Advanced | Help | Encyclopedia
Directory


Maximum parsimony

Introduction


Maximum parsimony is a simple but popular technique used in cladistics to predict an accurate phylogenetic tree for an set of taxa (commonly a set of species or reproductively-isolated populations of a single species).

The input data used in a maximum parsimony analysis is in the form of "characters" for a range of taxa. A character could be a binary value for the presence or absence of a feature(such as the presense of a tail), or it could be the protein or nucleic acid residue at a particular site in the organism's genome.

There are a number of probabilistic and deterministic algorithms available for constructing phylogenetic trees. However, what we describe here is the criteria that is used to select the best tree, not the specific algorithm used to find that answer.

The trees used in maximum parsimony analysis are unrooted trees(there is no indication of time in the tree, only the relations between taxa). All the taxa used in the analysis are leaf nodes(often called tips) in the tree(so have only one edge into them). Internal nodes are inserted into the tree to represent the inferred ancestral species. Each internal node has exactly three edges into it. Transitions between characters are placed on any of the edges on the tree.

The maximum parsimony tree is the tree which explains the characters, and which has the less than or equal to the number of transitions on any other tree that also explains the characters. To do this, all treesare given a length, equal to the minimum number of transitions which can explain the tree. The tree with the lowest length is the maximum parsimony tree. This assumption that "the simplest possible explanation is always right" is a generalisation of Occam's Razor.


Problems with maximum parsiomony

Maximum parsimony is a very simple approach, and is popular for this reason. However, it is not always very accurate.

A major problem with maximum parsimony is that it assumes that the only way two species can share the same character is if they are genetically related. This is not always the case. For example, birds and bats have wings, while crocodiles and humans do not. Based on this data, maximum parsimony would tend to group crocodiles with humans, and birds with bats. However, humans are actually more closely related to bats(a mammal) than crocodiles(a reptile) or birds. A similar problem occurs to back-mutation, where, due to a small mutation, a character is removed, but then it later comes back. Because maximum parsimony uses unrooted trees, independent innovation and back-mutation are actually mathematically equivalent.

As documented in 1978 by Felsenstein, maximum parsimony can systematically fail on some types of trees, where there are a limited number of characters. This situations is called long branch attraction, and occurs, for example, where you have long branches(a high level of substitutions) for two characters(A & C), but short branches for another two(B & D). A and B diverged from a common ancestor, as did C and D.

Assume for simplicity that we are considering a single binary character(it can either be + or -). Because the distance from B to D is small, in the vast majority of all cases, B and D will be the same. Here, we will assume that they are both +(+ and – are assigned arbitrarily and swapping them is only a matter of definition). If this is the case, there are four remaining possibilities. A and C can both be +, in which case all taxa are the same and all the trees have the same length. A can be + and C can be -, in which case only one character is different, and we cannot learn anything, as all trees have the same length. Similarly, A can be + and C can be -. The only remaining possibility is that A and C are both -. In this case, however, we group A and C together, and B and D together.

As a consequence, when we have a tree of this type, the more data we collect(i.e. the more characters we study), the more we tend towards the wrong tree.

To overcome this problem, a more sophisticated approach, such as maximum likelihood, can be used.








Links: Addme | Keyword Research | Paid Inclusion | Femail | Software | Completive Intelligence

Add URL | About Slider | FREE Slider Toolbar - Simply Amazing
Copyright © 2000-2008 Slider.com. All rights reserved.
Content is distributed under the GNU Free Documentation License.