Phylogenetics
In biology, phylogenetics (Greek: phylon = tribe, race and genetikos = relative to birth, from genesis = birth) is the study of evolutionary relatedness among various groups of organisms (e.g., species, populations). Phylogenetics, also known as phylogenetic systematics, treats a species as a group of lineage-connected individuals over time. Phylogenetic taxonomy, which is an offshoot of, but not a logical consequence of, phylogenetic systematics, constitutes a means of classifying groups of organisms according to degree of evolutionary relatedness.
"Phylogenetics is the science of estimating the evolutionary past, in the case of molecular phylogeny, based on the comparison of DNA or protein sequences." — Sandra L. Baldauf
Phylogeny (or phylogenesis) is the origin and evolution of a set of organisms, usually a set of species. A major task of systematics is to determine the ancestral relationships among known species (both living and extinct). The most commonly used methods to infer phylogenies include cladistics, phenetics, maximum likelihood, and Bayesian inference. These last two depend upon a mathematical model describing the evolution of characters observed in the species included, and are usually used for molecular phylogeny where the characters are aligned nucleotide or amino acid sequences.
Contents
Terminology
"A phylogenetic tree is composed of branches (edges) and nodes. Branches connect nodes; a node is the point at which two (or more) branches diverge. Branches and nodes can be internal or external (terminal). An internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences from which the tree was derived (also referred to as operational taxonomic units or 'OTUs'). Trees can be made up of multigene families (gene trees) or a single gene from many taxa (species trees, at least theoretically) or a combination of the two. In the first case, the internal nodes correspond to gene duplication events, in the second to speciation events." — Sandra L. Baldauf
Groups
- monophyletic (holophyletic)
- a natural group; all members are derived from a unique common ancestor (with respect to the rest of the tree) and have inherited a set of unique common traits (characters) from it (Baldauf, 2003).
- paraphyletic
- a group exluding some of its descendents (e.g. animals excluding humans) (Baldauf, 2003).
- polyphyletic
- a mixture of distantly related OTUs, perhaps superficially resembling one another or retaining similar primitive characteristics; that is, not a group at all (Baldauf, 2003).
Phylogenetic trees
The methods for calculating phylogenetic trees fall into two general categories (Hall, 2000):
- Distance-matrix methods (also known as clustering or algorithmic methods; much faster than the discrete data methods); and
- Discrete data methods (also known as tree searching methods; more information rich than distance-matrix methods)
Distance-matrix methods
There are various methods of the distance-matrix method. Listed below are the four main ones (Nei and Kumar, 2000):
- UPGMA
- involves clustering of closely distant species. At each stage of clustering, tree branches are being built, and the branch lengths are calculated. UPGMA assumes a constant evolutionary rate, and so the two species in a cluster are given the same branch length from the node. It is a simple and fast method; however, because of the assumption, it often produces incorrect topologies when the assumption is not met.
- Least Squares (LS) Method (or the Cavalli-Sforza-Edwards Method)
- calculates the differences between the observed and estimated branch lengths between species. After it evaluates all possible topologies, it chooses the topology with the smallest difference. The estimation of branch lengths has two methods, Fitch-Margoliash and Least Squares.
- Minimum Evolution (ME) Method
- estimates the total branch length of each topology. After it evaluates all possible topologies, it chooses the topology with the least total branch length. This method is computationally intensive and therefore slow, and with a small number of species to compare, the NJ method usually gives the same result as the ME method in less time.
- Neighbour-Joining (NJ) Method
- involves clustering of neighbour species that are joined by one node. It does not evaluate all the possible tree topologies, but at each stage of clustering the ME method is used. Thus, the NJ method is considered a simplified version of the ME method.
- Fitch-Margoliash
- see link
Discrete data methods
- parsimony
- see link
- maximum likelihood
- see link
- Bayesian methods
- see link
Roots
- outgroup
- anything that is not a natural member of the group of interest (i.e. the 'ingroup') (Baldauf, 2003).
- outcast
- the excluded member of a monophyletic group (i.e. the exclusion that makes it paraphyletic) is not an outgroup, it is an outcast (e.g. humans are not an outgroup of animals) (Baldauf, 2003).
- plesiomorphy
- the original (ancestral) character state. Shared plesiomorphies are called symplesiomorphies—uninformative similarity. Two kinds shown on the tree above, a) (1) which is shared by the ingroup and outgroup and b) (3, 4, 5) which are shared by the ingroup taxa only.
- apomorphy
- a derived character or character state. Shared apomorphies are called synapomorphies—informative similarity. The only synapomorphic characters on the tree are 6, 7, and 8. These provide information on branching relationships within the ingroup.
- autapomorphy
- uninformative differences unique to particular ingroup taxa. Characters 9, 10, and 11 are all autapomorphies for their respective taxa. Even though these characters are different between the taxa, they provide no cladistically useful information. Bear in mind though that they do provide information on branch length. A cladogram that shows branch length proportional to the amount of character change (including autapomorphic changes) is called a phylogram.
- holapomorphy
- present in the ingroup and their ancestor.
- homoplasy
- uninformative similarity (i.e., due to convergence or parallelism) or independent evolution of the same character(see Jukes-Cantor correction). Character 12 is an example where it is present in only some of the ingroup taxa. Clade A, B supported by character 12. Clade B,C supported by characters 6, 7, and 8. So, by the principle of parsimony, clade A, B is favored. Character 12 can also be interpreted as convergent in A and B or as holapomorphic (i.e. present in the ingroup and their ancestor) but was lost in taxon C. Parsimony simply means "economy in reasoning", or, if all things are equal, the simplest explanation is the preferred one (Occam's Razor).
Homology
Homologues can be orthologues or paralogues.
- orthologues
- Orthologues are genes thought to have evolved strictly by vertical descent (or vertically transmitted) from a common ancestor (e.e. parent to offspring). These genes usually arise a common ancestral gene during speciation. Orthologous genes may or may not be responsible for a similar function. Their phylogeny traces that of their host lineage. Orthologues only duplicate when their host divides (i.e. along with the rest of the genome).
- paralogues
- members of multigene families; they arise by gene duplication (Baldauf, 2003).
Books
Phylogenetic Systematics
- Willi Hennig
- University of Illinois Press, Urbana, 1966.
- ISBN 0252068149 (280 pages)
Description: This book popularized the techniques of cladistics in the English-speaking world. It is based on work published in German starting 1950. Willi Hennig is considered the founder of cladistics, which he developed while working as an entomologist in East Germany.
Inferring Phylogenies
- Joseph Felsenstein
- Sinauer Associates, 2004.
- ISBN 0878931775 (664 pages)
Description: An excellent technical manual to guide any biologist wishing to construct a phylogenetic hypothesis.
See also
- Bayesian inference
- Phylogenetic tree
- Evolutionary tree
- Molecular phylogeny
- Maximum likelihood
- Bioinformatics
Keywords
interior branch tests, polymorphism parsimony, quartets distance, expected pattern frequencies, parsimony score, multifurcating trees, least squares branch lengths, consensus supertree, unrooted tree topology, coalescent trees, quartets methods, ancestral selection graph, distance matrix methods, phylogenetic invariants, short quartets, unit branch length, unrooted bifurcating trees, coalescent genealogy, postorder tree traversal, least squares tree, different tree topologies, partial bootstrap, clock invariants, tree with branch lengths, branch that separates, common stem species, apomorphous conditions, comparative holomorphology, holomorphological method, phylogenetic kinship, hologenetic relationships, one stem species, autogenetic relationships, vicarying reproductive communities, tokogenetic relationships, genetic species concept, synapomorphous characters, mantle papillae, same absolute rank, absolute rank order, accessory criteria, phylogenetic systematics this, typological systematics, single stem species, hierarchic type, chorological method, species cleavage, parasitological method, general reference system, plesiomorphous characters
References
- Baldauf S (2003). Phylogeny for the faint of heart: a tutorial. TRENDS in Genetics 19(6):345-351.
- Hall BG (2000). Phylogenetic Trees Made Easy: A How-To Manual for Molecular Biologists. Sinauer Associates.
- Nei M and Kumar S (2000). Molecular Evolution and Phylogenetics. Oxford University Press, New York; pp73-113.
- Sneath PHA and Sokal RR (1973). Numerical Taxonomy. W.H. Freeman.
External links
Topics in phylogenetics |
---|
Relevant fields: phylogenetics | computational phylogenetics | molecular phylogeny | cladistics |
Basic concepts: synapomorphy | phylogenetic tree | phylogenetic network | long branch attraction |
Phylogeny inference methods: maximum parsimony | maximum likelihood | neighbour joining | UPGMA |