Difference between revisions of "UPGMA"

From Christoph's Personal Wiki
Jump to: navigation, search
(Started article)
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''UPGMA''' ('''Unweighted Pair-Group Method using Arithmetic Averages''') is a simple bottom-up data clustering method used in [[bioinformatics]] for the creation of [[phylogeny|phylogenetic]] trees. The input data is a collection of objects with their pairwise distances and the output is a rooted tree (dendrogram). It is sometimes used for creating rooted phylogenetic trees under the assumption of a constant evolutionary rate. Initially, each object is in its own cluster. At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters A and B is taken to be the average of all distances between pairs of objects a in A and b in B. UPGMA is not a well-regarded method for inferring phylogenetic trees unless the constant-rate assumption ([[molecular clock hypothesis]]) has been tested and justified for the data set being used.
+
'''UPGMA''' ('''Unweighted Pair-Group Method using Arithmetic Averages''') is a simple bottom-up data clustering method used in [[:Category:Bioinformatics|bioinformatics]] for the creation of [[phylogeny|phylogenetic]] trees.
  
UPGMA involves clustering of closely distant species. At each stage of clustering, tree branches are being built, and the branch lengths are calculated. UPGMA assumes a constant evolutionary rate, and so the two species in a cluster are given the same branch length from the node. It is a simple and fast method; however, because of the assumption, it often produces incorrect topologies when the assumption is not met.
+
The input data is a collection of objects with their pairwise distances and the output is a rooted tree (dendrogram). It is sometimes used for creating rooted phylogenetic trees under the assumption of a constant evolutionary rate. Initially, each object is in its own cluster. At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters ''A'' and ''B'' is taken to be the average of all distances between pairs of objects ''a'' in ''A'' and ''b'' in ''B''. UPGMA is not a well-regarded method for inferring phylogenetic trees unless the constant-rate assumption ([[molecular clock hypothesis]]) has been tested and justified for the data set being used.
  
== Methods Using Distance Matrices ==
+
UPGMA involves clustering of closely distant species. At each stage of clustering, tree branches are being built, and the branch lengths are calculated. UPGMA assumes a constant evolutionary rate, and so the two species in a cluster are given the same branch length from the node. It is a simple and fast method; however, because of the assumption, it often produces incorrect topologies when the assumption is not met.
There are various methods of the distance matrix method. Listed below are the four main ones (Nei & Kumar, 2000):
+
  
; UPGMA : ''see above''
+
==See also==
; Least Squares (LS) Method : calculates the differences between the observed and estimated branch lengths between species. After it evaluates all possible topologies, it chooses the topology with the smallest difference. The estimation of branch lengths has two methods, [[Fitch-Margoliash]] and Least Squares.
+
*[[WPGMA]]
; Minimum Evolution (ME) Method : estimates the total branch length of each topology. After it evaluates all possible topologies, it chooses the topology with the least total branch length. This method is computationally intensive and therefore slow, and with a small number of species to compare, the NJ method usually gives the same result as the ME method in less time.
+
*[[Phylogenetic trees]]
; Neighbour-Joining (NJ) Method : involves clustering of neighbour species that are joined by one node. It does not evaluate all the possible tree topologies, but at each stage of clustering the ME method is used. Thus, the NJ method is considered a simplified version of the ME method.
+
  
== References ==
+
==References==
* Nei M and Kumar S (2000). Molecular Evolution and Phylogenetics. ''Oxford University Press'' (New York; pp73-113).
+
*Kimura M (1968). "[http://bioportal.weizmann.ac.il/course/evogen/Neutral/kimura.pdf Evolutionary rate at the molecular level]". ''Nature, 217:624-626''.
 +
*Morgan GJ (1998). "Emile Zuckerkandl, Linus Pauling, and the Molecular Evolutionary Clock, 1959-1965". ''Journal of the History of Biology, 31:155-178''.
 +
*Sarich VM, Wilson AC (1967). "Immunological time scale for hominid evolution". ''Science, 158:1200-1203''.
 +
*Zuckerkandl E, Pauling L (1962). "Molecular disease, evolution, and genetic heterogeneity", pp. 189–225 in ''Horizons in Biochemistry'', edited by M. Kasha and B. Pullman. Academic Press, New York.
 +
*Zuckerkandl E, Pauling L (1965). "Evolutionary divergence and convergence in proteins", pp. 97–166 in ''Evolving Genes and Proteins'', edited by V. Bryson and H. J. Vogel. Academic Press, New York.
  
[[Category:Academic Research]]
+
{{Phylogenetics}}
 
[[Category:Phylogenetics]]
 
[[Category:Phylogenetics]]

Latest revision as of 10:34, 19 April 2007

UPGMA (Unweighted Pair-Group Method using Arithmetic Averages) is a simple bottom-up data clustering method used in bioinformatics for the creation of phylogenetic trees.

The input data is a collection of objects with their pairwise distances and the output is a rooted tree (dendrogram). It is sometimes used for creating rooted phylogenetic trees under the assumption of a constant evolutionary rate. Initially, each object is in its own cluster. At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters A and B is taken to be the average of all distances between pairs of objects a in A and b in B. UPGMA is not a well-regarded method for inferring phylogenetic trees unless the constant-rate assumption (molecular clock hypothesis) has been tested and justified for the data set being used.

UPGMA involves clustering of closely distant species. At each stage of clustering, tree branches are being built, and the branch lengths are calculated. UPGMA assumes a constant evolutionary rate, and so the two species in a cluster are given the same branch length from the node. It is a simple and fast method; however, because of the assumption, it often produces incorrect topologies when the assumption is not met.

See also

References

  • Kimura M (1968). "Evolutionary rate at the molecular level". Nature, 217:624-626.
  • Morgan GJ (1998). "Emile Zuckerkandl, Linus Pauling, and the Molecular Evolutionary Clock, 1959-1965". Journal of the History of Biology, 31:155-178.
  • Sarich VM, Wilson AC (1967). "Immunological time scale for hominid evolution". Science, 158:1200-1203.
  • Zuckerkandl E, Pauling L (1962). "Molecular disease, evolution, and genetic heterogeneity", pp. 189–225 in Horizons in Biochemistry, edited by M. Kasha and B. Pullman. Academic Press, New York.
  • Zuckerkandl E, Pauling L (1965). "Evolutionary divergence and convergence in proteins", pp. 97–166 in Evolving Genes and Proteins, edited by V. Bryson and H. J. Vogel. Academic Press, New York.
Topics in phylogenetics
Relevant fields: phylogenetics | computational phylogenetics | molecular phylogeny | cladistics
Basic concepts: synapomorphy | phylogenetic tree | phylogenetic network | long branch attraction
Phylogeny inference methods: maximum parsimony | maximum likelihood | neighbour joining | UPGMA