Difference between revisions of "Inferring Phylogenies"
From Christoph's Personal Wiki
(Started article) |
(Added "TOC". Needs formatting) |
||
Line 2: | Line 2: | ||
== Table of Contents == | == Table of Contents == | ||
+ | * PREFACE | ||
+ | * '''1. Parsimony methods''' | ||
+ | ** A simple example | ||
+ | *** Evaluating a particular tree | ||
+ | *** Rootedness and unrootedness | ||
+ | ** Methods of rooting the tree | ||
+ | ** Branch lengths | ||
+ | ** Unresolved questions | ||
+ | * '''2. Counting evolutionary changes''' | ||
+ | <ul> | ||
+ | <li>The Fitch algorithm</li> | ||
+ | <li>The Sankoff algorithm</li> | ||
+ | <ul> | ||
+ | <li>Connection between the two algorithms</li> | ||
+ | </ul> | ||
+ | <li>Using the algorithms when modifying trees</li> | ||
+ | <ul> | ||
+ | <li>Views</li> | ||
+ | <li>Using views when a tree is altered</li> | ||
+ | </ul> | ||
+ | <li>Further economies</li> | ||
+ | </ul> | ||
+ | * '''3. How many trees are there?''' | ||
+ | <ul> | ||
+ | <li>Rooted bifurcating trees</li> | ||
+ | <li>Unrooted bifurcating trees</li> | ||
+ | <li>Multifurcating trees</li> | ||
+ | <ul> | ||
+ | <li>Unrooted trees with multifurcations</li> | ||
+ | </ul> | ||
+ | <li>Tree shapes</li> | ||
+ | <ul> | ||
+ | <li>Rooted bifurcating tree shapes</li> | ||
+ | <li>Rooted multifurcating tree shapes</li> | ||
+ | <li>Unrooted Shapes</li> | ||
+ | </ul> | ||
+ | <li>Labeled histories</li> | ||
+ | <li>Perspective</li> | ||
+ | </ul> | ||
+ | * '''4. Finding the best tree by heuristic search''' | ||
+ | <ul> | ||
+ | <li>Nearest-neighbor interchanges</li> | ||
+ | <li>Subtree pruning and regrafting</li> | ||
+ | <li>Tree bisection and reconnection</li> | ||
+ | <li>Other tree rearrangement methods</li> | ||
+ | <ul> | ||
+ | <li>Tree-fusing</li> | ||
+ | <li>Genetic algorithms</li> | ||
+ | <li>Tree windows and sectorial search</li> | ||
+ | </ul> | ||
+ | <li>Speeding up rearrangements</li> | ||
+ | <li>Sequential addition</li> | ||
+ | <li>Star decomposition</li> | ||
+ | <li>Tree space</li> | ||
+ | <li>Search by reweighting of characters</li> | ||
+ | <li>Simulated annealing</li> | ||
+ | <li>History</li> | ||
+ | </ul> | ||
+ | * '''5. Finding the best tree by branch and bound''' | ||
+ | <ul> | ||
+ | <li>A nonbiological example</li> | ||
+ | <li>Finding the optimal solution</li> | ||
+ | <li>NP-hardness</li> | ||
+ | <li>Branch and bound methods</li> | ||
+ | <li>Phylogenies: Despair and hope</li> | ||
+ | <li>Branch and bound for parsimony</li> | ||
+ | <li>Improving the bound</li> | ||
+ | <ul> | ||
+ | <li>Using still-absent states</li> | ||
+ | <li>Using compatibility</li> | ||
+ | </ul> | ||
+ | <li>Rules limiting the search</li> | ||
+ | </ul> | ||
+ | * '''6. Ancestral states and branch lengths''' | ||
+ | <ul> | ||
+ | <li>Reconstructing ancestral states</li> | ||
+ | <li>Accelerated and delayed transformation</li> | ||
+ | <li>Branch lengths</li> | ||
+ | </ul> | ||
+ | * '''7. Variants of parsimony''' | ||
+ | <ul> | ||
+ | <li>Camin-Sokal parsimony</li> | ||
+ | <li>Parsimony on an ordinal scale</li> | ||
+ | <li>Dollo parsimony</li> | ||
+ | <li>Polymorphism parsimony</li> | ||
+ | <li>Unknown ancestral states</li> | ||
+ | <li>Multiple states and binary coding</li> | ||
+ | <li>Dollo parsimony and multiple states</li> | ||
+ | <li>Polymorphism parsimony and multiple states</li> | ||
+ | <li>Transformation series analysis</li> | ||
+ | <li>Weighting characters</li> | ||
+ | <li>Successive weighting and nonlinear weighting</li> | ||
+ | <ul> | ||
+ | <li>Successive weighting</li> | ||
+ | <li>Nonsuccessive algorithms</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''8. Compatibility''' | ||
+ | <ul> | ||
+ | <li>Testing compatibility</li> | ||
+ | <li>The Pairwise Compatibility Theorem</li> | ||
+ | <li>Cliques of compatible characters</li> | ||
+ | <li>Finding the tree from the clique</li> | ||
+ | <li>Other cases where cliques can be used</li> | ||
+ | <li>Where cliques cannot be used</li> | ||
+ | <ul> | ||
+ | <li>Perfect phylogeny</li> | ||
+ | <li>Using compatibility on molecules anyway</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''9. Statistical properties of parsimony''' | ||
+ | <ul> | ||
+ | <li>Likelihood and parsimony</li> | ||
+ | <ul> | ||
+ | <li>The weights</li> | ||
+ | <li>Unweighted parsimony</li> | ||
+ | <li>Limitations of this justification of parsimony</li> | ||
+ | <li>Farris’s proofs</li> | ||
+ | <li>No common mechanism</li> | ||
+ | <li>Likelihood and compatibility</li> | ||
+ | <li>Parsimony versus compatibility</li> | ||
+ | </ul> | ||
+ | <li>Consistency and parsimony</li> | ||
+ | <ul> | ||
+ | <li>Character patterns and parsimony</li> | ||
+ | <li>Observed numbers of the patterns</li> | ||
+ | <li>Observed fractions of the patterns</li> | ||
+ | <li>Expected fractions of the patterns</li> | ||
+ | <li>Inconsistency</li> | ||
+ | <li>When inconsistency is not a problem</li> | ||
+ | <li>The nucleotide sequence case</li> | ||
+ | <li>Other situations where consistency is guaranteed</li> | ||
+ | <li>Does a molecular clock guarantee consistency?</li> | ||
+ | <li>The Farris zone</li> | ||
+ | </ul> | ||
+ | <li>Some perspective</li> | ||
+ | </ul> | ||
+ | * '''10. A digression on history and philosophy''' | ||
+ | <ul> | ||
+ | <li>How phylogeny algorithms developed</li> | ||
+ | <ul> | ||
+ | <li>Sokal and Sneath</li> | ||
+ | <li>Edwards and Cavalli-Sforza</li> | ||
+ | <li>Camin and Sokal and parsimony</li> | ||
+ | <li>Eck and Dayhoff and molecular parsimony</li> | ||
+ | <li>Fitch and Margoliash popularize distance matrix methods</li> | ||
+ | <li>Wilson and Le Quesne introduce compatibility</li> | ||
+ | <li>Jukes and Cantor and molecular distances</li> | ||
+ | <li>Farris and Kluge and unordered parsimony</li> | ||
+ | <li>Fitch and molecular parsimony</li> | ||
+ | <li>Further work</li> | ||
+ | <li>What about Willi Hennig and Walter Zimmerman?</li> | ||
+ | </ul> | ||
+ | <li>Different philosophical frameworks</li> | ||
+ | <ul> | ||
+ | <li>Hypothetico-deductive</li> | ||
+ | <li>Logical parsimony</li> | ||
+ | <li>Logical probability?</li> | ||
+ | <li>Criticisms of statistical inference</li> | ||
+ | <li>The irrelevance of classification</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''11. Distance matrix methods''' | ||
+ | <ul> | ||
+ | <li>Branch lengths and times</li> | ||
+ | <li>The least squares methods</li> | ||
+ | <ul> | ||
+ | <li>Least squares branch lengths</li> | ||
+ | <li>Finding the least squares tree topology</li> | ||
+ | </ul> | ||
+ | <li>The statistical rationale</li> | ||
+ | <li>Generalized least squares</li> | ||
+ | <li>Distances</li> | ||
+ | <li>The [[Jukes-Cantor model]]—-an example</li> | ||
+ | <li>Why correct for multiple changes?</li> | ||
+ | <li>Minimum evolution</li> | ||
+ | <li>Clustering algorithms</li> | ||
+ | <li>UPGMA and least squares</li> | ||
+ | <ul> | ||
+ | <li>A clustering algorithm</li> | ||
+ | <li>An example</li> | ||
+ | <li>UPGMA on nonclocklike trees</li> | ||
+ | </ul> | ||
+ | <li>Neighbor-joining</li> | ||
+ | <ul> | ||
+ | <li>Performance</li> | ||
+ | <li>Using neighbor-joining with other methods</li> | ||
+ | <li>Relation of neighbor-joining to least squares</li> | ||
+ | <li>Weighted versions of neighbor-joining</li> | ||
+ | </ul> | ||
+ | <li>Other approximate distance methods</li> | ||
+ | <ul> | ||
+ | <li>Distance Wagner method</li> | ||
+ | <li>A related family</li> | ||
+ | <li>Minimizing the maximum discrepancy</li> | ||
+ | <li>Two approaches to error in trees</li> | ||
+ | </ul> | ||
+ | <li>A puzzling formula</li> | ||
+ | <li>Consistency and distance methods</li> | ||
+ | <li>A limitation of distance methods</li> | ||
+ | </ul> | ||
+ | * '''12. Quartets of species''' | ||
+ | <ul> | ||
+ | <li>The four point metric</li> | ||
+ | <li>The split decomposition</li> | ||
+ | <ul> | ||
+ | <li>Related methods</li> | ||
+ | </ul> | ||
+ | <li>Short quartets methods</li> | ||
+ | <li>The disk-covering method</li> | ||
+ | <li>Challenges for the short quartets and DCM methods</li> | ||
+ | <li>Three-taxon statement methods</li> | ||
+ | <li>Other uses of quartets with parsimony</li> | ||
+ | <li>Consensus supertrees</li> | ||
+ | <li>Neighborliness</li> | ||
+ | <li>De Soete’s search method</li> | ||
+ | <li>Quartet puzzling and searching tree space</li> | ||
+ | <li>Perspective</li> | ||
+ | </ul> | ||
+ | * '''13. Models of DNA evolution''' | ||
+ | <ul> | ||
+ | <li>Kimura’s two-parameter model</li> | ||
+ | <li>Calculation of the distance</li> | ||
+ | <li>The Tamura-Nei model, F84, and HKY</li> | ||
+ | <li>The general time-reversible model</li> | ||
+ | <ul> | ||
+ | <li>Distances from the GTR model</li> | ||
+ | </ul> | ||
+ | <li>The general 12-parameter model</li> | ||
+ | <li>LogDet distances</li> | ||
+ | <li>Other distances</li> | ||
+ | <li>Variance of distance</li> | ||
+ | <li>Rate variation between sites or loci</li> | ||
+ | <ul> | ||
+ | <li>Different rates at different sites</li> | ||
+ | <li>Distances with known rates</li> | ||
+ | <li>Distribution of rates</li> | ||
+ | <li>Gamma- and lognormally distributed rates</li> | ||
+ | </ul> | ||
+ | <ul> | ||
+ | <li>Distances from gamma-distributed rates</li> | ||
+ | </ul> | ||
+ | <li>Models with nonindependence of sites</li> | ||
+ | </ul> | ||
+ | * '''14. Models of protein evolution''' | ||
+ | <ul> | ||
+ | <li>Amino acid models</li> | ||
+ | <li>The Dayhoff model</li> | ||
+ | <li>Other empirically-based models</li> | ||
+ | <ul> | ||
+ | <li>Models depending on secondary structure</li> | ||
+ | </ul> | ||
+ | <li>Codon-based models</li> | ||
+ | <ul> | ||
+ | <li>Inequality of synonymous and nonsynonymous substitutions</li> | ||
+ | </ul> | ||
+ | <li>Protein structure and correlated change</li> | ||
+ | </ul> | ||
+ | * '''15. Restriction sites, RAPDs, AFLPs, and microsatellites''' | ||
+ | <ul> | ||
+ | <li>Restriction sites</li> | ||
+ | <ul> | ||
+ | <li>Nei and Tajima’s model</li> | ||
+ | <li>Distances based on restriction sites</li> | ||
+ | <li>Issues of ascertainment</li> | ||
+ | <li>Parsimony for restriction sites</li> | ||
+ | </ul> | ||
+ | <li>Modeling restriction fragments</li> | ||
+ | <ul> | ||
+ | <li>Parsimony with restriction fragments</li> | ||
+ | </ul> | ||
+ | <li>RAPDs and AFLPs</li> | ||
+ | <ul> | ||
+ | <li>The issue of dominance</li> | ||
+ | <li>Unresolved problems</li> | ||
+ | <li>Microsatellite models</li> | ||
+ | <li>The one-step model</li> | ||
+ | <li>Microsatellite distances</li> | ||
+ | <li>A Brownian motion approximation</li> | ||
+ | <li>Models with constraints on array size</li> | ||
+ | <li>Multi-step and heterogeneous models</li> | ||
+ | <li>Snakes and Ladders</li> | ||
+ | <li>Complications</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''16. Likelihood methods''' | ||
+ | <ul> | ||
+ | <li>Maximum likelihood</li> | ||
+ | <ul> | ||
+ | <li>An example</li> | ||
+ | </ul> | ||
+ | <li>Computing the likelihood of a tree</li> | ||
+ | <ul> | ||
+ | <li>Economizing on the computation</li> | ||
+ | <li>Handling ambiguity and error</li> | ||
+ | </ul> | ||
+ | <li>Unrootedness</li> | ||
+ | <li>Finding the maximum likelihood tree</li> | ||
+ | <li>Inferring ancestral sequences</li> | ||
+ | <li>Rates varying among sites</li> | ||
+ | <ul> | ||
+ | <li>Hidden Markov models</li> | ||
+ | <li>Autocorrelation of rates</li> | ||
+ | <li>HMMs for other aspects of models</li> | ||
+ | <li>Estimating the states</li> | ||
+ | </ul> | ||
+ | <li>Models with clocks</li> | ||
+ | <ul> | ||
+ | <li>Relaxing molecular clocks</li> | ||
+ | <li>Models for relaxed clocks</li> | ||
+ | <li>Covarions</li> | ||
+ | <li>Empirical approaches to change of rates</li> | ||
+ | </ul> | ||
+ | <li>Are ML estimates consistent?</li> | ||
+ | <ul> | ||
+ | <li>Comparability of likelihoods</li> | ||
+ | <li>A nonexistent proof?</li> | ||
+ | <li>A simple proof</li> | ||
+ | <li>Misbehavior with the wrong model</li> | ||
+ | <li>Better behavior with the wrong model</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''17. Hadamard methods''' | ||
+ | <ul> | ||
+ | <li>The edge length spectrum and conjugate spectrum</li> | ||
+ | <li>The closest tree criterion</li> | ||
+ | <li>DNA models</li> | ||
+ | <li>Computational effort</li> | ||
+ | <li>Extensions of Hadamard methods</li> | ||
+ | </ul> | ||
+ | * '''18. Bayesian inference of phylogenies''' | ||
+ | <ul> | ||
+ | <li>Bayes’ theorem</li> | ||
+ | <li>Bayesian methods for phylogenies</li> | ||
+ | <li>Markov chain Monte Carlo methods</li> | ||
+ | <li>The Metropolis algorithm</li> | ||
+ | <ul> | ||
+ | <li>Its equilibrium distribution</li> | ||
+ | <li>Bayesian MCMC</li> | ||
+ | </ul> | ||
+ | <li>Bayesian MCMC for phylogenies</li> | ||
+ | <ul> | ||
+ | <li>Priors</li> | ||
+ | </ul> | ||
+ | <li>Proposal distributions</li> | ||
+ | <li>Computing the likelihoods</li> | ||
+ | <li>Summarizing the posterior</li> | ||
+ | <li>Priors on trees</li> | ||
+ | <li>Controversies over Bayesian inference</li> | ||
+ | <ul> | ||
+ | <li>Universality of the prior</li> | ||
+ | <li>Flat priors and doubts about them</li> | ||
+ | </ul> | ||
+ | <li>Applications of Bayesian methods</li> | ||
+ | </ul> | ||
+ | * '''19. Testing models, trees, and clocks''' | ||
+ | <ul> | ||
+ | <li>Likelihood and tests</li> | ||
+ | <li>Likelihood ratios near asymptopia</li> | ||
+ | <li>Multiple parameters</li> | ||
+ | <ul> | ||
+ | <li>Some parameters constrained, some not</li> | ||
+ | <li>Conditions</li> | ||
+ | <li>Curvature or height?</li> | ||
+ | </ul> | ||
+ | <li>Interval estimates</li> | ||
+ | <li>Testing assertions about parameters</li> | ||
+ | <ul> | ||
+ | <li>Coins in a barrel</li> | ||
+ | <li>Evolutionary rates instead of coins</li> | ||
+ | </ul> | ||
+ | <li>Choosing among nonnested hypotheses: AIC and BIC</li> | ||
+ | <ul> | ||
+ | <li>An example using the AIC criterion</li> | ||
+ | </ul> | ||
+ | <li>The problem of multiple topologies</li> | ||
+ | <ul> | ||
+ | <li>LRTs and single branches</li> | ||
+ | </ul> | ||
+ | <li>Interior branch tests</li> | ||
+ | <ul> | ||
+ | <li>Interior branch tests using parsimony</li> | ||
+ | <li>A multiple-branch counterpart of interior branch tests</li> | ||
+ | </ul> | ||
+ | <li>Testing the molecular clock</li> | ||
+ | <ul> | ||
+ | <li>Parsimony-based methods</li> | ||
+ | <li>Distance-based methods</li> | ||
+ | <li>Likelihood-based methods</li> | ||
+ | <li>The relative rate test</li> | ||
+ | </ul> | ||
+ | <li>Simulation tests based on likelihood</li> | ||
+ | <ul> | ||
+ | <li>Further literature</li> | ||
+ | </ul> | ||
+ | <li>More exact tests and confidence intervals</li> | ||
+ | <ul> | ||
+ | <li>Tests for three species with a clock</li> | ||
+ | <li>Bremer support</li> | ||
+ | <li>Zander’s conditional probability of reconstruction</li> | ||
+ | <li>More generalized confidence sets</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''20. Bootstrap, jackknife, and permutation tests''' | ||
+ | <ul> | ||
+ | <li>The bootstrap and the jackknife</li> | ||
+ | <li>Bootstrapping and phylogenies</li> | ||
+ | <li>The delete-half jackknife</li> | ||
+ | <li>The bootstrap and jackknife for phylogenies</li> | ||
+ | <li>The multiple-tests problem</li> | ||
+ | <li>Independence of characters</li> | ||
+ | <li>Identical distribution —- a problem?</li> | ||
+ | <li>Invariant characters and resampling methods</li> | ||
+ | <li>Biases in bootstrap and jackknife probabilities</li> | ||
+ | <ul> | ||
+ | <li>$P$ values in a simple normal case</li> | ||
+ | <li>Methods of reducing the bias</li> | ||
+ | <li>The drug testing analogy</li> | ||
+ | </ul> | ||
+ | <li>Alternatives to <em>P</em> values</li> | ||
+ | <ul> | ||
+ | <li>Probabilities of trees</li> | ||
+ | <li>Using tree distances</li> | ||
+ | <li>Jackknifing species</li> | ||
+ | </ul> | ||
+ | <li>Parametric bootstrapping</li> | ||
+ | <ul> | ||
+ | <li>Advantages and disadvantages of the parametric bootstrap</li> | ||
+ | </ul> | ||
+ | <li>Permutation tests</li> | ||
+ | <ul> | ||
+ | <li>Permuting species within characters</li> | ||
+ | </ul> | ||
+ | <ul> | ||
+ | <li>Permuting characters</li> | ||
+ | <li>Skewness of tree length distribution</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''21. Paired-sites tests''' | ||
+ | <ul> | ||
+ | <li>An example</li> | ||
+ | <li>Multiple trees</li> | ||
+ | <ul> | ||
+ | <li>The SH test</li> | ||
+ | <li>Other multiple-comparison tests</li> | ||
+ | </ul> | ||
+ | <li>Testing other parameters</li> | ||
+ | <li>Perspective</li> | ||
+ | </ul> | ||
+ | * '''22. Invariants''' | ||
+ | <ul> | ||
+ | <li>Symmetry invariants</li> | ||
+ | <li>Three-species invariants</li> | ||
+ | <li>Lake’s linear invariants</li> | ||
+ | <li>Cavender’s quadratic invariants</li> | ||
+ | <ul> | ||
+ | <li>The <em>K</em> invariants</li> | ||
+ | <li>The <em>L</em> invariants</li> | ||
+ | <li>Generalization of Cavender’s <em>L</em> invariants</li> | ||
+ | </ul> | ||
+ | <li>Drolet and Sankoff’s <em>k</em>-state quadratic invariants</li> | ||
+ | <li>Clock invariants</li> | ||
+ | <li>General methods for finding invariants</li> | ||
+ | <ul> | ||
+ | <li>Fourier transform methods</li> | ||
+ | <li>Gröbner bases and other general methods</li> | ||
+ | <li>Expressions for all the 3ST invariants</li> | ||
+ | <li>Finding all invariants empirically</li> | ||
+ | <li>All linear invariants</li> | ||
+ | <li>Special cases and extensions</li> | ||
+ | </ul> | ||
+ | <li>Invariants and evolutionary rates</li> | ||
+ | <li>Testing invariants</li> | ||
+ | <li>What use are invariants?</li> | ||
+ | </ul> | ||
+ | * '''23. Brownian motion and gene frequencies''' | ||
+ | <ul> | ||
+ | <li>Brownian motion</li> | ||
+ | <li>Likelihood for a phylogeny</li> | ||
+ | <li>What likelihood to compute?</li> | ||
+ | <ul> | ||
+ | <li>Assuming a clock</li> | ||
+ | <li>The REML approach</li> | ||
+ | </ul> | ||
+ | <li>Multiple characters and Kronecker products</li> | ||
+ | <li>Pruning the likelihood</li> | ||
+ | <li>Maximizing the likelihood</li> | ||
+ | <li>Inferring ancestral states</li> | ||
+ | <ul> | ||
+ | <li>Squared-change parsimony</li> | ||
+ | </ul> | ||
+ | <li>Gene frequencies and Brownian motion</li> | ||
+ | <ul> | ||
+ | <li>Using approximate Brownian motion</li> | ||
+ | <li>Distances from gene frequencies</li> | ||
+ | <li>A more exact likelihood method</li> | ||
+ | <li>Gene frequency parsimony</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''24. Quantitative characters''' | ||
+ | <ul> | ||
+ | <li>Neutral models of quantitative characters</li> | ||
+ | <li>Changes due to natural selection</li> | ||
+ | <ul> | ||
+ | <li>Selective correlation</li> | ||
+ | <li>Covariances of multiple characters in multiple lineages</li> | ||
+ | <li>Selection for an optimum</li> | ||
+ | <li>Brownian motion and selection</li> | ||
+ | </ul> | ||
+ | <li>Correcting for correlations</li> | ||
+ | <li>Punctuational models</li> | ||
+ | <li>Inferring phylogenies and correlations</li> | ||
+ | <li>Chasing a common optimum</li> | ||
+ | <li>The character-coding “problem”</li> | ||
+ | <li>Continuous-character parsimony methods</li> | ||
+ | <ul> | ||
+ | <li>Manhattan metric parsimony</li> | ||
+ | <li>Other parsimony methods</li> | ||
+ | </ul> | ||
+ | <li>Threshold models</li> | ||
+ | </ul> | ||
+ | * '''25. Comparative methods''' | ||
+ | <ul> | ||
+ | <li>An example with discrete states</li> | ||
+ | <li>An example with continuous characters</li> | ||
+ | <li>The contrasts method</li> | ||
+ | <li>Correlations between characters</li> | ||
+ | <li>When the tree is not completely known</li> | ||
+ | <li>Inferring change in a branch</li> | ||
+ | <li>Sampling error</li> | ||
+ | <li>The standard regression and other variations</li> | ||
+ | <ul> | ||
+ | <li>Generalized least squares</li> | ||
+ | <li>Phylogenetic autocorrelation</li> | ||
+ | <li>Transformations of time</li> | ||
+ | <li>Should we use the phylogeny at all?</li> | ||
+ | </ul> | ||
+ | <li>Paired-lineage tests</li> | ||
+ | <li>Discrete characters</li> | ||
+ | <ul> | ||
+ | <li>Ridley’s method</li> | ||
+ | <li>Concentrated-changes tests</li> | ||
+ | <li>A paired-lineages test</li> | ||
+ | <li>Methods using likelihood</li> | ||
+ | <li>Advantages of the likelihood approach</li> | ||
+ | </ul> | ||
+ | <li>Molecular applications</li> | ||
+ | </ul> | ||
+ | * '''26. Coalescent trees''' | ||
+ | <ul> | ||
+ | <li>Kingman’s coalescent</li> | ||
+ | <li>Bugs in a box—an analogy</li> | ||
+ | <li>Effect of varying population size</li> | ||
+ | <li>Migration</li> | ||
+ | <li>Effect of recombination</li> | ||
+ | <li>Coalescents and natural selection</li> | ||
+ | <ul> | ||
+ | <li>Neuhauser and Krone’s method</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''27. Likelihood calculations on coalescents''' | ||
+ | <ul> | ||
+ | <li>The basic equation</li> | ||
+ | <li>Using accurate genealogies—a reverie</li> | ||
+ | <li>Two random sampling methods</li> | ||
+ | <ul> | ||
+ | <li>A Metropolis-Hastings method</li> | ||
+ | <li>Griffiths and Tavaré’s method</li> | ||
+ | </ul> | ||
+ | <li>Bayesian methods</li> | ||
+ | <li>MCMC for a variety of coalescent models</li> | ||
+ | <li>Single-tree methods</li> | ||
+ | <ul> | ||
+ | <li>Slatkin and Maddison’s method</li> | ||
+ | <li>Fu’s method</li> | ||
+ | </ul> | ||
+ | <li>Summary-statistic methods</li> | ||
+ | <ul> | ||
+ | <li>Watterson’s method</li> | ||
+ | <li>Other summary-statistic methods</li> | ||
+ | <li>Testing for recombination</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''28. Coalescents and species trees''' | ||
+ | <ul> | ||
+ | <li>Methods of inferring the species phylogeny</li> | ||
+ | <ul> | ||
+ | <li>Reconciled tree parsimony approaches</li> | ||
+ | <li>Likelihood</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''29. Alignment, gene families, and genomics''' | ||
+ | <ul> | ||
+ | <li>Alignment</li> | ||
+ | <ul> | ||
+ | <li>Why phylogenies are important</li> | ||
+ | </ul> | ||
+ | <li>Parsimony method</li> | ||
+ | <ul> | ||
+ | <li>Approximations and progressive alignment</li> | ||
+ | </ul> | ||
+ | <li>Probabilistic models</li> | ||
+ | <ul> | ||
+ | <li>Bishop and Thompson’s method</li> | ||
+ | <li>The minimum message length method</li> | ||
+ | <li>The TKF model</li> | ||
+ | <li>Multibase insertions and deletions</li> | ||
+ | <li>Tree HMMs</li> | ||
+ | <li>Trees</li> | ||
+ | <li>Inferring the alignment</li> | ||
+ | </ul> | ||
+ | <li>Gene families</li> | ||
+ | <ul> | ||
+ | <li>Reconciled trees</li> | ||
+ | <li>Reconstructing duplications</li> | ||
+ | <li>Rooting unrooted trees</li> | ||
+ | <li>A likelihood analysis</li> | ||
+ | </ul> | ||
+ | <li>Comparative genomics</li> | ||
+ | <ul> | ||
+ | <li>Tandemly repeated genes</li> | ||
+ | <li>Inversions</li> | ||
+ | <li>Inversions in trees</li> | ||
+ | <li>Inversions, transpositions, and translocations</li> | ||
+ | <li>Breakpoint and neighbor-coding approximations</li> | ||
+ | <li>Synteny</li> | ||
+ | <li>Probabilistic models</li> | ||
+ | </ul> | ||
+ | <li>Genome signature methods</li> | ||
+ | </ul> | ||
+ | * '''30. Consensus trees and distances between trees''' | ||
+ | <ul> | ||
+ | <li>Consensus trees</li> | ||
+ | <ul> | ||
+ | <li>Strict consensus</li> | ||
+ | <li>Majority-rule consensus</li> | ||
+ | <li>Adams consensus tree</li> | ||
+ | <li>A dismaying result</li> | ||
+ | <li>Consensus using branch lengths</li> | ||
+ | <li>Other consensus tree methods</li> | ||
+ | <li>Consensus subtrees</li> | ||
+ | </ul> | ||
+ | <li>Distances between trees</li> | ||
+ | <ul> | ||
+ | <li>The symmetric difference</li> | ||
+ | <li>The quartets distance</li> | ||
+ | <li>The nearest-neighbor interchange distance</li> | ||
+ | <li>The path-length-difference metric</li> | ||
+ | <li>Distances using branch lengths</li> | ||
+ | <li>Are these distances truly distances?</li> | ||
+ | <li>Consensus trees and distances</li> | ||
+ | <li>Trees significantly the same? different?</li> | ||
+ | </ul> | ||
+ | <li>What do consensus trees and tree distances tell us?</li> | ||
+ | <ul> | ||
+ | <li>The total evidence debate</li> | ||
+ | <li>A modest proposal</li> | ||
+ | </ul> | ||
+ | </ul> | ||
+ | * '''31. Biogeography, hosts, and parasites''' | ||
+ | <ul> | ||
+ | <li>Component compatibility</li> | ||
+ | <li>Brooks parsimony</li> | ||
+ | <li>Event-based parsimony methods</li> | ||
+ | <ul> | ||
+ | <li>Relation to tree reconciliation</li> | ||
+ | </ul> | ||
+ | <li>Randomization tests</li> | ||
+ | <li>Statistical inference</li> | ||
+ | </ul> | ||
+ | * '''32. Phylogenies and paleontology''' | ||
+ | <ul> | ||
+ | <li>Stratigraphic indices</li> | ||
+ | <li>Stratophenetics</li> | ||
+ | <li>Stratocladistics</li> | ||
+ | <li>Controversies</li> | ||
+ | <li>A not-quite-likelihood method</li> | ||
+ | <li>Stratolikelihood</li> | ||
+ | <ul> | ||
+ | <li>Making a full likelihood method</li> | ||
+ | <li>More realistic fossilization models</li> | ||
+ | </ul> | ||
+ | <li>Fossils within species: Sequential sampling</li> | ||
+ | <li>Between species</li> | ||
+ | </ul> | ||
+ | * '''33. Tests based on tree shape''' | ||
+ | <ul> | ||
+ | <li>Using the topology only</li> | ||
+ | <ul> | ||
+ | <li>Imbalance at the root</li> | ||
+ | </ul> | ||
+ | <li>Harding’s probabilities of tree shapes</li> | ||
+ | <li>Tests from shapes</li> | ||
+ | <ul> | ||
+ | <li>Measures of overall asymmetry</li> | ||
+ | <li>Choosing a powerful test</li> | ||
+ | </ul> | ||
+ | <li>Tests using times</li> | ||
+ | <ul> | ||
+ | <li>Lineage plots</li> | ||
+ | <li>Likelihood formulas</li> | ||
+ | <li>Other likelihood approaches</li> | ||
+ | <li>Other statistical approaches</li> | ||
+ | <li>A time transformation</li> | ||
+ | </ul> | ||
+ | <li>Characters and key innovations</li> | ||
+ | <li>Work remaining</li> | ||
+ | </ul> | ||
+ | * '''34. Drawing trees''' | ||
+ | <ul> | ||
+ | <li>Issues in drawing rooted trees</li> | ||
+ | <ul> | ||
+ | <li>Placement of interior nodes</li> | ||
+ | <li>Shapes of lineages</li> | ||
+ | </ul> | ||
+ | <li>Unrooted trees</li> | ||
+ | <ul> | ||
+ | <li>The equal-angle algorithm</li> | ||
+ | <li>n-Body algorithms</li> | ||
+ | <li>The equal-daylight algorithm</li> | ||
+ | </ul> | ||
+ | <li>Challenges</li> | ||
+ | </ul> | ||
+ | * '''35. Phylogeny software''' | ||
+ | <ul> | ||
+ | <li>Trees, records, and pointers</li> | ||
+ | <li>Declaring records</li> | ||
+ | <li>Traversing the tree</li> | ||
+ | <li>Unrooted tree data structures</li> | ||
+ | <li>Tree file formats</li> | ||
+ | <li>Widely used phylogeny programs and packages</li> | ||
+ | </ul> | ||
+ | * REFERENCES | ||
+ | * INDEX | ||
[[Category:Academic Courses]] | [[Category:Academic Courses]] | ||
[[Category:Books]] | [[Category:Books]] |
Latest revision as of 10:55, 6 January 2006
Inferring Phylogenies (ISBN ) by Joseph Felsenstein.
Table of Contents
- PREFACE
- 1. Parsimony methods
- A simple example
- Evaluating a particular tree
- Rootedness and unrootedness
- Methods of rooting the tree
- Branch lengths
- Unresolved questions
- A simple example
- 2. Counting evolutionary changes
- The Fitch algorithm
- The Sankoff algorithm
- Connection between the two algorithms
- Using the algorithms when modifying trees
- Views
- Using views when a tree is altered
- Further economies
- 3. How many trees are there?
- Rooted bifurcating trees
- Unrooted bifurcating trees
- Multifurcating trees
- Unrooted trees with multifurcations
- Tree shapes
- Rooted bifurcating tree shapes
- Rooted multifurcating tree shapes
- Unrooted Shapes
- Labeled histories
- Perspective
- 4. Finding the best tree by heuristic search
- Nearest-neighbor interchanges
- Subtree pruning and regrafting
- Tree bisection and reconnection
- Other tree rearrangement methods
- Tree-fusing
- Genetic algorithms
- Tree windows and sectorial search
- Speeding up rearrangements
- Sequential addition
- Star decomposition
- Tree space
- Search by reweighting of characters
- Simulated annealing
- History
- 5. Finding the best tree by branch and bound
- A nonbiological example
- Finding the optimal solution
- NP-hardness
- Branch and bound methods
- Phylogenies: Despair and hope
- Branch and bound for parsimony
- Improving the bound
- Using still-absent states
- Using compatibility
- Rules limiting the search
- 6. Ancestral states and branch lengths
- Reconstructing ancestral states
- Accelerated and delayed transformation
- Branch lengths
- 7. Variants of parsimony
- Camin-Sokal parsimony
- Parsimony on an ordinal scale
- Dollo parsimony
- Polymorphism parsimony
- Unknown ancestral states
- Multiple states and binary coding
- Dollo parsimony and multiple states
- Polymorphism parsimony and multiple states
- Transformation series analysis
- Weighting characters
- Successive weighting and nonlinear weighting
- Successive weighting
- Nonsuccessive algorithms
- 8. Compatibility
- Testing compatibility
- The Pairwise Compatibility Theorem
- Cliques of compatible characters
- Finding the tree from the clique
- Other cases where cliques can be used
- Where cliques cannot be used
- Perfect phylogeny
- Using compatibility on molecules anyway
- 9. Statistical properties of parsimony
- Likelihood and parsimony
- The weights
- Unweighted parsimony
- Limitations of this justification of parsimony
- Farris’s proofs
- No common mechanism
- Likelihood and compatibility
- Parsimony versus compatibility
- Consistency and parsimony
- Character patterns and parsimony
- Observed numbers of the patterns
- Observed fractions of the patterns
- Expected fractions of the patterns
- Inconsistency
- When inconsistency is not a problem
- The nucleotide sequence case
- Other situations where consistency is guaranteed
- Does a molecular clock guarantee consistency?
- The Farris zone
- Some perspective
- 10. A digression on history and philosophy
- How phylogeny algorithms developed
- Sokal and Sneath
- Edwards and Cavalli-Sforza
- Camin and Sokal and parsimony
- Eck and Dayhoff and molecular parsimony
- Fitch and Margoliash popularize distance matrix methods
- Wilson and Le Quesne introduce compatibility
- Jukes and Cantor and molecular distances
- Farris and Kluge and unordered parsimony
- Fitch and molecular parsimony
- Further work
- What about Willi Hennig and Walter Zimmerman?
- Different philosophical frameworks
- Hypothetico-deductive
- Logical parsimony
- Logical probability?
- Criticisms of statistical inference
- The irrelevance of classification
- 11. Distance matrix methods
- Branch lengths and times
- The least squares methods
- Least squares branch lengths
- Finding the least squares tree topology
- The statistical rationale
- Generalized least squares
- Distances
- The Jukes-Cantor model—-an example
- Why correct for multiple changes?
- Minimum evolution
- Clustering algorithms
- UPGMA and least squares
- A clustering algorithm
- An example
- UPGMA on nonclocklike trees
- Neighbor-joining
- Performance
- Using neighbor-joining with other methods
- Relation of neighbor-joining to least squares
- Weighted versions of neighbor-joining
- Other approximate distance methods
- Distance Wagner method
- A related family
- Minimizing the maximum discrepancy
- Two approaches to error in trees
- A puzzling formula
- Consistency and distance methods
- A limitation of distance methods
- 12. Quartets of species
- The four point metric
- The split decomposition
- Related methods
- Short quartets methods
- The disk-covering method
- Challenges for the short quartets and DCM methods
- Three-taxon statement methods
- Other uses of quartets with parsimony
- Consensus supertrees
- Neighborliness
- De Soete’s search method
- Quartet puzzling and searching tree space
- Perspective
- 13. Models of DNA evolution
- Kimura’s two-parameter model
- Calculation of the distance
- The Tamura-Nei model, F84, and HKY
- The general time-reversible model
- Distances from the GTR model
- The general 12-parameter model
- LogDet distances
- Other distances
- Variance of distance
- Rate variation between sites or loci
- Different rates at different sites
- Distances with known rates
- Distribution of rates
- Gamma- and lognormally distributed rates
- Distances from gamma-distributed rates
- Models with nonindependence of sites
- 14. Models of protein evolution
- Amino acid models
- The Dayhoff model
- Other empirically-based models
- Models depending on secondary structure
- Codon-based models
- Inequality of synonymous and nonsynonymous substitutions
- Protein structure and correlated change
- 15. Restriction sites, RAPDs, AFLPs, and microsatellites
- Restriction sites
- Nei and Tajima’s model
- Distances based on restriction sites
- Issues of ascertainment
- Parsimony for restriction sites
- Modeling restriction fragments
- Parsimony with restriction fragments
- RAPDs and AFLPs
- The issue of dominance
- Unresolved problems
- Microsatellite models
- The one-step model
- Microsatellite distances
- A Brownian motion approximation
- Models with constraints on array size
- Multi-step and heterogeneous models
- Snakes and Ladders
- Complications
- 16. Likelihood methods
- Maximum likelihood
- An example
- Computing the likelihood of a tree
- Economizing on the computation
- Handling ambiguity and error
- Unrootedness
- Finding the maximum likelihood tree
- Inferring ancestral sequences
- Rates varying among sites
- Hidden Markov models
- Autocorrelation of rates
- HMMs for other aspects of models
- Estimating the states
- Models with clocks
- Relaxing molecular clocks
- Models for relaxed clocks
- Covarions
- Empirical approaches to change of rates
- Are ML estimates consistent?
- Comparability of likelihoods
- A nonexistent proof?
- A simple proof
- Misbehavior with the wrong model
- Better behavior with the wrong model
- 17. Hadamard methods
- The edge length spectrum and conjugate spectrum
- The closest tree criterion
- DNA models
- Computational effort
- Extensions of Hadamard methods
- 18. Bayesian inference of phylogenies
- Bayes’ theorem
- Bayesian methods for phylogenies
- Markov chain Monte Carlo methods
- The Metropolis algorithm
- Its equilibrium distribution
- Bayesian MCMC
- Bayesian MCMC for phylogenies
- Priors
- Proposal distributions
- Computing the likelihoods
- Summarizing the posterior
- Priors on trees
- Controversies over Bayesian inference
- Universality of the prior
- Flat priors and doubts about them
- Applications of Bayesian methods
- 19. Testing models, trees, and clocks
- Likelihood and tests
- Likelihood ratios near asymptopia
- Multiple parameters
- Some parameters constrained, some not
- Conditions
- Curvature or height?
- Interval estimates
- Testing assertions about parameters
- Coins in a barrel
- Evolutionary rates instead of coins
- Choosing among nonnested hypotheses: AIC and BIC
- An example using the AIC criterion
- The problem of multiple topologies
- LRTs and single branches
- Interior branch tests
- Interior branch tests using parsimony
- A multiple-branch counterpart of interior branch tests
- Testing the molecular clock
- Parsimony-based methods
- Distance-based methods
- Likelihood-based methods
- The relative rate test
- Simulation tests based on likelihood
- Further literature
- More exact tests and confidence intervals
- Tests for three species with a clock
- Bremer support
- Zander’s conditional probability of reconstruction
- More generalized confidence sets
- 20. Bootstrap, jackknife, and permutation tests
- The bootstrap and the jackknife
- Bootstrapping and phylogenies
- The delete-half jackknife
- The bootstrap and jackknife for phylogenies
- The multiple-tests problem
- Independence of characters
- Identical distribution —- a problem?
- Invariant characters and resampling methods
- Biases in bootstrap and jackknife probabilities
- $P$ values in a simple normal case
- Methods of reducing the bias
- The drug testing analogy
- Alternatives to P values
- Probabilities of trees
- Using tree distances
- Jackknifing species
- Parametric bootstrapping
- Advantages and disadvantages of the parametric bootstrap
- Permutation tests
- Permuting species within characters
- Permuting characters
- Skewness of tree length distribution
- 21. Paired-sites tests
- An example
- Multiple trees
- The SH test
- Other multiple-comparison tests
- Testing other parameters
- Perspective
- 22. Invariants
- Symmetry invariants
- Three-species invariants
- Lake’s linear invariants
- Cavender’s quadratic invariants
- The K invariants
- The L invariants
- Generalization of Cavender’s L invariants
- Drolet and Sankoff’s k-state quadratic invariants
- Clock invariants
- General methods for finding invariants
- Fourier transform methods
- Gröbner bases and other general methods
- Expressions for all the 3ST invariants
- Finding all invariants empirically
- All linear invariants
- Special cases and extensions
- Invariants and evolutionary rates
- Testing invariants
- What use are invariants?
- 23. Brownian motion and gene frequencies
- Brownian motion
- Likelihood for a phylogeny
- What likelihood to compute?
- Assuming a clock
- The REML approach
- Multiple characters and Kronecker products
- Pruning the likelihood
- Maximizing the likelihood
- Inferring ancestral states
- Squared-change parsimony
- Gene frequencies and Brownian motion
- Using approximate Brownian motion
- Distances from gene frequencies
- A more exact likelihood method
- Gene frequency parsimony
- 24. Quantitative characters
- Neutral models of quantitative characters
- Changes due to natural selection
- Selective correlation
- Covariances of multiple characters in multiple lineages
- Selection for an optimum
- Brownian motion and selection
- Correcting for correlations
- Punctuational models
- Inferring phylogenies and correlations
- Chasing a common optimum
- The character-coding “problem”
- Continuous-character parsimony methods
- Manhattan metric parsimony
- Other parsimony methods
- Threshold models
- 25. Comparative methods
- An example with discrete states
- An example with continuous characters
- The contrasts method
- Correlations between characters
- When the tree is not completely known
- Inferring change in a branch
- Sampling error
- The standard regression and other variations
- Generalized least squares
- Phylogenetic autocorrelation
- Transformations of time
- Should we use the phylogeny at all?
- Paired-lineage tests
- Discrete characters
- Ridley’s method
- Concentrated-changes tests
- A paired-lineages test
- Methods using likelihood
- Advantages of the likelihood approach
- Molecular applications
- 26. Coalescent trees
- Kingman’s coalescent
- Bugs in a box—an analogy
- Effect of varying population size
- Migration
- Effect of recombination
- Coalescents and natural selection
- Neuhauser and Krone’s method
- 27. Likelihood calculations on coalescents
- The basic equation
- Using accurate genealogies—a reverie
- Two random sampling methods
- A Metropolis-Hastings method
- Griffiths and Tavaré’s method
- Bayesian methods
- MCMC for a variety of coalescent models
- Single-tree methods
- Slatkin and Maddison’s method
- Fu’s method
- Summary-statistic methods
- Watterson’s method
- Other summary-statistic methods
- Testing for recombination
- 28. Coalescents and species trees
- Methods of inferring the species phylogeny
- Reconciled tree parsimony approaches
- Likelihood
- 29. Alignment, gene families, and genomics
- Alignment
- Why phylogenies are important
- Parsimony method
- Approximations and progressive alignment
- Probabilistic models
- Bishop and Thompson’s method
- The minimum message length method
- The TKF model
- Multibase insertions and deletions
- Tree HMMs
- Trees
- Inferring the alignment
- Gene families
- Reconciled trees
- Reconstructing duplications
- Rooting unrooted trees
- A likelihood analysis
- Comparative genomics
- Tandemly repeated genes
- Inversions
- Inversions in trees
- Inversions, transpositions, and translocations
- Breakpoint and neighbor-coding approximations
- Synteny
- Probabilistic models
- Genome signature methods
- 30. Consensus trees and distances between trees
- Consensus trees
- Strict consensus
- Majority-rule consensus
- Adams consensus tree
- A dismaying result
- Consensus using branch lengths
- Other consensus tree methods
- Consensus subtrees
- Distances between trees
- The symmetric difference
- The quartets distance
- The nearest-neighbor interchange distance
- The path-length-difference metric
- Distances using branch lengths
- Are these distances truly distances?
- Consensus trees and distances
- Trees significantly the same? different?
- What do consensus trees and tree distances tell us?
- The total evidence debate
- A modest proposal
- 31. Biogeography, hosts, and parasites
- Component compatibility
- Brooks parsimony
- Event-based parsimony methods
- Relation to tree reconciliation
- Randomization tests
- Statistical inference
- 32. Phylogenies and paleontology
- Stratigraphic indices
- Stratophenetics
- Stratocladistics
- Controversies
- A not-quite-likelihood method
- Stratolikelihood
- Making a full likelihood method
- More realistic fossilization models
- Fossils within species: Sequential sampling
- Between species
- 33. Tests based on tree shape
- Using the topology only
- Imbalance at the root
- Harding’s probabilities of tree shapes
- Tests from shapes
- Measures of overall asymmetry
- Choosing a powerful test
- Tests using times
- Lineage plots
- Likelihood formulas
- Other likelihood approaches
- Other statistical approaches
- A time transformation
- Characters and key innovations
- Work remaining
- 34. Drawing trees
- Issues in drawing rooted trees
- Placement of interior nodes
- Shapes of lineages
- Unrooted trees
- The equal-angle algorithm
- n-Body algorithms
- The equal-daylight algorithm
- Challenges
- 35. Phylogeny software
- Trees, records, and pointers
- Declaring records
- Traversing the tree
- Unrooted tree data structures
- Tree file formats
- Widely used phylogeny programs and packages
- REFERENCES
- INDEX