Difference between revisions of "Inferring Phylogenies"

From Christoph's Personal Wiki
Jump to: navigation, search
(Started article)
 
(Added "TOC". Needs formatting)
 
Line 2: Line 2:
  
 
== Table of Contents ==
 
== Table of Contents ==
 +
* PREFACE
 +
* '''1. Parsimony methods'''
 +
** A simple example
 +
*** Evaluating a particular tree
 +
*** Rootedness and unrootedness
 +
** Methods of rooting the tree
 +
** Branch lengths
 +
** Unresolved questions
 +
* '''2. Counting evolutionary changes'''
 +
<ul>
 +
<li>The Fitch algorithm</li>
 +
<li>The Sankoff algorithm</li>
 +
<ul>
 +
<li>Connection between the two algorithms</li>
 +
</ul>
 +
<li>Using the algorithms when modifying trees</li>
 +
<ul>
 +
<li>Views</li>
 +
<li>Using views when a tree is altered</li>
 +
</ul>
 +
<li>Further economies</li>
 +
</ul>
 +
* '''3. How many trees are there?'''
 +
<ul>
 +
<li>Rooted bifurcating trees</li>
 +
<li>Unrooted bifurcating trees</li>
 +
<li>Multifurcating trees</li>
 +
<ul>
 +
<li>Unrooted trees with multifurcations</li>
 +
</ul>
 +
<li>Tree shapes</li>
 +
<ul>
 +
<li>Rooted bifurcating tree shapes</li>
 +
<li>Rooted multifurcating tree shapes</li>
 +
<li>Unrooted Shapes</li>
 +
</ul>
 +
<li>Labeled histories</li>
 +
<li>Perspective</li>
 +
</ul>
 +
* '''4. Finding the best tree by heuristic search'''
 +
<ul>
 +
<li>Nearest-neighbor interchanges</li>
 +
<li>Subtree pruning and regrafting</li>
 +
<li>Tree bisection and reconnection</li>
 +
<li>Other tree rearrangement methods</li>
 +
<ul>
 +
<li>Tree-fusing</li>
 +
<li>Genetic algorithms</li>
 +
<li>Tree windows and sectorial search</li>
 +
</ul>
 +
<li>Speeding up rearrangements</li>
 +
<li>Sequential addition</li>
 +
<li>Star decomposition</li>
 +
<li>Tree space</li>
 +
<li>Search by reweighting of characters</li>
 +
<li>Simulated annealing</li>
 +
<li>History</li>
 +
</ul>
 +
* '''5. Finding the best tree by branch and bound'''
 +
<ul>
 +
<li>A nonbiological example</li>
 +
<li>Finding the optimal solution</li>
 +
<li>NP-hardness</li>
 +
<li>Branch and bound methods</li>
 +
<li>Phylogenies: Despair and hope</li>
 +
<li>Branch and bound for parsimony</li>
 +
<li>Improving the bound</li>
 +
<ul>
 +
<li>Using still-absent states</li>
 +
<li>Using compatibility</li>
 +
</ul>
 +
<li>Rules limiting the search</li>
 +
</ul>
 +
* '''6. Ancestral states and branch lengths'''
 +
<ul>
 +
<li>Reconstructing ancestral states</li>
 +
<li>Accelerated and delayed transformation</li>
 +
<li>Branch lengths</li>
 +
</ul>
 +
* '''7. Variants of parsimony'''
 +
<ul>
 +
<li>Camin-Sokal parsimony</li>
 +
<li>Parsimony on an ordinal scale</li>
 +
<li>Dollo parsimony</li>
 +
<li>Polymorphism parsimony</li>
 +
<li>Unknown ancestral states</li>
 +
<li>Multiple states and binary coding</li>
 +
<li>Dollo parsimony and multiple states</li>
 +
<li>Polymorphism parsimony and multiple states</li>
 +
<li>Transformation series analysis</li>
 +
<li>Weighting characters</li>
 +
<li>Successive weighting and nonlinear weighting</li>
 +
<ul>
 +
<li>Successive weighting</li>
 +
<li>Nonsuccessive algorithms</li>
 +
</ul>
 +
</ul>
 +
* '''8. Compatibility'''
 +
<ul>
 +
<li>Testing compatibility</li>
 +
<li>The Pairwise Compatibility Theorem</li>
 +
<li>Cliques of compatible characters</li>
 +
<li>Finding the tree from the clique</li>
 +
<li>Other cases where cliques can be used</li>
 +
<li>Where cliques cannot be used</li>
 +
<ul>
 +
<li>Perfect phylogeny</li>
 +
<li>Using compatibility on molecules anyway</li>
 +
</ul>
 +
</ul>
 +
* '''9. Statistical properties of parsimony'''
 +
<ul>
 +
<li>Likelihood and parsimony</li>
 +
<ul>
 +
<li>The weights</li>
 +
<li>Unweighted parsimony</li>
 +
<li>Limitations of this justification of parsimony</li>
 +
<li>Farris&#8217;s proofs</li>
 +
<li>No common mechanism</li>
 +
<li>Likelihood and compatibility</li>
 +
<li>Parsimony versus compatibility</li>
 +
</ul>
 +
<li>Consistency and parsimony</li>
 +
<ul>
 +
<li>Character patterns and parsimony</li>
 +
<li>Observed numbers of the patterns</li>
 +
<li>Observed fractions of the patterns</li>
 +
<li>Expected fractions of the patterns</li>
 +
<li>Inconsistency</li>
 +
<li>When inconsistency is not a problem</li>
 +
<li>The nucleotide sequence case</li>
 +
<li>Other situations where consistency is guaranteed</li>
 +
<li>Does a molecular clock guarantee consistency?</li>
 +
<li>The Farris zone</li>
 +
</ul>
 +
<li>Some perspective</li>
 +
</ul>
 +
* '''10. A digression on history and philosophy'''
 +
<ul>
 +
<li>How phylogeny algorithms developed</li>
 +
<ul>
 +
<li>Sokal and Sneath</li>
 +
<li>Edwards and Cavalli-Sforza</li>
 +
<li>Camin and Sokal and parsimony</li>
 +
<li>Eck and Dayhoff and molecular parsimony</li>
 +
<li>Fitch and Margoliash popularize distance matrix methods</li>
 +
<li>Wilson and Le Quesne introduce compatibility</li>
 +
<li>Jukes and Cantor and molecular distances</li>
 +
<li>Farris and Kluge and unordered parsimony</li>
 +
<li>Fitch and molecular parsimony</li>
 +
<li>Further work</li>
 +
<li>What about Willi Hennig and Walter Zimmerman?</li>
 +
</ul>
 +
<li>Different philosophical frameworks</li>
 +
<ul>
 +
<li>Hypothetico-deductive</li>
 +
<li>Logical parsimony</li>
 +
<li>Logical probability?</li>
 +
<li>Criticisms of statistical inference</li>
 +
<li>The irrelevance of classification</li>
 +
</ul>
 +
</ul>
 +
* '''11. Distance matrix methods'''
 +
<ul>
 +
<li>Branch lengths and times</li>
 +
<li>The least squares methods</li>
 +
<ul>
 +
<li>Least squares branch lengths</li>
 +
<li>Finding the least squares tree topology</li>
 +
</ul>
 +
<li>The statistical rationale</li>
 +
<li>Generalized least squares</li>
 +
<li>Distances</li>
 +
<li>The [[Jukes-Cantor model]]&#8212;-an example</li>
 +
<li>Why correct for multiple changes?</li>
 +
<li>Minimum evolution</li>
 +
<li>Clustering algorithms</li>
 +
<li>UPGMA and least squares</li>
 +
<ul>
 +
<li>A clustering algorithm</li>
 +
<li>An example</li>
 +
<li>UPGMA on nonclocklike trees</li>
 +
</ul>
 +
<li>Neighbor-joining</li>
 +
<ul>
 +
<li>Performance</li>
 +
<li>Using neighbor-joining with other methods</li>
 +
<li>Relation of neighbor-joining to least squares</li>
 +
<li>Weighted versions of neighbor-joining</li>
 +
</ul>
 +
<li>Other approximate distance methods</li>
 +
<ul>
 +
<li>Distance Wagner method</li>
 +
<li>A related family</li>
 +
<li>Minimizing the maximum discrepancy</li>
 +
<li>Two approaches to error in trees</li>
 +
</ul>
 +
<li>A puzzling formula</li>
 +
<li>Consistency and distance methods</li>
 +
<li>A limitation of distance methods</li>
 +
</ul>
 +
* '''12. Quartets of species'''
 +
<ul>
 +
<li>The four point metric</li>
 +
<li>The split decomposition</li>
 +
<ul>
 +
<li>Related methods</li>
 +
</ul>
 +
<li>Short quartets methods</li>
 +
<li>The disk-covering method</li>
 +
<li>Challenges for the short quartets and DCM methods</li>
 +
<li>Three-taxon statement methods</li>
 +
<li>Other uses of quartets with parsimony</li>
 +
<li>Consensus supertrees</li>
 +
<li>Neighborliness</li>
 +
<li>De Soete&#8217;s search method</li>
 +
<li>Quartet puzzling and searching tree space</li>
 +
<li>Perspective</li>
 +
</ul>
 +
* '''13. Models of DNA evolution'''
 +
<ul>
 +
<li>Kimura&#8217;s two-parameter model</li>
 +
<li>Calculation of the distance</li>
 +
<li>The Tamura-Nei model, F84, and HKY</li>
 +
<li>The general time-reversible model</li>
 +
<ul>
 +
<li>Distances from the GTR model</li>
 +
</ul>
 +
<li>The general 12-parameter model</li>
 +
<li>LogDet distances</li>
 +
<li>Other distances</li>
 +
<li>Variance of distance</li>
 +
<li>Rate variation between sites or loci</li>
 +
<ul>
 +
<li>Different rates at different sites</li>
 +
<li>Distances with known rates</li>
 +
<li>Distribution of rates</li>
 +
<li>Gamma- and lognormally distributed rates</li>
 +
</ul>
 +
<ul>
 +
<li>Distances from gamma-distributed rates</li>
 +
</ul>
 +
<li>Models with nonindependence of sites</li>
 +
</ul>
 +
* '''14. Models of protein evolution'''
 +
<ul>
 +
<li>Amino acid models</li>
 +
<li>The Dayhoff model</li>
 +
<li>Other empirically-based models</li>
 +
<ul>
 +
<li>Models depending on secondary structure</li>
 +
</ul>
 +
<li>Codon-based models</li>
 +
<ul>
 +
<li>Inequality of synonymous and nonsynonymous substitutions</li>
 +
</ul>
 +
<li>Protein structure and correlated change</li>
 +
</ul>
 +
* '''15. Restriction sites, RAPDs, AFLPs, and microsatellites'''
 +
<ul>
 +
<li>Restriction sites</li>
 +
<ul>
 +
<li>Nei and Tajima&#8217;s model</li>
 +
<li>Distances based on restriction sites</li>
 +
<li>Issues of ascertainment</li>
 +
<li>Parsimony for restriction sites</li>
 +
</ul>
 +
<li>Modeling restriction fragments</li>
 +
<ul>
 +
<li>Parsimony with restriction fragments</li>
 +
</ul>
 +
<li>RAPDs and AFLPs</li>
 +
<ul>
 +
<li>The issue of dominance</li>
 +
<li>Unresolved problems</li>
 +
<li>Microsatellite models</li>
 +
<li>The one-step model</li>
 +
<li>Microsatellite distances</li>
 +
<li>A Brownian motion approximation</li>
 +
<li>Models with constraints on array size</li>
 +
<li>Multi-step and heterogeneous models</li>
 +
<li>Snakes and Ladders</li>
 +
<li>Complications</li>
 +
</ul>
 +
</ul>
 +
* '''16. Likelihood methods'''
 +
<ul>
 +
<li>Maximum likelihood</li>
 +
<ul>
 +
<li>An example</li>
 +
</ul>
 +
<li>Computing the likelihood of a tree</li>
 +
<ul>
 +
<li>Economizing on the computation</li>
 +
<li>Handling ambiguity and error</li>
 +
</ul>
 +
<li>Unrootedness</li>
 +
<li>Finding the maximum likelihood tree</li>
 +
<li>Inferring ancestral sequences</li>
 +
<li>Rates varying among sites</li>
 +
<ul>
 +
<li>Hidden Markov models</li>
 +
<li>Autocorrelation of rates</li>
 +
<li>HMMs for other aspects of models</li>
 +
<li>Estimating the states</li>
 +
</ul>
 +
<li>Models with clocks</li>
 +
<ul>
 +
<li>Relaxing molecular clocks</li>
 +
<li>Models for relaxed clocks</li>
 +
<li>Covarions</li>
 +
<li>Empirical approaches to change of rates</li>
 +
</ul>
 +
<li>Are ML estimates consistent?</li>
 +
<ul>
 +
<li>Comparability of likelihoods</li>
 +
<li>A nonexistent proof?</li>
 +
<li>A simple proof</li>
 +
<li>Misbehavior with the wrong model</li>
 +
<li>Better behavior with the wrong model</li>
 +
</ul>
 +
</ul>
 +
* '''17. Hadamard methods'''
 +
<ul>
 +
<li>The edge length spectrum and conjugate spectrum</li>
 +
<li>The closest tree criterion</li>
 +
<li>DNA models</li>
 +
<li>Computational effort</li>
 +
<li>Extensions of Hadamard methods</li>
 +
</ul>
 +
* '''18. Bayesian inference of phylogenies'''
 +
<ul>
 +
<li>Bayes&#8217; theorem</li>
 +
<li>Bayesian methods for phylogenies</li>
 +
<li>Markov chain Monte Carlo methods</li>
 +
<li>The Metropolis algorithm</li>
 +
<ul>
 +
<li>Its equilibrium distribution</li>
 +
<li>Bayesian MCMC</li>
 +
</ul>
 +
<li>Bayesian MCMC for phylogenies</li>
 +
<ul>
 +
<li>Priors</li>
 +
</ul>
 +
<li>Proposal distributions</li>
 +
<li>Computing the likelihoods</li>
 +
<li>Summarizing the posterior</li>
 +
<li>Priors on trees</li>
 +
<li>Controversies over Bayesian inference</li>
 +
<ul>
 +
<li>Universality of the prior</li>
 +
<li>Flat priors and doubts about them</li>
 +
</ul>
 +
<li>Applications of Bayesian methods</li>
 +
</ul>
 +
* '''19. Testing models, trees, and clocks'''
 +
<ul>
 +
<li>Likelihood and tests</li>
 +
<li>Likelihood ratios near asymptopia</li>
 +
<li>Multiple parameters</li>
 +
<ul>
 +
<li>Some parameters constrained, some not</li>
 +
<li>Conditions</li>
 +
<li>Curvature or height?</li>
 +
</ul>
 +
<li>Interval estimates</li>
 +
<li>Testing assertions about parameters</li>
 +
<ul>
 +
<li>Coins in a barrel</li>
 +
<li>Evolutionary rates instead of coins</li>
 +
</ul>
 +
<li>Choosing among nonnested hypotheses: AIC and BIC</li>
 +
<ul>
 +
<li>An example using the AIC criterion</li>
 +
</ul>
 +
<li>The problem of multiple topologies</li>
 +
<ul>
 +
<li>LRTs and single branches</li>
 +
</ul>
 +
<li>Interior branch tests</li>
 +
<ul>
 +
<li>Interior branch tests using parsimony</li>
 +
<li>A multiple-branch counterpart of interior branch tests</li>
 +
</ul>
 +
<li>Testing the molecular clock</li>
 +
<ul>
 +
<li>Parsimony-based methods</li>
 +
<li>Distance-based methods</li>
 +
<li>Likelihood-based methods</li>
 +
<li>The relative rate test</li>
 +
</ul>
 +
<li>Simulation tests based on likelihood</li>
 +
<ul>
 +
<li>Further literature</li>
 +
</ul>
 +
<li>More exact tests and confidence intervals</li>
 +
<ul>
 +
<li>Tests for three species with a clock</li>
 +
<li>Bremer support</li>
 +
<li>Zander&#8217;s conditional probability of reconstruction</li>
 +
<li>More generalized confidence sets</li>
 +
</ul>
 +
</ul>
 +
* '''20. Bootstrap, jackknife, and permutation tests'''
 +
<ul>
 +
<li>The bootstrap and the jackknife</li>
 +
<li>Bootstrapping and phylogenies</li>
 +
<li>The delete-half jackknife</li>
 +
<li>The bootstrap and jackknife for phylogenies</li>
 +
<li>The multiple-tests problem</li>
 +
<li>Independence of characters</li>
 +
<li>Identical distribution &#8212;- a problem?</li>
 +
<li>Invariant characters and resampling methods</li>
 +
<li>Biases in bootstrap and jackknife probabilities</li>
 +
<ul>
 +
<li>$P$ values in a simple normal case</li>
 +
<li>Methods of reducing the bias</li>
 +
<li>The drug testing analogy</li>
 +
</ul>
 +
<li>Alternatives to <em>P</em> values</li>
 +
<ul>
 +
<li>Probabilities of trees</li>
 +
<li>Using tree distances</li>
 +
<li>Jackknifing species</li>
 +
</ul>
 +
<li>Parametric bootstrapping</li>
 +
<ul>
 +
<li>Advantages and disadvantages of the parametric bootstrap</li>
 +
</ul>
 +
<li>Permutation tests</li>
 +
<ul>
 +
<li>Permuting species within characters</li>
 +
</ul>
 +
<ul>
 +
<li>Permuting characters</li>
 +
<li>Skewness of tree length distribution</li>
 +
</ul>
 +
</ul>
 +
* '''21. Paired-sites tests'''
 +
<ul>
 +
<li>An example</li>
 +
<li>Multiple trees</li>
 +
<ul>
 +
<li>The SH test</li>
 +
<li>Other multiple-comparison tests</li>
 +
</ul>
 +
<li>Testing other parameters</li>
 +
<li>Perspective</li>
 +
</ul>
 +
* '''22. Invariants'''
 +
<ul>
 +
<li>Symmetry invariants</li>
 +
<li>Three-species invariants</li>
 +
<li>Lake&#8217;s linear invariants</li>
 +
<li>Cavender&#8217;s quadratic invariants</li>
 +
<ul>
 +
<li>The <em>K</em> invariants</li>
 +
<li>The <em>L</em> invariants</li>
 +
<li>Generalization of Cavender&#8217;s <em>L</em> invariants</li>
 +
</ul>
 +
<li>Drolet and Sankoff&#8217;s <em>k</em>-state quadratic invariants</li>
 +
<li>Clock invariants</li>
 +
<li>General methods for finding invariants</li>
 +
<ul>
 +
<li>Fourier transform methods</li>
 +
<li>Gröbner bases and other general methods</li>
 +
<li>Expressions for all the 3ST invariants</li>
 +
<li>Finding all invariants empirically</li>
 +
<li>All linear invariants</li>
 +
<li>Special cases and extensions</li>
 +
</ul>
 +
<li>Invariants and evolutionary rates</li>
 +
<li>Testing invariants</li>
 +
<li>What use are invariants?</li>
 +
</ul>
 +
* '''23. Brownian motion and gene frequencies'''
 +
<ul>
 +
<li>Brownian motion</li>
 +
<li>Likelihood for a phylogeny</li>
 +
<li>What likelihood to compute?</li>
 +
<ul>
 +
<li>Assuming a clock</li>
 +
<li>The REML approach</li>
 +
</ul>
 +
<li>Multiple characters and Kronecker products</li>
 +
<li>Pruning the likelihood</li>
 +
<li>Maximizing the likelihood</li>
 +
<li>Inferring ancestral states</li>
 +
<ul>
 +
<li>Squared-change parsimony</li>
 +
</ul>
 +
<li>Gene frequencies and Brownian motion</li>
 +
<ul>
 +
<li>Using approximate Brownian motion</li>
 +
<li>Distances from gene frequencies</li>
 +
<li>A more exact likelihood method</li>
 +
<li>Gene frequency parsimony</li>
 +
</ul>
 +
</ul>
 +
* '''24. Quantitative characters'''
 +
<ul>
 +
<li>Neutral models of quantitative characters</li>
 +
<li>Changes due to natural selection</li>
 +
<ul>
 +
<li>Selective correlation</li>
 +
<li>Covariances of multiple characters in multiple lineages</li>
 +
<li>Selection for an optimum</li>
 +
<li>Brownian motion and selection</li>
 +
</ul>
 +
<li>Correcting for correlations</li>
 +
<li>Punctuational models</li>
 +
<li>Inferring phylogenies and correlations</li>
 +
<li>Chasing a common optimum</li>
 +
<li>The character-coding &#8220;problem&#8221;</li>
 +
<li>Continuous-character parsimony methods</li>
 +
<ul>
 +
<li>Manhattan metric parsimony</li>
 +
<li>Other parsimony methods</li>
 +
</ul>
 +
<li>Threshold models</li>
 +
</ul>
 +
* '''25. Comparative methods'''
 +
<ul>
 +
<li>An example with discrete states</li>
 +
<li>An example with continuous characters</li>
 +
<li>The contrasts method</li>
 +
<li>Correlations between characters</li>
 +
<li>When the tree is not completely known</li>
 +
<li>Inferring change in a branch</li>
 +
<li>Sampling error</li>
 +
<li>The standard regression and other variations</li>
 +
<ul>
 +
<li>Generalized least squares</li>
 +
<li>Phylogenetic autocorrelation</li>
 +
<li>Transformations of time</li>
 +
<li>Should we use the phylogeny at all?</li>
 +
</ul>
 +
<li>Paired-lineage tests</li>
 +
<li>Discrete characters</li>
 +
<ul>
 +
<li>Ridley&#8217;s method</li>
 +
<li>Concentrated-changes tests</li>
 +
<li>A paired-lineages test</li>
 +
<li>Methods using likelihood</li>
 +
<li>Advantages of the likelihood approach</li>
 +
</ul>
 +
<li>Molecular applications</li>
 +
</ul>
 +
* '''26. Coalescent trees'''
 +
<ul>
 +
<li>Kingman&#8217;s coalescent</li>
 +
<li>Bugs in a box—an analogy</li>
 +
<li>Effect of varying population size</li>
 +
<li>Migration</li>
 +
<li>Effect of recombination</li>
 +
<li>Coalescents and natural selection</li>
 +
<ul>
 +
<li>Neuhauser and Krone&#8217;s method</li>
 +
</ul>
 +
</ul>
 +
* '''27. Likelihood calculations on coalescents'''
 +
<ul>
 +
<li>The basic equation</li>
 +
<li>Using accurate genealogies—a reverie</li>
 +
<li>Two random sampling methods</li>
 +
<ul>
 +
<li>A Metropolis-Hastings method</li>
 +
<li>Griffiths and Tavaré&#8217;s method</li>
 +
</ul>
 +
<li>Bayesian methods</li>
 +
<li>MCMC for a variety of coalescent models</li>
 +
<li>Single-tree methods</li>
 +
<ul>
 +
<li>Slatkin and Maddison&#8217;s method</li>
 +
<li>Fu&#8217;s method</li>
 +
</ul>
 +
<li>Summary-statistic methods</li>
 +
<ul>
 +
<li>Watterson&#8217;s method</li>
 +
<li>Other summary-statistic methods</li>
 +
<li>Testing for recombination</li>
 +
</ul>
 +
</ul>
 +
* '''28. Coalescents and species trees'''
 +
<ul>
 +
<li>Methods of inferring the species phylogeny</li>
 +
<ul>
 +
<li>Reconciled tree parsimony approaches</li>
 +
<li>Likelihood</li>
 +
</ul>
 +
</ul>
 +
* '''29. Alignment, gene families, and genomics'''
 +
<ul>
 +
<li>Alignment</li>
 +
<ul>
 +
<li>Why phylogenies are important</li>
 +
</ul>
 +
<li>Parsimony method</li>
 +
<ul>
 +
<li>Approximations and progressive alignment</li>
 +
</ul>
 +
<li>Probabilistic models</li>
 +
<ul>
 +
<li>Bishop and Thompson&#8217;s method</li>
 +
<li>The minimum message length method</li>
 +
<li>The TKF model</li>
 +
<li>Multibase insertions and deletions</li>
 +
<li>Tree HMMs</li>
 +
<li>Trees</li>
 +
<li>Inferring the alignment</li>
 +
</ul>
 +
<li>Gene families</li>
 +
<ul>
 +
<li>Reconciled trees</li>
 +
<li>Reconstructing duplications</li>
 +
<li>Rooting unrooted trees</li>
 +
<li>A likelihood analysis</li>
 +
</ul>
 +
<li>Comparative genomics</li>
 +
<ul>
 +
<li>Tandemly repeated genes</li>
 +
<li>Inversions</li>
 +
<li>Inversions in trees</li>
 +
<li>Inversions, transpositions, and translocations</li>
 +
<li>Breakpoint and neighbor-coding approximations</li>
 +
<li>Synteny</li>
 +
<li>Probabilistic models</li>
 +
</ul>
 +
<li>Genome signature methods</li>
 +
</ul>
 +
* '''30. Consensus trees and distances between trees'''
 +
<ul>
 +
<li>Consensus trees</li>
 +
<ul>
 +
<li>Strict consensus</li>
 +
<li>Majority-rule consensus</li>
 +
<li>Adams consensus tree</li>
 +
<li>A dismaying result</li>
 +
<li>Consensus using branch lengths</li>
 +
<li>Other consensus tree methods</li>
 +
<li>Consensus subtrees</li>
 +
</ul>
 +
<li>Distances between trees</li>
 +
<ul>
 +
<li>The symmetric difference</li>
 +
<li>The quartets distance</li>
 +
<li>The nearest-neighbor interchange distance</li>
 +
<li>The path-length-difference metric</li>
 +
<li>Distances using branch lengths</li>
 +
<li>Are these distances truly distances?</li>
 +
<li>Consensus trees and distances</li>
 +
<li>Trees significantly the same? different?</li>
 +
</ul>
 +
<li>What do consensus trees and tree distances tell us?</li>
 +
<ul>
 +
<li>The total evidence debate</li>
 +
<li>A modest proposal</li>
 +
</ul>
 +
</ul>
 +
* '''31. Biogeography, hosts, and parasites'''
 +
<ul>
 +
<li>Component compatibility</li>
 +
<li>Brooks parsimony</li>
 +
<li>Event-based parsimony methods</li>
 +
<ul>
 +
<li>Relation to tree reconciliation</li>
 +
</ul>
 +
<li>Randomization tests</li>
 +
<li>Statistical inference</li>
 +
</ul>
 +
* '''32. Phylogenies and paleontology'''
 +
<ul>
 +
<li>Stratigraphic indices</li>
 +
<li>Stratophenetics</li>
 +
<li>Stratocladistics</li>
 +
<li>Controversies</li>
 +
<li>A not-quite-likelihood method</li>
 +
<li>Stratolikelihood</li>
 +
<ul>
 +
<li>Making a full likelihood method</li>
 +
<li>More realistic fossilization models</li>
 +
</ul>
 +
<li>Fossils within species: Sequential sampling</li>
 +
<li>Between species</li>
 +
</ul>
 +
* '''33. Tests based on tree shape'''
 +
<ul>
 +
<li>Using the topology only</li>
 +
<ul>
 +
<li>Imbalance at the root</li>
 +
</ul>
 +
<li>Harding&#8217;s probabilities of tree shapes</li>
 +
<li>Tests from shapes</li>
 +
<ul>
 +
<li>Measures of overall asymmetry</li>
 +
<li>Choosing a powerful test</li>
 +
</ul>
 +
<li>Tests using times</li>
 +
<ul>
 +
<li>Lineage plots</li>
 +
<li>Likelihood formulas</li>
 +
<li>Other likelihood approaches</li>
 +
<li>Other statistical approaches</li>
 +
<li>A time transformation</li>
 +
</ul>
 +
<li>Characters and key innovations</li>
 +
<li>Work remaining</li>
 +
</ul>
 +
* '''34. Drawing trees'''
 +
<ul>
 +
<li>Issues in drawing rooted trees</li>
 +
<ul>
 +
<li>Placement of interior nodes</li>
 +
<li>Shapes of lineages</li>
 +
</ul>
 +
<li>Unrooted trees</li>
 +
<ul>
 +
<li>The equal-angle algorithm</li>
 +
<li>n-Body algorithms</li>
 +
<li>The equal-daylight algorithm</li>
 +
</ul>
 +
<li>Challenges</li>
 +
</ul>
 +
* '''35. Phylogeny software'''
 +
<ul>
 +
<li>Trees, records, and pointers</li>
 +
<li>Declaring records</li>
 +
<li>Traversing the tree</li>
 +
<li>Unrooted tree data structures</li>
 +
<li>Tree file formats</li>
 +
<li>Widely used phylogeny programs and packages</li>
 +
</ul>
 +
* REFERENCES
 +
* INDEX
  
 
[[Category:Academic Courses]]
 
[[Category:Academic Courses]]
 
[[Category:Books]]
 
[[Category:Books]]

Latest revision as of 10:55, 6 January 2006

Inferring Phylogenies (ISBN ) by Joseph Felsenstein.

Table of Contents

  • PREFACE
  • 1. Parsimony methods
    • A simple example
      • Evaluating a particular tree
      • Rootedness and unrootedness
    • Methods of rooting the tree
    • Branch lengths
    • Unresolved questions
  • 2. Counting evolutionary changes
  • The Fitch algorithm
  • The Sankoff algorithm
    • Connection between the two algorithms
  • Using the algorithms when modifying trees
    • Views
    • Using views when a tree is altered
  • Further economies
  • 3. How many trees are there?
  • Rooted bifurcating trees
  • Unrooted bifurcating trees
  • Multifurcating trees
    • Unrooted trees with multifurcations
  • Tree shapes
    • Rooted bifurcating tree shapes
    • Rooted multifurcating tree shapes
    • Unrooted Shapes
  • Labeled histories
  • Perspective
  • 4. Finding the best tree by heuristic search
  • Nearest-neighbor interchanges
  • Subtree pruning and regrafting
  • Tree bisection and reconnection
  • Other tree rearrangement methods
    • Tree-fusing
    • Genetic algorithms
    • Tree windows and sectorial search
  • Speeding up rearrangements
  • Sequential addition
  • Star decomposition
  • Tree space
  • Search by reweighting of characters
  • Simulated annealing
  • History
  • 5. Finding the best tree by branch and bound
  • A nonbiological example
  • Finding the optimal solution
  • NP-hardness
  • Branch and bound methods
  • Phylogenies: Despair and hope
  • Branch and bound for parsimony
  • Improving the bound
    • Using still-absent states
    • Using compatibility
  • Rules limiting the search
  • 6. Ancestral states and branch lengths
  • Reconstructing ancestral states
  • Accelerated and delayed transformation
  • Branch lengths
  • 7. Variants of parsimony
  • Camin-Sokal parsimony
  • Parsimony on an ordinal scale
  • Dollo parsimony
  • Polymorphism parsimony
  • Unknown ancestral states
  • Multiple states and binary coding
  • Dollo parsimony and multiple states
  • Polymorphism parsimony and multiple states
  • Transformation series analysis
  • Weighting characters
  • Successive weighting and nonlinear weighting
    • Successive weighting
    • Nonsuccessive algorithms
  • 8. Compatibility
  • Testing compatibility
  • The Pairwise Compatibility Theorem
  • Cliques of compatible characters
  • Finding the tree from the clique
  • Other cases where cliques can be used
  • Where cliques cannot be used
    • Perfect phylogeny
    • Using compatibility on molecules anyway
  • 9. Statistical properties of parsimony
  • Likelihood and parsimony
    • The weights
    • Unweighted parsimony
    • Limitations of this justification of parsimony
    • Farris’s proofs
    • No common mechanism
    • Likelihood and compatibility
    • Parsimony versus compatibility
  • Consistency and parsimony
    • Character patterns and parsimony
    • Observed numbers of the patterns
    • Observed fractions of the patterns
    • Expected fractions of the patterns
    • Inconsistency
    • When inconsistency is not a problem
    • The nucleotide sequence case
    • Other situations where consistency is guaranteed
    • Does a molecular clock guarantee consistency?
    • The Farris zone
  • Some perspective
  • 10. A digression on history and philosophy
  • How phylogeny algorithms developed
    • Sokal and Sneath
    • Edwards and Cavalli-Sforza
    • Camin and Sokal and parsimony
    • Eck and Dayhoff and molecular parsimony
    • Fitch and Margoliash popularize distance matrix methods
    • Wilson and Le Quesne introduce compatibility
    • Jukes and Cantor and molecular distances
    • Farris and Kluge and unordered parsimony
    • Fitch and molecular parsimony
    • Further work
    • What about Willi Hennig and Walter Zimmerman?
  • Different philosophical frameworks
    • Hypothetico-deductive
    • Logical parsimony
    • Logical probability?
    • Criticisms of statistical inference
    • The irrelevance of classification
  • 11. Distance matrix methods
  • Branch lengths and times
  • The least squares methods
    • Least squares branch lengths
    • Finding the least squares tree topology
  • The statistical rationale
  • Generalized least squares
  • Distances
  • The Jukes-Cantor model—-an example
  • Why correct for multiple changes?
  • Minimum evolution
  • Clustering algorithms
  • UPGMA and least squares
    • A clustering algorithm
    • An example
    • UPGMA on nonclocklike trees
  • Neighbor-joining
    • Performance
    • Using neighbor-joining with other methods
    • Relation of neighbor-joining to least squares
    • Weighted versions of neighbor-joining
  • Other approximate distance methods
    • Distance Wagner method
    • A related family
    • Minimizing the maximum discrepancy
    • Two approaches to error in trees
  • A puzzling formula
  • Consistency and distance methods
  • A limitation of distance methods
  • 12. Quartets of species
  • The four point metric
  • The split decomposition
    • Related methods
  • Short quartets methods
  • The disk-covering method
  • Challenges for the short quartets and DCM methods
  • Three-taxon statement methods
  • Other uses of quartets with parsimony
  • Consensus supertrees
  • Neighborliness
  • De Soete’s search method
  • Quartet puzzling and searching tree space
  • Perspective
  • 13. Models of DNA evolution
  • Kimura’s two-parameter model
  • Calculation of the distance
  • The Tamura-Nei model, F84, and HKY
  • The general time-reversible model
    • Distances from the GTR model
  • The general 12-parameter model
  • LogDet distances
  • Other distances
  • Variance of distance
  • Rate variation between sites or loci
    • Different rates at different sites
    • Distances with known rates
    • Distribution of rates
    • Gamma- and lognormally distributed rates
    • Distances from gamma-distributed rates
  • Models with nonindependence of sites
  • 14. Models of protein evolution
  • Amino acid models
  • The Dayhoff model
  • Other empirically-based models
    • Models depending on secondary structure
  • Codon-based models
    • Inequality of synonymous and nonsynonymous substitutions
  • Protein structure and correlated change
  • 15. Restriction sites, RAPDs, AFLPs, and microsatellites
  • Restriction sites
    • Nei and Tajima’s model
    • Distances based on restriction sites
    • Issues of ascertainment
    • Parsimony for restriction sites
  • Modeling restriction fragments
    • Parsimony with restriction fragments
  • RAPDs and AFLPs
    • The issue of dominance
    • Unresolved problems
    • Microsatellite models
    • The one-step model
    • Microsatellite distances
    • A Brownian motion approximation
    • Models with constraints on array size
    • Multi-step and heterogeneous models
    • Snakes and Ladders
    • Complications
  • 16. Likelihood methods
  • Maximum likelihood
    • An example
  • Computing the likelihood of a tree
    • Economizing on the computation
    • Handling ambiguity and error
  • Unrootedness
  • Finding the maximum likelihood tree
  • Inferring ancestral sequences
  • Rates varying among sites
    • Hidden Markov models
    • Autocorrelation of rates
    • HMMs for other aspects of models
    • Estimating the states
  • Models with clocks
    • Relaxing molecular clocks
    • Models for relaxed clocks
    • Covarions
    • Empirical approaches to change of rates
  • Are ML estimates consistent?
    • Comparability of likelihoods
    • A nonexistent proof?
    • A simple proof
    • Misbehavior with the wrong model
    • Better behavior with the wrong model
  • 17. Hadamard methods
  • The edge length spectrum and conjugate spectrum
  • The closest tree criterion
  • DNA models
  • Computational effort
  • Extensions of Hadamard methods
  • 18. Bayesian inference of phylogenies
  • Bayes’ theorem
  • Bayesian methods for phylogenies
  • Markov chain Monte Carlo methods
  • The Metropolis algorithm
    • Its equilibrium distribution
    • Bayesian MCMC
  • Bayesian MCMC for phylogenies
    • Priors
  • Proposal distributions
  • Computing the likelihoods
  • Summarizing the posterior
  • Priors on trees
  • Controversies over Bayesian inference
    • Universality of the prior
    • Flat priors and doubts about them
  • Applications of Bayesian methods
  • 19. Testing models, trees, and clocks
  • Likelihood and tests
  • Likelihood ratios near asymptopia
  • Multiple parameters
    • Some parameters constrained, some not
    • Conditions
    • Curvature or height?
  • Interval estimates
  • Testing assertions about parameters
    • Coins in a barrel
    • Evolutionary rates instead of coins
  • Choosing among nonnested hypotheses: AIC and BIC
    • An example using the AIC criterion
  • The problem of multiple topologies
    • LRTs and single branches
  • Interior branch tests
    • Interior branch tests using parsimony
    • A multiple-branch counterpart of interior branch tests
  • Testing the molecular clock
    • Parsimony-based methods
    • Distance-based methods
    • Likelihood-based methods
    • The relative rate test
  • Simulation tests based on likelihood
    • Further literature
  • More exact tests and confidence intervals
    • Tests for three species with a clock
    • Bremer support
    • Zander’s conditional probability of reconstruction
    • More generalized confidence sets
  • 20. Bootstrap, jackknife, and permutation tests
  • The bootstrap and the jackknife
  • Bootstrapping and phylogenies
  • The delete-half jackknife
  • The bootstrap and jackknife for phylogenies
  • The multiple-tests problem
  • Independence of characters
  • Identical distribution —- a problem?
  • Invariant characters and resampling methods
  • Biases in bootstrap and jackknife probabilities
    • $P$ values in a simple normal case
    • Methods of reducing the bias
    • The drug testing analogy
  • Alternatives to P values
    • Probabilities of trees
    • Using tree distances
    • Jackknifing species
  • Parametric bootstrapping
    • Advantages and disadvantages of the parametric bootstrap
  • Permutation tests
    • Permuting species within characters
    • Permuting characters
    • Skewness of tree length distribution
  • 21. Paired-sites tests
  • An example
  • Multiple trees
    • The SH test
    • Other multiple-comparison tests
  • Testing other parameters
  • Perspective
  • 22. Invariants
  • Symmetry invariants
  • Three-species invariants
  • Lake’s linear invariants
  • Cavender’s quadratic invariants
    • The K invariants
    • The L invariants
    • Generalization of Cavender’s L invariants
  • Drolet and Sankoff’s k-state quadratic invariants
  • Clock invariants
  • General methods for finding invariants
    • Fourier transform methods
    • Gröbner bases and other general methods
    • Expressions for all the 3ST invariants
    • Finding all invariants empirically
    • All linear invariants
    • Special cases and extensions
  • Invariants and evolutionary rates
  • Testing invariants
  • What use are invariants?
  • 23. Brownian motion and gene frequencies
  • Brownian motion
  • Likelihood for a phylogeny
  • What likelihood to compute?
    • Assuming a clock
    • The REML approach
  • Multiple characters and Kronecker products
  • Pruning the likelihood
  • Maximizing the likelihood
  • Inferring ancestral states
    • Squared-change parsimony
  • Gene frequencies and Brownian motion
    • Using approximate Brownian motion
    • Distances from gene frequencies
    • A more exact likelihood method
    • Gene frequency parsimony
  • 24. Quantitative characters
  • Neutral models of quantitative characters
  • Changes due to natural selection
    • Selective correlation
    • Covariances of multiple characters in multiple lineages
    • Selection for an optimum
    • Brownian motion and selection
  • Correcting for correlations
  • Punctuational models
  • Inferring phylogenies and correlations
  • Chasing a common optimum
  • The character-coding “problem”
  • Continuous-character parsimony methods
    • Manhattan metric parsimony
    • Other parsimony methods
  • Threshold models
  • 25. Comparative methods
  • An example with discrete states
  • An example with continuous characters
  • The contrasts method
  • Correlations between characters
  • When the tree is not completely known
  • Inferring change in a branch
  • Sampling error
  • The standard regression and other variations
    • Generalized least squares
    • Phylogenetic autocorrelation
    • Transformations of time
    • Should we use the phylogeny at all?
  • Paired-lineage tests
  • Discrete characters
    • Ridley’s method
    • Concentrated-changes tests
    • A paired-lineages test
    • Methods using likelihood
    • Advantages of the likelihood approach
  • Molecular applications
  • 26. Coalescent trees
  • Kingman’s coalescent
  • Bugs in a box—an analogy
  • Effect of varying population size
  • Migration
  • Effect of recombination
  • Coalescents and natural selection
    • Neuhauser and Krone’s method
  • 27. Likelihood calculations on coalescents
  • The basic equation
  • Using accurate genealogies—a reverie
  • Two random sampling methods
    • A Metropolis-Hastings method
    • Griffiths and Tavaré’s method
  • Bayesian methods
  • MCMC for a variety of coalescent models
  • Single-tree methods
    • Slatkin and Maddison’s method
    • Fu’s method
  • Summary-statistic methods
    • Watterson’s method
    • Other summary-statistic methods
    • Testing for recombination
  • 28. Coalescents and species trees
  • Methods of inferring the species phylogeny
    • Reconciled tree parsimony approaches
    • Likelihood
  • 29. Alignment, gene families, and genomics
  • Alignment
    • Why phylogenies are important
  • Parsimony method
    • Approximations and progressive alignment
  • Probabilistic models
    • Bishop and Thompson’s method
    • The minimum message length method
    • The TKF model
    • Multibase insertions and deletions
    • Tree HMMs
    • Trees
    • Inferring the alignment
  • Gene families
    • Reconciled trees
    • Reconstructing duplications
    • Rooting unrooted trees
    • A likelihood analysis
  • Comparative genomics
    • Tandemly repeated genes
    • Inversions
    • Inversions in trees
    • Inversions, transpositions, and translocations
    • Breakpoint and neighbor-coding approximations
    • Synteny
    • Probabilistic models
  • Genome signature methods
  • 30. Consensus trees and distances between trees
  • Consensus trees
    • Strict consensus
    • Majority-rule consensus
    • Adams consensus tree
    • A dismaying result
    • Consensus using branch lengths
    • Other consensus tree methods
    • Consensus subtrees
  • Distances between trees
    • The symmetric difference
    • The quartets distance
    • The nearest-neighbor interchange distance
    • The path-length-difference metric
    • Distances using branch lengths
    • Are these distances truly distances?
    • Consensus trees and distances
    • Trees significantly the same? different?
  • What do consensus trees and tree distances tell us?
    • The total evidence debate
    • A modest proposal
  • 31. Biogeography, hosts, and parasites
  • Component compatibility
  • Brooks parsimony
  • Event-based parsimony methods
    • Relation to tree reconciliation
  • Randomization tests
  • Statistical inference
  • 32. Phylogenies and paleontology
  • Stratigraphic indices
  • Stratophenetics
  • Stratocladistics
  • Controversies
  • A not-quite-likelihood method
  • Stratolikelihood
    • Making a full likelihood method
    • More realistic fossilization models
  • Fossils within species: Sequential sampling
  • Between species
  • 33. Tests based on tree shape
  • Using the topology only
    • Imbalance at the root
  • Harding’s probabilities of tree shapes
  • Tests from shapes
    • Measures of overall asymmetry
    • Choosing a powerful test
  • Tests using times
    • Lineage plots
    • Likelihood formulas
    • Other likelihood approaches
    • Other statistical approaches
    • A time transformation
  • Characters and key innovations
  • Work remaining
  • 34. Drawing trees
  • Issues in drawing rooted trees
    • Placement of interior nodes
    • Shapes of lineages
  • Unrooted trees
    • The equal-angle algorithm
    • n-Body algorithms
    • The equal-daylight algorithm
  • Challenges
  • 35. Phylogeny software
  • Trees, records, and pointers
  • Declaring records
  • Traversing the tree
  • Unrooted tree data structures
  • Tree file formats
  • Widely used phylogeny programs and packages
  • REFERENCES
  • INDEX