Inferring Phylogenies

From Christoph's Personal Wiki
Jump to: navigation, search

Inferring Phylogenies (ISBN ) by Joseph Felsenstein.

Table of Contents

  • PREFACE
  • 1. Parsimony methods
    • A simple example
      • Evaluating a particular tree
      • Rootedness and unrootedness
    • Methods of rooting the tree
    • Branch lengths
    • Unresolved questions
  • 2. Counting evolutionary changes
  • The Fitch algorithm
  • The Sankoff algorithm
    • Connection between the two algorithms
  • Using the algorithms when modifying trees
    • Views
    • Using views when a tree is altered
  • Further economies
  • 3. How many trees are there?
  • Rooted bifurcating trees
  • Unrooted bifurcating trees
  • Multifurcating trees
    • Unrooted trees with multifurcations
  • Tree shapes
    • Rooted bifurcating tree shapes
    • Rooted multifurcating tree shapes
    • Unrooted Shapes
  • Labeled histories
  • Perspective
  • 4. Finding the best tree by heuristic search
  • Nearest-neighbor interchanges
  • Subtree pruning and regrafting
  • Tree bisection and reconnection
  • Other tree rearrangement methods
    • Tree-fusing
    • Genetic algorithms
    • Tree windows and sectorial search
  • Speeding up rearrangements
  • Sequential addition
  • Star decomposition
  • Tree space
  • Search by reweighting of characters
  • Simulated annealing
  • History
  • 5. Finding the best tree by branch and bound
  • A nonbiological example
  • Finding the optimal solution
  • NP-hardness
  • Branch and bound methods
  • Phylogenies: Despair and hope
  • Branch and bound for parsimony
  • Improving the bound
    • Using still-absent states
    • Using compatibility
  • Rules limiting the search
  • 6. Ancestral states and branch lengths
  • Reconstructing ancestral states
  • Accelerated and delayed transformation
  • Branch lengths
  • 7. Variants of parsimony
  • Camin-Sokal parsimony
  • Parsimony on an ordinal scale
  • Dollo parsimony
  • Polymorphism parsimony
  • Unknown ancestral states
  • Multiple states and binary coding
  • Dollo parsimony and multiple states
  • Polymorphism parsimony and multiple states
  • Transformation series analysis
  • Weighting characters
  • Successive weighting and nonlinear weighting
    • Successive weighting
    • Nonsuccessive algorithms
  • 8. Compatibility
  • Testing compatibility
  • The Pairwise Compatibility Theorem
  • Cliques of compatible characters
  • Finding the tree from the clique
  • Other cases where cliques can be used
  • Where cliques cannot be used
    • Perfect phylogeny
    • Using compatibility on molecules anyway
  • 9. Statistical properties of parsimony
  • Likelihood and parsimony
    • The weights
    • Unweighted parsimony
    • Limitations of this justification of parsimony
    • Farris’s proofs
    • No common mechanism
    • Likelihood and compatibility
    • Parsimony versus compatibility
  • Consistency and parsimony
    • Character patterns and parsimony
    • Observed numbers of the patterns
    • Observed fractions of the patterns
    • Expected fractions of the patterns
    • Inconsistency
    • When inconsistency is not a problem
    • The nucleotide sequence case
    • Other situations where consistency is guaranteed
    • Does a molecular clock guarantee consistency?
    • The Farris zone
  • Some perspective
  • 10. A digression on history and philosophy
  • How phylogeny algorithms developed
    • Sokal and Sneath
    • Edwards and Cavalli-Sforza
    • Camin and Sokal and parsimony
    • Eck and Dayhoff and molecular parsimony
    • Fitch and Margoliash popularize distance matrix methods
    • Wilson and Le Quesne introduce compatibility
    • Jukes and Cantor and molecular distances
    • Farris and Kluge and unordered parsimony
    • Fitch and molecular parsimony
    • Further work
    • What about Willi Hennig and Walter Zimmerman?
  • Different philosophical frameworks
    • Hypothetico-deductive
    • Logical parsimony
    • Logical probability?
    • Criticisms of statistical inference
    • The irrelevance of classification
  • 11. Distance matrix methods
  • Branch lengths and times
  • The least squares methods
    • Least squares branch lengths
    • Finding the least squares tree topology
  • The statistical rationale
  • Generalized least squares
  • Distances
  • The Jukes-Cantor model—-an example
  • Why correct for multiple changes?
  • Minimum evolution
  • Clustering algorithms
  • UPGMA and least squares
    • A clustering algorithm
    • An example
    • UPGMA on nonclocklike trees
  • Neighbor-joining
    • Performance
    • Using neighbor-joining with other methods
    • Relation of neighbor-joining to least squares
    • Weighted versions of neighbor-joining
  • Other approximate distance methods
    • Distance Wagner method
    • A related family
    • Minimizing the maximum discrepancy
    • Two approaches to error in trees
  • A puzzling formula
  • Consistency and distance methods
  • A limitation of distance methods
  • 12. Quartets of species
  • The four point metric
  • The split decomposition
    • Related methods
  • Short quartets methods
  • The disk-covering method
  • Challenges for the short quartets and DCM methods
  • Three-taxon statement methods
  • Other uses of quartets with parsimony
  • Consensus supertrees
  • Neighborliness
  • De Soete’s search method
  • Quartet puzzling and searching tree space
  • Perspective
  • 13. Models of DNA evolution
  • Kimura’s two-parameter model
  • Calculation of the distance
  • The Tamura-Nei model, F84, and HKY
  • The general time-reversible model
    • Distances from the GTR model
  • The general 12-parameter model
  • LogDet distances
  • Other distances
  • Variance of distance
  • Rate variation between sites or loci
    • Different rates at different sites
    • Distances with known rates
    • Distribution of rates
    • Gamma- and lognormally distributed rates
    • Distances from gamma-distributed rates
  • Models with nonindependence of sites
  • 14. Models of protein evolution
  • Amino acid models
  • The Dayhoff model
  • Other empirically-based models
    • Models depending on secondary structure
  • Codon-based models
    • Inequality of synonymous and nonsynonymous substitutions
  • Protein structure and correlated change
  • 15. Restriction sites, RAPDs, AFLPs, and microsatellites
  • Restriction sites
    • Nei and Tajima’s model
    • Distances based on restriction sites
    • Issues of ascertainment
    • Parsimony for restriction sites
  • Modeling restriction fragments
    • Parsimony with restriction fragments
  • RAPDs and AFLPs
    • The issue of dominance
    • Unresolved problems
    • Microsatellite models
    • The one-step model
    • Microsatellite distances
    • A Brownian motion approximation
    • Models with constraints on array size
    • Multi-step and heterogeneous models
    • Snakes and Ladders
    • Complications
  • 16. Likelihood methods
  • Maximum likelihood
    • An example
  • Computing the likelihood of a tree
    • Economizing on the computation
    • Handling ambiguity and error
  • Unrootedness
  • Finding the maximum likelihood tree
  • Inferring ancestral sequences
  • Rates varying among sites
    • Hidden Markov models
    • Autocorrelation of rates
    • HMMs for other aspects of models
    • Estimating the states
  • Models with clocks
    • Relaxing molecular clocks
    • Models for relaxed clocks
    • Covarions
    • Empirical approaches to change of rates
  • Are ML estimates consistent?
    • Comparability of likelihoods
    • A nonexistent proof?
    • A simple proof
    • Misbehavior with the wrong model
    • Better behavior with the wrong model
  • 17. Hadamard methods
  • The edge length spectrum and conjugate spectrum
  • The closest tree criterion
  • DNA models
  • Computational effort
  • Extensions of Hadamard methods
  • 18. Bayesian inference of phylogenies
  • Bayes’ theorem
  • Bayesian methods for phylogenies
  • Markov chain Monte Carlo methods
  • The Metropolis algorithm
    • Its equilibrium distribution
    • Bayesian MCMC
  • Bayesian MCMC for phylogenies
    • Priors
  • Proposal distributions
  • Computing the likelihoods
  • Summarizing the posterior
  • Priors on trees
  • Controversies over Bayesian inference
    • Universality of the prior
    • Flat priors and doubts about them
  • Applications of Bayesian methods
  • 19. Testing models, trees, and clocks
  • Likelihood and tests
  • Likelihood ratios near asymptopia
  • Multiple parameters
    • Some parameters constrained, some not
    • Conditions
    • Curvature or height?
  • Interval estimates
  • Testing assertions about parameters
    • Coins in a barrel
    • Evolutionary rates instead of coins
  • Choosing among nonnested hypotheses: AIC and BIC
    • An example using the AIC criterion
  • The problem of multiple topologies
    • LRTs and single branches
  • Interior branch tests
    • Interior branch tests using parsimony
    • A multiple-branch counterpart of interior branch tests
  • Testing the molecular clock
    • Parsimony-based methods
    • Distance-based methods
    • Likelihood-based methods
    • The relative rate test
  • Simulation tests based on likelihood
    • Further literature
  • More exact tests and confidence intervals
    • Tests for three species with a clock
    • Bremer support
    • Zander’s conditional probability of reconstruction
    • More generalized confidence sets
  • 20. Bootstrap, jackknife, and permutation tests
  • The bootstrap and the jackknife
  • Bootstrapping and phylogenies
  • The delete-half jackknife
  • The bootstrap and jackknife for phylogenies
  • The multiple-tests problem
  • Independence of characters
  • Identical distribution —- a problem?
  • Invariant characters and resampling methods
  • Biases in bootstrap and jackknife probabilities
    • $P$ values in a simple normal case
    • Methods of reducing the bias
    • The drug testing analogy
  • Alternatives to P values
    • Probabilities of trees
    • Using tree distances
    • Jackknifing species
  • Parametric bootstrapping
    • Advantages and disadvantages of the parametric bootstrap
  • Permutation tests
    • Permuting species within characters
    • Permuting characters
    • Skewness of tree length distribution
  • 21. Paired-sites tests
  • An example
  • Multiple trees
    • The SH test
    • Other multiple-comparison tests
  • Testing other parameters
  • Perspective
  • 22. Invariants
  • Symmetry invariants
  • Three-species invariants
  • Lake’s linear invariants
  • Cavender’s quadratic invariants
    • The K invariants
    • The L invariants
    • Generalization of Cavender’s L invariants
  • Drolet and Sankoff’s k-state quadratic invariants
  • Clock invariants
  • General methods for finding invariants
    • Fourier transform methods
    • Gröbner bases and other general methods
    • Expressions for all the 3ST invariants
    • Finding all invariants empirically
    • All linear invariants
    • Special cases and extensions
  • Invariants and evolutionary rates
  • Testing invariants
  • What use are invariants?
  • 23. Brownian motion and gene frequencies
  • Brownian motion
  • Likelihood for a phylogeny
  • What likelihood to compute?
    • Assuming a clock
    • The REML approach
  • Multiple characters and Kronecker products
  • Pruning the likelihood
  • Maximizing the likelihood
  • Inferring ancestral states
    • Squared-change parsimony
  • Gene frequencies and Brownian motion
    • Using approximate Brownian motion
    • Distances from gene frequencies
    • A more exact likelihood method
    • Gene frequency parsimony
  • 24. Quantitative characters
  • Neutral models of quantitative characters
  • Changes due to natural selection
    • Selective correlation
    • Covariances of multiple characters in multiple lineages
    • Selection for an optimum
    • Brownian motion and selection
  • Correcting for correlations
  • Punctuational models
  • Inferring phylogenies and correlations
  • Chasing a common optimum
  • The character-coding “problem”
  • Continuous-character parsimony methods
    • Manhattan metric parsimony
    • Other parsimony methods
  • Threshold models
  • 25. Comparative methods
  • An example with discrete states
  • An example with continuous characters
  • The contrasts method
  • Correlations between characters
  • When the tree is not completely known
  • Inferring change in a branch
  • Sampling error
  • The standard regression and other variations
    • Generalized least squares
    • Phylogenetic autocorrelation
    • Transformations of time
    • Should we use the phylogeny at all?
  • Paired-lineage tests
  • Discrete characters
    • Ridley’s method
    • Concentrated-changes tests
    • A paired-lineages test
    • Methods using likelihood
    • Advantages of the likelihood approach
  • Molecular applications
  • 26. Coalescent trees
  • Kingman’s coalescent
  • Bugs in a box—an analogy
  • Effect of varying population size
  • Migration
  • Effect of recombination
  • Coalescents and natural selection
    • Neuhauser and Krone’s method
  • 27. Likelihood calculations on coalescents
  • The basic equation
  • Using accurate genealogies—a reverie
  • Two random sampling methods
    • A Metropolis-Hastings method
    • Griffiths and Tavaré’s method
  • Bayesian methods
  • MCMC for a variety of coalescent models
  • Single-tree methods
    • Slatkin and Maddison’s method
    • Fu’s method
  • Summary-statistic methods
    • Watterson’s method
    • Other summary-statistic methods
    • Testing for recombination
  • 28. Coalescents and species trees
  • Methods of inferring the species phylogeny
    • Reconciled tree parsimony approaches
    • Likelihood
  • 29. Alignment, gene families, and genomics
  • Alignment
    • Why phylogenies are important
  • Parsimony method
    • Approximations and progressive alignment
  • Probabilistic models
    • Bishop and Thompson’s method
    • The minimum message length method
    • The TKF model
    • Multibase insertions and deletions
    • Tree HMMs
    • Trees
    • Inferring the alignment
  • Gene families
    • Reconciled trees
    • Reconstructing duplications
    • Rooting unrooted trees
    • A likelihood analysis
  • Comparative genomics
    • Tandemly repeated genes
    • Inversions
    • Inversions in trees
    • Inversions, transpositions, and translocations
    • Breakpoint and neighbor-coding approximations
    • Synteny
    • Probabilistic models
  • Genome signature methods
  • 30. Consensus trees and distances between trees
  • Consensus trees
    • Strict consensus
    • Majority-rule consensus
    • Adams consensus tree
    • A dismaying result
    • Consensus using branch lengths
    • Other consensus tree methods
    • Consensus subtrees
  • Distances between trees
    • The symmetric difference
    • The quartets distance
    • The nearest-neighbor interchange distance
    • The path-length-difference metric
    • Distances using branch lengths
    • Are these distances truly distances?
    • Consensus trees and distances
    • Trees significantly the same? different?
  • What do consensus trees and tree distances tell us?
    • The total evidence debate
    • A modest proposal
  • 31. Biogeography, hosts, and parasites
  • Component compatibility
  • Brooks parsimony
  • Event-based parsimony methods
    • Relation to tree reconciliation
  • Randomization tests
  • Statistical inference
  • 32. Phylogenies and paleontology
  • Stratigraphic indices
  • Stratophenetics
  • Stratocladistics
  • Controversies
  • A not-quite-likelihood method
  • Stratolikelihood
    • Making a full likelihood method
    • More realistic fossilization models
  • Fossils within species: Sequential sampling
  • Between species
  • 33. Tests based on tree shape
  • Using the topology only
    • Imbalance at the root
  • Harding’s probabilities of tree shapes
  • Tests from shapes
    • Measures of overall asymmetry
    • Choosing a powerful test
  • Tests using times
    • Lineage plots
    • Likelihood formulas
    • Other likelihood approaches
    • Other statistical approaches
    • A time transformation
  • Characters and key innovations
  • Work remaining
  • 34. Drawing trees
  • Issues in drawing rooted trees
    • Placement of interior nodes
    • Shapes of lineages
  • Unrooted trees
    • The equal-angle algorithm
    • n-Body algorithms
    • The equal-daylight algorithm
  • Challenges
  • 35. Phylogeny software
  • Trees, records, and pointers
  • Declaring records
  • Traversing the tree
  • Unrooted tree data structures
  • Tree file formats
  • Widely used phylogeny programs and packages
  • REFERENCES
  • INDEX