Inferring Phylogenies
From Christoph's Personal Wiki
Inferring Phylogenies (ISBN ) by Joseph Felsenstein.
Table of Contents
- PREFACE
- 1. Parsimony methods
- A simple example
- Evaluating a particular tree
- Rootedness and unrootedness
- Methods of rooting the tree
- Branch lengths
- Unresolved questions
- A simple example
- 2. Counting evolutionary changes
- The Fitch algorithm
- The Sankoff algorithm
- Connection between the two algorithms
- Using the algorithms when modifying trees
- Views
- Using views when a tree is altered
- Further economies
- 3. How many trees are there?
- Rooted bifurcating trees
- Unrooted bifurcating trees
- Multifurcating trees
- Unrooted trees with multifurcations
- Tree shapes
- Rooted bifurcating tree shapes
- Rooted multifurcating tree shapes
- Unrooted Shapes
- Labeled histories
- Perspective
- 4. Finding the best tree by heuristic search
- Nearest-neighbor interchanges
- Subtree pruning and regrafting
- Tree bisection and reconnection
- Other tree rearrangement methods
- Tree-fusing
- Genetic algorithms
- Tree windows and sectorial search
- Speeding up rearrangements
- Sequential addition
- Star decomposition
- Tree space
- Search by reweighting of characters
- Simulated annealing
- History
- 5. Finding the best tree by branch and bound
- A nonbiological example
- Finding the optimal solution
- NP-hardness
- Branch and bound methods
- Phylogenies: Despair and hope
- Branch and bound for parsimony
- Improving the bound
- Using still-absent states
- Using compatibility
- Rules limiting the search
- 6. Ancestral states and branch lengths
- Reconstructing ancestral states
- Accelerated and delayed transformation
- Branch lengths
- 7. Variants of parsimony
- Camin-Sokal parsimony
- Parsimony on an ordinal scale
- Dollo parsimony
- Polymorphism parsimony
- Unknown ancestral states
- Multiple states and binary coding
- Dollo parsimony and multiple states
- Polymorphism parsimony and multiple states
- Transformation series analysis
- Weighting characters
- Successive weighting and nonlinear weighting
- Successive weighting
- Nonsuccessive algorithms
- 8. Compatibility
- Testing compatibility
- The Pairwise Compatibility Theorem
- Cliques of compatible characters
- Finding the tree from the clique
- Other cases where cliques can be used
- Where cliques cannot be used
- Perfect phylogeny
- Using compatibility on molecules anyway
- 9. Statistical properties of parsimony
- Likelihood and parsimony
- The weights
- Unweighted parsimony
- Limitations of this justification of parsimony
- Farris’s proofs
- No common mechanism
- Likelihood and compatibility
- Parsimony versus compatibility
- Consistency and parsimony
- Character patterns and parsimony
- Observed numbers of the patterns
- Observed fractions of the patterns
- Expected fractions of the patterns
- Inconsistency
- When inconsistency is not a problem
- The nucleotide sequence case
- Other situations where consistency is guaranteed
- Does a molecular clock guarantee consistency?
- The Farris zone
- Some perspective
- 10. A digression on history and philosophy
- How phylogeny algorithms developed
- Sokal and Sneath
- Edwards and Cavalli-Sforza
- Camin and Sokal and parsimony
- Eck and Dayhoff and molecular parsimony
- Fitch and Margoliash popularize distance matrix methods
- Wilson and Le Quesne introduce compatibility
- Jukes and Cantor and molecular distances
- Farris and Kluge and unordered parsimony
- Fitch and molecular parsimony
- Further work
- What about Willi Hennig and Walter Zimmerman?
- Different philosophical frameworks
- Hypothetico-deductive
- Logical parsimony
- Logical probability?
- Criticisms of statistical inference
- The irrelevance of classification
- 11. Distance matrix methods
- Branch lengths and times
- The least squares methods
- Least squares branch lengths
- Finding the least squares tree topology
- The statistical rationale
- Generalized least squares
- Distances
- The Jukes-Cantor model—-an example
- Why correct for multiple changes?
- Minimum evolution
- Clustering algorithms
- UPGMA and least squares
- A clustering algorithm
- An example
- UPGMA on nonclocklike trees
- Neighbor-joining
- Performance
- Using neighbor-joining with other methods
- Relation of neighbor-joining to least squares
- Weighted versions of neighbor-joining
- Other approximate distance methods
- Distance Wagner method
- A related family
- Minimizing the maximum discrepancy
- Two approaches to error in trees
- A puzzling formula
- Consistency and distance methods
- A limitation of distance methods
- 12. Quartets of species
- The four point metric
- The split decomposition
- Related methods
- Short quartets methods
- The disk-covering method
- Challenges for the short quartets and DCM methods
- Three-taxon statement methods
- Other uses of quartets with parsimony
- Consensus supertrees
- Neighborliness
- De Soete’s search method
- Quartet puzzling and searching tree space
- Perspective
- 13. Models of DNA evolution
- Kimura’s two-parameter model
- Calculation of the distance
- The Tamura-Nei model, F84, and HKY
- The general time-reversible model
- Distances from the GTR model
- The general 12-parameter model
- LogDet distances
- Other distances
- Variance of distance
- Rate variation between sites or loci
- Different rates at different sites
- Distances with known rates
- Distribution of rates
- Gamma- and lognormally distributed rates
- Distances from gamma-distributed rates
- Models with nonindependence of sites
- 14. Models of protein evolution
- Amino acid models
- The Dayhoff model
- Other empirically-based models
- Models depending on secondary structure
- Codon-based models
- Inequality of synonymous and nonsynonymous substitutions
- Protein structure and correlated change
- 15. Restriction sites, RAPDs, AFLPs, and microsatellites
- Restriction sites
- Nei and Tajima’s model
- Distances based on restriction sites
- Issues of ascertainment
- Parsimony for restriction sites
- Modeling restriction fragments
- Parsimony with restriction fragments
- RAPDs and AFLPs
- The issue of dominance
- Unresolved problems
- Microsatellite models
- The one-step model
- Microsatellite distances
- A Brownian motion approximation
- Models with constraints on array size
- Multi-step and heterogeneous models
- Snakes and Ladders
- Complications
- 16. Likelihood methods
- Maximum likelihood
- An example
- Computing the likelihood of a tree
- Economizing on the computation
- Handling ambiguity and error
- Unrootedness
- Finding the maximum likelihood tree
- Inferring ancestral sequences
- Rates varying among sites
- Hidden Markov models
- Autocorrelation of rates
- HMMs for other aspects of models
- Estimating the states
- Models with clocks
- Relaxing molecular clocks
- Models for relaxed clocks
- Covarions
- Empirical approaches to change of rates
- Are ML estimates consistent?
- Comparability of likelihoods
- A nonexistent proof?
- A simple proof
- Misbehavior with the wrong model
- Better behavior with the wrong model
- 17. Hadamard methods
- The edge length spectrum and conjugate spectrum
- The closest tree criterion
- DNA models
- Computational effort
- Extensions of Hadamard methods
- 18. Bayesian inference of phylogenies
- Bayes’ theorem
- Bayesian methods for phylogenies
- Markov chain Monte Carlo methods
- The Metropolis algorithm
- Its equilibrium distribution
- Bayesian MCMC
- Bayesian MCMC for phylogenies
- Priors
- Proposal distributions
- Computing the likelihoods
- Summarizing the posterior
- Priors on trees
- Controversies over Bayesian inference
- Universality of the prior
- Flat priors and doubts about them
- Applications of Bayesian methods
- 19. Testing models, trees, and clocks
- Likelihood and tests
- Likelihood ratios near asymptopia
- Multiple parameters
- Some parameters constrained, some not
- Conditions
- Curvature or height?
- Interval estimates
- Testing assertions about parameters
- Coins in a barrel
- Evolutionary rates instead of coins
- Choosing among nonnested hypotheses: AIC and BIC
- An example using the AIC criterion
- The problem of multiple topologies
- LRTs and single branches
- Interior branch tests
- Interior branch tests using parsimony
- A multiple-branch counterpart of interior branch tests
- Testing the molecular clock
- Parsimony-based methods
- Distance-based methods
- Likelihood-based methods
- The relative rate test
- Simulation tests based on likelihood
- Further literature
- More exact tests and confidence intervals
- Tests for three species with a clock
- Bremer support
- Zander’s conditional probability of reconstruction
- More generalized confidence sets
- 20. Bootstrap, jackknife, and permutation tests
- The bootstrap and the jackknife
- Bootstrapping and phylogenies
- The delete-half jackknife
- The bootstrap and jackknife for phylogenies
- The multiple-tests problem
- Independence of characters
- Identical distribution —- a problem?
- Invariant characters and resampling methods
- Biases in bootstrap and jackknife probabilities
- $P$ values in a simple normal case
- Methods of reducing the bias
- The drug testing analogy
- Alternatives to P values
- Probabilities of trees
- Using tree distances
- Jackknifing species
- Parametric bootstrapping
- Advantages and disadvantages of the parametric bootstrap
- Permutation tests
- Permuting species within characters
- Permuting characters
- Skewness of tree length distribution
- 21. Paired-sites tests
- An example
- Multiple trees
- The SH test
- Other multiple-comparison tests
- Testing other parameters
- Perspective
- 22. Invariants
- Symmetry invariants
- Three-species invariants
- Lake’s linear invariants
- Cavender’s quadratic invariants
- The K invariants
- The L invariants
- Generalization of Cavender’s L invariants
- Drolet and Sankoff’s k-state quadratic invariants
- Clock invariants
- General methods for finding invariants
- Fourier transform methods
- Gröbner bases and other general methods
- Expressions for all the 3ST invariants
- Finding all invariants empirically
- All linear invariants
- Special cases and extensions
- Invariants and evolutionary rates
- Testing invariants
- What use are invariants?
- 23. Brownian motion and gene frequencies
- Brownian motion
- Likelihood for a phylogeny
- What likelihood to compute?
- Assuming a clock
- The REML approach
- Multiple characters and Kronecker products
- Pruning the likelihood
- Maximizing the likelihood
- Inferring ancestral states
- Squared-change parsimony
- Gene frequencies and Brownian motion
- Using approximate Brownian motion
- Distances from gene frequencies
- A more exact likelihood method
- Gene frequency parsimony
- 24. Quantitative characters
- Neutral models of quantitative characters
- Changes due to natural selection
- Selective correlation
- Covariances of multiple characters in multiple lineages
- Selection for an optimum
- Brownian motion and selection
- Correcting for correlations
- Punctuational models
- Inferring phylogenies and correlations
- Chasing a common optimum
- The character-coding “problem”
- Continuous-character parsimony methods
- Manhattan metric parsimony
- Other parsimony methods
- Threshold models
- 25. Comparative methods
- An example with discrete states
- An example with continuous characters
- The contrasts method
- Correlations between characters
- When the tree is not completely known
- Inferring change in a branch
- Sampling error
- The standard regression and other variations
- Generalized least squares
- Phylogenetic autocorrelation
- Transformations of time
- Should we use the phylogeny at all?
- Paired-lineage tests
- Discrete characters
- Ridley’s method
- Concentrated-changes tests
- A paired-lineages test
- Methods using likelihood
- Advantages of the likelihood approach
- Molecular applications
- 26. Coalescent trees
- Kingman’s coalescent
- Bugs in a box—an analogy
- Effect of varying population size
- Migration
- Effect of recombination
- Coalescents and natural selection
- Neuhauser and Krone’s method
- 27. Likelihood calculations on coalescents
- The basic equation
- Using accurate genealogies—a reverie
- Two random sampling methods
- A Metropolis-Hastings method
- Griffiths and Tavaré’s method
- Bayesian methods
- MCMC for a variety of coalescent models
- Single-tree methods
- Slatkin and Maddison’s method
- Fu’s method
- Summary-statistic methods
- Watterson’s method
- Other summary-statistic methods
- Testing for recombination
- 28. Coalescents and species trees
- Methods of inferring the species phylogeny
- Reconciled tree parsimony approaches
- Likelihood
- 29. Alignment, gene families, and genomics
- Alignment
- Why phylogenies are important
- Parsimony method
- Approximations and progressive alignment
- Probabilistic models
- Bishop and Thompson’s method
- The minimum message length method
- The TKF model
- Multibase insertions and deletions
- Tree HMMs
- Trees
- Inferring the alignment
- Gene families
- Reconciled trees
- Reconstructing duplications
- Rooting unrooted trees
- A likelihood analysis
- Comparative genomics
- Tandemly repeated genes
- Inversions
- Inversions in trees
- Inversions, transpositions, and translocations
- Breakpoint and neighbor-coding approximations
- Synteny
- Probabilistic models
- Genome signature methods
- 30. Consensus trees and distances between trees
- Consensus trees
- Strict consensus
- Majority-rule consensus
- Adams consensus tree
- A dismaying result
- Consensus using branch lengths
- Other consensus tree methods
- Consensus subtrees
- Distances between trees
- The symmetric difference
- The quartets distance
- The nearest-neighbor interchange distance
- The path-length-difference metric
- Distances using branch lengths
- Are these distances truly distances?
- Consensus trees and distances
- Trees significantly the same? different?
- What do consensus trees and tree distances tell us?
- The total evidence debate
- A modest proposal
- 31. Biogeography, hosts, and parasites
- Component compatibility
- Brooks parsimony
- Event-based parsimony methods
- Relation to tree reconciliation
- Randomization tests
- Statistical inference
- 32. Phylogenies and paleontology
- Stratigraphic indices
- Stratophenetics
- Stratocladistics
- Controversies
- A not-quite-likelihood method
- Stratolikelihood
- Making a full likelihood method
- More realistic fossilization models
- Fossils within species: Sequential sampling
- Between species
- 33. Tests based on tree shape
- Using the topology only
- Imbalance at the root
- Harding’s probabilities of tree shapes
- Tests from shapes
- Measures of overall asymmetry
- Choosing a powerful test
- Tests using times
- Lineage plots
- Likelihood formulas
- Other likelihood approaches
- Other statistical approaches
- A time transformation
- Characters and key innovations
- Work remaining
- 34. Drawing trees
- Issues in drawing rooted trees
- Placement of interior nodes
- Shapes of lineages
- Unrooted trees
- The equal-angle algorithm
- n-Body algorithms
- The equal-daylight algorithm
- Challenges
- 35. Phylogeny software
- Trees, records, and pointers
- Declaring records
- Traversing the tree
- Unrooted tree data structures
- Tree file formats
- Widely used phylogeny programs and packages
- REFERENCES
- INDEX