Bootstrapping
Bootstrapping, when applied to phylogenetics, tests whether your entire dataset is supporting your phylogenetic tree, or if the tree is just a marginal winner among many nearly equal alternatives.
"[Bootstrapping is accomplished] by taking random subsamples of the dataset, building trees from each of these and calculating the frequency with which the various parts of your tree are reproduced in each of these random subsamples. If group X is found in every subsample tree, then its bootstrap support is 100%; if it is found in only two-thirds of the subsample trees, its bootstrap support is 67%. Each of the subsamples is the same size as the original, which is accomplished by allowing repeat sampling of sites; that is, random sampling with replacement. It is a simple test, but bootstrap analyses of known phylogenies (viral populations evolved in the laboratory) show that is is a generally dependable measure of phylogenetic accuracy, and that values of 70% or higher are likely to indicate reliable groupings." — by Sandra L. Baldauf (2003).
Statistics
In statistics bootstrapping is a method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. It is distinguished from the jackknife procedure, used to detect outliers, and cross-validation, whose purpose is to make sure that results are repeatable. There are more complicated bootstraps for sampling without replacement, two-sample problems, regression, time series, hierarchical sampling, and other statistical problems.
See also particle filter for the general theory of Sequential Monte Carlo methods, as well as details on some common implementations.
Conventions
Bootstrap values should be displayed as percentages, not raw values. This makes the tree easier to read and to compare with other trees (Baldauf, 2003).
By convention, only bootstrap values of 50% or higher are reported; lower values mean that the node in question was found in less than half of the bootstrap replicates (Baldauf, 2003).
References
Phylogenetics
- Baldauf SL (2003). Phylogeny for the faint of heart: a tutorial. TRENDS in Genetics 19(6):345-351.
- Hillis DM and Bull JJ (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analyses. Sys Biol 42:182-192.
- Felsenstein J (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.
Statistics
- Hesterberg TC, Moore DS, Monaghan S, Clipson A, and Epstein R (2005). Bootstrap Methods and Permutation Tests (software).
- Moore DS, McCabe G, Duckworth W, and Sclove S (2003). Bootstrap Methods and Permutation Tests.
- Simon JL (1997). Resampling: The New Statistics.