Dot-star file format

From Christoph's Personal Wiki
Revision as of 17:37, 3 January 2006 by Christoph (Talk | contribs) (Started article)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The dot-star file format (or ".* format") is a standard used in phylogenetics to represent the partition of trees as clades.

dots (".") 
for the taxa that are on one side of the partition
stars ("*") 
for the taxa that are on the other side of the partition

Example dot-star file:

***************************...****************** 408
***************************...*.**************** 446
.................**....**.****.***.............. 197
***********************..*******.*************** 185
*****************..*******.******.************** 138
.......................**..***.**............... 72
111111111111111111111112211333132111111111111111

The columns of stars and dots in the table represent the sequences in the dataset, 1 to n from left to right. Each row represents the separation of the sequences into two groups (clades), the stars and the dots. The branch tahat separates the star clade from the dot clade occurs in the resampled trees the number of times indicated at the right end of each line out of the total number of resamplings. Thus the validity of any predicted branch can be quantified.

Another example

For each taxonomic assignment in your database data, you then check how many times the query sequence is a member of at least one partition (one of the two sets defined by an edge in the tree) which except for the query sequence only counts sequences belonging to that taxonomic assignment as its members.

For example, if you have 8 database sequences and sequence 1, 2, 3, and 5 belong to group 'waggadoodles', and you have the following output:

1: ...*.***.
2: *.......*
3: .*......*
4: ******..*
5: ..**....*

where the last sequence is the query sequence, then the probability of the query sequence belonging to the waggadoodles is 60% because it formed a unqiue (monophyletic) group with at least some waggadoodles in 3 out of 5 cases (case 1, 2 and 3).