|
|
(12 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | '''[http://www.binf.ku.dk/~rasmus/webpage/ras.html Dr. Rasmus Nielsen Laboratory]''' is where I am currently doing research (December 2005-present). It is located at the ''[http://www.binf.ku.dk/view/Main Centre for Bioinformatics]'', [http://www.ku.dk/ København Universitet], Denmark. | + | The '''[http://www.binf.ku.dk/~rasmus/webpage/ras.html Dr. Rasmus Nielsen Laboratory]''' is where I did research from December 2005-February 2006). It is located at the ''[http://www.binf.ku.dk/ Centre for Bioinformatics]'', [http://www.ku.dk/ København Universitet], Denmark. |
| | | |
− | ==Professor Information== | + | == Research topics == |
− | * '''Rasmus Nielsen''' (Ole Roemer Fellow) | + | * [[:Category:Phylogenetics|Phylogenetics]] |
− | * Office: 314 | + | * [[MrBayes]] |
− | * Phone: +45 3532 1279 | + | * [[Clustal]] |
− | * E-mail: rasmus@binf.ku.dk | + | * [[BLAST]] |
| | | |
− | ==Overview of research==
| + | == External links == |
− | | + | |
− | * '''Taxonomy''' (from GenBank files)
| + | |
− | * '''[[Clustal]]''' (align top 50; ignore ''E''-value; remove redundancy)
| + | |
− | ** [[ClustalW]]: <tt>clustalw all.fsa</tt>
| + | |
− | ** [[ClustalX]]: <tt>clustalx all.aln</tt>
| + | |
− | ** [[NJplot]]: <tt>njplot all.ph</tt>
| + | |
− | * '''[[MrBayes]]'''
| + | |
− | ** [[Bayesian inference]]
| + | |
− | ** [[Phylogenetics]]
| + | |
− | ** [[Bayesian Phylogenetic Analysis]]
| + | |
− | ** [[Markov chain Monte Carlo]]
| + | |
− | ** [[NEXUS file format]]
| + | |
− | * '''Find probability of each group'''
| + | |
− | | + | |
− | ==Test runs==
| + | |
− | | + | |
− | * Nucleotide model: '''General Time Reversible''' ('''GTR''') (option: +gamma): <tt>lset nst=2 rates=gamma</tt>
| + | |
− | * ('''Do not do the following!''') Constrain all phylogenetic groups to be monophyletic.
| + | |
− | * Make option: <tt>strict clock trees</tt> (uniform) (<tt>prset brlenspr=clock:uniform</tt>)
| + | |
− | | + | |
− | * 1,000,000 updates (e.g. cycles)
| + | |
− | * discard first 50,000 as burn-in
| + | |
− | * sample a total of 10,000 trees (say)
| + | |
− | | + | |
− | * Then process output to get probabilities of each possible phylogenetic assignment of query sequence.
| + | |
− | | + | |
− | In MrBayes, if you run the program with the constraints of monophyletic groups, you are forcing the query sequence '''not''' to be part of these groups. So that won't work.
| + | |
− | | + | |
− | Instead, run it without the constraints and simply check how often the query sequence is member of a partition that only contains one particular phylogenetic group as members (at any phylognetic level, e.g. species, genus, family, order, etc).
| + | |
− | | + | |
− | Remember, MrBayes will output the probabililities of specific groups (or partitions) directly. So you don't have to do anything with trees yourself.
| + | |
− | | + | |
− | <div style="padding: 1em; margin: 10px; border: 2px dotted blue;">
| + | |
− | <small><font color=blue>1</font></small>Maybe we should use <tt>nst=6</tt>. Maybe also in the initila analyses we should
| + | |
− | not assume a molecular clock, i.e. we should not use the <tt>prset brlenspr=clock:uniform option.l</tt>
| + | |
− | | + | |
− | OK, so plateauing around 100,000 means that we should not use the first 100,000 iterations (1000 samples with samplefreq=100). We then want to know often the query sequence is part of a bipartition which, in addition to the the query sequence itself, only contain members of a particular taxonomic group. We want to do that at all taxonomic levels. You use the <tt>sumt burnin=1000</tt> command to get output that specifies all the most supported partitions. For each taxonomic assignment in you database data, you then check how many times the query sequence is a member of at least one partition (one of the two sets defined by an edge in the tree) which except for the query sequence only counts sequences belonging to that taxonomix assignment as its members.
| + | |
− | | + | |
− | For example, if you have 8 database sequences and sequence 1, 2, 3 and 5 belong to group 'waggadoodles', and you have the following output:
| + | |
− | <pre>
| + | |
− | ...*.***.
| + | |
− | *.......*
| + | |
− | .*......*
| + | |
− | ******..*
| + | |
− | ..**....*
| + | |
− | </pre>
| + | |
− | | + | |
− | where the last sequence is the query sequence, then the probability of the query sequence belonging to the waggadoodles is 60% because it formed a unqiue (monophyletic) group with at least some waggadoodles in 3 out of 5 cases (case 1, 2 and 3).
| + | |
− | </div>
| + | |
− | | + | |
− | ==External links== | + | |
| ===Taxonomy=== | | ===Taxonomy=== |
| * [http://plants.usda.gov/ PLANTS Database] — by the USDA | | * [http://plants.usda.gov/ PLANTS Database] — by the USDA |
Line 71: |
Line 20: |
| ===References=== | | ===References=== |
| * [http://workshop.molecularevolution.org/resources/references/ Molecular Evolution papers] | | * [http://workshop.molecularevolution.org/resources/references/ Molecular Evolution papers] |
| + | ===Misc=== |
| + | * [[Dr. Rasmus Nielsen Laboratory/Notes]] |
| | | |
| [[Category:Academic Research]] | | [[Category:Academic Research]] |