Difference between revisions of "Dr. Rasmus Nielsen Laboratory"

From Christoph's Personal Wiki
Jump to: navigation, search
(Overview of research: Added "BLAST" section)
(Overview of research: "Blastall" link)
Line 12: Line 12:
 
** [[FASTA format]]
 
** [[FASTA format]]
 
** [[Formatdb]]: <tt>formatdb -i databasefile -p F -o -n basename</tt>
 
** [[Formatdb]]: <tt>formatdb -i databasefile -p F -o -n basename</tt>
** [[BLAST|Blastall]]: <tt>blastall -p blastn -d database_to_be_searched -i name_of_input_file -o name_of_output_file</tt>
+
** [[Blastall]]: <tt>blastall -p blastn -d database_to_be_searched -i name_of_input_file -o name_of_output_file</tt>
 
* '''Taxonomy''' (from GenBank files)
 
* '''Taxonomy''' (from GenBank files)
 
* '''[[Clustal]]''' (align top 50 from BLAST output)
 
* '''[[Clustal]]''' (align top 50 from BLAST output)

Revision as of 23:24, 23 January 2006

Dr. Rasmus Nielsen Laboratory is where I am currently doing research (December 2005-present). It is located at the Centre for Bioinformatics, København Universitet, Denmark.

Professor Information

  • Rasmus Nielsen (Ole Roemer Fellow)
  • Office: 314
  • Phone: +45 3532 1279
  • E-mail: rasmus@binf.ku.dk

Overview of research

Test runs

  • Nucleotide model: General Time Reversible (GTR) (option: +gamma): lset nst=2 rates=gamma
  • (Do not do the following!) Constrain all phylogenetic groups to be monophyletic.
  • Make option: strict clock trees (uniform) (prset brlenspr=clock:uniform)
  • 1,000,000 updates (e.g. cycles)
  • discard first 50,000 as burn-in
  • sample a total of 10,000 trees (say)
  • Then process output to get probabilities of each possible phylogenetic assignment of query sequence.

In MrBayes, if you run the program with the constraints of monophyletic groups, you are forcing the query sequence not to be part of these groups. So that won't work.

Instead, run it without the constraints and simply check how often the query sequence is member of a partition that only contains one particular phylogenetic group as members (at any phylognetic level, e.g. species, genus, family, order, etc).

Remember, MrBayes will output the probabililities of specific groups (or partitions) directly. So you don't have to do anything with trees yourself.

1Maybe we should use nst=6. Maybe also in the initila analyses we should not assume a molecular clock, i.e. we should not use the prset brlenspr=clock:uniform option.l

OK, so plateauing around 100,000 means that we should not use the first 100,000 iterations (1000 samples with samplefreq=100). We then want to know often the query sequence is part of a bipartition which, in addition to the the query sequence itself, only contain members of a particular taxonomic group. We want to do that at all taxonomic levels. You use the sumt burnin=1000 command to get output that specifies all the most supported partitions. For each taxonomic assignment in you database data, you then check how many times the query sequence is a member of at least one partition (one of the two sets defined by an edge in the tree) which except for the query sequence only counts sequences belonging to that taxonomix assignment as its members.

For example, if you have 8 database sequences and sequence 1, 2, 3 and 5 belong to group 'waggadoodles', and you have the following output:

...*.***.
*.......*
.*......*
******..*
..**....*

where the last sequence is the query sequence, then the probability of the query sequence belonging to the waggadoodles is 60% because it formed a unqiue (monophyletic) group with at least some waggadoodles in 3 out of 5 cases (case 1, 2 and 3).

Data

Initial data (trnL; should be same as below)

Received: 8-Dec-2005
From: Rasmus
Description: Here are some sample sequences. Use the first sequence in each alignment. The sequences are from a gene called 'trnL'.
Contains:

  • TB458_478_8C.fst
  • TB458_478_9C.fst
  • TB554_19_1C.fst
  • TB554_3C.fst
  • TB554_4C.fst
  • TBA487_2C.fst
  • TBB487_1C.fst
  • TBC496_2C.fst
  • TBD487_2C.fst
  • TBE487_1C.fst

trnL

Recevied: 8-Jan-2006
From: Rasmus
Description: Here are some sequences from a gene called trnL. For these and all the other sequences, we want the database sequences to be considerably longer, i.e. maybe 500 bp or 1000 bp to the 3' and 5' of our sequences - or however much there typicaly is in the relvant database sequences.
Contains:

  • Mike_1_4C (n/a)
    • >Mike_1_4C
    • >Araucaria_araucana* AY145322
    • >Pinus_uliginosa AF543755
    • >Pinus_mugo AF543754
    • >Pinus_sylvestris AF543753
    • >Pinus_strobus AF479874
    • >Pseudotsuga_sinensis AF440504
    • >Larix_occidentalis AF440502
    • >Larix_potaninii_potaninii AF440495
  • Mike_2_Betulaceae (n/a)
    • >Mike_2_Betulaceae
    • >Betula_pendula AF327578
    • >Alnus_japonica AY211427
    • >Ticodendron_incognitum AY147073
    • >Corylus_avellana AY147072
    • >Ostryopsis_davidiana AY147071
    • >Alnus_incana AF327574
    • >Rhoiptelea_chiliantha AY147081
    • >Ostrya_rehderiana AY211424
    • >Lophostemon_confertus AF190385
    • >Allosyncarpia_ternata AF190387
  • Mike_3 (n/a)
    • >Mike_3
    • >Gaylussacia_brasiliensis AF271713
    • >Vaccinium_altomontanum AF271698
    • >Gaylussacia_ursina AF271707
    • >Gaylussacia_dumosa AF271708
    • >Menziesia_pilosa AF452224
    • >Menziesia_ciliicalyx AF452223
    • >Rhododendron_nipponicum AF452215
    • >Bejaria_aestuans AF394264
    • >Ledum_palustre AF394252
  • Mike_4_2C (n/a)
    • >Mike_4_2C
    • >Taxus_yunnanensis AF501590
    • >Taxus_baccata AY013744
    • >Taxus_cuspidata AY013721
    • >Taxus_wallichiana_chinensis AY013720
    • >Taxus_chinensis_mairei AF501588
    • >Taxus_canadensis AF506837
  • TB458_478_8C_Poaceae (1sub)
    • >TB458_478_8C_Poaceae_1sub
    • >Poa_pratensis AY061952
    • >Poa_annua AY589123
    • >Poa_compressa AY504649
    • >Arctagrostis_latifolia AY237904
    • >Phalaris_arundinacea AY589138
    • >Elymus_repens AY362791
    • >Poa_alpina_alpina Y18511
    • >Arctophila_fulva AY237903
    • >Dupontia_fisheri AY237899
    • >Festuca_paniculata AF543515
  • TB458_478_9C_Rhamnaceae (4sub)
    • >TB458_478_9C_Rhamnaceae_4sub
    • >Paliurus_spina_christi AJ390354
    • >Emmenosperma_alphitonioides AJ390351
    • >Phylica_pubescens Y16771
    • >Pleuranthodes_hillebrandii AJ390348
    • >Trevoa_quinquenervia AY460427
    • >Retanilla_trinervia AY460426
    • >Glycine_clandestina AF518127
    • >Ziziphus_oenoplia AB235097
    • >Crumenaria_erecta AJ390346
    • >Colubrina_asiatica AJ390350
    • >Retanilla_stricta AY460425
  • TB487_12_1C_Poaceae (0sub)
    • >TB487_12_1C_Poaceae_0sub
    • >Poa_pratensis AY061951
    • >Poa_annua AY589123
    • >Poa_compressa AY504649
    • >Arctagrostis_latifolia AY237904
    • >Phalaris_arundinacea AY589138
    • >Elymus_repens AY362791
    • >Poa_alpina_alpina Y18511
    • >Arctophila_fulva AY237903
    • >Dupontia_fisheri AY237899
    • >Festuca_paniculata AF543515
  • TB487_14_2C_Veroniceae (0sub)
    • >TB487_14_2C_Veroniceae_0sub
    • >Veronica_sublobata AF486366
    • >Veronica_sibthorpioides AY540876
    • >Veronica_triloba AF513333
    • >Veronica_stewartii AY540875
    • >Veronicastrum_stenostachyum AF513354
    • >Wulfenia_orientalis AF486410
    • >Wulfenia_carinthiaca AF486409
    • >Veronica_panormitana AY776284
    • >Nemesia_aff_denticulata DQ222330
    • >Nemesia_foetens DQ222328
  • TB487_18_2C_Polygonaceae (0sub)
    • >TB487_18_2C_Polygonaceae_osub
    • >Rheum_wittrockii AY566464
    • >Rheum_tanguticum AY566457
    • >Rheum_lhasaense AY566463
    • >Rheum_globulosum AY566449
    • >Rheum_nobile AY566465
    • >Rheum_reticulatum AY566462
    • >Rheum_nanum AY566444
    • >Rumex_hastatulus AJ698484
    • >Rumex_tuberosus AJ698483
    • >Oxyria_digyna AY566466
    • >Rumex_cristatus AJ704864
    • >Rumex_japonicus AJ810936
  • TB487_23_1C_Betulaceae (0sub)
    • >TB487_23_1C_Betulaceae_0sub
    • >Corylus_avellana AY147072
    • >Ostryopsis_davidiana AY147071
    • >Carpinus_betulus AY147070
    • >Ticodendron_incognitum AY147073
    • >Alnus_japonica AY211427
    • >Alnus_incana AF327574
    • >Alnus_sinuata AY147067
    • >Ostrya_rehderiana AY211424
    • >Rhoiptelea_chiliantha AY147081
    • >Engelhardia_fenzelii AY147076
    • >Eucalyptus_viminalis AF306336
  • TB496_8_2C_Saliceae (0sub)
    • >TB496_8_2C_Saliceae_0sub
    • >Salix_sericea AY756930
    • >Salix_babylonica AY756929
    • >Chosenia_arbutifolia AY756928
    • >Scolopia_mundii AY756966
    • >Trimeria_grandifolia AY756920
    • >Xylosma_vincentii AY756973
    • >Prockia_pentamera AY756963
    • >Xylosma_panamensis AY756971
    • >Populus_nigra AF327591
    • >Zuelania_guidonia AY756911
  • TB554_19_1C_Rutaceae (0sub)
    • >TB554_19_1C_Rutaceae_0sub
    • >Hesperethusa_crenulata AY295298
    • >Triphasia_trifolia AY295297
    • >Afraegle_paniculata AY295295
    • >Eremocitrus_glauca AY295293 <-- [1], [2]
    • >Severinia_buxifolia AY295290
    • >Microcitrus_garrowayi AY295287
    • >Clausena_excavata AY295284
    • >Poncirus_trifoliata AY295282
    • >Balsamocitrus_dawei AY295278
    • >Citrus_paradisi AY295277
  • TB554_22_4C_Brassicaceae (0sub)
    • >TB554_22_4C_Brassicaceae_0sub
    • >Capsella_bursa_pastoris AY236218
    • >Erysimum_cheiri DQ180259
    • >Alyssum_klimesii DQ180242
    • >Crucihimalaya_himalaica DQ180227
    • >Capsella_rubella DQ180225
    • >Arabidopsis_halleri DQ180220
    • >Arabis_turrita DQ180217
    • >Arabidopsis_arenosa DQ191385
    • >Conringia_perfoliata AY722610
    • >Lepidium_trifurcum AY015900
  • TB554_6_5C_Saliceae (0sub)
    • >TB554_6_5C_Saliceae_0sub
    • >Salix_sericea AY756930
    • >Chosenia_arbutifolia AY756928
    • >Salix_caprea AF327597
    • >Scolopia_mundii AY756966
    • >Pseudoscolopia_polyantha AY756964
    • >Oncoba_spinosa AY756958
    • >Ludia_sp AY756952
    • >Trimeria_grandifolia AY756920
    • >Xylosma_vincentii AY756973
    • >Populus_tremuloides AY756927

COI Insect

Recevied: 8-Jan-2006
From: Rasmus
Description: Here are sequences from COI from insects.
Contains:

  • insect1.fasta
    • >TB3_11C_Coleoptera_Scarabaeidae
  • insect2.fasta
    • >TB2_6C_Coleoptera_Carabidae
  • insect3.fasta
    • >Mike_Diptera_Calliphoridae

rbcL

Recevied: 8-Jan-2006
From: Rasmus
Description: And here are some more rbcl sequences...
Contains:

  • Asteraceae.fasta
    • >EreAACT...
    • >CGGA...
    • >PluchCTCC...
    • >Flaveria_bidenGGTC...
    • >Buddleja_asiatGGTC...
    • >Leontodon_hispiduCCATA...
  • Betulaceae1.fasta
  • Betulaceae2.fasta
  • fabaceae.fasta

External links

Taxonomy

ClustalW

MrBayes

Misc

References