NEXUS file format

From Christoph's Personal Wiki
Jump to: navigation, search

NEXUS is the file format used by many popular programs like GDA, Paup*, Mesquite, ModelTest, MrBayes, and MacClade. Nexus file names often have a .nxs or .nex extension.

The NEXUS format conveys data organized according to the character state data model, in which the features of operational taxonomic units (OTUs) (e.g., species, individuals, genes, genomes, etc.) are observable states of underlying homologous characters. For instance, in a protein sequence alignment, proteins are the OTUs, alignment columns are characters, and amino acids (or gaps) are states. In evolutionary analysis, it is typical to consider differences as the result of state transitions that take place on branches of a tree, therefore the NEXUS file provides a means to represent a tree (in the standard Newick (a.k.a. New Hampshire) format).

Syntactic structure

The syntactic structure of a NEXUS file is as follows:

#NEXUS
begin < blockname >;
    < command > < argument > [additional argument];
    [ < another command with args >; ]
end;
[ < another block with commands > ]
  • The syntax for the TREES block is
BEGIN TREES;
    [Translate arbitrary-token-used-in-tree-description valid-taxon-name
        [, arbitrary-token-used-in-tree-description valid-taxon-name ...];]
    [Tree [*] tree-name=tree-specification;]
END;
  • Example syntax for a TREES block in a NEXUS file
BEGIN TAXA;
    TaxLabels Scarabaeus Drosophila Aranaeus;
END;

BEGIN TREES;
    Translate beetle Scarabaeus, fly Drosophila, spider Aranaeus;
    Tree tree1 = ((1,2),3);
    Tree tree2 = ((beetle,fly),spider);
    Tree tree3 = ((Scarabaeus,Drosophila),Aranaeus);
END;

Each of the pre-defined types of public blocks may appear only once. The TAXA block is the only necessary block. There are some restrictions on the ordering of blocks, and on the ordering of commands within a block. Application-specific "private" blocks are also possible. NEXUS keywords are not case-sensitive. Names of BLOCKS in upper case, shown here, are only for mnemonic purposes.

Some important public blocks
Name Description
TAXA specifies OTUs in data set
CHARACTERS specifies characters
SETS assigns names to sets of characters or OTUs
ASSUMPTIONS houses assumptions about the data or gives general directions as to how to treat them (e.g., which characters are to be excluded from consideration)
CODONS specifies codons and their genetic codes
DATA equivalent to a CHARACTERS block in which the NewTaxa subcommand is included in the Dimensions command
TREES stores information about trees
UNALIGNED
DISTANCES contains distance matrices
SETS stores sets of objects (characters, states, taxa, etc.)
source: Maddison et al., 1997


Some important commands
Name Block Description
TaxLabels CHARACTERS allows specification of the names of the taxa
CharLabels CHARACTERS label for a character (column)
StateLabels CHARACTERS label for a state (the type of an instance of a character)
CharStateLabels CHARACTERS combined label for a character and its states
CharSet SETS specifies and names a set of characters
TaxSet SETS give a name to some set of OTUs
GeneticCode CODONS specify a genetic code
CodeSet CODONS associate a code with a CharSet or TaxSet
Tree TREES specify a "Newick tree"
CodonPosSet
StateSet
ChangeSet
TreeSet
CharPartition define partition of characters
TaxPartition define partition of taxa
TreePartition define partition of trees
UserType
WtSet specifies the weights of each character (standard object definition command)
TypeSet specifies the type assigned to each character as used in parsimony analysis
ExSet specifies which characters are to be excluded from consideration
AncStates allows specification of ancestral states
Common
Dimensions specifies the number of characters.
Format specifies the format of the data Matrix (a crucial command)
Eliminate allows specification of a list of characters that are to be excluded from consideration.
Matrix contains a sequence of taxon names and state information for that taxon
source: Maddison et al., 1997


Format subcommands

The following are possible formatting subcommands:

  • DataType = { standard | DNA | RNA | nucleotide | protein | continuous }
  • RespectCase
  • Missing
  • Gap
  • Symbols
  • Equate
  • MatchChar
  • [No]Labels
  • Transpose
  • Interleave
  • Items
  • StatesFormat
  • [No]Tokens

NEXUS Objects

Many of the commands in a NEXUS file define objects or specify characteristics about them. All objects can be labeled (given names). Duplicate names should be avoided as should names that differ only in case.

List of currently defined objects:

  • taxa
  • characters
  • states
  • trees
  • genetic codes
  • sets (of taxa, characters, states, classes of changes between states, trees)
  • partitions (of taxa, trees, characters)
  • weight sets
  • types
  • type sets
  • character exclusion sets
  • ancestral states
  • codon position sets
Definition list 
list of definitions

Example Trees

Two character state trees and the NEXUS commands that define them. The first tree has an unnamed state.

     2     3        4     6
      \   /          \   /
       \ /            \ /
        *              3  5
        |              | /
        |              |/
        1              2
        |              |
        |              |
        0              1
      first          second

USERTYPE first (CSTREE) = (((2,3))1)0;
USERTYPE second (CSTREE) = (((4,6)3,5)2)1;

Example NEXUS files

Basic

#NEXUS

BEGIN TAXA;
      dimensions ntax=4;
      taxlabels A B C D;
END;

BEGIN CHARACTERS;
      dimensions nchar=5;
      format datatype=protein gap=-;
      charlabels 1 2 3 4 Five;
      matrix
A     MA-LL
B     MA-LE
C     MEATY
D     ME-TE
END;

BEGIN TREES;
       tree "basic bush" = ((A:1,B:1):1,(C:1,D:1):1);
END;

Simple example (orginal from paper)

#NEXUS
BEGIN TAXA;
      Dimensions NTax=4;
      TaxLabels fish frog snake mouse;
END;

BEGIN CHARACTERS;
      Dimensions NChar=20;
      Format DataType=DNA;
      Matrix
        fish   ACATA GAGGG TACCT CTAAG
        frog   ACATA GAGGG TACCT CTAAG
        snake  ACATA GAGGG TACCT CTAAG
        mouse  ACATA GAGGG TACCT CTAAG
END;

BEGIN TREES;
      Tree best=(fish, (frog, (snake, mouse)));
END;

Complex example (orginial from paper)

note: block names in bold (<b>); commands underlined (<u>).

BEGIN <b>TAXA</b>;
  <u>DIMENSIONS</u> ntax=26;
  <u>TAXLABELS</u>  O_volvulus_AAB64227.1 O_volvulus_AAB64226.1 C_elegans_AAF39759.1 C_elegans_AAA83577.1 
    S_cerevisiae_CAA89634.1 C_albicans_AAC12872.1 S_pombe_CAB57444.1 N_crassa_AAA63780.1 M_musculus_AAA40121.1 
    C_capitata_AAA57249.1 D_virilis_CAA32060.1 D_erecta_AAF23595.1 D_orena_AAF23594.1 D_teissieri_AAF23599.1 
    D_yakuba_AAF23598.1 D_melanogaster_AAF50095.1 D_mauritiana_AAF23597.1 D_sechellia_AAF23596.1 
    D_simulans_CAA33720.1 Z_mays_AAB49913.1 O_sativa_AAC14464.1 O_sativa_AAC14465.1 A_thaliana_AAF99769.1 
    P_tremuloides_AAD01605.1 A_thaliana_BAB09468.1 A_thaliana_AAD29823.2;
END;
BEGIN <b>CHARACTERS</b>;
  <u>DIMENSIONS</u> ntax=26 nchar=30;
  <u>FORMAT</u>  datatype=protein gap=- missing=?;
  <u>CHARLABELS</u> 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 
    113 114 115 116 117 118 119 120;
  <u>MATRIX</u>

    M_musculus_AAA40121.1       QGTIHFEQKASGE--PVVLSGQITGLTE-G
    C_capitata_AAA57249.1       KGTVHFEQQDAKS--PVLVTGEVNGLAK-G
    N_crassa_AAA63780.1	        KGTVIFEQESESA--PTTITYDISGNDPNA
       <font color="red">--stuff deleted here--</font> 
    D_simulans_CAA33720.1       KGTVFFEQESSGT--PVKVSGEVCGLAK-G
    S_cerevisiae_CAA89634.1     SGVVKFEQASESE--PTTVSYEIAGNSPNA
    S_pombe_CAB57444.1	        SGVVTFEQVDQNS--QVSVIVDLVGNDANA;
END;
BEGIN <b>ASSUMPTIONS</b>;
  <u>WTSET</u> MySoapWeights  (VECTOR) = 1 1 1 1 1 1 1 1 0.83 0.8 0.8 0.8 0.8 0.8 0.71 0.71 1 1 1 1 1 1 1 1 
    1 1 1 1 1 1;
END;
BEGIN <b>TREES</b>;
  <u>TREE</u> "Cu-Zn Superoxide Dismutase" = (((((O_volvulus_AAB64227.1:0.31741,O_volvulus_AAB64226.1:0.13498):
    0.20268[1],(C_elegans_AAF39759.1:0.14579,C_elegans_AAA83577.1:0.27311):0.2533[1]):0.12655[0.98],
    ((S_cerevisiae_CAA89634.1:0.28255,C_albicans_AAC12872.1:0.25631):0.08358[0.91],(S_pombe_CAB57444.1:
    0.3159,N_crassa_AAA63780.1:0.1635):0.11954[0.97]):0.17514[1]):0.08988[0.77],(M_musculus_AAA40121.1:
    0.49149,(C_capitata_AAA57249.1:0.18945,(D_virilis_CAA32060.1:0.11453,(((D_erecta_AAF23595.1:0.00661,
    D_orena_AAF23594.1:0.00769):0.00497[0.92],(D_teissieri_AAF23599.1:0.004,D_yakuba_AAF23598.1:0.01012):
    0.0073[0.87]):0.01271[0.88],(((D_melanogaster_AAF50095.1:0.00836,D_mauritiana_AAF23597.1:0.00552):
    0.00203[0.28],D_sechellia_AAF23596.1:0.01103):0.00398[0.7],D_simulans_CAA33720.1:0.00595):0.00739[0.75]):
    0.11795[1]):0.11754[1]):0.12932[1]):0.10326[1]):0.0712[0.9],(((((Z_mays_AAB49913.1:0.05142,
    O_sativa_AAC14464.1:0.09031):0.02799[0.98],O_sativa_AAC14465.1:0.06915):0.05245[0.99],
    (A_thaliana_AAF99769.1:0.17064,P_tremuloides_AAD01605.1:0.1075):0.08023[1]):0.08596[1],
    A_thaliana_BAB09468.1:0.46052):0.06401[0.75],A_thaliana_AAD29823.2:0.42442):0.14252[0.94]);
END;

DNA

#NEXUS
BEGIN DATA;
        Dimensions NTax=10 NChar=705;
        Format DataType=DNA Interleave=yes Gap=- Missing=?;
        Matrix
Cow     ATGGC ATATC CCATA CAACT AGGAT TCCAA GATGC AACAT CACCA ATCAT AGAAG AACTA
Carp    ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTT
Chicken ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTC
Human   ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTT
Loach   ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTT
Mouse   ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTA
Rat     ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTT
Seal    ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTA
Whale   ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTC
Frog    ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTA

Cow     CTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTAC
Carp    CTTCACTTCCACGACCACGCATTAATAATTGTGCTCCTAATTAGCACTTTAGTTTTATAT
Chicken GTTGAATTCCACGACCACGCCCTGATAGTCGCACTAGCAATTTGCAGCTTAGTACTCTAC
Human   ATCACCTTTCATGATCACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCCTGTAT
Loach   CTTCACTTCCATGACCATGCCCTAATAATTGTATTTTTGATTAGCGCCCTAGTACTTTAT
Mouse   ATAAATTTCCATGATCACACACTAATAATTGTTTTCCTAATTAGCTCCTTAGTCCTCTAT
Rat     ACAAACTTTCATGACCACACCCTAATAATTGTATTCCTCATCAGCTCCCTAGTACTTTAT
Seal    CTACACTTCCATGACCACACATTAATAATTGTGTTCCTAATTAGCTCATTAGTACTCTAC
Whale   CTACACTTTCACGATCATACACTAATAATCGTTTTTCTAATTAGCTCTTTAGTTCTCTAC
Frog    CTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTAC

Cow     ATTATTTCACTAATACTAACGACAAAGCTGACCCATACAAGCACGATAGATGCACAAGAA
Carp    ATTATTACTGCAATGGTATCAACTAAACTTACTAATAAATATATTCTAGACTCCCAAGAA
Chicken CTTCTAACTCTTATACTTATAGAAAAACTATCA---TCAAACACCGTAGATGCCCAAGAA
Human   GCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAA
Loach   GTTATTATTACAACCGTCTCAACAAAACTCACTAACATATATATTTTGGACTCACAAGAA
Mouse   ATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAA
Rat     ATTATTTCACTAATACTAACAACAAAACTAACACACACAAGCACAATAGACGCCCAAGAA
Seal    ATTATCTCACTTATACTAACCACGAAACTCACCCACACAAGTACAATAGACGCACAAGAA
Whale   ATTATTACCCTAATGCTTACAACCAAATTAACACATACTAGTACAATAGACGCCCAAGAA
Frog    ATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAG

Cow     GTAGAGACAATCTGAACCATTCTGCCCGCCATCATCTTAATTCTAATTGCTCTTCCTTCT
Carp    ATCGAAATCGTATGAACCATTCTACCAGCCGTCATTTTAGTACTAATCGCCCTGCCCTCC
Chicken GTTGAACTAATCTGAACCATCCTACCCGCTATTGTCCTAGTCCTGCTTGCCCTCCCCTCC
Human   ATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCC
Loach   ATTGAAATCGTATGAACTGTGCTCCCTGCCCTAATCCTCATTTTAATCGCCCTCCCCTCA
Mouse   GTTGAAACCATTTGAACTATTCTACCAGCTGTAATCCTTATCATAATTGCTCTCCCCTCT
Rat     GTAGAAACAATTTGAACAATTCTCCCAGCTGTCATTCTTATTCTAATTGCCCTTCCCTCC
Seal    GTGGAAACGGTGTGAACGATCCTACCCGCTATCATTTTAATTCTCATTGCCCTACCATCA
Whale   GTAGAAACTGTCTGAACTATCCTCCCAGCCATTATCTTAATTTTAATTGCCTTGCCTTCA
Frog    ATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCC

Cow     TTACGAATTCTATACATAATAGATGAAATCAATAACCCATCTCTTACAGTAAAAACCATA
Carp    CTACGCATCCTGTACCTTATAGACGAAATTAACGACCCTCACCTGACAATTAAAGCAATA
Chicken CTCCAAATCCTCTACATAATAGACGAAATCGACGAACCTGATCTCACCCTAAAAGCCATC
Human   CTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATT
Loach   CTACGAATTCTATATCTTATAGACGAGATTAATGACCCCCACCTAACAATTAAGGCCATG
Mouse   CTACGCATTCTATATATAATAGACGAAATCAACAACCCCGTATTAACCGTTAAAACCATA
Rat     CTACGAATTCTATACATAATAGACGAGATTAATAACCCAGTTCTAACAGTAAAAACTATA
Seal    TTACGAATCCTCTACATAATGGACGAGATCAATAACCCTTCCTTGACCGTAAAAACTATA
Whale   TTACGGATCCTTTACATAATAGACGAAGTCAATAACCCCTCCCTCACTGTAAAAACAATA
Frog   CTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATC

Cow     GGACATCAGTGATACTGAAGCTATGAGTATACAGATTATGAGGACTTAAGCTTCGACTCC
Carp    GGACACCAATGATACTGAAGTTACGAGTATACAGACTATGAAAATCTAGGATTCGACTCC
Chicken GGACACCAATGATACTGAACCTATGAATACACAGACTTCAAGGACCTCTCATTTGACTCC
Human   GGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCC
Loach   GGGCACCAATGATACTGAAGCTACGAGTATACTGATTATGAAAACTTAAGTTTTGACTCC
Mouse   GGGCACCAATGATACTGAAGCTACGAATATACTGACTATGAAGACCTATGCTTTGATTCA
Rat     GGACACCAATGATACTGAAGCTATGAATATACTGACTATGAAGACCTATGCTTTGACTCC
Seal    GGACATCAGTGATACTGAAGCTATGAGTACACAGACTACGAAGACCTGAACTTTGACTCA
Whale   GGTCACCAATGATATTGAAGCTATGAGTATACCGACTACGAAGACCTAAGCTTCGACTCC
Frog    GGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCT

Cow     TACATAATTCCAACATCAGAATTAAAGCCAGGGGAGCTACGACTATTAGAAGTCGATAAT
Carp    TATATAGTACCAACCCAAGACCTTGCCCCCGGACAATTCCGACTTCTGGAAACAGACCAC
Chicken TACATAACCCCAACAACAGACCTCCCCCTAGGCCACTTCCGCCTACTAGAAGTCGACCAT
Human   TACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAAT
Loach   TACATAATCCCCACCCAGGACCTAACCCCTGGACAATTCCGGCTACTAGAGACAGACCAC
Mouse   TATATAATCCCAACAAACGACCTAAAACCTGGTGAACTACGACTGCTAGAAGTTGATAAC
Rat     TACATAATCCCAACCAATGACCTAAAACCAGGTGAACTTCGTCTATTAGAAGTTGATAAT
Seal    TATATGATCCCCACACAAGAACTAAAGCCCGGAGAACTACGACTGCTAGAAGTAGACAAT
Whale   TATATAATCCCAACATCAGACCTAAAGCCAGGAGAACTACGATTATTAGAAGTAGATAAC
Frog    TATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAAT

Cow     CGAGTTGTACTACCAATAGAAATAACAATCCGAATGTTAGTCTCCTCTGAAGACGTATTA
Carp    CGAATAGTTGTTCCAATAGAATCCCCAGTCCGTGTCCTAGTATCTGCTGAAGACGTGCTA
Chicken CGCATTGTAATCCCCATAGAATCCCCCATTCGAGTAATCATCACCGCTGATGACGTCCTC
Human   CGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTG
Loach   CGAATGGTTGTTCCCATAGAATCCCCTATTCGCATTCTTGTTTCCGCCGAAGATGTACTA
Mouse   CGAGTCGTTCTGCCAATAGAACTTCCAATCCGTATATTAATTTCATCTGAAGACGTCCTC
Rat     CGGGTAGTCTTACCAATAGAACTTCCAATTCGTATACTAATCTCATCCGAAGACGTCCTG
Seal    CGAGTAGTCCTCCCAATAGAAATAACAATCCGCATACTAATCTCATCAGAAGATGTACTC
Whale   CGAGTTGTCTTACCTATAGAAATAACAATCCGAATATTAGTCTCATCAGAAGACGTACTC
Frog    CGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTC

Cow     CACTCATGAGCTGTGCCCTCTCTAGGACTAAAAACAGACGCAATCCCAGGCCGTCTAAAC
Carp    CATTCTTGAGCTGTTCCATCCCTTGGCGTAAAAATGGACGCAGTCCCAGGACGACTAAAT
Chicken CACTCATGAGCCGTACCCGCCCTCGGGGTAAAAACAGACGCAATCCCTGGACGACTAAAT
Human   CACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAAC
Loach   CACTCCTGGGCCCTTCCAGCCATGGGGGTAAAGATAGACGCGGTCCCAGGACGCCTTAAC
Mouse   CACTCATGAGCAGTCCCCTCCCTAGGACTTAAAACTGATGCCATCCCAGGCCGACTAAAT
Rat     CACTCATGAGCCATCCCTTCACTAGGGTTAAAAACCGACGCAATCCCCGGCCGCCTAAAC
Seal    CACTCATGAGCCGTACCGTCCCTAGGACTAAAAACTGATGCTATCCCAGGACGACTAAAC
Whale   CACTCATGGGCCGTACCCTCCTTGGGCCTAAAAACAGATGCAATCCCAGGACGCCTAAAC
Frog    CACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCAT

Cow     CAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAATGCTCAGAAATTTGC
Carp    CAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAATGCTCTGAAATTTGT
Chicken CAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAATGCTCAGAAATCTGC
Human   CAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCTCTGAAATCTGT
Loach   CAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAATGCTCAGAAATCTGT
Mouse   CAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAATGCTCTGAAATTTGT
Rat     CAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAATGCTCTGAAATTTGC
Seal    CAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAATGCTCAGAAATCTGT
Whale   CAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAATGCTCAGAGATCTGC
Frog    CAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGC

Cow     GGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCACTAAAGTACTTTGAA
Carp    GGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCTCTCGAACACTTCGAA
Chicken GGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCCCTAAAACACTTTGAA
Human   GGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCTAAAAATCTTTGAA
Loach   GGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCACTATCTCACTTCGAA
Mouse   GGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCACTAAAATATTTCGAA
Rat     GGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCTCTAAAATATTTCGAA
Seal    GGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCACTATCCCACTTCGAG
Whale   GGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCCCTAGAAGTCTTTGAA
Frog    GGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAA

Cow     AAATGATCTGCGTCAATATTA---------------------TAA
Carp    AACTGATCCTCATTAATACTAGAAGACGCCTCGCTAGGAAGCTAA
Chicken GCCTGATCCTCACTA------------------CTGTCATCTTAA
Human   ATA---------------------GGGCCCGTATTTACCCTATAG
Loach   AACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA
Mouse   AACTGATCTGCTTCAATAATT---------------------TAA
Rat     AACTGATCAGCTTCTATAATT---------------------TAA
Seal    AAATGATCTACCTCAATGCTT---------------------TAA
Whale   AAATGATCTGTATCAATACTA---------------------TAA
Frog    AACTGATCTTCATCAATACTA---GAAGCATCACTA------AGA
;
END;

Amino Acid

#NEXUS 

Begin data;
Dimensions ntax=10 nchar=234;
Format datatype=protein gap=- interleave;
Matrix
Cow     MAYPMQLGFQDATSPIMEELLHFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Carp    MAHPTQLGFKDAAMPVMEELLHFHDHALMIVLLISTLVLYIITAMVSTKLTNKYILDSQE
Chicken MANHSQLGFQDASSPIMEELVEFHDHALMVALAICSLVLYLLTLMLMEKLS-SNTVDAQE
Human   MAHAAQVGLQDATSPIMEELITFHDHALMIIFLICFLVLYALFLTLTTKLTNTNISDAQE
Loach   MAHPTQLGFQDAASPVMEELLHFHDHALMIVFLISALVLYVIITTVSTKLTNMYILDSQE
Mouse   MAYPFQLGLQDATSPIMEELMNFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Rat     MAYPFQLGLQDATSPIMEELTNFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Seal    MAYPLQMGLQDATSPIMEELLHFHDHTLMIVFLISSLVLYIISLMLTTKLTHTSTMDAQE
Whale   MAYPFQLGFQDAASPIMEELLHFHDHTLMIVFLISSLVLYIITLMLTTKLTHTSTMDAQE
Frog    MAHPSQLGFQDAASPIMEELLHFHDHTLMAVFLISTLVLYIITIMMTTKLTNTNLMDAQE

Cow     VETIWTILPAIILILIALPSLRILYMMDEINNPSLTVKTMGHQWYWSYEYTDYEDLSFDS
Carp    IEIVWTILPAVILVLIALPSLRILYLMDEINDPHLTIKAMGHQWYWSYEYTDYENLGFDS
Chicken VELIWTILPAIVLVLLALPSLQILYMMDEIDEPDLTLKAIGHQWYWTYEYTDFKDLSFDS
Human   METVWTILPAIILVLIALPSLRILYMTDEVNDPSLTIKSIGHQWYWTYEYTDYGGLIFNS
Loach   IEIVWTVLPALILILIALPSLRILYLMDEINDPHLTIKAMGHQWYWSYEYTDYENLSFDS
Mouse   VETIWTILPAVILIMIALPSLRILYMMDEINNPVLTVKTMGHQWYWSYEYTDYEDLCFDS
Rat     VETIWTILPAVILILIALPSLRILYMMDEINNPVLTVKTMGHQWYWSYEYTDYEDLCFDS
Seal    VETVWTILPAIILILIALPSLRILYMMDEINNPSLTVKTMGHQWYWSYEYTDYEDLNFDS
Whale   VETVWTILPAIILILIALPSLRILYMMDEVNNPSLTVKTMGHQWYWSYEYTDYEDLSFDS
Frog    IEMVWTIMPAISLIMIALPSLRILYLMDEVNDPHLTIKAIGHQWYWSYEYTNYEDLSFDS

Cow     YMIPTSELKPGELRLLEVDNRVVLPMEMTIRMLVSSEDVLHSWAVPSLGLKTDAIPGRLN
Carp    YMVPTQDLAPGQFRLLETDHRMVVPMESPVRVLVSAEDVLHSWAVPSLGVKMDAVPGRLN
Chicken YMTPTTDLPLGHFRLLEVDHRIVIPMESPIRVIITADDVLHSWAVPALGVKTDAIPGRLN
Human   YMLPPLFLEPGDLRLLDVDNRVVLPIEAPIRMMITSQDVLHSWAVPTLGLKTDAIPGRLN
Loach   YMIPTQDLTPGQFRLLETDHRMVVPMESPIRILVSAEDVLHSWALPAMGVKMDAVPGRLN
Mouse   YMIPTNDLKPGELRLLEVDNRVVLPMELPIRMLISSEDVLHSWAVPSLGLKTDAIPGRLN
Rat     YMIPTNDLKPGELRLLEVDNRVVLPMELPIRMLISSEDVLHSWAIPSLGLKTDAIPGRLN
Seal    YMIPTQELKPGELRLLEVDNRVVLPMEMTIRMLISSEDVLHSWAVPSLGLKTDAIPGRLN
Whale   YMIPTSDLKPGELRLLEVDNRVVLPMEMTIRMLVSSEDVLHSWAVPSLGLKTDAIPGRLN
Frog    YMIPTNDLTPGQFRLLEVDNRMVVPMESPTRLLVTAEDVLHSWAVPSLGVKTDAIPGRLH

Cow     QTTLMSSRPGLYYGQCSEICGSNHSFMPIVLELVPLKYFEKWSASML-------
Carp    QAAFIASRPGVFYGQCSEICGANHSFMPIVVEAVPLEHFENWSSLMLEDASLGS
Chicken QTSFITTRPGVFYGQCSEICGANHSYMPIVVESTPLKHFEAWSSL------LSS
Human   QTTFTATRPGVYYGQCSEICGANHSFMPIVLELIPLKIFEM-------GPVFTL
Loach   QTAFIASRPGVFYGQCSEICGANHSFMPIVVEAVPLSHFENWSTLMLKDASLGS
Mouse   QATVTSNRPGLFYGQCSEICGSNHSFMPIVLEMVPLKYFENWSASMI-------
Rat     QATVTSNRPGLFYGQCSEICGSNHSFMPIVLEMVPLKYFENWSASMI-------
Seal    QTTLMTMRPGLYYGQCSEICGSNHSFMPIVLELVPLSHFEKWSTSML-------
Whale   QTTLMSTRPGLFYGQCSEICGSNHSFMPIVLELVPLEVFEKWSVSML-------
Frog    QTSFIATRPGVFYGQCSEICGANHSFMPIVVEAVPLTDFENWSSSML-EASL--
	;
End;

References

  • Maddison DR, Swofford DL, and Maddison WP (1997). NEXUS: An extensible file format for systematic information. Syst Biol 46:590-621. (PDF)

External links

Topics in phylogenetics
Relevant fields: phylogenetics | computational phylogenetics | molecular phylogeny | cladistics
Basic concepts: synapomorphy | phylogenetic tree | phylogenetic network | long branch attraction
Phylogeny inference methods: maximum parsimony | maximum likelihood | neighbour joining | UPGMA