Dot-parenthesis notation
The dot-parenthesis notation is used in bioinformatics to describe the secondary structure of RNA (including tRNA, rRNA, etc.).
Example
- Example #1 (Simple case):
>AB013372 GCGCCCGUAGCUCAAUUGGAUAGAGCGUUUGACUACGGAUCAAAAGGUUAGGGGUUCGACUCCUCUCGGGCGCG (((((((..((((.........)))).((((((....).))))).....(((((.....)))))..))))))).
In the above example, the first row is the name of the sequence (e.g., accession number, organism name, etc.) and the second row is the actual sequence in question. The third row is where the dot-parenthesis notation is used.
The string of dots and parentheses must be of the same length as the actual sequence (usually a predicted one). A dot in the string indicates that the corresponding nucleotide is unpaired. If nucleotides i and j are paired, where i < j, a left parenthesis '(
' at position i and a right parenthesis ')
' at position j are shown instead.
- Example #2:
>structure ssssddd.................(((.......)))............................... >gca_bovine AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU >gca_chicken GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU >gca_mouse GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU >gca_rat AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU
In the above example, the same dot-parenthesis notation is used, however, it is describing the secondary structure for all of the four sequences (from four different organisms) below. The four 's
' in the "structure" forces these nucleotides to be treated as unpaired (for any prediction algorithms) and the "d
's" force those sequences to be paired with something. The middle part is forced to form specific base-pairs as indicated by parentheses.
References
- Siebert S, Backofen R (2005). "MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons". Bioinformatics, 21(16):3352-3359.
- Knudsen B, Hein J (2003). "Pfold: RNA secondary structure prediction using stochastic context-free grammars". Nucleic Acids Research, 31(13):3423-3428.