Difference between revisions of "GenBank"

From Christoph's Personal Wiki
Jump to: navigation, search
(See also)
Line 2: Line 2:
  
 
==Statistics==
 
==Statistics==
 +
*GenBank Flat File Release '''161.0''' (2007-08-15)
 +
**'''76,146,236''' loci, '''79,525,559,650''' bases, from '''76,146,236''' reported sequences<ref name="gbrel">[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
 +
**Uncompressed, the Release 161.0 flatfiles require roughly '''299 GB''' (sequence files only) or '''319 GB''' (including the '<code>short directory</code>', '<code>index</code>', and the <code>*.txt</code> files).
 
*GenBank Flat File Release '''158.0''' (2007-02-15)
 
*GenBank Flat File Release '''158.0''' (2007-02-15)
**'''67,218,344''' loci, '''71,292,211,453''' bases, from '''67,218,344''' reported sequences.<ref>[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
+
**'''67,218,344''' loci, '''71,292,211,453''' bases, from '''67,218,344''' reported sequences.<ref name="gbrel"/>
 
**Uncompressed, the Release 158.0 flatfiles require roughly '''251 GB''' (sequence files only) or '''263 GB''' (including the '<code>short directory</code>', '<code>index</code>', and the <code>*.txt</code> files).
 
**Uncompressed, the Release 158.0 flatfiles require roughly '''251 GB''' (sequence files only) or '''263 GB''' (including the '<code>short directory</code>', '<code>index</code>', and the <code>*.txt</code> files).
  

Revision as of 00:13, 30 September 2007

The GenBank (aka Genetic Sequence Data Bank) sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.[1][2] This database is produced at National Center for Biotechnology Information (NCBI).

Statistics

  • GenBank Flat File Release 161.0 (2007-08-15)
    • 76,146,236 loci, 79,525,559,650 bases, from 76,146,236 reported sequences[3]
    • Uncompressed, the Release 161.0 flatfiles require roughly 299 GB (sequence files only) or 319 GB (including the 'short directory', 'index', and the *.txt files).
  • GenBank Flat File Release 158.0 (2007-02-15)
    • 67,218,344 loci, 71,292,211,453 bases, from 67,218,344 reported sequences.[3]
    • Uncompressed, the Release 158.0 flatfiles require roughly 251 GB (sequence files only) or 263 GB (including the 'short directory', 'index', and the *.txt files).

Note: You can find the current release number by issuing the following commmand:

lynx --dump ftp://ftp.ncbi.nih.gov/genbank/GB_Release_Number

Selected Eukaryotic genomes

Note: The following are not part of the main NCBI GenBank database.

  • Fungi
    • Saccharomyces cerevisiae (Baker's Yeast)
    • Schizosaccharomyces pombe (Fission Yeast)
  • Plants
    • Arabidopsis thaliana
  • Vertebrates
    • Canis familiaris (Dog)
    • Gallus gallus (Chicken)
    • Homo sapiens (Human)
    • Mus musculus (Mouse)
    • Rattus norvegicus (Rat)
  • Invertebrates
    • Apis mellifera (Honey bee)
    • Caenorhabditis elegans (Nematode)
    • Drosophila melanogaster (Fruit fly)
  • Other
    • Encephalitozoon cuniculi (an intracellular parasite)

GenBank entries in the eukaryotic database

For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the list of completed eukaryotic genomes (NCBI).

See the complete list here: contig list (73,867 entries; 4.5 MB).

Flat file features

The following documents describe in detail the features of various flat files:

See also

  • Genome projects
  • TAB file format (aka "gb2tab")
  • build_gbff_cu.pl — Build a non-redundant cumulative GenBank flatfile from a set of GenBank Incremental Update (GIU) flatfiles provided by the NCBI. Documentation can be found here.
  • ffidx.pl — Generate an index file containing the sequence identifier and byte-offset of each record in a flatfile which contains biological sequence data.

References

  1. Benton D (1990). "Recent changes in the GenBank On-line Service". Nucleic Acids Research, 18(6):1517–1520.
  2. Benton D et al. (2006). "GenBank". Nucleic Acids Research, 34(Database):D16-D20.
  3. 3.0 3.1 NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes ('gbrel.txt') — 2007-02-15.

External links