Difference between revisions of "GenBank"

From Christoph's Personal Wiki
Jump to: navigation, search
(See also)
Line 4: Line 4:
 
*GenBank Flat File Release '''158.0''' (2007-02-15)
 
*GenBank Flat File Release '''158.0''' (2007-02-15)
 
**'''67,218,344''' loci, '''71,292,211,453''' bases, from '''67,218,344''' reported sequences.<ref>[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
 
**'''67,218,344''' loci, '''71,292,211,453''' bases, from '''67,218,344''' reported sequences.<ref>[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
**Uncompressed, the Release 158.0 flatfiles require roughly '''251 GB''' (sequence files only) or '''263 GB''' (including the '<code>short directory</code>', '<code>index</code>' and the <code>*.txt</code> files).
+
**Uncompressed, the Release 158.0 flatfiles require roughly '''251 GB''' (sequence files only) or '''263 GB''' (including the '<code>short directory</code>', '<code>index</code>', and the <code>*.txt</code> files).
  
 
Note: You can find the current release number by issuing the following commmand:
 
Note: You can find the current release number by issuing the following commmand:
Line 27: Line 27:
 
**''Drosophila melanogaster'' (Fruit fly)
 
**''Drosophila melanogaster'' (Fruit fly)
 
*Other
 
*Other
**''Encephalitozoon cuniculi'' (An intracellular parasite)
+
**''Encephalitozoon cuniculi'' (an intracellular parasite)
  
 
===GenBank entries in the eukaryotic database===
 
===GenBank entries in the eukaryotic database===
 
For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the [http://www.ncbi.nlm.nih.gov/genomes/static/euk_g.html list of completed eukaryotic genomes] (NCBI).
 
For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the [http://www.ncbi.nlm.nih.gov/genomes/static/euk_g.html list of completed eukaryotic genomes] (NCBI).
  
See the complete list here: [http://www.cbs.dtu.dk/services/FeatureExtract/contig_sum.txt contig list] (73,867 entries; 4.5MB).
+
See the complete list here: [http://www.cbs.dtu.dk/services/FeatureExtract/contig_sum.txt contig list] (73,867 entries; 4.5 MB).
  
 
==See also==
 
==See also==

Revision as of 09:50, 17 April 2007

The GenBank (aka Genetic Sequence Data Bank) sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.[1][2] This database is produced at National Center for Biotechnology Information (NCBI).

Statistics

  • GenBank Flat File Release 158.0 (2007-02-15)
    • 67,218,344 loci, 71,292,211,453 bases, from 67,218,344 reported sequences.[3]
    • Uncompressed, the Release 158.0 flatfiles require roughly 251 GB (sequence files only) or 263 GB (including the 'short directory', 'index', and the *.txt files).

Note: You can find the current release number by issuing the following commmand:

lynx --dump ftp://ftp.ncbi.nih.gov/genbank/GB_Release_Number

Selected Eukaryotic genomes

Note: The following are not part of the main NCBI GenBank database.

  • Fungi
    • Saccharomyces cerevisiae (Baker's Yeast)
    • Schizosaccharomyces pombe (Fission Yeast)
  • Plants
    • Arabidopsis thaliana
  • Vertebrates
    • Canis familiaris (Dog)
    • Gallus gallus (Chicken)
    • Homo sapiens (Human)
    • Mus musculus (Mouse)
    • Rattus norvegicus (Rat)
  • Invertebrates
    • Apis mellifera (Honey bee)
    • Caenorhabditis elegans (Nematode)
    • Drosophila melanogaster (Fruit fly)
  • Other
    • Encephalitozoon cuniculi (an intracellular parasite)

GenBank entries in the eukaryotic database

For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the list of completed eukaryotic genomes (NCBI).

See the complete list here: contig list (73,867 entries; 4.5 MB).

See also

  • Tab file format (aka "gb2tab")
  • build_gbff_cu.pl — Build a non-redundant cumulative GenBank flatfile from a set of GenBank Incremental Update (GIU) flatfiles provided by the NCBI. Documentation can be found here.
  • ffidx.pl — Generate an index file containing the sequence identifier and byte-offset of each record in a flatfile which contains biological sequence data.

References

  1. Benton D (1990). "Recent changes in the GenBank On-line Service". Nucleic Acids Research, 18(6):1517–1520.
  2. Benton D et al. (2006). "GenBank". Nucleic Acids Research, 34(Database):D16-D20.
  3. NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes ('gbrel.txt') — 2007-02-15.

External links