Difference between revisions of "GenBank"

From Christoph's Personal Wiki
Jump to: navigation, search
(See also)
(See also)
Line 36: Line 36:
 
==See also==
 
==See also==
 
*[[Tab file format]] (aka "<code>gb2tab</code>")
 
*[[Tab file format]] (aka "<code>gb2tab</code>")
*[ftp://ftp.ncbi.nih.gov/genbank/tools/build_gbff_cu.pl build_gbff_cu.pl] &mdash; Build a non-redundant cumulative GenBank flatfile from a set of GenBank Incremental Update (GIU) flatfiles provided by the NCBI.
+
*[ftp://ftp.ncbi.nih.gov/genbank/tools/build_gbff_cu.pl build_gbff_cu.pl] &mdash; Build a non-redundant cumulative GenBank flatfile from a set of GenBank Incremental Update (GIU) flatfiles provided by the NCBI. Documentation can be found [ftp://ftp.ncbi.nih.gov/genbank/tools/doc.build_gbff_cu.html here].
 
*[ftp://ftp.ncbi.nih.gov/genbank/tools/ffidx.pl ffidx.pl] &mdash; Generate an index file containing the sequence identifier and byte-offset of each record in a flatfile which contains biological sequence data.
 
*[ftp://ftp.ncbi.nih.gov/genbank/tools/ffidx.pl ffidx.pl] &mdash; Generate an index file containing the sequence identifier and byte-offset of each record in a flatfile which contains biological sequence data.
  

Revision as of 23:49, 16 April 2007

The GenBank (aka Genetic Sequence Data Bank) sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.[1][2] This database is produced at National Center for Biotechnology Information (NCBI).

Statistics

  • GenBank Flat File Release 158.0 (2007-02-15)
    • 67,218,344 loci, 71,292,211,453 bases, from 67,218,344 reported sequences.[3]
    • Uncompressed, the Release 158.0 flatfiles require roughly 251 GB (sequence files only) or 263 GB (including the 'short directory', 'index' and the *.txt files).

Note: You can find the current release number by issuing the following commmand:

lynx --dump ftp://ftp.ncbi.nih.gov/genbank/GB_Release_Number

Selected Eukaryotic genomes

Note: The following are not part of the main NCBI GenBank database.

  • Fungi
    • Saccharomyces cerevisiae (Baker's Yeast)
    • Schizosaccharomyces pombe (Fission Yeast)
  • Plants
    • Arabidopsis thaliana
  • Vertebrates
    • Canis familiaris (Dog)
    • Gallus gallus (Chicken)
    • Homo sapiens (Human)
    • Mus musculus (Mouse)
    • Rattus norvegicus (Rat)
  • Invertebrates
    • Apis mellifera (Honey bee)
    • Caenorhabditis elegans (Nematode)
    • Drosophila melanogaster (Fruit fly)
  • Other
    • Encephalitozoon cuniculi (An intracellular parasite)

GenBank entries in the eukaryotic database

For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the list of completed eukaryotic genomes (NCBI).

See the complete list here: contig list (73,867 entries; 4.5MB).

See also

  • Tab file format (aka "gb2tab")
  • build_gbff_cu.pl — Build a non-redundant cumulative GenBank flatfile from a set of GenBank Incremental Update (GIU) flatfiles provided by the NCBI. Documentation can be found here.
  • ffidx.pl — Generate an index file containing the sequence identifier and byte-offset of each record in a flatfile which contains biological sequence data.

References

  1. Benton D (1990). "Recent changes in the GenBank On-line Service". Nucleic Acids Research, 18(6):1517–1520.
  2. Benton D et al. (2006). "GenBank". Nucleic Acids Research, 34(Database):D16-D20.
  3. NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes ('gbrel.txt') — 2007-02-15.

External links