Difference between revisions of "GenBank"

Revision as of 23:39, 27 February 2008

The GenBank (aka Genetic Sequence Data Bank) sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.^[1]^[2] This database is produced at National Center for Biotechnology Information (NCBI).

Statistics

GenBank Flat File Release 164.0 (2008-02-15)
- 82,853,685 loci, 85,759,586,764 bases, from 82,853,685 reported sequences.^[3]
- Uncompressed, the Release 164.0 flatfiles require roughly 321 GB (sequence files only) or 342 GB (including the 'short directory', 'index', and the *.txt files).
GenBank Flat File Release 161.0 (2007-08-15)
- 76,146,236 loci, 79,525,559,650 bases, from 76,146,236 reported sequences.^[3]
- Uncompressed, the Release 161.0 flatfiles require roughly 299 GB (sequence files only) or 319 GB (including the 'short directory', 'index', and the *.txt files).
GenBank Flat File Release 158.0 (2007-02-15)
- 67,218,344 loci, 71,292,211,453 bases, from 67,218,344 reported sequences.^[3]
- Uncompressed, the Release 158.0 flatfiles require roughly 251 GB (sequence files only) or 263 GB (including the 'short directory', 'index', and the *.txt files).

Note: You can find the current release number by issuing the following commmand:

lynx --dump ftp://ftp.ncbi.nih.gov/genbank/GB_Release_Number

Selected Eukaryotic genomes

Note: The following are not part of the main NCBI GenBank database.

Fungi
- Saccharomyces cerevisiae (Baker's Yeast)
- Schizosaccharomyces pombe (Fission Yeast)
Plants
- Arabidopsis thaliana
Vertebrates
- Canis familiaris (Dog)
- Gallus gallus (Chicken)
- Homo sapiens (Human)
- Mus musculus (Mouse)
- Rattus norvegicus (Rat)
Invertebrates
- Apis mellifera (Honey bee)
- Caenorhabditis elegans (Nematode)
- Drosophila melanogaster (Fruit fly)
Other
- Encephalitozoon cuniculi (an intracellular parasite)

GenBank entries in the eukaryotic database

For details please refer to the NCBI genome FTP site at: ftp://ftp.ncbi.nih.gov/genomes/ and the list of completed eukaryotic genomes (NCBI).

See the complete list here: contig list (73,867 entries; 4.5 MB).

Flat file features

The following documents describe in detail the features of various flat files:

EMBL Features and Qualifiers
User Manual — by UniProt Knowledgebase (release 10.4; 2007-05-01)
Protein naming guidelines — by UniProt - Swiss-Prot Protein Knowledgebase (release 52.4; 2007-05-01)

Index files

The index keys (accession numbers, keywords, authors, journals, and gene symbols.) of an index are sorted alphabetically. Following each index key, the identifiers of the sequence entries containing that key are listed (LOCUS name, division abbreviation, and primary accession number). The division abbreviations are:

PRI - primate sequences
ROD - rodent sequences
MAM - other mammalian sequences
VRT - other vertebrate sequences
INV - invertebrate sequences
PLN - plant, fungal, and algal sequences
BCT - bacterial sequences
VRL - viral sequences
PHG - bacteriophage sequences
SYN - synthetic sequences
UNA - unannotated sequences
EST - EST sequences (expressed sequence tags)
PAT - patent sequences
STS - STS sequences (sequence tagged sites)
GSS - GSS sequences (genome survey sequences)
HTG - HTGS sequences (high throughput genomic sequences)
HTC - HTC sequences (high throughput cDNA sequences)
ENV - Environmental sampling sequences
CON - Constructed sequences

References

↑ Benton D (1990). "Recent changes in the GenBank On-line Service". Nucleic Acids Research, 18(6):1517–1520.
↑ Benton D et al. (2006). "GenBank". Nucleic Acids Research, 34(Database):D16-D20.
↑ ^3.0 ^3.1 ^3.2 NCBI-GenBank Flat File - Distribution Release Notes ('gbrel.txt') — 2007-02-15.

External links

GenBank (overview)
FTP directory containing full GenBank flat file releases (NCBI)
Genomes (NCBI)
List of completed eukaryotic genomes (NCBI)
The DDBJ/EMBL/GenBank Feature Table: Definition — version 6.6, 2006-10.
Trace Archive

[Benton1990-1] Benton D (1990). "Recent changes in the GenBank On-line Service". Nucleic Acids Research, 18(6):1517–1520.

[Benton2006-2] Benton D et al. (2006). "GenBank". Nucleic Acids Research, 34(Database):D16-D20.

[gbrel-3] 3.0 ^3.1 ^3.2 NCBI-GenBank Flat File - Distribution Release Notes ('gbrel.txt') — 2007-02-15.

[1]

[2]

[3]

@@ Line 3: / Line 3: @@
 ==Statistics==
 *GenBank Flat File Release '''164.0''' (2008-02-15)
-**'''82,853,685''' loci, '''85,759,586,764''' bases, from '''82,853,685''' reported sequences.<ref name="gbrel">[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File Release 158.0 - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
+**'''82,853,685''' loci, '''85,759,586,764''' bases, from '''82,853,685''' reported sequences.<ref name="gbrel">[ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt NCBI-GenBank Flat File - Distribution Release Notes] ('<code>gbrel.txt</code>') &mdash; 2007-02-15.</ref>
+**Uncompressed, the Release '''164.0''' flatfiles require roughly '''321 GB''' (sequence files only) or '''342 GB''' (including the '<code>short directory</code>', '<code>index</code>', and the <code>*.txt</code> files).
 *GenBank Flat File Release '''161.0''' (2007-08-15)
 **'''76,146,236''' loci, '''79,525,559,650''' bases, from '''76,146,236''' reported sequences.<ref name="gbrel"/>

Difference between revisions of "GenBank"

Revision as of 23:39, 27 February 2008

Contents

Statistics

Selected Eukaryotic genomes

GenBank entries in the eukaryotic database

Flat file features

Index files

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools