From Christoph's Personal Wiki
Jump to: navigation, search

NCBI's Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein.[1] Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genome annotation to reporting locations of sequence variation in medical records. The RefSeq collection is available without restriction and can be retrieved in several different ways, such as by searching or by available links in NCBI resources, including PubMed, Nucleotide, Protein, Gene, and Map Viewer, searching with a sequence via BLAST, and downloading from the RefSeq FTP site.


RefSeq Release 54 (11 July 2012):

Proteins: 16,393,342
Organisms: 17,605

See also


  1. Kim Pruitt, Garth Brown, Tatiana Tatusova, and Donna Maglott (2002). "The Reference Sequence (RefSeq) Database". NCBI. Bookshelf ID: NBK21091. Last Update: 6 April 2012.

External links