Biological databases

The following is an incomplete list of important biological databases commonly used in bioinformatics.

1 Primary sequence databases
2 Meta-databases
3 Genome databases
4 Genome browsers
5 Protein sequence databases
6 Protein structure databases
7 Protein-protein interactions
8 Metabolic pathway databases
9 Microarray databases
10 Mathematical model databases
11 PCR / Real time PCR primer databases
12 Specialized databases

Primary sequence databases

The International Nucleotide Sequence Database (INSD) consists of the following databases.

DDBJ: DNA Data Bank of Japan
EMBL Nucleotide DB (European Molecular Biology Laboratory)
GenBank [1] (National Center for Biotechnology Information)

These databanks represent the current knowledge about the sequences of all organisms. They interchange the stored information and are the source for many other databases.

Meta-databases

Strictly speaking a meta-database can be considered a database of databases, rather than any one integration project or technology. They collect data from different sources and usually makes them available in new and more convenient form, or with an emphasis on a particular disease or organism.

Entrez (National Center for Biotechnology Information)
euGenes (Indiana University)
GeneCards (Weizmann Inst.)
SOURCE (Stanford University)
mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
Bioinformatic Harvester (Karlsruhe Institute of Technology) — Integrating 26 major protein/gene resources.
MetaBase (KOBIC) &mdasdh; a user contributed database of biological databases.

Genome databases

These databases collect organism genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

Ensembl — provides automatic annotation databases for human, mouse, other vertebrate and eukaryote genomes.
JGI Genomes of the DOE-Joint Genome Institute — provides databases of many eukaryote and microbial genomes.
CAMERA Resource for microbial genomics and metagenomics
MGI Mouse Genome (Jackson Laboratory)
Corn, the Maize Genetics and Genomics Database
Saccharomyces Genome Database — genome of the yeast model organism.
Wormbase — genome of the model organism Caenorhabditis elegans
Zebrafish Information Network — genome of this fish model organism.
ENCODE — database of known functional elements in human genome.

Genome browsers

Genome Browsers enable researchers to visualize and browse entire genomes (most have many complete genomes) with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources.

Integrated Microbial Genomes (IMG) system by the DOE-Joint Genome Institute
UCSC Genome Bioinformatics Genome Browser and Tools (UCSC)
Ensembl — The Ensembl Genome Browser (Sanger Institute and EBI)
GBrowse — The GMOD GBrowse Project
Pathway Tools Genome Browser
X:Map &mdsah; a genome browser that shows Affymetrix Exon Microarray hit locations alongside the gene, transcript and exon data on a Google maps api

Protein sequence databases

UniProt: Universal Protein Resource (UniProt Consortium: EBI, Expasy, Protein Information Resource)
PIR Protein Information Resource (Georgetown University Medical Center)
Swiss-Prot: Protein Knowledgebase (Swiss Institute of Bioinformatics)
PEDANT: Protein Extraction, Description and ANalysis Tool] (Forschungszentrum f. Umwelt & Gesundheit)
PROSITE — database of protein families and structural domains
DIP: Database of Interacting Proteins (University of California)
Pfam: Protein families — database of alignments and HMMs (Sanger Institute)
ProDom — comprehensive set of Protein Domain Families (Institut National de la Recherche Agronomique/Centre national de la recherche scientifique)
SignalP 3.0 &mdsah; server for signal peptide prediction (including cleavage site prediction), based on artificial neural networks and HMMs
SUPERFAMILY — library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms

Protein structure databases

PDB: Protein Data Bank (RCSB: Research Collaboratory for Structural Bioinformatics)
CATH Protein Structure Classification
SCOP: Structural Classification of Proteins
SWISS-MODEL — server and repository for protein structure models
ModBase — database of comparative protein structure models (Sali Lab, UCSF)

Protein-protein interactions

BioGRID — a general repository for interaction datasets (Samuel Lunenfeld Research Institute)
STRING — a database of known and predicted protein-protein interactions (EMBL)
DIP: Database of Interacting Proteins

Metabolic pathway databases

BioCyc Database Collection (including EcoCyc and MetaCyc)
KEGG PATHWAY Database (Kyoto University)
MANET database (University of Illinois)
Reactome (Cold Spring Harbor Laboratory, European Bioinformatics Institute, Gene Ontology Consortium)

Microarray databases

ArrayExpress (European Bioinformatics Institute)
Gene Expression Omnibus (National Center for Biotechnology Information)
maxd (University of Manchester)
SMD (Stanford University)
GPX (Scottish Centre for Genomic Technology and Informatics)

Mathematical model databases

PCR / Real time PCR primer databases

PathoOligoDB: A free QPCR oligo database for pathogens

Specialized databases

BIOMOVIE (ETH-Zurich) — movies related to biology and biotechnology
CGAP Cancer Genes (National Cancer Institute)
Clone Registry Clone Collections (National Center for Biotechnology Information)
DBGET H.sapiens (Kyoto University)
GDB Hum. Genome Db (Human Genome Organisation)
SHMPD: The Singapore Human Mutation and Polymorphism Database
NCBI-UniGene (National Center for Biotechnology Information)
OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
Off. Hum. Genome Db (HUGO Gene Nomenclature Committee)
HGMD disease-causing mutations (HGMD Human Gene Mutation Database)
PhenCode linking human mutations with phenotype
List with SNP-Databases
p53: The p53 Knowledgebase
Edinburgh Mouse Atlas
HvrBase++: Human and primate mitochondrial DNA
PolygenicPathways &mdsh; genes and risk factors implicated in Alzheimer's disease, Bipolar disorder or Schizophrenia
Connectivity map — transcriptional expression data and correlation tools for drugs
CTD: The Comparative Toxicogenomics Database — describes chemical-gene-disease interactions

Biological databases

Contents

Primary sequence databases

Meta-databases

Genome databases

Genome browsers

Protein sequence databases

Protein structure databases

Protein-protein interactions

Metabolic pathway databases

Microarray databases

Mathematical model databases

PCR / Real time PCR primer databases

Specialized databases

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools