PDB

From Christoph's Personal Wiki
Revision as of 06:45, 25 April 2007 by Christoph (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. This data, typically obtained by X-ray crystallography or NMR spectroscopy, is submitted by biologists and biochemists from around the world, is released into the public domain, and can be accessed for free.

ATOM coordinates format overview

The ATOM records present the atomic coordinates for standard residues (see http://deposit.pdb.org/public-component-erf.cif). They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional.

  • Record Format
COLUMNS      DATA TYPE        FIELD      DEFINITION
------------------------------------------------------
 1 -  6      Record name      "ATOM    "
 7 - 11      Integer          serial     Atom serial number.
13 - 16      Atom             name       Atom name.
17           Character        altLoc     Alternate location indicator.
18 - 20      Residue name     resName    Residue name.
22           Character        chainID    Chain identifier.
23 - 26      Integer          resSeq     Residue sequence number.
27           AChar            iCode      Code for insertion of residues.
31 - 38      Real(8.3)        x          Orthogonal coordinates for X in 
                                         Angstroms
39 - 46      Real(8.3)        y          Orthogonal coordinates for Y in 
                                         Angstroms
47 - 54      Real(8.3)        z          Orthogonal coordinates for Z in 
                                         Angstroms
55 - 60      Real(6.2)        occupancy  Occupancy.
61 - 66      Real(6.2)        tempFactor Temperature factor.
77 - 78      LString(2)       element    Element symbol, right-justified.
79 - 80      LString(2)       charge     Charge on the atom.
  1. ATOM records for proteins are listed from amino to carboxyl terminus.
  2. Nucleic acid residues are listed from the 5' to the 3' terminus.
  3. No ordering is specified for polysaccharides.
  4. The list of ATOM records in a chain is terminated by a TER record.
  5. If more than one model is present in the entry, each model is delimited by MODEL and ENDMDL records.
  6. If an atom is provided in more than one position, then a non-blank alternate location indicator must be used as the alternate location indicator for each of the positions. Within a residue all atoms that are associated with each other in a given conformation are assigned the same alternate position indicator.
  7. For atoms that are in alternate sites indicated by the alternate site indicator, sorting of atoms in the ATOM/ HETATM list uses the following general rules:
    • In the simple case that involves a few atoms or a few residues with alternate sites, the coordinates occur one after the other in the entry.
    • In the case of a large heterogen groups which are disordered, the atoms for each conformer are listed together.
  8. The insertion code is commonly used in sequence numbering
  9. If the depositor provides the data, then the isotropic B value is given for the temperature factor.
  10. If there are neither isotropic B values from the depositor, nor anisotropic temperature factors in ANISOU, then the default value of 0.0 is used for the temperature factor.
  11. Columns 77 - 78 contain the atom's element symbol (as given in the periodic table), right-justified.
  12. Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases these are blank.
  • Verification/Validation/Value Authority Control

PDB checks ATOM/HETATM records for PDB format, sequence information, and packing. The PDB reserves the right to return deposited coordinates to the author for transformation into PDB format.

  • Relationships to Other Record Types

The ATOM records are compared to the corresponding sequence database. Residue discrepancies appear in the SEQADV record. Missing atoms are annotated in the remarks. HETATM records are formatted in the same way as ATOM records. The sequence implied by ATOM records must be identical to that given in SEQRES, with the exception that residues that have no coordinates, e.g., due to disorder, must appear in SEQRES.

  • Example
         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM    145  N   VAL A  25      32.433  16.336  57.540  1.00 11.92           N
ATOM    146  CA  VAL A  25      31.132  16.439  58.160  1.00 11.85           C
ATOM    147  C   VAL A  25      30.447  15.105  58.363  1.00 12.34           C
ATOM    148  O   VAL A  25      29.520  15.059  59.174  1.00 15.65           O
ATOM    149  CB AVAL A  25      30.385  17.437  57.230  0.28 13.88           C
ATOM    150  CB BVAL A  25      30.166  17.399  57.373  0.72 15.41           C
ATOM    151  CG1AVAL A  25      28.870  17.401  57.336  0.28 12.64           C
ATOM    152  CG1BVAL A  25      30.805  18.788  57.449  0.72 15.11           C
ATOM    153  CG2AVAL A  25      30.835  18.826  57.661  0.28 13.58           C
ATOM    154  CG2BVAL A  25      29.909  16.996  55.922  0.72 13.25           C
  • Known Problems

No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U).

External links