Amino acid

From Christoph's Personal Wiki
Revision as of 17:22, 8 January 2009 by Christoph (Talk | contribs)

Jump to: navigation, search

In chemistry, an amino acid is a molecule that contains both amine and carboxyl functional groups. In biochemistry, this term refers to alpha-amino acids with the general formula NH2CHRCOOH.[1]

Amino acid atoms

The following atoms are expected for a given amino acid:

 aa   1 2  3 4 5  6   7   8   9   10  11  12  13  14 
 A:   N CA C O CB                                      : Alanine
 V:   N CA C O CB CG1 CG2                              : Valine
 L:   N CA C O CB CG  CD1 CD2                          : Leucine
 I:   N CA C O CB CG1 CG2 CD1                          : Isoleucine
 P:   N CA C O CB CG  CD                               : Proline
 M:   N CA C O CB CG  SD  CE                           : Methionine
 F:   N CA C O CB CG  CD1 CD2 CE1 CE2 CZ               : Phenylalanine
 W:   N CA C O CB CG  CD1 CD2 NE1 CE2 CE3 CZ2 CZ3 CH2  : Tryptophan
 G:   N CA C O                                         : Glycine
 S:   N CA C O CB OG                                   : Serine
 T:   N CA C O CB OG1 CG2                              : Threonine
 C:   N CA C O CB SG                                   : Cysteine
 Y:   N CA C O CB CG  CD1 CD2 CE1 CE2 CZ  OH           : Tyrosine
 N:   N CA C O CB CG  OD1 ND2                          : Asparagine
 Q:   N CA C O CB CG  CD  OE1 NE2                      : Glutamine
 D:   N CA C O CB CG  OD1 OD2                          : Aspartic acid
 E:   N CA C O CB CG  CD  OE1 OE2                      : Glutamic acid
 K:   N CA C O CB CG  CD  CE  NZ                       : Lysine
 R:   N CA C O CB CG  CD  NE  CZ  NH1 NH2              : Arginine
 H:   N CA C O CB CG  ND1 CD2 CE1 NE2                  : Histidine
 X:   N CA C O CB                                      : Nonstandard (ATOM or HETATM records)
 #:   N CA C O                                         : Unknown (ATOM records)

Reduced (redundant or simplified) alphabets for proteins

AGTSNQDEHRKP => P: Hydrophilic
CMFILVWY     => H: Hydrophobic
  • Five letters alphabet: Chemical / structural properties[4]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KRDE  => C: Charged
GACS  => T: Tiny
TMQNP => D: Diverse
  • Six letters alphabet: Chemical / structural properties #2[4]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KR    => C: Pos. charged
DE    => C: Neg. charged
GACS  => T: Tiny
TMQNP => D: Diverse
  • 3 IMGT amino acid hydropathy alphabet[5]
IVLFCMAW => P: Hydrophilic
GTSYPM   => N: Neutral
DNEQKR   =>H: Hydrophobic
  • Five letters alphabet: Chemical / structural properties[5]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KRDE  => C: Charged
GACS  => T: Tiny
TMQNP => D: Diverse5 IMGT amino acid volume alphabet
GAS   => G: 60-90
CDPNT => C: 108-117
EVQH  => E: 138-154
MILKR => M: 162-174
FYW   => F: 189-228
  • 11 IMGT amino acid chemical characteristics alphabet[5]
AVIL => A: Aliphatic
F    => F: Phenylalanine
CM   => C: Sulfur
G    => G: Glycine
ST   => S: Hydroxyl
W    => W: Tryptophan
Y    => Y: Tyrosine
P    => P: Proline
DE   => A: Acidic
NQ   => N: Amide
HKR  => H: Basic
  • Murphy et al., 2000; 15 letters alphabet[6]
LVIM => L: Large hydrophobic
C    => C
A    => A
G    => G
S    => S
T    => T
P    => P
FY   => F: Hydrophobic/aromatic sidechains
W    => W
E    => E
D    => D
N    => N
Q    => Q
KR   => K: Long-chain positively charged
H    => H
  • Murphy et al., 2000; 10 letters alphabet[6]
LVIM => L: Large hydrophobic
C    => C
A    => A
G    => G
ST   => S: Polar
P    => P
FYW  => F:Hydrophobic/aromatic sidechains
EDNQ => E: Charged / polar
KR   => K: Long-chain positively charged
H    => H
  • Murphy et al., 2000; 8 letters alphabet[6]
LVIMC => L: Hydrophobic
AG    => A
ST    => S: Polar
P     => P
FYW   => F: Hydrophobic/aromatic sidechains
EDNQ  => E
KR    => K: Long-chain positively charged
H     => H
  • Murphy et al., 2000; 4 letters alphabet[6]
LVIMC   => L: Hydrophobic
AGSTP   => A
FYW     => F: Hydrophobic/aromatic sidechains
EDNQKRH => E
  • Murphy et al., 2000; 2 letters alphabet[6]
LVIMCAGSTPFYW => P: Hydrophobic
EDNQKRH       => E: Hydrophilic
  • Wang & Wang, 1999; 5 letters alphabet[7]
CMFILVWY => I
ATH      => A
GP       => G
DE       => E
SNQRK    => K
  • Wang & Wang, 1999; 5 letters variant alphabet[7]
CMFI => I
LVWY => L
ATGS => A
NQDE => E
HPRK => K
  • Wang & Wang, 1999; 3 letters alphabet[7]
CMFILVWY => I
ATHGPR   => A
DESNQK   => E
  • Wang & Wang, 1999; 2 letters alphabet[7]
CMFILVWY     => I
ATHGPRDESNQK => A
  • Li et al., 2003; 10 letters alphabet[8]
C   => C
FYW => Y
ML  => L
IV  => V
G   => G
P   => P
ATS => S
NH  => N
QED => E
RK  => K
  • Li et al., 2003; 5 letters alphabet[8]
CFYW    => Y
MLIV    => I
G       => G
PATS    => S
NHQEDRK => E
  • Li et al., 2003; 4 letters alphabet[8]
CFYW    => Y
MLIV    => I
GPATS   => S
NHQEDRK => E
  • Li et al., 2003; 3 letters alphabet[8]
CFYWMLIV => I
GPATS    => S
NHQEDRK  => E

References

  1. Proline is an exception to this general formula. It lacks the NH2 group because of the cyclization of the side chain.
  2. Chan HS, Dill KA (1989). "Compact polymers". Macromolecules, 22:4559-4573.
  3. Lau KF, Dill KA (1989). "A lattice statistical mechanics model of the conformational and sequence spaces of proteins". Macromolecules, 22:3986-3997.
  4. 4.0 4.1 Betts MJ, Russell RB (2003). "Amino acid properties and consequences of subsitutions". Bioinformatics for Geneticists, M.R. Barnes, I.C. Gray eds, Wiley.
  5. 5.0 5.1 5.2 Pommié C, Levadoux S, Sabatier R, Lefranc G & Lefranc MP (2004). "IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties". Journal of Molecular Recognition, 17:17-32. PMID: 14872534
  6. 6.0 6.1 6.2 6.3 6.4 Murphy LR, Wallqvist A, Levy RM (2000). "Simplified amino acid alphabets for protein fold recognition and implications for folding". Protein Eng, 13:149-152. PMID: 10775656
  7. 7.0 7.1 7.2 7.3 Wang J, Wang W (1999). "A computational approach to simplifying the protein folding alphabet". Nat Struct Biol, 11:1033-1038. PMID: 10542095
  8. 8.0 8.1 8.2 8.3 Li T, Fan K, Wang J, Wang W (2003). "Reduction of protein sequence complexity by residue grouping". Protein Eng, 5:323-330. PMID: 12826723

Further reading

  • Spitzer M, Fuellen G, Cullen P, Lorkowski S (2004). "VisCoSe: visualization and comparison of consensus sequences". Bioinformatics, 20:433-435. PMID: 14960475.
  • Livingstone CD, Barton GJ (1993). "Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation". Comput Appl Biosci, 9(6):745-56. PMID: 8143162.

External links