Difference between revisions of "Amino acid"

From Christoph's Personal Wiki
Jump to: navigation, search
(Reduced (redundant or simplified) alphabets for proteins)
Line 50: Line 50:
 
  IVLFCMAW => P: Hydrophilic
 
  IVLFCMAW => P: Hydrophilic
 
  GTSYPM  => N: Neutral
 
  GTSYPM  => N: Neutral
  DNEQKR  =>H: Hydrophobic
+
  DNEQKR  => H: Hydrophobic
  
 
*Five letters alphabet: Chemical / structural properties<ref name="Pommie2004">Pommié C, Levadoux S, Sabatier R, Lefranc G & Lefranc MP (2004). "IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties". ''Journal of Molecular Recognition, 17:17-32''. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14872534&dopt=Abstract PMID: 14872534]</ref>
 
*Five letters alphabet: Chemical / structural properties<ref name="Pommie2004">Pommié C, Levadoux S, Sabatier R, Lefranc G & Lefranc MP (2004). "IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties". ''Journal of Molecular Recognition, 17:17-32''. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14872534&dopt=Abstract PMID: 14872534]</ref>

Revision as of 02:34, 16 May 2012

In chemistry, an amino acid is a molecule that contains both amine and carboxyl functional groups. In biochemistry, this term refers to alpha-amino acids with the general formula NH2CHRCOOH.[1]

Amino acid atoms

The following atoms are expected for a given amino acid:

 aa   1 2  3 4 5  6   7   8   9   10  11  12  13  14 
 A:   N CA C O CB                                      : Alanine
 V:   N CA C O CB CG1 CG2                              : Valine
 L:   N CA C O CB CG  CD1 CD2                          : Leucine
 I:   N CA C O CB CG1 CG2 CD1                          : Isoleucine
 P:   N CA C O CB CG  CD                               : Proline
 M:   N CA C O CB CG  SD  CE                           : Methionine
 F:   N CA C O CB CG  CD1 CD2 CE1 CE2 CZ               : Phenylalanine
 W:   N CA C O CB CG  CD1 CD2 NE1 CE2 CE3 CZ2 CZ3 CH2  : Tryptophan
 G:   N CA C O                                         : Glycine
 S:   N CA C O CB OG                                   : Serine
 T:   N CA C O CB OG1 CG2                              : Threonine
 C:   N CA C O CB SG                                   : Cysteine
 Y:   N CA C O CB CG  CD1 CD2 CE1 CE2 CZ  OH           : Tyrosine
 N:   N CA C O CB CG  OD1 ND2                          : Asparagine
 Q:   N CA C O CB CG  CD  OE1 NE2                      : Glutamine
 D:   N CA C O CB CG  OD1 OD2                          : Aspartic acid
 E:   N CA C O CB CG  CD  OE1 OE2                      : Glutamic acid
 K:   N CA C O CB CG  CD  CE  NZ                       : Lysine
 R:   N CA C O CB CG  CD  NE  CZ  NH1 NH2              : Arginine
 H:   N CA C O CB CG  ND1 CD2 CE1 NE2                  : Histidine
 X:   N CA C O CB                                      : Nonstandard (ATOM or HETATM records)
 #:   N CA C O                                         : Unknown (ATOM records)

Reduced (redundant or simplified) alphabets for proteins

AGTSNQDEHRKP => P: Hydrophilic
CMFILVWY     => H: Hydrophobic
  • Five letters alphabet: Chemical / structural properties[4]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KRDE  => C: Charged
GACS  => T: Tiny
TMQNP => D: Diverse
  • Six letters alphabet: Chemical / structural properties #2[4]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KR    => C: Pos. charged
DE    => C: Neg. charged
GACS  => T: Tiny
TMQNP => D: Diverse
  • 3 IMGT amino acid hydropathy alphabet[5]
IVLFCMAW => P: Hydrophilic
GTSYPM   => N: Neutral
DNEQKR   => H: Hydrophobic
  • Five letters alphabet: Chemical / structural properties[5]
IVL   => A: Aliphatic
FYWH  => R: Aromatic
KRDE  => C: Charged
GACS  => T: Tiny
TMQNP => D: Diverse5 IMGT amino acid volume alphabet
GAS   => G: 60-90
CDPNT => C: 108-117
EVQH  => E: 138-154
MILKR => M: 162-174
FYW   => F: 189-228
  • 11 IMGT amino acid chemical characteristics alphabet[5]
AVIL => A: Aliphatic
F    => F: Phenylalanine
CM   => C: Sulfur
G    => G: Glycine
ST   => S: Hydroxyl
W    => W: Tryptophan
Y    => Y: Tyrosine
P    => P: Proline
DE   => A: Acidic
NQ   => N: Amide
HKR  => H: Basic
  • Murphy et al., 2000; 15 letters alphabet[6]
LVIM => L: Large hydrophobic
C    => C
A    => A
G    => G
S    => S
T    => T
P    => P
FY   => F: Hydrophobic/aromatic sidechains
W    => W
E    => E
D    => D
N    => N
Q    => Q
KR   => K: Long-chain positively charged
H    => H
  • Murphy et al., 2000; 10 letters alphabet[6]
LVIM => L: Large hydrophobic
C    => C
A    => A
G    => G
ST   => S: Polar
P    => P
FYW  => F:Hydrophobic/aromatic sidechains
EDNQ => E: Charged / polar
KR   => K: Long-chain positively charged
H    => H
  • Murphy et al., 2000; 8 letters alphabet[6]
LVIMC => L: Hydrophobic
AG    => A
ST    => S: Polar
P     => P
FYW   => F: Hydrophobic/aromatic sidechains
EDNQ  => E
KR    => K: Long-chain positively charged
H     => H
  • Murphy et al., 2000; 4 letters alphabet[6]
LVIMC   => L: Hydrophobic
AGSTP   => A
FYW     => F: Hydrophobic/aromatic sidechains
EDNQKRH => E
  • Murphy et al., 2000; 2 letters alphabet[6]
LVIMCAGSTPFYW => P: Hydrophobic
EDNQKRH       => E: Hydrophilic
  • Wang & Wang, 1999; 5 letters alphabet[7]
CMFILVWY => I
ATH      => A
GP       => G
DE       => E
SNQRK    => K
  • Wang & Wang, 1999; 5 letters variant alphabet[7]
CMFI => I
LVWY => L
ATGS => A
NQDE => E
HPRK => K
  • Wang & Wang, 1999; 3 letters alphabet[7]
CMFILVWY => I
ATHGPR   => A
DESNQK   => E
  • Wang & Wang, 1999; 2 letters alphabet[7]
CMFILVWY     => I
ATHGPRDESNQK => A
  • Li et al., 2003; 10 letters alphabet[8]
C   => C
FYW => Y
ML  => L
IV  => V
G   => G
P   => P
ATS => S
NH  => N
QED => E
RK  => K
  • Li et al., 2003; 5 letters alphabet[8]
CFYW    => Y
MLIV    => I
G       => G
PATS    => S
NHQEDRK => E
  • Li et al., 2003; 4 letters alphabet[8]
CFYW    => Y
MLIV    => I
GPATS   => S
NHQEDRK => E
  • Li et al., 2003; 3 letters alphabet[8]
CFYWMLIV => I
GPATS    => S
NHQEDRK  => E

References

  1. Proline is an exception to this general formula. It lacks the NH2 group because of the cyclization of the side chain.
  2. Chan HS, Dill KA (1989). "Compact polymers". Macromolecules, 22:4559-4573.
  3. Lau KF, Dill KA (1989). "A lattice statistical mechanics model of the conformational and sequence spaces of proteins". Macromolecules, 22:3986-3997.
  4. 4.0 4.1 Betts MJ, Russell RB (2003). "Amino acid properties and consequences of subsitutions". Bioinformatics for Geneticists, M.R. Barnes, I.C. Gray eds, Wiley.
  5. 5.0 5.1 5.2 Pommié C, Levadoux S, Sabatier R, Lefranc G & Lefranc MP (2004). "IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties". Journal of Molecular Recognition, 17:17-32. PMID: 14872534
  6. 6.0 6.1 6.2 6.3 6.4 Murphy LR, Wallqvist A, Levy RM (2000). "Simplified amino acid alphabets for protein fold recognition and implications for folding". Protein Eng, 13:149-152. PMID: 10775656
  7. 7.0 7.1 7.2 7.3 Wang J, Wang W (1999). "A computational approach to simplifying the protein folding alphabet". Nat Struct Biol, 11:1033-1038. PMID: 10542095
  8. 8.0 8.1 8.2 8.3 Li T, Fan K, Wang J, Wang W (2003). "Reduction of protein sequence complexity by residue grouping". Protein Eng, 5:323-330. PMID: 12826723

Further reading

  • Spitzer M, Fuellen G, Cullen P, Lorkowski S (2004). "VisCoSe: visualization and comparison of consensus sequences". Bioinformatics, 20:433-435. PMID: 14960475.
  • Livingstone CD, Barton GJ (1993). "Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation". Comput Appl Biosci, 9(6):745-56. PMID: 8143162.

External links