Difference between revisions of "SmoothDock"

From Christoph's Personal Wiki
Jump to: navigation, search
(External links)
 
Line 1: Line 1:
'''''SmoothDock''''' is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by [[Dr. Carlos J. Camacho Laboratory|Carlos J. Camacho]] and [[Christoph Champ]] at the University of Pittsburgh. It is based upon a previous algorithm, ''ClusPro''<ref>Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. ''Nucleic Acids Res'', '''32''':W96-9.</ref><ref name="Kozakov2006">Kozakov D, Brenke R, Comeau SR, Vajda S (2006). PIPER: An FFT-based protein docking program with pairwise potentials. ''Proteins''.</ref>, developed by Camacho and Steven R. Comeau at Boston University (note: ''ClusPro'' is also based on a previous algorithm, ''Consensus''<ref>Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. ''Nucleic Acids Res'', '''32''':W50-4.</ref>).
+
'''''SmoothDock''''' is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by [[Dr. Carlos J. Camacho Laboratory|Carlos J. Camacho]] and [[Christoph Champ]] at the University of Pittsburgh. It is based upon a previous algorithm, ''ClusPro'',<ref>Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. ''Nucleic Acids Res'', '''32''':W96-9.</ref><ref name="Kozakov2006">Kozakov D, Brenke R, Comeau SR, Vajda S (2006). PIPER: An FFT-based protein docking program with pairwise potentials. ''Proteins''.</ref> developed by Camacho and Steven R. Comeau at Boston University (note: ''ClusPro'' is also based on a previous algorithm, ''Consensus''<ref>Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. ''Nucleic Acids Res'', '''32''':W50-4.</ref>).
  
''SmoothDock'' starts from around 20,000 predictions obtained from FFT-based global search program [[DOT]]<ref name="Mandell2001">Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Tsigelny NE I, Ten Eyck LF (2001). Protein docking using continuum electrostatics and geometric fit. ''Protein Eng, 14:105–113''.</ref>, and then employs a rigid body minimization with a combination of empirical and standard force field energy terms and clustering. In general outline, ''ClusPro'' and ''SmoothDock'' are similar to [http://vakser.bioinformatics.ku.edu/resources/gramm/grammx GRAMM-X], and all use FFT-based initial global search, but the refinement/re-scoring protocols and potential terms differ in each case.
+
''SmoothDock'' starts from around 20,000 predictions obtained from FFT-based global search program [[DOT]],<ref name="Mandell2001">Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Tsigelny NE I, Ten Eyck LF (2001). Protein docking using continuum electrostatics and geometric fit. ''Protein Eng, 14:105–113''.</ref> and then employs a rigid body minimization with a combination of empirical and standard force field energy terms and clustering. In general outline, ''ClusPro'' and ''SmoothDock'' are similar to [http://vakser.bioinformatics.ku.edu/resources/gramm/grammx GRAMM-X], and all use FFT-based initial global search, but the refinement/re-scoring protocols and potential terms differ in each case.
  
 
==The ''SmoothDock'' algorithm==
 
==The ''SmoothDock'' algorithm==
Line 16: Line 16:
  
 
===Step 2: filtering decoys===
 
===Step 2: filtering decoys===
Following the procedure detailed elsewhere<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref><ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref><ref>Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. ''Biophys J'', '''76''':1166-1178.</ref>, for each complex we compute the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy<ref>Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. ''J Mol Biol'', '''267''':707-726.</ref> and 1,500 best electrostatic energy<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref> complexes for a total of 2,000 complex candidates.
+
Following the procedure detailed elsewhere,<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref><ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref><ref>Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. ''Biophys J'', '''76''':1166-1178.</ref> for each complex we compute the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy<ref>Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. ''J Mol Biol'', '''267''':707-726.</ref> and 1,500 best electrostatic energy<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref> complexes for a total of 2,000 complex candidates.
  
 
===Step 3: clustering decoys===
 
===Step 3: clustering decoys===
We cluster the filtered complexes using a ''pairwise RMS deviation'' (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors<ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref>.
+
We cluster the filtered complexes using a ''pairwise RMS deviation'' (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors.<ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref>
  
 
The complexes are clustered in either of two ways:
 
The complexes are clustered in either of two ways:
Line 28: Line 28:
  
 
====Pairwise RMSD clustering====
 
====Pairwise RMSD clustering====
The top 2000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C&alpha; of each of those residues and the C&alpha; of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then minimized in CHARMM<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> in the presence of the receptor, and concatenated into a PDB in NMR format.
+
The top 2,000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2,000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C&alpha; of each of those residues and the C&alpha; of the corresponding residues on each of the 2,000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then minimized in CHARMM<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> in the presence of the receptor, and concatenated into a PDB in NMR format.
  
 
===Step 4: refinement and discrimination of native-like clusters===
 
===Step 4: refinement and discrimination of native-like clusters===

Latest revision as of 22:27, 15 March 2018

SmoothDock is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by Carlos J. Camacho and Christoph Champ at the University of Pittsburgh. It is based upon a previous algorithm, ClusPro,[1][2] developed by Camacho and Steven R. Comeau at Boston University (note: ClusPro is also based on a previous algorithm, Consensus[3]).

SmoothDock starts from around 20,000 predictions obtained from FFT-based global search program DOT,[4] and then employs a rigid body minimization with a combination of empirical and standard force field energy terms and clustering. In general outline, ClusPro and SmoothDock are similar to GRAMM-X, and all use FFT-based initial global search, but the refinement/re-scoring protocols and potential terms differ in each case.

The SmoothDock algorithm

The SmoothDock algorithm comprises four steps:

  1. perform rigid-body docking using the program DOT, keeping the top 20,000 structures as ranked by surface complementarity;
  2. re-rank these structures according to a free energy estimate that includes both desolvation[5] and electrostatics and retain the top 2,000 complexes;
  3. cluster the filtered complexes using a pairwise RMS deviation criterion; and
  4. the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.

Step 1: rigid-body docking

Rigid-body docking using the Fast-Fourier Transform (FFT) based program DOT[6][7] is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.

Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 1010 structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.

Step 2: filtering decoys

Following the procedure detailed elsewhere,[8][9][10] for each complex we compute the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy[11] and 1,500 best electrostatic energy[12] complexes for a total of 2,000 complex candidates.

Step 3: clustering decoys

We cluster the filtered complexes using a pairwise RMS deviation (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors.[13]

The complexes are clustered in either of two ways:

  1. using an all Cα RMSD criterion and a 10 Å cutoff; and
  2. using a Cα binding site RMSD criterion and a cutoff radius of 7 Å.

All clustering is done in a hierarchical manner such that no overlaps occur between distinct clusters.

Pairwise RMSD clustering

The top 2,000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2,000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the Cα of each of those residues and the Cα of the corresponding residues on each of the 2,000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then minimized in CHARMM[14] in the presence of the receptor, and concatenated into a PDB in NMR format.

Step 4: refinement and discrimination of native-like clusters

Using 10 representative structures from each cluster, the smooth docking algorithm[15] is used to optimize our free energy function around each cluster. We submit the top ranked complexes from those clusters that converge to the lowest free energies as estimated by Eq.1:

ΔG = E_elec + E_desolv + E_vdw   (Eq.1)

Technical details

The algorithm described above is implemented through a combination of Fortran77 and C code. The initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions.

Since billions of calculations (a minimum of 1283*13000 ≈ 2.7 x 1010) are needed for each receptor/ligand complex, the main (C) code was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler).

The SmoothDock Server

The above algorithm can be used via a web server, which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a web server.

Notes

  • low affinity complexes: K_d < nM

See also

Concepts

References

  1. Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res, 32:W96-9.
  2. Kozakov D, Brenke R, Comeau SR, Vajda S (2006). PIPER: An FFT-based protein docking program with pairwise potentials. Proteins.
  3. Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. Nucleic Acids Res, 32:W50-4.
  4. Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Tsigelny NE I, Ten Eyck LF (2001). Protein docking using continuum electrostatics and geometric fit. Protein Eng, 14:105–113.
  5. Camacho CJ, Kimura SR, DeLisi C, Vajda S (2000). Kinetics of desolvation-mediated protein-protein binding. Biophys J, 78(3):1094-1105.
  6. Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press.
  7. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA, 89:2195-2199.
  8. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  9. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  10. Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. Biophys J, 76:1166-1178.
  11. Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol, 267:707-726.
  12. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  13. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  14. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem, 4:187–217.
  15. Camacho CJ, Vajda S (2001). Protein docking along smooth association pathways. Proc Natl Acad Sci USA, 98:10636-10641.

External links