Difference between revisions of "SmoothDock"

From Christoph's Personal Wiki
Jump to: navigation, search
(The ''SmoothDock'' algorithm)
Line 4: Line 4:
 
The ''SmoothDock'' algorithm comprises four steps:
 
The ''SmoothDock'' algorithm comprises four steps:
 
# perform rigid-body docking using the program [[DOT]], keeping the top 20,000 structures as ranked by surface complementarity;
 
# perform rigid-body docking using the program [[DOT]], keeping the top 20,000 structures as ranked by surface complementarity;
# re-rank these structures according to a free energy estimate that includes both desolvation and electrostatics and retain the top 2,000 complexes;
+
# re-rank these structures according to a free energy estimate that includes both desolvation<ref>Camacho CJ, Kimura SR, DeLisi C, Vajda S (2000). Kinetics of desolvation-mediated protein-protein binding. ''Biophys J'', '''78(3)''':1094-1105.</ref> and electrostatics and retain the top 2,000 complexes;
 
# cluster the filtered complexes using a pairwise RMS deviation criterion; and
 
# cluster the filtered complexes using a pairwise RMS deviation criterion; and
 
# the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.
 
# the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.
Line 11: Line 11:
 
Rigid-body docking using the Fast-Fourier Transform (FFT) based program [[DOT]]<ref>Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. ''In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press''.</ref><ref>Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natl Acad Sci USA'', '''89''':2195-2199.</ref> is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.
 
Rigid-body docking using the Fast-Fourier Transform (FFT) based program [[DOT]]<ref>Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. ''In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press''.</ref><ref>Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natl Acad Sci USA'', '''89''':2195-2199.</ref> is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.
  
Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13 000 rotations, over 2.7 x 1010 structures are evaluated, retaining 20 000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.
+
Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 10<sup>10</sup> structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.
  
 
=== Step 2: filtering decoys ===
 
=== Step 2: filtering decoys ===
Line 20: Line 20:
  
 
The complexes are clustered in either of two ways:
 
The complexes are clustered in either of two ways:
# using an all C_&alpha; RMSD criterion and a 10 Å cutoff; and
+
# using an all C&alpha; RMSD criterion and a 10 Å cutoff; and
# using a C_&alpha; binding site RMSD criterion and a cutoff radius of 7 Å.
+
# using a C&alpha; binding site RMSD criterion and a cutoff radius of 7 Å.
  
 
All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.
 
All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.
  
 
==== Pairwise RMSD clustering ====
 
==== Pairwise RMSD clustering ====
The top 2000 energetically favorable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C_&alpha; of each of those residues and the C_&alpha; of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbors below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbors is the cluster center, and is the representative structure for the cluster. The top cluster centers are then CHARMm<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> minimized in the presence of the receptor, and concatenated into PDB NMR format.
+
The top 2000 energetically favorable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C&alpha; of each of those residues and the C&alpha; of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbors below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbors is the cluster center, and is the representative structure for the cluster. The top cluster centers are then CHARMm<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> minimized in the presence of the receptor, and concatenated into PDB NMR format.
  
 
=== Step 4: refinement and discrimination of native-like clusters ===
 
=== Step 4: refinement and discrimination of native-like clusters ===
Line 35: Line 35:
 
The above algorithm can be used via a [http://structure.pitt.edu/servers/smoothdock/ webserver], which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a webserver.
 
The above algorithm can be used via a [http://structure.pitt.edu/servers/smoothdock/ webserver], which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a webserver.
  
Note: Since tens of billions of calculations are needed for ''each'' receptor/ligand complex, we run the algorithm on a dedicated cluster (256 CPUs).
+
Note: Since billions of calculations (a minimum of 2.7 x 10<sup>10</sup>) are needed for ''each'' receptor/ligand complex, we run the algorithm on a dedicated cluster (256 CPUs).
  
 
== Notes ==
 
== Notes ==
Line 42: Line 42:
 
== See also ==
 
== See also ==
 
* [[CAPRI]]
 
* [[CAPRI]]
 +
* [http://en.wikipedia.org/wiki/Electrostatics Electrostatics]
 +
* [http://en.wikipedia.org/wiki/Permittivity Permittivity]
 +
* [http://en.wikipedia.org/wiki/Coulomb%27s_law Coulomb's law]
 +
* [http://en.wikipedia.org/wiki/Dielectric Dielectric]
 +
* [http://en.wikipedia.org/wiki/Dielectric_constant Dielectric constant]
 +
* [http://en.wikipedia.org/wiki/Fourier_transform Fourier transform]
 +
* [http://en.wikipedia.org/wiki/Fast_Fourier_transform Fast Fourier transform]
 +
* [http://en.wikipedia.org/wiki/RMSD RMSD]
  
 
== References ==
 
== References ==
<references/>
+
<small><references/></small>
 
== External links ==
 
== External links ==
 
* [http://structure.pitt.edu/servers/smoothdock/ SmoothDock Server]
 
* [http://structure.pitt.edu/servers/smoothdock/ SmoothDock Server]

Revision as of 04:48, 31 July 2006

SmoothDock is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by Carlos J. Camacho and Christoph Champ at the University of Pittsburgh. It is based upon a previous algorithm, ClusPro[1], developed by Camacho and Steven R. Comeau at Boston University (note: ClusPro is also based on a previous algorithm, Consensus[2]).

The SmoothDock algorithm

The SmoothDock algorithm comprises four steps:

  1. perform rigid-body docking using the program DOT, keeping the top 20,000 structures as ranked by surface complementarity;
  2. re-rank these structures according to a free energy estimate that includes both desolvation[3] and electrostatics and retain the top 2,000 complexes;
  3. cluster the filtered complexes using a pairwise RMS deviation criterion; and
  4. the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.

Step 1: rigid-body docking

Rigid-body docking using the Fast-Fourier Transform (FFT) based program DOT[4][5] is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.

Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 1010 structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.

Step 2: filtering decoys

Following the procedure detailed elsewhere[6][7][8], for each complex we comput the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy[9] and 1,500 best electrostatic energy[10] complexes for a total of 2,000 complex candidates.

Step 3: clustering decoys

We cluster the filtered complexes using a pairwise RMS deviation (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors[11].

The complexes are clustered in either of two ways:

  1. using an all Cα RMSD criterion and a 10 Å cutoff; and
  2. using a Cα binding site RMSD criterion and a cutoff radius of 7 Å.

All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.

Pairwise RMSD clustering

The top 2000 energetically favorable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the Cα of each of those residues and the Cα of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbors below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbors is the cluster center, and is the representative structure for the cluster. The top cluster centers are then CHARMm[12] minimized in the presence of the receptor, and concatenated into PDB NMR format.

Step 4: refinement and discrimination of native-like clusters

Using 10 representative structures from each cluster, the smooth docking algorithm[13] is used to optimize our free energy function around each cluster. We submit the top ranked complexes from those clusters that converge to the lowest free energies as estimated by Eq.1:

ΔG = E_elec + E_desolv + E_vdw   (Eq.1)

The SmoothDock Server

The above algorithm can be used via a webserver, which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a webserver.

Note: Since billions of calculations (a minimum of 2.7 x 1010) are needed for each receptor/ligand complex, we run the algorithm on a dedicated cluster (256 CPUs).

Notes

  • low affinity complexes: K_d < nM

See also

References

  1. Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res, 32:W96-9.
  2. Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. Nucleic Acids Res, 32:W50-4.
  3. Camacho CJ, Kimura SR, DeLisi C, Vajda S (2000). Kinetics of desolvation-mediated protein-protein binding. Biophys J, 78(3):1094-1105.
  4. Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press.
  5. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA, 89:2195-2199.
  6. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  7. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  8. Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. Biophys J, 76:1166-1178.
  9. Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol, 267:707-726.
  10. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  11. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  12. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem, 4:187–217.
  13. Camacho CJ, Vajda S (2001). Protein docking along smooth association pathways. Proc Natl Acad Sci USA, 98:10636-10641.

External links