Difference between revisions of "SmoothDock"

From Christoph's Personal Wiki
Jump to: navigation, search
Line 1: Line 1:
 
'''''SmoothDock''''' is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by [[Dr. Carlos J. Camacho Laboratory|Carlos J. Camacho]] and [[Christoph Champ]] at the University of Pittsburgh. It is based upon a previous algorithm, ''ClusPro''<ref>Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. ''Nucleic Acids Res'', '''32''':W96-9.</ref>, developed by Camacho and Steven R. Comeau at Boston University (note: ''ClusPro'' is also based on a previous algorithm, ''Consensus''<ref>Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. ''Nucleic Acids Res'', '''32''':W50-4.</ref>).
 
'''''SmoothDock''''' is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by [[Dr. Carlos J. Camacho Laboratory|Carlos J. Camacho]] and [[Christoph Champ]] at the University of Pittsburgh. It is based upon a previous algorithm, ''ClusPro''<ref>Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. ''Nucleic Acids Res'', '''32''':W96-9.</ref>, developed by Camacho and Steven R. Comeau at Boston University (note: ''ClusPro'' is also based on a previous algorithm, ''Consensus''<ref>Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. ''Nucleic Acids Res'', '''32''':W50-4.</ref>).
  
== The ''SmoothDock'' algorithm ==
+
==The ''SmoothDock'' algorithm==
 
The ''SmoothDock'' algorithm comprises four steps:
 
The ''SmoothDock'' algorithm comprises four steps:
 
# perform rigid-body docking using the program [[DOT]], keeping the top 20,000 structures as ranked by surface complementarity;
 
# perform rigid-body docking using the program [[DOT]], keeping the top 20,000 structures as ranked by surface complementarity;
Line 8: Line 8:
 
# the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.
 
# the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.
  
=== Step 1: rigid-body docking ===
+
===Step 1: rigid-body docking===
 
Rigid-body docking using the Fast-Fourier Transform (FFT) based program [[DOT]]<ref>Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. ''In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press''.</ref><ref>Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natl Acad Sci USA'', '''89''':2195-2199.</ref> is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.
 
Rigid-body docking using the Fast-Fourier Transform (FFT) based program [[DOT]]<ref>Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. ''In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press''.</ref><ref>Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natl Acad Sci USA'', '''89''':2195-2199.</ref> is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.
  
 
Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 10<sup>10</sup> structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.
 
Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 10<sup>10</sup> structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.
  
=== Step 2: filtering decoys ===
+
===Step 2: filtering decoys===
 
Following the procedure detailed elsewhere<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref><ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref><ref>Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. ''Biophys J'', '''76''':1166-1178.</ref>, for each complex we comput the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy<ref>Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. ''J Mol Biol'', '''267''':707-726.</ref> and 1,500 best electrostatic energy<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref> complexes for a total of 2,000 complex candidates.
 
Following the procedure detailed elsewhere<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref><ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref><ref>Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. ''Biophys J'', '''76''':1166-1178.</ref>, for each complex we comput the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy<ref>Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. ''J Mol Biol'', '''267''':707-726.</ref> and 1,500 best electrostatic energy<ref>Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. ''Proteins'', '''40''':525-537.</ref> complexes for a total of 2,000 complex candidates.
  
=== Step 3: clustering decoys ===
+
===Step 3: clustering decoys===
 
We cluster the filtered complexes using a ''pairwise RMS deviation'' (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors<ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref>.
 
We cluster the filtered complexes using a ''pairwise RMS deviation'' (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors<ref>Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. ''To be Submitted''.</ref>.
  
Line 25: Line 25:
 
All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.
 
All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.
  
==== Pairwise RMSD clustering ====
+
====Pairwise RMSD clustering====
 
The top 2000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C&alpha; of each of those residues and the C&alpha; of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then CHARMM<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> minimized in the presence of the receptor, and concatenated into PDB NMR format.
 
The top 2000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the C&alpha; of each of those residues and the C&alpha; of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then CHARMM<ref>Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. ''J Comput Chem'', '''4''':187–217.</ref> minimized in the presence of the receptor, and concatenated into PDB NMR format.
  
=== Step 4: refinement and discrimination of native-like clusters ===
+
===Step 4: refinement and discrimination of native-like clusters===
 
Using 10 representative structures from each cluster, the smooth docking algorithm<ref>Camacho CJ, Vajda S (2001). Protein docking along smooth association pathways. ''Proc Natl Acad Sci USA'', '''98''':10636-10641.</ref> is used to optimize our free energy function around each cluster. We submit the top ranked complexes from those clusters that converge to the lowest free energies as estimated by Eq.1:
 
Using 10 representative structures from each cluster, the smooth docking algorithm<ref>Camacho CJ, Vajda S (2001). Protein docking along smooth association pathways. ''Proc Natl Acad Sci USA'', '''98''':10636-10641.</ref> is used to optimize our free energy function around each cluster. We submit the top ranked complexes from those clusters that converge to the lowest free energies as estimated by Eq.1:
 
  &Delta;G = E_elec + E_desolv + E_vdw  ''(Eq.1)''
 
  &Delta;G = E_elec + E_desolv + E_vdw  ''(Eq.1)''
  
== Technical details ==
+
==Technical details==
 
The algorithm described above is implemented through a combination of Fortran77 and C code. The initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using [[Perl]], [[awk]], [[sed]], and [[bash]] scripts. They are all controlled via [[make|makefiles]] and use extensive [[regular expression]]s.
 
The algorithm described above is implemented through a combination of Fortran77 and C code. The initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using [[Perl]], [[awk]], [[sed]], and [[bash]] scripts. They are all controlled via [[make|makefiles]] and use extensive [[regular expression]]s.
  
 
Since billions of calculations (a minimum of 2.7 x 10<sup>10</sup>) are needed for ''each'' receptor/ligand complex, the main (C) code was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler).
 
Since billions of calculations (a minimum of 2.7 x 10<sup>10</sup>) are needed for ''each'' receptor/ligand complex, the main (C) code was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler).
  
== The ''SmoothDock'' Server ==
+
==The ''SmoothDock'' Server==
 
The above algorithm can be used via a [http://structure.pitt.edu/servers/smoothdock/ web server], which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a web server.
 
The above algorithm can be used via a [http://structure.pitt.edu/servers/smoothdock/ web server], which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a web server.
  
== Notes ==
+
==Notes==
* low affinity complexes: K_d < nM
+
*low affinity complexes: K_d < nM
  
 
==See also==
 
==See also==
 
*[[CAPRI]]
 
*[[CAPRI]]
*[http://en.wikipedia.org/wiki/Electrostatics Electrostatics]
+
*[[wikipedia:Electrostatics]]
*[http://en.wikipedia.org/wiki/Permittivity Permittivity]
+
*[[wikipedia:Permittivity]]
*[http://en.wikipedia.org/wiki/Coulomb%27s_law Coulomb's law]
+
*[[wikipedia:Coulomb's law]]
*[http://en.wikipedia.org/wiki/Dielectric Dielectric]
+
*[[wikipedia:Dielectric]]
*[http://en.wikipedia.org/wiki/Dielectric_constant Dielectric constant]
+
*[[wikipedia:Dielectric constant]]
*[http://en.wikipedia.org/wiki/Fourier_transform Fourier transform]
+
*[[wikipedia:Fourier transform]]
*[http://en.wikipedia.org/wiki/Fast_Fourier_transform Fast Fourier transform]
+
*[[wikipedia:Fast Fourier transform]]
*[http://en.wikipedia.org/wiki/RMSD RMSD]
+
*[[wikipedia:Root mean square deviation (bioinformatics)]] (RMSD)
 
*[http://dock.compbio.ucsf.edu/Contributed_Code/index.htm Official UCSF DOCK Web-site]
 
*[http://dock.compbio.ucsf.edu/Contributed_Code/index.htm Official UCSF DOCK Web-site]
 
*[http://wiki.compbio.ucsf.edu/wiki/index.php/Main_Page Dockumentation] &mdash; a community-driven project to document the UCSF DOCK program.
 
*[http://wiki.compbio.ucsf.edu/wiki/index.php/Main_Page Dockumentation] &mdash; a community-driven project to document the UCSF DOCK program.
  
== References ==
+
==References==
 
<small><references/></small>
 
<small><references/></small>
== External links ==
+
==External links==
* [http://structure.pitt.edu/servers/smoothdock/ SmoothDock Server]
+
*[http://structure.pitt.edu/servers/smoothdock/ SmoothDock Server]
* [http://www.imb-jena.de/~rake/Bioinformatics_WEB/dd_tools.html List of docking software / servers]
+
*[http://www.imb-jena.de/~rake/Bioinformatics_WEB/dd_tools.html List of docking software / servers]
  
 
[[Category:Academic Research]]
 
[[Category:Academic Research]]
 
[[Category:Bioinformatics]]
 
[[Category:Bioinformatics]]

Revision as of 07:40, 20 April 2007

SmoothDock is a fully automated algorithm for finding physical interactions between proteins involved in common cellular functions. It was developed by Carlos J. Camacho and Christoph Champ at the University of Pittsburgh. It is based upon a previous algorithm, ClusPro[1], developed by Camacho and Steven R. Comeau at Boston University (note: ClusPro is also based on a previous algorithm, Consensus[2]).

The SmoothDock algorithm

The SmoothDock algorithm comprises four steps:

  1. perform rigid-body docking using the program DOT, keeping the top 20,000 structures as ranked by surface complementarity;
  2. re-rank these structures according to a free energy estimate that includes both desolvation[3] and electrostatics and retain the top 2,000 complexes;
  3. cluster the filtered complexes using a pairwise RMS deviation criterion; and
  4. the twenty-five largest clusters are subject to a smooth docking discrimination algorithm where van der Waals forces are taken into account.

Step 1: rigid-body docking

Rigid-body docking using the Fast-Fourier Transform (FFT) based program DOT[4][5] is performed for each receptor/ligand target. The output of this program is the top 20,000 receptor/ligand complexes sampled by the DOT program and ranked according to surface complementarity. Any experimental constraint on the binding area is also imposed here.

Although DOT allows for the use of an electrostatic potential in the scoring function, we base the scoring solely on the surface complementarity between the two structures. DOT is run on a 128 Å x 128 Å x 128 Å grid, using a grid spacing of 1 Å. Using a pre-defined list of 13,000 rotations, over 2.7 x 1010 structures are evaluated, retaining 20,000 structures with the best surface complementarity scores, which are then further subjected to the empirical free energy filtering algorithm described below.

Step 2: filtering decoys

Following the procedure detailed elsewhere[6][7][8], for each complex we comput the effective desolvation and electrostatic binding affinity between receptor and ligand. We then filter the 500 best desolvation energy[9] and 1,500 best electrostatic energy[10] complexes for a total of 2,000 complex candidates.

Step 3: clustering decoys

We cluster the filtered complexes using a pairwise RMS deviation (RMSD) criterion (see below), and retain the twenty-five clusters with the highest number of neighbors[11].

The complexes are clustered in either of two ways:

  1. using an all Cα RMSD criterion and a 10 Å cutoff; and
  2. using a Cα binding site RMSD criterion and a cutoff radius of 7 Å.

All clustering is done in a hierarchical manner such that no overlaps occurred between distinct clusters.

Pairwise RMSD clustering

The top 2000 energetically favourable structures are then clustered on the basis of a pairwise binding site root mean squared deviation (RMSD) criterion. For each of the 2000 structures, the residues of the moving molecule (designated as the ligand) that have at least one atom within 10 Å of any atom of the still molecule (designated as the receptor) are recorded into a list. Then, the distance between the Cα of each of those residues and the Cα of the corresponding residues on each of the 2000 ligands is calculated and stored into a matrix. Clusters are then formed by selecting the ligand that has the most neighbours below a previously selected clustering radius. Each member within the cluster is then eliminated from the matrix to avoid overlaps between clusters. This is repeated until at least 30 clusters are formed. The ligand with the most neighbours is the cluster centre, and is the representative structure for the cluster. The top cluster centres are then CHARMM[12] minimized in the presence of the receptor, and concatenated into PDB NMR format.

Step 4: refinement and discrimination of native-like clusters

Using 10 representative structures from each cluster, the smooth docking algorithm[13] is used to optimize our free energy function around each cluster. We submit the top ranked complexes from those clusters that converge to the lowest free energies as estimated by Eq.1:

ΔG = E_elec + E_desolv + E_vdw   (Eq.1)

Technical details

The algorithm described above is implemented through a combination of Fortran77 and C code. The initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions.

Since billions of calculations (a minimum of 2.7 x 1010) are needed for each receptor/ligand complex, the main (C) code was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler).

The SmoothDock Server

The above algorithm can be used via a web server, which I developed in January 2005. It is a fully automated algorithm for protein–protein docking via a web server.

Notes

  • low affinity complexes: K_d < nM

See also

References

  1. Comeau SR, Gatchell DW, Vajda S, Camacho CJ (2004). ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res, 32:W96-9.
  2. Prasad JC, Vajda S, Camacho CJ (2004). Consensus alignment server for reliable comparative modeling with distant templates. Nucleic Acids Res, 32:W50-4.
  3. Camacho CJ, Kimura SR, DeLisi C, Vajda S (2000). Kinetics of desolvation-mediated protein-protein binding. Biophys J, 78(3):1094-1105.
  4. Ten Eyck LF, Mandell J, Roberts VA, Pique ME (1995). Surveying molecular interactions with DOT. In: Hayes A, Simmons M, editors. Proceedings of the 1995 ACM/IEEE Supercomputing Conference. New York: ACM Press.
  5. Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem A, Aflalo C, Vakser I (1992). Molecular surface recognition: determinination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA, 89:2195-2199.
  6. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  7. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  8. Camacho CJ, Weng Z, Vajda S, DeLisi C (1999). Free energy landscapes of encounter complexes in protein-protein association. Biophys J, 76:1166-1178.
  9. Zhang C, Vasmatzis G, Cornette JL (1997). Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol, 267:707-726.
  10. Camacho C, Gatchell D, Kimura R, Vajda S (2000). Scoring docked conformations generated by rigid body protein-protein docking. Proteins, 40:525-537.
  11. Gatchell D, Vajda S, Camacho CJ. Sampling, clustering, refinement and discrimination of protein interactions using SmoothDock. To be Submitted.
  12. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem, 4:187–217.
  13. Camacho CJ, Vajda S (2001). Protein docking along smooth association pathways. Proc Natl Acad Sci USA, 98:10636-10641.

External links