Difference between revisions of "Dr. Carlos J. Camacho Laboratory"
(→References) |
(→Keywords) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The '''Dr. Carlos J. Camacho Laboratory''' is where I did [[:Category:Academic Research|scientific research]] from October 2004 - July 2005, from March 2006 - September 2006 (''in absentia''), and from December 2006 - February 2007 (''in absentia''). | The '''Dr. Carlos J. Camacho Laboratory''' is where I did [[:Category:Academic Research|scientific research]] from October 2004 - July 2005, from March 2006 - September 2006 (''in absentia''), and from December 2006 - February 2007 (''in absentia''). | ||
+ | |||
+ | ==Scientific programming== | ||
+ | Probably the most complicated programming project I have worked on was one where we were attempting to predict how two proteins will interact (see [[SmoothDock]]). Since billions of calculations (a minimum of 2.7 x 10^10) are needed for each protein/protein complex, I had to write specific (C) code that was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler). As a side note, I had to translate some original Fortran77 code into C so it could be compiled with MPICC. | ||
+ | |||
+ | We wanted to make our algorithm available to the general scientific community and, so, we decided that a web server would be the best implementation. What we needed was a simple, user-friendly interface to the back-end algorithm. Getting the user input (here the coordinates for each atom in a protein) to be transferred to the cluster required a great deal of pre-processing (data parsing, formatting, and error-checking). Likewise, the results returned by the cluster required post-processing to be eventually sent (via email) to the user. | ||
+ | |||
+ | The entire system had to run autonomously (controlled via [[crontab]] scheduling). As the administrator of this setup, I was responsible for keeping the system up at all times. However, since there were thousands of lines of code, if the system should crash it would be difficult to find out where the problem was if I didn't maintain extensive log files. I set these up to be easily parsable and had the system periodically email me the "health" of the system. | ||
+ | |||
+ | The algorithm described above is implemented through a combination of Fortran77 and C code. However, the initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk/gawk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions. | ||
+ | |||
+ | I would say that working with the command line interface (CLI) and scripting languages are my main skills and strengths. These skills have been developed through over seven years of active data mining through literally hundreds of terabytes of data in a wide array of formats and from multiple sources. | ||
+ | |||
+ | ''Note: This server remains up-and-running, as of {{CURRENTDAY}} {{CURRENTMONTHNAME}} {{CURRENTYEAR}}.'' | ||
==Research topics== | ==Research topics== | ||
Line 44: | Line 57: | ||
==Benchmarks== | ==Benchmarks== | ||
see: [[Protein-Protein Docking Benchmark]] | see: [[Protein-Protein Docking Benchmark]] | ||
+ | |||
+ | ==Keywords== | ||
+ | protein-protein interactions; protein-DNA interactions; docking | ||
+ | |||
+ | ==See also== | ||
+ | *''[[Curriculum Vitae]]'' | ||
==References== | ==References== | ||
Line 55: | Line 74: | ||
== External links == | == External links == | ||
− | * [http:// | + | * [http://smoothdock.ccbb.pitt.edu/ Dr. Carlos J. Camacho Laboratory website] |
[[Category:Academic Research]] | [[Category:Academic Research]] |
Latest revision as of 03:37, 27 December 2012
The Dr. Carlos J. Camacho Laboratory is where I did scientific research from October 2004 - July 2005, from March 2006 - September 2006 (in absentia), and from December 2006 - February 2007 (in absentia).
Contents
Scientific programming
Probably the most complicated programming project I have worked on was one where we were attempting to predict how two proteins will interact (see SmoothDock). Since billions of calculations (a minimum of 2.7 x 10^10) are needed for each protein/protein complex, I had to write specific (C) code that was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler). As a side note, I had to translate some original Fortran77 code into C so it could be compiled with MPICC.
We wanted to make our algorithm available to the general scientific community and, so, we decided that a web server would be the best implementation. What we needed was a simple, user-friendly interface to the back-end algorithm. Getting the user input (here the coordinates for each atom in a protein) to be transferred to the cluster required a great deal of pre-processing (data parsing, formatting, and error-checking). Likewise, the results returned by the cluster required post-processing to be eventually sent (via email) to the user.
The entire system had to run autonomously (controlled via crontab scheduling). As the administrator of this setup, I was responsible for keeping the system up at all times. However, since there were thousands of lines of code, if the system should crash it would be difficult to find out where the problem was if I didn't maintain extensive log files. I set these up to be easily parsable and had the system periodically email me the "health" of the system.
The algorithm described above is implemented through a combination of Fortran77 and C code. However, the initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk/gawk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions.
I would say that working with the command line interface (CLI) and scripting languages are my main skills and strengths. These skills have been developed through over seven years of active data mining through literally hundreds of terabytes of data in a wide array of formats and from multiple sources.
Note: This server remains up-and-running, as of 21 November 2024.
Research topics
- Protein-protein docking (Rigid body)
- Protein-protein interaction prediction
- Protein structural alignment
- Molecular docking
- Protein-ligand docking
- Rigid body dynamics (see also: [1])
- Molecular mechanics
- Search algorithm
- Katchalski-Katzir algorithm
- DOT — a molecular interaction programme
- Fast Fourier transform — used in the DOT algorithm
- Convolution theorem
- Van der Waals radius
- Force fields
- MPI / MPICH — for parallel computing
- Mathematics:
- Determinant (linear algebra)
- Minor (linear algebra) (cofactor)
- Trace (linear algebra)
- Moment of inertia (inertia tensor matrix)
- General Least-Squares
- Rotation matrix
- Euler angles
- Euler's equations
Results
The research I have done in the laboratory has, so far, yielded two papers published[1][2] and four Web Servers:
- FastContact Server: a free energy scoring tool for protein-protein complex structures.
- Version 1.0: Programmer, Server architect, and administrator; July 2005–December 2006.
- Version 2.0: Programmer, Server architect, and administrator; January 2007-present.
- SmoothDock Server (currently under development):
- Programmer, Server architect, and administrator; January 2005–present (note: This server uses code optimised and run in parallel on 256 processors).
- LooseLoops Server (currently under development and construction):
- Programmer, Server architect, and administrator; November 2004–present.
- Server architect and administrator; September 2004–present.
CAPRI
I was a participant of "Round 6" (17-Jan-2005) and "Round 7" (29-May-2005) of CAPRI. Our group presented the "SmoothDock Server".
Benchmarks
see: Protein-Protein Docking Benchmark
Keywords
protein-protein interactions; protein-DNA interactions; docking
See also
References
- ↑ Carlos J. Camacho, Ma H, and P. Christoph Champ (2006). Scoring a diverse set of high-quality docked conformations: A metascore based on electrostatic and desolvation interactions. Proteins, 63(4):868-77 DOI:10.1002/prot.20932 . [HubMed]
- ↑ P. Christoph Champ and Carlos J. Camacho (2007). FastContact: a free energy scoring tool for protein-protein complex structures. Nucleic Acids Research (Web Issue). DOI:10.1093/nar/gkm326 . [HubMed]
Further reading
- Katchalski-Katzir, E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natnl Acad Sci USA, 89(6):2195-9.
- Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D (2003). Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Bio, 331(1):281-99.
- Proteins: Structure, Function, and Genetics (special edition) Volume 52, Issue 1, 2003, all pages.
- Bueno M, Camacho CJ, Sancho J (2007). "SIMPLE estimate of the free energy change due to aliphatic mutations: superior predictions based on first principles". Proteins, 68(4):850-62; PMID: 17523191.
- Bueno M, Camacho CJ (2007). "Acidic groups docked to well defined wetted pockets at the core of the binding interface: A tale of scoring and missing protein interactions in CAPRI". Proteins, [Epub]; PMID: 17803211.