Difference between revisions of "Dr. Carlos J. Camacho Laboratory"

From Christoph's Personal Wiki
Jump to: navigation, search
(Research topics)
(Keywords)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
The '''Dr. Carlos J. Camacho Laboratory''' is where I did [[:Category:Academic Research|scientific research]] from October 2004 - July 2005, from March 2006 - September 2006 (''in absentia''), and from December 2006 - February 2007 (''in absentia'').
 
The '''Dr. Carlos J. Camacho Laboratory''' is where I did [[:Category:Academic Research|scientific research]] from October 2004 - July 2005, from March 2006 - September 2006 (''in absentia''), and from December 2006 - February 2007 (''in absentia'').
 +
 +
==Scientific programming==
 +
Probably the most complicated programming project I have worked on was one where we were attempting to predict how two proteins will interact (see [[SmoothDock]]). Since billions of calculations (a minimum of 2.7 x 10^10) are needed for each protein/protein complex, I had to write specific (C) code that was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler). As a side note, I had to translate some original Fortran77 code into C so it could be compiled with MPICC.
 +
 +
We wanted to make our algorithm available to the general scientific community and, so, we decided that a web server would be the best implementation. What we needed was a simple, user-friendly interface to the back-end algorithm. Getting the user input (here the coordinates for each atom in a protein) to be transferred to the cluster required a great deal of pre-processing (data parsing, formatting, and error-checking). Likewise, the results returned by the cluster required post-processing to be eventually sent (via email) to the user.
 +
 +
The entire system had to run autonomously (controlled via [[crontab]] scheduling). As the administrator of this setup, I was responsible for keeping the system up at all times. However, since there were thousands of lines of code, if the system should crash it would be difficult to find out where the problem was if I didn't maintain extensive log files. I set these up to be easily parsable and had the system periodically email me the "health" of the system.
 +
 +
The algorithm described above is implemented through a combination of Fortran77 and C code. However, the initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk/gawk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions.
 +
 +
I would say that working with the command line interface (CLI) and scripting languages are my main skills and strengths. These skills have been developed through over seven years of active data mining through literally hundreds of terabytes of data in a wide array of formats and from multiple sources.
 +
 +
''Note: This server remains up-and-running, as of {{CURRENTDAY}} {{CURRENTMONTHNAME}} {{CURRENTYEAR}}.''
  
 
==Research topics==
 
==Research topics==
Line 28: Line 41:
  
 
== Results ==
 
== Results ==
The research I have done in the laboratory has, so far, yielded a [[Curriculum_Vitae#Publications|paper published]]<ref name=Camacho2006>Carlos J. Camacho, Ma H, and '''P. Christoph Champ''' (2006). Scoring a diverse set of high-quality docked conformations: A metascore based on electrostatic and desolvation interactions. [http://www3.interscience.wiley.com/cgi-bin/abstract/112467717/ABSTRACT?CRETRY=1&SRETRY=0 ''Proteins, 63(4):868-77'']. [[http://www.hubmed.org/display.cgi?uids=16506242 HubMed]]</ref> and four Web Servers:
+
The research I have done in the laboratory has, so far, yielded two [[Curriculum_Vitae#Publications|papers published]]<ref name=Camacho2006>Carlos J. Camacho, Ma H, and '''P. Christoph Champ''' (2006). Scoring a diverse set of high-quality docked conformations: A metascore based on electrostatic and desolvation interactions. [http://www3.interscience.wiley.com/cgi-bin/abstract/112467717/ABSTRACT?CRETRY=1&SRETRY=0 ''Proteins, 63(4):868-77] {{doi|10.1002/prot.20932}}''. [[http://www.hubmed.org/display.cgi?uids=16506242 HubMed]]</ref><ref name="Champ2007">'''P. Christoph Champ''' and Carlos J. Camacho (2007). FastContact: a free energy scoring tool for protein-protein complex structures. ''Nucleic Acids Research (Web Issue)''. {{doi|10.1093/nar/gkm326}}. [[http://www.hubmed.org/display.cgi?uids=17537824 HubMed]]</ref> and four Web Servers:
 
* '''[[FastContact|FastContact Server]]''': a free energy scoring tool for protein-protein complex structures.
 
* '''[[FastContact|FastContact Server]]''': a free energy scoring tool for protein-protein complex structures.
 
:: Version 1.0: Programmer, Server architect, and administrator; July 2005–December 2006.
 
:: Version 1.0: Programmer, Server architect, and administrator; July 2005–December 2006.
Line 44: Line 57:
 
==Benchmarks==
 
==Benchmarks==
 
  see: [[Protein-Protein Docking Benchmark]]
 
  see: [[Protein-Protein Docking Benchmark]]
 +
 +
==Keywords==
 +
protein-protein interactions; protein-DNA interactions; docking
 +
 +
==See also==
 +
*''[[Curriculum Vitae]]''
  
 
==References==
 
==References==
 
<references/>
 
<references/>
=== Further reading ===
+
===Further reading===
* Katchalski-Katzir, E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natnl Acad Sci USA, 89(6):2195-9''.
+
*Katchalski-Katzir, E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. ''Proc Natnl Acad Sci USA, 89(6):2195-9''.
* Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D (2003). Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. ''J Mol Bio, 331(1):281-99''.
+
*Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D (2003). Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. ''J Mol Bio, 331(1):281-99''.
* Proteins: Structure, Function, and Genetics (special edition) Volume 52, Issue 1, 2003, all pages.
+
*Proteins: Structure, Function, and Genetics (special edition) Volume 52, Issue 1, 2003, all pages.
 +
*Bueno M, Camacho CJ, Sancho J (2007). "SIMPLE estimate of the free energy change due to aliphatic mutations: superior predictions based on first principles". ''Proteins, 68(4):850-62; PMID: 17523191''.
 +
*Bueno M, Camacho CJ (2007). "Acidic groups docked to well defined wetted pockets at the core of the binding interface: A tale of scoring and missing protein interactions in CAPRI". ''Proteins, [Epub]; PMID: 17803211''.
  
 
== External links ==
 
== External links ==
* [http://structure.pitt.edu Dr. Carlos J. Camacho Laboratory website]
+
* [http://smoothdock.ccbb.pitt.edu/ Dr. Carlos J. Camacho Laboratory website]
  
 
[[Category:Academic Research]]
 
[[Category:Academic Research]]

Latest revision as of 03:37, 27 December 2012

The Dr. Carlos J. Camacho Laboratory is where I did scientific research from October 2004 - July 2005, from March 2006 - September 2006 (in absentia), and from December 2006 - February 2007 (in absentia).

Scientific programming

Probably the most complicated programming project I have worked on was one where we were attempting to predict how two proteins will interact (see SmoothDock). Since billions of calculations (a minimum of 2.7 x 10^10) are needed for each protein/protein complex, I had to write specific (C) code that was optimised to run in parallel on a dedicated cluster of 256 CPUs (using the MPICH compiler). As a side note, I had to translate some original Fortran77 code into C so it could be compiled with MPICC.

We wanted to make our algorithm available to the general scientific community and, so, we decided that a web server would be the best implementation. What we needed was a simple, user-friendly interface to the back-end algorithm. Getting the user input (here the coordinates for each atom in a protein) to be transferred to the cluster required a great deal of pre-processing (data parsing, formatting, and error-checking). Likewise, the results returned by the cluster required post-processing to be eventually sent (via email) to the user.

The entire system had to run autonomously (controlled via crontab scheduling). As the administrator of this setup, I was responsible for keeping the system up at all times. However, since there were thousands of lines of code, if the system should crash it would be difficult to find out where the problem was if I didn't maintain extensive log files. I set these up to be easily parsable and had the system periodically email me the "health" of the system.

The algorithm described above is implemented through a combination of Fortran77 and C code. However, the initial data (input) and results (output) are sent through multiple pipes as a series of I/O streams using Perl, awk/gawk, sed, and bash scripts. They are all controlled via makefiles and use extensive regular expressions.

I would say that working with the command line interface (CLI) and scripting languages are my main skills and strengths. These skills have been developed through over seven years of active data mining through literally hundreds of terabytes of data in a wide array of formats and from multiple sources.

Note: This server remains up-and-running, as of 24 November 2024.

Research topics

Results

The research I have done in the laboratory has, so far, yielded two papers published[1][2] and four Web Servers:

Version 1.0: Programmer, Server architect, and administrator; July 2005–December 2006.
Version 2.0: Programmer, Server architect, and administrator; January 2007-present.
Programmer, Server architect, and administrator; January 2005–present (note: This server uses code optimised and run in parallel on 256 processors).
Programmer, Server architect, and administrator; November 2004–present.
Server architect and administrator; September 2004–present.

CAPRI

I was a participant of "Round 6" (17-Jan-2005) and "Round 7" (29-May-2005) of CAPRI. Our group presented the "SmoothDock Server".

Benchmarks

see: Protein-Protein Docking Benchmark

Keywords

protein-protein interactions; protein-DNA interactions; docking

See also

References

  1. Carlos J. Camacho, Ma H, and P. Christoph Champ (2006). Scoring a diverse set of high-quality docked conformations: A metascore based on electrostatic and desolvation interactions. Proteins, 63(4):868-77 DOI:10.1002/prot.20932 . [HubMed]
  2. P. Christoph Champ and Carlos J. Camacho (2007). FastContact: a free energy scoring tool for protein-protein complex structures. Nucleic Acids Research (Web Issue). DOI:10.1093/nar/gkm326 . [HubMed]

Further reading

  • Katchalski-Katzir, E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA. Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natnl Acad Sci USA, 89(6):2195-9.
  • Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D (2003). Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Bio, 331(1):281-99.
  • Proteins: Structure, Function, and Genetics (special edition) Volume 52, Issue 1, 2003, all pages.
  • Bueno M, Camacho CJ, Sancho J (2007). "SIMPLE estimate of the free energy change due to aliphatic mutations: superior predictions based on first principles". Proteins, 68(4):850-62; PMID: 17523191.
  • Bueno M, Camacho CJ (2007). "Acidic groups docked to well defined wetted pockets at the core of the binding interface: A tale of scoring and missing protein interactions in CAPRI". Proteins, [Epub]; PMID: 17803211.

External links