PDB Validation Suite

From Christoph's Personal Wiki
Jump to: navigation, search

Note: The majority of this article was taken directly from the README file.

The PDB Validation Suite is a set of tools used by the PDB for processing and checking structure data.

Installation

Note: The binary distribution contains an additional submodule, called PROCHECK, which does more structure checking. This submodule is not available in the source distribution.

  • Uncompress and unbundle the distribution using the following command:
zcat validation-vX.XXX-XXX.tar.gz | tar -xf - 

The result of this command is a subdirectory validation-vX.XXX-XXX in the current directory, which contains the following:

bin 
subdirectory that contains application executable "validation-v8"
data 
subdirectory that contains some data files needed by the application.
etc 
subdirectory that contains utility scripts and application software license agreement.
procheck 
subdirectory that contains executables for "procheck"
  • Set up the environment variables.

Define the RCSBROOT environment variable to point to the installation directory. Note that the RCSBROOT environment is also used for other RCSB applications like ADIT and PDB_EXTRACT. If all these applications are running on a computer, the last instance of setenv command will define the environment. Thus, set the environment at the command line as follows, just prior to running the application. Assuming that the installation directory is:

/home/username/validation-vX.XXX-XXX

execute in the shell (for Bourne shell):

RCSBROOT=/home/username/validation-vX.XXX-XXX; export RCSBROOT

Add "bin" subdirectory to the PATH environment variable.

PATH="$RCSBROOT/bin:"$PATH; export PATH
  • Make binary data from ASCII data

Position in the validation-vX.XXX-XXX/etc directory and run the script binary.sh:

cd validation-vX.XXX-XXX/etc
./binary.sh 

This command will create certain binary data files, using the ASCII data files in data/ascii directory. The resulting files are stored in data/binary directory. Note that it may take several minutes for this step to complete. This step must be executed before the tool can be utilized.

Application Usage Notes

Usage

  • For mmCIF files (Please note, only mmCIF format files downloaded from the PDB or generated by PDB_EXTRACT should be used):
validation-v8 -f file_name -o 2 -adit -exchange -public
  • For PDB files:
validation-v8 -f file_name -o 0 -adit

For example, to create reports for a file in mmCIF format named 1xyz.cif type:

validation-v8 -f 1xyz.cif -o 2 -adit -exchange -public

Output

The names of the output files begin with the root identifier <ID>, which is followed by an extension that indicates the file type.

For a PDB format file, the program converts the file name without extension into uppercase for the <ID>. For an mmCIF format file, the program uses data block identifier as the <ID>.

The application creates the following files:

  • <ID>.letter: a text file that contains a summary validation letter.
  • <ID>.ps: a PostScript file that contains molecular graphics of the structure.

For crystal structures, this includes a view of the asymmetric unit and crystal packing. If the mmCIF file was validated and the biological unit of the entry is either larger or smaller than the asymmetric unit, and the struct_biol_gen category was appropriately completed in the mmCIF file, then a view of the biological unit(s) will be included.

For NMR ensemble structures, a view of the first model and the ensemble of all models is included. If the NMR entry contains one model, a view of the model will be included.

NUCHECK output: If the structure contains nucleic acids, the <ID>.ps file also includes plots describing the geometry, torsion, and base morphology of the nucleic acids generated by the program NUCHECK.

  • PROCHECK output: For crystal structures containing protein, there are ten PostScript files from PROCHECK:

File name / File contains

  1. <ID>_01.ps: Ramachandran plot
  2. <ID>_02.ps: Ramachandran plots by residue
  3. <ID>_03.ps: Chi1-Chi2 plots
  4. <ID>_04.ps: Main-chain parameters
  5. <ID>_05.ps: Side-chain parameters
  6. <ID>_06.ps: Residue properties
  7. <ID>_07.ps: Main-chain bond distance comparisons
  8. <ID>_08.ps: Main-chain bond angle comparisons
  9. <ID>_09.ps: RMS deviations from planarity
  10. <ID>_10.ps: Summary of geometrical distortions

For NMR structures containing protein, there are nine PostScript files from PROCHECK: File name / File contains

  1. <ID>_01.ps: Ramachandran plot
  2. <ID>_02.ps: Ramachandran plots for all residue types
  3. <ID>_03.ps: Chi1-Chi2 plots
  4. <ID>_04.ps: Chi1 frequency distributions
  5. <ID>_05.ps: Chi2 frequency distributions
  6. <ID>_06.ps: Ensemble Ramachandran plots
  7. <ID>_07.ps: Residue properties
  8. <ID>_08.ps: Equivalent resolution
  9. <ID>_09.ps: Model secondary structures
  • <ID>.html: This html file is an Atlas summary containing the following:

For all structures:

The sequence of the residues in each chain (from entity_poly for a mmCIF file, from SEQRES for a PDB file, or from the coordinates if entity_poly or SEQRES are not provided).
Citation information (if provided).
Refinement information (if provided).

For crystal structures, additional information is listed:

Space group and cell constants.
Crystallization conditions (if provided).
Refinement information (if provided).

References

procheck

  • Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst, 26:283-291.
  • Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992). Stereochemical quality of protein structure coordinates. Proteins, 12:345-364.

nucheck

  • Feng Z, Westbrook J, Berman HM (1998). NUCheck. Rutgers University, New Brunswick, NJ; Report No.: NDB-407.

External links