computer programs\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767
Volume 49| Part 3| June 2016| Pages 1081-1084

DCC: a Swiss army knife for structure factor analysis and validation

CROSSMARK_Color_square_no_text.svg

aResearch Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Chemistry and Chemical Biology, Center for Integrative Proteomics Research, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA, bInstitute for Quantitative BioMedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA, and cSan Diego Supercomputer Center and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA
*Correspondence e-mail: ezra.peisach@rcsb.org

Edited by A. R. Pearson, Universität Hamburg, Germany (Received 25 November 2015; accepted 15 March 2016; online 18 April 2016)

Since 2008, X-ray structure depositions to the Protein Data Bank archive (PDB) have required submission of experimental data in the form of structure factor files. RCSB PDB has developed the program DCC to allow worldwide PDB (wwPDB; https://wwpdb.org) biocurators, using a single command-line program, to invoke a number of third-party software packages to compare the model file with the experimental data. DCC functionality includes structure factor validation, electron-density map generation and slicing, local electron-density analysis, and residual B factor analysis. DCC outputs a summary containing various crystallographic statistics in PDBx/mmCIF format for use in automatic data processing and archiving pipelines.

1. Introduction

The Protein Data Bank (PDB) is the single global archive of biological structures determined by X-ray crystallography, nuclear magnetic resonance (NMR) and three-dimensional electron microscopy. The archive is managed by the Worldwide PDB collaboration (wwPDB) (Berman et al., 2003[Berman, H. M., Henrick, K. & Nakamura, H. (2003). Nat. Struct. Biol. 10, 980.]). wwPDB members include the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]), Protein Data Bank in Europe (Velankar et al., 2016[Velankar, S. et al. (2016). Nucleic Acids Res. 44, D385-D395.]), Protein Data Bank Japan (Kinjo et al., 2012[Kinjo, A. R., Suzuki, H., Yamashita, R., Ikegawa, Y., Kudou, T., Igarashi, R., Kengaku, Y., Cho, H., Standley, D. M., Nakagawa, A. & Nakamura, H. (2012). Nucleic Acids Res. 40, D453-D460.]) and the Biological Magnetic Resonance Bank (Ulrich et al., 2008[Ulrich, E. L. et al. (2008). Nucleic Acids Res. 36, D402-D408.]).

Prior to 2008, only the atomic coordinate model of the structure was required for PDB archive deposition. Subsequently, submission of experimental data (structure factors for X-ray crystallography, restraints and chemical shifts for NMR) became mandatory (https://www.wwpdb.org/news/news?year=2007#29-November-2007). At this time, numerous individual programs were available to aid in the manipulation and validation of the experimental data relative to the model, but all required expertise and familiarity with the details of each program.

DCC was created by RCSB PDB to combine and enable use of these existing programs. Some of the features include structure factor validation, electron-density map calculation, real-space R (RSR) calculations, detection and correction of partial B factors, and production of cut electron maps and scripts for display in Jmol (Hanson, 2010[Hanson, R. M. (2010). J. Appl. Cryst. 43, 1250-1260.]). The program name, DCC, comes from one of these functions and was named for electron-density correlation coefficient. These features are used daily by wwPDB biocurators.

2. Methods

2.1. Program function

DCC is a Python wrapper for a number of third-party software programs, including SFCHECK (Vaguine et al., 1999[Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). Acta Cryst. D55, 191-205.]), PHENIX (Adams et al., 2002[Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948-1954.]), REFMAC (Murshudov et al., 1996[Murshudov, G., Dodson, E. & Vagin, A. (1996). Proceedings of the CCP4 Study Weekend: Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 93-104. Warrington: CCLRC Daresbury Laboratory.]), MAPMAN (Kleywegt et al., 2004[Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240-2249.]) and CNS (Brünger et al., 1998[Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905-921.]). Through a command-line interface, DCC converts structure factor files from any recognized format, creates the specific input files required for each of these programs and then runs the required programs (Table 1[link]). DCC will also utilize whatever metadata are present in the atomic coordinate model file, including TLS records and wavelength and twinning information, to produce suitable input data for third-party packages. For instance, a virus structure in which strict non-crystallographic symmetry (NCS) refinement has been used may not include atomic coordinates for the entire asymmetric unit in the model file. In this case, DCC will expand the coordinates using the NCS operators for use with third-party programs.

Table 1
The list of command-line options available in DCC

The basic command `dcc -pdb xyzfile -sf sffile' performs the default functionalities described in the text using REFMAC. Any metadata in the model file are utilized in the calculations. If such information is not found in the file, parameters are optimized so that the calculated statistics best match those reported.

The basic command:
dcc -pdb xyzfile -sf sffile Where xyzfile is the coordinate file in either PDB or PDBx(mmCIF) format, and sffile is the structure factor file, which can be in any of the following formats [mtz, mmCIF, CIF (for small molecules; IUCr), CNS/Xplor, HKL2000/SCALEPACK, Dtrek, SHELX, SAINT, EPMR, XSCALE, XPREP, TNT, XTALVIEW, X-GEN, XENGEN, MULTAN and MAIN].
 
 
The options below can be added to the above command to perform additional tasks:
-o Followed by an output file name to hold the calculated statistics. If not given, the default name (pdbfile + _rcc_sum.cif) will be used.
-diags Followed by a log file name to hold error/warning messages.
-verb Add to keep the intermediate files during computations.
-rsr_all Add to calculate electron-density statistics (RSR, RSRZ, RSCC) by groups [residual, main chain, side chain, phosphate (if RNA/DNA)].
-edstat Add to use the EDSTAT program to calculate electron-density statistics (RSR, RSRZ, RSCC, RSZD, RSDO) by groups [residual, main chain, side chain, phosphate (if RNA/DNA)].
-sfcheck Add to validate X-ray data by SFCHECK.
-refmac Add to validate X-ray data by REFMAC (default).
-phenix_x Add to validate X-ray data by PHENIX (model_vs_data).
-phenix_n Add to validate neutron data by PHENIX (model_vs_data).
-phenix_xn Add to validate neutron and X-ray hybrid data by PHENIX (model_vs_data). The structure factor file (sffile) must be in mmCIF format. The first data block must be the X-ray data and the second data block must be the neutron data.
-cns Add to validate X-ray data by CNS/Xplor.
-all Add to validate X-ray data by all the programs (SFCHECK, REFMAC, Phenix). The calculated statistics such as R/Rfree will be listed by the programs.
-auto Add to validate X-ray data by the program used for refinement in the coordinate file (xyzfile). If the program fails then other programs will be used.
-map Add to calculate maps (mFoDFc, 2mFoDFc) in CCP4 format.
-ligmap Add to produce all the files (ligand density maps, tables and html files) and Jmol scripts for displaying the ligand density in a browser.
-omitmap Add to calculate residual electron-density statistics (RSR, RSRZ, RSCC) after omitting all the ligands.
-omit Followed by an identifier to calculate the omit map. For example, the command dcc -pdb xyzfile -sf sffile -omit A_3:5 calculates a map omitting residue numbers from 3 to 5 of chain A.
-fem Add to calculate density statistics and the map using the feature-enhanced map in PHENIX.
-bfull Convert residual to full B factors using the command dcc -bfull xyzfile.

One challenge in producing files for input to refinement programs is how best to represent ligands. Refinement programs require a full definition of all chemical components present in the system, including bond order and connectivity. However, for unreleased components, and prior to processing, such information is not available. Therefore, DCC treats all ligands as individual atoms for presentation to the refinement programs.

For structure factor validation, the user may specify which refinement package to use, or an automatic mode may be invoked that will use the package specified in the model file (Table 1[link]). A zero-cycle (static) refinement is used and the resulting calculated Rwork and Rfree and other statistics are captured. Based on the statistical analysis of the calculated data items, errors and warnings will be included in the output file. Sample output is depicted in Fig. 1[link].

[Figure 1]
Figure 1
Example output from running DCC on the PDB entry 4n7l (Saer et al., 2014[Saer, R. G., Pan, J., Hardjasa, A., Lin, S., Rosell, F., Mauk, A. G., Woodbury, N. W., Murphy, M. E. & Beatty, J. T. (2014). Biochim. Biophys. Acta, 1837, 366-374.]) invoked with the command dcc -pdb 4n7l.cif -sf r4n7lsf.ent using publicly available data from the PDB archive. Ellipses represent sections of the file that have been removed for brevity. The first section is the local real-space density statistics determined by the programs REFMAC and MAPMAN. The second section is a combination of the model file and the results of phenix.xtriage, and the third section is the result of structure factor validation in REFMAC. Various error/warning messages are presented by the PDBx/mmCIF data item (pdbx_density.error). There are more than 200 possible error/warning messages in the DCC program. Different structures will export different messages.

When TLS restraints are used in refining a structural model with REFMAC, authors occasionally deposit structures containing only partial B factors without including the isotropic TLS contribution (Touw & Vriend, 2014[Touw, W. G. & Vriend, G. (2014). Protein Eng. Des. Sel. 27, 457-462.]). DCC detects these partial B factors and then uses TLSANL (Howlin et al., 1993[Howlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622-624.]) to produce full B factors before performing validation.

DCC uses REFMAC (Murshudov et al., 1996[Murshudov, G., Dodson, E. & Vagin, A. (1996). Proceedings of the CCP4 Study Weekend: Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 93-104. Warrington: CCLRC Daresbury Laboratory.]) to produce electron-density maps. For local density analysis of both polymer and non-polymer residues, both EDSTAT (Tickle, 2012[Tickle, I. J. (2012). Acta Cryst. D68, 454-467.]) and MAPMAN (Kleywegt et al., 2004[Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240-2249.]) are used to calculate RSR factors, density correlations and the real-space difference density Z score. MAPMASK (Winn et al., 2011[Winn, M. D. et al. (2011). Acta Cryst. D67, 235-242.]) is used to produce sliced maps for use with Jmol visualization.

The results of any analysis, and any additional calculations performed by DCC, are captured and stored in a PDBx/mmCIF formatted file. This feature allows DCC to be utilized as a component by other programs for further analysis. This capability also allows for the generation of tabular reports for review during PDB archive biocuration and facile loading to relational databases.

3. Results and discussion

The wrapper program DCC was developed as a command-line tool that can perform a variety of tasks to aid in the validation of structure factors and atomic coordinate models and the biocuration of PDB depositions. It supports format conversion and generates appropriate input files for a number of third-party programs. By the creation of a simple-to-use front end, biocurators and users are provided access to a variety of software packages without having to know the intricacies of each.

The versatility of a tool such as DCC is shown by its use in wwPDB validation reports. In 2008, the wwPDB formed an X-ray Validation Task Force (Read et al., 2011[Read, R. J. et al. (2011). Structure, 19, 1395-1412.]). To develop validation reports based on their recommendations, the wwPDB created a validation suite for X-ray structures (Gore et al., 2012[Gore, S., Velankar, S. & Kleywegt, G. J. (2012). Acta Cryst. D68, 478-483.]) that uses DCC to validate deposited structure factors.

Another use case arose during the 2011 wwPDB remediation effort to identify X-ray structures in which partial B factors were present in the atomic coordinate model file. Based on the output of DCC, annotators corrected TLS information in the entries and furnished an indicator that only partial B factors were present.

4. Conclusions

The program DCC is a versatile tool that is used daily by wwPDB biocurators. The usage of PDBx/mmCIF allows DCC to be employed in automatic pipelines. It is available for download from https://sw-tools.rcsb.org.

Acknowledgements

The authors thank Chenghua Shao for extensive testing and feedback during software development. This work was funded by a grant (No. DBI-1338415) from the US National Science Foundation. The RCSB PDB is managed by two members of the RCSB, Rutgers and UCSD, and is a member of the wwPDB organization.

References

First citationAdams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBerman, H. M., Henrick, K. & Nakamura, H. (2003). Nat. Struct. Biol. 10, 980.  Web of Science CrossRef PubMed Google Scholar
First citationBerman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBrünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921.  Web of Science CrossRef IUCr Journals Google Scholar
First citationGore, S., Velankar, S. & Kleywegt, G. J. (2012). Acta Cryst. D68, 478–483.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHanson, R. M. (2010). J. Appl. Cryst. 43, 1250–1260.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHowlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622–624.  CrossRef Web of Science IUCr Journals Google Scholar
First citationKinjo, A. R., Suzuki, H., Yamashita, R., Ikegawa, Y., Kudou, T., Igarashi, R., Kengaku, Y., Cho, H., Standley, D. M., Nakagawa, A. & Nakamura, H. (2012). Nucleic Acids Res. 40, D453–D460.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240–2249.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMurshudov, G., Dodson, E. & Vagin, A. (1996). Proceedings of the CCP4 Study Weekend: Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 93–104. Warrington: CCLRC Daresbury Laboratory.  Google Scholar
First citationRead, R. J. et al. (2011). Structure, 19, 1395–1412.  Web of Science CrossRef CAS PubMed Google Scholar
First citationSaer, R. G., Pan, J., Hardjasa, A., Lin, S., Rosell, F., Mauk, A. G., Woodbury, N. W., Murphy, M. E. & Beatty, J. T. (2014). Biochim. Biophys. Acta, 1837, 366–374.  Web of Science CrossRef CAS PubMed Google Scholar
First citationTickle, I. J. (2012). Acta Cryst. D68, 454–467.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTouw, W. G. & Vriend, G. (2014). Protein Eng. Des. Sel. 27, 457–462.  Web of Science CrossRef CAS PubMed Google Scholar
First citationUlrich, E. L. et al. (2008). Nucleic Acids Res. 36, D402–D408.  Web of Science CrossRef PubMed CAS Google Scholar
First citationVaguine, A. A., Richelle, J. & Wodak, S. J. (1999). Acta Cryst. D55, 191–205.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationVelankar, S. et al. (2016). Nucleic Acids Res. 44, D385–D395.  Web of Science CrossRef PubMed Google Scholar
First citationWinn, M. D. et al. (2011). Acta Cryst. D67, 235–242.  Web of Science CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767
Volume 49| Part 3| June 2016| Pages 1081-1084
Follow J. Appl. Cryst.
Sign up for e-alerts
Follow J. Appl. Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds