computer programs
DCC: a Swiss army knife for analysis and validation
aResearch Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Chemistry and Chemical Biology, Center for Integrative Proteomics Research, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA, bInstitute for Quantitative BioMedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA, and cSan Diego Supercomputer Center and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA
*Correspondence e-mail: ezra.peisach@rcsb.org
Since 2008, X-ray structure depositions to the Protein Data Bank archive (PDB) have required submission of experimental data in the form of DCC to allow worldwide PDB (wwPDB; https://wwpdb.org) biocurators, using a single command-line program, to invoke a number of third-party software packages to compare the model file with the experimental data. DCC functionality includes validation, electron-density map generation and slicing, local electron-density analysis, and residual B factor analysis. DCC outputs a summary containing various crystallographic statistics in PDBx/mmCIF format for use in automatic data processing and archiving pipelines.
files. RCSB PDB has developed the programKeywords: Protein Data Bank; structure factor validation; utility programs; DCC.
1. Introduction
The Protein Data Bank (PDB) is the single global archive of biological structures determined by X-ray crystallography, nuclear magnetic resonance (NMR) and three-dimensional et al., 2003). wwPDB members include the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (Berman et al., 2000), Protein Data Bank in Europe (Velankar et al., 2016), Protein Data Bank Japan (Kinjo et al., 2012) and the Biological Magnetic Resonance Bank (Ulrich et al., 2008).
The archive is managed by the Worldwide PDB collaboration (wwPDB) (BermanPrior to 2008, only the atomic coordinate model of the structure was required for PDB archive deposition. Subsequently, submission of experimental data (structure factors for X-ray crystallography, restraints and chemical shifts for NMR) became mandatory (https://www.wwpdb.org/news/news?year=2007#29-November-2007). At this time, numerous individual programs were available to aid in the manipulation and validation of the experimental data relative to the model, but all required expertise and familiarity with the details of each program.
DCC was created by RCSB PDB to combine and enable use of these existing programs. Some of the features include validation, electron-density map calculation, real-space R (RSR) calculations, detection and correction of partial B factors, and production of cut electron maps and scripts for display in Jmol (Hanson, 2010). The program name, DCC, comes from one of these functions and was named for electron-density These features are used daily by wwPDB biocurators.
2. Methods
2.1. Program function
DCC is a Python wrapper for a number of third-party software programs, including SFCHECK (Vaguine et al., 1999), PHENIX (Adams et al., 2002), REFMAC (Murshudov et al., 1996), MAPMAN (Kleywegt et al., 2004) and CNS (Brünger et al., 1998). Through a command-line interface, DCC converts files from any recognized format, creates the specific input files required for each of these programs and then runs the required programs (Table 1). DCC will also utilize whatever metadata are present in the atomic coordinate model file, including TLS records and wavelength and information, to produce suitable input data for third-party packages. For instance, a virus structure in which strict (NCS) has been used may not include atomic coordinates for the entire in the model file. In this case, DCC will expand the coordinates using the NCS operators for use with third-party programs.
|
One challenge in producing files for input to DCC treats all ligands as individual atoms for presentation to the programs.
programs is how best to represent ligands. programs require a full definition of all chemical components present in the system, including bond order and connectivity. However, for unreleased components, and prior to processing, such information is not available. Therefore,For ). A zero-cycle (static) is used and the resulting calculated Rwork and Rfree and other statistics are captured. Based on the statistical analysis of the calculated data items, errors and warnings will be included in the output file. Sample output is depicted in Fig. 1.
validation, the user may specify which package to use, or an automatic mode may be invoked that will use the package specified in the model file (Table 1When TLS restraints are used in refining a structural model with REFMAC, authors occasionally deposit structures containing only partial B factors without including the isotropic TLS contribution (Touw & Vriend, 2014). DCC detects these partial B factors and then uses TLSANL (Howlin et al., 1993) to produce full B factors before performing validation.
DCC uses REFMAC (Murshudov et al., 1996) to produce electron-density maps. For local density analysis of both polymer and non-polymer residues, both EDSTAT (Tickle, 2012) and MAPMAN (Kleywegt et al., 2004) are used to calculate RSR factors, density correlations and the real-space difference density Z score. MAPMASK (Winn et al., 2011) is used to produce sliced maps for use with Jmol visualization.
The results of any analysis, and any additional calculations performed by DCC, are captured and stored in a PDBx/mmCIF formatted file. This feature allows DCC to be utilized as a component by other programs for further analysis. This capability also allows for the generation of tabular reports for review during PDB archive biocuration and facile loading to relational databases.
3. Results and discussion
The wrapper program DCC was developed as a command-line tool that can perform a variety of tasks to aid in the validation of structure factors and atomic coordinate models and the biocuration of PDB depositions. It supports format conversion and generates appropriate input files for a number of third-party programs. By the creation of a simple-to-use front end, biocurators and users are provided access to a variety of software packages without having to know the intricacies of each.
The versatility of a tool such as DCC is shown by its use in wwPDB validation reports. In 2008, the wwPDB formed an X-ray Validation Task Force (Read et al., 2011). To develop validation reports based on their recommendations, the wwPDB created a validation suite for X-ray structures (Gore et al., 2012) that uses DCC to validate deposited structure factors.
Another use case arose during the 2011 wwPDB remediation effort to identify X-ray structures in which partial B factors were present in the atomic coordinate model file. Based on the output of DCC, annotators corrected TLS information in the entries and furnished an indicator that only partial B factors were present.
4. Conclusions
The program DCC is a versatile tool that is used daily by wwPDB biocurators. The usage of PDBx/mmCIF allows DCC to be employed in automatic pipelines. It is available for download from https://sw-tools.rcsb.org.
Acknowledgements
The authors thank Chenghua Shao for extensive testing and feedback during software development. This work was funded by a grant (No. DBI-1338415) from the US National Science Foundation. The RCSB PDB is managed by two members of the RCSB, Rutgers and UCSD, and is a member of the wwPDB organization.
References
Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954. Web of Science CrossRef CAS IUCr Journals Google Scholar
Berman, H. M., Henrick, K. & Nakamura, H. (2003). Nat. Struct. Biol. 10, 980. Web of Science CrossRef PubMed Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Gore, S., Velankar, S. & Kleywegt, G. J. (2012). Acta Cryst. D68, 478–483. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hanson, R. M. (2010). J. Appl. Cryst. 43, 1250–1260. Web of Science CrossRef CAS IUCr Journals Google Scholar
Howlin, B., Butler, S. A., Moss, D. S., Harris, G. W. & Driessen, H. P. C. (1993). J. Appl. Cryst. 26, 622–624. CrossRef Web of Science IUCr Journals Google Scholar
Kinjo, A. R., Suzuki, H., Yamashita, R., Ikegawa, Y., Kudou, T., Igarashi, R., Kengaku, Y., Cho, H., Standley, D. M., Nakagawa, A. & Nakamura, H. (2012). Nucleic Acids Res. 40, D453–D460. Web of Science CrossRef CAS PubMed Google Scholar
Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240–2249. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murshudov, G., Dodson, E. & Vagin, A. (1996). Proceedings of the CCP4 Study Weekend: Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 93–104. Warrington: CCLRC Daresbury Laboratory. Google Scholar
Read, R. J. et al. (2011). Structure, 19, 1395–1412. Web of Science CrossRef CAS PubMed Google Scholar
Saer, R. G., Pan, J., Hardjasa, A., Lin, S., Rosell, F., Mauk, A. G., Woodbury, N. W., Murphy, M. E. & Beatty, J. T. (2014). Biochim. Biophys. Acta, 1837, 366–374. Web of Science CrossRef CAS PubMed Google Scholar
Tickle, I. J. (2012). Acta Cryst. D68, 454–467. Web of Science CrossRef CAS IUCr Journals Google Scholar
Touw, W. G. & Vriend, G. (2014). Protein Eng. Des. Sel. 27, 457–462. Web of Science CrossRef CAS PubMed Google Scholar
Ulrich, E. L. et al. (2008). Nucleic Acids Res. 36, D402–D408. Web of Science CrossRef PubMed CAS Google Scholar
Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). Acta Cryst. D55, 191–205. Web of Science CrossRef CAS IUCr Journals Google Scholar
Velankar, S. et al. (2016). Nucleic Acids Res. 44, D385–D395. Web of Science CrossRef PubMed Google Scholar
Winn, M. D. et al. (2011). Acta Cryst. D67, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.