scientific commentaries
The development and use of a crystallographic database
aDepartment of Chemistry, University of Oslo, PO Box 1033 Blindern, N-0315 Oslo, Norway
*Correspondence e-mail: c.h.gorbitz@kjemi.uio.no
To scientists working with small-molecule or organometallic compounds, the Cambridge Structural Database constitutes an extremely important tool for reference to individual crystal structures and as a data source for statistical investigations. The article by Groom et al. [(2016), Acta Cryst. B72, 171–179] provides updated information on the use, development and future of this database.
Crystallographers constitute a privileged group of scientists, not only because they provide an understanding of nature to an extent and at a resolution that is truly unique, but also because their results are archived in a manner that is unrivaled among physical methods used for characterization of molecular properties. This archive, the Cambridge Structural Database (CSD), developed and maintained by the Cambridge Crystallographic Data Center (CCDC), presently holds more than 800 000 entries for organic and organometallic compounds. Other databases include the Protein Data Bank (PDB; Berman et al., 2000) and the Inorganic Database (ICSD; Belsky et al., 2002). The value of this resource is documented by the fact that the currently used reference to the CSD (Allen, 2002) has received more than 10 000 citations related to all fields of crystallography (Wong et al., 2010). The paper by Groom et al. (2016) supersedes Allen's article and will become the new standard reference to the CSD.
The authors describe the exponential growth in the number of entries in the CSD since the humble beginnings in 1965, and provide details in terms of the historical and gradual development of molecular complexity and size, journal used for publication and more. Two distinct ways in which the CSD provides value are pointed out: `aggregation and standardization of structures, which facilitates access to individual entries', and `the study of the collection of entries' (Kennard, 1997) related to the geometry of molecules and the interactions they make. Some key papers discuss structure correlation (Bürgi & Dunitz, 1983), C—H groups as donors (Taylor & Kennard, 1982) and geometric tables (Allen et al., 1987; Orpen et al., 1989). A recent example includes quantification of the symmetry preferences of intermolecular interactions in organic crystal structures (Taylor et al., 2015).
The process from deposition of a data set to a published paper is discussed in detail. In fact, most journals these days require that any crystallographic material has been deposited at the CCDC before the manuscript is submitted for review; individual structures subsequently being identified by their CCDC deposition numbers in the printed paper. Publication triggers immediate public release of the corresponding structure(s). Retrieval of data is then open to anyone through requests posted at the CCDC web site. The authors also point out that many structures are now published only and directly as CSD Communications (previously known as Private Communications) and foresee that this will soon become the most popular way in which to publish crystal structures. After discussing how the CSD is used, they outline future developments, including systems that handle structures derived from non-crystallographic sources, such as electron diffraction, atomic force microscopy, free electron lasers and NMR crystallography, but also from
prediction algorithms.The tremendous success of the CSD has several, more or less independent components:
As an example of this development, I searched the database for organic structures in which a methyl group has (at least) one C—C—H angle < 75°. A total of 349 such physically unrealistic narrow angles were found in 315 structures, one example being illustrated in Fig. 1(a). Sorted by year, the distribution in Fig. 1(b) shows that the number of such freak geometries has declined dramatically since the 1980s. There are two obvious reasons: the first is that most people now fix methyl groups in theoretical, staggered positions, the second that updated checkCIF routines issue a set of error messages when such geometries are encountered, Fig. 1(c). The fact that the number has still not dropped to zero does, however, raise some concern. Evidently, some structures are published by researchers who fail to use the programs properly and care little about checking their results. Furthermore, some reviewers take their role too lightly and do not discover and address Alert level B errors like those in Fig. 1(c). From my own experience as a reviewer I find that water molecules are particularly prone to error, and the frequency of wide H—O—H bond angles > 135° appears to still be on the rise, Fig. 1(b). More rigorous checkCIF algorithms will undoubtedly be available in the future, but in the end it is the responsibility of both authors and reviewers to correct such obvious errors before the structure enters the CSD.
I was myself introduced to the CSD when I worked with my masters thesis in the mid 1980s, but I used it in full for the first time a few years later in preparation of a paper on the hydrogen-bond distances and angles in the structures of amino acids and ). At the time elucidating information on intermolecular interactions was quite an undertaking, above all due to the obvious fact that we did not have graphical computers at the time (the first graphical CSD interfaces appeared in 1991, `modern' interfaces arrived in 2002 with ConQuest; Bruno et al., 2002; and Mercury; Macrae et al., 2006). Lacking any visual input or output, making sure that you had found what you intended to find required excessively time-consuming manual checking of intricate tables of molecular connectivities. Also, although the investigation was carried out only on a small subset of 749 amino acid and peptide structures extracted from the 67 000 structures in the CSD at the time, the generation of neighbors for calculating intermolecular interaction geometries exhausted our computer resources to the extent that a simple search would run overnight. The change compared to the way the CSD is used today is simply incredible. And evidently, according to Groom et al. (2016), it is going to get even better. Lucky crystallographers!
(Görbitz, 1989References
Allen, F. H. (2002). Acta Cryst. B58, 380–388. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Allen, F. H., Kennard, O., Watson, D. G., Brammer, L., Orpen, A. G. & Taylor, R. J. (1987). J. Chem. Soc. Perkin Trans. 2, pp. S1–S19. CSD CrossRef Web of Science Google Scholar
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. (2002). Acta Cryst. B58, 364–369. Web of Science CrossRef CAS IUCr Journals Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, H. J., Bollinger, J. C., Brown, I. D., Gražulis, S., Hester, J. R., McMahon, B., Spadaccini, N., Westbrook, J. D. & Westrip, S. P. (2016). J. Appl. Cryst. 49, 277–284. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brown, I. D. & McMahon, B. (2002). Acta Cryst. B58, 317–324. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bruno, I. J., Cole, J. C., Edgington, P. R., Kessler, M., Macrae, C. F., McCabe, P., Pearson, J. & Taylor, R. (2002). Acta Cryst. B58, 389–397. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Bürgi, H. B. & Dunitz, J. D. (1983). Acc. Chem. Res. 16, 153–161. Google Scholar
Görbitz, C. H. (1989). Acta Cryst. B45, 390–395. CrossRef Web of Science IUCr Journals Google Scholar
Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. (2016). Acta Cryst. B72, 171–179. Web of Science CSD CrossRef IUCr Journals Google Scholar
Hall, S. R. & McMahon, B. (2016). Data Sci. J. 15, 1–15. Google Scholar
IUCr (2016). checkCIF, https://checkcif.iucr.org/. Google Scholar
Kennard, O. (1997). The Impact of Electronic Publishing on the Academic Community, pp. 159–166. London: Portland Press Ltd. Google Scholar
Macrae, C. F., Edgington, P. R., McCabe, P., Pidcock, E., Shields, G. P., Taylor, R., Towler, M. & van de Streek, J. (2006). J. Appl. Cryst. 39, 453–457. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Orpen, A. G., Brammer, L., Allen, F. H., Kennard, O., Watson, D. G. & Taylor, R. (1989). J. Chem. Soc. Dalton Trans. pp. S1–S83. CSD CrossRef Web of Science Google Scholar
Sheldrick, G. M. (2015). Acta Cryst. C71, 3–8. Web of Science CrossRef IUCr Journals Google Scholar
Taylor, R., Allen, F. H. & Cole, J. C. (2015). CrystEngComm, 17, 2651–2666. Web of Science CrossRef CAS Google Scholar
Taylor, R. & Kennard, O. (1982). J. Am. Chem. Soc. 104, 5063–5070. CSD CrossRef CAS Web of Science Google Scholar
Wong, R., Allen, F. H. & Willett, P. (2010). J. Appl. Cryst. 43, 811–824. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.