teaching and education
From atoms to bonds, angles and torsions: molecular metrics from crystal space, and two Excel implementations
aCurtin Institute for Computation, Discipline of Chemistry, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
*Correspondence e-mail: l.glasser@curtin.edu.au
Values of molecular bond lengths, bond angles and (less frequently) bond torsion angles are readily available from databases, from crystallographic software, and/or from interactive molecular and crystal visualization programs such as Jmol. However, the methods used to calculate these values are less well known. In this paper, the computational methods are described in detail, and live Excel implementations, which permit readers to readily perform the calculations for their own molecular systems, are provided. The methods described apply to both fractional coordinates in crystal space and Cartesian coordinates in Euclidean space (space in which the geometric postulates of Euclid are valid) and are vector/matrix based. In their simplest computational form, they are applied as algebraic expansions which are summed. They are also available in matrix formulations, which are readily manipulated and calculated using the matrix functions of Excel. In particular, their general formulation as metric matrices is introduced. The methods in use are illustrated by a detailed example of the calculations. This contribution provides a significant practical application which can also act as motivation for the study of matrix mathematics with respect to its many uses in chemistry.
Keywords: metric matrices; bond angles; bond lengths; torsion angles; vectors.
1. Introduction
Students and teachers of chemistry are familiar with the lengths of and angles between chemical bonds, and even with torsion angles defining molecular conformations. They are also familiar with the source of these data, which generally arise from diffraction experiments, principally with X-ray sources but also with electron or neutron sources. However, the methods of calculation which yield these values are less familiar, often hidden in crystallographic computational programs, and not generally accessible to the non-expert. The difficulties of these calculations are compounded by the fact that the experimental results are generally presented in crystal spaces, which best represent the crystalline symmetry but which are not readily manipulated by those only familiar with everyday orthogonal Euclidean space described using Cartesian coordinates. Even for students who have the opportunity to perform a full X-ray ; Chapuis, 2011; Aldeborgh et al., 2014; Gražulis et al., 2015), it appears that the structural details are obtained from the computational software, without students having the opportunity of acquainting themselves with the calculation methods.
(Kantardjieff, 2010The fundamental data of crystallography, that is, chemical composition, unit-cell dimensions, space-group symmetry and fractional atom coordinates, are reported as the results of the diffraction experiments. These data are made widely available by publication both in the primary literature and, since the 1990s, via deposition as crystallographic information files (Hall et al., 1991; https://www.iucr.org/resources/cif) (*.cif), which the major crystallographic journals currently require; the CIFs are subject to extensive checks (https://journals.iucr.org/services/cif/checkcif.html) in order to ensure their integrity. These files are freely available (Glasser, 2016) and large collections have been made into databases, which may be free or commercial, such as the Cambridge Crystallographic Database (CSD, for organic materials – now exceeding one million entries; Groom et al., 2016; https://www.ccdc.cam.ac.uk/products/csd/), the Inorganic Database (Zagorac et al., 2019), the free Crystallography Open Database (Gražulis et al., 2009), the free American Mineralogist Database (Downs & Hall-Wallace, 2003), the free RCSB Protein Data Bank (PDB; Berman et al., 2000) and so forth (Glasser, 2016). However, the generally provides only the basic information, and the molecular metrics of bond lengths, bond angles and torsion angles around bonds may still need to be generated.
There are many molecular visualization programs which permit the user to import CIFs, display atoms, and select atom pairs to generate bond lengths, atom triplets to generate bond angles and even sequential atom quadruplets to generate bond torsions. Principal among these are the CSD's online WebCSD (Thomas et al., 2010) with its free downloadable Mercury program (Macrae et al., 2020; https://www.ccdc.cam.ac.uk/Community/csd-community/freemercury/), and the free downloadable Jmol (https://www.jmol.org/), Avogadro (https://avogadro.cc/), VESTA (Momma & Izumi, 2011) and CrystalOgraph (https://www.epfl.ch/schools/sb/research/iphys/teaching/crystallography/crystalograph/) programs (but the last provides only bond lengths).
It is the purpose of the current paper to introduce the reader to the methods of calculation of molecular metrics from crystal space (Dunitz, 1995; Julian, 2014). In order to allow these rather complex calculations to be readily performed by the reader, two separate live Excel implementations of the methods are supplied, in which users can easily insert their own data. One is a `black-box' implementation of the BASIC code in Appendix I of Dunitz's book (Dunitz, 1995, pp. 495–497), while the second lays out the matrix calculations in detail, demonstrating the mathematical processes involved. These programs parallel an earlier implementation by the author using the proprietary MathCad software (Glasser, 1993).
We do not delve into the complexities of reciprocal spaces applicable in crystallographic calculations, since this extends into specialist applications of crystallography. However, it is of interest to note that the direct and reciprocal lattices are mutual Fourier transforms, with the momentum difference between incoming and diffracted X-rays of a crystal being a https://www.doitpoms.ac.uk/tlplib/reciprocal_lattice/index.php).
vector (2. Calculations using Cartesian coordinates
The atomic data in CIFs are listed in fractional coordinates, x, y, z, but it may be simpler to calculate the molecular metrics using Cartesian coordinates, X, Y, Z. (It is, however, possible to obtain the same results directly from the fractional coordinates, and this is considered in a subsequent section.)
2.1. Cartesian coordinates, X, Y, Z
The following matrix equation (McRee, 1993a,b; https://www.ruppweb.org/Xray/tutorial/Coordinate%20system%20transformation.htm) (easily applied using the Excel array function MMULT) transforms fractional coordinates, x, y, z, in crystal space into Cartesian coordinates, X, Y, Z, using the crystal cell constants a, b, c, α, β, γ:
where the volume of the
isThe OpenBabel Chemical Formatter (O'Boyle et al., 2011; https://www.cheminfo.org/Chemistry/Cheminformatics/FormatConverter/index.html) provides a convenient facility to convert CIFs directly to the Cartesian XYZ format.
The volume, V, may equivalently be obtained in matrix terms (using the Excel function MDETERM) as the square root of the determinant of the metric matrix, G, which is introduced in Section 3.
2.2. Vectors
We use bold face, e.g. v, to represent a vector, while |v| or italic v represents the magnitude (length – a scalar) of the vector. The italic form is used in algebraic expressions.
Vectors are described in terms of their coordinates along basis axes. For a general vector pi pointing from the coordinate origin to a point i (e.g. an atom centre) along the crystal axes a, b, c,
where the arrow represents transformation through equation (1a) from coordinates in crystal space to Cartesian coordinates.
The vector length is calculated using the square root of the dot (or scalar) product function of the vector with itself:
For a general pair of vectors, p and q at an angle θ, the dot product yields a scalar value:
where each of these vectors is referenced to the coordinate origin. In Excel, the dot product can be simulated by the function SUMPRODUCT.
In the operation of the dot product function, the cosine first provides a projection of one vector onto the other; then the product function multiplies and sums the components together [equation (2c)]. In physics, for example, this may correspond to work as the product of force times distance in the same direction as determined by the angle θ between the vectors. When the two vectors are parallel (as in the determination of a bond length), θ is zero and the cosine multiplier has the value one.
The vector length rji between points i and j, corresponding to a bond length, is obtained from the coordinate differences with the vector being multiplied by itself:
Bond angles, θ, can be calculated using either the dot (scalar) product or the cross (vector) product function; the latter generates a pseudovector norm (defined below), n, orthogonal to the vector pair rji and sjk which lie at an angle θ with respect to one another. The value of this vector norm n is only required if a torsion angle is to be calculated, as will be seen in the Torsion paragraph below.
The bond angle is calculated with respect to the atom sequence i—j—k with each bond vector, rji and sjk, referenced to the coordinates of the central atom j of the triplet; for example, ΔXr = (Xj − Xi).
θ can be determined from the arc cosine of the dot product (Cockcroft, 2006):
Algebraically (Cockcroft, 2006)
where
and
Alternatively, the cross product, r × s, yields the pseudovector, n, which is normal to the plane defined by r and s. The bond angle, θijk, is determined using the arc sine function:
A pseudovector (or axial vector), being perpendicular to the plane of the three atoms forming the bond, changes sign when converted to its mirror image, so that the cross product is non-commutative:
The standard selection for the sign of the torsion is positive when r rotates to s according to the right-hand rule. The cross product, , is the signed area of the parallelogram bounded by the vectors r and s. Similarly, the (scalar) triple product, (https://en.wikipedia.org/wiki/Triple_product), represents the signed volume, V, of the with cell constants a, b, c, α, β, γ [for the algebraic form of V, see equation (1b)]. Excel does not have a cross-product function, but a user-defined cross-product function is listed in the supplementary information file.
Torsion angles, ϕ, are calculated as the twist around the central bond j—k of the two planes defined by the orthogonal pseudovectors of the angle triplets i—j—k and j—k—l (see Figs. 1 and 2). Thus, we use the dot product of the pair of triplets to generate the cosine of the torsion angle, , from which we evaluate ϕijkl through the arc cosine function:
It remains to determine the sign of ϕijkl (between −180 and +180°), which is accomplished in equation (4b) by calculating first the cross product of the orthogonal axial vectors of the two planes defining the bond angles (this cross product generates a new vector, now parallel to the planes) and then the scalar dot product of this vector with the vector representing the central bond j—k:
By the right-hand rule, if these parallel vectors point in the same direction the torsion angle is positive, but it is negative if they point in opposite directions.
3. The metric matrix and crystal (vector) space
Converting from the fractional coordinates of crystal (vector) space to Cartesian coordinates of standard space is convenient, as shown in Section 4, but it takes no account of the symmetry of crystals. It is sometimes appropriate to perform geometry in the crystal space directly (for example, to generate symmetry-related atoms absent from the data provided in the CIF) (De Graef & McHenry, 2011, 2012; https://dictionary.iucr.org/Asymmetric_unit). This introduces the scalar metric matrix, G (also known as the g, when it use extends to nonlinear processes and is no longer a scalar), which uses the six crystal constants, a, b, c, α, β, γ, as the basis vectors in evaluations of dot products:
We collect the basis vectors into a metric matrix, G, where
so that
Defining
so that , where the superscript T represents the transpose,
As noted earlier, the volume of the G, which can be found in Excel using the function MDETERM.
is equal to the square root of the determinant ofIn computation, it is convenient to record the six terms in equation (5c) a2, b2, c2,…, abcosγ individually, since they might be used repeatedly for both bond length and bond angle calculations (Cockcroft, 2006). In the algebraic expansion, equation (5c) provides the terms which need to be summed in order to calculate a bond length. However, it is simpler in Excel to calculate the metric matrix expression of equation (5c) directly using the MMULT array function, without requiring the six-term expansion.
The metric matrix can be used to calculate bond angles directly, without having to calculate bond lengths |r| and |s| independently, as follows:
This relation can also be expressed in terms of the metric matrix G:
Hence
Torsion angle calculations are generally regarded as being too complex to be formulated with fractional coordinates, and are often performed using Cartesian coordinates, as discussed above. However, the BASIC program provided by Dunitz (1995) does not have this limitation.
4. An example calculation (for L-valine) using both algebraic expansions and vector methods
For this exercise we will demonstrate how the equations introduced in Sections 2 and 3 are implemented in practice, using the example of L-valine (Torii & Iitaka, 1970) with the data and results as appear in the Valine worksheet of the supplementary workbook gj5247sup2.xlsx.
Atom numbering:
a = 9.71, b = 5.27, c = 12.06 Å, α = 90, β = 90.8, γ = 90°.
constants:Transformation matrix for conversion from fractional coordinates, x, y, z, to Cartesian coordinates, X, Y, Z:
where the volume of the unit cell
Note: in Excel, angles must be in radians, where rads = degs*pi()/180.
Algebraic result, after matrix multiplication:
The fractional atomic coordinates input from the LVALIN.cif file from the CSD are given in Table 1.
|
Coordinate transformation, from fractional to Cartesian, using atom C1 as an example:
Alternatively, by direct matrix multiplication following substitutions into equation (1a)
The resulting Cartesian atomic coordinates are given in Table 2.
|
Bond lengths [equation (2d) or (5c)]:
Using Cartesian coordinates,
Using fractional coordinates,
The resulting bond lengths are C8—C9 = 1.534 Å, C7—C8 = 1.547 Å and C1—O4 = 1.265 Å.
Bond angles [equations (3a) or (6)]:
A bond angle is calculated with respect to the atom sequence i—j—k with each bond vector, rji and sjk, referenced to the coordinates of the central atom j of the triplet.
Consider the Cartesian coordinates for the bond angle O1—C1—C7 (Table 3), extracted from the supplementary Valine spreadsheet.
|
Using the scalar dot product with Cartesian coordinates, and expanding algebraically,
In our Excel Valine worksheet, the bond angle for O1—C1—C7 is calculated in cell J24 using the Excel expression
= ACOS(SUMPRODUCT(M24:M26,O24:O26)/(M27*O27)) = 117.80°
where M24:M26 and O24:O26 refer to the Cartesian coordinate differences: ΔXr, ΔYr, ΔZr for bond O1—C1 and ΔXs, ΔYs, ΔZs for bond C7—C1, normalized by dividing by the lengths of the respective bonds, M27 and O27. Note that the bond differences are all calculated with respect to the central atom, C1.
Expressing this calculation in matrix terms [equation (6)],
where r·s is determined in Excel as SUMPRODUCT(|r|,|s|).
Algebraically, with fractional coordinates,
Alternatively, a bond angle can be found using the vector cross product:
The cross product can be determined using the user-defined function CVp (listed in the supplementary file) with the pair of bonds in Table 3. This generates a vector normal to the plane of the bond pair:
The length of this vector is |vrs| = (Xrs2 + Yrs2 + Zrs2)1/2 = 1.673. Hence
A choice needs to be made between the angle and its supplement. Note that the incorrect acute angle is also returned by the algebraic method unless the bond vectors are first referenced to the central atom j of the sequence ijk.
Torsion angles are best found by the complex method of the Torsion worksheet, using equations (4a) and (4b), or by simple substitution into the gj5247sup1.xlsx workbook.
5. Conclusions
Procedures by which bond lengths, bond angles and torsion angles can be calculated from either Cartesian or fractional crystal coordinates, individually or by use of metric matrices, are illustrated and also demonstrated using live Excel spreadsheets. These procedures exemplify the otherwise hidden methods used in crystallographic and molecular visualization programs.
6. Supplementary files
Supporting information:
(i) An Excel macro-enabled workbook, gj5247sup1.xls, which calculates torsion angles, bond angles and bond lengths using fractional coordinate data inserted by the user into the worksheet. This Excel file contains a macro.
(ii) An Excel workbook, gj5247sup2.xlsx, consisting of four worksheets labelled SF6, Serine, Valine and Torsion, which lays out the matrix calculations involved in calculating molecular geometry from fractional coordinates. All the calculations are performed live, using standard Excel functions.
(iii) The gj5247sup3.pdf file describes the contents and operations within the four worksheets of gj5247sup2.xls, and describes a user-defined Excel cross-product function.
Supporting information
Excel implementation of Dunitz BASIC program. DOI: https://doi.org/10.1107/S1600576720007311/gj5247sup1.xls
Excel workbook with 4 worksheets. DOI: https://doi.org/10.1107/S1600576720007311/gj5247sup2.xlsx
Description of Excel files. DOI: https://doi.org/10.1107/S1600576720007311/gj5247sup3.pdf
Acknowledgements
I thank both referees for their detailed and helpful corrections and comments which have saved me from much embarrassment. This contribution was inspired by the `Molecular Geometry' analysis of J. K. Cockcroft (2006).
References
Aldeborgh, H., George, K., Howe, M., Lowman, H., Moustakas, H., Strunsky, N. & Tanski, J. M. (2014). J. Chem. Crystallogr. 44, 70–81. Web of Science CSD CrossRef CAS Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Chapuis, G. (2011). Crystallogr. Rev. 17, 187–204. Web of Science CrossRef Google Scholar
Cockcroft, J. K. (2006). Molecular Geometry, https://pd.chem.ucl.ac.uk/pdnn/refine2/geometry.htm. Google Scholar
De Graef, M. & McHenry, M. E. (2011). Structure of Materials: Additional Material, https://som.web.cmu.edu/frames.html. Google Scholar
De Graef, M. & McHenry, M. E. (2012). Structure of Materials: An Introduction to Crystallography, Diffraction and Symmetry, 2nd ed. Cambridge University Press. Google Scholar
Downs, R. T. & Hall-Wallace, M. (2003). Am. Miner. 88, 247–250. Web of Science CrossRef CAS Google Scholar
Dunitz, J. D. (1995). X-ray Analysis and the Structure of Organic Molecules, 2nd ed. Basel: Verlag Helvetica Chimica Acta. Google Scholar
Glasser, L. (1993). Comput. Chem. 17, 107–108. CrossRef CAS Web of Science Google Scholar
Glasser, L. (2016). J. Chem. Educ. 93, 542–549. Web of Science CrossRef CAS Google Scholar
Gražulis, S., Chateigner, D., Downs, R. T., Yokochi, A. F. T., Quirós, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P. & Le Bail, A. (2009). J. Appl. Cryst. 42, 726–729. Web of Science CrossRef IUCr Journals Google Scholar
Gražulis, S., Sarjeant, A. A., Moeck, P., Stone-Sundberg, J., Snyder, T. J., Kaminsky, W., Oliver, A. G., Stern, C. L., Dawe, L. N., Rychkov, D. A., Losev, E. A., Boldyreva, E. V., Tanski, J. M., Bernstein, J., Rabeh, W. M. & Kantardjieff, K. A. (2015). J. Appl. Cryst. 48, 1964–1975. Web of Science CrossRef IUCr Journals Google Scholar
Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. (2016). Acta Cryst. B72, 171–179. Web of Science CrossRef IUCr Journals Google Scholar
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655–685. CrossRef CAS Web of Science IUCr Journals Google Scholar
Julian, M. M. (2014). Foundations of Crystallography with Computer Applications, 2nd ed. Abingdon-on-Thames: CRC Press. Google Scholar
Kantardjieff, K. (2010). J. Appl. Cryst. 43, 1276–1282. Web of Science CrossRef CAS IUCr Journals Google Scholar
Macrae, C. F., Sovago, I., Cottrell, S. J., Galek, P. T. A., McCabe, P., Pidcock, E., Platings, M., Shields, G. P., Stevens, J. S., Towler, M. & Wood, P. A. (2020). J. Appl. Cryst. 53, 226–235. Web of Science CrossRef CAS IUCr Journals Google Scholar
McRee, D. E. (1993a). Practical Protein Crystallography. San Diego: Academic Press. Google Scholar
McRee, D. E. (1993b). Practical Protein Crystallography, https://www.sciencedirect.com/book/9780124860506/practical-protein-crystallography. Google Scholar
Moggach, S. A., Allan, D. R., Morrison, C. A., Parsons, S. & Sawyer, L. (2005). Acta Cryst. B61, 58–68. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Momma, K. & Izumi, F. (2011). J. Appl. Cryst. 44, 1272–1276. Web of Science CrossRef CAS IUCr Journals Google Scholar
O'Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T. & Hutchison, G. R. (2011). J. Cheminform. 3, 33. Google Scholar
Thomas, I. R., Bruno, I. J., Cole, J. C., Macrae, C. F., Pidcock, E. & Wood, P. A. (2010). J. Appl. Cryst. 43, 362–366. Web of Science CrossRef CAS IUCr Journals Google Scholar
Torii, K. & Iitaka, Y. (1970). Acta Cryst. B26, 1317–1326. CSD CrossRef CAS IUCr Journals Web of Science Google Scholar
Zagorac, D., Müller, H., Ruehl, S., Zagorac, J. & Rehme, S. (2019). J. Appl. Cryst. 52, 918–925. Web of Science CrossRef CAS IUCr Journals Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.