The anatomy of a comprehensive constrained, restrained refinement program for the modern computing environment – Olex2 dissected

Bourhis, L.J.; Dolomanov, O.V.; Gildea, R.J.; Howard, J.A.K.; Puschmann, H.

doi:10.1107/S2053273314022207

research papers

FOUNDATIONS
ADVANCES

ISSN: 2053-2733

Volume 71| Part 1| January 2015| Pages 59-75

doi:10.1107/S2053273314022207

Open

access

The anatomy of a comprehensive constrained, restrained refinement program for the modern computing environment – Olex2 dissected

Luc J. Bourhis,^a ^* Oleg V. Dolomanov,^b Richard J. Gildea,^c Judith A. K. Howard ^d and Horst Puschmann ^b

^aBruker AXS–SAS, 4 Allée Lorentz, 77447 Marne-la-Vallée cedex 2, France, ^bOlexSys Ltd, Department of Chemistry, Durham University, South Road, Durham, DH1 3LE, England, ^cDiamond Light Source Ltd, Diamond House, Harwell Oxford, Didcot, Oxfordshire, OX11 0DE, England, and ^dDepartment of Chemistry, Durham University, South Road, Durham, DH1 3LE, England
^*Correspondence e-mail: [email protected]

(Received 19 June 2014; accepted 8 October 2014)

This paper describes the mathematical basis for olex2.refine, the new refinement engine which is integrated within the Olex2 program. Precise and clear equations are provided for every computation performed by this engine, including structure factors and their derivatives, constraints, restraints and twinning; a general overview is also given of the different components of the engine and their relation to each other. A framework for adding multiple general constraints with dependencies on common physical parameters is described. Several new restraints on atomic displacement parameters are also presented.

Keywords: small molecules; refinement; constraints; restraints; least squares; Olex2.

1. Introduction

During the last four decades, small-molecule crystallographers have used a myriad of stand-alone routines and various comprehensive software packages, most notably those written by Sheldrick (1997 , 2008 ) and that designed by Bob Carruthers and John Rollett, then maintained and enhanced by David Watkin (Betteridge et al., 2003 ). These latter two packages have dominated the citations in all publications containing crystal structure analyses for many years now.

Therefore it might be claimed that the analysis of single-crystal X-ray (and neutron) diffraction data has reached a certain level of maturity and that there is no need for further program development. Nonetheless, there is still considerable activity in this area by various groups around the world and there has been a new release, SHELX2013, from Sheldrick recently. A retro-fit of new ideas to these older programs is not always possible, except perhaps by the authors themselves.

When we first embarked on the project herein reported¹ we were clear about one point – namely that we wanted to create a new and flexible refinement engine, working in a collaborative environment and to be based on trusted and mature pre-existing code. This new refinement engine would be open source and therefore available for verification and modification by others. Modern programming paradigms, unknown when the aforementioned packages were created in the 1960s, would form the basis of our development. It is hoped that such an architecture will allow extensions to the code without breaking the program or endangering the underlying functionality. It is this that we describe below in further detail and which now forms the underlying code for olex2.refine (https://www.olex2.org).

We decided to base the computational core of olex2.refine on a pre-existing project, the Computational Crystallography Toolbox (cctbx), that provides the foundation for the macromolecular refinement facilities in the PHENIX suite of Adams et al. (2010 ). We made that choice because it provided the solid and versatile tools (Grosse-Kunstleve & Adams, 2003 ) that we needed to develop our project. The most important of them is a comprehensive and robust toolbox to handle crystal symmetries (sgtbx) that supports every space group in any setting, even rather unusual settings such as tripled cells. Another key cctbx toolbox is the eltbx, which is concerned with scattering-factor computations for any atom or ion and any wavelength that could be encountered in practice. The cctbx also featured many of the restraints we needed, in the module cctbx.restraints.

For our project, we contributed to it a Small-Molecule Toolbox (smtbx) which features, in particular, the constraint framework presented in detail in §3 and Appendix C, and the computation of structure factors presented in Appendix A. We also contributed several new restraints to cctbx, restraints which are discussed in §4 and Appendix D. Finally, both the cctbx and the smtbx use the Scientific Toolbox (scitbx) for arrays, matrices, special functions and other purely mathematical matters. We have contributed a new least-squares toolbox (lstbx) to the scitbx, which provides flexible tools to deal with generic linear and non-linear problems, with or without an unknown overall scale factor. It implements the method based on normal equations presented in Appendix B.

The program Olex2 relies either on SHELX or on the smtbx to refine structures. If using the latter, Olex2 converts its internal model of a structure into the objects used by the smtbx and the cctbx to represent the unit cell, reflections, crystal symmetry, constraints, restraints etc. Once the smtbx has produced the desired results they are sent back to Olex2 which converts them to its own representations, so as to perform diverse post-processing and eventually display the results to the user of the program. Fig. 1 summarizes the dependencies between the various modules and programs that have just been described.

Figure 1
A high-level depiction of the software modules involved in olex2.refine: an arrow is drawn from each component pointing towards another component it uses.

2. Least-squares refinement

A small-molecule structure refinement typically minimizes the weighted least-squares (LS) function

$[L = \underbrace{\textstyle\sum\limits_h w_h \left(Y_{{\rm o},h} - K Y_{{\rm c},h} \right)^2}_{L_{{\rm data}}} + \underbrace{\textstyle\sum\limits_{{{\rm restraint} }\, i} w_i \left(T_{{\rm o},i} - T_{{\rm c},i} \right)^2}_{L_{{\rm restraints}}}, \eqno (1)]$

where Y_o (respectively, Y_c) denotes either the measured amplitude F_o (respectively, the modulus of the calculated complex structure factor $[|{F_{\rm c}}|]$ ) or the measured intensity I_o = F_o² (respectively, the calculated intensity $[I_{\rm c} = |{F_{\rm c}}| ^2]$ ), whereas K is an overall, unknown scale factor that places Y_c on the same scale as Y_o. The first sum over h runs over a set of m non-equivalent reflections that have been observed. Each observation is given an appropriate weight, w_h, based on the reliability of the measurement. These may be pure statistical weights, $[w = 1/\sigma^2(Y_{\rm o})]$ , where σ is the estimated standard deviation of Y_o, though more complex weighting schemes are usually used. In the most general case, w_h is a function of Y_o,h, $[\sigma(Y_{{\rm o},h})]$ , Y_c,h and h itself, whereas the most common weighting schemes exhibit only a dependence on the first three.

These X-ray observations can be supplemented with the use of `observations of restraint', as suggested by Waser (1963 ), where additional information such as target values for bond lengths, angles etc. is included in the minimization. This is the origin of the second term of the sum, where T_o is the target value for our restraint, and T_c is the value of the target function calculated using the current model (see, for example, Giacovazzo et al., 2011 ; Watkin, 2008 ). This term L_restraints may of course be absent. Conversely, the data term could be absent in a refinement against geometrical terms only [cf. the program DLS-76 by Baerlocher et al. (1978 ) for example].

In equation (1), the crystallographic parameters [i.e. atomic positions, atomic displacement parameters (ADPs) and chemical occupancy in routine refinements] that each component of Y_c and T_c depends upon will be collectively denoted as the vector $[y = (y_1, \ldots, y_p)]$ and we can therefore denote those dependencies with the compact notations Y_c(y) and T_c(y). The parameters that are actually refined may be a different set $[x_1, \ldots, x_n]$ , collectively denoted as a vector x. We assume that the dependency of y upon x is known analytically. Since the scale factor K is unknown, our problem is the minimization of L with respect to all of $[K, x_1, \ldots, x_n]$ . We will denote that dependency as L[y(x), K], or more tersely when we do not need to remember the crystallographic parameter vector y, as L(x, K). These notations reflect the important fact that we treat the scale factor K separately, as will be explained later.

For a small-molecule structure with a high data-to-parameter ratio, such unconstrained minimization as defined by the first term of equation (1), when $[y \equiv x]$ , may well be sufficient. However, as the structure becomes larger, or the data-to-parameter ratio worsens, unconstrained minimization may not be well behaved, or result in some questionable parameter values. It has become customary to rely on the use of two techniques to solve such issues:

Restraints: by having a non-zero second term in equation (1) – with the use of appropriate weighting of the restraints – the minimization is gently pushed towards giving a chemically sensible and hopefully correct structure.

Constraints: by having a parameter vector x shorter than y, therefore explicitly taking into account exact relationships between some of the parameters y_i.

Whether the refinement uses restraints or constraints or both, at each refinement cycle, we first find the value of K that minimizes L(x, K) keeping the parameter vector x fixed: we will denote it by $[\tilde{K}(x)]$ , where we use a notation that is explicit in its dependency on the value of the refined parameters at the beginning of the cycle. We then search for the shift vector s of the parameter vector x that brings $[L[x, \tilde{K}(x)]]$ closer to the minimum. It is solution of the so-called normal equations,

$[(B_{{\rm data}} + B_{{\rm restraints}})s = -(g_{{\rm data}} + g_{{\rm restraints}}), \eqno (2)]$

where B_data and B_restraints are the so-called normal matrices associated with the two terms sharing the same labels in equation (1), respectively, and where the right-hand side g_data and g_restraints are equivalently associated with those two terms [see the derivation of equation (72) in Appendix B].

The rest of the paper is organized as follows. §3 deals with the computation of the derivatives with respect to the refined parameters $[x_1, \ldots, x_n]$ from the derivatives with respect to the crystallographic parameters $[y_1, \ldots, y_p]$ . The computation of the latter derivatives is expounded in Appendix A along with the formulae for the complex structure factors F_c. Appendix C details the computation of the value and derivatives of constrained parameters for each constraint featured by olex2.refine. §4 is devoted to the general principles of the computation of B_restraints and g_restraints. It is completed by Appendix D, which gives the formulae for each restraint featured by olex2.refine. §5 deals with the refinement of twinned structures. §6 details the computation of standard uncertainties (s.u.'s), emphasizing the influence of constraints. §7 gives a very quick overview of the output of the results of a refinement. Appendix B presents the method used to find the optimal scale factor K, and then the optimal value of all the other refined parameters.

3. Constraints

3.1. Synopsis

The difference between restraints and constraints may be conceptualized mainly in two manners that we will first illustrate with a simple example: a geometrically constrained acetylenic hydrogen, X≡C—H. The position of the hydrogen must be such that the distance CH is equal to some ideal bond length d and such that the angle $[X \widehat{{\rm C}}{\rm H}]$ is equal to 180°. Expressed in such an implicit manner, those restrictions could be used to construct restraints by adding to the function to minimize a term w₁(CH − d)² + w₂( $[\cos X\widehat{{\rm C}}{\rm H}]$ − 1)². By doing so, the number of refined parameters would not be changed but the number of observations would increase by three.

It is, however, traditional to do the opposite, by reducing the number of parameters by three and keeping the number of observations unchanged. This is achieved by using the implicit form of the constraints to express the position of H as a function of the positions of the two carbon atoms. Denoting by x the triplet of coordinates of the atoms,

$[x_{{\rm H}} = x_{{\rm C}} + d {{x_{{\rm C}} - x_{X}}\over{\vert\vert x_{{\rm C}} - x_{X}\vert\vert}},\eqno(3)]$

where $[\vert\vert. \vert\vert]$ denotes the Euclidean norm. By using this formula, the observable Y_c of the fit (either $[| {F_{\rm c}}| ^2]$ or $[|{F_{\rm c}}|]$ ) that used to depend on x_C, x_X and x_H is replaced by a function $[\tilde{Y}_{\rm c}]$ of x_C and x_X but not of x_H. We will call this transformation a reparametrization: $[\tilde{Y}_{\rm c}(x_{{\rm C}}, x_{X}) = Y_{\rm c}[x_{{\rm C}}, x_{X}, x_{{\rm H}}(x_{{\rm C}}, x_{X})]]$ and we will say that x_H(x_C, x_X) is a reparametrization of x_H, whose arguments are x_C and x_X.

The derivatives for the remaining variables are obtained with the chain rule

$[\eqalignno { {{\partial \tilde{Y}_{\rm c}}\over{\partial x_{{\rm C}}}} &= {{\partial Y_{\rm c}}\over{\partial x_{{\rm C}}}} + {{\partial Y_{\rm c}}\over{\partial x_{{\rm H}}}} \,\, {{\partial x_{{\rm H}}}\over{\partial x_{{\rm C}}}}, &\cr {{\partial \tilde{Y}_{\rm c}}\over{\partial x_{X}}} &= {{\partial Y_{\rm c}}\over{\partial x_{X}}} + {{\partial Y_{\rm c}}\over{\partial x_{{\rm H}}}} \,\, {{\partial x_{{\rm H}}}\over{\partial x_{X}}}. & (4)\cr}]$

We use the following compact notations for derivatives: for a column vector

$[x=\left (\matrix { x_1 \cr x_2 \cr x_3}\right),]$

$[{{\partial F}/{\partial x}}]$ will always denote the row vector

$[\left ({{\partial F}\over{\partial x_1}}, \, {{\partial F}\over{\partial x_2}}, \, {{\partial F}\over{\partial x_3}}\right).]$

Given another column vector

$[y = \left (\matrix { y_1 \cr y_2 \cr y_3}\right),]$

$[{{\partial y}/{\partial x}}]$ will always denote the matrix

$[\left ({{\partial y_i}\over{\partial x_j}}\right) _{1 \le i,j \le 3}]$

where i (respectively, j) indexes the rows (respectively, the columns). The identity matrix will be denoted by $[{\bf 1}]$ .

It should be noted that it is customary to work within the `riding' approximation,

$[{{\partial x_{{\rm H}}}\over{\partial x_{X}}} = 0, \quad {{\partial x_{{\rm H}}}\over{\partial x_{{\rm C}}}} = {\bf 1}, \eqno (5)]$

which results in the much simplified chain rule

implemented in all refinement programs, including Olex2.

It is not always the case that all three coordinates of an atom are removed from the refinement by constraints. For example, an atom in the plane of the mirror z,y,x whose matrix reads

$[M = \left (\matrix { 0 & 0 & 1\cr 0 & 1 & 0\cr 1 & 0 & 0}\right) \eqno (7)]$

has its coordinates x = (x₁, x₂, x₃) constrained by the relation M x = x, which can be reduced to x₁ = x₃ only. The corresponding reparametrization may be written

$[x_1 = u_1,\quad x_2 = u_2,\quad x_3 = u_1, \eqno (8)]$

the observable Y_c becoming now a function of the vector of newly introduced refinable parameters u = (u₁, u₂). There is a general algorithm implemented in the cctbx that for any special position returns a matrix Z so that the reparametrization takes the form

$[x = Z u + z, \eqno (9)]$

where Z is a 3 ×2 or 3 ×1 matrix and z is a 3-vector. This algorithm first determines the space-group symmetries that leave the site invariant. The resulting system of linear equations, which read M x = x in our example, is then reduced to a triangular form from which the matrix Z and the vector z are then readily obtained. This algorithm does not therefore try to determine which components of u, if any, are also components of x. This is why we have presented the reparametrization for our example in this general form instead of keeping x₁ and x₂ as refinable parameters, which would sound more intuitive in the first place. This matrix Z is the matrix of constraints for the position of that atom.

The anisotropic displacement tensor U is subject to symmetry constraints as well. In our example, it must satisfy

$[M U M^T = U, \eqno (10)]$

where M^T denotes the transpose of the matrix M (see e.g. Giacovazzo et al., 2011). Any number of such matrix equations can always be rewritten as a system of equations whose most general form reads

$[P \left (\matrix { U_{11}\cr U_{22}\cr U_{33}\cr U_{12}\cr U_{13}\cr U_{23}}\right) = 0, \eqno (11)]$

where P has 6 columns and 6n rows, where n is the number of symmetry elements other than the identity involved in the special position [indeed equation (10), being trivial for $[M = {\bf 1}]$ , can safely be discarded]. In our example,

$[P = \left (\matrix { -1 & 0 & 1 & 0 & 0 & 0\cr 0 & 0 & 0 & 0 & 0 & 0\cr 1 & 0 & -1 & 0 & 0 & 0\cr 0 & 0 & 0 & -1 & 0 & 1\cr 0 & 0 & 0 & 0 & 0 & 0\cr 0 & 0 & 0 & 1 & 0 & -1}\right) \eqno (12)]$

and equation (11) reduces to

$[U_{11} = U_{33}\,\, { \rm and } \,\, U_{12} = U_{23}, \eqno (13)]$

leading to the constraint matrix

$[\left (\matrix { U_{11}\cr U_{22}\cr U_{33}\cr U_{12}\cr U_{13}\cr U_{23}}\right) = \left (\matrix { 1 & 0 & 0 & 0\cr 0 & 1 & 0 & 0\cr 1 & 0 & 0 & 0\cr 0 & 0 & 1 & 0\cr 0 & 0 & 0 & 1\cr 0 & 0 & 1 & 0}\right) \left (\matrix { v_1\cr v_2\cr v_3\cr v_4}\right). \eqno (14)]$

Then one would refine (v₁, v₂, v₃, v₄) instead of the U_ij. The cctbx provides an algorithm that computes this matrix P for any special positions, and then reduces it to triangular form in order to determine the constraint matrix for the ADP.

In other cases, a reparametrization will make some crystallographic parameters disappear while introducing new refinable parameters. A typical example is that of a tetrahedral X—CH₃, as the geometrical constraints leave one degree of freedom, a rotation about the axis X—C. Thus the reparametrization expresses the coordinates of the hydrogen atoms H₀, H₁ and H₂ as functions of the coordinates of the carbon atoms, and of an angle φ modelling that rotation,

$[\eqalignno { x_{{\rm H}n} &= x_{{\rm C}} + d \Biggl \{\sin \alpha \left [ \cos \left (\varphi + n {{2\pi}\over{3}}\right) e_1 + \sin \left (\varphi + n {{2\pi}\over{3}}\right) e_2 \right ] &\cr &\quad- \cos \alpha e_0 \Biggr\} ,&(15)\cr}]$

where $[\alpha \simeq]$ 109.5° and (e₀, e₁, e₂) is an orthonormal basis of column vectors with e₀ in the direction of the bond $[X \rightarrow {\rm C}]$ . The riding approximation in this case consists of neglecting the derivatives of those base vectors, leading to

$[\eqalignno { {{\partial \tilde{Y}_{\rm c}}\over{\partial x_{{\rm C}}}} &= {{\partial Y_{\rm c}}\over{\partial x_{{\rm C}}}} + {{\partial Y_{\rm c}}\over{\partial x_{{\rm H}}}}, &\cr {{\partial \tilde{Y}_{\rm c}}\over{\partial x_{X}}} &= {{\partial Y_{\rm c}}\over{\partial x_{X}}}, &\cr {{\partial \tilde{Y}_{\rm c}}\over{\partial \varphi}} &= {{\partial Y_{\rm c}}\over{\partial x_{{\rm H}}}} d \sin \alpha \left[ - \sin \left (\varphi + n {{2\pi}\over{3}}\right) e_1 + \cos \left (\varphi + n {{2\pi}\over{3}}\right) e_2\right].&\cr&&(16)\cr }]$

Thus a new derivative with respect to the new refinable parameter φ is introduced by this reparametrization.

The last important concept is that of the chaining or composition of reparametrizations, that we will illustrate with a combination of the examples above. This example is not particularly common but it is a simple illustration of the concept we want to introduce. In the CH₃ case, the atoms C and X could be on the special position studied in the next-to-last example. One type of disorder could be modelled by first applying the reparametrization (15) and then reparametrizing x_C and x_X using equation (8), introducing parameters (u,v) for the former and Mathematical equation for the latter. The derivatives would then be obtained by the chain rule, e.g.

$[{{\partial x_{{\rm H_0}}}\over{\partial u}} = {{\partial x_{{\rm H_0}}}\over{\partial x_{{\rm C}}}} {{\partial x_{{\rm C}}}\over{\partial u}} = 2 \eqno (17)]$

in the riding approximation. This composition of reparametrization may be represented as a graph: each parameter (some of them are triplets of coordinates, others are scalars), x_H₀, x_H₁, x_H₂, x_C, x_X, φ, u, v, Mathematical equation and , constitutes a node of that graph, whereas edges are drawn from each node to its arguments, i.e. the nodes it depends upon, as shown in Fig. 2. In this example, x_C has only one argument, u, whereas x_H₀ has three arguments, x_C, x_X and φ. The smtbx implements reparametrizations by explicitly building such a graph.

Figure 2
Example of a dependency graph for the chain of reparametrization discussed in §3

. Only the part for hydrogen atom H₀ is shown.

As models become more complex, e.g. hydrogen atoms riding on the atoms of a rigid body whose rotation centre is tied to an atom whose coordinates are refined, the reparametrization graph becomes deeper. We decided not to put arbitrary limits on that graph. Indeed, we could have made a closed list of reparametrizations and of reparametrization combinations that our framework would accept but instead we decided to write our code so that it could correctly handle the computation of parameter values and of partial derivatives for arbitrary reparametrizations, combined in arbitrary ways. This framework is therefore open as new types of reparametrizations can be added, without the need to change the basic infrastructure in any way, and without the risk of breaking existing reparametrizations. This has proven very useful to the authors as this enabled them to incrementally add the wealth of constraints now available, some of which are unique to olex2.refine, as discussed in Appendix C. Furthermore, it enables third parties to develop their own constraints without the need for the involvement of the original authors beyond documenting how the reparametrization framework works.

A crystallographic refinement may involve many such reparametrizations. By piecing them all together, we obtain one reparametrization that expresses all refinable crystallographic parameters as a smaller vector of independent parameters that shall then be refined. Our framework safeguards that piecewise construction in several ways. First, at most one reparametrization may be applied to any given parameter. An attempt to add a reparametrization to a parameter that is already subject to one would be rejected by our framework as an error. Then if a cycle were found in the dependency graph, the framework would also reject the parametrization. This would happen when at least one parameter, through a series of reparametrization combinations, depends upon itself. Thus our framework safeguards against incorrect user inputs, and also against bugs in our own code that automatically builds constraints.

3.2. Constraint matrix

There is a wealth of algorithms designed to minimize least squares, but crystallographic software has only implemented a few of them. The two most popular methods have historically been the full matrix³ (all small-molecule programs, including Olex2, as explained in Appendix B) and conjugate gradient LS (CGLS, see e.g. Björck, 1996 ) which SHELXL offers as an option along with full matrix. The macromolecular community later introduced the limited-memory Broyden–Fletcher–Goldfarb–Shanno method (LBFGS, Nocedal, 1980 ), phenix.refine (Afonine et al., 2012 ) and a sparse Gauss–Newton algorithm, REFMAC (Murshudov et al., 2011 , §4 and references therein). All methods mentioned above have in common the fact that they require only the computation of the value and the first-order derivatives of the calculated F or F², that we have denoted as Y_c in this paper. Since no higher-order derivatives are used, the implementation of constrained least squares is greatly simplified.

Indeed, after transforming every constraint on the model into a reparametrization (as explained and exemplified in the previous section) and piecing all those reparametrizations together, we obtain a global reparametrization of the vector of crystallographic parameters $[y = (y_1, \ldots, y_p)]$ as a function of the vector $[x = (x_1, \ldots, x_n)]$ of the parameters that are actually refined. It should be noted that any component y_k of y that is not reparametrized – e.g. a coordinate of the pivot atom on which another atom rides, and of course any parameter that is not involved in any constraint – is also a component x_l of x, i.e.

$[{{\partial y_k}\over{\partial x_j}} = \left \{ \matrix { 1 & {\rm if } \,\, j = l, \cr 0 & {\rm if } \,\, j \ne l.}\right.]$

From the known analytical expression of Y_c(x), one computes the derivatives of $[{{\partial Y_{\rm c}}/{\partial y_i}}]$ (the reader is referred to Appendix A for a detailed presentation of those computations). The minimization algorithm then only needs the derivatives of

$[\tilde{Y}_{\rm c}(x) = Y_{\rm c}[y(x)] \eqno (18)]$

with respect to each x_i, which are easily obtained by a simple application of the chain rule

$[{{\partial \tilde{Y}_{\rm c}}\over{\partial x}} = {{\partial Y_{\rm c}}\over{\partial y}} \,\, {{\partial y}\over{\partial x}}. \eqno (19)]$

The matrix $[{{\partial y}/{\partial x}}]$ is known as the constraint matrix in crystallographic circles.⁴ In standard mathematical nomenclatures, it is called the Jacobian of the transformation $[x \,\mapsto\, y]$ and we will therefore denote it J.

The computation of J takes advantage of the fact that it is a very sparse matrix. This stems from the fact that any given crystallographic parameter y_i depends on very few refined parameters x_j, as amply illustrated in the previous section. We take advantage of this property to drastically reduce the memory cost of J and the cost of computing each of its non-zero coefficients. More precisely, both of these costs scale as the number of non-zero elements in J instead of O(n²) if J were treated as a dense matrix.

4. Restrained least-squares refinement

In this section we will give an overview of the computations involved in restrained refinement. The mathematical formulae for each restraint objective and their derivatives can be found in Appendix D.

Since each restraint target T_c,i only depends on a very small subset of the y_j (e.g., in the case of a bond length, only the six coordinates of the two bonded atoms would play a role), the matrix of derivatives with respect to the crystallographic parameters

$[D_{{\rm restraints}} = {{\partial T_{\rm c}}\over{\partial y}} = \left ({{\partial T_{{\rm c}, i}}\over{\partial y_j}}\right) _{i,j}, \eqno (20)]$

which is known as the design matrix for the restraints, is very sparse. We therefore use sparse-matrix techniques to efficiently store and perform computations with D, by only storing non-zero elements, and never performing any multiplication that involves an element of D known to be zero. We then introduce the matrix of derivatives with respect to the refined parameters,

$[{\tilde D}_{{\rm restraints}} = {{\partial T_{\rm c}}\over{\partial x}} = D_{{\rm restraints}} {{\partial y}\over{\partial x}}, \eqno (21)]$

which is computed by forming the product of D and the constraint matrix. Thus the restraints are initially built up without any knowledge of the constraint matrix. This greatly simplifies their implementation and it simplifies their use in a refinement program that does not use constraints. This organization of the computation does not incur any inefficiency as the product (21) is a cheap operation since both matrices are sparse, and it therefore scales as the number of non-zero elements, which is typically much smaller than the number of parameters n.

The two terms in the normal equations (2) coming from the restraints then read as the matrix product and matrix-vector product,

$[B_{{\rm restraints}} = {\tilde D}^T_{{\rm restraints}} W{\tilde D}_{{\rm restraints}} \eqno (22)]$

$[g_{{\rm restraints}} = {\tilde D}^T_{{\rm restraints}} \Delta T, \eqno (23)]$

where D^T denotes the transpose of D and where W is the diagonal matrix featuring the restraint weights, and where

$[\Delta T_i = T_{{\rm o},i} - T_{{\rm c},i} \eqno (24)]$

is the residual for the ith restraint. We again take advantage of the sparsity of $[\tilde{D}_{{\rm restraints}}]$ to efficiently implement those products.

It would be desirable to place the weights of the restraints on the same scale as the typical residual, such that a restraint will have a similar strength for the same weight in different structures. Rollett (1970 ) suggests the normalization factor

$[w_{{\rm restraints}} = {{1}\over{m-n}} \sum\limits_{h}{ w_h (Y_{{\rm o},h} - K Y_{{\rm c},h})^2}. \eqno (25)]$

This is better known as the square of the goodness of fit, $[\chi^2]$ . This normalizing factor also allows the restraints to have greater influence when the fit of the model to the data is poor (and the goodness of fit is greater than unity), whilst their influence lessens as the fit improves (Sheldrick, 1997).

4.1. Implementation

The choice of the minimization algorithm has a significant impact on the organization of the computation of restraints. Indeed, a generic minimizer such as the LBFGS minimizer used in phenix.refine requires at each iteration only the function value L and the derivatives $[{{\partial L}/{\partial x}}]$ . The derivatives $[{{\partial L_{{\rm data}}}/{\partial x}}]$ and $[{{\partial L_{{\rm restraints}}}/{\partial x}}]$ can be calculated separately before combining their sum to obtain $[{{\partial L}/{\partial x}}]$ . In contrast, for full-matrix least squares we need the matrices of partial derivatives $[{{\partial F_{\rm c}}/{\partial x}}]$ and $[{{\partial T_{\rm c}}/{\partial x}}]$ . Therefore, depending on the optimization method used, we must be able to compute both $[{{\partial L}/{\partial x}}]$ and $[{{\partial T_{\rm c}}/{\partial x}}]$ where by the chain rule

$[\eqalignno{{{\partial L_{{\rm restraints}}}\over{\partial x}} &= {{\partial L}\over{\partial T_{\rm c}}}\,\, {{\partial T_{\rm c}}\over{\partial x}} &\cr&= 2w (T_{\rm o} - T_{\rm c})\,\, {{\partial T_{\rm c}}\over{\partial x}}. &(26)}]$

The restraints framework was designed in such a way that it could be easily extended by adding further restraints. Each restraint must provide the array of partial derivatives of the restraint with respect to the crystallographic parameters (one row of the matrix $[{{\partial T_{\rm c}}/{\partial y}}]$ ), the restraint delta, T_o - T_c, and the weight, w, of the restraint.

5. Twinning

Like all other refinement programs, we have adopted the model of twins proposed by Jameson (1982 ) and Pratt et al. (1971 ). The sample is modelled as d domains, each sizeable enough a single crystal to give rise to observable Bragg peaks. If peaks from different domains fall very close in reciprocal space, the integration software analysing frames will be able to compute only the sum of the intensities of these superposed peaks. Data for the rth reflection will therefore consist of an intensity F_o,r² and of a list of s_r Miller indices $[h_{r,i_1}, h_{r,i_2}, \ldots, h_{r,i_{s_r}}]$ where h_r,i is the triplet of Miller indices of the Bragg peak originating from diffraction by domain i. The model of the structure then predicts an intensity I_c,r for F_o,r² that reads

$[I_{{\rm c},r} = \textstyle\sum\limits_{l = 1}^{s_r} \alpha_{i_l} |F_{\rm c}(h_{r,i_l})|^2, \eqno (27)]$

where $[\alpha_i]$ is the fraction of the sample volume occupied by twin domain i. The least squares to minimize are then, adapting equation (1) for a twinned structure,

$[L = \textstyle\sum\limits_{r = 1}^m w_r \left(F_{{\rm o},r}^2-K I_{{\rm c},r}\right)^2, \eqno (28)]$

where the weight w_r is usually

$[w_r = w[F_{{\rm o},r}^2, \sigma(F_{{\rm o},r}^2), I_{{\rm c},r}], \eqno (29)]$

where w would be the same function discussed in the context of equation (1). The minimization of L with respect to the model parameters embodied in F_c and with respect to the α's is therefore subject to the constraint

$[1 = \textstyle\sum\limits_{i = 1}^d \alpha_i. \eqno (30)]$

There is a special case that is common and therefore important, where there are exactly d superposed peaks for each reflection, i.e., s_r = d for every r, and where there is a 3 ×3 matrix R, the twin law, that generates the Miller indices for each domain $[i \,\gt\, 1]$ as

$[h_{r,i} = h_{r,i-1}R\semi \eqno (31)]$

in this case, the input consists solely of a list of F_o(h)² (as for an untwinned refinement but with h = h_r,1) and of R. Such twins belong to the taxonomies pseudo-merohedral or merohedral. olex2.refine is able to refine a twinned structure input in this manner.

SHELXL users would handle the general case by using a reflection file in the HKLF5 format. Such a file is created by the integration software and this approach can deal with the most complex situations. By contrast, CRYSTALS always requires a list of twin laws.

olex2.refine can handle a more general case: when general and merohedral twinning are simultaneously present. For example, one may have four domains 1, 2, 3 and 4. Domains 1 and 3 are related by a twin law R, and so are domains 2 and 4, but the relative orientation of domains 1 and 2 does not correspond to any twin law. Thus the measured frames exhibit two lattices of Bragg peaks, with some overlaps, and the integration software will then output a list of (F_o², h₁), (F_o², h₂) and (F_o², h₁, h₂). The refinement engine needs to expand this to a list of (F_o², h₁, h₃ = h₁R), (F_o², h₂, h₄ = h₂R) and (F_o², h₁, h₂, h₃ = h₁R, h₄ = h₂R). olex2.refine performs this task on the fly, while passing Miller triplets to the code computing structure factors and their derivatives.

The theory and phenomenology of twinning extends far beyond our exposition, but from the narrow point of view of refinement our presentation is sufficient, since, in this paper, we do not concern ourselves with the computation of the twin law or with the indexing of superposed lattices.

6. Standard uncertainties

6.1. Variance matrix

In this section, we will discuss the rigorous computation of s.u.'s of and correlation between the parameters of a constrained model. As pointed out in Appendix B in the comment about equation (73), the variance matrix for the refined parameters x is

$[{\rm Var}\, (x) = B^{-1}, \eqno (32)]$

where

$[B_{ij} = {{\partial r}\over{\partial x_i}} \cdot{{\partial r}\over{\partial x_j}} \eqno (33)]$

is the normal matrix for the constrained least-squares minimization. However, we are interested in the variance matrix Var (y) for the crystallographic parameters of the model, not in Var (x). For small variations, we have the linear relation

$[\delta y \approx J \delta x \eqno (34)]$

and therefore, using the well known heuristic definition of the variance matrix of y as the mean value of $[\delta y \delta y^T]$ in the linear approximation around the minimum implicitly assumed throughout crystallographic refinement,

$[{\rm Var}\, (y) = J\, {\rm Var}\, (x)\, J^{T}. \eqno (35)]$

We would like to stress an important consequence of this formula: constrained parameters generally have non-zero s.u.'s. This is the case, for example, for the coordinates of riding atoms. For most constraints, the s.u.'s of hydrogen coordinates are equal to the s.u.'s of the atom they ride on but e.g. for a rotating –CH₃, they differ because the s.u. of the azimuthal angle increases the s.u. of the hydrogen coordinates that come from riding only.

6.2. Derived parameters

We will now discuss the s.u.'s of derived parameters. Such a parameter is a function f of a set of atomic parameters p_i and its variance can be derived using the same heuristic as above in a linear approximation where $[\delta f \simeq \sum_i ({\partial {f}}/{\partial p_i}) \delta p_i]$ by computing Var (f) as the mean of $[(\delta f)^2]$ . This leads to

$[\sigma ^2(f) = \sum_{i,j} \left ({{\partial {f}}\over{\partial p_i}}\right) \,\, \left ({{\partial {f}}\over{\partial p_j}}\right) \,\, {\rm cov}(p_i,p_j). \eqno (36)]$

An early occurrence of this formula can be found in Sands (1966 ).

As a result of this formula and of the consequence of equation (35) stated after it, any bond length, any bond angle and any dihedral angle involving a riding atom has a non-zero s.u. unless that geometrical quantity is fixed by the constraint. For example, in the case of (R₁, R₂, R₃)—C—H, for which the constraints do not fix the angles $[\widehat{R_i{\rm CH}}]$ , those angles have a non-zero s.u. This correct behaviour unfortunately triggers many alerts when a structure refined with Olex2 is checked with PLATON.

PLATON plays a very important role in detecting common flaws in the crystallographic workflow. However, since PLATON does not have access either to the covariance matrix or the constraint matrix, it has to make guesses that result in estimation of e.s.d.'s that may be out by up to a factor 2. This is the key problem we encounter with the way olex2.refine reports e.s.d.'s. Specifically, most structures refined with olex2.refine fail checkCIF test PLAT732 (an alert C), for the reason explained in the documentation of this alert (in Note 2):⁵ PLATON computes the s.u. of the said angle using the s.u. of the atomic coordinates as if they were independent parameters since it does not have access to the variance–covariance matrix that describes their correlations. In this case, those correlations are very significant since the position of H completely depends on the position of R1, R2 and R3. Hence the s.u. of this angle as reported by Olex2 differs from the PLATON estimate. Thus all failures of test PLAT732 for angles or distances involving constrained hydrogen atoms are spurious. As a result, if a referee were to require that all alert C's are to be addressed in a CIF submitted for publication in Acta Crystallographica, we advise authors to explain away PLAT732 for riding hydrogen atoms by quoting Note 2 in the documentation for that test. olex2.refine will actually automatically prepare the CIF file with such explanations.

6.2.1. Incorporating s.u.'s of unit-cell parameters

Derived parameters such as bond lengths and angles are a function of both the least-squares atomic parameters and the unit-cell parameters. As such, the s.u. of a derived parameter is likewise a function of both the atomic and unit-cell parameters as well as their respective s.u.'s. If the s.u.'s in atomic parameters are considered to be totally uncorrelated with the s.u.'s in the cell parameters, i.e. their covariance is zero, then the s.u. in a derived parameter can be considered as comprising two independent sources of uncertainties:

$[\sigma^2(f) = \sigma^2_{\rm cell}(f) + \sigma^2_{xyz}(f), \eqno (37)]$

where $[\sigma_{xyz}(f)]$ is the part coming from the uncertainties in the least-square estimates of the positional parameters, and $[\sigma_{\rm cell}(f)]$ comes from the uncertainties in the unit-cell parameters,

$[\sigma^2_{\rm cell}(f) = \sum_{i,j} {{\partial f}\over{\partial i}} \,\, {{\partial f}\over{\partial j}} \,\, {\rm cov} \, (i,j), \eqno (38)]$

where $[i,j = \{ a,b,c,\alpha,\beta,\gamma \}]$ .

This necessitates the calculation of the derivatives of the function with respect to the unit-cell parameters. In order to do so, it is easier to calculate separately the derivatives of the function with respect to the elements of the metrical matrix, and also the derivatives of the metrical matrix with respect to the cell parameters, and then to use the chain rule

$[{{\partial f}\over{\partial i}} = {{\partial f}\over{\partial g_{jk}}} \,\, {{\partial g_{jk}}\over{\partial i}}, \,\, i = a, b, c, \alpha, \beta, \gamma. \eqno (39)]$

Indeed $[{{\partial f}/{\partial g_{jk}}}]$ must be evaluated for every function, whereas $[{{\partial g_{jk}}/{\partial i}}]$ is constant for a given unit cell.

Now we consider the application of equation (36) to determine the s.u. in the length of the vector u, in fractional coordinates. The length, D, of the vector u is given by

$[D = (u^T G u)^{{{1}/{2}}}, \eqno (40)]$

where G is the metrical matrix (see e.g. Giacovazzo et al., 2011).

The derivatives of the distance, D, with respect to the elements of the metrical matrix, G, are given by

$[{{\partial D}\over{\partial g_{ii}}} = {{1}\over{2}} \, {{u_i^2}\over{D}} \eqno (41)]$

and (given the metrical matrix is symmetric)

$[{{\partial D}\over{\partial g_{ij}}} = {{u_i u_j}\over{D}}, \,\, { \rm for\,\, all } \,\, i\,\lt\, j. \eqno (42)]$

Similarly, for the angle between two vectors in fractional coordinates, u and v, where the angle is defined as

$[\theta = \arccos {{u^T {G}v}\over{\vert\vert u^T {G}u\vert\vert \,\,\vert\vert v^T{G}v\vert\vert}} \eqno (43)]$

$[\theta = \arccos {{r_A \cdot r_B}\over{\vert\vert r_A \vert\vert \,\,\vert\vert r_B\vert\vert}}, \eqno (44)]$

where r_A and r_B are the Cartesian equivalents of u and v, the derivative of the angle, θ, with respect to the elements of the metrical matrix, G, is given by

$[{{\partial \theta}\over{\partial g_{ii}}} = {{1}\over{2 \sin\theta}} \left ({{u_i^2 \cos{\theta}}\over{\vert\vert{r_{A}}\vert\vert ^2}} - {{2 u_i v_i}\over{\vert\vert r_{A} \vert\vert \,\, \vert\vert r_{B}\vert\vert}} + {{v_i^2 \cos{\theta}}\over{\vert\vert r_{B}\vert\vert^2}}\right) \eqno (45)]$

and

$[\eqalignno{&{{\partial \theta}\over{\partial g_{ij}}} = {{1}\over{2 \sin\theta}} \left ({{u_iu_j \cos{\theta}}\over{\vert\vert{r_{A}}\vert\vert ^2}} - {{u_i v_j + u_jv_i}\over{\vert\vert r_{A} \vert\vert \,\, \vert\vert r_{B}\vert\vert}} + {{v_iv_j \cos{\theta}}\over{\vert\vert r_{B}\vert\vert^2}}\right), &\cr & {\rm for \,\, all} \,\, i \,\lt\, j. &(46)\cr}]$

The derivatives of the metrical matrix with respect to the unit-cell parameters,

$[C = (a, b, c, \alpha, \beta, \gamma), \eqno (47)]$

needed in order to apply equation (39) are given below:

$[\eqalignno { {{\partial g_{11}}\over{\partial C}} &= (2 a,0,0,0,0,0) &\cr {{\partial g_{22}}\over{\partial C}} &= (0,2 b,0,0,0,0) &\cr {{\partial g_{33}}\over{\partial C}} &= (0,0,2 c,0,0,0) &\cr {{\partial g_{12}}\over{\partial C}} &= (b \cos\gamma,a \cos\gamma,0,0,0,-a b \sin\gamma) &\cr {{\partial g_{13}}\over{\partial C}} &= (c \cos\beta,0,a \cos\beta,0,-a c \sin\beta,0) &\cr {{\partial g_{23}}\over{\partial C}} &= (0,c \cos\alpha,b \cos\alpha,-a c \sin\beta,0,0). & (48)\cr}]$

6.3. Symmetry

The variance–covariance matrix that is obtained from the inversion of the least-squares normal matrix contains the variance and covariance of all the refined parameters. Frequently, it is necessary to compute functions that involve parameters that are related by some symmetry operator of the space group to the original parameters. Sands (1966) suggests that the symmetry should be applied to the variance–covariance matrix to obtain a new variance–covariance matrix for the symmetry-generated atoms. Alternatively, and it is this method that is used here, the original variance–covariance matrix can be used if the derivatives in equation (36) are mapped back to the original parameters.

Let the function f depend on the Cartesian site y_c that is generated by the symmetry operator R_c from the original Cartesian site x_c, i.e.

$[y_{\rm c} = R_{\rm c} x_{\rm c}. \eqno (49)]$

Then the gradient with respect to the original site can be obtained by

$[{{\partial f(y_{\rm c})}\over{\partial {x}_{\rm c}}} = {{\partial f(y_{\rm c})}\over{\partial {y}_{\rm c}}} \, R_{\rm c}. \eqno (50)]$

The variance–covariance matrix that is used in this case should be the one that is transformed to Cartesian coordinates. The variance–covariance matrix for Cartesian coordinates can be obtained from that for fractional coordinates by the transformation

$[{V}_{\rm c} = {\cal O} {V}_{f} {\cal O}^{T}, \eqno (51)]$

where the transformation matrix $[{\cal O}]$ needed to transform the entire variance–covariance matrix in one operation would be block diagonal, with the 3 ×3 orthogonalization matrix O repeated at the appropriate positions along the diagonal. This transformation can be computed efficiently using sparse-matrix techniques.

7. Refinement results

olex2.refine uses CIF (Hall et al., 1991 ) as its main output. The CIF contains information regarding the space group, the data indicators such as merging indices, the refinement indicators such as the R factors, goodness of fit, residual electron density, refinement convergence indicators and tabulated structure information, including tables of atomic parameters, bonds and angles. olex2.refine produces a table describing the restraints using the CIF restraints dictionary. Moreover, the olex2.refine CIF always contains a verbal description of the refinement model – hydrogen-atom treatment, constraints, restraints and their targets. Optionally, the refinement model file (SHELX RES file) and the reflections can also be included in the final CIF for deposition. It should be noted that the constraints and restraints unique to olex2.refine, i.e. not featured by SHELX, are saved in REM sections (using an XML format). This has the advantage that they can be read back by Olex2 while providing a `.res file' that can also be refined with SHELX, albeit with potentially a different model. As part of this work a CIF-handling toolbox (Gildea et al., 2011 ) was added to cctbx.

8. Conclusion

We have presented herein the full mathematical derivations of the concepts used within olex2.refine. This new refinement engine is feature-wise on a par with the established software in the field such as SHELXL and CRYSTALS. It actually provides a richer wealth of constraints than those classic suites. olex2.refine is immediately useful to the practising crystallographer since it is presently available from the program Olex2 by default. It can also be used independently as a library of components to write short scripts as well as more complex programs, dealing with any aspect of small-molecule refinement. Indeed, olex2.refine is largely based on the Small-Molecule Toolbox (smtbx) that is part of the Computational Crystallography Toolbox (cctbx), which is usable as a library for the Python programming language, thus providing great expressiveness, conciseness and ease of coding combined with an immense wealth of tools covering all of crystallography. We have explained herein how the smtbx added constrained least-squares refinement from scratch to the cctbx and how it added many restraints as well, thus opening new fields to the cctbx.

APPENDIX A

Computation of structure factors and their gradient

The formulae discussed in this appendix have been known for nearly a century and have been implemented in all known refinement programs. However, it seems to the authors that, during the last decades, the computation for a given Miller index h of F_c(h) and its gradient $[\nabla{F_{\rm c}(h)}]$ with respect to crystallographic parameters has very rarely been presented in all the minute details necessary to implement those formulae in a program, justifying this appendix in our humble opinion. We will compare our algorithm to Rollett (1965 ) point by point as we present it, but first we would like to highlight a technical difference. Rollett computes the real and imaginary parts separately, whereas our computation is performed with complex numbers, the rationale being that the composition of the different terms of the structure factor can be performed with mere complex multiplications. For example, the incorporation of anomalous scattering in equation (53) corresponds to equations (20) and (21) in Rollett.

A1. Structure factor of one atom

Since the structure factor of the entire unit cell is the sum over the contribution of each scatterer, we will focus on one such contribution only. We therefore consider a scatterer with fractional coordinates x, a thermal displacement tensor U in fractional coordinates and a chemical occupancy s (i.e. this occupancy does not take crystallographic symmetries into account). Its contribution F_uc(h) to the entire unit cell is obtained from its structure factor F(h) by the sum

$[F_{{\rm uc}}(h) = \textstyle\sum\limits_{(R\mid t) \in {\cal S}} F^{(R\mid t)}(h). \eqno (52)]$

$[{\cal S}]$ denotes the subset of symmetry operators Mathematical equation of the rotational part R and translational part t that generate the orbit of the position x under the application of the entire set of symmetries in the space group of the structure, whereas $[F^{(R\mid t)}(h)]$ is the transform of F(h) under the symmetry . For a spherical atomic model with an elastic scattering factor f(h²) that does not depend on the direction of h and an inelastic scattering factor Mathematical equation that does not depend on h (a property that holds for X-ray diffraction), F reads

$[F(h) = s [\,f(h^2) + f^{\prime} + i f^{\prime\prime}] \underbrace{\exp(-h U h^T) \exp(i 2\pi h x)}_{G(h)}. \eqno (53)]$

We will consistently denote a triplet h of Miller indices as a row vector, whose transpose h^T is therefore a column vector, and a triplet x of fractional coordinates as a column vector. In this context, h² denotes the Euclidean square of h, that therefore involves the reciprocal metric matrix M^* as h² = h M^* h^T.

If the thermal displacement is isotropic, the term exp(-h U h^T) is replaced by $[\exp(-2\pi^2 h^2 u_{{\rm iso}})]$ and it is then factored out of G(h).

We are therefore left with computing the sum of the transforms of G(h) under the symmetries in $[{\cal S}]$ , the other terms forming a factor multiplying this sum. By the definition of a Fourier transform, that transform reads

$[G^{(R\mid t)}(h) = G(hR)\exp(i 2\pi ht). \eqno (54)]$

Let us consider the case of non-primitive unit cells first. The sum can be partitioned as follows:

$[\textstyle\sum\limits_{(R\mid t) \in {\cal S}} G^{(R\mid t)}(h) = \sum\limits_{(R\mid t) \in {\cal S}^\prime} \sum\limits_{\tau \in {\cal T}} G^{(R \mid t + \tau)}(h),]$

where $[{\cal T}]$ is the set of all centring translations, including zero, and $[{\cal S}']$ is the set of `primitive' symmetries. Then with equation (54) we can factorize the sum as

$[\eqalignno{\textstyle\sum\limits_{(R\mid t) \in {\cal S}} G^{(R\mid t)}(h) &= \left [\textstyle\sum\limits_{\tau \in {\cal T}} \exp(i2\pi h\tau) \right]\left [\textstyle\sum\limits_{(R\mid t) \in {\cal S}^\prime} G(hR) \exp(i2\pi ht) \right]. &\cr&&(55)\cr}]$

The first term of the right-hand side does not depend upon the scatterer and it can therefore be precomputed. Commonly, Miller indices satisfy the centring conditions $[h\tau = 0\, {\rm mod}\, 1]$ . In this case, this term is equal to the number of centring translations. However, this assumption can be invalid in the refinement of some pseudo-merohedral twins since Miller indices satisfying a centring condition may be transformed by the twin law in Miller indices that do not satisfy it.

It should be noted that Rollett already advocated skipping the computation of the terms in $[\sum_{(R\mid t) \in {\cal S}} G^{(R\mid t)}(h)]$ corresponding to non-zero centring translations and then multiplying the result by 2. However, our framework can handle any group $[{\cal T}]$ , not only those with +1/2 translations. It would work for tripled cells as well, for example.

Thus we are left with the computation of the second term of the right-hand side of equation (55). This is equivalent to treating the case of a primitive unit cell. Three cases are to be considered.

(i) The space group is non-centric. Then there is no further simplification.

(ii) The space group is centric but the inversion $[({\overline 1}| t_{{\overline 1}})]$ may not be located at the origin. Then the sum over the symmetries may be split into the sum over the set $[{\cal S}^+]$ of proper symmetries and the sum over the set of improper symmetries. Since every improper symmetry may be written as $[({\overline 1}| t_{{\overline 1}})(R | t)]$ where Mathematical equation is a proper symmetry, we have

$[\textstyle\sum\limits_{(R\mid t) \in {\cal S}} G^{(R\mid t)} = \underbrace{\textstyle\sum\limits_{(R\mid t) \in {\cal O}^+} G^{(R\mid t)} }_{\Sigma} + \sum\limits_{(R\mid t) \in {\cal S}^+} G^{( {{\overline 1}\mid}{t_{{\overline 1}}}) (R\mid t)}.]$

However, the product involving the inversion simplifies as $[({\overline 1}| t_{{\overline 1}}) (R| t) = (-R | -t + t_{{\overline 1}})]$ and therefore using equation (54)

$[G^{({\overline 1}\mid t _{{\overline 1}}) (R\mid t)} = G(-hR) \exp(-i2\pi ht) \exp(i2\pi h t_{{\overline 1}}),]$

but then from the very definition of G(h) in equation (54)

$[G(-hR) = G(hR)^*, \eqno (56)]$

and therefore

$[\textstyle\sum\limits_{(R\mid t) \in {\cal S}} G^{(R\mid t)} = \Sigma + \Sigma^* \exp(i2\pi h t_{{\overline 1}}). \eqno (57)]$

(iii) If $[h t_{{\overline 1}} = 0]$ , which holds true for every reflection if the space group is origin centric ( $[t_{{\overline 1}} = 0]$ ), the previous case resolves to the real part of Σ only,

$[\textstyle\sum\limits_{(R\mid t) \in {\cal S}} G^{(R\mid t)} = 2 \sum\limits_{(R\mid t) \in {\cal S}^+} \exp[-h R U (hR)^T] \cos(hRx + t). \eqno (58)]$

The total sum over symmetries is therefore real, and is its derivative, the former involving a cosine and the latter involving a sine for the derivatives with respect to x.

In order to avoid redundant computations, our code distinguishes these three cases. One piece of code handles the case of centrosymmetric space groups using only real numbers to compute Σ (an equivalent optimization was presented in Rollett). Another piece of code handles the other two cases, first computing Σ using complex number algebra and then using equation (57) if the space group is centric and $[h t_{{\overline 1}} \neq 0]$ (an optimization not found in Rollett). Another optimization we applied is to precompute the terms hR and $[\exp(i2\pi t)]$ appearing in equation (54) as well as the test for the condition $[h t_{{\overline 1}} = 0]$ before the loop over all scatterers in the asymmetric unit (an optimization already advocated by Rollett). In all three cases, the computation of cos(hRx + t) and sin(hRx + t) is the most costly operation. This is mitigated by computing the structure factor and its derivatives together, as the cosine and sine then only need to be computed once for each reflection and scatterer. This optimization is the reason why we have written our own new code into the smtbx instead of reusing the original code in the cctbx. Indeed, in the latter, structure factors are computed separately from the derivatives, resulting in two separate loops over the reflections and the scatterers, and in sines and cosines being computed twice. This is rather well suited to the optimization algorithm used in phenix.refine (LBFGS) because it does not need the derivatives at every step. But it does not fit well with full-matrix least squares as is prevalent in small-molecule crystallography.

Our code also provides two options to compute the sines and cosines: either by using the trigonometric functions of the standard C++ library, or by using a cctbx function that interpolates between tabulated values of sines and cosines. The latter is much faster but less precise. Olex2 uses the former as CRYSTALS and SHELXL use the standard FORTRAN sin and cos functions. It should be noted that Rollett used an approximation of trigonometric functions by Chebyshev polynomials, which used to be employed by CRYSTALS.

A2. Derivatives of |F|² and |F|

Having computed the unit-cell structure factor F_c and its derivatives, we need to compute the derivatives of $[|{F_{\rm c}}| ^2]$ and perhaps as well $[|{F_{\rm c}}| = (|{F_{\rm c}}| ^2)^{1/2}]$ if performing a refinement against F. Since $[|{F_{\rm c}}| ^2 = F_{\rm c} F_{\rm c}^*]$ , where z^* denotes the complex conjugate of the complex number z, for any crystallographic parameter ξ,

$[{{\partial |{F_{\rm c}}| ^2}\over{\partial \xi}} = F_{\rm c}^* {{\partial {F_{\rm c}}}\over{\partial \xi}} + {F_{\rm c}}{{\partial {F_{\rm c}^*}}\over{\partial \xi}},]$

but since the parameter ξ is always real in practice,

$[{{\partial {F_{\rm c}^*}}\over{\partial \xi}} = \left ({{\partial {F_{\rm c}}}\over{\partial \xi}}\right) ^* \eqno (59)]$

and therefore

$[{{\partial |{F_{\rm c}}| ^2}\over{\partial \xi}} = 2 {\rm Re} \left ({F_{\rm c}^*} {{\partial {F_{\rm c}}}\over{\partial \xi}}\right) \eqno (60)]$

where Re denotes the real part. Then,

$[{{\partial |{F_{\rm c}}|}\over{\partial \xi}} = {{1}\over{|{F_{\rm c}}|}} \, {\rm Re} \left ({F_{\rm c}^*} {{\partial {F_{\rm c}}}\over{\partial \xi}}\right). \eqno (61)]$

The smtbx provides code to compute derivatives for both of these observables. However, Olex2 currently only offers the choice to refine against F².

A3. Extinction correction

olex2.refine models primary and secondary extinction with the same empirical correction used in SHELXL,

$[F^{\prime}_{\rm c} = F_{\rm c} \left (1+0.001 x \, {{F_{\rm c}^{2}\lambda^3}\over{\sin 2\theta}}\right) ^{-{{1}/{4}}}, \eqno (62)]$

where the so-called extinction parameter x is added to the set of refined parameters. As noted in SHELXL documentation, it is close to the work of Becker & Coppens (1974 ) but not identical.

APPENDIX B

Minimization of least squares with an overall scale factor

In this appendix, we are concerned with the minimization of L_data(x, K) as defined in equation (1). We will drop the label `data' throughout this appendix for clarity. We will denote by (x^*, K^*) the values of those parameters at which L(x,K) reaches the minimum we are interested in.

It is well known that the overall scale factor tends to be highly correlated with the thermal displacements. As a result, a starting value of K far from K^* tends to destabilize the fit and at the very least will increase the number of iterations necessary to converge to the minimum. It is, however, easy to compute a reasonable approximation of K^*. Indeed, as a function of K only, keeping all the other parameters x fixed, L(x, K) is a second-order polynomial. An analytical formula therefore exists for the value of K that minimizes that polynomial. Using the geometrical interpretation of least squares is the fastest manner to demonstrate this formula and leads to the most compact formula. We therefore introduce the scalar product of two sets of observables,

$[Y \cdot Y^\prime = \textstyle\sum\limits_h w(h) Y(h) Y^\prime(h), \eqno (63)]$

and the associated norm $[\vert\vert Y\vert\vert]$ ,

$[\vert\vert Y\vert\vert ^2 = \textstyle\sum\limits_h w(h) Y(h)^2, \eqno (64)]$

as well as the residual vector

$[r(x, K) = Y_{\rm o} - K Y_{\rm c}(x). \eqno (65)]$

With those notations, L(x, K), which reads

$[L(x, K) = \vert\vert {r(x, K)}\vert\vert ^2, \eqno (66)]$

reaches a minimum at

$[\tilde{K} = {{Y_{\rm c} \cdot Y_{\rm o}}\over{\vert\vert {Y_{\rm c}}\vert\vert ^2}}, \eqno (67)]$

while keeping x fixed, and the value at the minimum reads

$[L(x, \tilde{K})= \vert\vert Y_{\rm o}\vert\vert ^2 - \tilde{K}^2 \vert\vert {Y_{\rm c}(x)}\vert\vert ^2. \eqno (68)]$

We could then simply use $[\tilde{K}]$ as a starting value for a combined refinement of x and K, i.e. the minimization of L(x, K) with the independent vector (x, K) of parameters.

However, we chose to minimize $[L(x, \tilde{K})]$ instead, which is a function of x only since $[\tilde{K}]$ is completely determined by x. This is a special case of a technique called separable least squares (Nielsen, 2000 , and references therein) that converges to the same minimum as previously, but with fewer iterations for the same cost per iteration. In order to carry out that minimization, we first need to compute the first-order derivatives,

$[{{\partial L (x, {\tilde K})}\over{\partial x_j}} = {{\partial L}\over{\partial x_j}} (x, {\tilde K}), \, 1 \le j \le n, \eqno (69)]$

where the chain rule would normally mandate a second term $[({\partial L}/{\partial K})\, (x, {\tilde K}) ({\partial {\tilde K}}/{\partial x_j})]$ , but in this case it is zero because of the definition of $[{\tilde K}]$ . In this expression,

$[{{\partial L}\over{\partial x_j}} = 2r \cdot {{\partial r}\over{\partial x_j}}. \eqno (70)]$

Thus, the first-order derivative is exactly the same as it would have been with an independent parameter K. We then need the second-order derivatives

$[{{\partial ^2 L (x, {\tilde K})}\over{\partial x_i \partial x_j}} \approx {{\partial r (x, {\tilde K})}\over{\partial x_i}} \cdot {{\partial r (x, {\tilde K})}\over{\partial x_j}},\, 1 \le i,j \le n, \eqno (71)]$

where we have neglected the term $[[\partial ^2 r(x, {\tilde K})/\partial x_i \partial x_j] \cdot r (x, {\tilde K})]$ because, for a well behaved fit, the residual r and its curvature are small enough that this term can safely be neglected.

Let us now build the normal equations,

$[Bs = -g, \eqno (72)]$

for the shift, s, of the parameter vector x. The matrix B is known as the normal matrix. Those equations are simply the Newton equations, featuring the Hessian matrix, B_ij = $[ {{\partial ^2 L (x, {\tilde K})}/{\partial x_i \partial x_j}}]$ , computed using the approximation of equation (71) and the gradient g of $[L(x, {\tilde K})]$ . It should be noted that the solution, s, of equation (72) is then also the solution of the linear least-squares problem

$[\min_s \left\vert\left\vert r(x, \tilde{K}) + {{\partial r(x, \tilde{K})}\over{\partial x}} s \right\vert\right\vert ^2. \eqno (73)]$

The inverse of the normal matrix B when the least-squares minimum has been reached by $[L(x, \tilde{K})]$ can therefore be viewed as the variance matrix of x, which should be contrasted to the more classic combined refinement of x and K, for which the variance matrix is the sub-matrix of the inverse of the normal matrix obtained by removing the column and the row corresponding to K.

Using the definition (65) of r and the expression (67) of $[\tilde{K}]$ , we have

$[{{\partial r(x, \tilde{K})}\over{\partial x_i}} = -\left (\tilde{K} {{\partial Y_{\rm c}}\over{\partial x_i}} + {{\partial \tilde{K}}\over{\partial x_i}} Y_{\rm c}\right), \eqno (74)]$

$[{{\partial \tilde{K}}\over{\partial x_i}} = {{1}\over{\vert\vert {Y_{\rm c}}\vert\vert ^2}} {{\partial Y_{\rm c}}\over{\partial x_i}} \cdot \left (Y_{\rm o} - 2 \tilde{K}Y_{\rm c} \right) \eqno (75)]$

and therefore the normal matrix B and the right-hand side of the normal equations g read

$[\eqalignno { B_{ij} &= \tilde{K}^2 {{\partial Y_{\rm c}}\over{\partial x_i}} \cdot {{\partial Y_{\rm c}}\over{\partial x_j}} + \tilde{K} \left ({{\partial \tilde{K}}\over{\partial x_j}} {Y_{\rm c}} \cdot {{\partial Y_{\rm c}}\over{\partial x_i}} + {{\partial \tilde{K}}\over{\partial x_i}} {Y_{\rm c}} \cdot {{\partial Y_{\rm c}}\over{\partial x_j}}\right) &\cr &\quad + {{\partial \tilde{K}}\over{\partial x_i}} {{\partial \tilde{K}}\over{\partial x_j}} \vert\vert Y_{\rm c} \vert\vert, &\cr g_i &= -(Y_{\rm o} - \tilde{K}Y_{\rm c}) \cdot \left (\tilde{K} {{\partial Y_{\rm c}}\over{\partial x_i}} + {{\partial \tilde{K}}\over{\partial x_i}}Y_{\rm c}\right).&(76)\cr}]$

It should be noted that the first term of B would appear in exactly the same form if we had kept the scale factor K as an independent parameter. Moreover, terms very similar to the second and third terms of B would need to be computed in that case, as the normal matrix B would then have one more column and one more row which would feature those similar terms. Thus, the computation cost of our unorthodox method is the same as that of the more traditional refinement of the overall scale factor along with the other crystallographic parameters. In any case, for all methods based on the normal equations, the computation time is dominated by the first term of B, as it scales as O(mn²), where m is the number of reflections and n the number of parameters.

It should also be noted that we never construct and store the whole of the so-called design matrix $[[{{\partial Y_{\rm c}(h)}/{\partial x_i}}] _{h,i}]$ that appears in those equations. Instead, we compute the vector of derivatives $[[{{\partial Y_{\rm c}(h)}/{\partial x_i}}] _{i}]$ for a few reflections h and then we immediately accumulate them into the normal matrix B and into the right-hand side g, as opposed to constructing the whole design matrix and then forming the products $[{{\partial Y_{\rm c}}/{\partial x_i}} \cdot {{\partial Y_{\rm c}}/{\partial x_j}}]$ appearing in equation (76), for example. It used to be the only efficient way to proceed a few decades ago when least-squares refinement was being developed, but the fantastic increase of computer memory has made that point largely irrelevant.

The traditional argument is to show that the normal matrix is typically one to two orders of magnitude smaller than the design matrix. Indeed, the latter has m n elements, whereas the former has approximately n²/2 elements for large values of n. Since in a typical small-molecule crystal structure determination the data-to-parameter ratio m / n is typically in the range 10–30, the normal matrix is therefore 20 to 60 times smaller than the design matrix. With the common use of constraints, particularly with respect to those on the parameters of hydrogen atoms, this contrast gets even more striking as n shall then be replaced with the number of parameters actually refined, which can be as small as n/2 for typical organic structures. This results in another factor 2 in the comparison of the respective sizes of the design and of the normal matrix, and it makes the latter typically 40 to 120 times smaller than the former.

However, in the extreme case of a structure with about 1000 atoms in the asymmetric unit, the design matrix stored in double precision only requires 40 to 120 Mb of memory, assuming constraints reducing the number of refinable parameters by a factor 2, as opposed to around 2 Mb for the normal matrix. They would therefore both easily fit in the main memory of a modern PC which comes with many Gigabytes of RAM.

The counter-argument goes even further, as the normal matrix is actually not necessary either to solve the least-square problem or to compute estimated standard deviations for geometric quantities, which are necessary to publish a structure. Indeed, there are well known mathematical techniques based only on the design matrix to solve these two problems [cf. for example, Björck (1996), sections 2.8.3 for s.u. computations and 2.4.3 for the solution of the least squares].

However, the best of those design-matrix techniques, the QR decomposition, involves about twice as many floating-point operations as the method based on the normal equations. That is the actual motivation for our choosing the latter and therefore adhering to a long crystallographic tradition.

It should be noted that for singular or nearly singular problems, one could solve the LS problem by using singular-value filtering on the design matrix, which is the most robust technique. On the other hand, if one computes the normal matrix, therefore squaring the singularities, solving the LS problem with eigenvalue filtering is of little practical interest.

APPENDIX C

Constraints

olex2.refine provides constraints to influence the position(s) or the ADP('s) of an atom or a group of atoms, as well as their occupancy factors. This section will enumerate the constraints and provide the details of their implementation.

C1. Occupancy constraints

The smtbx provides a generic tool to express any scalar parameter v as an affine function of other scalar parameters $[u_1, u_2, \ldots, u_s]$ ,

$[v = \textstyle\sum\limits_{i = 1}^s a_i u_i + b, \eqno (77)]$

where the number of arguments s as well as the coefficients a_i and b can be freely chosen. The common case of two atoms whose occupancies o and Mathematical equation shall add up to 1 is easily handled by the case s = 1, as .

Higher values of s are useful to model a site partially occupied by different types of atoms when the requirement of full occupancy is complemented by other constraints. The example used in the SHELXL manual to illustrate the restraint command SUMP is such a case: a site is occupied by ions Na⁺, Ca²⁺, Al³⁺ and K⁺ with an average charge of +2, leading to the restrictions on their occupancies:

$[o_{{\rm Na^+}} + o_{{\rm Ca^{2+}}} + o_{{\rm Al^{3+}}} + o_{{\rm K+}} = 1, \eqno (78)]$

$[o_{{\rm Na^+}} + 2o_{{\rm Ca^{2+}}} + 3o_{{\rm Al^{3+}}} + o_{{\rm K+}} = +2. \eqno (79)]$

Instead of enforcing those relations as restraints, as advocated in the SHELX documentation, we can solve those equations to get the explicit dependencies

$[o_{{\rm Na^+}} = o_{{\rm Al^{3+}}} - o_{{\rm K+}}, \eqno (80)]$

$[o_{{\rm Ca^{2+}}} = 1 - 2o_{{\rm Al^{3+}}}, \eqno (81)]$

which can then readily be expressed as special cases of equation (77).

C2. Shared site and shared U constraint

Two scatterers share the same site or the same ADP tensor. These constraints are pretty straightforward to implement within the smtbx framework. Values of dependent parameters and the corresponding rows of the Jacobian are set equal to the values for the reference atom.

C3. Riding atoms/group U constraint

This constraint defined a group as riding on the given, pivot atom, i.e. the coordinates of the groups are given the same shifts as the pivot atom (SHELXL AFIX 3). Moreover the distances from the atom to the atoms of the group can be refined (SHELXL AFIX 4), which is applicable to groups like XY_n for $[1 \leq n \leq 4]$ where X is B, C, N etc. and Y is H, F, Cl etc. In general the transformation of the triplet of coordinates r is written down like this:

$[r^\prime = s(r-r_{\rm p}) + r_{\rm p}, \eqno (82)]$

where s is the distance scaling component (s = 1 if not refined) and r_p is the pivot atom centre. In the case when s is refined, the corresponding derivative of the transformation is

$[{{\partial r^\prime}\over{\partial s}} = r-r_{\rm p}, \eqno (83)]$

whereas

$[{{\partial r^\prime}\over{\partial r}} = s {\bf 1}. \eqno (84)]$

C4. Rotated U constraint

This constraint relates the ADPs of one atom to the ADPs of another atom as rotated by a fixed or refined angle around a given direction. The direction can be defined as a static vector, best line or best plane normal. Considering atoms A and B, the relation between their ADP tensor U_A and U_B reads

$[U_{B} = RU_{A}R^{T}, \eqno (85)]$

where R is the rotation matrix

$[R = \left (\matrix { txx + c_\alpha && tyx + zs_\alpha && tzx - ys_\alpha \cr txy - zs_\alpha && tyy + c_\alpha && tzy + xs_\alpha \cr txz + ys_\alpha && tyz - xs_\alpha && tzz + c_\alpha}\right) \eqno (86)]$

with $[c_\alpha = \cos\alpha]$ , $[s_\alpha = \sin\alpha]$ and $[t = 1-c_\alpha]$ , and where the rotation direction is represented by a normal (x,y,z) and where α is the rotation angle around this direction. This transform is trivially rewritten as a linear transform of (U_A,11, U_A,22, U_A,33, U_A,12, U_A,13, U_A,23) into (U_B,11, U_B,22, U_B,33, U_B,12, U_B,13, U_B,23), whose matrix is therefore the Jacobian needed for the chain rule [equation (19)]. If the rotation angle is refined, then the derivative of this transformation by the angle is

$[{{\partial U_B}\over{\partial \alpha}} = RUR_\alpha^{\prime T} + R'_\alpha U R^{T}, \eqno (87)]$

where $[R'_\alpha]$ is the derivative of R by α:

$[R^{\prime}_{\alpha} = \left (\matrix { xx s_\alpha - s_\alpha && yx s_\alpha + z c_\alpha && zx s_\alpha - y c_\alpha \cr xy s_\alpha - z c_\alpha && yy s_\alpha - s_\alpha && zy s_\alpha + x c_\alpha \cr xz s_\alpha + y c_\alpha && yz s_\alpha - x c_\alpha && zz s_\alpha - s_\alpha}\right). \eqno (88)]$

C5. Pivoted rotation of a rigid body

This constraint is suitable for groups which can be rotated around a given direction (SHELXL AFIX 7). For groups XY_n with $[1 \leq n \leq 4]$ and X being B, C, N etc. and Y being H, F, Cl etc., the distances from X to Y can also be refined (SHELXL AFIX 8). The coordinate transformation is

$[r^{\prime} = sR(r-r_{\rm p}) + r_{\rm p}, \eqno (89)]$

and its derivatives are

$[{{\partial r^{\prime}}\over{\partial \alpha}} = sR^{\prime}_\alpha(r - r_{\rm p}), \eqno (90)]$

$[{{\partial r^{\prime}}\over{\partial s}} = R(r - r_{\rm p}), \eqno (91)]$

where r_p is the centre of the pivot atom, s is the distance scale component (s = 1 if not refined), R is the rotation matrix [equation (86)] and Mathematical equation is the derivative of the rotation matrix by the angle [equation (88)]. If neither the angle nor distances are refined, this constraint reduces to simple riding.

C6. Free rotation of a rigid body

This constraint is the most generic case of the rigid-body refinement (SHELXL AFIX 6). In this case there are six refined parameters – the three Euler angles and three positional components; the ADPs are normally refined independently. The coordinate transformation in this case happens similarly to §C5; however a different rotation matrix is used:

$[R = \left (\matrix { c_\beta c_\gamma & - c_\beta s_\gamma & s_\beta \cr s_\alpha s_\beta c_\gamma + c_\alpha s_\gamma & - s_\alpha s_\beta s_\gamma+ c_\alpha c_\gamma & - s_\alpha c_\beta \cr - c_\alpha s_\beta c_\gamma+ s_\alpha s_\gamma & c_\alpha s_\beta s_\gamma+ s_\alpha c_\gamma & c_\alpha c_\beta}\right) \eqno (92)]$

and its derivatives by the angles are

$[{{\partial R}\over{\partial \alpha}} = \left (\matrix { 0 & 0 & 0 \cr c_\alpha s_\beta c_\gamma - s_\alpha s_\gamma & - c_\alpha s_\beta s_\gamma- s_\alpha c_\gamma & - c_\alpha c_\beta \cr s_\alpha s_\beta c_\gamma+ c_\alpha s_\gamma & - s_\alpha s_\beta s_\gamma+ c_\alpha c_\gamma & - s_\alpha c_\beta}\right)\eqno(93) ]$

$[{{\partial R}\over{\partial \beta}} = \left (\matrix { - s_\beta c_\gamma & s_\beta s_\gamma & c_\beta \cr s_\alpha c_\beta c_\gamma & - s_\alpha c_\beta s_\gamma & s_\alpha s_\beta \cr - c_\alpha c_\beta c_\gamma & c_\alpha c_\beta s_\gamma & - c_\alpha s_\beta}\right) \eqno (94)]$

$[{{\partial R}\over{\partial \gamma}} = \left (\matrix { - c_\beta s_\gamma & - c_\beta c_\gamma & 0 \cr - s_\alpha s_\beta s_\gamma + c_\alpha c_\gamma & - s_\alpha s_\beta c_\gamma- c_\alpha s_\gamma & 0 \cr c_\alpha s_\beta s_\gamma+ s_\alpha c_\gamma & c_\alpha s_\beta c_\gamma- s_\alpha s_\gamma & 0 }\right). \eqno (95)]$

For the special cases of spherical counter-ions or groups like Cp and Ph the size component of this constraint can also be refined (SHELXL AFIX 9).

C7. Non-crystallographic symmetry constraint

This constraint is applicable to structures containing fragments which should be exactly superposable onto each other but which appear at different positions with different orientations. The same equations as for the free rotation of a rigid body described in §C6 are used in this case, but the coordinates of one of the group are refined along with the rotation angles and shift [i.e. equation (82) is used where all r's are refined, contrary to the rigid-body case where those coordinates are fixed input, usually taken from a collection of known chemical groups]. This allows more observations to be used in the refinement of the atomic coordinates and ADPs.

APPENDIX D

Restraints and their derivatives

Possible restraints on the stereochemistry or geometry of atomic positions include restraints on bond distances, angles and dihedral angles, chiral volume and planarity. These restraints are used extensively in macromolecular crystallography, and hence were already implemented within the cctbx as part of the macromolecular refinement program phenix.refine. With the exception of the bond-distance restraint, these restraints were not able to accept symmetry-equivalent atoms. Since this is more frequently required in small-molecule crystallography, these restraints have now been extended to allow for symmetry. We have also implemented other restraints commonly used in small-molecule structure refinement, such as a bond similarity restraint, and restraints on anisotropic displacement parameters including restraints based on Hirshfeld's `rigid-bond' test (Hirshfeld, 1976 ), similarity restraints and isotropic ADP restraints.

D1. Restraints involving symmetry

Given a restraint, f(x), involving a site x which is outside the asymmetric unit and which is related to the site y within the asymmetric unit by some symmetry transformation Mathematical equation , i.e. x = Ry+t, the gradient is transformed as

$[\left.{{\partial {f}(x)}\over{\partial y}} = {{\partial {f}}\over{\partial x}} \right\vert _{x = Ry+t} R. \eqno (96)]$

D2. Bond similarity restraint

The distances between two or more atom pairs are restrained to be equal by minimizing the weighted variance of the distances, where the least-squares residual, L, is defined as the population variance biased estimator

$[L = \langle r - \langle r\rangle \rangle ^2, \eqno (97)]$

where $[r = (r_1, \ldots, r_n)]$ are the Euclidean distances between the atoms of each pair. The mean is defined as

$[\langle {\xi}\rangle = \textstyle\sum\limits_i \omega_i \xi_i, \eqno (98)]$

where

$[\omega_i = {{w_i}\over{\sum_j w_j}}, \eqno (99)]$

and where w_i is therefore the weight for the ith pair.

The computation of the derivatives is easier with the alternate form of this variance,

$[L = \langle {r^2} \rangle -\langle{r}\rangle^2 = \textstyle\sum\limits_i \omega_i r_i^2 - \left(\sum\limits_i \omega_i r_i\right)^2. \eqno (100)]$

The derivative of L with respect to a distance r_i is then

$[{{\partial L}\over{\partial r_i}} = 2 \omega_i(r_i - \langle {r} \rangle). \eqno (101)]$

Then the derivatives of r_i = (x_a² - x_b²)^{1 / 2}, where x_a and x_b are the Cartesian coordinates of the two atoms in that pair, are given by

$[{{\partial r_i}\over{\partial x_a}} = {{x_a - x_b}\over{r_i}}, \eqno (102)]$

and the same formula with $[b \leftrightarrow a]$ . Therefore, using the chain rule, the derivatives of the residual with respect to x_a are

$[{{\partial L}\over{\partial x_a}} = {{2 \omega_i (r_j - \langle {r}\rangle)(x_a - x_b)}\over{r_j}}, \eqno (103)]$

and the same formula with $[b \leftrightarrow a]$ .

D3. Planarity restraint

In this section, we will discuss a restraint enforcing a group of atom sites to be coplanar. We will first consider the case of four sites, with Cartesian coordinate vectors x_i, x_j, x_k, x_l. They are coplanar if and only if the associated tetrahedron has a zero volume. This volume reads

$[V_{ijkl} = {{a \cdot (b \times c)}\over{6}} \eqno (104)]$

and any circular permutation of (a, b, c), where a, b, c are any triplet of distinct edges of the tetrahedron. Choices that lead to elegant formulae are

$[\eqalignno { a &= x_i - x_j, &\cr b &= x_j - x_k,&\cr c &= x_k - x_l, & (105)\cr}]$

and any circular permutations of (i,j,k,l).

Therefore a straightforward restraint enforcing coplanarity is the square of this volume, with a weight w,

$[L_{ijkl} = w V_{ijkl}^2. \eqno (106)]$

Its derivative with respect to x_i is then easily obtained from equations (104) and (105):

$[{{\partial L_{ijkl}}\over{\partial x_i}} = {{wV_{ijkl}}\over{3}} \left [(x_j - x_k) \times (x_k - x_l)\right] ^T. \eqno (107)]$

The derivatives with respect to x_j, x_k, x_l are then obtained by circular permutations of (i,j,k,l), taking advantage of the behaviour of equations (104) and (105) with respect to permutations stated above.

We will now consider p sites with Cartesian coordinate vectors $[x_1, x_2, \ldots, x_p]$ that we want to restrain to be coplanar, with $[p \,\gt\, 4]$ . We form the list $[{\cal T}]$ of tetrahedron (x_i, x_i+1, x_i+2, x_i+3) where $[1 \le i \le p-3]$ . If the volume of the ith and (i + 1)th tetrahedron are, respectively, zero, then x_i, x_i+1, x_i+2, x_i+3, x_i+4 are coplanar as the first four and the last four are, respectively, coplanar and those two quadruplets share a common triplet. Thus if all tetrahedra in $[{\cal T}]$ have a zero volume, then all sites are coplanar, which leads to building a restraint as the sum of the restraint for each tetrahedron,

$[L_{{\rm flat}} = \textstyle\sum\limits_{i = 1}^{p-3} L_{i,i+1,i+2,i+3}. \eqno (108)]$

This method is identical to the FLAT restraint in SHELXL, perhaps apart from differing implementation details.

The minimum number of degrees of freedom necessary to model p points in a plane is 2p+3: 2p for the coordinates in the plane, plus two angles for the orientation of the plane, plus one for the distance of the origin to the plane. Since this number of degrees of freedom is exactly the unconstrained and unrestrained number of degrees of freedom 3p minus the number of restraints in the sum of equation (108), this restraint L_flat is therefore optimal.

D4. Restraints on atomic displacement parameters

We will discuss three types of restraints on anisotropic displacement parameters (Rollett, 1970), similar to restraints implemented in SHELXL and REFMAC (Murshudov et al., 1999 ).

D4.1. Rigid-bond restraint

In a `rigid-bond' restraint the components of the anisotropic displacement parameters of two atoms in the direction of the vector connecting those two atoms are restrained to be equal. This corresponds to Hirshfeld's `rigid-bond' test (Hirshfeld, 1976) for testing whether anisotropic displacement parameters are physically reasonable (Sheldrick, 1997) and is in general appropriate for bonded and 1,3-separated pairs of atoms and should hold true for most covalently bonded systems.

Since the mean-square displacement of an atom of thermal displacement tensor U along a normalized vector u is u^T U u, the restraint residual for two atoms A and B at respective positions x_A and x_B and with respective thermal displacement tensors U_A and U_B reads

$[r = {{u^T (U_A - U_B) u}\over{\vert\vert {u}\vert\vert}}, \eqno (109)]$

where

$[u = x_A - x_B, \eqno (110)]$

and the restraint term is then

$[L (r_A, U_A, r_B, U_B) = w r^2. \eqno (111)]$

Those formulae are valid whether using Cartesian or fractional coordinates, providing the adapted formula is used to compute $[\vert\vert {u}\vert\vert]$ , i.e.

$[\vert\vert {u}\vert\vert = \left(\textstyle\sum\limits_i u_i^2\right)^{1/2} \eqno (112)]$

in Cartesian coordinates and

$[\vert\vert {u}\vert\vert = \left(\textstyle\sum\limits_i G_{ij} u_i u_j\right)^{1/2} \eqno (113)]$

in fractional coordinates, where G is the metric matrix.

The computation of the derivatives of L with respect to the components of U_A and U_B proceeds in two steps. First, since $[u^T U u = \sum_i U_{ii} u_i^2 + 2\sum_{i \lt j} U_{ij}u_i u_j]$ , where U is either U_A or U_B,

$[{{\partial r}\over{\partial U_{A,ij}}} = \left \{ \let\normalbaselines\relax\openup4pt\matrix { \displaystyle{{u_i^2}\over{\vert\vert {u}\vert\vert ^2}}& {\rm if}\,\, i = j, \cr \displaystyle{{2u_i u_j}\over{\vert\vert {u}\vert\vert ^2}} & {\rm otherwise}}\right. \eqno (114)]$

$[{{\partial r}\over{\partial U_{B,ij}}} = \left \{ \let\normalbaselines\relax\openup4pt\matrix { -\displaystyle{{u_i^2}\over{\vert\vert {u}\vert\vert ^2}} & {\rm if}\,\, i = j, \cr -\displaystyle{{2u_i u_j}\over{\vert\vert {u}\vert\vert ^2}} & {\rm otherwise}.}\right. \eqno (115)]$

The chain rule then gives, where U is either U_A or U_B,

$[{{\partial L}\over{\partial U_{ij}}} = 2wr {{\partial r}\over{\partial U_{A,ij}}}. \eqno (116)]$

The derivatives of L(r_A, U_A, r_B, U_B) with respect to the positions r_A and r_B are ignored.

D4.2. ADP similarity restraint

The anisotropic displacement parameters of two atoms are restrained to have the same U_ij components. Since this is only a rough approximation to reality, this restraint should be given a smaller weight in the least-squares minimization than for a rigid-bond restraint and is suitable for use in larger structures with a poor data-to-parameter ratio. Applied correctly, this restraint permits a gradual increase and change in direction of the anisotropic displacement parameters along a side chain. This is equivalent to a SHELXL SIMU restraint (Sheldrick, 1997). The weighted least-squares residual is defined as

$[L = w \textstyle\sum\limits_{i = 1}^3 \sum\limits_{j = 1}^3 (U_{A,ij} - U_{B,ij})^2, \eqno (117)]$

which, denoting $[\Delta U = U_A - U_B]$ the matrix of deltas, is the trace of $[\Delta U \Delta U^T]$ , making it clear that it is invariant under any rotation R, since it transforms $[\Delta U]$ into $[R\Delta U R^T]$ .

Since U is symmetric, i.e. U_ij = U_ji, equation (117) can be rewritten as

$[L = w \left [\textstyle\sum\limits_i (U_{A,ii} - U_{B,ii})^2 + 2 \sum\limits_{i \,\lt\, j} (U_{A,ij} - U_{B,ij})^2 \right]. \eqno (118)]$

Therefore the gradients of the residual with respect to diagonal elements are then

$[{{\partial L}\over{\partial U_{A,ii}}} = 2w(U_{A,ii} - U_{B,ii}), \eqno (119)]$

whereas the gradients with respect to the off-diagonal elements are

$[{{\partial L}\over{\partial U_{A,ij}}} = 4w(U_{A,ij} - U_{B,ij}), \eqno (120)]$

and then the same equation with $[B \leftrightarrow A]$ .

D4.3. Isotropic ADP restraint

Here we minimize the difference between the ADP's U, and the isotropic equivalent, defined in Cartesian coordinates by

$[{U}_{{\rm eq}} = U_{{\rm iso}}{\bf 1}, \eqno (121)]$

and

$[U_{{\rm iso}} = {{1}\over{3}} {\rm Tr} {U}. \eqno (122)]$

Again, this is an approximate restraint and as such should have a comparatively small weight. A common use for this restraint would be for solvent water, where the two restraints discussed previously would be inappropriate (Sheldrick, 1997). As in equation (117), we define a restraint term invariant under rotations, as L = Tr (U-U_eq)(U-U_eq)^T, or explicitly,

$[L = w \left [\textstyle\sum\limits_i (U_{ii} - U_{{\rm eq},ii})^2 + 2 \sum\limits_{i \,\lt\, j} (U_{ij} - U_{{\rm eq},ij})^2\right] \eqno (123)]$

which simplifies in Cartesian coordinates into

$[L = w \left[ \textstyle\sum\limits_i (U_{ii} - U_{{\rm iso}})^2 + 2 \sum\limits_{i \lt j} U_{ij}^2 \right]. \eqno (124)]$

By mere inspection, we can see that the derivatives of the residual with respect to the off-diagonal elements are in Cartesian coordinates:

$[{{\partial L}\over{\partial U_{ij}}} = \left \{ \matrix { 2 w (U_{ii} - U_{{\rm iso}}) & {\rm if} \,\, i = j, \hfill\cr 4 w U_{ij}\hfill & {\rm otherwise}.}\right. \eqno (125)]$

D4.4. ADP U_eq similarity

The ADP of two atoms A and B may be restrained to have the same U_eq. The restraint term for atom A reads

$[L_{A} = w \left (U_{A,{\rm eq}} - {{U_{A,{\rm eq}} + U_{B,{\rm eq}}}\over{2}}\right) ^2 \eqno (126)]$

or in Cartesian coordinates

$[L_{A} = w \left (\sum_{i = 1}^3 {{U_{A,ii} - U_{B,ii}}\over{6}} \right) ^2. \eqno (127)]$

This restraint is therefore invariant under rotation.

The derivatives with respect to the diagonal elements of U_A are then

$[{{\partial L_A}\over{\partial U_{A,ii}}} = {{2w}\over{6}} \sum_{i = 1}^3 {{U_{A,ii} - U_{B,ii}}\over{6}}, \eqno (128)]$

and the same formula with $[B \leftrightarrow A]$ .

D4.5. Fixed ADP U_eq

When the U_eq of an atom is restrained to a fixed value, P, the residual for this restraint is

$[L_{A} = w (U_{{\rm eq}} - P) = w \left ({{1}\over{3}} \sum_{i = 1}^3 U_{ii} - P \right), \eqno (129)]$

and the derivatives of the residual with respect to diagonal values of U are

$[{{\partial L}\over{\partial U_{ii}}} = {{1}\over{3}}. \eqno (130)]$

D4.6. ADP volume similarity

The volumes of the thermal ellipsoids of two atoms A and B may be restrained to be equal. The restraint term for atom A reads

$[L_{A} = w \left (V_{A} - {{V_{A} + V_{B}}\over{2}}\right) ^2 = w \left ({{V_{A} - V_{B}}\over{2}}\right) ^2. \eqno (131)]$

The harmonic anisotropic displacement of an atom is described by a multivariate normal distribution of covariant matrix U (see e.g. Trueblood et al., 1996 ), which is the anisotropic displacement tensor of that atom. Therefore, the probability distribution function is

$[p(x) = {{1}\over{[(2\pi)^3 \det U]^{1/2}}} \exp(-\textstyle{{1}\over{2}}x^T U^{-1} x).]$

The surface of constant probability p is therefore an ellipsoid of equation

Mathematical equation

where

$[A = {{U^{-1}}\over{-2 \log \{p [(2\pi)^3 \det U]^{1/2}\}}}.]$

In order to arrive at a simple formula, we fix the probability p such that the denominator is equal to 1, i.e.

$[p = {{1}\over{[(2\pi)^3 e \det U]^{1/2}}}. \eqno (132)]$

The volume of an ellipsoid with such an equation is given by $[({4}/{3})\pi (\det A^{-1})^{1/2}]$ , i.e. with the chosen value of p,

$[V = {{4\pi}\over{3}} (\det U)^{1/2},\eqno (133)]$

whose derivatives read

$[\left (\let\normalbaselines\relax\openup2pt\matrix { {{\partial V}\over{\partial U_{11}}} \cr {{\partial V}\over{\partial U_{22}}} \cr {{\partial V}\over{\partial U_{33}}} \cr {{\partial V}\over{\partial U_{12}}} \cr {{\partial V}\over{\partial U_{13}}} \cr {{\partial V}\over{\partial U_{23}}}}\right) = {{4\pi}\over{6(\det U)^{1/2}}} \left (\let\normalbaselines\relax\openup2pt\matrix { U_{22} U_{33}-U_{23}^2\cr U_{11} U_{33}-U_{13}^2\cr U_{11} U_{22}-U_{12}^2\cr 2 (U_{13} U_{23}- U_{12} U_{33})\cr 2 (U_{12} U_{23}- U_{13} U_{22})\cr 2 (U_{12}U_{13}- U_{23} U_{11})}\right) . \eqno (134)]$

We would like to emphasize once more that our choice of ellipsoid whose volume we wish to restrain lacks physical or mathematical motivations. It is only driven by simplicity. It is interesting to contrast the thermal ellipsoid volume restraint with the U_eq similarity restraint. Indeed, the former restrains the sum of the eigenvalues of U whereas the latter restrains the product of those eigenvalues. They are therefore complementary.

Footnotes

¹EPSRC grant `Age Concern: Crystallographic Software for the Future' (EP/C536274/1).

³This method is also known as the Gauss–Newton algorithm.

⁴The earliest appearance of that concept in crystallographic literature, though not by that name, brought to the attention of the authors is by Larson (1980 ).

⁵This note seems to be a recent addition and the authors would like to thank Ton Spek for this helpful consideration.

Acknowledgements

This work has been funded by the EPSRC grant `Age Concern: Crystallographic Software for the Future' (EP/C536274/1) from the British Government. We wish to thank David Watkin with whom we shared this grant for his immense contribution to our understanding of crystallographic refinement and for suggesting many improvements and corrections to this paper. We also wish to thank all the contributors to the cctbx library which served as a foundation and model for our work, especially its main author Ralf W. Grosse-Kunstleve whose support has been invaluable. We also wish to thank Professor George Sheldrick for many fruitful discussions. Finally, we wish to thank one of the referees for pointing out that we should handle Miller indices that do not satisfy centring conditions, a remark that led us to change this article and our code accordingly.

References

Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Baerlocher, C., Hepp, A. & Meier, W. M. (1978). DLS-76, a FORTRAN program for the simulation of crystal structures by geometric refinement. Institut für Kristallographie und Petrographie, ETH, Zürich, Switzerland. Google Scholar
Becker, P. J. & Coppens, P. (1974). Acta Cryst. A30, 129–147. CrossRef IUCr Journals Web of Science Google Scholar
Betteridge, P. W., Carruthers, J. R., Cooper, R. I., Prout, K. & Watkin, D. J. (2003). J. Appl. Cryst. 36, 1487. Web of Science CrossRef IUCr Journals Google Scholar
Björck, Å. (1996). Numerical Methods for Least Squares Problems. Philadelphia: Society for Industrial and Applied Mathematics. Google Scholar
Giacovazzo, C., Luis Monaco, H., Artioli, G., Viterbo, D., Milanesio, M., Gilli, G., Gilli, P., Zanotti, G., Ferraris, G. & Catti, M. (2011). Fundamentals of Crystallography, 3rd ed., edited by C. Giacovazzo. Oxford University Press. Google Scholar
Gildea, R. J., Bourhis, L. J., Dolomanov, O. V., Grosse-Kunstleve, R. W., Puschmann, H., Adams, P. D. & Howard, J. A. K. (2011). J. Appl. Cryst. 44, 1259–1263. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Comput. Commun. Newsl. 1, 28–38. Google Scholar
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655–685. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Hirshfeld, F. L. (1976). Acta Cryst. A32, 239–244. CrossRef IUCr Journals Web of Science Google Scholar
Jameson, G. B. (1982). Acta Cryst. A38, 817–820. CrossRef CAS Web of Science IUCr Journals Google Scholar
Larson, A. C. (1980). Computing in Crystallography, edited by R. Diamond, S. Ramaseshan & K. Venkatesan, pp. 11.01–11.07. Bangalore: Indian Academy of Sciences. Google Scholar
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murshudov, G. N., Vagin, A. A., Lebedev, A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247–255. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nielsen, H. B. (2000). Separable Nonlinear Least Squares. IMM, Department of Mathematical Modelling, Report No. IMM-Rep-2000-01. Technical University of Denmark, Lyngby, Denmark. Google Scholar
Nocedal, J. (1980). Math. Comput. 35, 773–782. CrossRef Google Scholar
Pratt, C. S., Coyle, B. A. & Ibers, J. A. (1971). J. Chem. Soc. A, pp. 2146–2151. CrossRef Web of Science Google Scholar
Rollett, J. S. (1965). Editor. Computing Methods in Crystallography. Oxford: Pergamon Press. Google Scholar
Rollett, J. S. (1970). Crystallographic Computing, edited by F. R. Ahmed, pp. 167–181. Copenhagen: Munksgaard. Google Scholar
Sands, D. E. (1966). Acta Cryst. 21, 868–872. CrossRef CAS IUCr Journals Web of Science Google Scholar
Sheldrick, G. M. (1997). The SHELX-97 Manual. Department of Structural Chemistry, University of Göttingen, Göttingen, Germany. Google Scholar
Sheldrick, G. M. (2008). Acta Cryst. A64, 112–122. Web of Science CrossRef CAS IUCr Journals Google Scholar
Trueblood, K. N., Bürgi, H.-B., Burzlaff, H., Dunitz, J. D., Gramaccioli, C. M., Schulz, H. H., Shmueli, U. & Abrahams, S. C. (1996). Acta Cryst. A52, 770–781. CrossRef CAS Web of Science IUCr Journals Google Scholar
Waser, J. (1963). Acta Cryst. 16, 1091–1094. CrossRef CAS IUCr Journals Web of Science Google Scholar
Watkin, D. (2008). J. Appl. Cryst. 41, 491–522. Web of Science CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

FOUNDATIONS
ADVANCES

ISSN: 2053-2733

Volume 71| Part 1| January 2015| Pages 59-75

doi:10.1107/S2053273314022207

Open

access

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search IUCr Journals		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

The anatomy of a comprehensive constrained, restrained refinement program for the modern computing environment – Olex2 dissected

1. Introduction

2. Least-squares refinement

3. Constraints

3.1. Synopsis

3.2. Constraint matrix

4. Restrained least-squares refinement

4.1. Implementation

5. Twinning

6. Standard uncertainties

6.1. Variance matrix

6.2. Derived parameters

6.2.1. Incorporating s.u.'s of unit-cell parameters

6.3. Symmetry

7. Refinement results

8. Conclusion

APPENDIX A

Computation of structure factors and their gradient

A1. Structure factor of one atom

A2. Derivatives of |F|2 and |F|

A3. Extinction correction

APPENDIX B

Minimization of least squares with an overall scale factor

APPENDIX C

Constraints

C1. Occupancy constraints

C2. Shared site and shared U constraint

C3. Riding atoms/group U constraint

C4. Rotated U constraint

C5. Pivoted rotation of a rigid body

C6. Free rotation of a rigid body

C7. Non-crystallographic symmetry constraint

APPENDIX D

Restraints and their derivatives

D1. Restraints involving symmetry

D2. Bond similarity restraint

D3. Planarity restraint

D4. Restraints on atomic displacement parameters

D4.1. Rigid-bond restraint

D4.2. ADP similarity restraint

D4.3. Isotropic ADP restraint

D4.4. ADP Ueq similarity

D4.5. Fixed ADP Ueq

D4.6. ADP volume similarity

Footnotes

Acknowledgements

References

research papers

A2. Derivatives of |F|² and |F|

D4.4. ADP U_eq similarity

D4.5. Fixed ADP U_eq