Constraints and restraints in crystal structure analysis

The restraint-based procedure in least-squares refinement is critiqued and the advantages of using internal coordinates are discussed.


Introduction
The aim of this letter is to express criticism of the widely used restraint-based approach to structural analysis from diffraction data and to underline the advantages of using constraints and internal coordinates. The debate is decidedly aged, but I believe that reopening it is desirable.
The problem of how to perform least-squares (LS) structural refinement from X-ray diffraction measurements, taking into account the subsidiary structural information available (known bond lengths, bond angles etc.), was debated back in the 1960s. The vexed question was whether to use constraints (precise specifications) or restraints (flexible specifications).
Constraints have, in fact, been used sparingly in the past 40-50 years (see later); restraints, on the other hand, have been used abundantly and are still widely employed. Another aim of this letter is to give reasons for the low popularity of constraint-based methods.
The ordinary LS procedure involves finding the minimum of the sum S ¼ P N n¼1 w n ðy o;n À y c;n Þ 2 ; where y o;n are N measurable values and y c;n the corresponding values computable as functions of J variables p j , with J much less than N, and where w n are appropriate weight factors. In diffraction analysis, y n are either the squared structure factors F 2 n or the moduli jF n j; the variables p j are structural variables, commonly the atomic fractional coordinates (a.f.c.).
Countless coordinate systems can be used, of course, as alternatives to the a.f.c., provided there are biunivocal relationships. The convenience of using internal coordinates (i.c.) for defining the molecular structure (bond lengths, bond angles and torsion angles) was soon recognized (see e.g. Wilson et al., 1955), since chemically connected atoms frequently have foreseeable distances and/or angles. Even so, other parameters will be necessary for defining the position and orientation of the molecules in the crystal, viz. molecular rotations and translations.
The number of i.c. needed for modelling an N-atom crystal structure is 3N, the same as the a.f.c. In the case of molecular crystals without symmetry, there are six rigid-body parameters and 3N À 6 molecular i.c. to be assigned among interatomic distances and angles. The latter are indeed more than 3N À 6 and the selection of i.c. among bond lengths and angles must be done carefully to produce a non-redundant coordinate system (Califano, 1974;Pulay et al., 1979). There are several good reasons for pursuing non-redundance. The first is that, in modelling molecular structures, non-redundant i.c.
behave as strictly independent variables, so that the a.f.c. are analytical functions of the i.c., whilst redundant coordinates imply nonanalytical building steps (e.g. solving one or more equations); the second reason is that, in performing LS refinements, the number of degrees of freedom is reduced and one obtains matrices of the smallest possible size; the third is that redundant systems imply singular matrices and matrix inversion with standard procedures (e.g. Gauss-Jordan & Cholesky; see Press, 1996) is not allowed.
Finally, if redundancy is avoided, carrying out a structural refinement based on internal coordinates is as simple as in the a.f.c. case, with the added advantage of having a smaller number of variables, if a number of i.c. (typically bond lengths, but also bond angles in certain cases) can be kept fixed. This applies of course in difficult cases, with a low data-to-unknown ratio. As discussed elsewhere (see Immirzi, 2007b), there is a general rule, which is applicable to all molecular crystals, for choosing the i.c. correctly: include all bond lengths among the i.c., then choose the other i.c. among angles, considering carefully the kind of construction adopted. There are several possibilities, the best known being the z-matrix method, devised by Eyring (1932). Other methods, discussed elsewhere (Immirzi, 2007a,b), make up for the limitations of the z-matrix method, which is not sufficient to cover all cases.

Using constraints and internal coordinates
If subsidiary information is available that can be considered as precise specifications, which analytically assume the form of K equations (constraints) of type f 1 ðp j Þ ¼ 0, . . . f K ðp j Þ ¼ 0, one could find the above minimum of S [equation (1)], whatever the coordinate system is, by adopting the method devised by Lagrange (1797), later termed the method of the undeterminate multipliers (Mellor, 1912). Hughes (1941) discussed the method in the crystallographic context; Waser (1963) pointed out that the method, while elegant, is often cumbersome in numerical applications. The problem is that when dealing with K constraints, the above LS matrix is not J Â J but ðJ þ KÞ Â ðJ þ KÞ instead. As a consequence, the matrix becomes, when K is more than a few units, rather ill conditioned and the procedure impractical.
Only a few crystallographic problems can be treated using the Lagrange method; one is the chain continuity in polymers (see Tadokoro, 1979;Immirzi et al., 2007). If precise specifications are numerous, it is decidedly better to use i.c. instead of ordinary a.f.c. With this strategy, the size of the LS matrix does not increase but decrease, since the number of i.c. truly optimized is much less than the number of the a.f.c.
The internal coordinates route, without deepening the redundancy problem (see above), was followed by Arnott & Wonacott (1966) who implemented the well known computer program LALS (linked atoms least squares; see also Smith & Arnott, 1978), which is of general applicability but has been used mainly in polymer crystallography. LALS has been repeatedly updated; the latest version is that reported by Okada et al. (2003).
There are other computer programs (e.g. SHELXL; Sheldrick, 2008) claiming the use of constraints without using internal coordinates, however. Such a procedure is limited to the case of linear relationships between a.f.c. accounted for by appropriate elimination of one variable computed as a function of others. To give a simple example, an atom lying along the x; y; 0 diagonal (P422 space group) is 'constrained' to have x ¼ y and this identity can be imposed with this machinery. There are, however, many more complicated situations (e.g. local non-crystallographic symmetries) and an elegant general solution exists for treating them with simplicity: using i.c. and using a symbolic language for the molecular modelling. The generalpurpose program TRY, available free of charge on the Web (http:// www.theochem.unisa.it/try.html), allows this.

Using restraints
On the whole, constraints are used somewhat infrequently. In contrast, restraints are used rather liberally by most crystallographers whenever they are dealing with many variables and limited data, and also simply when they are dissatisfied with the results obtained with the ordinary procedure. Studies done using restraints are very numerous; in protein crystallography they are used very extensively.
The restraint-based least-squares approach was first proposed by Waser (1963); recent articles have been written by Watkin (1994) and by Prince et al. (1999). The well known crystallographic package SHELX (Sheldrick, 2008) also makes use of restraints. Waser's idea was to add to the above sum S [equation (1)] a second sum S 0 to be performed on a number of quantities f m (typically bond lengths and bond angles) also computable as a function of the structural variables and having target values f m : The minimum of S þ S 0 is pursued instead of S. In practice, the role of S 0 is that of 'forcing' the p j variables towards values rendering f m close to f m . The w 0 m are appropriate weights assigned by the user; the higher the w 0 m , the stronger the forcing. The 'idealized' values for w 0 m would be 1= 2 ðf m Þ, the latter being the standard deviations for the f m observed in the reference structural models. Frequently, w 0 m are more or less arbitrary.

Critical observations
In the author's opinion, the restraint-based LS procedure, although founded on heuristic considerations, needs to be questioned. Waser's idea of treating the subsidiary conditions as if they were 'observational equations' is wanting, because it regards experimental data (the F 2 obs values) as analogous to the subsidiary information, whereas in fact the latter are merely 'expectations'. Summing data and expectations in constructing the LS matrix brings about several nonsensical inconsistencies.
Arguments against the restraint-based LS fit are as follows.
(1) The sums S [equation (1)] and S 0 [equation (2)] are intrinsically non-homogeneous, even if the summed items are rendered adimensional in both cases by defining the w n and w 0 m properly. Note that N (the number of observations) can be as high as one wishes, without an upper limit, and the measurement of each F 2 obs may be repeated many times; M (the number of restrained quantities) depends instead on the actual molecular structure and cannot be increased or decreased arbitrarily. Consequently, in the computation of the LS matrix, the role of P m can be arbitrarily reduced or enhanced and the same applies to the parameter shifts.
(2) It was stated earlier that Waser's idea is applicable whatever the cordinates are, and applies also to the internal coordinates themselves. Thus, let us perform two computations, both based on an appropriate set of i.c.: in the first case, perform a regular LS cycle without restraints, refining all the N i.c. which, being initially g i , become g Ã i ; in the second case, refine the first N À K i.c., apply K restraints to the last K i.c., and use as 'target' just the values g Ã i . This second computation (the number of degrees of freedom reduces from N to N À K) will bring the i.c. to g # i , which in general are different from g Ã i . If high w m values are used (at the limit infinity) and small w n values (at the limit zero), the i.c. for the last K terms will coincide with g Ã i (as target), but the first N À K i.c. will gain totally random values since the constructed LS matrix does not depend in any way on the measurement performed. It is evident that this is nonsense: the procedure, any mathematical procedure, must obey the continuity requirements.
(3) The convergence test (the ability of the procedure to find a solution after a number of cycles), a fundamental criterion for evaluating the reliability of the structural model in an LS fit, may become meaningless when the w 0 m weights are high. Indeed, convergence always takes place, provided the w 0 m weights are large enough. (4) A multivariate regression finds a minimum moving in a multidimensional space. If no restrictions apply, the minimum point may be everywhere; if one restriction applies, the minimum point is compelled onto a manifold, and if there are more restrictions, it is compelled onto the intersections of many manifolds. Using restraints, therefore, is like simply moving in the vicinity, and it is a very complicated and risky affair! Turning to internal coordinates (cleverly chosen), and refining only the i.c. truly unknown, one reduces drastically the dimensionality of the space and chases a point without any restriction, forgetting manifolds and other mathematical devilries. Why run along tortuous routes when you can follow straight ones?
Finally, it is worth noting that, in using restraints, the number of refined parameters and the size of the LS matrices are the same as if no restraints were imposed. By contrast, constraint-based methods imply a robust reduction in the number of parameters to be adjusted and, consequently, the reliability of the fit, the convergence etc. are considerably improved. In special cases, the reduction may be drastic. To give an impressive example, an unsymmetric calix[6]arene (54 atoms) has 162 a.f.c. and can be modelled, at fixed bond lengths, using only 12 angles (see the TRY user manual). When the conditions are difficult (many unknowns and limited data) the advantages are evident.

Concluding remarks
This letter does not set out to dismiss the restraint-based LS approach proposed by Waser as wrong, only to point out that there are wobbly foundations and that there is a risk, especially when restraints are overused, of incorrect results. In contrast, the constraint-based LS approach is unexceptionable when internal coordinates are properly chosen. This should be enough to reopen a critical discussion on the rather outmoded (and hastily archived) question of whether it is better to use constraints or restraints when dealing with complicated molecules. In the author's opinion, the constraints route is decidedly preferable when there are many parameters and limited data. If, instead, the situation is the reverse, both constraints and restraints are superfluous instruments. Although they do not often cause trouble, it is preferable not to use them.
In the author's opinion, the poor uptake of constraint-based refinements in X-ray structural analysis, necessarily based on internal coordinates and not on atomic coordinates, is due to a general disregard of the fundamental point discussed above: the necessity of using non-redundant coordinate systems. In addition, computer programming is difficult if one wishes to create systems of general validity. By contrast, the fortunes of the restraint-based method were mainly a consequence of the procedural simplicity and the relatively simple programming.