research papers
Macromolecular
by model morphing using nonatomic parameterizations^{a}Department of Chemistry, University of York, York, England
^{*}Correspondence email: kevin.cowtan@york.ac.uk
is a critical step in the determination of a model which explains the crystallographic observations and thus best accounts for the missing phase components. The scattering density is usually described in terms of atomic parameters; however, in macromolecular crystallography the resolution of the data is generally insufficient to determine the values of these parameters for individual atoms. Stereochemical and geometric restraints are used to provide additional information, but produce interrelationships between parameters which slow convergence, resulting in longer times. An alternative approach is proposed in which parameters are not attached to atoms, but to regions of the electrondensity map. These parameters can move the density or change the local temperature factor to better explain the structure factors. Varying the size of the region which determines the parameters at a particular position in the map allows the method to be applied at different resolutions without the use of restraints. Potential applications include initial of molecularreplacement models with domain motions, and potentially the use of electron density from other sources such as electron cryomicroscopy (cryoEM) as the model.
Keywords: refinement; low resolution; computational methods.
1. Introduction
The primary aim of most Xray crystallographic experiments is to obtain a model of the scattering matter of the crystal in terms of atomic coordinates, from which the bonding and therefore the chemistry may be deduced. For macromolecules, differences between crystal cells and thermal motion within a crystal cell generally limit the resolution of the diffraction pattern so that atomic centres can not be accurately distinguished. In practice, an approximate model must be constructed by manual or automatic examination of the electron density, and then the parameters of the atomic model – coordinates, isotropic or anisotropic displacement parameters and occupancy – must be refined to best explain the observed diffraction amplitudes or intensities, yielding plausible estimates for the phase components that are lost in the experiment.
The et al., 1989; Sheldrick, 2007) or, in more recent years, a negative loglikelihood function (Murshudov et al., 1997; Blanc et al., 2004; Afonine et al., 2012). The minimization may be performed directly in by adjusting the model parameters to reduce the target function. Alternatively, the derivative of the target function with respect to the model structure factors can be determined in order to identify the direction in which changes to the model should alter the model structure factors. The Fourier transform of these gradients gives rise to a form of difference map, weighted according to the type of target function being used (Henderson & Moffat, 1971; Read, 1986). The calculation may therefore be considered a problem in adjusting the model parameters to minimize the features of the difference map.
is performed in such a way to minimize some target function, which may be a leastsquares difference between the observed structurefactor amplitudes and values calculated using the atomic model parameters (DriessenIn macromolecular crystallography, the limited resolution of the data mean that the observationtoparameter ratio may not be significantly greater than 1, and therefore additional geometric restraints are used between atoms to ensure that the problem is well determined and that the results are not overly influenced by the noise in the data. The Xray observations have a similar influence on every model parameter (at least in the case of atomic coordinates), since every atom contributes to every et al., 1999).
However, the geometrical restraints, and in particularly bond lengths which are tightly restrained, introduce strong correlations between different parameters. As a result the may take many cycles to converge, since a shift to one atom will in subsequent cycles require corresponding shifts to neighbours, nextneighbours and so on (MurshudovTo enable structures to be refined with weaker or lower resolution data, recent software allow additional restraints to be used relating more distant atoms (Headd et al., 2012; Nicholls et al., 2014) or stabilize the through ridge regression (Murshudov et al., 2011). These approaches may also reduce the rate of convergence.
One common ). Domain motions and shifts in chain register are particularly problematic in this regard. Recently, Terwilliger et al. (2013) introduced a technique for `model morphing', in which entire chain segments may be moved based on finding the shift which matches model features within a sphere onto map features, significantly increasing the range of convergence of the in difficult molecularreplacement cases.
problem arises when solving a structure by in which a nearhomologous structure is placed in the by rigidbody rotation and translation in order to best explain the observed structure factors. In favourable cases, can resolve the residual differences between the structures; however, for less homologous structures the may not progress (Dodson, 2008The problem of ) is often performed within or after and involves the optimization of the rotation and translation of rigid domains to explain the observed structure factors. Similarly, translation–libration–screw (TLS) seeks to determine the rigidbody motions of a domain which best explain the anisotropic blurring of atomic features in the electrondensity map (Driessen et al., 1989; Winn et al., 2001). Both methods require that the model be correctly divided into domains in which all of the atoms move, to a first approximation, in a coordinated manner.
at lower resolutions may also be considered one of parameterization: we are trying to a refine a model described in terms of atomic parameters by fitting a diffraction pattern which cannot possibly resolve atomic features. Some existing approaches address the problem by attaching parameters to structural domains rather than to individual atoms. Rigidbody (Huber & Schneider, 1985In this paper, we outline an alternative approach in which parameters are attached to the grid points of the electrondensity map. The shifts to the parameters at any grid point are determined over an extended spherical region around that grid point, the radius of which is chosen so that the shifts will be well determined. The overlap between the spherical regions surrounding neighbouring grid points means that the shifts are highly correlated, and as a result the effective number of independent parameters can be low (dependent on the radius of the spherical region), even though the number of grid points is larger than the number of observations (Wang & Shen, 1999).
1.1. Terminology
The following terminology conventions will be used.

2. Theory
Conventional crystallographic
typically involves iterating the following steps.In order to parameterize i.e. temperature factors). While there are no atoms to which to attach these parameters, the electron density is already sampled on a threedimensional grid; therefore, the parameters will be attached to the same lattice of grid points.
without reference to an underlying atomic model, the parameters may instead be attached to the electrondensity map itself. These parameters can be the same as for an atomic model, including positional coordinates and isotropic or anisotropic displacement parameters (The calculation starts with the current model electrondensity map ρ and a difference electrondensity map D determined from the disagreement between the observed and model structure factors using likelihood weights or other means. These are used to determine a map of shifts, or `shift field', for each parameter to be refined.
Positional parameters may then be refined to make changes to the electrondensity map ρ to explain the features of the difference map. For example, if in a region it is found that a shift along the a axis will reduce the features of the difference map, the current electron density can be transformed by applying this shift. If different shifts are applied in different parts of the map, the result may lead to a bulk rotation of a region of the electron density.
To determine whether a change in a parameter will explain some features of the difference map D, the current electrondensity map can be differentiated with respect to that parameter (for example the x coordinate). If the resulting gradient map is correlated with the difference map over a particular region, then applying a shift to that coordinate (and therefore to the density) will reduce the features of the difference map. The size of the shift may be determined by finding the shift which best explains the features of the difference map.
Shifts can be determined for each of the (positional and displacement) parameters together to best explain the features of the difference map. However, the difference map also contains noise features, arising in particular from errors in the phases. The noise will lead to errors in the determination of the parameter shifts, which may be reduced by estimating the parameter shifts from larger volumes of the map. The parameter shifts for each grid point in the map will therefore be determined to best explain the features of the difference map in a spherical region around the current grid point, the size of which can be adjusted according to the resolution and the noise in the data. This calculation will be repeated for every grid point in the map.
The parameter shifts are determined by multivariate leastsquares regression, and thus scale the shifts to produce a leastsquares explanation for the differencemap features. The explanatory variables are the gradients of the model map ρ with respect to each of the parameters, and one additional constant term which will account for any error in the mean of the electron density leading to a constant value in the difference map: this may vary slowly across the as a result of missing or inaccurate lowresolution reflections.
The leastsquares solution for the parameter shifts is given by (1), where Y is the vector of differencemap values (from the difference map D) for the sphere around the grid point for which the parameter shifts are to be determined, X is a matrix whose columns are the gradients of the model map ρ with respect to a given parameter over the corresponding map positions, with one column per parameter, including the constant, and Δ is the vector of parameter shifts which best explain the difference map in terms of changes to the current map,
This assumes that all of the map is equally informative in determining the magnitudes of the parameter shifts. However, some regions of the map, in particular the solvent which may not be modelled in the current electrondensity map, may be less useful in determining the parameters. This may be addressed by using weighted leastsquares regression in which a weight is attached to each observation, given by (2). This weight is expressed as a weight matrix whose diagonal elements correspond to the weights w attached to each grid point in the difference map W = diag(w),
If there are n parameters including the constant term, and m grid points within the sphere of density, then X^{T}WX is an n × n symmetric matrix whose (i, j) element is . The term X^{T}WY is an n vector whose ith element is .
The vector Y is a vector of m differencemap values at the m grid points within the sphere. In the case where three positional and one isotropic displacement parameter are being refined, the matrix X takes the form in (3), where the derivatives are calculated by Fourier transforms (Bricogne, 2001),
The vector Δ contains the shifts to each of the parameters given by (4), where Δ_{c} is the shift to the constant term which is not otherwise used,
This calculation must be performed for every grid point in the electrondensity map ρ. The inversion of a 5 × 5 matrix and matrixvector product at each grid point are computationally cheap; however, the summation of the products of weight, gradient and differencemap vectors over a spherical region of arbitrary radius about each grid point becomes computationally demanding for larger spheres.
The calculation can be performed in a computationally efficient manner by recognizing that the summation of the various terms over a spherical region around each grid point can be efficiently performed for every grid point in the map simultaneously by convolution. This is similar in concept to the Lifchitz formulation described in Agarwal et al. (1981), except that the gradients are being averaged over a larger sphere rather than over the volume of an individual atom.
If the convolution operator is written as ⊗, then the term becomes (w_{k}X_{ki}X_{kj})⊗g, where g is a spherically symmetric function whose radius determines the region over which grid points will contribute to the regression calculation. Similarly, the term becomes (w_{k}X_{ki}Y_{k})⊗g. Convolutions must be performed for each unique element of the n × n symmetric matrix and the n vector. When refining three positional coordinates, one isotropic displacement parameter and the constant, n = 5 and therefore 20 convolutions and maps are required. If the convolution is calculated using fast Fourier transforms, the speed of the resulting method is independent of the radius over which the regression is performed.
Once the field of shifts to the x, y, z and U_{iso} parameters has been determined, these shifts can be used to update an atomic model by interpolating the value of the shift field for a given parameter using the grid points surrounding the atom. Alternatively, if the model is an electrondensity map, the coordinate shifts can be used to interpolate a new map from the existing map (however, the handling of U_{iso} is more complex).
3. Testing
Preliminary evaluation of the shiftfield approach has been performed for the problem of refining isotropic displacement parameters (i.e. temperature factors). Tests were performed using 54 structures for which an atomic model and structure factors were obtained from the Protein Data Bank (Berman et al., 2007). The test structures were all originally solved by the Joint Center for Structural Genomics (Elsliger et al., 2010) using a largely automated structuresolution pipeline leading to comparatively uniform data, subject to the different diffraction resolutions of the different crystals.
For each test structure the model was rerefined against the observations using the REFMAC5 software (Murshudov et al., 2011) in order to obtain estimates of the R factor and free R factor (Brünger, 1993) using a current version of the software. The isotropic atomic displacement parameters were then set to a constant value (U_{iso} = 0.5, B_{iso} = 4π^{2}), and a zerocycle run of REFMAC5 was used to determine the (higher) R factor and free R factor for the constant U_{iso} model. The results are independent of the chosen constant U_{iso} because REFMAC5 refines an overall U_{iso} even when the model parameters are not modified. The REFMAC5 run also produces the model electrondensity map ρ and a difference map D, which were used in the determination of the shift field.
Shiftfield
of the model was then performed by iterating the following two steps.

There are three protocol choices to be made in the implementation of the shiftfield calculation as presented below.

The optimal value of r_{0} is related to the resolution of the data (Fig. 2), allowing an appropriate value for r_{0} to be selected for an unsolved structure without running multiple calculations with different values of r_{0}. The best linear fit for r_{0} based on the R factor is given by (5). Estimation of r_{0} to optimize the free R factor is preferable and would lead to larger values of r_{0}, but the relationship is subject to much greater uncertainties.
The resulting R factor is only very weakly dependent on the radius r_{0} of the regression calculation for most of the structures (Fig. 3). For those structures for which larger radii are optimal, smaller radii do produce significantly worse results; these are generally the structures with the lowest data resolution (i.e. the largest d_{min}; Fig. 2). The weak increase in R factor as r_{0} increases past its optimum value suggests that the larger values required at lower resolutions will still lead to useful results.
The masking of unmodelled regions is only expected to make a difference when using larger radii of the regression region, since a small regression region will not significantly overlap the unmodelled region. However, there is no evidence of a benefit from the use of a mask even with larger radii.
Most of the improvement in R factor is obtained in the first cycle of shiftfield however, some additional improvement is seen over the next two cycles (Fig. 4). This raises the possibility of using the method as a faster alternative to conventional owing to the fast convergence that is achieved through the omission of chemical restraints.
The final results of the shiftfield REFMAC5, in each case using the value of r_{0} which leads to the best R factor (Fig. 5). The shiftfield gives results which are similar to those of conventional with marginally lower values of the R factor and marginally higher values of the free R factor. The R factors will be artificially lowered by the multiple trials with different values of r_{0}. If r_{0} is selected to optimize the free R factor rather than the R factor (leading to larger values for r_{0}), the resulting free R factors are on average marginally lower than the values from REFMAC5, while the R factors are marginally higher.
are compared with the results of conventional atomic parameter inThe electrondensity gradient map, difference map and shift field are shown in Fig. 6 for the region around a selenomethionine residue in PDB entry 1o6a (Elsliger et al., 2010). The gradient map shows negative features around atomic centres and positive features in a cage surrounding the Se atom, showing that increasing the B factor removes density from the atomic centre and places it around the atom. The difference map is anticorrelated with the gradient map for the mainchain atoms, indicating that their B factors should decrease, but is correlated with the gradient map for the Se atom, indicating that its B factor should increase. The shift field, which may be crudely understood as the smoothed quotient of these two maps, therefore shows that the B factor of the mainchain atoms should decrease and that of the Se atom should increase.
These results provide a sanity check and a preliminary characterization of the behaviour of the method, but do not demonstrate utility for novel problems. In particular, U_{iso} for a model where the coordinates have already been optimized is an easy problem: the shiftfield is being performed without restraints, but only one quarter of the number of parameters are being used in comparison to conventional Realistic evaluation of the utility of the method will only be possible once coordinate has been implemented.
of4. Discussion
A new approach to the
of an electrondensity model against Xray crystallographic data has been described, which optimizes positional and isotropic displacement parameters on a regular grid, rather than for individual atoms. The parameters are optimized by determining the shifts which will best explain the features of the difference map, and are obtained by multivariate regression over a spherical region around each point on the grid. The radius of the spherical region may be adjusted, and larger radii are required for optimal results at lower resolutions. While the approach has only been tested so far at medium to high resolutions (better than 3 Å), further increasing the radius of the regression region may allow to provide useful results at much lower resolutions.One potential benefit of shiftfield
is that it may be used to refine one map against another, even in the absence of an atomic model. This may be useful in the of (NCS)related regions of a map for use in the NCS averaging of flexible domains, or in the fitting of cryoEM reconstructions to Xray crystallographic data.A second potential benefit arises from the calculation of shifts over larger regions. Since groups of atoms move together (because the spherical region around each atom has significant overlap with its neighbours), atoms tend to move in a coordinated fashion without the need for stereochemical restraints. The incorporation of restraints into the
calculation produces significant offdiagonal terms in the normal matrix, which reduces the rate of convergence.Shiftfield βsheet. Conventional or morphing based on atomic parameters (Terwilliger et al., 2013), are more appropriate in these cases.
also has significant limitations which make it complementary to rather than a replacement for conventional Given that the method produces coordinated shifts to the model parameters over larger regions, it is well suited to the of coordinated changes, for example owing to domain flexibility, but is unsuited to the of noncoordinated changes, such as sidechain orientations or longitudinal shifts in the individual strands of aAnother limitation arises when the shift field is used to modify a model or map. Variations in the coordinate shift over a region, for example at the boundary of two domains which need to move in different directions, will lead to distortions in the shifted electron density or model. In the model case the ) or a fragmentbased rebuilding method (Cowtan, 2008).
may need to be iterated with regularization (Emsley & Cowtan, 2004The shiftfield approach should also be applicable to the et al., 1999). At low resolution it is normal to divide the molecule into rigid domains and refine the anisotropic motions which would arise from rigidbody motions of the domains, which are described by a translation–libration–screw (TLS) tensor (Schomaker & Trueblood, 1968; Winn et al., 2001). The TLS description has been successful in improving the fit of macromolecular models to observations, but is dependent on the identification of rigid domains and is limited to the description of rigidbody motions or positional uncertainty. Spatially correlated anisotropic displacement parameters can represent the same kind of motion or positional uncertainty as the TLS description (Thorn et al., 2012). Shiftfield may therefore allow the of correlated anisotropic motion for regions of the molecule without assuming rigidbody motion or describing the domains.
of anisotropic displacement parameters. Individual atomic anisotropic displacement parameters can usually only be refined at high resolution since they require six parameters per atom (Murshudov4.1. Data and methods
The computer code and data sets used in this paper are available at https://doi.org/10.15124/0dd39199299c433ea29b1f83a6273fb2.
Supporting information
Link https://doi.org/10.15124/0dd39199299c433ea29b1f83a6273fb2
Methods and data to reproduce the results of the paper.
Acknowledgements
The authors are grateful to Eleanor Dodson (YSBL, The University of York) for critically reading the manuscript.
Funding information
This work was supported by the BBSRC through grant BB/L006383/1 and Collaborative Computational Project Number 4: Software for Macromolecular Xray Crystallography (CCP4).
References
Afonine, P. V., GrosseKunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Agarwal, R., Lifchitz, A. & Dodson, E. (1981). In Proceedings of the CCP4 Study Weekend. Refinement of Protein Structures, edited by P. A. Machin, J. W. Campbell & M. Elder. Warrington: Daresbury Laboratory. Google Scholar
Berman, H., Henrick, K., Nakamura, H. & Markley, J. (2007). Nucleic Acids Res. 35, D301–D303. Web of Science CrossRef PubMed CAS Google Scholar
Blanc, E., Roversi, P., Vonrhein, C., Flensburg, C., Lea, S. M. & Bricogne, G. (2004). Acta Cryst. D60, 2210–2221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bricogne, G. (2001). International Tables for Crystallography, Vol. B, edited by U. Shmueli, pp. 25–98. Dordrecht: Kluwer. Google Scholar
Brünger, A. T. (1993). Acta Cryst. D49, 24–36. CrossRef Web of Science IUCr Journals Google Scholar
Cowtan, K. (2008). Acta Cryst. D64, 83–89. Web of Science CrossRef CAS IUCr Journals Google Scholar
Dodson, E. (2008). Acta Cryst. D64, 17–24. Web of Science CrossRef IUCr Journals Google Scholar
Driessen, H., Haneef, M. I. J., Harris, G. W., Howlin, B., Khan, G. & Moss, D. S. (1989). J. Appl. Cryst. 22, 510–516. CrossRef CAS Web of Science IUCr Journals Google Scholar
Elsliger, M.A., Deacon, A. M., Godzik, A., Lesley, S. A., Wooley, J., Wüthrich, K. & Wilson, I. A. (2010). Acta Cryst. F66, 1137–1142. Web of Science CrossRef CAS IUCr Journals Google Scholar
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Headd, J. J., Echols, N., Afonine, P. V., GrosseKunstleve, R. W., Chen, V. B., Moriarty, N. W., Richardson, D. C., Richardson, J. S. & Adams, P. D. (2012). Acta Cryst. D68, 381–390. Web of Science CrossRef CAS IUCr Journals Google Scholar
Henderson, R. & Moffat, J. K. (1971). Acta Cryst. B27, 1414–1420. CrossRef CAS IUCr Journals Web of Science Google Scholar
Huber, R. & Schneider, M. (1985). J. Appl. Cryst. 18, 165–169. CrossRef CAS Web of Science IUCr Journals Google Scholar
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Murshudov, G. N., Vagin, A. A., Lebedev, A., Wilson, K. S. & Dodson, E. J. (1999). Acta Cryst. D55, 247–255. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nicholls, R. A., Fischer, M., McNicholas, S. & Murshudov, G. N. (2014). Acta Cryst. D70, 2487–2499. Web of Science CrossRef IUCr Journals Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Schomaker, V. & Trueblood, K. N. (1968). Acta Cryst. B24, 63–76. CrossRef CAS IUCr Journals Web of Science Google Scholar
Sheldrick, G. M. (2007). Acta Cryst. A64, 112–122. Web of Science CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Read, R. J., Adams, P. D., Brunger, A. T., Afonine, P. V. & Hung, L.W. (2013). Acta Cryst. D69, 2244–2250. Web of Science CrossRef IUCr Journals Google Scholar
Thorn, A., Dittrich, B. & Sheldrick, G. M. (2012). Acta Cryst. A68, 448–451. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wang, X. & Shen, S. (1999). J. Clim. 12, 1280–1291. Web of Science CrossRef Google Scholar
Winn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122–133. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.