Real-space refinement in PHENIX for cryo-EM and crystallography

Afonine, P.V.; Poon, B.K.; Read, R.J.; Sobolev, O.V.; Terwilliger, T.C.; Urzhumtsev, A.; Adams, P.D.

doi:10.1107/S2059798318006551

research papers

STRUCTURAL
BIOLOGY

ISSN: 2059-7983

Volume 74| Part 6| June 2018| Pages 531-544

https://doi.org/10.1107/S2059798318006551

Open

access

Real-space refinement in PHENIX for cryo-EM and crystallography

Pavel V. Afonine,^a,^b ^* Billy K. Poon,^a Randy J. Read,^c Oleg V. Sobolev,^a Thomas C. Terwilliger,^d,^e Alexandre Urzhumtsev ^f,^g and Paul D. Adams ^a,^h

^aMolecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, ^bDepartment of Physics and International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai 200444, People's Republic of China, ^cCambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, England, ^dBioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA, ^eNew Mexico Consortium, Los Alamos, NM 87545, USA, ^fFaculté des Sciences et Technologies, Université de Lorraine, BP 239, 54506 Vandoeuvre-les-Nancy, France, ^gCentre for Integrative Biology, IGBMC, CNRS–INSERM–UdS, 1 Rue Laurent Fries, BP 10142, 67404 Illkirch, France, and ^hDepartment of Bioengineering, University of California Berkeley, Berkeley, California, USA
^*Correspondence e-mail: pafonine@lbl.gov

(Received 10 January 2018; accepted 27 April 2018; online 30 May 2018)

This article describes the implementation of real-space refinement in the phenix.real_space_refine program from the PHENIX suite. The use of a simplified refinement target function enables very fast calculation, which in turn makes it possible to identify optimal data-restraint weights as part of routine refinements with little runtime cost. Refinement of atomic models against low-resolution data benefits from the inclusion of as much additional information as is available. In addition to standard restraints on covalent geometry, phenix.real_space_refine makes use of extra information such as secondary-structure and rotamer-specific restraints, as well as restraints or constraints on internal molecular symmetry. The re-refinement of 385 cryo-EM-derived models available in the Protein Data Bank at resolutions of 6 Å or better shows significant improvement of the models and of the fit of these models to the target maps.

Keywords: real-space refinement; cryo-EM; crystallography; map interpolation; atomic-centered targets; PHENIX.

1. Introduction

Improvements in the cryo-electron microscopy (cryo-EM) technique have led to a rapid increase in the number of high-resolution three-dimensional reconstructions that can be interpreted with atomic models (Fig. 1). This has prompted a number of new developments in PHENIX (Adams et al., 2010 ) to support the method, from model building (Terwilliger, Adams et al., 2018 ), map improvement (Terwilliger, Sobolev et al., 2018 ) and refinement (Afonine et al., 2013 ) to model validation (Afonine et al., 2018 ). In this manuscript, we focus on atomic model refinement using a map (primarily cryo-EM, but the same algorithms and software are also applicable to crystallographic maps).

Figure 1
Number of cryo-EM-derived models in the PDB at resolutions of 6 Å or better.

Model refinement is an optimization problem and as such it requires the definition of three entities (for reviews, see Tronrud, 2004 ; Watkin, 2008 ; Afonine et al., 2012 , 2015 ). Firstly, the model, i.e. a mathematical construct that explains the experimental data, with an associated set of refinable parameters: in this case an atomic model with coordinates whose positions can be varied to improve the fit to the data. Seondly, the target function that links the model parameters to the experimental data: this function scores model-to-data fit and therefore guides refinement. Finally, an optimization method that changes the values of refinable model parameters such that the model agreement with the experimental data is improved. In PHENIX, gradient methods are used through L-BFGS (Liu & Nocedal, 1989 ) for this goal. If the target function is expressed through diffraction intensities or structure factors, refinement is usually referred to as reciprocal-space, or Fourier-space, refinement (FSR). Alternatively, a target function may be formulated in terms of a map: a Fourier synthesis in the case of crystallography or a three-dimensional reconstruction from projections in the case of cryo-EM. Such refinement is referred to as real-space refinement (RSR). In both cases the targets are the sums over a large number of similar terms corresponding to either reflections (FSR) or map grid points (RSR). A key methodological difference is that for RSR each term depends on only a few atoms, while for FSR each term depends on all model parameters. Most modern macromolecular refinement programs were developed for crystallographic data and therefore perform refinement in reciprocal space, at least as their main mode of operation (see Table 1 in Afonine et al., 2015). This work focuses on the real-space refinement of coordinates of atomic models.

In cryo-EM studies real-space refinement is a natural choice because a three-dimensional map is the output of the single-particle image-reconstruction method (see, for example, Frank, 2006 ) and does not change in a fundamental way as the atomic model is improved. This is not the case for crystallography, where the experimental data are diffraction intensities, and the associated and vital phase information has to be obtained indirectly. In crystallography, obtaining the best phases typically involves their calculation from atomic models, in turn making the resulting maps model-biased (see, for example, Hodel et al., 1992 ). Although FSR methods are predominant in crystallographic refinement, RSR is attractive in some contexts as it makes it possible to refine parts of the model locally and fast, and model incompleteness does not influence refinement as it does for FSR (Lunin et al., 2002 ). For this reason RSR has been particularly popular in the context of interactive model-building software such as FRODO, O (Jones, 1978 ; Jones et al., 1991 ), MAIN (Turk, 2013 ) and Coot (Emsley & Cowtan, 2004 ; Emsley et al., 2010 ).

In the case of cryo-EM an atomic model can also be refined using a reciprocal-space target. This can be achieved by converting the map into Fourier coefficients. These Fourier coefficients can then be used in reciprocal-space refinement using standard refinement protocols that are well established for crystallographic structure refinement (see, for example, Cheng et al., 2011 ; Baker et al., 2013 ; Brown et al., 2015 ). We note, however, that unless the map is converted to the full corresponding set of Fourier coefficients (and not a subset containing only a sphere limited to the stated resolution) this conversion may not be lossless.

To address the emerging structure-refinement needs of the rapidly growing field of cryo-EM, the phenix.real_space_refine program (Afonine et al., 2013), which is capable of the refinement of atomic models against maps, has been introduced into the PHENIX suite. It is not limited to cryo-EM and can also be used in crystallographic refinement (X-ray, electron or neutron). In this paper, we describe the implementation of the phenix.real_space_refine program and demonstrate its performance by applications to simulated data and to cryo-EM models in the PDB (Bernstein et al., 1977 ; Berman et al., 2000 ) and corresponding maps in the EMDB (Henrick et al., 2003 ). This is a work in progress, and further details and advances will be reported as the program evolves. To date, phenix.real_space_refine has been used in a number of documented structural studies (see, for example, Fischer et al., 2015 ; Shalev-Benami et al., 2016 ; Chua et al., 2016 ; Ahmed et al., 2016 ; Yang et al., 2016 ; Gao et al., 2016 ; Chen et al., 2016 ; Bhardwaj et al., 2016 ; Lokareddy et al., 2017 ; Hryc et al., 2017 ; Ahmed et al., 2017 ; Demo et al., 2017 ; Paulino et al., 2017 ; Liu et al., 2017 ).

2. Methods

2.1. Refinement flowchart

Fig. 2 shows the model-refinement flowchart as it is implemented in phenix.real_space_refine. This is very similar to the reciprocal-space refinement workflow implemented in phenix.refine (see Fig. 1 in Afonine et al., 2012).

Figure 2
Flowchart for phenix.real_space_refine.

The program begins by reading a model file, in PDB or mmCIF format, map data (as an actual map in MRC/CCP4 format or as Fourier map coefficients in MTZ format) and other parameters, such as resolution (if a map is provided) or additional restraint definitions for novel ligands, internal molecular symmetry (e.g. NCS in crystallography) or secondary structure. Once inputs have been read, the program proceeds to calculations that constitute a set of tasks repeated multiple times (macro-cycles). Tasks to be performed during the refinement are defined by the program automatically and/or by the user. In its default mode the program will only perform gradient-driven minimization of the entire model. Other nondefault tasks allow optimization using simulated annealing (SA; Brünger et al., 1987 ), morphing (Terwilliger et al., 2013 ), rigid-body refinement (see Afonine et al., 2009 and references therein) and systematic residue side-chain optimizations using grid searches in torsion χ-angle space (Oldfield, 2001 ). Parts of the model related by internal symmetry are determined automatically, if available, or can be defined by the user. In the presence of such internal symmetry, restraints or constraints can be applied between the coordinates of related molecules. The operators relating molecules can also be refined. The result of refinement, i.e. the refined model, is output as a file in PDB or mmCIF format.

Central to almost all tasks performed within a refinement macro-cycle is the target function. Its choice is the key for the success of refinement, i.e. efficient convergence to an improved model. Also of the same importance is the assessment of refinement progress by quantifying model quality and the goodness of model-to-map fit throughout the entire process. Some relevant points are discussed below.

2.2. Refinement target function

Macromolecular cryo-EM or crystallographic experimental data are almost always of insufficient quality to refine parameters of atomic models individually. To make refinement practical, restraints or constraints are almost always used in order to incorporate extra information into refinement, and the corresponding procedures are called restrained or constrained refinement. In restrained refinement the target function is a sum of data-based and restraints-based components:

$[T = {T}_{\rm data}+{w}_{\rm restraints} \times {T}_{\rm restraints}. \eqno(1)]$

The first term scores the model-to-data fit and the second term incorporates a priori information about the model. The weight w_restraints balances the contribution of restraints to maximize the model-to-data fit while also obeying the a priori information, and an optimal choice of its value is crucial. Constrained refinement does not change the target function but rather changes (reduces) the set of independent parameters that can vary. Examples include rigid-body refinement, the use of a riding model (Sheldrick & Schneider, 1997 ) to parameterize the positions of H atoms in refinement or the implementation of RSR by Diamond (1971 ) using torsion angles as variables.

2.2.1. Model-to-map target (T_data)

In RSR, the T_data term scores the fit of the model being refined to a target map. In cryo-EM the map is a three-dimensional reconstruction, while in crystallography it may be, for example, a 2mF_obs − DF_model map (Read, 1986 ).

It is possible to express the difference between the two maps in the integral form (see, for example, Diamond, 1971)¹

$[T_{\rm data} = \textstyle \int \limits_{V}[\rho_{\rm calc}({\bf r})-\rho_{\rm tar}({\bf r})]^{2}\, {\rm d}{\bf r}. \eqno(2)]$

For (2) we suppose that the original target map is optimally scaled to the model map (Diamond, 1971; Chapman, 1995 ). In the following, we will consider the target to be essentially unchanged by manipulations that shift its value by a constant or a scale factor, as such manipulations do not change the position of the minimum of the target. If the Euclidean norms of ρ_tar(r) and ρ_calc(r) are conserved during refinement [i.e. if $[\textstyle \int_{V}\rho^{2}_{\rm tar}({\bf r})\,{\rm d}{\bf r}]$ = constant, as will be the case when the target map itself does not change, and if $[\textstyle \int_{V}\rho^{2}_{\rm calc}({\bf r})\,{\rm d}{\bf r}]$ = constant, which will be true if the overlap of atomic densities does not change] then minimization of (2) is equivalent to minimization of the anticorrelation target, which does not need the maps to be optimally scaled,

$[T_{\rm data} = -\textstyle \int \limits_{V} \rho_{\rm calc}({\bf r})\rho_{\rm tar}({\bf r})\,{\rm d}{\bf r}. \eqno(3)]$

Assuming the target ρ_tar and model-calculated ρ_calc maps are provided on the same grid, a continuous integration in (2) and (3) can be replaced with a numeric integration over the regular grid on which the maps are available (see, for example, Diamond, 1971),

$[T_{\rm data} = \textstyle \sum \limits_{{\bf n}\in G}[\rho_{\rm calc}({\bf n})-\rho_{\rm tar}({\bf n})]^{2} \eqno(4)]$

$[T_{\rm data} = -\textstyle \sum \limits_{{\bf n}\in G}\rho_{\rm calc}({\bf n})\rho_{\rm tar}({\bf n}), \eqno(5)]$

respectively. The set G of grid nodes used to calculate the targets (i.e. the integration volume) is either the whole map or an envelope (mask) surrounding the whole atomic model or its part that is subject to refinement.

To match the finite resolution of the target map in (5) accurately, several steps are required to compute the model map. Firstly, the model map distribution is calculated using one of the available approximations (Sears, 1992 ; Maslen et al., 1992 ; Waasmaier & Kirfel, 1995 ; Grosse-Kunstleve et al., 2004 ; Peng et al., 1996 ; Peng, 1998 ). A set of Fourier coefficients is then calculated from the distribution up to the resolution limit specified by the target map.² Finally, a subset of these coefficients is used to calculate the model Fourier synthesis ρ_calc that can then be used in (5). This synthesis is a representation of a model image at a given resolution. A typical refinement may require hundreds or even thousands of such model-image calculations, which are computationally expensive, involving two Fourier transforms.

Alternatively, a model map may be calculated from the atomic model directly as a sum of individual contributions of M atoms, with each contribution being a Fourier image (or its approximation) of the corresponding atom at a given resolution (see, for example, Diamond, 1971; Lunin & Urzhumtsev, 1984 ; Chapman, 1995; Mooij et al., 2006 ; Sorzano et al., 2015 ). While this is much faster than the previous method, it may be less accurate and still be computationally expensive, especially for large models.

A numeric integration over the whole map (5) can be simplified by the integration exploring the volume directly around the atomic centers r_m, m = 1, … M:

$[T_{\rm data} = -\textstyle \sum \limits_{m = 1}^{M}\rho_{\rm calc}({\bf r}_m){\tilde{\rho}}_{\rm tar}({\bf r}_m). \eqno(6)]$

Here, $[{\tilde{\rho}}_{\rm tar}({\bf r}_m)]$ are the values interpolated from the nearby grid node values ρ_tar(n) to the atomic centers r_m (Appendices A and B). Neglecting the local variation of the model map at the atomic centers (e.g. at low resolution) and thus supposing ρ_calc(r_m) ≃ constant for all m, the target simplifies further as (Rossmann, 2000 ; Rossmann et al., 2001 )

$[T_{\rm data} = -\textstyle \sum \limits_{m = 1}^{M}{\tilde{\rho}}_{\rm tar}({\bf r}_m). \eqno(7)]$

The hypothesis ρ_calc(r_m) ≃ constant seems to be reasonable at low resolution, when a calculated map can be considered to be rather flat. On the other hand, minimization of (7) is essentially a fitting of atoms to the nearest peaks of the target map, which seems to be appropriate at high resolution as well. We show below (§3) that indeed this target function is efficient over a large resolution range; Appendix B supports this observation through the equivalence of targets (7) and (5) when taking map blurring/sharpening into account. If the difference in atomic size cannot be neglected, this target function can be modified to

$[T_{\rm data} = -\textstyle \sum \limits_{m = 1}^{M}w_m{\tilde{\rho}}_{\rm tar}({\bf r}_m), \eqno(8)]$

where w_m is an atom-specific weight. For example, w_m can be the electron number of the corresponding atom or it can be set negative for O atoms of Asp and Glu residues in the case of cryo-EM or for atoms that have a negative scattering length (such as hydrogen) in the case of neutron diffraction data. Clearly, for most of the macromolecular structures under consideration here these atom-centered targets are nearly the same, and for simplicity in the following we refer only to (7) unless otherwise stated. The computational cost of (7) is proportional, with a very small coefficient, to the number of atoms and therefore these targets are much faster to calculate compared with (5), making it advantageous for the refinement of large models. Unlike (4) or (5), the computational cost of (7) or (8) does not depend on the resolution or map-sampling rate. Essentially, target (5) optimizes the fit of the shape between model-calculated and experimental maps, while target (7) simply guides atoms to the nearest peaks in the experimental map. Therefore, refinement using (5) can produce a more accurate model-to-map fit. An optimal refinement protocol may consist of using target (7) for routine refinements and using (5) for the final refinement.

2.2.2. Restraints (T_restraints)

In restrained refinement, extra information is introduced through the term T_restraints with some weight (1). This extra term restrains model parameters to be similar, but not necessarily identical, to some reference values. At high to medium resolutions of approximately 3 Å or better, a standard set of restraints as implemented in PHENIX includes (Grosse-Kunstleve & Adams, 2004 ) restraints on covalent bond lengths and angles, dihedral angles, planarity and chirality restraints, and a nonbonded repulsion term. However, at lower resolutions the amount of experimental data is insufficient to preserve the geometry characteristics of a higher level of structural organization (such as secondary structure), and therefore including extra information (restraints or constraints) to help to produce a chemically meaningful model is desirable. These extra restraints or constraints may include similarity of related copies (NCS in the case of crystallography), restraints on secondary structure and restraints to one or more external reference models (for implementation details in PHENIX, see Headd et al., 2012 , 2014 ; Sobolev et al., 2015 ). phenix.real_space_refine can use the following extra restraints and constraints.

(i) Distance and angle restraints on hydrogen-bond patterns for protein helices and sheets and DNA/RNA base pairs.
(ii) Torsion-angle restraints on idealized protein secondary-structure fragments.
(iii) Restraints to maintain stacking bases in RNA/DNA parallel.
(iv) Ramachandran plot restraints.
(v) Amino-acid side-chain rotamer-specific restraints.
(vi) C^β deviation restraints.
(vii) Reference-model restraints, where a reference model may be a similar structure of better quality or the initial position of the model being refined.
(viii) Similarity restraints in torsion or Cartesian space.
(ix) NCS constraints.

2.2.3. Relative weight

The relative weight w_restraints is chosen such that the model fits the map as well as possible while maintaining reasonable deviations from ideal covalent bond lengths and angles. In PHENIX, w_restraints for RSR is determined by systematically trying a range of plausible values and performing a short refinement for each trial value. A similar procedure in FSR would be very computationally expensive because for each trial value of w_restraints the whole structure would need to be used. In RSR this is computationally feasible using (7) but not (5). The weight-calculation procedure implemented in phenix.real_space_refine splits the model into a set of randomly chosen segments, each one a few residues long. After trial refinements of each segment with different weights, the best weight is defined as the one that results in a model possessing reasonable bond and angle root-mean-square deviations (r.m.s.d.s) and that has the best model-to-map fit among all trial weights. The obtained array of best weights for all fragments is filtered for outliers and the average weight is calculated and defined as the best weight for the final refinement. This calculation typically takes less than a minute on an ordinary computer and is independent of the size of the structure or map. Instead of computing an average single weight for the entire model, this protocol can be extended (work in progress) to calculate and use different weights for different parts of the map, accounting for variations in local map quality.

2.3. Evaluation of refinement progress and results

It is recognized that model validation (see, for example, Brändén & Jones, 1990 ; Read et al., 2011 ; Wlodawer & Dauter, 2017 ) is a critical step in structure determination, and a number of corresponding tools have been developed in crystallography (see, for example, Chen et al., 2010 ; Read et al., 2011; Gore et al., 2017 ; Williams et al., 2018 and references therein) and some in cryo-EM studies (see, for example, Henderson et al., 2012 ; Tickle, 2012 ; Lagerstedt et al., 2013 ; Barad et al., 2015 ; Pintilie et al., 2016 ; Joseph et al., 2017 , Afonine et al., 2018). Generally, the process consists of assessing data, model quality and model-to-data fit quality, and is performed locally and globally. At the stage of refining a model we assume that the intrinsic data quality has already been evaluated, and only model quality and model-to-data fit need to be monitored.

The methods and tools to evaluate the geometric quality of a model are the same in crystallography and in cryo-EM. For example, the PHENIX comprehensive validation program provides an extensive report on model quality, making extensive use of the MolProbity validation algorithms (Chen et al., 2010; Richardson et al., 2018). In crystallography, the model-to-data fit is quantified by crystallographic R and R_free (Brünger, 1992 ) factors, which are global reciprocal-space metrics. In cryo-EM, model and data validation is currently performed by the comparison of complex Fourier coefficients in resolution shells; these coefficients are calculated from the model and from the full map or half-maps; different masks can be applied prior to calculation of these coefficients. Also in real space the model-to-data fit can be evaluated locally or globally by various correlation coefficients between a model-calculated map and the experimentally derived map (Urzhumtsev et al., 2014 ; Afonine et al., 2018). Some of these tools are used in §3.2, where models extracted from the PDB are refined against experimental cryo-EM maps.

3. Results

3.1. Test refinements with simulated data

Below, we illustrate the performance of refinement at different resolutions and map sharpnesses and using atomic models with various amounts of error in the coordinates. All refinements were performed using refinement target (1) with geometry restraints included with optimal weights and data term (7). We begin with several numerical tests using simulated data. The advantage of such tests is that one can study individual effects in a setting where the answer is known.

3.1.1. Preparing simulated data

A model from the PDB (PDB entry 3vb1) was chosen as a test model. The following manipulations were made to this model prior to test calculations: (i) the model was placed in a sufficiently large P1 unit cell, (ii) alternative conformations were replaced with a single conformation and (iii) model geometry was regularized using the phenix.geometry_minimization tool until convergence. In the following, we refer to this model as a reference model. Several Fourier maps at different resolutions d_high (1, 2, 3, 4, 5 and 6 Å) were calculated from the reference model considering three different overall B factors of 0, 100 and 200 Å²; these maps mimic ρ_tar (18 maps in total). The maps were calculated on a grid with the step equal to d_high/4. Additionally, we calculated the same maps on a much finer grid with a step of 0.2 Å; the same step was used for all maps independent of their resolution.

3.1.2. Refinement of the exact reference model

Firstly, we refined the reference model against finite-resolution maps calculated from this model, as described in §3.1.1. While the reference model corresponds to the minimum of (5), this is not the case for (7) because map peaks in finite resolution Fourier images do not necessarily correspond to atomic centers. Therefore, it is expected that refinement using (7) may shift the model from its original, correct, position. The goal of this test is to provide an estimate of the magnitude of these shifts after refinement. For each refined model we calculated the root-mean-square deviation (r.m.s.d.) from the reference model. Fig. 3 summarizes the result of this test. We observe the following.

(i) Refinement using a finer grid does not have any significant effect compared with using a d_high/4 grid step (compare the orange dots and black circles in Fig. 3).
(ii) The r.m.s.d. increases as the resolution worsens and ranges from as low as 0.01 Å at 1 Å resolution to as high as 0.48 Å at 6 Å resolution. These r.m.s.d.s are small compared with the details that can be resolved in maps at these resolutions. This justifies the use of a target (7) that is less accurate but much faster to calculate than (5).
(iii) Map sharpness has a mixed effect. At high resolution (1–2 Å) maps corresponding to the lowest B of 0 Å² produce more accurate results. At intermediate resolutions (3–5 Å) maps corresponding to both the lowest and the largest B perform worse compared with those corresponding to an intermediate value (B = 100 Å²). Maps with the largest B of 200 Å² result in overall less accurate models. These observations suggest that depending on resolution some attenuation of map sharpness may be useful.

Figure 3
Refinement of the exact model against 18 maps computed as described in §

3.1.1. Each circle shows the root-mean-square deviation between the refined model and the reference model. Blue, green and orange full circles correspond to maps with overall B factors of 0, 100 and 200 Å², respectively. Open circles correspond to the map with an overall B factor of 100 Å² computed on the finer grid with a step of 0.2 Å. See §

3.1.2 for details.

3.1.3. Refinement of perturbed reference models

Here, we describe tests that are similar to those in §3.1.2 except that instead of refining the reference model we refined perturbed reference models. These perturbed models were obtained by running molecular-dynamics (MD) simulations using the phenix.dynamics tool until a prescribed r.m.s.d. compared with the reference model was achieved. Given the stochastic nature of MD, it is possible to obtain many different models with the same r.m.s.d. from the reference model. Owing to the limited convergence radius of refinement and the finite resolution of the data, refinement of these models will not produce exactly the same refined models. Therefore, to ensure more robust statistics, for each chosen r.m.s.d. we generated an ensemble of 100 models. The r.m.s.d. values between the perturbed and reference models were chosen to be 0.5, 1.0, 1.5 and 2.0 Å. We then refined each of these 100 × 4 = 400 models against each of 18 maps (§3.1.1) calculated on a grid with a spacing of d_high/4. For each refined model (from 100 × 4 × 6 × 3 = 7200 refined models) we calculated the r.m.s.d. from the reference model and then the average r.m.s.d. over the corresponding ensemble of 100 models. Fig. 4 summarizes the results of this test. We observe the following.

(i) In most cases refinement was able to significantly reduce the difference between the reference and starting perturbed models. The refinement of models with a starting r.m.s.d. of 0.5 Å gives similar results as the refinement of a nonperturbed reference model (similar r.m.s.d.).
(ii) In almost all cases using a blurred map results in less accurate refined models.
(iii) In the case of large errors (1.5–2 Å) refinement against a 1 Å resolution map corresponding to an overall B of 0 Å² performs the worst compared with blurrier maps. This can be rationalized as the peaks on a very sharp map are narrow and sufficiently large displacements of atoms away from these peaks results in shifts that are outside the convergence radius of minimization.
(iv) At resolutions of 3–5 Å using neither very sharp nor very blurred maps produces the best results, although the effect is rather small. This suggests that there exists an optimal sharpening B value that is most suitable for refinement at a given resolution.

Figure 4
Refinement of perturbed models against maps computed as described in §

3.1.1. The horizontal axis shows the r.m.s.d. between the reference model and perturbed models: 0.5, 1.0, 1.5 and 2.0 Å. The vertical axis shows the r.m.s.d. between the reference model and the refined models. Blue, green and orange full circles correspond to maps with overall B factors of 0, 100 and 200 Å², respectively. See §

3.1.3 for details.

3.2. Refinement using data from the PDB and EMDB

3.2.1. Cryo-EM maps

Three-dimensional reconstructions (cryo-EM maps) represent the electric potential of the sample. Therefore, these maps are expected to have negative features around negatively charged moieties such as aspartate and glutamate (see, for example, Hryc et al., 2017). Furthermore, such moieties may be susceptible to radiation damage and therefore may have a weaker footprint in the reconstructions. This may have an implication for real-space refinement that uses target (7) [or (5) if the form factors do not reproduce the negative features] because this target favors atomic shifts towards positive map peaks. To investigate this effect, we surveyed map values at atomic positions considering reconstructions at 3 Å or better and map–model correlation better than 0.8. This selected nine (map, model) pairs. Prior to calculations, we normalized all selected maps to have zero mean value and a standard deviation of 1. Fig. 5(a) shows the distribution of map values for four groups of atoms: main-chain atoms, side-chain O atoms of Asp and Glu residues that may be negatively charged (OD1, OD2, OE1 and OE2), side-chain atoms of Arg and Lys residues that may be positively charged (NH1, NH2 and NZ) and all other side-chain atoms. We observe that side-chain O atoms of Asp and Glu residues indeed have systematically weaker map values, with about 8% of atoms having values below a threshold of −1 times the r.m.s. of the map. Negative map values for all other kinds of atoms are greater than −0.5 r.m.s. and may be considered as noise. We note that the size and flexibility of Asp, Glu, Arg and Lys side chains are likely to contribute to systematically weaker densities for these side chains. We repeated the same analysis for maps of lower resolution (3–4 Å; Fig. 5b). Here, the number of reliably observed atoms with negative features in the map is less than 1%.

Figure 5
Distribution of cryo-EM map values (scaled in r.m.s.) for selected groups of atoms, considering maps at 3 Å or better (a) and 3–4 Å (b) resolution. See §

3.2.1 for details.

This analysis shows that for the majority of cryo-EM models (resolution of 3 Å or worse) the concern about negative features in the map is rather small and is unlikely to affect the results of refinement using (7) significantly. On the other hand, the rapidly increasing number of higher resolution cryo-EM maps (better than 3 Å) is likely to highlight the limitation of (7) and to demand further improvements of the refinement target [such as using (8) with properly chosen weights].

3.2.2. Default refinement

In order to test the suggested methods and demonstrate their utility, we re-refined 385 cryo-EM models from the PDB that are reported at a resolution of 6 Å or better, that have model–map correlation greater than 0.3 and that contain only residues and ligands that are known to the PHENIX restraint library. A number of metrics were analyzed: the model-to-map correlation coefficient CC_mask calculated in the map region around the model (for an exact definition, see Afonine et al., 2018), the number of Ramachandran plot and rotamer outliers, excessive C^β deviations, the MolProbity clashscore (Chen et al., 2010) and the EMRinger score (Barad et al., 2015; calculated for 277 entries with maps at 4.5 Å resolution or better), all calculated for the initial models from the PDB and for the models after refinement. Default parameters were used in all refinements that, in addition to standard restraints, also include rotamer, C^β deviations and Ramachandran plot restraints, as well as NCS constraints where applicable (see §2.2.2). The program ran successfully, generating a refined model for all cases and highlighting the robustness of the algorithms and their implementation. In all cases we observe a substantial overall improvement of geometry metrics, such as reduced or fully eliminated Ramachandran plot and rotamer outliers, C^β deviations and MolProbity clashscore, as well as improvement of the model-to-data (map) fit (Fig. 6). Clearly, the removal of some outliers can be attributed to the use of rotamer, C^β deviations and Ramachandran plot restraints. Therefore, we also used an orthogonal validation metric to assess model improvement: EMRinger (Barad et al., 2015). We observe that the overall average EMRinger score for the initial models is 1.73 and that for the refined models is 2.26. The improvement of the EMRinger score for the refined models indicates that the amino-acid side chains are more chemically realistic and better fit the map. Detailed validation or analysis of individual refinement results is outside the scope of this work, but will be important in the future to assess the impact of stereochemical restraints on models, particularly when the starting models are of very poor quality.

Figure 6
Model statistics before (brown) and after (blue) refinement using phenix.real_space_refine, showing Ramachandran plot and residue side-chain rotamer outliers, C^β deviations, MolProbity clashscore and model–map correlation coefficient (CC_mask). The scatter plot shows the EMRinger score for the original and refined models (resolution better than 4.5 Å).

3.2.3. Refinement against sharpened maps

Our tests using simulated data (§3.1) have indicated that map sharpening or blurring may be useful in refinement. To investigate this with the real experimental data we performed the following test. We selected models similarly to as described in §3.2.2, additionally requiring that independent half-maps had also been deposited by the researcher. This resulted in 76 entries. We performed test refinements against the first of the two half-maps and evaluated the refined model-to-data fit using the original second half-map that had not been used in any calculations. In two independent refinements, the first half-map was taken either as deposited or modified with phenix.auto_sharpen (Terwilliger, Sobolev et al., 2018) to automatically optimally sharpen or blur the map. Fig. 7 shows the model–map correlation CC_mask for models refined against the original and sharpened first half-maps; the original second half-maps were used to compute the correlations. Overall, the CCs across all 76 cases are similar for refinement against the original first half-map and the sharpened first half-map. The refined models fit slightly but systematically better when using sharpened maps if the original model–map CC is low (<0.5) and systematically slightly worse if the original model–map correlation is higher (CC > 0.5). This agrees with the observation that target (7) allows the removal of large errors but may slightly distort exact models (§3.1.2). Also, we note that the MolProbity scores for models refined against sharpened maps are systematically better, but the difference is small.

Figure 7
Left, correlation coefficient CC_mask calculated using the original second half-maps and maps calculated from models refined against the first half-maps: original (x axis) versus sharpened (y axis). Right, MolProbity scores for models using original first half-maps versus sharpened first half-maps.

3.2.4. Re-refinement of the TRPV1 structure

The structure of the TRPV1 ion channel (PDB entry 3j5p; EMDB code EMD-5778) was determined by single-particle cryo-EM (Liao et al., 2013 ) at a resolution of 3.28 Å. The model was built manually and was not subjected to refinement. As the model was not refined it contains substantial geometry violations: the clashscore is high (∼100) and about one third of the side chains are identified as rotamer outliers (Table 1). More recently, the better resolved part of this structure has been re-evaluated using the same data (Barad et al., 2015; PDB entry 3j9j; ankyrin domain not included). This involved some rebuilding and refinement using algorithms implemented in the Rosetta suite (DiMaio et al., 2015 ). The resulting model has a much improved clashscore and EMRinger score (Barad et al., 2015) and no rotamer outliers, yet the number of Ramachandran plot outliers has increased compared with the original model (Table 1). We performed a refinement of PDB entry 3j5p (the portion that matches PDB entry 3j9j) using phenix.real_space_refine with all default settings and automatically, with no manual intervention, using the original, deposited map. The refinement took about 3 min on a Macintosh laptop.³ Overall, the refined model is similar to PDB entry 3j9j (virtually no rotamer or Ramachandran plot outliers), the EMRinger score is improved further and the model-to-map correlation (CC_mask) is increased compared with both PDB entries 3j5p and 3j9j. Notably, the MolProbity clashscore decreased from 100.8 to 5.6 as a result of the resolution of numerous steric clashes (Fig. 8).

Table 1
Summary of statistics for the original model (PDB entry 3j5p), that re-refined by Barad et al. (2015) (PDB entry 3j9j) and that re-refined by phenix.real_space_refine models

Metric	3j5p †	3j9j	3j5p † (phenix.real_space_refine)
CC_mask	0.65	0.59	0.82
EMRinger score	1.2	2.6	3.3
R.m.s.d.
Bonds (Å)	0.01	0.02	0.01
Angles (°)	1.50	1.10	1.44
Ramachandran plot (%)
Favored	95.8	94.5	93.3
Allowed	4.2	3.3	6.7
Outliers	0	2.2	0
Rotamer outliers (%)	32.3	0	<1
Clashscore	100.8	2.7	5.6
C^β deviations	0	0	0

†No ankyrin domain.

Figure 8
Backbone of the 3j5p model before (a) and after (b) refinement shown in black. The model before refinement contains a substantial number of steric clashes (indicated by red dots) and many side-chain rotamer outliers (blue side chains). Most clashes and rotamer outliers are resolved by phenix.real_space_refine. The images were created using the KiNG program (Chen et al., 2009

) from within PHENIX.

Modeling experimental data at resolutions below atomic (around 1–1.5 Å and better) may not be unambiguous (Terwilliger et al., 2007 ). Therefore, it may be instructive to perform several trial refinements, each using the exact same settings but different (perturbed) input models. Here, we generated an ensemble of 100 perturbed models by running molecular-dynamics simulation (using phenix.dynamics tool) until the r.m.s. deviation between the starting and simulated models reached 3 Å (Fig. 9a). We then refined all models using phenix.real_space_refine until convergence. This resulted in 100 refined models that are overall similar but vary locally (Fig. 9b). This highlights the fact that a single-model representation of experimental data is an approximation and should not be taken too literally (for example, when it comes to measuring and reporting distances between atoms). Also, this test demonstrates the rather large convergence radius of phenix.real_space_refine: the average map–model correlation (CC_mask) across all 100 refined models is 0.80, with the smallest and largest values being 0.79 and 0.81.

Figure 9
(a) Ensemble of perturbed 3j5p models; the r.m.s. deviation of each model from the initial model is 3 Å, showing chain A only. (b) Ensemble of refined models in the experimental map. The largest variation is observed in regions that lack density. The images were created using the ChimeraX program (Goddard et al., 2018

4. Conclusions

Refinement of an atomic model against a map is increasingly important as the technique of cryo-EM rapidly develops. We have described the algorithms implemented in a new PHENIX tool, phenix.real_space_refine, that was specifically designed to perform such real-space refinements. RSR is a natural choice for cryo-EM, unlike crystallography, where real-space methods are complementary to Fourier-space refinement and are somewhat limited since crystallographic maps are almost always model-biased. Nevertheless, while this work was inspired by rapid advances in the field of cryo-EM and the increasing number of three-dimensional reconstructions that allow atomic models to be refined (as opposed to rigid-body docked), the implementation is not limited to cryo-EM and crystallographic maps can also be used.

The proposed real-space refinement procedure is fast owing to the use of an atom-centered refinement target function that has been shown to be efficient at all tested resolutions from 1 to 6 Å. Several options for key calculation steps, such as map interpolation, gradient calculation and preliminary processing of the target (experimental) map, are available with the default choices selected on the basis of extensive test calculations. The real-space refinement algorithm includes a fast and efficient search for the optimal relative weight of restraints, a procedure that is extremely challenging for reciprocal-space refinement. The refinement algorithm is robust, with no failures for any of the cryo-EM maps tested. For all test model refinements improvements are observed; in some cases these improvements are significant. Future developments of the algorithms will include methods to account for local variation in map resolution and a fast and accurate calculation of (5) for the final refinement cycles and efficient modeling of atomic displacements.

APPENDIX A

Real-space targets and convolution

We show here that if the atoms all have the same shape, sampling a map at the positions of atomic centers, as in (7), can be made equivalent to the correlation function obtained by integrating or summing over the product of calculated and target densities, as in (3) or (5). Consider a simplified structure composed of a single atom. Looking for its best position according to (3) or (5) corresponds to seeking the position where the weighted average of the target map values (weighted by the atomic shape) inside a sphere centered at the trial atomic position is maximal. This calculation and check for the maximal value could be performed point by point. Alternatively, one can first calculate such averages for all grid points, replace the initial map values by these sums and then simply choose the maximum. From a mathematical point of view this averaging can be considered as a convolution and, if calculated simultaneously for the whole map, can be performed rapidly (Leslie, 1987 ; Urzhumtsev et al., 1989 ). Checking the values of the averaged, i.e. blurred, map for their maximum corresponds to using targets (7) or (8). Below, we give a formal interpretation of these real-space targets.

Let Z₀f₀(|s|; B₀) be a scattering factor of some isotropic atom characterized by a B₀ value and the electron number Z₀. Let Z₀ρ₀(r; B₀) be an image of this atom in the corresponding model map if it is placed at the origin. Both Z₀f₀(|s|; B₀) and Z₀ρ₀(r; B₀) are spherically symmetric and related by Fourier transformation. If a hypothetical structure is composed of a single atom positioned at r₀, the corresponding model map is

$[\rho_{{\rm calc},0}({\bf r}) = Z_0\rho_0({\bf r}-{\bf r}_{0}\semi B_{0}), \eqno (9)]$

which can be seen as a convolution of a point scatterer at position r₀ with the atomic shape. Owing to the spherical symmetry of ρ₀(r; B₀), the target function (3)

$[\eqalignno {T_{\rm data} &= -\textstyle\int\limits_{V} \rho_{\rm tar}({\bf r})\rho_{{\rm calc},0}({\bf r})\,{\rm d}{\bf r} = -Z_{\rm 0}\textstyle \int\limits_{V}\rho_{\rm tar}({\bf r})\rho_{0}({\bf r}-{\bf r}_0\semi B_{0})\,{\rm d}{\bf r} \cr & = -Z_{0}\textstyle\int\limits_{V}\rho_{\rm tar}({\bf r})\rho_{0}({\bf r}_{0}-{\bf r}\semi B_{\rm 0})\,{\rm d}{\bf r} & (10)}]$

can be interpreted as a convolution of the target map with ρ₀(r; B₀) taken at point r₀. Let {F_tar(s)} be the set of Fourier coefficients corresponding to the target map ρ_tar(r). By the convolution theorem, (10) is equal to the Fourier series of the corresponding Fourier coefficients,

$[\eqalignno {-Z_{0}\textstyle \sum \limits_{\bf s}&{\bf F}_{\rm tar}({\bf s})f_{0}(|{\bf s}|\semi B_{0})\exp(-2\pi i{\bf r}_0{\bf s}) \cr & = -Z_{0}\textstyle \sum \limits_{\bf s}[{\bf F}_{\rm tar}({\bf s})\cdot f_{o}(|{\bf s}|\semi B_{0})]\exp(-2\pi i{\bf r}_0{\bf s}) \cr &= -Z_{0}\rho_{{\rm tar}\_0}({\bf r}_{0}\semi B_{0}). &(11)}]$

Here, the map ρ_{tar_0}(r; B₀) is a Fourier series calculated with the coefficients F_tar(s)f₀(|s|; B₀). In other words, instead of blurring the model map with the atomic shape and calculating the point-by-point product of the two maps, one may blur the experimental map and leave the model map unblurred, i.e. as a point map.

For a multi-atom model

$[\eqalignno {T_{\rm data} & = -\textstyle \int \limits_{V}\rho_{\rm tar}({\bf r})\rho_{\rm calc}({\bf r})\,{\rm d}{\bf r} = {-\textstyle \int \limits_{V}}\rho_{\rm tar} \left [\textstyle \sum \limits_{m=1}^{M}\rho_{{\rm calc},m}({\bf r})\right]\, {\rm d}{\bf r} \cr & = -\textstyle \sum \limits_{m=1}^{M}\textstyle \int \limits_{V}\rho_{\rm tar}({\bf r})\rho_{{\rm calc},m}({\bf r})\,{\rm d}{\bf r}. & (12)}]$

At resolutions typical for bio-crystallography the shapes of macromolecular atoms are similar. If we additionally suppose that all of the atoms of the structure have the same (or similar) atomic displacement parameters B_m = B₀, then

$[T_{\rm data} \simeq - \textstyle\sum\limits_{m=1}^{M}Z_m\rho_{{\rm tar}\_0}({\bf r}_m\semi B_0)\eqno (13)]$

using the function ρ_{tar_0}(r; B₀) calculated once in advance. This shows that in calculating (8) we in fact implicitly sharpen the target map using ρ_tar(r) instead of ρ_{tar_0}(r; B₀). Even when using (8) as the target, it is likely to be beneficial to choose an optimal sharpening factor, just as the signal in map correlations can be improved.

If the difference in atomic B values cannot be neglected, one can calculate in advance a few maps ρ_{tar_0}(r; B_k) for a range of B-factor values B_k, k = 1, …, K, and use the appropriate ρ_{tar_0}(r_m; B_k) for a particular atom m,

$[R_{Z{\hbox {-}}{\rm atoms}} = -\textstyle\sum\limits_{m=1}^{M}Z_m\rho_{{\rm tar}\_0}[{\bf r}_m\semi B_k(m)]. \eqno (14)]$

If the atomic shapes are significantly different, as is the case for H atoms in neutron maps or negatively charged side chains in cryo-EM maps at high resolution, the approximation (13) can be used with Z_m being a negative value, or the target map can be convoluted with the respective atomic shape (which can be negative) before the sum over the relevant atoms is calculated.

APPENDIX B

Three-dimensional interpolation used

B1. General remarks

Using the atom-centered targets (7) and (8) requires an efficient and accurate interpolation of the maps calculated on three-dimensional regular grids. Not only the interpolated function values are needed but also the gradient. In this work, two options have been considered: trilinear (https://en.wikipedia.org/wiki/Trilinear_interpolation) and tricubic (https://en.wikipedia.org/wiki/Tricubic_interpolation). Both interpolation procedures, including the gradient calculation, are available through the cctbx software library (Grosse-Kunstleve et al., 2002 ). Trilinear interpolation is the simplest and the easiest to understand. Its major disadvantage is that, by construction, the minimum of the interpolated function is always at one of the corners of the box of interpolation. Since the map grid step is usually larger that the accuracy of atomic positions required, this can impact the optimization procedure and results. For this reason, the tricubic interpolation has been chosen as the default method. Other interpolations have also been tried but are not discussed in this work. In the following, we first describe the interpolation procedures inside the unit cube and then adapt the results and the procedures to an arbitrary regular tridimensional grid.

B2. Tricubic interpolation inside a unit cube

Let us consider an interpolation inside a unit cube, 0 ≤ x < 1, 0 ≤ y < 1, 0 ≤ z < 1. We search for a function in the form

$[\tilde{f}(x,y,z) = \textstyle\sum \limits_{k,l,m = 0}^{3}a_{klm}x^{k}y^{l}z^{m}. \eqno (15)]$

This function is cubic with respect to any of its three variables, giving expressions for the partial derivatives

$[\eqalignno {{{\partial \tilde{f}(x,y,z)}\over{\partial x}} & = \textstyle\sum\limits_{l,m = 0,k = 1}^{3}ka_{klm}x^{k-1}y^{l}z^{m}, \cr {{\partial \tilde{f}(x,y,z)}\over{\partial y}} & = \textstyle \sum \limits_{k,m = 0,l = 1}^{3}la_{klm}x^{k}y^{l-1}z^{m}, \cr {{\partial \tilde{f}(x,y,z)}\over{\partial z}} &= \textstyle\sum \limits_{k,l = 0, m = 1}^{3}ma_{klm}x^{k}y^{l}z^{m-1}. & (16)}]$

One can calculate all 64 coefficients in advance and use them for further calculations (Lekien & Marsden, 2005 ). Alternatively, one can build an interpolation for the coordinate x, then for the coordinate y and finally for the coordinate z (in any order of variables). To build interpolation (16) eight values from the cube corners are insufficient and either values from the neighboring grid points (the corners of the neighboring cubes) or derivatives in the corners of the unit cube are required. In the following, f_pqr with integers p, q, r stand for the grid function values f(p, q, r).

Firstly, we define a cubic interpolation

$[\tilde{f}(x) = a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3} \eqno (17)]$

of a function f(x) of one variable in the interval (0, 1) for which its values are known in the integer grid nodes, f₋₁ = f(−1), f₀ = f(0), f₁ = f(1), f₂ = f(2). We notate this interpolation by int3(x; f₋₁, f₀, f₁, f₂) and its derivative by gint3(x; f₋₁, f₀, f₁, f₂), as they are called in cctbx:

$[{{{\rm d}\tilde{f}(x)}\over{{\rm d}x}} = a_{1}+2a_{2}x+3a_{3}x^{2}. \eqno (18)]$

The coefficients of this approximation are derived below. The procedure of the tricubic interpolation then becomes a suite of operations:

$[\eqalignno {\tilde{f}_{xpq} &= {\rm int3}[x\semi f_{(-1)pq}, f_{0pq}, f_{1pq}, f_{2pq}], \cr \tilde{f}_{qyp} &= {\rm int3}[y\semi f_{q(-1)p}, f_{q0p}, f_{q1p}, f_{q2p}], \cr \tilde{f}_{pqz} &= {\rm int3}[z\semi f_{pq(-1)}, f_{pq0}, f_{pq1}, f_{pq2}], &(19)}]$

where p and q are integers −1, 0, 1 or 2, then

$[\eqalignno {\tilde{f}_{xyq} &= {\rm int3}[y \semi \tilde{f}_{x(-1)q}, \tilde{f}_{x0q},\tilde{f}_{x1q}, \tilde{f}_{x2q}], \cr \tilde{f}_{qyz} &= {\rm int3}[z\semi\tilde{f}_{qy(-1)}, \tilde{f}_{qy0}, \tilde{f}_{qy1}, \tilde{f}_{qy2}], \cr \tilde{f}_{xqz} &= {\rm int3}[x\semi\tilde{f}_{(-1)qz}, \tilde{f}_{0qz}, \tilde{f}_{1qz}, \tilde{f}_{2qz}] & (20)}]$

and finally

$[\eqalignno {\tilde{f}_{xyz} &= {\rm int3}[z\semi \tilde{f}_{xy(-1)}, \tilde{f}_{xy0}, \tilde{f}_{xy1}, \tilde{f}_{xz2}], \cr \tilde{f}_{xyz} &= {\rm int3}[x\semi \tilde{f}_{(-1)yz}, \tilde{f}_{0yz}, \tilde{f}_{1yz}, \tilde{f}_{2yz}], \cr \tilde{f}_{xyz} &= {\rm int3}[y \semi\tilde{f}_{x(-1)z}, \tilde{f}_{x0z}, \tilde{f}_{x1z}, \tilde{f}_{x2z}]. & (21)}]$

The last three expressions are redundant and only one of them can be calculated. However, the expressions previous to them are necessary to calculate partial derivatives as

$[\eqalignno {{{\partial \tilde{f}(x,y,z)}\over{\partial x}} & = {\rm gint3}[x\semi\tilde{f}_{(-1)yz}, \tilde{f}_{0yz}, \tilde{f}_{1yz}, \tilde{f}_{2yz}], \cr {{\partial \tilde{f}(x,y,z)}\over{\partial y}} & = {\rm gint3}[y\semi\tilde{f}_{x(-1)z}, \tilde{f}_{x0z}, \tilde{f}_{x1z}, \tilde{f}_{x2z}], \cr {{\partial \tilde{f}(x,y,z)}\over{\partial z}} &= {\rm gint3}[z\semi\tilde{f}_{xy(-1)}, \tilde{f}_{xy0},\tilde{f}_{xy1}, \tilde{f}_{xy2}]. &(22)}]$

The coefficients of the one-dimensional cubic interpolation (17) can be chosen using various considerations. The possibility taken as the default choice in the current software version is to build a cubic function $[\tilde{f}(x)]$ such that it and its first derivative coincide with f(x) and with f′(x), respectively, at points 0 and 1. Since the f′(0) and f′(1) values are unknown, they are estimated as

$[f'(0)\simeq {{1}\over{2}}(f_{1}-f_{-1}), \quad f'(1)\simeq {{1}\over{2}}(f_{2}-f_{0}). \eqno (23)]$

This gives the coefficients of (17) in the form

$[\eqalignno {a_{0} & = f_{0}, \cr a_{1} &= {{1}\over{2}}(f_{1}-f_{-1}), \cr a_{2} & = {{1}\over{2}}(-f_{2}+4f_{1}-5f_{0}+2f_{-1}), \cr a_{3} & = {{1}\over{2}}(f_{2}-3f_{1}+3f_{0}-f_{-1}). & (24)}]$

B3. Tricubic interpolation on a regular grid

Now let a function f(x, y, z) be defined in fractional coordinates on a grid with the step d_x = N_x⁻¹, d_y = N_y⁻¹, d_z = N_z⁻¹. Let us consider a point (x_g, y_g, z_g) and a box of this grid that this point belongs to,

$[\eqalignno {&n_{x}d_{x} \leq x_{g}\,\lt\, (n_{x}+1)d_{x}, \cr &n_{y}d_{y} \leq y_{g}\,\lt\, (n_{y}+1)d_{y}, \cr & n_{z}d_{z} \leq z_{g}\,\lt\, (n_{z}+1)d_{z} & (25)}]$

with n_x, n_y, n_z being integer numbers. We introduce intermediate variables rescaling this `box' to a unit cube as

$[\eqalignno {0 \le x & = x_gd_x^{-1} - n_x \,\lt\, 1, \cr 0 \le y & = y_gd_y^{-1} - n_y \,\lt \,1, \cr 0 \le z & = z_gd_z^{-1} - n_z \,\lt\, 1 &(26)}]$

and apply the procedure (19)–(21) described above. According to (26), the respective derivatives are

$[\eqalignno {{{\partial {\tilde f}(x_g,y_g,z_g)} \over {\partial x_g}} & = d_x^{-1} {{\partial {\tilde f}(x,y,z)} \over {\partial x}}, \cr {{\partial {\tilde f} (x_g,y_g,z_g)} \over {\partial y_g}} & = d_y^{-1}{{\partial {\tilde f}(x,y,z)} \over {\partial y}}, \cr {{\partial {\tilde f}(x_g,y_g,z_g)} \over {\partial z_g}} & = d_z^{-1}{{\partial {\tilde f}(x,y,z)} \over {\partial z}}. & (27)}]$

Footnotes

¹It is a widely known consequence of Parseval's theorem [see, for example, Diamond (1971) or Arnold & Rossmann (1988 )] that this is equivalent to a least-squares target between a full set of the corresponding complex Fourier coefficients; CNS (Brünger et al., 1998 ) describes this as a `vector LS target'.

²In crystallography, the set of the calculated Fourier coefficients usually coincides with that of the experimentally measured intensities.

³For comparison of the CPU required by the two methods, we refer to Kim & Sanbonmatsu (2017 ).

Funding information

This work was supported by the NIH (grant GM063210 to PDA, RJR and TT) and the PHENIX Industrial Consortium. This work was supported in part by the US Department of Energy under Contract No. DE-AC02-05CH11231. AU acknowledges the support and the use of resources of the French Infrastructure for Integrated Structural Biology FRISBI ANR-10-INBS-05 and of Instruct-ERIC. RJR is supported by a Principal Research Fellowship funded by the Wellcome Trust (Grant 082961/ Z/07/Z).

References

Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Grosse-Kunstleve, R. W., Urzhumtsev, A. & Adams, P. D. (2009). J. Appl. Cryst. 42, 607–615. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Headd, J. J., Terwilliger, T. C. & Adams, P. D. (2013). Comput. Crystallogr. Newsl. 4, 43–44. https://www.phenix-online.org/newsletter/CCN_2013_07.pdf. Google Scholar
Afonine, P. V., Klaholz, B. K., Moriarty, N. W., Poon, B. K., Sobolev, O. V., Terwilliger, T. C., Adams, P. D. & Urzhumtsev, A. (2018). bioRxiv. https://doi.org/10.1101/249607. Google Scholar
Afonine, P., Urzhumtsev, A. & Adams, P. D. (2015). Arbor, 191, a219. https://doi.org/10.3989/arbor.2015.772n2005. Google Scholar
Ahmed, T., Shi, J. & Bhushan, S. (2017). Nucleic Acids Res. 45, 8581–8595. CrossRef Google Scholar
Ahmed, T., Yin, Z. & Bhushan, S. (2016). Sci Rep. 6, 35793. CrossRef Google Scholar
Arnold, E. & Rossmann, M. G. (1988). Acta Cryst. A44, 270–283. CrossRef CAS Web of Science IUCr Journals Google Scholar
Baker, M. L., Hryc, C. F., Zhang, Q., Wu, W., Jakana, J., Haase-Pettingell, C., Afonine, P. V., Adams, P. D., King, J. A., Jiang, W. & Chiu, W. (2013). Proc. Natl Acad. Sci. USA, 110, 12301–12306. Web of Science CrossRef CAS PubMed Google Scholar
Barad, B. A., Echols, N., Wang, R. Y.-R., Cheng, Y., DiMaio, F., Adams, P. D. & Fraser, J. S. (2015). Nature Methods, 12, 943–946. CrossRef Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CSD CrossRef CAS PubMed Web of Science Google Scholar
Bhardwaj, A., Sankhala, R. S., Olia, A. S., Brooke, D., Casjens, S. R., Taylor, D. J., Prevelige, P. E. Jr & Cingolani, G. (2016). J. Biol. Chem. 291, 215–226. CrossRef Google Scholar
Brändén, C.-I. & Jones, T. A. (1990). Nature (London), 343, 687–689. Google Scholar
Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136–153. Web of Science CrossRef IUCr Journals Google Scholar
Brünger, A. T. (1992). Nature (London), 355, 472–475. PubMed Web of Science Google Scholar
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals Google Scholar
Brünger, A. T., Kuriyan, J. & Karplus, M. (1987). Science, 235, 458–460. PubMed Web of Science Google Scholar
Chapman, M. S. (1995). Acta Cryst. A51, 69–80. CrossRef CAS Web of Science IUCr Journals Google Scholar
Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef CAS IUCr Journals Google Scholar
Chen, V. B., Davis, I. W. & Richardson, D. C. (2009). Protein Sci. 18, 2403–2409. Web of Science CrossRef PubMed CAS Google Scholar
Chen, Y. et al. (2016). Science, 353, aad8266. CrossRef Google Scholar
Cheng, L., Sun, J., Zhang, K., Mou, Z., Huang, X., Ji, G., Sun, F., Zhang, J. & Zhu, P. (2011). Proc. Natl Acad. Sci. USA, 108, 1373–1378. Web of Science CrossRef CAS PubMed Google Scholar
Chua, E. Y. D., Vogirala, V. K., Inian, O., Wong, A. S. W., Nordenskiöld, L., Plitzko, J. M., Danev, R. & Sandin, S. (2016). Nucleic Acids Res. 44, 8013–8019. Web of Science CrossRef CAS PubMed Google Scholar
Demo, G., Svidritskiy, E., Madireddy, R., Diaz-Avalos, R., Grant, T., Grigorieff, N., Sousa, D. & Korostelev, A. A. (2017). Elife, 6, e23687. Google Scholar
Diamond, R. (1971). Acta Cryst. A27, 436–452. CrossRef CAS IUCr Journals Web of Science Google Scholar
DiMaio, F., Song, Y., Li, X., Brunner, M. J., Xu, C., Conticello, V., Egelman, E., Marlovits, T., Cheng, Y. & Baker, D. (2015). Nature Methods, 12, 361–365. CrossRef Google Scholar
Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fischer, N., Neumann, P., Konevega, A. L., Bock, L. V., Ficner, R., Rodnina, M. V. & Stark, H. (2015). Nature (London), 520, 567–570. Web of Science CrossRef PubMed Google Scholar
Frank, J. (2006). Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Oxford University Press. Google Scholar
Gao, Y., Cao, E., Julius, D. & Cheng, Y. (2016). Nature (London), 534, 347–351. Web of Science CrossRef CAS PubMed Google Scholar
Goddard, T. D., Huang, C. C., Meng, E. C., Pettersen, E. F., Couch, G. S., Morris, J. H. & Ferrin, T. E. (2018). Protein Sci. 27, 14–25. CrossRef Google Scholar
Gore, S. et al. (2017). Structure, 25, 1916–1927. CrossRef CAS Google Scholar
Grosse-Kunstleve, R. W. & Adams, P. D. (2004). IUCr Comput. Comm. Newsl. 4, 19–36. https://www.iucr.org/resources/commissions/crystallographic-computing/newsletters/4. Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K. & Adams, P. D. (2004). IUCr Comput. Comm. Newsl. 3, 22–31. https://www.iucr.org/resources/commissions/crystallographic-computing/newsletters/3. Google Scholar
Headd, J. J., Echols, N., Afonine, P. V., Grosse-Kunstleve, R. W., Chen, V. B., Moriarty, N. W., Richardson, D. C., Richardson, J. S. & Adams, P. D. (2012). Acta Cryst. D68, 381–390. Web of Science CrossRef CAS IUCr Journals Google Scholar
Headd, J. J., Echols, N., Afonine, P. V., Moriarty, N. W., Gildea, R. J. & Adams, P. D. (2014). Acta Cryst. D70, 1346–1356. Web of Science CrossRef IUCr Journals Google Scholar
Henderson, R. et al. (2012). Structure, 20, 205–214. Web of Science CrossRef CAS PubMed Google Scholar
Henrick, K., Newman, R., Tagari, M. & Chagoyen, M. (2003). J. Struct. Biol. 144, 228–237. Web of Science CrossRef PubMed CAS Google Scholar
Hodel, A., Kim, S.-H. & Brünger, A. T. (1992). Acta Cryst. A48, 851–858. CrossRef CAS Web of Science IUCr Journals Google Scholar
Hryc, C. F., Chen, D.-H., Afonine, P. V., Jakana, J., Wang, Z., Haase-Pettingell, C., Jiang, W., Adams, P. D., King, J. A., Schmid, M. F. & Chiu, W. (2017). Proc. Natl Acad. Sci. USA, 114, 3103–3108. CrossRef Google Scholar
Jones, T. A. (1978). J. Appl. Cryst. 11, 268–272. CrossRef CAS IUCr Journals Web of Science Google Scholar
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110–119. CrossRef CAS Web of Science IUCr Journals Google Scholar
Joseph, A. P., Lagerstedt, I., Patwardhan, A., Topf, M. & Winn, M. (2017). J. Struct. Biol. 199, 12–26. CrossRef Google Scholar
Kim, D. N. & Sanbonmatsu, K. Y. (2017). Biosci. Rep. 37, BSR20170072. CrossRef Google Scholar
Lagerstedt, I., Moore, W. J., Patwardhan, A., Sanz-García, E., Best, C., Swedlow, J. R. & Kleywegt, G. J. (2013). J. Struct. Biol. 184, 173–181. CrossRef Google Scholar
Lekien, F. & Marsden, J. (2005). Int. J. Numer. Methods Eng. 63, 455–471. CrossRef Google Scholar
Leslie, A. G. W. (1987). Acta Cryst. A43, 134–136. CrossRef CAS Web of Science IUCr Journals Google Scholar
Liao, M., Cao, E., Julius, D. & Cheng, Y. (2013). Nature (London), 504, 107–112. Web of Science CrossRef CAS PubMed Google Scholar
Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528. CrossRef Web of Science Google Scholar
Liu, Y., Pan, J., Jenni, S., Raymond, D. D., Caradonna, T., Do, K. T., Schmidt, A. G., Harrison, S. C. & Grigorieff, N. (2017). J. Mol. Biol. 429, 1829–1839. CrossRef CAS Google Scholar
Lokareddy, R. K., Sankhala, R. S., Roy, A., Afonine, P. V., Motwani, T., Teschke, C. M., Parent, K. N. & Cingolani, G. (2017). Nature Commun. 8, 14310. CrossRef Google Scholar
Lunin, V. Y., Afonine, P. V. & Urzhumtsev, A. G. (2002). Acta Cryst. A58, 270–282. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lunin, V. Y. & Urzhumtsev, A. G. (1984). Acta Cryst. A40, 269–277. CrossRef CAS Web of Science IUCr Journals Google Scholar
Maslen, E. N., Fox, A. G. & O'Keefe, M. A. (1992). International Tables for Crystallography, Vol. C, edited by A. J. C. Wilson, pp. 476–516. Dordrecht: Kluwer Academic Publishers. Google Scholar
Mooij, W. T. M., Hartshorn, M. J., Tickle, I. J., Sharff, A. J., Verdonk, M. L. & Jhoti, H. (2006). ChemMedChem, 1, 827–838. Web of Science CrossRef PubMed CAS Google Scholar
Oldfield, T. J. (2001). Acta Cryst. D57, 82–94. Web of Science CrossRef CAS IUCr Journals Google Scholar
Paulino, C., Neldner, Y., Lam, A. K. M., Kalienkova, V., Brunner, J. D., Schenck, S. & Dutzler, R. (2017). Elife, 6, e26232. CrossRef Google Scholar
Peng, L.-M. (1998). Acta Cryst. A54, 481–485. Web of Science CrossRef CAS IUCr Journals Google Scholar
Peng, L.-M., Ren, G., Dudarev, S. L. & Whelan, M. J. (1996). Acta Cryst. A52, 257–276. CrossRef CAS Web of Science IUCr Journals Google Scholar
Pintilie, G., Chen, D.-H., Haase-Pettingell, C. A., King, J. A. & Chiu, W. (2016). Biophys. J. 110, 827–839. CrossRef Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. et al. (2011). Structure, 19, 1395–1412. Web of Science CrossRef CAS PubMed Google Scholar
Rossmann, M. G. (2000). Acta Cryst. D56, 1341–1349. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G., Bernal, R. & Pletnev, S. V. (2001). J. Struct. Biol. 136, 190–200. Web of Science CrossRef PubMed CAS Google Scholar
Sears, V. F. (1992). Neutron News, 3(3), 26–37. CrossRef Google Scholar
Shalev-Benami, M., Zhang, Y., Matzov, D., Halfon, Y., Zackay, A., Rozenberg, H., Zimmerman, E., Bashan, A., Jaffe, C. L., Yonath, A. & Skiniotis, G. (2016). Cell. Rep. 16, 288–294. Google Scholar
Sheldrick, G. M. & Schneider, T. R. (1997). Methods Enzymol. 277, 319–343. CrossRef PubMed CAS Web of Science Google Scholar
Sobolev, O. V., Afonine, P. V., Adams, P. D. & Urzhumtsev, A. (2015). J. Appl. Cryst. 48, 1130–1141. CrossRef CAS IUCr Journals Google Scholar
Sorzano, C. O. S., Vargas, J., Otón, J., Abrishami, V., de la Rosa-Trevín, J. M., del Riego, S., Fernández-Alderete, A., Martínez-Rey, C., Marabini, R. & Carazo, J. M. (2015). AIMS Biophys. 2, 8–20. Google Scholar
Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2018). bioRxiv, 267138. https://doi.org/10.1101/267138. Google Scholar
Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Adams, P. D., Moriarty, N. W., Zwart, P., Read, R. J., Turk, D. & Hung, L.-W. (2007). Acta Cryst. D63, 597–610. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C., Read, R. J., Adams, P. D., Brunger, A. T., Afonine, P. V. & Hung, L.-W. (2013). Acta Cryst. D69, 2244–2250. Web of Science CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018). Acta Cryst. D74, 545–559. CrossRef IUCr Journals Google Scholar
Tickle, I. J. (2012). Acta Cryst. D68, 454–467. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tronrud, D. E. (2004). Acta Cryst. D60, 2156–2168. Web of Science CrossRef CAS IUCr Journals Google Scholar
Turk, D. (2013). Acta Cryst. D69, 1342–1357. Web of Science CrossRef CAS IUCr Journals Google Scholar
Urzhumtsev, A., Afonine, P. V., Lunin, V. Y., Terwilliger, T. C. & Adams, P. D. (2014). Acta Cryst. D70, 2593–2606. Web of Science CrossRef IUCr Journals Google Scholar
Urzhumtsev, A. G., Lunin, V. Y. & Luzyanina, T. B. (1989). Acta Cryst. A45, 34–39. CrossRef CAS Web of Science IUCr Journals Google Scholar
Waasmaier, D. & Kirfel, A. (1995). Acta Cryst. A51, 416–431. CrossRef CAS Web of Science IUCr Journals Google Scholar
Watkin, D. (2008). J. Appl. Cryst. 41, 491–522. Web of Science CrossRef CAS IUCr Journals Google Scholar
Williams, C. J. et al. (2018). Protein Sci. 27, 193–315. CrossRef Google Scholar
Wlodawer, A. & Dauter, Z. (2017). Acta Cryst. D73, 379–380. CrossRef IUCr Journals Google Scholar
Yang, H., Wang, J., Liu, M., Chen, X., Huang, M., Tan, D., Dong, M.-Q., Wong, C. C. L., Wang, J., Xu, Y. & Wang, H.-W. (2016). Protein Cell, 7, 878–887. CrossRef Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

STRUCTURAL
BIOLOGY

ISSN: 2059-7983

Volume 74| Part 6| June 2018| Pages 531-544

https://doi.org/10.1107/S2059798318006551

Open

access

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search IUCr Journals		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Real-space refinement in PHENIX for cryo-EM and crystallography

1. Introduction

2. Methods

2.1. Refinement flowchart

2.2. Refinement target function

2.2.1. Model-to-map target (Tdata)

2.2.2. Restraints (Trestraints)

2.2.3. Relative weight

2.3. Evaluation of refinement progress and results

3. Results

3.1. Test refinements with simulated data

3.1.1. Preparing simulated data

3.1.2. Refinement of the exact reference model

3.1.3. Refinement of perturbed reference models

3.2. Refinement using data from the PDB and EMDB

3.2.1. Cryo-EM maps

3.2.2. Default refinement

3.2.3. Refinement against sharpened maps

3.2.4. Re-refinement of the TRPV1 structure

4. Conclusions

APPENDIX A

Real-space targets and convolution

APPENDIX B

Three-dimensional interpolation used

B1. General remarks

B2. Tricubic interpolation inside a unit cube

B3. Tricubic interpolation on a regular grid

Footnotes

Funding information

References

research papers

2.2.1. Model-to-map target (T_data)

2.2.2. Restraints (T_restraints)