research papers
CryoEM singleparticle structure Servalcat
and map calculation using^{a}MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom, and ^{b}Scientific Computing Department, UKRI Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Campus, Didcot OX11 0FA, United Kingdom
^{*}Correspondence email: kyamashita@mrclmb.cam.ac.uk, garib@mrclmb.cam.ac.uk
In 2020, cryoEM singleparticle analysis achieved true atomic resolution thanks to technological developments in hardware and software. The number of highresolution reconstructions continues to grow, increasing the importance of the accurate determination of atomic coordinates. Here, a new Python package and program called Servalcat is presented that is designed to facilitate atomic model Servalcat implements a pipeline using the program REFMAC5 from the CCP4 package. After the Servalcat calculates a weighted F_{o} − F_{c} difference map, which is derived from Bayesian statistics. This map helps manual and automatic model building in real space, as is common practice in crystallography. The F_{o} − F_{c} map helps in the visualization of weak features including hydrogen densities. Although hydrogen densities are weak, they are stronger than in the electrondensity maps produced by Xray crystallography, and some H atoms are even visible at ∼1.8 Å resolution. Servalcat also facilitates atomic model under symmetry constraints. If pointgroup symmetry has been applied to the map during reconstruction, the model is refined with the appropriate symmetry constraints.
Keywords: cryoEM; structure refinement; REFMAC5; Servalcat.
1. Notation
F_{T}: Fourier transform of unknown true map (complex values).
F_{n}: Fourier transform of noise in the observed map (complex values).
F_{o1}, F_{o2}: Fourier transforms of the two unweighted and unsharpened half maps from independent reconstructions (complex values).
F_{o}: Fourier transform of the observed full map, (F_{o}_{1} + F_{o}_{2})/2.
F_{c}: Fourier transform of calculated map from atomic coordinates (complex values).
E: structure factors normalized in resolution bins, F/(〈F^{2}〉)^{1/2}.
k: resolutiondependent scale factor between F_{o} and F_{T}.
D: resolutiondependent scale factor between F_{o} and F_{c}.
: variance of signal, var(F_{T}).
: variance of noise, var(F_{n}).
: variance of unexplained signal, var(DF_{c} − kF_{T}).
f: atomic scattering factor.
s: column vector of position in reciprocal space.
s^{T}: row vector of position in reciprocal space.
x: column vector of position in real space.
(R, t): rotation matrix and translation vector that could be an element of a point group.
B: displacement parameter of an atom, or blurring parameter for a local or global region of a map. A real value (isotropic case) or a 3 × 3 symmetric matrix (anisotropic case). Usually B is isotropic and atomic unless otherwise stated. Also called an atomic displacement parameter (ADP) if associated with an atom.
Unless otherwise stated, all quantities in Fourier space are dependent on s.
2. Introduction
Atomic model ). More accurate maps may be obtained as the model becomes more accurate through the In singleparticle analysis (SPA) there is no although the Fourier coefficients can be noisy, especially at high resolution.
is the optimization of the model's parameters against the observed data. Atomic parameters typically include coordinates, atomic displacement parameters (ADPs) and occupancies. In crystallography, is crucial because of the the accuracy of density maps relies on the accuracy of the phases of the structure factors. Accurate phases are not observed and must be calculated from the model (Tronrud, 2004Accurate atomic model determination is becoming more and more important due to the `resolution revolution' in cryoEM SPA following the introduction of direct electron detectors and new dataprocessing methods (Bai et al., 2015). As of April 2021, more than 2500 SPA entries with resolutions better than 3.5 Å have been deposited in the Data Bank (EMDB; Tagari et al., 2002). This improvement in resolution has accelerated the development of methods for model building, and validation. Automatic modelbuilding programs that were originally developed for crystallography are now being adapted for cryoEM SPA maps (Terwilliger, Adams et al., 2018; Hoh et al., 2020; Chojnowski et al., 2021). Density modification and local map sharpening can help to interpret the map (Jakobi et al., 2017; Terwilliger, Sobolev et al., 2018; RamírezAportela et al., 2019; Ramlaul et al., 2019; Terwilliger et al., 2020). In general, care must be exercised when using any techniques based on prior knowledge; bias towards incorrect assumptions might lead to misinterpretation of the maps. Fullatom can be performed either in real space (Afonine et al., 2018) or in (Murshudov, 2016).
After et al., 2015). MolProbity is the most widely used geometry validation tool, and includes analyses of clashes, rotamers and the Ramachandran plot (Chen et al., 2010). Map–model quality is assessed using realspace local correlations (Cragnolini et al., 2021), which have commonly been used in crystallography (Tickle, 2012). In reciprocalspace the R factor can be calculated as in crystallography, but the map–model Fourier shell correlation (FSC) is preferred as it does not depend on resolutiondependent scaling and takes phases into account explicitly. An F_{o} − F_{c} map, which highlights unmodelled features and errors in the current model, is almost always used in crystallography, and some similar tools already exist for SPA (Joseph et al., 2020). The σ_{A}weighted (mF_{o} − DF_{c})exp(iφ_{c}) map as used in crystallography is not directly applicable to SPA, because phases are available for both F_{o} and F_{c} and we should model the error of F_{o} in the complex plane, rather than simply using the estimated phase error as in crystallography (see below).
the model should be validated; the model should have a reasonable geometry and should describe the map well. Due to the low datatoparameter ratio, all models will exhibit a degree of overfitting; however, the model should not deviate substantially from crossvalidation data (BrownIn 2020, cryoEM SPA achieved atomic resolution, according to Sheldrick's criterion (Wlodawer & Dauter, 2017), in structural analyses of apoferritin, which were reported by two groups (Nakane et al., 2020; Yip et al., 2020). Nakane et al. (2020) observed Hatom densities at 1.2 and 1.7 Å resolutions using F_{o} − F_{c} maps calculated by REFMAC5. There is a higher chance of observing hydrogen density in than in Xray crystallography because of the increased contrast for the lighter elements (Clabbers & Abrahams, 2018). Nevertheless, hydrogen density is relatively weak and there is always a much higher peak from the parent atom nearby, so the F_{o} − F_{c} difference map is essential to see it. In addition, there is complexity in the interpretation of hydrogen peaks in EM. An electron in an H atom is usually shifted towards the parent atom from the nucleus position. In EM, both the electrons and the nucleus contribute to scattering, and this offset results in a shift of hydrogen density peaks beyond the position of the hydrogen nucleus (Nakane et al., 2020).
SPA structures often have pointgroup symmetries (rather than spacegroup symmetry as in crystallography). Approximately half of the SPA entries in the EMDB have nonC1 pointgroup symmetry according to their associated metadata. Such symmetry is advantageous and helps to reach higher resolution because it increases the effective number of particles. If the map is symmetrized, downstream analyses should be aware of it and the structural model must follow the symmetry. As in crystallography, it is natural to work in a single The MTRIX records in the PDB format or _struct_ncs_oper in the mmCIF format can be used to encode the symmetry information.^{1} Currently, for structures from SPA there are only a few depositions of such models in the PDB (excepting viruses). We recommend refining and depositing an model, which makes sure the symmetry copies are truly identical. It should be noted that validation tools must be aware of any applied symmetry operators, but results should be reported for the only. These considerations are only valid if the map is symmetrized, and we suggest that the pointgroup information should be required by the deposition system.
Here, we present Servalcat, a Python package and standalone program for the and map calculation of cryoEM SPA structures. Servalcat takes unsharpened and unweighted half maps of the independent reconstructions as inputs and implements a pipeline using REFMAC5, which uses a dedicated likelihood function for SPA (Murshudov, 2016). After the Servalcat calculates a sharpened and weighted F_{o} − F_{c} map derived from Bayesian statistics as described below. If the map has pointgroup symmetry, the user can give an model and a pointgroup symbol, and the program will output a refined model with symmetry annotation as well as a symmetryexpanded model. The (NCS) constraint function in REFMAC5 has been updated to consider symmetryrelated nonbonded interactions and ADP similarity restraints (to ensure the similarity of ADPs of atoms brought into close proximity via symmetry operations).
Servalcat is freely available as a standalone package and also as part of CCPEM (Burnley et al., 2017), where the REFMAC5 interface has been updated to use Servalcat.
3. Map calculation and sharpening using signal variance
Let us assume that F_{o} is the result of a positionindependent blurring k of the true Fourier coefficients F_{T} with an independent zeromean Gaussian noise with variance . That is,
Note that in this work we treat k as a function of resolution s. Multiplication by k in Fourier space is equivalent to isotropic blurring by a convolution in real space. In general, k could take on a different value at each point s in Fourier space, which would produce a positionindependent but directiondependent blurring in real space.
The variance of the noise () can be calculated from the half maps in resolution bins (Murshudov, 2016),
We will later use the relationship of and to the FSC, correlation coefficients in resolution bins (Rosenthal & Henderson, 2003),
Let us also assume that the errors in the model follow a Gaussian distribution (Luzzati, 1952),
We need two functions: the likelihood p(F_{o}; F_{c}) for the estimation of parameters (of the atomic model and of the distribution function) and the posterior distribution p(F_{T}; F_{o}, F_{c}) of the unknown F_{T} for map calculation.
3.1. Likelihood
As derived in Murshudov (2016),
is the likelihood function that is optimized during atomic model D and are obtained in each resolution bin i by maximizing the joint likelihood (7):
where N_{i} is the number of Fourier coefficients in bin i.
3.2. Posterior distribution and map calculation
The posterior distribution, as derived in Murshudov (2016),
is a 2D Gaussian distribution with the mean and variance
where
Coefficients for an F_{o} − F_{c}type difference map can be derived as
The remaining unknown variable is k, which cannot be determined from the data alone. For positionindependent isotropic Gaussian blurring, k has the form exp(−B_{overall}s^{2}/4) and B_{overall} may be estimated from line fitting of a Wilson plot (Wilson, 1942). However such an estimate is unstable, especially when only lowresolution data are available. Here, we introduce a simple approximation using the variance of the signal. Let us assume that the true map consists of atoms with the same isotropic ADP of 〈B〉, and then
We ignored the interference terms . Further ignoring resolutiondependent terms in , we can use kσ_{T} as a proxy for k, which gives the best sharpening for the region, with a local blurring parameter of 〈B〉. kσ_{T} can be transformed as follows:
The F_{o} − F_{c} coefficient then finally has the form
Servalcat calculates an F_{o} − F_{c} map using (17). Note that the F_{o} − F_{c} map is only sensible when the ADPs are properly refined; otherwise we will see spurious peaks due to incorrect ADPs. For this reason, unsharpened F_{o} should be used as the input for atomic model (see Section 4.1); the sharpening is then consistent as the same sharpening factor is applied to F_{o} and F_{c}. Note also that the sharpening is based on the average B value, so regions having very different B values may show fewer structural features.
The map from the estimated true Fourier coefficients (11) may be useful, but there is a risk of model bias because of the contribution from F_{c}. In the future, techniques may be available to resolve the issue of model bias. At the moment, Servalcat provides the following as a default map for manual inspection. This is a special case of (11) in the absence of a model, that is with D = 0,
This is equivalent to EMDA's normalized expected map (Warshamanage et al., 2021).
The approach here should work at any resolution where atomic model
is applicable.3.3. Variance of a masked map
The significance of difference map peaks is usually defined by the r.m.s.d. (sigma) level in crystallography. However, in SPA the box size is arbitrary and the voxels outside the molecular envelope lead to underestimation of the r.m.s.d. value. Here, we demonstrate how a mask inflates sigmascaled density and show that it is useful to normalize the map using the standard deviation within the mask.
We consider a masked map containing n points in total, where m points are within the mask and thus the values for n − m points are zero. If we calculate the mean value of the whole data,
Thus, to calculate the mean within the mask we can calculate the total mean and then use the formula for correction:
For the variance,
From here we can calculate var_{mask} if we know var_{total} and μ_{total}. If we denote f = m/n then we can write
If the mean inside the mask is zero then there is a simple relationship between the total variance and the variance within the mask. This explains the dependence between the box size and the r.m.s.d. of a cryoEM SPA map. Servalcat normalizes the F_{o} − F_{c} map by (var_{mask})^{1/2} when a mask file is given. (Otherwise only the F_{o} − F_{c} structure factors are written in MTZ format.)
If we assume that the map consists of signal and noise, and there is no correlation between them, then we can claim that var_{mask} = var_{signal} + var_{noise}. Now, in addition, if we assume that we have modelled the map fully with an atomic model (or that two maps have an almost perfect overlap of signals) then the difference maps should consist almost entirely of noise. Therefore, var_{diffmap,mask} = var_{noise}. This variance should be calculated within the mask to make sure that we do not have variance reduction because of systematically low values outside the region occupied by the macromolecule. If we want to increase the reliability of these variances for a region of interest then we may also mask out other regions where there might be signal that is not fully accounted for by the current model. This can also be practiced in crystallography.
4. procedure
In this section the REFMAC5 itself is implemented in Servalcat using the GEMMI library (https://github.com/projectgemmi/gemmi). Fig. 1 summarizes the procedure.
and mapcalculation procedures are described. Everything other than4.1. Map choice
The optimal map depends on the purpose. For manual inspection, optimally sharpened and weighted maps should be used so that the best visual interpretability is achieved. In general, this does not mean the best signaltonoise ratio, but it does mean that the details of structural features are visible in the map. On the other hand, unsharpened and unweighted maps are preferred in B values (or nonpositive definite if anisotropic), but they are constrained to be positive in the resulting in suboptimal atomic models. On the other hand, blurred maps will just give a shifted distribution of refined B values. An unweighted map is preferred because it enables the calculation of many properties including noise variance and optimally weighted maps after (see Section 3). Users should therefore be aware that the ADPs in the model are not refined against the same map that is used for visual inspection. Crossvalidation (Brown et al., 2015) can also be carried out throughout and model building if both half maps are readily available. Therefore, unsharpened and unweighted half maps from two independent reconstructions are considered to be optimal inputs for the Servalcat pipeline, which performs atomic model followed by map calculation.
If a sharpened map is used, some atoms may need to be refined to have negative4.2. Masking and trimming
The box size in SPA is often substantially larger than the molecule, which is unnecessary for atomic model et al. (2018).
Therefore the map is masked and trimmed into a smaller box to speed up calculations, as discussed in NichollsHalf maps are first sharpened, masked at a radius of 3 Å (default) from the atom positions and then blurred by the same factor. Sharpening before masking is important to avoid masking away any of the signal (the tails of the atomic density distributions), because the raw half maps are blurred and the signal is spread out. The optimal sharpening will differ depending on the region, but here we use an overall isotropic B value estimated by comparing F_{o} with F_{c} calculated from a copy of the initial model with all ADPs set to zero. Alternatively, a usersupplied B value can be used. The sharpened–masked–unsharpened half maps are then averaged to make a full map that is used as the target in REFMAC5. After the map–model FSC is calculated using a newly created mask based on the refined model.
4.3. Pointgroup symmetry
If the maps are symmetrized, the user can specify a pointgroup symbol and give the coordinates for just a single Cn, Dn, O, T and I) following the axis convention in RELION (Scheres, 2012), which follows the common orientation convention (Heymann et al., 2005) except for T. It is also assumed that the centre of the box is the origin of symmetry. This requires translation for each rotation R_{j}, which can be calculated as c − R_{j}c = (I − R_{j})c, where c is the origin of symmetry. Reconstruction programs such as RELION (Scheres, 2012) usually follow this assumption. However, the rotation of the axes and the position of the origin are arbitrary in general, and in future will be determined automatically using ProSHADE (Nicholls et al., 2018; Tykac, 2018) and EMDA. The model in the is expanded when creating a mask and performing map trimming. The rotation matrices are invariant to changing the box sizes and shifts of the molecule. The translation vectors in the symmetry operators are recalculated for the shifted model.
Symmetry operators are calculated from the symbols (REFMAC5 internally generates symmetry copies when calculating F_{c} and restraint terms. For anisotropic ADPs, the B_{aniso} matrix in the Cartesian basis is transformed by . This anisotropic ADP transformation is also implemented in GEMMI.
During the
nonbonded interaction and ADP similarity restraints are evaluated using the symmetryexpanded model, and the gradients are calculated for the model in the asymmetric unit.If atoms are on special positions (for example on a rotation axis), they are restrained^{2} to sit on the special position and have anisotropic ADPs consistent with symmetry. Firstly, atoms are identified as being on a special position if the following condition is obeyed for any of the symmetry operators j,
where ɛ is a tolerance that can be modified by users. The default value is 0.25 Å. If an atom is on a special position then the program makes sure that the symmetry operators for this position form a group that is a
of the of the map. Once the elements of the for this atom have been identified, the atom is forced to be on that position by simply replacing its coordinates withIn every cycle, the positions of these atoms are restrained to be on their special positions by adding a term to the target function,
where the summation is performed over all _{x} is a usercontrollable weight parameter for special positions. The occupancy of the atom is adjusted based on the multiplicity of the position.
elements of the special position and σIf anisotropic ADPs are used, they are also forced to obey symmetry conditions for atoms on special positions by replacing the anisotropic tensor with
After this, similarly to the positional parameters, in every cycle restraints are applied to the anisotropic tensor of the atoms on special positions to avoid violation of the symmetry condition for the ADP,
where σ_{B} is a usercontrollable weight parameter for B_{aniso} values on special positions. Here, the distance between anisotropic tensors is a Frobenius distance B_{1} − B_{2}^{2} = .
4.4. H atoms
Hydrogen electrons are usually shifted towards the parent atoms by 0.1–0.2 Å (Williams et al., 2018). This must be accounted for when calculating structure factors from the atomic model (F_{c}). REFMAC5 and Servalcat (GEMMI) use the Mott–Bethe formula (Mott & Bragg, 1930; Bethe, 1930; Murshudov, 2016), which can conveniently take this fact into account.
The
for an atom with a shifted nucleus iswhere Δx is the positional shift of the nucleus with respect to the centre of the electron density. The hydrogen density peak in real space is shifted beyond the position of the hydrogen nucleus and varies depending on the ADP and resolution cutoff (Nakane et al., 2020). The expected peak position may be calculated by the Fourier transform of (28). The new CCP4 monomer library includes nucleus bond distances (_chem_comp_bond.value_dist_nucleus; Nicholls et al., 2021).
4.5. Refinement
REFMAC5 performs a against the Fourier transform of a sharpened–masked–unsharpened map (see Section 4.2) using a dedicated likelihood function for SPA (7). The estimated noise is not used at the moment. No solvent model is used. The average of map–model FSC weighted by the number of Fourier coefficients in each shell (FSC average) is reported to monitor the At low resolution the use of jellybody restraints or external restraints is encouraged to ensure a large radius of convergence and stabilize the (Murshudov et al., 2011; Nicholls et al., 2012). Note that jellybody restraints are only useful when the initial model geometry is of good quality because they try to keep the model in its current conformation. After the Servalcat shifts the model back to the original box and adjusts the translation vectors of the symmetry operators if needed. It also generates an MTZ file of map coefficients including the sharpened and weighted F_{o} − F_{c} and F_{o} maps (as calculated by equations 17 and 18).
4.6. User interface
Servalcat has a commandline interface. A graphical interface will be available in CCPEM, where the REFMAC5 interface has been updated and is now based on Servalcat.
From the user's point of view, the main difference in setting up a F_{o} − F_{c} difference map from Servalcat is made available along with the other output files in the CCPEM launcher.
job is that the default input is now a pair of half maps. (Refinement from a single input map is still possible but is no longer the default option.) The user is also offered more control over the options for weight, symmetry and handling of H atoms. At the end of the5. Methods and results
5.1. F_{o} − F_{c} map for ligand visualization
F_{o} − F_{c} omit maps are widely used to convincingly demonstrate the existence of ligands in crystallography. They are also useful for this purpose in SPA. Fig. 2 shows an example of an F_{o} − F_{c} omit map for the ligand density from EMDB entries EMD22898 (Kern et al., 2021) and EMD8123 (Murray et al., 2016), clearly showing support for the presence of the ligand. To generate the map from EMD22898, chain A of the atomic model from PDB entry 7kjr was refined using the half maps under C2 symmetry constraints. For EMD8123, PDB entry 5it7 was refined using the half maps without symmetry constraints. After the the ligand and water atoms were omitted and the F_{o} − F_{c} maps were calculated. Map values were normalized within a mask. Since a suitable mask for EMD22898 was not available in the EMDB, one was calculated from halfmap correlation using EMDA.
The weighting and sharpening scheme in Servalcat was compared with alternatives using no weights or (FSC_{full})^{1/2} weights (Rosenthal & Henderson, 2003), both with sharpening by the overall B value as determined from Wilson plot fitting by RELION (Supplementary Figs. S1 and S2). Especially in the case of EMDB entry EMD8123 (Supplementary Fig. S2), sharpening by the overall B value obtained by line fitting gave oversharpened maps.
5.2. F_{o} − F_{c} map for detecting model errors
In crystallography, F_{o} − F_{c} maps are almost always used for manual and automatic model rebuilding. Strong negative density usually indicates that parts of the model should be moved away or removed, while strong positive density implies that there are unmodelled atoms. The F_{o} − F_{c} map is typically updated after every session, and may be stopped when there are no significant strong peaks.
The same illustrates the use of the F_{o} − F_{c} map for detecting model errors using EMDB entry EMD0919 and PDB entry 6lmt (Demura et al., 2020). Chain A of the model was refined using the half maps under C8 symmetry constraints. After the F_{o} − F_{c} map was calculated and normalized using the standard deviation of the region within the EMDBdeposited mask. In this example, it is clear from the positive and negative difference peaks that the tryptophan and methionine side chains should be repositioned. The weighting and sharpening scheme are compared in Supplementary Fig. S3, demonstrating that appropriate weighting can increase the interpretability of maps.
practice is possible in SPA. Fig. 35.3. Hydrogen density analysis
Nakane et al. (2020) reported convincing densities for H atoms in apoferritin and GABA_{A}R maps by cryoEM SPA at 1.2 and 1.7 Å resolution, respectively. It is natural to ask what is the lowest resolution at which H atoms can be seen in cryoEM SPA using currently available computational tools.
Here, we analyzed apoferritin maps from the EMDB to see if and when hydrogen densities could be observed. There are 25 mouse or human apoferritin entries at resolutions better than 2.1 Å, of which 19 had half maps and were used in the analysis (Table 1). Chain A of each model was refined using the half maps under O symmetry constraints. If there was no corresponding PDB entry, PDB entry 7a4m or 6z6u was placed in the map using MOLREP (Vagin & Teplyakov, 2010) followed by jiggle fit in Coot (Brown et al., 2015) before full atomic After ten cycles of with REFMAC5, an F_{o} − F_{c} map was calculated and normalized within the mask. Riding H atoms were used in the (so they are not refined, but generated at fixed positions; this is the default in REFMAC5) and they were omitted for F_{o} − F_{c} map calculation. Peaks of ≥2σ and ≥3σ were detected using PEAKMAX from the CCP4 package (Winn et al., 2011), and were associated with hydrogen positions if the distance from the peak was less than 0.3 Å. H atoms having multiple potential minima (such as those in hydroxyl, sulfhydryl or carboxyl groups) were ignored in the analysis. The ratios of the number of hydrogen peaks to the number of H atoms in the model are plotted in Fig. 4(a). The result shows that the 1.25 Å resolution data gave the highest ratio of ∼70% hydrogens detected (Fig. 5a). Even at 1.84 Å resolution approximately 17% of the H atoms may be found (Fig. 5b), while at 2.0 or 2.1 Å resolution only a few H atoms are visible in the map (Fig. 5c). The weighting and sharpening schemes are compared in Supplementary Figs. S4–S6. Note that there may be false positives due to, for example, alternative conformations or inaccuracies in the model.

In addition, F_{o} − F_{c} maps were generated from the 1.2 Å resolution data (PDB entry 7a4m; EMDB entry EMD11638) using several different resolution cutoffs. These were analysed in the same way (Fig. 4c), along with F_{c} maps calculated from the PDB entry 7a4m model at the same resolutions (Fig. 4d). Figs. 4(c) and 4(d) show that if the cryoEM experiment and atomic model are carried out carefully, with due attention to ADPs, then some H atoms can be seen even at 2.0 Å resolution.
For comparison, we performed the same analysis using Xray crystallographic data for (apo)ferritins deposited in the PDB. 51 rerefined atomic models available in the PDBREDO database (Joosten et al., 2012) were downloaded, crystallographic mF_{o} − DF_{c} maps were calculated using REFMAC5 and density peaks for H atoms were analysed as just described. The result (Fig. 4b) confirms that, as expected, H atoms are more visible in EM than using Xrays.
6. Conclusions
A new program, Servalcat, for the and validation of atomic models using cryoEM SPA maps has been developed. The program controls the flow and performs differencemap calculations. A weighted and sharpened F_{o} − F_{c} map was derived as a validation tool, obtained from the posterior distribution of F_{T} and an approximation of an overall blurring factor calculated from the variance of the signal. We showed that such maps are useful to visualize H atoms and model errors, as in crystallography.
In this work, we assumed the blurring factor k was positionindependent (see Section 3). However, in reality, blurring of maps is position and directiondependent, for example due to the varying mobility of different domains and/or uncertainty in the particle alignments. For such regions k should ideally be replaced with k_{local}, derived from a local map blurring parameter B_{local} according to k_{local}(s) = exp(−B_{local}s^{2}/4) (if isotropic) or exp(−s^{T}B_{local}s/4) (if anisotropic). If we could estimate B_{local} values, then we would be able to use them for the visual improvement of maps. This is especially important for identifying weak densities. We are working on this subject.
We showed that many H atoms may be observed in the difference maps, even up to a resolution of 2 Å. We would expect that they should also be visible in electron diffraction (MicroED) experiments. However, high accuracy would be needed in the experiment, data analysis and model et al., 2018); H atoms are known to suffer from radiation damage (Leapman & Sun, 1995) and this would hinder their detection. Lower dose experiments might be needed for more reliable identification of hydrogen, even at the expense of resolution.
in both MicroED and cryoEM SPA to achieve this experimentally. For example, the electron dose in cryoEM experiments is often high enough to cause radiation damage (HattneSymmetry is widely used in cryoEM SPA. When symmetry is imposed in the reconstruction, it should be used throughout the downstream analyses, and all software tools should be aware of it and take it into account. The Cn or Dn), twist and rise (He & Scheres, 2017). Servalcat will support helical symmetry in the future.
model should be refined under symmetry constraints, and it should be deposited in the PDB with the correct annotation of the symmetry. The PDB and EMDB deposition system will need to validate the symmetry of both the model and the map. We hope that this will become common practice in the future. The same practice should be established for helical reconstructions, in which symmetry is described by the axial symmetry type (Servalcat is freely available under an open source (MPL2.0) licence at https://github.com/keitaroyam/servalcat. The features described in this paper have been implemented in REFMAC 5.8.0291 and Servalcat 0.2.0 (which requires GEMMI 0.4.9). Servalcat is also available in the latest nightly builds of the CCPEM suite and will be included in the upcoming version 1.6 release.
Supporting information
Supplementary Figures. DOI: https://doi.org/10.1107/S2059798321009475/qt5003sup1.pdf
Footnotes
^{1}There is a similar record, BIOMT, which encodes the biological assembly. In SPA, the symmetry of the map usually corresponds to the biological assembly, but this is not always the case. Both MTRIX and BIOMT records are generally required during deposition.
^{2}Technically, fixed position constraints would be more appropriate here. We used restraints instead of constraints for simplicity of implementation. In the future, we will implement the use of constraints instead.
Acknowledgements
The authors are grateful to Marcin Wojdyr for the implementation of F_{c} calculation for EM in the GEMMI library, Takanori Nakane for critical reading of the manuscript, computational structural biology group members for discussion, and Jake Grimmett and Toby Darling from the MRC–LMB Scientific Computing Department for computing support and resources.
Funding information
This work was supported by the Medical Research Council as part of UK Research and Innovation (MC_UP_A025_1012 to KY and GNM; MR/V000403/1 to CMP and TB).
References
Afonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). Acta Cryst. D74, 531–544. Web of Science CrossRef IUCr Journals Google Scholar
Bai, X.C., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49–57. Web of Science CrossRef CAS PubMed Google Scholar
Bethe, H. (1930). Ann. Phys. 397, 325–400. CrossRef Google Scholar
Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136–153. Web of Science CrossRef IUCr Journals Google Scholar
Burnley, T., Palmer, C. M. & Winn, M. (2017). Acta Cryst. D73, 469–477. Web of Science CrossRef IUCr Journals Google Scholar
Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef CAS IUCr Journals Google Scholar
Chojnowski, G., Sobolev, E., Heuser, P. & Lamzin, V. S. (2021). Acta Cryst. D77, 142–150. CrossRef IUCr Journals Google Scholar
Clabbers, M. T. B. & Abrahams, J. P. (2018). Crystallogr. Rev. 24, 176–204. Web of Science CrossRef CAS Google Scholar
Cragnolini, T., Sahota, H., Joseph, A. P., Sweeney, A., Malhotra, S., Vasishtan, D. & Topf, M. (2021). Acta Cryst. D77, 41–47. CrossRef IUCr Journals Google Scholar
Danev, R., Yanagisawa, H. & Kikkawa, M. (2019). Trends Biochem. Sci. 44, 837–848. Web of Science CrossRef CAS PubMed Google Scholar
Danev, R., Yanagisawa, H. & Kikkawa, M. (2021). Microscopy, dfab016. CrossRef Google Scholar
Demura, K., Kusakizako, T., Shihoya, W., Hiraizumi, M., Nomura, K., Shimada, H., Yamashita, K., Nishizawa, T., Taruno, A. & Nureki, O. (2020). Sci. Adv. 6, eaba8105. CrossRef PubMed Google Scholar
Fislage, M., Shkumatov, A. V., Stroobants, A. & Efremov, R. G. (2020). IUCrJ, 7, 707–718. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Guo, H., Franken, E., Deng, Y., Benlekbir, S., Singla Lezcano, G., Janssen, B., Yu, L., Ripstein, Z. A., Tan, Y. Z. & Rubinstein, J. L. (2020). IUCrJ, 7, 860–869. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Hattne, J., Shi, D., Glynn, C., Zee, C.T., GallagherJones, M., Martynowycz, M. W., Rodriguez, J. A. & Gonen, T. (2018). Structure, 26, 759–766. Web of Science CrossRef CAS PubMed Google Scholar
He, S. & Scheres, S. H. W. (2017). J. Struct. Biol. 198, 163–176. Web of Science CrossRef CAS PubMed Google Scholar
Heymann, J. B., Chagoyen, M. & Belnap, D. M. (2005). J. Struct. Biol. 151, 196–207. Web of Science CrossRef PubMed Google Scholar
Hoh, S. W., Burnley, T. & Cowtan, K. (2020). Acta Cryst. D76, 531–541. CrossRef IUCr Journals Google Scholar
Jakobi, A. J., Wilmanns, M. & Sachse, C. (2017). eLife, 6, e27131. Web of Science CrossRef PubMed Google Scholar
Joosten, R. P., Joosten, K., Murshudov, G. N. & Perrakis, A. (2012). Acta Cryst. D68, 484–496. Web of Science CrossRef CAS IUCr Journals Google Scholar
Joseph, A. P., Lagerstedt, I., Jakobi, A., Burnley, T., Patwardhan, A., Topf, M. & Winn, M. (2020). J. Chem. Inf. Model. 60, 2552–2560. Web of Science CrossRef CAS PubMed Google Scholar
Kato, T., Makino, F., Nakane, T., Terahara, N., Kaneko, T., Shimizu, Y., Motoki, S., Ishikawa, I., Yonekura, K. & Namba, K. (2019). Microsc. Microanal. 25, 998–999. CrossRef PubMed Google Scholar
Kern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., Toso, D. B., Kotecha, A., Bautista, D. M. & Brohawn, S. G. (2021). Nat. Struct. Mol. Biol. 28, 573–582. CrossRef CAS PubMed Google Scholar
Leapman, R. D. & Sun, S. (1995). Ultramicroscopy, 59, 71–79. CrossRef CAS PubMed Web of Science Google Scholar
Luzzati, V. (1952). Acta Cryst. 5, 802–810. CrossRef IUCr Journals Web of Science Google Scholar
Mott, N. F. & Bragg, W. L. (1930). Proc. R. Soc. London A, 127, 658–665. CAS Google Scholar
Murray, J., Savva, C. G., Shin, B.S., Dever, T. E., Ramakrishnan, V. & Fernández, I. S. (2016). eLife, 5, e13567. CrossRef PubMed Google Scholar
Murshudov, G. N. (2016). Methods Enzymol. 579, 277–305. Web of Science CrossRef CAS PubMed Google Scholar
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nakane, T., Kotecha, A., Sente, A., McMullan, G., Masiulis, S., Brown, P. M. G. E., Grigoras, I. T., Malinauskaite, L., Malinauskas, T., Miehling, J., Uchański, T., Yu, L., Karia, D., Pechnikova, E. V., de Jong, E., Keizer, J., Bischoff, M., McCormack, J., Tiemeijer, P., Hardwick, S. W., Chirgadze, D. Y., Murshudov, G., Aricescu, A. R. & Scheres, S. H. W. (2020). Nature, 587, 152–156. Web of Science CrossRef CAS PubMed Google Scholar
Naydenova, K., Peet, M. J. & Russo, C. J. (2019). Proc. Natl Acad. Sci. USA, 116, 11718–11724. Web of Science CAS PubMed Google Scholar
Nicholls, R. A., Long, F. & Murshudov, G. N. (2012). Acta Cryst. D68, 404–417. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nicholls, R. A., Tykac, M., Kovalevskiy, O. & Murshudov, G. N. (2018). Acta Cryst. D74, 492–505. Web of Science CrossRef IUCr Journals Google Scholar
Nicholls, R. A., Wojdyr, M., Joosten, R. P., Catapano, L., Long, F., Fischer, M., Emsley, P. & Murshudov, G. N. (2021). Acta Cryst. D77, 727–745. Web of Science CrossRef IUCr Journals Google Scholar
Pintilie, G., Zhang, K., Su, Z., Li, S., Schmid, M. F. & Chiu, W. (2020). Nat. Methods, 17, 328–334. Web of Science CrossRef CAS PubMed Google Scholar
RamírezAportela, E., Vilas, J. L., Glukhova, A., Melero, R., Conesa, P., Martínez, M., Maluenda, D., Mota, J., Jiménez, A., Vargas, J., Marabini, R., Sexton, P. M., Carazo, J. M. & Sorzano, C. O. S. (2019). Bioinformatics, 36, 765–772. Google Scholar
Ramlaul, K., Palmer, C. M. & Aylett, C. H. (2019). J. Struct. Biol. 205, 30–40. Web of Science CrossRef PubMed Google Scholar
R Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Google Scholar
Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745. Web of Science CrossRef PubMed CAS Google Scholar
Scheres, S. H. W. (2012). J. Struct. Biol. 180, 519–530. Web of Science CrossRef CAS PubMed Google Scholar
Schrodinger, LLC (2020). The PyMOL Molecular Graphics System, Version 2.4. Google Scholar
Tagari, M., Newman, R., Chagoyen, M., Carazo, J.M. & Henrick, K. (2002). Trends Biochem. Sci. 27, 589. CrossRef PubMed Google Scholar
Tan, Y. Z. & Rubinstein, J. L. (2020). Acta Cryst. D76, 1092–1103. Web of Science CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2018a). Nat. Methods, 15, 905–908. CrossRef CAS PubMed Google Scholar
Terwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018b). Acta Cryst. D74, 545–559. CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Sobolev, O. V., Afonine, P. V., Adams, P. D. & Read, R. J. (2020). Acta Cryst. D76, 912–925. Web of Science CrossRef IUCr Journals Google Scholar
Tickle, I. J. (2012). Acta Cryst. D68, 454–467. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tronrud, D. E. (2004). Acta Cryst. D60, 2156–2168. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tykac, M. (2018). PhD thesis. University of Cambridge. https://doi.org/10.17863/CAM.31783. Google Scholar
Vagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22–25. Web of Science CrossRef CAS IUCr Journals Google Scholar
Warshamanage, R., Yamashita, K. & Murshudov, G. N. (2021). bioRxiv, 2021.07.26.453750. Google Scholar
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. New York: Springer. Google Scholar
Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B. III, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293–315. Web of Science CrossRef CAS PubMed Google Scholar
Wilson, A. J. C. (1942). Nature, 150, 152. CrossRef Google Scholar
Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., Keegan, R. M., Krissinel, E. B., Leslie, A. G. W., McCoy, A., McNicholas, S. J., Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin, A. & Wilson, K. S. (2011). Acta Cryst. D67, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wlodawer, A. & Dauter, Z. (2017). Acta Cryst. D73, 379–380. Web of Science CrossRef IUCr Journals Google Scholar
Wu, M., Lander, G. C. & Herzik, M. A. (2020). J. Struct. Biol. X, 4, 100020. Web of Science PubMed Google Scholar
Yip, K. M., Fischer, N., Paknia, E., Chari, A. & Stark, H. (2020). Nature, 587, 157–161. Web of Science CrossRef CAS PubMed Google Scholar
Zivanov, J., Nakane, T., Forsberg, B. O., Kimanius, D., Hagen, W. J., Lindahl, E. & Scheres, S. H. W. (2018). eLife, 7, e42166. Web of Science CrossRef PubMed Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.