## research papers

## Automated map sharpening by maximization of detail and connectivity

**Thomas C. Terwilliger,**

^{a,}^{b}^{*}Oleg V. Sobolev,^{c}Pavel V. Afonine^{c,}^{d}and Paul D. Adams^{d,}^{e}^{a}Bioscience Division, Los Alamos National Laboratory, Mail Stop M888, Los Alamos, NM 87545, USA, ^{b}New Mexico Consortium, Los Alamos, NM 87544, USA, ^{c}Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, ^{d}Department of Bioengineering, University of California Berkeley, Berkeley, California, USA, and ^{e}Department of Physics and International Centre for Quantum and Molecular Structures, Shanghai University, Shanghai, 200444, People's Republic of China^{*}Correspondence e-mail: tterwilliger@newmexicoconsortium.org

An algorithm for automatic map sharpening is presented that is based on optimization of the detail and connectivity of the sharpened map. The detail in the map is reflected in the surface area of an iso-contour surface that contains a fixed fraction of the volume of the map, where a map with high level of detail has a high surface area. The connectivity of the sharpened map is reflected in the number of connected regions defined by the same iso-contour surfaces, where a map with high connectivity has a small number of connected regions. By combining these two measures in a metric termed the `adjusted surface area', map quality can be evaluated in an automated fashion. This metric was used to choose optimal map-sharpening parameters without reference to a model or other interpretations of the map. Map sharpening by optimization of the adjusted surface area can be carried out for a map as a whole or it can be carried out locally, yielding a locally sharpened map. To evaluate the performance of various approaches, a simple metric based on map–model correlation that can reproduce visual choices of optimally sharpened maps was used. The map–model correlation is calculated using a model with *B* factors (atomic displacement factors; ADPs) set to zero. This model-based metric was used to evaluate map sharpening and to evaluate map-sharpening approaches, and it was found that optimization of the adjusted surface area can be an effective tool for map sharpening.

### 1. Introduction

Current methods for single-particle reconstruction of cryo-EM maps are capable of yielding maps with resolutions that are often better than 4.5 Å and are sometimes as high as 2 Å or better (see Kühlbrandt, 2014; Baldwin *et al.*, 2018). The level of noise in the images used in reconstruction and in the resulting maps is highly resolution-dependent. For this reason, it is standard practice to represent a map as a Fourier series, estimate the signal and noise in the reconstruction as a function of resolution, and use this information to weight the Fourier terms as a function of resolution to maximize the interpretability of the final sharpened (or blurred) map (Rosenthal & Henderson, 2003). As the various errors in reconstruction are difficult to estimate accurately, other approaches for resolution-dependent weighting have also been considered. For example, feature-enhanced maps (FEMs; Afonine *et al.*, 2015) and the model-building software *Coot* (Emsley *et al.*, 2010) use maximization of the kurtosis of a map to choose an overall sharpening *B*_{sharpen} (an exponential factor applied to Fourier terms; see DeLaBarre & Brunger, 2006; Wlodawer *et al.*, 2008). Nicholls *et al.* (2012) developed procedures for optimizing anisotropic versions of displacement factors based on considering sharpening as an inverse deblurring problem. Joseph *et al.* (2016) used the method of Rosenthal and Henderson for map sharpening during the of macromolecular structures. Burnley *et al.* (2017) have recently noted that the challenge of optimizing map sharpening is an open one, with the comment that

Presently, the optimum sharpening coefficient (where `optimum' means maximizing the interpretable density features) cannot be analytically determined either locally or globally, although attempts are ongoing.

*et al.*(2005) used the resolution-dependence of a model-based map in a sharpening procedure, and recently Jakobi

*et al.*(2017) applied such a sharpening procedure locally to optimize the contrast and interpretability of density maps. Sharpening is also commonly applied in X-ray analysis. DeLaBarre & Brunger (2006) suggested strongly sharpening low-resolution maps; their procedure leads to an overall

*B*value

*B*

_{iso}(isotropic Wilson

*B*factor; here iso stands for isotropic; closely related to

*B*factors or atomic displacement factors; see DeLaBarre & Brunger, 2006; Wlodawer

*et al.*, 2008) of about zero. The

*PHENIX*(Adams

*et al.*, 2010) tools

*AutoSol*(Terwilliger

*et al.*, 2009) and

*AutoBuild*(Terwilliger

*et al.*, 2009) use map sharpening routinely in automated map interpretation and sharpen maps to an overall

*B*value numerically given by ten times the resolution in Å units (

*e.g.*

*B*

_{iso}= 40 Å

^{2}at a resolution of 4 Å; Terwilliger

*et al.*, 2008). Liu & Xiong (2014) applied sharpening to nearly 2000 X-ray maps and found that the map–model correlation could generally be improved through map sharpening.

Here, we present a model-free algorithm for optimizing the sharpening of a map that is based on simultaneously maximizing the level of detail in the map and the connectivity of the map. We show that this procedure can be an effective tool for map sharpening.

### 2. Methods

#### 2.1. Map sharpening and blurring

Prior to map sharpening or blurring, maps are (by default) corrected for anisotropy (Zwart *et al.*, 2005), where the final isotropic *B* value *B*_{iso} is set to be the average of the three diagonal terms in the matrix representing the anisotropic *B* factor.

We use a four-parameter function for map sharpening. The map is represented as a Fourier series and is typically sharpened at lower resolution and then blurred at higher resolution, although the map can also be blurred at all resolutions. The four parameters are a sharpening *B* factor (*B*_{sharpen}) applied at lower resolutions, a blurring *B* factor (*B*_{blur}) applied at high resolution, a transition resolution (*d*_{cut}) and a transition parameter (*k*) used to define the resolution range at which the transition between sharpening and blurring occurs. Note that here by `sharpening' we mean either overall sharpening or blurring. If the map is sharpened the sharpening *B* factor (*B*_{sharpen}) is positive; if the map is blurred it is negative. If the map is blurred, no additional blurring *B* factor is applied at high resolution.

The sharpening *B* factor (*B*_{sharpen}) is applied to amplitudes in the Fourier series representing the map through a resolution-dependent sharpening scale factor *A*_{sharpen}(*d*), where *d* is the resolution of a Fourier term,

If the value of the *B* factor (*B*_{sharpen}) is positive, so that the map is being sharpened (amplitudes increasing at high resolution), a blurring scale factor is applied at high resolution as a kind of soft resolution limit. The blurring scale factor *A*_{blur}(*d*) is given by

The resolution where the scale factors change from sharpening to blurring is defined by the resolution cutoff (*d*_{cut}) and the sharpness of the transition is controlled by the transition parameter (*k*). The overall scale factor *A*(*d*) has the form

where the resolution-dependent weighting factors *w*_{sharpen}(*d*) and *w*_{blur}(*d*) are given by

Examining (4*a*) it can be appreciated that for low-resolution terms where *d* is much larger than *d*_{cut} the exponential term will be very small, so that the value of *w*_{sharpen}(*d*) will be close to unity and *w*_{blur}(*d*) will be close to zero. At the transition resolution *d*_{cut} the weights on sharpening and blurring will be equal, and at high resolutions the blurring weight *w*_{blur}(*d*) will be close to unity.

Typically, the transition resolution *d*_{cut} is taken to be the same as the resolution of the map, the blurring *B* value has a default value of *B*_{blur} = 200 Å^{2} (a value that leads to blurring by a factor of 250 at a resolution of 3 Å, for example) and the transition parameter has a default value of *k* = 10 Å^{−1}, leading to a 90% completion of the transition between sharpening and blurring over the resolution range *d*_{cut} − 0.2 Å to *d*_{cut} + 0.2 Å. This leaves the sharpening *B* value (*B*_{sharpen}) as the only parameter that is normally adjusted to optimize a map. Optionally, two of the other parameters can be optimized at present (*d*_{cut} and *k*). These optimizations are not normally used because in a large-scale test with 345 cryo-EM maps the average map–model correlation (using *B* values of zero for the model) after sharpening with and without these optimizations was essentially identical.

#### 2.2. Description of overall map sharpness

We represent a map as a Fourier series and then describe the resolution-dependent increase or decrease in mean amplitudes using an overall `*B* factor' (Wlodawer *et al.*, 2008). In essence, this description assumes that the amplitudes of the Fourier terms fall off with resolution according to

where *F*(*d*) is the mean amplitude in a shell of resolution *d* and *B*_{iso} is the overall *B* factor (where iso stands for isotropic). This *B* factor *B*_{iso} is calculated for an unsharpened or sharpened map by fitting an anisotropic *B* factor to the map coefficients (Fourier terms; Zwart *et al.*, 2005) and using the average of the three diagonal terms in the matrix representing the anisotropic *B* factor as *B*_{iso}.

#### 2.3. Map sharpening by maximization of the adjusted surface area

Our core algorithm for map sharpening is to evaluate the interpretability of a sharpened map based on its level of detail and its connectivity (see Fig. 1). The level of detail is derived from the surface area of iso-contour surfaces enclosing a fixed volume of the map (by default 20% of the volume occupied by the macromolecule; see below). The connectivity is derived from the number of contiguous regions enclosed by these iso-contour surfaces. The iso-contour surfaces are simply contours such as those that would be displayed by software such as *Coot* (Emsley *et al.*, 2010) or *Chimera* (Pettersen *et al.*, 2004) at a given threshold level. The surface area (SA) of a set of iso-contour surfaces is taken to be the number of grid points that are outside the iso-contour surface and adjacent to a grid point inside the surface. The adjusted surface area (SA_{adjusted}) is the surface area minus the number of contiguous regions (*N*_{regions}) inside iso-contour surfaces (scaled with a factor *C*_{scale} as described below),

The key parameter in constructing the iso-contour surfaces is the threshold at which the surfaces are drawn. We set this threshold by finding a value that leads to a fixed percentage of the molecular volume inside the contours. To do this requires an estimate of the volume occupied by the molecule(s) in the map. This information can be input directly, but by default we estimate it in two steps. We first assume that some part of the map contains the molecule of interest (and therefore has a high variability of density) and that the remainder of the map (often the vast majority of grid points in the map) is essentially empty (flat). The volume occupied by the molecule is identified using tools developed for the identification of the region occupied by the macromolecule in crystallographic maps (Wang, 1985; Terwilliger, 1999). The region containing the macromolecule is identified as the region in which the local smoothed squared density is high, indicating a high variability in density. Cryo-EM maps typically have a very high contrast between the region containing the macromolecule and the region outside it, so the difference between regions inside and outside the molecule is generally not difficult to distinguish. The region containing the macromolecule is estimated in a probabilistic fashion based on a guess (in practice, several guesses of widely varying volume fractions) of the inside the macromolecule (Terwilliger, 1999). If the inside the macromolecule is underestimated, the region identified will typically be larger than the initial estimate. In essence, our procedure consists of identifying the region inside the macromolecule and updating the guess of the volume inside the macromolecule, cycling through this procedure several times. For this calculation a value for the smoothing radius is needed, and it is chosen by default to be 1.5 times the resolution of the map. (This scale factor for the radius is chosen as a compromise between the high level of detail in the definition of the region inside the macromolecule that would be obtained with a radius equal to the resolution of the map and the much more robust definition of this region that can be obtained with a larger radius such as twice the resolution of the map.) Once the region containing the macromolecule has been identified, the threshold is adjusted to yield a fixed fraction (typically 20%) of the volume of the molecule inside the iso-contour surfaces.

#### 2.4. Setting the scale factor between the surface area and the number of regions

In our procedure, the adjusted surface area SA_{adjusted} is normally calculated over a range of values of sharpening parameters for a particular map. We choose the scale factor *C*_{scale} by setting it to a value that yields the same value of the adjusted surface area for the most-sharpened version of a map as for the least-sharpened version of the same map. This therefore leads to a set of values for the adjusted surface area SA_{adjusted} *versus* the overall *B* factor *B*_{iso} in which the values of SA_{adjusted} are the same for the lowest and highest values of *B*_{iso}. Normally, owing to a sharp increase in the number of regions for low values of *B*_{iso}, the values of the adjusted surface area for intermediate values of the overall *B* factor are higher than those at the extremes. Our strategy for sharpening is to choose sharpening parameters that maximize the adjusted surface area SA_{adjusted}.

#### 2.5. Identifying cases where the algorithm is not applicable

Our procedure of maximizing the adjusted surface area is applicable in cases where the maximum of the adjusted surface area occurs at intermediate values of *B*_{iso} and where the maximum is clearly identifiable. We identify cases where the approach is not applicable in two ways. Firstly, if the scale factor *C*_{scale} would lead to an adjusted surface area at a preset intermediate value of the overall *B* factor *B*_{iso} (typically *B*_{iso} = 50 Å^{2}) that is less than the values at the extreme values of the overall *B* factor *B*_{iso}, the approach is considered not to be applicable. Secondly, we estimate the signal and noise in the adjusted surface area, and we do not apply the procedure if the signal to noise is less than a preset ratio (typically 3:1). The signal is taken to be the difference between the maximum value of the adjusted surface area and the value of the adjusted surface area at the extremes of *B*_{iso}. The noise is estimated from the local variation of the adjusted surface area for adjacent values of *B*_{iso}. The local variation of the adjusted surface area is taken to be the difference between the value of the adjusted surface area SA_{adjusted} at a particular value *B*_{iso} and the value of SA_{adjusted} interpolated between the neighboring two values of *B*_{iso}. The noise is then taken to be the r.m.s. value of this local variation of the adjusted surface area.

#### 2.6. Map sharpening based on kurtosis, half-map correlation or a model

We implemented methods for map sharpening based on kurtosis, the correlation between half-maps and on map–model correlation. For sharpening based on kurtosis the overall sharpening *B* value was simply adjusted to maximize the kurtosis of the map.

For half-map and map–model correlation a procedure similar to that developed by Rosenthal & Henderson (2003) was applied, except that the resolution-dependence of a model-based map was used as a normalization factor in each case (this normalization factor is similar to the scale factor used by Jakobi *et al.*, 2017). The map–map or map–model Fourier shell correlation (FSC) was calculated in shells of resolution. For half-map correlations, this correlation (CC) was converted into an estimate of the true correlation CC* between the full map and a perfect map using the formula (Rosenthal & Henderson, 2003)

For map–model correlations, by analogy to the σ_{A} analysis of errors in a crystallographic map (Read, 1986), the correlation (CC) was assumed to be related to the true correlation CC* by an exponential error function,

where the effective *B* value (*B*_{eff}) is estimated from a guess of the error in the model (r.m.s._{e}, assumed to be 1/4 the resolution of the map) using the relation

Finally, for each shell of resolution the ratio (*R*) of the mean amplitude of the Fourier coefficients for a model-based map calculated with *B* values of zero to the mean amplitudes of the Fourier terms representing the full starting map is calculated. Then, for half-map and map–model sharpening, the scale factor applied to all amplitudes in this shell of resolution is *R* CC*. That is, amplitudes are increased for shells where the model-based amplitudes calculated with a *B* value of zero are larger than those obtained from the original map and are decreased for shells where the estimated accuracy of the map is low.

#### 2.7. Local map sharpening

We applied sharpening to overlapping local regions in a map and combined the resulting partial maps to yield a locally sharpened map. In order to focus on the regions containing density, a set of regions of density representing most of the density in the map were first identified. An overlapping set of boxes covering these regions of density were then extracted from the original map. The size of these boxes was typically 40 grid units on a side. The map corresponding to each box was sharpened, yielding a set of overlapping sharpened maps. The maps were combined by weighting overlapping regions based on the distance of each grid point from the center of the corresponding box with an exponential fall-off of the weighting factor *w*,

where the characteristic distance *d*_{o} is set by default to be the average nearest-neighbor distance between the centers of boxes. In our implementation, when local sharpening is carried out using the adjusted surface area as the optimization target the same resolution cutoff is used throughout. It is likely that the effectiveness of the procedure could be improved with a suitable local resolution for each box.

#### 2.8. Map–model correlation using a model with *B* values of zero as a metric of map quality

It is useful to have a metric of map quality in order to compare different approaches for map sharpening. For this work, the metric of map quality is the correlation between the map of interest and a map calculated from a model in which all of the *B* factors are set to zero and using data to the same resolution as used in the map of interest. (This is performed by taking the original map, converting it into a box of Fourier map coefficients, removing all terms that have a resolution higher than the defined value and finally computing a real-space map from the truncated set of Fourier map coefficients.) Calculation of this map–model correlation is carried out using a model that has *B* factors set to zero.

#### 2.9. Map–model pairs chosen from the Data Bank (EMBD) and the Protein Data Bank (PDB)

We began with all 1097 maps in the EMDB as of August 2017 that had an associated model in the PDB. We then removed 91 map–model pairs for which the resolutions reported in the PDB and the EMDB differed by 0.2 Å or more or were not reported, and selected all of the remaining pairs for which the reconstruction resolution was 4.5 Å or better, yielding 401 map–model pairs. We then removed 24 additional map–model pairs for which the map–model correlation was less than 0.3. Finally, we removed 16 map–model pairs for which the signal-to-noise criterion for applying our procedure was not satisfied (see §2.5), leaving 361 map–model pairs at resolutions from 1.8 to 4.5 Å. For some analyses only a subset of these data sets were tested. The comparison of automatic sharpening with maximization of model–map correlation used 345 of the data sets, not including 16 data sets that were subsequently added to the analysis Comparisons involving half-data-set correlations only include the 59 data sets in our sample that had half-data sets deposited. The resolutions used in this work were the resolutions reported in the EMDB.

For each map–model pair the region of the map near the model was extracted, including density within 5 Å of any atom in the model, and placed in a new (typically smaller) box with the same gridding as the original map and a new origin at one corner of the map. This map and associated model (translated as necessary to match the new map) were used in the analyses described here.

### 3. Results and discussion

#### 3.1. Optimizing parameters for map sharpening/blurring by maximization of the adjusted surface area

The key idea in this work is that a map that is optimally sharpened will have more detail, leading to a high surface area of an iso-contour surface, yet at the same time it will have a high degree of connectivity, leading to a low number of regions enclosed by the same iso-contour surface. Figs. 2 and 3 illustrate these relationships for the cryo-EM map of the anthrax protective antigen pore (Jiang *et al.*, 2015) with a map (EMD-6224) deposited in the EMDB (Lawson *et al.*, 2016) and a model (PDB entry 3j9c) deposited in the PDB (Berman *et al.*, 2000; Bernstein *et al.*, 1977) and an analysis at a nominal resolution of 2.9 Å. In each panel of Fig. 2 the deposited map is represented as a Fourier series up to a resolution of 2.9 Å and the amplitudes are adjusted with various `sharpening' *B* factors that emphasize or de-emphasize high-resolution information in the map.

In Fig. 2(*a*) the map is sharpened so that the overall resolution-dependence is approximated by a *B* factor of −100 Å^{2} (the amplitudes strongly increase at high resolution). In Fig. 2(*b*) the *B* factor is 60 Å^{2} (the amplitudes fall off somewhat at high resolution) and in Fig. 2(*c*) it is +150 Å^{2} (the amplitudes fall off substantially at high resolution). Examining Fig. 2(*a*), it can be seen that this highly sharpened map has a high level of detail but is also somewhat fragmented (the map has some breaks where there is no density at the locations of atoms in the model). Framed in terms of adjusted surface area, this map has a high surface area of the iso-contour surfaces, but the map also has a large number of separate regions that are enclosed by these surfaces. Fig. 2(*b*) still has a high level of detail but has less fragmentation. Fig. 2(*c*) has much less detail and correspondingly less surface area of the iso-contour surfaces, but it also has less fragmentation than either Figs. 2(*a*) or 2(*b*).

Fig. 3 quantifies this analysis of the maps in Fig. 2. Fig. 3 shows the surface area, the number of regions and the adjusted surface area for a series of such maps with overall *B* factors varying from −100 to 300 Å^{2}. It can be seen that as the overall *B* factor for the map decreases (leading to an emphasis of high-resolution information), the surface area of the map increases. The number of regions inside iso-contour surfaces also increases in general; in particular, the number of regions becomes much larger very rapidly for maps with very low (or negative) overall *B* values. This rapid increase in the number of regions can be understood in terms of over-sharpening of the map and the associated fragmentation of regions of density.

The adjusted surface area is the surface area less a constant times the number of regions. This constant is set (see §2) for a particular map to yield the same value of the adjusted surface area for sharpening with the extreme low and high *B* values considered. It can be seen that after setting this constant (*C*_{scale}) the adjusted surface area SA_{adjusted} increases as a function of the overall *B* value of the map, comes to a maximal value and then decreases. In our procedure, the optimal values of the sharpening parameters are those that lead to a maximum of the adjusted surface area SA_{adjusted}. For the map of the anthrax protective antigen pore, this leads to an optimized map with an overall *B* value of about 20 Å^{2} (Fig. 2*e*).

Fig. 4 illustrates the resolution-dependence of the mean amplitude of Fourier coefficients for the deposited map of the anthrax protective antigen pore and compares them with those of the auto-sharpened map obtained on the basis of Fig. 3. It can be seen that in this case the auto-sharpened map has a resolution-dependence of amplitudes that is similar to that of the deposited map, except that owing to the blurring of the map near the high-resolution limit the amplitudes for the auto-sharpened map fall off in this resolution range. Figs. 2(*d*) and 2(*e*) show that the overall appearances of the deposited and auto-sharpened maps are quite similar in this case.

#### 3.2. Map–model correlation using a model with *B* factors of zero as a metric of map quality

It would be useful to have an overall metric of map quality that could be used to evaluate the quality of maps sharpened using our automated methods. As the maps we are analyzing in our tests have already been interpreted, structural models are available and can be used for this purpose. A challenge in map evaluation, even having a structural model, is that it is not exactly clear on what basis the map should be evaluated. Presumably the map should be evaluated based on its interpretability, but this could be visual interpretability, the quality of the model that can be automatically built into the map, or some other measure of interpretability.

For crystallographic maps, a commonly used metric of map quality is simply the map–model correlation (see, for example, Afonine *et al.*, 2018). This is calculated by creating a model-based map and obtaining the correlation between density values in the model-based map and those in the map of interest (for an analysis of related metrics, see Urzhumtsev *et al.*, 2014). Such a map–model correlation may be calculated for the map as a whole or just for grid points near atoms in the model. Normally, the atomic model used in this calculation will have been refined against X-ray data and will therefore have refined values of the *B* factors that match the resolution-dependence of the measured X-ray (intensity) data. The *B* factors used to generate the model-based map are normally the *B* factors that are associated with the refined model, and in this way the resolution-dependencies of the model-based map and of the original data are similar.

For cryo-EM maps the situation is more complicated because atomic models are typically refined, not against primary data, but rather by comparison with a map that been modified with resolution-dependent scaling factors (see, for example, Rosenthal & Henderson, 2003). Additionally, the deposited maps may or may not be the maps used in or map interpretation. Consequently, the *B* factors in refined cryo-EM maps do not necessarily represent the resolution-dependence of the original images used in the reconstruction or that of the deposited maps.

To sidestep this issue, in this work we calculate model-based maps using *B* factors of zero. This map of course does not depend on the values of the *B* factors in the model and therefore it avoids the problems associated with the values of refined *B* factors. Further, it has the advantage that if the model were perfect this model-based map would (more or less) be the clearest map that could be created to represent it. The quality of a particular map of interest is then evaluated by calculating its correlation with such a zero *B*-factor model-based map, considering only points in the map that are near atoms in the model. (An alternative approach to using a model to evaluate a map would be to adjust the *B* factors in the model to maximize the map–model correlation. It is important however to note that in the present work the goal is to identify the optimal sharpening of a map, not to identify whether the map is accurate. In this context, the approach of adjusting *B* factors in the model to maximize the map–model correlation has the disadvantage that a map with slightly more accurate low-resolution information than high-resolution information could have a maximal map–model correlation when it is highly blurred, whereas the most useful map would be one that includes the high-resolution information.)

The key question about using model–map correlations based on a model with zero *B* factors to evaluate maps is whether this metric can in fact identify maps that have high interpretability. We attempt to address this issue here using two sets of variably sharpened maps. We first examine the maps visually, evaluating them based on the connectivity of density along the polypeptide chains and on the level of detail in the density. We then evaluate them based on map–model correlations. Figs. 5 and 6 each show a series of maps that have been sharpened using (3) with a variety of sharpening *B* values, leading to overall *B* values of the maps that range from 0 to 120 Å^{2}. Examining the series of maps with variable sharpening in Fig. 5 (innexin-6 gap junction channel; EMD-9571 and PDB entry 5h1r; Oshima *et al.*, 2016), it can be appreciated that the map in Fig. 5(*e*) (or perhaps Fig. 5*d*) is probably the easiest to interpret, in that the density most closely matches the model and there are no breaks in the density along the polypeptide chain. These maps have overall *B* values of 80 and 60 Å^{2}, respectively. For Fig. 6 (TRPV1 channel; EMD-5778 and PDB entry 3j5p; Liao *et al.*, 2013) the situation is somewhat different, and for this map the same criteria plausibly lead to the conclusion that the map in Fig. 6(*b*), with an overall *B* value of about 20 Å^{2}, is the most useful.

We then evaluated each map in Figs. 5 and 6 based on the map–model correlation. Fig. 7(*a*) shows an analysis of the maps in Fig. 5 in which the map–model correlation is plotted as a function of the overall *B* value for sharpened maps. Consider first the points labelled `model *B*_{iso} 0'. These reflect the map–model correlation as a function of overall map *B* values where the model map is calculated with *B* values (atomic displacement factors) of zero. It can be seen that the map–model correlation has a maximum at an overall map *B* value of 90–100 Å^{2}, similar to the visually optimal overall map *B* value of about 80 Å^{2}. Next note that if the same analysis is performed using model–map correlations calculated with model *B* values of −100 Å^{2} the map–model correlation has a maximum at an overall map *B* value of 40–50 Å^{2}, and if the model *B* values are set to +100 Å^{2} the map–model correlation has a maximum at an overall map *B* value of 160 Å^{2}. This case illustrates that the values used for the model *B* values have a very large effect on the overall map *B* value that leads to a maximal model–map correlation. It also shows that using model *B* values of zero leads to a maximal model–map correlation at overall map *B* values that are similar to those identified as optimal by visual inspection.

In Fig. 7(*b*) we apply the same analysis to the maps in Fig. 6. Model *B* values of zero lead to a maximal model–map correlation at an overall map *B* value (*B*_{iso} = 10–20 Å^{2}) that is similar to the value identified as optimal by visual inspection (*B*_{iso} = 20 Å^{2}). These results suggest that map–model correlation calculating the model-based map using model *B* values of zero can reproduce visual evaluations of map interpretability quite well.

#### 3.3. Model accuracy required for evaluating maps based on map–model correlation with zero *B* factors

An important caveat in using map–model correlation with model *B* values of zero as a metric of map interpretability is that the model has to be similar to the true structure. We carried out a simulation to obtain an idea of how accurate the model has to be in order for these map–model correlations to identify interpretable maps. We took chain *A* of the major capsid protein of rotavirus (PDB entry 1qhd; Mathieu *et al.*, 2001) and used it to calculate model-based Fourier coefficients to a resolution of 3.5 Å using an overall *B* value of 72 Å^{2}. We then randomized the phases, introducing variable resolution-dependent errors (Terwilliger, 1999), to yield a series of maps of varying quality. We then began with each map of variable quality and sharpened or blurred the map to yield modified maps with a range of overall *B* values. We began again with the same maps of variable quality, and for each we generated a series of new models by randomizing the starting model with *SHAKE* (Ryckaert *et al.*, 1977) and refining it against the map with real-space (Afonine *et al.*, 2018). This process yielded (i) a set of maps with variable quality, and for each such map (ii) a set of sharpened/blurred maps with varying overall *B* values and (iii) a set of refined models with varying coordinate errors.

We then used these maps and models to determine how accurate a model has to be for the maximum of map–model correlation to occur at a very different overall map *B* value to the value obtained when there was no error in the model. We started with the original model-based map without errors along with its corresponding sharpened/blurred versions with varying overall *B* values and its corresponding refined models with varying coordinate errors. For each model with varying coordinate errors, we set the model *B* values to zero and carried out an analysis similar to that used in Figs. 7(*a*) and 7(*b*), yielding a plot of map–model correlation as a function of overall map *B* value and a value of the overall map *B* value (*B*_{iso}) that maximized map–model correlation. Fig. 7(*c*) (blue curve) shows the overall map *B* values maximizing map–model correlation as a function of the r.m.s. error in the model coordinates. It can be seen that the optimal map *B* value is about 60 Å^{2} for the perfect map and the perfect model. For models with increasing error, the overall *B* value maximizing the map–model correlation gradually decreases, changing by about −20 Å^{2} from its initial value when the r.m.s. error increases to about 1 Å and by −60 Å^{2} when the r.m.s. error increases to 2 Å.

We repeated this analysis using maps of variable quality with the map–model correlation (CC) between the original model with *B* values set to zero and the maps of variable quality ranging between 0.42 and 0.98 (Fig. 7*c*). The optimal overall *B* values for a perfect model (an ordinate of zero in Fig. 7*c*) gradually increase with decreasing map quality, as expected. The overall *B* values maximizing map–model correlation gradually decrease with increasing model error in much the same way that they did for the perfect map. As maps that differ in overall *B* value by 20 Å^{2} are very similar and even those differing by 60 Å^{2} are not greatly different (see Figs. 5 and 6), this analysis suggests that models that have coordinate errors of up to about 1.5–2.0 Å, or about half of the resolution of this 3.5 Å resolution map, could be effectively used to evaluate the quality of this map.

#### 3.4. Evaluation of adjusted surface-area maximization as a method for optimizing map interpretability

We applied our algorithm for automatic map sharpening to 345 of the map–model pairs from the EMDB and PDB identified above. We then compared the overall *B* values obtained using our procedure with those obtained by maximizing map correlation to a model-based map calculated with model *B* values of zero (Fig. 8). It can be seen that the automatic map-sharpening procedure yields overall *B* values similar to those obtained by maximizing map–model correlation. The r.m.s. difference between overall *B* values using the two approaches is 32 Å^{2}, which by reference to Figs. 5 and 6 would lead to relatively small differences in map appearance.

We next compared the maps obtained using our automatic map-sharpening procedure with the maps deposited in the EMDB. In each case we used the map–model correlation calculated with model *B* values of zero as our metric of map quality. Fig. 9(*a*) compares our adjusted surface area-based maps for 361 deposited map–model combinations with the original maps. It can be seen that in a high fraction of cases (65%) the automatically sharpened maps have a map–model correlation that is better than that of the deposited maps: only 4% have a correlation more than 0.02 lower than the deposited maps, while 28% have a correlation at least 0.02 higher. Overall, the mean correlation is 0.02 higher for automatic sharpening. Considering that the deposited maps are generally already optimized by the depositors, this indicates that the automatic sharpening procedure works well.

#### 3.5. Comparison of methods for optimizing map interpretability

We compared our procedure for map sharpening based on adjusted surface area with other procedures for map sharpening. Fig. 9(*b*) illustrates that map sharpening using our adjusted surface-area approach yields a higher map–model correlation (0.02 better on average) compared with using half-map-based sharpening (Rosenthal & Henderson, 2003). Fig. 9(*c*) illustrates that map sharpening using our adjusted surface-area approach also yields a higher map–model correlation (0.01 better on average) compared with using the kurtosis of the map as a target. Fig. 9(*d*) shows that model-based optimization is just about the same as (no difference in the average map–model correlation) our adjusted surface-area-based procedure.

#### 3.6. Local sharpening of maps

Jakobi *et al.* (2017) have recently described a method for the local sharpening of maps that uses a refined model and is based on using the local resolution-dependence of the model-based map to normalize the Fourier terms in a map of interest. They find that some regions of maps can be more interpretable when local sharpening is applied than when global sharpening is used. We have implemented a local map-sharpening algorithm and it can be applied along with any of our approaches for map sharpening. In our implementation of local sharpening we have not seen any substantial differences compared with global sharpening. Fig. 10(*a*) compares global and local sharpening using maximization of the adjusted surface area. Fig. 10(*b*) compares them using half-map sharpening and Fig. 10(*c*) compares them with model-based sharpening. For maximization of the adjusted surface area and model-based sharpening the maps have very similar overall correlations to model-based maps calculated with *B* values of zero, indicating that at least overall the maps are about equally useful. For half-map sharpening the locally sharpened maps are slightly better on average than the globally sharpened maps (these locally sharpened maps are still not quite as good as those obtained by maximization of the adjusted surface area, however). As our results are specific to the methods for local sharpening described here, it is entirely possible that a different implementation might yield more substantial improvements in map quality.

#### 3.7. Examples of map improvement with auto-sharpening

Fig. 11 illustrates the map improvement that can be obtained with auto-sharpening. The purpose of this figure is to show that the auto-sharpening procedure can automatically provide highly interpretable maps without manual intervention. The examples illustrated were chosen based on the increase in map–model correlation after sharpening, so they reflect the maximum improvement that might be anticipated, and similar results could presumably also be obtained by careful manual adjustment of sharpening parameters. For each case, Fig. 11 shows a section of the original map (for example Fig. 11*a*) and the same section of the auto-sharpened map (for example Fig. 11*b*), along with the overall *B* values associated with each map and the contour level used to illustrate the map. It can be seen that the improvement in map interpretability through automatic sharpening can be considerable.

### 4. Conclusions

Our major observation is that it is possible to automatically identify optimized sharpening parameters for cryo-EM maps by simultaneously maximizing the level of detail in the maps and the connectivity of the maps. The adjusted surface area of a map reflects these factors and does not require a model or any other prior interpretation of a map, and maximizing it leads to improved maps. A secondary observation is that a useful metric of cryo-EM map quality is the correlation between the map and a model-based map in which the atomic *B* values all have values of zero. Applying our automatic map-sharpening procedure to 361 cryo-EM maps with resolutions from 1.8 to 4.5 Å and evaluating the resulting maps using this metric, we find that our procedure can improve deposited maps, that it is an improvement over both kurtosis-based and half-map sharpening procedures, and that it is about equal overall to model-based map sharpening, all as implemented in our tools.

The focus in this work is on the optimization of cryo-EM maps, but our procedures can also be applied to X-ray crystallographic maps. We have carried out limited tests on crystallographic cases, but it seems likely that the approaches will generally be suited to both cryo-EM and crystallographic maps. Two important differences, however, are that for crystallographic maps in the early stages of 1, in our tools for automated (Terwilliger *et al.*, 2009) by experimental phasing and for iterative model building, density modification and (Terwilliger *et al.*, 2008), the current default is to sharpen maps using an based on the resolution of the map. Applying an optimization technique such as that developed here could potentially improve both of these steps in structure determination.

### Funding information

The authors appreciate support received from the US National Institutes of Health (grant P01GM063210 to PDA and TCT). This work was partially supported by the US Department of Energy under contract DE-AC02-05CH11231.

### References

Adams, P. D. *et al.* (2010). *Acta Cryst.* D**66**, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar

Afonine, P. V., Moriarty, N. W., Mustyakimov, M., Sobolev, O. V., Terwilliger, T. C., Turk, D., Urzhumtsev, A. & Adams, P. D. (2015). *Acta Cryst.* D**71**, 646–666. Web of Science CrossRef IUCr Journals Google Scholar

Afonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). *bioRxiv*, 249607. https://doi.org/10.1101/249607. Google Scholar

Baldwin, P. R., Tan, Y. Z., Eng, E. T., Rice, W. J., Noble, A. J., Negro, C. J., Cianfrocco, M. A., Potter, C. S. & Carragher, B. (2018). *Curr. Opin. Microbiol.* **43**, 1–8. CrossRef Google Scholar

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). *Nucleic Acids Res.* **28**, 235–242. Web of Science CrossRef PubMed CAS Google Scholar

Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). *J. Mol. Biol.* **112**, 535–542. CSD CrossRef CAS PubMed Web of Science Google Scholar

Burnley, T., Palmer, C. M. & Winn, M. (2017). *Acta Cryst.* D**73**, 469–477. CrossRef IUCr Journals Google Scholar

DeLaBarre, B. & Brunger, A. T. (2006). *Acta Cryst.* D**62**, 923–932. Web of Science CrossRef CAS IUCr Journals Google Scholar

Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). *Acta Cryst.* D**66**, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar

Falke, S., Tama, F., Brooks, C. L. III, Gogol, E. P. & Fisher, M. T. (2005). *J. Mol. Biol.* **348**, 219–230. Web of Science CrossRef PubMed CAS Google Scholar

Hite, R. K., Tao, X. & MacKinnon, R. (2017). *Nature (London)*, **541**, 52–57. CrossRef Google Scholar

Jakobi, A. J., Wilmanns, M. & Sachse, C. (2017). *Elife*, **6**, e27131. Web of Science CrossRef PubMed Google Scholar

Jiang, J., Pentelute, B. L., Collier, R. J. & Zhou, Z. H. (2015). *Nature (London)*, **521**, 545–549. CrossRef Google Scholar

Joseph, A. P., Malhotra, S., Burnley, T., Wood, C., Clare, D. K., Winn, M. & Topf, M. (2016). *Methods*, **100**, 42–49. Web of Science CrossRef CAS PubMed Google Scholar

Kühlbrandt, W. (2014). *Science*, **343**, 1443–1444. Web of Science PubMed Google Scholar

Lawson, C. L., Patwardhan, A., Baker, M. L., Hryc, C., Garcia, E. S., Hudson, B. P., Lagerstedt, I., Ludtke, S. J., Pintilie, G., Sala, R., Westbrook, J. D., Berman, H. M., Kleywegt, G. J. & Chiu, W. (2016). *Nucleic Acids Res.* **44**, D396–D403. Web of Science CrossRef PubMed Google Scholar

Liao, M., Cao, E., Julius, D. & Cheng, Y. (2013). *Nature (London)*, **504**, 107–112. Web of Science CrossRef CAS PubMed Google Scholar

Liu, C. & Xiong, Y. (2014). *J. Mol. Biol.* **426**, 980–993. CrossRef Google Scholar

Mathieu, M., Petitpas, I., Navaza, J., Lepault, J., Kohli, E., Pothier, P., Prasad, B. V., Cohen, J. & Rey, F. A. (2001). *EMBO J.* **20**, 1485–1497. Web of Science CrossRef PubMed CAS Google Scholar

Merk, A., Bartesaghi, A., Banerjee, S., Falconieri, V., Rao, P., Davis, M. I., Pragani, R., Boxer, M. B., Earl, L. A., Milne, J. L. & Subramaniam, S. (2016). *Cell*, **165**, 1698–1707. Web of Science CrossRef CAS PubMed Google Scholar

Merritt, E. A. & Bacon, D. J. (1997). *Methods. Enzymol.* **277**, 505–524. CrossRef PubMed CAS Web of Science Google Scholar

Nicholls, R. A., Long, F. & Murshudov, G. N. (2012). *Acta Cryst.* D**68**, 404–417. Web of Science CrossRef CAS IUCr Journals Google Scholar

Oshima, A., Tani, K. & Fujiyoshi, Y. (2016). *Nature Commun.* **7**, 13681. CrossRef Google Scholar

Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E. (2004). *J. Comput. Chem.* **25**, 1605–1612. Web of Science CrossRef PubMed CAS Google Scholar

Read, R. J. (1986). *Acta Cryst.* A**42**, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar

Rosenthal, P. B. & Henderson, R. (2003). *J. Mol. Biol.* **333**, 721–745. Web of Science CrossRef PubMed CAS Google Scholar

Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. C. (1977). *J. Comput. Phys.* **23**, 327–341. CrossRef CAS Web of Science Google Scholar

Terwilliger, T. C. (1999). *Acta Cryst.* D**55**, 1863–1871. Web of Science CrossRef CAS IUCr Journals Google Scholar

Terwilliger, T. C., Adams, P. D., Read, R. J., McCoy, A. J., Moriarty, N. W., Grosse-Kunstleve, R. W., Afonine, P. V., Zwart, P. H. & Hung, L.-W. (2009). *Acta Cryst.* D**65**, 582–601. Web of Science CrossRef CAS IUCr Journals Google Scholar

Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). *Acta Cryst.* D**64**, 61–69. Web of Science CrossRef CAS IUCr Journals Google Scholar

Urzhumtsev, A., Afonine, P. V., Lunin, V. Y., Terwilliger, T. C. & Adams, P. D. (2014). *Acta Cryst.* D**70**, 2593–2606. Web of Science CrossRef IUCr Journals Google Scholar

Wang, B.-C. (1985). *Methods Enzymol.* **115**, 90–112. CrossRef CAS PubMed Google Scholar

Wlodawer, A., Minor, W., Dauter, Z. & Jaskolski, M. (2008). *FEBS J.* **275**, 1–21. Web of Science CrossRef PubMed CAS Google Scholar

Zhang, Z. & Chen, J. (2016). *Cell*, **167**, 1586–1597. CrossRef Google Scholar

Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). *CCP4 Newsl. Protein Crystallogr.* **43**, contribution 7. Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.