research papers
CERES: a cryo-EM re-refinement system for continuous improvement of deposited models
aMolecular Biosciences and Integrated Bioimaging, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA, bDepartment of Biochemistry, Duke University, Durham, NC 27710, USA, and cDepartment of Bioengineering, University of California Berkeley, Berkeley, CA 94720, USA
*Correspondence e-mail: dcliebschner@lbl.gov
The field of electron cryomicroscopy (cryo-EM) has advanced quickly in recent years as the result of numerous technological and methodological developments. This has led to an increase in the number of atomic structures determined using this method. Recently, several tools for the analysis of cryo-EM data and models have been developed within the Phenix software package, such as phenix.real_space_refine for the of atomic models against real-space maps. Also, new validation metrics have been developed for low-resolution cryo-EM models. To understand the quality of deposited cryo-EM structures and how they might be improved, models deposited in the Protein Data Bank that have map resolutions of better than 5 Å were automatically re-refined using current versions of Phenix tools. The results are available on a publicly accessible web page (https://cci.lbl.gov/ceres). The implementation of a Cryo-EM Re-refinement System (CERES) for the improvement of models deposited in the wwPDB, and the results of the re-refinements, are described. Based on these results, contents are proposed for a `cryo-EM Table 1', which summarizes experimental details and validation metrics in a similar way to `Table 1' in crystallography. The consistent use of robust metrics for the evaluation of cryo-EM models and data should accompany every structure deposition and be reported in scientific publications.
Keywords: cryo-EM; Phenix; re-refinement; CERES; scientific web pages.
1. Introduction
Cryo-EM is an experimental technique that in the past has commonly been used to investigate large protein complexes, filaments and viruses. While the method was often limited to low resolution (5–9 Å), technological advances, such as the development of direct electron detectors (Faruqi et al., 2003; Milazzo et al., 2005; Deptuch et al., 2007; Li et al., 2013) and improvements in image processing (Campbell et al., 2012; Scheres, 2012; Bai et al., 2015), have led to an exponential increase in the number of cryo-EM models deposited in the Protein Data Bank (PDB; Berman et al., 2000; wwPDB Consortium, 2019). As a consequence, cryo-EM is now the third principal method for macromolecular (Fig. 1), representing 3.1% of deposited models in the PDB. While this is currently behind X-ray crystallography (88.8%) and nuclear magnetic resonance (NMR; 7.9%), some researchers project that deposition numbers will reach those of crystallography in only five years (Hand, 2020). The advances in cryo-EM technology have led to greatly improved resolutions of the deposited 3D reconstructions (Fig. 2). Low-resolution cryo-EM density maps can be used to dock models from X-ray crystallography or NMR, but density maps of 5 Å resolution or better can be used to solve structures de novo and to refine atomic models, similar to X-ray crystallography. More recently, the majority of deposited maps have resolutions of better than 5 Å (Figs. 2 and 3), with 1.15 Å currently being the highest (EMDB entry EMD-11668, PDB entry 7a6a; K. M. Yip, N Fischer, A. Chari & H. Stark, unpublished work).
Cryo-EM 3D reconstructions are similar to the density maps derived from X-ray diffraction experiments. Hence, the tools developed for the building and 1 which is necessary to compute (X-ray) density maps, and is therefore typically performed in In contrast, cryo-EM reconstructions are real-space maps, making in real space a natural choice (Afonine, Poon et al., 2018).
of crystallographic models can be readily modified for application to cryo-EM. Intensities from a diffraction experiment do not contain phase information,Best practices for the validation of cryo-EM maps and models have not yet been established. The validation criteria developed for crystallographic models can readily be applied to cryo-EM, as macromolecular stereochemistry obeys the same principles regardless of the experimental technique. However, data quality and model-to-data fit need to be formulated specifically for cryo-EM data. The process of finding the best validation parameters is therefore still ongoing, although progress has been made (Barad et al., 2015; Afonine, Klaholz et al., 2018; Williams et al., 2018; Richardson, Williams, Videau et al., 2018; Sobolev et al., 2020). In particular, the EMDataResource team (https://challenges.emdataresource.org/) works with the cryo-EM community to establish validation methods for the structure-determination process. As suggested by the EM Validation Task Force (Henderson et al., 2012), the team hosts benchmark challenges to stimulate community discussion about validation procedures. The most recent challenge is about model validation (Lawson et al., 2020).
The Phenix software (Liebschner et al., 2019) offers a series of programs that focus on the analysis of cryo-EM maps and models. For example, phenix.real_space_refine refines atomic models against real-space maps (Afonine, Poon et al., 2018), and the `Comprehensive validation' tool performs model validation using established geometry criteria from X-ray crystallography, based on MolProbity (Davis et al., 2007; Chen et al., 2010; Richardson, Williams, Hintze et al., 2018; Williams et al., 2018), and calculates cryo-EM-specific data and model-versus-data quality indicators (Afonine, Klaholz et al., 2018). To improve a cryo-EM map, the following tools can be used: automatic map sharpening (phenix.auto_sharpen; Terwilliger et al., 2018), density modification (phenix.resolve_cryo_em; Terwilliger, Ludtke et al., 2020; Terwilliger, Sobolev et al., 2020) and phenix.combine_focused_maps (recombination of the best parts of several maps). An initial atomic model representing a cryo-EM map can be obtained with phenix.dock_in_map and phenix.map_to_model (Terwilliger, Adams et al., 2020).
In light of these very recent advances in PDB-REDO procedure (Joosten, Salzemann et al., 2009; Joosten et al., 2014) have proved to be a great success. Brown et al. (2015) showed that cryo-EM models that were determined from maps with a resolution of 4 Å or better could be improved after Another benefit of re-refining a large number of models is that new computational methods and procedures can be tested for success, stability and validity. In this way, continuous improvement of the software and of the methods can be ensured.
and validation, it is worthwhile revisiting previously deposited cryo-EM structures. Much can be gained by re-refining each entry and assessing model and data quality. The cryo-EM field is evolving at such a rapid pace that recently deposited models were obtained (i) using the software and methods available at the time but which have since changed markedly and (ii) when no community-wide accepted consensus about validation was established. Therefore, re-refining cryo-EM structures with current methods represents an opportunity to obtain a snapshot of data and model quality based on consistent algorithms and validation criteria. Efforts to re-refine crystallographic models against diffraction data with theIn this work, we re-refined cryo-EM models with maps with a resolution of 5 Å or better. The results are available on a publicly accessible web page. Each processed map–model pair has an individual page that displays map, model and map-to-model validation statistics. A molecular viewer is incorporated, allowing easy visualization of maps and models along with identification of obvious issues. All re-refinement results can be also accessed in a table, letting the user browse for models with particular properties (for example, map resolution, model statistics or number of residues). We also suggest a set of metrics for model, data and model-versus-data quality (`cryo-EM Table 1') that should accompany every structural publication.
2. Materials and methods
2.1. Re-refinement procedure
The automated re-refinement procedure is divided into different tasks; a flow chart of the steps is shown in Fig. 4. All steps are described in detail in the following subsections. In summary: firstly, the maps and models are downloaded from the respective repositories and, if necessary, curated. Subsequently, the model composition is analyzed and chemical restraints for nonstandard compounds are created. The structure is then refined against the cryo-EM map using phenix.real_space_refine followed by examination of the statistics for the initial and the refined model. Finally, the results are exported and stored in a database. The script is based on code from the Computational Crystallography Toolbox (cctbx; Grosse-Kunstleve et al., 2002) and the Phenix software (Liebschner et al., 2019). Scripts are available in the RErefine GitHub repository (https://github.com/pafonine/CryoEMRErefine).
2.1.1. Obtaining maps and models
For structures determined by cryo-EM, the models and the maps are stored in different repositories, namely the PDB and the et al., 2011). Corresponding map–model pairs (model file, map files and half-maps if available) are downloaded to local storage for further processing. We note that one map can be associated with several models or one model can be associated with several maps.
Data Bank (EMDB; Lawson2.1.2. File preparation and curation
Models available in mmCIF format were converted to PDB format.2 If only the unique unit of a biological assembly was present in the model, the assembly was generated with the symmetry operators indicated in the header. The model and map files may each contain one or several values for the resolution of the map. In the majority of cases the values are the same, but if they were different then the map–model pair was not considered for re-refinement. The resolution that is indicated in both the model and map files is the `consensus resolution'. Maps and/or model(s) were not considered for further processing if one of the following conditions was met.
|
2.1.3. Model composition and ligand restraints
Model information, such as the number of ligands, residues and chains, the atomic displacement parameters (ADPs) and the occupancy statistics, were extracted from the PDB files created in the previous step. When a model contained ligands, it was sometimes necessary to supply geometry restraints for phenix.ready_set, which uses phenix.elbow (Moriarty et al., 2009).
In these cases, ligand restraints were obtained with2.1.4. Refinement
Refinements were performed with phenix.real_space_refine. The number of macrocycles was set to ten; electron scattering factors were used. Symmetry-related chains were detected automatically using the phenix.simple_ncs_from_pdb tool. If plausible copies are found, the operators relating them are derived and used to check whether the map was symmetrized (i.e. molecular symmetry was imposed during the reconstruction) using phenix.map_symmetry (Liebschner et al., 2019). Symmetry constraints are applied if the map was symmetrized or the resolution was worse than 4.5 Å. In addition to standard stereochemical and nonbonded restraints, we applied secondary-structure restraints (Headd et al., 2012) for protein and nucleic acid residues, Conformation-Dependent Library (CDL) restraints (Moriarty et al., 2016), Cβ-deviation restraints, rotamer restraints and Ramachandran plot restraints (Headd et al., 2012). The resolution limit for was set to the consensus resolution. The resolution does not affect the results, but is used to calculate and report the map–model (Afonine, Poon et al., 2018).
2.1.5. Statistics and plots
Validation of both the input structures and refined structures was performed. This allowed validation of the et al., 1997; Chen et al., 2010; Read et al., 2011; Gore et al., 2017; Wlodawer, 2017; Williams et al., 2018). Novel metrics that are specific to cryo-EM are summarized in Section 3.2. Plots were generated with Matplotlib, a Python 2D plotting library (Hunter, 2007).
step as well as a comparison of the properties of the initial and refined structures, model parameters, data quality and the model-to-data fit. For validation, the same resolution cutoff as for was applied. Most of the metrics are standard and are well documented in the crystallographic literature (Hooft2.1.6. Monthly computations
For smaller models (PDB file size <20 MB), the models are re-refined once a month. By default, the search function and the table show the results from the previous month (as the results of the current month are typically still being processed). Older results can be accessed as well, which allows comparison of the runs from different months. As larger models require significantly longer processing times, they are processed every six months.
2.2. Design of the website
The results of the re-refinements are stored in a database (PostgreSQL; https://www.postgresql.org). The webpage was created with Django (https://www.djangoproject.com), a web framework following the model–view–controller architectural pattern. Several common JavaScript and CSS libraries are used to create responsive content and tables: Bootstrap (https://getbootstrap.com), JQuery (https://jquery.com), Djangotables 2 (https://django-tables2.readthedocs.io/en/latest/) and Maphighlight (https://github.com/kemayo/maphilight). The NGL viewer (Rose & Hildebrand, 2015; Rose et al., 2018), a library for molecular visualization, is embedded into each page of individual results, allowing visualization of the maps and models (initial and refined).
3. Results and discussion
As of June 2020, the EMDB contains about 11 000 maps. Approximately 4500 of these have corresponding models in the PDB. Among the maps with a corresponding model in the PDB, ∼3300 have a resolution of 5 Å or better, with ∼350 of them not passing the map and model curation step. As of the time of preparation of this manuscript, ∼2750 map–model pairs successfully passed at least one step of the re-refinement procedure3 and the results are displayed on the CERES website https://cci.lbl.gov/ceres.
3.1. Filtered models
To be able to automatically re-refine models against maps, it is necessary to ensure that the maps and models pass some basic consistency checks. For example, of the 3308 map–model pairs (in local storage on 26 June 2020), 370 (more than 10%) did not pass the curation. The majority of failures are due to inconsistent resolution information in the files (263). Other failures are caused by inconsistent gridding in maps and half-maps (56), bad symmetry (or box) information (18) and the model consisting of more than 25% single-atom residues (17). The minority of failures are due to processing issues with Phenix tools (16). An example of inconsistent resolutions is PDB entry 6sfw [10162]:4 the resolution limit indicated in the map file is 4 Å, while that in the model (mmCIF format) file is 6 Å. A comment in the mmCIF file informs that although the overall resolution of the map is 4 Å, the region in which the molecule was modeled has a resolution of only 6 Å. Unfortunately, it is not practical to automatically screen comments that can explain the mismatch, leading to the removal of the entry from the list. Inconsistent gridding involving half-map files is not necessarily a problem for but such obvious disagreements are often indicative of other issues. Therefore, these instances are ignored in further processing. An example of inconsistent box information is PDB entry 6udk [20740]. A cryo-EM map is expressed as a three-dimensional array of density values inside a `map box'. For PDB entry 6udk, the cell indicated in the map file is a cube of length 291.20 Å. However, the cell lengths indicated in the model file are 1.0 Å. In such cases, the coordinates in the model file might actually correspond to a model that has been placed in a box with the same lengths as indicated in the map file. However, another possible scenario could be that the deposited map is in a box that is much larger than the molecule, while the molecule coordinates are expressed in a smaller box. In other instances, such as PDB entry 6hug [0275], the model file has no symmetry information at all.
The above examples of maps and models that failed the curation step underscore the necessity to store information consistently and clearly in the map and model metadata. Otherwise, errors in bookkeeping or ambiguous metadata can lead to erroneous results in automated data-mining efforts, such as structure-guided drug design (Dauter et al., 2014). Data-mining projects in crystallography from the Electron Density Server or PDB-REDO faced comparable issues (Kleywegt et al., 2004; Joosten, Salzemann et al., 2009; Joosten, Womack et al., 2009). Therefore, the information in the databases should be well curated and unambiguous so that they are usable by experts and non-experts alike.
3.2. Cryo-EM Table 1
Each re-refinement result is summarized in `Table 1', which represents the most important metrics of overall model and data quality and model-versus-data fit, similar to `Table 1' used in crystallography. While some quality indicators are identical for both, some are specific to cryo-EM. We recommend that new reports of cryo-EM structural studies include a cryo-EM Table 1, as it represents an expanded and amended version of the established Table 1 for crystallography. In the following, we briefly describe each indicator. This is not an exhaustive list, and newly developed metrics may eventually be added. The use of robust quality metrics by cryo-EM practitioners will enhance best practices of model building and
and will help in checking whether the model that is being built into the map is as correct as it could be.3.2.1. Model-quality indicators
3.2.2. Data-quality indicators
3.2.3. Model-versus-data fit
3.3. Results from re-refinement
Of the September 2020 data set, 2535 map–model pairs successfully passed through all steps of the re-refinement procedure. In the following, some of the quality indicators in cryo-EM Table 1 are discussed and compared for the initial and the re-refined structures.
We do not discuss the following indicators in detail. The geometry-restraints r.m.s.d. values are expected to be small in the resolution ranges typical for cryo-EM (although this may change in the future, with more and more maps determined at resolutions of 2 Å and better). Therefore, a decrease or increase does not necessarily mean that the structure is better or worse. We also do not show results for EMRinger score and d99 for the sake of brevity.
3.3.1. Map-to-model fit
In general, the map–model mask) reflects how well a model fits to a map. For a large majority of the re-refined models, the CCmask improves after real-space (Fig. 5). When the initial CCmask is high (>0.8), the improvement is relatively small. For initial CCmask values between 0.4 and 0.8, the improvement is usually more substantial. We also observe an improvement for models that have low initial correlation coefficients (0–0.4). While this may reflect a genuine improvement, it can also be indicative of problems with the starting model and map. We note that the should not be viewed as the single quality indicator of map-to-model fit, as there are scenarios in which it can be misleading. For example, if the CC is very low for one chain of a multi-chain model it decreases the overall Furthermore, a model with a good map–model correlation can have bad model-quality indicators. Nevertheless, the CC can be useful to flag serious problems. If the initial CCmask is very low, it indicates that the model does not fit well to the map. This may occur if the deposited model does not superpose on the map, for example when it is shifted or rotated (or both) with respect to the map. Among the structures that were re-refined, 41 models have initial correlation coefficients smaller than 0.2 (gray shaded area in Fig. 5). In Figs. 6, 7 and 8, which show histograms for model-quality indicators, models that have initial CC values below 0.2 are highlighted with a lighter color. In this way, it can be seen whether models that have a low initial model-to-map fit result in models with suboptimal geometry.
(CCAn example of a model with a low initial CC is PDB entry 6eu1 [3956]; the initial value for CCmask is 0.01, with the model being slightly shifted with respect to the map. The re-refined model yields a CCmask value of 0.72, with an average shift of more than 2 Å compared with the initial structure. While the map–model fit is visibly improved, the model geometry deteriorates: the clashscore increases from 6.5 to 17.7 and the percentage of residues in the favored Ramachandran region only marginally improves from 83.5% (which is already poor) to 83.7%. The default real-space procedure for individual coordinates is designed to improve the local details of the model. Larger scale changes require the application of simulated annealing (Grosse-Kunstleve et al., 2009) or morphing options (Terwilliger et al., 2013). However, very large-scale movements such as significant shifts or rotations of entire molecules or chains are outside the radius of convergence of the real-space procedures. Therefore, for maps and models that yield very poor correlations before a different strategy would be required. For example, the model could be first refined as a rigid body, or better still the model could be docked into the map with phenix.dock_in_map. These strategies will be explored in further versions of the re-refinement server.
While there is a slight tendency for higher resolution maps to yield a higher CCmask, manifested by slightly shifted distributions for the resolution ranges better than 3 Å, 3–4 Å and 4–5 Å, the trend is not obvious enough to postulate that higher resolution maps generally have better CC values. This is unlike X-ray crystallography, where R factors are somewhat correlated with resolution (Joosten, Salzemann et al., 2009; Read & Kleywegt, 2009; Urzhumtsev et al., 2009).
3.3.2. Geometry
3.3.3. Data resolution
The current standard for determining the resolution of cryo-EM reconstructions is the frequency-dependent comparison (Fourier shell dFSC of a map is where the FSC between half-maps is about 0.143, which corresponds to an estimated correlation of 0.5 between the experimentally determined map and the (unknown) true map (Rosenthal & Henderson, 2003). We note that other cutoff values, such as 0.5, may also be used (Böttcher et al., 1997; Frank, 2006). While the resolution does not affect the results, it is used to calculate the map–model Furthermore, the resolution is used as a criterion for filtering out candidates to be re-refined, i.e. models with maps that have a resolution worse than 5 Å are not refined. If the consensus resolution obtained from file headers is better than 5 Å but the real resolution of the map is lower, then the real-space procedure may not provide optimal results. It is therefore of interest to be able to filter out these cases.
FSC) of half-maps. The nominal resolutionA scatter plot of the recalculated dFSC (from half-maps, if available) plotted against the consensus resolution is shown in Fig. 9(a). Unfortunately, not all EMDB depositions include half-maps, so dFSC could not be recalculated for all maps that were used for re-refinements. Only 659 of the 2535 map–model pairs had half-maps (∼25%). For these maps, the dFSC from half-maps is generally within 1 Å of the consensus resolution (gray shaded area around the diagonal). In some cases, dFSC is significantly better than the consensus resolution. It is possible that in these instances the resolution was determined using a stricter cutoff than 0.143 for the FSC. In only 18 cases was dFSC worse than 5 Å, with the worst dFSC being 6.62 Å. It may be that there was some error in depositing the half-maps in this case. It therefore seems that the reported consensus resolution is generally quite close to the re-calculated dFSC value. An alternative method to estimate the resolution for the other ∼75% of maps are dmodel and d99. dmodel is shown in Fig. 9(b). Generally, dmodel is within 1 Å of the consensus resolution. There are significant outliers, which as expected occur for models that have a very low initial model-to-map correlation (CCmask lower than 0.2, yellow crosses). For these cases, the resolution estimate dmodel is likely to be flawed. For models with initial CCmask values between 0.2 and 0.5, dmodel is often close to the consensus resolution, but may differ significantly (red circles). The remaining cases where CCmask > 0.5 and dmodel is worse than 5 Å are candidates to be filtered out.
Intuitively, one expects dmodel to be most reliable when the model fits the map best, i.e. if the re-refined model fits better to the map, then its dmodel should be better than the initial value. However, it has been previously observed that this is not the case (Afonine et al., 2018). It is possible that the dmodel values do not follow this correlation owing to unusual atomic displacements or map peculiarities such as non-uniform resolution across the map volume.
For example, PDB entry 6i52 [4410] has a consensus resolution of 4.7 Å, while dmodel is equal to 8.9 Å. The deposited map has a visibly lower resolution than 4.7 Å (https://cci.lbl.gov/ceres/goto_entry/6i52_4410/09_2020). In this case, it is possible that the consensus resolution corresponds to a map that has been processed in some way, such as sharpening. Future re-refinements will include filtering for maps that have a low dmodel and at the same time an acceptable CCmask. This way, maps that are likely to have a lower resolution than reported can be excluded from re-refinement.
3.4. Examples
The following section discusses two examples where the automatic re-refinement procedure led to models with better model and model-versus-data metrics. Each case exemplifies the features of phenix.real_space_refine at different resolution ranges.
3.4.1. PDB entry 5k12, 1.8 Å resolution
PDB entry 5k12 [8194] represents the cryo-EM structure of glutamate dehydrogenase obtained at 1.8 Å resolution (Merk et al., 2016). The structure is composed of six protein chains, each with 294 residues (the full sequence length is 558) and some water molecules. All of the metrics in cryo-EM Table 1 improve after re-refinement (Table 1), such as the clashscore, CaBLAM outliers, Ramachandran metrics and CCmask. Most strikingly, the CC per residue improves systematically for most protein residues (Fig. 10a). An example is residue Tyr471 (chain F; Fig. 10b). In the initial model, the side chain points into a region without density, while in the re-refined structure the side chain flips to an area that is within density. This model was released in 2016 (the version of Phenix used at the time was not indicated). The fact that the current version of phenix.real_space_refine can improve this model further underscores the automated re-refinement and illustrates how the computational tools have improved over time. This model may have benefited from recent enhancements in side-chain fitting procedures, which improve the orientations of side chains and move them into density peaks. This is of particular interest for large models, where manual corrections are time-consuming.
‡The EMRinger score is calculated when the map resolution is better than 4 Å. |
3.4.2. PDB entry 3j4p, 4.8 Å resolution
PDB entry 6htx [5681] represents the cryo-EM structure of the adeno-associated virus obtained at 4.8 Å resolution (Xie et al., 2013). The model is composed of 31 000 residues in 60 chains. While the model-to-map fit is essentially identical before and after re-refinement, all model-validation metrics improve significantly (Table 1). The model was released in 2013, when mature atomic programs for cryo-EM models were not yet widely available. This highlights the benefit of revisiting older structures in the re-refinement project. We note that the model also contains 60 water molecules, 180 ions and sucrose octasulfate molecules (Na, Mg, SCR) and 120 alternative conformations. It is likely these features are remnants of the models that were used as templates for the virus structure. At 4.8 Å resolution, water molecules typically do not show up as clear density peaks. Ions may only be identified to some extent if they adopt their characteristic coordination.
4. The CERES website
The website is hosted at https://cci.lbl.gov/ceres/. The content of the website is divided into different pages.
|
5. Conclusion
This work describes re-refinements of cryo-EM models at map resolutions of up to 5 Å with phenix.real_space_refine. 2535 model–map pairs were re-refined and the results are publicly accessible on a website. A significant number of models could not be re-refined because the automatic curation procedures were thwarted by inconsistent metadata information. Cryo-EM data, although dramatically better than before the recent innovations, are still often at low enough resolution to present challenges for model building, and validation. Therefore, we were not surprised to observe that for those models that were successfully re-refined, the model-to-map fit and numerous geometric validation parameters generally improved. It is worth adding that a complete evaluation requires a robust measure of cross-validation, which is currently lacking. Therefore, the re-refinement results suggest that the current methods for constructing and refining models with cryo-EM data can be improved further, as can the best practices for practitioners of cryo-EM.
The results of the re-refinements also highlight some of the areas for improvement. As an example, the current algorithms in Phenix for Ramachandran restraints need to be better adapted to the features of low-resolution cryo-EM maps. Also, better validation metrics for assessing model-to-map agreement are required. The number of failures of automatic curation procedures argue for better standardization of metadata in map/model depositions, as well as the consistent collection of important data-validation information in the form of independent half-maps. The reanalysis of structures would be facilitated by the inclusion of more information in depositions. Half-maps could be potentially used as a measure for the resolution if the metadata give ambiguous results, but unfortunately the deposition of half-maps is not mandatory. To enable other researchers to judge the quality of cryo-EM structures, we suggest a number of validation metrics to be included in structural reports of cryo-EM models, similar to the `Table 1' used in crystallography. To address the artifacts that we observed in some models, future improvements of the re-refinement server will include filtering of ligands and water molecules that do not have any signal in the map. Furthermore, we will incorporate validation metrics that are specific for nucleic acids.
Footnotes
1However, phases can be inferred through experimental phasing approaches such as SAD, MAD, MIR etc.
2Future versions will use the mmCIF format exclusively, as the PDB format is deprecated.
3The numbers are approximate, as new entries are added to the EMDB weekly. Curation is once a month. Re-refinements are run once a month.
4Here and subsequently, the number in square brackets denotes the EMDB code.
5As the full map may be very large, leading to a frustrating loading time, the maps displayed in the viewer have been cut out around the model with phenix.map_box.
Funding information
We gratefully acknowledge the financial support of NIH/NIGMS through grants 5P01GM063210 and the Phenix Industrial Consortium. This work was supported in part by the US Department of Energy under Contract DE-AC02-05CH11231.
References
Afonine, P. V., Klaholz, B. P., Moriarty, N. W., Poon, B. K., Sobolev, O. V., Terwilliger, T. C., Adams, P. D. & Urzhumtsev, A. (2018). Acta Cryst. D74, 814–840. Web of Science CrossRef IUCr Journals Google Scholar
Afonine, P. V., Poon, B. K., Read, R. J., Sobolev, O. V., Terwilliger, T. C., Urzhumtsev, A. & Adams, P. D. (2018). Acta Cryst. D74, 531–544. Web of Science CrossRef IUCr Journals Google Scholar
Bai, X., McMullan, G. & Scheres, S. H. W. (2015). Trends Biochem. Sci. 40, 49–57. Web of Science CrossRef CAS PubMed Google Scholar
Barad, B. A., Echols, N., Wang, R. Y.-R., Cheng, Y., DiMaio, F., Adams, P. D. & Fraser, J. S. (2015). Nat. Methods, 12, 943–946. Web of Science CrossRef CAS PubMed Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Böttcher, B., Wynne, S. A. & Crowther, R. A. (1997). Nature, 386, 88–91. PubMed Web of Science Google Scholar
Brown, A., Long, F., Nicholls, R. A., Toots, J., Emsley, P. & Murshudov, G. (2015). Acta Cryst. D71, 136–153. Web of Science CrossRef IUCr Journals Google Scholar
Campbell, M. G., Cheng, A., Brilot, A. F., Moeller, A., Lyumkis, D., Veesler, D., Pan, J., Harrison, S. C., Potter, C. S., Carragher, B. & Grigorieff, N. (2012). Structure, 20, 1823–1828. Web of Science CrossRef CAS PubMed Google Scholar
Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef CAS IUCr Journals Google Scholar
Dauter, Z., Wlodawer, A., Minor, W., Jaskolski, M. & Rupp, B. (2014). IUCrJ, 1, 179–193. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N., Kapral, G. J., Wang, X., Murray, L. W., Arendall, W. B., Snoeyink, J., Richardson, J. S. & Richardson, D. C. (2007). Nucleic Acids Res. 35, W375–W383. Web of Science CrossRef PubMed Google Scholar
Deptuch, G., Besson, A., Rehak, P., Szelezniak, M., Wall, J., Winter, M. & Zhu, Y. (2007). Ultramicroscopy, 107, 674–684. Web of Science CrossRef PubMed CAS Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Faruqi, A. R., Cattermole, D. M., Henderson, R., Mikulec, B. & Raeburn, C. (2003). Ultramicroscopy, 94, 263–276. Web of Science CrossRef PubMed CAS Google Scholar
Frank, J. (2006). Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Oxford University Press. Google Scholar
Gore, S., Sanz García, E., Hendrickx, P. M. S., Gutmanas, A., Westbrook, J. D., Yang, H., Feng, Z., Baskaran, K., Berrisford, J. M., Hudson, B. P., Ikegawa, Y., Kobayashi, N., Lawson, C. L., Mading, S., Mak, L., Mukhopadhyay, A., Oldfield, T. J., Patwardhan, A., Peisach, E., Sahni, G., Sekharan, M. R., Sen, S., Shao, C., Smart, O. S., Ulrich, E. L., Yamashita, R., Quesada, M., Young, J. Y., Nakamura, H., Markley, J. L., Berman, H. M., Burley, S. K., Velankar, S. & Kleywegt, G. J. (2017). Structure, 25, 1916–1927. Web of Science CrossRef CAS PubMed Google Scholar
Grosse-Kunstleve, R. W., Moriarty, N. W. & Adams, P. D. (2009). Proceedings of the ASME 2009 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 4, pp. 1477–1485. New York: ASME. Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hand, E. (2020). Science, https://doi.org/10.1126/science.aba9954. Google Scholar
Headd, J. J., Echols, N., Afonine, P. V., Grosse-Kunstleve, R. W., Chen, V. B., Moriarty, N. W., Richardson, D. C., Richardson, J. S. & Adams, P. D. (2012). Acta Cryst. D68, 381–390. Web of Science CrossRef CAS IUCr Journals Google Scholar
Henderson, R., Sali, A., Baker, M. L., Carragher, B., Devkota, B., Downing, K. H., Egelman, E. H., Feng, Z., Frank, J., Grigorieff, N., Jiang, W., Ludtke, S. J., Medalia, O., Penczek, P. A., Rosenthal, P. B., Rossmann, M. G., Schmid, M. F., Schröder, G. F., Steven, A. C., Stokes, D. L., Westbrook, J. D., Wriggers, W., Yang, H., Young, J., Berman, H. M., Chiu, W., Kleywegt, G. J. & Lawson, C. L. (2012). Structure, 20, 205–214. Web of Science CrossRef CAS PubMed Google Scholar
Hooft, R. W. W., Sander, C. & Vriend, G. (1997). Bioinformatics, 13, 425–430. CrossRef CAS Google Scholar
Hunter, J. D. (2007). Comput. Sci. Eng. 9, 90–95. Web of Science CrossRef Google Scholar
Joosten, R. P., Long, F., Murshudov, G. N. & Perrakis, A. (2014). IUCrJ, 1, 213–220. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Joosten, R. P., Salzemann, J., Bloch, V., Stockinger, H., Berglund, A.-C., Blanchet, C., Bongcam-Rudloff, E., Combet, C., Da Costa, A. L., Deleage, G., Diarena, M., Fabbretti, R., Fettahi, G., Flegel, V., Gisel, A., Kasam, V., Kervinen, T., Korpelainen, E., Mattila, K., Pagni, M., Reichstadt, M., Breton, V., Tickle, I. J. & Vriend, G. (2009). J. Appl. Cryst. 42, 376–384. Web of Science CrossRef CAS IUCr Journals Google Scholar
Joosten, R. P., Womack, T., Vriend, G. & Bricogne, G. (2009). Acta Cryst. D65, 176–185. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240–2249. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lawson, C. L., Baker, M. L., Best, C., Bi, C., Dougherty, M., Feng, P., van Ginkel, G., Devkota, B., Lagerstedt, I., Ludtke, S. J., Newman, R. H., Oldfield, T. J., Rees, I., Sahni, G., Sala, R., Velankar, S., Warren, J., Westbrook, J. D., Henrick, K., Kleywegt, G. J., Berman, H. M. & Chiu, W. (2011). Nucleic Acids Res. 39, D456–D464. Web of Science CrossRef CAS PubMed Google Scholar
Lawson, C. L., Kryshtafovych, A., Adams, P. D., Afonine, P. V., Baker, M. L., Barad, B. A., Bond, P., Burnley, T., Cao, R., Cheng, J., Chojnowski, G., Cowtan, K., Dill, K. A., DiMaio, F., Farrell, D. P., Fraser, J. S., Herzik, M. A., Hoh, S. W., Hou, J., Hung, L.-W., Igaev, M., Joseph, A. P., Kihara, D., Kumar, D., Mittal, S., Monastyrskyy, B., Olek, M., Palmer, C. M., Patwardhan, A., Perez, A., Pfab, J., Pintilie, G. D., Richardson, J. S., Rosenthal, P. B., Sarkar, D., Schäfer, L. U., Schmid, M. F., Schröder, G. F., Shekhar, M., Si, D., Singharoy, A., Terashi, G., Terwilliger, T. C., Vaiana, A., Wang, L., Wang, Z., Wankowicz, S. A., Williams, C. J., Winn, M., Wu, T., Yu, X., Zhang, K., Berman, H. M. & Chiu, W. (2020). bioRxiv, 2020.06.12.147033. Google Scholar
Li, X., Mooney, P., Zheng, S., Booth, C. R., Braunfeld, M. B., Gubbens, S., Agard, D. A. & Cheng, Y. (2013). Nat. Methods, 10, 584–590. Web of Science CrossRef CAS PubMed Google Scholar
Liebschner, D., Afonine, P. V., Baker, M. L., Bunkóczi, G., Chen, V. B., Croll, T. I., Hintze, B., Hung, L.-W., Jain, S., McCoy, A. J., Moriarty, N. W., Oeffner, R. D., Poon, B. K., Prisant, M. G., Read, R. J., Richardson, J. S., Richardson, D. C., Sammito, M. D., Sobolev, O. V., Stockwell, D. H., Terwilliger, T. C., Urzhumtsev, A. G., Videau, L. L., Williams, C. J. & Adams, P. D. (2019). Acta Cryst. D75, 861–877. Web of Science CrossRef IUCr Journals Google Scholar
Merk, A., Bartesaghi, A., Banerjee, S., Falconieri, V., Rao, P., Davis, M. I., Pragani, R., Boxer, M. B., Earl, L. A., Milne, J. L. S. & Subramaniam, S. (2016). Cell, 165, 1698–1707. Web of Science CrossRef CAS PubMed Google Scholar
Milazzo, A.-C., Leblanc, P., Duttweiler, F., Jin, L., Bouwer, J. C., Peltier, S., Ellisman, M., Bieser, F., Matis, H. S., Wieman, H., Denes, P., Kleinfelder, S. & Xuong, N.-H. (2005). Ultramicroscopy, 104, 152–159. Web of Science CrossRef PubMed CAS Google Scholar
Moriarty, N. W., Grosse-Kunstleve, R. W. & Adams, P. D. (2009). Acta Cryst. D65, 1074–1080. Web of Science CrossRef CAS IUCr Journals Google Scholar
Moriarty, N. W., Tronrud, D. E., Adams, P. D. & Karplus, P. A. (2016). Acta Cryst. D72, 176–179. Web of Science CrossRef IUCr Journals Google Scholar
Oldfield, T. J. (2001). Acta Cryst. D57, 82–94. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J., Adams, P. D., Arendall, W. B., Brunger, A. T., Emsley, P., Joosten, R. P., Kleywegt, G. J., Krissinel, E. B., Lütteke, T., Otwinowski, Z., Perrakis, A., Richardson, J. S., Sheffler, W. H., Smith, J. L., Tickle, I. J., Vriend, G. & Zwart, P. H. (2011). Structure, 19, 1395–1412. Web of Science CrossRef CAS PubMed Google Scholar
Read, R. J. & Kleywegt, G. J. (2009). Acta Cryst. D65, 140–147. Web of Science CrossRef IUCr Journals Google Scholar
Richardson, J. S., Williams, C. J., Hintze, B. J., Chen, V. B., Prisant, M. G., Videau, L. L. & Richardson, D. C. (2018). Acta Cryst. D74, 132–142. Web of Science CrossRef IUCr Journals Google Scholar
Richardson, J. S., Williams, C. J., Videau, L. L., Chen, V. B. & Richardson, D. C. (2018). J. Struct. Biol. 204, 301–312. Web of Science CrossRef CAS PubMed Google Scholar
Rose, A. S., Bradley, A. R., Valasatava, Y., Duarte, J. M., Prlić, A. & Rose, P. W. (2018). Bioinformatics, 34, 3755–3758. CrossRef CAS PubMed Google Scholar
Rose, A. S. & Hildebrand, P. W. (2015). Nucleic Acids Res. 43, W576–W579. Web of Science CrossRef CAS PubMed Google Scholar
Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745. Web of Science CrossRef PubMed CAS Google Scholar
Scheres, S. H. W. (2012). J. Struct. Biol. 180, 519–530. Web of Science CrossRef CAS PubMed Google Scholar
Sobolev, O. V., Afonine, P. V., Moriarty, N. W., Hekkelman, M. L., Joosten, R. P., Perrakis, A. & Adams, P. D. (2020). Structure, 28, 1249–1258. CrossRef CAS PubMed Google Scholar
Terwilliger, T. C., Adams, P. D., Afonine, P. V. & Sobolev, O. V. (2020). Protein Sci. 29, 87–99. Web of Science CrossRef CAS PubMed Google Scholar
Terwilliger, T. C., Ludtke, S. J., Read, R. J., Adams, P. D. & Afonine, P. V. (2020). Nat. Methods, 17, 923–927. Web of Science CrossRef CAS PubMed Google Scholar
Terwilliger, T. C., Read, R. J., Adams, P. D., Brunger, A. T., Afonine, P. V. & Hung, L.-W. (2013). Acta Cryst. D69, 2244–2250. Web of Science CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Sobolev, O. V., Afonine, P. V. & Adams, P. D. (2018). Acta Cryst. D74, 545–559. Web of Science CrossRef IUCr Journals Google Scholar
Terwilliger, T. C., Sobolev, O. V., Afonine, P. V., Adams, P. D. & Read, R. J. (2020). Acta Cryst. D76, 912–925. CrossRef IUCr Journals Google Scholar
Urzhumtsev, A., Afonine, P. V. & Adams, P. D. (2009). Acta Cryst. D65, 1283–1291. Web of Science CrossRef CAS IUCr Journals Google Scholar
Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293–315. Web of Science CrossRef CAS PubMed Google Scholar
Wlodawer, A. (2017). Methods Mol. Biol. 1607, 595–610. CrossRef CAS PubMed Google Scholar
Word, J. M., Lovell, S. C., LaBean, T. H., Taylor, H. C., Zalis, M. E., Presley, B. K., Richardson, J. S. & Richardson, D. C. (1999). J. Mol. Biol. 285, 1711–1733. Web of Science CrossRef CAS PubMed Google Scholar
wwPDB Consortium (2019). Nucleic Acids Res. 47, D520–D528. Web of Science CrossRef PubMed Google Scholar
Xie, Q., Spilman, M., Meyer, N. L., Lerch, T. F., Stagg, S. M. & Chapman, M. S. (2013). J. Struct. Biol. 184, 129–135. CrossRef CAS PubMed Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.