research papers
Modes and model building in SHELXE
aICREA, Institució Catalana de Recerca i Estudis Avançats, Passeig Lluís Companys, 23, Barcelona, E-08003, Spain, bCrystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB-CSIC), Barcelona Science Park, Helix Building, Baldiri Reixach, 15, Barcelona, 08028, Spain, and cDepartment of Structural Chemistry, Georg-August Universität Göttingen, Tammannstrasse 4, 37077 Göttingen, Germany
*Correspondence e-mail: uson@ibmb.csic.es
Density modification is a standard step to provide a route for routine structure solution by any experimental phasing method, with single-wavelength or multi-wavelength anomalous diffraction being the most popular methods, as well as to extend fragments or incomplete models into a full solution. The effect of density modification on the starting maps from either source is illustrated in the case of SHELXE. The different modes in which the program can run are reviewed; these include less well known uses such as reading external phase values and weights or phase distributions encoded in Hendrickson–Lattman coefficients. Typically in SHELXE, initial phases are calculated from experimental data, from a partial model or map, or from a combination of both sources. The initial phase set is improved and extended by density modification and, if the resolution of the data and the type of structure permits, polyalanine tracing. As a feature to systematically eliminate model bias from phases derived from predicted models, the trace can be set to exclude the area occupied by the starting model. The trace now includes an extension into the gamma position or hydrophobic and aromatic side chains if a sequence is provided, which is performed in every tracing cycle. Once a of over 30% between the structure factors calculated from such a trace and the native data indicates that the structure has been solved, the sequence is docked in all model-building cycles and side chains are fitted if the map supports it. The extensions to the tracing algorithm brought in to provide a complete model are discussed. The improvement in phasing performance is assessed using a set of tests.
Keywords: model building; phasing; density modification; MRSAD; SHELXE.
1. Introduction
Starting phases from ; Hendrickson et al., 1985) are often not accurate enough to make the solution of a macromolecular structure evident and to allow the building a complete model for Still, once a starting solution is obtained it is possible to constrain the electron density to conform to previous structural knowledge. Upon back-transformation of the modified map, combination with the transformed phases rendered is used to improve the original phases. Such procedures are called density modification and were pioneered for macromolecules by Main (1967), while the first successful application of density modification was reported for small molecules by Hoppe and Gassmann in their phase-correction method (Hoppe & Gassmann, 1968).
(MR) or experimental phasing (see, for example, Read, 2001Many sophisticated density-modification schemes have been proposed and have been incorporated into widely used programs such as DM (Cowtan & Main, 1998), SOLOMON (Abrahams & Leslie, 1996) and RESOLVE (Terwilliger, 2000). Effective concepts for macromolecular density modification include (NCS) averaging (Main, 1967; Bricogne, 1976; Kleywegt & Read, 1997), solvent flattening (Wang, 1985), histogram matching (Zhang & Main, 1990), solvent flipping (Abrahams, 1997) and statistical approaches (Terwilliger, 2000, 2003; Cowtan, 2000). interpretation is extremely powerful as it extends information over the whole resolution range and is widely used, for example in the ARP/wARP algorithms (Perrakis et al., 2001).
SHELXE implements an alternative approach aiming to enforce stereochemical knowledge, the sphere-of-influence algorithm (Sheldrick, 2002), which is iterated with main-chain tracing (Sheldrick, 2010). In SHELXE, map interpretation has now been extended into the side chains to improve the phases and provide a more complete model.
2. Density modification in SHELXE
2.1. General principles
Fig. 1 shows a scheme representing phase improvement by density modification, adapted from the relevant chapter in International Tables For Crystallography (Zhang et al., 2001), to illustrate its practical effect in SHELXE. The idea is general, whether the starting phases have been determined through experimental phasing, nowadays most frequently single-wavelength anomalous diffraction (SAD) or multi-wavelength anomalous diffraction (MAD), or calculated from a partial model placed by or through a combination of both sources, as in MRSAD (Panjikar et al., 2009). Once approximate phases are available for a structure, an electron-density map can be computed. This density can be modified based on assumptions of the general physical properties underlying structure: for instance, that in X-ray diffraction density should never be negative (Karle & Hauptman, 1964). Prior knowledge and statistical analysis of its properties are brought into the process. The modified map can be inverted back to calculate structure factors and the resulting phases are expected to have improved. Combining them with the original phases and the experimental amplitudes, a new, presumably better, map is calculated to initiate a fresh iteration. Fig. 1 illustrates the effect of density modification in the case of a protein originally phased using ARCIMBOLDO (Millán et al., 2015), displaying helices decorating a central β-sheet and containing zinc sites (PDB entry 6ys7). Initial phases are calculated from a single polyalanine helix of 16 amino acids placed by Phaser (McCoy et al., 2007), emphasized in the original map as it provides the only clearly defined feature. As the phases are calculated from this helix, the resulting map would in any case show such helical density due to model bias (Brünger, 1997; Luebben & Gruene, 2015; for an animated illustration see https://chango.ibmb.csic.es/resource/colibri.html), even if the true structure does not contain a helix in this position. The map produced after 20 cycles of density modification has developed new features in areas where no initial model was present: in particular, clear density is apparent for a second helix, whereas interpreting the central β-sheet would be more challenging. Eventually, the map after additional model building followed by fresh cycles of density modification becomes very clear, with correct density extending to the side chains. Fig. 1(b) shows the map calculated for the same protein in a different, zinc-containing crystal (PDB entry 6ysd) from the raw SAD phases, after resolving the twofold ambiguity and adding the heavy-atom contribution. In contrast to the initial, model bias-dominated, fragment-derived maps in Fig. 1(a), the signal is more evenly distributed and noise is present everywhere in the map. Fig. 1(c) displays the resulting map for this data set after density modification and main-chain tracing.
2.2. Initial phase information: the different modes in SHELXE
In practice, SHELXE supports several sources of phase information as input, be it from an external calculation with a different program or internally generated and combined. Experimental phasing can be exploited for single with (SIRAS), single (SIR), radiation-damage-induced phasing (RIP), MAD or, as in the case summarized in Table 1, SAD, after data preparation with SHELXC. Multiple (MIR) or MIR with (MIRAS) would require an external program, such as SHARP (Vonrhein et al., 2007). Alternatively, phases can be calculated from a (partial) model provided in orthogonal (PDB) or crystal coordinates placed by MR (Thorn & Sheldrick, 2013). Map coefficients, phases and weights in the SHELXE format .phi (renamed the .phs format), structure factors in .fcf format, generated with SHELXL (Sheldrick, 2015), or phase distributions encoded as Hendrickson–Lattman coefficients (Hendrickson & Lattman, 1970) constitute suitable input. Experimental and map or model phases can be combined, either providing a located, for example, with SHELXD (Usón & Sheldrick, 1999) or ANoDe (Thorn & Sheldrick, 2011) or through an internal cross-Fourier synthesis and peak search. If the is given, SHELXE will refer it to the same origin as the model and invert it if necessary. It is also possible to perform density modification on the phases derived from the model or map and have SHELXE determine the in the final cycle. This last procedure would not combine both sources of phase information but may be useful in the context of some CCP4 pipelines (Agirre et al., 2023), for a subsequent iteration or to identify correctly placed partial models.
|
The various modes of running the program are illustrated for the case of an apoferritin structure (PDB entry 2g4h; Mueller-Dieckmann et al., 2007). Results are summarized in Table 1. The single data set, to a resolution of 2 Å, contains anomalous signal from the cadmium cations present. A helical polyalanine fragment of 32 residues was placed with Coot (Emsley et al., 2010) as a partial model fragment to provide phases from the derived coordinates and temperature factors, map and structure factors. From the results shown in Table 1, it can be seen that the used in the SAD experiment, reflecting the sites and occupancies in the deposited PDB entry, can be improved for phasing purposes. Actually, better results are obtained with fewer atoms, and the modes locating the from the partial model or map return the four major cadmium sites. SHELXE can also refine the anomalous provided (-z). In addition, it is possible to read in phase distributions, as shown in the table for this example, which were generated with SHARP starting from the same sites. Anisotropic of the accounting for various types of scatterers and starting from a phase distribution, may be required for difficult cases. The values in Table 1 show how this improved start leads to a much better map upon 20 cycles of density modification. In addition, this route will be necessary to solve a structure from a MIRAS experiment.
A partial model provides an alternative start, with slight differences between the PDB model and the map, as the map is used as provided, whereas the PDB model is trimmed to optimize the correlation to the native data of the structure factors calculated from it. Also, the default values for sharpening vary slightly for the orthogonal and crystal coordinate formats, due to their typical use contexts. MRSAD combination of experimental with model or map phases yields the best results. Any of these starts leads to the equivalent solved structure when model building is performed, and Table 1 shows the resulting phase errors when three cycles of main-chain tracing of the map are used to improve the phases.
2.3. The sphere-of-influence algorithm
Classical density modification, as established by B.-C. Wang (1985), divides the map into protein and solvent regions. Protein regions present the largest density fluctuations and their features should be further enhanced, whereas the density in the more featureless solvent region should be low and uniform and is accordingly flattened. Unique to SHELXE, the sphere-of-influence algorithm (Sheldrick, 2002) avoids locating and smoothing the boundary between the protein and the solvent. In this algorithm, the variance V of the density is calculated for a spherical surface of radius 2.42 Å (a typical 1,3 distance in a macromolecule) around each voxel in the map. For voxels with a low variance, as would be expected within the solvent region, the density at the voxel is `flipped' (ρ′ = −γρ, where γ is typically 1.1 but may be set by the user). The procedure is related to the γ-correction (Abrahams, 1997), except that it does not require an explicit solvent boundary. For voxels with a high variance, typical for protein regions, the density is reset to zero if negative and is otherwise left unchanged or subjected to a sharpening function (ρmod = [ρ4/(ν2σ2(ρ) + ρ2)]1/2, with ν being resolution dependent and larger the higher the resolution). This function is similar in its behaviour to that used in ACORN (Foadi et al., 2000). For intermediate values of the variance, SHELXE applies a of the corrected values for the protein and the solvent regions. Sharpening is particularly effective at high resolution or for experimental phases, and in default use is downweighted for fragment-derived phases.
2.4. Extension of partial structures
Main-chain interpretation was incorporated into SHELXE to support phase improvement (Sheldrick, 2010) and was extended to include secondary-structure and tertiary-structure constraints for lower resolution purposes (Usón & Sheldrick, 2018). SHELXE typically traces one third to one half of the final structure in order to avoid compromising on accuracy, because deviations from the correct structure tend to quench the extension process. The factors underlying the chance of success in the extension of a with given diffraction data are well understood, if not predictable in a quantitative way. They are illustrated in Fig. 2 using four contrasting examples. The structure of aldose reductase (PDB entry 4lbs), shown in Fig. 2(a), in complex with the bromine-containing ligand {2-[(4-bromo-2,6-difluorobenzyl)carbamoyl]-5-chlorophenoxy}acetic acid and NADP+ is, at 0.76 Å, one of the highest resolution structures deposited in the PDB (Fanfrlik et al., 2013) for the comparatively large content of the 316 amino acids. Nevertheless, the Br atom, which can be placed from the native Patterson (Patterson, 1935), suffices to expand 85% of the structure. Fig. 2(a) illustrates how although the initial phases are characterized by an extremely high phase error (wMPE),
of above 80°, even density modification alone succeeds in very slowly improving the phase information, so that after 500 cycles the average phase errors have decreased by ∼5°. Main-chain autotracing accelerates convergence, and two rounds, interspersed with ten cycles of density modification, bring the errors down to 35°. Subsequent density modification brings the wMPE down to 15° (not shown in the figure). This constitutes a residual difference, given that the final deposited structure used as reference contains a model accounting for features established in the course of a high-resolution
that are outside the scope of the model used in phasing: H atoms, (anisotropic) displacement parameters, multiple conformations for disordered regions, bulk-solvent correction and scaling. Phase differences to the deposited structure are therefore never expected to be zero after density modification.The resolution yielded by these aldose reductase crystals is extremely unusual, so in contrast PDB entry 1buu exemplifies a structure diffracting to a more typical resolution of 1.93 Å which can also be extended from a single atom to its 150 independent amino acids. Solution starts from a holmium(III) cation to provide starting phases characterized by an already remarkably low wMPE of below 60° (Fig. 2b). It should be remarked that holmium(III), with 64 rather than 36 electrons as in Br−, represents a considerable contribution to the total scattering. In Fig. 2(b), a steeper convergence can be observed during the 40 cycles displayed in the SHELXE run, where main-chain tracing is used along with density modification versus the run where only density modification is used. This difference becomes negligible if both processes are allowed to run for more cycles until convergence, as in this and many other cases the final result will be limited by the quality of the data rather than by the starting phase information.
Modern synchrotrons and in-house diffractometers should be able to extract useful anomalous signal for experimental phasing, rendering the first two examples somewhat academic. Nevertheless, in the absence of heavy atoms the same pattern can be seen. Four residues (barely an α-helix turn) suffice, when correctly placed, to phase PDB entry 1zzk (0.95 Å resolution). As seen in Fig. 2(c), in this case tracing accelerates convergence, even though in its absence the same very low overall errors are eventually reached when more cycles are performed (not shown in the figure). Incomplete models at medium resolution may require autotracing for extension into a full solution; 10–15% of the main chain is typically enough at resolutions around 2 Å, as exemplified in Fig. 2(d) in the case of human myosin 5B (PDB entry 4j5m; Nascimento et al., 2013). This protein is 396 residues long, the data extend to 2.1 Å resolution and it can be phased from two helices of 17 residues each, whereas without building the main-chain model these starting phases deteriorate and no solution is achieved. Cases such as those described constitute frequent targets in pipelines such as AMPLE (Bibby et al., 2012; Rigden et al., 2018; Simpkin et al., 2019), MrBUMP (Keegan et al., 2018) or ARCIMBOLDO_LITE (Sammito et al., 2015).
The availability of high-resolution data has so far been critical for the extension of features outside the placed partial structures when these constitute a very limited fraction of the content of the et al., 2007), in what has been named the `free lunch' algorithm. This option is used to generate electron-density maps and can produce spectacular results for high resolution and/or high solvent content.
Some improvement is achieved by extrapolating unmeasured data, whether missing low-resolution data or reflections beyond the resolution limit (Usón2.5. Phase information from predicted models
The advent of accurately predicted models frequently allows routine molecular-replacement solution of crystallographic structures using AlphaFold (Jumper et al., 2021) or RoseTTAFold models (Baek et al., 2021), for instance exploiting the optimized tools and pipelines available in CCP4 (Simpkin et al., 2023). SHELXE now provides a feature to allow systematic elimination of model bias: -V will exclude the area occupied by the starting model from tracing. This feature is used for verification (Caballero et al., 2018) within ARCIMBOLDO_SHREDDER (Sammito et al., 2014: Millán et al., 2018), which was originally designed to identify and refine the closest fragments present in a remote homolog structure. Within its dedicated mode for predicted models (Medina et al., 2022), combining traces from different fragments ensures that the resulting solution is derived only from the inferences of each of these fragments while the original model has been systematically eliminated.
2.6. No structure solution despite partially correct phase information
Borderline cases occur where despite partially correct start phases it may still not be possible to solve the structure by interpreting the experimental map, extending a β-structures. In such cases, the brute-force method implemented in SLIDER of probing all favourable side-chain assignments onto a trace can extract a solution (Borges et al., 2020). Our experience with this program underlies the choices to extend the model towards the side chains in the SHELXE implementation. Other approaches such as the sophisticated combination of building and developed over the last three decades in ARP/wARP (Chojnowski et al., 2020) should also be mentioned here. In the case of SLIDER, we observed that in the absence of powerful hardware to support the arduous calculations associated with probing all possible side-chain assignments, simplified modes considering only aromatic and hydrophobic residues or even reducing every side chain to a serine (Schwarzenbacher et al., 2004) occasionally allowed a complete solution to be obtained from a poor start.
or eliminating the errors in a correctly placed model that presents large geometrical differences to the target. Locating the anomalous or solving the molecular-replacement problem is not necessarily equivalent to solving the structure. When the starting phases are not accurate enough, and the more so the poorer the resolution, the correct starting information cannot successfully be extended and nonrandom starts remain unsolved. Also, a high percentage of helical structure is advantageous versus predominantly2.7. Gamma extension and map probing
Even in the absence of sequence information, it can be safely assumed that most residues will have a side chain with a C, O or S atom in the gamma position. As the density modification proceeds, main-chain electron density tends to be revealed earlier and more prominently, but even with large mean phase errors (see Fig. 1) clear electron density starts to show in parts of the structure. Gamma positions typically cluster in one of three staggered positions (Lovell et al., 2000). Probing density in each of these at a compromise distance of 1.47 Å, between the shorter CB–OG distance in serine and the CB–CG distance in most amino acids, establishes whether the difference between the highest and lowest electron-density value is significant. Furthermore, it allows the detection of features in the map. In every autotracing cycle the trace is probed at the gamma position, provided that there is clear density for the beta position. Otherwise, the residue is annotated as a probable glycine. If there is clear discrimination between the highest and lowest density in the alternative position, the residue is modelled as pseudo-serine, with a slightly longer distance. Fig. 3(a) illustrates the inclusion of some gamma positions in the trace of a map for PDB entry 4ici, when it still has a large MPE of above 70°. If the maximum is not in trans, the ±30° conformation particular to proline will be probed with the appropriate geometry for its gamma carbon (Fig. 3b). Finally, if the intermediate and highest density values are similar and clearly discriminated from the lowest valuee, the residue is annotated as a probable valine, threonine or isoleucine. Gamma sites are included in the calculation of trace-derived phases for the next cycle. The improvement that this provides is modest in the first cycle, less than 2° in the best cases, but in no case has it been found to deteriorate the phases, as Table 2 shows for a set of test cases.
|
2.8. Sequence docking
Sequence docking is performed by combining experimental evidence from the features in the current electron-density map and previous structural knowledge.
Sequence information is input through a FASTA format file named with the root name of the data file and the extension .seq. If the flag -O2 is set but no sequence file is provided, the program will issue a warning and will perform only gamma tracing to aid phasing, so that pipelines do not fail due to the lack of this file. To assign the corresponding sequence to each of the traced chains, these are considered from longest to shortest. Probabilities are then calculated for all possible sequence assignments obtained by sliding a copy of the sequence, with two trailing dummy residues on each end. Probabilities are calculated as the sum of the individual probabilities of each residue in the trace on a logarithmic scale, combining the score obtained probing the electron density and the probability derived from prior structural knowledge, following the scheme described in McCoy (2004).
of anomalous scatterers comprising selenium from selenomethionine or sulfur from cysteine and/or methionine, its atoms are used as sequence markers (Fig. 3There is a vast amount of prior knowledge on sequence–structure relationships, but the reliability of secondary-structure prediction from the sequence is limited in the absence of comprehensive PDB data (Berman et al., 2000; Lange et al., 2020). It would be possible to involve the powerful methods implemented in HHpred (Söding et al., 2005) as an external dependency, but rather than a comprehensive sequence analysis, SHELXE sequence assignment is guided by more robust principles, which stuck out clearly enough to be identified when only a very small subset of protein structures had been determined. Notably, the overall propensities of some amino acids to form secondary structure (Chou & Fasman, 1974), particularly the residues that typically terminate/initiate secondary-structure elements or marking loops (Richardson & Richardson, 1988; for example the proline displayed in Fig. 3b), and the consistent association of hydrophobic residues upon sequence docking with the trace of a strand or a helix (Eisenberg et al., 1984). Fundamentally, it is the available electron-density map that can be interrogated and the secondary structure of the trace internally described with characteristic vectors (Medina et al., 2020).
2.9. Error correction
As sequence docking relies on correct main-chain tracing, a previous step to locate and remove connections with unusual Ramachandran values (Hollingsworth & Karplus, 2010) and lower density than that of flanking connections has been introduced to precede sequence docking. In general, to avoid errors the criterion used to accept an assignment is that it needs to be distinctly better than any alternative. Before incorporating side chains into the final trace (Fig. 3d), a comparison of the (CC, expressed as a percentage) calculated omitting the side chains for every stretch of chain is performed, analogous to the PDB optimization step introduced in SHELX macromolecular phasing (Sheldrick & Gould, 1995). If the CC characterizing the polyalanine trace is higher than when side chains are incorporated, they are eliminated from that part of the model.
3. Tracing tests
Fig. 4 displays the results of the phasing and model building of apoferritin (Mueller-Dieckmann et al., 2007) described in Section 2.2. Starting phases are derived from anomalous difference data and cadmium sites or the phase-probability distributions calculated therefrom as Hendrickson–Lattman coefficients. Alternatively, starting phases of similar accuracy originate from a fragment of a long helix provided as a PDB file or in fractional crystal coordinates, as well as structure factors or a map calculated from the same model. Combinations of SAD and fragment or map phases are also included. The last model-building cycle involves side-chain tracing. In all of these cases, results after three groups of 20 cycles of density modification interspersed with map tracing are comparable. Nevertheless, in the cases where more complete starting information, combining model/map and SAD phases, is used convergence is faster and a more complete trace is already present in the first cycles.
Furthermore, 11 structures with resolutions ranging from 1.2 to 2.0 Å have been used to test and illustrate the new side-chain tracing features in SHELXE described above. Table 3 summarizes the characteristics, parameterization and phasing results obtained by incorporating side-chain tracing in SHELXE for SAD phasing and fragment cases where the atoms in the or starting structure can be used as markers, as well as when this is not the case. For molecular-replacement solutions, initial phases should be limited (with the parameter -y) to a resolution dictated by the r.m.s.d. and extended in the course of density modification; the more limited the lower the identity between template and structure. For fragment phases, this default should be changed to use the full resolution available as small fragments should be accurate to be able to solve a structure. Hirustasin (Usón et al., 1999) and bucandin (Kuhn et al., 2000) have each been extended from ten sulfurs; SAD data for insulin and glucose isomerase were measured from nonmerohedrally twinned crystals (Sevvana et al., 2019). The SusD protein (PDB entry 3l22) was originally solved using a MAD experiment (Vollmar et al., 2020); for the purpose of this study it was phased from the peak wavelength used as SAD data at a resolution of 1.9 Å. VirusX CAS3 (Freitag-Pohl et al., 2019) was the first previously unknown structure where we used the SHELXE development version to build a model with side chains. The flavoprotein with PDB code 4ici in I41 was incorporated as a test where space-group inversion has to be performed at (0.5, 0.25, 0.5) rather than at the origin, an operation that is performed when phases from the fragment and from the pre-calculated anomalous need to be referred to the same origin. Proteinase K (Wang et al., 2006) and AmiA (M. Alcorlo, M. R. Abdullah, S. Hammerschmidt & J. Hermoso, unpublished work) are proteins phased from fragments of homologs, placed and refined with ARCIMBOLDO_SHREDDER (Millán et al., 2018). The first is a test case starting from a map, as ALIXE (Millán et al., 2020) combines solutions in and outputs a set of phases and figures of merit. The second was originally solved with ARCIMBOLDO_SHREDDER and the test starts from a fragment.
|
Finally, PilA1 (Crawshaw et al., 2020) was originally solved with ARCIMBOLDO_LITE and contains three chains of 150 amino acids. For structures such as this one, with NCS, the FASTA format file read by SHELXE should explicitly contain a copy of the sequence for each of the chains present. The tracing results for all these cases are shown in Fig. 5.
4. Concluding remarks
This paper provides an overview of model building in SHELXE and of its effect as a constraint on density modification. It also describes all of the different modes in which SHELXE can be used and showcases how density modification can make a decisive contribution to phasing, which is sometimes hidden within pipelines.
The tracing algorithms have been expanded to enhance performance in borderline cases and to provide more complete models. Thus, the incorporation of side-chain atoms extends the phasing improvement brought about by model building. Furthermore, obtaining a more complete model, with side chains assigned and fitted to the density, is convenient. In the absence of a sequence or at lower resolution, tracing of polyserine has been added to SHELXE in all tracing cycles to increase model scattering. If a sequence is provided SHELXE can assign and fit side chains to the trace, which in the tests presented has been performed in the last tracing cycle, after extension of the gamma position in all previous cycles.
In view of the results presented, the recommended use, corresponding to the default triggered by the flag -O, when a sequence is provided is that gamma extension and density probing will be performed in every autotracing cycle, incorporating probable side chains for aromatic and hydrophobic residues with a partial occupancy of 0.6 into the trace used to generate phases for the next round of density modification. Still, models are output as polyalanine at this stage. Once the CC characterizing the trace reaches 30%, sequence docking will be performed in all remaining autotracing cycles, the best scored model with side chains will be saved as a PDB file and its derived phases will be combined in the calculation of the output map. SHELXE is often encountered within phasing pipelines, where a of greater than 25% between the structure factors calculated from the polyalanine trace and the native data is adopted as an indication that the structure has been solved at a resolution of 2.5 Å or better. As seen in Table 3, CC values up to twice those typically rendered by main-chain tracing are obtained from the complete models. Therefore, the procedure implemented ensures that the CC value will be consulted in the polyalanine trace and that the most complete and correct model will be output incorporating side chains for a solved structure.
The performance of this procedure was assessed within the Auto-Rickshaw pipeline on a set of 40 structures that had not previously been used to develop the algorithms. The resolution in this pool of structures ranged between 2.0 and 2.4 Å, yielding improved results over the previous distributed version and nearly complete models.
At low resolution, model bias becomes a concern in the face of practically complete starting models. It is planned to extend the current feature (-V) to systematically exclude model bias.
Acknowledgements
We thank Clemens Vonrhein for his help in generating the apoferritin HL-encoded starting phase distribution with SHARP and Santosh Panjikar for thoroughly testing the new model building on an independent pool of structures within his Auto-Rickshaw pipeline and for useful discussion and feedback. We are grateful to the State of Niedersachsen GMS.
Funding information
IU acknowledges grants PGC2018-101370-B-I00 and PID2021-128751NB-I00 (Ministry of Science and Innovation/Spanish State Research Agency/European Regional Development Fund/European Union), support from the Science and Technology Facilities Council (CCP4-ARCIMBOLDO_LOW) and support for a Visiting Fellowship at Clare Hall College, Cambridge.
References
Abrahams, J. P. (1997). Acta Cryst. D53, 371–376. CrossRef CAS Web of Science IUCr Journals Google Scholar
Abrahams, J. P. & Leslie, A. G. W. (1996). Acta Cryst. D52, 30–42. CrossRef CAS Web of Science IUCr Journals Google Scholar
Agirre, J., Atanasova, M., Bagdonas, H., Ballard, C. B., Baslé, A., Beilsten-Edmands, J., Borges, R. J., Brown, D. G., Burgos-Mármol, J. J., Berrisford, J. M., Bond, P. S., Caballero, I., Catapano, L., Chojnowski, G., Cook, A. G., Cowtan, K. D., Croll, T. I., Debreczeni, J. É., Devenish, N. E., Dodson, E. J., Drevon, T. R., Emsley, P., Evans, G., Evans, P. R., Fando, M., Foadi, J., Fuentes-Montero, L., Garman, E. F., Gerstel, M., Gildea, R. J., Hatti, K., Hekkelman, M. L., Heuser, P., Hoh, S. W., Hough, M. A., Jenkins, H. T., Jiménez, E., Joosten, R. P., Keegan, R. M., Keep, N., Krissinel, E. B., Kolenko, P., Kovalevskiy, O., Lamzin, V. S., Lawson, D. M., Lebedev, A. A., Leslie, A. G. W., Lohkamp, B., Long, F., Malý, M., McCoy, A. J., McNicholas, S. J., Medina, A., Millán, C., Murray, J. W., Murshudov, G. N., Nicholls, R. A., Noble, M. E. M., Oeffner, R., Pannu, N. S., Parkhurst, J. M., Pearce, N., Pereira, J., Perrakis, A., Powell, H. R., Read, R. J., Rigden, D. J., Rochira, W., Sammito, M., Sánchez Rodríguez, F., Sheldrick, G. M., Shelley, K. L., Simkovic, F., Simpkin, A. J., Skubak, P., Sobolev, E., Steiner, R. A., Stevenson, K., Tews, I., Thomas, J. M. H., Thorn, A., Valls, J. T., Uski, V., Usón, I., Vagin, A., Velankar, S., Vollmar, M., Walden, H., Waterman, D., Wilson, K. S., Winn, M. D., Winter, G., Wojdyr, M. & Yamashita, K. (2023). Acta Cryst. D79, 449–461. Web of Science CrossRef IUCr Journals Google Scholar
Backe, P. H., Messias, A. C., Ravelli, R. B., Sattler, M. & Cusack, S. (2005). Structure, 13, 1055–1067. Web of Science CrossRef PubMed CAS Google Scholar
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., Wang, J., Cong, Q., Kinch, L. N., Schaeffer, R. D., Millán, C., Park, H., Adams, C., Glassman, C. R., DeGiovanni, A., Pereira, J. H., Rodrigues, A. V., van Dijk, A. A., Ebrecht, A. C., Opperman, D. J., Sagmeister, T., Buhlheller, C., Pavkov-Keller, T., Rathinaswamy, M. K., Dalwadi, U., Yip, C. K., Burke, J. E., Garcia, K. C., Grishin, N. V., Adams, P. D., Read, R. J. & Baker, D. (2021). Science, 373, 871–876. Web of Science CrossRef CAS PubMed Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Bibby, J., Keegan, R. M., Mayans, O., Winn, M. D. & Rigden, D. J. (2012). Acta Cryst. D68, 1622–1631. Web of Science CrossRef IUCr Journals Google Scholar
Borges, R. J., Meindl, K., Triviño, J., Sammito, M., Medina, A., Millán, C., Alcorlo, M., Hermoso, J. A., Fontes, M. R. M. & Usón, I. (2020). Acta Cryst. D76, 221–237. Web of Science CrossRef IUCr Journals Google Scholar
Bricogne, G. (1976). Acta Cryst. A32, 832–847. CrossRef CAS IUCr Journals Web of Science Google Scholar
Brünger, A. T. (1997). Methods Enzymol. 277, 366–396. CrossRef PubMed CAS Web of Science Google Scholar
Caballero, I., Sammito, M., Millán, C., Lebedev, A., Soler, N. & Usón, I. (2018). Acta Cryst. D74, 194–204. Web of Science CrossRef IUCr Journals Google Scholar
Chojnowski, G., Choudhury, K., Heuser, P., Sobolev, E., Pereira, J., Oezugurel, U. & Lamzin, V. S. (2020). Acta Cryst. D76, 248–260. Web of Science CrossRef IUCr Journals Google Scholar
Chou, P. Y. & Fasman, G. D. (1974). Biochemistry, 13, 222–245. CrossRef CAS PubMed Web of Science Google Scholar
Cowtan, K. (2000). Acta Cryst. D56, 1612–1621. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cowtan, K. & Main, P. (1998). Acta Cryst. D54, 487–493. Web of Science CrossRef CAS IUCr Journals Google Scholar
Crawshaw, A. D., Baslé, A. & Salgado, P. S. (2020). Acta Cryst. D76, 261–271. Web of Science CrossRef IUCr Journals Google Scholar
Eisenberg, D., Weiss, R. M. & Terwilliger, T. C. (1984). Proc. Natl Acad. Sci. USA, 81, 140–144. CrossRef CAS PubMed Web of Science Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fanfrlik, J., Kolar, M., Kamlar, M., Hurny, D., Ruiz, F. X., Cousido-Siah, A., Mitschler, A., Rezac, J., Munusamy, E., Lepsik, M., Matejicek, P., Vesely, J., Podjarny, A. & Hobza, P. (2013). ACS Chem. Biol. 8, 2484–2492. Web of Science CAS PubMed Google Scholar
Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137–1147. Web of Science CrossRef CAS IUCr Journals Google Scholar
Freitag-Pohl, S., Jasilionis, A., Håkansson, M., Svensson, L. A., Kovačič, R., Welin, M., Watzlawick, H., Wang, L., Altenbuchner, J., Płotka, M., Kaczorowska, A. K., Kaczorowski, T., Nordberg Karlsson, E., Al-Karadaghi, S., Walse, B., Aevarsson, A. & Pohl, E. (2019). Acta Cryst. D75, 1028–1039. Web of Science CrossRef IUCr Journals Google Scholar
Hendrickson, W. A. & Lattman, E. E. (1970). Acta Cryst. B26, 136–143. CrossRef CAS IUCr Journals Google Scholar
Hendrickson, W. A., Smith, J. L. & Sheriff, S. (1985). Methods Enzymol. 115, 41–55. CrossRef CAS PubMed Google Scholar
Hollingsworth, S. A. & Karplus, P. A. (2010). Biomol. Concepts, 1, 271–283. CrossRef CAS PubMed Google Scholar
Hoppe, W. & Gassmann, J. (1968). Acta Cryst. B24, 97–107. CrossRef CAS IUCr Journals Web of Science Google Scholar
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P. & Hassabis, D. (2021). Nature, 596, 583–589. Web of Science CrossRef CAS PubMed Google Scholar
Karle, J. & Hauptman, H. (1964). Acta Cryst. 17, 392–396. CrossRef CAS IUCr Journals Web of Science Google Scholar
Keegan, R. M., McNicholas, S. J., Thomas, J. M. H., Simpkin, A. J., Simkovic, F., Uski, V., Ballard, C. C., Winn, M. D., Wilson, K. S. & Rigden, D. J. (2018). Acta Cryst. D74, 167–182. Web of Science CrossRef IUCr Journals Google Scholar
Kleywegt, G. J. & Read, R. J. (1997). Structure, 5, 1557–1569. Web of Science CrossRef CAS PubMed Google Scholar
Kuhn, P., Deacon, A. M., Comsa, D.-S., Rajaseger, G., Kini, R. M., Usón, I. & Kolatkar, P. R. (2000). Acta Cryst. D56, 1401–1407. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lange, J., Baakman, C., Pistorius, A., Krieger, E., Hooft, R., Joosten, R. P. & Vriend, G. (2020). Protein Sci. 29, 330–344. Web of Science CrossRef CAS PubMed Google Scholar
Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. (2000). Proteins, 40, 389–408. Web of Science CrossRef PubMed CAS Google Scholar
Luebben, J. & Gruene, T. (2015). Proc. Natl Acad. Sci. USA, 112, 8999–9003. Web of Science CrossRef CAS PubMed Google Scholar
Main, P. (1967). Acta Cryst. 23, 50–54. CrossRef CAS IUCr Journals Web of Science Google Scholar
McCoy, A. J. (2004). Acta Cryst. D60, 2169–2183. Web of Science CrossRef CAS IUCr Journals Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
Medina, A., Jiménez, E., Caballero, I., Castellví, A., Triviño Valls, J., Alcorlo, M., Molina, R., Hermoso, J. A., Sammito, M. D., Borges, R. & Usón, I. (2022). Acta Cryst. D78, 1283–1293. Web of Science CrossRef IUCr Journals Google Scholar
Medina, A., Triviño, J., Borges, R. J., Millán, C., Usón, I. & Sammito, M. D. (2020). Acta Cryst. D76, 193–208. Web of Science CrossRef IUCr Journals Google Scholar
Millán, C., Jiménez, E., Schuster, A., Diederichs, K. & Usón, I. (2020). Acta Cryst. D76, 209–220. Web of Science CrossRef IUCr Journals Google Scholar
Millán, C., Sammito, M. & Usón, I. (2015). IUCrJ, 2, 95–105. Web of Science CrossRef PubMed IUCr Journals Google Scholar
Millán, C., Sammito, M. D., McCoy, A. J., Nascimento, A. F. Z., Petrillo, G., Oeffner, R. D., Domínguez-Gil, T., Hermoso, J. A., Read, R. J. & Usón, I. (2018). Acta Cryst. D74, 290–304. Web of Science CrossRef IUCr Journals Google Scholar
Mueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366–380. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nascimento, A. F. Z., Trindade, D. M., Tonoli, C. C. C., de Giuseppe, P. O., Assis, L. H. P., Honorato, R. V., de Oliveira, P. S. L., Mahajan, P., Burgess-Brown, N., von Delft, F., Larson, R. E. & Murakami, M. T. (2013). J. Biol. Chem. 288, 34131–34145. Web of Science CrossRef CAS PubMed Google Scholar
Ng, K. K., Park-Snyder, S. & Weis, W. I. (1998). Biochemistry, 37, 17965–17976. Web of Science CrossRef CAS PubMed Google Scholar
Panjikar, S., Parthasarathy, V., Lamzin, V. S., Weiss, M. S. & Tucker, P. A. (2009). Acta Cryst. D65, 1089–1097. Web of Science CrossRef CAS IUCr Journals Google Scholar
Patterson, A. L. (1935). Z. Kristallogr. Cryst. Mater. 90, 517–542. CrossRef CAS Google Scholar
Perrakis, A., Harkiolaki, M., Wilson, K. S. & Lamzin, V. S. (2001). Acta Cryst. D57, 1445–1450. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Richardson, J. S. & Richardson, D. C. (1988). Science, 240, 1648–1652. CrossRef CAS PubMed Web of Science Google Scholar
Rigden, D. J., Thomas, J. M. H., Simkovic, F., Simpkin, A., Winn, M. D., Mayans, O. & Keegan, R. M. (2018). Acta Cryst. D74, 183–193. Web of Science CrossRef IUCr Journals Google Scholar
Sammito, M., Meindl, K., de Ilarduya, I. M., Millán, C., Artola-Recolons, C., Hermoso, J. A. & Usón, I. (2014). FEBS J. 281, 4029–4045. Web of Science CrossRef CAS PubMed Google Scholar
Sammito, M., Millán, C., Frieske, D., Rodríguez-Freire, E., Borges, R. J. & Usón, I. (2015). Acta Cryst. D71, 1921–1930. Web of Science CrossRef IUCr Journals Google Scholar
Schwarzenbacher, R., Godzik, A., Grzechnik, S. K. & Jaroszewski, L. (2004). Acta Cryst. D60, 1229–1236. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sevvana, M., Ruf, M., Usón, I., Sheldrick, G. M. & Herbst-Irmer, R. (2019). Acta Cryst. D75, 1040–1050. Web of Science CSD CrossRef ICSD IUCr Journals Google Scholar
Sheldrick, G. M. (2002). Z. Kristallogr. Cryst. Mater. 217, 644–650. Web of Science CrossRef CAS Google Scholar
Sheldrick, G. M. (2010). Acta Cryst. D66, 479–485. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheldrick, G. M. (2015). Acta Cryst. C71, 3–8. Web of Science CrossRef IUCr Journals Google Scholar
Sheldrick, G. M. & Gould, R. O. (1995). Acta Cryst. B51, 423–431. CrossRef CAS Web of Science IUCr Journals Google Scholar
Simpkin, A. J., Caballero, I., McNicholas, S., Stevenson, K., Jiménez, E., Sánchez Rodríguez, F., Fando, M., Uski, V., Ballard, C., Chojnowski, G., Lebedev, A., Krissinel, E., Usón, I., Rigden, D. J. & Keegan, R. M. (2023). Acta Cryst. D79, 806–819. Web of Science CrossRef IUCr Journals Google Scholar
Simpkin, A. J., Thomas, J. M. H., Simkovic, F., Keegan, R. M. & Rigden, D. J. (2019). Acta Cryst. D75, 1051–1062. Web of Science CrossRef IUCr Journals Google Scholar
Söding, J., Biegert, A. & Lupas, A. N. (2005). Nucleic Acids Res. 33, W244–W248. Web of Science PubMed Google Scholar
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C. (2003). Acta Cryst. D59, 1688–1701. Web of Science CrossRef CAS IUCr Journals Google Scholar
Thorn, A. & Sheldrick, G. M. (2011). J. Appl. Cryst. 44, 1285–1287. Web of Science CrossRef CAS IUCr Journals Google Scholar
Thorn, A. & Sheldrick, G. M. (2013). Acta Cryst. D69, 2251–2256. Web of Science CrossRef IUCr Journals Google Scholar
Usón, I. & Sheldrick, G. M. (1999). Curr. Opin. Struct. Biol. 9, 643–648. Web of Science CrossRef PubMed CAS Google Scholar
Usón, I. & Sheldrick, G. M. (2018). Acta Cryst. D74, 106–116. Web of Science CrossRef IUCr Journals Google Scholar
Usón, I., Sheldrick, G. M., de La Fortelle, E., Bricogne, G., Marco, S. D., Priestle, J. P., Grütter, M. G. & Mittl, P. R. (1999). Structure, 7, 55–63. Web of Science PubMed Google Scholar
Usón, I., Stevenson, C. E. M., Lawson, D. M. & Sheldrick, G. M. (2007). Acta Cryst. D63, 1069–1074. Web of Science CrossRef IUCr Journals Google Scholar
Vollmar, M., Parkhurst, J. M., Jaques, D., Baslé, A., Murshudov, G. N., Waterman, D. G. & Evans, G. (2020). IUCrJ, 7, 342–354. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Vonrhein, C., Blanc, E., Roversi, P. & Bricogne, G. (2007). Methods Mol. Biol. 364, 215–230. PubMed CAS Google Scholar
Wang, B.-C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed Google Scholar
Wang, J., Dauter, M. & Dauter, Z. (2006). Acta Cryst. D62, 1475–1483. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zhang, K. Y. J., Cowtan, K. D. & Main, P. (2001). International Tables for Crystallography, Vol. F, edited by M. G. Rossmann & E. Arnold, pp. 311–324. Dordrecht: Kluwer Academic Publishers. Google Scholar
Zhang, K. Y. J. & Main, P. (1990). Acta Cryst. A46, 41–46. CrossRef CAS IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.