research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 65| Part 2| February 2009| Pages 121-127

Model-building strategies for low-resolution X-ray crystallographic data

aDepartment of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, England
*Correspondence e-mail: nickf@ebi.ac.uk

(Received 10 September 2008; accepted 27 November 2008)

The interpretation of low-resolution X-ray crystallographic data proves to be challenging even for the most experienced crystallographer. Ambiguity in the electron-density map makes main-chain tracing and side-chain assignment difficult. However, the number of structures solved at resolutions poorer than 3.5 Å is growing rapidly and the structures are often of high biological interest and importance. Here, the challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed. The methods employed in model generation from electron microscopy, which share many of the same challenges in providing high-confidence models of macromolecular structures and assemblies, are also considered.

1. Introduction

The number of low-resolution structures solved by X-ray crystallography being deposited in the Protein Data Bank (PDB; Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]) has rapidly increased in recent years (see Fig. 1[link]). Whereas previously such data sets may have been discarded in pursuit of higher resolution, the value of the biological information that can only be obtained from lower resolution data has begun to be realised. It has been observed that the threshold of acceptability for obtaining mechanistic insights has been changing (Brunger, 2005[Brunger, A. T. (2005). Structure, 13, 171-172.]), with a number of significant structures, such as the entire ribosome (at 7 Å resolution; Cate et al., 1999[Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). Science, 285, 2095-2104.]), plant photosystem I (at 4.4 Å resolution; Ben-Shem et al., 2003[Ben-Shem, A., Frolow, F. & Nelson, N. (2003). Nature (London), 426, 630-635.]), the reverse transcriptase from HIV in complex with a target RNA (at 4.7 Å resolution; Jaeger et al., 1998[Jaeger, J., Restle, T. & Steitz, T. A. (1998). EMBO J. 17, 4535-4542.]) and many others being described. The increased reporting of low-resolution structures has coincided with advances in both experimental and computational methods for structure determination that make tackling the problems associated with low-resolution data increasingly tractable.

[Figure 1]
Figure 1
Rate of deposition of low-resolution structures: the number of new structures deposited each year in the Protein Data Bank which were solved by X-ray crystallographic methods at a resolution of less than 3.5 Å.

In this review, we provide an overview of the techniques used in building a model from low-resolution electron-density maps. This can be one of the most time-consuming, laborious and difficult tasks in the structure-determination process. There are a number of techniques in data processing and refinement, which will only be briefly mentioned, that can greatly aid the structure-determination process. Current refinement techniques have been reviewed by DeLaBarre & Brunger (2006[DeLaBarre, B. & Brunger, A. T. (2006). Acta Cryst. D62, 923-932.]) and elsewhere in this issue. The model-building methods applied to X-ray crystallography are contrasted with those used in modelling electron-microscopy data and the difference in definitions of resolution between the two methods is examined.

2. Problems of interpreting low-resolution X-ray data

A number of factors contribute to the problem of generating an atomic model from low-resolution (d > 3.5 Å) X-ray diffraction data. The primary cause of difficulty is that the number of observations used in the calculation of the electron-density map is significantly smaller than the number of parameters to be defined. This results in a map with a lack of atomicity, with helices appearing as tubes of density, lack of definition of peptide groups and accumulation of density in places other than the main chain (see Fig. 2[link]). Furthermore, termination of the Fourier series at low resolution can cause diffraction ripples around peaks in electron density, making the map difficult to interpret.

[Figure 2]
Figure 2
Features of high- and low-resolution electron-density maps. (a) A section of the high-resolution structure of XRCC4 solved at 2.3 Å resolution. (b) The equivalent section to (a) from the low-resolution structure of Lif1 solved at 3.9 Å resolution. Note the loss of side-chain and main-chain features. (c) A section of β-sheet from the 3.9 Å resolution structure of Lig4. The region of β-sheet is shown as black sticks, while the remaining trace is depicted as black lines. In all three the maps are calculated with 2FoFc coefficients and thus may have some model bias.

Faced with such problems, the crystallographer has difficulty in tracing the peptide main chain, with ambiguities in direction and in the number of residues that make up sections of the structure. For example, the conformations of residues that cap a helix are often unclear as the helix unwinds into a turn region. In addition, it is also difficult to ascertain the number of residues that then make up the loop. Once a main-chain trace (or part trace) has been constructed, the assignment of residue type, through the placement of its side chain, is also taxing. Registry errors can easily occur as density for long side chains can be curtailed owing to side-chain disorder, resulting in the assignment of a residue with a shorter side chain instead. Further problems can arise with bulkier side chains, for example distinguishing between phenylalanine and tyrosine, where the lack of atomicity can result in the tyrosine hydroxy group being indiscernible. Assignment can also be hampered by the fragmentation of the main-chain trace, such that even if a section can be correctly assigned it cannot be continued for the entire structure. The more discontinuity in the main-chain trace, the more intractable the sequence assignment becomes.

3. Classical and current strategies for model building into low-resolution electron density

The issues of handling low-resolution electron density have been present since the first protein structures were experimentally determined. The early structure of myoglobin by Kendrew and coworkers in 1958 was determined at 6 Å resolution (although subsequently to 2 Å). It is interesting to note that although the structure was solved at 6 Å resolution the crystal actually diffracted to a much higher resolution, but owing to problems in processing all the reflections the resolution was cut back (Kendrew et al., 1958[Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). Nature (London), 181, 662-666.]). The model was built as a tube connecting continuous peaks in electron density plotted on stacked sheets of glass (Kendrew, 1958[Kendrew, J. C. (1958). Nature (London), 182, 764-767.]). There were ambiguities in the trace, with at least two ways of tracing the molecule, which were not resolved until the 2 Å resolution model was built using the now-famous Kendrew wire models. Yet the 6 Å resolution model provided a rich resource of new insights into protein structure. Similar problems were en­countered in the structure of haemoglobin solved by Perutz and coworkers at 5.5 Å resolution a short while after. The model was constructed by cutting, from a sheet of plastic, the shape of each peak above a certain cutoff and then assembling the pieces according to their positions in the different sections (Muirhead & Perutz, 1963[Muirhead, H. & Perutz, M. F. (1963). Nature (London), 199, 633-638.]).

Unlike these early model-building methods that relied just on the electron density, today there is a large knowledge base for macromolecular structure from macromolecular studies at high resolution as well as detailed analysis of small peptides. Often, there are fragments or domains of the structure under study, or at least a structural homologue, that have been solved at high resolution. These fragments can be used to perform molecular replacement. If these sections are too small or structurally dissimilar to be used for phasing, they can be manipulated manually within a graphics-based modelling program [such as O (Jones et al., 1991[Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110-119.]) or Coot (Emsley & Cowtan, 2004[Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126-2132.])] to obtain an approximate orientation of the section in the electron density. If fragments of previously determined structures are unavailable, sections of idealized secondary-structure fragments can be used. This can be particularly useful for α-helices as even at low resolution the groove of the helix can be discerned in the electron density. These fragments can then be connected by manually extending the main-chain trace using tools such as the `baton' tool in Coot, which has a defined residue length to place Cα atoms at appropriate distances apart. This is often performed in conjunction with an alignment of the sequence of the structure under investigation with structural homologues (supplemented with other homologous sequences to improve the alignment accuracy). From the sequence/structure alignment, secondary-structure elements can be inferred and the number of residues separating them can be estimated.

The above general strategy was broadly employed in solving the structure of plant photosystem I at 4.4 Å resolution. The crystallo­graphers utilized the Cα backbone of a subsection of a previously solved homologue, the cyanobacterial reaction centre, which was manually located in the electron density. This provided a core to which modifications (residue additions/deletions) could be made based on clear parts of the map and in combination with a sequence alignment. Idealized helices were also placed and manually modified to improve their fit to the map. Further subunits were assigned based on biochemical and other biophysical data, although the entire model could only be represented as a backbone trace (Ben-Shem et al., 2003[Ben-Shem, A., Frolow, F. & Nelson, N. (2003). Nature (London), 426, 630-635.]). A similar backbone-only trace was generated for the first structural model of the bovine mitochondrial F1-ATPase (at 6.5 Å resolution) using a tracing program (`skeletonization' in O, which reduces the electron density to idealized thin lines following the long polypeptide chains preserving the connectivity of the structure; Greer, 1985[Greer, J. (1985). Methods Enzymol. 115, 206-224.]). At this resolution the automated methods make many misinterpretations; thus, the trace was manually edited to exclude all atoms placed outside the density (Abrahams et al., 1993[Abrahams, J. P., Lutter, R., Todd, R. J., van Raaij, M. J., Leslie, A. G. W. & Walker, J. E. (1993). EMBO J. 12, 1775-1780.]). This structure was subsequently solved at 2.7 Å resolution (Abrahams et al., 1994[Abrahams, J. P., Leslie, A. G. W., Lutter, R. & Walker, J. E. (1994). Nature (London), 370, 621-628.]). A stripped-down polyglycine version of this higher resolution model was used as a molecular-replacement probe to determine the structure of the Escherichia coli mitochondrial F1-ATPase at 4.4 Å resolution. Further manual modelling to account for differences between the search model and the electron density, including extending into new regions, was conducted. Side-chain modelling was not possible and the model was deposited as a polyglycine model (Hausrath et al., 1999[Hausrath, A. C., Gruber, G., Matthews, B. W. & Capaldi, R. A. (1999). Proc. Natl Acad. Sci. USA, 96, 13697-13702.]).

While tracing the main chain can be challenging, the modelling of side chains can be even more problematic. A Cα trace may be all that can be confidently modelled, unless there are clear features in the electron density that can be used as points to begin to assign sequence. Features can include large `blobs' that can be attributed to a large side chain, most commonly tryptophan, combined with topological features seen in related structures, which might indicate relationships to other secondary structures. In addition, unusual topological features produced by sequence motifs can also aid in assigning sequence. Most useful are peaks in the density from heavy atoms used in phasing from MAD, SAD or MIRAS experiments. Sequence can also be attributed by extension from a fragment of a high-resolution structure if one has been docked or used for molecular replacement. Other modifications present in the structure, such as glycosylation sites or disulfide bridges, are also invaluable in acting as sequence-anchor points. Often, combinations of features are required to assign sequence effectively.

In the case of the 30S subunit (solved at 5.5 Å resolution), seven high-resolution structures were placed manually using both visual interpretation and other extensive experimental data including a neutron map of the centres of mass, footprinting studies and accumulated biochemical data. Un­determined substructures have also been resolved on the basis of helical secondary-structure predictions from biochemical and neutron scattering data as well as one section based on a secondary-structure prediction from the sequence (Clemons et al., 1999[Clemons, W. M. Jr, May, J. L., Wimberly, B. T., McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V. (1999). Nature (London), 400, 833-840.]). In determining the 50S subunit (solved at 5.0 Å resolution), in addition to the placement of previously solved fragments, sections were identified using template fragments placed using ESSENS (Kleywegt & Jones, 1997a[Kleywegt, G. J. & Jones, T. A. (1997a). Acta Cryst. D53, 179-185.]) and some unusual shapes, e.g. the sarcin–ricin loop with a distinctive S shape. This became a marker to orientate other sections, such as the L6f region (Ban et al., 1999[Ban, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B. & Steitz, T. A. (1999). Nature (London), 400, 841-847.]). In determining the entire 70S ribosome at 7.8 Å resolution, similar methods of com­bining molecular replacement, in this case using a pseudo-atom model from an EM single-particle reconstruction, in combination with inferences from biochemical data as well as knowledge of the 30S and 50S components was used to generate an all-atom model (Cate, 2001[Cate, J. H. (2001). Methods, 25, 303-308.]; Cate et al., 1999[Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). Science, 285, 2095-2104.]).

In determining the structure of the fully glycosolated SIV gp120 envelope glycoprotein, in addition to using the tech­niques described above for manually extending the trace from a polyalanine-backbone model derived from a high-resolution homologue, side-chain assignment was aided by negative B-­factor sharpening as well as by using the heavy-atom selenium sites and glycosylation sites in conjunction with alignment to the HIV model (Chen et al., 2005[Chen, B., Vogan, E. M., Gong, H., Skehel, J. J., Wiley, D. C. & Harrison, S. C. (2005). Structure, 13, 197-211.]). In addition to using high-resolution substructures as molecular-replacement probes, it has also been possible to use homology models of subunits. This method was successfully used in determining the structure of human factor VIII at 3.98 Å resolution (Ngo et al., 2008[Ngo, J. C., Huang, M., Roth, D. A., Furie, B. C. & Furie, B. (2008). Structure, 16, 597-606.]). The presence of side chains in both the previously solved high-resolution subunits and the homology model allowed manual refitting and modelling to generate a complete model that could be effectively refined.

An existing high-resolution or homology model is not always required as a starting point. It is possible to use automated model-building software even at low resolution to generate backbone fragments which can then be used by other automated software to extend and assign the sequence. This was achieved for the structure of human 5-lipoxygenase-activating protein at 4.0 Å resolution, which used a combination of the helix-building module of the ARP/wARP package (Langer et al., 2008[Langer, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. (2008). Nature Protoc. 3, 1171-1179.]) and MIFit (Ferguson et al., 2007[Ferguson, A. D., McKeever, B. M., Xu, S., Wisniewski, D., Miller, D. K., Yamin, T. T., Spencer, R. H., Chu, L., Ujjainwalla, F., Cunningham, B. R., Evans, J. F. & Becker, J. W. (2007). Science, 317, 510-512.]). The sequence assignment was greatly aided by using the 18 selenomethionine sites and six bound inhibitor molecules as markers.

Sometimes the structural differences between existing high-resolution structures that could potentially be used as either molecular-replacement probes or as manually fitted subunits are too great to be used directly. This was the case for the cocrystal structure Lig1–Lig4 (Dore et al., 2006[Dore, A. S., Furnham, N., Davies, O. R., Sibanda, B. L., Chirgadze, D. Y., Jackson, S. P., Pellegrini, L. & Blundell, T. L. (2006). DNA Repair, 5, 362-368.]), which had a number of significant topology changes and low sequence similarity to its high-resolution homologue XRCC4. To overcome these issues, the structure was traced manually using general topology, alignments and structure prediction gained from knowledge of the homologue. With the lack of side-chain placement from positioned high-resolution sub­units, sequence assignment is much more problematic. Key features in the electron density were identified, in­cluding a glycine–proline–proline 90° turn, a tryptophan located in the middle of a helix supporting a three-stranded sheet identified as a feature from a structural homologue and residues in­volved in the interaction between two subunits; these all acted as starting points for sequence assignment. In this example, side-chain modelling was achieved using the semi-automated real-space search algorithm RAPPER (Furnham, Dore et al., 2006[Furnham, N., Doré, A. S., Chirgadze, D. Y., de Bakker, P. I., DePristo, M. A. & Blundell, T. L. (2006). Structure, 14, 1313-1320.]), allowing a number of alternative main-chain/side-chain placements to be ex­plored.

From the different strategies described above, the following action plan for building an atomic model can be employed when presented with low-resolution X-ray crystallographic data. Firstly, all available structural (pre­viously solved fragments and homologous structures) as well as structurally related information (including theor­etical and biochemical data) should be collated together. If related structures exist, they should be located either through use as molecular-replacement (MR) probes, for example using Phaser (McCoy et al., 2005[McCoy, A. J., Grosse-Kunstleve, R. W., Storoni, L. C. & Read, R. J. (2005). Acta Cryst. D61, 458-464.]), or if experimental phases have been estimated by manual placement in the electron density using graphical software packages such as Coot. Sections of secondary structure can be located in the electron density using fragment libraries and search tools in programs such as Buccaneer, ARP/wARP and PHENIX (Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]). Sections of model can then be connected using automated approaches such as RAPPER for smaller loops or by manual extension in Coot or O. Model building can be informed by secondary-structure predictions based on sequence and also by locating amino acids in the sequence from biochemical data and by binding of heavy metals used in experimental phasing. Sequence placement can also benefit from the association of a particular motif with a usual structural feature. Rebuilding using automated methods such as RAPPER or real-space refinement methods as implemented in Coot can be used to improve further the model in conjunction with rounds of careful refinement. An overview of this general strategy is shown in Fig. 3[link]. What is evident from the strategies employed in the past is that an inventive combination of approaches is frequently required in order to interpret successfully the experimental data.

[Figure 3]
Figure 3
Schematic of a general strategy for low-resolution X-ray crystallographic model generation.

4. Interpreting high-resolution electron microscopy electron-density maps

Although X-ray crystallography and NMR spectroscopy remain the methods of choice for studying biomolecular structures to atomic detail, electron microscopy (EM) serves as a complementary tool to study large complexes and macromolecular machines that are difficult to crystallize and are beyond the size threshold for NMR spectroscopy.

Structural interpretation of EM maps generally involves fitting a high-resolution X-ray/NMR structure or a homology model into the map. The model is first docked using programs that perform a global rigid-body search (Volkmann & Hanein, 1999[Volkmann, N. & Hanein, D. (1999). J. Struct. Biol. 125, 176-184.]; Wriggers & Birmanns, 2001[Wriggers, W. & Birmanns, S. (2001). J. Struct. Biol. 133, 193-202.]; Wriggers et al., 1999[Wriggers, W., Milligan, R. A. & McCammon, J. A. (1999). J. Struct. Biol. 125, 185-195.]). The initial fit can then be refined by limiting flexibility to between domains or in connecting loop regions in order to prevent overfitting at low resolution (Chen et al., 2001[Chen, L. F., Blanc, E., Chapman, M. S. & Taylor, K. A. (2001). J. Struct. Biol. 133, 221-232.], 2003[Chen, J. Z., Furst, J., Chapman, M. S. & Grigorieff, N. (2003). J. Struct. Biol. 144, 144-151.]; Gao et al., 2003[Gao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S. M., Van Roey, P., Agrawal, R. K., Harvey, S. C., Sali, A., Chapman, M. S. & Frank, J. (2003). Cell, 113, 789-801.]; Topf et al., 2008[Topf, M., Lasker, K., Webb, B., Wolfson, H., Chiu, W. & Sali, A. (2008). Structure, 16, 295-307.]). Normal-mode-based methods avoid the need to arbitrarily assign rigid and flexible regions by allowing shifts along the low-frequency normal modes of the molecule (Tama et al., 2004[Tama, F., Miyashita, O. & Brooks, C. L. III (2004). J. Mol. Biol. 337, 985-999.]). Unrealistic distortions are avoided by iterating the procedure and the structure is gradually optimized to fit the density map.

Recent advances in sample preparation and data handling (for a detailed review, see Zhou, 2008[Zhou, Z. H. (2008). Curr. Opin. Struct. Biol. 18, 218-228.]) have lead to the resolution obtainable by EM reaching near-atomic levels. This has prompted the development of new methods that combine traditional model-building techniques taken from crystallo­graphy with new pattern-recognition algorithms (Kong et al., 2004[Kong, Y., Zhang, X., Baker, T. S. & Ma, J. (2004). J. Mol. Biol. 339, 117-130.]) suited to sub-nanometre resolution maps. At resolutions of between 4 and 8 Å helices can be identified by their characteristic cylindrical shaped density and β-sheets appear as flat continuous regions of density, although individual strands may not be identified. At resolutions closer to 4 Å, density for bulky side chains may be seen (Chiu et al., 2005[Chiu, W., Baker, M. L., Jiang, W., Dougherty, M. & Schmid, M. F. (2005). Structure, 13, 363-372.]). Visual inspection can be used to identify these features; however, the interpretation can be subjective (Chiu et al., 2002[Chiu, W., Baker, M. L., Jiang, W. & Zhou, Z. H. (2002). Curr. Opin. Struct. Biol. 12, 263-269.]). Automation has been achieved using SSEHunter (Baker et al., 2007[Baker, M. L., Ju, T. & Chiu, W. (2007). Structure, 15, 7-19.]), a feature-extraction program that identifies secondary-structure elements in maps of up to 10 Å resolution. The map is first quantized by designating pseudo-atoms that correspond to regions of high density and then traced using a thinning and pruning algorithm. The skeleton outline gives a simplified geometric representation of the map in which cylindrical shaped density characteristic of helices is represented as a curve and plate-shaped density corresponding to β-sheets is depicted as a surface. α-Helices are identified using a cross-correlation-based exhaustive search between the map and the density of a prototypical helix. The pseudo-atoms are given a combined weighted score based on the skeletal features observed, their relative distance to a high-density voxel in the helix-correlation map, the number and relative geometric positions of neighbouring pseudo-atoms and the aspect ratio of the local density region. Depending upon the score, the pseudo-atoms can then be interactively grouped to represent helices and sheets or an automated procedure can be used.

Once the positions and orientations of secondary-structure elements have been identified, a prototypical helix/strand can be fitted. Although the skeleton can be used as a guide to establish the connections between the secondary structures, branches can occur in regions of ambiguity. Ludtke and coworkers have shown that consensus secondary-structure prediction can be used to assign each Cα atom in the helix by mapping the sequences of the predicted helices onto the helices identified in the map based on their lengths and relative position. Connectivity can then be established based on the sequence and the surrounding density. This approach has been successfully used to build a Cα trace for the major capsid protein gp7 of epsilon15 virus at 4.5 Å resolution (Jiang et al., 2008[Jiang, W., Baker, M. L., Jakana, J., Weigele, P. R., King, J. & Chiu, W. (2008). Nature (London), 451, 1130-1134.]) and GroEL at 4 Å resolution (Ludtke et al., 2008[Ludtke, S. J., Baker, M. L., Chen, D. H., Song, J. L., Chuang, D. T. & Chiu, W. (2008). Structure, 16, 441-448.]). β-­Sheets pose a more challenging problem for model building. Although SSEHunter can determine position and orientation, differentiating the number and direction of individual strands is difficult. The manually placed Cα atoms can then be refined to optimize the fit to the density and idealize hydrogen bonds and dihedral angles within helices and sheets. Although these methods have been developed for EM maps, they could equally be applied to low-resolution X-ray crystallography maps.

Often, some of the helices can be identified and structure-matching programs such as COSEC (Mizuguchi & Go, 1995[Mizuguchi, K. & Go, N. (1995). Protein Eng. 8, 353-362.]; Kinoshita et al., 1999[Kinoshita, K., Kidera, A. & Go, N. (1999). Protein Sci. 8, 1210-1217.]) and DejaVu (Kleywegt & Jones, 1997b[Kleywegt, G. J. & Jones, T. A. (1997b). Methods Enzymol. 277, 525-545.]) can be used to probe a library of PDB structures to identify possible homologues based on the relative position and orientation of the helices (Jiang et al., 2001[Jiang, W., Baker, M. L., Ludtke, S. J. & Chiu, W. (2001). J. Mol. Biol. 308, 1033-1044.]). Such partial structure-based fold recognition enables homologues to be identified that may have low sequence similarity but share a similar fold. The homologous structure can then be docked and flexibly refined into the map or can aid the model-building process by fitting fragments from structurally/sequentially conserved regions and help establish topology in regions in which loops appear disordered.

5. EM resolution and X-ray crystallographic resolution

The method of determining the resolution of an EM map is dependent on the nature of the sample imaged (Chiu et al., 2005[Chiu, W., Baker, M. L., Jiang, W., Dougherty, M. & Schmid, M. F. (2005). Structure, 13, 363-372.]). For two- and three-dimensional crystalline samples and filaments with helical symmetry, resolution relates to the highest peak that can be resolved in diffraction space. For single-particle cryo-EM the data are divided into two sets from which independent reconstructions are calculated and compared at various frequency shells in Fourier space (Leschziner & Nogales, 2007[Leschziner, A. E. & Nogales, E. (2007). Annu. Rev. Biophys. Biomol. Struct. 36, 43-62.]). Resolution can then be evaluated based on the Fourier shell correlation (FSC; Harauz & Van Heel, 1986[Harauz, G. & van Heel, M. (1986). Optik, 73, 146-156.]) or the spectral signal-to-noise ratio criterion (Unser et al., 2005[Unser, M., Sorzano, C. O., Thevenaz, P., Jonic, S., El-Bez, C., De Carlo, S., Conway, J. F. & Trus, B. L. (2005). J. Struct. Biol. 149, 243-255.]). The more commonly used FSC method ascertains the resolution as the frequency interval at which the two reconstructions have a normalized correlation coefficient equal to a certain threshold,

[{\rm FSC} = {{\textstyle \sum F_{1}F_{2}^*}\over{\left(\textstyle \sum F_{1}^{2}\textstyle \sum F_{2}^{2}\right)^{1/2}}},]

where F1 and F2 are the complex structure factors of the two reconstructions and the sum is over all Fourier space voxels contained in a resolution shell.

Generally, a cutoff of 0.5 is used and other cutoff criteria have been proposed (Rosenthal & Henderson, 2003[Rosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721-745.]). It has been argued that the use of fixed-value threshold cannot give a reproducible resolution value. Instead, threshold curves such as the σ-factor curve are more representative and give a more conservative estimate of the resolution (Saxton & Baumeister, 1982[Saxton, W. O. & Baumeister, W. (1982). J. Microsc. 127, 127-138.]); the 1/2 bit-information threshold curve is calibrated to give a resolution value comparable to the resolution value calculated in X-ray crystallography (van Heel & Schatz, 2005[Heel, M. van & Schatz, M. (2005). J. Struct. Biol. 151, 250-262.]).

Since the definition of the resolution of an EM map is variable in terms of the criterion chosen to evaluate it, it is thus important that the resolution be validated with the structural details that can be discerned in the map. One might expect that an X-ray map would be more detailed than an EM map at the same resolution; however, a comparison (Cate et al., 1999[Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). Science, 285, 2095-2104.]) of the X-ray map of Thermus thermophilus 70S ribosomal complex obtained at 7.8 Å resolution with the cryo-EM map of the E. coli 50S subunit at 7.5 Å resolution (Matadeen et al., 1999[Matadeen, R., Patwardhan, A., Gowen, B., Orlova, E. V., Pape, T., Cuff, M., Mueller, F., Brimacombe, R. & van Heel, M. (1999). Structure, 7, 1575-1583.]) showed that the visual details observed in the latter were slightly better (van Heel, 2000[Heel, M. van (2000). Curr. Opin. Struct. Biol. 10, 259-264.]).

6. Towards automated strategies for model building into low-resolution X-ray data

The latest developments in automated model-building techniques, which allow more indistinct descriptions of the fragments/residues that are used as the basic search models, have extended the resolution at which these programs can effectively generate at least a partial model. These programs include Buccaneer (Cowtan, 2006[Cowtan, K. (2006). Acta Cryst. D62, 1002-1011.]), SOLVE/RESOLVE (Terwilliger, 2003[Terwilliger, T. C. (2003). Methods Enzymol. 374, 22-37.]) and the secondary-structure recognition package of ARP/wARP (Langer et al., 2008[Langer, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. (2008). Nature Protoc. 3, 1171-1179.]). Other methods, such as RAPPER, that combine prior knowledge about a structure, such as secondary structure and sequence, with the experimental data are beginning to emerge for low-resolution X-ray crystallography (Furnham, Dore et al., 2006[Furnham, N., Doré, A. S., Chirgadze, D. Y., de Bakker, P. I., DePristo, M. A. & Blundell, T. L. (2006). Structure, 14, 1313-1320.]). This permits hypotheses and weak assumptions about a structure to be tested. As more automated approaches emerge, it becomes increasingly possible to generate multiple models representing the data. This permits the exploration of the conformational space represented in the data, providing both a measure of the uncertainty in the interpretation of the electron density and the temporal and spatial heterogeneity of the structure (Furnham, Blundell et al., 2006[Furnham, N., Blundell, T. L., DePristo, M. A. & Terwilliger, T. C. (2006). Nature Struct. Mol. Biol. 13, 184-185.]).

Developments in low-resolution X-ray crystallographic structure modelling are beginning to mirror some of the recent developments in model generation for EM. As attempts are made to understand the mechanisms of complete molecular machines, there is a need to integrate more diverse structural data. In the determination of the architecture of the nuclear pore complex, spatial restraints derived from EM maps together with proteomics experiments and biophysical studies such as ultracentrifugation and affinity purification were used to restrain relative positions of individual protein components in a molecular simulation to generate ensembles of possible architectures for the complex (Alber et al., 2007[Alber, F., Dokudovskaya, S., Veenhoff, L. M., Zhang, W., Kipper, J., Devos, D., Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B. T., Rout, M. P. & Sali, A. (2007). Nature (London), 450, 683-694.]). As many of the same challenges in providing high-confidence models of macromolecular structures and assemblies are shared by both EM and low-resolution X-ray crystallography, it is likely that many of the methods will be combined and new strategies developed to provide more automated techniques for model construction.

Footnotes

Current address: EMBL Outstation Hinxton, European Bioinformatics Institute, Wellcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, England.

Acknowledgements

NF is currently supported by a Wellcome Trust project grant. AMK is funded by the Cambridge Commonwealth Trust.

References

First citationAbrahams, J. P., Leslie, A. G. W., Lutter, R. & Walker, J. E. (1994). Nature (London), 370, 621–628.  CrossRef CAS PubMed Web of Science Google Scholar
First citationAbrahams, J. P., Lutter, R., Todd, R. J., van Raaij, M. J., Leslie, A. G. W. & Walker, J. E. (1993). EMBO J. 12, 1775–1780.  CAS PubMed Web of Science Google Scholar
First citationAlber, F., Dokudovskaya, S., Veenhoff, L. M., Zhang, W., Kipper, J., Devos, D., Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B. T., Rout, M. P. & Sali, A. (2007). Nature (London), 450, 683–694.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBaker, M. L., Ju, T. & Chiu, W. (2007). Structure, 15, 7–19.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBan, N., Nissen, P., Hansen, J., Capel, M., Moore, P. B. & Steitz, T. A. (1999). Nature (London), 400, 841–847.  Web of Science PubMed CAS Google Scholar
First citationBen-Shem, A., Frolow, F. & Nelson, N. (2003). Nature (London), 426, 630–635.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBerman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBrunger, A. T. (2005). Structure, 13, 171–172.  Web of Science CrossRef PubMed CAS Google Scholar
First citationCate, J. H. (2001). Methods, 25, 303–308.  Web of Science CrossRef PubMed CAS Google Scholar
First citationCate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). Science, 285, 2095–2104.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChen, B., Vogan, E. M., Gong, H., Skehel, J. J., Wiley, D. C. & Harrison, S. C. (2005). Structure, 13, 197–211.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChen, J. Z., Furst, J., Chapman, M. S. & Grigorieff, N. (2003). J. Struct. Biol. 144, 144–151.  Web of Science CrossRef PubMed Google Scholar
First citationChen, L. F., Blanc, E., Chapman, M. S. & Taylor, K. A. (2001). J. Struct. Biol. 133, 221–232.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChiu, W., Baker, M. L., Jiang, W., Dougherty, M. & Schmid, M. F. (2005). Structure, 13, 363–372.  Web of Science CrossRef PubMed CAS Google Scholar
First citationChiu, W., Baker, M. L., Jiang, W. & Zhou, Z. H. (2002). Curr. Opin. Struct. Biol. 12, 263–269.  Web of Science CrossRef PubMed CAS Google Scholar
First citationClemons, W. M. Jr, May, J. L., Wimberly, B. T., McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V. (1999). Nature (London), 400, 833–840.  Web of Science PubMed CAS Google Scholar
First citationCowtan, K. (2006). Acta Cryst. D62, 1002–1011.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDeLaBarre, B. & Brunger, A. T. (2006). Acta Cryst. D62, 923–932.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDore, A. S., Furnham, N., Davies, O. R., Sibanda, B. L., Chirgadze, D. Y., Jackson, S. P., Pellegrini, L. & Blundell, T. L. (2006). DNA Repair, 5, 362–368.  Web of Science CrossRef PubMed CAS Google Scholar
First citationEmsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFerguson, A. D., McKeever, B. M., Xu, S., Wisniewski, D., Miller, D. K., Yamin, T. T., Spencer, R. H., Chu, L., Ujjainwalla, F., Cunningham, B. R., Evans, J. F. & Becker, J. W. (2007). Science, 317, 510–512.  Web of Science CrossRef PubMed CAS Google Scholar
First citationFurnham, N., Blundell, T. L., DePristo, M. A. & Terwilliger, T. C. (2006). Nature Struct. Mol. Biol. 13, 184–185.  Web of Science CrossRef CAS Google Scholar
First citationFurnham, N., Doré, A. S., Chirgadze, D. Y., de Bakker, P. I., DePristo, M. A. & Blundell, T. L. (2006). Structure, 14, 1313–1320.  Web of Science CrossRef PubMed CAS Google Scholar
First citationGao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S. M., Van Roey, P., Agrawal, R. K., Harvey, S. C., Sali, A., Chapman, M. S. & Frank, J. (2003). Cell, 113, 789–801.  Web of Science CrossRef PubMed CAS Google Scholar
First citationGreer, J. (1985). Methods Enzymol. 115, 206–224.  CrossRef CAS PubMed Google Scholar
First citationHarauz, G. & van Heel, M. (1986). Optik, 73, 146–156.  Google Scholar
First citationHausrath, A. C., Gruber, G., Matthews, B. W. & Capaldi, R. A. (1999). Proc. Natl Acad. Sci. USA, 96, 13697–13702.  Web of Science CrossRef PubMed CAS Google Scholar
First citationHeel, M. van (2000). Curr. Opin. Struct. Biol. 10, 259–264.  Web of Science PubMed Google Scholar
First citationHeel, M. van & Schatz, M. (2005). J. Struct. Biol. 151, 250–262.  Web of Science PubMed Google Scholar
First citationJaeger, J., Restle, T. & Steitz, T. A. (1998). EMBO J. 17, 4535–4542.  Web of Science CrossRef CAS PubMed Google Scholar
First citationJiang, W., Baker, M. L., Jakana, J., Weigele, P. R., King, J. & Chiu, W. (2008). Nature (London), 451, 1130–1134.  Web of Science CrossRef PubMed CAS Google Scholar
First citationJiang, W., Baker, M. L., Ludtke, S. J. & Chiu, W. (2001). J. Mol. Biol. 308, 1033–1044.  Web of Science CrossRef PubMed CAS Google Scholar
First citationJones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110–119.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationKendrew, J. C. (1958). Nature (London), 182, 764–767.  PubMed CAS Web of Science Google Scholar
First citationKendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H. & Phillips, D. C. (1958). Nature (London), 181, 662–666.  CrossRef PubMed CAS Web of Science Google Scholar
First citationKinoshita, K., Kidera, A. & Go, N. (1999). Protein Sci. 8, 1210–1217.  Web of Science CrossRef PubMed CAS Google Scholar
First citationKleywegt, G. J. & Jones, T. A. (1997a). Acta Cryst. D53, 179–185.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationKleywegt, G. J. & Jones, T. A. (1997b). Methods Enzymol. 277, 525–545.  CrossRef PubMed CAS Web of Science Google Scholar
First citationKong, Y., Zhang, X., Baker, T. S. & Ma, J. (2004). J. Mol. Biol. 339, 117–130.  Web of Science CrossRef PubMed CAS Google Scholar
First citationLanger, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. (2008). Nature Protoc. 3, 1171–1179.  Web of Science CrossRef CAS Google Scholar
First citationLeschziner, A. E. & Nogales, E. (2007). Annu. Rev. Biophys. Biomol. Struct. 36, 43–62.  Web of Science CrossRef PubMed CAS Google Scholar
First citationLudtke, S. J., Baker, M. L., Chen, D. H., Song, J. L., Chuang, D. T. & Chiu, W. (2008). Structure, 16, 441–448.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMatadeen, R., Patwardhan, A., Gowen, B., Orlova, E. V., Pape, T., Cuff, M., Mueller, F., Brimacombe, R. & van Heel, M. (1999). Structure, 7, 1575–1583.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMcCoy, A. J., Grosse-Kunstleve, R. W., Storoni, L. C. & Read, R. J. (2005). Acta Cryst. D61, 458–464.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMizuguchi, K. & Go, N. (1995). Protein Eng. 8, 353–362.  CrossRef CAS PubMed Web of Science Google Scholar
First citationMuirhead, H. & Perutz, M. F. (1963). Nature (London), 199, 633–638.  CrossRef PubMed CAS Web of Science Google Scholar
First citationNgo, J. C., Huang, M., Roth, D. A., Furie, B. C. & Furie, B. (2008). Structure, 16, 597–606.  Web of Science CrossRef PubMed CAS Google Scholar
First citationRosenthal, P. B. & Henderson, R. (2003). J. Mol. Biol. 333, 721–745.  Web of Science CrossRef PubMed CAS Google Scholar
First citationSaxton, W. O. & Baumeister, W. (1982). J. Microsc. 127, 127–138.  CrossRef CAS PubMed Web of Science Google Scholar
First citationTama, F., Miyashita, O. & Brooks, C. L. III (2004). J. Mol. Biol. 337, 985–999.  Web of Science CrossRef PubMed CAS Google Scholar
First citationTerwilliger, T. C. (2003). Methods Enzymol. 374, 22–37.  Web of Science CrossRef PubMed CAS Google Scholar
First citationTerwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationTopf, M., Lasker, K., Webb, B., Wolfson, H., Chiu, W. & Sali, A. (2008). Structure, 16, 295–307.  Web of Science CrossRef PubMed CAS Google Scholar
First citationUnser, M., Sorzano, C. O., Thevenaz, P., Jonic, S., El-Bez, C., De Carlo, S., Conway, J. F. & Trus, B. L. (2005). J. Struct. Biol. 149, 243–255.  Web of Science CrossRef PubMed CAS Google Scholar
First citationVolkmann, N. & Hanein, D. (1999). J. Struct. Biol. 125, 176–184.  Web of Science CrossRef PubMed CAS Google Scholar
First citationWriggers, W. & Birmanns, S. (2001). J. Struct. Biol. 133, 193–202.  Web of Science CrossRef PubMed CAS Google Scholar
First citationWriggers, W., Milligan, R. A. & McCammon, J. A. (1999). J. Struct. Biol. 125, 185–195.  Web of Science CrossRef PubMed CAS Google Scholar
First citationZhou, Z. H. (2008). Curr. Opin. Struct. Biol. 18, 218–228.  Web of Science CrossRef PubMed CAS Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 65| Part 2| February 2009| Pages 121-127
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds