Rapid model building of α-helices in electron-density maps

A method for rapid model building of α-helices at moderate resolution is presented.


Introduction
Building an atomic model is a key step in the interpretation of electron-density maps of macromolecules. Atomic models can be simple and readily visualized representations of the structures of macromolecules and are commonly used as the primary means of conveying structural information about a macromolecule.
Many methods have been developed for manual, semiautomatic and automatic interpretation of electron-density maps from macromolecules. Interactive methods include manual building of models into maps [e.g. O (Jones et al., 1991), MAIN (Turk, 1992), XtalView (McRee, 1999) and Coot (Emsley & Cowtan, 2004)] as well as on-demand local interpretation of maps in which the user specifies some information about the chain location or geometry and a model is automatically generated (Oldfield, 1994;Jones & Kjeldgaard, 1997;McRee, 1999). There are also a number of highly automated methods for the interpretation of maps of proteins. These include procedures for the identification of C -atom positions followed by the generation of complete polypeptide chains (Oldfield, 2002(Oldfield, , 2003Ioerger & Sacchettini, 2003;Cowtan, 2006), methods focusing on the identification of helical and extended structures followed by tracing loops and other structure (Levitt, 2001;Terwilliger, 2003), methods based on the identification of atomic positions and their interpretation in terms of a polypeptide chain (Perrakis et al., 1999), methods that use extensive conformational sampling (DePristo et al., 2005), probabilistic methods based on the recognition of density patterns in electron-density maps (DiMaio et al., 2007) and methods analyzing lower resolution density features in maps (Baker et al., 2007).
While these are powerful tools for the automated interpretation of electron-density maps representing structures of proteins, they typically take considerably longer to carry out than other initial steps in structure determination (heavy-atom location, phasing and density modification). Additionally, they all become progressively less effective as the resolution of the map decreases, although some progress has recently been made in this regard (DiMaio et al., 2007).
One approach for speeding up map interpretation and for broadening the resolution range over which accurate model building can be carried out is to identify and interpret features in the map that are as large as possible. In this way a sub-stantial portion of a model can be generated all at once. Furthermore, provided that the features that are identified in this way are relatively uniform over many structures, these features can potentially be modelled accurately. The experience of many crystallographers has demonstrated that -helices can readily be identified at low (5-8 Å ) resolution (DeLaBarre & Brunger, 2006). At higher resolution, the O software has shown that the direction (and placement) of -helices in a map can be accurately identified by averaging the electron density near several sequential C positions by applying a transformation corresponding to the relationship between sequential residues in an -helix (Kleywegt & Jones, 1997). The key element in this approach is that the C atoms in an -helix point somewhat towards the N-terminus of the -helix and this directionality of the side-chain density can be readily identified after averaging over several sequential residues in a -helix.
Here, we combine these methods for -helix identification and placement and use them to create a simple series of steps for automatic modeling of the -helices in an electron-density map of a protein.

Modelling a-helices in an electron-density map
Our approach for modeling the -helices in an electrondensity map of a protein consists of three steps. These are as follows.
(i) Identification of -helical density and modeling of -helical axes and extent using maps with varying lowresolution cutoffs.
(ii) Determination of -helix placement (direction, rotation about and translation along the -helical axis) using the full available resolution.
(iii) Assembly of -helices, elimination of overlaps and joining of adjacent segments. The result of this process is a model of the -helical portions of the structure that can be used as a starting point for further model building and map interpretation. These steps are described in detail below.
2.1. Identification of a-helical density and modeling of a-helical axes and extent using maps with varying low-resolution cutoffs The first step in our process for modeling -helices in the electron-density map of a protein is to identify the -helices using a set of maps with low-resolution cutoffs from about 5 to 8 Å . While at high resolution an -helix has a rather complicated pattern of density (Fig. 1a), at a resolution of 7 Å an -helix appears as a tube of density (Fig. 1b), so that finding the -helices can be quite straightforward.
A map is calculated (typically with a grid of about 1/3 to 1/6 the resolution of the map) at low resolution (7 Å in Fig. 1b) and a set of points is identified along the axis of the tubes of density corresponding to -helices. The points are chosen to be a set for which (i) each point is in relatively high density (typically at least 2, where is the r.m.s. of the map), (ii) no   more than one point that is adjacent to a chosen point has an electron-density value that is greater than the value at the chosen point and (iii) each chosen point is at least a specified distance (typically 2 Å ) from each other chosen point. The second criterion is chosen to ensure that the chosen points are either at a peak of density or along a line of high density. A set of points satisfying these criteria for the map in Fig. 1 Fig. 1(c).
Next, the points along the axis of the tube of density as shown in Fig. 1(c) are used to guess the location and direction of the axis of the tube of density. Each point is considered as a possible marker of the center of a tube of density and the directions to every other point (typically including only those within 25 Å ) are considered, one at a time, as the direction of the tube of density. The center and direction are scored by calculating the electron density at intervals of typically 2 Å along the line they define and identifying the longest segment that satisfies the criteria that (i) every point along the line has a density of at least mean Â cut 1 , where mean is the mean density in the segment and cut 1 has a typical value of 0.5, and (ii) the points on the ends have densities of at least mean Â cut 2 , where the value of cut 2 is typically 0.75. These are the same criteria as used previously in building protein main-chain segments (Terwilliger, 2003). The score is then the square root of the number of points sampled along the line multplied by the mean: mean Â N 1/2 . For each point, the direction yielding the highest score is saved. An additional optimization of the direction is then carried out by sampling randomly chosen directions within approximately 30 of the saved direction. The overall highest scoring direction is then saved along with the extent of the segment in which the sampled points satisfied the two criteria. This yields a set of potential -helix locations, orientations and ends.
The final step in low-resolution identification of -helices is to score each potential -helix based on the correlation of density between the low-resolution electron-density map and an idealized tube of positive density. The basic idea in this scoring is to ensure that the potential -helices have high density down their axis and low density a few angstroms away from the axis, as would a tube of density. In this simple scoring scheme, the idealized density consists of a tube of density down the axis of the potential -helix with a density of 1 on the axis and zero elsewhere. The correlation is calculated down the axis of the -helix and on the surface of a cylinder with a radius of 4 Å and an axis coincident with the axis of the -helix. These correlations are then used to score each potential -helix location, and the top-scring locations (typically those with a correlation coefficient cc_helix_min of 0.5 or greater) are saved.
This process is typically repeated with maps with resolution cutoffs from about 5-8 Å and all the resulting -helices are considered in the following steps.

Determination of a-helix placement (direction, rotation about and translation along the helical axis) using the full available resolution
The second overall step in -helix identification is to use the high-resolution electron-density map to determine how an -helix could be optimally placed in the electron density given the helix axis and the ends of the helical segment. This is performed in three stages. Firstly, the positioning along the helix axis of the tubes of density in the map corresponding to the main-chain atoms in each (potential) helix is determined. The direction of the -helix is then identified and finally the positioning of an idealized -helix is identified. Fig. 1(d) illustrates the approach used to position the helix axis of a segment in ideal -helical density. The blue mesh corresponds to a contour of ideal density from an -helical segment and the gray helix is an ideal helix with a radius of 2 Å and a pitch of 5.4 Å . The parameter that is optimized in this step is the translation of the gray ideal helix along the SAD-phased density-modified electron-density map of a calcium pump (Sorensen et al., 2004) recalculated using the PHENIX AutoSol wizard at a resolution of 3.1 Å . (a) Section of map truncated at a resolution of 7 Å . (b) The same section as in (a) but calculated at a resolution of 3.1 Å , showing the helices found with the present procedure in yellow and those from the refined structure (PDB entry 1t5s; Sorensen et al., 2004) in red. This figure was created using Coot (Emsley & Cowtan, 2004) helix axis, with a score given by the mean density along the gray ideal helix multiplied by the square root of its length. As in the previous overall step, the ends of the helix are chosen to maximize its length, while requiring that the density at all intermediate points and at the ends be at least cut 1 or cut 2 times the mean in the segment, respectively.
The direction of the -helix is identified by maximizing the density at the positions where C atoms would be located given the location of the gray helix representing main-chain atoms as identified above. Fig. 1(d) illustrates this process. Two helices (shown in red and yellow in Fig. 1d) are constructed based on the gray helix. Each of these helices has a radius of 4 Å and a pitch of 5.4 Å . They are offset by AE1 Å along the helix axis from the gray main-chain helix. Depending on the direction of the helix, one of these two helices (the red helix in Fig. 1d) will typically be in much higher average density than the other, allowing the direction of the helix to be identified. A Z score is estimated reflecting the confidence in this difference from the ratio of the difference between the scores for the two directions to the estimated standard deviation of this ratio for random helix placements. This standard deviation is estimated from the variance of the values of the scores obtained for both directions, assuming incorrect periodicities of a helix of 80 , 90 , 110 and 120 . If the Z score was 2 or larger, the assignment of the direction was considered to be likely to be correct.
The positioning and extent of an idealized polyalanine -helix in the high-resolution electron density is then identified by a simple search over rotations about the helix axis and translations along the helix axis, trimming the ends in the same fashion as described above and scoring by the mean value of electron density at the coordinates of atoms in the idealized -helix multiplied by the square root of the number of atoms.

Figure 3
Accuracy of -helical models. The r.m.s.d. between the -helical models obtained using the present method and the corresponding refined models from

Assembly of a-helices, elimination of overlaps and joining of adjacent segments
The previous steps result in a collection of -helices that match the electron density but that may contain overlapping or otherwise incompatible fragments of -helix. The assembly of all these fragments and the resolution of overlaps is carried out by the main-chain assembly routines in the RESOLVE software (Terwilliger, 2003). This process consists of ranking all fragments (-helices) based on their match to the density using the scoring function described above and identifying fragments that have two or more sequential C atoms that overlap within about 1 Å and that can therefore be connected into longer chains. The highest scoring chain is then selected and all overlapping fragments are deleted. This process is continued until no fragments of at least a minimum length (typically four residues) are found. The resulting set of -helices is saved.

Application to experimental electron-density maps
We first tested our algorithm for -helix identification using the electron-density map of a calcium pump with a transmembrane segment consisting of -helices (Sorensen et al., 2004). For this analysis the map was recalculated using the PHENIX AutoSol wizard (Adams et al., 2002;Terwilliger et al., 2008) using SAD data to a resolution of 3.1 Å . A portion of this map truncated to a resolution of 7 Å is shown in Fig. 2(a). Tubes of density corresponding to helices are readily identifiable in the map. Fig. 2(b) shows the map at high resolution, along with the -helices that were identified using the procedure described here (in yellow) and the -helices from the refined structure (PDB entry 1t5s; Berman et al., 2000;Bernstein et al., 1977;Sorensen et al., 2004) (in red). It can be seen that the C positions of the -helices identified using the present method very closely match those in the refined structure.
We next applied the method to a set of 42 density-modified electron-density maps obtained with MAD, SAD, MIR and a combination of SAD and SIR procedures with data extending to high resolutions ranging from 1.5 to 3.8 Å . These maps were calculated with the PHENIX AutoSol wizard (Terwilliger et al., 2008) using data that had previously led to refined models for each of the structures considered. Each map was examined for -helices using the procedure described above. Table 1 summarizes the results of these tests, listing for each structure the number of residues of -helix in the refined structure (as calculated with DSSP; Kabsch & Sander, 1983), the number of residues of -helix found, the number of these residues that were correctly placed in -helices (with a C atom within 3 Å of a C atom in an -helix in the refined structure), the quality of the map (the correlation of the map with a map calculated from the refined model of the structure), the r.m.s. coordinate difference between main-chain atoms in the modeled -helices compared with those in the refined structure and the correlation between the map and a map calculated from the -helix model.
Overall, 63% of the 11 233 residues in -helices in the refined structures were found. Viewed differently, 76% of the residues that were built using the present method in fact corresponded to -helical segments of the refined structures, with a C atom within 3 Å of a C atom in an -helix in the refined structure. The remaining 24% were built into structure that was not identified as -helical by DSSP. The overall r.m.s.d. between modeled -helices and refined coordinates (matching the closest corresponding atom, e.g. C with C , and including incorrectly modeled -helices, but excluding any atoms more than 10 Å from any atom in the refined structures) was 1.3 Å . The CPU time (using 2.9 GHz Intel Xeon processors) required to analyze all 42 maps was 28 min or about 0.2 s per residue of -helix placed. To provide a frame of reference for these results, we carried out one cycle of automated model building applying the PHENIX AutoBuild wizard (Terwilliger et al., 2008) to the same maps as used above. This procedure includes RESOLVE model building and phenix.refine refinement. The AutoBuild wizard correctly built 75% of the 11 233 residues in -helices in the refined structures with an overall r.m.s.d. (for all main-chain and C atoms in the entire models built) of 0.95 Å , requiring 43 h for the 42 maps.
The maps used in this analysis were of fair to excellent quality, with correlations to model maps based on the corresponding refined structures of 0.53-0.89. Fig. 3(a) shows that for this set of maps the quality of the map has only a small effect on the quality of the -helices built, as reflected in the r.m.s.d. between the main-chain atoms in the -helices found and those in the corresponding refined models. Similarly, the resolution of the map, in the range 1.5-3.8 Å , had little effect on the quality of the models (Fig. 3b). However, it was possible to tell which models were accurate. Fig. 3(c) shows that the map-model correlation based on the coordinates of the -helices that were built is inversely related to the r.m.s.d. between those coordinates and those of the corresponding refined structures. Those models with a model-map correlation of greater than about 0.45 generally had an r.m.s.d. of less than about 1.5 Å and those with lower model-map correlation generally had an r.m.s.d. of greater than 1.5 Å .
One parameter that might be particularly important in determining both the accuracy of the procedure and the number of residues built is the map-correlation cutoff used to choose the density at low resolution (cc_helix_min). The default value is a correlation of 0.5. We tested a range of values of cc_helix_min for the set of 42 maps in Table 1. Fig. 4(a) shows the overall r.m.s.d. of main-chain atoms from those in corresponding refined models and Fig. 4(b) shows the total number of residues built. Increasing the threshold correlation results in more accurate models but fewer residues built and the default value of 0.5 appears to be a reasonable compromise between these effects.

Conclusions
The procedure described here for the rapid placement of -helices in electron-density maps may be useful in several contexts. Firstly, it may be useful as a method for the evaluation of map quality. Secondly, it may be useful in giving a rapid indication to a crystallographer as to whether they have successfully determined the structure in their crystals. Thirdly, it may be a useful approach to generating a partial model of a protein that can then be extended with other model-building tools. Accuracy and residues built versus cutoff for accepting helices. (a) The overall r.m.s.d. as in Fig. 3 is plotted as a function of the parameter cc_helix_min which defines the minimum correlation of density between a helix and the electron-density map. The default is 0.5. (b) The overall number of residues built for the 42 structures in Table 1 is plotted as a function of cc_helix_min.
The author would like to thank the NIH Protein Structure Initiative for generous support of the Phenix project (1P01 GM063210) and the members of the Phenix project for extensive collaboration and discussions. The author is grateful to the many researchers who contributed their data to the PHENIX structure library. The algorithm described here is carried out by the PHENIX routine phenix.find_helices_strands with the keywords trace_chain=False and helices_only=True.