Departments of Molecular and Cellular

An ab initio molecular-replacement method for phasing X-ray diffraction data for symmetric helical membrane proteins has been developed. The described method is based on generating all possible orientations of idealized transmembrane helices and using each model in a molecular-replacement search.

Obtaining phases for X-ray diffraction data can be a ratelimiting step in structure determination. Taking advantage of constraints specific to membrane proteins, an ab initio molecular-replacement method has been developed for phasing X-ray diffraction data for symmetric helical membrane proteins without prior knowledge of their structure or heavy-atom derivatives. The described method is based on generating all possible orientations of idealized transmembrane helices and using each model in a molecular-replacement search. The number of models is significantly reduced by taking advantage of geometrical and structural restraints specific to membrane proteins. The top molecular-replacement results are evaluated based on noncrystallographic symmetry (NCS) map correlation, OMIT map correlation and R free value after refinement of a polyalanine model. The feasibility of this approach is illustrated by phasing the mechanosensitive channel of large conductance (MscL) with only 4 Å diffraction data. No prior structural knowledge was used other than the number of transmembrane helices. The search produced the correct spatial organization and the position in the asymmetric unit of all transmembrane helices of MscL. The resulting electron-density maps were of sufficient quality to automatically build all helical segments of MscL including the cytoplasmic domain. The method does not require high-resolution diffraction data and can be used to obtain phases for symmetrical helical membrane proteins with one or two helices per monomer.

Introduction
Obtaining high-resolution structures of integral membrane proteins is one of the grand challenges in structural biology. Many processes important to the cell, such as electrochemical, immunological and signalling functions, occur at the membrane. Not surprisingly, membrane proteins are extremely attractive pharmacological targets. Modern biomedical research builds upon high-resolution structural information and the demand for membrane-protein structures is clearly increasing (Dahl et al., 2002), yet relatively few membraneprotein structures are known (Tusnady et al., 2004). Low expression, poor stability in the absence of the lipid bilayer, the presence of detergents and difficulty in forming well ordered crystals are some of the problems that account for the slow progress in membrane-protein structure determination. Even after crystals have been obtained, obtaining phases for X-ray diffraction data can be the next bottleneck.
In the past century, several methods have been developed to circumvent the phase problem in protein crystallography. Heavy-atom substitution (Robertson, 1935), direct methods and molecular replacement (Hoppe, 1957;Rossmann & Blow, 1962) are the most common ways to obtain approximate initial phases for electron-density calculation. Heavy-atom methods rely on the availability of numerous well diffracting crystals for soaking experiments and on the ability of the soaked compounds to bind at discrete locations within the protein. Such conditions can be sometimes difficult to achieve for membrane proteins (Bass et al., 2002). Membrane proteins expressed in eukaryotic expression hosts further suffer from difficulties in expressing selenomethionine-substituted protein.
If one can make a reasonable 'guess' as to the structure (for example, from a homologous protein), molecular replacement is the method of choice since no further experimental effort is required. Indeed, as the number of structures in the Protein Data Bank (PDB) increases (Berman et al., 2000;Sussman et al., 1998), molecular-replacement methods have become increasingly more popular. However, relative to soluble proteins ($1030 folds), the number of known membraneprotein folds ($40 folds) is very low (Berman et al., 2000;http://scop.mrc-lmb.cam.ac.uk/scop/;Tusnady et al., 2004). Although the total number of membrane-protein folds might be smaller than the number for soluble proteins, the small numbers of presently known membrane-protein folds render molecular replacement unlikely to succeed for many cases. However, membrane proteins have the advantage that their orientation is restricted in the lipid bilayer. By surveying known -helical membrane-protein structures, it is possible to obtain constraints on helical arrangements such as the maximum helix tilt angle, helix-helix distances and helixpacking preferences (Bowie, 1997(Bowie, , 1999Spencer & Rees, 2002;Strop et al., 2003). Additionally, the number of membrane-spanning helices can be often accurately predicted from the primary sequence (Cserzo et al., 1997;Krogh et al., 2001). Taking advantage of these constraints, we have developed an ab initio molecular-replacement method for phasing X-ray diffraction data for symmetric helical membrane proteins ( Fig. 1). After generating an exhaustive ensemble of plausible models, each model is subjected to a molecularreplacement search. The top molecular-replacement models are evaluated based on noncrystallographic symmetry (NCS) map correlation, OMIT map correlation and free R value after simulated-annealing refinement. As a test case, we successfully obtained phases for the mechanosensitive channel of large conductance (MscL; Chang et al., 1998) without any prior structural knowledge other than the number of transmembrane helices.

Model generation
Idealized C traces of helical assemblies with n-fold symmetry were generated using an implementation of the algorithm in MATHEMATICA (Wolfram, USA). The geometric quantities are defined in Fig. 2(a), where the distance from the bundle symmetry axis to the projection of the first C atom on the helix axis is designated r hi and the rotation angle of the helix tilting plane hi , the helix tilt hi and the helix axial rotation hi are the Euler angles according to Arfken for a helix in bundle i (Arfken & Weber, 1995). The symmetrical assembly of helices is defined such that the origin of each helical coordinate system (i.e. the helical axes; Fig. 2) lie evenly spaced about the circumference of a circle of radius r h . A helical bundle may also undergo a collective rotation about the bundle symmetry axis with angle b (Fig. 2c). Each helix was constructed with a 1.45 Å rise per residue, 3.76 residues per turn and an C helix radius of 2.58 Å (Kleywegt, 1999), producing an overall length of l h = (n r À1) Â 1.45 Å , where n r is the number of residues. Thus, the variable parameters in generating all of the models are r hi , hi and hi . The helical orientation (N-to C-terminal direction) and rotation along the individual helical axes ( hi ) are not considered in the search. Although helix rotation might have a small effect on the quality of the molecular-replacement solution even at low resolutions, i.e. 5-6 Å , we use this approximation to limit the size of the calculation. The effect of helix rotation is also reduced by the use of polyalanine models and by helix translation in the subsequent molecular-replacement search.
Several important restrictions were considered in creating the ensemble of structures. The restraints for r h1 are calculated from r h1 = d/[2 sin (/n)], where d is the side length of an n-sided polygon. The minimum r h1 is the smallest radius possible for an n-fold symmetric helical bundle with helices that have a diameter of 9 Å (an approximate diameter of a typical membrane-protein helix with its side chains). The maximum r h1 for the inner bundle occurs when an outer bundle would be intercalated into the inner bundle, equivalent to a 2n-sided polygon (for a protein with two transmembrane helices per monomer). Here, n is replaced with 2n in the preceding equation. When constructing the outer helical bundle, r h2 is subjected to the restraint r Ã h1 r h2 r Ã h1 + l h1 sin Ã h1 + l h2 sin max h2 . The asterisks indicate the chosen models from the inner bundle search. The maximum r h2 occurs when h1 and h2 are at their maximum values (Table 1) and the inner and outer helical bundles are still in contact (Fig. 2b). In all cases, to ensure that helical space was sampled equivalently, we utilized the relation Á h = s/(l h sin h ) such that the increment Á h decreases with increasing h , where s is the distance between the helical axes (at the last C positions) in the ensemble. In other words, the number of h angles is calculated by dividing the circumference described by the end of the tilted helix by the helical spacing s.
Each generated model was checked for steric clashes (models where the minimum inter-axial distance between any helices was less than 9 Å were eliminated). Additionally, models where the minimum inter-axial distance between inner and outer bundle helices was greater than 10.5 Å were also eliminated to ensure that the outer helical bundle would come into contact with the inner bundle. From this restrained ensemble, idealized polyalanine helix models including all main-chain and side-chain atoms were created with MOLEMAN and LSQMAN (Kleywegt, 1999). (McCoy et al., 2005). All models were searched against an MscL data set (Chang et al., 1998) limited to 15-5.0 Å resolution, 40% identity and no allowed C clashes. Peak-selection criteria were set to 80% in order to optimize the calculations. The top molecular-replacement models were assessed with the Z score (Z) and log-likelihood gain (LLG) statistics (McCoy et al., 2005). Ten solutions with the highest Z score, ten solutions with the highest LLG score and ten solutions with the highest Z*LLG scores were selected for further scoring by NCS and OMIT map correlations, although other alternative ways of selecting top solutions are possible. All molecular replacements for the inner helical bundle (305 models) were completed in approximately 80 CPU hours utilizing a 2.8 GHz Intel Xeon P4 processor. The secondary search for the combined inner/outer helical bundles (1050 models Geometric description of the parameters used to generate the helical models. Constraints and limits are given in x2 and Table 1. Parameters of the helices in the inner bundle are subscripted h1, while the parameters of the outer helical bundle are subscripted h2. (a) The first helix is rotated with Euler angles h1 , h1 and h1 . The resulting new helix orientation is shown in a lighter shade of gray. (b) The second helix is rotated with Euler angles h2 , h2 and h2 and a distance from the protein symmetry axis r h2 . (c) b is the rotation of the outer helical bundle about the inner helical bundle. Table 1 Parameters used for model building according to the geometry indicated in Fig. 2. The increment of the hi angle is dependent on the tilting angle hi and is calculated by dividing the circumference described by the end of the tilted helix by the helical spacing s as described in the methods. completed in 130 CPU hours. The entire procedure is easily adaptable to parallel processing since each molecularreplacement search is independent.

NCS map correlation, OMIT map correlation and refinement
The NCS map correlation scoring takes advantage of the fact that if the position of a model in the asymmetric unit is correct then the NCS axis of the search model will coincide with the crystal's NCS axis. In such cases, the NCS map correlation between the five monomers should be higher than if they were incorrectly placed within the asymmetric unit. Density modification including solvent flattening and NCS averaging with phase extension was performed in DM (Collaborative Computational Project, Number 4, 1994) prior to NCS map correlation calculation. The NCS mask for monomer A was calculated in NCSMASK from the coordinates of monomer A with a radius of 15 Å , removing any overlaps. NCS operators were obtained from the oriented model with LSQMAN (Kleywegt, 1999).
Independent of the NCS map correlation calculation, segments of five residues were omitted from each helix in the top molecular-replacement solutions and the maps generated from these models were subjected to prime-and-switch density modification with fivefold NCS averaging in the program RESOLVE (Terwilliger, 2000). After density modification, the map correlation for the omitted residues was computed with OVERLAPMAP (Branden & Jones, 1990), where the correlation coefficient is calculated as CC = (hxyi À hxihyi)/[(hx 2 i À hxi 2 ) 1/2 (hy 2 i À hyi 2 ) 1/2 ]. The product of the OMIT and NCS correlation scores (NCS*OMIT) was used to delineate the top molecular-replacement models. The resulting polyalanine models were subjected to rigid-body and torsion-angle simulated-annealing refinement with the MLF maximumlikelihood target function as implemented in CNS (Brü nger et al., 1998).

Results and discussion
Our method is summarized in Fig. 1  and outer (d) helical bundle ensembles. In all panels, the top ten Z, LLG and Z*LLG scores are shown in red, the best solution from a coarse search is shown in blue, and the best solution from a fine grid search is shown in green. For clarity, the best coarse and fine search solutions are also shown as large squares in panels (a) and (c).
for the inner and combined inner/outer helical bundles. Firstly, all possible models of the inner helical bundle were constructed and used as models in molecular-replacement searches. The top solutions from the molecular-replacement searches were further evaluated based on NCS map correlation and OMIT map correlation. At this point, the process was repeated with finer parameter variation in order to achieve a more precise model. Next, the top solution was 'fixed' in place and all possible conformations of the outer helical bundle were constructed. The resulting models were again subjected to molecular-replacement searches and the top solutions were scored based on NCS and OMIT map correlation. Once again, the process was repeated on a finer grid in order to achieve a more precise model.
The molecular replacement for the inner helical bundle (five helices) representing the inner transmembrane core of MscL was performed with a coarse set of 305 models ( Table 1). The resolution range of the diffraction data was restricted to 15.0-5.0 Å in order to exclude high-resolution detail missing from our polyalanine helical models. The molecular-replacement results were represented as a scatter plot of Z versus LLG scores (Fig. 3a). Since the inner helical bundle is a small fragment of the asymmetric unit, distinguishing the correct solution from incorrect solutions is difficult. To circumvent this problem, we assessed the models with the highest Z, LLG and Z*LLG scores (shown in red and blue in Fig. 3a) with NCS map correlation and OMIT map correlation coefficients. The 'best' model, i.e. that with the largest product of NCS and OMIT map correlation, yielded the parameters r h1 = 10 Å , h1 = 115 and h1 = 30 (coloured blue in Figs. 3a and 3b). In order to obtain a higher accuracy solution for the inner helical bundle, a second finer search was performed around the top coarse solution. After repeating the process with the 'finer' parameters (Table 1), the resulting molecular-replacement models were again subjected to scoring by NCS and OMIT map correlation. The resulting best model yielded the parameters r h1 = 11 Å , h1 = 120 and h1 = 40 .
After the solution for the first (inner) helical bundle was found, an ensemble of second (outer) ring helices was constructed around the fixed geometry of the inner bundle. Roughly 1040 models composed of two transmembrane helices per monomer (ten helices in total) were subjected to another round of molecular-replacement searches ( Table 1). The top results were again evaluated by NCS and OMIT map correlations (Figs. 3c and 3d), revealing the best solution with parameters r h2 = 23 Å , h2 = 131 , h2 = 30 and b = 14 . A finer grid search (see Table 1) around the top solution was also performed, producing r h2 = 21 Å , h2 = 122 , h2 = 30 and b = 10 . The correct solution was further distinguished from incorrect models by rigid-body and torsion-angle simulatedannealing refinement with a maximum-likelihood target function (Adams et al., 1999). The top five models were subjected to refinement and evaluated with the R free statistic (Fig. 4a). Three models converged to approximately the same structure with an R free of 0.46-0.48 (Fig. 4b). The two incorrect models are thus easily distinguisheable by their higher R free values (0.52-0.54). The final model found by ab initio molecular replacement and the known structure of MscL are qualitatively in good agreement (Figs. 5a and 5b).
To further validate the correctness of the final model, we calculated an anomalous difference electron-density map of an MscL gold derivative (Chang et al., 1998)   Comparison of (a) structure and (b) R free statistic for the top five converging (light blue, dark blue and green) and nonconverging (black and gray) models after torsion-angle simulated-annealing refinement with a maximum-likelihood target function. The known transmembrane structure of MscL (red) is shown for reference. ches. The resulting map contoured at 4 is shown in Fig. 6(a) and clearly shows the symmetrical positions of the Au atoms. Anomalous difference maps calculated with incorrect models were not symmetrical and yielded no significant peaks at the known gold positions. We have also omitted a region of the inner helix (in all five monomers) and subjected the maps from this omitted model to a prime-and-switch density-modification protocol (Terwilliger, 2000). The resulting 2F o À F c OMIT maps contoured around the helices are shown in Figs. 6(b) and 6(c).
Crucial questions are whether the resulting electron-density maps provide new features that are not part of the search model and whether they are sufficient for model building.
Remarkably, in addition to the electron density of the transmembrane region, there is also a visible density for the cytoplasmic helical bundle (Fig. 6d). This extra-membranous region was not included in the molecular-replacement searches, demonstrating the presence of 'new' information in the electrondensity maps. Automated construction of helical fragments with ARP/wARP (Perrakis et al., 1997) successfully built main-chain atoms for all ten helices in the transmembrane region as well as the five helices in the cytoplasmic region (Figs. 5c and 5d). While the ARP/wARP helix builder has been reported to work down to 3.5 Å resolution (http:// www.embl-hamburg.de/ARP/), tracing of side chains and loops requires higher resolution data sets (at least 2.6 Å ; http://www.embl-hamburg.de/ARP/). Therefore it is not surprising that ARP/ wARP did not succeed in automatically building side-chain and loop regions using the 4.0 Å resolution data set. However, the new helical model generated in ARP/wARP followed by crystallographic refinement reduced R free to 41.5%, improving electron-density maps for manual model building.
The native oligomerization state of membrane proteins is not always apparent from biochemical studies. It is often difficult to distinguish between closely related oligomerization states (such as tetramer, pentamer or hexamer). In many cases, only an approximate oligomerization state can be experimentally deduced. To address whether it is necessary to know the oligomerization state of the protein prior to using ab initio molecular replacement, we have also performed the entire procedure assuming incorrect fourfold and sixfold symmetry. The Z versus LLG plot for molecular replace-ments of models with fourfold (red), fivefold (green) and sixfold (blue) symmetries is shown in Fig. 7(a). The correct fivefold-symmetric model scored much higher than either the fourfold or sixfold symmetric models in the molecular-replacement searches. This distinction is also supported by the NCS*OMIT product scores (Fig. 7b) and is even more pronounced in the R free statistics after refinement of the top models (Fig. 7c). Therefore, it appears that for MscL it is not necessary to know the oligomerization state. In situations where the oligomerization state is unknown, our ab initio molecular-replacement method could thus be used to determine the molecular symmetry.

Conclusions
One limitation of all molecular-replacement methods is that finding a correct solution becomes increasingly difficult as the   (Fenn et al., 2003;Kraulis, 1991) fractional amount of scattering mass present in the search model decreases. In the case of MscL, we were able to find the correct solution with as little as 15% of the scattering mass, corresponding to one idealized transmembrane helix per monomer (the crystal structure of MscL consists of 3954 atoms, while the search model consisted of five polyalanine helices with a total of 575 atoms). Finding a molecularreplacement solution with such a low percentage of the asymmetric unit was probably aided by the high solvent content of the MscL crystals (85%). Additional algorithms such as generalized molecular replacement of flexible elements (Brü nger, 1991) or normal-mode analysis (Suhre & Sanejouand, 2004) might be useful for these difficult molecular-replacement searches. In many cases, it may therefore be necessary to perform the molecular-replacement searches with a larger fraction of the asymmetric unit. However, as the number of helices in the model increases, the number of necessary models grows significantly. For example, if both helices of MscL were used together, the search would increase approximately 200-fold.
Our search method benefited greatly from reducing the total number of possible helical arrangements by utilizing  (a) Anomalous difference electron-density map (red) contoured at 4 computed for a gold derivative using the phases obtained from ab initio molecular replacement. 2F o À F c map contoured at 1 computed from ab initio phases with omitted residues shown in green for (b) the complete transmembrane ensemble, (c) for a single transmembrane helix and (d) for the cytoplasmic domain (for clarity, the electron density is shown for only one cytoplasmic helix). geometrical and structural restraints. For larger helical assemblies, additional restraints or constraints limiting the number of models would be advantageous. Such restraints can come from experimental evidence of disulfide bonds, disulfide scanning experiments, chemical cross-linking (Faulon et al., 2003) or electron paramagnetic resonance (EPR) spectroscopy (Perozo et al., 2001(Perozo et al., , 2002. Additional reduction of the number of models can be achieved through the use of global search methods. Global search methods have been successful in some cases in predicting oligomeric membrane-protein structures (Adams et al., 1995(Adams et al., , 1996Arkin et al., 1994) and could be used to decrease the parameter space of the ab initio molecular-replacement searches. In some cases it may be possible to locate the NCS axis using a self-rotation function, although this was not possible for MscL. Restraining the orientation of the NCS axis could then simplify the molecularreplacement searches and thus significantly speed up the procedure. In cases where the number of models is too large to consider, statistical search tools such as genetic algorithms can provide an alternative approach. Although genetic algorithms are nondeterministic, they can be used to find approximate solutions to search problems and have been successful in many applications including molecular-replacement phasing (Kissinger et al., 1999).
Presently, our ab initio molecular-replacement method is applicable mainly to symmetrical membrane proteins. Although it is difficult to estimate how many symmetrical membrane proteins are present in the genomes, one can obtain a rough estimate by examining known membraneprotein structures. Currently, there are 44 known -helical membrane-protein families in the Protein Data Bank (Raman et al., 2006). 52% of these membrane-protein families form homo-oligomeric structures. Furthermore, in 31% of -helical membrane-protein families the association of monomers forms the region responsible for the functionality of the protein. For example, in many ion channels the ion-conducting pathway coincides with its symmetry axis. Although these statistics might not hold in the future, there is a significant chance that many new membrane-protein structures will be symmetric.
Achieving a high-resolution structure of a membrane protein is a formidable task. Once the barrier of producing quality diffracting crystals has been overcome, the next hurdle is the phase problem. We have shown that membrane proteins can provide sufficient constraints in the placement of -helices to make ab initio molecular replacement possible. Therefore, our ab initio molecular-replacement method should be an important tool for phasing membrane proteins.