From electron microscopy to X-ray crystallography: molecular-replacement case studies

Test studies have been conducted on five crystal structures of large molecular assemblies, in which EM maps are used as models for structure solution by molecular replacement using various standard MR packages such as AMoRe, MOLREP and Phaser.


Introduction
Developments in technology and methodology have increasingly narrowed the gap between the two major structural biology techniques: X-ray crystallography and single-particle cryo-electron microscopy (cryo-EM). X-ray crystallography, the most effective method for obtaining the atomic model of a molecule, is being used more and more often in tackling the structures of large molecular complexes. Structures of macromolecules that are megadaltons in molecular weight, such as the ribosome and fatty-acid synthase, have been solved by X-ray crystallography (Jenni et al., 2006;Korostelev et al., 2006;Nissen et al., 2000;Selmer et al., 2006;Lomakin et al., 2007). These large molecular complexes are traditionally studied by cryo-EM, in which two-dimensional projection images of the molecule are used to reconstruct a threedimensional map at low resolution (Baker et al., 1999). Because of recent improvements in EM instruments and reconstruction techniques, it is now common to achieve a cryo-EM reconstruction at a resolution better than 10 Å (Rossmann et al., 2005). Structures have even reached 4 Å resolution (van Heel et al., 2000), which is approaching the atomic resolution obtained by X-ray crystallography.
These advances facilitate the interplay of the two methods, either by providing an EM image for X-ray studies or by providing an X-ray structure for EM studies. In the former case, an EM image is used as a search model for X-ray crystal structure determination by molecular replacement (MR). Initial MR phases at low resolution can then be extended to high resolution by density-modification techniques such as noncrystallographic symmetry (NCS) averaging. Initial MR phases can also be used to locate heavy-atom positions using difference Fourier calculations, followed by subsequent heavy-atom phasing in isomorphous replacement or anomalous dispersion methods. In the latter case of using X-ray crystal structures to interpret EM data, parts of a large molecular complex whose structures have been solved by X-ray crystallography can be docked into the EM reconstruction map. This frequently results in a more detailed structural description of a complex than revealed by the EM map or the individual X-ray structures.
The exchange of information between the two techniques has been facilitated by a concerted effort to make EM images available with a standard convention (Heymann et al., 2005). A significant development is the establishment of a public database, the Electron Microscopy Data Bank (EMDB), for three-dimensional EM data deposition (http://www.ebi.ac.uk/ msd). The EM images are deposited in the same CCP4 map format as used in X-ray crystallography. This standard format will significantly contribute to the increased use of EM data by the X-ray crystallographic community.
Here, I will discuss issues related to the use of EM images as MR models for X-ray crystal structure determination. This study complements an excellent review by Dodson (2001). I will present three case studies, in which EM maps of the 50S ribosomal subunit, 70S ribosome and fatty-acid synthase (FAS) were used to solve their X-ray crystal structures. Information about the EM images and the crystal data is summarized in Table 1. A brief procedure of EM map preparation for MR program packages such as AMoRe (Navaza, 2001), MOLREP (Vagin & Teplyakov, 1997) and Phaser (Read, 2001) will be described. The effect of errors, including those in EM data and MR solutions, on structure determination and phase extension will be discussed.

EM map preparation for molecular replacement
The use of EM maps for molecular replacement exactly parallels the use of atomic coordinates. Before an MR run, a model-preparation step is carried out, which involves placing the model in an artificial unit cell with dimensions three or four times as large as those of the model. This ensures fine sampling of reciprocal space when the model is Fourier transformed to structure factors. These finely sampled model structure factors can then be used to compute 'crystal' structure factors by interpolation whenever they are needed within the MR procedure, such as preparing inputs to the rotation and translation programs and computing structure factors for the moving models during rigid-body refinement. The proce-dures of preparing the EM map are described below with the required programs mentioned, which are either from the CCP4 suite (Collaborative Computational Project, Number 4, 1994) or from the RAVE package of the Uppsala Software Factory (Kleywegt & Jones, 1999). A more EM-friendly version of AMoRe distributed by the program's author does not require items (ii) and (iii) described below.
(i) Scale the EM map to the correct magnification. The EM map is frequently deposited in the database on a threedimensional grid of arbitrary step/pixel size, resulting in an arbitrary size of the molecule. The true grid step size should be obtained from the EM experimentalists and then applied to the EM map to yield an image of the correct size. Alternatively, the size of the EM map can be adjusted by comparing it with a separate atomic model when it is available. Adjusting the size of the EM map can be conveniently performed by using MAPMAN to convert the CCP4-format map to an ASCII/text-format map such as the X-PLOR/CNS format and then editing the unit-cell parameters in the map header. MAPMAN also produces a NEWEZD-type ASCII map whose unit-cell parameters may be changed as described above.
(ii) Place the EM map into a large P1 cell. This can be achieved using the program MAPROT with desired unit-cell parameters and grid steps specified. The program requires a mask that covers the molecule. The mask can be converted from the EM map itself using programs such as MAMA or MAPMASK. The resulting EM map in the large unit cell can then be directly supplied to the MR program MOLREP as the search model.
(iii) Calculate structure factors from the placed model. When using the current CCP4 distribution of the MR programs AMoRe or Phaser, the EM map needs to be Fourier transformed to structure factors using a program such as SFALL. The structure factors can then be input into either AMoRe to generate the model structure-factor table, using the sorting function (SORTFUN), which takes the place of TABLING in an ordinary AMoRe run or Phaser to define the search ensemble.
(iv) Compare the average intensity/amplitude distribution against resolution (power spectrum) between the data generated from the EM map and the X-ray experimental data. It is advantageous to have similar distributions in the range used in the molecular replacement so that the scaling of F c to F obs is more accurate. This can be used as a guide to decide what resolution range should be included in the MR search and the proper B factor by which to either sharpen or blur the EM image.

Case study 1: the effects of EM magnification or scaling errors
Because electron microscopy is unable to directly determine the absolute size of the molecule and the magnification error can be as large as 5% (Dodson, 2001), it is important to investigate the effect of this error on structure determination by molecular replacement. Test studies were conducted using an EM map of the Escherichia coli 50S ribosomal subunit (Matadeen et al., 1999; EMDB accession code 1019) as the search model for molecular replacement with the X-ray data from the Haloarcula marismortui 50S crystals (Nissen et al., 2000). The initial EM map was adjusted to approximately the correct size by comparing it with the known atomic model of the 50S ribosomal subunit. Various magnification errors from À15% to 15% were then introduced by altering the map header as described above. MR tests using the altered EM maps as search models were carried out using MOLREP at resolutions of 10 and 20 Å . In a second series of tests, structure factors were calculated from these EM maps and served as test data for an MR search by Phaser using an atomic model of H. marismortui 50S.
The test results indicate that MR solutions are not very sensitive to EM magnification errors and are a function of the highest resolution used in the search. For example, up to AE10% error can be tolerated at 20 Å resolution and up to AE5% error can be tolerated at 10 Å resolution (Fig. 1a). Similarly, when using the atomic model to search against 'crystal' data converted from EM maps of altered sizes, a solution could be found with up to AE10% size errors (Fig. 1b).
The correct magnification is important for later steps such as phase extension because it provides better starting phases, even though it may not be crucial for molecular replacement.
Therefore, it is advantageous to find out the correct magnification by testing the correlation of the X-ray data with data calculated from EM maps at different sizes. Alternatively, this can also be achieved by choosing the optimal size that gives the best MR solution. In this test case the correct EM map size is $2% smaller than the starting map (Fig. 1).

Case study 2: E. coli 70S ribosome in two crystal forms
The two recently published crystal structures of the E. coli 70S ribosome (Korostelev et al., 2006;Selmer et al., 2006) provide excellent MR test cases to study the effects of different space groups and different numbers of independent molecules in the asymmetric unit (AU) of the crystal. There is one 70S molecule in the AU of the I422 crystal and there are two molecules in the AU of the P2 1 2 1 2 1 crystal. A cryo-EM map of the 70S particle at 11 Å resolution (Rawat et al., 2003; EMDB accession code 1008) was used as the search model. The test runs were carried out at 12 Å resolution using the program Phaser.
The MR search in the I422 crystal form was straightforward, with outstanding peaks for the correct solution in both the   rotational and translational searches (Table 2). In the P2 1 2 1 2 1 crystal form two orientations are clearly found in the rotational search and two corresponding translational solutions were also found when looking for the first molecule alone ( Table 2). The two translational solutions obtained from the automated search routine represent correct solutions individually but with arbitrary choices of origins that may be different. However, a solution for the second molecule was not found using the automated routine. The structure was solved by running a second translational search in Phaser with molecule 1 fixed manually and the rotational angles for molecule 2 also fixed. The angles for molecule 2 were taken from the first Phaser run searching for molecule 1.

Test case 3: an unknown structure of yeast fatty-acid synthase (FAS)
Yeast FAS, a 2.6 MDa 6 6 protein complex with point-group symmetry 32, was crystallized in two space groups, with one FAS particle in the AU of a P2 1 crystal and half an FAS particle in the AU of a P4 3 2 1 2 crystal (Lomakin et al., 2007)  Stoop, University of Texas; Kolodziej et al., 1997) was used as the search model (Fig. 2a).

MR solution in the P2 1 crystal
An outstanding solution was found at 24 Å resolution using either AMoRe, MOLREP or Phaser. This solution was corroborated by packing analysis and by comparing the calculated self-rotation peaks from the positioned model with those of the observed X-ray data (Fig. 2b). However, phase extension to high resolution by sixfold NCS-averaging was unsuccessful, presumably owing to inaccuracy in the MR solution at low resolution, which results in errors in the NCS operators. When compared with the final solution from an atomic model that later became available, the solution obtained from the EM map model had a positional error of a few angstroms in the xz (ac) plane (Fig. 2c). This error is presumably too large to be corrected by NCS-operator refinement during the averaging process.

MR solution in the P4 3 2 1 2 crystal
An MR solution could not be found in the P4 3 2 1 2 crystal form with standard searches using the full or half FAS particle as the model after extensive trials with various resolution cutoffs, magnification and intensity corrections. However, analysis of crystal content and the X-ray data greatly simplifies molecular replacement. Because there is only half an FAS particle in the AU, the other half of the molecule must be generated by the crystallographic twofold symmetry. Therefore, the particle is located on the twofold crystallographic axis. The orientation of the FAS around the twofold axis can be defined by the threefold axis of the particle, which can be found from the peaks in the threefold section ( = 120 ) of the self-rotation plot (Fig. 3a). This analysis reduced the search from six-dimensional to just one-dimensional along the xy diagonal where the twofold crystallographic axis is located (Fig. 3b).
A one-dimensional search was carried out using the line search option in Phaser. The EM map of half an FAS particle was placed on the xy diagonal with its threefold axis oriented according to the self-rotation peak. A solution was found but was initially unrecognized for two reasons. Firstly, the Z score of the correct solution was only 3.7, well below the value (Z score 5) above which a solution is considered to be correct. Secondly, this solution scores below a false solution of Z score 4.0. The solution was identified a posteriori when an MR solution was determined from an atomic model consisting of C atoms of homologous proteins of several domains of the molecule (Jenni et al., 2006). Although the atomic model consists of less than 5% of the diffracting mass of the FAS particle, the MR solution in Phaser was readily obtained using data in the 60-10 Å resolution range (rotational Z score of 8.7 and translational Z score of 17.3). This is because these C atoms are from domains that are relatively evenly distributed within the FAS particle; they define the shape of the molecule sufficiently well and correlate well with the FAS electron density at low resolution. However, the C model worked only at resolutions higher than 12 Å in both crystal forms. Neither the cryo-EM model nor the C model yielded a solution for the tetragonal crystal at a resolution lower than 12 Å , presumably owing to the strong effect of the solvent contribution to the diffraction at low resolution.

EM model phase extension
Phase extension by cross-crystal averaging between the two crystal forms was carried out using the program DMMULTI starting from phases calculated from an EM map correctly Self-rotation (a) and packing analysis (b) of the P4 3 2 1 2 FAS crystal. The threefold axis of the FAS particle is indicated by the white triangle in (b). placed by Phaser. An initial averaging mask was defined by the EM map. The NCS matrices within each crystal and those relating the two crystals were defined by the MR solutions from the atomic model. Because of the ninefold NCS relationship (sixfold in the P2 1 crystal and threefold in the P4 3 2 1 2 crystal) and the accuracy of the NCS axes, the phase extension was successful and produced an electron-density map of extraordinary quality at 4 Å resolution (Fig. 4). The X-ray structure factors were sharpened with a B factor of À120 Å 2 to calculate the map. The B-factor sharpening dramatically enhanced the higher resolution details of the electron-density map. The sharpening magnitude was chosen to maximize the enhancement while not overinflating the noise at higher resolution. It is possible that the NCS matrices defined by the MR solution from the EM map model would also work in the case of cross-crystal averaging because the high degree of NCS averaging might produce a radius of convergence large enough to refine the NCS matrices and phases to the correct values.

Conclusions
Test cases demonstrate that cryo-EM maps serve as viable models for MR solution of X-ray crystal structures. Of the five X-ray diffraction data sets tested, only one could not be solved by molecular replacement with an EM model using standard procedures. A cryo-EM model worked at very low resolution (82-24 Å ) in a case when an incomplete atomic model could not produce a solution at the same resolution range. When a high degree of NCS is present, phases generated from the lowresolution EM map can be extended to high resolution by averaging. NCS occurs frequently as a large fraction of macromolecules fortuitously crystallize with more than one molecule in the asymmetric unit of the crystal. Furthermore, the use of an EM map as a search model is likely to be applied to large molecular complexes, which are frequently highly symmetrical and therefore may crystallize with a high degree of NCS.