CCP4 study weekend
Phasing the 30S ribosomal subunit structure
aMRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England
*Correspondence e-mail: ramak@mrc-lmb.cam.ac.uk
The methods involved in determining the 850 kDa structure of the 30S ribosomal subunit from Thermus thermophilus were in many ways identical to those that are generally used in standard protein crystallography. This paper reviews and analyses the methods that can be used in phasing such large structures and shows that the anomalous signal collected from heavy-atom compounds bound to the RNA is both necessary and sufficient for ab initio at high resolution. In addition, measures to counter problems with non-isomorphism and radiation decay are described.
Keywords: ribosomes; phasing; 30S ribosomal subunit; protein translation.
1. Introduction
In recent years, the limit on the kinds of macromolecules that can be tackled by traditional crystallographic methods at a resolution where a complete atomic model can be built (based on the electron density and knowledge of the molecular structure of the sample) has been pushed into the megadalton range. Several structures of both of the bacterial ribosomal subunits, the small 30S subunit of approximately 850 kDa and the large 50S subunit of around 1.5 MDa, have been determined using a combination of heavy-atom phasing methods (Ban et al., 2000; Schluenzen et al., 2000; Wimberly et al., 2000), as has the ten-subunit structure of the yeast RNA polymerase II holoenzyme of more than 500 kDa (Cramer et al., 2001). These structures are not only the result of decades of experience in how to stabilize and crystallize large complexes, but also owe their successes to major technical innovations in data collection and the large group of scientists who work hard to improve and expand synchrotron-radiation facilities around the world. Together, these structures provide hope that the crystallographic method is, in principle, without limits in terms of the size of macromolecule and that can be handled.
It is a common misperception that determination of very large structures requires an entirely different set of data-collection and phasing methods compared with those used for smaller individual protein structures. Traditionally, it has been believed that large clusters of heavy atoms were needed to crack the initial ). This is with good reason, because single heavy-atom sites can be extremely difficult if not impossible to locate manually in noisy Patterson maps of large unit cells, a problem which can be alleviated by using clusters which appear as large `super-atoms' at low resolution. Once located, such clusters will provide very good phase information, but only at low resolution unless the individual atoms can be resolved, which is rarely the case. These phases can then be used to calculate difference Fourier maps for separate single heavy-atom soaks which are inherently less noisy than the Patterson maps. For structures in the megadalton range, such as the 50S and 70S ribosomal subunit structures, the initial identification of cluster positions was aided by using crude low-resolution cryoelectron-microscopy masks as kinds of molecular-replacement search models (Ban et al., 1998; Cate, 2001; Cate et al., 1999). Whereas phase information from clusters or low-resolution masks can indeed be convenient tools in the initial search for heavy atoms, they are often not imperative for either the location of heavy atoms or the ab initio phase calculation, as we will show later. Heavy-atom location and programs are now so sensitive that in most cases they will be able to locate the individual sites directly without prior phase information. In the following, we will describe details of the phasing and structure solution of the 30S ribosomal subunit from Thermus thermophilus and discuss some of the specific problems that were addressed.
in these cases (Blundell & Johnson, 19762. Data collection
Phasing of very large structures often goes hand in hand with attempts to improve the resolution of the diffraction data. Rod-shaped crystals of the 30S subunit from T. thermophilus with symmetry P41212 grow in about 15% MPD (Trakhanov et al., 1987) and reach a final size of 80–120 µm across after several weeks or even months at 277 K (Clemons et al., 2001). Efforts to improve the quality of the diffraction data from these crystals were made on several fronts, particularly by standardizing and automating crystal handling and mounting, pre-screening of crystals to select those suitable for high-resolution studies, fine-slicing φ-oscillations and the use of multiple crystals per data set (Clemons et al., 2001).
2.1. Crystal decay
Crystals of the 30S subunit are extremely sensitive to radiation, even at 100 K. This does not seem to be a general feature of large protein–RNA complex crystals, since crystals of both 50S subunits and 70S ribosomes (which have approximately the same protein:RNA ratio as the 30S subunit) are much less sensitive. However, such effects may well be more pronounced for large complexes, when they occur. Crystal decay poses problems, particularly when collecting anomalous heavy-atom data, where phasing is dependent upon small differences between the Friedel mates in the reflection set. For this reason, we decided early on to use the three-circle κ-goniostat installed at The Advanced Photon Source (APS) beamline 19ID to orient crystals during anomalous heavy-atom data collections (Wimberly et al., 2000). By orienting a crystal accurately so that it rotates about a normal to a mirror plane in not only are Friedel mates recorded on identical or close frames, but it is also easier to keep track of which parts of have been covered by the individual wedges of data. The P41212 belongs to the 4/mmm, which means that two mirror planes and a fourfold axis are present in the a*b* plane and a maximum of 45° of data needs to be collected for a complete data set if the crystal is aligned with c* on the spindle axis. To avoid overlaps at high resolution (in this case between 3.5 and 3.0 Å), very thin slicing of was necessary and most data sets were collected using 0.1° oscillations. Together with the crystal decay problems, however, this meant that only about 6–8° of data could be collected from each crystal and consequently that about 5–10 crystals were needed for each full data set.
2.2. and mosaicity
The unit cells of crystals of large macromolecules often show significant variation and this can cause problems when merging data from several crystals together. Furthermore, proper separation of high-resolution reflections is dependent on the mosaicity not being too high. In practice, this means that only a small fraction of a population of crystals will be suitable for data collection at high resolution and these can conveniently be selected at an early stage by screening all available crystals and selecting only those that fit the criteria. For the 30S subunit, crystals were initially screened at Daresbury SRS beamlines 9.6, 14.1 and 14.2 and selected for diffraction limit and mosaicity as judged from two single 0.1° oscillation images collected at right angles to one another. Crystals which met the criteria of diffracting to at least 4 Å at this lower-intensity source with an apparent mosaicity of less than 0.35° were kept and used for high-resolution data collection at higher-brilliance sources such as ESRF ID14-4 and APS 19ID. Resolution was often judged as the point where the average I/σ(I) of integrated non-merged reflections fell below 2.0–3.0, whereas the mosaicity could conveniently be estimated by the program MOSFLM, which in its more recent versions includes a feature to estimate mosaicity from a single frame (Collaborative Computational Project, Number 4, 1994; Leslie, 1992).
2.3. Multiple shots
More recently, we have been able to fine-tune the size of the collimated X-ray beam not only to avoid excess background on the diffraction images, but also to maximize the number of shots that can be performed along the length of a single crystal. In this way, it has been possible to collect as many as ten individual wedges from a single crystal and thus a complete data set, which reduces many of the problems with completeness and unit-cell variability which sometimes change randomly throughout the crystal. On the other hand, the mosaicity can often be seen to change gradually but significantly from one end of the crystal to the other. Fig. 1 shows a 30S subunit crystal mounted in a loop which has been shot multiple times along its length using a beam size of 40 µm (horizontal) by 100 µm (vertical).
3. Phasing at low resolution
During the early stages of the et al., 1999). This was possible at a resolution of 9 Å and both dispersive and anomalous Patterson maps showed clear evidence for the binding of cluster super-atoms to the subunit (Fig. 2) (Clemons et al., 2001). Using the cluster positions, phases to low resolution but of very high quality could be calculated by treating the W17 cluster as a point scatterer. These phases could now be used to check for the positions of other cluster compounds and even single heavy-atom positions by difference Fourier methods, which are much less sensitive to noise than the Using this approach, more than 20 different heavy-atom compounds, including several compounds of the lanthanide elements, as well as various osmium compounds which are known to bind to the major groove of RNA, were screened and found to bind to the subunit (Clemons et al., 1999). Data sets collected from these soaks were each scaled to each other and the degree of isomorphism between them was estimated from the scaling R factor. From this, a group of soaks could be isolated (including osmium hexammine, two types of W clusters, thallium bromide and the chlorides of ytterbium and lutetium) which were internally compatible with one another as judged by the scaling R factor and it was decided to use these heavy atoms to push the phasing to higher resolution. However, because of the significant problems with crystal decay, it was not possible to collect multiple-wavelength data for the heavy-atom soaks at anything higher than 9–7 Å. Whenever this was attempted, the strong internal decay meant that the collective set of data was less useful than a single data set carefully collected at the peak of the anomalous signal.
attempts were made to carry out multiple-wavelength (MAD) experiments from 30S subunit crystals soaked in heavy-atom clusters such as W17 (ClemonsSingle-wavelength data sets to 5.5 Å were now collected at the peak of anomalous signal for the group of internally compatible heavy-atom compounds and an electron-density map was calculated. For each derivative, the quality of the phasing (anomalous) signal from the individual data sets could now be judged by comparing the peak heights of the heavy-atom peaks in the difference Fourier map calculated using a defined set of phases. After careful solvent flattening (see below), the electron-density map showed clear signs of both double-helical RNA and tubes corresponding to helices and strands of the proteins in the subunit. By combination of the information in this low-resolution map with the vast amount of structural biochemical information available for the ribosome (such as cross-links and footprints), it was possible to construct a reasonably accurate model of the most well ordered third of the RNA. In addition, each of the protein structures which had been determined in isolation by either crystallography or NMR could be placed with great certainty into the model (Clemons et al., 1999).
4. Extension of phases to high resolution
From analysing the 5.5 Å data, it was clear that the majority of the phasing signal was contributed by the osmium hexammine derivative, which had more than 50 sites in the 30S subunit. Phasing at higher resolution went hand in hand with efforts to push the resolution of the native crystals; eventually, native data extending to 3.0 Å and derivative data to 3.3 Å were collected at beamlines ID14-4 at ESRF and 19ID at the APS. For every derivative data set, the crystals were aligned absolutely using the κ-goniostat as described above in order to maximize the anomalous signal in spite of crystal decay. This also helped determine which parts of had been covered by each crystal in the data set. When collecting native data at ESRF, the program STRATEGY was used to ensure that complete data was achieved as efficiently as possible (Ravelli et al., 1997). On average, about ten crystals/wedges would be needed for a full data set at 3.0 Å.
Problems with lack of isomorphism are generally aggravated at higher resolution, but in addition we found that the unit-cell variability of the native 30S subunit crystals was much greater than for the osmium hexammine derivative, for which the unit-cell axes cluster in a relatively narrow region (Fig. 3). To overcome this problem, we went back to the very foundation of the method, namely the observation that crystals of pairs of isostructural small-molecule compounds, such as KMnO4 and BaSO4 or Ag3AsS3 and Ag3SbS3 (also known as isomorphous pairs) commonly show perfect isomorphism (Harker, 1956). For protein crystals, we approximate this idea by adding a heavy atom and comparing the resulting structure to that where the heavy atom is absent. Clearly, such structures are usually not isomorphous in the strictest sense and this can lead to problems during phasing. Therefore, the idea was to mimic the original small-molecule case more closely by adding a compound to the 30S subunit crystals which was lighter than osmium hexammine but was isostructural to it, in order to counter the observed differences between the native and derivative crystals. We chose cobalt hexammine and analysis of the unit-cell axis distribution of crystals soaked in this compound clearly showed not only that these were compatible with those of the osmium derivative, but also that they varied much less than the native (Fig. 3). Thus, for all subsequent work, the `native' crystals were first soaked in cobalt hexammine (Clemons et al., 2001).
Derivative data extending to between 3.2 and 4.5 Å were collected from a number of different osmium compounds, including osmium hexammine chloride, pentaammine(dinitrogen)osmium (II) chloride, pentaammine(trifluoromethanesulfonato)osmium (III) trifluoromethanesulfonate and osmium bipyridine, in addition to the lanthanide compound lutetium chloride (Table 1) (Wimberly et al., 2000). No phase information from heavy-atom clusters was included in the final phasing. Experimental phases were calculated using both the programs SOLVE (Terwilliger & Berendzen, 1999) and SHARP (de La Fortelle & Bricogne, 1997), followed by an elaborate solvent-flattening procedure (see below). Both the SHARP and SOLVE maps were good enough that both bases of RNA and individual protein side chains could be seen (Fig. 4).
‡Pentaammine(dinitrogen)osmium (II) chloride. §Pentaammine(trifluoromethanesulfonato)osmium (III) trifluoromethanesufonate. ¶Osmium bipyridine. ††Lutetium chloride. |
If we investigate the heavy-atom binding sites for osmium and lutetium inside the 30S subunit, it is clear that both types of compounds prefer to bind to the RNA rather than to the proteins of the subunit. Fig. 5 shows a typical example of an osmium hexammine complex bound in the major groove of an A-form RNA helix and a nearby Lu atom interacting directly with the bases of RNA. Such observations suggest that both osmium (in particular osmium hexammine) and lutetium are very well suited as derivatives for any structure involving RNA. Since lutetium has similar chemistry to the other lanthanide elements, any one of these should also provide a good derivative for RNA-containing crystals.
5. Solvent flattening
Density-modification techniques such as solvent flattening are all but essential to modern-day macromolecular ). For very large structures, this effect is perhaps even more pronounced and in our case the 30S subunit crystals contain as much as 70% of solvent. The effect of solvent flattening was so dramatic that in electron-density maps calculated prior to solvent flattening only a faint outline of the subunit was visible, whereas in the solvent-flattened maps both protein side chains and RNA bases could be built with confidence. Fig. 6 shows a schematic outline of the solvent-flattening procedure used for the 30S subunit, which is similar to the protocol used by the program SHARP. Since the optimal solvent-content value to use for the calculation does not necessarily equal the actual solvent content of the crystals, we made the procedure loop through a range of solvent contents, typically from 58 to 80% in 2% steps, where the optimal value would hopefully be somewhere in between. For each value of the solvent content, the appropriate flipping factor (the density value to assign to the flattened regions, which is 0 for old-style solvent flattening and −1 for solvent flipping) was calculated based on the gamma correction formula
as they provide strong and independent restraints on the phase information (Cowtan & Zhang, 1999where ff is the flipping factor and x is the estimated value of the solvent content. The truncation level (which is the lower threshold for the fraction of the `protein' regions of the map that is included as solvent and thus is flattened) was varied independently between 30 and 40%. In other words, this meant that the 30–40% lowest `protein' density regions were treated as solvent. Using these parameters, 50 cycles of solvent flattening were now calculated with a decreasing sphere size to gradually extend the resolution of the phases. The sphere size (which defines the region of the map for which the standard deviation is calculated in each case to determine whether it belongs to a solvent or `protein' region) would start in the range corresponding to the resolution where very good phase information was available (typically 6–7 Å) and gradually decrease down to the maximum resolution of the data (3.0–3.3 Å). This was performed in a way such that the sphere size was kept fixed at the maximum and minimum levels for the first and last 20 cycles, respectively, and then decreased in a linear fashion during the 30 intervening cycles. For each cycle, the current map was flattened using SOLOMON (Abrahams & Leslie, 1996; Collaborative Computational Project, Number 4, 1994), phases were then derived from the modified density and combined with the previous set of phases and a new map was calculated (Fig. 6). By the end of the procedure, a series of maps corresponding to the different values of the solvent content was obtained.
From here on, a tedious manual process began to judge which map was the best. In each case, three different parts of the map corresponding to known good and bad regions of the SHARP now include the option to automatically monitor the progress of the solvent flattening and determine the best values for the solvent content (C. Vonrhein, personal communication). It would be extremely useful if such approaches were expanded to include automatic estimation of more of the parameters involved in solvent flattening, as this method is proving increasingly powerful as we progress towards ever larger structures.
were inspected and compared. The procedure was repeated for other values of the initial sphere size and the truncation factor and again for other sets of calculated phases. More recent versions of the solvent-flattening procedure included in the phasing package6. A `postmortem' analysis of the 30S subunit phasing
The 30S subunit structure was determined at 3.0 Å resolution using the complete set of derivatives as mentioned above. However, it was always clear that the majority of the phasing information was derived from the osmium hexammine data, so it is interesting to investigate how much information is actually necessary to solve the structure ab initio. For this exercise, a series of new phase calculations were carried out in which data were gradually removed from the calculation. The quality of the resulting phases and maps were then judged by the mean phase difference between the experimental phases and phases calculated from the final refined 30S subunit model, as well as by the real-space map correlation between the experimental map and the final refined map (Table 2) (Clemons et al., 2001). The reference point was the original phasing protocol in SOLVE, in which both isomorphous and anomalous signal from all derivatives were used in the calculation. The reference protocol had a mean phase error of 32.4° and a map correlation of 86.6% after solvent flattening, which is clearly very good. Using only the anomalous signal from all derivatives, these values deteriorate only slightly to 34.3° and 85.1%, respectively, but when only isomorphous signal is used they are much worse (54.0° and 66.4%). This clearly shows the importance of the anomalous signal in this kind of phasing; the structure could not have been solved without it. Perhaps the most surprising result is that if the phasing is carried out using only the anomalous signal from our main derivative, osmium hexammine, a phase error of only 37.3° and a map correlation of 81.7% is obtained, which is almost as good as using all the available information. This shows that it is the anomalous signal from osmium that drives the phasing entirely.
|
In a separate calculation, SOLVE was only provided with the native structure factors, as well as the osmium hexammine data, with no additional information about the location of the sites. Initially, the two data sets were locally scaled to each other using the built-in scaling function of SOLVE and then subjected to automatic using the ADDSOLVE function. Without further information, the program was immediately able to locate eight strong osmium sites using data to 6 Å, from which further sites could be located. This shows that the 30S subunit structure could have been determined at high resolution without resorting to cluster compounds and entirely by using traditional heavy-atom soaking procedures.
7. Conclusion
With the determination of structures as large as the ribosome, crystallography has once again proven the method of choice for medium- to high-resolution studies of macromolecules of almost any size. With substantial technological developments such as high-resolution CCD detectors, tuneable high-flux X-ray beams, as well as continual software developments, this trend is set to continue. Our experience from the ribosome has shown that given modern synchrotron radiation and software, these very large structures can indeed be determined using the same phasing protocols involving de facto standard for macromolecular over the last 20 years. Most of the present crystallographic software is already well equipped to deal with large structures and, with ever faster computers, lengthy calculations can now be performed even at the desktop.
and that have become theMany of the cell's vital functions, particularly in eukaryotes, are now known to be maintained by large and transient protein–protein and protein–RNA complexes, such as, for example, the splicing machinery and the recently discovered RNA-degrading apparatus, the exosome. With the technology in place to deal with structures of these dimensions, our challenges ahead now lie in characterizing, isolating and crystallizing these large and elusive complexes so that their structures can be determined at the molecular level and we can gain further insight into the sophisticated inner workings of higher organisms.
Acknowledgements
The authors would like to thank Dr Raimond Ravelli for continual help optimizing the data-collection protocol at ESRF ID14-4 and P. R. Evans for critical and helpful comments on the manuscript. DEB was funded by a Human Frontier Science Program postdoctoral fellowship.
References
Abrahams, J. P. & Leslie, A. G. W. (1996). Acta Cryst. D52, 30–42. CrossRef CAS Web of Science IUCr Journals Google Scholar
Ban, N., Freeborn, B., Nissen, P., Penczek, P., Grassucci, R. A., Sweet, R., Frank, J., Moore, P. B. & Steitz, T. A. (1998). Cell, 93, 1105–1115. Web of Science CrossRef CAS PubMed Google Scholar
Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. (2000). Science, 289, 905–920. Web of Science CrossRef PubMed CAS Google Scholar
Blundell, T. L. & Johnson, L. N. (1976). Protein Crystallography. New York: Academic Press. Google Scholar
Cate, J. H. (2001). Methods, 25, 303–308. Web of Science CrossRef PubMed CAS Google Scholar
Cate, J. H., Yusupov, M. M., Yusupova, G. Z., Earnest, T. N. & Noller, H. F. (1999). Science, 285, 2095–2104. Web of Science CrossRef PubMed CAS Google Scholar
Clemons, W. M. Jr, Brodersen, D. E., McCutcheon, J. P., May, J. L., Carter, A. P., Morgan-Warren, R. J., Wimberly, B. T. & Ramakrishnan, V. (2001). J. Mol. Biol. 310, 827–843. Web of Science CrossRef PubMed CAS Google Scholar
Clemons, W. M. Jr, May, J. L., Wimberly, B. T., McCutcheon, J. P., Capel, M. S. & Ramakrishnan, V. (1999). Nature (London), 400, 833–840. Web of Science PubMed CAS Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Cowtan, K. D. & Zhang, K. Y. (1999). Prog. Biophys. Mol. Biol. 72, 245–270. Web of Science CrossRef PubMed CAS Google Scholar
Cramer, P., Bushnell, D. A. & Kornberg, R. D. (2001). Science, 292, 1863–1876. Web of Science CrossRef PubMed CAS Google Scholar
Harker, D. (1956). Acta Cryst. 9, 1–9. CrossRef CAS IUCr Journals Web of Science Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Leslie, A. G. W. (1992). Jnt CCP4/ESF–EAMCB Newsl. Protein Crystallogr. 26. Google Scholar
Ravelli, R. B. G., Sweet, R. M., Skinner, J. M., Duisenberg, A. J. M. & Kroon, J. (1997). J. Appl. Cryst. 30, 551–554. CrossRef CAS Web of Science IUCr Journals Google Scholar
Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F. & Yonath, A. (2000). Cell, 102, 615–623. Web of Science CrossRef PubMed CAS Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999). Acta Cryst. D55, 849–861. Web of Science CrossRef CAS IUCr Journals Google Scholar
Trakhanov, S. D., Yusupov, M. M., Agalarov, S. C., Garber, M. B., Ryazantsev, S. N., Tischenko, S. V. & Shirokov, V. A. (1987). FEBS Lett. 220, 319–322. CrossRef Web of Science Google Scholar
Wimberly, B. T., Brodersen, D. E., Clemons, W. M., Morgan-Warren, R. J., Carter, A. P., Vonrhein, C., Hartsch, T. & Ramakrishnan, V. (2000). Nature (London), 407, 327–339. Web of Science PubMed CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.