diffraction structural biology Journal of Synchrotron Radiation

Studies of icosahedral virus capsids provide insights into the function of supramolecular machines. Virus capsid crystals have exceptionally large unit cells; as a result, they diffract weakly compared with protein crystals. HK97 is a dsDNA lambda-like bacteriophage whose 13 MDa capsid expands from 550 A ˚ to 650 A ˚ with large subunit conformational changes during virus maturation. The HK97 penultimate maturation intermediate was crystallized in a tetragonal unit cell that has lattice constants of 1010 A ˚ Â 1010 A ˚ Â 730 A ˚. The crystals could be cryoprotected, but diffracted to a modest resolution of 5 A ˚ at a bending-magnet beamline. When these crystals were optimally exposed with two orders-of-magnitude more photons from a new insertion-device beamline, data extending to better than 3.8 A ˚ resolution were obtained. Here, the strategies to collect and process such data are described. These strategies can be adapted for other crystals with large unit cells and for microcrystals. The bacteriophage HK97 Head II structure was the first high-resolution structure of a mature dsDNA virus capsid (Wikoff, Liljas et al., 2000; Helgstrand et al., 2003). Cryo-EM and solution X-ray scattering studies of the procapsid and a number of maturation intermediate states have shown that HK97 capsids (and probably other dsDNA capids) undergo a 'molecular ballet' during DNA-packaging-induced capsid maturation (Lata et al., 2000; Lee et al., 2004; Wikoff et al., 2006). To fully understand the macromolecular mechanisms involved in capsid maturation, it is necessary to obtain high-resolution crystal structures of these intermediates. Biochemical and cryo-EM experiments revealed that the penultimate capsid state known as expansion intermediate IV (EI-IV) may be a candidate for crystallization (Gan et al., 2004). Moreover, two capsid forms that are related to EI-IV, termed Head I and pepEI-IV, could also be crystallized in the same space group P4 3 2 1 2 (Gan et al., 2006). Such crystals will now be referred to as 'HK97 tetragonal crystals'. The structure of EI-IV was determined to 8 A ˚ resolution from a single crystal, revealing important details of molecular rearrangement at the secondary structural level. However, this 8 A ˚ resolution structure missed all of the important details at the amino-acid side-chain level that can be revealed at 3–4 A ˚ resolution. We observed that HK97 tetragonal crystals either diffracted to worse than 30 A ˚ resolution, where the diffraction image is dominated by the concentric small-angle solution scattering …


Optimized exposure: pushing diffraction to the max
The bacteriophage HK97 Head II structure was the first high-resolution structure of a mature dsDNA virus capsid (Wikoff, Liljas et al., 2000;Helgstrand et al., 2003). Cryo-EM and solution X-ray scattering studies of the procapsid and a number of maturation intermediate states have shown that HK97 capsids (and probably other dsDNA capids) undergo a 'molecular ballet' during DNA-packaging-induced capsid maturation (Lata et al., 2000;Lee et al., 2004;Wikoff et al., 2006). To fully understand the macromolecular mechanisms involved in capsid maturation, it is necessary to obtain high-resolution crystal structures of these intermediates. Biochemical and cryo-EM experiments revealed that the penultimate capsid state known as expansion intermediate IV (EI-IV) may be a candidate for crystallization . Moreover, two capsid forms that are related to EI-IV, termed Head I and pepEI-IV, could also be crystallized in the same space group P4 3 2 1 2 . Such crystals will now be referred to as 'HK97 tetragonal crystals'.
The structure of EI-IV was determined to 8 Å resolution from a single crystal, revealing important details of molecular rearrangement at the secondary structural level. However, this 8 Å resolution structure missed all of the important details at the amino-acid sidechain level that can be revealed at 3-4 Å resolution. We observed that HK97 tetragonal crystals either diffracted to worse than 30 Å resolution, where the diffraction image is dominated by the concentric small-angle solution scattering pattern of residual uncrystallized capsids, or to a moderate resolution of $ 8 Å . The drastic difference probably resulted from the vagaries associated with flash freezing.
The crystals were exposed for 1 min at a bending-magnet (BM) beamline, which was optimized for the 3.44 Å resolution Head II dataset that was collected at room temperature (Wikoff, Schildkamp et al., 2000). The best HK97 tetragonal crystals diffracted to 5.6 Å resolution for 200 one-minute exposures using a BM beamline. The crystals might be more ordered, but diffraction was limited by the small number of unit cells. Indeed, a typical protein crystal has at least 1000-fold more unit cells than an HK97 tetragonal crystal of comparable size. The HK97 crystals also have an unusually large solvent content because they were prepared from empty capsids. Not surprisingly, HK97 crystals produce almost undetectable diffraction when they are exposed to X-ray doses used for protein crystals.

Experimental considerations
The highest-resolution data for the HK97 tetragonal crystals were obtained by what would be conventionally considered over-exposure of the crystals. Extended exposure times degraded the crystals but produced diffraction to much higher resolution than when a single crystal was used for a complete dataset. HK97 crystals were suitable for such a procedure because they are highly reproducible and can be flash-frozen. Multiple crystals were required to maximize the completeness of the dataset because they were not perfectly isomorphous, a problem often encountered when crystals are frozen. The diffraction images from many crystals were rejected owing to a small but significant difference in lattice dimensions. More than 20 crystals were screened for each capsid class studied, but data from up to eight crystals contributed to the final dataset. The EI-IV tetragonal crystals were not as reproducible and therefore a dataset could not be collected using extended exposures.
Virus data have traditionally been collected at the brightest beamlines; for example, BioCARS 14-BMC was key to the Head II project (Wikoff, Schildkamp et al., 2000). Since the new tetragonal crystals diffracted weakly compared with the room-temperature crystals, they were exposed to the unattenuated beam of a new GM/CA CAT insertion-device (ID) beamline. The flux (1.3 Â 10 13 photons s À1 ) at the ID-D beamline allowed up to 100-fold more photons to be collected per reflection per unit time. By reducing the exposure time from 30 min to 20 s, the background radiation noise was minimized and hundreds of diffraction images could be collected in a day. These brilliance-dose relationships are not linear because the diffraction power depends on other factors such as wavelength and crystal size, even when holding crystal 'quality' constant.
GM/CA CAT's ID-D beamline also features convergent optics, which enables the accurate integration of closely spaced reflections (Fig. 1). The convergent optics also increased the signal-to-noise ratio by imaging each reflection in an area smaller than 16 pixels, a value that was limited by the detector's point spread function (Fig. 2). To maximize the resolution range covered, large multi-element MarMosaic CCD detectors were used that could resolve more than 120000 reflections in a single image. Future improvements in detectors might result in all photons being imaged in 1-4 pixels, which will significantly simplify intensity integration.

Outcome
HK97 tetragonal crystals diffracted to 5-6 Å resolution for up to 200 one-minute 'standard exposures' at a BM beamline. Comparable crystals (same mother liquor, different preparation) diffracted to 3.8-4.2 Å resolution for up to 20 twenty-second exposures at the GM/CA CAT ID-D beamline. Such 'extended exposures' were optimal for answering key questions about HK97 capsids, but dramatically reduced the number of diffraction images that could be collected per crystal. Fortunately, many crystals were large enough that data could be collected by rastering with the small beam (40 mm Â 60 mm). Owing to the random orientation of each crystal, the resulting dataset was not complete. Systematic absences were difficult to detect in the non-contiguous images, so an almost complete dataset was collected at low resolution with conventional exposure times. However, no (0, 0, 4n) systematic absences were observed, perhaps owing to the unfortunate orientation of the crystal's four-fold axis parallel to the spindle axis. A fortuitous (0, 0, 4n) systematic absence was eventually confirmed from a single test-crystal diffraction image, cutting the molecular replacement search time by half during the structure determination.

Pitfalls
There are a few important disadvantages to extended-exposure data collection. First, it is time-consuming at both the beamline and the data-processing stages. Second, it is not guaranteed to produce an interpretable structure because crystals that are apparently isomorphous at the integration stage may not scale well. Furthermore, a scaling cut-off must be chosen to reject images from non-isomorphous crystals (see below). However, these time costs were justified because the diffraction resolution was sufficient to answer the key biological questions, and therefore saved many months of potentially fruitless crystal optimization. These time costs may be impractical for determination of structures by isomorphous replacement or anomalous scattering. Moreover, crystals that do not have high non-crystallographic symmetry (NCS) may yield noisier electron density maps because frozen crystals are never perfectly isomorphous with each other.

Data processing strategies
Data from the optimally exposed crystals had many properties in common with data collected from crystals at room temperature. Mosaicity increases and diffraction falls off rapidly as the crystals are exposed to the unattenuated ID X-rays; many crystals are nonisomorphous; resolution limits are heterogeneous. All of these properties reduce the resolution to significantly lower than that of the best crystal. To obtain the best possible structure from optimally exposed datasets, the data must be integrated and scaled with great care. HK97 tetragonal crystal diffraction images were processed and scaled using the methods implemented for Head II and Prohead II (Wikoff, Liljas et al., 2000;Wikoff et al., 2003), but with significant modifications.

Indexing and intensity integration
Individual HK97 diffraction images used to be indexed independently using the program DENZO (Otwinowski & Minor, 1997), even if they came from the same crystal . This strategy allowed the resolution cut-off to be optimized for each    individual frame. Sequential images that are indexed independently are often assigned different symmetryrelated orientation matrices, so the partially recorded reflections that account for the majority of reflections cannot be summed conveniently. Therefore, the images were auto-indexed once per crystal. Indeed, it was found that the post-scaling statistics, especially completeness and I/(I) of the highest-resolution bins, could be improved if all images from the same crystal were integrated using the starting orientation matrix of the first image (Gertsman et al., 2008).
The ability to record high-resolution reflections from a 1000 Å primitive-unit-cell crystal presented a new challenge: there were too many reflections to integrate. After extensive testing, it was found that up to 100000 reflections could be integrated, so each image was integrated using a low-and a high-resolution pass. The resolution cutoff was chosen so that equal numbers of reflections were in each resolution band. Scaling of the resulting data revealed an artificially high I/(I) in the overlapping resolution band, but this did not produce any noticeable artefacts in the electron density maps.

Scaling, image rejection and reindexing
Optimally exposed data were scaled in three passes to eliminate the bad crystals and images that showed severe radiation damage. First, the reflection files from all crystals were scaled together. The non-isomorphous and 'bad' crystals, with much higher R merge and 2 values than most other frames, were detected in the log files and excluded from further processing. While decreases in resolution can readily be detected by looking at the images, increases in mosaicity are detected by scaling. HK97 data were scaled in Scalepack (Otwinowski & Minor, 1997) using the option 'fit batch mosaicity', which refines independently the mosaicity of each image. The log file revealed that the mosaicity increased slowly at first, but then rapidly. The mosaicity of each crystal was classified as low (early exposures) and high (late exposures). All images were re-integrated using the highest mosaicity value from each mosaicity class to ensure that all reflections were integrated; each crystal was therefore treated as two crystals, one with low mosaicity and one with high mosaicity. Spurious reflections, such as those with negative (I) or an I/(I) value less than À1.5, were removed from the reflection files using a locally written Perl script. These trimmed image files were scaled one last time, but with the option 'fit crystal mosaicity'. Although the completeness could have been increased by scaling partially recorded reflections to full reflections, this procedure was not carried out because there were enough data to calculate interpretable maps.

Results and final maps
Four datasets were collected from three capsid forms (Table 1). The maps were sharpened by ÀB Wilson and the structures were refined extensively using real-space refinement methods (Korostelev et al., 2002). There is a clearly positive correlation between the number of crystals, R cryst , and B Wilson . Since the maps were of comparable quality, it is likely that by using more (slightly nonisomorphous) crystals the errors in the averaged amplitudes increased significantly. The effects of such errors in electron density maps were mitigated by 30-fold NCS real-space averaging. An optimally exposed dataset was collected from two cryoprotected Head II crystals to control for crystallization artefacts. The electron density map calculated from this dataset was compared with the room-temperature Head II dataset, revealing similar quality (Fig. 3). Owing to the smaller number of crystals used for the cryo dataset (2 versus 60), R cryst was significantly better (30% versus 36%). diffraction structural biology Table 1 Statistics from optimally exposed crystals.
Additional details can be found by Gan et al. (2006). Note that the Head II dataset was collected using the smallest MarMosaic detector.

Figure 3
Comparison of Head II electron density, in stereo. The densities were real-space averaged using (left) 60-fold or (right) 30-fold NCS and contoured at 1.5 RMS. Note that the maps are comparable in terms of noise and side-chain definition despite the differences in resolution and crystal exposure method.

Summary and prospectus
The lessons learned by determining structures from optimally exposed data are useful for other challenging crystals and can be summarized as follows. First, a complete low-resolution dataset should be collected to determine the space group and molecular replacement parameters. If a second dataset can be collected from the same crystal, the crystal should be manually rotated 45 from the spindle axis to increase the chances of recording a line of systematic absences. Next, a crystal should be exposed with 10-to 100-fold more dose than used for the low-resolution dataset. If the resolution increases significantly, the crystals are considered to be 'weakly' diffracting instead of 'poorly' diffracting and should be exposed in a dose series to determine how many images can be collected from one crystal. To maximize completeness, a large number of crystals should be shot. The data should be processed carefully, paying special attention to individual mosaicity and R merge values. Extended exposures are feasible using the brightest beamlines, which have convergent beam fluxes of ! 1 Â 10 13 photons s À1 mm À2 at the specimen. To minimize air scatter, the beamline should have high-energy (> 20 kV) X-ray capability or a helium/vacuum path between the sample and detector.
A survey of the VIPERdb revealed that virus crystals with a unitcell parameter greater than 500 Å rarely diffract beyond 3.4 Å resolution (Shepherd et al., 2006). Most of these crystals were exposed at room temperature, leading one to suspect that radiation damage is the limiting factor. Another resolution-limiting factor is the large background from the water solution-scattering ring. If suitable cryoprotection conditions can be found for these crystals, diffraction may be extended beyond the 3.4 Å resolution limit by optimizing the exposure. We suspect that breaking this resolution barrier may be especially challenging for crystals that have enormous Matthews coefficients (V m ). An HK97 tetragonal crystal has a V m of 14 and produces up to three times more solvent scattering than a typical protein crystal, which has a V m of 2-3 (Kantardjieff & Rupp, 2003).