radiation damage
XFEL diffraction: developing processing methods to optimize data quality
aPhysical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
*Correspondence e-mail: nksauter@lbl.gov
Serial crystallography, using either femtosecond X-ray pulses from free-electron laser sources or short synchrotron-radiation exposures, has the potential to reveal metalloprotein structural details while minimizing damage processes. However, deriving a self-consistent set of Bragg intensities from numerous still-crystal exposures remains a difficult problem, with optimal protocols likely to be quite different from those well established for rotation photography. Here several data processing issues unique to serial crystallography are examined. It is found that the limiting resolution differs for each shot, an effect that is likely to be due to both the sample heterogeneity and pulse-to-pulse variation in experimental conditions. Shots with lower resolution limits produce lower-quality models for predicting Bragg spot positions during the integration step. Also, still shots by their nature record only partial measurements of the Bragg intensity. An approximate model that corrects to the full-spot equivalent (with the simplifying assumption that the X-rays are monochromatic) brings the distribution of intensities closer to that expected from an ideal crystal, and improves the sharpness of anomalous difference Fourier peaks indicating metal positions.
Keywords: serial femtosecond crystallography; X-ray free-electron laser; partiality; postrefinement; mosaicity.
1. Introduction
As a strategy to avoid radiation damage, serial crystallography techniques aim to spread the X-ray dose over numerous crystal specimens, with the goal of observing Bragg spots from material that is close to the undamaged state. Based on the general decay of diffraction at a third-generation synchrotron source, an upper limit for radiation absorbed dose of 30 MGy has been proposed (Owen et al., 2006) for single-crystal experiments. However, it is also clear that, even at doses far below this limit, damage at specific sites of interest is observed, in particular at metal sites where valence states and coordination geometry are sensitive to X-rays. In photosystem II (PSII), for example, the valence state of the multinuclear Mn4Ca complex can be monitored by X-ray absorption near-edge spectroscopy (XANES; Yano et al., 2005). This complex, which is responsible for catalyzing the water oxidation reaction that evolves oxygen during photosynthesis, has a high valent Mn4(III2, IV2) structure in dark-adapted crystals. Critically, XANES can detect the accumulation of the radiation-damaged low-valent Mn(II) state even at the smallest doses examined, 0.6 MGy (at 100 K, with 13.3 keV radiation; Yano et al., 2005). In contrast, the femtosecond-scale pulse durations from an X-ray free-electron laser (XFEL) permit the observation of the undamaged Mn4(III2, IV2) complex (Kern et al., 2012, 2013), as shown by Mn Kβ1,3 X-ray emission spectroscopy that is likewise sensitive to the valence state (Alonso-Mori et al., 2012). Short pulses also permit the direct observation of metal coordination bond lengths expected by theory from undamaged Mn (Suga et al., 2015). Furthermore, these XFEL-based observations can be performed under room-temperature conditions that permit time-dependent pump–probe studies of the water oxidation mechanism (Kern et al., 2014). Such experiments, when performed at the Linac Coherent Light Source (LCLS) with typical pulse durations of 40 fs, deliver a that would be equivalent to about 200 MGy (Kern et al., 2012) if they were carried out at synchrotron-source time scales that allow radiation absorption. Thus, despite reports of diffraction decay with exceedingly long XFEL pulses (150 fs) and higher equivalent doses of 3 GGy (Lomb et al., 2011), it appears that short-pulse XFEL still shots provide a promising method to look at radiation-sensitive structures, including those of metalloproteins.
Several high-resolution crystal structures have now been derived from XFEL diffraction (Boutet et al., 2012; Redecke et al., 2013; Barends et al., 2013a,b; Liu et al., 2013; Weierstall et al., 2014; Hattne et al., 2014; Kern et al., 2014; Sawaya et al., 2014; Cohen et al., 2014; Tenboer et al., 2014), with a common result being the large number of diffraction images required to produce a complete set of merged structure factors, ranging in these cases from 104 to 1.8 × 105. Part of this requirement arises from the heterogeneous quality of the diffraction images. When the data are examined in detail, it appears that the limiting resolution differs from shot to shot; this can be quantified by asking at what resolution the average Bragg spot signal-to-noise ratio [I/σ(I), where I is the intensity and is the standard deviation from counting statistics] falls below a threshold value (Kern et al., 2012, 2014; Hattne et al., 2014). Scoring the data in this way suggests that only a small fraction of the images contributes signal at the outer limits of resolution. Considering this, the high resolution data are especially valuable, and special effort is warranted to optimize their measurement.
However, there are several known issues with XFEL data processing that make it difficult to gain accurate measurements of the high resolution signal. Firstly, there are tradeoffs made when implementing the algorithm for Bragg spot integration. The program cctbx.xfel (Hattne et al., 2014) chooses small integration masks that tightly conform to the pixels believed to contain signal based on a lattice model, so as to discard surrounding pixels that contain only Gaussian noise. However, these small integration masks make the model highly sensitive to the calibration of both the detector distance and the detector metrology, which defines the mutual positions of sensor tiles (Hart et al., 2012; Hattne et al., 2014). While cctbx.xfel can fortunately calibrate sensor positions to about 0.1-pixel accuracy, it is found that even a 0.5-pixel miscalibration noticeably degrades the integrated data, and that this is felt most acutely for the highest-resolution data (Hattne et al., 2014). Secondly, when performing a control data analysis with simulated image data, it can be shown that the inability to perfectly model the lattice orientation produces some Bragg spot predictions that are not in the simulated data, and misses other spots that really are in the simulation (Sauter et al., 2014). These effects are most pronounced at the highest resolution limits. Thirdly, correct modeling of the diffraction pattern of a crystal with mosaic structure (Nave, 1998, 2014) requires counterbalancing parameters that describe the crystal's physical properties. Increasing the parameter describing the angular spread of mosaic blocks permits the modeling of Bragg spots at the highest resolution limits, while decreasing the mosaic block size parameter allows the model to cover the low resolution Bragg spots. However, tuning these parameters requires assumptions about the mosaic structure of the crystal, and this entails increased uncertainty at the resolution extremes. Finally, in contrast to the usual rotation method employed at synchrotron sources, where each point is fully moved through the reflection condition, Bragg spots recorded on still shots necessarily represent partial measurements of the intensity. While it has been shown experimentally that adjacent spots can have differing partialities of measurement (Hattne et al., 2014), particularly at high resolution, there is as yet no robust model to correct measurements to the full-spot equivalent.
Here further evidence that the data quality is most sensitive to error at the highest resolution is presented, providing further incentive for resolution-based filtering. However, it is demonstrated that a straightforward filter based on I/σ(I) removes real signal that is capable of improving anomalous difference measurements. Finally, with the eventual goal of deriving a proper expression to correct for partiality, it is demonstrated that a simplified model based on the assumption of monochromaticity provides a reasonable first step toward improving the structure factors.
2. Methods
Data were processed with the program cctbx.xfel (Hattne et al., 2014; Sauter et al., 2014). A tutorial for processing the thermolysin data is presented at https://cci.lbl.gov/xfel .
2.1. Data analyzed
Thermolysin diffraction patterns were reprocessed from a previously described data set (Hattne et al., 2014) that is publicly archived at the Coherent X-ray Imaging Data Bank (https://cxidb.org ), accession ID 23. Data were acquired during the L498 (December 2012) beam time at the 1 µm focus of the Coherent X-ray Imaging (CXI) instrument (Boutet & Williams, 2010) of LCLS. Typical crystal size was approximately 2 µm × 3 µm × 1 µm (Sierra et al., 2012). Since the thermolysin structure contains a single Zn atom, it was possible to use the signal-to-noise ratio of the anomalous difference electron density as a metric for the data processing quality (Sauter et al., 2014). Therefore, in the work presented here the analysis was limited to data (runs 16–27) collected at a wavelength of 1.269 Å, slightly more energetic than the Zn K-edge at 1.284 Å.
Simulated still-shot diffraction patterns from photosystem I (PSI) were obtained from James Holton (LBNL), and are available at https://bl831a.als.lbl.gov/example_data_sets/Illuin/LCLS . The 20000-image simulated dataset was created with the program fastBragg as described (Kirian et al., 2010, 2011), utilizing modeled structure factors from Protein Data Bank entry 1jb0 . Spatially coherent simulations of randomly oriented parallelepiped nanocrystals (17 × 17 × 30 unit cells; cell lengths a = b = 281 Å, c = 165.2 Å) were performed, assuming constant-flux, polarized, monochromatic radiation (λ = 1.32 Å) with zero divergence, impinging on a pixel-array detector with pixel size (0.11 mm)2 at a distance of 129 mm from the sample. Solvent scattering and shot noise were added so as to effectively limit the resolution to about 3.3 Å. At very low resolutions (d > 60 Å) the simulation exhibits diffraction fringes between Bragg spots as previously observed for PSI (Chapman et al., 2011); however, the present paper attempts to analyze only the central Bragg peak, and the analysis is limited to the 15–3.5 Å range. Angular misorientation between the cctbx.xfel models and the true crystal orientations used for the simulation were calculated after accounting for the orientational ambiguities due to the hexagonal lattice symmetry operators (six-fold along c and two-fold along a + b). Angular misorientations were then decomposed into a rotation Rz about an axis parallel to the beam, and a residual rotation Rxy about an axis perpendicular to the beam.
2.2. Correction of the integrated intensity to the full-spot equivalent
This section describes the component of partiality that arises from the crystal's mosaic structure (Nave, 1998, 2014), setting aside the effects of beam properties such as dispersion and divergence for future work. Consider a point at position Q (rlpQ; Fig. 1). Points on the of radius 1/λ (λ, wavelength) satisfy the exactly, but what if rlpQ is located a distance rh from the surface, as in Fig. 1? In this case diffraction can still be modeled if crystal imperfections are taken into account, thus widening rlpQ into a ball of radius rs, and satisfying the Laue conditions at the spherical cap-shaped intersection between the and the rlpQ ball. Although this intersection area AQ could be expressed analytically, it is convenient to approximate it as a circle of radius rp given by the right triangle in Fig. 1, with
To obtain the best match with experimentally observed still-shot diffraction (Sauter et al., 2014), it is useful to consider two parameters that contribute to the ball radius rs. Viewing the crystal as a mosaic of coherently scattering blocks (Nave, 1998) of effective width Deff gives
Meanwhile, considering the angular spread and
variation among the mosaic blocks leads to the expressionwhere d is the resolution and is the effective full-width mosaicity.1 Combining these two effects gives a final expression for the intersection area:
For the partiality of the Bragg spot in Fig. 1, arising from the crystal's mosaic structure, it seems intuitive that the partiality should be proportional to AQ, with a maximum obtained when the slices through the center of rlpQ, and a minimum of 0 at rh = rs. Taking the simplest case first, that with = 0, one finds that the maximal area is a constant: . To turn this into a measure for partiality, one must assure that the partiality always takes on values from 0 to 1, and that it is a unitless quantity instead of having dimensions of length−2. This is accomplished by taking a suggestion from James Holton who, considering work on NaCl where a reference reflection was used (Bragg et al., 1921), proposed (private communication) that the ratio between AQ and the area intersected by the F000 point should be used:
For F000, equation (5) always holds because rh and 1/d are identically zero.
Next, in the general case where > 0, the maximal intersection area AQ increases as a function of resolution due to its dependence on 1/d, but this apparently goes against the expectation that the maximum partiality for any spot, independent of resolution, should be 1. To correct for this, one can normalize against the volume VQ of the point, so that the full expression for partiality P involves the ratio of area to volume:
where
To evaluate equation (6), the parameters Deff and are determined separately for each image as previously described (Sauter et al., 2014). Plotting the partiality of Bragg spots from a single thermolysin image (Fig. 2) confirms the expected behavior: the distribution of P increases to a maximum at rh = 0 but never actually reaches 1.0 due to the normalization by VQ, and it falls off to zero at rh = .
Individual Bragg spot intensity measurements I are corrected to their full-spot equivalent IF with
and those measurements with very low partiality are discarded, i.e. those with > 0.9rs.
Prior to merging data from different images together, duplicate measurements from different images are placed on a common scale by determining a separate scaling factor G and isotropic temperature factor B for each image. In common with previous work on scaling (Hamilton et al., 1965; Fox & Holmes, 1966; Bolotovsky et al., 1998; Kabsch, 2014), here these parameters were determined by iterative non-linear least-squares minimization of a target functional,
and the summation is over all measured on a given image. However, instead of taking to be the best least-squares estimate of the intensity over the global dataset, the shortcut is taken of using reference intensities measured at a synchrotron, in this case thermolysin intensities from Protein Data Bank entryFurthermore, for the computation of partiality in still-shot data [equation (6)], the distances rh are sensitive functions of the crystal orientation, and in particular are susceptible to rotational uncertainties about the two orthogonal axes perpendicular to the X-ray beam; see §3.3 below. By expressing rh explicitly as a function of these rotations, the scaling equation (8) can be modified to include these rotations as free parameters. The necessary equation is:
where and are matrices describing rotational perturbations through angles and about orthogonal axes y and x perpendicular to the beam, A* is the orientation matrix determined by indexing (Sauter et al., 2006), also known as the UB matrix (Busing & Levy, 1967), and is the vector describing the travel direction of the X-ray beam with length . With the final set of free parameters being G, B, and , this adjustment of the crystal orientation to optimize agreement between reference intensities and the corrected measured intensities is similar to other postrefinement protocols used for both classical rotation photography (Winkler et al., 1979; Rossmann et al., 1979) and XFEL scaling (White, 2014; Kabsch, 2014).
2.3. Comparison of data processing protocols
The thermolysin data were processed five times to assess the relative effects of differing protocols (Table 1). During the integration step, the lattice models were either truncated (Hattne et al., 2014) at a resolution limit, separate for each lattice, where integrated intensity measurements fell below a threshold value (protocols 4, 6 and 7POST); or the data were integrated to a fixed limit of 2.2 Å (6F and 7F,POST). In either case, negative measurements were removed before the data from separate lattices were scaled together. Scaling was performed either by finding a simple scaling constant to fit factor intensities from each lattice to full calculated intensities based on PDB entry 2tli (4, 6 and 6F) as previously described (Hattne et al., 2014), or by the postrefinement protocol of §2.2 (7POST and 7F,POST). Once duplicate measurements were merged globally over the whole data set, intensity distribution statistics were calculated with phenix.xtriage (Zwart et al., 2005). The previously published XFEL thermolysin structure (PDB entry 4ow3 ) was re-refined against the newly processed data with phenix.refine (Afonine et al., 2012), and anomalous difference Fourier peak heights analyzed with phenix (Adams et al., 2010). Likelihood-weighted maps displayed with Coot (Emsley et al., 2010) are shown in Fig. S1.3 Correlation coefficients of these maps to a 1.65 Å reference model (from synchrotron-based data, PDB entry 2tlx ) were determined after rigid body of the 2tlx model into the XFEL Separately, in order to assess the ability to perform automated model building, the structure was solved by against 4ow3 with phaser (McCoy et al., 2007). phasing information was combined with single-wavelength anomalous differences with phenix.autosol (Terwilliger et al., 2009), and fully automated fitting of the amino acid sequence was performed with phenix.autobuild (Terwilliger et al., 2008).
|
3. Results and discussion
3.1. Shot-to-shot heterogeneity is intrinsic to the data
Heterogeneity in XFEL-based serial crystallographic images is a necessary consequence of physical properties such as the mosaic structure that varies among crystals, and pulse-to-pulse differences in incident et al., 2013, 2014; Hattne et al., 2014). These reports, using data processed with cctbx.xfel, were based on the examination of Wilson plots (integrated Bragg spot intensity versus diffraction angle bin), to identify the limiting resolution where average intensity falls below the average counting-statistics noise. This analysis, however, does not convey whether the resolution limits are determined by actual falloff of the recorded spot intensities or by artifacts produced by the integration algorithm.
and the volume of crystal/beam intersection. Shot-to-shot variation in the limiting resolution has been previously noted for microcrystal populations of two proteins: PSII and thermolysin (KernFig. 3(a) confirms that the resolution limit variation is indeed intrinsic to the recorded data. The horizontal axis (scatter plot and histogram) reports the distribution of resolution limits judged by a spotpicking algorithm (Zhang et al., 2006). After removal of untrusted pixels and subtraction of local background, the signal is judged by whether the intensity exceeds local variance by a given threshold; the resulting population of Bragg spot candidates is then plotted as a function of diffraction angle and a uniform cutoff criterion applied over the whole data set. Resolution cutoffs determined this way are therefore independent of all the ensuing data processing details such as indexing (discovery of basis vector candidates), choice of basis vectors to form the model application of symmetry constraints, and choice of algorithms for spot prediction and signal integration.
However, once the integrated intensities are analyzed with a Wilson plot, the resolution cutoffs of the integrated data [Fig. 3(a), vertical axis] are well correlated with those determined simply on the basis of spotpicking [correlation coefficient r = 89% for Fig. 3(a)], ruling out any distortion arising from data processing. Indeed, the lattice model used for data integration can be used to push beyond the limits of the spotfinder to some extent: for two-thirds of the images plotted (Fig. 3a), the cctbx.xfel integration limit is above the diagonal, and therefore the model is finding signal that is missed by straightforward spotpicking.
Given this successful result, why not simply ignore the resolution cutoffs altogether, use the lattice model to predict spot positions out to the corner of the detector, and thereby take full advantage of the weak measurements when ultimately the duplicate measurements are merged over the whole data set to produce structure factors? Indeed, it is widely recognized (e.g. Weiss, 2001) that high-quality reduced data can be obtained by merging numerous multiplicitous measurements. The argument against this proposition is that it supposes that the error model for the weak high-resolution spots is well characterized and suitably random, which is a requirement for merging data (Borek et al., 2003). The following sections (§§3.2–3.3) show that there are large non-random systematic uncertainties in the model. Moreover, while the spotpicking Bragg candidates offer a built-in validation of the model, the uncertainties are poorly characterized at the highest resolutions that are beyond the spotpicking limit.
3.2. The positional accuracy of the model is resolution-dependent
The CSPAD imaging detector at LCLS (Hart et al., 2012) is designed to fulfil stringent requirements: signal is integrated over a 50 fs X-ray pulse, readout is performed at the pulse repetition rate of 120 Hz, and operation is in vacuum. A large detection area is achieved by tiling 32 rectangular silicon sensors; however, this geometrical arrangement also creates the problem of knowing the sensors' relative positions (metrology) to subpixel accuracy. As reported earlier (Hattne et al., 2014), cctbx.xfel can determine or validate the tile displacements to within 0.1 pixel. It compares the positions of Bragg spots observed by the spotfinder with those predicted by the lattice model, and performs iterative non-linear least-squares parameter over tile positions and lattice model parameters. Once the tile positions (and rotations) have been corrected, one can investigate the residual displacement errors of the bright Bragg spots (Fig. 3b).
Fig. 3(b) indicates that the positional error of the model increases at higher resolutions; this is evident both for individual images (blue traces), and for aggregate positional errors over the whole dataset (red curve). While positional uncertainty is quite manageable at 10 Å d-spacings (0.3 pixels), it becomes problematic (1.0 pixel) at 2.7 Å. Several factors may combine to cause this effect. Firstly, there is a positional error, potentially 1 pixel or greater, due to the assumption of monochromaticity. In reality the X-ray pulses at LCLS have a stochastic spectrum with ∼0.5% bandpass (Emma et al., 2010). Ideally the model could be augmented with a spot prediction algorithm that determines the wavelength range satisfying Bragg's law separately for each reflection (Hattne et al., 2014), thus taking the bandpass into account when predicting the 2θ diffraction angle. Secondly, the thickness of the silicon sensor (0.5 mm for the CSPAD) introduces a differential parallax effect, which again is potentially correctable (Hülsen et al., 2005). These phenomena affect spots' radial positions, and indeed we observe that the radial displacement is the largest component of the positional error (data not shown).
Regardless of the cause of the positional displacement shown in Fig. 3(b), values exceeding 1 pixel could significantly degrade the intensities, considering that a typical spot area is 5 square pixels (Hattne et al., 2014), and in view of cctbx.xfel's practice of constructing tightly conforming integration masks based on nearby bright spots. Rather than explicitly determining the uncertainty for each modeled spot at high resolution, cctbx.xfel currently takes the easier route of using the falloff of the Wilson plot as a proxy for uncertainty, and simply cutting off the integrated intensities past the apparent resolution limit. Other approaches to downweighting outlier data may be possible; for example, one of the 20 lattices plotted in Fig. 3(b) has positional displacements exceeding 2 pixels, which should probably disqualify it from the subsequent data merging process. Filtering individual lattices based on positional displacement rather than I/σ(I) falloff might offer a way to preserve weak high-resolution signal in the final merged intensities.
3.3. Resolution-dependent model quality, due to misorientation, affects map features
An inherent concern with still shots is that the orientation of the crystal is not uniquely determined by measuring the Bragg spot positions. Only one of the three rotational Rz around the axis parallel to the beam. The other two rotations (Rxy) move points in and out of the reflecting condition, but do not change the direction of the diffracted rays. It has been possible to improve the outcome by placing an additional restraint on the of the orientational model. Specifically, one can rotate the model lattice, while minimizing the deviations between the observed points and the with deviations being expressed either as distances (Kabsch, 2014) or rotational angles (Sauter et al., 2014).
is directly coupled to spot positions, namely the rotationThe effect of these restraints can be directly gauged by considering simulated diffraction data. Fig. 4 shows that, for 1000 simulated 3.3 Å PSI diffraction patterns in random orientations, lattice models refined against spot positions alone have residual Rxy misorientations up to 0.3° (Fig. 4a); while applying the angular restraint brings most Rxy misorientations to below 0.05° (Fig. 4b). A misoriented lattice model can have a dramatic effect on spot predictions (Fig. 4a). Improperly oriented model lattices place the observed points far away from the thus the mosaicity parameter must be adjusted upward so that the predicted spot pattern can cover all the observations. As illustrated in Fig. 4(a), this has the unwanted effect of creating false predictions for numerous spots that are not actually recorded in the image. Furthermore, previous work (Sauter et al., 2014) with the PSI simulation shows that the fraction of spots predicted falsely increases with resolution. In parallel, if the experimental thermolysin data are processed with a protocol that omits the angular restraint and thus is believed to allow numerous false spot predictions, the ability to distinguish the Zn2+ anomalous difference Fourier peak is markedly decreased (Table 1, compare protocols 4 and 6). All these results provide further argument for cutting off integrated intensities at the resolution suggested by the Wilson plot, thereby guarding against the chance that any given measurement is errantly modeled due to lattice misorientation.
3.4. Direct test of the I/σ(I) cutoff
The preceding two sections raise cautions about the systematic errors present in high resolution data. Accordingly, the program cctbx.xfel has been implemented with the option of applying a separate resolution cutoff to individual lattices, reasoning that, for the highest-resolution bins where I/σ(I) falls below a particular threshold, the data integration model has probably diverged too much for the intensities to be useful (Hattne et al., 2014). However, as recent literature has highlighted the pitfalls of discarding data (Karplus & Diederichs, 2012; Diederichs & Karplus, 2013), Table 1 presents a direct comparison between thermolysin data processed with an I/σ(I)-dependent cutoff (protocols 6 and 7POST) and data processed with a fixed cutoff of 2.2 Å (6F and 7F,POST). As expected, the inclusion of more weak high-resolution data dramatically increases the average multiplicity of observations, as well as increasing the fraction of negative observations due to the poor quality of the high resolution models. Notably, however, the inclusion of more data also increases the height of the Zn2+ anomalous difference Fourier peak, suggesting that there is value in preserving the high resolution information. As noted in §3.2, it is worth developing alternative methods that would include more data, but yet still account for the known systematic errors such as positional displacement.
3.5. Modeling the partiality
Even with utmost care given to choose data based on the significance level of the signal, a large inherent uncertainty remains with all still-shot data, due to the partial nature of the recorded intensities. This uncertainty is not present for rotation photography, where well established methods exist (Rossmann et al., 1979; Winkler et al., 1979) to quantify the spot partiality based on the volume of the point (rlpQ) swept up by the due to rotation. However, this is not a useful measure for still shots where, in the absence of rotation, the swept-up volume due to rotation is always zero.
Two factors are directly relevant when considering spot partiality on still shots. First, due to crystal imperfection (Nave, 1998; Bellamy et al., 2000; Helliwell, 2005), the point itself is spread out into a finite volume, therefore it has a finite intersection area with the even though the swept-up volume is zero. Secondly, due to the dispersion and divergence of the beam, one must consider a family of Ewald spheres of different radii (to account for dispersion; Hattne et al., 2014) and radius vector direction (to account for divergence). This degeneracy does sweep out a volume of the point, as has been discussed (White, 2014). In this paper, the focus is exclusively on the component of partiality due to crystal imperfection, as it seems a reasonable starting point. Many still datasets are acquired on endstations with negligible divergence, such as the LCLS/CXI 1 µm focus. While beam dispersion has been large (∼0.5%) for many XFEL datasets (Emma et al., 2010), it is also possible to acquire stills at synchrotron sources where the energy bandpass is potentially less then 10−4, and it is now possible to create seeded XFEL beams with similarly narrow bandpasses (Amann et al., 2012).
Therefore, a correction for partiality based on a monochromatic zero-divergence model is described in equation (6). Partiality is related to the finite width of the spot, due to the underlying mosaic disorder in the crystal that is modeled by two parameters: an effective mosaic block size Deff and an effective full-width mosaic angular spread . Intensity measurements are corrected for partiality in combination with scaling and postrefinement [equations (8) and (9)]. Despite the simple assumption of monochromaticity, this treatment notably improves the XFEL thermolysin data, which were collected with a non-monochromatic source (Table 1, compare protocols 6 and 7POST). The multiplicity of observation decreases, due to many points being classified as lying too far from the thus discarding a set of measurements that have no signal. The crystallographic R-factors improve, and the significance level of the anomalous difference Fourier peak for the Zn increases from 5.8σ to 7.4σ. These effects depend on performing postrefinement [equation (9)] to determine the optimal crystal orientation for calculating partiality; no improvement is observed unless the partiality correction is combined with postrefinement (data not shown).
Statistics indicating the quality of the merged structure factors (Padilla & Yeates, 2003) also show that the partiality correction (with postrefinement) alters the intensity distribution so as to conform better with theoretical expectation (Table 1 and Fig. 5). Synchrotron datasets have long been judged by their intensity distributions (Wilson, 1949; French & Wilson, 1978; Stein, 2007). It would be useful if such metrics could also be applied to judge the quality of XFEL data. However, the present comparison shows that distributions of the L and Z statistics (defined in Fig. 5) are highly dependent on the data processing procedures, and that, while accounting for partiality helps, the optimal protocol has not yet been achieved. One straightforward avenue for improvement would be to incorporate known spectral dispersion information into the partiality calculation. XFEL pulses, in particular the self-amplified stimulated emission pulses (SASE) in ordinary use, have complex and stochastic spectra, but it has been possible to measure these spectra on a shot-by-shot basis (Zhu et al., 2012). For future datasets where the incident spectra I0(E) are routinely available, one could perform a weighted summation over the entire bandpass to obtain the polychromatic partiality,
where the summations are performed over all energy increments within the measured spectrum, and the functional dependence of P(rh,E) is explicitly stated to emphasize that the Ewald-sphere distances rh are dependent on energy. Spectral measurements are not available for the thermolysin data presented here; however, other datasets that are linked to spectral information are under investigation.
4. Conclusions
To the knowledge of the author, this is the first literature presentation of experimentally measured XFEL still-shot diffraction data that are explicitly corrected for partiality (albeit with the simplified assumption of monochromaticity), and modeled with a lattice that is oriented by postrefinement. Equation (6), the expression for still-shot partiality, is similar to equation (40) in a recent paper from Kabsch (2014), in that both rest on the assumption of monochromaticity. However, the Kabsch paper does not include the effect of mosaic block size [equation (2)], which makes a resolution-independent contribution to the size of points, necessary for optimal modeling of still data (Sauter et al., 2014) if the block size is small. The equation (6) approach differs substantially from that used by White (2014), as that paper defines partiality in terms of the fractional immersion of a point between two Ewald spheres of different wavelengths, representing the high- and low-energy limits of the XFEL spectrum.
While no attempt is made here to comparatively evaluate these three partiality and postrefinement methodologies, it is clear that, as a general principle, algorithm choices must rely on objective metrics that measure the quality of the result. Examples of data processing quality metrics include the r.m.s. displacement between observed and modeled Bragg spot positions (and its resolution dependence), statistics that rely on the moments of the intensity distribution (Stein, 2007), local L-statistics (Padilla & Yeates, 2003), crystallographic R-factors, and the height of anomalous difference Fourier peaks for metal sites.
Thermolysin is an informative case for testing the potential of still-shot crystallography. It is possible to phase the structure with synchrotron data using SAD phasing, from the single Zn metal site (Ferrer et al., 2013). However, the best XFEL thermolysin data (giving an 18σ anomalous difference Fourier peak out to 1.8 Å resolution) falls short of the phasing power needed for a SAD structure solution (Kern et al., 2014). Only a single SAD-phased XFEL structure has been published (of lysozyme; Barends et al., 2013b), yet the usefulness of XFEL techniques may depend on whether they can be utilized generally to solve new macromolecular structures, and gain high-resolution information on systems that would otherwise be damaged at synchrotron sources. Data processing strategies that help correct specific issues such as partial measurements and the heterogeneous distribution of resolution limits will hopefully lead to more favorable structural outcomes.
5. Software availability
The partiality correction and postrefinement procedures described here are incorporated into cctbx.xfel (https://cci.lbl.gov/xfel ) and are available as a command line option in the cxi.merge program component.
Supporting information
Figure S1: Likelihood-weighted electron density maps. DOI: 10.1107/S1600577514028203/xh5046sup1.pdf
Footnotes
1The earlier paper (Sauter et al., 2014) treats as a rotational spread about the origin O of causing rlpQ to be shaped as a spherical cap of radius 1/d and maximum half-angle . Here, in contrast, equation (3) implies that rlpQ is shaped like a ball, not a spherical cap. This is chosen only to simplify the derivation of partiality, not to indicate a preference for a particular physical model (Nave, 1998; Juers et al., 2007) to describe the crystal.
2Scaling with a set of reference intensities has been used in more traditional crystallographic settings to extract weak signal from long wavelength anomalous diffraction experiments (Mueller-Dieckmann et al., 2004). Moreover, it can be shown with XFEL data that a scaling reference introduces no intensity bias, by scaling XFEL lysozyme measurements (CXIDB accession ID 17) against an isomorphous lysozyme structure containing the alanine truncation mutant E35A (PDB entry 3ok0 ). After (with the correct structure 4et8 used as the search model) followed by the likelihood-weighted shows perfectly normal signal for glutamic acid 35, proving that the false scaling model does not distort the information content of the experimental intensities. Furthermore, if the 3ok0 structure is used for phasing instead of 4et8 , one sees positive difference density for the glutamate side chain at 4 standard deviations, indicating that the intensities contain sufficient signal to overcome the phase bias introduced by the incorrect 3ok0 phasing model.
3Supporting information for this paper is available from the IUCr electronic archives (Reference: XH5046 ).
Acknowledgements
I thank James M. Holton (Lawrence Berkeley National Laboratory) for making available both the PSI simulated data and the program fastBragg (https://bl831.als.lbl.gov/~jamesh/fastBragg ), and for suggesting the functional form of the partiality correction, Peter Zwart and Paul Adams (LBNL) for technical discussions, and Helen Ginn and David Stuart (Oxford University), as well as Monarin Uervirojnangkoorn, William Weis and Axel Brunger (Stanford University) for discussing their separate work on partiality and postrefinement. This work was supported by NIH grants GM095887 and GM102520 and Director, Office of Science, Department of Energy (DOE) under contract DE-AC02-05CH11231 for data processing methods (NKS).
References
Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Alonso-Mori, R. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 19103–19107. Web of Science CAS PubMed Google Scholar
Amann, J. et al. (2012). Nat. Photon. 6, 693–698. Web of Science CrossRef CAS Google Scholar
Barends, T. R. M. et al. (2013a). Acta Cryst. D69, 838–842. Web of Science CrossRef IUCr Journals Google Scholar
Barends, T. R. M., Foucar, L., Botha, S., Doak, R. B., Shoeman, R. L., Nass, K., Koglin, J. E., Williams, G. J., Boutet, S., Messerschmidt, M. & Schlichting, I. (2013b). Nature (London), 505, 244–247. Web of Science CrossRef PubMed Google Scholar
Bellamy, H. D., Snell, E. H., Lovelace, J., Pokross, M. & Borgstahl, G. E. O. (2000). Acta Cryst. D56, 986–995. Web of Science CrossRef CAS IUCr Journals Google Scholar
Bolotovsky, R., Steller, I. & Rossmann, M. G. (1998). J. Appl. Cryst. 31, 708–717. Web of Science CrossRef CAS IUCr Journals Google Scholar
Borek, D., Minor, W. & Otwinowski, Z. (2003). Acta Cryst. D59, 2031–2038. Web of Science CrossRef CAS IUCr Journals Google Scholar
Boutet, S. & Williams, G. J. (2010). New J. Phys. 12, 035024. Web of Science CrossRef Google Scholar
Boutet, S. et al. (2012). Science, 337, 362–364. CrossRef CAS PubMed Google Scholar
Bragg, W. L., James, R. W. & Bosanquet, C. H. (1921). Philos. Mag. Ser. 6, 41, 309–337. Google Scholar
Busing, W. R. & Levy, H. A. (1967). Acta Cryst. 22, 457–464. CrossRef IUCr Journals Web of Science Google Scholar
Chapman, H. N. et al. (2011). Nature (London), 470, 73–77. Web of Science CrossRef CAS PubMed Google Scholar
Chen, V. B., Arendall, W. B. III, Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef IUCr Journals Google Scholar
Cohen, A. E. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 17122–17127. Web of Science CrossRef CAS PubMed Google Scholar
Diederichs, K. & Karplus, P. A. (2013). Acta Cryst. D69, 1215–1222. Web of Science CrossRef CAS IUCr Journals Google Scholar
Emma, P. et al. (2010). Nat. Photon. 4, 641–647. Web of Science CrossRef CAS Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Ferrer, J.-L., Larive, N. A., Bowler, M. W. & Nurizzo, D. (2013). Exp. Opin. Drug. Discov. 8, 835–847. Web of Science CrossRef CAS Google Scholar
Fox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886–891. CrossRef CAS IUCr Journals Web of Science Google Scholar
French, S. & Wilson, K. (1978). Acta Cryst. A34, 517–525. CrossRef CAS IUCr Journals Web of Science Google Scholar
Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130. CrossRef IUCr Journals Web of Science Google Scholar
Hart, P. et al. (2012). Proc. SPIE, 8504, 85040C. CrossRef Google Scholar
Hattne, J. et al. (2014). Nat. Methods, 11, 545–548. Web of Science CrossRef CAS PubMed Google Scholar
Helliwell, J. R. (2005). Acta Cryst. D61, 793–798. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hülsen, G., Brönnimann, C. & Eikenberry, E. F. (2005). Nucl. Instrum. Methods Phys. Res. A, 548, 540–554. Google Scholar
Juers, D. H., Lovelace, J., Bellamy, H. D., Snell, E. H., Matthews, B. W. & Borgstahl, G. E. O. (2007). Acta Cryst. D63, 1139–1153. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kabsch, W. (2014). Acta Cryst. D70, 2204–2216. Web of Science CrossRef IUCr Journals Google Scholar
Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033. Web of Science CrossRef CAS PubMed Google Scholar
Kern, J. et al. (2012). Proc. Natl Acad. Sci. USA, 109, 9721–9726. Web of Science CrossRef CAS PubMed Google Scholar
Kern, J. et al. (2013). Science, 340, 491–495. Web of Science CrossRef CAS PubMed Google Scholar
Kern, J. et al. (2014). Nat. Commun. 5, 4371. Web of Science CrossRef PubMed Google Scholar
Kirian, R. A., Wang, X., Weierstall, U., Schmidt, K. E., Spence, J. C. H., Hunter, M., Fromme, P., White, T., Chapman, H. N. & Holton, J. (2010). Opt. Express, 18, 5713–5723. Web of Science CrossRef PubMed Google Scholar
Kirian, R. A., White, T. A., Holton, J. M., Chapman, H. N., Fromme, P., Barty, A., Lomb, L., Aquila, A., Maia, F. R. N. C., Martin, A. V., Fromme, R., Wang, X., Hunter, M. S., Schmidt, K. E. & Spence, J. C. H. (2011). Acta Cryst. A67, 131–140. Web of Science CrossRef CAS IUCr Journals Google Scholar
Liu, W. et al. (2013). Science, 342, 1521–1524. Web of Science CrossRef CAS PubMed Google Scholar
Lomb, L. et al. (2011). Phys. Rev. B, 84, 214111. Web of Science CrossRef Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
Mueller-Dieckmann, C., Polentarutti, M., Djinovic Carugo, K., Panjikar, S., Tucker, P. A. & Weiss, M. S. (2004). Acta Cryst. D60, 28–38. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nave, C. (1998). Acta Cryst. D54, 848–853. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nave, C. (2014). J. Synchrotron Rad. 21, 537–546. Web of Science CrossRef CAS IUCr Journals Google Scholar
Owen, R. L., Rudiño-Piñera, E. & Garman, E. F. (2006). Proc. Natl Acad. Sci. USA, 103, 4912–4917. Web of Science CrossRef PubMed CAS Google Scholar
Padilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124–1130. Web of Science CrossRef CAS IUCr Journals Google Scholar
Redecke, L. et al. (2013). Science, 339, 227–230. Web of Science CrossRef CAS PubMed Google Scholar
Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570–581. CrossRef CAS IUCr Journals Web of Science Google Scholar
Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2006). J. Appl. Cryst. 39, 158–168. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sauter, N. K., Hattne, J., Brewster, A. S., Echols, N., Zwart, P. H. & Adams, P. D. (2014). Acta Cryst. D70, 3299–3309. Web of Science CrossRef IUCr Journals Google Scholar
Sawaya, M. R. et al. (2014). Proc. Natl Acad. Sci. USA, 111, 12769–12774. Web of Science CrossRef CAS PubMed Google Scholar
Sierra, R. G. et al. (2012). Acta Cryst. D68, 1584–1587. Web of Science CrossRef CAS IUCr Journals Google Scholar
Stein, N. (2007). CCP4 Newsl. 47, contribution 9. Google Scholar
Suga, M., Akita, F., Hirata, K., Ueno, G., Murakami, H., Nakajima, Y., Shimizu, T., Yamashita, K., Yamamoto, M., Ago, H. & Shen, J.-R. (2015). Nature (London), 517, 99–103. Web of Science CrossRef CAS PubMed Google Scholar
Tenboer, J. et al. (2014). Science, 346, 1242–1246. Web of Science CrossRef CAS PubMed Google Scholar
Terwilliger, T. C., Adams, P. D., Read, R. J., McCoy, A. J., Moriarty, N. W., Grosse-Kunstleve, R. W., Afonine, P. V., Zwart, P. H. & Hung, L.-W. (2009). Acta Cryst. D65, 582–601. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weierstall, U. et al. (2014). Nat. Commun. 5, 3309. Web of Science CrossRef PubMed Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
White, T. A. (2014). Philos. Trans. R. Soc. London B, 369, 20130330. Web of Science CrossRef Google Scholar
Wilson, A. J. C. (1949). Acta Cryst. 2, 318–321. CrossRef IUCr Journals Web of Science Google Scholar
Winkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901–911. CrossRef CAS IUCr Journals Web of Science Google Scholar
Yano, J., Kern, J., Irrgang, K. D., Latimer, M. J., Bergmann, U., Glatzel, P., Pushkar, Y., Biesiadka, J., Loll, B., Sauer, K., Messinger, J., Zouni, A. & Yachandra, V. K. (2005). Proc. Natl Acad. Sci. USA, 102, 12047–12052. Web of Science CrossRef PubMed CAS Google Scholar
Zhang, Z., Sauter, N. K., van den Bedem, H., Snell, G. & Deacon, A. M. (2006). J. Appl. Cryst. 39, 112–119. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zhu, D., Cammarata, M., Feldkamp, J. M., Fritz, D. M., Hastings, J. B., Lee, S., Lemke, H. T., Robert, A., Turner, J. L. & Feng, Y. (2012). Appl. Phys. Lett. 101, 034103. Web of Science CrossRef Google Scholar
Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, 26–35. Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.