research papers
Optimization of data collection taking radiation damage into account
^{a}EMBL Hamburg Outstation, c/o DESY, Notkestrasse 85b, 22607 Hamburg, Germany, and ^{b}European Synchrotron Radiation Facility, 6 Rue Jules Horowitz, BP220, 38043 Grenoble, France
^{*}Correspondence email: apopov@esrf.fr
To take into account the effects of radiation damage, new algorithms for the optimization of datacollection strategies have been implemented in the software package BEST. The intensity variation related to radiation damage is approximated by loglinear functions of resolution and cumulative Xray dose. Based on an accurate prediction of the basic characteristics of data yet to be collected, BEST establishes objective relationships between the accessible data completeness, resolution and signaltonoise statistics that can be achieved in an experiment and designs an optimal plan for data collection.
Keywords: Xray data collection; protein crystals; BEST software; radiation damage.
1. Introduction
One of the main problems in data collection from macromolecular crystals is Xray radiation damage to the crystals. Radiation damage is the result of complex physical and chemical processes induced by absorbed Xray photons (see reviews by Ravelli & Garman, 2006; Garman & Owen, 2006). It occurs at any temperature and leads to a resolutiondependent reduction in diffraction intensity, changes in unitcell parameters and crystal mosaicity, slight rotations and translations of protein molecules in the lattice, disulfidebond breaks and decarboxylation of acidic residues (Burmeister, 2000; Weik et al., 2000; Ravelli & McSweeney, 2000). At cryotemperatures a large improvement in the crystal lifetime is obtained compared with that at room temperature (Haas & Rossmann, 1970). Damage at cryogenic temperatures is a function of Xray dose and shows no significant doserate dependence over the range of fluxes available at thirdgeneration synchrotron sources (Sliz et al., 2003). Radiation damage limits the information that can be obtained from a single crystal. It can also induce specific chemical modifications in the protein, which in turn can make the biological interpretations based on such an Xray experiment problematic (Dubnovitsky et al., 2005).
The effects of radiation damage must be taken into account when designing an optimal datacollection strategy, especially at thirdgeneration synchrotron undulator beamlines, where the empirical `radiation dose limit for cryocooled protein crystals' (Owen et al., 2006) can be reached after a few seconds of irradiation. An incorrect choice of datacollection parameters can easily lead to failure of the experiment.
Here, we present a further development of the methods and of the computer program BEST (Popov & Bourenkov, 2003; Bourenkov & Popov, 2006) for optimal planning of Xray data collection from macromolecular crystals. The strategydetermination method has been extended to take radiation damage into account. BEST models the statistical results of data collection based on the processing of a few initial images. The radiationdamage model in BEST accounts for both average intensity decay and radiationinduced nonisomorphism; model parameters common to a wide range of macromolecular structures are used in combination with the program RADDOSE for doserate calculations (Murray et al., 2004) and, under the assumption that the crystal size is matched to the size of the beam, only requires a beamline with calibrated The key feature of the BEST strategy is compensation of the signal loss arising from overall intensity decay by a gradual increase in the exposure time.
2. Overview of the method
The datacollection optimization method in BEST (Popov & Bourenkov, 2003) is based on modelling the data statistics prior to the experiment using the information extracted from a few initial diffraction images. To a certain extent, the algorithm within BEST is analogous to the methods that have been developed to allow the simulation of diffraction patterns (Sarvestani et al., 1998; Holton, 2008; Diederichs, 2009). A number of generalizations and approximations implemented in BEST make it very efficient computationally. The basic ideas are as follows.
i.e. taking into account the dynamic alterations of a structure that are induced by the measurement process.2.1. Radiationdamage model
2.1.1. Resolutiondependent intensity decay
The change in the scattering power after exposure to a radiation dose D is expressed in our model by a change in the . Fig. 1(a) shows an experimental example of its radial projection, , for two data sets measured from one of our test samples (P19–siRNA1A; see §4.2 for experimental details) and covering the same narrow rotation range (3°) at an effectively zero dose and after an Xray burn causing absorption of a dose D = 32 MGy. The total dose received by the crystal for each wedge was 0.54 MGy. Following common crystallographic methodology, the functions for a pair of are related by the relative Bfactor scaling, with the scale and isotropic B factor being functions of dose,
Fig. 1(b) shows the relative scale and B factors as a function of D determined in a series of such exposures. Here, the crystal was irradiated so that it absorbed a dose of 1.5 MGy between data collections. The example illustrates typical behaviour, characterized by a linear increase in the Debye–Waller factor B(D) = βD, where β is a constant scale factor representing the intensitydecay rate. Such a dependence has been observed in our systematic studies involving a large number of model structures and a variety of irradiation conditions (dose rates at different synchrotrons; Bourenkov et al., 2006). The linearity of the Bfactor increase with the dose has been confirmed in an independent study by Thorne and coworkers (Kmetko et al., 2006). Moreover, the decay rates observed in these two investigations of β ≃ 1 Å^{2} MGy^{−1} are also in very close agreement. These results are furthermore in good agreement with the linear decay of the net diffraction intensity in a broad resolution shell (h^{−1} > 2.5 Å) to 50% after a radiation dose of 43 MGy observed by Garman and coworkers (Owen et al., 2006), despite differences in the details of the data analysis. To relate this `radiation dose limit for cryocooled protein crystals' to the Bfactor decay model, it is sufficient to integrate the function over a corresponding resolution shell. An extensive discussion unifying many observations supporting this model is given by Holton (2008).
In addition, it is worth noting that the increase in the Debye–Waller factor accounts for more than a tenfold decrease in the scattering power at D = 32 MGy and h^{−1} = 2.5 Å, whereas the change in the relative scale factor is responsible for a decrease of less than 20%. Presumably, the variation in scale factor can be neglected in a statistical model which aims to optimize the collection of highresolution data.
2.1.2. Radiationinduced nonisomorphism
Similar to classical Bfactor scaling, radiationinduced nonisomorphism can be described by means of the well known Luzzati model (Luzzati, 1953). The nonisomorphism between two closely related structures, in our case one fresh and one irradiated to absorb a dose D, is modelled by a standard resolutiondependent nonisomorphism parameter σ_{A} (Read, 1986). We denote σ_{B}(h, D) as an expected absolute difference between reflection intensities at different doses and ,
Appropriate renormalization (scaling) of the `damaged crystal' data by a factor is assumed.
In our model, σ_{A} is expressed as an exponential function of both the dose and the resolution, σ_{A}(h, D) = exp(−αDh^{2}/4). The exponential dependence of σ_{A} on the resolution has a direct analogy with methods of σ_{A} modelling in structure and phasing (e.g. Murshudov et al., 1997; de La Fortelle & Bricogne, 1997), where the representation of σ_{A} by a single exponential (as well as Bfactor scaling) simply corresponds to the assumption that it is the same number of atoms in both structures that are being related. This assumption holds rather well in our case. The linearity with dose and quantification of the decay parameter α are substantially more difficult to demonstrate experimentally (compared with that shown in the previous section for B factors). This is because the variance represented by σ_{A} is always strongly convoluted with experimental errors and separating the two contributions requires rather elaborate data analysis. We have carried out such an analysis on a large number of model structures (Bourenkov et al., 2006), but the details are beyond the scope of this paper and will be published elsewhere.
For a pair of redundant or symmetryequivalent observations recorded after absorbed doses D_{1} and D_{2}, we define an exponential model of the as a function of dose and resolution,
which expresses, given a small value of the parameter α, our expectation that for a small increment the two observations will show small radiationinduced differences from each other.
2.2. Optimization of data collection
2.2.1. Signaltonoise dependence on dose
Let us consider a rotation interval (wedge) Φ of data measured with a constant t_{exp} and Δ_{φ} at a dose rate ρ_{D} (in Gy s^{−1}). The width of this interval, Φ, is chosen to be small compared with the rotation range required for a complete data set but substantially broader than the integration range of a single reflection (e.g. Φ ≃ 5°). The expected value of the intensity of a reflection h observed at a spindle position φ ∈ Φ, with β being the intensitydecay parameter defined above, is given by
and the expected value of its
isAveraging and for a list of reflections predicted at Φ, one obtains an expected value of the signaltonoise ratio for a resolution shell h as a function of exposure time and rotation range per frame, /. Fig. 2(a) represents an example of such a function of exposure time (Δ_{φ} = 1° is fixed) modelled for a crystal of cubic insulin (see §4.1 for experimental details). For comparison, the same model is shown for the hypothetical case of ρ_{D} = 0. Neglecting the radiation damage, the maximum attainable signaltonoise ratio is limited by the contribution of the instrumental error (k_{2}) or by the of the detector. For this example, with an exposure time of t_{exp} ≥ 20 s no data could be collected at a resolution h^{−1} ≥ 1.5 Å owing to detector overload. The radiation damage sets an absolute limit on the statistics of ≤ 3.5, which could be attained using an optimal t_{exp} = 2.5 s per 1° rotation (for a given interval but not for a complete data set). It is obvious that the pattern in Fig. 2(a) would shift monotonically downwards and to the right for higher resolutions [smaller, faster decaying and shorter exposures] and vice versa at lower resolutions.
2.2.2. Formulation of the optimization problem
Let us further assume that the rotation range providing a complete data set is chosen and partitioned into a series of consecutive subwedges Φ_{i}. Optimizing the data collection then means searching for a set of exposure parameters {t_{expi}, } that satisfy a set of simultaneous equations
at a highest possible resolution h = h_{max}(C). The statistical signaltonoise target C must be chosen according to the crystallographic problem being addressed. The choice of C typically accounts for the data multiplicity given by the choice of rotation interval (assuming that the signaltonoise ratio in a complete data set will be inversely proportional to the square root of the multiplicity).
2.2.3. The algorithm
The solution is found iteratively via a highly efficient computational procedure. For a first trial, a high value of h is selected such that no solution to (6) is possible even for a first subwedge (the requested signaltonoise ratio is above the maximum). h is decremented by a small step until the solution {t_{exp1}, } in a first subwedge is found. As can be seen from Fig. 2, the solution is not unique and, obviously, the solution with the highest speed of rotation (and hence with the lowest radiation dose) is selected. The constraints on which are set by reflection spatial overlaps are taken into account. The expected decrease in scattering power induced by the dose D_{1} = ρ_{D}Φ/ω_{1} accumulated while collecting the first wedge is then considered by substituting by in (4) before the iteration proceeds to a second wedge. There the solution (if it exists) will again be found, typically with a slower rotation and a higher dose D_{i+1} > D_{i} required etc. Thus, the optimization problem is solved by decrementing the resolution until the h_{max} is found at which the solution to (6) exists for the last subwedge.
Figs. 2(b) and 2(c) illustrate an optimization procedure for the above example of insulin. The full required interval of 20° was split into four subwedges. C = 2 was selected as an optimization target. Only the first two subwedges could be measured with the required signaltonoise ratio at a resolution of 1.50 Å. A solution does not exist for a third subwedge. However, a solution does exist for all four subwedges at a resolution of 1.55 Å.
2.3. Predictive merging statistics
The quality and internal consistency of the data sets are characterized by statistics expressing the variation of multiple (redundant and symmetryequivalent) observations with respect to their σ^{−2}weighted average. Let us consider a set of m_{hkl} such observations J_{hj}^{o} of a unique reflection hkl observed at respective dose values D_{hj} and rotation speeds ω_{hj}. The expected standard uncertainties are obtained by substituting the dose rate and measurement conditions into (4) and (5). If frametoframe scaling uses the first frame in the data set as a reference, an expected scale factor applied to the jth observation is approximately s_{hj} ≃ exp(−βD_{hj})ω_{1}/ω_{hj}, where is the rotation speed of the first subwedge. Denoting = and expanding standard equations for σ^{−2}weighted merging (J_{hkl}^{o} = ), it is easy to show that the variance of J_{hj}^{o}s_{hj} about J_{hkl}^{o} is expressed by = . Note that only accounts for statistical measurement errors in the data.
Another independent term that contributes to the above variance originates from radiationinduced nonisomorphism. Following similar considerations for statistical variance and constructing a covariance matrix for a set of observations with considerations according to (2) and (3) one obtains (omitting straightforward derivation)
Here, δ_{ij} is a Kronecker delta.
The expected value of R_{merge} is then approximated to
The multiplier 2/π reflects the fact that is the variance of a sample from a normal distribution (measurement errors), whereas is associated with an exponential distribution (see, for example, Srinivasan & Parthasarathy, 1976). The function obeys the metric of the crystal.
Finally, the average signaltonoise in the merged data, 〈J/σ(J)〉, which is usually estimated in data processing after applying some fudge factors correcting for unaccounted radiationinduced variance, is approximated by
Estimations according to (8) and (9), computed by summation over unique hkl in either the resolution shells or for a data set, are directly comparable with the respective values obtained from data processing.
3. Implementation
The above formulations were implemented in the program BEST (versions 3.0 and higher). BEST uses as input the results (the basic crystallographic parameters and integrated intensities) of the processing of the initial images by HKL (Otwinowski & Minor, 1997), MOSLFM (Leslie, 1992) or XDS (Kabsch, 1993). The background scattering pattern is obtained from the MOSFLM or XDS output or evaluated by BEST directly from the diffraction images. For the radiationdamage model the only required parameter is a dose rate. In the current implementation the parameters of the decay model α and β are fixed at 0.1 and 1.0 Å^{2} MGy^{−1}, respectively.
The optimization process begins by finding the shortest rotation range that provides a complete data set for starting at φ = 0. The statistical signaltonoise target of in the highest resolution shell defined by the user is divided by the square root of the multiplicity in this interval to obtain the optimization constant C (6). Thus, the user request is related to the statistics of a complete data set. Note that for the sake of computational efficiency the optimization target is different from, although very similar to, the 〈J/σ(J)〉 signaltonoise statistic that is used for judging the final data quality. The rotation range is partitioned into narrow (2–5°) subwedges and optimization is carried out as outlined in §2.2, which results in determination of the attainable resolution h_{max}(C) and an associated set of {t_{exp}, Δ_{φ}} pairs. The procedure is repeated for all starting angles in steps of 1°. The rotation interval that provides the highest attainable resolution is then again extended while h_{max}(C) increases. Thus, both the starting angle of data collection and the multiplicity are optimized. The implementation allows the application of a variety of constraints, for example on the rotation interval, the minimum acceptable multiplicity or Δ_{φ}, the total dose or total time of an experiment. The maximum resolution may also be constrained (to a value below an attainable resolution). In this case, the rotation interval is chosen using a minimumdose criterion.
In order to simplify the practical implementation of this multisubwedge datacollection strategy with currently available datacollection interfaces, as well as further data reduction with available software, the small subwedges are appropriately recombined into a few (typically 3–6) larger subwedges of variable length. Thereby, insignificant differences in the optimal t_{exp} and Δ_{φ} between the adjacent small subwedges are smoothed out. This final datacollection strategy, consisting of a datacollection resolution (i.e. the detector distance) and a set of quadruples {φ_{start}, number of frames, t_{exp}, Δ_{φ}} is presented to the user as a final solution, together with a set of expected standard data statistics comprising completeness, multiplicity, R_{merge}, and 〈J/σ(J)〉 in the resolution shells.
4. Testing
In the following section, experimental examples are presented that demonstrate the validity of the approach. All measurements were carried out at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) on beamline ID231 (Nurizzo et al., 2006). The detector was an ADSC Q315. The Xray beam profile at ID231 has a Gaussian shape, with FWHM (fullwidth halfmaximum) dimensions of 30 µm vertically and 40 µm horizontally at the sample position. The incidentbeam intensity was monitored continuously and the monitors were calibrated to an absolute scale (photons s^{−1}) over the whole energy range. The exposure time per image at ID231 was not shorter then 0.1 s; in cases where shorter exposures were needed the beam was attenuated. An exposure time of 0.1 s and a rotation width of 1° were used for collecting initial images in all experiments
The program RADDOSE (Murray et al., 2004) was used to estimate the absorbed dose on the basis of structure composition and crystallization conditions as indicated in the literature reference for each of the samples (except for FtsH). MOSFLM (Leslie, 1992) was used to process both the initial images and the collected data sets and SCALA (Evans, 2006; Collaborative Computational Project, Number 4, 1994) was used for scaling and evaluating the data statistics. For comparison of predicted and observed intensitydecay curves, the resolutiondependent scale factors versus frame number were extracted from the SCALA output.
4.1. Insulin
Small (35 µm) equidimensional bovine insulin crystals (Nanao et al., 2005) were used for testdata collection. The crystals belonged to I2_{1}3, with unitcell parameter a = 77.9 Å. The incidentbeam wavelength was 0.97 Å. The beam was attenuated by a factor of 2. The was 1.0 × 10^{12} photons s^{−1} and the estimated dose rate was 0.3 MGy s^{−1}. One initial image was measured to 1.5 Å resolution in order to evaluate the crystal quality and to produce the input data for BEST modelling, including those presented in Fig. 2. Subsequently, 300 images were collected with t_{exp} = 0.1 s, Δ_{φ} = 1° and a resolution of 1.65 Å. Three data sets were obtained after processing and scaling these images. The first data set included the first 20 images and provided a complete (99%) data set with a multiplicity of 2.5 and a low total absorbed dose of 0.6 MGy, the second included 150 images (multiplicity of 18.6 and dose of 4.5 MGy) and the third included all data (multiplicity of 34.9 and dose of 9 MGy). The R_{merge} and 〈J/σ(J)〉 statistics for these data sets are compared with BEST predictions in Figs. 3(a) and 3(b), respectively. The example shows that BEST can accurately predict the statistical characteristics of data sets over a broad range of absorbed doses. The apparent mismatch of the predicted and observed 〈J/σ(J)〉 statistics in lowresolution shells arises from unaccountedfor systematic errors that are at the level of <1% of the intensity.
Experimental intensitydecay curves in three resolution shells are compared with the decay model used in BEST for statistical predictions in Fig. 3(c). The nonmonotonic character of the experimental curves is clearly a consequence of the combination of a slight mismatch of the crystal size with the vertical beam size and minor miscentring of the sample. Despite a noticeable inconsistency between the model and actual measurement conditions, the statistical predictions are in good agreement with the data.
4.2. P19–siRNA
Crystals of viral RNA suppressor P19 in complex with small interfering RNA from tomato bushy stunt virus (P19–siRNA; Ye et al., 2003) belonged to R32, with unitcell parameters a = b = 90.5, c = 148.9 Å. The needlelike shape of the crystals, which were 200–300 µm in length and 25 µm thick, permitted the collection of several data sets from the same crystal by translating an unexposed volume into the beam. The incidentbeam wavelength was 0.99 Å.
For the irradiation experiment described in §2.1.1 the was 2.75 × 10^{12} photons s^{−1} (dose rate 0.54 MGy s^{−1}). A fresh part of the same crystal was used for each data collection (P19–siRNA1A). During this experiment, the was 2.2 × 10^{12} photons s^{−1} (dose rate 0.4 MGy s^{−1}). Two initial images were measured with a 1° rotation at 0° and 90° angles, respectively, with an exposure time of 0.1 s and resolution of 2.3 Å. A target value of = 2 was set in BEST. The strategy calculation showed that a complete data set could be collected to a resolution of 2.45 Å with a total exposure time of 44 s corresponding to a dose of 17.6 MGy. The datacollection strategy is shown in Table 1; the optimal rotation width was 0.8° for all four subwedges.

After collecting the P19–siRNA1 data set, the crystal was recentred on an unexposed part and a second data set, P19–siRNA1B, was collected using the same starting angle (136°), number of frames (36) and Δ_{φ} as for P19–siRNA1A but with a constant exposure time of 1.22 s, i.e. with a total dose equal to that in P19–siRNA1A. Predicted and calculated data statistics for both data sets are shown by resolution shell in Fig. 4(a); Fig. 4(b) demonstrates how well the BEST model describes the diffractionintensity drop with absorbed dose under closetoideal exposure conditions, i.e. when the crystal is smaller than the beam in a vertical direction.
Even though the same `optimum' total dose was used for both data sets, the data statistics are noticeably worse for P19–siRNA1B. The effect of decay compensation by exposure time in P19–siRNA1A is less pronounced when looking at the spherically averaged 〈J/σ(J)〉 statistics, which are insensitive with respect to the in signaltonoise distribution within a resolution shell. The significant increase in R_{merge} in highresolution shells is indicative of a severe degradation of the diffracted intensity towards the last frames of P19–siRNA1B (Fig. 4b). This was correctly predicted and successfully compensated for by increasing the exposure time of the last frames in P19–siRNA1A.
In a second experiment, a different more strongly diffracting P19–siRNA crystal was used. The ^{12} photons s^{−1} and the dose rate was 0.2 MGy s^{−1}. An identical initial imagecollection procedure (but with the detector distance set to yield a resolution of 2.0 Å) and calculations resulted in a strategy for the P19–siRNA2A data set (Table 2) at a resolution of 2.06 Å with a total exposure time of 44 s and a dose of 8.7 MGy.
was 1.1 × 10

Next, three further data sets, P19–siRNA2B, P19–siRNA2C and P19–siRNA2D, were collected from the same crystal translated to an unexposed region for each. For these data sets the same rotation range as for P19–siRNA2A was used (i.e. the same starting angle and constant Δ_{φ} = 1°; the number of frames was 42). t_{exp} was 1.05, 0.5 and 1.5 s for P19–siRNA2B, P19–siRNA2C and P19–siRNA2D, respectively, corresponding to equal total doses for P19–siRNA2A and P19–siRNA2B, an approximately 50% lower dose for P19–siRNA2C and a 50% higher dose for P19–siRNA2D. The data statistics for all four data sets are compared in Fig. 5. The statistics of P19–siRNA2A are clearly better than those of the other data sets in the highresolution shells.
4.3. FAE
Crystals of the feruloyl esterase module of xylanase 10B from Clostridium thermocellum (FAE; Prates et al., 2001) belonged to P2_{1}2_{1}2_{1}, with unitcell parameters a = 65.4, b = 108.8, c = 113.9 Å. The ESRF storage ring was operated at only 30 mA current, so the beam was only 0.3 × 10^{12} photons s^{−1}. The wavelength was 0.99 Å. Two initial images were measured with 1° rotation at 0° and 90° with an exposure time of 0.1 s and a resolution of 1.2 Å at the edge of the detector.
In this experiment the crystal size substantially exceeded the beam size. Obviously, under such conditions an essential assumption of the model, namely that at a rotation angle φ the diffracting volume receiving the dose D = ρ_{D}t_{exp}(φ − φ_{start})/Δ_{φ} (in equation 5) is the same, does not hold as fresh unexposed fractions of the crystal are coming into the beam during rotation. In order to partly compensate for this effect, a dose rate of 24 kGy s^{−1} was used in strategy optimization instead of an estimated nominal (for a static sample) dose rate of 60 kGy s^{−1}. This reduces the dose rate by a (fudge) factor of 2.5, which is approximately equal to the ratio of the maximum crystal size in the direction normal to the spindle axis to the vertical FWHM size of the beam. The strategy optimization with a requested of 2 in the last resolution shell showed that a complete data set could be collected to 1.3 Å with a total exposure time of 217 s (Table 3). Despite this rather simplistic approach, which may only roughly compensate for the lack of information on the real behaviour of the exposed crystal volume as a function of rotation angle (see §5), the predicted and observed data statistics (Fig. 6a), as well as the predicted and observed intensitydecay curves in resolution shells (Fig. 6b), agree well.

4.4. FtsH
The 70 kDa membrane protein FtsH from Aquifex aeolicus crystallizes in I222, with unitcell parameters a = 137.9, b = 162.1, c = 170 Å and three FtsH molecules in the The crystals grew in 60% Tacsimate pH 7.0 and 10 mM AMPPNP and exhibited moderate diffraction quality. A bipyramidal sample approximately 120 µm in the largest dimension and 50 µm in the smallest dimension was used for data collection at a wavelength of 1.055 Å and a beam of 4 × 10^{11} photons s^{−1}. The estimated dose rate was nominally 70 kGy s^{−1}. In order to exploit nearly the whole crystal volume, the sample position relative to the beam was changed five times during data collection, with a relatively small rotation of 30° used per position.
Thus, it appeared possible to collect 150° of data with a multiplicity of about 6. Under these conditions, ≃ 3 for the last resolution shell (3.25–3.15 Å) in a complete data set would be reached provided that five 30° data wedges were measured so that ≃ 1.5 in each of them. The latter was set as a statistical target in the optimization of (constant) exposure time and oscillation width for a 30° wedge starting at 0°. An initial image measured at φ = 15° was used in BEST. The decay compensation normally achieved by changing the exposure time was disabled, simply because the manual implementation of data collection and processing for a large number of (sub)wedges would have been too tedious to perform and prone to mistakes. Optimization resulted in an achievable resolution limit of 3.15 Å, with t_{exp} = 2.0 s and Δ_{φ} = 0.50°. For an optimized wedge, the experimental decay curves and the dataprocessing statistics are in excellent agreement with the data (Figs. 7a and 7b). By repeating the same strategy for another four wedges, a complete data set was collected.
Despite the complications, the data set was of good quality (Table 4) and the data statistics are close to expected values. The structure was solved by a short time after the experiment (Vostrukhina & Baumann, personal communication).

It is worth noting that for this particular example the residual scattering intensity at the end of data collection is ∼65% of the starting value in the last resolution shell (Fig. 7b), which is a much larger decrease than in all of the other examples (Figs. 3c, 4b and 6b). This is a consequence of the fact that we disabled the facility for changing the exposure time to compensate for decay and this example provides a good illustration of the advantages of such compensation. The residual scattering power would still have permitted the collection of more data on the same part of the crystal, suggesting that even longer exposures might have been used to improve the signaltonoise ratio. As the BEST calculations show, this was not the case. For longer exposures the signal to noise would improve only in the first frames of the wedge; it would degrade even more strongly for the last frames and thus degrade overall. The validity of the calculations is in turn directly supported by the experimental data (Fig. 7a).
5. Discussion
Experimenters collecting data on undulator beamlines have been confronted with the dilemma of underexposing versus overexposing their samples for a long time. Without a doubt, an educated crystallographer possessing significant experience in data collection on a particular at a particular instrument would usually find closetooptimal conditions (e.g. similar to those shown in Fig. 5). Here, we demonstrate that under experimental conditions close to the model assumptions (i.e. the instrument is calibrated, the beam size matches the crystal size and the chemical composition of the sample is approximately known) our approach delivers an optimal datacollection strategy in a systematic way. It would be difficult (in our hands, rather impossible) to find notably better strategies.
Furthermore, as the application examples demonstrate, the method is tolerant with respect to the deviations from ideal conditions in real experiments. For instance, in the case of the FAE crystals, which were highly mismatched in size to the beam dimensions, we were able to adapt the model simply by applying a fudge factor to the dose rate. A fudge factor equal to the ratio of the beam size to crystal size is roughly applicable for any ). This further indicates that the requirements for the accuracy of the fluxdensity calibration and other parameters involved in the dose calculations are essentially relaxed. As a rule of thumb, ∼20% accurate doserate estimates would be sufficient for practical purposes.
or redundancy. Such tolerance is directly explained by a very slow variation in signal to noise with the absorbed dose in the vicinity of the maximum (Fig. 2Nevertheless, the assumption that the beam size matches the crystal size currently remains a major limitation to the accuracy of the method. In many cases, for example for large platelike crystals measured in a small beam, the errors in the statistical prediction will be much larger. Here, the datacollection procedures need to employ multiple recentrings or some other manoeuvres similar to those described for the example of FtsH. This application demonstrates that the radiationdamage modelbased optimization can be used successfully in more complex scanning diffraction experiments. If a threedimensional model of the crystal shape and a twodimensional model of the beam profile were available, further development of the model which could take this information into account appears to be fairly straightforward. For crystal sizes in the range of several tens of micrometres or larger, methods of sampleshape characterizations exist (Leal et al., 2008; Brockhauser et al., 2008). Thus, for the range of beam sizes and crystals at a normal macromolecular crystallography beamline, such as ID231 at the ESRF, this development is technically feasible. Extension of the technique to micrometresized beam applications (Moukhametzianov et al., 2008) will be more demanding, but will be justified by the anticipation of a very significant gain in the data quality under the extreme dose rates delivered by the microbeams.
Another limitation to the practical applicability of the method at the beamlines may be related to a certain increase in the complexity of the datacollection procedure. This is largely overcome by software integration, e.g. in the EDNA online dataanalysis framework (Incardona et al., 2009).
The demonstrated tolerance of the method with respect to deviations from ideal model conditions can be extrapolated to the possible variations in radiationsensitivity between different macromolecular structures. Until now, we have not been confronted with a sample that could confidently be classified as significantly more or significantly less radiationsensitive compared with the samples described by default model parameters (α and β); in practice, apparent deviations in radiationsensitivity often do not arise from a specific feature of a but rather from a mismatched beam size, miscalibration or other technical problems. If such an example were to occur, it could be resolved by recalibrating the model in a preliminary experiment involving a sacrificial sample or a part of the sample. The optimization algorithm can easily accommodate a change in the empirical or, if required, an alternative to the simple exponential model used here.
It is important to note that our radiationdamage model is essentially incomplete and may not be able to exhaustively account for the whole variety of radiationinduced processes occurring in crystals during data collection and their effects on the structure factors. It only accounts for the most pronounced systematic effects, the `global' damage following the terminology of Holton (2008), and has the sole purpose of optimizing the data collection. `Specific' damage is neglected. The optimization method is geared towards providing data to the highest possible resolution and implies a risk of inducing strong sitespecific damage. This may lead in some particular cases to misinterpretations of the structure. Whenever data on the radiationsensitivity of a site in question are available, appropriate dose constraints should be used in strategy optimization. Such an option is available in BEST. Note that BEST optimization will provide the optimum datacollection conditions and also the highest possible resolution in such cases.
A further possible consequence of choosing the last resolutionshell statistics and the resolution limit as optimization targets is that associated lowresolution data may not be collected optimally at the same time. One can see this effect in all the data presented here in Fig. 5. In this sense, the method described here is only applicable to a range of experiments aiming at data collection to the highest possible resolution but at the limit of statistical significance. Even for such experiments, a separate lowresolution collection run often appears to be useful irrespective of detector overloads. This can easily be planned together with the highresolution pass and only requires a separate run of BEST with an appropriate dose constraint (e.g. a small fraction, <10%, of the dose allocated to a highresolution pass). For experiments aiming at highly accurate data at low to medium resolution, as in an phasing experiment, the criterion used in this work would not be a suitable optimization target. We have derived a new statistical target specifically for the optimization of SAD data collection that is directly related to the noise in anomalous difference data and have developed methods of optimizing the data collection to this target. A manuscript describing these results is currently in preparation.
The program BEST is available for download at https://www.emblhamburg.de/BEST.
Acknowledgements
We would like to thank Lucy Malinina for providing P19–siRNA crystals and Marina Vostrukhina for providing FtsH crystals. This work was partially supported by the ECfunded project BIOXHIT (https://www.bioxhit.org), contract No. LHSGCT2003503420. We gratefully acknowledge access to beamtime at the ESRF under Radiation Damage BAGs MX551, 666, 812 and 931.
References
Bourenkov, G. P., Bogomolov, A. & Popov, A. N. (2006). Fourth International Workshop on Xray Damage to Biological Crystalline Samples, SPring8, Japan. Google Scholar
Bourenkov, G. P. & Popov, A. N. (2006). Acta Cryst. D62, 58–64. Web of Science CrossRef CAS IUCr Journals Google Scholar
Brockhauser, S., Di Michiel, M., McGeehan, J. E., McCarthy, A. A. & Ravelli, R. B. G. (2008). J. Appl. Cryst. 41, 1057–1066. Web of Science CrossRef CAS IUCr Journals Google Scholar
Burmeister, W. P. (2000). Acta Cryst. D56, 328–341. Web of Science CrossRef CAS IUCr Journals Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Diederichs, K. (2009). Acta Cryst. D65, 535–542. Web of Science CrossRef CAS IUCr Journals Google Scholar
Dubnovitsky, A. P., Ravelli, R. B. G., Popov, A. N. & Papageorgiou, A. C. (2005). Protein Sci. 14, 1498–1507. Web of Science CrossRef PubMed CAS Google Scholar
Evans, P. (2006). Acta Cryst. D62, 72–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Garman, E. F. & Owen, R. L. (2006). Acta Cryst. D62, 32–47. Web of Science CrossRef CAS IUCr Journals Google Scholar
Haas, D. J. & Rossmann, M. G. (1970). Acta Cryst. B26, 998–1004. CrossRef CAS IUCr Journals Web of Science Google Scholar
Holton, J. M. (2008). Acta Cryst. A64, C77. CrossRef IUCr Journals Google Scholar
Incardona, M.F., Bourenkov, G. P., Levik, K., Pieritz, R. A., Popov, A. N. & Svensson, O. (2009). J. Synchrotron Rad. 16, 872–879. Web of Science CrossRef IUCr Journals Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kmetko, J., Husseini, N. S., Naides, M., Kalinin, Y. & Thorne, R. E. (2006). Acta Cryst. D62, 1030–1038. Web of Science CrossRef CAS IUCr Journals Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Leal, R. M. F., Teixeira, S. C. M., Rey, V., Forsyth, V. T. & Mitchell, E. P. (2008). J. Appl. Cryst. 41, 729–737. Web of Science CrossRef CAS IUCr Journals Google Scholar
Leslie, A. G. W. (1992). Jnt CCP4/ESF–EACBM Newsl. Protein Crystallogr. 26. Google Scholar
Luzzati, V. (1953). Acta Cryst. 6, 142–152. CrossRef CAS IUCr Journals Web of Science Google Scholar
Moukhametzianov, R., Burghammer, M., Edwards, P. C., Petitdemange, S., Popov, D., Fransen, M., McMullan, G., Schertler, G. F. X. & Riekel, C. (2008). Acta Cryst. D64, 158–166. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murray, J. W., Garman, E. F. & Ravelli, R. B. G. (2004). J. Appl. Cryst. 37, 513–522. Web of Science CrossRef CAS IUCr Journals Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Nanao, M. H., Sheldrick, G. M. & Ravelli, R. B. G. (2005). Acta Cryst. D61, 1227–1237. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nurizzo, D., Mairs, T., Guijarro, M., Rey, V., Meyer, J., Fajardo, P., Chavanne, J., Biasci, J.C., McSweeney, S. & Mitchell, E. (2006). J. Synchrotron Rad. 13, 227–238. Web of Science CrossRef CAS IUCr Journals Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS PubMed Web of Science Google Scholar
Owen, R. L., RudinoPinera, E. & Garman, E. F. (2006). Proc. Natl Acad. Sci. USA, 103, 4912–4917. Web of Science CrossRef PubMed CAS Google Scholar
Popov, A. N. & Bourenkov, G. P. (2003). Acta Cryst. D59, 1145–1153. Web of Science CrossRef CAS IUCr Journals Google Scholar
Prates, A. M. J., Tarbouriech, N., Charnock, S. J., Fontes, C. M. J. A., Ferreira, L. M. A. & Davies, G. J. (2001). Structure, 9, 1183–1190. Web of Science CrossRef PubMed CAS Google Scholar
Ravelli, R. B. G. & Garman, E. F. (2006). Curr. Opin. Struct. Biol. 16, 624–629. Web of Science CrossRef PubMed CAS Google Scholar
Ravelli, R. B. G. & McSweeney, S. M. (2000). Structure, 8, 315–328. Web of Science CrossRef PubMed CAS Google Scholar
Read, R. J. (1986). Acta Cryst. A42 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Sarvestani, A., Walenta, A. H., Busetto, E., Lausi, A. & Fourme, R. (1998). J. Appl. Cryst. 31, 899–909. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sliz, P., Harrison, S. & Rosenbaum, G. (2003). Structure, 11, 13–19. Web of Science CrossRef PubMed CAS Google Scholar
Srinivasan, R. R. & Parthasarathy, S. (1976). Some Statistical Applications in Xray Crystallography, p. 61. Oxford: Pergamon Press. Google Scholar
Weik, M., Ravelli, R. B. G., Kryger, G., McSweeney, S., Raves, M. L., Harel, M., Gros, P., Silman, I., Kroon, J. & Sussman, J. L. (2000). Proc. Natl Acad. Sci. USA, 97, 623–628. Web of Science CrossRef PubMed CAS Google Scholar
Wilson, A. J. C. (1950). Acta Cryst. 3, 397–398. CrossRef IUCr Journals Web of Science Google Scholar
Ye, K., Malinina, L. & Patel, D. J. (2003). Nature (London), 426, 874–878. Web of Science CrossRef PubMed CAS Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.