## research papers

## Some aspects of quantitative analysis and correction of radiation damage

^{a}Universität Konstanz, Fachbereich Biologie, M647, D-78457 Konstanz, Germany^{*}Correspondence e-mail: kay.diederichs@uni-konstanz.de

Radiation damage is the major source of systematic error in macromolecular data collected at third-generation synchrotron beamlines. In this paper, a simple way of analysing data for radiation damage is proposed and shown to give results that are easy to interpret. Results of correction of radiation damage obtained with an *XSCALE* (from the *XDS* package) are shown, and aspects of the mathematical treatment of radiation damage, as well as experimental requirements for the correction and utilization of radiation damage are discussed. Furthermore, a method for quantifying the coverage and evenness of sampling of rotation range is proposed.

### 1. Introduction

In recent years it has become clear that radiation damage of macromolecular X-ray data collected at synchrotrons is the major systematic source of error for most of the data sets measured. Ionization events in the crystal can conceptually be divided into two components: primary radiation damage directly and unspecifically affects all atoms in the crystal, but has its greatest effect on the occupancy (and sometimes position) of heavy atoms (sulfur and metals) because of their comparatively large absorption ). damage arises from highly reactive free radicals and photoelectrons produced by the incident radiation. Even at cryotemperature (near 100 K) the photoelectrons diffuse through the crystal, although at much reduced rates, and target susceptible groups in macromolecules.

(reviewed in Garman, 2003Taken together, these events produce both unspecific, random damage and specific, localized changes of the macromolecular structure. Random damage of atoms leads to disorder in the crystal and ultimately to deterioration of the diffracting power of the crystal. Henderson (1990) estimated that the maximum dose tolerated by a crystal amounts to about 2 × 10^{7} Gy, a dose reached at third-generation undulator beamlines within minutes of irradiation with an unattenuated beam.

The specific changes caused by radiation damage have important consequences for phasing,

and interpretation of macromolecular structures. The loss of high resolution brought about by unspecific radiation damage reduces the amount of information in refined structures, and structural changes owing to irradiation might lead to the wrong conclusions concerning functional aspects of protein structure. Experimental phasing of macromolecules depends on the presence of naturally available or experimentally introduced heavy atoms bound at specific sites of the structure. Radiation damage reduces the occupancy and sometimes location of exactly those atoms which are used for phasing, and, in principle, results in a breakdown of the theory used to derive phases from the small differences of structure factors in single/multiple (SAD and MAD) or single/multiple (SIR and MIR) experiments, because theory assumes that the contribution of the heavy-atom on the structure factors can be calculated from its time-independent model, and does not depend on the time (or rather, dose) when a reflection was recorded during the experiment.Experimentally, *et al.*, 2002; Garman, 2003). Strong attenuation and short exposure times are commonly used to find a compromise between data quality and radiation damage. At the data reduction stage, an artificial temperature factor is employed to correct the average radiation damage, both as a function of dose and of resolution.

Quantitative ways of assessing radiation damage have been sought, but neither the changes in cell parameters, nor those in mosaicity or a number of other parameters investigated have been found to be reproducible among crystals even of the same species (Murray & Garman, 2002; Ravelli *et al.*, 2002). Correction of specific radiation damage has been shown to be effective at the level of the raw, unmerged intensity data, thus exploiting the redundancy (multiplicity) of observations of the unique reflections during data reduction and scaling (Diederichs *et al.*, 2003). Alternatively, correction was achieved by of heavy-atom parameters as a function of dose against multiple observations during phasing (Schiltz *et al.*, 2004).

Here, aspects of the former correction are discussed, and a way of analysing data with respect to radiation damage is suggested and explored. Furthermore, indicators for the characterization of coverage and sampling of a data set's rotation range, and therefore also of its radiation damage, are proposed, which may be used to optimize data-collection strategy with respect to radiation damage and its correction.

### 2. Methods

The method suggested by Diederichs *et al.* (2003) requires and exploits redundancy (multiplicity of observations belonging to each unique reflection) to partially correct radiation damage. The basic idea is that the dose-dependent changes in electron density result in non-random, dose-dependent values of the structure factors, which can be approximated by a low-order function fitted to all observations of each unique reflection. The underlying physical model is the decomposition of the electron density into two parts, one that is constant, and one that depends on dose and can be used to model the (specific and unspecific) radiation damage.

It is worth noting that a zero-order model is fitted to all observations of a unique reflection in the traditional scaling model which does not explicitly take radiation damage on individual reflections into account, thereby neglecting the information contained in the dose-dependency of the intensities.

#### 2.1. Models for radiation damage at the level of unique reflections

Although radiation damage is a local phenomenon, its manifestation in the measured intensities involves the average over a large number of unit cells in the crystal. These intensities can therefore be considered slowly varying functions of the dose, at least over the range of doses commonly employed for synchrotron measurements of protein crystals. Low-order functions can therefore be used to model the radiation-induced change of intensities.

As particularly simple functions, a quadratic, a linear and an exponential model are defined and characterized below.

##### 2.1.1. Quadratic model

The constant, radiation-insensitive part of the dose dependent electron density ρ(*x*, *y*, *z*; *d*) can be denoted as ρ_{c}(*x*, *y*, *z*), and the radiation-sensitive part, which depends on the dose *d*, is denoted as ρ_{v}(*x*, *y*, *z*; *d*), thus

If we assume that ρ_{v}(*x*, *y*, *z*; *d*) is linear in the dose, we can write

where is the change in electron density of the radiation-susceptible part of the structure upon irradiation with a unit dose. In *hkl* of the structure factors for brevity, the corresponding to the constant ρ_{c}(*x*, *y*, *z*) is the dose-independent *F*, and the corresponding to is *d**G*. Then, the total *h* of a reflection as a function of dose is

The total intensity *y*^{quad}(*d*) = |*H*(*d*)|^{2} is

*y*^{quad}(*d*) therefore depends on two parameters, |*G*| and the cosine of the angle between *F *and *G*, which need to be determined by least-squares fitting to the observed intensities, in addition to the zero-dose intensity |*F*|^{2}. In this way, provided that the assumption of linearity in dose holds, the observations of a dose-dependent intensity may be fully described, except for noise. However, the least-squares determination of the two parameters from the observations is not straightforward because of the restrictions on the possible values of the cosine.

In the presence of unit-cell changes induced by radiation damage, the sampling of the fourier transform of the macromolecule differs between the observations of a unique reflection. In this case, the quadratic model is not justified, and the coefficients of the quadratic model do not provide a valid mathematical interpretation of the changes in the electron density.

##### 2.1.2. Linear model

Although the quadratic model approach shows the mathematical background and underlying assumptions, a simplified approach appears to be adequate in practice. The simplest approach is a linear function

where β is an empirical `decay factor' (or `damage factor') without direct physical significance, which can be determined from a linear fit to the observations of each unique reflection. This was implemented by the author in the program *0-dose* (Diederichs *et al.*, 2003) for the general case, where β is determined from observations belonging to several data sets. A shortcoming of this model is that the correction may result in negative extrapolated intensities.

##### 2.1.3. Exponential model, and implementation in *XSCALE*

A better approximation can be expected from an exponential function

where β is again a `decay factor' which is determined from a nonlinear fit to the observations of each unique reflection. This model has the advantage that the theoretical values and their second derivative are positive, as in the case of the quadratic model.

The algorithm was implemented in the program *XSCALE* (Kabsch, 2004), again for the general case where β is determined from observations belonging to several data sets. Furthermore, *XSCALE* determines by least-squares fitting two parameters per data set, both of which can also be input manually. The first of these (`STARTING_DOSE') takes into account the dose that the crystal has absorbed before the start of the data set, and the second (`DOSE_RATE') is a conversion factor from frame number to dose, which could be used in the case of data sets collected at different wavelengths or with different exposure times. In the case of only one data set, this parameter is set to one.

*XSCALE* extrapolates to zero dose, and STARTING_DOSE is by default zero for the first data set. If the user wants to extrapolate or interpolate intensities to any other value of the absorbed dose, a different STARTING_DOSE can be input. A negative value for STARTING_DOSE leads to interpolation within the dose range of the data set as long as the absolute value is less than the highest frame number difference within the data set.

#### 2.2. Analysis of decay

Radiation damage leads to a dose-dependent deviation of intensities from the (theoretical) intensity at zero dose. On average, the absolute difference becomes larger with larger difference in dose (see *e.g.* Table 2 in Ravelli & McSweeney, 2000, Table 1 in Banumathi *et al.*, 2004 and Table 5 in Weiss *et al.*, 2005). A simple way to analyse radiation damage is a plot of fractional differences, which are calculated in analogy to the usual *R* factor, as a function of the separation of the contributing observations in framenumber (dose) space. We define a `decay *R* factor'

such that the differences of the observed intensities *Y* of the unique reflection *hkl* with centroids on frames *i *and *j* contribute towards *R*_{d} at *d* = |*i* −*j*|. As all pairs of observations of each unique reflection *hkl* contribute to this `*R* factor as a function of framenumber difference', a relatively smooth function is obtained which gives a quantitative way of assessing radiation damage based on the measured data, without reference to any theoretical model of radiation damage.

Contrary to *R*_{sym} (Diederichs & Karplus, 1999), *R*_{d} does not depend on the multiplicity, because only pairwise comparisons are performed.

Plots of *R*_{d} (Fig. 1) by resolution shell can be routinely used to check data sets for radiation damage. A positive slope indicates significant radiation damage (Fig. 1*a*). After radiation-damage correction, the slope should be around zero (Fig. 1*b*).

The difference (without radiation-damage correction) between *R* factors at low (*R*_{d,low}) and high (*R*_{d,high}) frame number differences can be used to quantify the damage. To enable the comparison of radiation damage between different proteins, or different crystal forms or growth conditions of the same protein, we may in the simplest case calculate a resolution-dependent quantity `srd' considered to be a `susceptibility for radiation damage' as

in units of *R* factor increase per dose.

#### 2.3. Sampling of rotation range and radiation damage

For a given crystal orientation and geometry of the diffraction experiment, software is usually available to plan data collection, in order to obtain sufficient completeness of data and multiplicity of observations. Completeness is important to collect all available information about a

and multiplicity is required to allow scaling and outlier rejection in the traditional scaling model. When radiation damage correction should be performed, multiplicity is also required to determine the decay factor(s) of each unique reflection.If the dose per image is constant (or slowly varying), the radiation damage of a crystal is proportional to the amount of its exposure to the beam, and therefore to the fraction of its rotation in the data set's rotation range. In order to allow accurate interpolation of intensities to dose values within the data set's rotation range and possibly slightly beyond it, it is important to both cover as much as possible of the whole rotation range of the crystal with the observations, and to collect the observations in equal intervals, rather than to collect them (for example) in pairs because of the alignment of a symmetry axis with the axis of rotation. If either of these requirements is not met, the multiplicity cannot be expected to be a good indicator for the possibility of interpolation and extrapolation.

According to the author's knowledge, definitions of parameters quantifying these effects have not been given in the literature. Possible definitions are given below.

##### 2.3.1. Coverage of rotation range

A straightforward way to define the coverage *C* of rotation space for a single unique reflection is to calculate the fraction of total rotation range of the data set (φ_{max} − φ_{min}) covered by its *n* observations. If the crystal is rotated around the φ axis, the observations of a reflection are measured at φ_{1}…φ_{n}, where the index 1 refers to the first observation, and *n* to the last observation. We define for each reflection

which is a number between zero, if only one observation is made, and one, if first and last observation are measured at the start and end of data set. Intermediate values indicate an incomplete coverage of total rotation space in an obvious way. The average value of *C* over all unique reflections in a data set could be a parameter that should be useful to optimize, together with the resulting completeness, as a function of the starting φ_{min} value of a data set.

##### 2.3.2. Evenness of sampling of rotation range

A possible way of calculating the `evenness of sampling of rotation range', and therefore of the evenness of sampling of radiation damage, for individual unique reflections is the following. Again, we assume that *n* observations of a given unique reflection cover a rotation range φ_{n} − φ_{1}. We can then calculate the rotation intervals Δ_{i} = φ_{i} − φ_{i−1} for *i* > 1 and define the sampling *H* as

with , and the evenness *E* of sampling as

with *E* defined as 0 if *n* ≤ 2.

This usage of the *H* parallels that in other natural sciences, and results in an evenness *E* of sampling with a value of zero in the case of less than two intervals, and with a value of one when all intervals (*n* > 1) are of the same size. Intermediate values result if the intervals are of unequal size.

Again, the average value of *E* over all unique reflections in a data set could be a parameter that should be useful to optimize, together with the resulting completeness and coverage, as a function of the starting value φ_{min} of a data set. As it is likely that not all three parameters are optimal for the same φ_{min} value, a compromise should be sought depending on the purpose of data collection.

### 3. Results and discussion

In §§3.1 to 3.4, aspects of quantification and correction of radiation damage at the level of individual reflections by different models are shown and discussed. The corrections are assumed to be done in addition to the usual artificial temperature factor correction of the zero-order model. In §§3.5 and 3.6, further aspects are discussed which relate to various data collection parameters.

#### 3.1. Analysis of radiation damage by the *R*_{d} plot

The *R _{d}* plot has been used to detect radiation damage in a number of projects. Fig. 1 shows

*R*plots of data measured at the SLS (Swiss Light Source, Villigen, Switzerland) without and with radiation-damage correction (exponential model). Without radiation-damage correction of individual reflections, the slope of

_{d}*R*is found to be higher for high-resolution shells than for low-resolution reflections. After correction, the slope is close to zero in all resolution ranges.

_{d}Within the limited experience obtained so far, the features seen in Fig. 1 are common to the projects where these calculations were made. As the slopes of *R _{d}* appear to be reproducible across data sets from crystals of the same drop, and qualitatively similar for high- and low-resolution shells, it can be expected that

*R*gives a more robust quantitative estimation of radiation damage than changes in

_{d}*e.g.*cell volume or mosaicity, and is also applicable for low-resolution data.

#### 3.2. Modelling of radiation damage at the level of unique reflections

Obviously, if the constant part of the electron density as well as the changes in *et al.*, 2004) or implicitly (Weiss *et al.,* 2004) carried out to obtain better experimental phases. Likewise, if the constant part is known, the changes in the radiation-susceptible parts can be analysed by interpolation (with any of the low-order models) to a number of snapshots along the kinetic coordinate of radiation damage (Diederichs *et al.,* 2003; Wang & Ealick, 2004).

#### 3.3. Choice of model

The quadratic model, if applicable, gives insight into the mathematical side of the problem. If the radiation-sensitive *et al.*, 2004).

However, both assumptions underlying the quadratic model can often not be considered to be fulfilled in practice, as the kinetics are unlikely to follow a linear dose-damage relationship. Furthermore, the determination of two additional parameters instead of one (in the linear and the exponential model) increases the risk of overfitting of the experimental information. On the other hand, the one-parameter models lack the physical interpretation.

#### 3.4. Robust estimation at 1/4-dose and 3/4-dose

In this situation, it is useful to discuss a feature common to all least-squares fits with low-order functions. If we assume that the `true' dose-dependency is itself an unknown low-order function resulting in the observed values *Y*(*d*), then the task of each of the possible models *y*(*d*) is to approximate the unknown function as well as possible in a least-squares sense. This will result in a small number of continuous ranges (1 or 2) where the function values *y*(*d*) are bigger than those of *Y*(*d*). Likewise, in a small number of continuous ranges (2 or 1) the function values *y*(*d*) will be smaller than those of *Y*(*d*). The error of the approximation is biggest at the ends of the dose interval, and is smallest where *Y*(*d*) and *y*(*d*) intersect. For all the low-order models, the points of intersection are close to 1/4 and 3/4, irrespective of the model used (Fig. 2).

We can therefore conclude that the error made by the approximation of the unknown dose-dependency is greatest when an extrapolation is performed to zero or maximum dose, and is smallest when interpolating to 1/4 or 3/4 of the maximum dose. Near the latter points, choice of a different low-order model does not result in a substantially changed interpolated value for the intensity. Therefore, the choice of the model does not matter much, if the interpolation is performed at these points.

#### 3.5. Effect of irradiated volume

The applicability of all mathematical models, and therefore of the computational correction of radiation damage, depends on whether the crystal is uniformly irradiated during the experiment. If the size of the crystal perpendicular to the spindle axis is bigger than the beam, then fresh parts of the crystal enter the beampath during rotation of the spindle; only the center of the crystal obtains the full dose (Fig. 3). The resulting radiation damage can in principle be described as a superposition of radiation effects at different doses. Clearly, no simple low-order model can account for this situation.

For a data set measured like this, the changes in intensities owing to radiation damage are biggest at the beginning of the data set, when the center of the crystal still contributes to the observed structure factors. Later in the data set, the center of the crystal is amorphous and a dynamic equilibrium situation may be reached which is not accessible to radiation-damage correction, or does not even require any. Finally, after 180° rotation, no unirradiated crystalline material is available to enter the beampath, and another sudden change of diffraction properties happens – the resulting intensities represent diffraction arising solely from radiation-damaged protein, and a different dynamic equilibrium results. This repeats itself with a period of 180°, which might have implications for RIP (Ravelli *et al.*, 2003) experiments. Clearly, no matter whether radiation-damage correction is performed or not, this also means that data sets should be scaled in batches of 180°.

Experiments handling this effect of the irradiated volume properly fall into two classes: either the beam is much smaller than the crystal, or the beam is about as big as the crystal. In the first case, the dynamic equilibrium is reached soon (provided that the *R _{d}* plot; in the dynamic equilibrium case the slope of

*R*should be smaller for ranges of data up to 180° than for a small data range near the beginning of data collection.

_{d}If a big crystal is not available or a very fine beam cannot be produced, an attempt should be made to match the size of the beam to that of the crystal (in the direction perpendicular to the spindle) by adjustment of optical elements (*e.g.* focus, slits). In this case, the measured intensities may be extrapolated or interpolated with any of the low-order functions discussed above.

#### 3.6. Coverage and sampling of rotation and dose space

At the current state of knowledge, it is still unknown which multiplicity of reflections is required to allow a reliable correction of radiation damage at the level of individual reflections. One important aspect is that the dose space should be sampled evenly by the observations, in order to accurately determine the parameter(s) of the decay model, and to allow inter- and extrapolation. For example, if all observations of a unique reflection are measured closely together, and therefore correspond to a similar dose, the sampling is poor, whereas an even sampling would result if the observations were spread out in the rotation (or rather, dose) range covered by the experiment.

The formalism given in §2.3.1 and §2.3.2 can be used to assess the coverage and evenness of sampling of rotation range, and therefore of dose space, for each unique reflection. Both coverage *C* and evenness *E* are numbers between 0 and 1. An evenness of 1 corresponds to the most even sampling, with equal rotation angle (dose) intervals for each of the observations. If all reflections occur in pairs (*e.g.* because of a certain alignment of symmetry elements with the rotation axis) and these pairs are evenly spaced, the evenness drops from one to ln(*n*/2 − 1)/ln(*n* − 1). The case of *E* = 0 corresponds to the situation of poor sampling (zero or one interval).

The average coverage and evenness of all reflections in a data set are therefore useful indicators which could be optimized or at least monitored by software which is designed to plan the data collection based on a given geometry of the diffraction experiment.

At the level of individual reflections, complete cross-validation (Diederichs *et al.*, 2003) or a different statistical significance test based on the actual data could be performed to assess whether inter- and extrapolation of intensity is going to be reliable or not. An implementation of a statistical significance test is available in *XSCALE* (keyword 0-DOSE_SIGNIFICANCE_LEVEL).

#### 3.7. Concluding remarks

Radiation damage analysis and correction is still a new field of research. The aspects and indicators discussed here are hoped to advance the understanding of some important parameters in a diffraction experiment, to foster software improvements and ultimately to result in better data and new ways to obtain insight into physical, chemical and biological phenomena.

### Acknowledgements

The author would like to thank Drs R. Ravelli, S. McSweeney, V. Favre-Nicolin, U. Ermler, E. Warkentin, M. S. Weiss, W. Kabsch and the staff of beamline X06SA of the Swiss Light Source (Villigen, Switzerland) for discussion.

### References

Banumathi, S., Zwart, P. H., Ramagopal, U. A., Dauter, M. & Dauter, Z. (2004). *Acta Cryst.* D**60**, 1085–1093. Web of Science CrossRef CAS IUCr Journals Google Scholar

Diederichs, K. & Karplus, P. A. (1997). *Nature Struct. Biol.* **4**, 269–275. CrossRef CAS PubMed Web of Science Google Scholar

Diederichs, K., McSweeney, S. & Ravelli, R. (2003). *Acta Cryst.* D**59**, 903–909. Web of Science CrossRef CAS IUCr Journals Google Scholar

Garman, E. (2003). *Curr. Opin. Struct. Biol.* **13**, 545–551. Web of Science CrossRef PubMed CAS Google Scholar

Henderson, R. (1990). *Proc. R. Soc. London. Ser. B Biol. Sci.* **241**, 6–8. CrossRef CAS Google Scholar

Kabsch, W. (2004). *XDS* version December 2004. https://www.mpimf-heidelberg.mpg.de/~kabsch/xds. Google Scholar

Köhler, R., Schäfer, K., Müller, S., Vignon, G., Diederichs, K., Philippsen, A., Ringler, P., Pugsley, A. P., Engel, A. & Welte, W. (2004). *Mol. Microbiol.* **54**, 647–664. Web of Science CrossRef PubMed Google Scholar

Murray, J. & Garman, E. (2002). *J. Synchrotron Rad.* **9**, 347–354. Web of Science CrossRef CAS IUCr Journals Google Scholar

O'Neill, P., Stevens, D. L. & Garman, E. F. (2002). *J. Synchrotron Rad.* **9**, 329–332. Web of Science CrossRef CAS IUCr Journals Google Scholar

Ravelli, R. B. G., Leiros, H. K., Pan, B., Caffrey, M. & McSeeney, S. (2003) *Structure*, **11**, 217–224. Web of Science CrossRef PubMed CAS Google Scholar

Ravelli, R. B. G. & McSweeney, S. M. (2000). *Structure*, **8**, 315–328. Web of Science CrossRef PubMed CAS Google Scholar

Ravelli, R. B. G., Theveneau, P., McSweeney, S. & Caffrey, M. (2002) *J. Synchrotron Rad.* **9**, 355–360. Web of Science CrossRef CAS IUCr Journals Google Scholar

Schiltz, M., Dumas, P., Ennifar, E., Flensburg, C., Paciorek, W., Vonrhein, C. & Bricogne, G. (2004). *Acta Cryst.* D**60**, 1024–1031. Web of Science CrossRef CAS IUCr Journals Google Scholar

Wang, J. & Ealick, S. E. (2004). *Acta Cryst.* D**60**, 1579–1585. Web of Science CrossRef CAS IUCr Journals Google Scholar

Weiss, M. S., Mander, X., Hedderich, X., Diederichs, K., Ermler, U. & Warkentin, E. (2004). *Acta Cryst.* D**60**, 686–695. Web of Science CrossRef CAS IUCr Journals Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.