research papers
Integration of macromolecular diffraction data
^{a}MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England
^{*}Correspondence email: andrew@mrclmb.cam.ac.uk
Diffraction intensities can be evaluated by two distinct procedures: summation integration and profile fitting. Equations are derived for evaluating the intensities and their standard errors for both cases, based on Poisson statistics. These equations highlight the importance of the contribution of the Xray background to the standard error and give an estimate of the improvement which can be achieved by profile fitting. Profile fitting offers additional advantages in allowing estimation of saturated reflections and in dealing with incompletely resolved diffraction spots.
Keywords: profile fitting; summation integration.
1. Introduction
Data integration refers to the process of obtaining estimates of diffracted intensities (and their standard errors) from the raw images recorded by an Xray detector. As twodimensional area detectors are almost universally used to collect macromolecular diffraction data, only this type of detector will be considered in the following analysis.
When collecting data with a twodimensional area detector, a decision has to be taken about the magnitude of the angular rotation of the crystal during the recording of each image. Two distinct modes of operation are possible: the rotation per image can be comparable to or greater than the angular reflection range of a typical reflection (coarse φ slicing), or it can be much less than the reflection width (fine φ slicing). The latter approach allows the use of threedimensional profile fitting and, providing the detector is relatively noisefree, will improve the quality of the resulting data by minimizing the contribution of the Xray background to the total measured intensity. However, there are significant overheads associated with recording, storing and processing the relatively large number of images that are required. Threedimensional profile fitting is described in the article by Pflugrath (1999) and will not be discussed here.
2. Prerequisites for accurate integration
2.1. Crystal parameters
Only the integration procedure itself will be described in detail in this article. However, in order to obtain the highest quality data possible from a given set of images, there are a number of parameters which need to be determined in advance of or during the integration. The most important of these are the unitcell parameters, which should be determined to an accuracy of a few parts in a thousand (or better). Postrefinement procedures (Winkler et al., 1979; Rossmann et al., 1979), which make use of the estimated φ centroids of observed spots rather than their detector coordinates, generally provide more accurate estimates than methods based on the spot positions. This is because spot positions are affected by residual spatial distortions (after applying appropriate corrections) and, additionally, the unitcell parameters are correlated with the crystaltodetector distance, which is not always accurately known. For either method, it is necessary to include data from widely separated regions of (ideally φ values 90° apart) in order to determine all unitcell parameters accurately. This is particularly important for lower symmetry space groups.
The crystal orientation also needs to be known to an accuracy which corresponds to a few percent of the reflection width. For crystals with low mosaicity (e.g. 0.1°), this corresponds to a hundredth of a degree or better. Fortunately, it is a feature of postrefinement that the error in determining the orientation is typically a few percent of the reflection width, and so this condition can generally be met. It is important to allow for movement of the crystal by continuously updating the crystal orientation during integration. This is even true when using cryocooled crystals, as the magnetic couplings which attach the pin (holding the crystal) to the goniometer head are not strong enough to prevent small movements, particularly with the high angular rotation rates employed on intense synchrotron beamlines. Nonorthogonality of the incident Xray beam and the rotation axis (if not allowed for) or an offcentred crystal will also give rise to apparent changes in crystal orientation with spindle rotation.
The crystal mosaicity can be estimated by visual inspection and refined by postrefinement. Refined values are quite reliable when the mosaic spread is less than about 0.5°, but becomes more dependent on the rocking curve model for the high mosaicities which are often associated with frozen crystals. The presence of diffuse scatter, which appears as haloes around the Bragg diffraction spots, presents further difficulties in determining the correct mosaic spread. When processing coarsesliced images, it is preferable to slightly overestimate the mosaic spread (rather than underestimate it). This will result in an increase in random errors (by adding in the Xray background from an image on which the spot is not actually present), whereas using too small a value can give systematic errors (by underestimating the number of images on which the spot lies).
2.2. Detector parameters
Detector calibration is essential for high data quality. Both the spatial distortion and the nonuniformity of response of the detector must be known accurately, and it is equally important that these corrections are stable over the time scale of the experiment (and preferably for much longer).
Finally, the crystaltodetector distance, the detector orientation and the directbeam position must be refined and continuously updated during integration, using observed spot positions. The crystaltodetector distance can vary during data collection if the crystal is not exactly centred on the rotation axis, and the directbeam position can move after a beam refill at a synchrotron. For imageplate detectors with two (or more) plates, the directbeam position and detector distance often differ slightly for different plates.
With appropriate care, it is normally possible to predict reflection positions on the detector to an accuracy of 20–30 µm, or a fraction of the pixel size, particularly for highly collimated Xray beams available at synchrotron sources. This level of accuracy is necessary to minimize possible systematic errors, particularly in the case of profile fitting.
3. Methods of integration
There are two quite distinct procedures available for determining the integrated intensities: summation integration and profile fitting. Summation integration involves simply adding the pixel values for all pixels lying within the area of a spot and then subtracting the estimated background contribution to the same pixels. Profile fitting (Diamond, 1969; Ford, 1974; Rossmann, 1979) assumes that the actual spot shape or profile is known (in two or three dimensions), and the intensity is derived by finding the scale factor which, when applied to the known (or standard) profile, gives the best fit to the observed spot profile. In practice, profile fitting requires two separate steps: determination of the standard profiles and evaluation of the profilefitting intensities. As will be shown later, profile fitting results in a reduction in the random error associated with weak intensities, but offers no improvement for very high intensities.
4. Xray background
In the complete absence of Xray background and detector noise, the integration of the diffraction images becomes very straightforward. Using the geometry of the
construction, it is possible to map every pixel on the detector into and assign to each pixel the indices of the nearest reciprocallattice point. The diffracted intensity could then be found by summing the pixel values for all pixels which have been assigned to that particular reciprocallattice point. In the absence of background and detector noise, these intensities are as accurate an estimate as it is possible to obtain, and methods like profile fitting offer no advantage. The inclusion of pixels which lie outside the physical extent of the spot on the detector does not compromise the signaltonoise ratio. In practice, it is extremely rare for the background to be negligible, even for the relatively strong lowresolution diffraction spots.4.1. The measurement box
Xray scattering from air, the sample holder and the specimen itself give rise to a general background in the images which has to be subtracted in order to obtain the Bragg intensities. Ideally, the background should be measured for the same pixels used to record the Bragg diffraction spot, but this is not usually practical and the background is determined using pixels immediately adjacent to the spot. In practice, the pixels to be used for the determination of the background (background pixels) and those to be used for evaluating the intensity (peak pixels) are defined using a `measurement box'. This is a rectangular box of pixels which is centred on the predicted spot position. Each pixel within the box is classified as being a background or a peak pixel (or neither). This mask can either be defined by the user or the classification can be made automatically by the program. An example of a possible measurementbox definition is given in Fig. 1. The background parameters NRX, NRY and NC can be optimized automatically by maximizing the ratio of the intensity divided by its standard error, in a manner analogous to that described by Lehmann & Larsen (1974). It is generally assumed that the background can be adequately modelled as a plane, and the plane constants are determined using the background pixels. This allows the background to be estimated for the peak pixels, so that the backgroundcorrected intensity can be calculated.
5. Integration by simple summation
5.1. Determination of the best background plane
The backgroundplane constants a, b and c are determined by minimizing
where ρ_{i} is the total number of counts at the pixel with coordinates (p_{i}, q_{i}) with respect to the centre of the measurement box and the summation is over the n background pixels. w_{i} is a weight which should ideally be the inverse of the variance of ρ_{i}. Assuming that the variance is determined by counting statistics, this gives
where G is the detector gain, which converts pixel counts to equivalent Xray photons, and E(ρ_{i}) is the of the background counts ρ_{i}. In practice, the variation in background across the measurement box is usually sufficiently small that all weights can be considered to be equal.
This gives the following equations for a, b and c, as given in Rossmann (1979)
where all summations are over the n background pixels.
5.1.1. Outlier rejection
It is not unusual for the diffraction pattern to display features other than the Bragg diffraction spots from the crystal of interest. Possible causes are the presence of a satellite crystal or twin component, whiteradiation streaks, cosmic rays or zingers. In order to minimize their effect on the determination of the backgroundplane constants, the following outlierrejection algorithm is employed.

5.2. Evaluating the integrated intensity and standard error
The summation integration intensity I_{s} is given by
where the summation is over the m pixels in the peak region of the measurement box. If the peak region has mm symmetry, this simplifies to
To evaluate the standard error, this can be written
where the second summation is over the n background pixels. The variance in I_{s} is
From Poisson statistics, this becomes
where I_{bg} is the background summed over all peak pixels. We can also write
(this is only strictly true if the background region has mm symmetry). Then,
This expression shows the importance of the background (I_{bg}) in determining the standard error in the intensity. For weak reflections, the Bragg intensity (I_{s}) is often much smaller than the background (I_{bg}) and the error in the intensity is determined entirely by the background contribution.
5.3. The effect of instrument or detector errors
Standard error estimates calculated using (11) are generally in quite good agreement with observed differences between the intensities of symmetryrelated reflections for weak or medium intensities. This is particularly true if other sources of systematic error are minimized by measuring the same reflections five or more times by taking multiple exposures of the same small oscillation range and then processing the data in P1. However, even in this case, the agreement between strong intensities is significantly worse than that predicted using (11). This is consistent with the observation that it is very unusual to obtain merging R factors lower than 1%, even for very strong reflections where Poisson statistics would suggest merging R factors should be in the range 0.2–0.3%.
An experiment in which a diffraction spot recorded on photographic film was scanned many times on an optical microdensitometer showed that the r.m.s. variation in individual pixel values between the scans was greatest for those pixels immediately surrounding the centre of the spot, where the gradient of the
was greatest. One explanation for this observation is that these optical densities will be most sensitive to small errors in positioning the reading head, owing to vibration or other mechanical defects. A simple model for the instrumental contribution to the standard error of the spot intensity is obtained by introducing an additional term for each pixel in the spot peak,where δρ/δx is the average gradient and K is a proportionality constant. Taking a triangularshaped reflection profile, the gradient and integrated intensity are related by the equation
where x is the halfwidth of the reflection (in pixels). Writing
this gives
where the factor A allows for differences in spot size and K is ideally a constant for a given instrument.
The total variance in the integrated intensity is then
A value for K can be determined by comparing the goodnessoffit of the standard profiles to individual reflection profiles (of fully recorded reflections) with that calculated from combined Poisson statistics and the instrumenterror term. Standard errors estimated using (17) give much more realistic estimates than those based on (11), even for data collected with CCD detectors, where the physical model for the source of the error is clearly not appropriate.
6. Integration by profile fitting
Providing the background and peak regions are correctly defined, summation integration provides a method for evaluating integrated intensities which is both robust and free from systematic error. For weak reflections, however, many of the pixels in the peak region will contain very little signal (Bragg intensity), but will contribute significantly to the noise because of the Poissonian variation in the background [as shown by the I_{bg} term in (11)]. Profile fitting provides a means of improving the signaltonoise ratio for this class of reflection (but will provide no improvement for reflections where the background level is negligible).
6.1. Forming the standard profiles
In order to apply profilefitting methods, the first requirement is to derive a `standard' profile which accurately represents the true reflection profile. Although analytical functions can be used, it is difficult to define a simple function which will cope adequately with the wide variation in spot shapes which can arise in practice. Most programs therefore rely on an empirical profile derived by summing many different spots. The optimum profile is that which provides the best fit to all the contributing reflections, i.e. that which minimizes
where P_{j} is the profile value for the jth pixel, ρ_{j}(h)_{corr} is the observed backgroundcorrected counts at that pixel for reflection h, K_{h} is a scale factor and w_{j}(h) is a weight for the jth pixel of reflection h. The summation extends over all reflections contributing to the profile. The weight w_{j}(h) is given by
and, from Poisson statistics, the j is given by
of the counts at pixelAfter Rossmann (1979), the summationintegration intensity I_{s}(h) can be used to derive a value for K_{h},
In (20) and (21), as the profile values P_{j} are not yet determined, a preliminary profile derived, for example, from simple summation of strong reflections used in the detectorparameter can be used, which will give acceptable weights for use in (18).
This method of deriving the standard profile is only appropriate for fully recorded reflections. However, in many cases there will be very few or no fully recorded reflections on each image. In such cases, the profile is determined by simply adding together the backgroundcorrected pixel counts from all contributing reflections. In the program MOSFLM (Leslie, 1992), the profiles are determined using reflections on, typically, ten or more successive images, so that partials will be summed to give the correct fully recorded profile for the majority of the contributing reflections. Tests carried out using standard profiles derived using only fully recorded reflections and (18), or using both fully recorded and partially recorded reflections and simple summation, give data of the same quality as judged by the merging statistics.
The reflection profile changes across the face of the detector, owing to obliquity of incidence, changes in the projected diffracting volume and geometric factors. In the MOSFLM program, this variation is accommodated by determining several standard profiles (typically 9 or 25) for different regions of the detector. When evaluating the profilefitted intensity for a given reflection, a weighted sum of the nearest standard profiles is calculated to provide the best estimate of the true profile at that position on the detector. For the central regions of the detector, there will be four contributing profiles, while at the edges there will be between one and three. The weights assigned to each profile vary linearly with the distance from the reflection to the centres of the regions used in determining the standard profiles. An alternative procedure, used in DENZO (Otwinowski & Minor, 1997), is to evaluate a new profile for each reflection based on spots lying within a prespecified radius.
6.2. Evaluation of the profilefitted intensity
Given an appropriate standard profile, the reflection intensity for fully recorded reflections is evaluated by determining the scale factor K and backgroundplane constants a, b and c which minimize
where the summation is over all valid pixels in the measurement box. As before,
and the i is given by
of the counts at pixelIn order to calculate the weights, the background plane constants and summationintegration intensity I_{s} are evaluated as described in §5, at the same time identifying any outliers in the background. The summationintegration intensity is used to evaluate the scale factor J in (24) using
In (22), the summation is over all valid pixels within the measurement box. This excludes pixels which are overlapped by neighbouring spots (if any) and any outliers identified in the background region.
Minimizing R_{3} with respect to K, a, b and c leads to four linear equations from which K, a, b and c can be determined:
and the profilefitted intensity I_{p} is then given by
The standard error in the profilefitted intensity is given by
where
N is the number of pixels in the summation and A^{1}_{KK} is the diagonal element for the scale factor K of the inverse normal matrix (used to minimize R_{3}).
In the case of partially recorded reflections, it is no longer valid to fit the sum of the scaled standard profile and a background plane to all pixels in the measurement box. Partially recorded reflections can have a profile which differs significantly from the standard profile, with the result that the background plane constants take on physically unreasonable values in an attempt to compensate for this difference. Therefore, for partially recorded reflections, the summation in (22) is restricted to pixels in the peak region of the measurement box. Minimizing R_{3} with respect to the scale factor K then gives
where all summations are over the peak region only.
It is not possible to derive a standard error for partially recorded reflections based on the fit of the scaled standard profile (because partially recorded reflections have a different spot profile). For these reflections, the standard error can be calculated using (17).
6.3. Modifications for very close spots
In order to apply (22), it is necessary to exclude all pixels in the measurement box which are overlapped by a neighbouring spot. This applies not only to the pixels of the reflection being integrated, but also to the pixels of all the reflections used to form the standard profile. Consequently, a pixel should be excluded even if it is only overlapped by a neighbouring spot for one of the reflections used in forming the standard profile. When processing data from large unit cells, this can lead to a very high percentage of the background pixels being rejected and, therefore, a poor determination of the background plane parameters. In these circumstances, the background plane is determined using only background pixels and excluding only those pixels which are overlapped by neighbours for the reflection actually being integrated. The profilefitted intensity for both fully recorded and partially recorded reflections is then evaluated in the way described for partially recorded reflections in the previous section, with the summation in (32) extending only over peak pixels. The standard error in the intensity for partially recorded reflections is derived from (17) as before. For fully recorded reflections, the standard error has two components; the first is based on the fit of the scaled standard profile to the reflection profile and the second on the contribution from the background:
where m and n are the number of pixels in the peak and background, respectively.
6.4. Profile fitting very strong reflections
For very strong reflections, the background level is very small, and (32) reduces to
the weights are given by
Substituting for w_{i} in (35) gives
As pointed out by Otwinowski (personal communication), this shows that for correctly weighted profile fitting, the profilefitted intensity reduces to the summationintegration intensity for very strong intensities.
6.5. Profile fitting very weak reflections
For very weak reflections, all pixels will have very similar counts and, therefore, all the weights will be the same. For simplicity, consider the case where the profile fit is evaluated only for the peak pixels; (32) then reduces to
The last term in this equation depends only on the shape of the standard profile. This shows that the intensity is a weighted sum of the individual backgroundcorrected pixel counts (rather than a simple unweighted sum, as is the case for summation integration). Because the values of P_{i} are a maximum in the centre of the spot, this will place a higher weight on those pixels where the contribution of the Bragg diffraction is greatest and a very low weight on the peripheral pixels where the Bragg diffraction is weakest. In this way, profile fitting improves the signaltonoise ratio without the risk of introducing any systematic error which may result from simply reducing the size of the peak region for weak spots.
6.6. Improvement provided by profile fitting weak reflections
For very weak reflections, where all the weights w_{i} are approximately the same, the variance in I_{p} using (38) is given by
Assuring a flat background and very weak intensity, from Poisson statistics
and, as ρ_{i} has approximately the same value (ρ) for all pixels,
The variance in the summationintegration intensity is simply
The ratio of the variances is thus
For a typical spot profile, the righthand side (which depends only on the shape of the standard profile) has a value of 2, showing that profile fitting can reduce the standard error in the integrated intensity by a factor of 2^{1/2}.
6.7. Other benefits of profile fitting
6.7.1. Incompletely resolved spots
If adjacent spots are not fully resolved, there will be a systematic error in the integrated intensity which will be largest for weak spots which are adjacent to very strong spots. However, the profilefitted intensity will be affected less than the summationintegration intensity because the peripheral pixels (where the influence of neighbouring spots is greatest) are downweighted relative to the central pixels (where the neighbours will have least influence).
Further steps can be taken to minimize the errors caused by overlapping spots. Firstly, when forming the standard profiles, reflections are only included if they are significantly stronger than their nearest neighbours. This will minimize the errors in the standard profiles. Secondly, when evaluating the profilefitted intensity of a particular reflection, pixels can be omitted if they are adjacent to a pixel which is part of a neighbouring spot (rather than having to be part of that spot).
A more satisfactory approach is to deconvolute spatially overlapping spots as described, for example, by Bourgeois et al. (1998).
6.7.2. Elimination of peak pixel outliers
In the same way that outliers in the background region can be identified and rejected (see §5.1.1), it is possible, in principle, to identify outliers in the peak region of fully recorded reflections as those pixels whose deviation from the scaled standard profile is significantly greater than that expected from counting statistics. This approach works well if the feature which gives rise to the outliers affects only a small fraction of the peak pixels and gives rise to large deviations; this is the case for some zingers, dead pixels and for diffraction from small ice crystals when collecting data from cryocooled samples.
Another source of outliers is the encroachment of a strong neighbouring spot into the peak region, as discussed in the previous section. When dealing with peripheral pixels, the outlier test can be applied to both fully recorded and partially recorded reflections, but a high σ cutoff (e.g. 10–20) must be used to avoid rejecting pixels which do not fit the profile simply because it is a partially recorded spot.
6.7.3. Estimation of overloaded reflections
Because of the limited
of current detectors, it is common for many lowresolution spots to contain saturated pixels. Providing the saturation level of the detector is known, such pixels can simply be excluded from the profile fitting, allowing a reasonable estimate of the true intensity (except when the majority of the pixels are saturated). A knowledge of the strong intensities is essential for structure solution based on molecularreplacement techniques, and so this is a very useful additional feature of profile fitting.6.8. Profile fitting partially recorded reflections
Greenhough & Suddath (1986) have shown that when profile fitting is applied to partially recorded reflections this leads to a systematic error in the individual intensities, but there is no systematic error in the total summed intensity. Although their analysis is strictly only applicable to the case of unweighted profile fitting, experience has shown that even when using weighted profile fitting there is no evidence of systematic errors in the summed profilefitted intensities of partially recorded reflections. This is particularly important, as many data sets collected from frozen crystals have few, if any, fully recorded reflections.
6.9. Systematic errors in profilefitted intensities
The fundamental assumption in profile fitting is that the standard profiles accurately reflect the true profile of the reflection being integrated. Errors in the standard profile will result in systematic errors in the profilefitted intensities. While these errors will often be small compared with the random (Poissonian) error for weak reflections, this is not necessarily the case for strong reflections, as the systematic error is typically a small percentage of the total intensity. Because the standard profiles are derived from the summation of many contributing reflections, small positional errors in spot prediction will lead to a broadening of the standard profile relative to the profile of an individual spot. The same broadening can occur because of the finite 2) so that the spot positions are predicted to within a small fraction of the overall spot width, then there is no suggestion (from merging statistics at least) for significant systematic error even in the stronger intensities.
in the image, which means that a predicted spot position can lie up to half a pixel away from the centre of the measurement box. This error can be minimized by interpolating the pixel values in the image onto a grid which is centred exactly on the predicted position, but the interpolation step itself will inevitably distort the reflection profile. In spite of these difficulties, providing adequate care is taken to determine the crystal and detector parameters accurately (as mentioned in §Acknowledgements
I would like to thank Dr A. J. Wonacott, Dr P. Brick and Dr P. R. Evans for many stimulating and critical discussions on all aspects of data integration.
References
Bourgeois, D., Nurizzo, D., Kahn, R. & Cambillau, C. (1998). J. Appl. Cryst. 31, 22–35. Web of Science CrossRef CAS IUCr Journals Google Scholar
Diamond, R. (1969). Acta Cryst. A25, 43–55. CrossRef CAS IUCr Journals Web of Science Google Scholar
Ford, G. C. (1974). J. Appl. Cryst. 7, 555–564. CrossRef IUCr Journals Web of Science Google Scholar
Greenhough, T. J. & Suddath, F. L. (1986). J. Appl. Cryst. 19, 400–409. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lehmann, M. S. & Larsen, F. K. (1974). Acta Cryst. A30, 580–584. CrossRef CAS IUCr Journals Web of Science Google Scholar
Leslie, A. G. W. (1992). Jnt CCP4/ESF–EACMB. Newslett. Protein Crystallogr. 26. Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS Web of Science Google Scholar
Pflugrath, J. (1999). Acta Cryst. D55, 1718–1725. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. (1979). J. Appl. Cryst. 12, 225–238. CrossRef CAS IUCr Journals Web of Science Google Scholar
Rossmann, M. G., Leslie, A. G. W., AbdelMeguid, S. S. & Tsukihara, T. (1979). J. Appl. Cryst. 12, 570–581. CrossRef CAS IUCr Journals Web of Science Google Scholar
Winkler, F. K., Schutt, C. E. & Harrison, S. C. (1979). Acta Cryst. A35, 901–911. CrossRef CAS IUCr Journals Web of Science Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.