research papers
Integration, scaling, spacegroup assignment and postrefinement
^{a}MaxPlanckInstitut für Medizinische Forschung, Abteilung Biophysik, Jahnstrasse 29, 69120 Heidelberg, Germany
^{*}Correspondence email: wolfgang.kabsch@mpimfheidelberg.mpg.de
A version of this paper will be published as a chapter in the new edition of Volume F of International Tables for Crystallography.
Important steps in the processing of rotation data are described that are common to most software packages. These programs differ in the details and in the methods implemented to carry out the tasks. Here, the working principles underlying the datareduction package XDS are explained, including the new features of automatic determination of spot size and reflecting range, recognition and assignment of crystal symmetry and a highly efficient algorithm for the determination of correction/scaling factors.
Keywords: XDS; integration; scaling; spacegroup assignment; postrefinement.
1. Introduction
The key steps in the processing of diffraction data from single crystals involve (i) modelling of the observed reflection positions in the detector plane, (ii) integration of diffraction intensities, (iii) data correction, scaling and postrefinement and (iv) spacegroup assignment. Much of the theory and many of the methods for carrying out these steps were developed about three decades ago for processing rotation data recorded on film and were subsequently extended in order to fully exploit the capabilities of a variety of electronic area detectors; some CCD (chargecoupled device) and multiwire detectors as well as a new pixel detector specially developed for data collection at synchrotron beamlines allow the recording of finely sliced rotation data because of their fast data readout. In this article, the principles of the methods are described as employed by the program XDS (Kabsch, 2010). These apply equally well to rotation images covering small or large oscillation ranges. A large number of other dataprocessing systems have been developed which differ in the details of the implementations. Some of these packages were described in chapter 25.2 of Volume F of International Tables for Crystallography (2001). The theory and practice of processing finesliced data have been discussed by Pflugrath (1997).
2. Modelling rotation images
The observed diffraction pattern, i.e. the positions of the reflections recorded on the rotationdata images, is controlled by a small set of parameters which must be accurately determined before integration can start. Approximate values for some of these parameters are given by the experimental setup, whereas others may be completely unknown and must be obtained from the rotation images. This is achieved by the automatic location of strong diffraction spots, the extraction of a primitive basis that yields integer indices for the observed reflections and the subsequent of all parameters to minimize the discrepancies between observed and calculated spot positions in the data images.
2.1. Coordinate systems and parameters
In the rotation method, the incidentbeam wavevector S_{0} of length 1/λ (where λ is the wavelength) is fixed while the crystal is rotated around a fixed axis described by a unit vector m_{2}. S_{0} points from the Xray source towards the crystal. It is assumed that the incident beam and the rotation axis intersect at one point at which the crystal must be located. This point is defined as the origin of a righthanded orthonormal laboratory coordinate system {l_{1}, l_{2}, l_{3}}. This fixed but otherwise arbitrary system is used as a reference frame to specify the setup of the diffraction experiment.
Diffraction data are assumed to be recorded on a fixed planar detector. A righthanded orthonormal detector coordinate system {d_{1}, d_{2}, d_{3}} is defined such that a point with coordinates X, Y in the detector plane is represented by the vector (X − X_{0})d_{1} + (Y − Y_{0})d_{2} + Fd_{3} with respect to the laboratory coordinate system. The origin X_{0}, Y_{0} of the detector plane is found at a distance F from the crystal position. It is assumed that the diffraction data are recorded on adjacent nonoverlapping rotation images, each covering a constant oscillation range Δ_{φ}, with image No. 1 starting at spindle angle φ_{0}.
Diffraction geometry is conveniently expressed with respect to a righthanded orthonormal goniostat system {m_{1}, m_{2}, m_{3}}. It is constructed from the rotation axis and the incidentbeam direction such that m_{1} = (m_{2} × S_{0})/m_{2} × S_{0} and m_{3} = m_{1} × m_{2}. The origin of the goniostat system is defined to coincide with the origin of the laboratory system.
Finally, a righthanded crystal coordinate system {b_{1}, b_{2}, b_{3}} and its reciprocal basis {, , } are defined to represent the unrotated crystal, i.e. at rotation angle φ = 0°, such that any reciprocallattice vector can be expressed as = + + , where h, k, l are integers.
As shown in §2.2, the location of all diffraction peaks recorded in the data images can be computed from the parameters S_{0}, m_{2}, b_{1}, b_{2}, b_{3}, X_{0}, Y_{0}, F, d_{1}, d_{2}, d_{3}, φ_{0} and Δ_{φ}. In addition, knowledge of the shape and extent of the diffraction spots is required for accurate estimations of their intensities. This can be achieved by a Gaussian model involving two parameters: the standard deviations of the reflecting range, σ_{M}, and of the beam divergence, σ_{D} (see §2.3). This leads to an integration region around the spot defined by the parameters δ_{M} and δ_{D}, which are typically chosen to be 6–10 times larger than σ_{M} and σ_{D}, respectively.
2.2. Spot prediction
Let denote any arbitrary reciprocallattice vector if the crystal has not been rotated, i.e. at rotation angle φ = 0°. Depending on the diffraction geometry, may be rotated into a position fulfilling the reflecting condition. The required rotation angle φ and the coordinates X, Y of the diffracted beam at its intersection with the detector plane can be found from as follows.
can be expressed by its components with respect to the orthonormal goniostat system as
Rotation by φ around axis m_{2} changes into ,
The incidentbeam and diffractedbeam wavevectors, S_{0} and S, have their termini on the and satisfy the
If ρ = denotes the distance of from the rotation axis, solutions for and φ can be obtained in terms of as
In general there are two solutions according to the sign of ·m_{1}. If ρ^{2} < (·m_{3})^{2} or > the have no solution and the reciprocallattice point is in the `blind' region.
If FS·d_{3} > 0 the diffracted beam intersects the detector plane at the point
which leads to a diffraction spot recorded at detector coordinates
2.3. Standard spot shape
A reciprocallattice point crosses the β axis' e_{1} = S × S_{0}/S × S_{0}, as introduced by Schutt & Winkler (1977). Rotation around the fixed axis m_{2}, as enforced by the rotation camera, thus leads to an increase in the length of the shortest path by the factor 1/e_{1}·m_{2}. This motivated the introduction of a coordinate system {e_{1}, e_{2}, e_{3}}, specific for each reflection, which has its origin on the surface of the at the terminus of the diffracted beam wavevector S,
by the shortest route only if the crystal happens to be rotated about an axis perpendicular to both the diffractedbeam and incidentbeam wavevectors, the `The unit vectors e_{1} and e_{2} are tangential to the while e_{3} is perpendicular to e_{1} and = S − S_{0}. The shape of a reflection, as represented with respect to {e_{1}, e_{2}, e_{3}}, then no longer contains geometrical distortions resulting from the fixed rotation axis of the camera and the oblique incidence of the diffracted beam on a flat detector. Instead, all reflections appear as if they had followed the shortest path through the and had been recorded on the surface of the sphere.
A detector pixel at X′, Y′ in the neighbourhood of the reflection centre X, Y, when the crystal is rotated by φ′ instead of φ, is mapped to the profile coordinates ∊_{1}, ∊_{2}, ∊_{3} by the following procedure:
ζ corrects for the increased path length of the reflection through the and is closely related to the reciprocal Lorentz correction factor
Because of crystal mosaicity and beam divergence, the intensity of a reflection is smeared around the diffraction maximum. The fraction of total reflection intensity found in the volume element d∊_{1}d∊_{2}d∊_{3} at ∊_{1}, ∊_{2}, ∊_{3} can be approximated by Gaussian functions:
2.4. Spot centroids and partiality
The intensity of a reflection can be completely recorded on one image or distributed among several adjacent images. The fraction R_{j} of total intensity recorded on image j, the `partiality' of the reflection, can be derived from the distribution function ω(∊_{1}, ∊_{2}, ∊_{3}) as
The integral is evaluated by using a numerical approximation of the error function, erf (Abramowitz & Stegun, 1972).
While the spot centroids in the detector plane are usually good estimates for the detector position of the diffraction maximum, the angular centroid about the rotation axis,
can be a rather poor guess for the true φ angle of the maximum. Its accuracy depends strongly on the value of φ and the size of the oscillation range Δφ relative to the mosaicity σ_{M} of the crystal. For a reflection fully recorded on image j, the value Z = φ_{0} + (j − ½)·Δ_{φ} will always be obtained, which is correct only if φ accidentally happens to be close to the centre of the rotation range of the image. In contrast, the φ angle of a partial reflection recorded on images j and j + 1 is closely approximated by Z = φ_{0} + [j + (R_{j+1} − R_{j})/2]·Δ_{φ}. If many images contribute to the spot intensity, Z(φ) is always an excellent approximation to the ideal angular position φ when the are satisfied; in fact, in the limiting case of infinitely finesliced data it can be shown that lim_{Δφ→0}Z(φ) = φ.
Most φ angles and their approximations obtained from the observed Z centroids and must therefore carefully distinguish between fully and partially recorded reflections. However, this distinction is unnecessary if the observed Z centroids are instead compared with their analytic forms, because the sensitivity of the centroid positions to the diffraction parameters is correctly weighted in either case (see §2.8).
routines minimize the discrepancies between the predicted2.5. Localizing diffraction spots
Often, some of the parameters controlling the diffraction experiment are either completely unknown or available only at a crude approximation. Accurate values for all parameters must be obtained from the recorded data, i.e. from a list of the coordinates of strong spots occurring in the images. As implemented in XDS, this list is obtained from all or a subset of the data images by the following procedure. Firstly, each pixel value is compared with the mean value and standard deviation of surrounding pixels in the same image and classified as a strong pixel if its value exceeds the mean by a given multiple (typically 3–5) of the standard deviation. Values of the strong pixels and their location addresses and image running numbers are saved in a file. After the scan, a hash table of sufficient size is allocated to accommodate the strong pixels from the file together with their addresses (for a discussion of the hash technique, see Wirth, 1976). As several strong pixels may belong to the same spot, they are labelled with a unique spot number so that any two such pixels which can be connected by direct strong neighbours in two or three dimensions (if there are adjacent images) belong to the same spot (equivalence class). The labelling is achieved by the highly efficient algorithm for the recording of equivalence classes developed by Rem (see Dijkstra, 1976). On termination, a list X′_{i}, Y′_{i}, Z′_{i} (i = 1, …, n) of the centroids of n strong spots is available.
2.6. Basis extraction
Any reciprocallattice vector can be written in the form = + + , where h, k, l are integers and , , are reciprocal basis vectors of the The basis vectors which describe the orientation, metric and symmetry of the crystal, as well as the reflection indices h, k, l, have to be determined from the list of strong diffraction spots X′_{i}, Y′_{i}, Z′_{i} (i = 1, …, n). Ideally, each spot corresponds to a reciprocallattice vector which satisfies the after a crystal rotation by φ. Substituting the observed value Z′ for the unknown φ angle (see §2.4), is found from the observed spot coordinates as
Unfortunately, the reciprocallattice vectors (i = 1, …, n) derived from the above list of strong diffraction spots often contain a number of `aliens' (spots arising from fluctuations in the background, from ice or from satellite crystals) and a robust method has to be used which is still capable of recognizing the dominant One approach, suggested by Bricogne (1986) and implemented in a number of variants (Otwinowski & Minor, 1997; Steller et al., 1997), is to identify a basis as the three shortest linear independent vectors b_{1}, b_{2}, b_{3}, each at a maximum of the Fourier transform . Alternatively, a reciprocal basis for the dominant can be determined from short differences between the reciprocallattice vectors (Howard, 1986; Kabsch, 1988a). As implemented in XDS, a basis is found by the following procedure.
The list of given reciprocallattice points (i = 1, …, n) is first reduced to a small number m of lowresolution differencevector clusters (μ = 1, …, m). f_{μ} is the population of a differencevector cluster ; that is, the number of times the difference between any two reciprocallattice vectors − is approximately equal to . In a second step, three linear independent vectors , , are selected among all possible triplets of differencevector clusters that maximize the function Q,
where
and h_{k}^{μ} is the nearest integer to ξ_{k}^{μ}. The absolute maximum of Q is assumed if all difference vectors can be expressed as small integral multiples of the best triplet. Deviations from this ideal situation are quantified by the quality measure q. The value of q declines sharply if the expansion coefficients ξ_{k}^{μ} deviate by more than ∊ from their nearest integers h_{k}^{μ} or if the indices are absolutely larger than δ. The constraint on the allowed range of indices prevents the selection of a spurious triplet of very short differencevector clusters which might be present in the set. Excellent results have been obtained using ∊ = 0.05 and δ = 5. The best vector triplet thus found is refined against the observed differencevector clusters. Finally, a is derived from the refined reciprocalbase vector triplet (see §6).
2.7. Indexing
Once a basis b_{1}, b_{2}, b_{3} of the is available, integral indices h_{i}, k_{i}, l_{i} must be assigned to each reciprocallattice vector (i = 1, …, n). Using the integers nearest to ·b_{k} (k = 1, 2, 3) as indices of the reciprocallattice vectors could easily lead to a misindexing of longer vectors because of inaccuracies in the basis vectors b_{k} and the initial values of the parameters describing the instrumental setup. A more robust solution of the indexing problem is provided by the local indexing method, which assigns only small index differences h_{i} − h_{j}, k_{i} − k_{j}, l_{i} − l_{j} between pairs of neighbouring reciprocallattice vectors (Kabsch, 1993).
The reciprocallattice points can be considered as nodes of a tree. The tree connects the n points to each other with the connections as its branches. The length ℓ_{ij} of a possible branch between nodes i and j is defined here as
where
h_{k}^{ij} is the nearest integer of ξ_{k}^{ij} and k = 1, 2, 3. Reliable index differences are indicated by short branches; in fact, ℓ_{ij} is 0 if none of the indices h_{k}^{ij} is absolutely larger than δ and the ξ_{k}^{ij} are integer values to within ∊. Typical values are ∊ = 0.05 and δ = 5. Defining the length of a tree as the sum of the lengths of its branches, a shortest tree among all n^{n−2} possible trees is determined using the elegant algorithm described by Dijkstra (1976). Starting with arbitrary indices 0, 0, 0 for the root node, the local indexing method then consists of traversing the shortest tree and thereby assigning each node the indices of its predecessor plus the small index differences between the two nodes.
During traversal of the tree, each node is also given a subtree number. Starting with subtree number 1 for the root node, each successor node is given the same subtree number as its predecessor if the length of the connecting branch is below a minimal length ℓ_{min}. Otherwise, its subtree number is incremented by 1. Thus, all nodes in the same subtree have internally consistent reflection indices. Defining the size of a subtree by the number of its nodes, `aliens' are usually found in small subtrees. Finally, a constant index offset is determined such that the centroids of the observed reciprocallattice points belonging to the largest subtree and their corresponding grid vectors are as close as possible. This offset is added to the indices of each reciprocallattice point.
2.8. Refinement
For a fixed detector, the diffraction pattern depends on the parameters S_{0}, m_{2}, b_{1}, b_{2}, b_{3}, X_{0}, Y_{0}, F. Starting values for the parameters can be obtained by the procedures described above, which do not rely on prior knowledge of the crystal orientation, spacegroup symmetry or unitcell metric. Better estimates of the parameter values, as required for the subsequent integration step, can be obtained by the method of least squares from the list of n observed indexed reflection centroids h_{i}, k_{i}, l_{i}, X_{i}′, Y_{i}′, Z_{i}′ (i = 1, …, n). In this method, the parameters are chosen to minimize a weighted sum of squares of the residuals
The residuals between the calculated (X_{i}, Y_{i}, Z_{i}) and observed spot centroids are
Let s_{μ} (μ = 1, …, k) denote the k independent parameters for which initial estimates are available. Expanding the residuals to first order in the parameter changes gives
The parameters should be changed in such a way as to minimize , which implies = 0 for μ = 1, …, k. The are found as the solution of the k normal equations
The parameters are corrected by and a new cycle of E is reached. The weights
is started until a minimum ofare calculated with the current guess for s_{μ} at the beginning of each cycle.
The derivatives appearing in the normal equations can be worked out from the definitions given in §§2.2 and 2.4 and only the form of the gradient of the Z residuals is shown. Assuming σ_{i} = σ_{M}/ζ_{i} (i = 1, …, n) is constant for each reflection, the gradients of the Z residuals are obtained from the chain rule and the relation derf(z)/dz = (2/π^{1/2})exp(−z^{2}).
Obviously, ∂Δ^{i}_{Z}/∂s_{μ} is small for a fully recorded reflection because of the small values of all exponentials appearing in ∂Δ^{i}_{Z}/∂φ_{i}. In contrast, the gradient for a partial reflection that is equally recorded on two adjacent images is most sensitive to parameter variations because one of the exponentials assumes its maximum value. In the limiting case of infinitely finesliced data it can be shown that lim_{Δφ→0}∂Δ^{i}_{Z}/∂φ_{i} = 1. Thus, the scheme based on observed Z centroids, as described here and implemented in XDS, is applicable to finesliced data and also to data recorded with a large oscillation range.
3. Integration
Assuming that the diffraction parameters have been refined successfully as described above, the intensity of a reflection is distributed in the neighbourhood of the predicted location of the diffraction peak among detector pixels of one or several adjacent rotation images. Accurate integration requires several steps: determination of a reflection mask, estimation of the background, generation of reference profiles and integration by profile fitting.
The intensity distribution of a reflection can be modelled analytically or derived from the observed profiles of neighbouring strong spots. For the rotation method, the profile shape depends strongly on the specific path of the reflection through the 2.3, which simplifies the task of modelling the expected intensity distribution as all reflection profiles become similar.
and on variations in the angle of incidence of the diffracted beam on a flat detector. These geometrical distortions can be eliminated by mapping the reflections onto the coordinate system defined in §3.1. Reflection mask
The parameters σ_{M} and σ_{D} of the Gaussian model (see §2.3) used to describe reflection shape can be determined automatically from one or more data images by the following procedure.
n reflections as Determination of the standard deviation of the reflecting range, the mosaicity σ_{M}, requires additional considerations. For each of the n reflections from the list above, let τ denote the angular difference between the rotation angle φ at its Bragg maximum and the centre of the oscillation angles covered by the image. The fraction of observed reflection intensity is (see §2.4) For a given σ_{M}/ζ the function R(τ; σ_{M}/ζ) assumes its maximum at τ = 0 and declines as τ increases. The decline depends strongly on the mosaicity σ_{M} and on the path length of the reflection through the which is accounted for by the factor 1/ζ. For a large mosaicity R(τ; σ_{M}/ζ) declines slowly, which explains why for such crystals many reflections with large τ values can be observed on a data image. Clearly, the list of strong spots located by the automatic procedure described above contains information about the mosaicity of the crystal. The problem of finding σ_{M} from this list can be solved if one considers each τ value as a random variable drawn from a probability distribution R(τ; σ_{M}/ζ) with population parameter σ_{M}/ζ. The mosaicity σ_{M} can then be estimated so that it maximizes the likelihood (joint probability) The parameters σ_{D} and σ_{M} are mainly used to specify the integration region around the spot defined by the parameters δ_{M} and δ_{D}, which are typically chosen to be 6–10 times larger than σ_{M} and σ_{D}, respectively (see §2.1). The reflection mask thus comprises all image pixels that satisfy when mapped to the profile coordinate system {e_{1}, e_{2}, e_{3}} defined in §2.3. In addition, pixels are excluded from the mask if they are closer to the predicted Bragg peak of an intruding reflection from the neighbourhood.3.2. Background
The region around a spot is assumed to have been chosen to be large enough to include a sufficient number of pixels which can be used for determination of the background. Background determination, as implemented in XDS, begins by sorting all pixels belonging to a reflection by increasing intensity. For weak or absent reflections, these values should represent a random sample drawn from a normal distribution. If this is not the case, the pixel with the largest intensity is removed until the sampling distribution of the remaining smaller items satisfies the expected distribution. This method will also exclude pixels with unexpected high values, such as ice reflections. The background, determined as the mean value of the accepted pixels, is systematically overestimated for strong spots because of some residual intensity extending into the accepted background pixels. This residual intensity is estimated from the expected distribution ω(∊_{1}, ∊_{2}, ∊_{3}) defined in §2.3 and removed from the final background value.
3.3. Standard profiles
Reflection profiles are represented on the D_{0} comprising 2n_{1} + 1, 2n_{2} + 1 and 2n_{3} + 1 equidistant gridpoints along e_{1}, e_{2} and e_{3}, respectively. The sampling distances between adjacent grid points are then Δ_{1} = δ_{D}/(2n_{1} + 1), Δ_{2} = δ_{D}/(2n_{2} + 1) and Δ_{3} = δ_{M}/(2n_{3} + 1). Thus, grid coordinate ν_{3} (ν_{3} = −n_{3}, …, n_{3}) covers the set of rotation angles
within a domainContributions to the spot intensity come from one or several adjacent data images (j = j_{1}, …, j_{2}), each covering the set of rotation angles
Assuming Gaussian profiles along e_{3} for all reflections (see §2.3), the fraction of counts (after subtraction of the background) contributed by data frame j to grid coordinate ν_{3} is
where σ = σ_{M}/ζ. The integrals can be expressed in terms of the error function, for which efficient numerical approximations are available (Abramowitz & Stegun, 1972). Finally, each pixel in data image j belonging to the reflection is subdivided into 5 × 5 areas of equal size and of the pixel signal is added to the profile value at grid coordinates ν_{1}, ν_{3}, ν_{3} corresponding to each subdivision.
This complicated procedure leads to more uniform intensity profiles for all reflections than using their untransformed shape. This simplifies the task of modelling the expected intensity distribution needed for integration by profile fitting. As implemented in XDS, reference profiles are learnt every 5° of crystal rotation at nine positions on the detector, each covering an equal area of the detector face. In the learning phase, profile boxes of the strong reflections are normalized and added to their nearest reference profile boxes. The contributions are weighted according to the distance from the location of the reference profile. Each grid point within the average profile boxes is classified as signal if it is above 2% of the Finally, each profile is scaled such that the sum of its signal pixels normalizes to one. The analytical expression ω(∊_{1}, ∊_{2}, ∊_{3}) defined in §2.3 for the expected intensity distribution is only a rough initial approximation, which is now replaced by the empirical reference profiles.
3.4. Intensity estimation
If an expected intensity distribution {p_{i}i ∈ D_{0}} of the observed profile is given in a domain D_{0}, the reflection intensity I can be estimated as
which minimizes the function
b_{i}, c_{i}, v_{i} (i ∈ D) are the background, contents and variance of pixels observed in a subdomain D D_{0} of the expected distribution. The background b_{i} underneath a diffraction spot is often assumed to be a constant which is estimated from the neighbourhood around the reflection. Determination of reflection intensities by profile fitting has a long tradition (Diamond, 1969; Ford, 1974; Kabsch, 1988b; Otwinowski, 1993). Implementations of the method differ mainly in their assumptions about the variances v_{i}. Ford used constant variances, which work well for films, which have a high intrinsic background. In XDS, which was originally designed for a multwire detector, v_{i} ∝ p_{i} was assumed, which results in a straight summation of backgroundsubtracted counts within the expected profile region, I = /. This particular simple formula is very satisfactory for the low background typical of these detectors. For the general case, however, better results can be obtained by using v_{i} = b_{i} + Ip_{i} for the pixel variances as shown by Otwinowski and implemented in DENZO and in later versions of XDS. Starting with v_{i} = b_{i}, the intensity is now found by an iterative process which is terminated if the new intensity estimate becomes negative or does not change within a small tolerance, which is usually reached after three cycles. It can be shown that the solution thus obtained is unique.
4. Scaling
The integrated intensities of the reflections need to be corrected by various factors arising from the following

If more than one data set is included, these are first put on approximately the same scale by the factor K·exp[B·(2sinθ/λ^{2})] involving two parameters, K and B, for each data set. The parameter values are assigned so that the resulting correction factors fit best to the observed intensity ratios of common reflections in each pair of data sets.
For the more detailed corrections, three types of twodimensional functions are used in succession to remove correlations of the intensity of a reflection with (i) image number and resolution, (ii) location in the detector plane and (iii) image number and 13 detector surface regions. To correct for nonuniform detector response such as edge effects at the boundaries of multisegment detectors, the use of smooth analytical correction functions was avoided. Instead, the correction functions are sampled at a finite set of grid regions covering all of the function's definition range. The grid regions are chosen automatically to be as small as possible without overfitting the data so that each sampling region contains more than a specified minimum number of reflections (default 50). Thus, the correction function G is represented by a possibly large number of reciprocal factors G_{l}, where the subscript l denotes the grid regions.
The correction factors G_{l} are found in a cyclic prodedure starting with G_{l} = 1. In each cycle, G_{l} is updated by a factor g_{l}. The target function for is based on an observational equation for each reflection
as introduced by Hamilton et al. (1965). The subscript h represents the unique reflection indices and hl denotes symmetryrelated reflections to h that need to be corrected by the reciprocal scaling factor g_{l} associated with grid point l; I_{hl} and σ_{hl} are their and standard deviation, respectively. This standard deviation is considered to be infinitely large if no such reflection was measured, which amounts to omitting the observational equation altogether. The factors g_{l} and the `true' intensities I_{h} are found at the minimum of the function
The first sum on the right side is a homogeneous function of g_{l} of degree zero so that the g_{l} would only be determined up to an arbitrary factor. The second sum on the right side is used to weakly restrain the scaling factors to one; a reasonable value is σ = 0.05. Minimization of Ψ leads to updates g_{l} in terms of the `true' intensities I_{h} which again depend on g_{l},
The new update factors g_{l} are obtained by using `true' intensities I_{h}^{o} from the previous cycle instead of the current I_{h} as defined above. At the end of the cycle, the old correction factors G_{l} are updated by multiplication with the new g_{l}. This cyclic procedure typically converges in less than six cycles.
The approach described here has been implemented in XDS and XSCALE and has been successfully used for more than two years. In contrast to the `shortest path' eigenvector method of Fox & Holmes (1966), which is very efficient for a relatively small number of variables, the computations here require a time that is proportional to the number of reflections used for scaling and thus quickly lead to a solution even when a very large number of correction factors from many data sets are involved.
5. Postrefinement
The number of fully recorded reflections on each single image rapidly declines for small oscillation ranges and the complete intensities of the partially recorded reflections have to be estimated. This presented a serious obstacle in early structural work on virus crystals, as the crystal had to be replaced after each exposure on account of radiation damage. A solution to this problem, the `postrefinement' technique, was found by Schutt, Winkler and Harrison and variants of this powerful method have been incorporated into most datareduction programs (for a detailed discussion, see Harrison et al., 1985; Rossmann, 1985). The method derives complete intensities of reflections that are only partially recorded on an image from accurate estimates for the fractions of observed intensity: the `partiality'. The partiality of each reflection can always be calculated as a function of orientation, unitcell metric, mosaic spread of the crystal and model intensity distributions. The accuracy of the estimated full reflection intensity obviously then strongly depends on a precise knowledge of the parameters describing the diffraction experiment. Usually, symmetryrelated fully recorded reflections can be found for many of the partial reflections and the list of such pairs of intensity observations can be used to refine the required parameters using a leastsquares procedure. Clearly, this is carried out after all images have been processed, which explains why the procedure is called `postrefinement'.
Adjustments of the diffraction parameters s_{μ} (μ = 1, …, k) are determined by minimization of the function E, which is defined as the weighted sum of squared residuals between calculated and observed partial intensites.
Here, I_{hj} is the intensity recorded on image j of a partial reflection with indices summarized as hj, I_{h} is the mean of the observed intensities of all fully recorded reflections symmetryequivalent to hj, g_{j} is the inverse scaling factor of image j, φ_{hj} is the calculated spindle angle of reflection hj at diffraction and R_{j} is the computed fraction of total intensity recorded on image j.
Expansion of the residuals Δ_{hj} to first order in the parameter changes δs_{μ} and minimization of E(δs_{μ}) leads to the k normal equations
Often, the normal matrix is illconditioned since changes in some unitcell parameters or small rotations of the crystal about the incident Xray beam do not significantly affect the calculated partiality R_{j}. To take care of these difficulties, the system of equations is rescaled to yield unit diagonal elements for the normal matrix and the correction vector δs_{μ} is filtered by projection into a subspace defined by the eigenvectors of the normal matrix with sufficiently large eigenvalues (Diamond, 1966).
The parameters are corrected by the filtered δs_{μ} and a new cycle of is started until a minimum of E is reached. The weights, residuals and their gradients are calculated using the current values for s_{μ} and g_{j} at the beginning of each cycle. The derivatives
appearing in the normal equations can be worked out from the definitions given in §§2.2 and 2.4 (to simplify the following equations, the subscript hj is omitted). The fraction R_{j} of total intensity can be expressed in terms of the error function (see §2.4) as
Using the relation derf(z)/dz = (2/π^{1/2})exp(−z^{2}), the derivatives of R_{j} are
The derivatives ∂φ/∂s_{μ}, ∂σ_{M}/∂s_{μ} and ∂ζ/∂s_{μ} remain to be worked out (not shown here). As discussed in detail by Greenhough & Helliwell (1982), spectral dispersion and asymmetric beam crossfire lead to some variation in σ_{M}, which makes it necessary to include additional parameters in the list s_{μ}. The effect of these parameters on the partiality is dealt with easily by the derivatives ∂σ_{M}/∂s_{μ}.
The g_{j}. With the now improved estimates for the partialities R_{j}, a new set of scaling factors can be obtained using the method outlined in §4. This alternating procedure of scaling and postrefinement usually converges within three cycles.
scheme described above requires initial scaling factorsThe use of error functions for modelling partiality, as implicated by a Gaussian model for describing spot shape, was chosen here for reasons of conceptual simplicity and coherence. This choice is unlikely to significantly alter the results of postrefinement that are based on other functions of similar form (see the discussion by Rossmann, 1985).
6. Spacegroup assignment
Identification of the correct
is not always an easy task and should be postponed for as long as possible. Sometimes, the true only becomes known when the structure has been successfully solved and refined! However, one can expect to identify a small number of possibilities from the diffraction experiment.Fortunately, all data processing as implemented in the program XDS can be carried out in the absence of any knowledge of the crystal symmetry and unitcell parameters. In this case, a is extracted from the observed diffraction pattern and processing of the data images continues to completion as if the crystal were triclinic. Clearly, the reflection indices then refer to the and must be reindexed once the is known. For all space groups, the required reindexing transformation is linear and involves only whole numbers, as shown in Part 9 of Vol. A of International Tables for Crystallography (1989).
Automatic spacegroup assignment is carried out in two steps once integrated intensities of all reflections are available (see Kabsch, 2010). Firstly, the Bravais lattices are identified that are compatible with the derived from the observed diffraction pattern. In the second step, all enantiomorphous space groups compatible with the observed symmetry are rated by a redundancyindependent R factor. The group is selected that explains all integrated intensities in the data set at an acceptable R factor requiring a minimum number of unique reflections (Occam's principle). This approach deliberately avoids any test for the presence of screw axes as these tests would depend strongly on the completeness of the data. Fortunately, the presence or absence of screw axes is irrelevant for the determination of data correction/scaling factors (see §4).
6.1. Determination of the Bravais lattice
The determination of possible Bravais lattices is based upon the concept of the A of International Tables for Crystallography (1989). A b_{1}, b_{2}, b_{3} of a given is defined there as a if it is righthanded and if the components of its
whose metric parameters characterize 44 types as described in Part 9 of Vol.satisfy a number of conditions (inequalities). The main conditions state that the basis vectors are the shortest three linear independent International Tables for Crystallography, each of the 44 types is characterized by additional equality relations among the six components of the reducedcell As an example, for character 11 (Bravais type tP) the components of the of the must satisfy
vectors with either all acute or all nonacute angles between them. As specified in(Note that the other tetragonal primitive A ≤ B = C with the fourfold as the shortest axis.)
character 21 requiresAny primitive triclinic cell describing a given
can be converted into a It is well known, however, that the thus derived is sensitive to experimental error. Hence, the direct approach of first deriving the correct and then finding the type is unstable and may in certain cases even prevent identification of the correct Bravais lattice.A suitable solution of the problem has been found that avoids any decision as to what the `true' ). The essential ingredients of this procedure are (i) a database of possible reduced cells and (ii) a backwardsearch strategy that finds the bestfitting cell in the database for each type.
is (see Kabsch, 1993The database is derived from a seed cell which strictly satisfies the definitions for a A of International Tables for Crystallography (1989) may be violated. These violations are treated as arising from experimental error.
All cells of the same volume as the seed cell whose basis vectors can be linearly expressed in terms of the seed vectors by indices −1, 0 or +1 are included in the database. Each in the database is considered as a potential although some of the defining conditions as given in Part 9 of Vol.The backwardsearch strategy starts with the hypothesis that the b_{1}, b_{2}, b_{3} from the database characterizes character 11 (Bravais tP), the quality index
type is already known and identifies the bestfitting cell in the database of possible reduced cells. In contrast to a forwarddirected search, it is now always possible to decide which conditions have to be satisfied by the components of the of the The total amount by which all these equality and inequality conditions are violated is used as a quality index. For example, to find out how well a potentialis computed. Positive values of p_{11} indicate that some conditions are not satisfied. All potential reduced cells in the data base are tested and the smallest value for p_{11} is used for rating type 11. A similar test is carried out for all 44 possible types using quality indices based on their defining conditions as listed in Part 9 of Vol. A of International Tables for Crystallography (1989).
The results obtained using this method are shown in Table 1 for the example of a data set comprising 177 images with each exposure covering 0.5° of spindle rotation. The of the protein crystal was P4_{3}2_{1}2 (unitcell parameters a = 159.4, b = 159.4, c = 160.3 Å), but this knowledge was not used in the processing. Instead, the data were processed with respect to a triclinic derived from the observed diffraction pattern as described above. The images contained a total of 292 998 reflections within the resolution range 20.0–3.0 Å; 57 548 reflections in the resolution range 10.0–5.0 Å were used for spacegroup determination. For determination of the symmetry all 44 possibilities were considered and rated by their quality index. The table shows the possible symmetries, their implied conventional unitcell parameters and a reindexing transformation. The table entries are sorted by increasing quality index and reveal a nearly cubic symmetry. A symmetry is considered to be acceptable if it has a low quality index and its implied unitcell parameters do not violate the ideal values by more than 3.0° in angles and 3% in cell axes. Thus, except for the last entry, all of the symmetries in the table are acceptable; the correct type 11 tP is highlighted. symmetries that are not accepted include all bodycentred lattices or those that are centred on all faces; they are omitted from the table.

The reindexing transformation REIDX() consists of 12 integers that relate the original indices h, k, l used during the integration to the indices h′, k′, l′ with respect to the new cell.
The value of the integer IDXV depends on the P1.
type used for specifying reflection indices in the integration step. IDXV is 1 for a primitive 2 for a facecentred or bodycentred 3 for a rhombohedral and 4 for a centred on all faces. In the example case we have IDXV = 1 because integration was carried out inNote also that elements 4, 8 and 12 of the transformation are always 0 in this example. These three extra elements were introduced to provide a simple tool for correcting the indices if all reflections are misindexed by a constant.
6.2. Finding possible space groups
For protein crystals, the absence of paritychanging symmetry operators restricts the number of possible space groups to 65 instead of 230. Moreover, the determination of correction factors for the integrated intensities does not depend on the presence or absence of any screw axes so that data processing can be finished without this knowledge. This reduces the problem to the identification of an enantiomorphous
without screw axes that is compatible with the observed symmetry (see above).For solution of the problem, a quality indicator of the mean variation in the intensities of symmetryequivalent reflections (R_{meas}) is calculated for each possible group. The decision for a particular group is then based on Occam's principle: the selected group must explain all integrated intensities in the data set at acceptable quality, thereby requiring a minimum number of unique reflections.
A suitable redundancyindependent data quality indicator has been suggested by Diederichs & Karplus (1997) and Weiss (2001),
The subscript h represents the unique reflection indices and hl denotes any of the n_{h} symmetryrelated reflections to h. The absolute differences between the observed intensities I_{hl} and their mean intensity I_{h} are weighted to remove any dependency on n_{h} and compared with the intensities. Small values of R_{meas} indicate accurate single observations I_{hl} and the use of symmetry operators compatible with the intensity data set.
For the above example data set, Table 2 lists all enantiomorphous groups which are in harmony with the observed symmetry shown in Table 1. For each listed UNIQUE is the number of unique reflections and COMPARED is the number of reflections used to calculate the redundancyindependent R factor R_{meas}. Two sets of groups can be distinguished clearly: those implying an acceptable R_{meas} and a second set with R_{meas} > 45%, which is totally unacceptable. Among the acceptable solutions a minimum number of unique reflections is needed if the crystal has the tetragonal spacegroup symmetry P422.

References
Abramowitz, M. & Stegun, I. A. (1972). Handbook of Mathematical Functions. New York: Dover Publications. Google Scholar
Bricogne, G. (1986). Proceedings of the EEC Cooperative Workshop on PositionSensitive Detector Software (Phase III), p. 28. Paris: LURE. Google Scholar
Diamond, R. (1966). Acta Cryst. 21, 253–266. CrossRef CAS IUCr Journals Web of Science Google Scholar
Diamond, R. (1969). Acta Cryst. A25, 43–55. CrossRef CAS IUCr Journals Web of Science Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Nature Struct. Biol. 4, 269–275. CrossRef CAS PubMed Web of Science Google Scholar
Dijkstra, E. W. (1976). A Discipline of Programming, pp. 154–167. New Jersey: Prentice–Hall. Google Scholar
Evans, P. (2006). Acta Cryst. D62, 72–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Ford, G. C. (1974). J. Appl. Cryst. 7, 555–564. CrossRef IUCr Journals Web of Science Google Scholar
Fox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886–891. CrossRef CAS IUCr Journals Web of Science Google Scholar
Greenhough, T. J. & Helliwell, J. R. (1982). J. Appl. Cryst. 15, 338–351. CrossRef CAS Web of Science IUCr Journals Google Scholar
Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130. CrossRef IUCr Journals Web of Science Google Scholar
Harrison, S. C., Winkler, F. K., Schutt, C. E. & Durbin, R. M. (1985). Methods Enzymol. 114, 211–237. CrossRef CAS PubMed Google Scholar
Howard, A. (1986). Proceedings of the EEC Cooperative Workshop on PositionSensitive Detector Software (Phases I and II), pp. 89–94 Paris: LURE. Google Scholar
International Tables for Crystallography (1989). Vol. A, pp. 738–749. Dordrecht: Kluwer Academic Publishers. Google Scholar
International Tables for Crystallography (2001). Vol. F, ch. 25.2, pp. 695–743. Dordrecht: Kluwer Academic Publishers. Google Scholar
Kabsch, W. (1988a). J. Appl. Cryst. 21, 67–71. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (1988b). J. Appl. Cryst. 21, 916–924. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (1993). J. Appl. Cryst. 26, 795–800. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (2010). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Otwinowski, Z. (1993). Proceedings of the CCP4 Study Weekend. Data Collection and Processing, edited by L. Sawyer, N. Isaacs & S. Bailey, pp. 56–62. Warrington: Daresbury Laboratory. Google Scholar
Otwinowski, Z., Borek, D., Majewski, W. & Minor, W. (2003). Acta Cryst. A59, 228–234. Web of Science CrossRef CAS IUCr Journals Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. Google Scholar
Pflugrath, J. W. (1997). Methods Enzymol. 276, 286–306. CrossRef CAS Web of Science Google Scholar
Rossmann, M. G. (1985). Methods Enzymol. 114, 237–280. CrossRef CAS PubMed Google Scholar
Schutt, C. & Winkler, F. K. (1977). The Rotation Method in Crystallography, edited by U. W. Arndt & A. J. Wonacott, pp. 173–186. Amsterdam: NorthHolland. Google Scholar
Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). J. Appl. Cryst. 30, 1036–1040. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wirth, N. (1976). Algorithms + Data Structures = Programs, pp. 264–274. New York: Prentice–Hall. Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.