 1. Introduction
 2. Physical reasons for differences of scale
 3. Modelling the correction factors
 4. Determining the correction factors
 5. Assessment of data quality
 6. Scaling of multiplewavelength data sets and detection of anomalous signal
 7. Determination of Laue group, point group and space group
 A1. Files, data sets, runs and batches: data organization
 A2. Scaling model
 A3. Estimation of errors
 A4. Averaged intensities
 A5. Outlier rejection algorithm
 References
 1. Introduction
 2. Physical reasons for differences of scale
 3. Modelling the correction factors
 4. Determining the correction factors
 5. Assessment of data quality
 6. Scaling of multiplewavelength data sets and detection of anomalous signal
 7. Determination of Laue group, point group and space group
 A1. Files, data sets, runs and batches: data organization
 A2. Scaling model
 A3. Estimation of errors
 A4. Averaged intensities
 A5. Outlier rejection algorithm
 References
research papers
Scaling and assessment of data quality
^{a}MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England
^{*}Correspondence email: pre@mrclmb.cam.ac.uk
The various physical factors affecting measured diffraction intensities are discussed, as are the scaling models which may be used to put the data on a consistent scale. After scaling, the intensities can be analysed to set the real resolution of the data set, to detect bad regions (e.g. bad images), to analyse radiation damage and to assess the overall quality of the data set. The significance of any anomalous signal may be assessed by probability and The algorithms used by the CCP4 scaling program SCALA are described. A requirement for the scaling and merging of intensities is knowledge of the Laue group and pointgroup symmetries: the possible symmetry of the diffraction pattern may be determined from scores such as correlation coefficients between observations which might be symmetryrelated. These scoring functions are implemented in a new program POINTLESS.
Keywords: scaling diffraction data; data quality; Laue group determination.
1. Introduction
The diffraction intensities measured by integrating spots recorded on an area detector are not all on the same scale because they are affected by a number of physical factors from the experiment, most of which are difficult to measure directly. The process of `data reduction' uses the redundancy of multiple measurements of symmetryrelated reflections to put all observations on a common scale by fitting a scaling model which reflects the experiment. This process produces a data set which is internally consistent, within the errors of the model, though not necessarily correct on an absolute scale.
Analysis of the agreement between equivalent reflections after scaling gives estimates of the quality of the data and also highlights parts of the data which agree poorly with the rest. This allows decisions to be made about whether parts of the data should be rejected.
This paper discusses the physical reasons for the differences in scale, the scaling model and the analysis of data. This discussion is based on the CCP4 program SCALA, but the general ideas also apply to other implementations of scaling. Some more details of the program SCALA are given in Appendix A. This paper also discusses some considerations in the determination of the Laue group and hence the and a new program (POINTLESS) which scores and ranks different possible Laue groups.
2. Physical reasons for differences of scale
The various factors affecting the measured intensity can be divided into those dependent on the primary beam and the way in which the crystal is rotated, those dependent on the diffracted beam direction and those dependent on the detector. These factors may then be combined into a model to correct the measured intensities.
2.1. Factors related to the incident Xray beam
We generally assume that
has been sampled by rotation of the crystal at a constant speed in an incident beam of constant or smoothly varying intensity and that adjacent images exactly abut each other in rotation angle. Variations in the rotation rate, rapid fluctuations in incidentbeam intensity or errors in synchronization of the shutter cause systematic errors which are difficult or impossible either to detect or to model and ideally these factors should be explicitly monitored.Correctable factors are slow variation in incidentbeam intensity (for example on synchrotron beams), change in illuminated volume if the beam is smaller than the crystal and absorption in the primary beam. These can be grouped together into a single correction factor dependent on the crystal rotation.
2.2. Factors related to the crystal and diffracted beam
Absorption in the secondary beam direction is serious at long wavelengths and worth correcting in all cases, particularly as a relative correction for single or multiplewavelength et al., 2003; Diederichs, 2006) requires many observations of each reflection well spaced out in time and this is not generally possible in radiationsensitive cases. The relative B factor (see §3 and §A2.2) is essentially a correction for average radiation damage.
measurements. The most difficult systematic error is radiation damage, since radiation causes the structure to change with time, which means that different reflections change at different rates. Extrapolation to zero time (Diederichs2.3. Factors related to the detector
The detector should be properly calibrated for spatial distortion and sensitivity of response as well as for any defective regions and should be stable: detector corrections cannot easily be extracted from diffraction data. The user will usually have to tell the integration program about shadows from the beamstop and other obstructions and it is important to do so.
3. Modelling the correction factors
The scaling model should be chosen as far as possible to describe the diffraction experiment performed. Various scaling models have been used to model the correction as a function of rotation (or time) and the direction of the diffracted beam: a good discussion of modelling the various factors, using a general exponential model, is given by Otwinowski et al. (2003). The simplest model applies a different scale factor for each image, but the scale does not usually vary sharply from one image to another, so a smooth function is more appropriate: the function used in SCALA was inspired by the method of Kabsch (1988) (see Appendix A and Kabsch, 2000). Using separate scales for each image (`batch' scales) introduces discontinuities in the scale, even if neighbouring scales are restrained together (Otwinowski et al., 2003), which is usually undesirable. `Batch' scaling also causes complications for partially recorded reflections in that different parts of the same reflection have different scales, so that in the determination of scales either the partial derivatives must be partitioned according to the calculated fraction or each part must be treated separately and scaled up to the full equivalent (Rossmann & van Beek, 1999); both methods use the calculated fraction, which is typically not very accurate. A smooth scale model avoids this problem by scaling each reflection after summing all its parts.
The other traditional component of the scaling model is a relative B factor, exp(−2Bsin^{2}θ/λ^{2}), where B is a function of time (or rotation or image number). This provides a resolutiondependent radiationdamage correction, but it is an average correction and cannot account for localized radiation damage. Like the scale factor, this is best treated as a smooth function of time (or rotation as its proxy; see §A2.2).
Absorption in the secondary beam direction is best parameterized as coefficients of real spherical harmonics, either in the rotating crystal frame or in the diffractometer frame (Katayama, 1986; Blessing, 1995). These two coordinate frames give very similar results if data are collected about a single rotation axis, but if a crystal is rotated about two or more axes a single absorption surface expressed in the crystal frame may in principle be used for all rotation sweeps. This assumes perfect centring of the crystal, so use of different surfaces for each sweep is likely to be better.
SCALA includes an optional and rather crude correction for errors introduced by the long tails on reflections from diffuse scattering and the inconsistency of sampling these tails by relatively coarse slicing on the rotation (see §A2.4). This may be helpful when the image width is comparable to or larger than the reflection width, when the sampling of the reflection profile is very different between reflections measured on one, two or more images. The error caused by this differential sampling is apparent in the systematic underestimation of fully recorded reflections (from one image) compared with summed partials (from two or more images), giving rise to a negative `partial bias', defined as , where 〈I_{full}〉 is the average of all fully recorded observations of the reflection (or more generally of the observations with the smallest number of parts) and I_{partial} is a summed partial observation and the summation is over all reflections which have both fully recorded and partial observations and over all summed partials. This correction should be applied with caution, since such bias can also arise from underestimation of the mosaic spread defining the width of the Bragg peak: in this case, the `tails' correction is inappropriate. It is also unlikely to be helpful when the image width is smaller than the reflection width (`fine slicing').
4. Determining the correction factors
The correction factors are optimized to make the data as internally consistent as possible by minimizing the difference between symmetryrelated observations. Note that the only information we have is the measured difference between symmetryrelated observations (unless an external reference data set is used), so any systematic error which follows the
will not be corrected: in particular, crystal absorption errors may remain since the shape of a crystal often obeys its diffraction symmetry. It follows that to obtain the most accurate data symmetryrelated observations should be measured in as different a way as possible (by rotation about more than one axis). Conversely, to obtain the most accurate differences for phasing (anomalous or isomorphous), equivalent observations should be be measured in as similar way as possible, with the same systematic errors.The function minimized is
where I_{hl} is the lth observation of reflection h, g_{hl} is its associated inverse scale, w_{hl} = 1/σ^{2}(I_{hl}) and 〈I_{h}〉 is the weighted average intensity for all observations l of reflection h (Hamilton et al., 1965; Fox & Holmes, 1966). The inverse scale g_{hl} is a function of all the parameters in the model.
From minimization of Ψ within one reflection,
By minimizing Ψ over all reflections, we obtain values for all the parameters. This is performed by a singular value decomposition (Fox & Holmes, 1966), eliminating two zero eigenvalues corresponding to the scales and the B factors, since the residual Ψ is unchanged by multiplying all the scale parameters by a constant or by adding a constant to all the B factors. The first scale factor is normalized to a value of 1 and the Bfactor parameters are all forced negative by normalizing the largest one to 0. Parameters may be restrained by additional terms in the residual: for example, it is useful to restrain the coefficients of the spherical harmonic terms in the absorption correction to a target value of 0 to avoid wild corrections with limited data (see §A2.3.)
5. Assessment of data quality
After applying the refined scale model, the quality of the data may be assessed in a number of ways based on the internal consistency of the data and comparison of the corrected intensities with the corrected standard deviations (see §A3). There are a number of important questions about the data which need to be answered: what is the real resolution, are there bad regions of data which should be omitted, is there any anomalous signal and what is the overall quality of the data? The internal consistency may be measured as R factors or as correlation coefficients. The conventional R_{merge} (also known as R_{sym}) is not a particularly good measure of data quality as it only measures the discrepancy between observations and takes no account of the improvement in the merged intensity by averaging many observations: indeed, R_{merge} tends to increase with increasing multiplicity. Improved multiplicityweighted R factors have been suggested by Diederichs & Karplus (1997), Weiss & Hilgenfeld (1997) and Weiss (2001). If n_{h} is the number of observations of reflection h, then
the traditional R_{merge},
the multiplicityindependent R factor, and
the precisionindicating R factor.
R_{meas} = R_{r.i.m.} is an improved version of the traditional R_{merge} and measures how well the different observations agree. R_{p.i.m.} is a measure of the quality of the data after averaging the multiple measurements.
5.1. What is the real resolution?
The varianceweighted average intensities fall off with increasing resolution and may be compared with the corrected standard deviation estimate (§A3). A typical resolution cutoff is when 〈I/σ(I)〉 (averaging within resolution bins on 1/d^{2} = 4sin^{2}θ/λ^{2}) falls below 2.0. Beyond this point, the data are probably too weak to be useful in The between intensities averaged within two random half data sets also gives an indication of the maximum resolution (see Fig. 3b and §6). Many crystals show anisotropic diffraction and the resolution limits ought to be anisotropic, but at present no programs treat anisotropic data gracefully.
5.2. Are there bad parts of the data?
A plot of R_{merge} against `batch' number will show if there any individual images or parts of the data which are significantly worse than the rest of the data: this might suggest that there is a bad image or that something has gone wrong with the integration. In the case illustrated in Fig. 1 there is a blank image owing to the beam disappearing.
Radiation damage causes serious degradation of data quality and shows up in several plots against batch number, but most clearly from the relative B factor: Fig. 2 shows that as the crystal dies the scale increases, the B factor becomes more negative, the R_{merge} increases and 〈I〉/〈Scatter〉 [where Scatter is the r.m.s. value of (I_{hl} − 〈I〉)] decreases.
5.3. Outlier rejection
Occasionally, individual observed intensities are just wrong, for one of a number of reasons. These include (i) spots which do not belong to the main i.e. events on the detector which do not arise from Xrays, and (iii) spots which lie outside the active area of the detector, e.g. behind the beamstop.
but overlap a predicted position, from ice crystals, salt crystals or another crystal, (ii) zingers,Detecting outliers is reasonably easy if the reflection has been measured many times, but is not possible for a reflection measured only once or twice: this is a major reason for measuring data with a high multiplicity. The outlier rejection algorithm used in SCALA is described in §A5. Note that outlier detection generally assumes that the majority of observations of a reflection are correct: one common case where this may cause problems is with spots behind a slightly miscentred beamstop, when it is possible that the majority of observations are wrong and the program will reject the correct ones. It is important to tell the integration program (e.g. MOSFLM) the position of the beamstop explicitly.
Spurious observations arising from ice or salt spots are often very large and may be rejected if they have an intensity much larger than would be expected (Read, 1999). This test is performed on the normalized amplitudes E, normalized as a function of resolution such that 〈E^{2}〉 = 1. An E of > 4 is very unlikely, but because of the errors in normalization, particularly at low resolution where the mean intensity is changing rapidly with resolution and there are relatively few reflections, or with anisotropic data, it is better to reject only observations with E > 8–10.
6. Scaling of multiplewavelength data sets and detection of anomalous signal
When multiple data sets have been collected from the same crystal (or indeed different crystals) at different wavelengths for a MAD experiment, the relative systematic errors may be reduced by scaling them together, assuming for the purposes of scaling that the differences between the data sets arising from different I^{+} and I^{−} within a data set are usually small and may be ignored in the scaling step. Scaling data sets together forces all observations to be as similar as possible within the scaling model and improves the signal, since the scaling model varies slowly in while the desired signal (anomalous or dispersive differences) varies more rapidly, so the differences remaining after the relative systematic errors have been removed are closer to the true signal. This was discussed by Evans (1997), but in retrospect the scaling seems to work well without the reference data set recommended there.
are small. Similarly, the differences between Bijvoet pairsIt is useful to know if there is any significant anomalous or dispersive signal before attempting to locate anomalous scatterers. The observed anomalous differences may be compared with their estimated standard deviations using a normal probability plot of the normalized differences δ_{anom} = I^{+} − I^{−} [σ^{2}(I^{+}) + σ^{2}(I^{−})]^{1/2} (Howell & Smith, 1992). The slope of the central region of this plot will be >1 if the anomalous differences are larger than expected from their standard devations (Fig. 3a). Another way of detecting a signal is from the between differences in different data sets (Fig. 3b): this will fall off with resolution and may be used to set a suitable maximum resolution limit for initial trials to locate anomalous scatterers (Schneider & Sheldrick, 2002). If only one data set is available (SAD), it may be split randomly into two half data sets, provided the multiplicity is high enough, and the correlations calculated between the two halves (Fig. 3c).
Another way of analysing the significance of the anomalous signal from the half data sets is from a scatter plot: for each reflection we divide the I^{+} and the I^{−} observations randomly into two sets, average them within the sets and subtract them to obtain ΔI_{1} = 〈I^{+}〉_{1} − 〈I^{−}〉_{1}, ΔI_{2} = 〈I^{+}〉_{2} − 〈I^{−}〉_{2} for each reflection, then plot ΔI_{1} against ΔI_{2}. For perfect data where ΔI_{1} = ΔI_{2}, this plot would have all points lying along the diagonal. The is the slope of the leastsquares straight line fitted through these points, but it is very sensitive to a few outliers and makes no use of the fact that the slope should be 1.0 for ideal data. Real data (Fig. 4a) shows a distribution which is roughly elliptical. The width of the distribution along the diagonal is a measure of the signal and its width perpendicular to the diagonal is a measure of the error, so the ratio of these, the r.m.s. correlation ratio = (r.m.s. deviation along diagonal)/(r.m.s. deviation perpendicular to diagonal), can be used as a measure of the significance of the signal and may be plotted as a function of resolution (Fig. 4b). In the absence of any anomalous signal, the distribution is spherical (Fig. 4c) and the r.m.s. correlation ratio is close to 1 (Fig. 4d). This measure seems somewhat more robust than the with less variation between resolution bins, but leads to similar conclusions about a suitable resolution at which to truncate the data to preserve a strong signal: for the peak wavelength in the example in Figs. 3, 4(a) and 4(b), a good signal extends to about 3.6 Å resolution with correlation coefficients between and within data sets of above about 0.3 and an r.m.s. correlation ratio of above 1.5.
7. Determination of Laue group, and space group
The true POINTLESS, which will be distributed in the CCP4 suite.
of a crystal cannot be known with certainty until the structure has been solved and refined, since it is easy to be misled by and perhaps by but the does impose itself on the diffracted intensities and from these it is possible to propose the likely or at least a range of possibilities. It is useful to find the likely symmetry as early as possible during the initial examination of a crystal, since it affects the datacollection strategy (how much rotation range is needed for a complete data set). Scaling and merging depends on the Laue group (or more strictly, the see below), since this controls which spots are related by symmetry. This section describes the methods which are used in a new program to determine the Laue group,7.1. Stages in spacegroup determination
can be considered as a series of stages of increasing difficulty: determining successively the lattice symmetry, the Laue group, the and the At all stages, distinguishing between the possibilities may be uncertain owing to either a small number of observations or (see §7.1.1. Lattice symmetry: crystal class
Autoindexing determines the unitcell parameters of the observed lattice initially without constraints, but the e.g. a = b, α = β = 90, γ = 120° for a hexagonal lattice) and lattice centring restricts which indices are present (e.g. h + k + l even for an Icentred lattice). When indexing a diffraction pattern, the user (or the program) chooses a lattice which fits geometrically to the observed spot positions within an acceptable limit on some penalty function (see, for example, Leslie, 2006), but the apparent cell restrictions may occur accidentally (e.g. β ≃ 90° in a monoclinic cell) and at the indexing stage the intensities are not available to indicate that the wrong choice has been made.
imposes restrictions on the allowed cell (7.1.2. Laue group symmetry
The Laue group is the symmetry of the diffraction pattern, plus any lattice centring. It corresponds to the 7.2).
without any translations, with an added centre of symmetry from Friedel's law. The Laue group may be inferred from the observed symmetry of the diffraction pattern (see §7.1.3. Pointgroup symmetry
To take i.e. for all macromolecular crystals), there is only one possible corresponding to each Laue group. For many nonchiral space groups, the may be inferred by determination of which principal zones of the are centric, which can be performed from intensity statistics: a centre of symmetry makes all reflections centric and a twofold axis (rotation or screw) makes the perpendicular zone centric, while a mirror or glide plane does not. However, in practice tests on zone statistics are unreliable, particularly in the presence of heavy atoms or pseudosymmetry (G. M. Sheldrick, personal communication).
into account, intensity observations should be averaged according to the that can be derived from the by removing the lattice type and translations. For chiral space groups (7.1.4. Spacegroup symmetry
The
is the plus translations (screw axes for chiral space groups). Screw axes are only visible in the diffraction pattern as along the axes and these are not always very reliable as there may be few reflections and there may also be accidental absences. Determination of the translational part of the from axial absences must be considered as a hypothesis to be confirmed by structure solution. In nonchiral cases, possible glide planes introduce absences throughout a zone which may be detected more reliably.7.2. Scoring functions for determination of Laue group symmetry
To distinguish between possible Laue groups, we need to compare observations which might be related by potential symmetry and score their agreement. There are two problems which need to be addressed in choosing a suitable method of scoring. Firstly, we would like to be able to obtain a preliminary idea of the symmetry from a very partial data set, from the first few images, even before a complete data collection and we want a method which is robust to limited data and will not give a spurious high score from a few accidental agreements. Secondly, we would like a score function which is insensitive to the scale between observations, since we need to know the symmetry to scale the data.
Two sorts of scoring functions have been tried.

Use of the POINTLESS is to calculate the score given by all possible intensity pairs related by a potential (the test score) and to compare this score with scores from the same size groups of unrelated pairs. The many pairs at the same resolution which cannot be related by symmetry are divided into groups of the same size as the test sample (with a maximum size of say 200, since larger groups should not be very different), the score is calculated for each group and then the mean and standard deviation of these scores used to convert the test score into a Z score,
reduces the problem of the unknown scales, but the problem of small samples remains. The approach used in7.3. Determining the Laue group in POINTLESS
POINTLESS reads unmerged integrated intensities from, for example, MOSFLM and determines the lattice with highest possible symmetry compatible with the unitcell parameters, within a rather generous limit (currently 3°; Le Page, 1982). The symmetry in the file is ignored. Most of the symmetry handling in the program uses the cctbx library (GrosseKunstleve et al., 2002). Each (rotation axis) in this lattice symmetry is scored separately using all pairs of observations related by that rotation. All the possible combinations of these elements are then scored, giving all the possible subgroups. For each the score for elements belonging to the lattice group but not to the are subtracted from the score for elements which do belong to the subgroup
This favours the highest symmetry consistent with a good score in preference to lower symmetries with good Z(for) scores.
7.3.1. Example 1: an orthorhombic case with a ≃ b
A crystal indexed and integrated with unitcell parameters a = 44.67, b = 46.10, c = 117.89 Å, α = β = γ = 90° was tested in the possible tetragonal lattice P4/mmm using either just the first 5° of data or a full 90° data set. Table 1 shows the scores for the individual possible symmetry elements: the twofold axes along c [001] and a [100] are clearly present, but the twofold along b [010] has only four pairs of observations and thus is indeterminate. The potential fourfold axis along l [001] is not obviously present and the diagonal twofolds are absent. With the full 90° data set (righthand part of table) the Z scores are larger, mainly because σ[CC(unrelated)] is smaller and the twofold along b is now clear. Table 2 shows the scores for all the possible Laue groups, showing that even with the very limited 5° of data the correct Laue group Pmmm is reasonably clear.


7.3.2. Example 2: pseudohexagonal Cmmm
A hexagonal lattice may be indexed as Ccentred orthorhombic in three different ways, related by 60° rotations. Conversely, a true Ccentred orthorhombic lattice with b = 3^{1/2}a can be indexed as hexagonal. In this case, an autoindexing program has only a one in three chance of picking the correct orthorhombic lattice.
In the case illustrated in Tables 3 and 4, the has b ≃ 3^{1/2}a and was indexed incorrectly. The scores on the individual symmetry elements (Table 3) clearly pick out the correct 222 set of rotations and the combination (Table 4) selects the correct Cmmm setting.


7.4. Future directions
Future developments of POINTLESS will include assessment of intensity statistics and in order to score possible space groups and to detect and comparison with previously collected data sets to choose between alternative valid but nonequivalent indexing schemes. Ultimately, it is intended that all the facilities of SCALA will also be included.
APPENDIX A
Algorithms used in SCALA
This appendix describes some of the details of the scaling and analysis calculations in SCALA. It is not comprehensive, but covers the most important and commonly used functions. The description here refers to version 3.2.13. Most of the algorithms also described in the documentation for SCALA distributed by CCP4.
A1. Files, data sets, runs and batches: data organization
Unmerged intensity data is read from a file in the CCP4 MTZ format, which represents a hierarchy of data organization. This file typically comes from the integration program MOSFLM, but intensities from other programs may be imported via the CCP4 programs COMBAT or DTREK2SCALA. With COMBAT, geometric information may be lost in this process, so not all scaling options may be available. The file may contain several data sets (e.g. collected at different wavelengths for MAD), each of which is divided by the program into `runs'. Each run consists of spots from a set of contiguous images (`batches') and has its own set of scaling parameters, i.e. the scales vary smoothly within the run but are different between runs. SCALA automatically divides data into runs at any discontinuity in batch number or rotation angle. A `reflection' consists of all `observations' of symmetryrelated intensities and each observation may consist of a number of `parts'. Parts are summed to form a complete observation, provided that either the flags from MOSFLM are consistent (e.g. all parts 1–3 of three present) or the total calculated fraction (read from the file) is within limits (usually 0.95–1.05).
A2. Scaling model
The inverse scale factor for an observation I_{hl} (i.e. the lth observation of reflection h) is composed of four parts
A2.1. Scale factor
The scale term C_{hl} for a reflection measured at rotation angle φ is smoothly interpolated with Gaussian weights from a series of scales at intervals, typically 5° (Δφ), covering the range of the data in a run. For the normalized rotation angle r = (φ − φ_{0})/Δφ (where φ_{0} is the initial rotation angle),
where C_{i} are the scale factors at positions r_{i} and the summation i is over all scales close to position r, i.e. with (r − r_{i})^{2}/V_{r} < ProbLim (default = 3.0). V_{r} is the `variance' of the weight (default value 1.0). This is similar to the method of Kabsch (1988).
A2.2. B factor
The Bfactor term is similarly derived from a smoothed function of `time' t (usually time is taken as equivalent to φ), with B factors defined at intervals (default interval Δt = 20° on φ).
where t_{0} is the initial time,
where B_{i} are the B factors at positions t_{i}, the summation is for (t − t_{i})^{2}/V_{B} < ProbLim and V_{B} is the smoothing weight (default = 0.5). Note the positive sign in the exponent arises because this is the inverse scale.
A2.3. Absorption correction
The absorptiοn term is derived from summing a series of spherical harmonic terms (Katayama, 1986; Blessing, 1995) as a function of the diffracted beam vector s_{2}, expressed either in the diffractometer frame or the crystal frame.
where C_{lm} are the coefficients to be determined and Y_{lm}(s_{2}) are the spherical harmonic functions. Harmonics up to order l_{max} = 4 or 6 are sufficient to give a good fit. The initial `1' in this equation is essentially the zerothorder term (l = 0). Ideally, absorption should be identical if the beam is reversed [i.e. S(s_{2}) = S(−s_{2})] which implies that terms with l odd should have zero coefficients, but inclusion of the oddorder terms allows for crystal miscentring and other approximations and provides a useful correction to errors in anomalous differences, since I^{+} and I^{−} observations then have different corrections applied, even for inversebeam experiments.
The coordinate frame used for s_{2} is not critical, but usually the diffractometer frame is used,
where B is the crystal orthogonalization matrix, U is the orientation matrix, Φ is the diffractometer rotation matrix and s is the diffraction vector
where s_{0} is the incidentbeam vector. The diffractometer frame
and the permuted crystal frame
where Q is a permutation matrix
To keep the absorption surface smooth in regions where there is no data, the coefficients C_{lm} are restrained to a value of 0 with a quadratic penalty function added to the total residual,
where the weight w_{s} = 1/σ_{s}^{2} with a default value of σ_{s} = 0.001. Otwinowski et al. (2003) have suggested using tighter restraints on high order terms,
A2.4. Tails, a correction for diffuse scattering
Diffuse scattering causes long tails on reflections and tails in the direction of rotation (φ) are often truncated by the integration program by an underestimate of the reflection width on φ, the `mosaic spread'. A spot may be integrated on one or more images: a fully recorded observation is integrated over a narrower rotation range than a partial, so will include less of the `tail' of the spot. This leads to a systematic difference between fulls and summed partials, a negative partial bias. SCALA implements a very crude correction for this systematic difference (Evans, 1997), based on some ideas from Blessing (1987).
Thermal diffuse scattering is proportional to the Bragg intensity J, so the measured intensity I = J(1 + α). The constant of proportionality varies with resolution (and may be anisotropic): it is modelled in SCALA as α = (sin θ/λ)^{2}α_{1}, where α_{1} is a refinable parameter. The width of the thermal diffuse scattering peak is assumed to be constant in a refinable parameter v. The peak may be modelled as a triangle of height h in the reciprocalspace coordinate q (Fig. 5), given by the fraction of the complete Jα = hv. If the total scan range from the start of the first image to the end of the last image is less than 2v, the diffuse scattering peak is truncated by the areas marked A_{1} and A_{2} in Fig. 5. We can calculate a correction to the equivalent full scan.
Intensity for full scan corrected for diffuse scattering is given by
Intensity for partial scan from point u_{1} to point u_{2} is given by
where
Correction factor (dividing scale factor) = [1 + α(1 − C_{1} − C_{2})]/(1 + α) = f(v, α_{1}).
A3. Estimation of errors
Integration programs such as MOSFLM (Leslie, 2006) produce good estimates of intensities, but the estimates of the individual errors are less reliable and are typically underestimated. After scaling, the error estimates can be improved by comparing the observed scatter between observations and the estimated standard deviation, making them equal on average. If the standard deviations σ(I_{hl}) are correct, then the normalized deviations δ_{hl} = (I_{hl}  )/σ(I_{hl}) (where is averaged over all observations of reflection h excluding the lth observation) should be distributed with a mean 0.0 and standard deviation 1.0. A simple correction to give improved error estimates is σ′(I_{hl}) = Sdfac[σ^{2}(I_{hl}) + (Sdadd g_{hl}〈I_{h}〉)^{2}]^{1/2}. These correction factors Sdfac and Sdadd have at least some physical justification: Sdadd allows for the fact that many potential errors are proportional to the true intensity, for example fluctuations in the incident beam and errors in the exact rotation. SCALA uses a default value of Sdadd = 0.02. The factor Sdfac is a more general correction for unknown errors, but includes uncertainty in the detector gain which converts detectorreadout values to photon counts which are used to estimate Poissonian errors. To determine Sdfac, SCALA uses a normal probability analysis (Abrahams & Keve, 1971; Howell & Smith, 1992) of δ_{hl} and sets factor Sdfac equal to the slope of the central part of the normal probability plot, thus forcing the slope to be 1.0 after correction. Using just the central part of the plot for this avoids fitting outliers in the distribution.
A5. Outlier rejection algorithm
Flowchart for rejection algorithm.

Acknowledgements
I thank George Sheldrick for many useful discussions on Laue group determination and dataquality analysis, Ralf GrosseKunstleve for his cctbx library and examples of how to use it, Airlie McCoy for advice on C++ programming and many other people with whom I have discussed data reduction over the years, including Andrew Leslie, Harry Powell, Eleanor Dodson, Jim Pflugrath, Gwyndaf Evans and Elspeth Garman.
References
Abrahams, S. C. & Keve, E. T. (1971). Acta Cryst. A27, 157–165. CrossRef CAS IUCr Journals Web of Science Google Scholar
Blessing, R. H. (1987). Crystallogr. Rev. 1, 3–58. CrossRef Google Scholar
Blessing, R. H. (1995). Acta Cryst. A51, 33–38. CrossRef CAS Web of Science IUCr Journals Google Scholar
Diederichs, K. (2006). Acta Cryst. D62, 96–101. Web of Science CrossRef CAS IUCr Journals Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Nature Struct. Biol. 4, 269–275. CrossRef CAS PubMed Web of Science Google Scholar
Diederichs, K., McSweeney, S. & Ravelli, R. B. G. (2003). Acta Cryst. D59, 903–909. Web of Science CrossRef CAS IUCr Journals Google Scholar
Evans, P. R. (1997). Proceedings of the CCP4 Study Weekend. Recent Advances In Phasing, edited by K. S. Wilson, G. Davies, A. W. Ashton & S. Bailey, pp. 97–102. Warrington: Daresbury Laboratory. Google Scholar
Fox, G. C. & Holmes, K. C. (1966). Acta Cryst. 20, 886–891. CrossRef CAS IUCr Journals Web of Science Google Scholar
GrosseKunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hamilton, W. C., Rollett, J. S. & Sparks, R. A. (1965). Acta Cryst. 18, 129–130. CrossRef IUCr Journals Web of Science Google Scholar
Howell, P. L. & Smith, G. D. (1992). J. Appl. Cryst. 25, 81–86. CrossRef Web of Science IUCr Journals Google Scholar
Kabsch, W. (1988). J. Appl. Cryst. 21, 916–924. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kabsch, W. (2000). In International Tables for Crystallography, Vol. F, edited by M. G. Rossmann & E. Arnold. Dordrecht: Kluwer Academic Publishers. Google Scholar
Katayama, C. (1986). Acta Cryst. A42, 19–23. CrossRef CAS Web of Science IUCr Journals Google Scholar
Le Page, Y. (1982). J. Appl. Cryst. 15, 255–259. CrossRef CAS Web of Science IUCr Journals Google Scholar
Leslie, A. G. W. (2006). Acta Cryst. D62, 48–57. Web of Science CrossRef CAS IUCr Journals Google Scholar
Otwinowski, Z., Borek, D., Majewski, W. & Minor, W. (2003). Acta Cryst. A59, 228–234. Web of Science CrossRef CAS IUCr Journals Google Scholar
Read, R. J. (1999). Acta Cryst. D55, 1759–1764. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rossmann, M. G. & van Beek, C. G. (1999). Acta Cryst. D55, 1631–1640. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Weiss, M. S. & Hilgenfeld, R. (1997). J. Appl. Cryst. 30, 203–205. CrossRef CAS Web of Science IUCr Journals Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.