## research papers

## Multiparametric scaling of diffraction intensities

^{a}Department of Biochemistry, University of Texas, Southwestern Medical Center at Dallas, TX 75390-9038, USA, and ^{b}Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA^{*}Correspondence e-mail: zbyszek@work.swmed.edu

A novel and general approach to scaling diffraction intensities is presented. The method minimizes the disagreement among multiple measurements of symmetry-related reflections using a stable

procedure. The scale factors are described by a flexible exponential function that allows different scaling corrections to be chosen and combined according to the needs of the experiment. The scaling model presented here includes: scale and temperature factor per batch of data; temperature factor as a continuous function of the radiation dose; absorption in the crystal; uneven exposure within a single diffraction image; and corrections for phenomena that depend on the diffraction peak position on the detector. This scaling model can be extended to include additional corrections for various instrumental and data-collection problems.Keywords: scaling; absorption; diffraction; exponential modelling.

### 1. Introduction

A natural starting point in the complex analysis of measurements is to define the data model that predicts observations based on the physics of the experiment. The purpose of data analysis is to solve the inverse problem – how to obtain the parameters of the physical model from the measurements. In crystallography, the inverse problem has many steps, and this article addresses one of them – determination of the scaling model in the diffraction data reduction process.

The basic equation that describes measured intensity of an *hkl* reflection is (Guinier, 1994)

where *I*_{b} is the flux density of the primary beam; *r*_{e} = *e*^{2} /*m*_{e} *c*^{2} is the classical electron radius (2.818 × 10^{−15} m); is the wavelength; is the cross product between the diffraction vector and (projection of the crystal rotation-speed vector on the plane perpendicular to the primary beam), term is known as the Lorentz factor; *P* is the polarization factor (Azároff, 1955); *T* is the transmission [absorbance is defined as *A* = - ln*T*, *T* = exp( - *A*)]; *v*_{u} is the volume of a primitive crystal *V* is the volume of the crystal exposed by the beam; is the square of the structure-factor amplitude for a given reflection *h**k**l*; *D*_{A} is the X-ray absorption in the detector's active material; *D*_{S} is the detector's response to a single absorbed X-ray photon; it may depend on the wavelength, the position in the detector, the incidence angle *etc*.

To derive , all components of (1) need to be determined. The product of all factors that multiply the structure-factor amplitude squared is the total scale factor:

where *I* is the intensity and *K* is the total scale factor.

In principle, all scaling components in (1) can be calculated from non-diffraction measurements and calibration of the data-collection system. The total scale factor of a particular observation can be expressed as

Typically, only relative scale factors, *k*_{r}, are calculated and an overall scale factor *k*_{o}, common to all measurements, is determined by comparing the scaled data to the squared structure-factor amplitudes predicted from an atomic model. The overall scale factor describes the typical lack of knowledge about the absolute calibration of the whole system (Evans, 1993).

We always assume that some components of scale factors are known from detector calibration, beam monitoring and the diffraction geometry; *k*_{c} represents their product. The scaling procedure described here determines only the remaining part of the relative scale factors, *k*_{s}:

In many cases, quantities needed to describe *k*_{s} are calculated based on the knowledge from subsequent stages of crystallographic analysis. In such cases, scaling and subsequent steps should be done iteratively, with reasonable guesses for starting values (see §§4.1 and 4.3).

### 2. General scaling method

The method presented here is a generalization of scaling methods where parameterized scaling corrections were applied, for example, using scale and temperature factor per batch of data (Fox & Holmes, 1966; Stuart & Walker, 1979; Evans, 1993). The presented method applies to crystallographic scaling the concept of exponential modeling (Della-Pietra *et al.*, 1997). Scale factors for individual measurements are calculated using a set of parameters *p*_{i}:

*f*_{i} is a pre-defined modeling function of experimental conditions for a given observation and *p*_{i} specify the *a priori* unknown parameters. The *i* is shorthand for the hierarchical index, which describes here both the type of correction for a given physical effect and indices of parameters for functions describing this correction. Some of the functions *f*_{i} in (6) can be derived directly from the description of a particular physical effect, for example an error of individual amplifier gain in multireadout channel CCD detectors (see §2.1.2). To represent an unknown but smooth-shaped function, for example one describing absorption in the crystal (see §2.1.3), the combination of the basis functions *f*_{i} and a group of parameters can be used. The values of smooth functions rather than the values of the *p*_{i} coefficients can be interpreted in physical terms.

The advantage of the exponential modeling approach is its flexibility when dealing with correlations among parameters and a uniform description of the parameter optimization process for various scaling models.

#### 2.1. Parameterization

Scale factors are used to describe crystal diffraction, the experimental system and to correct for potential approximation errors during the integration step.

On the crystal diffraction level, the main problem is absorption in the crystal, before and after diffraction (Kopfmann & Huber, 1968; Huber & Kopfmann, 1969; Stuart & Walker, 1979; Schutt & Evans, 1985).

Typical instrumental problems are: beam fluctuations (Stupakov & Heifets, 2002); the error in the position of the rotation axis; inaccuracy of detector calibration (Tate *et al.*, 2000); mechanical problems resulting in non-uniform crystal rotation and shutter errors (Evans, 1993).

Integration programs make certain assumptions about diffraction spot shape and size. Crystal mosaicity and changing spot shape may result in systematic underestimation of the diffraction intensity owing to reflection tails extending past the assumed shape. Imperfect profile shape predictions also result in systematic effects (Diamond, 1969; Leslie, 1999).

Parameterization of a particular source of an effect described by a scale factor may be quite effective in describing other physical phenomena or approximation errors, and for this reason refined values of parameters may not have a simple physical interpretation. For example, beam intensity corrections may also correct for changes in exposed volume of the crystal.

##### 2.1.1. Error in the Lorentz factor

An example of the application of a modeling function is correcting for a small discrepancy between the actual and assumed directions of the crystal rotation axis. The inaccuracy results in an error in the calculated value of the Lorentz factor. The scale factor to correct this error can be described using the parameter *p*_{l}, whose value represents a small angular error, and the corresponding function is

##### 2.1.2. Corrections that depend on the Bragg-peak position on the detector

Detector-specific corrections are important for measurements of a very weak phasing signal. A very small correction, of about 1% in magnitude, can be critical for phasing from weak anomalous scatterers. Even if the detector sensitivity has been properly calibrated, it may have changed with time. For detectors with multichannel read-out, the most likely source of the above problem is a relative change in amplifier gains. The amplifier corrections are represented by characteristic functions corresponding to CCD areas read out by separate channels:

The parameters *p*_{da,j} are logarithms of relative changes in amplifier gains.

Other detector corrections require a smooth function that can be described by a series of basis functions. One possibility is to use two-dimensional Fourier–Bessel series (Weissman, 1982); alternatively, one can use two-dimensional Chebyshev polynomials or two-dimensional cosines (Boyd, 1991). The last option transforms detector coordinates (*x*,*y*) onto the square . For transformed values of coordinates, the modeling function is

where *d*,*n*,*m* are components of the hierarchical index *i* from (6).

The above functions have been used to effectively correct for the signal decay during the imaging plate read-out. To correct for differences between plates in two-plate scanners, an additional index is necessary to describe separate corrections for individual plates.

Smooth functions dependent on the spot position might correct also for other phenomena, *e.g.* the non-rotating component of absorption, the incorrect polarization and systematic errors in integration. In diffraction data collected with a large oscillation range from mosaic crystals, reflections diffracting along the rotation axis have a distorted spot shape. Such spot-shape variability results in an integration error that decreases with the distance from the projection of the rotation axis onto the detector. A function that was found useful in correcting for this effect is

where *f*_{l} is a function describing error in the Lorentz factor [equation (7)] and *a* is a stabilizing factor (reasonable value *a* = 5).

In the case of anisotropic mosaicity, this correction can be made asymmetric with respect to the sign of *f*_{l} (left/right distinction for horizontal rotation axis). This can be accomplished by using two functions instead of the function in (10):

where *step* is a step function with zero value for negative arguments and one for positive.

##### 2.1.3. Correction for the absorption component that rotates with the crystal

Owing to a different geometry of diffraction for every reflection, the absorption in the crystal is not uniform. A frequently used approximation of crystal absorbance is the average of absorbances in the incoming- and diffracted-beam directions (Kopfmann & Huber, 1968). The crystal absorption can be parameterized by real spherical harmonics in the rotating crystal coordinate system (Katayama, 1986; Blessing, 1995):

where *a**s*,*l**m* and *a**c*,*l**m* are parts of the hierarchical index *i* from (6); *l*,*m* are indices of spherical harmonics; *P*_{lm} are Legendre polynomials; are polar coordinates of the incoming (index *i*) and the outgoing (index *o*) directions in the crystal coordinate system.

Owing to absorption being the same in opposite directions, the odd-order spherical harmonics should have zero coefficients and can be omitted in the description of pure absorption. Nevertheless, owing to correlation with other effects, the scaling results can be improved by the inclusion of low-order odd harmonics.

##### 2.1.4. Overall scale per batch of data

This scale factor describes the product of different physical effects: beam intensity, illuminated crystal volume and average absorption. It has a simple modeling function:

The parameter *p*_{sj } is the logarithm of the scale factor for batch *j*.

##### 2.1.5. Decay described by *B* factor

Resolution-dependent crystal decay can be described by a temperature factor that is a function of the accumulated radiation dose. This dependence can be stated explicitly:

where *f*_{pb} for *n* = 1 describes a linear dependence and higher-order terms can be added to describe more complex behavior of radiation decay. Alternatively, the traditional approach is to apply temperature factors separately to batches of data:

Parameter *p*_{bj } is the temperature factor of batch *j*.

##### 2.1.6. Uneven crystal rotation and/or exposure

The rotation method (Arndt & Wonacott, 1977) assumes that the exposure is constant in the angular range. The factors that can result in non-uniform exposure are: gear misalignment in the crystal goniostat, beam fluctuations and X-ray shutter timing errors. Uneven exposure can be reproducible from image to image and may also have a random component. The correction for the reproducible part of exposure fluctuations is well determined and its description needs relatively few parameters. The correction for the random component can be problematic owing to a large number of parameters needed to describe it. Random fluctuations should be eliminated rather than corrected for by scaling. The main value of correcting for random changes may lie in its usefulness as a powerful diagnostic tool to detect under-recognized instrumental problems.

Uneven exposure can be described by a function series based on the sine and cosine of the goniostat angle corresponding to a diffraction condition for the center of a Bragg peak. The correction of uneven exposure has to average the exposure variations over the angular range in which a particular reflection diffracts. The angular range depends not only on mosaicity but also on the geometry of a reflection crossing the Ewald sphere.

The reproducible unevenness of exposure due to mechanical backlash or a shutter error can be described by a series of frequencies *g* = *n**t*, where and is the rotation range. For a particular periodicity *g*, the modeling functions are:

where

For a Gaussian mosaicity profile, the averaging factor *A* should be a Gaussian function that depends on the coefficient *c*. The expression for *A* approximates the Gaussian function effectively but it can also approximate averaging resulting from other mosaicity shapes. The averaging resulting from the mosaicity shape function is described by two parameters, *a* and *b*, that tend to have values close to one and zero, respectively. For unusual shapes of intensity profiles, *a* and *b* may need to be optimized. To describe random fluctuations, one can use the term , where *g* = *n**t* /2 and *n* goes from 1 to a maximum value. Parameters *p*_{u,g,c,j} are independent for every frame of data *j*. The term corresponding to *n* = 0 is the same as the scale factor per batch of data (see §2.1.4).

The consequence of averaging of the exposure variations is the reduced impact of the fluctuations with very high frequencies. For example, a stepper motor creates fluctuations corresponding to the step size of the motor divided by the gear ratio, typically 0.01° or less. For crystals of typical mosaicity, the consequences of such high-frequency fluctuations are negligible. Fluctuations of very low frequencies are corrected by the scale factor per batch *j*. The fluctuations of intermediate angular scale are the ones most affected by (16).

### 3. Optimization of scale factors

The scaling procedure determines the parameters *p*_{i} [equation (6)] by a method where a -like target function is minimized. The target function describes the agreement between symmetry-related reflections weighted by the expectation of measurement errors. Two similar target functions can be generated, depending on whether the intensities or their logarithms are compared.

#### 3.1. Logarithmic target function resulting in a fully linear equation

A function that works very well in practice and whose optimization always converges in one cycle is obtained from comparing the logarithms of scaled measurements (Rae, 1965). First, we define *r* as the logarithm of the unknown scale factor *k*_{s}:

then *P*_{m} is the logarithm of the measured intensity *I*_{m} for a given observation:

where *P*_{s} is the logarithm of a scaled observation.

The intensity error estimate can be used to calculate the error estimate of the logarithm of the intensity and the corresponding weight *u* in the least-squares method:

The logarithm of the geometrical average of symmetry-related observations *P*_{av} [equation (20)] is

where *n* is the index of symmetry-related observations of a unique *hkl*. In the target function, we minimize the squared difference between the logarithms of scaled intensity and the scaled geometrical average of symmetry-related measurements:

The minimum of the target function can be found by using multivariate Newton methods. The first and second derivatives are straightforward to calculate. The first derivative is calculated with respect to *r* and then, using the chain rule, the derivative with respect to parameters *p*_{i} is calculated:

The second derivative with respect to *r* is

Because *r* is a linear function of unknown parameters *p*_{i}:

the first derivative of the target function with respect to *p*_{i} is

It is important to note that the second derivative with respect to *p*_{i} for the logarithmic target function is constant – independent of the parameters *p*_{i} [equations (23) and (25)]. In such a case, the solution is given by a single matrix equation, and there is no need to make initial guesses for parameter values.

#### 3.2. Alternative of scale factors

The traditional approach to finding scale factors is based on an equation where the inverse rather than the logarithm of the scale factor appears and where we use the arithmetic rather than the geometric average of intensities:

where *k*_{s} is the scaling factor, is the estimated error of *I*_{m} and is the mean intensity of reflections for a given *hkl*.

The calculation of the first and then the second derivative with respect to *k*_{s} follows the same logic as in the case of a logarithmic function. First derivative:

and second derivative:

where .

However, the formula

when used as a second derivative in the Newton method, gives better convergence because it better approximates the second derivative at the minimum of the target function.

Since in this case the second derivative is not constant with respect to *k*_{s}, the Newton method has to be used iteratively.

It is not clear if or when scale factors determined by logarithmic or traditional approaches are better, but even in the case when one prefers the traditional approach, it is preferable to run the first cycle with the logarithmic function owing to its superior convergence.

#### 3.3. The description of stabilization based on the prior knowledge of the magnitude of corrections

For reflections with no redundancy, scale factors are extrapolated from other measurements. Generally, without additional restraints, the extrapolation is unstable for highly correlated parameters. In the method presented here, the restraints can be obtained from the prior knowledge about the reasonable magnitude of refined parameters. The logic behind it is the same as that behind the restraints in the atomic *w*_{s} between frames, where expectation about *w*_{s} is a function of the data-collection stability (beam stability, goniostat and/or crystal vibrations). This knowledge is described by adding a penalty term to the functions being optimized:

where *p*_{sj } is a logarithm of the scale factor for batch *j*.

We can similarly treat the expectation about the magnitude of absorption coefficients. For a smooth absorption with the expectation of decreasing magnitude of parameters for high orders of spherical harmonics [equation (21)], a reasonable penalty term added to the target function [equation (21)] is

If we do not want to penalize high-order terms more, we can use the following term:

In a similar fashion, penalty terms can be created and used to restrain all the other scaling parameters.

### 4. Discussion

#### 4.1. Success criteria

The main criteria of scaling effectiveness come from subsequent stages of analysis: merging, phasing *etc*., described elsewhere. During scaling itself, we can only judge the convergence and the value of the target function – overall .

The method described above, based on the logarithmic target function, always converges in one cycle. Thus, the next step – merging – can always be accomplished, even with problematic data. A high value of overall in itself does not differentiate whether it is due to anomalous signal, low quality of data or an unreasonable error model. Comparing scaled symmetry-related data allows for the calculation of various statistics in order to identify potential problems with data quality, with the error model or the existence of non-isomorphism in the diffraction.

Potentially, merging analysis may separate sources of variability in the structure factors owing to anomalous and dispersive effects, radiation-induced changes, non-isomorphism among crystals, and ; Otwinowski & Minor, 2000). Redetermination of the error model, postrefinement and rejection of outliers may impact the assumptions about the data to be scaled, so it maybe worthwhile to redo the scaling after the execution of these steps.

In the case of structure factors not being constant, it is usually better to ignore this variability during determination of the scale factors and to scale the reflections but separate them during merging (Evans, 1993In the following example, the structure factors were stable and the measurement errors were lower than in most experiments. Traditional scaling with scale and temperature factor per batch of data resulted in better than average merging statistics, but still with a rather low value of the significance ratio for ).

from intrinsic sulfur. Application of the above-described scaling corrections improved the merging statistics and resulted in a significant anomalous signal for medium resolution (Table 1#### 4.2. Global and local scaling

Global scaling can be followed by local scaling (Matthews & Czerwinski, 1975). Local scaling is mostly applied to calculate differences of the phasing signal, where it is assumed that a group of measurements, for example those close together in the or in the detector space, should be on a similar scale. A flexible parameterization by the exponential modeling allows for a good description of all kinds of smooth corrections. Local scaling is much more limited in terms of what type of smooth variation is being corrected for, so it is unlikely to provide additional benefit to the general scaling method described here.

#### 4.3. Related problems non-correctable by multiplicative scaling

The basic equation (1) assumes a crystal with a level of imperfection that results in negligible extinction. More perfect crystals require a non-linear correction for extinction. However, crystals often are more than `ideally imperfect' (Guinier, 1994). The departure from crystallinity may result in diffuse scattering, extra modulation (incommensurate structure), stacking disorders, large mosaicity that produces spot overlap, *etc*. The sample may also be contaminated with other crystals or ice. These problems may require further analysis with already scaled data but with results that can impact the assumption about the data to be scaled. As in the case of changes in the error model or outliers rejection, iterations may be needed.

One of the purposes of scaling is to correct for a reasonable level of imperfection in the measurement system. Such a procedure may also attempt to correct severe problems resulting in non-obvious behavior of the determined scale factors. For example, an incorrect or missing description of the beamstop produces similar effects to a very large absorption in an area of

The smoothly varying absorption correction will produce a high estimate of absorption not only for the area behind the beamstop but also for reflections at some distance from the beamstop. Such problems should not be corrected by scaling, but rather by applying the proper experimental procedure.#### 4.4. Detector calibration

The purpose of scale determination can be the calibration of the instrument rather than the *et al.*, 2000).

### 5. Summary

The

from a single-crystal diffraction experiment requires scale factors that relate measured and calculated intensities. We describe here a scaling procedure based on exponential modeling that is very effective in correcting for factors that are difficult to calculate from the physical description of the experiment. Because such factors always exist in macromolecular crystallography owing to the complex geometry of absorption around the crystal and owing to radiation decay, this kind of scaling is in practice necessary. It is also effective in small-molecule crystallography.The power of the described method comes from the absolute stability of the ; Rossmann *et al.*, 1979; Evans, 1993) can be accommodated into an exponential approach. The parameterizations described here are being implemented in *Scalepack* (Otwinowski & Minor, 1997, 2000). This program corrects for a large group of problems that have been observed to be significant, though typically not all problems in every case. The development of the parameterization was driven by the recognition of problems encountered in practice. This process is likely to continue. The remarkable success of applying the corrections discussed in this article requires a description of the problems in the particular experiments, a subject of separate publications.

The parameters in the scaling model should be chosen on the basis of the aim of the experiment and the understanding of what is the most significant scaling problem likely to be encountered. The choice of parameterization is particularly important when obtaining the phasing signal is the target of the experiment. For typically sized crystals, errors in the measurements of strong reflections are mostly multiplicative, so in principle an appropriate scaling model can correct them. Carefully conducted experiments with proper scaling should allow for SAD structure solution from sulfur and other weak scatterers to be more than a rare occurrence.

### Acknowledgements

This work was supported by grant GM53163 from the National Institutes of Health. The authors would like to thank Janet Smith and MinSun LeeSong for providing diffraction data presented here, Halszka Czarnocka for her help with the preparation of the manuscript and Mischa Machius for his extensive comments and stimulating discussions.

### References

Arndt, U. W. & Wonacott, A. J. (1977). *The Rotation Method in Crystallography*. Amsterdam: North Holland.

Azároff, L. (1955). *Acta Cryst.* **8**, 701–704. CrossRef CAS IUCr Journals Web of Science

Blessing, R. H. (1995). *Acta Cryst.* A**51**, 33–38. CrossRef CAS Web of Science IUCr Journals

Boyd, J. P. (1991). *Chebyshev and Fourier Spectral Methods.* Mineola, NY: Dover.

Della-Pietra, S., Della-Pietra, V. & Lafferty, J. (1997). *IEEE Trans. Pattern Anal*. **19**, 380–393. CrossRef Web of Science

Diamond, R. (1969). *Acta Cryst.* A**25**, 43–55. CrossRef CAS IUCr Journals Web of Science

Evans, P. R. (1993). *Data Reduction: Data Collection and Processing*, edited by L. Sawyer, N. Isaacs & S. Bailey. Proceedings of the Daresbury CCP4 Study Weekend, pp. 114–123. Daresbury Laboratory, Warrington, England.

Fox, G. C. & Holmes, K. C. (1966). *Acta Cryst.* **20**, 886–891. CrossRef CAS IUCr Journals Web of Science

Guinier, A. (1994). *X-ray Diffraction in Crystals, Imperfect Crystals, and Amorphous Bodies.* New York: Dover.

Huber, R. & Kopfmann, G. (1969). *Acta Cryst.* A**25**, 143–152. CrossRef CAS IUCr Journals Web of Science

Katayama, C. (1986). *Acta Cryst.* A**42**, 19–23. CrossRef CAS Web of Science IUCr Journals

Kopfmann, G. & Huber, R. (1968). *Acta Cryst.* A**24**, 348–351. CrossRef IUCr Journals Web of Science

Leesong, M., Henderson, B. S., Gillig, J. R., Schwab, J. M. & Smith, J. L. (1996). *Structure*, **4**, 253–264. CrossRef CAS PubMed Web of Science

Leslie, A. G. W. (1999). *Acta Cryst.* D**55**, 1696–1702. Web of Science CrossRef CAS IUCr Journals

Matthews, B. W. & Czerwinski, E. W. (1975). *Acta Cryst*. A**31**, 480–487. CrossRef CAS IUCr Journals Web of Science

Otwinowski, Z. & Minor, W. (1997). *Processing of X-ray Diffraction Data Collected in Oscillation Mode*. In *Methods in Enzymology*, Vol. 276, edited by J. C. W. Carter & R. M. Sweet. New York: Academic Press.

Otwinowski, Z. & Minor, W. (2000). *Denzo and Scalepack*. In *International Tables for Crystallography*, Vol. F, edited by M. G. Rossman & E. Arnold. Dordrecht: Kluwer.

Rae, A. (1965). *Acta Cryst.* **19**, 683–684. CrossRef IUCr Journals Web of Science

Rossmann, M. G., Leslie, A. G. W., Abdel-Meguid, S. S. & Tsukihara, T. (1979). *J. Appl. Cryst.* **12**, 570–581. CrossRef CAS IUCr Journals Web of Science

Schutt, C. E. & Evans, P. R. (1985). *Acta Cryst.* A**41**, 568–570. CrossRef CAS Web of Science IUCr Journals

Stuart, D. & Walker, N. (1979). *Acta Cryst.* A**35**, 925–933. CrossRef CAS IUCr Journals Web of Science

Stupakov, G. & Heifets, S. (2002). *Phys. Rev. Spec. Top.* Ac. 5, No. 054402.

Tate, M. W., Eikenberry, E. F. & Gruner, S. M. (2000). *CCD Detectors*. In *International Tables for Crystallography*, Vol. F, edited by M. G. Rossman & E. Arnold. Dordrecht: Kluwer.

Weissman, L. (1982). *Strategies for Extracting Isomorphous and Anomalous Signals. Computational Crystallography*, edited by D. Sayre. Oxford: Clarendon Press.

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.