The modular small-angle X-ray scattering data correction sequence

A data correction sequence is presented, consisting of ordered elementary steps that extract the small-angle X-ray scattering cross section from the original detector signal(s). It is applicable to a wide range of samples, including solids and dispersions.


Introduction
Attaining a high standard for data quality is paramount for any detailed analysis.This is of great importance for small-angle scattering in particular, where the largely featureless scattering patterns may easily be over-or under-fitted by an inexperienced user.Therefore, the provision of a consistent set of data corrections, which will also put well-founded uncertainty estimates on the resulting values, is a necessary addition to any small-angle scattering laboratory.Previous work on data correction procedures tended to follow an integral or ad-hoc approach, incorporating a limited subset of the available corrections, offering little flexibility or chances for tracing the effects of every individual correction (Stothart, 1987;Strunz et al., 2000;Dreiss et al., 2006).
A modular approach consisting of a sequence of elementary data correction steps would allow laboratories to select the subset of importance for their experiments or instruments, while allowing rapid evaluation of the significance.While most of the individual data correction steps that can be considered to achieve high quality data have been comprehensively collated before (Pauw, 2013), the recommended sequence in which they can be applied has not been published.This was due to a previous lack of software that might benefit from such a scheme, and because the sequence was still under development at the time.
With the recent emergence of various comprehensive, modular data correction software packages (Basham et al., 2015;Filik et al., 2017;Arnold et al., 2014;Solé et al., 2017;Benecke et al., 2014;Nielsen et al., 2009;Girardot et al., 2010Girardot et al., -2017;;Taché et al., 2015Taché et al., -2017)), establishing a recommended starting point for implementing such a data correction schema seems pertinent.This schema can be used as the core of a data correction software package, or as a reference correction sequence against which (faster) alternatives can be proven.It is hoped that adherence to this schema will improve the comparability of results obtained at different instruments.
IUCr macros version 2.1.10:2016/01/28 Over the last few years, the schema has been developed, tested and refined in practice, on both laboratory-as well as synchrotron-based SAXS instruments, with its modular nature making it easy to trace and verify the effect of each individual correction step on the detected signal and its uncertainties.In particular, this data correction scheme has been developed for modern instruments, and a direct-detection, photoncounting detector is highly recommended in order to achieve the best results.In this work, the use of a photon-counting detector is implicit, as the data correction steps necessary to compensate for the other detector types' inadequacies have been omitted for brevity.
Herein the schema will be presented, its individual correction's abbreviations briefly described, and the reasoning behind the placement will be clarified.

The Schema
The recommended data correction schema for solids and dispersions is presented in Figure 1.For solid samples, this schema results in the scattering power of the solid in absolute units.For dispersions, both the solvent scattering as well as the sample scattering are obtained in absolute units.The solvent scattering can then be used for future samples by adding this to a solvent scattering library.The disadvantage of this approach is that the uncertainties of several operations are added twice to the output for dispersions (Output C), such as the uncertainties on the flatfield and polarization corrections.
Note that for Process B or C for dispersions, the subtracted capillary signal used from Process A should be the same.That means that to obtain the solvent scattering cross-section, the same capillary should be used for Process A and B, and to obtain the dispersion scattering cross-section, the same capillary should be used for Process Output B for solids is the corrected data in absolute units, for dispersions it is the solvent scattering in absolute units.Output C for dispersions is the sample scattering in absolute units.The azimuthal averaging step can be considered for isotropically scattering samples.

The Steps and Reasoning Behind the Sequence
The mathematical expressions for each of the corrections below are described in (Pauw, 2013).Here, we focus on the justification of the steps and highlight the position dependency of some of them.
• DS (Data Read-in): Before starting any data corrections, the data must be read in correctly, where necessary compensating for the data storage peculiarities (Knudsen et al., 2013).
• MK (Masking): Invalid pixels are masked so they are not considered in the following corrections.
• PU (Poisson Uncertainty Estimator): The Poisson (counting) uncertainty needs to be calculated on the number of detected photons, and therefore is carried out before the deadtime, darkcurrent or flatfield corrections.
• DT (Deadtime): The signal is subsequently corrected for deadtime, returning the estimated number of photons arriving at each pixel based on the detected countrate.
• DC (Darkcurrent): The subtraction of natural background radiation (including the steady flow of cosmic rays) forms the dominant component of the darkcurrent correction.With the aforementioned detector conditions, we should not see any significant contribution of the time-independent and flux-dependent darkcurrent components.
• TI (Time): A normalization to make the measurement independent of time.
• FL (Flux): A normalization to make the measurement independent of incident beam flux.
• TR (Transmission): A scaling correction, correcting for the probability of absorption (and only absorption) within the sample.The transmission should, ideally be calculated by dividing the flux of the transmitted, scattered and diffracted IUCr macros version 2.1.10:2016/01/28 signals by the incident flux.Note that the quality of the result is very strongly dependent on the quality of the transmission factor (in particular when the background subtraction operation is applied), and an accuracy of >99% should be aimed for.
• SA (Self-absorption): The sample self-absorption is the correction for the increased probability of scattered rays to be absorbed as they travel through slightly increased amounts of sample after the scattering event.This correction needs to be performed after the transmission correction: it represents a directiondependent modification to the transmission correction, and does not replace the TR correction itself.It is feasible to implement and use for samples of platelike geometry (only), with the plate surface perpendicular to the X-ray beam direction.It is related, and therefore placed next to the transmission correction.
• BG (Background subtraction): The subtraction of the background signal is calculated only after the measurement-dependent corrections have taken place, as the various parameters (transmission, flux time, and therefore darkcurrent in particular) may differ.
• FF (Flatfield): The flatfield correction, a multiplication matrix normalized to 1, corrects for interpixel sensitivity differences.This is the last of the corrections for detector imperfections.
• AE (Angular Efficiency): This correction compensates for variations in the detector efficiency depending on the photon angle of incidence onto the detector surface.It is detailed in Appendix B.
• SP (Solid-angle): A (geometric) correction for the solid angle subtended by each pixel.This can be calculated on the basis of the instrument geometry alone.
• PO (Polarization): The polarization correction affects the probability of scattering events, both for polarized and unpolarized beams.In the latter case, it IUCr macros version 2.1.10:2016/01/28 is a radially uniform (isotropic) correction.The polarization correction is performed before the background subtraction for dispersions, so that older solvent measurements can still be used for correction.
• TH (Thickness): The thickness correction normalizes the data to units of reciprocal lengths.Note that the thickness used in this correction is the thickness of the solid sample or the liquid phase for dispersions only.A derivation for this is provided in Appendix C.
• AU (Absolute Units): The absolute units correction scales the data to units of scattering cross-section, the fraction of radiation that is scattered per length of material per solid angle.This is commonly reported in units of • DV (Displaced Volume): This correction has not been included in the original work, but is described in Appendix A. This correction can be done for dispersions with high volume fractions of analyte, but must be done on the solvent scattering signal only.
• AV (Averaging): This optional step reduces the dimensionality and size of the data, typically from 2D to a limited number of datapoints in 1D.This can be done azimuthally (to obtain [ dΣ dΩ ] versus Q), or radially ([ dΣ dΩ ] versus χ).The azimuthal averaging is suitable for isotropic data, whereas the radial averaging is typically applied to anisotropic data, over a limited radial range, to extract degree of orientation; as is commonly utilised in fibre diffraction experiments.
The averaging from 2D to 1D are the last steps as 1) they are optional, and 2) the background subtraction process in particular can subtract anisotropic signals such as flares.In that case, the uncertainty is improved if the operation is done in 2D rather than after averaging.
Many of the corrections are multiplications, and therefore follow the law of commu-IUCr macros version 2.1.10:2016/01/28 tation.The corrections that can be commutated have been grouped together where it is reasonable to do so such that a commutation would not affect the result.The commutability becomes clear when we write Process B as a pseudo-equation, with a → indicating a more involved operation, a − indicating a subtraction, and a × indicating a multiplication operation with either a scalar or a vector: In order to further reduce the propagated uncertainties, the sequence for dispersions could be modified by postponing the flatfield, angular efficiency, polarization and solidangle corrections in Process B and C, until after the second background subtraction in Process C is performed.This would reduce the uncertainties, as they are then only added once instead of twice (only for the sample, as opposed for both solvent and sample).Furthermore, if a flow-through cell is used, the thickness correction and absolute intensity scaling can also be postponed.While this seems desirable, the penalty is a drastic loss of generality: in this case the solvent scattering signal is no longer obtained in absolute units, thus reducing its value for future use.Conversely, in the recommended scheme, any stored solvent signal can be used for future data correction of an appropriate dispersion, significantly reducing overhead.It is, therefore, recommended to determine the flatfield, polarization and instrumental geometry with sufficient accuracy, so that the resultant increase in uncertainty can be kept to a minimum.

A further practical modification
In practice, the flux and transmission corrections can be combined.We define the transmission factor T = I 1 I 0 , with the incident flux denoted as I 0 , and the emergent flux (the sum of the transmitted, scattered and diffracted radiation) as I 1 .Then, defining the prior detected signal I p ( Q) and flux-and transmission-corrected signal I c ( Q), we get: Combining these operations ostensibly negates the need for an upstream intensity monitor, to the great relief of many instrument scientists.However, as the transmission factor is still to be known for the self-absorption correction, their elation is likely to be short in duration.

Instrumental effects for consideration in the analysis rather than the data corrections
There are some effects which are, unfortunately, best considered in the scattering pattern analysis procedure rather than in the data correction procedure.There are three effects: The resolution function smearing, the multiple scattering effect, and the scattering length density contrast.We will discuss each of these briefly.
The resolution effect originates from the uncertainty in the scattering vector for each individual photon.Some of the origins of these uncertainties are well-defined, such as finite beam size and divergence, and the scattering vectors for an ensemble of photons will, therefore, exhibit a well-defined spread.This is known as the resolution function, and this can, in principle, be corrected for.The procedure to do this can be likened to a "sharpening" procedure in image processing, and carries the risk of introducing artifacts due to its ill-posed nature.As it is more prominent in the neutron IUCr macros version 2.1.10:2016/01/28 scattering field, a workable solution has been developed already: the mathematically safer method for including the resolution contribution is to include the resolution function in the analysis.By convoluting, or "smearing" the model intensity with the resolution function, the problem is tractable, and can be taken into account without reservation (Rennie et al., 2013).
The same holds for the multiple scattering contribution (Warren & Mozzi, 1966).This is the probability that photons are scattered twice or multiple times, and is directly related to the scattering probability of a material for the energy used and its thickness.The multiple scattering contribution is hard to correct for in the original data.It is much easier to convolute the scattering pattern with the multiple scattering effect and likelihood, and to take it into account in that manner (Rennie et al., 2013).
The last effect is the energy dependence of the scattering length density contrast.
This energy dependence implies that, while the scattering vector is described independent of the energy, the scattering intensity will still be correlated, particularly near to absorption edges.There is, to our knowledge, no current solution for this, and information on the used energy must, therefore, always accompany a scattering curve.

Conclusions
We have presented a comprehensive data correction sequence, which can be used as the core of a software implementation, or as a reference correction sequence against which other, faster implementations can be proven.The sequence is chosen so that it returns useful information, in particular for dispersions, where both the absolute scattering signal from solvents as well as the analyte is obtained in as separate output signals.
By presenting this schema, we hope to encourage unity and consistency in the worldwide data correction efforts, to the betterment of the small-angle X-ray scattering IUCr macros version 2.1.10:2016/01/28 to the signal from the solvent.Proteins in solution and micellar systems are a prime example, but also dispersed polymers and vesicles may be affected.

Appendix B The Angle-dependent efficiency correction: AE
One additional correction can be considered, which takes into account the variation in detection probability of a photon passing through the detection layer at various angles (Zaleski et al., 1998).When a photon passes through the detector at an angle perpendicular to the sensor surface, its detection probability is proportional to the absorption probability.This is, then, a function of the linear absorption coefficient (and thus the photon energy and sensor material), and the thickness of the sensor layer.If the photon were to pass through the detector layer at an angle, the amount of material it passes through is greater, and the detection probability increases1 .This means that the detection efficiency of a photon is greater when it arrives at oblique angles rather than perpendicular to the surface.This angle-dependent efficiency correction could be considered part of the flat-field response correction of the detector.Its source, however, is not due to detector imperfections, but lies in the instrument geometry coupled with the detector sensor thickness, and can, therefore, be considered separate.Since its magnitude can be easily estimated, it is straightforward to take it into account.If we rewrite the derivation from Zaleski et al. (1998) to let K represent the mass energy-absorption efficiency of a detector surface of thickness d as a function of the angle of incidence α of a photon to the detector surface normal, we get: Where µ en is the mass energy-absorption coefficient for silicon for a given energy.
The correction of the observed intensity I obs (Q) to the corrected intensity I corr then becomes: much larger than the thickness of the sample.

C.1.2. Geometric definitions
Fig. 4. Schematic overview of the definitions used in the derivation of the background correction for dispersions "sandwiched" between two container walls.
The upstream sample container wall is denoted by the subscript 1 , the sample by 2 , and the downstream sample container wall by 3 .The following definitions are made (c.f. Figure 4): • D: The thickness of a phase The beam intensities entering and exiting the various phases therefore work out as: C.2.2.Absorption of scattered beam by subsequent components Components in place after a scattered photon will absorb the scattered radiation with an absorption length slightly larger than the unscattered beam.The length of travel of the photon though subsequent materials is defined as: The transmission factor T of scattered radiation through subsequent phases therefore is: C.2.3.Intensity of the scattered beam in the scattering component The derivation of the scattered intensity, and direction-dependent transmission factor has been derived elsewhere (Pauw, 2013), where it was found to be: For the initial derivation, however, we do not consider the scattering angle-dependent increase in material pathlength, so that the term cos(2θ) = 1.

C.2.4. Intensity scattered from component phases
The scattered intensities of the individual components are defined as follows: With cos(2θ) = 1, this simplifies to: C.2.5.Intensity scattered from the total The total scattered intensity is the sum of the scattering from all three components in the beam, attenuated by their subsequent phases.
Assuming phases 1 and 3 are identical, this simplifies to: C.2.6.Determining P 1 Before we can continue, we must find out how to determine P 1 .We do this in a background measurement, by measuring the scattering from the empty cell I b (in practice, the cell is ideally drawn to a vacuum, although the signal from air is assumed to be negligible).This implies that P 2 and µ 2 are both zero as this phase is not present in the measurement.We then obtain P 1 from Equation 12: (NB: The first factor 2 originates from considering the upstream and downstream walls separately) So that: C.2.7.Extracting P 2 Finally, we want to find the scattering probability of phase 2 P 2 (which is what we are really after), by rearranging equation 13: So, after this work, we find out that even when we thoroughly consider the scattering process of a sample sandwiched between two sample cell walls, we arrive at a simple equation for determining the sample scattering probability from the total measured intensity.

C.3. Final remarks
There are interesting aspects when we use this background subtraction equation in practice.Firstly, we find that it is not necessary to determine the sample cell wall thickness D 1 .Secondly, both the sample measurement and the background measurement are normalized to the thickness of the sample phase D 2 only.Lastly, it should be noted that this is, of course, only valid if the same sample cell is used for both the background and the sample measurement.
Equation 17 as derived thus is represented using the modular data corrections as shown in Figure 1.The thickness correction occurs after background subtraction, and the transmission and incident flux corrections have been applied before subtraction.
The same background equation also works for simpler cases, for example when measuring a solid sample with an empty background.

Fig. 1 .
Fig. 1.The recommended data correction sequence for solids (Process A & B), or dispersions (Process A -C).Output B for solids is the corrected data in absolute units, for dispersions it is the solvent scattering in absolute units.Output C for dispersions is the sample scattering in absolute units.The azimuthal averaging step can be considered for isotropically scattering samples.

• t :
The running variable of distance traveled through all phases • t 0 : position at the start of the upstream sample container component • t 1 : position at the start of the sample component (end of the upstream sample container component) • t 2 : position at the start of the downstream sample container component (end of sample component) • t 3 : position at the end of the downstream sample container component • P n (2θ): The scattering probability of phase n • I 0 (t): The primary beam intensity at position t IUCr macros version 2.1.10:2016/01/28 • I s (t): The scattered intensity at position t • I 0 : The primary beam intensity • I 1 : The primary beam intensity entering the sample phase • I 2 : The primary beam intensity entering the downstream sample container component • I 3 : The primary beam intensity after absorption through all of the components • µ n : The linear absorption coefficient of phase n • 2θ: The angle of the scattered radiation • T n : The transmission factor of a given phase or set of phases C.2.The derivation C.2.1.Absorption of the unscattered beam X-ray absorption is defined as: 11) Substituting the components of equation 11 by with equations from 10, 5 and 7, we get for the total scattered intensity of both sandwich-cell walls and the intermediate IUCr macros version 2.1.10:2016/01/28 sample: