research papers
Mean weighted residuals reveal systematic overestimation of Bragg intensities in single-crystal diffraction
aDataQ Intelligence UG, Fichtelgebirgsstrasse 66, Germany, bDepartment of Chemistry, Materials and Chemical Engineering, Politecnico di Milano, Via Bassini 6, 20133, Milano, Italy, and cInstitute of Inorganic and Analytical Chemistry, Goethe University Frankfurt, Max-von-Laue Str. 7, 60438 Frankfurt am Main, Germany
*Correspondence e-mail: [email protected], [email protected]
The mean value of weighted residuals (〈ζ〉) was analysed for 8424 published single-crystal X-ray data sets of crystals containing only light elements (C, H, N, O). A striking asymmetry was observed: 71.5% of data sets exhibit positive 〈ζ〉 values, occurring 2.5 times more often than negative values. This imbalance suggests systematic errors, with evidence pointing to a slight overestimation of observed intensities (Iobs). Simulations and theoretical analysis show that such overestimation artificially lowers common data-quality metrics, including the popular merging factor Rmerge, the redundancy independent factor Rr.i.m., the precision indicating factor Rp.i.m., the weighted agreement factor wR(F2) and even atomic displacement parameters, creating a `rewarding error' that may reinforce confirmation bias. Experimental data confirm these findings, as residual factors reach their minima for 〈ζ〉 > 0 rather than at zero. These results highlight the need for critical evaluation of data-processing strategies and caution against relying solely on conventional agreement factors as indicators of accuracy.
Keywords: systematic errors; rewarding errors; metrics; single-crystal diffraction.
1. Introduction
Systematic errors in single-crystal diffraction are of general importance as the description and tracking of errors can be used to continuously improve the accuracy of the experiments, to validate data-acquisition and data-processing steps, to adjust parameter values in data-integration steps, to expose misconceptions, to validate new approaches and changes in hard- and software, and to improve correction procedures such as absorption and extinction models, as well as supporting modelling in challenging developing fields such as electron diffraction. In the recent past, the mean value of the weighted residuals 〈ζ〉, ζ = (Iobs − Icalc)/σ(Iobs), has been found to be a helpful data descriptor. A significant deviation from zero indicates the presence of systematic errors, which is frequently the case (Henn, 2019
). The significance of the deviation from zero is calculated by dividing 〈ζ〉 by the standard deviation of the mean value σ(〈ζ〉). The standard deviation of the mean value is given by the square root of the unbiased sample variance over the number of reflections, σ(〈Iobs〉) = [var(ζ)/Nobs]1/2, with the unbiased sample variance . This definition for the significance of the mean value of the weighted residuals is analogous to the definition of the significance of a redundantly measured observed reflection, where the standard deviation of the mean value (as opposed to the standard deviation of the sample) is obtained by dividing the unbiased sample variance by the redundancy and taking the square root.
It was previously found that 〈ζ〉 tends to positive values. In a sample of 127 data sets published with IUCrData (https://iucrdata.iucr.org/x/), 52% of the data sets showed a significant positive deviation of the mean value of the weighted residuals from zero (Henn, 2019
). These findings were later confirmed with an even larger sample of over 300 data sets from IUCrData (unpublished work). A positive shift of the residuals was connected earlier to several causes such as unrecognized low-energy contamination, unrecognized and disorder problems (Domagala et al., 2023
).
It will be shown in this study how overestimation of observed intensities affects different metrics such as the merging R factor Rmerge, the redundancy independent merging factor Rr.i.m., the precision indicating merging factor, Rp.i.m. (Weiss, 2001
) and the weighted agreement factor wR(F2): it leads to an artificial lowering in the studied metrics.
When aiming for high data quality, researchers may adjust various data-integration parameters. Success is then judged by monitoring commonly used metrics. These metrics can create the impression that quality is improving. In reality, after a certain point, the actual data quality begins to decline – even though the metrics continue to suggest improvement. Only after model refinement may these errors become apparent, for example by a systematic positive shift of the residuals, provided these traces of systematic errors are actively searched for. This systematic shift may be so small for individual data sets that it remains well below the noise level; however, with many data sets in the sample, the shift is clearly exposed.
When certain systematic errors (such as a slight overestimation of observed intensities) lead to seemingly higher data quality with respect to certain metrics [such as Rr.i.m., Rp.i.m., Rmerge, R, wR(F2), Uij etc.], this is called a rewarding error with respect to the mentioned data quality metrics as it leads to an appreciated result. Rewarding errors are a particularly important class of errors since they meet the desired expectation of high data quality of the user. Therefore, they are less likely to be questioned (confirmation bias) and may occur frequently but may remain undetected for decades. When crystallographic software developers unintentionally and unknowingly fall for confirmation bias, this can also lead to undetected methodological issues.
The present work aims at confirming the tendency to positive residuals for a much larger sample of published data sets comprising only light elements (in order to minimize the impact of absorption correction errors discussed previously (Henn, 2025
). Additionally, a possible explanation is offered by proposing that slight overestimation of Iobs on average is likely to be a cause of the shift of the residuals towards positive values.
2. The data
A total of 8424 crystallographic data sets containing only C, H, O and N were downloaded from the Crystallography Open Database (COD; Vaitkus et al., 2023
; Mesto et al., 2013
; Vaitkus et al., 2021
; Quirós et al., 2018
; Merkys et al., 2016
; Gražulis et al., 2015
; Gražulis et al., 2012
; Gražulis et al., 2009
; Downs & Hall-Wallace, 2003
). The CIF tag _exptl_absorpt_process_details was used to determine the absorption correction processing software. Data processed with different releases of SADABS (Krause et al., 2015
), SORTAV (Blessing, 1987
; Blessing, 1997
; Blessing, 1995
), Rigaku/Oxford Diffraction and Stoe & Cie software were included in the sample. The overwhelming majority of structure models use the independent atom model. It is known that the software versions may sometimes quote an older version of a program even when in fact the data were obtained with a newer version. No attempts were made to identify such cases. Most data sets were processed with one out of many releases of SADABS (N = 6781), with SADABS 1996 (N = 1919) having the largest share. Different releases of CrysAlis PRO [CrysAlis PRO Agilent releases from 2010 (N = 76), 2011 (N = 95), 2012 (N = 73), 2013 (N = 53) and 2014 (N = 80); CrysAlis PRO Oxford Diffraction releases from 2009 (N = 81) and 2010 (N = 71); CrysAlis PRO Rigaku OD 2015 (N = 72)], CrysAlis RED [CrysAlis RED Oxford Diffraction 2006 (N = 47), 2007 (N = 51), 2008 (N = 42) and 2009 (N = 83)] and CrystalClear [CrystalClear Rigaku MSC 2005 (N = 66) and CrystalClear Rigaku 2005 (N = 201)] add up to a total of 1091 Rigaku-associated data sets. Two releases of SORTAV follow [Blessing (1995
) (N = 199) and Blessing (1997
) (N = 33)], with a total of 232 data sets, and finally, X-RED32 [Stoe & Cie 2002 (N = 235)].
The resolution limit as recalculated from θmax and the wavelength ranges between 0.4476 and 1.1744 Å−1, with mean value 0.6402 Å−1 and median value 0.6276 Å−1. Fifty per cent of all data sets have a maximum resolution in the range 0.6024–0.6601 Å−1, with 25% of all data sets having a maximum resolution below and 25% above this range (see Table 1
).
‡Includes releases from Agilent and Oxford Diffraction: CrysAlis PRO Agilent 2010, CrysAlis PRO Agilent 2011, CrysAlis PRO Agilent 2012, CrysAlis PRO Agilent 2013, CrysAlis PRO Agilent 2014, CrysAlis PRO Oxford Diffraction 2009, CrysAlis PRO Oxford Diffraction 2010, CrysAlis PRO Rigaku OD 2015, CrysAlis RED Oxford Diffraction 2006, CrysAlis RED Oxford Diffraction 2007, CrysAlis RED Oxford Diffraction 2008, CrysAlis RED Oxford Diffraction 2009, CrystalClear Rigaku MSC 2005, CrystalClear Rigaku 2005. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The number of observed intensities ranges between 314 from a low-temperature redetermination of metaldehyde (tetragonal I4) at 150 K [COD 2205445, Barnett et al. (2005
)] and 38082 from a structure in monoclinic space group P21/n containing a pyrene derivative C34H37N [COD 2240967, Thekku Veedu & Techert (2015
)], measured at 100 K and refined as a non-merohedral twin. Only two data sets show a zero weighting scheme parameter a = 0 [N,N′-bis(3-methylphenyl)succinamide dihydrate, monoclinic space group P21/c, COD 2230904, Saraswathi et al. (2011
); a polymorph of butobarbital, monoclinic space group P21/c, COD 2016360, Gelbrich et al. (2007
)]. A total of 1057 data sets show weighting scheme parameter b > 1; the largest values are 36.3175 [ethyl 4-butylamino-3-nitrobenzoate, monoclinic space group C2/c, COD 2222973, Narendra Babu et al. (2009
)] and 38.7799 from a study of peptide nanotubes with flexible pores and disordered solvent [COD 2103455, Görbitz (2002
)].
The maximum crystal dimension lies between 0.03 mm [COD 2230768, Ismiyev (2011
); COD 2218750, Liang & Qu (2008
); COD 2238399, Chetioui et al. (2013
); COD 2022704, Aristov et al. (2023
)] and 10.28 mm [benzohydrazide, C25H29N3O, obtained with Cu Kα radiation at 296 K, COD 2234318, Bhat et al. (2012
)]. The weighted agreement factor wR(F2) lies between 0.0483 [polymorph of myo-inositol, orthorhombic space group Pna21, T = 180 K, COD 2212154, Khan et al. (2007
)] and 0.3664 [hexamethylenetetramine, C6H12N4·2C8H8O2, monoclinic space group P21/n, T = 173 K, COD 2104946, Lemmerer (2011
)], with in total 19 data sets with wR(F2) > 0.3000 and 483 data sets with wR(F2) > 0.2000.
3. Mean and median of the weighted residuals increase simultaneously
Fig. 1
(a) shows the histogram for the mean value of the weighted residuals 〈ζ〉 from the above-described sample of N = 8424 structures with light atoms C, H, O and N only. The brackets indicate averaging of the weighted residuals over individual data sets such that for every crystallographic data set one mean value of the weighted residuals 〈ζ〉 is obtained. The mean value over all data sets , where the overline indicates now averaging over all data sets in the sample, is slightly larger than zero. This may appear to be a small value; however, Fig. 1
(b) shows the histogram of the significance of the deviation from zero for the 8424 analysed data sets. A minority of only 45.45% are within the boundaries of ±3σ, as indicated by the black vertical lines; 41.79% of all data sets show a significance larger than plus three, and 12.76% show a significance less than minus three. Positive outliers appear 3.27-fold more frequently than negative outliers and in total the `outliers' are the majority. In Fig. 1
(c), the median of the weighted residuals is plotted against the mean value of the weighted residuals for each data set. A large correlation between these entities is obvious from the plot. This observation is a first hint that for each individual data set the distribution of weighted residuals is shifted as a whole, i.e. the mean value is not mainly determined by a few strong positive outliers. The shift of the residual distribution as a whole can also be described by the fraction of positive excess residuals [Fig. 1
(d)]. A well centred Gaussian distribution results in approximately 50% positive residuals ζ > 0 := ζ+ and 50% negative residuals ζ < 0 := ζ−. The difference between the integer numbers of positive and of negative residuals #ζ+ − #ζ− divided by the total number of weighted residuals Nobs (called `the fraction of positive excess residuals') is in this case a number close to zero and is accidentally sometimes slightly larger and sometimes slightly smaller than zero. But the number of positive excess residuals increases with increasing mean value of the weighted residuals 〈ζ〉, which confirms and quantifies the observation from plot (c) that the shift of the distribution of weighted residuals is driven mainly by shifting the distribution as a whole, rather than by strong outliers.
| Figure 1 (a) The mean value of the weighted residuals is not at zero but at a slightly larger value of 0.0521. This appears to be a small value; however, plot (b) shows the histogram of the significance of the deviation from zero for 8424 analysed data sets. (c) The mean value of the weighted residuals is strongly correlated with the median of the weighted residuals. The 95% confidence intervals for the fit parameters are given by [−0.012, −0.011] for the constant and [1.003, 1.015] for the slope. (d) The shift of the residual distribution as a whole can also be described by the fraction of positive excess residuals. When this fraction is multiplied by 100 it gives the percentage of positive excess residuals. When the distribution of residuals for an individual data set is well centred at zero, the percentage of positive excess residuals is close to zero. The fitted parameters are given in the plot with the respective estimated standard deviations. The 95% confidence intervals for the fitted parameters are given by [−0.008, −0.007] for the constant and [0.850, 0.863] for the slope. Plots (c) and (d) together may indicate data-processing or -reduction problems. For details, see text. |
When the 483 data sets with wR(F2) > 0.2000 are excluded, the following values result: (to be compared with 0.0521 for all data sets), median〈ζ〉 = 0.0384 (to be compared with 0.0398 for all data sets),
(to be compared with 2.9477) and median[〈ζ〉/σ(〈ζ〉)] = 2.0484 (to be compared with 2.1287), i.e. the values are all slightly reduced, but the qualitative picture does not change.
3.1. Over- and underestimation of Iobs by δ
When Iobs denotes the reflection intensity in the reflection input file, Itrue denotes the (unknown) true intensity excluding random noise, ±Δ denotes random noise and δ denotes a constant systematic offset for all reflections, e.g. from a calibration error, and when no other errors apply, the resulting intensity is given by
The symbol ±Δ was used to emphasize the stochastic nature of noise with equal probabilities for positive and for negative fluctuations. Its amplitude is characterized by the estimated of the observed reflection s.u.(Iobs) in the reflection input file. Stochastic noise is not affected by a constant shift of origin. With this notation, when δ = 0 for all reflections, the observed intensity is unbiased with respect to the true intensity when averaging over the noise:
When, in contrast, a systematic offset δ ≠ 0 applies,
and the observed intensity is not unbiased anymore. It is affected by a systematic shift δ.
A constant and for all reflections equal positive or negative offset δ may be seen as the extreme case of a whole class of systematic errors where the origin for only a of observed intensities, for example from resolution or exposure time batches, is shifted (origin drift) or where non-linearities in area detectors lead to spatial or intensity-dependent origin drifts. Distinct spatial inhomogeneities in detector responses were reported earlier (Paciorek et al., 1999
; Pflugrath, 1999
; Dudka, 2018
). Insufficient or missing absorption correction procedures may affect specifically those reflections with the longest path through the crystal, though depending on the linear absorption coefficient of the crystal. Time-dependent drifts of the origin may occur from decreasing or fluctuating beam intensity, crystal decay, or just insufficient or missing correction for changing irradiated crystal volume. These errors may lead to different individual shifts depending on the coordinates of the detecting pixel in the detector (xdet., ydet.), resolution, intensity, beam profile or exposure time, , and on geometry parameters.
All of the mentioned errors may result in an average shift 〈δ〉 of the reflection intensities. So all of these systematic errors may be kept in mind when only a constant value δ is discussed for simplification as it can be regarded as the total net effect of individual errors . To further simplify the discussion, all of these errors are summarized under the keywords `overestimation' and `underestimation' of the observed intensity. Fig. 1
indicates a dominance of overestimation of Iobs, so the focus is on overestimation.
A slight – but systematic – overestimation from, say, data-integration steps would not necessarily be visible in the individual data set, where other systematic errors may overlie and mask this specific small error. Additionally, a slight over- or underestimation Iobs = Itrue ± Δ + δ would easily explain the strong correlation between median(ζ) and 〈ζ〉 as depicted in Fig. 1
(c), as even small errors |δhkl| < s.u.(Iobs,hkl) in the abundant weak data would immediately lead to small linear positive and negative changes in the median of the residuals, just as observed in Fig. 1
(c).
The working hypothesis is therefore from here on that small but systematic errors in Iobs are an important factor in explaining Fig. 1
(d). Overestimation of Iobs on average explains the overall appearance of the plots in Figs. 1
(c) and 1
(d) by accounting for the linear increase in (#ζ+ − #ζ−)/Nobs with increasing 〈ζ〉.
But overestimation of Iobs on average also raises further questions: (i) Why would small errors more frequently lead to overestimation, 〈ζ〉 > 0, than to underestimation (negative shifts 〈ζ〉 < 0)? And (ii) would regular overestimation of Iobs not increase the residual factors and call for corrections in this way?
4. How overestimation of Iobs affects the residual factors
The last two questions have a surprising answer: The residual factors decrease when the observed intensities are slightly overestimated – and they tend to increase when the observed intensities are slightly underestimated. This may unconsciously incentivize overestimation of Iobs rather than underestimation in detector calibration experiments or when setting default data-integration parameter values. In the following section an example will be used to briefly discuss how and why the residuals factors are lower when the observed intensities Iobs are overestimated. In order to give more evidence that the rewarding behaviour of the agreement factors is not just a theoretical idea but a very tangible thing, a simulation is performed with artificial data to prove this point, and traces in data from experiments are presented to further substantiate the working hypothesis.
4.1. Overestimation of Iobs reduces Rmerge, Rr.i.m. and Rp.i.m.
As an example, the merging R factor, Rmerge, which is also called Rsym, is briefly discussed. The importance of the merging R factor lies in its availability, as it is very frequently given in published data sets. As a metric for data quality it has severe weaknesses (Diederichs & Karplus, 1997
). These were solved with the redundancy independent merging R factor Rr.i.m. and with the precision indicating merging R factor Rp.i.m. (Weiss, 2001
). These important descriptors should be included as standard.
The merging R factor is defined according to
where nj is the redundancy of the unique reflection i and where 〈I(hkl)〉 indicates the mean value over the redundantly measured reflection (Arndt et al., 1968
). The contribution from one arbitrarily chosen unique reflection i with redundancy nj in the numerator is the sum . Suppose each individual measurement carries the same small constant error δ > 0. How does this affect the sum in the numerator? Obviously, it increases each individual reflection by the same amount
, and as a consequence also the average value by the same amount:
. As these terms are subtracted, the sum remains unchanged:
A short way to state this fact is `The sum in the numerator of equation (4
) remains unchanged under a transformation Ij(hkl) → Ij(hkl) + δ' – it is invariant under such a transformation. This is actually a trivial statement; it just means that the difference between numbers that are increased by the same amount remains unchanged. This holds also for each individual term in the numerator and thus for the numerator of equation (4
) in total. The denominator of equation (4
), however, is not invariant under such a transformation. It increases by njδ for the unique reflection i:
The last two equations taken together state that the merging R factor decreases when the observed intensities are overestimated by δ > 0 as the numerator is unchanged and the denominator increases in this case. The merging agreement factor responds `rewardingly' to the systematic error of overestimated intensities when a low value is perceived as desirable (confirmation bias). This holds also when not all of the redundantly measured intensities are overestimated by the exact same value δ, which was only assumed to simplify the discussion; it also holds when the observed intensities are overestimated by different amounts δj.1
So far, the discussion has assumed small errors δ. However, the conclusions also apply to large errors: indeed, the larger δ is, the smaller the merging R factor. In other words, the more the observed intensities are overestimated, the smaller Rmerge gets.
The demonstrated behaviour of the merging R factor to reward overestimation of Iobs with lower values holds also for Rr.i.m. and Rp.i.m., as these differ from Rmerge only in the factors [ni/(ni − 1)]1/2 and 1/ni1/2, respectively, in the numerator.
4.2. Overestimation of Iobs reduces wR(F2), R1 and atomic displacement parameters
For the weighted agreement factor
(with weights wi and where Nref is the number of reflections included in the refinement) and for the conventional R factor
one cannot expect exactly the same results, as the structure model is involved in these cases in the form of Icalc or Fcalc. This is in contrast to Rmerge, Rr.i.m. and Rp.i.m. where the observed entities are not compared with calculated entities but only with other observed entities. For the weighted agreement factor and the conventional agreement factor it is reasonable to assume that overestimation of the observed intensity initially also lowers the respective residual factor, until, at some point, the difference between the weakest observed and calculated entities starts to increase the residual sum at a rate that is larger than the rate of increase of the denominator.
In order to prove that the weighted agreement factor is initially decreasing as a response to a slight overestimation of Iobs, a simulation was performed with artificial data (for more details about the simulations see Appendix A
). In the simulation we know the exact true values of the observed intensities, which are never known in an experiment. Additionally, the exact error δ is known, as are the true model parameter values. For the simulation, the calculated intensities were extracted after convergence of a and written to a reflection input file. Gaussian random noise was added in proportion to the s.u.(Iobs) values from the experiment. In order to perform the simulation at a level of the weighted agreement factor that approximately compares with average experimental values, the noise was chosen to be 4 times s.u.(Iobs). A number of such reflection input files were generated. Different values δ were added in incremental steps in addition to the noise for different artificial data sets. The same starting model was refined against each of these resulting artificial reflection input files. The resulting residuals factors wR(F2) and R1 are plotted in Figs. 2
(a) and 2
(b), respectively, against the mean value of the weighted residuals 〈ζ〉. The lowest residual values are not attained for 〈ζ〉 = 0, as one might expect naively, but for 〈ζ〉 > 0. This proves that wR(F2) and R1 are rewarding overestimation of Iobs similarly to Rmerge, Rr.i.m. and Rp.i.m.. The main effect of overestimation of Iobs on the structure model as obtained from the simulation is a systematic reduction in the atomic displacement parameter Ueqiv. As an example, the true value for the chlorine atom in the simulation described in Appendix A
is Ueqiv = 0.05793. Increasing Iobs until 〈ζ〉 = 0.1071 leads to a reduction to Ueqiv = 0.05729, which corresponds to 2.56 standard deviations. The reduction in Ueqiv is systematic for all atoms and corresponds on average to 0.92 standard deviations.
| Figure 2 Simulations with artificial data. A calibration error is simulated by adding for each simulation a different constant amount δ to all observed intensities. The added amount is small in the sense that (i) it is smaller than 2 times the smallest value of σ(Iobs) and (ii) model refinement against the simulated data results in weighting scheme parameters a = 0 and b = 0 for SIM 14–SIM 30. For the exact values of δ for each simulated data set see Table 2 |
5. Are these findings in accordance with the experimental data?
The theoretical considerations and the simulations from the previous sections showed that the residual factors Rmerge, Rr.i.m. and Rp.i.m., wR(F2), and R1 are lower when the true intensity is slightly overestimated compared with the case where the observed intensity is unbiased with respect to the true intensity. For Rmerge, Rr.i.m. and Rp.i.m. (and all other residual factors with a similar structure), this is easy to demonstrate theoretically, as in these cases only observed entities enter the defining equations. The numerators in all of these equations are composed of sums of absolute differences from observed intensities and mean values thereof. These differences remain unchanged when the observed intensities are overestimated, whereas the denominators in all of these equations increase. This leads to a reduction of the residual factors in response. In the case of wR(F2) and R1, model-derived entities are involved. This changes the situation slightly, but not qualitatively, when only a small overestimation of Iobs is considered. These residual factors also decrease initially for overestimation of Iobs. A qualitative difference is that they finally start to indicate the systematic error if the overestimation is sufficiently distinct, whereas for Rmerge, Rr.i.m. and Rp.i.m. this is not the case.
The theoretically derived and simulation-confirmed rewarding behaviour of the residual factors with respect to overestimation of Iobs may explain why there are so many more data sets with a positive shift of the mean value of the residuals compared with those with a negative shift, where, ideally, positive and negative shifts should be equally distributed and equally strong.
The answer to this riddle could lie in the fact that overestimation of Iobs is rewarded such that it remains undetected in the data (confirmation bias). As a consequence, this should be visible from the experimental data themselves.
Fig. 3
shows the residual factors Rmerge (_diffrn_reflns_av_R_equivalents, red), wR(F2) (_refine_ls_wR_factor_ref, blue) and R1 (_refine_ls_R_factor_all, green), plotted as moving averages (with a window of 50 consecutive data points) as a function of the mean value of the weighted residuals. The common area in which all three residual factors reach their respective minimum value is shown as a yellow stripe in the range 0.025 ≤ 〈ζ〉 ≤ 0.065. The same yellow stripe is also depicted in Fig. 2
. This confirms again the overall slight overestimation of Iobs.
| Figure 3 Residual factors (y axis) plotted as moving averages over a window of 50 consecutive data points as a function of 〈ζ〉 (x axis). Blue: _refine_ls_wR_factor_ref. Red: _diffrn_reflns_av_R_equivalents. Green: _refine_ls_R_factor_all. The plot contains empty spaces where individual data points were not available. The averaging continues only when 50 or more data sets were available in a row with the desired value due to the chosen window. Different residual factors show their minimum values also for 〈ζ〉 > 0. The common minimum area for the three residual factors is shown as a yellow stripe. |
6. Discussion and outlook
It was shown that the shift of the mean value of the residuals correlates strongly with the median of the residuals. This strong correlation is interpreted as a sign that the mean value of the weighted residuals is determined by a shift of the distribution of the residuals as a whole rather than by a small number of strong outliers. In other words, the abundant weak data in each data set may have a stronger influence on the mean value of the residuals than individual large outliers from other, more conventional errors, such as a slight unmodelled disorder or neglect of bonding density. This holds in particular when the abundant weak intensities are slightly over- or underestimated. The individual under- or overestimation of Iobs = Itrue ± Δ ± δ may well be within the limits of noise – it is virtually invisible on the level of the individual reflection – and may nevertheless influence the mean value of the weighted residuals as these many small but systematic errors δ accumulate. It was shown with the help of artificial data that slight overestimation of Iobs = Itrue ± Δ + |δ| leads to a lower weighted agreement factor than would be obtained with the unbiased, true values of Iobs = Itrue ± Δ. The true intensity is available in a simulation in contrast to an experiment. Note that overestimation of Iobs may already occur at the step of data integration and data processing.
It was furthermore shown that these ideas are not merely theoretical but are confirmed by experimental data. The minimum values for different residual factors are found in the sample of experimental data sets again for a slightly positive mean value of the weighted residuals – they are not centred around 〈ζ〉 = 0. This is taken as confirmation (i) that these residual factors respond in a rewarding way to slight overestimation of Iobs and (ii) that the rewarding behaviour may constitute an (unconscious) incentive for overestimation.
In the simulation presented in this study, under- and overestimation of Iobs was modelled by adding the same small amount ±nδ to each reflection. Different increments n resulted in different simulated data sets. This error is of course a simplified model of systematic errors in real experiments, where errors may be much more complicated. For discussion purposes, however, and for working out the consequences, it is a valid model. Also, detectors need to be calibrated. The calibration itself comes with an error, even if it is small. A calibration error offsets the origin for all reflections by the same value. Therefore, such an error may describe a real-world case quite accurately, even if it seems a little idealized and artificial at first glance.
A calibration error explains the simultaneous increase of the mean of the residuals with the number of positive excess residuals. The artificial lowering of agreement factors and of atomic displacement parameters by overestimation of Iobs explains why this error was overlooked for such a long time (confirmation bias). Overestimation of weak data in small-molecule single-crystal experiments was found and discussed earlier in the context of measurement strategies (Williams et al., 2019
) for low-exposure-time and high-resolution data. The current results indicate that the problem might be much more widespread.
6.1. Factors contributing to overestimation of Iobs from the perspective of metrics
Unintentionally allowing for a slight overestimation Iobs > IBragg of the observed intensities may unknowingly be supported by leading to more desirable results. A typical situation for confirmation bias is (i) lower residual factors Rmerge, Rr.i.m., Rp.i.m., wR(F2), R1 and similar and (ii) the atomic displacement parameters indicate smaller amplitudes, as briefly mentioned in Section 4.2
. It is intuitive that small atomic displacement parameters are associated with high data quality as systematic errors tend to accumulate in displacement parameters by increasing them. For this reason, atomic displacement parameters from X-ray diffraction experiments are sometimes compared with results from neutron diffraction experiments [as an example see Chodkiewicz & Woźniak (2025
)] in order to give evidence for the accuracy of the X-ray refinement. Neutron diffraction experiments have a reputation of being of a higher accuracy; however, they may also be affected by slight systematic over- or underestimation of intensities.
For either X-ray or neutron diffraction experiments, small atomic displacement parameters may arise from effects that artificially increase Iobs > IBragg and thus may just be an artefact. Systematic errors that artificially reduce atomic displacement parameters may need to be excluded in order to ensure that small atomic displacement parameters are physically meaningful and not an artefact in X-ray and neutron diffraction experiments.
APPENDIX A
The simulations
The data set collected by Shraddha et al. (2020
) was selected from the sample. It describes a structure crystallizing in the monoclinic space group P21/n and contains a moiety belonging to the imidazoles (C29H23ClN2O). The measurement was conducted on a Bruker diffractometer with Mo Kα radiation at 297 K.
For preparing the simulation, first the was repeated with the command OMIT -100 but no other changes. This ensures that all reflections are taken into account, including large negative intensity observations. From the resulting reflection list file, the calculated intensities were extracted. They correspond to the true Bragg intensity without any noise. For the reference simulation SIM 22, noise was added to each reflection in proportion to the s.u.(Iobs) from the experimental data. The model parameters resulting from this reference refinement are defined to be the true model parameters and correspond to a refinement against observed intensities unbiased with respect to the true intensities as described in equation (2
).
For refinements including a systematic shift δ of the origin like for a calibration error, equation (3
), n increments of δSIM = ±0.0004F000 were added to all Itrue and exactly the same noise was added. For example, when the noise was +1.234 for reflection 234 in SIM 22, it was also +1.234 in all other simulations (SIM 14, SIM 18, SIM 26, SIM 30). This is to make sure that the results do not depend on the set of random numbers. In order to monitor the effect on the individual reflection, the maximum, minimum and mean value for δSIM/σ(Iobs) was calculated for each individual reflection in each simulated data set. The aim was to make the added systematic error insignificant with respect to σ(Iobs) such that it would not give rise to large residuals. SIM 22 is the reference simulation with no systematic error, δSIM 22 = 0. Simulations with numbers >22 have the increment added, while the others have it subtracted. For example, in SIM 26, δSIM 26 = +0.0016F000 for each reflection (n = 4 as 26 − 22 = 4), and in SIM 30 δSIM 30 = +0.0032F000. The largest change for an individual reflection in SIM 30 due to the systematic error was (δSIM 30)max = 1.8420σ(Iobs), well below three standard deviations of the observed intensity. Similarly, for SIM 14, the largest change was negative, δSIM 14 = −1.8420σ(Iobs).
The average change due to systematic errors was a reduction of the individual reflections by 0.1929σ(Iobs) for SIM 14 and correspondingly an increase by 0.1929σ(Iobs) for SIM 30. This corresponds for SIM 30 to adding to all reflections in a uniform way the total amount of additional scattering mass of 1.63% of F0002, which leaves the relative maximum simulated intensities virtually unaffected when compared with the true value from SIM 22. This leads to maximum changes in the weakest reflections by approximately only 4% when compared with the true value of SIM 22. Overall it can be said that the changes induced by the simulation are all small and most likely much smaller than other errors, for example due to a fluctuating beam intensity. The weighting scheme parameters a and b remain consequently zero after invoking the weighting scheme. Despite each individual distortion from the true intensity being small, the overall effect of these many small but one-sided distortions adds up to a measurable effect (Table 2
).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A shift of the weighted residuals by 0.0527 like in SIM 26 may appear small; however, it is already significant at 3.6798σ. The significance is given by dividing the mean value by the standard deviation of the mean value,
, where
is the unbiased sample variance. The larger Nobs, the smaller the standard deviation of the mean value. The simulations SIM 14–SIM 30 all have the same Nobs = 4807. 〈Nobs〉 = 3534.62 for the 8424 data sets from the sample described in Section 2
.
Footnotes
1Within certain limits: The more uniform the distribution of δj values for a given set of redundantly measured intensities is, the more exactly is the sum in equation (5
) invariant. Equation (5
) holds exactly in the case of a calibration error, where basically the origin has the same offset δ for all reflections. For larger differences between individual values of δj, the described cancellation with the mean value does not work fully anymore, but the equation holds approximately when the set of δj is sufficiently uniform.
Acknowledgements
JH thanks S. Mebs for bringing `confirmation bias' to his attention. Open access publishing facilitated by Politecnico di Milano, as part of the Wiley–CRUI-CARE agreement.
References
Aristov, M. M., Geng, H., Harris, J. W. & Berry, J. F. (2023). Acta Cryst. C79, 133–141. CrossRef IUCr Journals Google Scholar
Arndt, U., Crowther, R. & Mallett, J. (1968). J. Phys. E Sci. Instrum. 1, 510–516. CrossRef CAS Google Scholar
Barnett, S. A., Hulme, A. T. & Tocher, D. A. (2005). Acta Cryst. E61, o857–o859. CrossRef IUCr Journals Google Scholar
Bhat, M. A., Abdel-Aziz, H. A., Ghabbour, H. A., Hemamalini, M. & Fun, H.-K. (2012). Acta Cryst. E68, o1135. CSD CrossRef IUCr Journals Google Scholar
Blessing, R. H. (1987). Crystallogr. Rev. 1, 3–58. CrossRef Google Scholar
Blessing, R. H. (1995). Acta Cryst. A51, 33–38. CrossRef CAS Web of Science IUCr Journals Google Scholar
Blessing, R. H. (1997). J. Appl. Cryst. 30, 421–426. CrossRef CAS Web of Science IUCr Journals Google Scholar
Chetioui, S., Boudraa, I., Bouacida, S., Bouchoul, A. & Bouaoud, S. E. (2013). Acta Cryst. E69, o1322–o1323. CSD CrossRef IUCr Journals Google Scholar
Chodkiewicz, M. & Woźniak, K. (2025). IUCrJ 12, 74–87. CrossRef CAS PubMed IUCr Journals Google Scholar
Diederichs, K. & Karplus, P. A. (1997). Nat. Struct. Mol. Biol. 4, 269–275. CrossRef CAS Web of Science Google Scholar
Domagala, S., Nourd, P., Diederichs, K. & Henn, J. (2023). J. Appl. Cryst. 56, 1200–1220. Web of Science CrossRef CAS IUCr Journals Google Scholar
Downs, R. T. & Hall-Wallace, M. (2003). Am. Mineral. 88, 247–250. Web of Science CrossRef CAS Google Scholar
Dudka, A. P. (2018). Crystallogr. Rep. 63, 1051–1056. Web of Science CrossRef CAS Google Scholar
Gelbrich, T., Zencirci, N. & Griesser, U. J. (2007). Acta Cryst. C63, o751–o753. Web of Science CSD CrossRef IUCr Journals Google Scholar
Görbitz, C. H. (2002). Acta Cryst. B58, 849–854. Web of Science CSD CrossRef IUCr Journals Google Scholar
Gražulis, S., Chateigner, D., Downs, R. T., Yokochi, A. F. T., Quirós, M., Lutterotti, L., Manakova, E., Butkus, J., Moeck, P. & Le Bail, A. (2009). J. Appl. Cryst. 42, 726–729. Web of Science CrossRef IUCr Journals Google Scholar
Gražulis, S., Daškevič, A., Merkys, A., Chateigner, D., Lutterotti, L., Quirós, M., Serebryanaya, N. R., Moeck, P., Downs, R. T. & Le Bail, A. (2012). Nucleic Acids Res. 40, D420–D427. Web of Science PubMed Google Scholar
Gražulis, S., Merkys, A., Vaitkus, A. & Okulič-Kazarinas, M. (2015). J. Appl. Cryst. 48, 85–91. Web of Science CrossRef IUCr Journals Google Scholar
Henn, J. (2019). Crystallogr. Rev. 25, 83–156. Web of Science CrossRef CAS Google Scholar
Henn, J. (2025). Crystals 15, 898. Google Scholar
Ismiyev, A. I. (2011). Acta Cryst. E67, o1863. CrossRef IUCr Journals Google Scholar
Khan, U., Qureshi, R. A., Saeed, S. & Bond, A. D. (2007). Acta Cryst. E63, o530–o532. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Krause, L., Herbst-Irmer, R., Sheldrick, G. M. & Stalke, D. (2015). J. Appl. Cryst. 48, 3–10. Web of Science CSD CrossRef ICSD CAS IUCr Journals Google Scholar
Lemmerer, A. (2011). Acta Cryst. B67, 177–192. Web of Science CSD CrossRef IUCr Journals Google Scholar
Liang, W.-X. & Qu, Z.-R. (2008). Acta Cryst. E64, o1198. CrossRef IUCr Journals Google Scholar
Merkys, A., Vaitkus, A., Butkus, J., Okulič-Kazarinas, M., Kairys, V. & Gražulis, S. (2016). J. Appl. Cryst. 49, 292–301. Web of Science CrossRef CAS IUCr Journals Google Scholar
Mesto, E., Scordari, F., Lacalamita, M., De Cola, L., Ragni, R. & Farinola, G. M. (2013). Acta Cryst. C69, 480–482. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Narendra Babu, S. N., Abdul Rahim, A. S., Abd Hamid, S., Balasubramani, K. & Fun, H.-K. (2009). Acta Cryst. E65, o2070–o2071. Web of Science CSD CrossRef IUCr Journals Google Scholar
Paciorek, W. A., Meyer, M. & Chapuis, G. (1999). J. Appl. Cryst. 32, 11–14. Web of Science CrossRef CAS IUCr Journals Google Scholar
Pflugrath, J. W. (1999). Acta Cryst. D55, 1718–1725. Web of Science CrossRef CAS IUCr Journals Google Scholar
Quirós, M., Gražulis, S., Girdzijauskaitė, S., Merkys, A. & Vaitkus, A. (2018). J. Cheminform 10, 23. Google Scholar
Saraswathi, B. S., Foro, S. & Gowda, B. T. (2011). Acta Cryst. E67, o1591. CrossRef IUCr Journals Google Scholar
Shraddha, K. N., Devika, S. & Begum, N. S. (2020). IUCrData 5, x191690. Google Scholar
Thekku Veedu, S. & Techert, S. (2015). Acta Cryst. E71, o629–o630. CrossRef IUCr Journals Google Scholar
Vaitkus, A., Merkys, A. & Gražulis, S. (2021). J. Appl. Cryst. 54, 661–672. Web of Science CrossRef CAS IUCr Journals Google Scholar
Vaitkus, A., Merkys, A., Sander, T., Quirós, M., Thiessen, P. A., Bolton, E. E. & Gražulis, S. (2023). J. Cheminform. 15, 123. CrossRef PubMed Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Williams, A. E., Thompson, A. L. & Watkin, D. J. (2019). Acta Cryst. B75, 657–673. CrossRef IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

journal menu
access



