 1. Introduction
 2. Data for this analysis
 3. Fitting data with original point density and point accuracy: mu2chi
 4. eFEFFIT
 5. eFEFFIT, IFEFFIT fitting
 6. High point density (HPD) XAFS analysis
 7. Interpolation of experimental data with minimal distortion of information content
 8. Information preservation
 9. Selection of kspacing of the interpolation grid
 10. Effects on refined structure
 11. Conclusion
 Supporting information
 References
 1. Introduction
 2. Data for this analysis
 3. Fitting data with original point density and point accuracy: mu2chi
 4. eFEFFIT
 5. eFEFFIT, IFEFFIT fitting
 6. High point density (HPD) XAFS analysis
 7. Interpolation of experimental data with minimal distortion of information content
 8. Information preservation
 9. Selection of kspacing of the interpolation grid
 10. Effects on refined structure
 11. Conclusion
 Supporting information
 References
q2xafs2017 workshop
Propagation of uncertainty in experiment: structures of Ni (II) coordination complexes
^{a}School of Physics, University of Melbourne, Australia
^{*}Correspondence email: chantler@unimelb.edu.au
Accurate experimental Nnpropylsalicylaldiminato)nickel(II) (npr) are investigated. The additional physical insight afforded by the correct propagation of experimental uncertainty is used to determine newly refined structures for the innermost coordination shell. Two sets of data are investigated for each complex; one optimized for high point accuracy and one optimized for high point density. Clearly both are important and in this investigation the quality of the physical insight from each is directly provided by measured and propagated uncertainties to fairly represent the relevant accuracies. The results provide evidence for an approximate tetrahedral geometry for the ipr Ni complex that is more symmetric than previously concluded, with our high point accuracy data yielding ligand lengths of 2.017 ± 0.006 Å and 2.022 ∓ 0.006 Å for Ni—N and Ni—O bonds, respectively, and an even more skewed squareplanar (i.e. rhombohedral) arrangement for the npr complex with corresponding bond lengths of 2.133 ± 0.004 Å and 1.960 ∓ 0.003 Å. The ability to distinguish using hypothesis testing between the subtle differences in spectra arising from the approximate local tetrahedral and squareplanar geometries of the complexes is also highlighted. The effect of standard interpolation on experimental spectra prior to fitting with theoretical model structures is investigated. While often performed as a necessary step for Fourier transformation into position space, this will nonetheless skew the fit away from actual data taken, and fails to preserve the information content within the data uncertainty. The artificial effects that interpolation imposes on χ_{r}^{2} are demonstrated. Finally, a method for interpolation is introduced which locally preserves the χ_{r}^{2} and thus information content, when a regular grid is required, e.g for further analysis in rspace.
(Xray absorption finestructure) data including uncertainties are required during analysis for valid comparison of results and conclusions of hypothesis testing on structural determinations. Here an approach is developed to investigate data without standard interpolation of experimental data and with minimal loss of information content in the raw data. Nickel coordination complexes bis(inpropylsalicylaldiminato)nickel(II) (ipr) and bis(Keywords: XAFS; information content; uncertainty; nickel coordination complexes.
1. Introduction
Xray absorption fine structure is the series of oscillations observed in the ). While fingerprinting of XANES (Xray absorption nearedge structure) and, for example, preedge features can be assessed to useful levels without leastsquares analysis or principal component analysis, distinguishing alternate structures, ligands and shells needs a reliable measure of goodnessoffit (Koningsberger et al., 2000; Chantler et al., 2012).
following an due to interference of the photoelectron wavefunction backscattering from nearby atoms. Despite containing detailed information on the arrangement of atoms in the local region around the central absorber, highprecision determination of physical characteristics relies on accurate fitting of theoretical models with the experimental data. To date, data collection has not defined uncertainties nor propagated them into hypothesis testing and with the exception of the Xray extended range technique (XERT) (Chantler, 2009Techniques including Xray crystallography and neutron diffraction can be used in ) to a high degree of accuracy, though they have obvious limitations when the sample of interest is in noncrystalline, solution or dilute form. The extended Xray absorption finestructure (EXAFS) region of the is useful in determining the structure of isolated molecules (Sayers et al., 1971; Eisenberger & Kincaid, 1978), and has proven to be a powerful tool in subtle structural determination (Mazzara et al., 2000; Chantler et al., 2012).
(Ladd & Palmer, 1977Following one of many standard preedge removals, the experimental data are transformed into by
where estimates the isolated atom background curve. Depending upon formalism, this preprocessing removes background effects from a matrix and solvent and any absorption or scattering from atoms not involved in the edge; defines and subtracts the edge energy or edge offset E_{0}, removes the edgejump to the aboveedge region, and estimates somehow an isolatedatomic absorption function above the edge. It is often incorporated into a single routine which makes an empirical spline fit through the data points above the This enables the isolation of the oscillations to allow structural determination. The spectrum is then converted to a function of wavenumber k,
where E_{0} is the `edge energy'.
Much activity around the 1990s emphasized the need to fit spectra to allow structural insight, though the measures used varied quite widely and with nonuniform results. O'Day et al. (1994) introduced a goodnessoffit measure but did not incorporate uncertainties or the standard deviation of experimental data. They stated that `there is currently no accepted method for determining these errors'. Similarly, Filipponi & Di Cicco (1995) comment that `any report should be accompanied by a detailed analysis of the statistical errors due to random noise in the raw spectra'. However, `general procedures to estimate errors … are still not well established'.
There have been attempts to estimate uncertainty for et al. (1992) used a piecewise polynomial to extract residual noise hopefully free of any structure, and equivalently to use a Fourier filtering to remove dominant structure to hopefully yield a noise spectrum. Of course these are recursive methods and depend upon an ideal fit of any structure using empirical means in order to derive the variance and noise that would allow the structure to be determined.
data. DentFilipponi (1995) commented that the uncertainties on the fitted parameters should be given by the spread of such parameters resulting from variance from an ensemble of experimental spectra. However, he comments `unfortunately, only a single measurement is usually available'. He then provides three prescriptions for evaluating the noise distribution based upon an assumption of normal distribution of errors with assumptions of magnitudes of these multivariate distributions.
He suggests that a Metropolis Monte Carlo algorithm may be used to sample the parameter probability distribution. When applied to experimental data this will result in a sequence of independent sets of parameter values, each of which producing best fits of the experimental spectrum. The spread then represents the statistical uncertainty. This again is a postfacto representation and depends upon the initial determination of uncertainty. Finally, statistical errors can also be estimated from an assumption of perfect structural determination followed by a noise analysis of the residual, a little like that of Dent et al. (1992).
The GNXAS software estimates the noise in energy space: after fitting the structure, an error bar for each data point is generated by first fitting a polynomial of degree q < M over M data points, the residual square difference divided by M − q forms an estimate of the noise in the data. Repeating this along the spectrum allows for an uncertainty to be estimated at each point via interpolation (Westre et al., 1995; Filipponi & Di Cicco, 1995; Filipponi, 1995).
An alternative approach is employed by the IFEFFIT software package (Newville, 2001), which estimates an uncertainty of experimental spectra as a function of wavenumber based upon a Fourier transform of Rspace background, against theoretical models produced via the package FEFF6 or recently FEFF8L. IFEFFIT is also the foundation for other software used in analysis, which often provide the benefit of a graphical user interface (GUI), such as the ARTEMIS and ATHENA packages (Ravel & Newville, 2005). The measure of model agreement in IFEFFIT is calculated as
or alternatively
where N_{indp} is an effective estimated `number of independent points' in the spectra given by the Nyquist formula,
for a fit range of and in k and Rspace, respectively (Stern, 1993).
estimates the uncertainty in the spectrum which is calculated as the rootmeansquare of the Fouriertransformed data in a region at high R. Parseval's theorem allows for conversion of this parameter into kspace with w the power of the kweighted spectrum (Newville et al., 1999),
for data point kspacing . However, since most sources of noise are not taken into account, and are underestimated, the error bars are too small, and is overly large, and often 500–2000, compared with a more ideal propagated .
In an attempt to remedy this, the fit is often reevaluated using a somewhat arbitrary userdefined constant or to yield a `good fit ≃ 1' (Calvin, 2013). This assumes the final fit is perfect in order to define the uncertainties, and is therefore of limited use for hypothesis testing. The use of any such uniform error affects the fit since experimental uncertainties are nonuniform on  or space. Without measuring the uncertainties experimentally, this skews the fit toward data points that actually have a large error, and away from those with small measured uncertainty.
Meanwhile Chantler's highaccuracy analyses of QED and atomic spectra were extended into synchrotron research and Xray absorption in several key papers. Commenting that estimates of statistical precision are critical (Chantler et al., 1999), they made a series of (ten) considerations of key limitations of accuracy in Xray absorption measurements to be addressed. This was followed by a detailed statistical analysis of noise and variance in synchrotron Xray measurements and in ion chamber detection (Chantler et al., 2000a,b). This explicitly measured numerous contributions to variance and precision, though indeed earlier authors had investigated some of these details for absorption. Finally this led to the first implementation of the XERT (Chantler et al., 2001).
Essentially, to capture the actual uncertainty obtained in the experiment, one may begin with the variance of repeated measurements at each energy. Many experimental collection routines collect n multiple scans for the same sample of interest, where n may be from 3 to 10. XERT typically takes ten repeats for the same sample and aperture combination at each measured energy. XERT performs additional measurements to aid in the correction of experimental systematics such as so the calculation of final uncertainties in becomes less trivial, with procedures outlined by Tran et al. (2004), de Jonge et al. (2007), Chantler (2010) and Tantau et al. (2015). The determination of experimental uncertainties for is further complicated in the case of a Hybrid experimental setup, especially for millimolar solutions, as is true for the data for this work (Chantler et al., 2015; Islam et al., 2016). When these uncertainties have been propagated to give uncertainty in , the results can be deposited or collected ready for data deposition, and it is best that this be done as part of the manuscript (Chantler et al., 2001, 2015; Tran et al., 2003, 2005; de Jonge et al., 2005, 2007; Glover et al., 2008; Islam et al., 2014, 2016; Tantau et al., 2015).
In most cases, the absence of a derived uncertainty in implies that such an uncertainty cannot be propagated to an uncertainty in or . However, in recent work (Islam et al., 2015, 2016) the uncertainty has been propagated to or explicitly from the standard error uncertainty ,
However, the data are then always interpolated onto a regularly spaced grid in k. This distorts experimental values, point density, information content and experimental uncertainties (Islam et al., 2014). The change of point spacing will skew the fit toward a different region of the spectrum, hence any additional time spent during the experiment in particular energy regions to gain high detail, high point density or high point accuracy on features of interest will effectively be lost. These issues are general in and apply to common packages including IFEFFIT, ARTEMIS, GNXAS and FDMX for example.
In this work we extend the work of Islam et al. (2015) by examining the spectra taken of Ni (II) complexes (Chantler et al., 2015). The nickel complexes bis(inpropylsalicylaldiminato) nickel(II) (ipr), and bis(Nnpropylsalicylaldiminato) nickel(II) (npr) (Fig. 1) are known to have local metal environments of approximate tetrahedral and squareplanar geometries, respectively (Fox et al., 1964; Britton & Pignolet, 1989). Xray crystallography has been used to examine the solid state structures, and, while small variations in the interatomic distances are observed, it is confirmed that the overall molecular geometry of the solid state structures are maintained in solution. Hence these Ni complexes provide a convenient platform on which to demonstrate the sensitivity of to such differentiating characteristics. Besides being the subject of many structural and stereochemical analyses, salicylaldiminato Ni(II) complexes are also used as a catalyst in the polymerization of (Chan et al., 2000; Lu et al., 2011).
Normal versus another (tetrahedral) an extremely fraught problem. It is therefore an ideal example to compare the importance of uncertainty propagation and its consequence upon structural determination.
is considered to be able to distinguish the to 25%, whereas in this case the difference is 0%. Further, the bond lengths and path differences are essentially identical, making any comparison of structure or distinction of one hypothesis (square planar)We will perform fits on noninterpolated spectra to evaluate the information content and to provide more reliable parameter and
for these challenging coordination complexes. We will therefore:(i) Develop a novel method and code for transforming the raw data while maintaining point density and point accuracy, with code in the supporting information.
(ii) Determine newly refined local structures based on noninterpolated experimental data for both high point accuracy data sets and high point density data sets.
(iii) Highlight the ability of
as a powerful tool in stereochemical analysis.(iv) Demonstrate the effects standard interpolation has on and the consequences on the interpreted structure.
(v) Present a new method of interpolation which preserves information content.
(vi) Explain that the new approach is not only effective across the whole range of the spectrum but also with respect to the distribution of data point density and accuracy across the spectrum.
2. Data for this analysis
A total of four experimental data sets are used herein, gathered via the Hybrid technique, which simultaneously records fluorescence and transmission data (Chantler et al., 2015). The Ni complexes are in a frozen solution of 15 mM, kept at a temperature of ∼80 K, so as to observe and quantify structure in the absence of thermal disorder. For each complex, an transmission spectrum is collected following highaccuracy Hybrid methodologies (Chantler, 2010), providing experimental data with high point accuracy (HPA) and with the spectra corrected for experimental systematics including energy calibration, harmonic contamination and scattering using published methods (Tran et al., 2004; Glover & Chantler, 2009; Barnea et al., 2011; Islam et al., 2014; Tantau et al., 2015). Additionally, a spectrum is also taken using a faster method for each complex, which trades accuracy for higher point density, in an experiment where time constraints did not permit both. Tabulations of the collected spectra can be found in the supporting information of Chantler et al. (2015).
3. Fitting data with original point density and point accuracy: mu2chi
In order to use, preserve and propagate the information content contained within the experimental uncertainties, we introduce the mu2chi processing to convert spectra from absorption versus energy E into χ versus k (i.e. it translates mu to chi) which avoids any interpolation and propagates experimental measurement uncertainty. Fig. 2 shows one of the stages of the mu2chi data reduction, whereby the background spline is removed. An example of a final mu2chi output is shown in Fig. 3. We provide the (general) code, manual and makefile in the supporting information.
There exist numerous methods for interpolation. Previous work (Islam et al., 2015) utilized a cubic spline approach. After the preedge and background (Fig. 2) are subtracted in the conventional manner, a cubic fit with standard deviation uncertainties is made through four data points, and is evaluated at a regular 0.05 Å^{−1} spaced grid, iteratively stepping through the data. Each point on the grid then has multiple fitted points with uncertainties, with the final value determined by using a and the uncertainty being a weighted standard deviation.
This is common, but often the interpolated value will differ from that of the original, despite being located at the same E or k value (Fig. 4). This then does not reflect the real data, and measured outliers are often omitted by the spline, incorrectly improving the reported or .
4. eFEFFIT
The theoretical spectra are calculated following the photoelectron wave model (Lee & Pendry, 1975; Barton & Shirley, 1985), and expressed as a sum of photoelectron scattering paths through the equation (Zabinsky et al., 1995; Bunker, 2010),
where N_{j} is the degeneracy of the path, S_{0}^{ 2} corresponds to manybody reduction effects, approximated as constant, F_{j}(k) is the backscattering amplitude function, is the phase shift, is the Debye–Waller factor for thermal movements, is the photoelectron and r_{j} = is the half path length, with α being the relative scaling due to thermal expansion.
We introduce a modified version of the IFEFFIT subroutine FEFFIT, called eFEFFIT (errorFEFFIT or errorFEFF fit), which allows for experimental uncertainties to be input and propagated. This should be used when determining the fit, as well as in the determination of [equations (10) and (11)], so as to better reflect the actual data and their significance (Smale et al., 2006),
The numerator of equation (11) is the residual and is calculated as the difference between , the ith experimental data point, converted to a function of wavenumber k, and the corresponding theoretical modelled value at that point . is the associated propagated uncertainty. N_{pts} represents the number of points inside the fitting range, and N_{vars} the number of fitted parameters. The authors plan to distribute eFEFFIT code in the near future.
We recommend to not interpolate the experimental data onto a regular grid but rather interpolate the theoretical model onto the experimental data point array. Otherwise, it is difficult to preserve the information content of the original experiment during interpolation. Experimental data points should of course be taken in at least semiregular points in kspace; however, this is dependent upon an exact and correct determination of E_{0} prior to data collection, which is generally implausible. Also, there can be a focus on local structure or multiple edges which makes a more uniform scan impractical. If the experimental point density varies greatly across the fitting range, the fit will be dominated by regions of high point density.
5. eFEFFIT, IFEFFIT fitting
Our starting point for structural refinements in this work will be the structure presented in Table 4 of Islam et al. (2015). This represents an intermediate stage of analysis whereby lengths are refined yet the N—Ni—O angle is fixed at 90°. The model includes the carbon rings but omits the H atoms (Fig. 1).
Fitting should be done in kspace as opposed to rspace, otherwise interpolation to a uniform grid is still required for fast Fourier transform, compromising further the information contained in the measured data points. We perform the fit without any kweighting to avoid emphasizing different regions of the spectrum. Graphs throughout this work, showing a kweighted spectrum, are scaled after the fit.
The theoretical models used are provided via FEFF8, with parameters RPATH = 4.85, being the maximum half scatteringpath length, with the maximum number of legs NLEG = 6 being applied consistently throughout both this and previous (Islam et al., 2015) work. Table 1 shows the fitted parameters using the previously refined model, and conventionally splineinterpolated data compared with the newly processed χ versus k noninterpolated experimental data. The uncertainties look similar but the fit might appear worse as is around double for the `correct' raw data rather than the interpolated form. The interpolation smooths the data and ergo reduces apparent noise; but it does so artificially and hence distorts the spectrum and the apparent fit. Notice that there are small differences in the scale of thermal parameters but that all are physical and the nearest neighbours have a smaller thermal broadening.
‡Fixed to physical value. § is the thermal broadening parameter for the next shortest 15 photoelectron scattering FEFF paths, set to be 0.002 Å^{2}. 
The results obtained previously (Islam et al., 2015), and in Table 1 for comparison, only implemented experimental uncertainties σ in the postfit calculation of equation (11). Henceforth we implement the eFEFFIT routine to utilize the experimental uncertainties in both determining the fit and in the calculation of . Table 2 shows the effect of utilizing experimental uncertainties in the fit in addition to using the raw (noninterpolated) data. This maintains the understanding of significance testing with experimental uncertainties, and entirely eliminates any ad hoc estimation of uncertainties. The analysis can then test the validity of theory, model, experimental uncertainty and structure. In Table 2, most parameters only shift a small amount, so the use of raw data and uncertainties might be seen as not so important; however, the quoted uncertainties are generally halved, some shifts are equal to one derived standard error, and the major changes will be seen in Tables 3 and 4.
‡Fixed to physical value. § is the thermal broadening parameter for the next shortest 15 photoelectron scattering FEFF paths, set to be 0.002 Å^{2}. 


Using the full eFEFFIT and noninterpolated data, a third step is now to perform a of the key Ni—N and Ni—O bond lengths and N—Ni—O interatomic angle. Since the approximate geometries of each complex are known, the refinements for the ipr and npr complex will be based on the tetrahedral (TD) and squareplanar (SQ) theories, respectively.
The first step is a twodimensional . Only the nitrogen and oxygen atomic positions are being modified while carbon coordinates remain fixed. The surface is presented in Fig. 5 and indicates the difficulty of the minimization to clearly distinguish the N from the O, as expected. The neighbouring carbon atoms and those in the rest of the molecule also undergo small changes in their positions to compensate for the new nitrogen and oxygen locations.
performed on the key bond lengths, keeping the O—Ni—N angle at 90°, followed by a onedimensional angular scan using the new bond lengths. The new fitted parameters are presented in Table 3The valley is quite shallow (Figs. 5 and 6). The asymmetry of the valley in Fig. 5 indicates a relatively firm positioning of the distance of the nitrogen and oxygen atoms from the nickel centre, while allowing more flexibility as to the interchanging of their relative proximity. Also indicated is a sharper definition of the oxygen location than the nitrogen. A key result of this stage is a more plausible S_{0}^{ 2}. Another significant consequence is that the bond radii have shifted significantly from the result of the previous fit using splineinterpolated data without uncertainties. Hence the importance of the raw data with uncertainty for any quantitative analysis.
All refinements exhibit small discontinuities in as a function of bond and angle (Fig. 6). These occur primarily from two sources. Changes in key interatomic distances result in the relative contribution of certain paths crossing the threshold for acceptance criteria as set in FEFF. Redefining this parameter simply results in the discontinuities occurring elsewhere, and its omission produces too many unique FEFF paths for the software IFEFFIT to handle. Secondly, altering the bond lengths causes some paths to pass into or out of the maximum path length, also defined in FEFF. An effort to circumvent this artifact was to use only a few threelegged paths of any length, although a change in the position of the minima occurs. However, the overall gradient trend exists over subsections of the scan partitioned by the discontinuities. Hence the location of the minimum is largely unaffected by these small discontinuities.
The uncertainties of the refined bond lengths correspond to the fitted percentage uncertainty given for the α parameter. Uncertainties of the N—Ni—O bond angle are determined by matching the percentage increase in to that for the bond length uncertainties. This latter method applied to bond lengths (Fig. 5) is in good agreement with the former more direct method of uncertainty determination. This reflects and is consistent with the for the standard error uncertainty corresponding to an increase of of unity.
There is no guarantee that the lowest value of in threedimensional parameter space has the bond length values determined in the twodimensional length search. Therefore we now perform a simultaneous threedimensional ). Indeed we find that the minimum from such a rigorous search is significantly different from that from a twodimensional and onedimensional search. The earlier restricted search is common but is not sufficient in a flat valley. Hence restricted minimizations, while necessary, are causes for concern in quantitative analysis. Of course we all have molecules where the full number of independent cannot be fitted as there is insufficient information content in the spectra. The solution is to constrain the physical independent motions to the most significant with chemically meaningful constraints, and to preserve the maximum information content of the data. If needed of course, collect better data.
with the newly refined bond lengths and angles (Table 4Having found suggested structures for each of the complexes, we should wonder whether the structures are simply local minima. Indeed, is the `squareplanar' molecule optimized with the data set to be square planar, and is the `tetrahedral' molecule optimized to be tetrahedral? Conversely, might they be indistinguishable by
analysis because the paths, especially twolegged paths, are basically identical?In other words we need more serious hypothesis testing, by attempting to fit the `squareplanar' molecule with the tetrahedral structure; and to fit the `tetrahedral' molecule using the squareplanar structure. Can we distinguish the two conformers by , Fig. 7).
analysis? Hence after refining the structure with the appropriate geometry, each complex is then fitted with the alternate model. The resulting crossfits are inferior, confirming the correct stereochemistry for each moiety (Table 4The uncertainties presented throughout this manuscript reflect the standard error (s.e.) for the parameter. This is exactly the same meaning for any error analysis on the assumption that all variates are normal, Gaussiandistributed and independent. Uncertainties were determined in a manner consistent for all parameters and the same as the basic methodology of IFEFFIT. The uncertainties reported are fully consistent with a detailed mapping of and . In other words, queries should be made, where there are occurrences of discrepancies between analyses of much more than 3 s.e.
Table 4 shows better defined, more robust, bond lengths when a threedimensional parameter search is performed, especially for the npr complex. Significant shifts in bond length and angle are observed compared with the earlier tests, with the ipr complex gaining a higher level of symmetry. The crossfits (ipr/SQ and npr/TD) yield larger values of indicating that the subtle changes in geometry are correctly modelled by the fits from the data, and that the hypothesis testing is able to distinguish the two conformations. The thermal parameter is significantly larger and less well defined for the squareplanar structure than for the tetrahedral structure.
Table 5 represents results from Islam et al. (2015), showing IFEFFITfitted values for crossfitting of the theories. As before, TD and SQ model structures are based on ipr and npr data, respectively.

Ni—N and Ni—O bond lengths in the previous literature were quite symmetric in both the tetrahedral and squareplanar theories, as opposed to our results propagating experimental uncertainties without interpolation (Table 4). Similar trends in the fit parameters are seen. However, the interatomic angle shows a noticeable difference between the theories in our results, which is not as evident in Table 5.
This disagreement with the results in Table 4 shows the significant difference of structure and ergo error resulting from the use of spline interpolation without using fitting uncertainties, and illustrates by contrast the considerable changes that occur when the data and uncertainties are processed with a rigorous errorpreserving method. It also suggests that using the noninterpolated data to fit the theoretical model allows for a sharper differentiation between the geometries of the individual complexes, and between hypotheses of structure or dynamics in general.
6. High point density (HPD) analysis
The discussion above used spectra collected to emphasize high point accuracy so that each point carried insight as to the physical structure of the complexes. With this Hybrid method, independent measurements are made on a range of possible systematic contributions to the signal and each data point should have a small uncertainty. The HPA spectra have some large spacings in regions of kspace which may hide critical information. Relative changes of from pointtopoint carry the most important information about structural changes and so a sufficiently fine point spacing to reveal and represent the frequencies corresponding to particular paths and bond distances is important. The HPD spectra are intended to cover all important structural frequencies with uniform stepping in kspace, but with lower pointwise accuracy, very like the standard continuous scans or the standard QuickXAFS scans. The HPD data are stepped but in approximately equal steps in kspace, and with some increased counting for higher k to better match statistics. It is interesting and useful to see how these choices affect the structural determination and how the noninterpolation is affected with different spacings and uncertainties. Therefore we repeat the last few steps in applying the above logic to the corresponding HPD spectra. This may also ask if the structural conclusions from the HPA spectra are confirmed or otherwise by appropriate analysis of HPD spectra, or by other different measurement cycles for the same sample and structure.
A simultaneous krange at each point.
of the key interatomic bond distances along with the N—Ni—O angle is therefore carried out by performing a grid search over all three parameters and fitting theThe fits with the theoretical models are depicted in Fig. 8. The SQ column of Tables 4 and 6 contains set values for certain IFEFFIT XAFSfitted parameters, namely S_{0}^{ 2} and E_{0}. This was done to prevent the software from fitting unphysical values, which would also have the additional negative effect of invalidating any cross comparisons. Ideally all parameters might be free; but their correlation and the limited information content prevents this. Hence they should be modelled with chemically and physically plausible restraints and constraints to yield a maximal search through parameter space of the most significant structural parameters. The number of fitted parameters N_{var} [equation (10)] changes by one due to the fixing of a parameter, and only changes by 1–2%. Hence no conclusions are affected by the small change of N_{var}.

Generally, there is good agreement between the results from the HPA and HPD data sets: the SQ/npr fit yields a higher degree of asymmetry in Ni—N and Ni—O bond lengths than the previous literature (Table 5). The TD/ipr structure also exhibits properties similar to that of the HPA results, with similar Ni—N and Ni—O distances. However, these were found here to be inverted in the HPD data, with the nitrogen closer than the oxygen. The amplitude reduction factor, S_{0}^{ 2}, in the SQ/ipr fit has a slightly large value. The structure found for the squareplanar model possesses slightly larger bond lengths than for the HPA data set, with a relatively higher uncertainty on the alpha scaling parameter. Crossfitting the experimental data with the opposite model structures again produces inferior fits and again demonstrates the ability of error propagation without interpolation to distinguish important hypothesis testing including of subtle geometric changes.
Given the stated error on the bond lengths shown in Tables 4 and 6, it is reasonable to question whether the corresponding resolution in rspace is sufficient to separate the peaks from the N and O atoms. However, it must be realized that one only needs to observe a shoulder in the peak of the radial distribution. Such asymmetry implies multiple radii, and can hence be expressed as independent locations to within the given uncertainty. Given a FWHM resolution of approximately 0.2–0.3 Å it is realistic to identify individual peaks of similar magnitude with separations of 0.18, 0.17 and 0.07 Å. We do note that the HPA data for the TD geometry have a separation of only 0.05, and hence do not conclude which one is shorter, and this particular data set is consistent with the distances being identical.
It is also observed that the HPA data were not able to distinguish a difference in key bond lengths for the tetrahedral arrangement, whilst the HPD did. This is simply explained by the fact that one of the data sets contained more incisive information relating to this specific question, since the interference waves from each were better defined in the HPD. This shows that one particular method, HPA or HPD, is not `better' than the other, as it depends on what information is desired. Ideally, one should have the high point density, with each point having the accuracy obtained from the HPA method.
The final Table 4 (HPA) and Table 6 (HPD), while different data sets, should be the same structure and consistent within uncertainty: if both are evaluated to accurate levels; if the input data uncertainties are estimated accurately; if the model is a true representation of the structure observed (both the theory and the imputed structure); and if the respective data sets reflect the same structure etc. In most cases this consistency is clear from the tables, though the uncertainties on any given parameter vary as they should. One must also consider the level of agreement between the structures resulting from the HPA and HPD data sets for the ipr sample. The HPA ipr yielded key bond lengths of 2.017 Å and 2.022 Å for the N and O distances, respectively, with a stated uncertainty of 0.005 Å. In general, should be unity if the model is correct (i.e. perfect theory and exact structure). If not unity, then it is possible that the input data uncertainties are incorrect and underestimated by , and hence if the model is exactly correct then the resulting uncertainties can commonly be reported as (standard error) × . Assuming the uncertainties are complete as stated, the shifts from Table 4 to Table 6 for the N and O distances are 2.017 − 1.985 = 0.032 Å and 2.055 − 2.022 = 0.032 Å or some 4.1 s.e.; or 2.2 including . Hence the results are plausible and consistent, but do reveal shifts and sensitivities from different data sets.
7. Interpolation of experimental data with minimal distortion of information content
We now address the question of how best to preserve the information content of a data set if it must be interpolated onto a linear grid, in this case in kspace. We have argued that in general such a grid is liable to distort features and weight spectral ranges inappropriately, and also to scale noise in the wrong way. How then can we envisage an attempt at interpolation onto a regular linear grid in, for example, kspace which will approximately preserve the information content overall and especially of each region of the fitting range? Knowing and having demonstrated the errors of the popular spline interpolation used in the majority of packages, we first look at the popular and common cubic interpolation, followed by an approach to preserve the information content such that the produced remains constant to within some small margin.
For a cubic interpolation, the value of can be determined by fitting a cubic function between the points k_{i} and k_{i+1}, where k_{i} < k_{int} < k_{i+1}. The points at k_{i1},k_{i+1} and k_{i},k_{i+2} are used to estimate the gradient at points k_{i} and k_{i+1}, respectively. Then these along with the values of and are used to determine the four coefficients in the polynomial equation
so that may be calculated. A simple linear interpolation is performed on the first and last data intervals. In this manner we preserve all points of the data if the kgrid point is at the same point as the data, and we preserve the local average derivative to yield a smooth curve of interpolation that passes through all the data. This method is also common and useful in preserving a smooth first derivative in the interpolation. In other words, we do not use the standard spline which smooths and replaces data points. In this cubic interpolation, we can define interpolated uncertainties as
This interpolated uncertainty yields exactly the original uncertainty at the original data points, so in that sense this is informationpreserving. Also, this prescription has one advantage that the predicted uncertainty is the corresponding uncertainty for a single interpolated point.
In a given interval between raw data points, interpolation of data will either introduce extra points or fail to place a point at all. Whenever an interval has no interpolated point, there will necessarily be loss of information content. Therefore, there is an implicit expectation or requirement that a sensible interpolation approach should always have at least one point in any range of the original experimental spectrum; that is, that all interpolation approaches ought to have a grid spacing less than or equal to the smallest grid spacing dk_{e} in the experimental data set across the range of interest, or across the range to be fitted.
Unless proper care is taken, the changing number of points in the summation in equation (11) influences . Even if N_{pts} were to be maintained, the implementation of a regular grid over nonuniform experimental data collection will bias the fit toward different regions of the spectrum. Table 7 shows the significant impact which interpolating the experimental data typically has on the resulting .

Simply put, this interpolation routine should (always) introduce additional points into the data set, but if for example the two end points of a grid remain in the interpolation then any additional point introduced with any uncertainty will appear to add to the information content of the data incorrectly; that is, if the endpoints correctly represent all the information content of the data, then any additional interpolated point requires a weighting of zero to retain information content.
8. Information preservation
Whilst, obviously, there are many interpolation methods available, none of them (locally) preserve the information content contained within, and certainly nothing of the kind has been applied to IFEFFIT and ATHENA software changes data values and data uncertainties, and hence are manifestly not information preserving. Our method which we present in this section preserves variance over a range of interpolation and parameter space, which is required for a nondistorted fit to be obtained.
CurrentUpon examination of the formula for it is evident that in order to preserve one must preserve the following quantity for each interpolation interval to remove region bias,
where the residual Res = .
We also define
where N_{var} is the number of IFEFFIT fitted variables (in this case four, i.e. E_{0}, α, and S_{0}^{ 2}) and N_{pts} is the total number of data points in the range over which IFEFFIT will eventually perform the fit over. Therefore, this range should be entered into the mu2chi program and rerun should the desired fitting range change. N_{pts} for each of our four data sets are shown in Tables 4 and 6. The residual is calculated postfit, and the adjustment to the uncertainties is performed in the earlier mu2chi process. Therefore we will now make the assumption that the residual is constant over the interpolation interval, and bring it out of the summation. Hence for each interval, we now aim to satisfy
where N_{nonint} and N_{int} are the number of points present inside the given interpolation interval, for the noninterpolated and interpolated data, respectively. For the noninterpolated data set, the number of points in the interval is taken to be unity, with the contribution to the from the uncertainties of the interval endpoints divided equally to the two respective adjacent intervals. As the residual is assumed to be constant whether interpolated or not, these cancel. Thus one requires
whence
is the sum of the inverse of the uncertainty squared for the local noninterpolated interval.
In order for equation (17) to be true, we observe that the following expression must be unity,
Therefore, following the error bar calculation as per the noninformationpreserving approach of equation (13), this value is determined for each interpolation interval. Should this value not be unity, the uncertainties within the interpolation interval are adjusted by altering the functional `height' of the curve joining the tops of the error bars of the interpolated points. For this purpose, we introduce a parameter β, the incrementing of which allows the fine adjustment of the value of the error bars in a manner depicted in Fig. 9. β is adjusted in steps of 0.001 to convergence, until either equation (19) is equal to unity within some predefined level of tolerance (0.01%), or a failsafe mechanism is triggered after 1000 iterations. This is performed using
When β equals 0.5, this expression is reverted back to that in equation (13). The process is performed on each interpolation interval.
Table 8 shows how the informationpreservation procedure of mu2chi works to restore to similar levels to that from fitting with the noninterpolated data. The HPA data have been restored to within 10% of their respective original . Conversely, the data with a denser data point spacing, HPD, now show a significantly less than the noninterpolated version, which is too low and does not reflect the original experimental data. This large discrepancy for the corrected HPD data is explained by considering the spacing of the interpolated grid, which in Table 8 was 0.05 Å^{−1}. At this spacing, many pairs of adjacent noninterpolated data points contain no interpolated data point between them, and so a contribution to the summation in equation (11) is lost, falsely lowering the reported . While the cause might be considered obvious, ensuring that this does not occur for your data is important.

The HPD data were taken via a single monotonic energy sweep, with data acquired at regular kspacing based on an estimate of E_{0}. This then requires a small amount of interpolation, for which our method is perfectly applicable, with the number of interpolated points approximately equal to the number of points in the raw data. The β value depends on the fractional distance from the endpoints. The monotonic energy sweep in kspace is very similar to the continuous energy scans often used, which may be uniform in angular velocity, energy spacing or in principle k.
The following section examines the effect changing the interpolation spacing has on the resulting for all four data sets.
9. Selection of kspacing of the interpolation grid
Until now, all interpolation has been performed to a kgrid spacing of 0.05 Å^{−1}. Fig. 10 illustrates the percent change in due to interpolation compared with the respective noninterpolated counterpart for each data set for both HPA and HPD data sets.
Fig. 10 illustrates that the noncorrected cubic interpolated data have increased by around 20–40% in all cases. When using the informationpreserving uncertainties, this changes to a difference of only approximately 10% for the HPA data, for all interpolation grids with 0.05; for > 0.05, becomes lower due to missed intervals as described above. This effect is seen again with the higher density HPD data with being restored to within 5% for 0.03 before dropping rapidly with increasing grid spacing, due to lost intervals; this is an issue for HPD data at a lower k spacing than for HPA. Hence the interpolation grid spacing should be chosen carefully for a given data set: as long as the interpolation has a finer grid than the raw data, this approach operates reasonably well to preserve information content.
Hence it is recommended that, should interpolation be required, the use of our informationpreservation method will go a reasonable way to preserving the that is representative of the actual (i.e. noninterpolated) data.
Whether the raw data are at largely nonuniform spacing, or almost uniform with or without a fine grid, this approach is able to be applied successfully prior to a Fourier transform to position space. It should be emphasized that this method is not appropriate to correct for an oversampled spectrum in kspace, as too large a number of intervals will not possess an interpolated point within, and hence the information content is unable to be preserved.
10. Effects on refined structure
To illustrate how the changes in due to interpolation affect structural shows the bond lengths and angles corresponding to a minimum when interpolated data are used, and also when using the error preservation corrected data using the above method.
Table 9

In all four data sets, using the informationpreserving correction on the interpolated data produces structural
closer to those found with the noninterpolated data, demonstrating the benefits of local preservation of information content. A particular success is that of the npr HPA data, where the corrected interpolation significantly corrected the relatively large errors in all three structural parameters incurred using the initial cubic interpolation.11. Conclusion
We introduce the idea of avoiding distortions of uncertainties and fits of kspace from any normal nonuniform experimental data. We provide code and theory for why this should give more insightful results especially for hypothesis testing in comparison of measures or any other goodnessoffit measure.
by avoiding uniform interpolations inCorrect propagation of information content gained from highaccuracy experimental techniques throughout the analysis procedure has been demonstrated, and by doing so we have been able to determine new information on the local structures of Ni complexes. The highaccuracy data (HPA) yields nickel bond lengths of 2.017 ± 0.006 Å and 2.022 ∓ 0.006 Å to the nitrogen and oxygen, respectively, and an interatomic angle of 85.12 ± 2.0° for the ipr Ni complex, suggesting a more highly symmetrized tetrahedral arrangement than prior determinations. Contrastingly, this analysis suggests a more skewed squareplanar geometry for npr Ni complex, with corresponding structural parameters of 2.133 ± 0.004 Å, 1.960 ∓ 0.003 Å and 88.7 ± 3.0° for Ni—N, Ni—O lengths and N—Ni—O angle, respectively. A second method of data collection using a single energy sweep quite like a continuous scan provided similar accuracies and insight and is consistent within defined uncertainty.
We investigate the possibility of deriving uncertainties for a uniform grid and explain some of the difficulties and challenges, including a necessary loss of information and the possibility of missing interpolation intervals. We demonstrate that any such method must yield nonuniform uncertainties even for a uniform grid in kspace.
Furthermore we have illustrated the effect interpolation has on the resulting , and the magnitude of some consequent errors. We demonstrated that an informationpreservation algorithm in our mu2chi code, which uses an intervalwise scaling of uncertainties to conserve the local contribution to the final , can be quite effective with care on the interpolation spacing relative to the original data.
The methods presented here are important and applicable to all data, whether nonuniform or uniform with a offset, and to any data needing interpolation prior to a Fourier transform. Our recommendations are that using original data uncertainty and avoided interpolation, and ergo fitting in kspace, is the most incisive way of hypothesis testing data. Should transforms be required in a processing environment, it is recommended that a cubic interpolation (not a spline) can preserve the interpolation endpoints and derivatives; and that our presented local informationpreserving algorithm provides uncertainties to maintain the potential of hypothesis testing. This fully applies to any continuous scan with or without a requiring a reinterpolation; or to any continuous scan attempting to have a uniform drive speed in angle or energy; or to any stepwise scans uniform in θ, E, k or otherwise.
Supporting information
Code and manual for software. DOI: https://doi.org/10.1107/S1600577518006549/xj5011sup1.zip
Acknowledgements
The authors would like to acknowledge M. T. Islam et al. for the collection of the experimental data used in this work. Also thanks to L. F. Smale for his valuable assistance with the software, and for providing the eFEFFIT subroutine.
References
Barnea, Z., Chantler, C. T., Glover, J. L., Grigg, M. W., Islam, M. T., de Jonge, M. D., Rae, N. A. & Tran, C. Q. (2011). J. Appl. Cryst. 44, 281–286. Web of Science CrossRef CAS IUCr Journals Google Scholar
Barton, J. J. & Shirley, D. A. (1985). Phys. Rev. B, 32, 1892–1905. CrossRef Web of Science Google Scholar
Britton, D. & Pignolet, L. H. (1989). Acta Cryst. C45, 819–821. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Bunker, G. (2010). Introduction to XAFS: A Practical Guide to Xray Absorption Fine Structure Spectroscopy. Cambridge University Press. Google Scholar
Calvin, S. (2013). XAFS for Everyone. Bosa Roca: CRC Press. Google Scholar
Chan, M. S. W., Deng, L. & Ziegler, T. (2000). Organometallics, 19, 2741–2750. Web of Science CrossRef Google Scholar
Chantler, C. T. (2009). Eur. Phys. J. Spec. Top. 169, 147–153. Web of Science CrossRef Google Scholar
Chantler, C. T. (2010). Radiat. Phys. Chem. 79, 117–123. Web of Science CrossRef CAS Google Scholar
Chantler, C. T., Barnea, Z., Tran, C. Q., Tiller, J. & Paterson, D. (1999). Opt. Quant. Elect. 31, 495–505. Web of Science CrossRef CAS Google Scholar
Chantler, C. T., Islam, M. T., Best, S. P., Tantau, L. J., Tran, C. Q., Cheah, M. H. & Payne, A. T. (2015). J. Synchrotron Rad. 22, 1008–1021. Web of Science CrossRef CAS IUCr Journals Google Scholar
Chantler, C. T., Rae, N. A., Islam, M. T., Best, S. P., Yeo, J., Smale, L. F., Hester, J., Mohammadi, N. & Wang, F. (2012). J. Synchrotron Rad. 19, 145–158. Web of Science CrossRef CAS IUCr Journals Google Scholar
Chantler, C. T., Tran, C. Q., Barnea, Z., Paterson, D., Cookson, D. J. & Balaic, D. X. (2001). Phys. Rev. A, 64, 062506. Web of Science CrossRef Google Scholar
Chantler, C. T., Tran, C. Q., Paterson, D., Barnea, Z. & Cookson, D. J. (2000a). Xray Spectrom. 29, 449–458. CrossRef Google Scholar
Chantler, C. T., Tran, C. Q., Paterson, D., Cookson, D. J. & Barnea, Z. (2000b). Xray Spectrom. 29, 459–466. Web of Science CrossRef Google Scholar
Dent, A. J., Stephenson, P. C. & Greaves, G. N. (1992). Rev. Sci. Instrum. 63, 856–858. CrossRef CAS Web of Science Google Scholar
Eisenberger, P. & Kincaid, B. (1978). Science, 200, 1441–1447. CrossRef CAS PubMed Web of Science Google Scholar
Filipponi, A. (1995). J. Phys. Condens. Matter, 7, 9343–9356. CrossRef CAS Web of Science Google Scholar
Filipponi, A. & Di Cicco, A. (1995). Phys. Rev. B, 52, 15135–15149. CrossRef CAS Web of Science Google Scholar
Fox, M. R., Orioli, P. L., Lingafelter, E. C. & Sacconi, L. (1964). Acta Cryst. 17, 1159–1166. CSD CrossRef IUCr Journals Web of Science Google Scholar
Glover, J. L. & Chantler, C. T. (2009). Xray Spectrom. 38, 510–512. Web of Science CrossRef CAS Google Scholar
Glover, J. L., Chantler, C. T., Barnea, Z., Rae, N. A., Tran, C. Q., Creagh, D. C., Paterson, D. & Dhal, B. B. (2008). Phys. Rev. A, 78, 052902. Web of Science CrossRef Google Scholar
Islam, M. T., Best, S. P., Bourke, J. D., Tantau, L. J., Tran, C. Q., Wang, F. & Chantler, C. T. (2016). J. Phys. Chem. C, 120, 9399–9418. Web of Science CrossRef Google Scholar
Islam, M. T., Chantler, C. T., Cheah, M. H., Tantau, L. J., Tran, C. Q. & Best, S. P. (2015). J. Synchrotron Rad. 22, 1475–1491. Web of Science CrossRef IUCr Journals Google Scholar
Islam, M. T., Tantau, L. J., Rae, N. A., Barnea, Z., Tran, C. Q. & Chantler, C. T. (2014). J. Synchrotron Rad. 21, 413–423. Web of Science CrossRef CAS IUCr Journals Google Scholar
Jonge, M. D. de, Tran, C. Q., Chantler, C. T., Barnea, Z., Dhal, B. B., Cookson, D. J., Lee, W.K. & Mashayekhi, A. (2005). Phys. Rev. A, 71, 032702. Google Scholar
Jonge, M. D. de, Tran, C. Q., Chantler, C. T., Barnea, Z., Dhal, B. B., Paterson, D., Kanter, E. P., Southworth, S. H., Young, L., Beno, M. A., Linton, J. A. & Jennings, G. (2007). Phys. Rev. A, 75, 032702. Google Scholar
Koningsberger, D., Mojet, B., van Dorssen, G. & Ramaker, D. (2000). Top. Catal. 10, 143–155. Web of Science CrossRef Google Scholar
Ladd, M. F. C. & Palmer, R. A. (1977). Structure Determination by Xray Crystallography. New York: Plenum Press. Google Scholar
Lee, P. A. & Pendry, J. B. (1975). Phys. Rev. B, 11, 2795–2811. CrossRef CAS Web of Science Google Scholar
Lu, J., Zhang, D., Chen, Q. & Yu, B. (2011). Front. Chem. Sci. Eng. 5, 19–25. Web of Science CrossRef Google Scholar
Mazzara, C., Jupille, J., Flank, A.M. & Lagarde, P. (2000). J. Phys. Chem. B, 104, 3438–3445. Web of Science CrossRef CAS Google Scholar
Newville, M. (2001). J. Synchrotron Rad. 8, 322–324. Web of Science CrossRef CAS IUCr Journals Google Scholar
Newville, M., Boyanov, B. I. & Sayers, D. E. (1999). J. Synchrotron Rad. 6, 264–265. Web of Science CrossRef CAS IUCr Journals Google Scholar
O'Day, P. A., Rehr, J. J., Zabinsky, S. I. & Brown, G. E. J. (1994). J. Am. Chem. Soc. 116, 2938–2949. Google Scholar
Ravel, B. & Newville, M. (2005). Phys. Scr. T115, 1007. CrossRef Google Scholar
Sayers, D. E., Stern, E. A. & Lytle, F. W. (1971). Phys. Rev. Lett. 27, 1204–1207. CrossRef CAS Web of Science Google Scholar
Smale, L., Chantler, C. T., de Jonge, M., Barnea, Z. & Tran, C. (2006). Radiat. Phys. Chem. 75, 1559–1563. Web of Science CrossRef Google Scholar
Stern, E. A. (1993). Phys. Rev. B, 48, 9825–9827. CrossRef CAS Web of Science Google Scholar
Tantau, L. J., Chantler, C. T., Bourke, J. D., Islam, M. T., Payne, A. T., Rae, N. A. & Tran, C. Q. (2015). J. Phys. Condens. Matter, 27, 266301. Web of Science CrossRef PubMed Google Scholar
Tran, C. Q., Chantler, C. T., Barnea, Z., de Jonge, M. D., Dhal, B. B., Chung, C., Paterson, D. & Wang, J. (2005). J. Phys. B, 38, 89–107. Web of Science CrossRef Google Scholar
Tran, C. Q., Chantler, C. T., Barnea, Z., Paterson, D. & Cookson, D. J. (2003). Phys. Rev. A, 67, 042716. Web of Science CrossRef Google Scholar
Tran, C. Q., de Jonge, M. D., Barnea, Z. & Chantler, C. T. (2004). J. Phys. B, 37, 3163–3176. Web of Science CrossRef CAS Google Scholar
Westre, T. E., Di Cicco, A., Filipponi, A., Natoli, C. R., Hedman, B., Solomon, E. I. & Hodgson, K. O. (1995). J. Am. Chem. Soc. 117, 1566–1583. CrossRef CAS Web of Science Google Scholar
Zabinsky, S. I., Rehr, J. J., Ankudinov, A., Albers, R. C. & Eller, M. J. (1995). Phys. Rev. B, 52, 2995–3009. CrossRef CAS Web of Science Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.