bioXAS and metallogenomics
Bayes–Turchin approach to
analysisaDepartment of Physics, University of Washington, Seattle, WA 98195, USA, bUniversity of California, Santa Cruz, CA 95064, USA, and cHahn-Meitner-Institut, D-14109 Berlin, Germany
*Correspondence e-mail: jjr@phys.washington.edu
Modern analysis of X-ray absorption fine structure (XAFS) is usually based on a traditional least-squares fitting procedure. Here an alternative Bayes–Turchin method is discussed which has a number of advantages. In particular the method takes advantage of a priori estimates of the model parameters and their uncertainties and avoids the restriction on the size of the model parameter space or the necessity for Fourier filtering. Thus the method permits the analysis of the full X-ray absorption spectra (XAS), including both and X-ray absorption near-edge spectra (XANES). The approach leads to a set of linear equations for the model parameters, which are regularized using the `Turchin condition'. Also, the method naturally partitions parameter space into relevant and irrelevant subspaces which are spanned by the experimental data or the a priori information, respectively. Finally we discuss how the method can be applied to the analysis of XANES spectra based on fits of experimental data to full multiple-scattering calculations. An illustrative application yields reasonable results even for very short data ranges.
Keywords: XAS; XANES; Bayesian analysis.
1. Introduction
The goal of X-ray absorption spectra (XAS) analysis is to extract various experimental parameters, e.g. interatomic distances R, coordination numbers N, vibrational amplitudes etc. This requires the solution to an inverse problem, i.e. the determination of the physical parameters, which we denote by a vector , that best fit a model spectrum to a set of K measurements , k = , i.e. by inverting the relation = . Most modern approaches for the analysis of X-ray absorption fine structure (XAFS) are based on standard least-squares fitting algorithms (Lytle et al., 1988; Newville et al., 1995), either in k- or R-space, where k = (E-E0)1/2 is the photoelectron wavenumber measured from threshold and R is its Fourier complement. In this approach one minimizes the normalized mean square error,
with respect to each model parameter xi, where denotes the experimental error at each data point k. A number of software packages for this purpose are available, e.g. FEFFIT (Newville, 2001), EXAFSPAK (George & Pickering, 2000), EXCURVE (Binsted, Campbell et al., 1991; Binstead, Strange et al., 1991) and GNXAS (Filipponi & Di Cicco, 1995). The least-squares approach has also been applied recently to the analysis of X-ray absorption near-edge structure (XANES) using the MSXAN package (Benfatto et al., 2001).
There are a number of problems with the least-squares approach, however. First it is rarely obvious a priori which of the model parameters xi or how many of them can be included in the fits. Indeed, there are usually many more model parameters than can be represented by the data, which leads to an ill-conditioned inverse problem, as discussed below. The problem is typically avoided in practice by Fourier filtering and a judicious selection of model parameters. For example, the Nyquist criterion Np = gives an estimate of the number of independent variables that span a given range in k- and R-space. This procedure, however, is subject to user experience and subjectivity and hence difficult to automate. Also the effects of systematic errors both in theory and experiment are not easily taken into account. Moreover, many parameters are strongly correlated, leading to large error bars in the results without the imposition of appropriate constraints.
To address these problems we discuss an alternative method which permits the analysis of the full , 2002). The approach has several advantages over traditional data-analysis procedures. In particular, the method avoids the restriction on the size of the model parameter space by making use of Bayes' theorem. Given suitable restraints based on a priori parameters and their errors, the Bayes–Turchin method leads to a stable set of linear equations for the model parameters. Moreover, the approach naturally partitions parameter space into relevant and irrelevant subspaces, which are determined primarily by the experimental data or the a priori information, respectively. Thus only relevant parameters in a given model parameter space are affected by a fit, while irrelevant parameters are essentially pinned at their a priori values.
spectrum, including the XANES and/or the The method is based on the Bayes–Turchin approach introduced into analysis by Krappe & Rossner (2000A principle aim of the paper is to show that this approach is also advantageous for the analysis of XANES spectra. This requires a treatment of the smooth atomic background absorption as well as the fine structure and fast calculations of full-multiple-scattering theory as in the FEFF8 code (Ankudinov et al., 1998) which are now possible using parallel algorithms (Ankudinov et al., 2002). Such an approach is important, e.g. in the analysis of biostructures such as metalloproteins. For such systems the XANES often contains the best signal-to-noise ratio and is easiest to collect experimentally. However, the region is typically eliminated by the Fourier filtering in conventional analyzes.
The remainder of this paper is as follows. In §2 we discuss the problems with the least-squares method, and in §3 we outline the Bayes–Turchin approach. The method is illustrated with an application to the XANES of GeCl4 in §4. Finally, §5 contains some conclusions.
2. Least-squares analysis
We begin by showing formally that the usual least-squares approach leads to an ill-conditioned inverse problem (Krappe & Rossner, 2002). Assuming that the theoretical model for the is a smooth function of the parameters = near their true values , one can approximate the model near any data point as
where xi denotes a normalized residual, e.g. and is a nominal scaling factor chosen so that xR is of order unity in fits. Inserting this approximation into the definition of , we obtain
where the vector denotes the signal components,
Here = represents the dimensionless i.e.
and are the components of the normalized model gradients ,With these definitions, minimizing with respect to the parameters xi leads to a set of N linear equations,
where Qij are components of the N×N information matrix ,
and bi are the normalized signal components,
Since the dot product is invariant under coordinate rotations, the information matrix does not depend on the space (e.g. k or R) in which it is evaluated.
To show that these equations are ill-conditioned, consider the eigenvalues of the symmetric matrix , listed in order of decreasing magnitude. Since is directly related to the model gradients, it is clear that parameters on which the model depends strongly correspond to large eigenvalues and, conversely, those which have little effect on the model correspond to small eigenvalues. The conditioning number ZQ is defined as the ratio q1/qN, which can grow large as the number of parameters is increased. For the applications to discussed by Krappe & Rossner (2002), ZQ can be up to 1016. Thus in the eigen space spanned by the eigenvectors of the matrix , the solutions to the linear equations are formally given by
However, the signal components are generally limited by the experimental noise and only include one factor of the gradients, while the eigenvalues involve a product of gradients and can be much smaller. Thus, as a result of noise, the weak eigen-components are intrinsically unstable. Thus the standard least-squares approach becomes unstable as the number of parameters is increased, necessitating some form of regularization. Various methods of regularization exist, e.g. Wiener filtering (Bijaoui, 2002); however, many of these are more or less ad hoc. In the next section we show how the Bayesian approach naturally regularizes the inversion.
3. Bayes–Turchin approach
The Bayes–Turchin approach of Krappe & Rossner (2002) addresses the ill-posed nature of the inverse problem by making use of known or a priori information on the model parameters . Here we will only summarize the key results, as the original papers provide much more detail. For simplicity we will assume that the parameters xi have Gaussian distributions, which may be correlated, i.e. they are characterized by the a priori probability distribution,
where N is a normalization factor and is a quadratic form,
with a kernel determined by the inverse of the cross-correlation matrix
Thus, for example, the a priori variance of the coordinate R yields A-1R,R = .
Secondly, as discussed by Krappe & Rossner (2000), the method explicitly takes into account theoretical errors by considering the dependence of the model ] on both the model parameters and on the theoretical parameters, e.g. the phase shifts, scattering amplitudes, mean free paths etc.; these are denoted by a dimensionless vector = the components of which may also depend on the model parameters .
The incorporation of the a priori information is carried out using Bayes theorem for the conditional probability distribution, given the a priori information. This yields the a posteriori probability distribution for the model parameters x given ,
Here the conditional probability is
where in tensor notation (i.e. writing ),
Here the K×K matrix appears,
which characterizes the errors in the theory. In particular, characterizes the truncation error arising, for example, from a finite cluster size of the model, while involves errors in the theoretical parameters (e.g. the phase shifts) and their gradients. If the off-diagonal terms are neglected, the effect of is to renormalize the mean-square error at each point k by the sum of the squares of the experimental, model and truncation errors.
As a result, one obtains an a posteriori distribution given by
where = is given by the quadratic form,
The a posteriori expectation values of the model parameters,
are obtained by minimizing with respect to xi. This yields the regularized linear equations
where the renormalized information matrix is
and the renormalized signal coefficients are given by
Clearly the a priori information in the matrix regularizes the inversion, since in the eigen-space of the solutions are then always stable. That is, the formal solutions for the model parameters are
which is well behaved even when .
Moreover, it is seen that the a priori information naturally partitions the data into relevant and irrelevant subspaces, and , respectively, where
Thus only the parameters in the relevant subspace are significantly fit by the data, while the parameters in the irrelevant subspace do not deviate significantly from their a priori values. An important finding is that the dimension NR of the relevant subspace is significantly smaller than that given by the Nyquist criterion, i.e. = , since NR takes into account the effects of experimental noise and systematic error.
These conditions can be satisfied by setting = for an appropriate cut-off (Krappe & Rossner, 2000). In particular, two methods for determining were introduced by Turchin et al. (1971). The first fixes such that the effective number of is
This criterion ensures that the information in the data is not distorted by a priori information. A second criterion, for situations when one would like to decrease the effect of the a priori information, gives a value .
4. analysis
4.1. analysis
In the original papers of Krappe & Rossner (2000, 2002), the Bayes method was applied to the analysis of data of Cu metal based on a theoretical model using the multiple-scattering path expansion and the FEFF8 code. In an initial test, a large model parameter space of dimension N = 158 was used, including the variables, S02, E0, , . Of these, only a small number turn out to be important. The dimension of the relevant subspace for this model was found to be NR = 21 from the Turchin cut-off and NR = 40 from the cut-off , both much smaller than the Nyquist estimate, Np = 55. Subsequent investigations for a variety of systems showed that the approach is very robust. In these investigations, the vibrational information was fit to a small set of spring constants using an efficient theoretical model (Poiarkova & Rehr, 1999), which is much more efficient than fitting pairwise Debye–Waller factors. The third cumulants were also included as model parameters.
4.2. XANES analysis
Here we have attempted to extend the Bayes–Turchin method for XANES analysis. The near edge requires a different fitting procedure for several reasons. First, the measured absorption contains a background contribution (from the other edges) as well as the atomic background from a given edge, in addition to the fine structure , all of which must be taken into account,
Second, the path expansion for the fine structure does not always converge well in the XANES region, thus often necessitating full multiple-scattering calculations, which do not have a simple analytical form. Thus numerical gradients of the model parameters are needed,
where is the ith unit vector. To fit the atomic background, we have adopted a new Bayesian background-subtraction procedure introduced by Krappe & Rossner (2004). In this method, a correction is added to an a priori background absorption from FEFF, i.e. with a small set of spline parameters , where in our work T is typically 3–5. Related Bayesian background-subtraction algorithms have also been developed by Klementev (2001).
4.3. Results
The above formalism was tested using high-quality gas-phase GeCl4 data (Bouldin, 1990) using the FEFF8 code. The model parameters included E0, R and between three and five spline parameters . Thus this example with its rather restricted parameter set should be viewed as a preliminary step towards more extensive future analysis. The pre-edge background was isolated using the ATHENA package (Ravel & Newville, 2005) and subtracted from the data prior to fitting. The a priori error values of these few parameters were found to have little effect on the fit, that is, all of these parameters were found to be relevant. However, such a priori estimates are generally expected to become important when additional parameters, such as distortions, spring constants and the are added.
Our results are illustrated in Fig. 1, which shows the result of a fit in k-space of the XANES of GeCl4 from threshold to k = 3.89 Å−1, i.e. 11013–11170 eV. Clearly the fit is qualitatively satisfactory, and leads to a reduced of ≃ 19, where = , where K-7 is the number of data points minus the number of model parameters, or the number of The experimental error at all the data points was taken to be 0.01, based on the observed fluctuations in the tail of the experimental spectrum. Theoretical errors were not included, but are typically larger than , as discussed by Krappe & Rossner (2002). As seen in Fig. 1, the theoretical error is quite large near 11120 eV, probably due to the neglect of non-spherical corrections and many-electron excitations in the theory. This systematic error is likely to be the primary source of error in the fit. The a priori and fitted values of the parameters R and the threshold level E0 are shown in Table 1.
|
An important question to be considered by these preliminary results is how the fitted parameters vary as the data range is reduced. Remarkably this is found to be reasonably stable, even for very small data ranges. Results for the near-neighbor distance R with respect to the data range [0,kmax] for various values of the maximum wavevector kmax are given in Table 2. Note that the results for the fitted distances R tend toward the experimental value 2.113 Å as the k-range is increased while the errors tend to decrease. The errors in the fit parameters are still not well determined, owing to the lack of a treatment of the theoretical errors, which appear to dominate the fit. The r.m.s. errors in R, obtained from (Q-1R,R)1/2, are typically less than about 0.03 Å. However, since our fits did not include the theoretical error, and , these errors are underestimates.
|
5. Conclusions
We have found that the Bayes–Turchin approach can be applied to the analysis of XANES data, even over short experimental data ranges. The spline correction to the atomic background is found to be stable and thus provides a useful procedure to improve background subtraction in XAS.
Such developments are important, e.g. in the analysis of biostructures such as metalloproteins. For such systems the Bayes–Turchin approach has many advantages compared with conventional least-squares fitting methods. In particular, the method avoids the restriction on the size of the model parameter space. Moreover, the method can take advantage of a priori estimates of model parameters, as well as their uncertainties and correlations, thus improving the significance of fits. Thus the method can perhaps be automated and has the potential to provide a smart black-box analysis tool. Finally, the method is quite general and can be applied as an add-on to existing analysis techniques by modifying the function which is minimized, thus providing a procedure for adding fuzzy constraints on model parameters.
Acknowledgements
We thank M. Benfatto, S. Della Longa, E. A. Stern, M. Newville, G. George, B. Hedman, K. Hodgson and L. Sorensen for suggestions. This work was supported in part by the US Department of Energy grant DE-FG02-97ER45623 and the SSRL Structural Molecular Biology Program through NIH NCRR BTP grant RR-01209, and was facilitated by the DOE Computational Materials Sciences Network.
References
Ankudinov, A., Ravel, B., Rehr, J. J. & Conradson, S. (1998). Phys. Rev. B, 58, 7565–7576. (https://leonardo.phys.washington.edu/feff/ ). Google Scholar
Ankudinov, A. L., Rehr, J. J., Bouldin, C., Sims, J. & Hung, H. (2002). Phys. Rev. B, 65, 104107/1–104107/11. Web of Science CrossRef CAS Google Scholar
Benfatto, M., Congiu-Castellano, A., Daniele, A. & Della Longa, S. (2001). J. Synchrotron Rad. 8, 267–269 Web of Science CrossRef CAS IUCr Journals Google Scholar
Bijaoui, A. (2002), Signal Process. 82, 709–712. Web of Science CrossRef Google Scholar
Binsted, N., Campbell, J. W., Gurman, S. J. & Stephenson, P. C. (1991). EXCURVE. SERC Daresbury Laboratory, Warrington, UK. Google Scholar
Binsted, N., Strange, R. W. & Hasnain, S. S. (1991). Biochemistry, 31, 12117–12125. CrossRef Web of Science Google Scholar
Bouldin, C. (1990). Private communication. Google Scholar
Filipponi, A. & Di Cicco, A. (1995). Phys. Rev. B, 52, 15122–15134. CrossRef CAS Web of Science Google Scholar
George, G. N. & Pickering, I. J. (2000). EXAFSPAK, https://www-ssrl.slac.stanford.edu/exafspak.html . Google Scholar
Klementev, K. V. (2001). J. Phys. D, 34, 209–217. Web of Science CrossRef CAS Google Scholar
Krappe, H. & Rossner, H. (2000). Phys. Rev. B, 61, 6596–6610. Web of Science CrossRef CAS Google Scholar
Krappe, H. & Rossner, H. (2002). Phys. Rev. B, 66, 184303/1–184303/20. Web of Science CrossRef CAS Google Scholar
Krappe, H. J. & Rossner, H. H. (2004). Phys. Rev. B, 70, 104102/1–104102/7. Web of Science CrossRef CAS Google Scholar
Lytle, F. W., Sayers, D. E. & Stern, E. A. (1988). Physica B, 158, 701–722. Google Scholar
Newville, M. (2001). J. Synchrotron Rad. 8, 322–324. Web of Science CrossRef CAS IUCr Journals Google Scholar
Newville, M., Ravel, B., Haskel, D., Stern, E. A. & Yacoby, Y. (1995). Physica B, 208/209, 154–156. CrossRef Web of Science Google Scholar
Poiarkova, A. V. & Rehr, J. J. (1999). Phys. Rev. B, 59, 948–957. Web of Science CrossRef CAS Google Scholar
Ravel, B. & Newville, M. (2005). Phys. Scr. In the press. Google Scholar
Turchin, V. F., Kozlov, V. P. & Malkevich, M. S. (1971). Sov. Phys. Usp. 13, 681–691. CrossRef Web of Science Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.