research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767

Optimization of reflectometry experiments using information theory

CROSSMARK_Color_square_no_text.svg

aDepartment of Physics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA, bCenter for Neutron Research, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899-6102, USA, and cDepartment of Biomedical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
*Correspondence e-mail: fheinrich@cmu.edu

Edited by E. P. Gilbert, ANSTO, Kirrawee DC, Australia (Received 2 August 2018; accepted 30 November 2018)

A framework based on Bayesian statistics and information theory is developed to optimize the design of surface-sensitive reflectometry experiments. The method applies to model-based reflectivity data analysis, uses simulated reflectivity data and is capable of optimizing experiments that probe a sample under more than one condition. After presentation of the underlying theory and its implementation, the framework is applied to exemplary test problems for which the information gain ΔH is determined. Reflectivity data are simulated for the current generation of neutron reflectometers at the NIST Center for Neutron Research. However, the simulation can be easily modified for X-ray or neutron instruments at any source. With application to structural biology in mind, this work explores the dependence of ΔH on the scattering length density of aqueous solutions in which the sample structure is bathed, on the counting time and on the maximum momentum transfer of the measurement. Finally, the impact of a buried magnetic reference layer on ΔH is investigated.

1. Introduction

Neutron reflectometry (NR) is a structure determination technique that resolves the thickness and composition of thin films at interfaces and surfaces with near-ångström resolution (Smith & Majkrzak, 2006[Smith, G. S. & Majkrzak, C. F. (2006). International Tables for Crystallography, Vol. C, Mathematical, Physical and Chemical Tables, edited by E. Prince, pp. 126-146. Chester: International Union of Crystallography.]). Applications of NR reach from hard-condensed matter (Majkrzak et al., 2006[Majkrzak, C. F., O'Donovan, K. V. & Berk, N. F. (2006). Neutron Scattering From Magnetic Materials, pp. 397-471. Amsterdam: Elsevier.]) to soft matter (Russell, 1990[Russell, T. P. (1990). Mater. Sci. Rep. 5, 171-271.]), including structural biology of lipid membranes (Heinrich & Lösche, 2014[Heinrich, F. & Lösche, M. (2014). Biochim. Biophys. Acta, 1838, 2341-2349.]). Given the limited availability of neutrons for scattering experiments and the flexibility in isotopic labeling of distinct components of the surface structure, it is worthwhile to optimize the experimental design with respect to the information gain. Presently, the design of neutron scattering experiments mostly follows rules of thumb, i.e. experience gained in similar experiments in the past. Here, we implement a quantitative and predictive framework to plan reflectometry work based on rigorous estimates of the information gained in a particular implementation of the experiment. With minor changes, this framework is applicable to X-ray reflectometry and, with some extension, to neutron and X-ray small-angle scattering.

In the recent past, Bayesian statistical methods have found applications in reflectometry for robust global model fitting and the determination of confidence limits on model parameters (Sivia & Webster, 1998[Sivia, D. S. & Webster, J. (1998). Physica B, 248, 327-337.]; Kirby et al., 2012[Kirby, B. J., Kienzle, P. A., Maranville, B. B., Berk, N. F., Krycka, J., Heinrich, F. & Majkrzak, C. F. (2012). Curr. Opin. Colloid Interface Sci. 17, 44-53.]; Maranville et al., 2016[Maranville, B. B., Kirby, B. J., Grutter, A. J., Kienzle, P. A., Majkrzak, C. F., Liu, Y. & Dennis, C. L. (2016). J. Appl. Cryst. 49, 1121-1129.]; Lesniewski et al., 2016[Lesniewski, J. E., Disseler, S. M., Quintana, D. J., Kienzle, P. A. & Ratcliff, W. D. (2016). J. Appl. Cryst. 49, 2201-2209.]). In particular, the work of Sivia and co-workers has provided a solid foundation for the application of Bayesian statistics to reflectivity data, discussing aspects such as parameter estimation, model selection and experimental design. Our work concerning experimental optimization adds to this foundation by introducing model fitting based on a Monte Carlo Markov chain (MCMC) simulation, which by design yields a sample of the posterior parameter density function (PDF) (Yustres et al., 2012[Yustres, Á., Asensio, L., Alonso, J. & Navarro, V. (2012). Comput. Geosci. 16, 1-20.]; Braak & Vrugt, 2008[Braak, C. J. F. ter & Vrugt, J. A. (2008). Stat. Comput. 18, 435-446.]; Lesniewski et al., 2016[Lesniewski, J. E., Disseler, S. M., Quintana, D. J., Kienzle, P. A. & Ratcliff, W. D. (2016). J. Appl. Cryst. 49, 2201-2209.]). A measure of the information gain from a given experiment is obtained from comparing the entropies of the posterior and prior PDFs, which represent the knowledge about the sample after and before the experiment, respectively. We show that with these two additions a flexible numerical framework for experimental design can be built. In developing this methodology for reflectometry, we closely follow established implementations in other fields such as systems biology (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.], 2014[Liepe, J., Kirk, P., Filippi, S., Toni, T., Barnes, C. P. & Stumpf, M. P. H. (2014). Nat. Protoc. 9, 439-456.]).

Fig. 1[link] summarizes the implemented method to quantify the information gain of an experiment. We start with a model-dependent description of a hypothetical sample structure S and instrument configuration E parameterized by a vector θΘ. (Capital letters denote a random variable, and lower-case letters denote a particular instance of a random variable.) Importantly, θ is not randomly drawn from Θ according to the prior PDF, as we do not optimize over different sample configurations within the prior. Using a model XS,E(θ), noise-free reflectivity data XS,E(θ) → x(Qz) are simulated over a finite range of discrete, experimentally accessible momentum transfer values Qz. Random normal noise z(Qz) is added to x(Qz) to obtain simulated sets of noisy data y(Qz) that could have occurred in a real measurement of the hypothetical structure. The standard deviation σ(Qz) of the normal noise Z depends on the instrument configuration, the momentum transfer Qz and the value of x(Qz) itself. It generally differs for every data point. Finally, model parameters are retrieved from y using an MCMC simulation that returns a sample of the posterior PDF p(θ | y). The information gain of the virtual experiment is evaluated as the difference in entropy of the prior and posterior PDFs, [\Delta {\widehat H} = H(\Theta )- H(\Theta \mid Y)]. Since the MCMC simulation employs the same model that was used to calculate x(Qz), [\Delta {\widehat H}] is a measure of the gain in information exclusively about model and experimental parameters contained in Θ. Other parameters intrinsic to the model that have fixed values, such as the known scattering length density of a substrate supporting the interfacial structure and instrumental parameters like those defining the resolution function, do not affect [\Delta {\widehat H}]. To optimize an NR experiment, sample or experimental properties are systematically varied to determine the maximum [\Delta {\widehat H}] in the search space.

[Figure 1]
Figure 1
Information processing steps in a virtual reflectivity experiment. The information gain is the difference in entropy between the posterior and prior PDFs.

Different approaches to determine the information content for small-angle scattering data have been established in the past (Moore, 1980[Moore, P. B. (1980). J. Appl. Cryst. 13, 168-175.]; Taupin & Luzzati, 1982[Taupin, D. & Luzzati, V. (1982). J. Appl. Cryst. 15, 289-300.]; Luzzati & Taupin, 1986[Luzzati, V. & Taupin, D. (1986). J. Appl. Cryst. 19, 39-50.]; Müller et al., 1996[Müller, J. J., Hansen, S. & Pürschel, H.-V. (1996). J. Appl. Cryst. 29, 547-554.]; Vestergaard & Hansen, 2006[Vestergaard, B. & Hansen, S. (2006). J. Appl. Cryst. 39, 797-804.]; Pedersen et al., 2014[Pedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000-2010.]; Konarev & Svergun, 2015[Konarev, P. V. & Svergun, D. I. (2015). IUCrJ, 2, 352-360.]; Larsen et al., 2018[Larsen, A. H., Arleth, L. & Hansen, S. (2018). J. Appl. Cryst. 51, 1151-1161.]). While the employed methods differ substantially, these approaches have in common that the information content of the experiment is quantified either directly from y, or from y given x and θ. Our method is strictly model based and it describes the information gain from virtual experiments using a series of discrete information processing steps as shown in Fig. 1[link], ultimately comparing the two endpoints of this process with respect to the information content. This has the advantage that it separates the information gain on model fit parameters from known model and experimental parameters, about which information is also carried by x and y. This procedure has a substantial advantage over other implementations in that it can handle information distributed across multiple related measurements that are analyzed simultaneously with one model.

After establishing the methodology, we apply it to a set of simple model systems, thereby demonstrating the optimization of fundamental experimental properties, such as counting time, maximum momentum transfer and the choice of the scattering length density (SLD) of the bulk solvent in NR measurements of fluid-immersed samples. These examples have been chosen to best illustrate the broad usefulness of the technique, and they can easily be extended to encompass other experimental situations of practical interest. For example, while the method described here is applied to the current generation of monochromatic neutron reflectometers, it can be adapted to other types of reflectometers or be used to predict the performance of experimental stations under development.

2. Theory and implementation

2.1. Information content of specular reflection data

A specular neutron or X-ray reflectometry experiment on a sample S using an experimental configuration E results in a particular measurement of the data, yY. Experimental results are generally analyzed in terms of a model, XS,E(θ), that relates a model parameter vector θΘ to an expected experimental outcome [{\hat y} = x]. The aim of data analysis is to find the posterior PDF p(θ | y) by which a particular parameter vector θ is realized given y and XS,E(θ). The traditional task of finding the vector θ that produces the best fit to the data, or the maximum of p(θ | y), is therefore only a particular aspect within this broader definition of data analysis.

The information gain ΔH is defined as the difference between the entropy H(Θ) of the prior PDF p(θ), representing the knowledge before the experiment, and the entropy H(Θ | y) of the posterior PDF p(θ | y), obtained after the measurement yielded a particular experimental outcome yY:

[\Delta H = H\left(\Theta \right)- H\left(\Theta \mid y\right). \eqno (1)]

Both entropies are functionals of their respective continuous PDFs:

[H\left(\Theta \right) = - \textstyle\int p\left(\theta \right)\log p\left(\theta \right)\,{\rm d}\theta ,\eqno (2)]

[H\left(\Theta \mid y\right) = - \textstyle\int p\left(\theta \mid y\right)\log p\left(\theta \mid y\right)\,{\rm d}\theta. \eqno (3)]

(All logarithms in this work are taken to the base of two, such that entropies are calculated in bits.) This approach does not consider that the experimental outcome y is a random variable itself, drawn from a pool of possible experimental outcomes Y. The appropriate, but significantly more expensive, quantity to calculate is the expected information gain [\Delta {\widehat H}] given all possible experimental outcomes y, which equals the mutual information I between the random variables Y and Θ:

[\eqalignno { \Delta {\widehat H} & = H\left(\Theta \right)- H\left(\Theta \mid Y\right) \cr & = H\left(\Theta \right)- \textstyle\int H\left(\Theta \mid y\right) \,p\left(y\right)\,{\rm d}y = I(Y,\Theta). & (4)}]

Using equation (4)[link], [\Delta {\widehat H}] can in principle be computed as a Monte Carlo integration over Θ and Y (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]). The prior predictive distribution p(y) additionally needs to be computed (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]), and it can be expressed in terms of the prior and posterior PDFs using Bayes' theorem:

[\eqalign { p\left(\theta \right)\,p\left(y\mid \theta \right) & = p\left(y\right)\,p(\theta \mid y), \cr p\left(y\right) & = {{p\left(\theta \right)\,p\left(y\mid \theta \right)}\over{p(\theta \mid y)}}.} \eqno (5)]

The conditional PDF p(y | θ) of observing a particular experimental outcome y, given a parameter vector θ, can be obtained using the model XS,E(θ) and instrument-specific normal variate random noise on n data points of y(Qz) with a standard deviation σ(Qz):

[p\left(y\mid \theta \right) = \prod _{i = 1}^{n}{{1}\over{\left( {2\pi } \right)^{1/2}{\sigma }_{i}\left({Q}_{z}\right)}}\exp\left\{{-{{{1}\over{2}}}{\left[{{{y}_{i}\left({Q}_{z}\right) -{x}_{i}\left({Q}_{z}\right) }\over{{\sigma }_{i}\left({Q}_{z}\right)}}\right]}^{2}}\right\}. \eqno (6)]

In practice, however, such a nested Monte Carlo integration is computationally costly.1 We therefore approximate [\Delta {\widehat H}] from the average of up to ten calculations of ΔH using independently simulated experimental data y (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]).

2.2. Implementation of the algorithm

2.2.1. Simulation of experimental data

This work is carried out with simulated data to avoid systematic errors due to particular experimental instrumentation and to explore a large parameter space for optimization. Experimental data y(Qz) are simulated for the Magik reflectometer at the NIST Center for Neutron Research (NCNR) in Gaithersburg, MD, USA (Dura et al., 2006[Dura, J. A., Pierce, D. J., Majkrzak, C. F., Maliszewskyj, N. C., McGillivray, D. J., Lösche, M., O'Donovan, K. V., Mihailescu, M., Perez-Salas, U., Worcester, D. L. & White, S. H. (2006). Rev. Sci. Instrum. 77, 074301.]), with a beam footprint on the sample surface of 2.5 × 5 cm, equipped with a fluids cell for solvent-immersed samples. The models XS,E(θ) of interfacial structures were implementations of stratified slabs of homogeneous SLD (slab models) (Ankner & Majkrzak, 1992[Ankner, J. F. & Majkrzak, C. F. (1992). Proc. SPIE, 1738, 260-269.]). Noise-free experimental outcomes x(Qz) were calculated with refl1d (Kirby et al., 2012[Kirby, B. J., Kienzle, P. A., Maranville, B. B., Berk, N. F., Krycka, J., Heinrich, F. & Majkrzak, C. F. (2012). Curr. Opin. Colloid Interface Sci. 17, 44-53.]). To obtain the final simulated reflectivity y(Qz), normally distributed random noise z(Qz) with standard deviations σS,E[Qz, x(Qz), θ] was added to x(Qz). A detailed description of the data simulation is provided in the supporting information.

For the optimization of a particular experiment, parameter vectors θΘ for data simulation should be strictly drawn at random from Θ according to the prior PDF p(θ) (see Introduction[link]). However, for the problems discussed in this work, and for many other applications, the sample structure is sufficiently well known that variations within p(θ) are not expected to significantly affect ΔH and are negligible compared with changes in ΔH that occur when systematically varying parameters over much larger ranges during the experimental optimization. Choosing one sample representation θΘ from the prior PDF has the additional advantage that a costly Monte Carlo simulation over the prior PDF, which would be otherwise necessary, is avoided (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]). For largely unknown sample structures that would be implemented by a much wider prior PDF, the dependence of ΔH on the particular sample representation θΘ might not be negligible. In this case, the framework presented in this work can easily be extended to include a Monte Carlo simulation over the prior PDF.

2.2.2. Entropy of the prior parameter density function

Lacking more detailed prior knowledge, the prior PDF p(θ) is the product of the prior probabilities of the assumed independent vector components θi. It is further assumed that the PDF of each component, p(θi), is constant over an interval Δθi with p(θi) = 1/Δθi:

[p\left(\theta \right) = \textstyle\prod p\left({\theta }_{i}\right) = \prod 1/\Delta {\theta }_{i}. \eqno (7)]

Using equation (2)[link], the entropy of p(θ) becomes the sum of the entropies of the independent p(θi):

[H\left(\Theta \right) = \textstyle\sum \log\Delta {\theta }_{i} .\eqno (8)]

The prior PDF is subjectively set by the experimenter. In addition to the calculation of ΔH, it is used to compute the acceptance probability of new states of the Markov chain during the MCMC analysis of y(Qz). As such, the choice of Δθi affects the posterior PDF in that it excludes parameter values outside of Δθi. Only parameters with non-uniform contributions to the posterior PDF, or in other words parameters that can be resolved with respect to the prior PDF, add to the information gain (see Section 3.1.3[link] for a detailed discussion). A change in interval length Δθi for a resolvable parameter leads to a constant offset in H(Θ) and, therefore, ΔH, which is inconsequential for the purpose of experimental optimization, as it relies on relative differences in ΔH.

2.2.3. Entropy of the posterior parameter density function

The posterior PDF is obtained from an MCMC simulation implemented in refl1d (Kirby et al., 2012[Kirby, B. J., Kienzle, P. A., Maranville, B. B., Berk, N. F., Krycka, J., Heinrich, F. & Majkrzak, C. F. (2012). Curr. Opin. Colloid Interface Sci. 17, 44-53.]) using the simulated data y and the model XS,E(θ) as inputs. MCMC analysis yields an unnormalized sample of the posterior PDF. We calculated the entropy of p(θ | y) using two different methods. The first method constructs a multivariate normal (MVN) probability density approximation from a random sample of 1000 points of the posterior:

[{p}^{\rm MVN}\left(\theta \mid y\right) = {{1}\over{{\left(2\pi \right)}^{d/2}{|\Sigma |}^{1/2}}}\exp{\left[-\textstyle {{1}\over{2}}{\left(\theta -\mu \right)}^{\rm T}{\Sigma }^{-1}\left(\theta -\mu \right)\right]}. \eqno (9)]

The vector μ is the mean of the sample of d-dimensional parameter vectors θ. |Σ| denotes the determinant of the variance–covariance matrix Σ of θ. Both values can be defined in terms of an expectation value [{\bb E}]:

[\mu = {\bb E}\left[\theta \right], \eqno (10)]

[\eqalignno { \Sigma & = {\bb E}\left[\left(\theta -\mu \right){\left(\theta -\mu \right)}^{\rm T}\right] \cr & = \left[{\rm Cov}\lfloor {\theta }_{i},{\theta }_{j}\rfloor, 0\lt i,j\lt d\right]. & (11)}]

The entropy of the MVN distribution is then computed as (Chen et al., 2016[Chen, B., Wang, J., Zhao, H. & Principe, J. C. (2016). Entropy, 18, 196.])

[{H}^{\rm MVN}\left(\Theta \mid y\right) = {{d}\over{2}}\log2\pi +{{d}\over{2}}+\log|\Sigma |. \eqno (12)]

The second method to calculate the entropy of the posterior follows Kramer et al. (2010[Kramer, A., Hasenauer, J., Allgöwer, F. & Radde, N. (2010). 2010 IEEE International Conference on Control Applications, pp. 493-498. IEEE.]). Here, the entropy of the posterior is obtained by Monte Carlo sampling from the unnormalized MCMC output using a sample size of 5000, while the normalization factor is obtained from a kernel density estimate using a Gaussian kernel (Silverman, 1986[Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London, New York: Chapman and Hall.]). In the following, we denote this approach as the KDN method.

The sample sizes for the MVN and KDN approaches represent limits for which a computation of an equilibrated MCMC plus entropy estimate remains feasible given current computational resources. Because of those limits on sample size and the high dimensionality of NR models, an accurate and robust computation of the posterior entropy is often challenging. The MVN estimate fulfills here the role of a widely used reference against which the KDN estimate can be validated. As shown in Section 3[link], the MVN approximation was robust over repetitions of MCMC simulations but tended to underestimate the entropy of the posterior and, therefore, overestimated ΔH. This is not unexpected, as the MVN approximation performs less well on non-normal, i.e. asymmetric and heavily tailed, distributions (Kramer et al., 2010[Kramer, A., Hasenauer, J., Allgöwer, F. & Radde, N. (2010). 2010 IEEE International Conference on Control Applications, pp. 493-498. IEEE.]). The KDN method proved less robust, in turn, leading to somewhat larger standard deviations on ΔH and occasional outliers that were identified and eliminated.

2.2.4. Information gain

As discussed above, the computation of [\Delta {\widehat H} = I(Y, \Theta)] is significantly costlier than calculating the gain in information ΔH obtained from only a single experimental representation y. For large multivariate models such as those used in NR, computing [\Delta {\widehat H}] is currently not feasible. The differences between ΔH and [\Delta {\widehat H}] have been determined for models with fewer parameters. While significant, they were shown to be smaller than the changes in entropy due to experimental optimization in other applications (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]). This observation is in agreement with results in this work. We estimated the difference between [\Delta {\widehat H}] and ΔH by averaging multiple independent values of ΔH and showed that the variations in ΔH for individual points of the optimization are significantly smaller than the changes in ΔH over the entire range of systematically varied parameters to be optimized.

2.2.5. Marginalization of the posterior parameter density function

Most models XS,E contain a subset of nuisance parameters δ that are required for constructing a valid model but are not of practical interest to the experimenter. Together with the parameters of interest η, they form the parameter space θ = (η, δ). Consequently, the relevant quantity for optimizing an NR experiment is often the marginal entropy of the posterior Hη(Θ | y) with respect to the parameters of interest (Sivia & Webster, 1998[Sivia, D. S. & Webster, J. (1998). Physica B, 248, 327-337.]; Chen et al., 2016[Chen, B., Wang, J., Zhao, H. & Principe, J. C. (2016). Entropy, 18, 196.]):

[{H}_{\eta }\left(\Theta \mid y\right) = -\textstyle\int p\left(\eta \mid y\right)\log p\left(\eta \mid y\right)\,{\rm d}\eta .\eqno (13)]

The marginal posterior PDF p(η | y) is obtained by integrating the joint probability of η and δ over the nuisance parameters δ:

[p\left(\eta \mid y\right) = \textstyle\int p\left(\eta, \delta \mid y\right) \,{\rm d}\delta. \eqno (14)]

Using an MVN distribution, a marginal entropy HηMVN(Θ | y) is calculated rather easily by dropping the unwanted parameters from the covariance matrix and the mean vector:

[{H}_{\eta }^{\rm MVN}\left(\Theta \mid y\right) = {{{d}_{\eta }}\over{2}}\log 2\pi +{{{d}_{\eta }}\over{2}}+\log|{\Sigma }_{\eta }|. \eqno (15)]

The computation of a KDN equivalent of the marginal entropy that involves Monte Carlo sampling from the MCMC-obtained posterior PDF is difficult and will be the topic of a future study. In this work, we exclusively compute total entropies of the posterior PDF and evaluate confidence limits on parameters of interest separately for selected points in the optimization space.

3. Results

3.1. A test structure

For a first demonstration of the method, we start with a simple artificial interfacial structure: a porous, atomically flat Si layer suspended above a planar solid Si substrate in aqueous solvent (Fig. 2[link]). This structure is characterized at first with one and then with two NR measurements, evaluated with a simple slab model, and analyzed for the resulting information gain under systematic variation of the SLD of the solvent, ρn. Model parameters are provided in Table 1[link]. The aim of the optimization is to identify the isotopic constitution of the aqueous solvent that maximizes information gain as ρn is varied between that of H2O (ρn ≃ −0.5 × 10−6 Å−2) and D2O (ρn ≃ 6.5 × 10−6 Å−2).

Table 1
Simulation parameters and MCMC fit results for a virtual NR measurement of the system shown in Fig. 2[link] in which one bulk solvent SLD was optimized

Where ranges are given in the first column, the parameter was systematically varied within these boundaries. Median parameter values and 68% confidence limits were determined by an MCMC fit of the simulated data and are given for selected solvent SLD values of the entire optimization range shown in Fig. 3[link].

      MCMC fit results (θ | y)
      Solvent SLD, ρn (10−6 Å−2)
Model parameter Parameterized sample representation, θ Fit boundaries, prior PDF limits −0.5 2 6.5
Thickness of interstitial water (Å) 20 ±10 21 ± 1 20 ± 7 20.0 ± 0.2
Thickness of porous Si layer (Å) 30 ±10 29.1 ± 0.9 29 ± 7 29.8 ± 0.2
SLD of porous Si layer (10−6 Å−2) 2.07 ±1 2.0 ± 0.1 2.1 ± 0.1 2.1 ± 0.2
Volume fraction of porous Si layer 0.95 ±0.05 0.95 ± 0.04 0.95 ± 0.03 0.95 ± 0.04
SLD of solvent, ρn (10−6 Å−2) [−0.5, 6.5] ±0.5 −0.50 ± 0.03 2.1 ± 0.1 6.50 ± 0.01
Interfacial roughness (Å) 3 ±1 2.9 ± 0.6 3.0 ± 0.7 3.4 ± 0.5
Log10 of background −8 ±1 −7.9 ± 0.7 −8.1 ± 0.6 −8.0 ± 0.7
†A constant background is routinely fitted as a free parameter to each experimental NR curve, and the same procedure is adopted here. This parameter accounts for insufficient background subtraction during data reduction and is a small fraction of the total background (see supporting information).
[Figure 2]
Figure 2
(a) Structural model of the test system. A 30 Å-thick porous Si layer (95% Si by volume) is surrounded by aqueous solvent and suspended at a distance of 20 Å from a solid Si surface. Pores in the Si layer are solvent filled. (b) Calculated reflectivities with simulated noise for three different solvent SLD values, reflectivity curves that are best fits to the data and their associated SLD profiles (inset). The noisy data and error bars correspond to those expected in a measurement of this hypothetical sample at a current-generation reactor-based instrument such as the Magik neutron reflectometer at the NCNR. Error bars represent 68% confidence limits.
3.1.1. One solvent contrast

Fig. 3[link] shows the expected information gain from a single NR measurement as a function of the SLD of the aqueous medium that surrounds and penetrates the porous Si layer (`one solvent contrast'). The MVN and KDN methods for entropy determination yield similar results, with the MVN results consistently slightly higher than the KDN results. The error bars in Fig. 3[link] represent standard deviations from five independent simulations per data point and allow us to assess the error introduced by computing ΔH instead of [\Delta {\widehat H}]. For this particular example, this error is significantly smaller than the changes in ΔH due to variation in solvent SLD.

[Figure 3]
Figure 3
Information gain ΔH as a function of the SLD of the aqueous solvent in a virtual NR experiment of the model structure shown in Fig. 2[link]. ΔH calculated using the MVN and KDN approximations follow each other closely. Error bars indicate one standard deviation obtained from five independent simulations.

The minimum information gain is observed under the condition that the bulk solvent SLD matches that of Si (ρn ≃ 2 × 10−6 Å−2). In this case, the porous solvent-filled Si slab, the substrate and the solvent all have the same SLD and are thus `invisible' to neutrons. The residual gain at the minimum of the curve, ΔH ≃ 3 bits, stems from a high confidence in determining the SLDs of the porous silicon layer, the bulk solvent and the scattering background. Because the SLD of the semi-infinite Si substrate is known, the unknown SLDs are well determined under matching conditions, even though the observed neutron reflectivity is zero. Other model parameters can only be resolved if the experimenter uses a solvent with an SLD different from that of Si. Consequently, the gain in information increases when its SLD deviates from that of Si, reaching ΔH ≃ 10 bits for H2O-based solvent and ΔH ≃ 15 bits for D2O. For the same absolute difference of the solvent SLD from that of Si, positive SLD deviations yield a larger information gain than negative deviations. This phenomenon is due to several effects. First, because the incoherent scattering from D2O is lower than that from H2O, samples bathed in D2O-rich solvent show lower intrinsic scattering background. Second, the higher SLDs of D2O-rich solvents lead to higher neutron reflectivity throughout the simulated range of Qz (0.08 ≤ Qz ≤ 0.26 Å−1), which can be determined with higher confidence. Finally, the presence of a critical angle of total internal reflection in the NR curve increases the gain in information for solvent SLDs with ρn > 4 × 10−6 Å−2, at which this critical angle can be observed within the simulated Qz range.

Explicit fit parameters and their uncertainties for three exemplary bulk solvent SLD values are listed in Table 1[link]. The parameter uncertainties for ρn = −0.5 × 10−6 Å−2 and ρn = 6.5 × 10−6 Å−2 are significantly smaller over the entire set of model parameters, reflecting the increased information gain for those solvents.

To put the abstract values of ΔH given in Fig. 3[link] in perspective, the following simplified comparison is educative. Under the assumption that the posterior PDF of a single uncorrelated parameter can be approximated by a Gaussian distribution, the contribution of this parameter to the posterior entropy is determined by the standard deviation σ of the distribution:

[H = \log\left[\sigma \left( {2\pi e} \right)^{1/2}\right]. \eqno (16)]

With respect to the entropy of the corresponding uniform prior PDF over the interval Δθ [equation (8)[link]], the contribution to the information gain from this single parameter is

[\Delta H = \log{{\Delta \theta }\over{\sigma \left( {2\pi e} \right)^{1/2}}}. \eqno (17)]

A Gaussian posterior PDF with a standard deviation approximately one-quarter [i.e. 1/(2πe)1/2] of the width of a uniform prior PDF has the same entropy as the latter and yields therefore zero information gain. Standard deviations above this threshold contribute a limited loss of information, particularly for the MVN estimate, only because of the different functional forms used to describe the prior and posterior PDFs. Equations (8)[link] and (16)[link] also show that decreasing the widths of either a uniform prior or a Gaussian posterior PDF by one-half changes the information gain by ∼1 bit (while neglecting parameter correlations).

3.1.2. Two solvent contrasts

The information gain of an NR experiment that consists of two reflectivity measurements with different solvents is shown in Fig. 4[link]. The two solvent SLDs in these measurements were independently varied between those of pure H2O and D2O, and data analysis was performed under the constraint that both structural models share the same set of parameters, except for the two solvent SLDs and their associated background levels. The minimum information gain is observed when both solvent SLDs are ρn = 2 × 10−6 Å−2. Particularly high information gains are found for bulk compositions at the extreme margins of solvent SLDs, (ρn1, ρn2) = (−0.5 × 10−6 Å−2, 6.5 × 10−6 Å−2). However, a combination of measurements with pure D2O and an H2O/D2O mixture with ρn = 4 × 10−6 Å−2 (denoted as CM4) yields a comparable information gain.

[Figure 4]
Figure 4
Information gain ΔH as a function of the aqueous solvent SLDs in two simultaneously evaluated, independent NR measurements of the model structure in Fig. 2[link]. SLDs are varied in steps of 0.5 × 10−6 Å−2. Entropies of the posterior were calculated using the MVN (a) and KDN (b) approximations. Symmetry-related data were independently calculated with both methods to obtain a visual impression of data reproducibility.

Similarly to the previous optimization, the KDN and MVN entropy estimates in Fig. 4[link] yield qualitatively equivalent results. While the MVN estimate slightly overestimates the information gain, the KDN estimate shows somewhat larger uncertainties in each data point. In all further examples discussed in this work, we only used KDN estimates.

Fit parameters and their uncertainties for three regions of particularly small and large ΔH (Fig. 4[link]) are provided in Table 2[link]. For the minimum information gain at (ρn1, ρn2) = (2.0 × 10−6 Å−2, 2.0 × 10−6 Å−2), the result is comparable to the experiment with one solvent contrast. Only the parameter uncertainties for the SLD and volume fraction of the porous Si, and the SLDs of both bulk solvents, show significant improvement over the prior PDFs. The parameter uncertainties for (−0.5 × 10−6 Å−2, 6.5 × 10−6 Å−2) and (6.5 × 10−6 Å−2, 4 × 10−6 Å−2) are significantly smaller than for (2.0 × 10−6 Å−2, 2.0 × 10−6 Å−2) over the entire set of model parameters.

Table 2
Results for a virtual NR experiment on the system shown in Fig. 2[link], which allows for two measurements with two distinct bulk solvent SLDs that were optimized (Fig. 4[link]) (for other details refer to Table 1[link])

      MCMC fit results (θ | y)
      Solvent SLDs ρn1, ρn2 (10−6 Å−2)
Model parameter Parameterized sample representation, θ Fit boundaries, prior PDF limits (2.0, 2.0) (−0.5, 6.5) (4.0, 6.5)
Thickness of interstitial water (Å) 20 ±10 21 ± 8 19.9 ± 0.2 19.9 ± 0.2
Thickness of porous Si layer (Å) 30 ±10 29 ± 7 30.0 ± 0.1 30.2 ± 0.2
SLD of porous Si layer (10−6 Å−2) 2.07 ±1 2.1 ± 0.1 2.08 ± 0.03 2.15 ± 0.06
Volume fraction of porous Si layer 0.95 ±0.05 0.95 ± 0.03 0.955 ± 0.009 0.97 ± 0.02
SLD of solvent 1, ρn1 (10−6 Å−2) [−0.5, 6.5] ±0.5 2.1 ± 0.1 6.499 ± 0.004 6.493 ± 0.003
SLD of solvent 2, ρn2 (10−6 Å−2) [−0.5, 6.5] ±0.5 2.1 ± 0.1 −0.49 ± 0.02 4.001 ± 0.002
Interfacial roughness (Å) 3 ±1 3.0 ± 0.7 2.7 ± 0.5 3.6 ± 0.4
Log10 of background 1 −8 ±1 −8.0 ± 0.7 −7.9 ± 0.7 −8.2 ± 0.5
Log10 of background 2 −8 ±1 −7.7 ± 0.7 −8.0 ± 0.7 −7.9 ± 0.7

Evaluations such as those shown in Fig. 4[link] are invaluable to determine whether it is advantageous in a real experiment to consecutively measure two NR curves from a sample bathed in distinct solvents or, rather, to allocate the same total measurement time to a single measurement with one solvent. In this example, values at the plot diagonal are consistently lower than off-diagonal values, which would argue in favor of conducting two distinct measurements.

3.1.3. Dependence of ΔH on counting time

Fig. 5[link](a) shows the information gain ΔH as a function of the counting time t for a single NR measurement of the model structure (Fig. 2[link]) immersed in D2O. After an initial fast increase of ΔH within the first 3 h of measurement, the information gain quickly enters a region of diminishing return in which further improvement requires increasingly long counting times. For the presented example and simulated instrument, we consider the transition region between those two regimes at 3–6 h an optimal range of counting time. The observed time dependence of the information gain is similar to those previously reported (Pedersen et al., 2014[Pedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000-2010.]; Berk & Majkrzak, 2009[Berk, N. F. & Majkrzak, C. F. (2009). Langmuir, 25, 4132-4144.]). It suggests that, at least for this simple test structure, an increase in neutron flux on future instrumentation will allow for shorter measurement times but may not yield significantly reduced fit parameter confidence limits.

[Figure 5]
Figure 5
Information gain ΔH as a function of the counting time t for a single NR measurement of the model structure (Fig. 2[link]) in D2O, calculated using the KDN estimate and plotted on linear (a) and logarithmic (b) time scales. The continuous curves are fits to equation (23)[link], which describes the information gain as being limited by the capacity of m′ parallel independent Gaussian channels. The displayed time range in (a) has been shortened to 24 h to focus on experimentally relevant measurement times. Error bars indicate one standard deviation obtained from ten independent simulations.

To describe the functional form of the relation shown in Fig. 5[link], we simplify the situation and consider the capacity of the Gaussian channel shown in Fig. 1[link] to be the single limiting factor for the information gain. In other words, for the current example we assume that the exact parameter vector θ used for data simulation could be retrieved by the MCMC simulation given noise-free reflectivity data y(Qz). We therefore neglect other limiting factors such as the loss of phase information during reflectivity simulation or shortcomings of the MCMC algorithm. The capacity C of a communication channel is defined as the maximum mutual information between the input X and the output Y for all possible choices of p(x). In communication theory, C sets the maximum transmission rate of information over the channel; in our example it constitutes an upper limit on the information gain on x by knowing y.

As shown in Fig. 1[link], the Gaussian channel adds random normal noise z(Qz) with standard deviation σ(Qz) to the noise-free simulated reflectivity x(Qz), thus providing the reflectivity with noise y(Qz). Since the reflectivity is simulated for n discrete values of Qz (n data points), such a Gaussian channel can be described by the combined action of n independent parallel Gaussian channels, each of which adds random normal noise to one data point of the simulated reflectivity. n independent parallel Gaussian channels have a combined maximum capacity C that depends on the signal-to-noise ratio y/σ per channel (Cover & Thomas, 2006[Cover, T. M. & Thomas, A. J. (2006). Elements of Information Theory. Hoboken: Wiley-Interscience.]):

[C = \sum \limits_{i = 1}^n {\textstyle{1 \over 2}}\log \left[{1 + {{{y_i}{{\left({{Q_z}} \right)}^2}} \over {{\sigma _i}{{\left({{Q_z}} \right)}^2}}}} \right]. \eqno (18)]

Neglecting contributions from background subtraction and incident intensity normalization, the signal-to-noise ratio y/σ can be computed solely from counting statistics. For each reflectivity data point, y/σ depends only on the number of specular counts N, which is the product of the counting time t and a constant specular count rate r:

[{y / \sigma } = {N / { N^{1/2} }} = \left( {rt} \right)^{1/2}. \eqno (19)]

We can therefore rewrite the combined capacity of the Gaussian channels as

[C = \textstyle \sum \limits_{i = 1}^n {1 \over 2}\log \left({1 + {r_i}{t_i}} \right) .\eqno (20)]

In a typical NR measurement, the signal-to-noise ratio y/σ = (rt)1/2 is kept approximately constant over the entire Qz range by increasing both counting time and beam cross section as Qz increases, in order to offset the general Qz−4 dependence of the specular reflectivity. We therefore simplify equation (20)[link] by assuming that all channels have the same relative variance, which is measured in an effective time t and at an effective rate r. We arbitrarily choose that t represents the total counting time of the reflectivity curve (instead of, for example, choosing the average measurement time per point which would only change the effective rate).2 On the basis of the Shannon–Nyquist sampling theorem (Pedersen et al., 2014[Pedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000-2010.]; Konarev & Svergun, 2015[Konarev, P. V. & Svergun, D. I. (2015). IUCrJ, 2, 352-360.]), the reflectivity data in the example are heavily oversampled. In addition, R is band limited. Consequently, not all n channels are independent, and we can write the channel capacity as that of m effective independent channels (m < n) (Cover & Thomas, 2006[Cover, T. M. & Thomas, A. J. (2006). Elements of Information Theory. Hoboken: Wiley-Interscience.]):

[C = m \textstyle{1 \over 2}\log \left({1 + rt} \right). \eqno (21)]

The channel capacity C imposes an upper limit on the actual channel rate I(X, Y), which itself is an upper limit on the information gain ΔH of the entire virtual experiment shown in Fig. 1[link]:

[\Delta H \simeq \Delta {\widehat H} \le I(X,Y) \le C. \eqno (22)]

Consequently, we apply the following equation for analysis of the KDN-derived information gain (Fig. 5[link]):

[\Delta H(t) = m^\prime\textstyle{{1}\over{2}}\log(1 + r^\prime t) + \Delta H_0. \eqno (23)]

The coefficient m′ can be interpreted as the number of independent parameters determined in the experiment, and r′ is associated with an average rate of increase in information gain per parameter. Fig. 5[link] shows the fit to ΔH(t) that yields m′ = 4.2 ± 0.1 and r′ = 219 ± 5 h−1. The value of m′ indicates four independent parameters, and inspection of the last column in Table 1[link] confirms this estimate, as four out of seven parameters show a significant improvement over the prior PDF (t = 6 h). ΔH0 is the systematic error in calculating the information gain for t → 0, which stems from evaluating a uniform posterior PDF that equals the prior PDF using the KDN estimate (see Section 3.1.1[link]). ΔH0 was determined to be −0.89 ± 0.05 bits.

The values of ΔH for 0.375 ≤ t ≤ 3 h show a comparatively high uncertainty (see Fig. 5[link]). When inspecting individual parameter uncertainties over this interval (data not shown), it was found that in this region a transition occurs, in which the number of independent parameters that can be resolved (are in scope of the prior PDF) increases from three to four, and variations in the simulated reflectivity due to random noise can lead to either outcome in the MCMC analysis. Accordingly, when fitting ΔH(t) using equation (23)[link] and a limited time interval 0 ≤ t < 0.375 h, a coefficient m′ = 3.0 ± 0.1 is obtained (fit not shown), which agrees with the ability to resolve three independent parameters. This indicates that, strictly, ΔH(t) has to be fitted piecewise over intervals of t, in which the number of resolvable independent parameters does not change. A thorough exploration of this aspect goes beyond the objective of this work and is left for a future study.

3.1.4. Dependence of ΔH on the maximum momentum transfer

Fig. 6[link] shows the dependence of the information gain on counting time t and maximum momentum transfer Qz,max of the simulated data for the test structure (Fig. 2[link]) immersed in D2O [Fig. 6[link](a)], and two related structures in which all layer thicknesses are scaled by a factor of 0.5 [Fig. 6[link](b)] or a factor of 2 [Fig. 6[link](c)]. For all three structures, ΔH shows an increase with t similar to that in Fig. 5[link], which is equivalent to a vertical slice of the independent optimization shown in Fig. 6[link](a) at Qz,max = 0.26 Å−1. All structures show a rather sudden increase in information gain when the reflectivity is extended beyond a certain critical value of Qz,max, which roughly matches the position of the second minimum of the reflectivity curves [Fig. 2[link](b)]. For the original test structure, this transition occurs at Qz,max ≃ 0.2 Å−1 [Fig. 6[link](a)], corresponding to the thicknesses of the Si slab of 30 Å. For sufficiently high Qz,max, a third reflection minimum can be observed at Qz = 0.3 Å−1, which stems from the 20 Å-thick interstitial water layer. However, Fig. 6[link](a) indicates that extending the reflectivity to this value does not significantly increase the information gain further. Accordingly, the thicknesses of both sample layers are already well resolved when limiting the reflectivity to Qz,max = 0.26 Å−1 (t = 6 h) (see Table 1[link], last column). This result is consistent with the canonical resolution estimate (Schalke & Lösche, 2000[Schalke, M. & Lösche, M. (2000). Adv. Colloid Interface Sci. 88, 243-274.]), which for Qz,max = 0.26 Å−1 yields a smallest resolvable structure size of Δz = π/Qz,max = 12 Å. A discussion of the effect of a limited Qz range on the information gain that goes beyond these rather qualitative arguments will require theory on time- and bandwidth-limited Gaussian channels (Cover & Thomas, 2006[Cover, T. M. & Thomas, A. J. (2006). Elements of Information Theory. Hoboken: Wiley-Interscience.]), and provides a promising avenue for future studies.

[Figure 6]
Figure 6
(a) Information gain (KDN estimate) as a function of maximum momentum transfer Qz,max and counting time t of the measurement for the model structure shown in Fig. 2[link] and related structures in which all layer thicknesses were multiplied by 1/2 (b) and 2 (c). The counting times shown apply to Qz,max = 0.26 Å−1, but were shorter and longer for smaller and larger Qz,max, respectively, as we have chosen an optimization scheme that preserves the counting statistics for individual data points, but not the total counting time when varying Qz,max.

With respect to experimental design, it is useful to determine the critical value of Qz, max for a particular sample to avoid spending neutron beamtime at unnecessarily high Qz for which the signal-to-noise ratio becomes increasingly unfavorable. Real-world samples, as opposed to the simulated structures used in this work, do not necessarily have a smallest feature size that would define Qz,max. Therefore, future optimizations using more complex structural models with a larger range of feature sizes will be of high interest to determine how to limit the Qz range of a measurement according to the smallest feature size of interest to the experimenter.

3.2. Influence of the substrate structure on information gain

NR sample substrates are sometimes engineered to contain one or several nanoscopic layers of high SLD buried near the interface which are not part of the interfacial structure of interest. Magnetic reference layers that scatter incident neutrons differently, depending on the polarization of the neutron in a magnetic field, can be particularly powerful in elucidating interfacial details (Holt et al., 2009[Holt, S. A., Le Brun, A. P., Majkrzak, C. F., McGillivray, D. J., Heinrich, F., Lösche, M. & Lakey, J. H. (2009). Soft Matter, 5, 2576-2586.]). Such sample engineering has been demonstrated to allow for a direct inversion of reflectivity data in certain cases (Blasie et al., 2003[Blasie, J. K., Zheng, S. & Strzalka, J. (2003). Phys. Rev. B, 67, 224201.]; Majkrzak et al., 2009[Majkrzak, C. F., Berk, N. F., Kienzle, P. A. & Perez-Salas, U. (2009). Langmuir, 25, 4154-4161.]). Here we explore the effect of a buried nanoscopic layer on the information content of NR data using a previously published test case (Zimmermann et al., 2000[Zimmermann, K. M., Tolan, M., Weber, R., Stettner, J., Doerr, A. K. & Press, W. (2000). Phys. Rev. B, 62, 10377-10382.]; Majkrzak & Berk, 2003[Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27-38.]).

Zimmermann et al. (2000[Zimmermann, K. M., Tolan, M., Weber, R., Stettner, J., Doerr, A. K. & Press, W. (2000). Phys. Rev. B, 62, 10377-10382.]) described a set of distinct X-ray scattering length density profiles that yield nearly indistinguishable reflectivity curves. In turn, this prevents a unique determination of the SLD profile if any one of those reflectivities were measured. Majkrzak & Berk (2003[Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27-38.]) constructed a set of similar neutron SLD profiles that result in the same ambiguity (profiles 1 and 2 in Fig. 7[link]). Both studies demonstrated that even partial knowledge of the sample structure can be insufficient to uniquely determine the SLD profile. It was shown that additional information, such as embedded reference structures, is necessary to uniquely determine the profiles.

[Figure 7]
Figure 7
(a) Two distinct surface structures (profile 1 and profile 2) on an Si substrate in air are indistinguishable by (non-polarized) neutron reflection, as shown by the red and black reflectivity curves in (b) (Majkrzak & Berk, 2003[Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27-38.]). However, a reference layer buried beneath the surface structure [gray slab in (a) with 120 Å thickness and tunable SLD] is able to sufficiently increase the signal-to-noise ratio in the reflectivity to resolve the two profiles (blue and green curves, exemplarily shown for ρnnucl = 10 × 10−6 Å−2, ρnsplit = 0). Error bars indicate 68% confidence limits.

Here, we systematically explore this problem by burying a tunable soft magnetic reference layer of finite thickness near the substrate surface [gray layer in Fig. 7[link](a)]. The SLD of the reference layer attains two values that depend on the polarization of the incident neutrons. The total SLD of the reference layer is the sum of the nuclear SLD and the magnetic splitting, ρn ± = ρnnucl ± ρnsplit. Both parameters depend on the choice of the magnetic material and were thus systematically varied in the analysis (see Table 3[link]). For every point in the optimization space, including those with ρnsplit = 0, two reflectivity curves were simulated and analyzed, one for each neutron polarization. The statistical quality of each curve is equivalent to that obtained after 30 h counting on the Magik reflectometer at the NCNR. Reflectivities with ρnsplit = 0 are equivalent to those that would be obtained in a non-polarized NR experiment. The condition at which ρnnucl matches that of the underlying Si substrate and ρnsplit = 0 reproduces the structures described in the original work (Majkrzak & Berk, 2003[Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27-38.]).

Table 3
Results for a virtual reflectivity experiment with polarized neutrons on the system shown in Fig. 7[link](a)

A 120 Å-thick reference layer is buried beneath the interfacial profile of interest. The SLD of this layer consists of a nuclear part and a magnetic splitting which leads to distinct SLD values seen by different neutron polarizations in a scattering experiment. The impact of (ρnnucl, ρnsplit) on ΔH was systematically evaluated in this analysis and is presented here for selected values. For other details refer to Table 1[link].

      MCMC fit results (θ | y)
      SLD of reference layer, (ρnnucl, ρnsplit) (10−6 Å−2)
      Profile 1 Profile 2
Parameter Parameterized sample representation, θ Fit boundaries, prior PDF limits (2, 0) (6, 0) (6, 0) (6, 2)
Nuclear SLD reference layer, ρnnucl (10−6 Å−2) [−2, 10] ±0.5 2.000 ± 0.002 6.002 ± 0.002 5.997 ± 0.002 6.002 ± 0.002
Magnetic SLD of reference layer, ρnsplit (10−6 Å−2) [0, 4] ±0.5 0.000 ± 0.004 0.001 ± 0.002 0.001 ± 0.002 2.001 ± 0.002
SLD layer 1 (10−6 Å−2) 0.6/0.4 [−0.5, 2.5] 0.5 ± 0.2 0.601 ± 0.006 0.402 ± 0.006 0.400 ± 0.006
SLD layer 2 (10−6 Å−2) 0.4/0.7 [−0.5, 2.5] 0.6 ± 0.1 0.407 ± 0.005 0.701 ± 0.006 0.698 ± 0.006
SLD layer 3 (10−6 Å−2) 0.7/0.5 [−0.5, 2.5] 0.6 ± 0.2 0.703 ± 0.006 0.493 ± 0.006 0.509 ± 0.006
SLD layer 4 (10−6 Å−2) 1.4/1.3 [−0.5, 2.5] 1.4 ± 0.2 1.400 ± 0.006 1.297 ± 0.006 1.302 ± 0.005
SLD layer 5 (10−6 Å−2) 1.2/1.45 [−0.5, 2.5] 1.4 ± 0.1 1.197 ± 0.006 1.451 ± 0.005 1.437 ± 0.005
SLD layer 6 (10−6 Å−2) 1.6/1.3 [−0.5, 2.5] 1.6 ± 0.2 1.602 ± 0.006 1.304 ± 0.006 1.298 ± 0.006
Interfacial roughness (Å) 3 [2, 5] 2.7 ± 0.5 3.0 ± 0.1 2.9 ± 0.1 3.09 ± 0.08
Log10 background spin ↑↑ −8 [−9, −5] −8.1 ± 0.2 −8.3 ± 0.4 −8.3 ± 0.4 −7.9 ± 0.2
Log10 background spin ↓↓ −8 [−9, −5] −8.0 ± 0.1 −7.7 ± 0.3 −8.1 ± 0.5 −7.7 ± 0.4

The information gain ΔH from virtual NR measurements of profiles 1 and 2 as a function of ρnnucl and ρnsplit is shown in Fig. 8[link]. In agreement with the previous work (Majkrzak & Berk, 2003[Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27-38.]), ΔH is small for the original configuration without reference layer (ρnnucl = 2 × 10−6 Å−2, ρnsplit = 0). Correlation plots between the fit parameters in this configuration (see Figs. S5 and S6 in the supporting information) reveal that the MCMC identifies several distinct solutions for the SLD values of layers 1–6 of the two interfacial structures shown in Fig. 7[link].

[Figure 8]
Figure 8
Information gain ΔH (KDN estimate) for profiles 1 (a) and 2 (b) as functions of the nuclear and magnetic SLDs of a buried reference layer (see Fig. 7[link]).

If one of the neutron-polarization-dependent SLDs of the reference layer is sufficiently different (by 2 × 10−6 Å−2 or more) from that of the substrate, a unique solution is obtained. This is reflected in Fig. 8[link] by an increase in information gain of ∼15 bits; and the parameter correlation plots (see supporting information) collapse into one solution. Most notably, a polarized NR experiment (ρnsplit ≠ 0) is not required to uniquely resolve profiles 1 and 2, as the information gain does not depend on ρnsplit, as long as neither of ρn± matches that of the substrate. Table 3[link] lists the detailed results and their confidence limits for four selected points of the optimization: profile 1 without reference layer, profiles 1 and 2 with reference layer (ρnnucl = 6 × 10−6 Å−2, ρnsplit = 0), and profile 2 with reference layer and magnetic splitting (ρnnucl = 6 × 10−6 Å−2, ρnsplit = 2 × 10−6 Å−2). The information gain provided by the reference layer translates into significantly smaller uncertainties of the SLD values of the surface structure (layers 1–6).

In conclusion, a high-index reference layer boosts the overall reflectivity of the interfacial structure such that subtle details buried in noise of the reflectivity of the original structure become accessible. The near identity of the reflection spectra is not abrogated by the reference layer, but their overall magnitude is shifted to a level where the experiment can distinguish them, given the signal-to-noise ratio of a typical measurement.

4. Discussion

We have implemented a framework based on Bayesian statistics and information theory (Liepe et al., 2013[Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.]) to optimize surface-sensitive scattering experiments by evaluating their information content as a function of experimental parameters. The information content [\Delta {\widehat H}] of the experiment is obtained by approximating the mutual information between the prior and posterior PDFs using virtual experiments. By necessity, we applied a number of restrictions to our implementation. Instead of computing the full mutual information, we use a single representative parameter vector θ from the prior distribution. We also simulated only up to ten data sets y(θ) per sample representation θ, which differ by Gaussian noise. Analyzing the information gain ΔH obtained from those y(θ), we showed that the observed standard deviation of ΔH is sufficiently small that its average can be used to approximate [\Delta {\widehat H}].

We used two approaches to entropy calculation of the posterior PDF: the multivariate normal probability density approximation (MVN estimate) and an approach that samples directly from the posterior PDF (KDN estimate) (Kramer et al., 2010[Kramer, A., Hasenauer, J., Allgöwer, F. & Radde, N. (2010). 2010 IEEE International Conference on Control Applications, pp. 493-498. IEEE.]). We observed that these approaches yield qualitatively consistent results, as exemplified in Figs. 3[link] and 4[link], in which the MVN approach tends to overestimate ΔH. On the other hand, the KDN algorithm produces occasional outliers that need to be eliminated.

Results on a simple reflectometry problem of a fluid-immersed sample (Fig. 2[link]) demonstrated the usefulness of our approach. They confirm existing best practices on choosing the isotopic composition of aqueous solvents (Figs. 3[link] and 4[link]). For this example, the largest gains in information were obtained when the scattering contrasts between different components of the sample structure – such as the substrate, the interfacial structure and the surrounding aqueous solvent – were large. The least gain in information was obtained when the SLD of the solvent was matched to that of the substrate and the interfacial structure. Under contrast-matching conditions, when zero reflectivity is measured, the computed information gain was larger than zero. Since the model contains implicit parameters, such as that of the substrate SLD, even measuring zero reflectivity provides information.

Our approach explicitly simulates background levels for different isotopic compositions of aqueous solvents and air. H2O-based solvents create a large incoherent background and yield the lowest signal-to-noise ratio in the measured reflectivity. Therefore, predominantly H2O-based solvents are at a disadvantage over D2O-based solvents with comparable scattering contrast to the substrate and interfacial structure. In Fig. 3[link], this effect contributes to an asymmetry in information gain around the minimum located at substrate-matching conditions (ρn = 2 × 10−6 Å−2). Similarly, Fig. 4[link] shows that a combination of CM4 (ρn = 4 × 10−6 Å−2) and D2O yields the same information gain as a combination of H2O and D2O, although the latter creates the greater overall scattering contrast and should, therefore, provide more information.

We showed for the same sample structure that an extension of the measurement time t per reflectivity curve above the empirically determined optimal value of 3–6 h on the simulated instrumentation is ineffective, as the information gain is a logarithmic function of t (Fig. 5[link]). Furthermore, we determined the number of independent parameters supported by the data and the model by modeling the information gain as being limited by the Gaussian channel at the center of our optimization procedure (see Fig. 1[link]). We neglected other losses of information that might occur during the calculation of the noise-free reflectivity x, and we relied on the assumption that the MCMC robustly finds the global solution of the fitting problem. In quantitative terms, the fit of the time dependence to equation (23)[link] suggested that a measurement can resolve four independent parameters in this example. Equation (23)[link] further allowed us to independently determine the rate of information gain as a function of measurement time per independent parameter. This constitutes an improvement over similar studies in small-angle scattering (Pedersen et al., 2014[Pedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000-2010.]).

We also studied the dependence of ΔH on the maximum momentum transfer Qz,max in a measurement. Fig. 6[link] visualizes a result that is consistent with the rule of thumb expressed as the canonical resolution of a reflectivity experiment. For optimal information gain, a measurement must extend to a value of Qz that depends on the smallest feature size of the sample. We showed for our example that extending the measurement beyond this value does not yield a significant increase in information gain, other than that stemming from the time spent measuring additional data points. Moreover, Fig. 6[link] demonstrates that the transition from ignorance to knowledge is rather sudden at this particular value of Qz (left to right) – in contrast to the steady logarithmic increase in information gain as a function of simulated counting times (bottom to top).

While the virtual sample structure has a smallest length scale of 20 Å, real samples typically do not have such a limit. Nevertheless, the conclusion can be drawn that spending measurement time to assess reflectivities beyond the Qz,max that is associated with the smallest feature size of interest to the experimenter is not advisable. These conclusions are drawn from the simple structure investigated here; but the clarity of the result indicates that the general approach will probably also yield valuable and interesting results for more complex systems. Here, we have not explicitly tested the main consequence of the Shannon–Nyquist sampling theorem – the notion that a minimum spacing between data points in Qz is required to resolve a structure in real space with a limited total length (Vestergaard & Hansen, 2006[Vestergaard, B. & Hansen, S. (2006). J. Appl. Cryst. 39, 797-804.]; Shannon, 1949[Shannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10-21.]). However, this requirement is fulfilled in all virtual experiments simulated here; in fact, the simulated reflectivities are typically heavily oversampled.

While reference layers of high SLD and, in particular, magnetized reference layers probed with spin-polarized neutrons are increasingly used in NR investigations, details of how and to what degree they boost the information gain from the experiment have not been systematically investigated. Surprisingly, we found that performing a spin-polarized experiment is not required for maximum information gain in our example (see Fig. 8[link]). Magnetic splitting of the reference layer SLD is only effective if the nuclear SLD is near that of the Si substrate. However, once the nuclear SLD is shifted away from ρnnucl = 2 × 10−6 Å−2, the information gain is largely independent of the splitting. Our results show essentially uniform information gain for 0 ≤ ρnsplit ≤ 4 × 10−6 Å−2, for sufficiently large ρnnucl > 4 × 10−6 Å−2. Fig. 7[link](b) visualizes the mechanism by which the nanoscopic reference layer increases the information gain. While the interference structure of the reflectivity curve from the sample with the reference layer is more regular than that without, showing how the high-SLD layer dominates the signal, the details of the interference pattern can be more precisely determined because the signal is raised above the noise. Therefore, profiles 1 and 2 can be distinguished on a sample with a reference layer, but not without such a layer.

Because spin-polarized measurements typically occur at 1/2 of the unpolarized beam intensity, we conclude that polarized reflectometry is not always effective for measurements such as those discussed here. It will be interesting to study how more complex samples, such as partially hydrated sample structures, are affected by reference layers. The result that a polarized neutron measurement is not required to resolve the sample structure does not contradict theoretical work that polarized reflectometry and magnetic reference layers are sufficient to analytically reconstruct the SLD profile of certain classes of samples (direct inversion) (Majkrzak et al., 2009[Majkrzak, C. F., Berk, N. F., Kienzle, P. A. & Perez-Salas, U. (2009). Langmuir, 25, 4154-4161.]), nor that analytical data inversion recovers the maximum information content from the experimental data (phase-inversion principle) (Berk & Majkrzak, 2009[Berk, N. F. & Majkrzak, C. F. (2009). Langmuir, 25, 4132-4144.]).

5. Conclusion

We implemented a Bayesian and information theoretical framework to determine the information gain from reflectometry experiments with the purpose of experimental optimization. We applied this framework to a selection of test problems, demonstrating its usability and confirming many best practices that have guided the design of reflectometry experiments for a long time. At the same time, we gained non-intuitive insights that challenge some of them. A next step in this development will be an extension to more complex, and more relevant, applications of reflectometry in current research. Marginalization of the posterior PDF will be required to tailor the experiment effectively to a subset of parameters that are of immediate interest to the experimenter. With this in place, we predict significant utility of this framework for the optimization of reflectometry experiments from complex biomimetic interfaces.

Supporting information


Footnotes

1With current technology, the computation of p(θ | y) for a model of average complexity used in this work takes approximately 1.5 CPU hours. The calculation of H(Θ | y) is negligible in comparison. With a sample size of 1000 in Y, a nested Monte Carlo simulation would average 1500 CPU hours ≃ 60 CPU days per [\Delta {\widehat H}]. Optimizations in this work would require the calculation of between 15 and 200 values of [\Delta {\widehat H}], requiring 900 to 12 000 CPU days total.

2This neglects the diminishing effect of background on the relative variance. However, the background arising from the D2O solvent in the current sample is sufficiently small.

Acknowledgements

We thank Drs Markus Deserno and David L. Worcester for critical discussions. Certain commercial materials, equipment and instruments are identified in this work to describe the experimental procedure as completely as possible. In no case does such an identification imply a recommendation or endorsement by National Institute of Standards and Technology, nor does it imply that the materials, equipment or instrument identified are necessarily the best available for the purpose.

Funding information

This work was supported by the US Department of Commerce (grants 70NANB13H009 and 70NANB17H299). Will Golding implemented the initial entropy calculations within the MCMC optimization software and was supported by the Center for High Resolution Neutron Scattering (CHRNS), a partnership between the National Institute of Standards and Technology and the National Science Foundation under agreement No. DMR-1508249. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant No. ACI-1053575. Specifically, it used the Bridges system, which is supported by NSF award No. ACI-1445606, at the Pittsburgh Supercomputing Center (PSC) (Towns et al., 2014[Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., Hazlewood, V., Lathrop, S., Lifka, D., Peterson, G. D., Roskies, R., Scott, J. R. & Wilkins-Diehr, N. (2014). Comput. Sci. Eng. 16, 62-74.]).

References

First citationAnkner, J. F. & Majkrzak, C. F. (1992). Proc. SPIE, 1738, 260–269.  CrossRef Google Scholar
First citationBerk, N. F. & Majkrzak, C. F. (2009). Langmuir, 25, 4132–4144.  CrossRef CAS Google Scholar
First citationBlasie, J. K., Zheng, S. & Strzalka, J. (2003). Phys. Rev. B, 67, 224201.  Web of Science CrossRef Google Scholar
First citationBraak, C. J. F. ter & Vrugt, J. A. (2008). Stat. Comput. 18, 435–446.  Google Scholar
First citationChen, B., Wang, J., Zhao, H. & Principe, J. C. (2016). Entropy, 18, 196.  CrossRef Google Scholar
First citationCover, T. M. & Thomas, A. J. (2006). Elements of Information Theory. Hoboken: Wiley-Interscience.  Google Scholar
First citationDura, J. A., Pierce, D. J., Majkrzak, C. F., Maliszewskyj, N. C., McGillivray, D. J., Lösche, M., O'Donovan, K. V., Mihailescu, M., Perez-Salas, U., Worcester, D. L. & White, S. H. (2006). Rev. Sci. Instrum. 77, 074301.  Web of Science CrossRef Google Scholar
First citationHeinrich, F. & Lösche, M. (2014). Biochim. Biophys. Acta, 1838, 2341–2349.  CrossRef CAS Google Scholar
First citationHolt, S. A., Le Brun, A. P., Majkrzak, C. F., McGillivray, D. J., Heinrich, F., Lösche, M. & Lakey, J. H. (2009). Soft Matter, 5, 2576–2586.  CAS Google Scholar
First citationKirby, B. J., Kienzle, P. A., Maranville, B. B., Berk, N. F., Krycka, J., Heinrich, F. & Majkrzak, C. F. (2012). Curr. Opin. Colloid Interface Sci. 17, 44–53.  CrossRef CAS Google Scholar
First citationKonarev, P. V. & Svergun, D. I. (2015). IUCrJ, 2, 352–360.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationKramer, A., Hasenauer, J., Allgöwer, F. & Radde, N. (2010). 2010 IEEE International Conference on Control Applications, pp. 493–498. IEEE.  Google Scholar
First citationLarsen, A. H., Arleth, L. & Hansen, S. (2018). J. Appl. Cryst. 51, 1151–1161.  CrossRef CAS IUCr Journals Google Scholar
First citationLesniewski, J. E., Disseler, S. M., Quintana, D. J., Kienzle, P. A. & Ratcliff, W. D. (2016). J. Appl. Cryst. 49, 2201–2209.  CrossRef CAS IUCr Journals Google Scholar
First citationLiepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888.  CrossRef Google Scholar
First citationLiepe, J., Kirk, P., Filippi, S., Toni, T., Barnes, C. P. & Stumpf, M. P. H. (2014). Nat. Protoc. 9, 439–456.  CrossRef CAS Google Scholar
First citationLuzzati, V. & Taupin, D. (1986). J. Appl. Cryst. 19, 39–50.  CrossRef CAS IUCr Journals Google Scholar
First citationMajkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27–38.  CrossRef CAS Google Scholar
First citationMajkrzak, C. F., Berk, N. F., Kienzle, P. A. & Perez-Salas, U. (2009). Langmuir, 25, 4154–4161.  CrossRef CAS Google Scholar
First citationMajkrzak, C. F., O'Donovan, K. V. & Berk, N. F. (2006). Neutron Scattering From Magnetic Materials, pp. 397–471. Amsterdam: Elsevier.  Google Scholar
First citationMaranville, B. B., Kirby, B. J., Grutter, A. J., Kienzle, P. A., Majkrzak, C. F., Liu, Y. & Dennis, C. L. (2016). J. Appl. Cryst. 49, 1121–1129.  CrossRef CAS IUCr Journals Google Scholar
First citationMoore, P. B. (1980). J. Appl. Cryst. 13, 168–175.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationMüller, J. J., Hansen, S. & Pürschel, H.-V. (1996). J. Appl. Cryst. 29, 547–554.  CrossRef Web of Science IUCr Journals Google Scholar
First citationPedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000–2010.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRussell, T. P. (1990). Mater. Sci. Rep. 5, 171–271.  CrossRef CAS Google Scholar
First citationSchalke, M. & Lösche, M. (2000). Adv. Colloid Interface Sci. 88, 243–274.  CrossRef CAS Google Scholar
First citationShannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10–21.  Google Scholar
First citationSilverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London, New York: Chapman and Hall.  Google Scholar
First citationSivia, D. S. & Webster, J. (1998). Physica B, 248, 327–337.  CrossRef CAS Google Scholar
First citationSmith, G. S. & Majkrzak, C. F. (2006). International Tables for Crystallography, Vol. C, Mathematical, Physical and Chemical Tables, edited by E. Prince, pp. 126–146. Chester: International Union of Crystallography.  Google Scholar
First citationTaupin, D. & Luzzati, V. (1982). J. Appl. Cryst. 15, 289–300.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTowns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., Hazlewood, V., Lathrop, S., Lifka, D., Peterson, G. D., Roskies, R., Scott, J. R. & Wilkins-Diehr, N. (2014). Comput. Sci. Eng. 16, 62–74.  CrossRef Google Scholar
First citationVestergaard, B. & Hansen, S. (2006). J. Appl. Cryst. 39, 797–804.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationYustres, Á., Asensio, L., Alonso, J. & Navarro, V. (2012). Comput. Geosci. 16, 1–20.  CrossRef Google Scholar
First citationZimmermann, K. M., Tolan, M., Weber, R., Stettner, J., Doerr, A. K. & Press, W. (2000). Phys. Rev. B, 62, 10377–10382.  CrossRef CAS Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767
Follow J. Appl. Cryst.
Sign up for e-alerts
Follow J. Appl. Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds