research papers
Optimization of reflectometry experiments using information theory
^{a}Department of Physics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA, ^{b}Center for Neutron Research, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 208996102, USA, and ^{c}Department of Biomedical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
^{*}Correspondence email: fheinrich@cmu.edu
A framework based on Bayesian statistics and information theory is developed to optimize the design of surfacesensitive reflectometry experiments. The method applies to modelbased reflectivity data analysis, uses simulated reflectivity data and is capable of optimizing experiments that probe a sample under more than one condition. After presentation of the underlying theory and its implementation, the framework is applied to exemplary test problems for which the information gain ΔH is determined. Reflectivity data are simulated for the current generation of neutron reflectometers at the NIST Center for Neutron Research. However, the simulation can be easily modified for Xray or neutron instruments at any source. With application to structural biology in mind, this work explores the dependence of ΔH on the scattering length density of aqueous solutions in which the sample structure is bathed, on the counting time and on the maximum momentum transfer of the measurement. Finally, the impact of a buried magnetic reference layer on ΔH is investigated.
1. Introduction
Neutron reflectometry (NR) is a ). Applications of NR reach from hardcondensed matter (Majkrzak et al., 2006) to soft matter (Russell, 1990), including structural biology of lipid membranes (Heinrich & Lösche, 2014). Given the limited availability of neutrons for scattering experiments and the flexibility in isotopic labeling of distinct components of the surface structure, it is worthwhile to optimize the experimental design with respect to the information gain. Presently, the design of neutron scattering experiments mostly follows rules of thumb, i.e. experience gained in similar experiments in the past. Here, we implement a quantitative and predictive framework to plan reflectometry work based on rigorous estimates of the information gained in a particular implementation of the experiment. With minor changes, this framework is applicable to Xray reflectometry and, with some extension, to neutron and Xray smallangle scattering.
technique that resolves the thickness and composition of thin films at interfaces and surfaces with nearångström resolution (Smith & Majkrzak, 2006In the recent past, Bayesian statistical methods have found applications in reflectometry for robust global model fitting and the determination of confidence limits on model parameters (Sivia & Webster, 1998; Kirby et al., 2012; Maranville et al., 2016; Lesniewski et al., 2016). In particular, the work of Sivia and coworkers has provided a solid foundation for the application of Bayesian statistics to reflectivity data, discussing aspects such as parameter estimation, model selection and experimental design. Our work concerning experimental optimization adds to this foundation by introducing model fitting based on a Monte Carlo Markov chain (MCMC) simulation, which by design yields a sample of the posterior parameter density function (PDF) (Yustres et al., 2012; Braak & Vrugt, 2008; Lesniewski et al., 2016). A measure of the information gain from a given experiment is obtained from comparing the entropies of the posterior and prior PDFs, which represent the knowledge about the sample after and before the experiment, respectively. We show that with these two additions a flexible numerical framework for experimental design can be built. In developing this methodology for reflectometry, we closely follow established implementations in other fields such as systems biology (Liepe et al., 2013, 2014).
Fig. 1 summarizes the implemented method to quantify the information gain of an experiment. We start with a modeldependent description of a hypothetical sample structure S and instrument configuration E parameterized by a vector θ ∈ Θ. (Capital letters denote a random variable, and lowercase letters denote a particular instance of a random variable.) Importantly, θ is not randomly drawn from Θ according to the prior PDF, as we do not optimize over different sample configurations within the prior. Using a model X_{S,E}(θ), noisefree reflectivity data X_{S,E}(θ) → x(Q_{z}) are simulated over a finite range of discrete, experimentally accessible momentum transfer values Q_{z}. Random normal noise z(Q_{z}) is added to x(Q_{z}) to obtain simulated sets of noisy data y(Q_{z}) that could have occurred in a real measurement of the hypothetical structure. The standard deviation σ(Q_{z}) of the normal noise Z depends on the instrument configuration, the momentum transfer Q_{z} and the value of x(Q_{z}) itself. It generally differs for every data point. Finally, model parameters are retrieved from y using an MCMC simulation that returns a sample of the posterior PDF p(θ  y). The information gain of the virtual experiment is evaluated as the difference in of the prior and posterior PDFs, . Since the MCMC simulation employs the same model that was used to calculate x(Q_{z}), is a measure of the gain in information exclusively about model and experimental parameters contained in Θ. Other parameters intrinsic to the model that have fixed values, such as the known scattering length density of a substrate supporting the interfacial structure and instrumental parameters like those defining the resolution function, do not affect . To optimize an NR experiment, sample or experimental properties are systematically varied to determine the maximum in the search space.
Different approaches to determine the information content for smallangle scattering data have been established in the past (Moore, 1980; Taupin & Luzzati, 1982; Luzzati & Taupin, 1986; Müller et al., 1996; Vestergaard & Hansen, 2006; Pedersen et al., 2014; Konarev & Svergun, 2015; Larsen et al., 2018). While the employed methods differ substantially, these approaches have in common that the information content of the experiment is quantified either directly from y, or from y given x and θ. Our method is strictly model based and it describes the information gain from virtual experiments using a series of discrete information processing steps as shown in Fig. 1, ultimately comparing the two endpoints of this process with respect to the information content. This has the advantage that it separates the information gain on model fit parameters from known model and experimental parameters, about which information is also carried by x and y. This procedure has a substantial advantage over other implementations in that it can handle information distributed across multiple related measurements that are analyzed simultaneously with one model.
After establishing the methodology, we apply it to a set of simple model systems, thereby demonstrating the optimization of fundamental experimental properties, such as counting time, maximum momentum transfer and the choice of the scattering length density (SLD) of the bulk solvent in NR measurements of fluidimmersed samples. These examples have been chosen to best illustrate the broad usefulness of the technique, and they can easily be extended to encompass other experimental situations of practical interest. For example, while the method described here is applied to the current generation of monochromatic neutron reflectometers, it can be adapted to other types of reflectometers or be used to predict the performance of experimental stations under development.
2. Theory and implementation
2.1. Information content of specular reflection data
A specular neutron or Xray reflectometry experiment on a sample S using an experimental configuration E results in a particular measurement of the data, y ∈ Y. Experimental results are generally analyzed in terms of a model, X_{S,E}(θ), that relates a model parameter vector θ ∈ Θ to an expected experimental outcome . The aim of data analysis is to find the posterior PDF p(θ  y) by which a particular parameter vector θ is realized given y and X_{S,E}(θ). The traditional task of finding the vector θ that produces the best fit to the data, or the maximum of p(θ  y), is therefore only a particular aspect within this broader definition of data analysis.
The information gain ΔH is defined as the difference between the H(Θ) of the prior PDF p(θ), representing the knowledge before the experiment, and the H(Θ  y) of the posterior PDF p(θ  y), obtained after the measurement yielded a particular experimental outcome y ∈ Y:
Both entropies are functionals of their respective continuous PDFs:
(All logarithms in this work are taken to the base of two, such that entropies are calculated in bits.) This approach does not consider that the experimental outcome y is a random variable itself, drawn from a pool of possible experimental outcomes Y. The appropriate, but significantly more expensive, quantity to calculate is the expected information gain given all possible experimental outcomes y, which equals the mutual information I between the random variables Y and Θ:
Using equation (4), can in principle be computed as a Monte Carlo integration over Θ and Y (Liepe et al., 2013). The prior predictive distribution p(y) additionally needs to be computed (Liepe et al., 2013), and it can be expressed in terms of the prior and posterior PDFs using Bayes' theorem:
The conditional PDF p(y  θ) of observing a particular experimental outcome y, given a parameter vector θ, can be obtained using the model X_{S,E}(θ) and instrumentspecific normal variate random noise on n data points of y(Q_{z}) with a standard deviation σ(Q_{z}):
In practice, however, such a nested Monte Carlo integration is computationally costly.^{1} We therefore approximate from the average of up to ten calculations of ΔH using independently simulated experimental data y (Liepe et al., 2013).
2.2. Implementation of the algorithm
2.2.1. Simulation of experimental data
This work is carried out with simulated data to avoid systematic errors due to particular experimental instrumentation and to explore a large parameter space for optimization. Experimental data y(Q_{z}) are simulated for the Magik reflectometer at the NIST Center for Neutron Research (NCNR) in Gaithersburg, MD, USA (Dura et al., 2006), with a beam footprint on the sample surface of 2.5 × 5 cm, equipped with a fluids cell for solventimmersed samples. The models X_{S,E}(θ) of interfacial structures were implementations of stratified slabs of homogeneous SLD (slab models) (Ankner & Majkrzak, 1992). Noisefree experimental outcomes x(Q_{z}) were calculated with refl1d (Kirby et al., 2012). To obtain the final simulated reflectivity y(Q_{z}), normally distributed random noise z(Q_{z}) with standard deviations σ_{S,E}[Q_{z}, x(Q_{z}), θ] was added to x(Q_{z}). A detailed description of the data simulation is provided in the supporting information.
For the optimization of a particular experiment, parameter vectors θ ∈ Θ for data simulation should be strictly drawn at random from Θ according to the prior PDF p(θ) (see Introduction). However, for the problems discussed in this work, and for many other applications, the sample structure is sufficiently well known that variations within p(θ) are not expected to significantly affect ΔH and are negligible compared with changes in ΔH that occur when systematically varying parameters over much larger ranges during the experimental optimization. Choosing one sample representation θ ∈ Θ from the prior PDF has the additional advantage that a costly Monte Carlo simulation over the prior PDF, which would be otherwise necessary, is avoided (Liepe et al., 2013). For largely unknown sample structures that would be implemented by a much wider prior PDF, the dependence of ΔH on the particular sample representation θ ∈ Θ might not be negligible. In this case, the framework presented in this work can easily be extended to include a Monte Carlo simulation over the prior PDF.
2.2.2. of the prior parameter density function
Lacking more detailed prior knowledge, the prior PDF p(θ) is the product of the prior probabilities of the assumed independent vector components θ_{i}. It is further assumed that the PDF of each component, p(θ_{i}), is constant over an interval Δθ_{i} with p(θ_{i}) = 1/Δθ_{i}:
Using equation (2), the of p(θ) becomes the sum of the entropies of the independent p(θ_{i}):
The prior PDF is subjectively set by the experimenter. In addition to the calculation of ΔH, it is used to compute the acceptance probability of new states of the Markov chain during the MCMC analysis of y(Q_{z}). As such, the choice of Δθ_{i} affects the posterior PDF in that it excludes parameter values outside of Δθ_{i}. Only parameters with nonuniform contributions to the posterior PDF, or in other words parameters that can be resolved with respect to the prior PDF, add to the information gain (see Section 3.1.3 for a detailed discussion). A change in interval length Δθ_{i} for a resolvable parameter leads to a constant offset in H(Θ) and, therefore, ΔH, which is inconsequential for the purpose of experimental optimization, as it relies on relative differences in ΔH.
2.2.3. of the posterior parameter density function
The posterior PDF is obtained from an MCMC simulation implemented in refl1d (Kirby et al., 2012) using the simulated data y and the model X_{S,E}(θ) as inputs. MCMC analysis yields an unnormalized sample of the posterior PDF. We calculated the of p(θ  y) using two different methods. The first method constructs a multivariate normal (MVN) probability density approximation from a random sample of 1000 points of the posterior:
The vector μ is the mean of the sample of ddimensional parameter vectors θ. Σ denotes the determinant of the variance–covariance matrix Σ of θ. Both values can be defined in terms of an :
of the MVN distribution is then computed as (ChenThe second method to calculate the et al. (2010). Here, the of the posterior is obtained by Monte Carlo sampling from the unnormalized MCMC output using a sample size of 5000, while the normalization factor is obtained from a kernel density estimate using a Gaussian kernel (Silverman, 1986). In the following, we denote this approach as the KDN method.
of the posterior follows KramerThe sample sizes for the MVN and KDN approaches represent limits for which a computation of an equilibrated MCMC plus , the MVN approximation was robust over repetitions of MCMC simulations but tended to underestimate the of the posterior and, therefore, overestimated ΔH. This is not unexpected, as the MVN approximation performs less well on nonnormal, i.e. asymmetric and heavily tailed, distributions (Kramer et al., 2010). The KDN method proved less robust, in turn, leading to somewhat larger standard deviations on ΔH and occasional outliers that were identified and eliminated.
estimate remains feasible given current computational resources. Because of those limits on sample size and the high dimensionality of NR models, an accurate and robust computation of the posterior is often challenging. The MVN estimate fulfills here the role of a widely used reference against which the KDN estimate can be validated. As shown in Section 32.2.4. Information gain
As discussed above, the computation of is significantly costlier than calculating the gain in information ΔH obtained from only a single experimental representation y. For large multivariate models such as those used in NR, computing is currently not feasible. The differences between ΔH and have been determined for models with fewer parameters. While significant, they were shown to be smaller than the changes in due to experimental optimization in other applications (Liepe et al., 2013). This observation is in agreement with results in this work. We estimated the difference between and ΔH by averaging multiple independent values of ΔH and showed that the variations in ΔH for individual points of the optimization are significantly smaller than the changes in ΔH over the entire range of systematically varied parameters to be optimized.
2.2.5. Marginalization of the posterior parameter density function
Most models X_{S,E} contain a subset of nuisance parameters δ that are required for constructing a valid model but are not of practical interest to the experimenter. Together with the parameters of interest η, they form the parameter space θ = (η, δ). Consequently, the relevant quantity for optimizing an NR experiment is often the marginal of the posterior H_{η}(Θ  y) with respect to the parameters of interest (Sivia & Webster, 1998; Chen et al., 2016):
The marginal posterior PDF p(η  y) is obtained by integrating the joint probability of η and δ over the nuisance parameters δ:
Using an MVN distribution, a marginal H_{η}^{MVN}(Θ  y) is calculated rather easily by dropping the unwanted parameters from the covariance matrix and the mean vector:
The computation of a KDN equivalent of the marginal
that involves Monte Carlo sampling from the MCMCobtained posterior PDF is difficult and will be the topic of a future study. In this work, we exclusively compute total entropies of the posterior PDF and evaluate confidence limits on parameters of interest separately for selected points in the optimization space.3. Results
3.1. A test structure
For a first demonstration of the method, we start with a simple artificial interfacial structure: a porous, atomically flat Si layer suspended above a planar solid Si substrate in aqueous solvent (Fig. 2). This structure is characterized at first with one and then with two NR measurements, evaluated with a simple slab model, and analyzed for the resulting information gain under systematic variation of the SLD of the solvent, ρ_{n}. Model parameters are provided in Table 1. The aim of the optimization is to identify the isotopic constitution of the aqueous solvent that maximizes information gain as ρ_{n} is varied between that of H_{2}O (ρ_{n} ≃ −0.5 × 10^{−6} Å^{−2}) and D_{2}O (ρ_{n} ≃ 6.5 × 10^{−6} Å^{−2}).

3.1.1. One solvent contrast
Fig. 3 shows the expected information gain from a single NR measurement as a function of the SLD of the aqueous medium that surrounds and penetrates the porous Si layer (`one solvent contrast'). The MVN and KDN methods for determination yield similar results, with the MVN results consistently slightly higher than the KDN results. The error bars in Fig. 3 represent standard deviations from five independent simulations per data point and allow us to assess the error introduced by computing ΔH instead of . For this particular example, this error is significantly smaller than the changes in ΔH due to variation in solvent SLD.
The minimum information gain is observed under the condition that the bulk solvent SLD matches that of Si (ρ_{n} ≃ 2 × 10^{−6} Å^{−2}). In this case, the porous solventfilled Si slab, the substrate and the solvent all have the same SLD and are thus `invisible' to neutrons. The residual gain at the minimum of the curve, ΔH ≃ 3 bits, stems from a high confidence in determining the SLDs of the porous silicon layer, the bulk solvent and the scattering background. Because the SLD of the semiinfinite Si substrate is known, the unknown SLDs are well determined under matching conditions, even though the observed neutron reflectivity is zero. Other model parameters can only be resolved if the experimenter uses a solvent with an SLD different from that of Si. Consequently, the gain in information increases when its SLD deviates from that of Si, reaching ΔH ≃ 10 bits for H_{2}Obased solvent and ΔH ≃ 15 bits for D_{2}O. For the same absolute difference of the solvent SLD from that of Si, positive SLD deviations yield a larger information gain than negative deviations. This phenomenon is due to several effects. First, because the from D_{2}O is lower than that from H_{2}O, samples bathed in D_{2}Orich solvent show lower intrinsic scattering background. Second, the higher SLDs of D_{2}Orich solvents lead to higher neutron reflectivity throughout the simulated range of Q_{z} (0.08 ≤ Q_{z} ≤ 0.26 Å^{−1}), which can be determined with higher confidence. Finally, the presence of a critical angle of total internal reflection in the NR curve increases the gain in information for solvent SLDs with ρ_{n} > 4 × 10^{−6} Å^{−2}, at which this critical angle can be observed within the simulated Q_{z} range.
Explicit fit parameters and their uncertainties for three exemplary bulk solvent SLD values are listed in Table 1. The parameter uncertainties for ρ_{n} = −0.5 × 10^{−6} Å^{−2} and ρ_{n} = 6.5 × 10^{−6} Å^{−2} are significantly smaller over the entire set of model parameters, reflecting the increased information gain for those solvents.
To put the abstract values of ΔH given in Fig. 3 in perspective, the following simplified comparison is educative. Under the assumption that the posterior PDF of a single uncorrelated parameter can be approximated by a Gaussian distribution, the contribution of this parameter to the posterior is determined by the standard deviation σ of the distribution:
With respect to the Δθ [equation (8)], the contribution to the information gain from this single parameter is
of the corresponding uniform prior PDF over the intervalA Gaussian posterior PDF with a standard deviation approximately onequarter [i.e. 1/(2πe)^{1/2}] of the width of a uniform prior PDF has the same as the latter and yields therefore zero information gain. Standard deviations above this threshold contribute a limited loss of information, particularly for the MVN estimate, only because of the different functional forms used to describe the prior and posterior PDFs. Equations (8) and (16) also show that decreasing the widths of either a uniform prior or a Gaussian posterior PDF by onehalf changes the information gain by ∼1 bit (while neglecting parameter correlations).
3.1.2. Two solvent contrasts
The information gain of an NR experiment that consists of two reflectivity measurements with different solvents is shown in Fig. 4. The two solvent SLDs in these measurements were independently varied between those of pure H_{2}O and D_{2}O, and data analysis was performed under the constraint that both structural models share the same set of parameters, except for the two solvent SLDs and their associated background levels. The minimum information gain is observed when both solvent SLDs are ρ_{n} = 2 × 10^{−6} Å^{−2}. Particularly high information gains are found for bulk compositions at the extreme margins of solvent SLDs, (ρ_{n}^{1}, ρ_{n}^{2}) = (−0.5 × 10^{−6} Å^{−2}, 6.5 × 10^{−6} Å^{−2}). However, a combination of measurements with pure D_{2}O and an H_{2}O/D_{2}O mixture with ρ_{n} = 4 × 10^{−6} Å^{−2} (denoted as CM4) yields a comparable information gain.
Similarly to the previous optimization, the KDN and MVN yield qualitatively equivalent results. While the MVN estimate slightly overestimates the information gain, the KDN estimate shows somewhat larger uncertainties in each data point. In all further examples discussed in this work, we only used KDN estimates.
estimates in Fig. 4Fit parameters and their uncertainties for three regions of particularly small and large ΔH (Fig. 4) are provided in Table 2. For the minimum information gain at (ρ_{n}^{1}, ρ_{n}^{2}) = (2.0 × 10^{−6} Å^{−2}, 2.0 × 10^{−6} Å^{−2}), the result is comparable to the experiment with one solvent contrast. Only the parameter uncertainties for the SLD and of the porous Si, and the SLDs of both bulk solvents, show significant improvement over the prior PDFs. The parameter uncertainties for (−0.5 × 10^{−6} Å^{−2}, 6.5 × 10^{−6} Å^{−2}) and (6.5 × 10^{−6} Å^{−2}, 4 × 10^{−6} Å^{−2}) are significantly smaller than for (2.0 × 10^{−6} Å^{−2}, 2.0 × 10^{−6} Å^{−2}) over the entire set of model parameters.

Evaluations such as those shown in Fig. 4 are invaluable to determine whether it is advantageous in a real experiment to consecutively measure two NR curves from a sample bathed in distinct solvents or, rather, to allocate the same total measurement time to a single measurement with one solvent. In this example, values at the plot diagonal are consistently lower than offdiagonal values, which would argue in favor of conducting two distinct measurements.
3.1.3. Dependence of ΔH on counting time
Fig. 5(a) shows the information gain ΔH as a function of the counting time t for a single NR measurement of the model structure (Fig. 2) immersed in D_{2}O. After an initial fast increase of ΔH within the first 3 h of measurement, the information gain quickly enters a region of diminishing return in which further improvement requires increasingly long counting times. For the presented example and simulated instrument, we consider the transition region between those two regimes at 3–6 h an optimal range of counting time. The observed time dependence of the information gain is similar to those previously reported (Pedersen et al., 2014; Berk & Majkrzak, 2009). It suggests that, at least for this simple test structure, an increase in neutron on future instrumentation will allow for shorter measurement times but may not yield significantly reduced fit parameter confidence limits.
To describe the functional form of the relation shown in Fig. 5, we simplify the situation and consider the capacity of the Gaussian channel shown in Fig. 1 to be the single limiting factor for the information gain. In other words, for the current example we assume that the exact parameter vector θ used for data simulation could be retrieved by the MCMC simulation given noisefree reflectivity data y(Q_{z}). We therefore neglect other limiting factors such as the loss of phase information during reflectivity simulation or shortcomings of the MCMC algorithm. The capacity C of a communication channel is defined as the maximum mutual information between the input X and the output Y for all possible choices of p(x). In communication theory, C sets the maximum transmission rate of information over the channel; in our example it constitutes an upper limit on the information gain on x by knowing y.
As shown in Fig. 1, the Gaussian channel adds random normal noise z(Q_{z}) with standard deviation σ(Q_{z}) to the noisefree simulated reflectivity x(Q_{z}), thus providing the reflectivity with noise y(Q_{z}). Since the reflectivity is simulated for n discrete values of Q_{z} (n data points), such a Gaussian channel can be described by the combined action of n independent parallel Gaussian channels, each of which adds random normal noise to one data point of the simulated reflectivity. n independent parallel Gaussian channels have a combined maximum capacity C that depends on the signaltonoise ratio y/σ per channel (Cover & Thomas, 2006):
Neglecting contributions from background subtraction and incident intensity normalization, the signaltonoise ratio y/σ can be computed solely from counting statistics. For each reflectivity data point, y/σ depends only on the number of specular counts N, which is the product of the counting time t and a constant specular count rate r:
We can therefore rewrite the combined capacity of the Gaussian channels as
In a typical NR measurement, the signaltonoise ratio y/σ = (rt)^{1/2} is kept approximately constant over the entire Q_{z} range by increasing both counting time and beam as Q_{z} increases, in order to offset the general Q_{z}^{−4} dependence of the specular reflectivity. We therefore simplify equation (20) by assuming that all channels have the same relative variance, which is measured in an effective time t and at an effective rate r. We arbitrarily choose that t represents the total counting time of the reflectivity curve (instead of, for example, choosing the average measurement time per point which would only change the effective rate).^{2} On the basis of the Shannon–Nyquist sampling theorem (Pedersen et al., 2014; Konarev & Svergun, 2015), the reflectivity data in the example are heavily oversampled. In addition, R is band limited. Consequently, not all n channels are independent, and we can write the channel capacity as that of m effective independent channels (m < n) (Cover & Thomas, 2006):
The channel capacity C imposes an upper limit on the actual channel rate I(X, Y), which itself is an upper limit on the information gain ΔH of the entire virtual experiment shown in Fig. 1:
Consequently, we apply the following equation for analysis of the KDNderived information gain (Fig. 5):
The coefficient m′ can be interpreted as the number of independent parameters determined in the experiment, and r′ is associated with an average rate of increase in information gain per parameter. Fig. 5 shows the fit to ΔH(t) that yields m′ = 4.2 ± 0.1 and r′ = 219 ± 5 h^{−1}. The value of m′ indicates four independent parameters, and inspection of the last column in Table 1 confirms this estimate, as four out of seven parameters show a significant improvement over the prior PDF (t = 6 h). ΔH_{0} is the systematic error in calculating the information gain for t → 0, which stems from evaluating a uniform posterior PDF that equals the prior PDF using the KDN estimate (see Section 3.1.1). ΔH_{0} was determined to be −0.89 ± 0.05 bits.
The values of ΔH for 0.375 ≤ t ≤ 3 h show a comparatively high uncertainty (see Fig. 5). When inspecting individual parameter uncertainties over this interval (data not shown), it was found that in this region a transition occurs, in which the number of independent parameters that can be resolved (are in scope of the prior PDF) increases from three to four, and variations in the simulated reflectivity due to random noise can lead to either outcome in the MCMC analysis. Accordingly, when fitting ΔH(t) using equation (23) and a limited time interval 0 ≤ t < 0.375 h, a coefficient m′ = 3.0 ± 0.1 is obtained (fit not shown), which agrees with the ability to resolve three independent parameters. This indicates that, strictly, ΔH(t) has to be fitted piecewise over intervals of t, in which the number of resolvable independent parameters does not change. A thorough exploration of this aspect goes beyond the objective of this work and is left for a future study.
3.1.4. Dependence of ΔH on the maximum momentum transfer
Fig. 6 shows the dependence of the information gain on counting time t and maximum momentum transfer Q_{z,max} of the simulated data for the test structure (Fig. 2) immersed in D_{2}O [Fig. 6(a)], and two related structures in which all layer thicknesses are scaled by a factor of 0.5 [Fig. 6(b)] or a factor of 2 [Fig. 6(c)]. For all three structures, ΔH shows an increase with t similar to that in Fig. 5, which is equivalent to a vertical slice of the independent optimization shown in Fig. 6(a) at Q_{z,max} = 0.26 Å^{−1}. All structures show a rather sudden increase in information gain when the reflectivity is extended beyond a certain critical value of Q_{z,max}, which roughly matches the position of the second minimum of the reflectivity curves [Fig. 2(b)]. For the original test structure, this transition occurs at Q_{z,max} ≃ 0.2 Å^{−1} [Fig. 6(a)], corresponding to the thicknesses of the Si slab of 30 Å. For sufficiently high Q_{z,max}, a third reflection minimum can be observed at Q_{z} = 0.3 Å^{−1}, which stems from the 20 Åthick interstitial water layer. However, Fig. 6(a) indicates that extending the reflectivity to this value does not significantly increase the information gain further. Accordingly, the thicknesses of both sample layers are already well resolved when limiting the reflectivity to Q_{z,max} = 0.26 Å^{−1} (t = 6 h) (see Table 1, last column). This result is consistent with the canonical resolution estimate (Schalke & Lösche, 2000), which for Q_{z,max} = 0.26 Å^{−1} yields a smallest resolvable structure size of Δz = π/Q_{z,max} = 12 Å. A discussion of the effect of a limited Q_{z} range on the information gain that goes beyond these rather qualitative arguments will require theory on time and bandwidthlimited Gaussian channels (Cover & Thomas, 2006), and provides a promising avenue for future studies.
With respect to experimental design, it is useful to determine the critical value of Q_{z, max} for a particular sample to avoid spending neutron beamtime at unnecessarily high Q_{z} for which the signaltonoise ratio becomes increasingly unfavorable. Realworld samples, as opposed to the simulated structures used in this work, do not necessarily have a smallest feature size that would define Q_{z,max}. Therefore, future optimizations using more complex structural models with a larger range of feature sizes will be of high interest to determine how to limit the Q_{z} range of a measurement according to the smallest feature size of interest to the experimenter.
3.2. Influence of the substrate structure on information gain
NR sample substrates are sometimes engineered to contain one or several nanoscopic layers of high SLD buried near the interface which are not part of the interfacial structure of interest. Magnetic reference layers that scatter incident neutrons differently, depending on the polarization of the neutron in a magnetic field, can be particularly powerful in elucidating interfacial details (Holt et al., 2009). Such sample engineering has been demonstrated to allow for a direct inversion of reflectivity data in certain cases (Blasie et al., 2003; Majkrzak et al., 2009). Here we explore the effect of a buried nanoscopic layer on the information content of NR data using a previously published test case (Zimmermann et al., 2000; Majkrzak & Berk, 2003).
Zimmermann et al. (2000) described a set of distinct Xray scattering length density profiles that yield nearly indistinguishable reflectivity curves. In turn, this prevents a unique determination of the SLD profile if any one of those reflectivities were measured. Majkrzak & Berk (2003) constructed a set of similar neutron SLD profiles that result in the same ambiguity (profiles 1 and 2 in Fig. 7). Both studies demonstrated that even partial knowledge of the sample structure can be insufficient to uniquely determine the SLD profile. It was shown that additional information, such as embedded reference structures, is necessary to uniquely determine the profiles.
Here, we systematically explore this problem by burying a tunable soft magnetic reference layer of finite thickness near the substrate surface [gray layer in Fig. 7(a)]. The SLD of the reference layer attains two values that depend on the polarization of the incident neutrons. The total SLD of the reference layer is the sum of the nuclear SLD and the magnetic splitting, ρ_{n} ^{±} = ρ_{n}^{nucl} ± ρ_{n}^{split}. Both parameters depend on the choice of the magnetic material and were thus systematically varied in the analysis (see Table 3). For every point in the optimization space, including those with ρ_{n}^{split} = 0, two reflectivity curves were simulated and analyzed, one for each neutron polarization. The statistical quality of each curve is equivalent to that obtained after 30 h counting on the Magik reflectometer at the NCNR. Reflectivities with ρ_{n}^{split} = 0 are equivalent to those that would be obtained in a nonpolarized NR experiment. The condition at which ρ_{n}^{nucl} matches that of the underlying Si substrate and ρ_{n}^{split} = 0 reproduces the structures described in the original work (Majkrzak & Berk, 2003).
The information gain ΔH from virtual NR measurements of profiles 1 and 2 as a function of ρ_{n}^{nucl} and ρ_{n}^{split} is shown in Fig. 8. In agreement with the previous work (Majkrzak & Berk, 2003), ΔH is small for the original configuration without reference layer (ρ_{n}^{nucl} = 2 × 10^{−6} Å^{−2}, ρ_{n}^{split} = 0). Correlation plots between the fit parameters in this configuration (see Figs. S5 and S6 in the supporting information) reveal that the MCMC identifies several distinct solutions for the SLD values of layers 1–6 of the two interfacial structures shown in Fig. 7.
If one of the neutronpolarizationdependent SLDs of the reference layer is sufficiently different (by 2 × 10^{−6} Å^{−2} or more) from that of the substrate, a unique solution is obtained. This is reflected in Fig. 8 by an increase in information gain of ∼15 bits; and the parameter correlation plots (see supporting information) collapse into one solution. Most notably, a polarized NR experiment (ρ_{n}^{split} ≠ 0) is not required to uniquely resolve profiles 1 and 2, as the information gain does not depend on ρ_{n}^{split}, as long as neither of ρ_{n}^{±} matches that of the substrate. Table 3 lists the detailed results and their confidence limits for four selected points of the optimization: profile 1 without reference layer, profiles 1 and 2 with reference layer (ρ_{n}^{nucl} = 6 × 10^{−6} Å^{−2}, ρ_{n}^{split} = 0), and profile 2 with reference layer and magnetic splitting (ρ_{n}^{nucl} = 6 × 10^{−6} Å^{−2}, ρ_{n}^{split} = 2 × 10^{−6} Å^{−2}). The information gain provided by the reference layer translates into significantly smaller uncertainties of the SLD values of the surface structure (layers 1–6).
In conclusion, a highindex reference layer boosts the overall reflectivity of the interfacial structure such that subtle details buried in noise of the reflectivity of the original structure become accessible. The near identity of the reflection spectra is not abrogated by the reference layer, but their overall magnitude is shifted to a level where the experiment can distinguish them, given the signaltonoise ratio of a typical measurement.
4. Discussion
We have implemented a framework based on Bayesian statistics and information theory (Liepe et al., 2013) to optimize surfacesensitive scattering experiments by evaluating their information content as a function of experimental parameters. The information content of the experiment is obtained by approximating the mutual information between the prior and posterior PDFs using virtual experiments. By necessity, we applied a number of restrictions to our implementation. Instead of computing the full mutual information, we use a single representative parameter vector θ from the prior distribution. We also simulated only up to ten data sets y(θ) per sample representation θ, which differ by Gaussian noise. Analyzing the information gain ΔH obtained from those y(θ), we showed that the observed standard deviation of ΔH is sufficiently small that its average can be used to approximate .
We used two approaches to et al., 2010). We observed that these approaches yield qualitatively consistent results, as exemplified in Figs. 3 and 4, in which the MVN approach tends to overestimate ΔH. On the other hand, the KDN algorithm produces occasional outliers that need to be eliminated.
calculation of the posterior PDF: the multivariate normal probability density approximation (MVN estimate) and an approach that samples directly from the posterior PDF (KDN estimate) (KramerResults on a simple reflectometry problem of a fluidimmersed sample (Fig. 2) demonstrated the usefulness of our approach. They confirm existing best practices on choosing the isotopic composition of aqueous solvents (Figs. 3 and 4). For this example, the largest gains in information were obtained when the scattering contrasts between different components of the sample structure – such as the substrate, the interfacial structure and the surrounding aqueous solvent – were large. The least gain in information was obtained when the SLD of the solvent was matched to that of the substrate and the interfacial structure. Under contrastmatching conditions, when zero reflectivity is measured, the computed information gain was larger than zero. Since the model contains implicit parameters, such as that of the substrate SLD, even measuring zero reflectivity provides information.
Our approach explicitly simulates background levels for different isotopic compositions of aqueous solvents and air. H_{2}Obased solvents create a large incoherent background and yield the lowest signaltonoise ratio in the measured reflectivity. Therefore, predominantly H_{2}Obased solvents are at a disadvantage over D_{2}Obased solvents with comparable scattering contrast to the substrate and interfacial structure. In Fig. 3, this effect contributes to an asymmetry in information gain around the minimum located at substratematching conditions (ρ_{n} = 2 × 10^{−6} Å^{−2}). Similarly, Fig. 4 shows that a combination of CM4 (ρ_{n} = 4 × 10^{−6} Å^{−2}) and D_{2}O yields the same information gain as a combination of H_{2}O and D_{2}O, although the latter creates the greater overall scattering contrast and should, therefore, provide more information.
We showed for the same sample structure that an extension of the measurement time t per reflectivity curve above the empirically determined optimal value of 3–6 h on the simulated instrumentation is ineffective, as the information gain is a logarithmic function of t (Fig. 5). Furthermore, we determined the number of independent parameters supported by the data and the model by modeling the information gain as being limited by the Gaussian channel at the center of our optimization procedure (see Fig. 1). We neglected other losses of information that might occur during the calculation of the noisefree reflectivity x, and we relied on the assumption that the MCMC robustly finds the global solution of the fitting problem. In quantitative terms, the fit of the time dependence to equation (23) suggested that a measurement can resolve four independent parameters in this example. Equation (23) further allowed us to independently determine the rate of information gain as a function of measurement time per independent parameter. This constitutes an improvement over similar studies in smallangle scattering (Pedersen et al., 2014).
We also studied the dependence of ΔH on the maximum momentum transfer Q_{z,max} in a measurement. Fig. 6 visualizes a result that is consistent with the rule of thumb expressed as the canonical resolution of a reflectivity experiment. For optimal information gain, a measurement must extend to a value of Q_{z} that depends on the smallest feature size of the sample. We showed for our example that extending the measurement beyond this value does not yield a significant increase in information gain, other than that stemming from the time spent measuring additional data points. Moreover, Fig. 6 demonstrates that the transition from ignorance to knowledge is rather sudden at this particular value of Q_{z} (left to right) – in contrast to the steady logarithmic increase in information gain as a function of simulated counting times (bottom to top).
While the virtual sample structure has a smallest length scale of 20 Å, real samples typically do not have such a limit. Nevertheless, the conclusion can be drawn that spending measurement time to assess reflectivities beyond the Q_{z,max} that is associated with the smallest feature size of interest to the experimenter is not advisable. These conclusions are drawn from the simple structure investigated here; but the clarity of the result indicates that the general approach will probably also yield valuable and interesting results for more complex systems. Here, we have not explicitly tested the main consequence of the Shannon–Nyquist sampling theorem – the notion that a minimum spacing between data points in Q_{z} is required to resolve a structure in real space with a limited total length (Vestergaard & Hansen, 2006; Shannon, 1949). However, this requirement is fulfilled in all virtual experiments simulated here; in fact, the simulated reflectivities are typically heavily oversampled.
While reference layers of high SLD and, in particular, magnetized reference layers probed with spinpolarized neutrons are increasingly used in NR investigations, details of how and to what degree they boost the information gain from the experiment have not been systematically investigated. Surprisingly, we found that performing a spinpolarized experiment is not required for maximum information gain in our example (see Fig. 8). Magnetic splitting of the reference layer SLD is only effective if the nuclear SLD is near that of the Si substrate. However, once the nuclear SLD is shifted away from ρ_{n}^{nucl} = 2 × 10^{−6} Å^{−2}, the information gain is largely independent of the splitting. Our results show essentially uniform information gain for 0 ≤ ρ_{n}^{split} ≤ 4 × 10^{−6} Å^{−2}, for sufficiently large ρ_{n}^{nucl} > 4 × 10^{−6} Å^{−2}. Fig. 7(b) visualizes the mechanism by which the nanoscopic reference layer increases the information gain. While the interference structure of the reflectivity curve from the sample with the reference layer is more regular than that without, showing how the highSLD layer dominates the signal, the details of the interference pattern can be more precisely determined because the signal is raised above the noise. Therefore, profiles 1 and 2 can be distinguished on a sample with a reference layer, but not without such a layer.
Because spinpolarized measurements typically occur at 1/2 of the unpolarized beam intensity, we conclude that polarized reflectometry is not always effective for measurements such as those discussed here. It will be interesting to study how more complex samples, such as partially hydrated sample structures, are affected by reference layers. The result that a polarized neutron measurement is not required to resolve the sample structure does not contradict theoretical work that polarized reflectometry and magnetic reference layers are sufficient to analytically reconstruct the SLD profile of certain classes of samples (direct inversion) (Majkrzak et al., 2009), nor that analytical data inversion recovers the maximum information content from the experimental data (phaseinversion principle) (Berk & Majkrzak, 2009).
5. Conclusion
We implemented a Bayesian and information theoretical framework to determine the information gain from reflectometry experiments with the purpose of experimental optimization. We applied this framework to a selection of test problems, demonstrating its usability and confirming many best practices that have guided the design of reflectometry experiments for a long time. At the same time, we gained nonintuitive insights that challenge some of them. A next step in this development will be an extension to more complex, and more relevant, applications of reflectometry in current research. Marginalization of the posterior PDF will be required to tailor the experiment effectively to a subset of parameters that are of immediate interest to the experimenter. With this in place, we predict significant utility of this framework for the optimization of reflectometry experiments from complex biomimetic interfaces.
Supporting information
Details regarding the simulation of reflectivity data and additional figures. DOI: https://doi.org/10.1107/S1600576718017016/ge5055sup1.pdf
Footnotes
^{1}With current technology, the computation of p(θ  y) for a model of average complexity used in this work takes approximately 1.5 CPU hours. The calculation of H(Θ  y) is negligible in comparison. With a sample size of 1000 in Y, a nested Monte Carlo simulation would average 1500 CPU hours ≃ 60 CPU days per . Optimizations in this work would require the calculation of between 15 and 200 values of , requiring 900 to 12 000 CPU days total.
^{2}This neglects the diminishing effect of background on the relative variance. However, the background arising from the D_{2}O solvent in the current sample is sufficiently small.
Acknowledgements
We thank Drs Markus Deserno and David L. Worcester for critical discussions. Certain commercial materials, equipment and instruments are identified in this work to describe the experimental procedure as completely as possible. In no case does such an identification imply a recommendation or endorsement by National Institute of Standards and Technology, nor does it imply that the materials, equipment or instrument identified are necessarily the best available for the purpose.
Funding information
This work was supported by the US Department of Commerce (grants 70NANB13H009 and 70NANB17H299). Will Golding implemented the initial et al., 2014).
calculations within the MCMC optimization software and was supported by the Center for High Resolution Neutron Scattering (CHRNS), a partnership between the National Institute of Standards and Technology and the National Science Foundation under agreement No. DMR1508249. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant No. ACI1053575. Specifically, it used the Bridges system, which is supported by NSF award No. ACI1445606, at the Pittsburgh Supercomputing Center (PSC) (TownsReferences
Ankner, J. F. & Majkrzak, C. F. (1992). Proc. SPIE, 1738, 260–269. CrossRef Google Scholar
Berk, N. F. & Majkrzak, C. F. (2009). Langmuir, 25, 4132–4144. CrossRef CAS Google Scholar
Blasie, J. K., Zheng, S. & Strzalka, J. (2003). Phys. Rev. B, 67, 224201. Web of Science CrossRef Google Scholar
Braak, C. J. F. ter & Vrugt, J. A. (2008). Stat. Comput. 18, 435–446. Google Scholar
Chen, B., Wang, J., Zhao, H. & Principe, J. C. (2016). Entropy, 18, 196. CrossRef Google Scholar
Cover, T. M. & Thomas, A. J. (2006). Elements of Information Theory. Hoboken: WileyInterscience. Google Scholar
Dura, J. A., Pierce, D. J., Majkrzak, C. F., Maliszewskyj, N. C., McGillivray, D. J., Lösche, M., O'Donovan, K. V., Mihailescu, M., PerezSalas, U., Worcester, D. L. & White, S. H. (2006). Rev. Sci. Instrum. 77, 074301. Web of Science CrossRef Google Scholar
Heinrich, F. & Lösche, M. (2014). Biochim. Biophys. Acta, 1838, 2341–2349. CrossRef CAS Google Scholar
Holt, S. A., Le Brun, A. P., Majkrzak, C. F., McGillivray, D. J., Heinrich, F., Lösche, M. & Lakey, J. H. (2009). Soft Matter, 5, 2576–2586. CAS Google Scholar
Kirby, B. J., Kienzle, P. A., Maranville, B. B., Berk, N. F., Krycka, J., Heinrich, F. & Majkrzak, C. F. (2012). Curr. Opin. Colloid Interface Sci. 17, 44–53. CrossRef CAS Google Scholar
Konarev, P. V. & Svergun, D. I. (2015). IUCrJ, 2, 352–360. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Kramer, A., Hasenauer, J., Allgöwer, F. & Radde, N. (2010). 2010 IEEE International Conference on Control Applications, pp. 493–498. IEEE. Google Scholar
Larsen, A. H., Arleth, L. & Hansen, S. (2018). J. Appl. Cryst. 51, 1151–1161. CrossRef CAS IUCr Journals Google Scholar
Lesniewski, J. E., Disseler, S. M., Quintana, D. J., Kienzle, P. A. & Ratcliff, W. D. (2016). J. Appl. Cryst. 49, 2201–2209. CrossRef CAS IUCr Journals Google Scholar
Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. (2013). PLoS Comput. Biol. 9, e1002888. CrossRef Google Scholar
Liepe, J., Kirk, P., Filippi, S., Toni, T., Barnes, C. P. & Stumpf, M. P. H. (2014). Nat. Protoc. 9, 439–456. CrossRef CAS Google Scholar
Luzzati, V. & Taupin, D. (1986). J. Appl. Cryst. 19, 39–50. CrossRef CAS IUCr Journals Google Scholar
Majkrzak, C. F. & Berk, N. F. (2003). Physica B, 336, 27–38. CrossRef CAS Google Scholar
Majkrzak, C. F., Berk, N. F., Kienzle, P. A. & PerezSalas, U. (2009). Langmuir, 25, 4154–4161. CrossRef CAS Google Scholar
Majkrzak, C. F., O'Donovan, K. V. & Berk, N. F. (2006). Neutron Scattering From Magnetic Materials, pp. 397–471. Amsterdam: Elsevier. Google Scholar
Maranville, B. B., Kirby, B. J., Grutter, A. J., Kienzle, P. A., Majkrzak, C. F., Liu, Y. & Dennis, C. L. (2016). J. Appl. Cryst. 49, 1121–1129. CrossRef CAS IUCr Journals Google Scholar
Moore, P. B. (1980). J. Appl. Cryst. 13, 168–175. CrossRef CAS IUCr Journals Web of Science Google Scholar
Müller, J. J., Hansen, S. & Pürschel, H.V. (1996). J. Appl. Cryst. 29, 547–554. CrossRef Web of Science IUCr Journals Google Scholar
Pedersen, M. C., Hansen, S. L., Markussen, B., Arleth, L. & Mortensen, K. (2014). J. Appl. Cryst. 47, 2000–2010. Web of Science CrossRef CAS IUCr Journals Google Scholar
Russell, T. P. (1990). Mater. Sci. Rep. 5, 171–271. CrossRef CAS Google Scholar
Schalke, M. & Lösche, M. (2000). Adv. Colloid Interface Sci. 88, 243–274. CrossRef CAS Google Scholar
Shannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10–21. Google Scholar
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London, New York: Chapman and Hall. Google Scholar
Sivia, D. S. & Webster, J. (1998). Physica B, 248, 327–337. CrossRef CAS Google Scholar
Smith, G. S. & Majkrzak, C. F. (2006). International Tables for Crystallography, Vol. C, Mathematical, Physical and Chemical Tables, edited by E. Prince, pp. 126–146. Chester: International Union of Crystallography. Google Scholar
Taupin, D. & Luzzati, V. (1982). J. Appl. Cryst. 15, 289–300. CrossRef CAS Web of Science IUCr Journals Google Scholar
Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., Hazlewood, V., Lathrop, S., Lifka, D., Peterson, G. D., Roskies, R., Scott, J. R. & WilkinsDiehr, N. (2014). Comput. Sci. Eng. 16, 62–74. CrossRef Google Scholar
Vestergaard, B. & Hansen, S. (2006). J. Appl. Cryst. 39, 797–804. Web of Science CrossRef CAS IUCr Journals Google Scholar
Yustres, Á., Asensio, L., Alonso, J. & Navarro, V. (2012). Comput. Geosci. 16, 1–20. CrossRef Google Scholar
Zimmermann, K. M., Tolan, M., Weber, R., Stettner, J., Doerr, A. K. & Press, W. (2000). Phys. Rev. B, 62, 10377–10382. CrossRef CAS Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.