Determining the maximum information gain and optimising experimental design in neutron reflectometry using the Fisher information

An approach based on the Fisher information (FI) is developed to quantify the maximum information gain and optimal experimental design in neutron reflectometry experiments. In these experiments, the FI can be analytically calculated and used to provide sub-second predictions of parameter uncertainties. This approach can be used to influence real-time decisions about measurement angle, measurement time, contrast choice and other experimental conditions based on parameters of interest. The FI provides a lower bound on parameter estimation uncertainties and these are shown to decrease with the square root of measurement time, providing useful information for the planning and scheduling of experimental work. As the FI is computationally inexpensive to calculate, it can be computed repeatedly during the course of an experiment, saving costly beam time by signalling that sufficient data has been obtained; or saving experimental datasets by signalling that an experiment needs to continue. The approach's predictions are validated through the introduction of an experiment simulation framework that incorporates instrument-specific incident flux profiles, and through the investigation of measuring the structural properties of a phospholipid bilayer.


Introduction
The Fisher information (FI) [Fisher, 1925] has been applied across many fields, from information theory and communications [Wang andYin, 2010, Barnes et al., 2019] to quantum mechanics [Barndorff-Nielsen andGill, 2000, Petz, 2002], quantitative finance [Taylor, 2019] and volcanology [Telesca et al., 2009]. The FI provides a way of measuring the amount of information an observable variable carries about an unknown parameter of a distribution that models the observable. For certain situations it is possible to analytically calculate the FI, giving a measure of parameter uncertainty, as well as inter-parameter covariances, from which correlations can be derived. Neutron reflectometry allows one to model a measured reflectivity curve in order to determine the properties of the thin film layer structure that produced the curve. Most reflectometry analyses use sampling methods to extract parameter uncertainties, though this is expensive and cannot be performed in real time with current software [Nelson and Prescott, 2019, Kienzle et al., 2017, Hughes, 2017. In this work, we describe an application of the FI to neutron reflectometry in enabling real-time estimation of parameter uncertainties, as well as a projection of these with time. We compare the results to established sampling methods and demonstrate the FI's use for experimental design, and for potentially enabling early stopping of experiments based on counting statistics.
In reflectivity, a thin film is described by a thickness, a scattering length density (SLD) which is the product of the neutron scattering length and the film density, and an interfacial roughness; thin film heterostructures are composed of multiple thin films on top of each other. In the analysis of reflectometry data, we are presented with data points of reflectivity, r, as a function of momentum transfer, Q, and wish to infer the SLD profile from the top surface to the substrate. For a single interface, i.e. a semi-infinite substrate, the neutron reflectivity decays ∼ Q −4 , also called the Fresnel reflectivity. A single layer on a substrate is analytically solvable [Sivia, 2013], however, for more layers, multiple reflections are possible, and the inversion of the curve to a SLD profile is non-trivial. In fact, the loss of phase information upon reflection makes inversion of the SLD profile from the reflectivity profile an inverse problem [Majkrzak and Berk, 1995] and approximations are required.
Typically, reflectometry analysis is model-dependent where a model is defined using series of contiguous layers and the model reflectivity is calculated using the Abelès matrix formalism for stratified media [Abelès, 1948] or Parratt recursive method [Parratt, 1954]. However, the solution to this analysis is not necessarily unique and often requires a priori knowledge such as details of a system or the underlying science. Such prior knowledge helps to limit the dimensionality of the optimisation space by reducing the number of structures that agree with the experimental data within some tolerance. Methods have been devised to estimate interface properties, using this prior knowledge, that describe the data while adhering to a given set of constraints. Such methods include optimisers applying gradient projection [Byrd et al., 1995], annealing processes [Xiang et al., 1997], and evolutionary algorithms [Storn and Price, 1997].
Another approach to optimisation is the use of sampling methods of which there are two discussed in this work, Metropolis-Hastings Markov Chain Monte Carlo (MCMC) [Metropolis et al., 1953, Hastings, 1970 and nested sampling [Skilling, 2004, Skilling, 2006, both of which are Bayesian and sample the parameter posterior distribution. Due to the typically high dimensionality of the parameter space in reflectometry, Bayesian sampling methods tend to be computationally expensive and impractical for obtaining results, such as parameter estimates and covariances, in real-time. We use refnx MCMC sampling [Nelson and Prescott, 2019] and dynesty nested sampling [Speagle, 2019] to sample our data and to compare the results to those derived from the FI; refnx uses the emcee package [Foreman-Mackey et al., 2012] to provide an implementation of an invariant MCMC ensemble sampler [Goodman and Weare, 2010].
Much work has been undertaken on quantifying the information content of a reflectivity dataset, with most applying Bayesian statistics, where probability represents a degree-of-belief or plausibility based on the evidence at hand [Sivia and Skillings, 2012]. One such approach, looked at experimental optimisation by determining the information gain from a given experiment using the entropies of the posterior and prior probability density functions [Treece et al., 2019]. Similarly, work has been done on quantifying the information gain from scattering experiments as a function of the scattering length density of molecular components [Heinrich et al., 2020]. Many other Bayesian information-based approaches have been applied to reflectometry including the use of Bayesian evidence to determine the set of free parameters that maximise model information density [McCluskey et al., 2020], and using maximum entropy to reconstruct a SLD profile from a reflectivity curve [Sivia et al., 1991].
Similarly to previous works, we propose a methodology for quantifying the information content of a reflectivity dataset for use in determining maximum information gain and experimental design optimisation. However, we attempt to solve a slightly different problem to previous work; the calculations that are made using our framework are different to those of Bayesian techniques. Although the goal of these estimation procedures remains the same, we derive the maximum information that the dataset contains, given the current data point error bars, not the information content that can be readily extracted; for example, by sampling the posterior distribution.
For our application, the uncertainties on our reflectivity points are defined as the square root of the number of neutron counts and these counts are governed by Poisson statistics. Under these assumptions, we can analytically calculate the FI and apply the Cramér-Rao bound [Cramér, 1946, Rao, 1994. The bound states that the inverse of the FI provides a strict lower bound on the variance of an observable variable and, as consequence, the FI provides us with a strict upper bound on the amount of information extractable from the observable. In practice, using this analytical derivation, we can achieve sub-second calculations of parameter uncertainties.
To evaluate the FI approach, we developed an experiment simulation framework based on the underlying assumptions of Poisson statistics for neutron counts. Since the FI is calculated using neutron counts, such a framework is necessary in calculating the FI in the general case where any model is given. Further, this framework allows us to calculate the information content of any experimental conditions without costly beamtime to acquire the same data. The simulation framework is general in that it can simulate any beamline, given the incident flux profile of the instrument in question, and shown to be accurate without requiring computationally expensive Monte Carlo methods.

The Fisher Information Matrix
The FI is a fundamental quantity in statistical theory and quantifies parametric changes in probability distributions in terms of information. It is related to various kinds of information "distance", most notably the Kullback-Leibler (KL) divergence: a building block for many familiar information theoretic measures [Kullback, 1997]. The KL divergence is the standard way of measuring a difference between distributions in terms of information and, when applied to parametric distributions, provides the "distance" between parameter values. As a rule of thumb, we can understand a one sigma (one standard deviation) difference in parameters as a KL divergence of one nat (natural unit of information). To a first approximation, this distance is calculated from the FI.
More formally, the FI matrix, g, is the first non-zero term in the series expansion of the KL divergence from one (vector) parameter ξ to ξ + ∆ξ (written here as D[ξ ξ + ∆ξ]): Thus, we can think of the FI matrix as a way of scaling the parameters so that, for sufficiently small changes in parameters, the square Euclidean distance is the informational change. The FI is therefore relative to the specified parameters, and measured in nats per parameter unit squared; it is local and not dimensionless.
The one nat per sigma relationship is exact for many widely used distributions, such as for a multivariate normal with constant covariance. In this case, the FI is the inverse of the associated covariance matrix. Correspondingly, a practical way this is used is to set an information threshold, e.g. one nat (one sigma), and find by how much parameters must change to reach this threshold, thereby specifying an acceptance and/or confidence region, as is discussed in section 2.4 and the SI.

Derivation
To derive the equations for information content quantification using the FI, we must first provide a structure for given reflectivity data of N points (equivalently, histogram bins). This structure consists of contiguous layers representing a physical sample with each layer being defined by its thickness, SLD and interfacia roughness. A model is then described by this structure and given measurement background noise, experimental scale factor, and instrument resolution; we need only vary the M unknown parameters of this model. Such a model describes the reflectance at a given neutron momentum transfer, for example, using the Abelès matrix formalism implemented in refnx.
Please refer to the supplementary information (SI) for the full derivation but, in summary, the FI matrix, g ξ , for the M parameters, ξ, of a model of N reflectivity points, is given by where J is the Jacobian of the reflectances, r i , with respect to the parameters, ξ, and M is a diagonal matrix of incident counts, s i divided by model reflectances r i . In the FI matrix, the FI for an individual parameter, ξ i , corresponds to the diagonal element g ξ i,i .
Using equation 1, we can calculate the FI for a model describing a single dataset. However, more complicated experiments often involve multiple datasets, such as measuring multiple experimental contrasts. For such cases, n models, potentially containing inter-model parameter constraints, are required as input. The calculation is the same as in equations 1, 2 and 3 except the parameters, ξ, are the union of all of the (potentially shared) parameters of the n models. Additionally, the Jacobian, J, and matrix, M, are calculated over the concatenated incident counts and model reflectances.

Experiment Simulation
To simulate an experiment, we require both a model and knowledge of the flux of incident neutrons as a function of wavelength. In our case, this was taken on the OFFSPEC reflectometer [Dalgliesh et al., 2011]. We can multiply this incident neutron flux by a constant in order to change the experimental counting time and to give us the number of incident neutrons for a simulated experiment; this approach was developed from the ideas presented by [Mironov et al., 2021]. To account for different measurement angles, we multiply this incident flux by a factor to compensate for different collimating slit openings. Since both of the slits that are used to define the beam footprint scale linearly with the angle, we scale the intensity as the square of the angle.
We calculate the momentum transfer, Q, for each wavelength, λ, in the file using the measurement angle, θ, and equation Q = 4π sin θ λ By default, these Q values are assigned to geometrically spaced bins, with the number of bins being set to the desired number of points for the simulated dataset. Following this, we calculate each bin's centre, Q i . Alternatively, a given set of Q bin centres can be used. The model reflectivity for each bin, r i , is calculated using refnx and additive instrument background noise, α, is added (optionally, this can be accounted for in the model reflectivity calculation). Next, this reflectivity is multiplied by the bin's incident flux, µ i , to obtain the reflected flux and then multiplied by the simulated measurement time, τ , to get the reflected counts for the bin. We use the reflected counts as the mean rate parameter of a Poisson distribution, from which, we obtain a random value giving us an appropriately randomised number of reflected counts, N i .
The bin's uncertainty in count-space is then the square root of this value, √ N i . To obtain the reflectivity and associated uncertainty, we simply divide the "noisy" counts and uncertainty by the number of incident neutrons for the bin, s i (I.e. the product of the bin's incident flux and measurement time, µ i τ ).
The previously described process will generate a single reflectivity dataset for a given model. For more complicated experiments involving multiple datasets, n models are required as input, and the above process repeated for each model, yielding n simulated datasets.

Applying the Fisher Information
The FI can be applied to both experimentally measured and simulated data. As the FI measures the instrumental uncertainty relative to changes in model parameters, a parameterised model must be provided and parameter values input. These parameter values can be obtained as estimates based on data, or specified manually if the task is simply the verification of a particular structure.
We use refnx to define a model and to load the model's associated measured or simulated reflectivity data. The parameters of the model are then optimised using a fitting algorithm of choice, in our case differential evolution [Storn and Price, 1997], according to the χ 2 distance or equivalently, the negative log-likelihood. The likelihood provides a measure of difference between a given dataset and model and, for this work, is defined as its implementation in refnx where, N is the number of measured data points, r i is the experimental reflectivity at the ith Q point, δr i is the uncertainty in the experimental reflectivity at the ith Q point and r im is the model reflectivity at the ith Q point calculated using the Abelès matrix formalism.
From here on, we no longer need the data, since the model and Poisson statistics describe the data sufficiently. Next, we calculate the Jacobian, J, whose entries are the gradient of the model reflectivity, r i , with respect to each of the model parameters, ξ j . We estimate this by using a finite difference approximation based on the reflectance for parameter values 0.5% either side of the input value. Using this and the diagonal matrix, M, of incident counts divided by model reflectances, we can calculate the FI matrix using equation 1. Since, the calculation of this matrix is relatively simple, implementation for use in other fitting software [Kienzle et al., 2017, Hughes, 2017, Björck and Andersson, 2007 should be straightforward.
The FI matrix contains all of the information about parameter variances and covariances but these values require extraction. The variance of a single parameter is simply given by the inverse of the FI and so its uncertainty is given by the square root. In the general case of multiple parameters, the uncertainty, i , for a parameter, ξ i , is obtained from the square root of the inverse of the diagonal elements of the FI matrix Finally, to extract the covariance between any two parameters, we can calculate a confidence ellipse of given size k standard deviations (see SI for details).

Application to Soft Matter
To illustrate the utility of the FI, we applied our framework to an experiment measuring a common model system for structural biology: a 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) bilayer deposited onto a silicon surface. The lipids were measured against two water contrasts, H 2 O and D 2 O. The data was taken using the CRISP neutron reflectometer [Penfold et al., 1987] as part of the ISIS neutron training course and simultaneously fit using RasCAL [Hughes, 2017]. This fitting was constrained against measured data for a bare Si/D 2 O interface including a native SiO 2 layer.
Our model for the bilayer was defined by two lipid leaflets with fixed surface coverage. The model was fitted by area per molecule rather than volume fractions to avoid ambiguity arising from differing total molar quantities of headgroup and tailgroup components. The model also accounted for the headgroups and tailgroups containing water through defects across their surfaces and also the water bound to the hydrophilic headgroups. After fitting the experimental data, we reparamaterised the bilayer model as a function of contrast SLD and, using this new model, were able to simulate the DMPC bilayer experiment on the OFFSPEC reflectometer (using our instrument flux profile) with arbitrary contrast SLD. We then investigated the change in the FI, for each model parameter, with contrast SLD.
For our parameterisation, we have assumed the molecular volumes of the headgroups and tailgroups are known and constant, and that any changes in molecule surface area are inversely proportional to the headgroup and tailgroup thicknesses. Structural biology is a large field of research with varying values for these molecular volumes used. Further, these molecular volumes may vary with measurement conditions and may not necessarily be constant in practice [Campbell et al., 2018]. However, as to not overcomplicate our model, we have fixed them. The full details of the bilayer model parameterisation and fitting can be found in the SI.

Measured vs. Simulated Data
To demonstrate the robustness of our experiment simulation, we compare a dataset measured using the OFFSPEC neutron reflectometer to its simulated counterpart. The data was experimentally measured using angles 0.3 • , 0.4 • , 0.5 • , 0.6 • , 0.7 • , 2.0 • and 3.0 • with measurement times 7.5, 7.5, 7.5, 15, 15, 60, and 120 minutes respectively. The data from these angles was stitched together to produce a single data file. To obtain a "ground truth" model for simulation, we fitted this stitched data using refnx to get table 1. The background, experimental scale factor and resolution used for fitting were 8 × 10 −7 , 0.783, and 2.5% dQ/Q respectively.
To facilitate a measurable difference in the noise characteristics of the experimentally measured data and the data generated by our simulation framework, we took 1.5 minute time-slices from the measured data associated with each individual angle. For each measurement angle, we used the same angle, counting time and number of points for the simulation. As can be visually seen in figure 1, the noise characteristics of the time-sliced measured data, and simulated data are very similar. Statistically, comparing the datasets we find using the Hotelling's t 2 test p = 0.874 and t 2 = 0.159, Anscombe Transformed [Hotelling, 1931, Anscombe, 1948, implying no significant differences between the measured and simulated data.

Benchmarking
As mentioned previously, the FI approach has notable performance upsides. To demonstrate this, we compared the time to obtain parameter uncertainties using the approach and using established Bayesian methods, given a correct and fitted model. Fitting times have been excluded from these results since the time to fit would dominate the computation time of the FI approach, and thus would provide little insight into the computational advantage of the FI calculation. Additionally, since fitting is typically required for MCMC sampling but not for nested sampling, a fair comparison between all methods becomes difficult. We therefore focus on the time taken for each method to complete given optimal starting values.
The benchmark was run on a CPU with no methods having multiprocessing explicitly enabled. MCMC sampling was run with a 400 step burn-in period followed by a 30 step sample with each sample being separated by 100 iterations. Nested sampling was run using the default dynesty stopping criteria which is optimised for evidence estimation [Speagle, 2019]. Uniform priors were used with a 25% bound above and below the ground truth for each parameter. Following this, we ran our FI approach on the same samples. Table 2 compares the mean processing times of 10 samples for each number of layers and, as can be clearly seen, the FI approach is significantly faster.
It should be noted that our implementation is not particularly optimised and we believe further performance gains could be obtained if they were required.
For each number of layers in the interval [1, 6], we randomly generated 10 samples and varied the SLD, thickness and interfacial roughness of each layer in each sample. Each sample used a silicon substrate of SLD 2.047 × 10 −6Å −2 and the random SLD, thickness and roughness of each layer were sampled from uniform distributions of intervals [−1, 10] × 10 −6Å −2 , [20, 1000]Å and [2, 8]Å respectively. Using our experiment simulation, we synthesised data for each of these samples and ran both MCMC and nested sampling to obtain parameter uncertainties. Each experiment simulation consisted of 140 points obtained from two angles, 0.7 • and 2.0 • , using simulated measurement times of 7.5 and 30 minutes respectively. Background noise of 10 −6 , instrument resolution of 2% dQ/Q and experimental scale factor of 1.0 were used.

Corner Plots and Confidence Ellipses
In refnx and dynesty, the results of MCMC and nested sampling respectively can provide a corner plot which is "an illustrative representation of different projections of samples in high dimensional spaces" [Foreman-Mackey, 2016]. These Bayesian sampling methods sample the parameter posterior distribution, allowing contours to be drawn through samples that are equally probable. The FI, however, is developed from a frequentist view and the confidence ellipses bound regions where we have at least a kσ confidence of the value. Despite these fundamental differences, the sampling corner plots do still often agree very closely with the FI confidence ellipses.
For samples with mostly uncorrelated parameters, we found that corner plots show strong agreement with confidence ellipses. However, when more parameter correlation is present, sampling uncertainties are much larger and we reach a point at which the FI still represents the maximum obtainable information, but this seemingly cannot not be extracted from the experimental data. As consequence, the confidence ellipses do not match the corner plots as closely. This discrepancy is shown in figure 2 which compares two samples: a simple sample with mostly uncorrelated parameters and a more complicated sample with more parameter correlation due to similar layer SLDs. The datasets of figure 2 were both simulated with the same run condition as detailed in section 3.2.
One potential source of deviation between corner plots and confidence ellipses may come from our fitting algorithm of choice. For our application of the FI, our estimator is a fitting algorithm and, so far, we have assumed that this estimator is unbiased. Thus, the Cramér-Rao bound implies that the inverse of the FI is a lower bound on the variance of this estimator. However, in practice, we found that our fitting algorithm of choice, differential evolution in refnx, may exhibit bias in some cases. To measure this bias we simulated 1000 experiments, using the same simulation conditions previously used, for a number of different samples of varying complexity and calculated the difference between the "ground truth" and mean fitted parameter values. Table 3 shows the fitting biases in the parameters of the figure 2 samples. As can be evidently seen, the fitting bias is greater in the sample with larger inter-parameter correlations, particularly in the layer thicknesses; these biases are model dependent and potentially fitting package dependent.
The Cramér-Rao bound may be modified for a biased estimator. However, for a real measurement, there is no way to be able to tell if such a bias exists. As such, we leave our approach with the stricter limit (since any bias always increases the variance), and remind ourselves that the maximum possible information contained in the data is not always going to be the maximum extractable.

Time Dependence
One potential use of the FI in reflectometry is enabling early stopping of experiments based on counting statistics. To determine the feasibility of this application, and to validate our implementation, we investigated how parameter uncertainties change with measurement time. As derived in the SI, we should expect the uncertainty of a parameter, , to be inversely proportional to the square root of the experiment measurement time, τ . By using the fact that ∝ 1/ √ τ , introducing a non-zero proportionality constant, α, and taking the natural logarithm of both sides we see Using this result, we should expect the gradient of the plot of log parameter uncertainty, ln , vs. log time, ln τ , to be − 1 2 . To confirm this is the case, we compared established fitting uncertainty measures and uncertainties derived from the FI with increasing time using our experiment simulation framework; the fitting uncertainties were calculated using differential evolution in refnx and the FI uncertainties using equation 4.
Using simple linear regression, we found that the time dependence for any parameter's uncertainty was indeed determined by the square root of the measurement time as is shown in figure 3. We used the same samples and simulation parameters as those used for figure 2 except for the simulated measurement time. Both samples were initially simulated using the same times as before and then these times were multiplied by an increasing "time factor" from 1 to 1000. This essentially split a fixed time-budget between the simulated angles, 0.7 • and 2.0 • , with a ratio of 1:4.
For the simple sample, the relationship is perfectly exhibited. However, for the more complicated sample, the results are slightly noisier due to the increased difficulty of fitting. This is particularly noticeable when the counting time is low and the data being fitted is impacted by our added noise to a greater degree. With low counting statistics, differential evolution may terminate in a minimum of the χ 2 parameter space that does not represent the ground truth model (I.e., the simulated data no longer uniquely describes the true parameter set), resulting in uncertainties that deviate from the time dependence relationship previously derived. This difficulty in fitting is shown in figure 3 where the mean absolute error between the "ground truth" and fitted parameter values are plotted against time. As can be seen at lower counting times, the fitting errors are approximately an order of magnitude larger than those at higher counting times.
Since we now know that parameter uncertainties decrease as the square root of the measurement time, we are able to easily project the evolution of these uncertainties and can predict when some desired threshold would be reached, at which time we may want to cease the measurement. It is the choice of the experimenter to decide on such a threshold with the choice likely weighing factors including time to change angle, time to change sample, total time budget and number of samples being measured. Such choices are not necessarily easy to make prior to starting an experiment and so automating this process may warrant further investigation.

Application to Soft Matter
As detailed in section 2.5, we applied our framework to a soft matter experiment by taking experimentally measured data, fitting a DMPC bilayer model, and reparameteristing the model as a function of bulk water contrast SLD. The fitted SLD profiles and experimental reflectivity data are shown in figure 4. Using the model, data was simulated for each contrast SLD from −0.56×10 −6Å −2 to 6.35 × 10 −6Å −2 (pure H 2 O to pure D 2 O) and the FI calculated for each model parameter, obtained from the diagonal elements of the FI matrix of equation 1. These results are shown in figure  4 for an initial contrast choice and for a second contrast choice, assuming D2O was first measured. For simulation, angles of 0.7 • and 2.0 • and times of 15 and 60 minutes, respectively, were used (typical measurement times). Shown also are the nested sampling corner plots from sampling simulated data of solely D 2 O, and D 2 O and H 2 O contrasts using the reparameterised model. Included in these corner plots are the sampling uncertainties associated with each parameter.
Since the units of the FI are nats per parameter unit squared, it is not technically correct to directly compare the FI between parameters. However, we can still compare the information content of parameters of the same unit. As might naively be expected, we show that it is possible to extract more information about some parameters than others. This result is certainly no surprise for researchers in the field who have experience fitting this system, but it does allow us to quantify it.
We show that the information of a parameter as a function of contrast is non-monotonic, almost certainly due to hydration of various components leading to them becoming indistinguishable from neighbouring components for some bulk water deuterium concentrations. This is particularly no-ticeable with the SiO 2 hydration parameter for an initial contrast choice, where the large drop in information is due to the matching of contrasts. Since most of the other model parameters describe multiple interfaces, with only one interface being able to become "invisible" through contrast matching at a time, there are no zeroes in the FI for these plots. Figure 4 indicates that the greatest information is almost always obtainable from the highest SLD water, D 2 O. For an initial measurement contrast, the difference in information between the H 2 O and D 2 O extremes is significant. However, when considering a second measurement contrast, the information gain between the two contrast SLDs is less. It is well established that measuring multiple contrasts of different SLDs will reduce parameter uncertainties and so this result may seem unusual; it could suggest measuring D 2 O for twice as long would reduce parameter uncertainties similarity to that of measuring D 2 O and H 2 O. However, we believe the aim of measuring multiple contrasts is not only in lowering parameter uncertainties but also in the decoupling of parameter correlations (I.e., lowering inter-parameter covariances). Therefore, only considering which contrasts maximise the FI will not account for this covariance reduction. This is illustrated in the corner plots of figure  4 where the estimated posterior distributions from the D 2 O and H 2 O data are clearly much better defined (I.e., more Gaussian) than just D 2 O. For example, the model roughness parameters are very poorly defined with just a single D 2 O measurement.
While difficult to display due to the number of parameter pairs, if you have a model, it would be possible to calculate the optimal contrast to measure that minimises both parameter variances and inter-parameter covariances. The optimal solution is almost certainly model dependent, but given the broad features found here, it is unlikely to be an issue having a slightly incorrect model.

Limitations
Our framework is essentially frequentist and, much like one does in hypothesis testing, it proceeds by calculating probabilities based on hypothetical, assumed or estimated parameter values. The size of uncertainties, for example, are those that would exist, if the estimated parameter values were correct. Determining uncertainties in this way may appear to be an issue, particularly in reflectometry, since the determination of the globally "correct" model is non-trivial. As ever, it is the choice of the experimenter to decide whether their model, guided by their underlying knowledge of the system, will accurately represent the true system. However, even if the model is not exactly correct, the FI still provides value. Since our calculations effectively perform a sensitivity analysis of the parameters (more sensitive means a larger gradient in the Jacobian, J, and therefore a larger FI), similar models with differing values are still very likely to have the same behaviour and give the same trends. For example, for our DMPC bilayer data, there are many more model parameterisations that have been argued in the literature, and differing values for the parameters we have chosen to fix. However, our simplifying assumptions describe the measured data to a satisfactory level and we are confident that equivalent parameterisations would not change the trends found from our calculations.
Usually, to optimise the parameter values, the model is fitted before calculating the FI; in our case, we have used differential evolution. Fitting algorithms can often provide estimates of parameter variances and covariances and in some cases these values may be very similar to those provided by our framework. However, the fit-derived (co)variances have no deeper statistical underpinning, and do not allow you to further investigate the system. Our framework, being based on the counting statistics and parameter sensitivity of the model, enables almost instant calculation of uncertainties measured at any point in the future, or for different contrasts/conditions. Determining the (co)variances from a fit several hundreds of times to create a "phase diagram", similar to those described in section 3.5, is not feasible in anything close to real time, as would be useful during an experiment.

Future Work
The presented framework has many potential applications in neutron reflectometry, and other scattering techniques based on counting statistics. As demonstrated by our soft matter application, experimental design is one such use where the FI could be used to influence real-time decisions of measurement angle and/or contrast choice; a similar Bayesian approach would be infeasible for real-time application due to computational overhead. We would also like to apply our framework to more complex real-world systems, such as magnetic structures. This could provide answers to common questions posed in the literature and give insight as to why particular experimental design choices have found popularity.
The FI framework has the possibility of being extended to quantify additional factors of an experiment such as time to change sample or angle. Additionally, work on quantifying and incorporating fitting biases and inter-parameter correlations into the FI calculation and subsequent analysis could bridge the gap between our framework and established methods. Since the FI uncertainties do not always match those obtained from established methods, it could also be possible to provide experimenters with a metric detailing how closely the FI results would be expected match to established methods. Since the largest variations were found to occur when parameters are strongly correlated, the Pearson correlation coefficient applied to the FI matrix (or similar) could be indicative here.

Conclusions
In this work, we presented a framework for determining the maximum information gain and experimental design optimisation of neutron reflectometry experiments using the Fisher information (FI). We demonstrated how the FI allows us to quantify the information content of a measured data point in relation to given model parameters. To illustrate this point, we developed a robust framework for simulating experimental data with realistic noise characteristics, and then compared the FI-derived results to Bayesian sampling methods. The FI describes the maximum possible extractable information, and therefore can be significantly different from sampling methods. However, this approach has significant upsides in its run time, and ability to project uncertainties, as well as the ability to run experiments in silico. Finally, we demonstrated a practical application of the approach in determining the information content of the parameters of a DMPC bilayer sample parameterised as a function of the bulk water contrast, allowing us to ascertain optimal measurement conditions.
The code for this work is open source and freely available on GitHub [Durant et al., 2021].    In reflectometry, a model describes the reflectivity at a given neutron momentum transfer. These momentum transfer values are binned with the measured number of neutrons in bin i being given by where s i is the number of neutrons incident in bin i. s i is a function of the number of incident neutrons in the ith bin at each experimental condition, µ iθ , and the time each condition is measured for τ θ .

Fisher Information about the λ coordinates
The probability distribution of the measurement in one bin is Poisson distributed and the corresponding Fisher information (FI) for this bin, with respect to λ i , is The probability distribution for the whole measurement is, by independence and the corresponding FI with respect to λ is i.e. a diagonal matrix with values of g λ i which happens to equal (in this case, but not in general), E[cov X] −1 .

Fisher Information about the ξ coordinates.
In general, we can transform the FI using tensor transforms, i.e.
So, the FI in terms of ξ is just To get the FI as a function of τ , we take the derivative of equation 5 with respect to ξ which gives us The derivative ∂r i ∂ξ j is obtained from the model alone (irrespective of the data); it is the derivative of the reflectivity for bin i with respect to the jth ξ parameter.
We can now put everything together. First, we have the initial FI about the λ parameter, which we can write in terms of s i and r i can be re-written in matrix form in terms of a diagonal matrix, S, and the Jacobian matrix, J, for the N modelled reflectivity points, with respect to the M parameters, ξ.
The tensor transformation of g λ in this notation is then and the matrix Sg λ S is a composition of diagonal matrices, and is equal to

Summary
In summary, the FI about ξ is given by where, J is the Jacobian of the reflectances, r i , with respect to the parameters, ξ. M is a diagonal matrix with entries (s 0 /r 0 , s 1 /r 1 . . . s N /r N ) and s i is the incident neutron flux, which depends on the experimental condition k and the time spent measuring it, τ k s i (τ ) = k τ k µ ik

Confidence Ellipses
To calculate confidence intervals in the general case, we can find the set of parameters that differ from the estimate by a certain number of standard deviations. To do this, we need to know the length of a vector in parameter space in terms of the number of standard deviations. This is what the FI does (technically, it gives a linear approximation to the informational distance between distributions, but they are related). So, if we want to find a vector with a given length, k, we solve It so happens that k in the above equation can be interpreted as "number of standard deviations", so a 2σ error bar will have k = 2. In practice, it is useful to fix a direction, and calculate the magnitude of the vector needed to reach the threshold. I.e. let ∆ξ = ∆ξ where hat denotes a unit vector. Consider the 1-D case; k 2 = 2 g, g is analogous to the inverse variance, so we have k 2 = 2 /σ 2 . Therefore, if we want to know where = 2σ, we would have k 2 = (2σ) 2 /σ 2 and thus, k = 2.
In 2-D, the unit vectors can be written as (sin ϑ, cos ϑ). We can graphically solve the following k 2 = 2 sin ϑ, cos ϑ g ξ sin ϑ cos ϑ scalar for over a sample of angles, θ, in [0, 2π] by plotting the points ( (θ) sin θ, (θ) cos θ). The result is a confidence ellipse of size k between two chosen parameters. If |ξ| > 2, then for two chosen parameters, ξ i and ξ j , the above equation becomes k 2 = 2 sin ϑ, cos ϑ g ξ i,i g ξ i,j g ξ j,i g ξ j,j sin ϑ cos ϑ

Point Estimates vs Posterior Distributions
An estimator, usually written with a hat, is a function of sampled data that provides an estimate of a parameter. In frequentist statistics, one is concerned with the probability distributions of estimators, not of the parameters themselves (as would be the case in Bayesian statistics). The distribution of estimators often has a variance. When it does, and when it is unbiased, i.e.

Time Dependence
If there is a single measurement condition, we have s i = µ i τ . Then, we see that τ is a factor of all the s i 's and so of the matrix M and thus of g ξ . We can therefore write g ξ = τ f ξ . The size of an "error bar", , for a single parameter, ξ, at a threshold, k, in a direction, ∆ξ, is given by And so, k 2 = ( ∆ξ)τ f ξ ( ∆ξ) Hence ∝ 1/ √ τ .

Bilayer Model Parameterisation
For the measured DMPC sample, each experimental dataset was recorded with an instrument resolution of 2%. The instrument backgrounds were 3.21 × 10 −6 , 2.80 × 10 −6 and 2.06 × 10 −6 respectively. The SLDs of the Si substrate and following SiO 2 layer were defined using known values of 2.073 × 10 −6Å −2 and 3.41 × 10 −6Å −2 . From fitting in RasCAL, we obtained values of 14.7Å and 24.5% for the SiO2 layer's thickness and hydration. For both the Si/SiO 2 and SiO 2 /DMPC interfacial roughnesses, we obtained 2.00Å; all other interfacial roughnesses shared a common parameter that was fitted as 6.57Å.
We have assumed the molecular volumes of the headgroups and tailgroups are known and constant at 320.9Å 3 and 783.3Å 3 , and that any changes in molecule surface area are inversely proportional to the headgroup and tailgroup thicknesses. Therefore, we need only fit one parameter: the area per molecule (APM) at the surface. From the APM we can calculate the tailgroups thickness using the known volume. Thickness = Volume APM (7) From fitting the APM parameter, we obtained a value of 49.9Å 2 . Also using the tailgroup volume, the SLD of the tailgroups can be calculated using the known tailgroup scattering length (SL) of −3.08 × 10 −4Å and equation SLD(ρ) = Σb Volume (8) Both the headgroups and tailgroups contain water through defects across their surfaces but there is also water bound to the hydrophilic headgroups. The model accounted for these differing hydration types by varying two parameters: the total bilayer hydration and headgroup bound waters for which we obtained values of 7.37% and 3.59. The headgroup water SLs in H 2 O and D 2 O were calculated as the product of the bound waters parameter and the known SLs of −1.64 × 10 −5Å and 2.00 × 10 −4Å for the H 2 O and D 2 O solutions. Further, the product of the bound waters parameter and the known volume of water, 30.4Å 3 , yielded the headgroup water volume. The headgroup thickness was calculated using the total headgroup volume (including bound water), APM parameter and equation 7. To determine the headgroup SLs in H 2 O and D 2 O we took the sum of the known headgroup SL of 6.41 × 10 −4Å and headgroup SL in H 2 O and D 2 O previously calculated. Finally, using the headgroup SL in each contrast and the calculated headgroup volume, we calculated the SLD of the headgroup in each solution using equation 8.
We reparamaterised the fitted model as a function of bulk water contrast SLD using the known SLD of the DMPC headgroups of 1.98 × 10 −6Å −2 (if no hydrating water was present) and approximating the headgroup hydration at 27%.