Bayesian inference of metal oxide ultrathin film structure based on crystal truncation rod measurements

Reverse Monte Carlo software to analyze the atomic arrangements of perovskite oxide ultrathin films from the crystal truncation rod intensity is developed on the basis of Bayesian inference.


Introduction
Perovskite oxides have been studied for a long time and still attract much attention because of their variety of conductive, electronic and magnetic properties, as well as their practical applications (Tokura & Nagaosa, 2000;Goodenough, 2001;Dagotto, 2005). In the past 15 years, extensive work has been performed on epitaxial interfaces of perovskite oxides (Ohtomo et al., 2002;Ohtomo & Hwang, 2004;Nakagawa et al., 2006;Hwang et al., 2012;Salluzzo et al., 2013;Chakhalian et al., 2014;Middey et al., 2016). Epitaxial interfaces allow us to study atomically ordered interfaces between two different electronic systems, such as ferromagnet/superconductor interfaces or metal/insulator interfaces. Such structures are useful to test our understanding of condensed matter physics (Chaloupka & Khaliullin, 2008) and can be used to create new electronic phases (Reyren et al., 2007). For this reason, enormous effort has been made to control the oxide interface structures.
Most perovskite interfaces are fabricated by using pulsed laser deposition or molecular beam epitaxy techniques while monitoring the quality of the layer-by-later growth by reflection high-energy electron diffraction. The interfacial structure is often examined by using scanning transmission electron microscopy, which is sometimes combined with electron energy loss spectroscopy (Perna et al., 2010;Cantoni et al., 2012). This requires slicing the sample, which sometimes causes damage to the newly exposed surface. In addition, the sample environment is hardly controlled. The resolution of the atomic displacement is typically 0.5 Å for heavy elements, which is atomic resolution. However, when we want to study the electric polarization, which is an essential value for the physical properties of oxides, the resolution is still insufficient. For this purpose, a 0.1 Å resolution is required. Such resolution can be achieved by using surface X-ray diffraction to measure the crystal truncation rods (CTRs) arising from a sudden change in electron density at a specific plane (Robinson, 1986;Feidenhans'l, 1989). This method does not require slicing of the samples, allows tuning of the sample environment and has a resolution better than 0.1 Å . The main drawback of using this technique is the difficulty of the analysis. There are many software programs and algorithms to make the analysis easier. ROD (Vlieg, 2000) and its successor (Vonk, 2011) are classical examples, whose main purpose is the refinement of the surface atomic arrangement. Holographic phase retrieval methods (Takahashi et al., 2001;Yacoby et al., 2002) are often used to generate the initial models for the refinements (Fong et al., 2005;Willmott et al., 2007;Yamamoto et al., 2011;Fister et al., 2014). Such techniques to find a good initial model are indispensable for the interfacial structure analysis of perovskite oxides (Willmott et al., 2007). Electron density analysis techniques provide flexible models, although the derivation of the standard deviations of structure parameters is not straightforward. Holographic analysis relies on the assumption that the sample surface is homogeneous, which is often untrue. Iterative reconstruction (Fung et al., 2007;Bjö rck et al., 2008) shows that the true structure cannot be obtained without enforcing very strict constraints on the electron density in real space, such as positivity, atomicity, and similarity between the substrate and film structures.
Here, we develope a Monte Carlo (MC)-based refinement program to find reliable structural models for perovskite oxide interfaces. The method we adopt is similar to the reverse Monte Carlo method used for liquids or amorphous bodies (McGreevy & Pusztai, 1988;D'Alessandro & Cilloco, 2010;D'Alessandro, 2011). Each MC step makes a specific structural model, which can be regarded as a 'very strict positivity and atomicity constraint in real space'. In principle, the MC technique can treat inhomogeneous surfaces, including domain structures. In addition, the MC calculation can provide the probability density of the structure model, that is, the standard deviations of structure parameters are directly provided when a set of experimental results is given. In this paper, we consider (001)-oriented LaAlO 3 /SrTiO 3 ultrathin films. Fig. 1 presents a schematic of the film. La and Sr occupy the A site, and Al and Ti occupy the B site. Because of the assumed sample orientation, the structure is regarded as an alternating stack of AO and BO 2 planes; the oxygen site in the AO plane is called the O1 site, and those in the BO 2 plane are called O2 sites. In the present study, we refined the structure of a ten-unit-cell-thick region from the surface.

Structure model
The in-plane structure is assumed to have no superstructure, and the a lattice parameter to be the same as that of SrTiO 3 . Now we have only two parameters per site, that is, the atomic displacements along the surface normal direction from the ideal substrate lattice zð; nÞ and the occupancy occð; nÞ, where denotes A, B, O1 or O2, denotes La, Sr, Al, Ti, O1 or O2, and n denotes the layer index starting from the ideal substrate (see Fig. 1). Conditions of occðLa; nÞ + occðSr; nÞ = 1 and occðAl; nÞ + occðTi; nÞ = 1 are assumed, except for n values close to the surface. The isotropic atomic displacement parameter B (Å 2 ) was defined as being common to all atoms within the film. The total number of independent parameters m depends on the constraints we use, and the typical value of m for the present study was 60. The structural parameters of the model are expressed as H = ð 1 ; 2 ; . . . ; m Þ = [zðA; 1Þ, zðB; 1Þ, zðO1; 1Þ, zðO2; 1Þ, zðA; 2Þ, . . . , occðLa; 1Þ, occðSr; 1Þ, occðAl; 1Þ, occðTi; 1Þ, occðO1; 1Þ, occðO2; 1Þ, . . . ].

Bayesian inference
The experimentally observed diffraction intensity from a unit area of the sample surface at scattering vector Q is expressed by I exp ðQÞ. A set of CTR data is composed of the diffraction intensities at different Q, I exp = [I exp ðQ 1 Þ, I exp ðQ 2 Þ, . . . , I exp ðQ N Þ], where N denotes the total number of measured Q positions. Our MC software estimates the surface structure based on a posterior probability PðHjI exp Þ, that is, a conditional probability of structural parameters under the conditions of the measured CTR data. In Bayesian inference, maximizing the posterior probability to estimate parameters is called the maximum a posteriori estimation. The distribution of the posterior probability reflects the uncertainty of the estimation. According to Bayes' theorem, the posterior probability is given by research papers where PðI exp jHÞ is called the likelihood function, which represents the statisical property of the measurement noise, and PðHÞ is called the prior probability, which represents prior information on the structure provided by other experiments and theories. In this paper, PðHÞ is set to a uniform distribution except in the last paragraph of x4, which means that no prior information is assumed for H. As a result, PðHjI exp Þ is proportional to PðI exp jHÞ = Q N i¼1 P½I exp ðQ i ÞjH. The conditional probability P½I exp ðQ i ÞjH is assumed to be a Gaussian distribution with a standard deviation of ðQ i Þ: The calculated intensity for structure H at Q i is here expressed as I calc ðQ i ; HÞ = S jFðQ i ; HÞj 2 , where FðQ i ; HÞ denotes the scattering amplitude for structure H at Q i and S is a scale factor. Let ðQ i Þ be given by ðf½I calc ðQ i ; HÞ þ I bg ðQ i Þg 2 þ ½I bg ðQ i Þ 2 Þ 1=2 to express the Gaussian noise whose standard deviation is proportional to the intensity, which imitates the error arising from the optical misalignment. Here, I bg ðQ i Þ is the background intensity at Q i . From the typical standard deviation of our CTR data at equivalent positions, we estimate the value of ' 0:2.
The structure H that gives the maximum value of PðI exp jHÞ is the structure model most likely to reproduce the experimental data. For convenience, we introduce a cost function E(H) = ðÀ1=NÞ ln PðHjI exp Þ, whose minimization is equivalent to maximizing a corresponding posterior distribution. Substituting equations (1) and (2) into the definition of E(H), one obtains up to a constant that is independent of H. The advantage of the MC-based technique is the flexibility of the model construction. I calc ðQ i ; HÞ can include the effect of BO 6 octahedral rotation or any kind of domain structure, such as domains having different film thickness, through the ordinary formula of kinematical diffraction theory. In the practical use of the MC technique, the amount of information carried by the CTR data limits the complexity of the model. I exp ðQÞ at very close Q values are correlated, especially when the distance between the two points is closer than the instrumental resolution. This means that the amount of information in the CTR profile is proportional not to N but to the range of Q space. In this study, we selected the data step in the c Ã direction to be 0.02c Ã , which corresponds to $0.2 in the 2 angle for 16.5 keV X-rays. Thus, the resolution functions of the neighboring data points are well separated. Under these conditions, we will examine if the information derived from the CTR profiles is sufficient to obtain the global optima, by estimating both the accuracy and the precision in x3. The application to real experimental data will be presented in x4.

Initial model construction
The search for the global minimum of EðHÞ is divided into two parts: preparing initial parameters and the refinement of the parameters. The requirements of the initial model largely depend on the robustness of the refinement process. While the MC technique is robust to the initial parameters, a good initial model is preferable to achieve the model providing the global minimum of EðHÞ. The initial values of the structural parameters are modeled by using the error functions Erfðx À x c ; sÞ, where x c and s are the peak position and the width of the corresponding Gaussian function. The initial values of the interplanar distance c þ zðA; n þ 1Þ À zðA; nÞ are expressed as c þ Ác Erfðn À n int ; s z Þ, where c is the bulk substrate lattice parameter, Ác denotes the difference in the interplanar distance between the film and substrate, n int denotes the nominal interface position, and s z gives the sharpness of the interface. The depth profile of the initial distance c þ zðA; n þ 1Þ À zðA; nÞ is presented in Fig. 2(a), and the corresponding initial zðA; nÞ profile is shown in panel (b)  Depth dependence of (a) interplanar distance, (b) displacement of the A site and (c) occupancy of (Sr, La) of the LaAlO 3 /SrTiO 3 heterostructure model. Error bars represent the standard deviation of Pð i jI VM exp Þ. For most of the plots, the values are smaller than the symbol size. Shaded areas represent regions having occ(A, n) < 0.5. The differences between the resulting structure H and H VM are plotted with the origin shifted to the thick horizontal lines. The difference of zð; nÞ is magnified by a factor of 10. occð; nÞ is expressed by the product of two error functions, Erfðn À n surf ; s surf Þ½1 À Erfðn À n int ; s int Þ (where n surf , s surf , s int denote the nominal surface position, the sharpness of the surface and that of the interface, respectively), as presented in Fig. 2(c) (blue triangles). The number of parameters is reduced to six (Ác, n int , n surf , s z , s int , s surf ), which allows us to use the grid search method to find a good initial model.

Monte Carlo sampling
The refinement part of the MC calculation was performed by stochastic sampling with the Metropolis method (Metropolis et al., 1953): (1) One component of H is randomly selected to be modified. The parameter is modified by a small step Á defined by a Gaussian random number with a standard deviation of 0.004 Å for zð; nÞ and 0.01 for occð; nÞ.
(2) The change in EðHÞ caused by the parameter modification, ÁEðHÞ = EðH new Þ À EðH old Þ, where H old and H new are the structure models before and after the modification, is evaluated. The probability that the modification is accepted is given by r, which is defined as Here, T MC is a temperature parameter to adjust how often modifications are accepted in MC sampling. If the modification is not accepted, H new is restored to H old .
(3) The value of T MC is decreased according to an exponential annealing schedule. This process is called simulated annealing (Kirkpatrick et al., 1983). The number of Monte Carlo steps is typically 10 6 -10 7 . A 10 7 cycle iteration takes half an hour using a 4 GHz core i7 CPU with single-core calculation.
(4) After we have constructed a structural model that gives a satisfactory small value of EðHÞ, H is recorded during the MC calculation with a constant temperature T MC = 1=N, the inverse of the number of data points. This procedure generates samples from the posterior probability PðHjI exp Þ because the detailed balance condition is satisfied by the Metropolis method. Each component-wise posterior probability Pð i jI exp Þ has a peak. The peak position and width are interpreted as the resulting structural parameter i and its uncertainty, respectively. Since it evaluates the precision of structural parameters by directly sampling from their posterior probabilities, the MC technique can be applied to general cases where a strong correlation exists between the structural parameters. The values of and the scale factor S are refined together with H.

Analysis of virtual measurement data
To evaluate the performance of the MC refinement, the software developed for this purpose was applied to artificial CTR intensity profiles calculated from the reported structure parameters of a five-unit-cell-thick LaAlO 3 film on a TiO 2terminated SrTiO 3 (001) Fig. 3. The prepared VM data were 00, 01 and 11 rods. The total number of data points ðNÞ was 575. The minimum value of the cost function EðH VM Þ was 6.84. The initial model made with a grid search provided EðHÞ = 29.38, which corresponds to R = P jðjF exp j À jF calc jÞj= P jF exp j = 0.354, using the initial value of ¼ 0:15. After the grid search, all variable parameters H and were relaxed by the MC software. The MC calculation was divided into two stages by different temperature sequences. In the first temperature sequence, z(O1, n), occ(O1, n), z(O2, n) and occ(O2, n) were constrained to the corresponding values of metal ions on the same layer. The initial temperature was set to 0.5 to sample a wide parameter space. The second MC calculation with a different temperature sequence was applied to the model resulting from the first sequence by removing the constraints on z(O1, n) and z(O2, n), while the constraints on the occupancies for oxygen sites were maintained. The initial T MC was set to 0.1 and decreased by 1% per 2000 steps. In total 2 Â 10 6 calculation steps were performed for both stages of MC calculation.
The intensity profiles calculated from the resulting structure are shown in Fig. 3 as red curves. The structure model was improved to EðHÞ = 7.25 (R ¼ 0:04), and the resulting was 0.215 AE 0.002. The refined values of the structural parameters of the A site are presented in Fig. 2. Fig. 2(b) shows the depth dependence of zðA; nÞ. A positive displacement represents atomic movement towards the surface. Fig. 2(c) shows the depth dependence of occð; nÞ. The error bars are defined by the standard deviation of each parameter.
The precision of the structure parameters is also expressed by the standard deviation of each parameter. The average value of the standard deviation of zð; nÞ for each site ð z Þ is listed in Table 1. Here, the average is taken over the sites whose occupancy is larger than 0.5. CTR intensity profile of the VM data, I VM exp (open symbols), together with I calc before (blue) and after (red) the MC fitting (solid curves). The three CTR rods 00, 01 and 11 are shown with the scale shifted for clarity. metal sites is AE0.01 Å and that for the oxygen sites is AE0.04 Å . Standard deviations for the occupancy parameters at the interface ð int occ Þ and at the surface ð surf occ Þ are also listed in the same table. The typical precision of the metal occupancy is 3%.
The accuracy of the structure parameters is estimated from the differences between the resulting H and H VM , which are plotted in Figs. 2(b) and 2(c). The difference for zð; nÞ was approximately the same as the magnitude of the error bar, and the typical difference for occð; nÞ was three times larger than the standard deviation estimated by this procedure. The quantitative similarity between the accuracy and the precision suggests that the amount of information provided by the CTR profiles is well estimated by the number of data points N in the present case.

Analysis of experimental data
The experimental data of LaAlO 3 /SrTiO 3 along the 00, 01 and 11 rods with N = 575 (Yamamoto et al., 2011) were analyzed with our MC software. The initial value of S was determined in advance of the MC calculation by the steepest descent method using only the data near the Bragg peaks, where the CTR intensity is rather insensitive to the detail of the surface structure model. The initial values of the other structural parameters were the same as those used for the analysis made on the VM data. The initial systematic noise scale was chosen to be 0.15.
The initial model made with the grid search provides the value of EðHÞ = 42.54 (R ¼ 0:391). The analysis process for the MC sampling was the same as in x3, except for the refinement of the atomic displacement parameter B for the whole film. Fig. 4 shows the experimentally observed CTR intensity profiles together with the calculated profiles from the resulting structure. The value of EðHÞ was reduced to 7.66 (R ¼ 0:110). The MC refinement is proven to be robust enough for practical use of perovskite interfacial structure determination. Fig. 5(a) shows the depth dependence of zð; nÞ. The main structural features reported by Yamamoto et al. (2011), namely the shift of the O atoms in SrTiO 3 towards the surface and the lattice expansion around the interface, are reproduced in this analysis. Fig. 5(b) shows the depth dependence of occð; nÞ. Some amount of atomic interdiffusion at the interface is visible. The average values of the standard deviation of the structural parameters are listed in Table 1, where the average was taken over the sites where the occupancy was larger than 0.5. The precision of the parameters was similar to that in the VM analysis. The B value for the film region was found to be 1.16 AE 0.08 Å 2 , which is about three times larger than that of the substrate atoms. We also tried to refine the B parameters for each atom, and found that the B parameters were scattered from site to site unphysically, while the other parameters were nearly unchanged. This is because the effect of each B parameter on the cost function is too small to refine with the present dataset.
Lastly, we present the effect of the prior probability PðHÞ. The thickness of the film in the refined structure can be defined as the total values of occðLa; nÞ or occðAl; nÞ, which are 4.91 AE 0.08 and 4.46 Experimental intensity profiles (open symbols) together with calculated profiles (solid curves).

Figure 5
Depth dependence of (a) displacement and (b) occupancy of the LaAlO 3 / SrTiO 3 heterostructure. The horizontal axis represents the layer index n. Error bars represent the standard deviation of Pð i jI exp Þ. The values for some parameters are smaller than the symbol size.  Fig. 5. By applying a Gaussian prior probability with an average value of 5.0 and a standard deviation of 0.1, we obtained thicknesses of 4.92 AE 0.07 and 4.74 AE 0.13 for the La and Ti sites with the cost of a slight increase in R value. The R values before and after the application of the prior probability were 0.110 and 0.111, respectively. One can define an arbitrary PðHÞ, which will help to find a physically reasonable solution in a short time.

Conclusions
We have developed MC analysis software for the CTR scattering from perovskite-type transition metal oxide interfaces. The performance of the software was demonstrated by using a five-unit-cell-thick LaAlO 3 ultrathin film on an SrTiO 3 substrate as an example. The precision of the structural parameters estimated from the VM data analysis was AE0.01 and AE0.04 Å for the displacement of metal and O atoms, and that for the occupancies was AE0.03. The accuracy of the structural parameters was also examined, and it was found that the accuracy was similar to the precision. Experimental data were successfully analyzed by the same procedure. We are planning to apply this method to films with some inhomogeneity in thickness.