SUePDF: a program to obtain quantitative pair distribution functions from electron diffraction data

SUePDF is software to convert electron diffraction data to quantitative pair distribution functions.


Introduction
The pair distribution function (PDF) method is widely employed for studying structurally disordered materials (Warren, 1990;Egami & Billinge, 2002;Proffen et al., 2003). This is because it gives more structural information, beyond the information given by the traditional Bragg-peak-based analysis, from scattering data. Standard computer programs for PDFs for neutron and X-ray powder diffraction data are well established (Peterson et al., 2000;Juhá s et al., 2013). A number of efforts have been made to obtain PDFs from electron diffraction (ED) data (Cockayne & McKenzie, 1988;Tewes et al., 1994;Takagi et al., 2001;McBride, 2003;Ankele et al., 2005;Ishimaru et al., 2008;Abeykoon et al., 2012;Mu et al., 2013). However, these have not been sufficient to establish electron powder diffraction as one of the major data sources for PDF analysis. The main obstacle has been the multiple scattering of electrons, which may alter the scattering intensities in the ED patterns, making it difficult to extract an undistorted structure function (Uyeda, 1968;Cowley, 1969;Anstis et al., 1988;Cockayne & McKenzie, 1988). Beside this, the electron-matter interaction differs from the cases of X-rays and neutrons, implying that a dedicated procedure of data treatment is needed for ED data.
In order to obtain reasonable PDFs, ED data IðQÞ have to be scaled using appropriate electron scattering factors f e ðQÞ to yield a proper reduced structure function FðQÞ, where Q is the magnitude of the scattering vector. This is not an easy task because the ED data are generally contaminated by inelastic scattering and distorted by multiple scattering, instrument errors and noise. The widely reported solution for inelastic scattering is using an energy filter, but this may also cause additional distortion in the data when only electrons of certain energy-loss range can be blocked. Data scaling was previously carried out by multiplying the compositional averages hf 2 e ðQÞi and hf e ðQÞi 2 with a fitting parameter , which may be fixed by matching hf 2 e ðQÞi with IðQÞ at large Q (Cockayne & McKenzie, 1988;Cockcayne, 2007;Cockayne et al., 2010). An early software attempt to fit IðQÞ with hf 2 e ðQÞi for PDF extraction was made by drawing and then subtracting a background of IðQÞ (Hauschild, 2009). However, because the mathematical curve presented for this background is generally not suitable, the background is often constructed manually. The next software attempt was made by Mitchell & Petersen (2012), in which an additional fitting parameter was introduced to counter the discrepancy between IðQÞ and hf 2 e ðQÞi. This meant that FðQÞ was obtained by matching hf 2 e ðQÞi À with IðQÞ at large Q. In other work, instead of introducing an additional parameter, FðQÞ was directly adjusted by fitting the distortion feature to a fourth-order polynomial (Mu et al., 2013).
The program SUePDF aims to effectively correct ED data by taking into account the multiple scattering features and uniqueness of electron-matter interactions. For amorphous materials, the coherent multiple scattering can be considered as insignificant and incoherent multiple scattering should merely contribute to the background (Cowley, 1992), which may be modeled and subtracted. For crystalline-like nanoparticles, coherent multiple scattering should not affect the peak positions of the resulting PDFs but only the peak intensities. This has been demonstrated to be the case for crystals having a thickness of less than five times the electron mean free path (Anstis et al., 1988). The problem with modulated peak intensities caused by coherent multiple scattering is tackled in SUePDF by a renormalization procedure based on number densities, and the probability of having an atom at a certain distance is non-negative. In general, the major distortion of IðQÞ can be considered to form a smooth background, which is built up from the direct-beam tail, inelastic continuum and incoherent multiple scattering. SUePDF employs the combination of an optimizing parameterization and a reasonable mathematical model for the background. Inside SUePDF, background subtraction is coupled with data scaling in a loop-based routine to optimize the data treatment. After background subtraction and data scaling, the PDF can be obtained by a Fourier transform of the normalized data. On the basis of the physical meaning of PDFs, noise filtering and normalization procedures have to be routinely carried out. This improves the physical reliability of the outcome PDFs and allows the uncertainties to be evaluated. SUePDF also offers the possibility to correct finite size effects present for nanoparticles, by using a nanoparticle form factor computed for a given size and shape.

Background modeling
The smooth background observed for ED data is considered to be the contributions from the direct-beam tail, inelastic scattering continuum and incoherent multiple scattering. For an elastic scattering of a -wavelength electron with a semi-angle , Q ¼ 4 sin =, but in the case of smallangle inelastic scattering (Egerton, 2011) where ¼ E ÁE=½K E ð1 þ E Þ is the characteristic angle corresponding to an energy loss ÁE; E ¼ ð1 þ K E =E 0 Þ is the relativistic factor with K E the kinetic energy and   Normalized IðQÞ of a nanoporous carbon sample (back solid) and the corresponding hf 2 e ðQÞi ¼ f 2 e ðQÞ; the inset shows a ring pattern reconstructed from this normalized IðQÞ.
For example, if the electron energy loss is counted up to 2000 eV for an incident beam of 200 keV, then the inelastic beam tail is limited to below Q inel ð2000Þ ' 1:46 Å À1 . For lowloss electrons (ÁE < 50 eV), the inelastic component Q inel ð<50Þ is less than $0.036 Å À1 . Equation (2) means that the inelastic error is more significant for electrons of high energy loss. It is well known that high-loss electrons can be excluded using an energy filter. However, according to our experience, energy filtering also modifies the background, making an accurate quantitative background modeling difficult. The following mathematical model is introduced to fit the background BðQÞ of the electron powder diffraction pattern: Equation (3) is actually a positive-degree portion of a Laurent-type series where c k are the fitting parameters and N is the fitting order, which may vary for different sample compositions. A comparison between the power-law model and the Laurent-type model [described in equation (3), with N ¼ 7] for background fitting of electron powder diffraction data acquired from a nanoporous carbon sample is shown in Fig. 1. In addition, this background model works in both thin and thick samples. This is demonstrated in a study of amorphous silica discussed in x4.2. The measured bond lengths and the values reported in the literature are listed there and are in good agreement.

Optimization procedure for data scaling and background modeling
Electron scattering factors, denoted as f e ðQÞ, are used for data scaling. These factors may be obtained using the following Mott-Bethe formula (Mott & Massey, 1965): where Z is the atomic number, V is the valence number, a 0 ' 0:53 Å is the Bohr radius and f X ðQÞ is the X-ray scattering factor (Brown et al., 2006). For neutral atoms, it is recommended to use the available DFT-computationbased parameterizations (Kirkland, 2010), which are considered to be more accurate than those from the Mott-Bethe formula, particularly at low Q values. For samples composed of more than one element, the chemical compositions are required as input, together with the corresponding valence for each element.
The data scaling is constrained by a mathematical feature of the structure function SðQÞ. That is, SðQÞ ! 1 when Q ! 1. The background-subtracted and normalized scattering intensity IðQÞ is converted into SðQÞ by using the compositionaveraged hf 2 e ðQÞi and hf e ðQÞi 2 (Warren, 1990): The data scaling is done according to where I raw ðQÞ is the raw data of scattering intensity. The feature SðQÞ ! 1 when Q ! 1 restricts IðQÞ to attenuate around hf 2 e ðQÞi at high Q values. This suggests a necessary minimization of the following quantity, defined at the tail of IðQÞ: In SUePDF, the minimization of equation (7)   fitting order N, which are initially input by users, to optimize the normalized IðQÞ. A normalized IðQÞ of a nanoporous carbon sample, scaled by the corresponding hf 2 e ðQÞi ¼ f 2 e ðQÞ, is shown in Fig. 2.

Fourier transform of F(Q) to yield reduced PDF
The reduced PDF GðrÞ is obtained by a Fourier transform of the reduced structure function FðQÞ ¼ Q½SðQÞ À 1: FðQÞ sinðQrÞ dQ: The red dotted curve shown in Fig. 3(a) is the unfiltered GðrÞ for nanoporous carbon after the Fourier transform [equation (8)]. Besides being convolved with the termination function ½sinðQ max rÞ À sinðQ min rÞ=ðrÞ (Peterson et al., 2003), this unfiltered GðrÞ exhibits some artifact peaks at low (<1 Å ) and high (>20 Å ) values of r, which are equivalent to the low-and high-frequency noise, respectively, of the experimental data.

PDF normalization and noise filtering
The unfiltered GðrÞ obtained from equation (8) has to be adjusted by physical and mathematical constraints. These constraints refer back to the definition of PDF gðrÞ, which represents the probability density of finding a pair of two atoms separated by distance r (Egami & Billinge, 2002). Therefore gðrÞ, the probability density, has to be non-negative: where ðrÞ is the nanoparticle form factor (Kodama et al., 2006;Gilbert, 2008;Tran et al., 2016) [for bulk, ðrÞ 1] and 0 is the average number density of the sample. The normalization of the PDFs is based on equation (9). Prior knowledge about the shortest interatomic distance existing in the sample is used as a physical constraint, such that the PDF gðrÞ at distances smaller than a value r min is set to zero: Equation (10) is used to filter off lowfrequency noise. The high-frequency noise can be filtered off by setting an upper cut-off distance r max , where GðrÞ ¼ 0 at r > r max . This noise treatment of GðrÞ is demonstrated in Fig. 3(a). A Fourier back-transform of the treated GðrÞ then yields the noise-filtered FðQÞ data, shown in Fig. 3(b), for the here given example from a nanoporous carbon sample.

Evaluation of uncertainty
The treated reduced PDF G T ðrÞ is compared with G BFT ðrÞ, which is the Fourier transform of the [1.2-20 Å ] band-passfiltered FðQÞ, in Fig. 4(a). The observed difference between G T ðrÞ and G BFT ðrÞ is the consequence of the cutting of low and high frequencies which is propagated through the Fourier transform. Therefore, the uncertainty considered here is mainly associated with the low-and high-frequency noise, which correspond to the artifact short-and long-distance peaks found in the unfiltered GðrÞ, respectively. From Fig. 3(b), it is clear that the low-frequency noise can be attributed to some low-frequency distortions of the FðQÞ data. These distortions may be caused by the imperfection of the instrumental setup, the errors encountered during the background subtraction and data scaling using the approximated electron scattering factors. One of the typical errors is the beam convergence, which generally does not affect the positions of PDF peaks but does affect their intensities (McBride, 2003). Beside these errors, in the case of coherent structures, the low-frequency distortions may be related to the coherent multiple scattering which might not be treated Uncertainty in the noise filtering of PDFs: (a) treated G T ðrÞ (red dotted line) compared with G BFT ðrÞ (blue solid line) which has been transformed from [1.2-20 Å ] band-pass-filtered FðQÞ; (b) the corresponding normalized PDFs g T ðrÞ (red dotted line) and g BFT ðrÞ (blue solid line) and the difference between these (black line). From these, the evaluated uncertainty is $3.7%.
properly by background subtraction. The high-frequency noise, which is believed to be more random, may come from various sources. Some probable sources are the electromagnetic environment, mechanical instability of the instrument etc. Note that the properties of the recording media are of importance and can influence the results in both images and diffraction significantly, in particular resolution and sensitivity (Ruskin et al., 2013). In general, the higher the number of pixels in the detector, the better the resolution that can be achieved. The current generation of detectors, which are of CCD and CMOS type and have either 1000 Â 1000 or 2000 Â 2000 pixels installed in a typical electron microscope configuration, can provide significantly high Q max for most applications. A detector with better dynamical range and sensitivity can have the advantage of handling strong diffraction in the low-Q range and acquiring a weaker signal in the higher-Q range. As a result, the extended Q range can improve the overall quality of the ED-based PDF. The optimization of the acquisition condition for a particular detector can help to obtain quality electron diffraction patterns and therefore EDbased PDFs, especially to minimize artifacts given by the detector, such as streaking, the trace of the beam path given by the shutter, blooming etc.
The uncertainty of the normalized PDFs can be evaluated from the relative r.m.s. difference between the corresponding g T ðrÞ and g BFT ðrÞ: U gð½r min ;r max Þ ¼ r:m:s: g T ðrÞ À g BFT ðrÞ Â Ã r:m:s: g T ðrÞ Â Ã : ð11Þ U g([1.2 Å , 20 Å ]) ' 3.7% for the case of the nanoporous carbon sample shown in Fig. 4(b).

Electron total scattering profile input
The input data for SUePDF v1.0 is a one-dimensional electron total scattering intensity profile. The input file must have a simple x-y two-column format. The x column is the s values (in Å À1 when calibrated; note that Q = 2s) and the y column is the corresponding intensity I(Q). Multiple input files may be selected for integration of different diffraction data of the same sample and under the same experimental conditions. Multiple input files must be synchronized with the same format and the same data size. Fig. 5 shows an overview of the graphical user interface (GUI) after inputting a data file using 'INPUT BROWSER'.

Loading electron scattering factors
Databases of parameterization for both electron (Kirkland, 2010) and X-ray (Brown et al., 2006) scattering factors (or atomic form factors) are implemented in SUePDF. In cases of neutral atoms the electron database will be used; otherwise the X-ray database will be loaded in order to calculate the electron scattering factors for ions via the Mott-Bethe formula [equation (4)]. Besides the electron energy (in keV), chemical information, including the elemental composition, molar ratios and valences, is required as input to load appropriate electron scattering factors for data scaling. Fig. 6 shows the GUI panel for inputting chemical information and electron kinetic energy and calculation of electron scattering factors.

Background optimization
The background optimization is based on user input of the following: The GUI for calculating the electron scattering factor by inputting chemical information and electron kinetic energy.

Figure 5
An overview of the SUePDF GUI after inputting the data file of an electron scattering profile.
(i) Two points specifying the pre-peak background (corresponding to the lowest momentum transfer) and the tail.
(ii) The number of middle background reference points: these points will be automatically positioned as initial conditions between the previous two points of the pre-peak and the tail. Their positions will vary along the curve of the raw scattering profile to optimize the background. It is noted that these reference points generally do not lie on the background; their distances to the background are refined while their positions vary. The typically recommended number of these reference points is 3-8. Use of larger numbers of reference points consumes more computation time.
(iii) The maximum fitting order: typically recommended orders are 5-15. Larger fitting orders consume more computation time. Fig. 7 shows a background optimized in the SUePDF GUI, together with the scaled I(Q) and S(Q).

S(Q) correction and highfrequency noise filtering
SUePDF offers a routine for correction of the S(Q) tail. This routine is optional and only recommended when a good enough solution for the S(Q) scaling problem cannot be found by background optimization. The correction procedure is based on calculation of the median curve of the S(Q) tail. There are two steps: (1) Specifying the to-be-corrected tail of S(Q).
(2) Tuning the order of the median fitting to achieve a corrected tail of S(Q). The tuning may also be done automatically by pressing the 'Auto optimization' button.
High-frequency noise filtering is recommended because ED data usually contain high-frequency noise, which is more visible at high Q values as spiky oscillations. The filter is based on forward-and-back Fourier transforms. A cut-off distance is required as a user input. The cut-off distance serves as the 'highest frequency' allowed in the ED data and is supposed to relate to the atomic structure.

Nanoparticle form factor
The nanoparticle form factor takes into account the size and shape of the sample and quantifies how much they affect the normalized PDF. For a bulk sample this factor is unity. SUePDF v.1.0 offers the calculations for four basic shapes: sphere, cuboctahedron, cube and truncated cube (Tran et al., 2016). When working with nanoparticle samples, the user must input their size and shape to load the appropriate form factor. If this information is not loaded, the default form factor will be unity (for bulk). Fig. 8 shows the GUI panel for calculation of the nanoparticle form factor.

PDF renormalization
This renormalization is to achieve the quantitative PDF g(r). The renormalization procedure is based on the following: (a) The non-negativity of g(r) as probability density.
(b) The cutting off of short unphysical distances producing low-frequency distortion in the data (low-frequency filtering). This requires a user input of a lower cut-off distance (based on general prior knowledge of the shortest interatomic distance existing in the sample).
(c) The revision of number density. SUePDF is able to deduce a value of number density from the normalized ED   data. If a better value of number density is known already, it should be used as the correction for the deduced value. In a good case of data processing of nanoscale samples, the deduced value can be very close to the generally accepted one. Fig. 9 shows the PDF renormalization by cutting off low-frequency distortions and number density revision.

PDF quantification
As shown in Fig. 10, the coordination number can be measured by specifying an integration window for the relevant g(r) peak. The background-subtracted and noise-filtered electron scattering profile (which should be the extracted kinematical scattering data) can be reconstructed back into a ring pattern, shown as the inset of Fig. 10.

Gold nanoparticles
An example of using SUePDF to study $5 nm sized Au nanoparticles supported on an amorphous carbon film is shown in Fig. 11. The particles are considered to be close to spherical in shape (thus, the nanoparticle form factor of a 5 nm sphere was chosen), although some facets resembling cuboctahedral morphology can be seen in the high-resolution transmission electron microscopy (TEM) image (inset of Fig. 11a). To obtain reliable ED data for the Au nanoparticles, an ED data set of an equivalent blank carbon film was collected as the substrate reference. The PDF based on ED data with Q min = 1.5 Å À1 and Q max = 12.5 Å À1 is shown in Fig. 11(b) (red) in a comparison with the theoretical PDF of a 5 nm spherical model of a perfect face-centered cubic (fcc) Au nanoparticle (blue). PDF renormalization: the cutting off of low-frequency distortion and number density revision; g(r) is shown in the inset.

Figure 10
Measuring coordination number and reconstruction of the kinematical ED pattern.

Figure 11
(a) TEM images of $5 nm sized Au nanoparticles including a high-resolution image (inset); (b) EDbased PDF obtained using SUePDF (red) compared with the theoretical PDF of a 5 nm spherical Au nanoparticle model having perfect fcc structure (blue). apertures (marked with dashed blue and solid red circles for the thin and thick areas, respectively) defining the regions for ED acquisition. The corresponding (reduced) PDFs obtained using SUePDF are shown in Fig. 12(c). The PDF of the thin area (dashed blue line) was obtained from ED data with Q min ' 0.4 Å À1 and Q max ' 12 Å À1 . The PDF of the thick area (solid red line) was obtained with the same Q max but a slightly higher Q min (0.55 Å À1 ) in order to cut off the possible increase of inelastic scattering. Note that, because of the limited Q max , the EDbased PDFs are generally broader than X-ray/neutron-based PDFs and the termination ripples may interfere significantly with some low and broad peaks of amorphous materials (e.g. Si-Si peaks). The quantitative measurements of bond lengths and coordination numbers for both the thin and the thick areas are listed in Table 1, together the reference data from neutron scattering (Keen & Dove, 1999) and molecular dynamics simulation of bulk amorphous silica (Hoang, 2007). The bond lengths found by SUePDF do not change significantly from the thin to the thick areas. On the other hand, the average coordination numbers (measured with a number density of 0.065 Å À3 ) do vary from the thin to the thick region. Apart from the possible sources of error specified in x2.5, this reasonable variance could be attributed to the finite size of the studied regions and local effects when the ED data are obtained from small amounts of sample in a transmission electron microscope, which are not large enough to be fully considered as bulk samples. Fig. 13 shows an ED-based PDF of Cu 0.475 Zr 0.475 Al 0.05 metallic glass in comparison with the corresponding X-ray data (Kaban et al., 2015). The ED data have [Q min , Q max ] ED = [0.9, 12.2] Å À1 , while the X-ray data have [Q min , Q max ] X = [0.7, 21.1] Å À1 . Note that the difference in Q range can cause different termination effects on the ED-based PDF and X-ray PDF. Besides this, the sample amount in a TEM-based ED experiment is much less than the amount in an X-ray experiment, suggesting that the information given by the ED-based PDF is more local than the information given by the X-ray PDF.

Environment and distribution of SUePDF
SUePDF is written in the MATLAB language and is compiled as a stand-alone GUI program, which only requires the free MATLAB Runtime R2015a (or newer) environment (http:// www.mathworks.com) installed on a 64 bit Windows platform (Windows XP or newer is recommended). The installer (SUePDF_Installer.exe file) will automatically download (internet connection required) and install the MATLAB Runtime environment before installing SUePDF when executed.  Table 1 Bond lengths and coordination numbers of amorphous silica found by using SUePDF on the ED data obtained from thin and thick areas.
These values are compared with the results from neutron (N) scattering (Keen & Dove, 1999) and molecular dynamics (MD) simulations (Hoang, 2007 Figure 13 ED-based PDF (blue) of Cu 0.475 Zr 0.475 Al 0.05 metallic glass obtained using SUePDF compared with the corresponding X-ray data (Kaban et al., 2015) shown in red.

Figure 12
Comparison between thin and thick areas of amorphous silica for obtaining ED-based PDFs using SUePDF. (a), (b) TEM images of thin and thick areas, respectively, for ED acquisition; (c) the corresponding ED-based PDFs for the thin (dashed blue line) and thick (solid red line) areas.
SUePDF is distributed as free software for academic users, with an installer file and a manual document available for free download at https://osf.io/c2jq8/.

Summary
We have described the implementation of SUePDF, a GUI program dedicated to structural analysis based on electron diffraction data. SUePDF facilitates TEM-based structural studies of amorphous materials and nanoparticles by converting the electron diffraction data in the reciprocal space into quantitative PDFs in the direct space. SUePDF employs the scattering physics of electrons as well as the physical meaning of PDFs to achieve reliable data normalization. Noise is treated in SUePDF by band-pass Fourier filtering that also allows the evaluation of uncertainties caused by experimental conditions and data treatment procedures. Examples of using SUePDF to obtain quantitative PDFs of crystalline Au nanoparticles, amorphous silica and amorphous Cu 0.475 Zr 0.475 Al 0.05 metallic glass have been demonstrated.