Interactive visualization of multi-data-set Rietveld analyses using Cinema:Debye-Scherrer

A tool to visualize the results of a series of Rietveld analyses is presented, allowing identification of analysis problems, prediction of suitable starting values and acceleration of scientific insight from the experimental data.


Introduction
Modern diffraction instruments at neutron and synchrotron sources produce data sets at ever faster rates. Rietveld analysis (Rietveld, 1969;Young, 1993) is the standard analysis method for powder diffraction data, fitting in many cases tens of parameters to derive crystallographic parameters (e.g. lattice parameters, atomic positions, atomic displacement parameters), microstructural parameters (e.g. micro-stresses or particle sizes from peak broadening or texture from peak intensities) and instrumental parameters. While some techniques and tools exist to automate the analysis, e.g. seqgsas in GSAS (Larson & Von Dreele, 2004), batch mode for MAUD (Lutterotti et al., 1999;Lutterotti, 2011), the sequential analysis in GSAS-2 (Toby & Von Dreele, 2013), and SrRietveld for both GSAS and Fullprof (Tian et al., 2013), they typically rely on the ability to use a template refinement that will converge when the experimental data are changed to a new data set and the set of refinable parameters remains identical to the template. Script-based approaches such as gsaslanguage (Vogel, 2011) allow more flexibility by applying refinement strategies even when a template refinement is used for a new data set (e.g. fix all parameters but scale and background and refine, then refine parameters incrementally) by querying parameters to allow conditional refinements (e.g. refine lattice parameters only above a threshold for the volume fraction). While these tools automate the analysis, the inspection of the results (parameter values as well as graphical inspection of experimental intensities and fit), the prediction of parameter starting values likely to lead to convergence of the refinement and ultimately the scientific insight from a large ISSN 1600-5767 data set of results from several tens or hundreds of Rietveld refinements still pose a challenge.
Since Rietveld refinements frequently vary tens or even hundreds of parameters, inspecting the refined parameters of a larger series of Rietveld refinements is not trivial, with simultaneous multi-histogram refinements complicating the situation. A single diverged parameter can invalidate the analysis of an entire experimental data set, making the identification of outliers mandatory. A tool allowing rapid identification of the parameter leading to divergence in some data sets is to the best of the authors' knowledge unavailable at present. Similarly, a tool to efficiently inspect tens or hundreds of single-or multi-histogram Rietveld analyses graphically is missing. Addressing problems related to the experiment, for example too small or too large sample volumes, may be crucial for successful beam time if real-time analysis is available. In many cases, trial-and-error approaches or cumbersome single peak fitting are necessary to identify starting values for parameters (e.g. lattice parameters) of unknown structures (e.g. in a solid solution or at elevated temperature or pressure). A tool utilizing successful refinement results to predict starting values likely to lead to convergence on the basis of external parameters (e.g. composition or temperature) can accelerate the data analysis. Finally, a tool to identify trends in the highly multi-dimensional refinement results, correlating external parameters (sample composition or temperature) with refinement results (phase volume fractions, atomic positions), would accelerate the pace of scientific insight from a series of powder diffraction analyses. A tool that readily visualizes relationships between any two parameters would expedite analyses uncovering new scientific discoveries. If real-time data analysis is available at the beamline during the experiment, early identification of some relationships may direct the experimental design if the analyst notices a particular region of interest in the parameter space that they would like to sample more finely. In addition, an interactive graphic interface with a simple input format and connection to automated workflows, i.e. automated data collection and data analysis, would alleviate some of the user-interface challenges between workflow data and visualization tools.
While tools to visualize raw scattering data exist, e.g. Mantid (Arnold et al., 2014), the tool described here specifically deals with visualization of analyzed scattering or diffraction data. Similarly, tools to visualize individual crystal structures exist, e.g. VESTA (Momma & Izumi, 2011), but results of parametric studies or diffraction parameters related to microstructural parameters such as peak width cannot be visualized with such tools. The commercial PolySNAP3 visualization and analysis tool for high-throughput diffraction data (Barr et al., 2009) provides statistical analysis of the raw diffraction data, but not Rietveld analysis results.
In this article, we introduce an interactive data exploration tool, Cinema:Debye-Scherrer, and demonstrate how it can aid in the validation of Rietveld analyses, scientific discovery and experimental design. In x2, we present the visualization tool Cinema:Debye-Scherrer, which allows users to quickly get an overview of large data sets in order to identify outliers and interesting relationships between parameters. A script to extract relevant data from GSAS Rietveld refinements is also introduced. In x3, we demonstrate our visualization approach with Cinema:Debye-Scherrer on example data sets from parametric diffraction studies of uranium-niobium samples and caesium plumbo tribromide. Finally, in x4 we summarize our approach to the visualization of diffraction data results.

Cinema:Debye-Scherrer
Cinema is an open-source image-based approach to extreme-scale data analysis predominantly developed at Los Alamos National Laboratory as part of the Exascale Computing Project. Cinema addresses the problem that in extreme-scale scientific simulations data analytics runs the risk of being a bottleneck to scientific discovery, by providing in situ visualization as an integral part of the workflow. Cinema provides a novel framework for a highly interactive, imagebased approach to data analysis that promotes exploration of, for example, simulation results or in this particular case Rietveld analysis results. Custom-developed pipelines as interfaces for data stored in different formats and viewers for data display provide flexibility to adapt Cinema for different problems. The adaptation of the Cinema viewers to view a powder diffraction data set and the subset of viewers chosen for inspection of Rietveld analysis results was named Cinema:Debye-Scherrer in reference to the Debye-Scherrer camera.
2.1.1. Input format. The input specifications to Cinema were chosen with generality in mind. The file format is a comma-separated value (CSV) file where the final set of columns may contain paths to images. This is referred to as a Cinema database. The Cinema database specifications were designed for an image-based approach for the visualization of extreme-scale (>10 15 FLOPS) in situ simulation data (O'Leary et al., 2016); however, CSV files are a popular file format so the input to Cinema:Debye-Scherrer should be portable to many other applications as well.
2.1.2. Web-based visualization interface. Cinema:Debye-Scherrer is an interactive data exploration tool implemented with a web-based interface using the Cinema Components and D3 (Bostock et al., 2011) JavaScript libraries. A web-based interface allows the visualizations to be displayed within web browsers for a versatile distribution across different platforms and the visualization code to be placed on a server for remote sharing between collaborators. Fig. 1 shows an overview of the Cinema:Debye-Scherrer interface which consists of several panels that display different types of visualizations or controls.
At the top of the interface is a parallel coordinates plot which shows a set of points in an n-dimensional space drawn across n vertical axes; there is one vertical axis for each dimension which may be numerical (e.g. lattice parameter value) or categorical (e.g. the composition shown on the first axis in the example). This visualization presents the analyst with an overview of an n-dimensional data set and the computer programs interactive interface allows the analyst to explore relationships between parameters. For example, anti-correlations and correlations are depicted as lines connecting axes that cross or do not cross, respectively. The vertical axes may be rearranged to emphasize certain relationships in the data set. Interesting points or outliers may be highlighted (red lines in Fig. 1) and a subset of samples can be selected by dragging a box vertically across a particular axis. A panel to the side of the parallel coordinates plot contains display options such as hiding axes or changing a particular axis to a logarithmic scale.
At the bottom of the interface is a panel to switch between an image spread, scatter plot or sortable tabular display. The image spread can display multiple images for each sample in the data set. With the current interface to the GSAS Rietveld software, PNG files with measured data, Rietveld fit and difference curve are included for all histograms of a refinement. With slight modifications of the input file, photographs of samples, pole figures describing the texture or any other bitmaps could also be included. Scatter plots and tables are commonly used to inspect data sets from parametric diffraction studies, e.g. to plot unit-cell volume as a function of temperature or pressure. The scatter plot display in Cinema: Debye-Scherrer dynamically plots any two parameters from the input database. Examples of the scatter plot display are shown in Figs. 4 to 7.
Any dynamic selections or highlighted points from the parallel coordinates plots are communicated to the three other displays in the lower panel: selections of subsets by selecting regions on any axis (e.g. selecting ranges of lattice parameter values) will limit the displayed plots to the selected paths, marking of paths will highlight rows in the table view etc.
Highlighting a certain image in the image spread will highlight the corresponding path (e.g. in Fig. 1 highlighting the left image of the two images corresponding to the second data set from the bottom right in the image spread, marked with the '1 2 3, 1/3' indicator, results in highlighting of the path shown in blue in the multi-axis plot). Similarly, any highlighted selections from the scatter plot and tabular displays propagate to the other visualizations. Hovering over a particular point in any of the visualizations displays its information for the displayed database columns (e.g. bottom left of Fig. 1).
2.1.3. Workflow integration. A bash script to compile results from GSAS PVE files (files containing parameter name, value and estimated standard deviation) for all PVE files in a directory has been developed. The script uses the phase names defined in the GSAS experiment files (EXP) files and can accommodate phases being present only in a subset of EXP files as well as parameters only refined for certain files. A CSV file (readily produced with any spreadsheet software) with external parameters (e.g. sample mass, temperatures or annealing times) can be integrated. The result is a CSV file which can be processed by Cinema:Debye-Scherrer, spreadsheet programs or other plotting software. Bitmap files with the experimental data, fit and difference curve for all histograms included in an analysis are also created and referenced in the resulting file. The script, gsas_prepare_cinema, is part of the gsaslanguage package but also works without the entire package.
These lightweight tools are fast operations that do not add significant time to the analysis. The extraction of data from the 59 workflows described in this article is completed on a timescale of the order of seconds using gsas_prepare_cinema.  In addition, the extracted data are quickly rendered with Cinema:Debye-Scherrer. For data sets of a thousand samples with tens of parameters, the results appear in the web browser in approximately 1 s.
2.1.4. Distribution and documentation. The source code and documentation of Cinema:Debye-Scherrer are available for download from https://github.com/cinemascience/cinema_ debye_scherrer. New releases of Cinema:Debye-Scherrer will be posted at the same location. Cinema:Debye-Scherrer was built with Cinema Components, which is a JavaScript library to construct web-based visualization tools for Cinema databases. Some of the visualization capabilities in Cinema:Debye-Scherrer have been implemented in the Cinema Components library, which is available at https://github.com/cinemascience/ cinema_components.
The script gsas_prepare_cinema which takes the output from the gsaslanguage workflows and converts it to a Cinema database is included in gsaslanguage which is available at https://github.com/Svennito/gsaslanguage.

Uranium-niobium
Fifty-nine samples with four different U-Nb compositions, annealing times from minutes to years, annealing temperatures from 300 to 600 C as well as varying sample sizes were characterized by neutron diffraction. Neutron powder diffraction data were collected on the HIPPO (High-Pressure Preferred Orientation) instrument (Wenk et al., 2003) at the pulsed neutron spallation source LANSCE (Lisowski & Schoenberg, 2006) using a robotic sample changer (Losko et al., 2014). Data sets for 0, 67.5 and 90 rotations around the vertical axis were collected, allowing texture analysis following procedures described by Wenk et al. (2010). The data collected from the 1200 3 He detector tubes of the HIPPO instrument, arranged on 45 detector panels on five rings around the incident beam, were integrated for the five rings, resulting in five histograms per run. The histogram data from the three rotations were then added for each ring, resulting in five histograms for each sample with random grain orientations as any weak to moderate texture is randomized by this procedure. All data sets were analyzed with the same gsaslanguage (Vogel, 2011) script. Consistent with the U-Nb phase diagram and applicable time-temperature-transformation diagrams (Jackson, 1971;Hackenberg et al., 2011Hackenberg et al., , 2015Hackenberg et al., , 2017, monoclinic 00 -U-Nb, tetragonal 0 -U-Nb, orthorhombic -U and 0 -U-Nb, and cubic -U were part of the analysis. In some cases urania UO 2 (from sample oxidation) and aluminium (from sample holders) were detected. The phases to be included in the analysis were provided as command-line parameters to the analysis script. A second script executed the analysis of all data sets in batch mode. Adjusting the U-Nb phase composition via the site-occupation factors in the cubic phase on the basis of the Jackson equation (Jackson, 1970), relating lattice parameter to composition, increased the accuracy of weight fractions via mass balance calculations. After all data sets had been analyzed, the structural results were compiled into a single text file by gsas_prepare_cinema and merged with sample parameters such as annealing temperature and time. A metallographic interpretation of the results is the subject of a future publication.
3.1.1. Validation. After development of the analysis script with a few data sets, all 59 data sets were analyzed in batch mode. To illustrate the ability of Cinema:Debye-Scherrer to identify potential problems in some data sets, Fig. 2   A parallel coordinates plot of sample identifier, lattice parameters a; b; c, unit-cell volume, peak width parameter and weight fraction of -uranium (axes from left to right). The red highlighted paths show outliers for the lattice parameters. lattice parameters, unit-cell volume, peak width parameter and weight fraction of -U. While the majority of the analyses provided lattice parameters with a small variation, several analysis runs resulted in lattice parameters clearly outside the typical values from the majority of the 59 data sets. The paths of those runs are marked. It becomes immediately obvious that only two of the four outliers in the lattice parameters also result in distinct outliers for the unit-cell volume, i.e. in some cases the lattice parameter deviations cancel each other out and inspection of the unit-cell volume alone would not have revealed these outliers. Furthermore, in three of the four cases the peak width parameters are also above the typical values. In A parallel coordinates plot of annealing temperature, annealing time and the weight fraction of the UO 2 phase. The bar on the UO2_WTFRAC (UO 2 weight fraction) axis shows the selected paths with samples that have a non-zero weight fraction for the UO 2 phase.

Figure 4
Full view of the Cinema:Debye-Scherrer interface with a scatter plot. The parallel coordinates plot shows composition, annealing temperature (in C), -U weight fraction and annealing time (in min). all cases, the weight fraction of -U is low. These findings immediately caution the analyst on the results for these runs. The ability to select runs within a range on any of the axes allows inspection of the diffraction patterns produced by GSAS for these particular samples, including the difference curve. This may lead to insight into common problems with the Rietveld analysis of problematic analyses (see Fig. 1).
3.1.2. Parameter relationships. To illustrate the ability to gain scientific insight from a data set, Fig. 3 shows composition, annealing temperature and time together with the urania weight fraction. Selecting the entire range of weight fractions on the corresponding axis hides the paths of samples where no urania was found (indicated by 'NaN' on that axis). The parallel coordinates plot shows that oxidation predominantly occurred for medium annealing times and, except for three samples, only for the U-5.6Nb and U-5.9Nb samples. While this could indicate a higher corrosion resistance of U-7.5Nb and U-7.7Nb, in this case it is simply due to sample preparation. Fig. 4 shows composition, annealing temperature, annealing time and -U weight fraction. The paths marked in red correspond to U-5.6Nb samples. The data points marked in red in the multi-axis plot are also marked in red in the scatter plot. As introduced in x2, the number of crossings in the parallel coordinates plot can be interpreted visually as a correlation, anti-correlation or absence of a correlation: Few crossings indicates a positive correlation since this means as one parameter increases, so does the other parameter. A strong negative correlation is represented by most or even all the paths crossing in a single point, while many crossings throughout the space between the axes indicates no correlation. Since the axes can be re-arranged by dragging the axis labels, the analyst can inspect correlations between two parameters by positioning the axes of two parameters next to each other. This allows us to inspect correlations between -U weight fraction, annealing time and annealing temperature. The many crossings of paths between annealing temperature and -U weight fraction indicate that there is no correlation between the two parameters. On the other hand, there are few paths crossing between the -U weight fraction and the annealing time, indicating that increasing annealing time leads to increased -U weight fraction. The scatter plot of all -U weight fractions against annealing time confirms this trend. This behavior is typical of what is captured in typical timetemperature-transformation (TTT) diagrams. These diagrams typically show so-called 'noses' where, at certain temperatures in the intermediate temperature range, short times are sufficient to transform significant volume fractions of the material while above and below the 'nose' longer times are required. While the general trend that longer annealing times lead to larger volume fractions of a new phase, -U in this case, holds for the U-Nb system characterized here, the TTT noses prevent a 'perfect' correlation with temperature. Similarly, the few crossings in Fig. 2 between the c lattice parameter and the unit-cell volume indicate the in this case obvious correlation between lattice parameter and unit-cell volume. The few crossings in Fig. 5 between the -U lattice parameter and the weight fraction of the same phase indicate a correlation between those parameters (see below). Cinema:Debye-Scherrer would allow 'discovery' of such systematics. Fig. 4 show that for the U-5.6Nb composition (marked in red) no samples with short annealing times were measured. This could indicate problems with the experiment design; however, in the present case these samples were not yet characterized. Fig. 5 shows composition, annealing time and temperature, -U lattice parameter, and weight fraction. Almost no crossings of paths of the -U lattice parameter a and -U weight fraction are observed, indicating a correlation between the two parameters. The scatter plot of -U weight fraction versus -U lattice parameter shows a linear relationship for a large range of lattice parameters, which was first identified by Jackson (1970). However, our data show that for larger lattice parameters the relation might be nonlinear.

CsPbBr 3
The crystal structure of caesium plumbo tribromide, CsPbBr 3 , was investigated by neutron diffraction as a function of temperature using HIPPO (see above for a description and references). The material has a perovskite-type structure with orthorhombic crystal structure at room temperature and undergoes phase transformations to tetragonal and cubic crystal structures at 88 and 133 C, respectively (Møller, 1959;Hirotsu et al., 1974;Stoumpos et al., 2013). Neutron diffraction patterns were collected in 10 C steps from 35 to 175 C and in greater increments up to 400 C. Results of this study will be reported in a forthcoming paper.
3.2.1. Validation. Fig. 6 shows sample temperature, reduced 2 , lattice parameters and unit-cell volumes of the three polymorphs. The paths between temperature and reduced 2 show that the runs in the temperature range of the orthorhombic phase have lower reduced 2 values, and therefore better fit quality, than the tetragonal phase, with the latter in turn resulting in worse agreement between experimental data and Rietveld fit than the cubic phase (except for the two highest temperatures). The scatter plot at the bottom of Fig. 6 confirms this. The crossover between the axes for the orthorhombic lattice parameters a and b indicates peculiarities in the thermal expansion behavior. The paths for the tetragonal phase, highlighted in red, show approximately constant increments in the unit-cell volume for the 10 C increments. However, the center path of the five paths shows a higher a lattice parameter with c lower than the remaining increments. This peculiarity would have gone undetected if only the reduced 2 , not different from the other 2 for this phase, or only the unit-cell volume had been considered. Alternatively, for experiment design or if detected by real-time data analysis and visualization, more data points could be collected in that temperature range to establish whether this observation is an artifact of the data analysis.
3.2.2. Parameter relationships. Fig. 7 shows the reduced 2 , temperature, unit-cell volume and lattice parameters of the orthorhombic phase of CsPbBr 3 only (by selecting the range on the unit-cell volume axis with values, excluding the runs for Multi-axis plot of temperature, reduced 2 , lattice parameters and unit-cell volumes of orthorhombic, tetragonal and cubic CsPbBr 3 . The scatter plot shows the reduced 2 as a function of temperature. which this parameter is not available). The unit-cell volume increases with approximately constant increments when the temperature is increased. Since there are no paths crossing between unit-cell volume and lattice parameter a, the same is true for a. However, between lattice parameters a and b as well as between b and c crossing occurs, indicating deviations from that proportionality. The scatter plot of the lattice parameter c as a function of temperature shows that after an initial expansion along this direction, CsPbBr 3 starts to contract at 65 C along this direction, leading to the crossings in the paths in the multi-axis plot.
3.2.3. Experiment design. As discussed above, Cinema: Debye-Scherrer allowed us to identify changes in the increments of a and c in the tetragonal phase while the unit-cell volume shows a constant increment with a constant temperature increment. If this discrepancy had been discovered during the experiment, this region could have been reinvestigated. Furthermore, a reversal from expansion to contraction of the c axis was identified during heating together with a change from contraction to expansion along b. If this information had been available from real-time analysis during the experiment, this temperature region could have been reinvestigated with smaller temperature increments.

Summary and conclusions
The Cinema:Debye-Scherrer tool for multi-dimensional data visualization was described. Applying the tool to diffraction data sets from 59 U-Nb samples of different composition, annealing times and annealing temperatures and a study to investigate the crystal structure evolution of CsPbBr 3 as a function of temperature illustrated how this tool provides an efficient overview of Rietveld analysis results. The required data are extracted from standard results files of the widely used GSAS Rietveld analysis software. Since Cinema:Debye-Scherrer is based on CSV files, similar files can be produced for other analysis packages. The tool allows us to efficiently identify outliers of automated Rietveld analysis runs and at the same time offers efficient access to the graphical output of diffraction data, Rietveld fit and difference curves for  Multi-axis plot of reduced 2 , temperature, unit-cell volume and lattice parameters of the orthorhombic phase of CsPbBr 3 . The scatter plots show the c (middle) and b (bottom) lattice parameter as a function of temperature. multi-histogram refinements. In the case of the U-Nb samples, the automated Rietveld analysis takes several minutes per data set owing to the complexity of the analysis, simultaneous refinement of multiple histograms and the multitude of parameters in the phases with low crystal symmetry present in the samples. The corresponding gsaslanguage command allows us to gather all required information after the automated analysis within seconds and Cinema:Debye-Scherrer displays results instantaneously. Visualization of the parameter space covered during an experiment (annealing times and temperatures in this example) identifies gaps in experimental coverage. The presence or absence of crossing points in the paths between two axes allows us to quickly identify correlations, which can then be further inspected using scatter plots. The latter serves as an example of how new scientific insight can be gained at an accelerated pace. All three aspects, (i) quality control of (automated) Rietveld analysis, (ii) identification of gaps in the experimental parameter space and (iii) accelerated scientific insight, are great assets in decision making during beam times when real-time data analysis is available. The tool has already proven useful for the management of large-scale sample sets and parametric studies at LANSCE. The visualization tool Cinema:Debye-Scherrer and the script for the extraction of GSAS Rietveld analysis results are available for download at the web sites provided in x2.1.4.