Representational analysis of extended disorder in atomistic ensembles derived from total scattering data

Representational analysis is used to characterize correlated short-range order in large atomistic ensembles. This method, analogous to tight-binding methods, enables the extraction of relevant structural parameters in an orthogonal and local basis that permits robust statistical analysis of crystalline disorder.

With the increased availability of high-intensity time-of-flight neutron and synchrotron X-ray scattering sources that can access wide ranges of momentum transfer, the pair distribution function method has become a standard analysis technique for studying disorder of local coordination spheres and at intermediate atomic separations. In some cases, rational modeling of the total scattering data (Bragg and diffuse) becomes intractable with least-squares approaches, necessitating reverse Monte Carlo simulations using large atomistic ensembles. However, the extraction of meaningful information from the resulting atomistic ensembles is challenging, especially at intermediate length scales. Representational analysis is used here to describe the displacements of atoms in reverse Monte Carlo ensembles from an ideal crystallographic structure in an approach analogous to tight-binding methods. Rewriting the displacements in terms of a local basis that is descriptive of the ideal crystallographic symmetry provides a robust approach to characterizing medium-range order (and disorder) and symmetry breaking in complex and disordered crystalline materials. This method enables the extraction of statistically relevant displacement modes (orientation, amplitude and distribution) of the crystalline disorder and provides directly meaningful information in a locally symmetry-adapted basis set that is most descriptive of the crystal chemistry and physics.

Introduction
Achieving an atomistic description of solids continues to provide a challenge to the study of materials, especially as we learn that imperfections and disorder of crystals can give rise to the emergence of unexpected materials properties. For example, the multifunctional properties of the perovskite manganites can only be explained by understanding the relationships between the local and average structures Wu et al., 2007). Therefore, we strive to further classify and quantify the nature of any local ordering (shortrange order) that is patterned in a disordered fashion. Pair distribution function (PDF) analysis of total scattering data has become a common technique for the characterization of local distortions and disorder in crystals, as well as of nanoparticle structures (Egami & Billinge, 2012;Billinge & Levin, 2007;Young & Goodwin, 2011;Keen & Goodwin, 2015).
Modeling of atomistic structures -with an emphasis on capturing the correct local structure -from experimentally derived atom-atom histograms poses a great challenge, especially when the best description of the PDF has a short finite correlation length (a domain) that becomes averaged into a higher symmetry in the crystallographic structure. To ISSN 1600-5767 obtain an atomistic description of such a model with these domains (where each domain consists of a few unit cells), simulations containing thousands of atoms can be used to model the total scattering data. By employing a large-scale simulation, the limitations from periodic boundary conditions are lifted, thus allowing disordered aspects of the structure either to average out into the Debye-Waller factor in the case of crystalline disorder, or to lack any attributes of long-range order over the range of data provided in reciprocal space (after convolution with the finite size of the simulation) in order to describe amorphous solids (Renninger et al., 1974;McGreevy & Pusztai, 1988;Elliott, 1984). However, analysis of these large-scale atomistic ensembles containing thousands of atoms has been nontrivial, both in the challenge of extracting information relative to the average crystallographic structure and also in providing statistically meaningful information; there are typically many more free parameters in these simulations than there are independent observations (i.e. data).
Herein, we develop a systematic approach for analyzing the disorder in large atomistic simulations of complex crystal structures using representational analysis. The determination of crystallographic superstructures resulting from displacive distortions via symmetry-mode analysis of a statistical distribution of ensembles has proven to be very powerful (cf. WO 3 and LaMnO 3 ) (Kerman et al., 2012). Another similar approach, but coupled to a different analysis, has also made it possible to extract phonon dispersions from powder diffraction data (Dimitrov et al., 1999;Goodwin et al., 2004Goodwin et al., , 2005. Here, we use a variation of this technique adapted to the understanding of local structural variations by projecting displacements of atoms from their average crystallographic sites in atomistic ensembles onto a tight-binding-like basis formed from the symmetry-adapted 1 modes of a single unit cell, as depicted in Fig. 1; we define these modes as 'tightbinding modes'. When displacements from an ideal crystallographic site are projected onto this locally symmetryadapted basis, the disorder can be quantified and statistically analyzed to determine the frequency of specific displacement magnitudes and orientations. This manuscript outlines the analytical method and presents two illustrative applications of the method: the observation of a trigonal distortion in BaTiO 3 at room temperature and the identification of the local displacement modes in the charge-ice pyrochlore Bi 2 Ti 2 O 7 . More broadly, our approach is equally important for the analysis of experimental diffraction and scattering data King et al., 2011), ab initio and force-field-based simulations (Dixon & Elliott, 2014;Palin et al., 2014), and combinations of the two (White et al., 2010a,b). Furthermore, this approach provides a common language and representation for bridging experiment-and theory-derived models.

Introduction to total scattering methods
The analytical method described here operates on an ensemble of atoms that can be described as a enlarged 'big box' generated from small crystallographic unit cells. The atom positions need not sit on precisely ordered lattice sites; however, upon back-folding the big-box ensemble onto the parent unit cell, the average atom positions should project close to particular lattice sites, each with a position distribution resembling something like a Debye-Waller factor (i.e. the model may be paracrystalline). This method is agnostic to how the models are generated; the authors refer the reader to Egami & Billinge (2012), Young & Goodwin (2011), Keen & Goodwin (2015) and Tucker et al. (2007Tucker et al. ( , 2001 for descriptions of modeling total scattering data. Here, we use 'total scattering' to refer to the scattering of X-rays or neutrons that describes the structure factor of the crystallographic symmetry (diffraction from periodically ordered components) and the diffuse scattering that can arise from displacements of atoms from their ideal lattice points, including displacements from thermal motion and static disorder in the crystal (Egami & Billinge, 2012). If the total scattering structure factor, S(Q), is measured to a sufficiently high momentum transfer [Q max > $ 15 Å À1 ; Q = (4 sin )/, where is half the scattering angle and is the wavelength of the incident radiation, one can numerically take a sine Fourier transform to convert S(Q) into the reduced PDF, G(r): where 0 is the average number density of the material and g(r) is the atomic PDF. The atomic PDF, g(r), is a direct measure of the relative positions of the atoms in a solid, i.e. an experimentally accessible real-space histogram of all atomatom separations in the solid (of both periodically ordered and disordered atoms). Because of the crystallographic phase problem, without the use of isotopic labeling or anomalous scattering it is not possible to assign peaks directly in the PDF to specific atoms, so atomic scale modeling must be used to make assignments to individual peaks. 'Small-box' models, which allow the extraction of bond lengths and a description of the thermal motion (i.e. Debye-Waller factors), can be obtained from least-squares (LS) optimization of a crystallographic unit cell, or some small variant thereof, to the experimental PDF using the software PDFgui (Egami & Billinge, 2012;Proffen & Billinge, 1999;Farrow et al., 2007). LS optimization is susceptible to finding local minima in the goodness-of-fit and is numerically cumbersome when the model contains many degrees of freedom, as applicable here. Additionally, these short-rangeordered models often fail to provide an accurate description of the crystallographic observations (Neilson et al., 2012(Neilson et al., , 2013King et al., 2013).
A complimentary approach to extract atomistic configurations from the PDF is to model simultaneously both the crystallographic structure factor and the PDF by employing a 'large-box' simulation of the total scattering data. A reverse Monte Carlo (RMC) algorithm can be used to find atomistic configurations of the ensemble consistent with both the experimentally determined G(r) and S(Q) (Tucker et al., , 2001.

Coordinate transform and decomposition
The goal of this method is to define the atomic configurations of an ensemble as displacements from the ideal crystallographic positions. Here, we define a 'big' or 'large box' as an M x Â M y Â M z enlargement of the crystallographic unit cell to form an atomistic ensemble, but no attempt is made to constrain the symmetry between atoms, within either the subcells or the 'large box'. The simplest such basis is simply to write down displacement vectors, in Cartesian or lattice coordinates, for each atom in the ensemble. Each atom within the crystallographic unit cell i has a unique position defined by a vector x i,n . The vector R n describes the spatial vector between each unit cell n within the ensemble. Each atom can be mapped as a displacement from its ideal position in the crystallographic unit cell, x 0 i;n , by u i;n = x i;n À x 0 i;n , where the values x 0 i;n are often determined from a traditional crystallographic analysis (Rietveld analysis or single-crystal structural refinement). Such a representation is shown schematically in Fig. 2(a) for a simple two-dimensional 'toy' model, a 2 Â 1 'big box' built from a crystallographic unit cell with two atoms and C 4 symmetry. While straightforward to compute, this basis (the displacement vectors) lacks any connection to the symmetries that are present, locally or on average, and is thus difficult to interpret. A more refined approach is to rewrite the displacements in terms of the normal modes of the crystallographic structure, with amplitudes and phases for every mode at every wavevector in the Brillouin zone (as determined by the point symmetry of each wavevector). This normal-mode basis provides physical insight because the atomic displacements are mapped onto symmetrydefined motions away from their ideal positions, and correlations between unit cells are captured. There is, however, an even better choice of basis that keeps many of the advantages of the classic normal-mode approach but retains physical insight into the local symmetry changes.
First, the local tight-binding (i.e. locally symmetry-adapted) modes are identified. This is accomplished by rewriting all possible atomic motions within a single unit cell into motions consistent with the point symmetry of the crystal at the Brillouin zone center, k = (0, 0, 0). Each motion (or mode) can be labeled according to the irreducible representation (irrep) that it transforms under in the point symmetry group and is described by a set of basis vectors describing the actual atomic motions. Identification of these tight-binding modes is straightforward: basis vectors spanning each irreducible representation for each space group have been tabulated by Kovalev (1993), or can be computed by various crystallographic tools, including KAREP (Hovestreydt et al., 1992), SARAh (Wills, 2002), BASIREPS (Rodriguez-Carvajal, 2001), the Bilbao Crystallographic Server (symmetry-adapted modes) (Aroyo et al., 2011;Kroumova et al., 2003) and the ISOTROPY software suite (Stokes et al., 2013). The inputs for these tools are the crystallographic space group and the atom   positions of the small (crystallographic average) unit cell, as one would derive from Rietveld (or other suitable crystallographic) analysis.
These tight-binding modes provide an orthonormal and local basis for describing all possible motional degrees of freedom within a single unit cell, and are analogous to the normal vibrational modes of a molecular system. To retain this physical intuition but capture the degrees of freedom of a 'large-box' atomistic ensemble, we adopt a technique analogous to tight-binding methods in electronic structure calculations (Slater & Koster, 1954) and write down modes at a nonzero wavevector in terms of these local basis functions that we define at the Brillouin zone center (as for atomic orbitals), with appropriate phase factors to describe correlations between crystallographic unit cells in an ensemble. Specifically, we define the spatial correlations between unit cells within the ensemble with a quantized reciprocal wavevector, k = 2/R. The vector spans the indices k x = (2n x =M x , 2n y =M y , 2n z =M z ) for all n x = 0, 1, . . . , (M x À 1), n y = 0, 1, . . . , (M y À 1) and n z = 0, 1, . . . , (M z À 1); in other words, the wavevectors are in steps of 2/M along each direction. For mathematical convenience, we define all values of k as positive. The amplitude of a tight-binding mode, À j; ðkÞ, with the associated phase factor described by the reciprocal-space wavevector k, is defined by where i runs over all atoms in the crystallographic unit cell and n runs over all unit cells contained within the ensemble. The vector R points to the nth crystallographic unit cell in the ensemble. The values w i;j; are the vectorial contribution of atom i to the mode described by the ( j, ) pair. The vectorial contributions can span multiple atoms, as pertaining to the crystallographic multiplicity of the particular site in the original crystallographic unit cell. The index j specifies each set of modes that together transform as an irreducible representation of the point group; is equal to the dimensionality of the corresponding irreducible representation and runs over all modes in the set. Together, there are 3N distinct ( j, ) pairs, or tight-binding modes, where N is the number of atoms in the small crystallographic cell.
There is no index k on w, just like there is no wavevector dependence on atomic orbitals in the classic tight-binding electronic structure approach, because all wavevector dependences are explicitly included in the phase factors. Further, note that to retain all degrees of freedom we allow the amplitudes of each tight-binding mode to be independent of all others, even if symmetry would constrain them (i.e. because one irreducible representation may be spanned by multiple modes). This allows us to consider, but not enforce, symmetry in describing the 'large-box' atomistic ensembles. Stated differently, the projection is only a change of basis; all 3N À 6 degrees of freedom (for an ensemble of N atoms) are retained (omitting the three translational and three rotational degrees of freedom) and the exact atomistic ensemble can be reconstructed by the inverse of This method, as applied to the toy model, is shown in Fig. 2 We note that this is distinct from typical crystallographic order parameter analysis (Kerman et al., 2012;Dimitrov et al., 1999;Goodwin et al., 2004Goodwin et al., , 2005Stokes et al., 2013;Campbell et al., 2006), in which the constraints of the parent crystallographic symmetry are preserved and the primary interoperable variables are the order parameter amplitudes, thus providing one number for a pair of basis vectors that describe a displacement transforming as a two-dimensional irreducible representation, versus two numbers in our approach. We retain all possible degrees of freedom.

Continuous symmetry measures
When using our tight-binding modes, we can determine the activity of the mode and the deviation of the ensemble from the crystallographic symmetry, not just from the mode amplitude but also from its mean-squared deviation (MSD) from an ensemble operated on by a symmetry operation of the parent crystallographic space group. There are at least two distinct types of continuous symmetry measures [as developed by Avnir and coworkers (Zabrodsky et al., 1992;Alvarez et al., 2005)] that we characterize here. First, the global activity of a single tight-binding mode ( j, consisting of one or more individual modes depending on the dimensionality of the corresponding irreducible representation) can be quantified as the MSD between the jÀ j; ðkÞj amplitudes and the new amplitude coefficients, jÀ G;j; ðkÞ 0 j, following application of a symmetry operation G of the crystallographic space group: For purely symmetry-conserving displacements, the MSD should be zero. Here, it is critical to combine the squared amplitudes of all individual modes that together transform as a single multidimensional irreducible representation (the innermost sums) because the amplitudes of individual modes can be varied simply by changing the choice of basis vectors within that mode set. The sum over all wavevectors is justified to identify local symmetry changes because it corresponds to summing the contributions derived from a single local tightbinding mode (as for atomic orbital) and is exact in the molecular limit. The final square root is provided for convenience to make the magnitude of s G,j more physically interpretable. The related MSD, not broken down by individual mode sets, is similarly simple to calculate: where again the final square root is provided for convenience. The second type of deviation from the parent space group that can be identified is distortions that do not retain an equivalence of mode amplitudes within a single mode set that research papers transforms as a multidimensional irreducible representation. To illustrate this, consider a single box with C 4 symmetry and an atom in the center displaced along the diagonal direction (Fig. 3). Projected onto the two basis vectors À 1 and À 2 , which together span the two-dimensional irreducible representation E in the corresponding point group, the amplitudes along each basis are initially equal. As the Euler angle that defines the absolute orientation of the basis vectors is varied, the intensity of À 2 reduces while À 1 increases until À 1 is collinear with the atom displacement; this oscillatory pattern continues the rest of the way. Note that the sum of the square amplitudes from the two contributions (À 1 and À 2 ) is a constant (this is required, as the magnitude of the displacement is not changing). However, across multiple subcells or across multiple simulation runs, one can differentiate between random and ordered displacements. Let 0 be the initial angle of the displacement of the central atom. Different values of 0 correspond to phase shifts of the values of À 2 1 (and À 2 2 ). If the 0 values are completely random, then their average is a flat line as a function of Euler angle, with variances that are also flat (Fig. 3b). On the other hand, if the 0 values are pinned to specific directions, then only a subset of the phase shifts is present. This will often result in an average that is still flat as a function of Euler angle, e.g. if they are pinned every 90 , but the variances will no longer be uniform (Fig. 3c). This can be exploited to determine whether the displacements are approximately random or fixed in some subset of orientations relative to the parent unit-cell coordinate system.

Trigonal displacements in tetragonal BaTiO 3
The ferroelectric ceramic BaTiO 3 at T = 298 K provides an excellent example of local distortion that averages out to a higher crystallographic symmetry in the unit cell. The average crystallographic symmetry determined from Rietveld analysis is tetragonal, P4mm, which was used to define the tightbinding modes. However, the local bonding environment is significantly distorted and better described by the symmetry of the low-temperature R3m configuration (Kwei et al., 1995;Ravel et al., 1998;Page et al., 2010)   The PDF analysis reveals a local distortion present at room temperature that resembles the low-temperature R3m crystal structure (Kwei et al., 1995;Ravel et al., 1998;Page et al., 2010), illustrated here with exaggerated Ti and O displacements. (c) The folded atomistic big-box ensemble generated from an RMC simulation, overlain with the anisotropic displacement ellipsoids determined from small-box modeling of the PDF (R3m structure). The (200), (020) and (002)   information on the medium-range order, such as information on the correlations between unit cells or the coherence length scale, even though such information can (and should) exist within the PDF.
The experimental data used for this analysis were collected using the NPDF instrument (Lujan Neutron Scattering Center, Los Alamos National Laboratory, New Mexico, USA) and were re-analyzed with adjusted relative absorption corrections [such that a scale factor was not needed to fit the intensity G(r)]; the experimental details and original report of the experimental data are given by Page et al. (2010). The Bragg profile and PDF were used to constrain RMC simulations using the RMCprofile code , as illustrated in Figs. 1(a) and 1(b). The simulation ensemble is a 12 Â 12 Â 12 enlarged big-box ensemble of the tetragonal P4mm unit cell (8640 atoms) that was determined from Rietveld analysis. The ensembles were constrained by G(r) (in the range 1 < r < 24 Å ) in addition to the Bragg profile from the 90 detector bank of the NPDF (1.7 < Q < 15.7 Å À1 , 3.7 > d > 0.4 Å ). In addition to hard-sphere cutoffs, a small penalty was applied to the simulations for breaking [TiO 6 ] coordination in order to accelerate the simulations. Two hundred different simulations were performed from the same starting configuration in order to build statistics in the analysis. Each simulation ensemble can be back-folded into the unit cell; the atom positions fall within a cloud-like distribution centered around the average crystallographic site (Fig. 4c).
Using the analysis method presented here, the atom positions were then decomposed into the tight-binding basis of the P4mm space group with a k-mesh divided into 12 discrete steps along each x, y and z direction with M x = M y = M z = 12. The irreducible representations and corresponding basis vectors for the tight-binding (locally symmetry-adapted) modes were identified using the Bilbao Crystallographic Server (symmetry-adapted modes) (Kroumova et al., 2003) and are listed in Table 1; some basis functions are represented graphically in Fig. 5.
For the analysis of a single ensemble of BaTiO 3 , the tightbinding mode amplitudes that describe displacements along the ferroelectric polarization are not very large (Ti A 1 , O1 A 1 , O2 A 1 , O2 B 1 , Table 2). This makes sense, since the average positions of the Ti and O2 atoms are off-center along the elongated c-axis direction (Table 1) (Megaw, 1945(Megaw, , 1973. However, the displacements in the ab plane are significantly enlarged. This is represented graphically by the 'point-cloud' distributions of the atom positions in Fig. 4(c) that are overlain on top of the R3m unit cell used to describe the PDF by Page et al. (2010).
One problem with RMC simulations is that, if the data are insufficiently resolved such that some atoms are poorly Visualization of selected tight-binding modes of BaTiO 3 in the P4mm space group. For Ti (Wyckoff position 1b), all three basis vectors are shown, and they demonstrate the retained 3N degrees of freedom, as the pair that together transform as E are allowed to have independent amplitudes. For the O1 site (Wyckoff position 2c), the A 1 and B 1 modes each join two O-atom positions, but 3N degrees of freedom are retained for the two atoms generated from that Wyckoff position, noted by the six independent modes. While the E pairs will transform together if the local symmetry is also P4mm, all amplitudes are allowed to vary independently in this analysis. Table 1 Basis vector components along each crystallographic direction for BaTiO 3 , described in the P4mm space group setting, using the fractional atom coordinates Ba (0, 0, 0), Ti ( 1 2 , 1 2 , 0.516), O1a ( 1 2 , 0, 0.487), O1b (0, 1 2 , 0.487) and O2 ( 1 2 , 1 2 , 0.978).

Basis vector components
O2 0 1 0 constrained, then the simulation atoms can wander away from their ideal positions. This would give the same graphical appearance as in Fig. 4(c). However, the quantitative data presented in Table 2 show that these displacements are significant on average within an ensemble and that their variance is tightly defined, even across 200 simulations. As a control, we performed RMC simulations constrained by simulated PDFs. In one case (P4mm control), we computed G(r) from the P4mm crystal structure obtained by Rietveld analysis (convoluted with the appropriate instrumental resolution parameters, Q damp and Q broad ); the Bragg profile was the experimental Bragg profile. The simulated G(r) and Bragg profile were used to constrain 200 RMC simulations for analysis. For another control (R3m control), we took the reported R3m model determined from small-box modeling for the PDF [as reported by Page et al. (2010)] and simulated G(r) from that structural model; the Bragg profile was the experimental Bragg profile. These then constrained 90 independent RMC simulations for analysis. The P4mm control is a negative control that does not have additional displacements within the ab plane (beyond thermal disorder modeled by a Debye-Waller factor); the R3m control is a positive control for a known displacement in the ab plane coincident with thermal disorder. The tight-binding mode coefficients resulting from analysis of the P4mm control simulations do not have a substantial anisotropy (Table 2); while there is a statistically significant increase in the coefficients of displacements in the ab plane, this may be biased from using the experimental Bragg peaks in conjunction with the simulated G(r). For the R3m control, there is a significant and expected increase in displacements within the ab plane. This analysis informs us that the tight-binding mode amplitudes are capable of identifying aperiodic displacements when expected; however, the values of the coefficients alone do not inform us as to whether particular symmetry operations are broken.
With a local trigonal distortion, the R3m-based model implies that there are specific vectors along which the Ti displacements are oriented; these are the vectors that point directly at the faces of the [TiO 6 ] octahedra (i.e. the h111i directions, as referenced to the P4mm or Pm3m unit cells of BaTiO 3 ). However, looking at the graphical representation in Fig. 4(c), it is impossible to tell if particular directions are preferred. Because the tight-binding modes within a set are mutually orthogonal and therefore yield locally orthogonal displacements, it is trivial to rotate the reference frame of the basis vectors and recompute their coefficient as a function of the Euler angle along the rotation axis of the multidimensional irreducible representation. In the P4mm description, this angle () rotates around the fourfold axis of the unit cell.
In our analysis, we decomposed the atomic displacements into amplitudes of specific tight-binding modes as a function of rotation about the Euler angle, (Fig. 6). To illustrate this analysis, we employ two control simulations. Shown in Fig. 6(a) is a simulation of the displacements of Ti atoms around an approximately random distribution of angles. In Fig. 6(b Figure 6 Euler angle analysis of a multidimensional tight-binding mode set that transforms as a multidimensional irreducible representation. (a) A random distribution of atom positions from the crystallographic location (dashed circle in cartoon) produces an equivalent variance of the mixing coefficients (denoted with vertical bars) between the two basis vectors (w 1 and w 2 ) when the basis vectors are rotated about the Euler angle, , parallel to the C 4 axis of the crystal structure. (b) A clustering of positions at regular intervals, such as /2, will produce the same mixing coefficients as in part (a) for each tight-binding mode when averaged over all k and over all ensembles. However, a clustering of positions will yield a significant variance in the coefficients, denoted by the vertical bars. (c) The coefficients provided from the experimentally derived BaTiO 3 ensembles do not display significant differences when the basis vectors are rotated about the Euler angle. magnitude as in Fig. 6(a), but the angles are constrained to be a random integer multiple of /2 rad. Therefore, the Ti atoms are clustered into four groups (akin to the h111i displacements). In both cases, the average coefficient of the tightbinding modes that together form a set and span a multidimensional irreducible representation will not change as a function of , since the Ti atoms are displaced from the center by the same distance. However, the variance between tightbinding mode amplitudes [E(1) versus E(2)] will be distinct for each Euler angle (cf. Fig. 3). For the completely random distribution in Fig. 6(a), the coefficient multiplying À 1 of the irreducible representation E will vary continuously between 0 and the maximum value, as the basis vector is orthogonal and collinear with the atomic displacement; the second basis vector (À 2 ) will also vary by the same amount, but its amplitude will be /2 out of phase with À 1 . Therefore, each basis vector will have the same variance with , denoted by the error bars in Fig. 6. As in Fig. 6(b), if the atom displacements are clustered into groups, then the variation of basis vector coefficients will not be constant with . When = 0 rad, such that À 1 is oriented along the a axis, then its mixing coefficient will be 2 1/2 times the average value; the coefficient of À 2 will be identical. Therefore, the difference in coefficients is zero. However, when orients one of the basis vectors directly towards the clustered displacements, one coefficient is maximal and the other is zero; this produces a large variation in the basis vector amplitudes. In the experimental simulations, there does not appear to be explicit clustering of the Ti atoms along particular displacement vectors (Fig. 6c). Looking at the variation in coefficients for all atoms in the unit cell, depicted by the error bars in Fig. 7, there does not appear to be any clustering of displacements as a function of Euler angle. While the two-dimensional irreducible representations E for atom O2 appear to exhibit a trend with , the change in the average value of the coefficient reflects the definition of the basis vectors; the variations of the coefficients, as indicated by the error bars, do not change with . This result is consistent for RMC simulations run for different times (as disorder tends to be artificially maximized for longer simulation runs).
While a variation in coefficients with Euler angle can indicate clustering of displacements described by a multidimensional irreducible representation, it does not provide any indication of whether the degeneracy-inducing symmetry operation is broken. To find broken degeneracies, we turn to continuous symmetry measures as defined in the Method section. For BaTiO 3 , we compute the MSD for each generated symmetry operation of the crystallographic space group (P4mm). With four symmetry operations (E, v , C 2 and C 4 ), there are a total of eight symmetry-related atoms that are generated from a general position; therefore, we test all unique combinations of these operations (each combination that generates one of the general positions).
The MSDs illustrate that the atomic displacements in the ensembles show the highest deviation away from the fourfold rotation symmetry element. Histograms of all MSDs computed for BaTiO 3 (summed over all k and irreducible research papers Figure 7 Tight-binding mode coefficients as a function of Euler angle, , parallel to the C 4 axis of P4mm for the (a) Ti, (b) O1, (c) O2 and (d) Ba crystallographic sites, illustrating a constant variance as a function of .

Figure 8
Histograms of the mean-square displacements, summed over all k and mode sets ( j) for 240 ensembles for each symmetry operation of P4mm. (a) The identity E, (b) a vertical mirror along x, v , (c) a twofold axis, 2C 4 = C 2 , (d) a vertical mirror and twofold axis v + 2C 4 , (e) a vertical mirror plane along xy, v + C 4 = xy , (f ) the fourfold rotation axis C 4 and (g) 3C 4 . representations) are illustrated in Fig. 8 for each symmetry operation. The histograms for related symmetry elements are clustered together; those combinations that equate to a fourfold rotation have the most significant MSD (Figs. 8g and 8h), followed by mirror planes parallel to the {110} planes, then mirror planes parallel to the {100} planes.
To probe which irreducible representations are most symmetry conserving, histograms of MSDs summed over all k for each irreducible representation are shown in Fig. 9; the histograms bin together the MSDs computed for the equivalent symmetry operations shown on the right. The histograms for the Ti A 1 irreducible representation (Figs. 9a, 9c and 9e) show tightly grouped and low-value MSDs, indicating that the vertical Ti displacements tend to preserve the P4mm symmetry operations. However, the displacements that project onto the Ti irreducible representation E tend to break the symmetry operations, as inferred previously. The fourfold rotation axis appears to be the symmetry operation most frequently broken, as expected naively from the small-box trigonal model illustrated in Fig. 4(b), which does have a vertical mirror plane parallel to the (110) plane.
The analyses presented here for BaTiO 3 provide results that are sufficiently simple for easy comparison with small-box models of BaTiO 3 . The use of RMC simulations allows one to extract a single statistically relevant model of the atom positions that describes both the data regarding local atom separations (the PDF) and the average crystallographic symmetry (Bragg profile). For BaTiO 3 , the coefficients of the tight-binding modes and their spatial dependence reveal the presence of a significant distortion from the P4mm crystallographic symmetry. The resulting ensemble reveals that the atom positions are mostly displaced in the ab plane, which closely resembles the low-temperature R3m crystal structure. This example illustrates how such an analysis may be performed on materials with more complexity, in terms of both their crystal structure and their crystalline disorder, as described in the next section.

Correlated O and Bi displacements in Bi 2 Ti 2 O 7
The analysis methods presented here are generally applicable to materials with more complex structures. The 'chargeice' pyrochlore oxide Bi 2 Ti 2 O 7 has a large unit cell that contains 88 atoms; direct inspection of ensembles becomes prohibitive with this many degrees of freedom in a single crystallographic unit cell (Hector & Wiggin, 2004). In Bi 2 Ti 2 O 7 there is extensive disorder of the Bi sublattice, attributed to stereochemical activity of the lone pair -derived from the [Xe]5d 10 6s 2 electron configuration of Bi III -on a geometrically frustrated lattice. The geometry of the diamond lattice prevents long-range ordering of the dipoles, in a manner related to Pauling's ice rules (Seshadri, 2006). Previously, RMC simulations of total neutron scattering have been used to gain an atomistic representation of the static Bi displacements, which form a toroidal distribution of Bi atoms that encircle the ideal crystallographic site. Furthermore, the O 0 atoms (Wyckoff site 8a) are connected to the non-spherically distributed Bi atoms and therefore become displaced from their ideal crystallographic sites into tetrahedral volumes. The original report, experimental data and experimental details are given by . The crystallographic Bi 2 Ti 2 O 7 unit cell is described by the Fd3m space group, which defines the irreducible representations and tight-binding modes used here.
By rewriting the atomic displacements in terms of the tightbinding modes, a straightforward examination of their amplitudes reveals several characteristics that lead to many of the same conclusions as presented by

Figure 10
Visualization of the tight-binding modes for the different Bi irreducible representations. The E u , T 2u and T 1u (1) mode sets have the largest amplitude. They describe the displacements that generate a toroidal distribution of Bi positions and agree with the predicted displacement modes of E u and T 1u symmetry from ab initio density functional theory calculations . these coefficients are tabulated in Table 3 (the coefficients are averaged across modes related by face centering, over all k and across 320 distinct ensembles). Of the Bi modes (depicted in Fig. 10), those spanning the E u and T 2u irreducible representations generate displacements that reproduce the toroidal distribution of Bi positions observed by  and have the most significant amplitude. The mode spanning the A 2u representation is orthogonal to the C 1 rotational axis of the torus and has a small amplitude. The modes spanning the T 1u (1) and T 1u (2) representations have intermediate orientations and amplitudes. The decomposition of atomic displacements into tight-binding modes reproduces the physically meaningful and intuitive results presented by ; here, the averaging across many wavevectors and simulations identifies the robustness of these displacements.
Furthermore, identification of these high-amplitude modes allows one to create a 'small-box' model for a symmetryconstrained refinement. In work by  and Fennie et al. (2007), imaginary phonon modes were discovered at the Brillioun zone center; the symmetries of the polarization eigenvectors belong to the T 1u and E u irreducible representations (Fig. 11). The tight-binding modes spanning these irreducible representations have high-amplitude coefficients in the analysis performed here. Distortion of the Fd3m lattice along these polarization modes yields a small unit cell of Cm symmetry that provides an excellent description of G(r) for r < 3.5 Å . The agreement of the high-amplitude tight-binding modes with the theory-predicted distortion modes and small-box refinement illustrates another utility of this approach for unknown systems.
In contrast, the distribution of Bi-atom displacements is varied (Fig. 14). The Bi A 2u mode distribution is comparable to the Ti-O sublattice. However, many of the Bi displacement modes corresponding to multidimensional irreducible representations have high amplitudes and broad distributions, indicative of substantial static disorder in directions orthogonal to the linear O 0 -Bi-O 0 bond axis. This strongly suggests that those displacement modes locally break the Fd3m symmetry of the crystal structure.
In this analysis, the multidimensional irreducible representations are broken into their individual components (so as to retain the total number of degrees of freedom); however, the values of each tight-binding mode are identical in the case of Bi 2 Ti 2 O 7 . Then, to identify if and by how much the atomic displacements break the symmetry elements linking together tight-binding modes spanning a multidimensional irreducible representation, the continuous symmetry measure of each irreducible representation can be calculated. Fig. 15 contains histograms of the MSDs for each irreducible representation after operation on the simulation box by a specific symmetry operation (equation 4). The modes spanning the A 2u representation do not show any dependence on the symmetry operation, while the modes corresponding to the E u and T 2u representations do show a dependence on the operations. Specifically, the face-centering [+( 1 2 , 0, 1 2 ), +(0, 1 2 , 1 2 )] and inversion (i) symmetry operations show the highest MSDs as well as the broadest distributions, suggesting that those symmetries deviate by the largest magnitude and in the most ways. In future work, it will be informative to analyze the compatibility relationships as the degeneracy of different modes changes as k 6 ¼ (0, 0, 0).
The crystal structure of Bi 2 Ti 2 O 7 presents a very complex problem as the unit cell contains 88 atoms, resulting in 264 degrees of freedom or 264 distinct tight-binding modes to describe all atom displacements. When trying to analyze a large ensemble simulation of this structure, analysis in Cartesian coordinates becomes unwieldy. Decomposition of the structure into the crystallographically relevant local basis allows one to determine the highest amplitude disorder in the lattice, the distribution of amplitudes, the direction of the Histograms of the mean-square displacements of the (a) Ti and (b) O mode sets, summed over all k, all symmetry operations and 320 ensembles, illustrating that the Ti-O sublattice is not disrupted. Only the O A 1g mode shows any distribution of the MSD.

Figure 14
Histograms of the mean-square displacements of the Bi mode sets, summed over all k, all symmetry operations and 320 ensembles, illustrating that the E u , T 2u and T 1u (2) representations break the symmetry operations of Fd3m; those modes correspond to the toroidal displacements shown in Fig. 10. atomic displacements causing the disorder, and how the disorder breaks specific symmetry elements of the crystallographic space group and by how much.

Conclusions
The representational analysis of large atomistic ensembles generated from simulations (such as from reverse Monte Carlo simulations of total scattering data) using a tightbinding basis derived from locally symmetry-adapted modes is a robust method that allows one to quantify disorder in the lattice. In many RMC simulations, the goal is often to characterize subtle deviations from the lattice. These types of displacement are subtle perturbations from a lattice that possesses a modicum of moderately isotropic thermal disorder. Therefore, isolation and quantification of the disorder (i.e. of infrequent events) requires statistical analysis. By representing the disorder with respect to a local basis of the background signal (i.e. symmetry-adapted modes of the crystallographic space group), displacements appear as a positive signal, are amplified and can be analyzed statistically. Additionally, the approach presented here permits a framework for analyzing other types of degrees of freedom, such as occupational/compositional disorder (e.g. solid solutions) or magnetism. Such a rigorous group-theoretical treatment is currently implemented in ISODISPLACE (Campbell et al., 2006).