Structure determination from unindexed powder data from scratch by a global optimization approach using pattern comparison based on cross-correlation functions

A new method for the structure determination of molecular crystals from unindexed powder data has been developed and successfully applied. The method performs a global optimization using pattern comparison based on cross-correlation functions.


Introduction
Structure determination from powder diffraction data (SDPD) is an important technique for the investigation of crystalline solids (David et al., 2002;David & Shankland, 2008;Harris, 2012;Č erný, 2017). This is particularly true if the material can not be prepared as a single crystal, or in cases where the structure of a powder of low crystallinity is at the centre of interest. SDPD generally starts with the indexing of the powder pattern. Reliable indexing fails if the pattern contains too few or too broad reflections or if the powder is not phase-pure (Brü ning & Schmidt, 2015).
If indexing fails, the intensities of the hkl reflections cannot be extracted and reciprocal-space approaches such as direct methods cannot be applied. Also, the common direct-space methods that solve structures by translation, rotation and conformational changes of the molecules in the unit cell require the unit-cell parameters as input, e.g. in DASH (David et al., 2006;Spillman et al., 2015), FOX (Favre-Nicolin & Č erný, 2004;Č erný et al., 2017), EXPO (Altomare et al., 2009(Altomare et al., , 2013, TOPAS (Coelho, 2018) or MRIA (Zhukov et al., 2001).
Without knowledge of the unit-cell parameters and space group there are two main obstacles: (i) Six additional parameters (a, b, c, , , ) must be determined. This is not a principal problem, because they correspond to the reflection positions. However, it implies an enormous expansion of the search space and an increase in the required computing time.
(ii) The exploration of the search space depends on comparison of the observed pattern with the powder patterns simulated from structural models. Common direct-space methods and Rietveld refinement perform this comparison based on pointwise differences between the two curves, i.e. the intensity differences at each individual 2 value. This approach to quantifying the agreement between the patterns, e.g. by the most commonly used 2 and R wp values (Toby, 2006), only works for changes in atomic coordinates, and for small changes in the unit-cell parameters, if the reflection positions do not shift by more than a few reflection half widths. The powder pattern is highly sensitive to even small changes in the unit-cell parameters. Hence, the comparative measure becomes meaningless if the unit-cell parameters of the structural model deviate too much from the correct ones and the simulated reflections do not overlap with the corresponding signals in the observed pattern.
In contrast to the intense development of global optimization methods working within the subspace determined by a given unit cell, there have been only a few attempts to apply global optimization approaches to the extensive global space beyond that. The search space contains more parameters, i.e. more dimensions, which are difficult to track. The even more crucial element is the link between model and experiment. The shape and characteristics of the multidimensional (dis)similarity hypersurface substantially affect the effectiveness and efficiency of the global search and local optimizations. Attempts at SDPD without prior indexing typically address these problems using alternative approaches to pattern comparison or using energy calculations. Hofmann & Kuleshova (2006) used a similarity index based on the distances between the normalized integral curves of the patterns for structure fitting to the powder data, starting from crystal structures predicted by force-field lattice energy minimization. Padgett et al. (2007) employed a combination of grid search and a genetic algorithm in the program OCEANA, using a combination of R wp values and force-field energy. Rapallo (2009) developed a hybrid Monte Carlo method implemented in the software VARICELLA, where coordinate changes are performed according to molecular dynamics, using a joint probability density of the potential energy and a disagreement factor that compares the Fourier transforms of the patterns. De Gelder and co-workers employed a genetic algorithm for simultaneous indexing and structure solution, using pattern matching based on cross-correlation functions in the program FIDDLE (de Gelder et al., 2008;Guguta, 2009;Smits et al., 2009). However, all these approaches have their limitations, and to our knowledge there is at present no generally applicable well functioning method for the structure determination of molecular crystals from unindexable powder data.
We developed a robust local optimization procedure that uses pattern matching based on cross-correlation functions for the fitting of a structural model to the experimental pattern. The approach was first implemented in the computer program FIDEL (FIt with DEviating Lattice parameters) (Habermehl et al., 2014). FIDEL fits not only the unit-cell parameters, but also simultaneously the position and orientation of the molecules and selected internal degrees of freedom. FIDEL proved to be capable of fitting significantly deviating structural models to powder data of high or low quality. In a previous paper (Habermehl et al., 2014), we described the useful and successful application of the procedure to the refinement of crystal structures, if a suitable structural model with possibly strongly deviating unit-cell parameters is available. The model may come from the crystal structures of isostructural compounds (e.g. solvates, hydrates or chemical derivatives), or from diffraction data measured at a different temperature or pressure. Alternatively, trial models can be obtained by a crystal structure prediction (CSP), e.g. a global lattice energy minimization. Typically, the simulated powder patterns of these structural models deviate significantly from the experimental data, in particular in their reflection positions. Nevertheless, the FIDEL fits were successful.

The generalized expression for similarity
To compare powder patterns and fit a crystal structure to the experimental powder pattern, FIDEL uses the generalized similarity measure S 12 , which was introduced by de Gelder et al. (2001) as a versatile similarity criterion for pattern comparison. S 12 correlates data points within a certain 2 neighbouring range. This absolute and normalized measure is based on the weighted cross-and auto-correlation functions of the patterns to be compared. S 12 puts emphasis on the strong reflections, while being tolerant of changes in the position or shape of the reflections. It can properly recognize even rough matches, in particular with respect to signal shifts in otherwise similar patterns.
The cross-correlation function c 12 (r) of two powder patterns I 1 (2) and I 2 (2) correlates every data point of one pattern to data points at the 2 distance of r in the other pattern, The auto-correlation functions c 11 (r) and c 22 (r) of each pattern are defined analogously. The correlation of data points, however, is restricted to a certain neighbourhood by the introduction of the triangular weighting function w(r), with the neighbouring range parameter l (l > 0) corresponding to the full width at half-maximum of the weighting function.

research papers
Integration of the weighted cross-correlation function leads to a single value. This value is normalized using the corresponding weighted auto-correlation functions of the two patterns, resulting in the generalized similarity measure described by de Gelder et al. (2001), Generally, S 12 can adopt values between À1 and 1. In the case of powder patterns with positive intensities, S 12 adopts values between 0 and 1, where S 12 = 1 corresponds to identical patterns. A schematic illustration of S 12 is shown in Fig. 2 of Habermehl et al. (2021).
The similarity measure can be adapted to the specific characteristics of a problem by varying the neighbouring range parameter l. A large value of l allows the treatment of patterns with strongly deviating reflection positions. Narrowing the weighting function by decreasing l, on the other hand, leads to a more accurate comparison, which is useful for already very similar patterns. The limit of S 12 as l approaches 0 leads to a pointwise comparison of the two diagrams that corresponds to the Pearson correlation coefficient (de Gelder et al., 2001;Habermehl et al., 2021). The values of S 12 (l = 0) based on the full 2 range of the experimental data are denoted here as the reference similarities S 0 12 (including the background on both sides) and S 0 12;bc for background-corrected patterns. S 0 12 and S 0 12;bc are employed as reference values for the comparison of results obtained with different values of l and different 2 comparison ranges.
The similarity measure S 12 can be used for: (i) The comparison of two crystal structures, regardless of their chemical composition or crystal symmetry (Macrae et al., 2008;Sacchi et al., 2020).
(ii) The comparison of a structural model and an experimental powder pattern.
(iii) The comparison of two experimental powder patterns.
(iv) The selection of a peak shape function and optimization of its full width at half-maximum (FWHM), based on the comparison of an experimental powder pattern with simulated patterns derived from an arbitrary list of signal positions and intensities (see Section 3.1).
(v) Local optimization by fitting a crystal structural model with possibly strongly deviating unit-cell parameters to an experimental powder pattern (Habermehl et al., 2014).
(vi) The clustering of similar structures by comparison of their simulated powder patterns (de Gelder et al., 2001) or by fitting structural models to simulated patterns (this work).
(vii) Automatic peak alignment of a set of in situ X-ray powder diffraction patterns using the maximization of the similarity of experimental powder patterns (Guccione et al., 2018).
(viii) The clustering of large lists of experimental powder patterns.
(ix) The screening of lists of structural models (e.g. from CSP or a database) by (a) comparison ( de Gelder, 2006) fitting against an experimental powder pattern (Habermehl et al., 2014;Neumann, 2016) (this work).
(x) Global optimization approaches to SDPD from scratch using (a) comparison (de Gelder et al., 2008;Guguta, 2009;Smits et al., 2009) or (b) comparison and local optimization of structural models (this work).
S 12 can also be used to compare two pair distribution functions (PDFs) . Correspondingly, the above-mentioned applications are possible either by comparing and fitting to powder patterns or by comparing and fitting to PDFs (e.g. Schlesinger et al., 2021).

Crystal structure fitting
The similarity measure S 12 is used by FIDEL for the fit of a crystal structure to a powder pattern. A structural model is described by the molecular geometry and a parameter vector. The molecular geometry is described by internal coordinates given as a z matrix (see Shankland, 2004). The parameter vector contains: (i) The unit-cell parameters a, b, c, , , , (ii) The fractional coordinates m x , m y , m z of an anchor point of the molecule or molecular ensemble, (iii) The rotation angles ' x , ' y , ' z describing the change in spatial orientation relative to the initial orientation, and (iv) A number of internal degrees of freedom i referring to distances, angles and torsions in the z matrix.
All these parameters can be fitted. Depending on the space group, some of them may be fixed or constrained. The internal degrees of freedom i account for the variation in bond lengths, bond angles, rotation of bonds and even more complex conformational flexibilities. They are also used to model structures with more than one molecule in the asymmetric unit, including solvates and ionic compounds. The capability of this approach to model concerted conformation changes is limited. Some options may require sophisticated z-matrix constructions including dummy atoms.
During the fitting of a given crystal structural model (starting structure) to the powder pattern, the parameter vector is altered under maximization of the similarity S 12 of the simulated powder pattern and the background-corrected experimental pattern as the cost function. The local optimization is done by a robust and customizable fitting procedure using steepest ascent, conjugate gradient or hill-climb algorithms. The best results are obtained with a modified hill-climb algorithm, although this also requires the most computing time.
The simulation of powder patterns from crystal structures is done based on a common methodology. The integral reflection intensities are computed from the crystal structure by where s is a scaling factor, L the Lorentz factor, P the polarization factor, A the absorption factor, T a factor accounting for preferred orientation effects, M the reflection multiplicity according to the crystal symmetry and F hkl the complex structure factor. The powder pattern is derived by applying a peak shape function p(Á, ) to each I hkl value in a given range. FIDEL implements different functions for L, P, A, T and p(Á, ) that can be chosen and parametrized according to the experimental conditions of the diffraction measurement [for more details see Section 2.2 of Habermehl et al. (2014)]. The characteristics of the similarity measure S 12 facilitate working with static inputs and settings for the modelling of the diffraction pattern that are not altered during the fitting procedure. Typical preparations include a reasonable background correction of the experimental pattern, the selection or configuration of the intensity correction functions in equation (4), the selection of a peak shape function p(Á, ) and a raw estimate of the FWHM of the reflections. It is sufficient to satisfy these requirements once before fitting the structure. Hence, the local optimization procedure is fully focused on the fitting of the small number of structural parameters. Only the FWHM value is usually slightly adjusted at the stage of a FIDEL fine fit of a structural model that already matches the experimental data quite well.
FIDEL's crystal structure fitting by maximization of S 12 is particularly suitable for poorly crystalline compounds (broad overlapping reflections) or powder data of low quality (e.g. phase-impure samples, low signal-to-noise ratio). The local optimization approach has been successfully applied for automatic SDPD starting from (i) the crystal structures of isostructural compounds, (ii) crystal data measured at different temperatures, and (iii) results of CSP by force-field methods, including successful application to the powder pattern of a sample of ethyl-tert-butyl ether with significant phase impurity [see Section 6 of Habermehl et al. (2014)]. The structural models resulting from a FIDEL fit are subsequently refined by an automatic Pawley fit and Rietveld refinement sequence using the program TOPAS (Coelho, 2007) controlled by FIDEL, and finalized by a user-controlled Rietveld refinement.

Development of the global optimization method
The method for structure determination from unindexed powder patterns described by Habermehl et al. (2014) requires as input either an appropriate structural model or a list of structures, e.g. from a crystal structure prediction (CSP). The significant increase in the reliability of CSP (Neumann, 2008;Reilly et al., 2016;Neumann & van de Streek, 2018) is concomitant with a demand for exhorbitant required computing time. When searching for the structure corresponding to just one experimental powder pattern it is simply not necessary to try to find all possible low-energy structures for a given compound. Hence, we transferred the global optimization approach of CSP to the direct fit of crystal structures to powder diffraction data, thus avoiding the major effort for a reliable search for structure candidates by energy minimization. Furthermore, the approach by direct fitting may reveal the existence and kind of disorder in the examined structure, as well as other effects that could be missed by the approach via the screening of CSP results.
We developed a new method for SDPD from scratch by global optimization, FIDEL-GO ('FIt with DEviating Lattice parameters -Global Optimization'), based on the method employed by FIDEL for local optimization. Here we describe this global optimization method and its implementation (Section 2). By exploiting the potential and versatility of the pattern comparison approach of S 12 , a complete framework for SDPD evolved that comprises almost all scenarios of crystal structure determination from powder data (Section 2.4).
After giving some computational (Section 3) and experimental (Section 4) details, we present applications (Section 5) of SDPD from scratch with FIDEL-GO for four powders ( Fig. 1): (i) the -phase of 4,11-difluoro-quinacridone (DFQ), (ii) the -phase of 2,9-dichloro-quinacridone (DCQ), (iii) 2,9dichloro-6,13-dihydro-quinacridone (DCDHQ) and (iv) CuCl 2 (pyridine) 2 (CuCP). DFQ, DCQ and DCDHQ are nanocrystalline organic pigments with rigid or semi-rigid molecules. They were chosen to demonstrate how the FIDEL-GO method works, and to prove that crystal structures of medium-sized molecules can actually be determined from powder patterns with only about 15 peaks. The limitations of the method are also discussed. The coordination polymer CuCP serves as an example of a moderately flexible compound with ten internal degrees of freedom in the calculation.

Method
The feasibility and success of SDPD from scratch with FIDEL-GO are attributed to the synergy of the following concepts and approaches: (i) The similarity measure S 12 and the adaptation of its neighbouring range parameter l for comparison, fitting and clustering.
(ii) The compact description of structural models with a minimal number of variable parameters.
(iii) The robust and well considered fitting algorithms.
(iv) A suitable setup and handling of the global parameter search space.   (v) The Monte Carlo approach for exploration of the search space.
(vi) The hierarchical search strategy advancing from preselection by comparison through several steps of fitting, evaluation and selection of structure candidates to the final refinement.
(vii) A sound overall architecture based on automation, frameworking and interfacing, supported by the integration of many concepts, methods, software and data sources.
The core elements of the new method are the global optimization runs (GO) in selected crystal symmetries (space group, Z 0 , Wyckoff positions). The overall procedure of SDPD from scratch with FIDEL-GO consists of the following subsequent stages which will be referred to by their specified acronyms: (i) GO -global optimization runs (Section 2.3, Fig. 2), yielding sets of qualified structural models.
(ii) RE -automatic re-evaluation: (a) collection, filtering and ranking of the GO results, yielding the primary result set for each crystal symmetry (RE1); (b) enhanced FIDEL fitting and clustering of structures that reach a high similarity, yielding the final results of the global optimization, a list of top-ranking structure candidates (RE2).
(iii) AR -automatic Rietveld refinement of one or more structure candidates selected by the user based on critical evaluation of the RE2 results.
(iv) DO -geometry optimization by lattice energy minimization of selected structural models using dispersioncorrected density functional theory (DFT-D), if necessary.

Inputs and settings
The following inputs and static settings are required for SDPD from scratch: (i) A background-corrected powder pattern.
(ii) A molecular geometry model (e.g. from geometry optimization).
(iii) Selection of a peak shape function and estimation of FWHM.
(iv) Settings related to instrumentation and measurement parameters.
(v) Selection of internal degrees of freedom.
(vi) Selection of the crystal symmetries and search space setup.
(vii) A 2 range for the comparison of simulated and experimental patterns (e.g. 3-40 ).
(ix) Selection and configuration of optimization algorithms and convergence criteria.
In FIDEL-GO all of these settings are supported by reasonable defaults or automated procedures for their determination or generation.

Construction of the search space
Each global optimization run is performed in a given crystal symmetry, i.e. space group, Z 0 and the site symmetry of the molecule(s). Likely crystal symmetries for the search can in some cases be derived from indecisive indexing or from the symmetries of related compounds. The general approach to the selection of crystal symmetries for SDPD from scratch, however, is based on the space group statistics of the Cambridge Structural Database (CSD; Groom et al., 2016). The statistical analysis by Pidcock et al. (2003) is used to identify the most common crystal symmetries for the molecular symmetries. The selection of space groups and special positions is usually fine-tuned based on crystallographic experience.
For every fitted parameter and for the cell volume sensible ranges are defined. Minimum and maximum values of the unitcell parameters are derived from the spatial dimensions of the molecules. The ranges for the parameters describing the position and orientation of the molecules are set according to the characteristics of the space group and the site symmetry. The ranges for conformational degrees of freedom are set considering chemical plausibility and molecular symmetry. The range for the cell volume is set according to the estimated  molar volume based on volume increments given by Hofmann (2002) or to known crystal densities of related phases or compounds. The parameter and volume range settings apply only to the starting structure and do not constrain the trajectories of local optimization runs. The local optimization with the preferred hill-climb algorithm, however, can include restraints on the fitted parameters when they approach the range boundaries.

Global optimization
The general problem of global optimization approaches to SDPD lies in the huge amount of computing time required, even if the number of fitted parameters is comparatively small. Powder patterns are highly sensitive to small changes in the crystal structure. This is an essential advantage for SDPD, but generates a major obstacle to SDPD from scratch without prior indexing. In structure fitting to powder data the most time-consuming computational task is the simulation of powder patterns from the structural models. The characteristics of the similarity measure S 12 are very well suited to coping with this problem. S 12 facilitates the detection of a rough match of a trial structure to the powder data using a broad weighting function w(r). This allows for an effective preselection of suitable trial structures by comparison. Furthermore, the similarity hypersurface is smoothed out by the use of a relatively broad weighting function, which is in favour of fast local optimizations going in the right direction. Successive narrowing of the neighbouring range leads to a more accurate fit. The combined approach of pre-selection and local optimizations under the general regime of successively reducing an initially broad neighbouring range of S 12 is the major key to the general applicability, scalability, efficiency and effectiveness of the method. It allows for a drastic reduction in the number of time-consuming pattern simulations, while still being very specific in the search for the best match. The global optimization method of FIDEL-GO takes into account these characteristics of the problem and of S 12 by employing a complex hierarchical search strategy.
The procedure of a global optimization run in a given crystal symmetry is summarized in Fig. 2. The GO run is based on the generation of random trial structures. Each random structure passes through up to four successive steps with increasing computing effort: (i) GO1 -a check of cell volume and geometry (rejection of structures with too close contacts of atoms).
(ii) GO2 -pre-selection of trial structures by comparison of the simulated and experimental patterns using a broad weighting function, yielding S start 12 . (iii) GO3 -a fast local fit: raw structure fitting with a fast (usually conjugated gradient) optimization algorithm, yielding S opt 12 .
(iv) GO4 -a cycle of more accurate local fits with a better but more time-consuming hill-climb algorithm, a narrowing weighting function and successively stricter convergence criteria.
The conditional steps GO3 and GO4 are triggered by the two threshold levels S start;thre 12 and S opt;thre 12 , respectively, that are adjusted dynamically during the GO run so that the relative computing times for the generation and pre-selection of trial structures (GO1-GO2), the raw structure fitting (GO3) and the post-optimization cycle of more accurate fittings (GO4) are balanced. This approach ensures that enough random structures are evaluated and allows the procedure to adapt automatically to various conditions resulting from the crystal symmetry and the experimental powder data.
In the initial setup the search space is huge with respect to the unit-cell parameters. While the cell volume constraint already cuts out only a small part, the search space for the unit-cell parameters is still highly redundant in terms of crystallographic equivalence. The qualified structures accumulating in the list of results all come from the local optimization (GO3-GO4) with a wide convergence radius, in particular regarding the response of the unit-cell parameters to the dominant signals in the observed pattern. Accordingly, at least the best models are found multiple times and the result set is subjected to clustering of similar structures. Moreover, the unit-cell parameter search space is populated with qualified candidates rather selectively during the global search. These structural models appear with different unit-cell settings. After the automatic cell transformation, at least some of the unit-cell parameters usually show a monomodal distribution, thus effectively implying a fuzzy indexing of the powder data. The fitted parameters of the molecules (m i , ' i , i ) may exhibit a more or less pronounced mono-or multimodal distribution pattern as well, e.g. due to the steric hindrance of torsions or the stacking of planar molecules. This valuable information evolving during the global search is exploited by an auto-focusing mechanism that dynamically adapts the search space based on the population analysis of structures with high S 0 12;bc . The overall procedure passes through a number of iteration steps, where (i) The list of qualified structural models is subjected to filtering, clustering and subsequent automatic cell transformation (labelled IT1 in Fig. 2), (ii) The search space is narrowed based on the statistical evaluation of the structure candidates found thus far (IT2), (iii) The weighting function for the similarity computation is narrowed (IT2), and (iv) The convergence criteria become stricter (IT2). The search ranges are carefully narrowed, reacting primarily to pronounced monomodal parameter distributions, in particular with respect to the unit-cell parameters. Since the trajectory of local structure fits is allowed to go beyond the borders of the actual search space, the automatic adaptation can even shift or widen the search ranges. A similar approach of using dynamic boundaries driven by the evolving parameter distributions is also used in e.g. SDPD by the evolutionary direct-space method of Chong & Tremayne (2006).
The filtering out and rejection of unsuitable structures and the evaluation of potential structure candidates is primarily based on their reference similarity S 0 12;bc (Section 1.1). Each research papers candidate is also characterized by the weight W C , an integer value indicating the number of levels (GO1-GO3 and cycles of GO4) it has passed. The cell volume, and optionally the result of a single-point force-field energy calculation E FF , are used as additional descriptors. The clustering of structure candidates is based on the pairwise comparison of simulated powder patterns using S 12 with a narrow neighbouring range and a high similarity threshold for the grouping of structures. Every cluster is represented by the structure that exhibits the highest S 0 12;bc . The other structures of the cluster are discarded and the value of their weight descriptor W C is added to the W C of the top candidate that represents the group. Of course, all structure candidates in the list of results fit the observed pattern to a considerable extent, which is typical of low-quality experimental data. The simulated powder patterns used for structure comparison are much more detailed and a large number of medium and weak reflections allows the differentiation of structures that are actually different. Any clustering runs the risk of concealing significant differences, thus merging structures incorrectly if the criteria are too tolerant. This is also true for existing polymorphs compared via the S 12 of simulated patterns (Sacchi et al., 2020). Optionally, FIDEL-GO can use a more secure variant that tests the structural similarity by fitting the lower-ranking structural model to the simulated pattern of the top candidate. If the simulated patterns become identical, the structures are indeed duplicates. However, this is very time-consuming and normally not necessary. The clustering is always carried out with sufficiently strict criteria, thus leading to a certain persistence of structure candidates that turn out to be equivalent at a later stage.
After the general exploration of the search space the best ranking structural models can be re-evaluated by targeting the global optimization procedure at parameter search regions in their neighbourhood (labelled CE in Fig. 2). While being a Monte Carlo method in the first place, the global optimization method of FIDEL-GO also includes mechanisms corresponding to other optimization approaches. The hierarchy of conditional random structure evaluations and fits (GO2-GO4) is similar to simulated annealing approaches. The iterative adaptation of parameter ranges (IT2) and the reevaluation of search-space regions in the neighbourhood of top structure candidates (CE) resemble characteristics of evolutionary algorithms.

SDPD procedure and application framework
The result sets of one or more global optimization runs (GO) in each of the selected crystal symmetries are collected, filtered and ranked, yielding the primary result set for each crystal symmetry (RE1). The standard filter criteria account for the agreement with the powder data S 0 12 , a sensible molar volume and a maximum number of candidates considered. Subsequently, the top ranking structures of the primary results are subjected to an automatic re-evaluation procedure including enhanced fitting and clustering (RE2), yielding the final results of the global optimization by FIDEL-GO. The enhanced fitting is performed using a larger 2 range, smaller l values (0.1-0.2 ) and stricter convergence criteria. Furthermore, the fine fit may include an improved profile modelling or the fitting of additional internal degrees of freedom.
After the evaluation of similarity values, molar volumes and crystal structures, and the visual comparison of the simulated powder patterns with the experimental data by the user, the SDPD procedure succeeds with automatic Rietveld refinements (AR) of selected promising structures. The structure determination is finalized by a careful and sound usercontrolled Rietveld refinement (UR).
DFT-D geometry optimizations (DO) of selected structures can be employed in order to gain valuable hints in the case of persistent ambiguities of different structural models. In particular, in the case of 'problematic' powder data the global optimization may provide several different models that are chemically sensible and match the experimental data similarly well. Crystal structures can be validated by lattice energy minimization using DFT-D, as has been shown by Neumann and van de Streek (Neumann et al., 2008;van de Streek & Neumann, 2010;van de Streek & Neumann, 2014).
The overall SDPD procedure is outlined in Fig. 3. The global optimization method is primarily targeted at SDPD from scratch. However, FIDEL-GO has evolved into an almost comprehensive application framework suitable for a wide range of application scenarios. By specific configuration of the global optimization runs, the method can easily be adapted to a variety of 'less global' applications, described below. The flowchart in Fig. 3 also shows how several auxiliary third-party components and specific adaptations of the global optimization runs fit into the hierarchy of procedures that make up the general framework.
2.4.1. Structure solution fit (SF). If the unit-cell parameters are known from indexing or from isostructural compounds, the structure solution is carried out with very narrow ranges for the unit-cell parameters. The use of narrow ranges instead of fixed unit-cell parameters takes into account the limited accuracy of the indexing results and adds some flexibility to local optimization trajectories.
2.4.2. Reduced global fit (RG). If the indexing of a powder pattern is substantially uncertain or incomplete, the procedure can be run in one or a few space groups using specific range settings for the unit-cell parameters. This is a valuable approach, e.g. for patterns which are dominated by hk0 reflections, so that a*, b* and * can easily be determined, whereas information on c*, * and * is low or even completely absent.
2.4.3. Regional fit (RF). The search space can be automatically centred around a given starting structure using comparatively small parameter ranges, e.g. whenever a local FIDEL fit as described by Habermehl et al. (2014) cannot successfully capture the structure, or when the fitting to the observed pattern leads to an obviously wrong local similarity maximum. The regional fit can also be used for the validation of a structure solution (VS), i.e. to check whether a structural model is really the best match to the powder data within a narrow parameter hyperspace region (see Gorelik et al., 2021).

2.4.4.
Screening of large sets of structural models (SC). The random trial structures that are usually the starting point of the global optimization can be replaced by a list of input structures, e.g. from a CSP (see Section 5.2.2) or sets of possibly isotypical structures of chemical derivatives, solvates or hydrates. Thus the procedure turns into a tool for the automated screening, fitting, clustering and ranking of structure candidates for an experimental pattern.

Computational details
FIDEL-GO, the global optimization approach to SDPD described above, has been implemented by extension of the program FIDEL (Habermehl et al., 2014). The program is essentially a highly configurable non-interactive command-line tool supporting simple ad hoc calls as well as complex workflows. The core executable is written in ISO C with bindings to C and C++ libraries. It supports parallel execution of time-consuming tasks on multi-core machines. Active development and application is done on various Windows and Linux systems. The program supports the construction of complex process chains using FIDEL-GO features, XSL transformations (see Section 3.2), template-based outputs and the integration of external programs. A menu-based user interface allows fast interactive work with FIDEL. The FIDEL-GO calculations require a computational effort comparable to that of a CSP with force fields. A typical full global optimization run in one space group with 10 6 random trial structures took about 12 hours (range 3-30 hours) on an Intel Core i7 at 3.2 GHz. At present FIDEL-GO can only be used by an experienced user, therefore it is not yet included in the commercial version of the program FIDEL.

Automation and performance
Automation and performance are two crucial aspects regarding the feasibility of approaches to SDPD for practical applications. Hence, substantial effort was made to meet these requirements by appropriate architectures for the methods and computational procedures. At the methodological level, both requirements are met by employing a flexible hierarchy of scalable or adaptive procedures. At the implementation level, a key to automation was the integration of existing methods, functions, programs and software libraries, in particular by integrating established or innovative open source software in the fields of chemistry, crystallography and diffraction. At the computational level, performance is supported by parallelization of major time-consuming computations on multi-processor machines. Performance options at different levels of the general procedure include the caching of reflection lists, intensity corrections and scattering factors, neglecting of hydrogen atoms, and the limitation of the 2 range for powder pattern simulation. The general performance perspective of distributed computing is implicitly supported by any global optimization approach and explicitly supported by FIDEL-GO and its procedures.
(ii) Determination of an appropriate peak shape function and estimation of FWHM based on analysis of the diffractogram and peak extraction by PeakSearch (Oishi-Tomiyasu, 2012), followed by optimization of the FWHM value by FIDEL-GO. The optimization is done by fitting a powder pattern simulated from the extracted peak positions and intensities to the experimental data using S 12 with a small l value. This does not include indexing attempts nor the use of a structural model.
(iii) Construction of a z-matrix representation of the molecule(s) that allows a reasonable fit of molecular arrangements and flexibilities (see Section 1.2). Degrees of freedom for rotatable bonds and for the movement of frag-  Schematic flowchart describing the application framework for SDPD with FIDEL-GO. The general overall procedure for SDPD from scratch as described in this work is indicated by bold lines and borders. Applications of the global optimization method are shown in orange. Local structure fitting is shown in green. Major decisions and actions by the user are depicted in blue. The integration of optional third-party external programs is indicated by purple boxes and dashed paths. Yellow refers to stages inheriting multiple tasks that are largely automated but need some user interaction. ments can be determined automatically using the structure analysis features of the OpenBabel library (O'Boyle et al., 2011).
(iv) Determination of constraints for the structure fitting due to space group and site symmetry, supported by cctbx (Grosse-Kunstleve et al., 2002).
(v) Determination of the expected cell volume range based on volume increments according to Hofmann (2002).
(vi) Clustering, ranking and filtering of sets of structural models using S 12 , complemented by evaluation of other descriptors such as molar volumes and lattice energies calculated by integrated force-field routines from CRYSCA (Schmidt & Englert, 1996;Schmidt & Kalkhof, 1998).
(vii) Generation of restraints for the Rietveld refinements based on CSD statistics provided by Mogul (Bruno et al., 2004).
Other third-party software programs used to facilitate certain tasks and to complement existing features of FIDEL-

Interfacing, data processing, reporting and visualization using XML technologies
The development of complex scientific software in the academic domain faces some major problems with regard to usability and flexibility of the software in practice, as well as perspectives for long-term development. In addition to the requirements in terms of automation and performance, it is important to provide the user with suitable means for (i) preparing, adapting and configuring the inputs, (ii) customizing project design and control, (iii) easy evaluation, visualization and further processing of the results, and (iv) interfacing with other programs. Besides the implementation of basic capabilities to define and execute complex process chains, these objectives have primarily been achieved by the use of open standards and technologies based on XML (extensible markup language): (i) XML and CML (chemical markup language) (Murray-Rust & Rzepa, 2011) for input and output of configurations, results and chemical structures.
(ii) Interactive HTML pages and SVG graphics for reporting, visualization and publishing.
(iii) XSLT (extensible stylesheet language transformation) for data processing, interfacing and report generation.
Thus a high degree of flexibility, customizability, transparency, validatability and portability can be realized while minimizing the non-trivial installation prerequisites and requirements regarding the programming skills of advanced users.

Experimental details
4.1. X-ray powder diffraction X-ray powder diffraction (XRPD) data of the samples of DFQ, DCQ, DCDHQ and CuCP (Fig. 1) were recorded with Cu K 1 radiation in transmission mode at room temperature. The samples were measured between polymer films on a Stoe Stadi-P diffractometer equipped with a curved Ge(111) primary monochromator and a linear position-sensitive detector. The program suite WinX POW (Stoe & Cie, 2006) was used for data collection. Details are provided in Section S1 in the supporting information.
Selected crystal structure models were subjected to DFT-D lattice energy minimization (Neumann, 2008;van de Streek & Neumann, 2010) with CASTEP (Clark et al., 2005), using the combination of the PBE functional (Perdew et al., 1996) and the semi-empirical dispersion correction according to Grimme (2006). The convergence criteria were set to 'fine' level (energy tolerance 10 À5 eV atom À1 , maximum force tolerance 0.03 eV Å À1 , maximum stress tolerance 0.05 GPa, maximum displacement tolerance 0.001 Å ). At first the energy minimizations were performed with fixed unit-cell parameters to yield improved structural models and molecular geometry restraints for the Rietveld refinement. For further validation and energy ranking the resulting structural models were then optimized without constraining the cell dimensions.

Crystal structure prediction
For DCQ a crystal structure prediction was performed in the most common space groups using the force-field program CRYSCA (Schmidt & Englert, 1996;Schmidt & Kalkhof, 1998). CRYSCA performs a global lattice energy minimization, starting from a set of 10 5 -10 7 random structures. The optimized structures are ranked by energy. The structure prediction is continued until the lowest-energy structures have been found several times from different starting points. In CRYSCA a crystal structure is described in the same way as in FIDEL-GO (see Section 1.2). The intermolecular interactions were treated as a sum of van der Waals, hydrogen-bonding and Coulomb terms. For the van der Waals potentials the DREIDING parametrization in its recommended 6-exp form (Mayo et al., 1990) was used. Hydrogen-bond energies were calculated using a self-developed 10-12 potential without angle dependency, but this potential did not yield very accu-rate structures. The Coulomb energy was calculated from atomic charges derived by the electrostatic potential (ESP) approach (Chirlian & Francl, 1987).

Rietveld refinements
All Rietveld refinements were performed using TOPAS Academic, Versions 4.1, 4.2 and 6 (Coelho, 2007(Coelho, , 2009(Coelho, , 2016(Coelho, , 2018. The robust automatic refinement procedure in stage AR consists of a sequence of seven TOPAS calls configured and controlled by FIDEL. Automatic refinements of structural models with Z 0 < 1 were performed in subgroups with Z 0 ! 1. Restraints for bond lengths and bond angles were usually automatically derived from median values from CSD statistics provided by Mogul (Bruno et al., 2004). Additional 'flatten' restraints were applied for planar moieties. The final usercontrolled Rietveld refinements in stage UR usually started from the structural models obtained by the automatic Rietveld refinements. Molecular geometry restraints for the usercontrolled refinements were preferably taken from DFT-D calculations. Details appear in Section S2 in the supporting information.

SDPD of DFQ by global optimization
4,11-Difluoro-quinacridone (C 20 H 10 F 2 N 2 O 2 , DFQ, Fig. 1) is a non-commercial orange pigment. The corresponding chloro derivative 4,11-dichloro-quinacridone is polymorphic with four described polymorphs (Hunger & Schmidt, 2018). Its -phase crystallizes in Pbca, Z = 4 (Chung & Scott, 1971). For DFQ no crystal structures are known. The powder pattern of DFQ exhibits only about nine sharp reflections and a number of broad humps (see Fig. 4). Reliable indexing is not possible. Nevertheless, this powder pattern was sufficient to determine the crystal structure (Fig. 5) using the global optimization method of FIDEL-GO.
In the SDPD procedure the molecule was treated as a rigid body of the point group C 2h . According to space-group statistics, about 95% of all molecules with C 2h symmetry are located on crystallographic inversion centres (Pidcock et al., 2003). Hence, global optimization runs were performed in the space groups P1, P2 1 /c, Pbca and C2/c with molecules on inversion centres, and additionally in statistically common space groups with molecules on the general position (P2 1 , P2 1 /c, P2 1 2 1 2 1 ), and in P1, Z = 1. Additional runs were performed in C2/m with molecules on positions with site symmetry 2/m (see Table 1). The molar volume of quinacridone derivatives is usually about 10% smaller than the Final Rietveld refinement of DFQ (P2 1 /c, Z 0 = 0.5, model B), showing the experimental X-ray powder diagram (black dots), the simulated diagram of the refined structure (red line), the difference curve (blue line) and reflection positions (black bars).

Table 1
Evolution of the SDPD of DFQ: number of structural models at successive levels of the overall procedure.
The numbers of random structures refer to trial structures with non-overlapping molecules within the given cell volume range. To test the procedure on different computer platforms, additional GO runs were performed for all crystal symmetries, in particular for the triclinic space groups. Runs with adapted settings were performed for P2 1 /c with Z 0 = 0.5 and Z 0 = 1 to verify that they yield the best results matching the powder data equally well. The numbers of Rietveld refined structures (AR, UR) are listed under the crystal symmetries from which they originated, irrespective of the space groups actually used in the refinement. predicted volumes based on the volume increments of Hofmann (2002), due to the densestacking of the aromatic systems. Hence, the volume range V/Z for the global optimization of DFQ was set to roughly 365 Å 3 AE 10%. The stepwise evolution of the SDPD process is outlined in Table 1. After initial checks of volume and geometry (GO1) a total number of about 21 000 000 trial structures were evaluated by simple comparison of the simulated and observed powder patterns (GO2). Only $137 000 of the trial structures (0.65%) qualified and were fitted to the experimental pattern (GO3). Of these fitted structure candidates, 7828 (5.7%) also entered the post-optimization cycle of more accurate fits (GO4). An overview of the primary result structures with the highest S 0 12;bc values in each of the nine crystal symmetries is shown in Table 2.
The re-evaluation fine fit led to a list of 122 structural models (RE2). The best 61 of them with reference similarities S 0 12 between $0.98 and 0.99 in effectively three crystal symmetries were inspected as candidates for selection by the user [see Table 3(a), and Table S2 in the supporting information]. Eighteen structure candidates were selected for the evaluation by automatic Rietveld refinements (AR), resulting in R wp values of 7-17% [ A unique solution, which is better than all the others, could not be identified. Four structural models gave a similarly good fit to the experimental data and exhibited chemically reasonable packing motifs, each of them found several times with minor differences. However, the packing in each instance is considerably different: Model B: P2 1 /c (Z 0 = 0.5). Criss-cross pattern similar to the -phase of unsubstituted quinacridone (Paulus et al., 2007) (cf. Fig. 5).
Model A: P2 1 /c (Z 0 = 1). Criss-cross packing motif similar to model B, but in a cell double the size of model B.

Figure 5
The crystal structure of DFQ (P2 1 /c, Z 0 = 0.5, model B). The view is along [001]. Table 3 SDPD of DFQ, structural models A-D: (a) final results of the global optimization, (b) automatic Rietveld refinements, models with Z 0 = 0.5 refined in subgroups with Z 0 = 1 or in P1, (c) DFT-D geometry optimizations, energies given relative to the lowest energy of all calculations, and (d) user-controlled Rietveld refinements with molecular geometry restraints derived from DFT-D.
For more details see Tables S2, S3, S4 and S5 in the supporting information.
GoF stands for goodness of fit.
Model D: P2 1 /c (Z 0 = 1). A combination of the packing motifs of models B and C that has so far not been found in any phase of quinacridone or its derivatives.
DFT-D calculations of structural models A-D with CASTEP (DO) confirmed the close correspondence of models A and B and revealed a significantly lower likelihood of models C and D [ Table 3(c)].
Finally, six candidates were subjected to a user-controlled Rietveld refinement (UR). Two pairs turned out to be duplicates, leaving four different structures (A, B, C and D) [ Table 3(d)]. The top ranking structure A led to a better Rietveld fit to the data than B, but at the cost of double the number of atom position parameters. Structures A and B are virtually identical. In A the molecules are located on a general position in P2 1 /c, Z = 4, while in B they are located on an inversion centre in P2 1 /c, Z = 2. Bearing in mind the limited quality of the diffraction data, the deviation from the higher symmetry in structure A is not significant. Hence, model B (Fig. 5, CIF file in the supporting information) having the higher symmetry should be regarded as the correct one. Its final Rietveld plot is shown in Fig. 4. The hypothesis that B is the correct structure was later confirmed by further investigations, including solid-state NMR measurements, alternative DFT-D calculations and fits to the pair distribution function (Schlesinger et al., 2022).

SDPD of DCQ by global optimization or by screening of CSP results
2,9-Dichloro-quinacridone (C 20 H 10 Cl 2 N 2 O 2 , DCQ, Fig. 1) is an industrial red pigment used for automotive coatings (Hunger & Schmidt, 2018). The structures of theand -phases, which are formed by high-temperature recrystallization or sublimation, are known from single-crystal X-ray analyses (Senju et al., 2005a,b). The crystal structure of the -phase, which is formed during the synthesis, is hitherto not known. The -phase is inherently nanocrystalline (see Fig. 6). The pigment is nearly insoluble in all solvents, even at elevated temperatures. All recrystallization attempts failed. It was not even possible to improve the crystallinity: under mild conditions the powder pattern of the sample did not improve, while under harsher conditions the phase changed to the more stable -phase.
The XRPD pattern of the -phase contains only about 14 reflections and cannot be indexed reliably. Here we present the structure determination of the -phase using FIDEL-GO in two different ways: (i) by structure solution from scratch with a global FIDEL-GO fit and (ii) using FIDEL-GO to screen the results of CSP (see Section 4.3).
5.2.1. SDPD from scratch by global optimization. The global optimization runs were performed in a similar way to that described for DFQ (see Section 5.1). The best primary result structures in each of nine crystal symmetries (RE1) are shown in Table S6 in the supporting information. The final results of the global optimization (RE2) yielded 60 structure candidates with S 0 12 values between 0.97 and 0.99 (see Table 4). The best ranking structure solution with a reference similarity S 0 12 of 0.987 was found in P1 (Z 0 = 1) and in P1 (Z 0 = 0.5). The second best model, also in P1 (Z 0 = 0.5), had an S 0 12 value of 0.981, but an unrealistically low molar volume.
The evolution of the SDPD of DCQ from the initial trial structure to the final Rietveld refinement is presented in Table 5 Table 4 DCQ: final results of the global optimization by FIDEL-GO (stage RE2).

Rank
Space group  global optimization with FIDEL-GO showed a reasonable molar volume and packing, and matched the experimental data significantly better than all the other candidates. The DFT-D geometry optimization (DO) of this structure indicates its correctness too. The structure changed only slightly during geometry optimization with fixed unit-cell parameters (see Fig. S1 in the supporting information). The final usercontrolled Rietveld refinement (UR) resulted in a good fit to the powder data (Fig. 6). The structure is shown in Fig. 7 and available in the CIF file in the supporting information. The molecules are arranged in chains parallel to the [110] direction. Each molecule is connected to two neighbouring molecules via double hydrogen bonds [ Fig. 7(a)]. The chains are not fully planar but exhibit steps of 1.4 Å between neighbouring molecules [ Fig. 7(b)]. The final crystal data are given in the supporting information. 5.2.2. SDPD by screening of CSP results. Possible crystal structures of DCQ were predicted by global lattice energy minimizations using the program CRYSCA (Section 4.3). An overview of the CSP results is shown in Table S7 in the supporting information. None of the predicted structures showed a powder diagram similar to the experimental one. A total of 2190 low-energy structures from CSP in different crystal symmetries were screened by local fitting to the powder data with FIDEL-GO (Table 6). The highest reference similarity S 0 12;bc was obtained for a structure candidate in P1, Z = 1 [ Fig. 8(a)]. After the FIDEL fit the resulting simulated powder pattern of this structure was quite similar to the experimental one [ Fig. 8(b)]. The unit-cell parameters changed by up to approximately 0.6 Å or 1 , while the molecular orientation changed by up to $3 during the fit. The example in Fig. 8 impressively demonstrates the power of a local fit with FIDEL. The resulting structure has P1, Z 0 = 0.5 symmetry and is practically identical to the structure determined by global optimization from scratch (Section 5.2.1).

SDPD of DCDHQ by global optimization
2,9-Dichloro-6,13-dihydro-quinacridone (C 20 H 12 Cl 2 N 2 O 2 , DCDHQ, Fig. 1 Table 5 Stages in the evolution of the SDPD of DCQ for the structure that finally turned out to be the correct one. The initial random trial structure (GO1-GO2) in P1 evolved through the fitting steps of the global optimization run (GO3-GO4) and went through automatic cell transformation and re-evaluation fine fit (RE). It was then transformed to P1 and subjected to the final user-controlled Rietveld refinement (UR). The similarity values S 12 (l) refer to the comparison of the simulated pattern to the background-corrected experimental pattern based on the 2 comparison range set for the global optimization. S 0 12;tp refers to the comparison of the powder pattern simulated by TOPAS and the experimental pattern. E FF denotes the force-field energy. Random Fast raw fit Better fit 1 Better fit 2 Automatic transformation Re-evaluation fit

Figure 7
The compound is a precursor in the industrial synthesis of DCQ (Hunger & Schmidt, 2018). Like DCQ, DCDHQ is obtained as a poorly crystalline powder, with an X-ray powder diagram that contains only about 17 reflections and cannot be reliably indexed (see Fig. 9). Because of the two sp 3 carbon atoms in the central ring the molecule shows some conformational flexibility. According to the CSD, the majority of similar molecules show a tilted conformation (for details see Section S5.1 in the supporting information). For DCDHQ, two intramolecular degrees of freedom were considered, allowing for a twist and a tilt of the central ring. Global optimization runs with the two intramolecular degrees of freedom were performed in space groups P1, P2 1 , C2/c, P2 1 /c, P2 1 2 1 2 1 , Pbca and Pna2 1 with the molecule on the general position. Additional runs were performed in space groups P1 and P2 1 /c with a rigid planar molecule on an inversion centre (see Table S8 in the supporting information). The best primary result structures (stage RE1) in each of the nine crystal symmetries are listed in Table S9 in the supporting information. The 400 structural models with the highest similarity values were re-evaluated by a fine fit (stage RE2), yielding 99 structures in five different crystal symmetries with reference similarities S 0 12 between 0.98 and 0.99, and R wp values in the range of 15-24% (Table 7)

Figure 8
The best structure of DCQ in P1, Z = 1 from the screening of the CSP results, showing the background-corrected experimental powder pattern (dots) versus simulated patterns (red lines) of the structural model (a) before and (b) after the FIDEL fit. simple reflection profile, constant FWHM; see Section 1.2). Such a simple pattern simulation would not be suitable in a Rietveld refinement, but is fully sufficient for evaluation of the similarity measure S 12 , which depends much less on good modelling of the powder pattern concerning peak profiles, background, intensity corrections etc. Table 7 reveals that there are many structures that have similar unit-cell parameters (4, 6-7 or 16 Å , or doubled values) despite having different space groups and Z values. Apparently, these values reflect the major peak positions in the powder pattern. A similar situation is frequently observed for unsatisfactory indexing attempts of the powder pattern of a poorly crystalline sample. After thorough evaluation by the user, 16 structures were selected for the automatic Rietveld refinement (stage AR, Table S10 in the supporting information) and eight of them were subjected to DFT-D geometry optimization (stage DO, Table S11 in the supporting information). In the automatic Rietveld refinements the R wp values dropped to roughly half of the values already achieved by FIDEL-GO, due to better modelling of peak profiles and background, and refinement of all atomic coordinates.
Finally, five structures were subjected to user-controlled Rietveld refinements (stage UR, Table 8). The two structural models A3 and B1 exhibited the lowest energy in the DFT-D geometry optimization with free unit-cell parameters. Since their unit-cell parameters had changed substantially in the DFT-D calculations, the local fitting procedure of FIDEL was applied to re-adjust the unit-cell parameters of these models to the powder pattern before the final Rietveld refinement. The crystal data for models B1Z1, B1, C1 and A3 from the Rietveld refinement and the DFT-D optimized structure of A3 are available in the supporting information.
The powder pattern is dominated by 0kl reflections (h0l for model C1). All information about a*, * and * is buried in the broad group of peaks at 24-28 . Hence, different unit cells match the pattern similarly well. All models are chemically sensible and contain chains of molecules connected by double hydrogen bonds. Models A1, A3 and A4 are quite similar and contain chains with steps, as in DCQ. Model B1 contains wavy chains. In model C1 the chains run in two different directions. Model B1 gives the best fit, but at the expense of double the number of atomic parameters. Model B1 was transformed from P1, Z = 2 to P1, Z = 1, yielding model B1Z1. After a careful Rietveld refinement the fit was quite good (Fig. 9). Nevertheless the hydrogen-bond topology remains questionable. Even with DFT-D calculations it remains unclear which is the correct structure. A detailed discussion, including Rietveld plots and figures of the structures, is provided in section S5 in the supporting information.   Table 8 DCDHQ: user-controlled Rietveld refinements (stage UR).
Models B1 and A3 started from the lowest-energy structures of the DFT-D calculations after re-adjustment of the unit-cell parameters by FIDEL fitted to the experimental pattern. Models C1 and A4 started from the final global optimization results (Table 7). Model A1 started from the best structural model of the automatic Rietveld refinement (see Table S10 in the supporting information) after (re)transformation from P1 to P1. Model B1Z1 started from B1 after transformation to P1, Z = 1. The Rietveld refinement of B1Z1 was performed with different settings, hence the R and GoF values cannot be compared with the other refinements. Molecular geometry restraints were derived from DFT-D. This example shows both the power and limitations of FIDEL-GO's global optimization approach: even with very poor powder data, FIDEL-GO is able to find the crystal structures that match the diffraction data. However, it may happen that the Rietveld refinements do not allow the identification of the correct structure. In such cases, additional information is required, e.g. from electron diffraction (Gorelik et al., 2009(Gorelik et al., , 2021, elaborate solid-state NMR (Bryce & Taulelle, 2017;Schlesinger et al., 2022) or PDF analyses (Billinge, 2019;Schlesinger et al., 2021).

SDPD of CuCP by global optimization
Dichloro-bis(pyridine-N)copper(II) {[CuCl 2 (C 5 H 5 N) 2 ] n , CuCP, Fig. 1} is a member of a series of coordination polymers which we reported recently (Krysiak et al., 2014;Zhao et al., 2017;Heine et al., 2018Heine et al., , 2020. The compound consists of infinite copper-halogen chains of trans-edge-sharing distorted octahedra. It was used to test FIDEL-GO's capabilities to work with different types of intramolecular degrees of freedom. The sample was of sufficient crystallinity, but the measured X-ray powder data suffered from a low signal-tonoise ratio. Therefore, the powder pattern was smoothed by FIDEL-GO using PeakSearch (Oishi-Tomiyasu, 2012).
The molecular fragment CuCl 2 (C 5 H 5 N) 2 is shown in Fig. 10. The geometry of the pyridine ligand and reasonable ranges for the Cu-N and Cu-Cl bond lengths were derived from similar complexes and Mogul statistics of structures in the CSD. Global optimization runs with the fragment CuCl 2 (C 5 H 5 N) 2 on a general position and ten internal degrees of freedom (Fig. 10) were performed in space groups P1, P2 1 , C2/c, P2 1 /c and P2 1 2 1 2 1 . Additional runs with copper on a crystallographic inversion centre were performed in P1, C2/c, P2 1 /c and Pbca using the fragment CuCl(C 5 H 5 N) with four degrees of freedom, i.e. the Cu-N and Cu-Cl distances, the N-Cu-Cl angle and the rotation around the Cu-N bond (see Table S12 in the supporting information).
Global optimization by FIDEL-GO resulted in a number of top ranking structure candidates in P2 1 /c (Z 0 = 0.5) and P2 1 (Z 0 = 1). An overview of the best primary result structures (stage RE1) in each of the nine crystal symmetries is shown in Table S13 in the supporting information. The final results of the global optimization (stage RE2, Table 9) show the high reliability and accuracy of the FIDEL-GO fit. The three structures with the highest S 0 12 were subjected to automatic Rietveld refinements (stage AR, Table S14 in the supporting information), evaluated by DFT-D calculations (stage DO, Table S15 in the supporting information) and finally subjected to a user-controlled Rietveld refinement (stage UR) to the original unsmoothed powder data. All three structures proved to be practically identical. The evolution of the final structure is shown in Table 9 and the Rietveld plot in Fig. 11. Although the search fragments contained only an incomplete coordination sphere with only four of the six ligands, all final structures exhibited the correct polymeric structure with octahedrally coordinated Cu atoms.
The crystal structure determined here from scratch from powder data with a low signal-to-noise ratio is in excellent agreement with the structure determined by Morosin (1975) from Mo K single-crystal data (see Table 9, and structure overlay in Fig. S5 in the supporting information). This demonstrates the capability and consistency of FIDEL-GO's global optimization approach to structure determination from low-quality powder data, even with a flexible coordination complex.  Table 9 CuCP: final results of the global optimization by FIDEL-GO (stage RE2).
For the best structure candidate the automatic Rietveld refinement (AR) and the final Rietveld refinement (UR) after transformation to P2 1 /c (Z = 2) are shown, together with the published reference structure (CSD refcode PYRCUC02) after transformation from P2 1 /n (Z = 2) to P2 1 /c (Z = 2) for comparison. The S 0 12 values given for the Rietveld refinements refer to the comparison of the powder pattern simulated by TOPAS to the smoothed experimental pattern (AR) or the original experimental data (UR).

Rank
Space group Z 0

Figure 10
CuCP: a fragment on a general position with internal degrees of freedom for bond lengths (black), angles (red) and torsions (violet).

Conclusions
A method for the ab initio determination of organic and metal-organic crystal structures from powder data without prior indexing has been developed and implemented in the program FIDEL-GO. The global optimization approach uses the similarity measure S 12 , which is based on weighted crosscorrelation functions, for ranking, fitting and clustering of trial structures. SDPD from scratch requires only a reasonable molecular geometry and a general setup of the global search space in selected crystal symmetries. The unit-cell parameters, molecular position and orientation, and selected internal degrees of freedom are fitted simultaneously to the powder pattern.
In order to realize an efficient and effective exploration of the global search space, a hierarchical search strategy has been developed. The global optimization starts from a huge number of random trial structures and combines pre-selection by a rough comparison to the powder data with local optimizations of suitable candidates. The standard overall SDPD procedure of FIDEL-GO consists of three major steps: (i) the actual global optimization runs (GO), (ii) the re-evaluation of top ranking primary results by the best possible FIDEL fits (RE) followed by automatic Rietveld refinements (AR) of promising user-selected structures, and (iii) the final identification and refinement of one or several best matching structure candidates based on user-controlled Rietveld refinements (UR) and, optionally, DFT-D calculations (DO).
FIDEL-GO's robust approach to pattern comparison and structure fitting is suitable for experimental patterns of very low quality and for nanocrystalline powders. With the implementation of the global optimization method and the integration of many auxiliary components, FIDEL-GO has evolved into an almost comprehensive application framework. The elaborate multi-step procedure can easily be adapted to a wide range of application scenarios in powder diffraction. By downscaling and specific configuration of the method, it is possible to make use of additional information and to adapt the method to the specific characteristics of a problem.
The method was successfully applied to the ab initio structure determination from unindexed powder data of (metal-)organic phases consisting of small to medium-sized rigid or moderately flexible molecules. It is already viable in terms of computing time on a standard PC. With the increasing performance of common equipment and the growing availability of distributed computing environments, SDPD from scratch by global optimization shall soon be a common option.
A remarkable aspect arising from the application of the method to SDPD from scratch using 'problematic' powder data is the challenge of the paradigm 'one powder -one structure'. The global optimizations may yield several solutions with a similarly good fit to the experimental pattern. Even if the method does not provide a unique solution, the obtained structural models are very valuable. On the one hand they give a very good impression of the possible crystal structures. On the other hand there are many other analytical tools available to resolve which of the structures is the correct one, e.g. computational methods such as DFT-D (as shown here), specific modelling and refinement techniques (e.g. with respect to disordered structures) or complementary experimental approaches such as electron diffraction, solid-state NMR or vibrational spectroscopy.
In order to assess the method's full scope and limitations, more rigorous evaluations on a broader scale have to be performed. This will include its application to a larger variety of actually existing examples, as well as systematic investigations based on specifically designed sets of simulated powder patterns. The method can complement existing approaches as a useful tool for the acquisition of structural information from powder diffraction data that is otherwise difficult or impossible to obtain and is in most cases discarded due to the lack of reliable indexing. Based on our experience with FIDEL-GO and the characteristics of the approach, we expect the method will also serve well for more flexible molecules, structures with Z 0 > 1, disordered structures and phase-impure samples.
Finally, it should be noted that the similarity measure S 12 can also be applied to pair distribution functions . There, S 12 and the structure solution approach of FIDEL-GO have been successfully used for the structure determination of organic compounds from scratch by a fit to the PDF without prior indexing .