- 1. Introduction
- 2. Method description
- 3. Polymorph generation probabilities
- 4. Polymorph sampling with EVCCP
- 5. Conclusion
- A1. Theoretical description for an extended variable reference coupled to crystal polymorph configurations
- A2. Calculation of free energy differences
- Supporting information
- References
- 1. Introduction
- 2. Method description
- 3. Polymorph generation probabilities
- 4. Polymorph sampling with EVCCP
- 5. Conclusion
- A1. Theoretical description for an extended variable reference coupled to crystal polymorph configurations
- A2. Calculation of free energy differences
- Supporting information
- References
research papers
Polymorph sampling with coupling to extended variables: enhanced sampling of polymorph energy landscapes and free energy perturbation of polymorph ensembles
aChemistry Department, Curtin University, Bentley, WA, 6102, Australia, bDepartment of Chemistry, New York University, New York City, NY, 10003, USA, cCourant Institute of Mathematical Science, New York University, New York City, NY, 10003, USA, and dNew York University-East China Normal University Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai 200062, China
*Correspondence e-mail: eric.j.chan@curtin.edu.au
This article is part of a collection of articles covering the seventh
prediction blind test.A novel approach to computationally enhance the sampling of molecular crystal structures is proposed and tested. This method is based on the use of extended variables coupled to a Monte Carlo based crystal polymorph generator. Inspired by the established technique of quasi-random sampling of polymorphs using the rigid molecule constraint, this approach represents molecular clusters as extended variables within a thermal reservoir. Polymorph unit-cell variables are generated using pseudo-random sampling. Within this framework, a harmonic coupling between the extended variables and polymorph configurations is established. The extended variables remain fixed during the inner loop dedicated to polymorph sampling, enforcing a stepwise propagation of the extended variables to maintain system exploration. The final processing step results in a polymorph energy landscape, where the raw structures sampled to create the extended variable trajectory are re-optimized without the thermal coupling term. The foundational principles of this approach are described and its effectiveness using both a Metropolis Monte Carlo type algorithm and modifications that incorporate replica exchange is demonstrated. A comparison is provided with pseudo-random sampling of polymorphs for the molecule coumarin. The choice to test a design of this algorithm as relevant for enhanced sampling of crystal structures was due to the obvious relation between molecular structure variables and corresponding crystal polymorphs as representative of the inherent vapor to crystal transitions that exist in nature. Additionally, it is shown that the trajectories of extended variables can be harnessed to extract fluctuation properties that can lead to valuable insights. A novel thermodynamic variable is introduced: the free energy difference between ensembles of Z′ = 1 and Z′ = 2 crystal polymorphs.
Keywords: crystal structure prediction (CSP); polymorphism; enhanced sampling; lattice ensemble free energies.
1. Introduction
In-silico prediction (CSP) has gained significant interest among material engineers, chemical control specialists, and solid-state organic chemists (Chan et al., 2021; Davey & Garside, 2000; Desiraju, 1989, 2001; Hartman, 1973; Mullin, 2001; Price, 2013, 2004). However, the precise design of molecular building blocks for targeted packing motifs and desired physical properties remains a challenge in crystal engineering (Bernstein, 2008, 2002; Dunitz, 1995; Gavezzotti, 2006; Kitaigorodskiy et al., 1965; Kitaigorosky, 1973). This is due not only to the complex physical laws governing the packing of molecular crystals but also to the role of crystallization kinetics and factors such as necessitating practical experimentation as the primary means of design. While new approaches to CSP continue to emerge (Bier et al., 2021; Day et al., 2003; Neumann et al., 2008; Price, 2018; Reilly et al., 2016; Schneider et al., 2016; Yu & Tuckerman, 2011; Zhu et al., 2014), the most successful methods often remain closely guarded industrial secrets (Neumann, 2008; Hunnisett et al., 2024a, Hunnisett et al., 2024b).
Despite the wealth of crystal data in the Cambridge Structural Database (CSD), many reliable CSP approaches still rely on energy-based techniques and configurational sampling, rather than being data-driven. While machine learning has made significant strides in predicting protein structures (Jumper et al., 2021; Silver et al., 2018, 2016), the application of similar breakthroughs in CSP remains a challenge. CSP primarily relies on computational chemistry and molecular simulation techniques (Allen & Tildesley, 1987; Frenkel & Smit, 2002; Hermann et al., 2017; Parr & Weitao, 1995; Tuckerman, 2010) to bridge the gap between theory and experimental evidence.
Simulation-driven CSP methods focus on two main objectives: polymorph sampling and ranking stability. Polymorph sampling involves configuration sampling algorithms that require a molecular structure as input, which is the primary topic of this report. Ranking stability aims to accurately predict energy differences between pre-generated polymorph configurations and benefits from insights into crystal et al., 2016; Reilly et al., 2016; Yang & Day, 2021a,b; Hermann et al., 2017; Hoja et al., 2017; Wengert et al., 2021; Rossi et al., 2016).
(CasePolymorph sampling methods frequently utilize Monte Carlo (MC) sampling, basin hopping (BH), et al., 2008; Price, 2004; Yu & Tuckerman, 2011; Rosso et al., 2002; Sobol, 1977; Zhu et al., 2014; Bier et al., 2021). Most MC methods such as BH or Sobol sampling involve a subsequent molecular energy optimization from an higher energy test configuration to identify local minima. A key question pertains to how the probability of generating a structure correlates with a compound's intrinsic ability to crystallize in nature.
(MD) or evolutionary algorithms (NeumannEffective and efficient polymorph sampling algorithms must adapt as molecular systems grow in complexity, which may involve an increased number of torsional Z′). In silico polymorph screening can often be incomplete (Case et al., 2016; Sobol, 1977). CSP faces the challenge of dimensionality as the number of configuration variables increases, making exhaustive searches less feasible. To address this challenge, low-Z′ values are often used, and CSP employs pseudo-random (PR) or quasi-random (QR) sampling methods (van Eijck & Kroon, 1999; Case et al., 2016), or enhanced sampling schemes (Hasenbusch & Schaefer, 2010; Laio & Parrinello, 2002; Liu et al., 2005; Yu & Tuckerman, 2011). These techniques expedite the exploration of the configuration space, ensuring adequate sampling of a wide range of crystal structures and polymorphs.
or more molecules in the (Polymorphs often exhibit complex and diverse structures, and efficiently sampling them within a practical time frame in computational simulations can be challenging. The systems themselves are not inherently non-ergodic, but achieving ergodicity within a reasonable time frame presents a formidable challenge. This challenge also is primarily addressed with enhanced sampling techniques.
This report introduces an enhanced sampling approach, adapted from QR, BH and temperature-accelerated methods, with similarities to umbrella sampling. The method involves stepwise propagation of molecular coordinates represented by extended variables (EVs) in a heat bath, allowing a broader distribution of possible unit cells to be randomly sampled at each step. EVs are reference variables acting as a tool and, as described later, are not necessarily atomic coordinates. During each step, the EVs are held fixed, which biases a pseudo-random polymorph sampling stage. This method is referred to as `Extended Variable Coupled to Crystal Polymorph Monte Carlo' sampling (EVCCPMC or EVCCP).
The reasoning to design and test the EVCCP algorithm for the possibility of enhanced sampling of crystal structures was due to the obvious relation between extended variables being representative of a molecular configuration in the gaseous phase and the statistically biased generation of corresponding polymorphs being representative of inherent vapor to crystal transitions that can exist in nature.
This report aims to introduce and demonstrate EVCCP sampling conceptually within the context of CSP. EVCCP was trialed as part of a recent blind test (Hunnisett et al., 2024b). The specific focus of this report is to demonstrate a modification that enables replica exchange, known as the EVCCP modified replica exchange (EVCCPMRE) approach. It is worth noting that this modification conceptually allows for the exchange of non-ordinal extended variables, such as those associated with symmetry or Z′. However, this is outside the scope of this report and the authors intend to conduct a comprehensive study of EVCCPMRE, particularly focusing on the exchange of as a variable, separately as part of a future investigation.
Within the EVCCP framework, EVs exhibit ergodic behavior. This enables the calculation of thermodynamic properties pertaining to ensembles of crystal polymorph configurations. Specifically, it facilitates the calculation of Free Energy Differences (FED) between variables, such as stoichiometric ratios or different values of Z′. The latter serves as a qualitative measure of a molecule's propensity to form a polymorph with a specific Z′, distinct from a comparison of selected minimum energy polymorphs.
The system of coumarin polymorphs (Shtukenberg et al., 2017) was used as the benchmark compound for this investigation and comparisons are made between EVCCP and vanilla pseudo-random polymorph generation (van Eijck & Kroon, 1999). Coumarin was deemed ideal because it is well suited for rigid body approximation and there are experimentally known forms with different Z′ that crystallize in the same space group.
2. Method description
The EVCCP framework considers two independent atomic systems that describe identical sets of components (i.e. the cluster of Z′ molecules). One system represents the crystal polymorph and the other is a reference system containing extended variables (EVs) (Abrams & Tuckerman, 2008; Laio & Parrinello, 2002; Maragliano & Vanden-Eijnden, 2006; Ciccotti & Meloni, 2011). To elaborate, if the desire is to sample crystal polymorphs with Z′ = 4 then the EVs would represent an isolated cluster of four molecules in the gas phase. Both systems contain atomic coordinates (Ri), where R ∈ and the subscript i represents the ith atom in either system.
In EVCCP, the atomic positional coordinates (Ri) are mapped onto collective variables (CVs) for molecular centers and orientations (Euler angles). A matrix is used to describe the unit-cell parameters for the crystal polymorph (i.e. parallelepiped).
An orthogonal simulation box is used for the reference system that has a volume much greater than the volume occupied by all the atoms. X ∈ is a vector of coordinates with dimensionality (d) representing the CVs in the (d = Z′ × 6) and H is a vector representing the unit-cell parameters. H ∈ are the vector coordinates for a parallelepiped or H ≡ {a, b, c}, where a, b and c are the unit-cell vectors in Å. S are EVs which correspond with X (see Fig. 1).
The
[] describing the coupled systems isIn equation (1), U(X, H) is the surface (PES) of the crystal polymorphs, U(S) is the reference system potential and is a harmonic coupling term with a spring constant (k). For simplicity, in this study the reference system is chosen to be not self-interacting [i.e. U(S) = 0]. However, this is not at all a strict requirement and might be exploited for further investigation. The combined potential of the two systems is initially represented using
In EVCCP, the evolution of the polymorph system is adiabatically (quasi-static) decoupled from that of the reference system because each X and H are evaluated while S remains unchanged and vice-versa, i.e. the evolution or change of either system is subject to largely separate characteristic time-scales. If k = 0, the X and H are obtained by a minimization of the polymorph structure energy U(X, H) from some initial configuration (X0, H0). This final configuration will also have a generation probability P(X, H). Instead, k > 0 and X0 are instantiated with some reference coordinate S and independently , where a and b are limits on unit-cell vectors defined to specify a range of polymorph densities. The configurational energy currently undergoing minimization is denoted as U(X, H|S). This notation is employed here to represent the potential associated with the joint probability [i.e. P(S)*P(X, H|S) = P(X, H, S)], with U(X, H|S) specifically emphasizing the fixed nature of S. Exploiting the conditional probability P(X, H|S) is a conceptual underpinning of EVCCP. In the next section estimates for P(X, H|S) are made by statistical inference.
S are stepwise propagated with temperature (T)1 according to the metropolis MC updating scheme (Metropolis et al., 1953). The configuration energy used for these updates (detailed in Appendix A) is given by
Here Umin and Xmin are the respective energy and CV coordinate of the polymorph that was generated from a subset number of polymorphs (M) generated each EVCCP step which share the same fixed S coordinate (essentially argmin[U(X, H, S)]). To determine Umin the harmonic energy penalty for the mth polymorph must be taken into account. Umin and Xmin are elements of subsets of {X1, X2, …, XM} and {H1, H2, … HM} under the scope of a set value for S. Thus,
where Uadb(Xm, Hm|S) represents the unbiased energy component of the mth polymorph generated adiabatically by using the biased PES U(X, H|S) and X0 = S. This strategy involving optimization of multiple polymorphs to obtain Umin as part of each EVCCP step is likened to the particle swarm approach (Kennedy & Eberhart, 1995) and also has similarities with umbrella sampling (Torrie & Valleau, 1977), as demonstrated schematically in Fig. 2.
A flowchart indicating the most relevant steps for coding of the EVCCPMC algorithm is shown as Fig. 3. Initialization of the algorithm requires S0 (iter = 0, i.e. 0th iteration) which is read from a previously known or randomly selected polymorph. This Siter = S0 is input to the structure generator which is used to generate the mini-batch of M energy-minimized structures (see Fig. 2). It is the global minimum energy polymorph from the mini-batch that is used to obtain U(X, H|Siter). The next step is the metropolis update for Siter+1, i.e. generation of a test configuration Stest = Siter + ΔS. A different batch of M structures are generated using Stest as the seed EV resulting in the test configuration energy U(X, H|Stest) required for the update. This procedure is iterated (iter = iter+1) to determine each Siter until N structures have been sampled. For the algorithm to work effectively, an Ethreshold must be declared so that polymorph configurations with U(X = Stest, H) > Ethreshold can be rejected prior to the polymorph energy optimization because values for ΔS and H are generated randomly ( where B = kBT|ΔS|max) and can lead to unsuitable pre-optimization configurations. Each MC update for Stest is related to the corresponding estimate for P(X, H|Stest) i.e. the polymorphs sampled by the polymorph generator for Stest. Each step in the EVCCPMC trajectory contains both Siter, representing a node in a Markov chain of EV coordinates, as well as the collection of all M local minimum polymorphs.
3. Polymorph generation probabilities
When using PR or EVCCP sampling, the respective probabilities for polymorph generation P(X, H) or P(X, H|S) can be estimated using
where Nhits is the total number of hits obtained for a specified polymorph (X = X0, H = H0), N is the total number of polymorphs generated for a PR search and M (the mini-batch size) is the number of polymorphs generated for a EVCCP step. Some benchmark for the dependence of these ratios on sampling sizes pertaining to this work will now be demonstrated for different Z′.
P(X0, H0|S) is further evaluated through histogramming and 2D-projection of P(X0, H0|S − S0). All sampling was performed in P212121 with Z′ = 1, 2 or 3 corresponding with the known coumarin polymorphs (i.e. form V, III and IV).
3.1. Sampling of probability distributions
Fig. 4 are the results from PR [Figs. 4(a)–4(c)] and EVCCP [Figs. 4(d)–4(f)] sampling of the set of N, M = {5, 10, 20, 100, 1000} for Z′ = 3 (results for Z′ = 1, 2 are made available in the supporting information). For this part of the study only a gentle coupling (k = 2) was applied for EVCCP such that effects are mostly resulting from setting X = S. Figs. 4(a) and 4(d) compare the overall post-sampling unbiased optimized energy distributions P[U(X, H)] which for EVCCP results are denoted as P[Uunb(X, H|S)]. As expected, the estimate of these distribution functions becomes smoother as sample size (N) increases. For pseudo-random searches the P[U(X, H)] distribution appears more characteristic of the Maxwell Boltzmann distribution, the strong departure from this as shown in EVCCP sampling demonstrates the expected key differences between P(X, H) and P(X, H|S). In general, P[Uunb(X, H|S)] is bimodal exhibiting peaks roughly situated at 〈U(X, H)〉 as well as U(X0, H0).
Figs. 4(b), 4(c) and 4(e), 4(f) compare the polymorph sampling using a measure of the variance of the molecular center components in X from those components corresponding with the minimum energy polymorph that was identified in that particular sampling distribution [i.e. Xmin from Umin as shown in equation (4)]. The variance measure for molecular centers (RMSDcom) is thus defined as,
That is, polymorphs with RMSDcom = 0 represent the global minimum polymorph identified in a distribution. In Figs. 4(b) and 4(e) the RMSDcom is plotted against the harmonic biasing penalty term in equation (2) . As expected, the value of this bias (k = 2) increases with the RMSDcom. Also, the overall magnitudes of RMSDcom are much lower for EVCCP generated polymorph distributions.
Figs. 4(c) and 4(f) are plots of RMSDcom against the polymorph energy U(X, H, S) from equation (2) so that each polymorph energy includes the biasing contribution relative to S0 irrespective of how it was generated. Vertical dashed lines represent when the X was set to be S0 for coumarin form IV. Figs. 4(c) and 4(f) are the proof of the important fact that RMSDcom = [(S0, com − Xmin, com)2]1/2 can be seen to approach zero when sampling from a P(X, H|S) distribution as opposed to sampling from P(X, H).
The plots in Fig. 4 demonstrate the effect of increasing accuracy for P(X, H) or P(X, H|S) as sample sizes N, M → ∞. The comparison of Figs. 4(c) with 4(f) suggests that M can be small, in this case 5 < M < 10, so that Xmin = S0.
Raw MC generated polymorph data is commonly removed of duplicate configurations prior to stability ranking. The number of distinct or unique structures is denoted as . Fig. 5 shows bar plots of versus log(N) for the EVCCP searches [Fig. 5(a)] with versus log(M) for PR searches [Fig. 5(b)]. N, M values were ranged between [5, 2000]. Bars are stacked as triplets representing Z′ = 1, 2, 3 from left to right. The error bars show the ratio or . Markers on the right of each triplet are the corresponding P(X0, H0|S0) or P(X0, H0) estimate for each set of data [see equation (5)]. The center of each triplet is the log(N) or log(M) value for that group. As expected, the results demonstrate that and increase with Z′. Also when M = N, is lower than and converges to small values with increasing M indicating that EVCCP does generate less unique configurations than a random search (the only exception is the case of Z′ = 1 PR search). P(X0, H0|S0) estimates are higher than P(X0, H0) and in fact P(X0, H0) is negligible for Z′ > 1 which assists to confirm S will bias sampling for a specific polymorph.
3.2. Sampling of polymorph conditional probability distributions and effect of coupling (k) magnitude
To demonstrate the P(X, H|S) underlying EVCCP, the related probability distribution P(X0, H0|S0 + ΔS) was estimated from sampling [see equation (5)] with a specific deviation (ΔS where and B = |ΔS|max) from S0 (i.e. S0 + ΔS = S). It is assumed that P(X0, H0|S0) will be highest (i.e. B = 0), with P(X0, H0|S) → 0 as . Differences in the curvature of P(X0, H0|S − S0) distributions for the Z′ = 1, 2, 3 example forms are mapped as 2D-projections (i.e. multi-variate → bi-variate) as the coordinate space (shown in Fig. 6) using the integer values for [P(X0, H0|S − S0)]−1 provided in Table 1. The values shown in Table 1 are directly related to the EVCCP parameter (M) required to ensure the specific polymorph can be generated. Clearly, the projections becomes narrower as Z′ increases. The reasonable fit of Table 1 values using a bi-variate Gaussian function demonstrates the suitability of the Gaussian approximation used for evaluating U(X, H|S) [see equation (3), and equations (32) and (33) in Appendix A].
|
The expected effect of increasing the spring constant magnitude (k) on P(X, H|S) distribution is an increased probability at the origin and an overall narrower distribution. This is also shown in the sampling and the comparison between k = 0 and k = 1000 is shown in Table 1 and Fig. 6 for all Z′ examples.
The actual effect of k on polymorph generation each EVCCP step is best appreciated using Fig. 7. In Fig. 7 the reference EV are the same as in Table 1 only that M = 20 polymorphs are generated with either k = 10 or k = 1000. The resulting structures are arranged as a multi-structure projection down b with the origin as the center of the The difference between weak versus strong coupling can be clearly seen in that strong coupling results in fewer variants of polymorphs and that displacement from the reference system coordinates (molecules with green outline) between polymorphs is negligible.
4. Polymorph sampling with EVCCP
The EVCCP sampling threads were modified to enable replica exchange updates, following established principles in the field (Tuckerman, 2010; Frenkel & Smit, 2002). In this modified replica exchange (EVCCPMRE) approach, only the vector S traverses between replicas with different temperatures (T). This modification carries a significant implication: the configurational extended variables within S can encompass non-ordinal variables that influence structural energy and polymorph generation but remain invariant with respect to other components of S during the update moves within each T bath. In other words, these CSP search variables may lack obvious analytical derivatives, such as those linked to symmetry or Z′. A thorough evaluation of EVCCPMRE involving exchanges of variables such as is outside the scope of this investigation. Thus, the PES sampled in this report will be based on the marginal probabilities, given that variables like and Z′ are held constant in each MC run.
Fig. 8 demonstrates how Scom fluctuate and diffuse about some local minima with a variance relative to T.
Tests of polymorph screenings using EVCCPMRE with coupling k = 1000 were performed for 13 space groups for both Z′ = 3, 4 and compared with analogous unbiased PR searches (N = 8000). Unless otherwise specified the initial configuration for seeding the MRE S0 was generated using a smaller preliminary random search (N = 100). Bath T were chosen based on the condition that the acceptance rate for exchange moves should be close to 0.5. Baths were set at 263 K, 370 K, 574 K and 1142 K (roughly exponentially spaced). Since commonly P(X0,H0) < [8000]−1, each search is not considered exhaustive, thus identifying the same global minima during comparison is not a requirement for enhanced sampling. In principle, the high T bath acts to scan for new global minima using large displacements in EV space whereas the lower T baths can harvest information about polymorphs in surrounding super basins (Yang & Day, 2021a).
In Fig. 8(a) the molecular center EVs are plotted in projection down the x-axis of the reference system with the initial positions for Scom, 0 indicated using black squares. Fig. 8(d) plots energy output U(X, H|S) stepwise from a test simulation. The trajectories are from simulation with 200 steps of MC (no replica exchange) compared with MRE. The trajectory data shown for Fig. 8 was for Z′ = 3 searches in P212121 starting with a predetermined global minimum polymorph that has U(X, H) = 103.8 kJ mol−1 (iso-structural with coumarin form IV). In the 370 K MC run, the final EV position for Scom regenerates the initial polymorph [shown in Fig. 8(b)], whereas in the 370 K MRE bath this is not the case, the resulting structure is shown in Fig. 8(c) with overlay of the and EVs (red circles) in Fig. 8(a). The effect of MRE is clear upon inspection of Fig. 8(a) with an example indicated using a black arrow. In contrast when exchange is disabled, EVs in comparative reference system do not have same degree of spacial coverage in the same number of MC steps.
4.1. EVCCPMRE variant schemes
EVCCPMRE was implemented using the Python interpreter as wrapper code to drive a modified version of the codes for the UPACK (van Eijck & Kroon, 1999). Variations of EVCCPMRE were tested in order to identify if certain modifications of the workflow could further enhance the sampling and subsequent screening results. Despite the utilization of UPACK code for this work, these concepts are transferable and can be coded using many other publicly available generators. The basic MRE algorithm as previously described is referred to as MRE0 (baths = 4, k = 1000, steps = 200, cycles = 1, M = 10, N = 8000).
generatorA variant algorithm, MRE1, evaluates the addition of a history dependent biasing potential as a Gaussian kernel (Laio & Parrinello, 2002) placed along the EV path in attempts to further enhance sampling. In the MRE1 scheme, the historical biasing potential adds an energetic penalty to U(X, H|S) if the EV test-state S had already been previously visited.
The MRE2 scheme incorporates an additional forced update move which is referred to as `forced-relaxation'. The forced-relaxation move causes the system replicas to reset the EV state of each T bath, thus descend back into a local basin that was previously detected `on-the-fly'. The update occurs at the start of each MC cycle after a set number of MC steps. MRE2 is based on a notion that the EVs re-visit a low-energy basin after some predetermined length along the sampling path trajectory. When running MRE2 the mini-batch size was reduced (M = 5) so that twice as many MC steps will be run (baths = 4, steps = 40, cycles = 10, M = 5, N = 8000).
The final modification, MRE3 tests the performance of a global forced-relaxation resetting which occurs after several MC-cycles during the overall search (termed `relaxation-restart'). The MRE3 is similar to an MRE2 update, yet differs in that all polymorphs over a number of cycles from all replicas are evaluated for the unbiased [U(X, H)] global minimum required for restarting the MRE. This takes longer to run because the basic MRE0 does not perform structure optimizations on-the-fly to absolute full convergence each MC step (full convergence occurs during post-processing), but also differs because EVs for structures based on U(X, H|S) will be affected by the harmonic coupling k.
4.2. EVCCPMRE efficiency and comparison with pseudo-random search method
To facilitate a stepwise comparison of the sampling performance between variant searches, we devised a representative metric for the et al., 1989; Whitfield et al., 2002). The choice was an average energy for the window of 20 lowest energy ranked structures made relative by using a mean-shifted value (i.e. 〈U〉T20 − 〈U〉). Monitoring of the relative 〈U〉T20 − 〈U〉 parameter was calculated stepwise from each set of search statistics as a function of number of polymorphs generated (N, corresponding with the number of MC steps). Evaluation of overall search performances was made using typical energy versus density landscapes from unbiased structure energy optimization of resultant distinct polymorphs. The notation E = U(X, H) is used for the unbiased polymorph energy and Emin, 〈E〉T20 and percentiles of E (0.05%, 1.0%, 5.0%) were also evaluated.
of unique configurations identified within a low-energy window. This was loosely based on earlier work with replica exchange ergodic metrics (ThirumalaiComparative landscape plots for Z′ = 3 searches (space group and C2/c) and Z′ = 4 searches (space groups P21/c and C2) are shown as Fig. 9. Corresponding examples for the 〈U〉T20 − 〈U〉 metrics plotted as a function of N are shown in Fig. 10. The comparison of the 〈U〉T20 − 〈U〉 metrics in Fig. 10 demonstrates that EVCCPMRE sampling is more efficient at sampling low energy configurations. For variant searches the metric was able to reach the same value from the corresponding PS search in a smaller number of steps. For EVCCPMRE the metric appears to always converge to values lower than for PR. This does not imply that at this stage of development EVCCPMRE is more efficient at identifying new global minima (Emin), since this would also be highly system dependant, rather that the EVCCPMRE sampling algorithm was behaving as intended, and the concept was correctly implemented.
Search metrics were evaluated over all space groups and summarized for larger intervals of N = 2000, 4000, 6000 and 8000 (see Table 2). In Table 2, the measures Rg and δEg are used to summarize comparisons over the many space groups.
Here represents whether a 〈U〉T20 − 〈U〉 metric or Emin value is from MRE or PR sampling. The Rg value is the ratio between [0,1] for when is less for MRE than for random sampling averaged over all the searches (NSG) for a particular Z′. If Rg > 0.5 it means that more MRE searches gave the lower metric.
represents an average difference, over the space groups tested, between the values being compared. The degree to which Eg < 0 represents the magnitude by which MRE searches obtained a lower metric. From Table 2, the fact that values obtained δEg are slightly lower when running the modified MC updating schemes (algorithms MRE2 and MRE3) suggests that the forced-relaxation or relaxation-restart moves are useful options.
|
Landscape plots that combine search data for all space groups are provided as Fig. 11, with corresponding Emin, 〈E〉T20 values and percentiles provided in Table 3. Interestingly two of the Emin structures for Z′ = 4 were isostructural with the experimental coumarin form I polymorph (Z′ = 1, Pca21) and represent the overall global minimum for the searches. It is likely a structure corresponding with experimental form II (Z′ = 2, P21) was also generated.
|
Interestingly, the overall search results suggest any improvements in Emin, 〈E〉T20 and percentiles made by EVCCP sampling were minimal or ill-defined. There was some expectation that enhancements may not be statistically significant (or highly system dependent) as these tests were restricted to a simple rigid body system and that N is too small (i.e. with limited MC steps the simulation EVs remain far from equilibrium behavior). In the limit of identical N in comparison, the overall coverage from EVCCPMRE was not expected to be as good as PR sampling due to the opportunistic exploration of a k biased search space which appears to sacrifice the total number of distinct hits. The results certainly re-demonstrate the robustness and acceptability of the PR and QR sampling strategies for CSP.
As expected the MRE searches do identify different polymorphs in low energy regions especially for the case of MRE2 and MRE3 for Z′ = 3. It is expected that many more might still be identified for large enough N such that the difference between the total number of distinct hits for MRE search versus PR method is reduced. Interpreting the results shown in Tables 2 and 3 is less useful for initial bench-marking or design of hyper-parameter defaults. It is very likely that EVCCP specific parameters such as k and ΔS might need further experimentation or on-the-fly adjustment especially when screening other molecular compounds with varying chemical complexity (e.g. molecules with many torsional degrees of freedom).
4.3. Benchmark for identification of rare polymorphs
Can EVCCPMRE increase the probability of generating a specific polymorph with a known intrinsically low probability of occurring from PR sampling? The candidate polymorph used for this part of our study was the experimentally identified form IV (Z′ = 3) of coumarin, which was previously reported to have a probability P(XIV, HIV) of 1 in 60000 occurring from PR searches (Shtukenberg et al., 2017).
In order to make this evaluation, a P(XIV, HIV) estimate was made by running 20 (Z′ = 3) complete EVCCPMRE searches spawned with different initial CV coordinate (N = 8000) in P212121 and counting the number of times the form IV polymorph was generated. This was recalculated analogously using the PR method where it was found the probability was closer to 1 in 58000. The probabilities from variant EVCCPMRE search options are shown in Table 4. It is remarkable that the combination of both history-dependent biasing potential () and forced-relaxation updates () results in roughly a sixfold increase in the probability of successfully generating form IV. In hindsight, from the analysis of CSP landscapes (as documented in the previous section), any enhanced sampling effect from adding in the was less prominent if not spurious. However, for generating a specified rare polymorph, the utility appears more striking.
|
4.4. Free energy calculation
4.4.1. Overview
Typically for studies in molecular simulation of crystal et al., 2003; Hoja et al., 2017; Parrinello & Rahman, 1981; Reilly & Tkatchenko, 2013; Yu & Tuckerman, 2011). This is because a realistic ranking of polymorphs is attained by accurately factoring finite temperature effects. In practice, most methods involve direct calculation of phonon spectra to determine entropic contributions from a vibrational Strategies also exist which are based entirely on sampling with MD or MC finite temperature simulations. However, all methods are both approximate and computationally expensive (Baroni et al., 2001; Martoňák et al., 2003; Reilly & Tkatchenko, 2015; Nyman & Day, 2015; Frenkel & Ladd, 1984).
the temperature dependence of the free energy difference (FED) between individual forms is of considerable interest (DayIn an EVCCPMC simulation, the EV evolution includes contributions from the ensemble of crystal polymorphs to which there is the harmonic tether k. The EV are said to be scanning the polymorph probabilities at a finite temperature and it is assumed that thermodynamic ensemble averages such as free energy differences are derivable from a collective of EVCCPMC simulation trajectories. It is believed that the concept can be validated and such a FED estimate might be useful in future as a qualitative measure. To demonstrate, as example of such an ensemble FED estimate was determined, namely the FED between nominal variables Z′ = 1 and Z′ = 2. In this study we used the free energy perturbation (FEP) method for the FED calculation (Zwanzig, 1954). FEP from ECVCCP was appealing since it is straightforward to implement, requiring the EVs as input coordinates since a modified potential for U(X, H|S) could be evaluated (see Appendix A).
4.4.2. Free energy perturbation
The FED between Z′ = 1 and Z′ = 2 can be determined using EVCCPMC as the follows.
Given
then
To do this a Z′ = 2 trajectory was generated at a specified T and the resulting EV coordinates are fed back into the polymorph generator with settings for Z′ = 1. Because a Z′ = 2 calculation will generate two sets of configurations the energy is re-weighted to account for stoichiometry.
The U(X, H|S) will be affected both by the harmonic coupling constant (k) and the displacement factor (ΔS) for the shift magnitude which generates MC test positions. For this work, multiple simulation runs were performed with a range of different ΔS and k depending on T. The deviation of MC acceptance/rejection (AR) ratio from 0.5 was used as a guide (Frenkel & Smit, 2002) to ensure simulation results were reasonable. All Z′ = 2 (P212121) runs start from a known pre-determined global minimum EV (Sg0) corresponding with coumarin form III. Multiple trajectories (steps = 10000) were generated at twelve exponentially spaced T between 25 and 388 K. Spring constants ranged from 100 to 5000 in units of kJ mol−1 Å−1 for COM displacements or kJ mol−1 °−1 for Euler angles.
4.4.3. Simulation results
As expected, it was found that at very low temperatures Umin values do not deviate much since the conditional probability P(Xg0, Hg0|Sg0) was reasonably high (up to 0.33 with k = 1000 and M = 20). Also, the EV coordinates S fluctuate about the point Xg0 typical of the behavior for a system tethered to a harmonic spring. As the temperatures are increased the biasing components of the energies were higher and EVs move away from Sg0.
The plots of 〈U〉 are shown in Fig. 12. Each system was considered equilibrated after 2000 MC steps and 〈U〉 are averaged over the last 8000 steps.
The comparison of the instantaneous U(X, H|S) for a few different trajectories is depicted in Fig. 13. A plot of the FED values (with 〈ΔU〉 representative of the associated error) between Z′ = 2 and Z′ = 1 for coumarin crystal polymorph ensembles in P212121 is shown as Fig. 14. To facilitate the FED estimate, recall that a vibrational free energy curve (Nyman & Day, 2015) can be expressed as
with
where ωi, k are phonon frequencies. A ΔFvib(T) difference between any two such curves (e.g. and can then be approximated as
Thus the estimate for the FED between two curves can be grossly simplified as a straight line with a slope (). Despite being unrelated calculations and a high degree of error, there is a correspondence (albeit coincidental) with experimental observations and DFT based phonon calculations (Shtukenberg et al., 2017). For coumarin P212121, experimental form III (Z′ = 2, E0 K = −103.977 kJ mol−1) and form V (Z′ = 1, E0 K = −102.366 kJ mol−1) have E0 K values close to the global minimum configurations (Emin). For fitting the to the data points, the ZPE difference (CZPE) was not negligible and is fit as an intercept which takes on a positive value indicating that Z′ = 2 structures are expected to be more favorable at lower temperatures. At higher temperatures a transition point at ≃ 200 K occurs. Above this transition temperature Z′ = 1 polymorphs in P212121 are predicted to be more favorable due to an entropic stabilization. This is in agreement with results identified using PBE(0)+MDB DFT-based phonon calculations that demonstrated form V should be significantly stabilized when harmonic vibrations and zero-point energies are taken into consideration (Shtukenberg et al., 2017). However, suggesting that the answer to why one polymorph is more likely to crystallize than another may be qualitatively derived by ensemble averaging of many polymorphs is an overly bold statement which still remains invalid.
5. Conclusion
A new EV-based approach to CSP has been investigated. The approach relies on harmonic coupling between the reference EV coordinate system and the PR polymorph generator. Results of comparison of EVCCPMRE versus PR based on the coumarin system showed that EVCCP does not always lead to an overall greater yield of polymorphs in the low energy high density region of a landscape. It is believed this might be attributed to the selected ΔS and k used in the evaluation but is mostly attributed to the fact that coumarin CSP can be adequately performed within a rigid molecule approximation. It is believed that larger molecular systems with many more (i.e. more rotatable bonds) would benefit from the EVCCPMRE approach in contrast to the PR method.
EVCCPMRE can be modified with history dependent biasing (), forced-relaxation () or relaxation-restart () approaches to further enhance sampling. This was evidenced from the sampling statistics for coumarin form IV.
Averaging of 〈U〉 from EVCCPMC trajectories and performing FEP is one strategy to obtain a FED between ensembles of polymorphs (i.e. Z′ or space group). The FED tested was for the propensity of either Z′ = 2 or Z′ = 1 polymorphs to crystallize and by a sheer coincidence was found to correspond with the attributed entropic stabilization at T > 300 K that lead to the discovery of coumarin form V (Z′ = 1) from melting and cooling experiments.
APPENDIX A
Theoretical framework
A1. Theoretical description for an extended variable reference coupled to crystal polymorph configurations
Generally, a configurational N molecules (Tuckerman, 2010). The is used to statistically derive thermodynamic properties. We will make a simplification and state that one subset of these microstates encompasses possible crystal polymorphs for a molecule. The configurational variables in this subset are those necessary for describing these polymorphs and can include hidden variable types representing both translational and point symmetry.
as applied in molecular simulation, represents the phase space comprising all possible microstates in a system ofLet represent this rudimentary configurational
of the crystal polymorph space for a molecular system, defined as follows:Here has components that are collective variables (CV), which are the positional coordinates for molecular centers (Å) as well as relative molecular orientations in Euler angles (°). In cases where rotations about a bond can occur X will also contain these angular coordinates. The number of components in X is d = Z′*6 + ntor where Z′ is the number of molecules in the and ntor are the number of rotating bonds. are the vector coordinates for a parallelepiped or (unit cell). H ≡ {a, b, c} where a, b and c are the unit-cell vectors in units of Å. It is worth mentioning that in general when the system volume (V) and number of molecules (N) are fixed quantities which is not the case here. The states for the system are limited to involving only a certain fixed number of molecules (Z) that can occupy a parallelepiped (unit cell) having parameters denoted as H (treated as integration variables). Thus, Z is effectively replacing the N in the formalism. For any such system Z is also related to the symmetry (SG) which is considered a hidden constraint variable, held constant, affecting the number of molecules in the (Z′). For example, would represent the configurational for all Z′ = 2 structures in all possible space groups (there are 230 space groups, however most molecular crystals will commonly manifest in a smaller subset of < 30 of these space groups). can then be further subdivided into subsets of each with a fixed respective variable [e.g. = 2, SG = P21/c)]. The 0 K surface (PES) for the system is represented with U(X, H) and is a rough multidimensional surface. U(X, H) is associated with the crystal energy landscape which is a projection of sampled U(X, H) for local minimum polymorphs and their respective densities.
Efficient, accurate and thorough sampling of is the goal of et al., 2016; Nyman & Day, 2015; Schneider et al., 2016; Chan et al., 2021; Bier et al., 2021; Yang & Day, 2021b; Gavezzotti, 2006; Hoja et al., 2017; van Eijck & Kroon, 1999; Zhu et al., 2014).
prediction. Historically there are many approaches to do this that go beyond what is necessary to understand this report, however the interested reader can consult the following articles (ReillyIn the extended variable coupled to crystal polymorph (EVCCP) scheme, a reference coordinate system with an extended variable (EV) space S is introduced and included as an integration variable in the S also represents the same CV coordinates of the molecule as X. In this study, the reference system is not self-interacting so U(S) = 0.
Energetic coupling between the two systems is introduced which can be written as
where
Equation (16) represents the of a system that couples variables X, H, and S. This coupling is achieved by introducing an additional harmonic spring term with coupling strength k that restrains X and S to take on similar values. This coupling introduces interactions between the different variables, and βS represents the temperature associated with the thermal bath of S.
However, it is crucial to emphasize that equation (16) does not imply an adiabatic separation between the systems, and the surface (PES) U(X, H, S) is distinct from U(X, H). Adiabatic separation would require that the evolution of X and H is completely decoupled from S, meaning that they evolve at different time and frequency regimes.
To handle this, the conditional probability distribution P(X, H|S) is introduced. This distribution represents the probability of observing a specific configuration of X and H given a fixed value of S. The existence of this conditional probability distribution indicates that X and H evolve in a way that is co-dependent with S, which allows for the possibility of an adiabatic separation. This conditional distribution relates to the overall joint probability distribution P(X, H, S) through the product rule of probability theory, as shown in equation (18). In general this is known as the factorization of a joint probability and equation (16) represents this in terms of a which incorporates an interaction term between co-dependent variables.
Also, because
implies the relation
can be defined.
This conditional probability distribution is a subset of the larger overall distribution P(X, H), from which useful information and properties are extracted. In essence, the introduction of P(X, H|S) within the context of equation (16) makes the goal of sampling and extracting these useful ensemble properties more tractable (see Fig. 15). It serves to handle the interactions and dependencies between variables within the system. The notation U(X, H|S) indicates a modified potential that accounts for the adiabatic decoupling of variables for the relations described in equation (20).
This approach, where variables can now be adiabatically separated and are still co-dependent, shares conceptual similarities with umbrella sampling (Torrie & Valleau, 1977), a technique used to enhance the sampling of specific regions of the configurational space.
It is possible to state the following:
The P(S) in equation (21) is a familiar sum rule from probability theory. This means that
It is then proposed that the joint probability from equation (21) may be represented as
i.e. U(X, H, S) is replaced with U(X, H|S) and S is introduced in the denominator of equation (22) as an integration variable. Note that the modified potential U(X, H|S) still remains undefined. The denominator represents another which now includes the same integration variables as equation (16).
Also the independent construction of
is made. represents a micro-canonical Uadb(X, H|S)〉M represents the average over a subset of M microstates, which in practice is estimated from the biased distribution P(X, H|S), i.e. fixed value of S. Equation (25) represents scanning over the entire phase space of X and H to identify all configurations where U(X, H) = 〈Uadb(X, H|S)〉M. This differs from a conventional micro-canonical definition in that it is representative of a `Dirac comb' that identifies the subset of configurations which match for 〈Uadb(X, H|S)〉M. The subscript adb is for `adiabatic' and is used to differentiate Uadb(X, H|S) from = which will be described later. One can also represent the difference between and as
which will be used as a tool at a later stage. The 〈and
The following reformulation of equation (16) into equation (24) by application of equation (25) is made. This is written out as
which is the general form of the ) can be written as
used for EVCCP. Equation (28where 〈X〉M denotes the average value of X that was obtained from configurations where the value of S was fixed, hence X and H are adiabatically decoupled from S [i.e. using fixed S to sample an estimate for P(X, H|S) and obtain 〈Uadb(X, H|S)〉M and 〈X〉M]. Consider now that we can expand the energetic term in equation (29) as follows:
where Uadb(Xi, Hi|S) is the energy of the ith polymorph generated from P(X, H|S) under the influence of the reference S and
In the last line of equation (30) the value Umin is introduced which is the energy of the lowest energy configuration with the harmonic penalty taken into account. Umin is obtained from sampling the M structures for fixed S. Umin enables further manipulation of equation (30) so that P(X, H|S) can be approximated with a Gaussian distribution as shown in Fig. 15 [i.e. P(X, H|S) is a Gaussian distribution centered at Xmin and Hmin)]. A Gaussian approximation can be made which is based on the square of the deviation of the variables Xi and Hi from those of the minimum energy polymorph configuration Xmin and Hmin that was identified in the mini-batch generation. Within the proposed Gaussian approximation equation (30) becomes
A Gaussian approximation implies that in equation (30) the 〈Uadb(Xmin, Hmin|S)〉M = Umin. The summation is represented as an integral with a δ function and equation (32) is rewritten as follows:
Equations (28) and (29) are then rewritten as follows:
or
As shown in Fig. 15 the utility of this approach is to approximate the probability distribution P(X, H) with many smaller Gaussian distributions which are centered at S. Because S can deviate from X based on the value of the coupling constant (k) this means that the coupling constant also acts to adjust the spread of the Gaussian distribution which affects the resolution of how well the approximation of P(X, H) can be made since this approximation is based on sub-sampling many P(X, H|S). For very large values of k a reconstruction of P(X, H) from integration of P(X, H, S) would be as close as possible to the true P(X, H). However for smaller k the reconstruction of P(X, H) is smoothed.
A2. Calculation of free energy differences
The following equations outline the scheme used for calculating the free energy difference between polymorph ensembles (Umin, βS, Z′ = 2, SG = P212121) and (Umin, βS, Z′ = 1, SG = P212121) sampled using EVCCPMC. The Zwanzig relation (Zwanzig, 1954) provides a prescription for calculating the free energy difference from the ratio between and such that
where the subscript with angle brackets denotes averaging taken with respect to the collective variables S trajectory generated using , i.e. Z′ = 2.
It is appropriate to express such a formulation with respect to the energetic coupling terms [see equation (33)] used for EVCCPMC so that and are representative of
where the superscript (a) and (b) denote which ensemble had generated the particular variable, also U(X1, X2, H12|S1, S2), U(X1, H1|S1) and U(X2, H2|S2) are lattice energies which are scaled relative to Z′. Such a workflow requires generating a MC trajectory for Z′ = 2 to obtain S1(a) and S2(a) coordinates at a specified βS. This is followed by two separate Z′ = 1 re-calculations to obtain . So for Z′ = 1 the input S is fixed and comes directly from the Z′ = 2 calculation. The S(a) will experience a and forces created by X(b) and H(b) as it is coupled with the conditional distribution of polymorphs that can be generated using the fixed S(a), i.e. P(X(b), H(b)|S(a)). The corresponding energy is evaluated as U(X(b), H(b)|S(a)) + k(X(b) − S(a))2.
APPENDIX B
Further computational details
The algorithm for pseudo-random (PR) sampling of polymorphs and structure optimization was made available as part of the program UPACK (van Eijck & Kroon, 1999). The chosen force field parameters and charge assignments comply with the generalized amber force field (Wang et al., 2004).
An initial polymorph description or `test structure' as atomic positions and unit-cell parameters (Cartesian coordinates) or as CVs (X and H) is randomly sampled from a uniform distribution. The initial position in the polymorph landscape is then subjected to a density test so that the unit-cell parameters are sensible and within specified tolerances. Given the density test is satisfied, gradient descent moves are made based on the chosen force field and the structure is energy minimized and becomes a local minimum on the 0 K PES. If at any point within the optimization part workflow the structure energy goes above a certain threshold then the test structure is rejected.
In practice the P(X, H) associated with an identified polymorph at the local minimum depends on the boundaries associated with the gradients surrounding X and H. This probability is complex and not a uniform distribution. P(X, H) from PR sampling considers polymorphs as local minima on the PES and must have some dependence on the U(X, H). Generally, P(X, H) will take on a shape that is difficult to estimate even for very simple molecular systems and rarely evaluated in practice using statistical methods.
Supporting information
Sampling statistics of probability distributions comparing from pseudo-random and EVCCP methods for Z'=1 and Z'=2. Web links to all the codes related to this article. DOI: https://doi.org/10.1107/S205252062400132X/aw5082sup1.pdf
Footnotes
1If β = 0 then this is the special case of T = ∞ condition equivalent with the random search method.
Acknowledgements
The authors express gratitude to Professor Julian Gale, at Curtin Institute for Computation, for providing access to academic computing and library resources. The authors also extend thanks to the team at PAWSEY supercomputing research centre for ongoing access to the Nimbus facility. Special thanks must go to Julian Gale, Jutta Rogal, Peter Spackman and Leslie Vogt for their valuable discussions on the topic of CSP.
Funding information
The following funding is acknowledged: Materials Research Science and Engineering Center, NYU (award No. DMR-1420073).
References
Abrams, J. B. & Tuckerman, M. E. (2008). J. Phys. Chem. B, 112, 15742–15757. Web of Science CrossRef PubMed CAS Google Scholar
Allen, M. P. & Tildesley, D. J. (1987). Computer Simulation of Liquids. Clarendon Press. Google Scholar
Baroni, S., Gironcoli, S. de, Dal Corso, A. & Giannozzi, P. (2001). Rev. Mod. Phys. 73, 515. Web of Science CrossRef Google Scholar
Bernstein, J. (2002). Polymorphism in Molecular Crystals. Oxford University Press. Google Scholar
Bernstein, J. (2008). Crystal Polymorphism, pp. 87–109. Springer Netherlands. Google Scholar
Bier, I., O'Connor, D., Hsieh, Y.-T., Wen, W., Hiszpanski, A. M., Han, T. Y.-J. & Marom, N. (2021). CrystEngComm, 23, 6023–6038. Web of Science CrossRef CAS Google Scholar
Case, D. H., Campbell, J. E., Bygrave, P. J. & Day, G. M. (2016). J. Chem. Theory Comput. 12, 910–924. Web of Science CrossRef CAS PubMed Google Scholar
Chan, E. J., Shtukenberg, A. G., Tuckerman, M. E. & Kahr, B. (2021). Cryst. Growth Des. 21, 5544–5557. Web of Science CrossRef CAS Google Scholar
Ciccotti, G. & Meloni, S. (2011). Phys. Chem. Chem. Phys. 13, 5952. Web of Science CrossRef PubMed Google Scholar
Davey, R. J. & Garside, J. (2000). From Molecules to Crystallizers - an Introduction to Crystallization. Oxford Chemistry Primers, Vol. 86. Oxford Science Publications. Google Scholar
Day, G. M., Price, S. L. & Leslie, M. (2003). J. Phys. Chem. B, 107, 10919–10933. Web of Science CrossRef CAS Google Scholar
Desiraju, G. R. (1989). Crystal Engineering: the Design of Organic Solids. Elsevier. Google Scholar
Desiraju, G. R. (2001). Nature, 412, 397–400. Web of Science CrossRef PubMed CAS Google Scholar
Dunitz, J. D. (1995). X-ray Analysis and the Structure of Organic Molecules. Weinheim, New York: VCH. Google Scholar
Frenkel, D. & Ladd, A. J. C. (1984). J. Chem. Phys. 81, 3188–3193. CrossRef CAS Web of Science Google Scholar
Frenkel, D. & Smit, B. (2002). Understanding Molecular Simulation: From Algorithms to Applications, 2nd ed. Academic Press. Google Scholar
Gavezzotti, A. (2006). Molecular Aggregation: Structure Analysis and Molecular Simulation of Crystals and Liquids. Oxford University Press. Google Scholar
Hartman, P. (1973). Crystal Growth: an Introduction, Vol. 1. Amsterdam: North Holland Publishing Co. Google Scholar
Hasenbusch, M. & Schaefer, S. (2010). Phys. Rev. E, 82, 046707. Web of Science CrossRef Google Scholar
Hermann, J., DiStasio, R. A. & Tkatchenko, A. (2017). Chem. Rev. 117, 4714–4758. Web of Science CrossRef CAS PubMed Google Scholar
Hoja, J., Reilly, A. M. & Tkatchenko, A. (2017). WIREs Comput. Mol. Sci. 7, e1294. Google Scholar
Hunnisett, L. M., Francia, N., Nyman, J., Abraham, N. S., Aitipamula, S., Alkhidir, T., Almehairbi, M., Anelli, A., Anstine, D. M., Anthony, J. E., Arnold, J. E., Bahrami, F., Bellucci, M. A., Beran, G. J. O., Bhardwaj, R. M., Bianco, R., Bis, J. A., Boese, A. D., Bramley, J., Braun, D. E., Butler, P. W. V., Cadden, J., Carino, S., Červinka, C., Chan, E. J., Chang, C., Clarke, S. M., Coles, S. J., Cook, C. J., Cooper, R. I., Darden, T., Day, G. M., Deng, W., Dietrich, H., DiPasquale, A., Dhokale, B., van Eijck, B. P., Elsegood, M. R. J., Firaha, D., Fu, W., Fukuzawa, K., Galanakis, N., Goto, H., Greenwell, C., Guo, R., Harter, J., Helfferich, J., Hoja, J., Hone, J., Hong, R., Hušák, M., Ikabata, Y., Isayev, O., Ishaque, O., Jain, V., Jin, Y., Jing, A., Johnson, E. R., Jones, I., Jose, K. V. J., Kabova, E. A., Keates, A., Kelly, P. F., Klimeš, J., Kostková, V., Li, H., Lin, X., List, A., Liu, C., Liu, Y. M., Liu, Z., Lončarić, I., Lubach, J. W., Ludík, J., Maryewski, A. A., Marom, N., Matsui, H., Mattei, A., Mayo, R. A., Melkumov, J. W., Mladineo, B., Mohamed, S., Momenzadeh Abardeh, Z., Muddana, H. S., Nakayama, N., Nayal, K. S., Neumann, M. A., Nikhar, R., Obata, S., O'Connor, D., Oganov, A. R., Okuwaki, K., Otero-de-la-Roza, A., Parkin, S., Parunov, A., Podeszwa, R., Price, A. J. A., Price, L. S., Price, S. L., Probert, M. R., Pulido, A., Ramteke, G. R., Rehman, A. U., Reutzel-Edens, S. M., Rogal, J., Ross, M. J., Rumson, A. F., Sadiq, G., Saeed, Z. M., Salimi, A., Sasikumar, K., Sekharan, S., Shankland, K., Shi, B., Shi, X., Shinohara, K., Skillman, A. G., Song, H., Strasser, N., van de Streek, J., Sugden, I. J., Sun, G., Szalewicz, K., Tan, L., Tang, K., Tarczynski, F., Taylor, C. R., Tkatchenko, A., Touš, P., Tuckerman, M. E., Unzueta, P. A., Utsumi, Y., Vogt-Maranto, L., Weatherston, J., Wilkinson, L. J., Willacy, R. D., Wojtas, L., Woollam, G. R., Yang, Y., Yang, Z., Yonemochi, E., Yue, X., Zeng, Q., Zhou, T., Zhou, Y., Zubatyuk, R. & Cole, J. C. (2024). Acta Cryst. B80, https://doi.org/10.1107/S2052520624008679. Google Scholar
Hunnisett, L. M., Nyman, J., Francia, N., Abraham, N. S., Adjiman, C. S., Aitipamula, S., Alkhidir, T., Almehairbi, M., Anelli, A., Anstine, D. M., Anthony, J. E., Arnold, J. E., Bahrami, F., Bellucci, M. A., Bhardwaj, R. M., Bier, I., Bis, J. A., Boese, A. D., Bowskill, D. H., Bramley, J., Brandenburg, J. G., Braun, D. E., Butler, P. W. V., Cadden, J., Carino, S., Chan, E. J., Chang, C., Cheng, B., Clarke, S. M., Coles, S. J., Cooper, R. I., Couch, R., Cuadrado, R., Darden, T., Day, G. M., Dietrich, H., Ding, Y., DiPasquale, A., Dhokale, B., van Eijck, B. P., Elsegood, M. R. J., Firaha, D., Fu, W., Fukuzawa, K., Glover, J., Goto, H., Greenwell, C., Guo, R., Harter, J., Helfferich, J., Hofmann, D. W. M., Hoja, J., Hone, J., Hong, R., Hutchison, G., Ikabata, Y., Isayev, O., Ishaque, O., Jain, V., Jin, Y., Jing, A., Johnson, E. R., Jones, I., Jose, K. V. J., Kabova, E. A., Keates, A., Kelly, P. F., Khakimov, D., Konstantinopoulos, S., Kuleshova, L. N., Li, H., Lin, X., List, A., Liu, C., Liu, Y. M., Liu, Z., Liu, Z.-P., Lubach, J. W., Marom, N., Maryewski, A. A., Matsui, H., Mattei, A., Mayo, R. A., Melkumov, J. W., Mohamed, S., Momenzadeh Abardeh, Z., Muddana, H. S., Nakayama, N., Nayal, K. S., Neumann, M. A., Nikhar, R., Obata, S., O'Connor, D., Oganov, A. R., Okuwaki, K., Otero-de-la-Roza, A., Pantelides, C. C., Parkin, S., Pickard, C. J., Pilia, L., Pivina, T., Podeszwa, R., Price, A. J. A., Price, L. S., Price, S. L., Probert, M. R., Pulido, A., Ramteke, G. R., Rehman, A. U., Reutzel-Edens, S. M., Rogal, J., Ross, M. J., Rumson, A. F., Sadiq, G., Saeed, Z. M., Salimi, A., Salvalaglio, M., Sanders de Almada, L., Sasikumar, K., Sekharan, S., Shang, C., Shankland, K., Shinohara, K., Shi, B., Shi, X., Skillman, A. G., Song, H., Strasser, N., van de Streek, J., Sugden, I. J., Sun, G., Szalewicz, K., Tan, B. I., Tan, L., Tarczynski, F., Taylor, C. R., Tkatchenko, A., Tom, R., Tuckerman, M. E., Utsumi, Y., Vogt-Maranto, L., Weatherston, J., Wilkinson, L. J., Willacy, R. D., Wojtas, L., Woollam, G. R., Yang, Z., Yonemochi, E., Yue, X., Zeng, Q., Zhang, Y., Zhou, T., Zhou, Y., Zubatyuk, R. & Cole, J. C. (2024). Acta Cryst. B80, https://doi.org/10.1107/S2052520624007492. Google Scholar
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P. & Hassabis, D. (2021). Nature, 596, 583–589. Web of Science CrossRef CAS PubMed Google Scholar
Kennedy, J. & Eberhart, R. (1995). In Proceedings of ICNN'95. International Conference on Neural Networks, Vol. 4, pp. 1942–1948. Google Scholar
Kitaigorodskiy, A. I., Mnyukh, Y. V. & Asadov, Y. G. (1965). J. Phys. Chem. Solids, 26, 463–464. CrossRef Web of Science Google Scholar
Kitaigorosky, A. I. (1973). Molecular Crystals and Molecules. Academic Press. Google Scholar
Laio, A. & Parrinello, M. (2002). Proc. Natl Acad. Sci. USA, 99, 12562–12566. Web of Science CrossRef PubMed CAS Google Scholar
Liu, P., Kim, B., Friesner, R. A. & Berne, B. J. (2005). Proc. Natl Acad. Sci. USA, 102, 13749–13754. Web of Science CrossRef PubMed CAS Google Scholar
Maragliano, L. & Vanden-Eijnden, E. (2006). Chem. Phys. Lett. 426, 168–175. Web of Science CrossRef CAS Google Scholar
Martoňák, R., Laio, A. & Parrinello, M. (2003). Phys. Rev. Lett. 90, 075503. Web of Science PubMed Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. (1953). J. Chem. Phys. 21, 1087–1092. CrossRef CAS Web of Science Google Scholar
Mullin, J. W. (2001). Crystallization, 4th ed. Oxford: Butterworth–Heinemann. Google Scholar
Neumann, M. A. (2008). J. Phys. Chem. B, 112, 9810–9829. Web of Science CrossRef PubMed CAS Google Scholar
Neumann, M., Leusen, F. & Kendrick, J. (2008). Angew. Chem. Int. Ed. 47, 2427–2430. Web of Science CrossRef CAS Google Scholar
Nyman, J. & Day, G. M. (2015). CrystEngComm, 17, 5154–5165. Web of Science CrossRef CAS Google Scholar
Parr, R. G. & Weitao, Y. (1995). Density-Functional Theory of Atoms and Molecules. Oxford University Press. Google Scholar
Parrinello, M. & Rahman, A. (1981). J. Appl. Phys. 52, 7182–7190. CrossRef CAS Web of Science Google Scholar
Price, S. L. (2004). Adv. Drug Deliv. Rev. 56, 301–319. Web of Science CrossRef PubMed CAS Google Scholar
Price, S. L. (2013). Acta Cryst. B69, 313–328. Web of Science CrossRef CAS IUCr Journals Google Scholar
Price, S. L. (2018). Faraday Discuss. 211, 9–30. Web of Science CrossRef CAS PubMed Google Scholar
Reilly, A. M., Cooper, R. I. et al. (2016). Acta Cryst. B72, 439–459. CrossRef IUCr Journals Google Scholar
Reilly, A. M. & Tkatchenko, A. (2013). J. Chem. Phys. 139, 024705. Web of Science CrossRef PubMed Google Scholar
Reilly, A. M. & Tkatchenko, A. (2015). Chem. Sci. 6, 3289–3301. Web of Science CrossRef CAS PubMed Google Scholar
Rossi, M., Gasparotto, P. & Ceriotti, M. (2016). Phys. Rev. Lett. 117, 115702. Web of Science CrossRef PubMed Google Scholar
Rosso, L., Mináry, P., Zhu, Z. & Tuckerman, M. E. (2002). J. Chem. Phys. 116, 4389–4402. Web of Science CrossRef CAS Google Scholar
Schneider, E., Vogt, L. & Tuckerman, M. E. (2016). Acta Cryst. B72, 542–550. Web of Science CrossRef IUCr Journals Google Scholar
Shtukenberg, A. G., Zhu, Q., Carter, D. J., Vogt, L., Hoja, J., Schneider, E., Song, H., Pokroy, B., Polishchuk, I., Tkatchenko, A., Oganov, A. R., Rohl, A. L., Tuckerman, M. E. & Kahr, B. (2017). Chem. Sci. 8, 4926–4940. Web of Science CSD CrossRef CAS PubMed Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. (2016). Nature, 529, 484–489. Web of Science CrossRef CAS PubMed Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. & Hassabis, D. (2018). Science, 362, 1140–1144. Web of Science CrossRef CAS PubMed Google Scholar
Sobol, I. (1977). USSR Comput. Math. Math. Phys. 16, 236–242. CrossRef Google Scholar
Thirumalai, D., Mountain, R. D. & Kirkpatrick, T. R. (1989). Phys. Rev. A, 39, 3563–3574. CrossRef CAS Web of Science Google Scholar
Torrie, G. M. & Valleau, J. P. (1977). J. Comput. Phys. 23, 187–199. CrossRef Web of Science Google Scholar
Tuckerman, M. E. (2010). Statistical Mechanics: Theory and Molecular Simulation. Oxford University Press. Google Scholar
van Eijck, B. P. & Kroon, J. (1999). J. Comput. Chem. 20, 799–812. Web of Science CrossRef CAS PubMed Google Scholar
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. (2004). J. Comput. Chem. 25, 1157–1174. Web of Science CrossRef PubMed CAS Google Scholar
Wengert, S., Csányi, G., Reuter, K. & Margraf, J. T. (2021). Chem. Sci. 12, 4536–4546. Web of Science CrossRef CAS PubMed Google Scholar
Whitfield, T. W., Bu, L. & Straub, J. (2002). Physica A, 305, 157–171. Web of Science CrossRef Google Scholar
Yang, S. & Day, G. (2021a). ChemRxiv, 10.26434/chemrxiv-2021-jh6cj. Google Scholar
Yang, S. & Day, G. M. (2021b). J. Chem. Theory Comput. 17, 1988–1999. Web of Science CrossRef CAS PubMed Google Scholar
Yu, T.-Q. & Tuckerman, M. E. (2011). Phys. Rev. Lett. 107, 015701. Web of Science CrossRef PubMed Google Scholar
Zhu, Q., Sharma, V., Oganov, A. R. & Ramprasad, R. (2014). J. Chem. Phys. 141, 154102. Web of Science CrossRef PubMed Google Scholar
Zwanzig, R. W. (1954). J. Chem. Phys. 22, 1420–1426. CrossRef CAS Web of Science Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.