Embedding-theory-based simulations using experimental electron densities for the environment

For the first time, the use of experimentally derived molecular electron densities as ρB(r) in calculations based on frozen-density embedding theory (FDET) of environment-induced shifts of electronic excitations for chromophores in clusters is demonstrated. ρB(r) was derived from X-ray restrained molecular wavefunctions of glycylglycine to obtain environment densities for simulating electronic excitations in clusters.


Introduction
Frozen-density embedding theory (FDET) is the Hohenberg-Kohn theorems-based formal framework for multi-level simulations (Wesolowski, 2004). The total electron density is built up from two components, A (r) and B (r), of which only the first is constructed from quantum-mechanical descriptors. FDET was originally formulated for variational methods used to obtain such descriptors of the embedded species as: (i) a non-interacting reference system described with a Kohn-Sham determinant (Wesolowski & Warshel, 1993), (ii) an interacting system described with a multi-determinant wavefunction (Wesołowski, 2008), and (iii) a one-particle density matrix (Pernal & Wesolowski, 2009). An extension of FDET for non-variational methods has recently been formulated (Zech et al., 2019). Extensions of FDET for excited states can be made based on the response theory for either noninteracting (Wesolowski, 2004) or interacting (Hö fener et al., 2012) systems. Another possibility for describing excited states relies on the Perdew-Levy theorem on extrema of the groundstate energy functional (Perdew & Levy, 1985). It makes it possible to interpret other-than-the-lowest-energy stationary embedded wavefunctions obtained in FDET as excited states, as pointed out by Khait & Hoffmann (2010). In any of these variants of FDET, the embedded wavefunction depends on the chosen B (r). Several computational methods sharing some elements with FDET but differing in some key aspects, such as the choice of independent variables, self-consistency between the embedding potential and the embedded wavefunction, locality of the embedding potential etc., have been developed by various groups. We address the reader to reviews concerning -besides the methods based on FDET -related computational approaches (Wang & Carter, 2000;Wesolowski, 2006;Jacob & Neugebauer, 2014;Wesolowski et al., 2015;Krishtal et al., 2015).
At the present state of development of approximations for the FDET embedding functional [see equation (8) below], applications of FDET are limited to such systems where A (r) and B (r) do not overlap significantly Bernard et al., 2008). As a rule of thumb, FDET-based methods are only applicable to such cases where the environment is not covalently bound to the embedded species (Gö tz et al., 2009;Goodpaster et al., 2010;Fux et al., 2010). In such cases, the overlap between A (r) and B (r) is small and simple local and semi-local approximations are sufficiently accurate. FDET-based simulations can be seen as a variant of quantum mechanics/molecular mechanics simulations, in which the modeller decides on the procedure to generate B (r) instead of parametrizing the force-field parameters describing the energy contributions due to the interactions between the quantum system and its environment. Various system-and property-specific protocols for generating B (r) for FDETbased simulations are possible. Some examples of different treatments of the environment density are given below. If the environment comprises several weakly bound molecules, the corresponding B (r) can be obtained either from quantummechanical calculations for the whole cluster comprising all molecules in the environment or, in a simplified manner, as a superposition of molecular densities derived from some quantum-mechanical method (Wesolowski & Warshel, 1994;Humbert-Droz et al., 2014). If B (r) is localized in a predefined part of space, the effect of electronic polarization of the environment by the embedded species can be taken into account by optimizing B (r) also  or by 'pre-polarizing' it using simpler techniques (Zbiri et al., 2004;Ricardi et al., 2018). FDET can also be used to set up a multi-physics simulation in which B (r) represents a statistical ensemble-averaged electron density [h B i(r)] represented as a continuum derived using classical statistical-mechanics-based approaches (Kaminski et al., 2010;Laktionov et al., 2016). Such methods are especially useful for studying the electronic structure of solvated molecules .
The above examples show clearly that the choice of the procedure to generate B (r) is the key element of any FDETbased simulation. This can be made in an 'automatic' way by making some system-independent procedures, choices or approximations, or made in a system-dependent manner involving user-provided information about B (r) such as: (i) using as B (r) the ground-state density of some system obtained without putting in any information about the embedded species, (ii) localizing B (r) in a pre-defined region of space by choosing a limited set of atom-centred basis functions, (iii) allowing it to spread over the whole system, (iv) optimizing B (r) by means of the 'freeze-and-thaw' minimization of the total energy , or (v) any combination of the above. In principle, the density B (r) obtained from the unique partitioning of the total density using the approach developed by Carter and collaborators ) could be used as a possible 'automatic' procedure to generate B (r) in FDET.
The strategy by which B (r) is obtained from quantummechanical calculations for the environment only, i.e. in the absence of the embedded species, is particularly attractive. Both our experience and work by other researchers show that obtaining B (r) from an isolated calculation yields the dominant contribution to the complexation-induced shifts of the excitation energies, especially for excitations with shifts of large magnitude [see Fradelos et al. (2011 and Ricardi et al. (2018)]. Daday and co-workers showed that the effects of the optimization of B (r), either for the ground state alone or for both ground and excited state, are secondary, albeit numerically non-negligible: the excitation energy for methylenecyclopropene solvated by 17 water molecules (for which the reference shift obtained from the reference calculations for the whole cluster equals 0.86 eV) is fairly well reproduced (0.82 eV) with such a choice for B (r) (Daday et al., 2014). In the case of n-* excitations for acrolein in water, the corresponding shifts are 1.42 and 1.10 eV, respectively. Although the difference between the FDET with such a choice of B (r) and the reference shifts cannot be attributed to the 'neglect of the electronic polarization of the environment' within the formal framework of FDET, the optimization of B (r) (Daday et al., 2014) or pre-polarization  usually reduces this difference. This secondary importance of the explicit treatment of the polarization of the environment is due to the variational character of FDET and the fact that the partitioning of the total density of the complex into A (r) and B (r) is not unique in exact FDET, resulting in a better capacity to approach the exact total density [see the discussions by Wesolowski et al. (2015) and Humbert-Droz et al. (2014)].
The present work concerns yet another possibility for generating B (r) for FDET simulations of embedded species in a given environment consisting of non-covalently bound molecules, in which B (r) is obtained from experimental data concerning a different system: a molecular crystal of the environment molecule. Recent years have brought a number of reports showing that both electron densities (Hansen & Coppens, 1978) and wavefunctions (Jayatilaka, 2012) can be reconstructed from X-ray diffraction data. It is tempting, therefore, to explore these new possibilities to generate B (r) for use in FDET-based simulations. We have to emphasize that several approximations may undermine the use of X-raybased densities for FDET. The most important issues can be listed as follows: (i) the link to any experimental quantity is of course affected by experimental errors, which are unavoidable and may affect both precision and accuracy; (ii) the electron density and wavefunction that are extracted from experiment are static, whereas atoms are not steady in the crystal; (iii) the sampling of the diffraction in reciprocal space is necessarily incomplete; (iv) only the intensity of the diffracted ray is measured, and not the phase; and (v) the crystal sample is imperfect. For these reasons, the possibility of using experimental densities as B (r) in FDET hinges critically on robust and numerically stable protocols to generate such densities, and it is thus important to investigate the dependence of FDET results on such procedures. The current state of the art in wavefunction and electron-density reconstruction from X-ray diffraction data encourages this attempt. In particular, the above-mentioned pitfalls may be tackled as follows: (i) modern instrumentation enables measurement of the diffraction intensities with high precision; (ii) the deconvolution of thermal motion is reliable if the measurements are carried out at a sufficiently low temperature and if the resolution of the diffraction is sufficiently large; (iii) complementary input from theory can compensate for the missing information; (iv) appropriate modelling enables the phasing of the diffracted rays; and (v) data correction from the ideal kinematic theory of diffraction allows for sufficiently accurate data. The present work reports an exploratory study of the use of densities from X-ray restricted wavefunctions in FDET.
Concerning a particular variant of FDET and system to be investigated, we have chosen to evaluate the excitation energies obtained from LinearizedFDET Zech et al., 2015) for several organic chromophores, each hydrogenbonded to its environment. Our extensive benchmarking of the performance of FDET for such cases indicates that the errors in FDET excitation energies due to the approximations used for the explicit density functional for non-electrostatic components of the FDET embedding potential (see the next section) are small. In a benchmark set of embedded organic chromophores, the average deviation from the reference amounts to about 0.04 eV . This magnitude of the deviation defines the threshold for complexationinduced shifts in the excitation energy above which analysis of the dependence of the shift on B (r) is meaningful. In the embedded chromophores chosen for the present study, these shifts vary between 0.15 and 0.6 eV.

Embedded chromophores
Concerning the molecules for which B (r) is generated, we have chosen glycylglycine (GlyGly). For this exploratory study, it is crucial that the molecule(s) corresponding to B (r) are capable of forming hydrogen bonds with the chromophore. GlyGly satisfies this condition. Moreover, the molecular density of GlyGly reconstructed from X-ray diffraction data reflects features arising from intermolecular hydrogen bonds present in the crystal structure (Genoni et al., 2018). Fig. 1 shows the GlyGly molecule, together with its nearest neighbours in the crystal.
The densities reconstructed from experimental data on glycylglycine are used in the present work as B (r) in FDET calculations of excitation energies for eight different hydrogen-bonded complexes formed by one organic chromophore (acrolein, acrylic acid or acetone) and glycylglycine. Fig. 2 shows the considered clusters.
The hydrogen-bonding networks shown in Figs. 1 and 2 are not the same. In the crystal, all donors and acceptors are involved in hydrogen bonding, which is not the case in the investigated clusters. Nevertheless, each individual hydrogen bond in the clusters has its corresponding partner in the crystal. It can be expected, therefore, that the effect of the hydrogen bonding on B (r) in the cluster is also reflected in the density obtained from the crystal. The hydrogen-bonding pattern (dashed lines) for the glycylglycine molecule in the crystal, taken from Dos Santos et al. (2014). Only nearest atoms involved in hydrogen bonding are shown: oxygen (red) and nitrogen (blue).

Figure 2
The complexes of three different chromophores with glycylglycine.

FDET approach to multi-level simulations
For a system comprising N AB electrons in an external potential v AB (r), the functional E FDET v AB ½É A ; B is defined to satisfy by construction the following relation: where E HK v AB ½ is the Hohenberg-Kohn ground-state energy functional (Hohenberg & Kohn, 1964) and By virtue of the second Hohenberg-Kohn theorem, equation (1) leads to where E 0 = E HK v AB ½ 0 and 0 (r) are the ground-state energy and density of the total system, respectively. Equality is reached for a large class of densities B (r), Using conventional density functionals representing components of the total energy, and arbitrary partitioning of the where  (Levy, 1979). It is defined as The functional ÁF[], on the other hand, depends on the form of the wavefunction É used in equation (1) and is also defined via the constrained search (Wesołowski, 2008). For instance, if É A is a single determinant (È), it reads whereV V ee N A is the electron-electron repulsion operator, V ee ½ is the density functional of the electron-electron repulsion energy, and ÁF SD ½ is just the correlation functional (ÁF ½ E c ½) defined in the constrained-search formulation of density functional theory (Levy, 1979;Baroni & Tuncel, 1983). For É of the full CI form, ÁF FCI [] = 0 by definition.
Euler-Lagrange optimization of É A leads to the Schrö dinger-like equation where  (7) will be denoted as É EL A . Note that the energy is given not by the Lagrange multiplier but by equation (4). For exact density functionals, any variational method can be used to obtain É EL A and the corresponding density EL A ðrÞ, which satisfy by construction the basic FDET equality given in equation (1).
3.1. Reconstruction of q B (r) from X-ray diffraction data X-ray restrained wavefunctions (XRW), in the literature commonly (but incorrectly) termed X-ray constrained wavefunctions, were initially developed by Jayatilaka and coworkers (Jayatilaka, 2012;. Within the XRW paradigm, instead of applying the variational principle, like in conventional SCF, a special functional L is defined, based on a classical Hamiltonian and a function of the square difference between calculated and experimentally measured structure factors, which is ideally distributed with 2 statistics. Here, with N r and N p being, respectively, the number of experimental data and the number of parameters in the model. ðF k À F exp k Þ is the difference between the structure factors from the trial wavefunction and the experimental ones, and k is the experimental standard deviation. Thus, the minimization of L implies finding the minimal energy AND the best agreement with experiment. Of course, these cannot be achieved simultaneously and a parameter J must be defined in order to weight the two parts of the functional. Therefore, the functional takes the form This procedure allows the construction of molecular wavefunctions from experimental observations in crystals. By increasing J , both long-and short-range interactions in the crystal are progressively taken into account. In this work, we used structure factors measured for GlyGly to calculate X-ray restrained wavefunctions with J values from 0.0 to 1.0, since for higher values the SCF procedure does not converge. We stress that such a value of J = 1.0 has no specific meaning because the electronic energy of the Hamiltonian and the electron-density difference in the 2 function have two different units; thus J is not dimensionless but depends on the number of electrons, the molecular volume and the diffraction resolution. Moreover, the structure factors in the 2 function are weighted by the variance of their measurement statistics. The aforementioned wavefunctions were then used to calculate B (r).

Computational details
The following approximations were used in the reported FDET calculations: (i) ADC(2) treatment (Schirmer, 1982)  are not used at all), (iii) neglect of the v F [ A ] contribution to the embedding potential, (iv) monomer expansion of A (r) (only atomic basis sets centred on the chromophore), (v) monomer expansion of B (r) (only atomic basis sets centred on GlyGly), and (vi) chromophore-independent generation of B (r) using one of the following methods for the isolated GlyGly: Hartree-Fock, first-order Møller-Plesset (MP) perturbation theory, Kohn-Sham (KS) with PBE (Perdew et al., 1996) approximation for E xc [], coupled clusters singles and doubles (CCSD) or the reconstruction from experimental structure factors (see the next section).
The FDET results for each cluster are compared with the reference obtained from ADC(2) calculations. The reported reference shifts in the excitation energy are evaluated as Á ref = ADCð2Þ

À ADCð2Þ
AðBÞ , where AB denotes the complex and A(B) denotes the chromophore alone but with the basis set expanded by the functions localized on GlyGly [similar to how it is done in the counterpoise technique of Boys & Bernardi (1970) for intermolecular interaction energy]. In all calculations, including also the reconstruction of the electron density of GlyGly from the X-ray structure factors, the 6-311G basis set was used.
At J = 0, the wavefunction obtained from the X-ray diffraction data is just the Hartree-Fock molecular wavefunction. The numerical results should be identical, regardless of what software is used to generate B (r). We used this fact to check the numerical soundness of the procedures to exportimport densities B (r) obtained with different software. Tonto (Grimwood et al., 2003) was used for X-ray restrained wavefunction calculations, Psi-4 (Parrish et al., 2017) to generate the CCSD GlyGly density, and Q-Chem (Shao et al., 2015), with its ADCMAN (Wormit et al., 2014) and FDEMAN (Prager et al., 2016) modules, for all other calculations, including FDET/ADC (2)

Results
For the eight considered clusters, the lowest excitation energies obtained from FDET/ADC(2) calculations ( emb [ B ]) using several choices for B (r) are shown in Fig. 3, together with the corresponding reference supermolecular ADC(2) results. These excitations have n-* character and are blueshifted due to the interactions with the environment. The magnitude of the reference shift falls in the 0.15-0.6 eV range, which makes the shift in these complexes a suitable observable for discussing the effect of the B -dependency of the FDET results. For this type of excitation, the combined effect of the approximation used for the FDET embedding potential and the use of the isolated environment density as B (r) results in an average error in the excitation energy of magnitude 0.04 eV Ricardi et al., 2018).
We start with an analysis of the results obtained without taking any experimental information from the molecular crystal, i.e. Á emb ½ corresponds to a 'standard' FDET protocol in which the Hartree-Fock density of the isolated environment is used as B (r). The deviations from the reference are small and their magnitude is consistent with the benchmark results published elsewhere  At J > 0, both the correlation and the crystal-field effects are taken into account in B (r), leading to a further reduction in the shifts. The deviations of FDET shifts from the reference increase (see the values of Á emb ½ J ¼1 B in Fig. 3). As previously mentioned, we could not extend the restraint to values of J larger than 1, because the procedure became numerically unstable (Genoni et al., 2018). Unfortunately, although the effects of correlation and polarization by the crystal field are reflected in J B ðrÞ, they cannot be separated. Moreover, the environment of GlyGly in the molecular crystal and in the clusters analysed in the present work are different. As a result, even if the reconstruction of the density of GlyGly from X-ray structure factors were exact, this would not guarantee that such density would yield the best FDET results for the clusters under investigation. The values Á emb ½ J B at J = 0 and J = 1 represent, therefore, a good estimate of the maximum scatter (minimal and maximal bounds) of the FDET results due to the B -dependency of the FDET embedding potential. Within these bounds, the deviations from the reference do not exceed 0.1 eV (or 30% in terms of the relative error). This also points out the need for a thorough analysis of the disentangled effects of correlation and polarization in Á½ J B .
The subsequent part concerns the numerical stability of the FDET-derived complexation-induced shifts of the lowest excitation energy with respect to variations of B (r) corresponding to the change in the parameter J from 0 to 1. Fig. 4 shows the dependence of the calculated shifts Á emb ½ J B on the parameter J for each complex. The dependence of Á emb ½ J B on J is smooth and monotonic. Above J = 0.5 up to the maximum value used in this study, J = 1, Á emb ½ J B remains almost constant (it changes by as little as about 0.01 eV). The magnitude of the solvatochromic shift decreases for all but one system (acetone + GlyGly 1) when J increases. This can be ascribed to a decrease in dipolemoment magnitude for the XRW densities when J increases.
A similar trend appears for correlated methods, which, as is known, tend to yield lower dipole moments.
Turning back to practical applications, we notice that largescale simulations usually apply the monomer expansion for both A (r) and B (r), and using the isolated environment density as B (r) already assures good accuracy of the FDETderived environment-induced shifts. In such simulations, the modeller has a wide range of available methods to generate B (r) (see the Introduction). The data collected in Fig. 5 show how the FDET results depend on the method used to generate B (r), including Hartree-Fock, MP1, CCSD and KS-DFT(PBE). For reference purposes, the values of Á emb obtained from X-ray diffraction data at J = 0.25 are also given.
The results collected in Fig. 5 indicate clearly that X-rayderived molecular densities are suitable for generating B (r) for FDET calculations following the conventional protocol   expansion for B (r), lack of explicit treatment of B (r) polarization by the chromophore]. The deviations from the reference are, however, larger than if Hartree-Fock or correlated isolated GlyGly densities are used for this purpose. This does not bear direct relevance to the quality of these densities. The overall error of the FDET-derived excitation energy results from the balance of the errors in two FDET embedding potentials evaluated at two different pairs of densities, v emb ½ ES A ; B ; v B ðrÞ and v emb ½ GS A ; B ; v B ðrÞ, where GS A and ES A denote the ground and excited state, respectively, in which the non-electrostatic contributions are approximate.
It is worth noting that the use of X-ray-derived densities as B (r) leads to smaller errors than if the Kohn-Sham PBE calculations are used for this purpose.

Conclusions
Recent developments in techniques to reconstruct the electron density from X-ray diffraction data have made it possible not only to determine the maxima of the electron density (coinciding with the positions of nuclei) but also to reveal its more detailed features (Genoni et al., 2018). For a molecular crystal, the reconstruction yields a localized density of a single molecule but taking into account its chemical environment. In FDET-based simulations, the molecular densities are used as an input quantity providing the complete quantummechanical descriptor of the environment of the embedded species. In the present work, we explored the possibility of using the molecular density of glycylglycine derived from X-ray diffraction data collected for the molecular crystal in FDET calculations of the complexation-induced shift of the excitation energy in eight intermolecular complexes, each consisting of an organic chromophore hydrogen-bonded to one glycylglycine molecule.
The usability of such densities for this purpose was not evident before the present study was made. Several factors could, in principle, invalidate such practical applications of X-ray reconstructed densities. First of all, glycylglycine in the crystal and in the complexes analysed in the present work has different environments. This might result in different polarization of such localized molecular densities, and as a consequence, contribute to errors in the FDET results. Other factors relate rather to the reconstruction procedure. It cannot be made perfect due to (i) errors in the experimental measurements, (ii) the very basic assumption according to which the average of a dynamic quantity (electron density) is represented using an intermediate object, namely a static singledeterminant wavefunction, (iii) incompleteness of the experimental data, (iv) errors in the phasing procedures, and (v) crystal defects. It might be expected, therefore, that the reconstructed densities would indeed not deviate significantly from the constraint-free one and, in turn, yield similar excitation energies if used as B (r). The primary objective of this work was the verification of this expectation. The obtained results demonstrate, indeed, that X-ray reconstructed densities are suitable to be used as B (r) in FDET on a par with possible alternative techniques.
Despite the fact that the X-ray restrained wavefunction procedure does not yield a unique solution, but rather a range of densities parametrized by J , the scatter of the excitation energies obtained using the whole range of this parameter is rather narrow. For all but one complex (acetone + GlyGly 1, characterized by a remarkably short hydrogen bond of only 1.3 Å ) the excitation energies vary within approximately 0.05 eV, depending on the details of the reconstruction procedure. This scatter in calculated shifts is small compared with the range of variation in solvatochromic shifts (Reichardt, 1994;Improta et al., 2016), making FDET simulations using X-ray-derived molecular densities an attractive tool for making quantitative predictions and interpreting experimental results. Further reduction of this scatter is probably possible through disentangling the effect of crystal-field polarization and the correlation effect on the electron density of a molecule in a molecular crystal. We intend to deal with this issue in our subsequent work. Also here the experimentally derived B (r) might prove more useful than alternative techniques. This is the case when the molecules associated with B (r) have similar neighbours in the cluster to be investigated and in the molecular crystal used to generate B (r).
For the first time, we adopted an experimental density for the environment and tested this on spectral shifts for valence excitations. At present, a density calculated via an X-ray restrained procedure is the only possibility for this approach.
The numerical examples in this work in which we applied the proposed procedure concern microsolvated clusters. For finite systems, many alternatives to generate B (r) involving similar or even lower computational cost are possible (Hartree-Fock or Kohn-Sham densities of isolated environments including, or not, their optimization or pre-polarization). This numerical validation is the first stage in our long-standing interests and plans aimed at modelling the electronic structure of species in the condensed phase such as neat or doped molecular crystals (Meirzadeh et al., 2018). We plan to apply the same strategy to generate the FDET embedding potential using experimentally derived B (r) for modelling other spectroscopic properties (core excitations, NMR shifts, two-photon absorption, hyperpolarizabilities etc.) that are evaluated from embedded wavefunctions. In infinite systems, the generation of B (r) from first principles might face serious difficulties if done in a similar way as for the studies of clusters, for the following reasons. Firstly, a density obtained using a straightforward application of the simplest protocol [generation of B (r) in an artificial system with a void in place of the part described by means of É A ] might be unphysical or even impossible to obtain due to convergence problems or the need for a much larger supercell. Secondly, taking into account the effect of electronic correlation on the electron density in periodic systems is generally limited to Kohn-Sham types of methods. A final point is worth discussing. By using a wavefunction restrained to fit the electron density of a molecule in a crystal, the environment density used in the approach we propose here implicitly includes not only the effects of intermolecular interactions but also the long-range electrostatic effects of crystalline matter.