Protonated nucleobases are not fully ionized in their chloride salt crystals and form metastable base pairs further stabilized by the surrounding anions

Charge-density studies of cytosinium chloride, adeninium chloride hemihydrate and guaninium dichloride crystals based on ultra-high-resolution X-ray diffraction data and extensive theoretical calculations are presented. The studies confirm the importance of electrostatic interactions in ionic crystals and show how counterintuitive they can be for protonated nucleobases of like charge.


Experimental charge density models
Crystallization, data collection and processing. In order to obtain charge density quality, single crystals, adenine (3.7 ·10 −3 mol) or guanine (3.3 ·10 −3 mol) in their neutral form were dissolved in a mixture of 5.0 ml distilled water and few drops of 38% hydrochloric acid solution. The obtained suspensions were subsequently heated to 80 ℃ and heating was continued until the substrates dissolved entirely. The solutions were left in vials covered with paraffin film at 37 ℃. Prismatic crystals of good quality were obtained after one month. As for cytosinium chloride, equimolar amounts of cytosine and 4-thiouracil (1.5 ·10 −4 mol) were dissolved in 5.0 ml of distilled water and a few drops of 1M HCl. The mixture was stirred in ultrasound bath and heated to approximately 50 ℃, until both compounds completely dissolved. Solutions were left for evaporation at room temperature in cytosinium chloride (CC) case, and in 37 ℃ in adeninium chloride hemihydrate (ACH) and guaninium dichloride (GDC) case. Good quality crystals were obtained after one month. All aforementioned compounds were purchased from the Sigma-Aldrich Corporation.
For cytosinium chloride (CC) and adeninium chloride hemihydrate (ACH) crystal samples, subatomic resolution X-ray measurements were performed at 90 K on an Agilent Technologies SuperNova four-circle diffractometer equipped with a micro-focus sealed tube and Eos CCD detector. The temperature was controlled with an Oxford Cryosystems low-temperature nitrogen gas-flow device (Cryostream Plus). The crystals were mounted on a goniometer head using nylon loop and placed 60 mm from the detector. For CC a total of 3836 frames were collected in 39 runs using ω scan; rotation width of 1.0° and exposure time in the range of 4 -10 seconds. Similarly, for ACH a total of 3195 frames were collected in 67 runs using ω scan, a rotation width of 1.0° and an exposure time in the range of 12.50 -110 seconds. For guaninium dichloride (GDC) crystals, highresolution single crystal X-ray measurement was carried out on a Bruker APEX II ULTRA single-crystal diffractometer with a TXS rotating anode (Mo Kα, radiation λ = 0.71073 Å) equipped with a CCD-type area detector, multilayer optics and an Oxford Cryostream low temperature attachment set to 100 K. A transparent cube-shaped crystal sample of GDC was attached to a cryogenic nylon loop, mounted on a goniometer head and positioned 50 mm from the detector. A total of 12412 frames were collected in 44 runs to obtain the high redundancy data. Diffraction data was collected using the ω and φ scan method with a rotation width of 0.5° and the exposure time in the range of 20 -60 seconds.
For CC and ACH, the determination of unit cell parameters, integration of reflection intensities, and data reduction, including multiscan absorption corrections, were performed using CrysAlis PRO Version 1.171.36.32 (CrysAlis, 2013). Finally, reflection merging was carried out with the SORTAV program (Blessing, 1987;Blessing, 1989;Blessing, 1995;Blessing, 1997). Whereas for GDC, the determination of unit cell parameters, integration of the reflection intensities, and data reduction were performed with the APEX2 suite of programs (integration was carried out with SAINT V8.27B) (Bruker, 2013) and the multiscan absorption correction, scaling and merging of reflection data were carried out with the SORTAV program (Blessing, 1987;Blessing, 1989;Blessing, 1995;Blessing, 1997).
Structure solution and refinement. Using Olex2 (Dolomanov et al., 2009), the structures were solved with the ShelXS (Sheldrick, 2008) program with direct methods and refined with the olex2.refine refinement package with Gauss-Newton minimization using the independent atom model (IAM).
Multipole refinements were performed in the MoPro Suite software package (Guillot et al., 2001;Jelsch et al., 2005) with the use of the Stewart-Hansen-Coppens multipolar model (Stewart et al., 1975;Hansen & Coppens, 1978). In the multipole density formalism, the molecular electron density is expanded in pseudoatom density contributions. The density of each pseudoatom is given by: where ρcore and ρval are spherical and valence densities, respectively. The third term contains the sum of angular function ( , ) to account for aspherical deformations. The coefficients and are multipole populations for the spherical and multipole density, respectively. The κ and κ' are the scaling parameters which determine the expansion/contraction of spherical and multipolar valence densities, respectively.
For each sample, refinement was performed against structure factor amplitudes (F) and only those reflections fulfilling the I ≥ 2σ(I) conditions were taken into account as this was found to produce the best results. The initial atomic coordinates, x, y and z, for all atoms, anisotropic atomic displacement parameters ( 's) for non-hydrogen atoms and isotropic atomic displacement parameters for hydrogen atoms were taken from the IAM refinement.
With the use of the LSDB (Volkov, Li et al., 2004) program, all deformation parameters were defined with respect to their local Cartesian coordinate systems and the initial multipolar and contraction-expansion parameters for nucleobases and water molecules were transferred from UBDB2011 (Jarzembska & Dominiak, 2012). Multipole expansion was truncated at the hexadecapole ( = 4) and quadrupole ( = 2) levels for all non-hydrogen and hydrogen atoms respectively. For hydrogen atoms, only the bond-directed dipoles and quadrupole populations (i.e. and 2 ) were refined. Special positional constraints for occupancy, coordinates, and thermal and multipolar parameters were applied to all atoms in GDC and to the O1 atom in ACH. For each sample parameters for all atoms were constrained to the local symmetry suggested by the LSDB program. Each atom was assigned with core and spherical-valence scattering factors derived from Su and Coppens atomic wave functions for neutral atom configurations except chlorine atoms (Su & Coppens, 1998). In the case of chlorine atoms two possibilities were investigated: the Cl −1 ion scattering radial function and Cl −1 ion electronic configurations (core: 1s 2 2s 2 2p 6 , valence: 3s 2 3p 6 ), or the Cl 0 neutral scattering radial function and Cl 0 neutral electronic configurations (core: 1s 2 2s 2 2p 6 , valence: 3s 2 3p 5 ). The anomalous dispersion coefficients were taken from Kissel et al. (Kissel et al., 1995). The contraction-expansion parameters, κ and κ' for all hydrogen atoms were kept fixed at the UBDB2011 values during refinements. To eliminate instabilities in the refinements, it was necessary to constrain κ' parameters of the deformation functions to 1.0 for chlorine atoms. The values of the parameters for hydrogen atoms were estimated from the SHADE 3.0 server (Madsen, 2006); the values were not refined but updated in-between refinement stages until convergence. The X-H distances were restrained to the averaged values obtained from neutron diffraction studies (Allen & Bruno, 2010) with a restraint σ of 0.004 Å.
Additionally it was found that the Cl1 atom of ACH undergoes noticeable anharmonic motion. Gram-Charlier (GC) coefficients (Kuhs, 1983;Johnson, 1969;Scheringer, 1985) for up to the third order were used to successfully model anharmonic motion, whereas the physical reliability of the anharmonic model was confirmed by the probability density function (PDF) computed with the MolIso program (Hubschle & Luger, 2009). It is probable that also in the case of CC and GDC, anharmonic motion of Cl atoms is present, but the resolution of the data does not allow for reliable refinement of GC coefficients.
The detailed general refinement procedure involved the following refinement steps: (i) scale factor refinement (which was also refined in almost all other stages); (ii) high-order refinement (sinθ/λ > 0.7 Å −1 ) of the x, y, z and parameters was carried out for non-hydrogen atoms. Then, considering the whole resolution range, refinement of the scale factor, x, y, z, and parameters of the H atoms was carried out. These steps were repeated until convergence was obtained; (iii) the anisotropic ADPs ( ) of the H atoms were estimated using the SHADE program and were kept constrained to the SHADE values in subsequent refinements; (iv) multipole population parameters and ′ in a stepwise manner; (v) all multipole population parameters and structural parameters simultaneously followed by re-estimation of the from SHADE; (vi) for CC and GDC block refinement of: (vii) all multipole population parameters, non-hydrogen atom κ parameters and structural parameters simultaneously.
The outcomes of multipole refinements were verified by examining R-factors, GOFs and residual densities,

UBDB charge density models
UBDB models of charge densities for CC, ACH and GDC crystals were built on experimental geometries using LSDB (Volkov, Li et al., 2004) and the UBDB2011 (Jarzembska & Dominiak, 2012) version of databank. Chlorine atoms were treated as spherical anions with Cl −1 ion scattering radial function and Cl −1 ion electronic configurations (core: 1s 2 2s 2 2p 6 , valence: 3s 2 3p 6 ). Core and spherical-valence factors for each atom were taken from Su and Coppens' atomic wave functions (Su & Coppens, 1998). For the majority of calculations, the net charge of each molecule was scaled separately to its formal value: +1e for cytosinium and adeninium, +2 e for guaninium, 0 e for water molecules and -1 e for chloride anions. For alternative simulations (as explained further in the text), net molecular charges were scaled to experimentally observed values.

Periodic quantum mechanical calculations
Periodic quantum mechanical calculations based on the variational principle were applied to compute theoretical charge densities, theoretical structure factors and cohesive energies ( ℎ ). ℎ were computed using (Civalleri et al., 2008): where is the total energy of a system (per unit cell) and is the energy of a molecule extracted from the bulk. Z stands for the number of molecules (here ionic pairs, including half of water molecule in the case of ACH) in the unit cell. The Crystal14 package (Dovesi et al., 2014a;Dovesi et al., 2014b) was used at the DFT-B3LYP/pVDZ level of theory (Becke, 1988;Perdew, 1986;Lee et al., 1988;Dunning, 1989). Brillouin zone was sampled using 170 k-points (the shrinking factor of the reciprocal space net was set to 8). See Table S1 in SI for further details.
Static theoretical structure factors were computed from previously obtained wave functions using the XFAC option in Crystal14. For each computation, a set of unique Miller indices was firstly generated using the segment description approach (Le Page & Gabe, 1979;Flack, 1984) up to the Multipole refinements against theoretical structure factors were performed using the MoPro Suite software package with the analogous strategy as in the case of experimental structure factors. The scale factor and the x, y, z parameters were not refined. Basic statistical descriptors of the refinement are given in the Table   2 and more information can be found in SI, Figures S5 -S12. A QTAIM analysis was carried out on all experimental, UBDB and theoretical from periodic quantum mechanics electron density models of CC, ACH and GDC crystals. For models in the multipole representation (experimental, UBDB and the one fitted to theoretical structure factors) integrated charges were computed using the WinXPRO program (Stash & Tsirelson, 2002;Stash & Tsirelson, 2007). For exact (not approximated by the multipolar model) theoretical crystal electron densities, integrated charges were calculated for experimental and optimized geometry using TOPOND14 (Gatti et al., 1994;Tsirelson, 2002).

Electrostatic potentials
Electrostatic potentials for single molecules, selected dimers and entire crystals were computed from charge density models in multipole representation (experimental, UBDB and the one fitted to theoretical structure factors) with the use of the XDPROP module of the XD2016 package (Volkov et al., 2016). For calculations of electrostatic potentials of entire crystals contributions from atoms from at least 3x3x3 unit cells were taken into account. To plot molecular electrostatic potentials on molecular electron isodensities the MoleCoolQt Revision 576 program (Hubschle & Dittrich, 2011) was used.

Intermolecular interaction energies and electrostatic contributions to them
To compute intermolecular interaction energies for isolated dimers with geometry as found in studied crystals the DFT-based symmetry-adapted perturbation theory (DFT-SAPT) (Jansen & Hesselmann, 2001;Williams & Chabalowski, 2001;Jeziorski et al., 1994) method was applied. Formal charges were assigned to particular molecules (+1 e for cytosinium and adeninium, +2 e for guaninium, 0 e for water molecules and -1 e for chloride anions). Total intermolecular interaction energy in the DFT-SAPT is given as the sum of the first ( 1 ) and second-order ( 2 ) perturbation energy terms and the 2 energy term, specifically electrostatic ( 1 ), induction ( 2 ) and dispersion ( 2 ) energy terms together with exchange-repulsion ( ℎ 1 , ℎ− 2 and ℎ− 2 ) terms: The corrections 2 were applied in all cases to estimate the polarization effect beyond the second order. DFT-SAPT calculations applied Kohn-Sham (KS) orbitals which were determined using the PBE0AC exchangecorrelation (XC) potential (Heßelmann & Jansen, 2002) which consists of using the PBE0 functional with a hybrid kernel consisting of 75% adiabatic local density approximation and 25% coupled Hartree-Fock to solve coupledperturbed static and frequency-dependent KS equations for the second-order contributions (Gross et al., 1996).
For neutral and cationic molecules, the shift parameter ( ) to correct the asymptotic behaviour of the functional was calculated as the difference between the HOMO energy of each monomer (ϵHOMO) and the true ionization potential obtained from the calculation of its neutral/cationic and ionized forms ( ): The values of HOMO and ionization energies were calculated using the PBE0 functional. The anionic systems (chloride anions) were left without asymptotic correction since the XC potentials in these cases are short-ranged and decay with higher powers of 1/r (Lee & Burke, 2010). All DFT-SAPT and HOMO energy calculations were done in the MOLPRO2012.1 program (Werner et al., 2012) with the aug-cc-pVTZ Dunning basis set (Kendall et al., 1992;Dunning, 1989).
For the calculation of electrostatic, polarization, dispersion and repulsion contributions to lattice energy, the semi-classical density sum (the PIXEL method) (Gavezzotti, 2011) was used, which also relays on dimers approximation. Molecular electron densities of molecules bearing their formal charges were obtained using the GAUSSIAN09 revision B.01 program (Frisch et al., 2016) using the 6-31** basis set (Hariharan & Pople, 1973) at the MP2 level of theory. The electron densities were then analysed using the PIXELc module (Gavezzotti, 2003c;Gavezzotti, 2003d;Gavezzotti, 2003a;Gavezzotti, 2003b) of the Coulomb-London-Pauli (CLP) program (Gavezzotti, 2011) which allows the calculation of lattice energies. The total intermolecular interaction energy was defined in PIXEL as following: where, is the electrostatic interaction energy, is the polarization energy, is the dispersion energy and repulsion energy between interacting molecules. Due to limitation of the program, i.e.it was possible to perform calculation of crystal cohesive energy only for systems which have maximum two molecules in the asymmetric unit, crystal cohesive energy was computed only for the CC structure.
Electrostatic contributions to intermolecular interaction energies and to crystal cohesive energies were also .
It is to be noted that 1 from the Eq. 3 refers to the exact electrostatic interaction energy and from now onward will be abbreviated as .
In the case of refinement against static theoretical structure factors, high negative peaks at the vicinities of nuclei positions are observed on residual maps. This is very well know phenomenon and is usually attributed to limitations of multipole model, which is not flexible enough to accommodate all details of electron density changes while going from spherical isolated atoms/ions into molecules and crystals. To resolve the problem, extended Hansen-Coppens model most often is used in which core electron density term is refined in addition to valence spherical and deformation electron density. However there is another possibility, in our opinion. We have considerable experience with refinement of standard Hansen-Coppens model against valence-only theoretical structure factors; the refinement is a part of procedure for adding a new atom type to the UBDB.
Despite the fact that these structure factors do not include information about core electron densities, highresidual peaks close to nuclei positions are still present. Thus, in our opinion, in organic crystals, more elaborated model of spherical part of valence density is necessary. For example, to account for s and p hybridization one can split spherical valence density into two terms, one build form s-orbitals and another one from p-orbitals, and refine populations and expansion-contraction parameters for each term separately. In addition, high residual peaks in the vicinity of chloride nuclei could come from the fact that multipolar models in this study were built on the basis of the SCM electron densities which take into account relativistic effects, whereas during periodic calculations relativistic effects were not included. There is one more possible reason for the presence of residual peaks close to nuclei, it is the inherit difference between Slater and Gaussian functions, the former used in multipole model, the later in Crystal14 calculations.
To be sure that molecular charges (both, these computed from Pval, and these from topological analysis) obtained from the fit of standard multipole model to theoretical structure factors, we did various extended multipole model modelling for cytosinium chloride, among others: (a) core refinement: in addition to valence density parameters (second and third term of standard Hansen-Coppens model), we refined also expansioncontraction parameters and electron population for K-shell (for Cl-, O, N and C) and L-shell (for Cl-) electron densities; (b) split-valence refinement: in which we modified only the second term of the model, i.e. spherical valence electron density, by splitting it into two spherical terms, one build from s-orbitals and the second one from p-orbitals; (c) second-monopole refinement: in which in addition to standard parameters, we refined Two examples of refinements which lead to molecular charges around +/-0.80 and one example of refinement which gives charge around +/-1.05 e are given in Figure S14.

Transition state search
For select two dimers of protonated nucleobases transition state search was performed in Gaussian16 (Frisch

Figure S1
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against experimental structure factors with the use of Cl -1 ionic scattering radial functions.

Figure S2
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against experimental structure factors with the use of Cl -1 ionic scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S3
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against experimental structure factors with the use of Cl 0 neutral scattering radial functions.

Figure S4
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against experimental structure factors with the use of Cl 0 neutral scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S5
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for experimental geometry with the use of Cl -1 ionic scattering radial functions.

Figure S6
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for experimental geometry with the use of Cl -1 ionic scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S7
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for experimental geometry with the use of Cl 0 neutral scattering radial functions.

Figure S8
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for experimental geometry with the use of Cl 0 neutral scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S9
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for optimized geometry with the use of Cl -1 ionic scattering radial functions.

Figure S10
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for optimized geometry with the use of Cl -1 ionic scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S11
Fourier residual density maps in the plane of nucleobase (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plots of residual density for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for optimized geometry with the use of Cl 0 neutral scattering radial functions.

Figure S12
Normal probability plots on F 2 (left) and scale factors, ∑| | ∑| | ⁄ , with respect to resolution (Å -1 ) plots (right) for CC (up), ACH (middle) and GDC (bottom) multipolar refinements against theoretical structure factors computed for optimized geometry with the use of Cl 0 neutral scattering radial functions. Prepared with the program XDRKplot from the WinGX package.

Figure S13
Fourier residual density map in the plane of adenine (right column, contour level: ± 0.05 eÅ -3 ) and fractal dimension plot of residual density for ACH multipolar refinements against experimental structure factors with the use of Cl -1 ionic scattering radial functions and with harmonic approximation of thermal motion for the Cl atom. Fourier residual density map in the plane of cytosine (contour level: ± 0.05 eÅ -3 ), fractal dimension plot of residual density and statistical descriptors for CC extended multipolar refinement (see description in the table above) against theoretical structure factors computed for experimental geometry with the use of Cl -1 ionic scattering radial functions.

Figure S16
Hirshfeld surface based fingerprints of the chloride anion in the CC (left), the chloride anion (up) and the water molecule (down) in the ACH (middle), and the first (up) and second (down) chloride anions in the GDC (right) crystal structures.

Figure S17
Percentage contributions of different interatomic contact types to the Hirshfeld surfaces of particular molecules present in the CC, ACH and GDC.