Transferable Hirshfeld atom model for rapid evaluation of aspherical atomic form factors

A databank of atomic densities calculated using Hirshfeld partition has been developed which allows for refinement with similar accuracy to Hirshfeld atom refinement without the need for time-consuming wavefunction calculations.


S1. Some properties of S12 similarity index
The  12 similarity index is defined as:  12 = 100(1 −  12 ), where  12 is defined as follows: 12 = ∫ √ 1 () 2 () 3  where () is probability density function (pdf) for finding atom displaced by vector  from the equilibrium.In fact  12 is equal to the correlation coefficient (Merrit, 1999), which is a popular measure used macromolecular crystallography, the equality can checked by comparing the exact expressions given in the original publications. 12 is known in statistics as the Bhattacharyya distance (Bhattacharyya 1943(Bhattacharyya , 1946)).The () pdf is given by: Where the matrix  is a mean-square displacement tensor which components are known as anisotropic atomic displacement parameters.Matrices  and  −1 has common eigenvectors.In coordinate systems which axis are oriented along eigenvectors of  the tensor becomes diagonal and the pdf becomes a product of three univariate Gaussian functions, each of the following form: The R12 integral becomes a product of three single variable integrals, each of the form: If a tensor  is compared with its n-times larger version , then each of the factors becomes: and as a consequence enlarging tensor  n-times and comparing with the original one would give: It shows that  12 () behaves like a quadratic function of  around  = 1, i.e. it grows slowly in that region, for example enlarging ADPs by 5% gives  12 value about only 0.045 "percent" when compared with the original one.Bhattacharyya, A. (1943) Bull. Calcutta Math. Soc. 55, 99-110. Bhattacharyya, A. (1946) Indian J. Stat. 7, 401-406 Merritt, E. A. (1999).Acta Cryst.D55, 1997-2004.

S3. Accuracy of multipole expansion of atomic electron density in HAR
Table S1 Comparison of structures obtained from HAR refinement using atomic electron density represented (1) with numerical values on integration grid (standard approach) (2) with multipole expansion of the densities (up to   order of spherical harmonics).Reported average difference in length of covalent bond to atom (⟨|Δ − |⟩), and average rescaled overlapping coefficient (⟨  ⟩) for hydrogen atom ADPs.

S4.2. Final selection of molecules/ions
After each structure is divided into separate chemical units (molecules/ions) and atom types are assigned a selection algorithm is applied to select final set of chemical units for further use in the databank creation.This set is used as a source of input geometry for wave function calculations after an adjustment of X-H bond lengths to tabularized values from neutron measurements.

The chemical units selections algorithm:
A chemical unit selection procedure is applied repeatedly.It selects one chemical unit to the final set of chemical units.The procedure is repeated till all atom types are represented in at least N chemical units.
The chemical unit selection procedure: 1. Calculate score function for all chemical units not selected so far (the selected ones are not used in selection procedure).
2. Select the chemical unit with the highest score.
The score function calculation: Score is calculated using the following expression: where the summation runs over atom types,  is a number of atoms in the chemical unit,  is 100 if there is a chemical unit with identical structural formula already chosen and 1 if it is not,  is an index of an atom type,  is a target number of chemical units containing atoms of a given type,   is a number of chemical units containing atom of a given type () selected so far,   is similarity to the other selected chemical units containing atom of type ().Structural formulas comparison involved in evaluation of  is performed with graph isomorphism algorithm, molecular graphs with chemical elements as 'colours' of graph nodes and chemical bonds as edges are used.
The similarity   is calculated using the following expression: where the summations runs over chemical units containing atoms of type  selected so far.  ∈ (0,1), ( 1 ,  2 ) is a "scalar product" of chemical formulas: where the summation runs over all chemical elements  occurring the two chemical formulas  1 and  2 , (,   ) is number of atoms of the element  in the formula   .

Figure S1
Figure S1 Comparison of ADPs similarity indices  12 /  for the xylitol structure (a) from neutron measurement and (b) from HAR and for ice VI (c) from neutron measurement and (d) from HAR.

Figure S3
Figure S3 Comparison of: X-H bond lengths with reference neutron diffraction data in terms of average difference (in mÅ) for (a) non-polar (C-H) and (b) polar (N-H and O-H) bonds.THAM and HAR models are based on B3LYP/cc-pVTZ unless specified otherwise, (±) stands for HAR with crystal environment represented via point multipoles.Structure abbreviations: G-A -Gly-L-Ala, Xyl.-Xylitol, Car.-Carbamazepine, NAC -NAC•H2O.

Table S3
Comparison of average difference in X-H bond lengths (in mÅ) between results from X-ray structure refinement and reference neutron study data ( − −   ).HAR (±) stands for HAR with crystal environment represented via point multipoles, HAR (alone) for the version without such representation.

Table S4
Comparison of hydrogen atom ADPs obtained with aspherical atom model X-ray refinements in terms of similarity index S12, neutron diffraction experiments used as a reference.HAR (±) stands for HAR with crystal environment represented via point multipoles, HAR (alone) for the version without such representation.

Table S5
Comparison of R1 and wR2 agreement factors, aspherical atom models derived X-H bond lengths with reference neutron diffraction data in terms of absolute difference (in mÅ) hydrogen atom ADPs obtained with aspherical atom model X-ray refinements in terms of rescaled overlapping index   .All methods compared (TAAM, THAM and HAR with crystal environment represented via point multipoles) based on B3LYP/6-31g(d,p) level of theory.