Estimating temperature-dependent anisotropic hydrogen displacements with the invariom database and a new segmented rigid-body analysis program

A novel method and a new program for estimating anisotropic displacement parameters for H atoms are presented. Results are validated against molecular orbital computations and neutron diffraction data.


Introduction
Improving the accuracy of structural information derived from conventional single-crystal X-ray diffraction (XRD) experiments has been the initial aim for transferring aspherical scattering factors (Brock et al., 1991) and it remains the central aim of the generalized invariom database (GID; Dittrich et al., 2013). Deriving molecular properties from aspherical electron density is another important and closely related aim, since such properties can only be as accurate as the best possible set of coordinates that can be refined from a given data set.
Anisotropic displacement parameters (ADPs) of atoms and their electron density distribution (EDD) are correlated (Hirshfeld, 1976). It has hence been shown to be beneficial to incorporate hydrogen ADPs for obtaining accurate EDDs by least-squares refinement of multipole parameters against highresolution data (Madsen et al., 2004). Since this must also hold true for parameters derived from conventional data sets, our recent efforts concern the estimation of temperature-dependent hydrogen ADPs (H-ADPs) for selected applications in structure determinations with the independent atom model (IAM) as well as refinement with aspherical scattering factors. This includes charge density (CD) studies and refinements of data sets of normal resolution (sin = ¼ 0:6 Å À1 or ISSN 1600-5767 d ¼ 0:84 Å ) using scattering-factor databases, of which the GID is one. 1 Scattering of H atoms is limited in reciprocal space owing to their comparably low scattering contribution and their missing core density. XRD therefore has limitations in locating their positions and displacements accurately (Cooper et al., 2010), a situation that has already been improved considerably by scattering factor databases (Dittrich et al., 2005) and Hirshfeld atom refinement (Jayatilaka & Dittrich, 2008;Capelli et al., 2014), as comparisons with results from neutron diffraction have shown. These developments allow free refinement of hydrogen parameters but require good low-order data (Orben & Dittrich, 2014).
Several authors have proposed improved or alternative hydrogen treatment in XRD, mainly for CD work. With respect to positional parameters Hoser et al. (2009) recommend to only use low-order reflections from a high-resolution data set to determine the X-H directions, and then elongate to average neutron diffraction values (Allen & Bruno, 2010), whereas we advocate the use of calculated positions and X-H distances from the invariom database (Schü rmann et al., 2012), a procedure also applicable to data sets of low quality.
The SHADE (simple hydrogen anisotropic displacement estimator; Madsen, 2006) and SHADE2 servers (Munshi et al., 2008) can provide estimates of H-ADPs by combining a TLS (translation-libration-screw) fit (Schomaker & Trueblood, 1968) of the non-H atoms with average internal modes tabulated from neutron diffraction. The SHADE2 server has established its usefulness in CD research.
Other ways to estimate ADPs for H atoms have been developed. Displacements can likewise be computed from spectroscopic data as implemented in the SHADE3 server (Roversi & Destro, 2004;Madsen & Hoser, 2014). This idea was applied first by Hirshfeld & Hope (1980). One can also carry out theoretical optimizations of isolated molecular structures (Flaig et al., 1998) or employ QM/MM cluster computations to retrieve the structure found in the crystal (Whitten & Spackman, 2006). Computed frequencies can subsequently be converted into internal atomic displacements, which are again combined with a TLS analysis 2 after appropriate scaling (Scott & Radom, 1996). Last but not least, full periodic computations, implemented in the SHADE3 server, may also provide H-ADPs (Madsen et al., 2013;Madsen & Hoser, 2014). However, all these approaches have disadvan-tages: estimates derived from diffraction data do not take into account temperature dependence of the internal contribution of atomic vibration; 3 neutron data for the SHADE2 approach are not available for rare bonding environments; theoretical studies require high computational costs and are thus unsuited for conventional structure determinations. This is why we introduce a new approach based on the invariom database, combined with a new freely available TLS analysis program.
Our approach relies on the geometry-optimized model compounds in the invariom database. 4 It covers a wide range of chemical environments in organic chemistry (Dittrich et al., 2013) and now also facilitates aspherical-atom refinements of coordination compounds (Dittrich et al., 2015). Earlier work is here extended by providing functionality to estimate H-ADPs relying on the established empirical rules of partitioning electron density with invarioms. The rules already allow one to separate and reconstruct molecular EDDs from fragments and now also provide estimates of the internal modes of vibration of a particular chemical environment.
Estimation of H-ADPs thus allows further improvement of all those structures where chemical environments are covered by scattering factors of the GID. Moreover, estimated H-ADPs increase the choices in handling three common situations: (a) data of low quality can be better evaluated by reducing the number of refined parameters; (b) high-quality data of comparably low resolution are available; or (c) refinement of H-atom positions becomes an option when aspherical scattering factors and ADPs are kept fixed, thereby reaching better agreement with results from neutron diffraction and bond-length predictions of quantum chemistry.
Central to this work is the underlying development of a new segmented-body (Schomaker & Trueblood, 1998) TLS refinement program called APD-Toolkit (anisotropic proton displacement toolkit), which is introduced here.

Automatic segmented rigid-body analysis
A simple approach that can provide information on the coupling between internal and external displacements is to assume segmented rigid-body motion. Our implementation analyzes the shape of all measured ADPs and determines how attached rigid groups should be added to the otherwise rigid body to best fit the observed ADPs. After internal and external contributions are estimated, a displacement model for H atoms is then generated by adding both contributions. The well known Fortran77 program for TLS fits THMA14c (Schomaker & Trueblood, 1998) is limited to 230 atoms in the asymmetric unit and can only handle up to seven manually defined attached rigid groups. These limitations were our motivation to develop a more flexible solution. Our program was developed to estimate the ADPs of H atoms and will be discussed next. 1 Four scattering-factor databases currently exist: the 'supramolecular-synthon based fragments approach' (SBFA; Hathwar et al., 2011), the 'experimental library multipolar atom model' (ELMAM2; Zarychta et al., 2007;Domagała et al., 2012) (both based on high-resolution experiments), the 'generalized invariom database' (GID; Dittrich et al., 2006Dittrich et al., , 2013 and the 'University at Buffalo Databank' (UBDB2011; Dominiak et al., 2007;Jarzembska & Dominiak, 2012) (the latter two based on theoretical DFT computations). All four rely on the established Hansen/Coppens multipole model (Hansen & Coppens, 1978) and can successfully be used to improve the accuracy and precision of least-squares structure refinements. 2 In the context of this manuscript the term 'TLS analysis' is used for a postrefinement analysis of the atomic displacement parameters determined by the refinement program. The term 'TLS refinement' is not used to avoid confusion with procedures that apply a TLS model during the refinement process (Merritt, 1999).

Workflow of the program
The APD-Toolkit carries out the following steps: (1) Determination of invariom names of all atoms.
(2) Calculation of internal displacement parameters from GAUSSIAN (Frisch et al., 2013) output files and caching of results 5 for subsequent applications.
(3) Transformation of internal ADPs to the crystal coordinate system.
(4) Calculation of the difference between observed and calculated internal ADPs for all non-H atoms to remove contamination of the TLS parameters with internal ADPs.
(5) Determination of a suitable segmentation model for the segmented rigid-body analysis.
(6) Computation of a physically meaningful set of TLS+ARG (attached rigid group) parameters describing ADP differences.
(7) Computation of external ADPs for all H atoms based on the TLS+ARG parameters and the atomic coordinates.
(8) Estimation of H-ADPs by adding internal and external contributions.

Automated rigid-body segmentation
APD-Toolkit automatically analyzes the shape of non-Hatom ADPs to obtain a suitable segmentation model for a segmented rigid-body analysis. In contrast to similar procedures in protein refinement (Painter & Merritt, 2006) the method implemented analyzes the refined model to find a physically plausible segmentation model instead of finding the model that minimizes the R 1 (F) value.
The procedure works as follows. In a first step all single bonds in the molecule are flagged as potential axes for vibrations along a torsion angle (Blom & Haaland, 1985). For every potential axis the molecule is then separated into two parts that are connected only by one bond representing the rotation axis. The smaller of the two groups is considered to be the attached rigid group (Schomaker & Trueblood, 1998); the larger one is the rigid body. In a next step the difference of ADPs ( in i ) of a pair of bonded atoms in the direction of the connecting vector is computed for all atom pairs within the attached rigid group. In addition, the corresponding value ( out i ) is determined for all atom pairs where one atom is part of the attached rigid group and the other atoms are part of the rigid body. For every potential rotation axis the 'rigidity index' is then determined, as defined in equation (1) and illustrated in Fig. 1. If is negative, the implied attached rigid group is accepted. The expression of is purely empirical.
The factor " is used to control the weight between the rigidity of the ARG and the flexibility relative to the rest of the group. A value of " ¼ 2 gave the most reasonable results in our studies.
The groups are cross-referenced after all rigid groups are assigned to ensure that they all consist of at least eight atoms. Smaller groups are discarded, since they would not allow stable optimization. 6 After a suitable segmentation model is found, a least-squares optimization is carried out to find the optimal TLS+ARG parameters.
The procedure is applied for each molecule in cases where the asymmetric unit contains more than one molecule. Adaptation of the procedure for disordered compounds and molecules on special positions is planned only for future versions.

Results and discussion
3.1. Similarity of ADPs from TLS+ONIOM and TLS+INV 7 estimates: similarity of U(2) from TLS+ONIOM and TLS+INV For initial validation, theoretical ADPs taken from the generalized invariom database were compared with those obtained from ONIOM computations (Svensson et al., 1996) to assess the transferability of internal ADPs. Computations were performed with the B3LYP functional and the basis set combination 6-31G(d,p):3-21G, which has been shown to be a good compromise between computational requirements and the quality of results . Internal ADPs are not compared directly since the internal parts of the ADPs encompass different parts of the overall displacements. Illustration of the rigidity index: the NO 2 group attached to a molecule R is considered rigid if the average of out i is twice as large as the average value of in i .
5 Frequency information is extracted from all GAUSSIAN output files and stored in a database linking normal modes to the model compound including its geometry. This reduces the amount of data to store from several hundred gigabytes to about 100 MB. In a second step, when the method is applied to a structural model, the database is loaded and displacement parameters are calculated for all model compounds at the given temperature. The results are stored in a second database file now containing all internal ADPs at a certain temperature. The file is about 4 MB in size. During every subsequent application of the method it is checked whether the database already contains ADPs for the current temperature and ADPs are only recomputed if the method is applied at this temperature for the first time. Otherwise the cached results are used.
with external displacements derived from TLS analysis, analogously to x3.3. For estimating internal ADPs a lowfrequency cutoff value of 200 cm À2 was used (Madsen et al., 2013). 8 Structural models of four test structures obtained by the TLS+INV approach are shown in Figs. 2-5. 9 To quantitatively compare ADPs obtained by different methods a procedure proposed by Whitten and Spackman was employed (Whitten & Spackman, 2006;Munshi et al., 2008). This procedure determines the spacial overlap of two sets of ADPs. It yields a value of the comparison parameter (S) of zero if both ADPs are identical and a value of 100 if the ADPs do not overlap at all. S is computed as Tables 1-3 indicate that the agreement between the two methods depends on whether or not an H atom is involved in hydrogen bonding. In these cases the ONIOM estimate is more realistic since the bonding interactions, which are omitted in the TLS+INV approach, add forces to the H atoms that counteract vibrational movement. For those atoms not involved in hydrogen bonding the agreement is good, especially in cases where the asymmetric unit is described as one overall rigid body. This is supported by the very small discrepancies seen in the structures of MBADNP and xylitol. In these cases the non-hydrogen-bonded atoms have nearly identical ADPs. When the asymmetric unit content is more flexible or contains more than one molecule the agreement becomes less good, as evident in the structure of l-phenylalaninium hydrogen maleate. Since the TLS+INV approach does not include intermolecular interactions it predicts larger ADPs than the ONIOM model, which approximates these interactions. Slightly larger differences seen for methyl-group H atoms can also be explained by intermolecular interactions: while the rotational movement of the methyl group around a C-X (X not hydrogen) single bond usually has a discrete minimum for an isolated molecule, intermolecular interactions      Structural model of methylbenzylaminodinitropyridine (MBADNP) at 20 K (Cole et al., 2002) with ADPs estimated with the TLS+INV approach.
8 Inaccuracies introduced in the estimation of internal displacements in the TLS+INV model are absorbed in the TLS part. We argue that the threshold separating internal and external displacements is arbitrary to some extent. Contributions to the displacements that are caused by vibrations with frequencies close to the threshold can be modeled by an internal and an external displacement model alike. 9 ONIOM computations for dimethylbiguanidiniumbishydrogensquarate did not converge and were therefore not included in the comparison.
can lead to flattening of the potential, thus reducing the force required for rotating these groups.
Overall, the differences between the two methods are of the same order of magnitude as the differences seen between the estimated models and neutron diffraction derived models discussed below. We therefore argue that the TLS+INV method is an equivalent and easier to apply substitute for the computationally more demanding TLS+ONIOM approach. Empirical corrections for hydrogen bonding could be added at a later stage.

Temperature dependence of relative U iso values
Accounting for the measurement temperature when calculating the internal contributions to the ADPs avoids systematic errors that otherwise would affect data sets collected at low temperatures, especially below 100 K (Lü bben et al., 2014). That the temperature-dependent behavior is well reproduced in the TLS+INV approach introduced here is shown by comparing the temperature dependence of U iso values obtained by the TLS+INV model with those determined from neutron diffraction studies and from ONIOM cluster computations in the same manner as in our earlier work (Fig. 6).
These results are in very good agreement with those of Lü bben et al. (2014) and reproduce the temperature dependence. Additionally, the TLS+INV approach is able to estimate unbiased ADPs in cases where H atoms are disordered. Since the invariom approach relies on non-interacting molecules in the gas phase, displacement parameters of H atoms involved in hydrogen bonding are less well estimated.

Comparison with results from neutron diffraction
Estimated ADPs were compared with ADPs refined against neutron diffraction data to further validate the TLS+INV method. A set of four structures where both high-resolution X-ray data and neutron data are available were taken from the literature (Cole et al., 2002;Woiń ska et al., 2014;Ş erb et al., 2014;Madsen et al., 2003). A scaling model was fitted to each neutron data set to bring both sets of ADPs onto the same scale (Blessing, 1995). This was achieved by computing the set of parameters S 1 -S 7 in equation (3) that minimize the difference between ADPs of equivalent atoms in both models.
Modified TLS+INV results were then compared with those obtained with the SHADE server. One should note that the accuracy and absolute scale of H-ADPs remain unknown. While appropriate for atoms with similar mass, it has been shown that this scaling model does not yield accurate results when heavier atoms like iron are involved (Blessing, 1995). It is therefore reasonable to suspect that application of scaling parameters obtained by fitting against C and O atoms yields only rough estimates of hydrogen parameters. Absolute values of this comparison should therefore be interpreted with

Figure 6
Temperature dependence of U H iso =U X eq ratios obtained with the TLS+INV approach. The results are in very good agreement with those from our earlier study (Lü bben et al., 2014) reporting neutron and TLS+ONIOM results. Note: H atoms with the invariom name H1c[1c1h1h] are disordered and therefore appear larger when compared with the TLS+INV model. The H atom with the invariom name H1o[1c] is involved in hydrogen bonding, which is not accounted for in the TLS+INV model. Therefore, its U H iso =U X eq ratio is systematically larger.
caution. Concerning this problem, the SHADE server benefits from error cancellation: since the internal ADPs are obtained from neutron diffraction studies, the ADPs are already scaled appropriately for comparison with neutron diffraction results and possible systematic errors could be obscured.
The S values listed in Tables 4-7 quantify differences between the respective methods of estimation and ADPs from models refined against neutron diffraction data. The estimation methods are thereby not compared directly; instead their agreement with experimental data is given.
Overall, one can clearly see that the SHADE server estimates are closer to the model refined against neutron diffraction data. However, the overall differences between all three models are of the same order of magnitude. It should also be noted that the values depend on the applied refinement model that was used prior to TLS analysis. A variation of about 0.5-0.7 in the average S values was observed when the invariom aspherical-atom model rather than the IAM model was used. Such variations do not appear to be systematic and can lead to smaller as well as larger S values. Therefore we conclude that the overall standard deviations of the S values must be of the same order of magnitude.
For the case of xylitol it is noteworthy that parameterization of the SHADE server was initiated with neutron data for this compound. Hence it is expected that the SHADE server performs especially well for this structure.

Usability
The program APD-Toolkit was designed specifically to be easy to use. To demonstrate this a series of crystal structures were taken from the literature. The structures were re-refined with the invariom model and a subsequent TLS+INV treatment was applied. The TLS+INV application only requires one program call with output files from a previous refinement and no further input is needed. Currently SHELXL-style (Sheldrick, 2008) .res files, XD-style (http://xd.chem.buffalo. edu/ ) .res files, CIFs and PDB files (http://www.rcsb.org/) are supported. In Table 8 the effect of including the estimated H-ADPs on R 1 (F) was investigated. Respective refinement models use the same number of parameters. Table 8 shows that the improvements in the R 1 (F) values are temperature dependent. We chose R 1 (F) (with units weights) for historic reasons and since unweighted R 2 (F) is not very meaningful. When non-hydrogen ADPs are large they increasingly deviate from the segmented rigid-body approximation, possibly because of anharmonic vibrational behavior (Zhurov et al., 2011). Therefore TLS analysis may not provide   The large discrepancy for atom H71 is most likely due to ill-determined displacement parameters in the neutron refinement, as becomes obvious from visual inspection.   Table 8 Temperature and resolution dependence of the improvements in the R value for a series of structure determinations.

Conclusion and outlook
The combination of segmented rigid-body analysis with information from geometry optimized model compounds allows one to rapidly estimate anisotropic hydrogen displacements from tabulated values in the TLS+INV approach introduced here. The invariom approach is thus being extended to predict not only aspherical scattering factors but also H-ADPs, all from one consistent model and notation. This is an important advantage over other scattering-factor databases.
The program APD-Toolkit provides an easy to use way of estimating these displacement parameters with an accuracy comparable to the TLS+ONIOM method without the need for extensive computations upon application. The software is a standalone alternative to the SHADE server, is freely available for download (http://ewald.ac.chemie.uni-goettingen.de/ programs.html; https://github.com/jluebben/APD-toolkit) for various operating systems and can be easily adapted for other applications. The underlying TLS+ARG implementation can be combined with other software to generate segmented rigidbody models in an automatic fashion -without requiring specialized input-file formats or restrictions in system size.

APPENDIX A Transformations and TLS fit A1. Coordinate transformation
The invariom database stores structural parameters like atomic positions and corresponding ADPs in a standardized format. These parameters need to be transformed to the crystal's native coordinate system upon application.
Atomic positions are stored in an artificial crystal coordinate system in fractional coordinates. The artificial cell is cubic with a cell length of 30 Å .
ADPs are obtained from frequency computations carried out with GAUSSIAN and stored in a Cartesian coordinate system.
If V is the unit-cell volume, the matrix M fc is used to transform from fractional space to Cartesian space: : M cf transforms from Cartesian to fractional systems: If M fc;inv is M fc with a ¼ b ¼ c ¼ 30 Å and ¼ ¼ ¼ 90 and M cf;cryst is M cf with the crystal's cell parameters, the atomic position of an atom in the invariom database v inv in the crystal's coordinate system v cryst can be computed as The matrix representation of an ADP in Cartesian space, is transferred to the crystal's coordinates system with where a, b and c are the cell constants of the crystal.