A survey of thermal expansion coefficients for organic molecular crystals in the Cambridge Structural Database

Thermal expansion coefficients are calculated for 6201 molecular crystals in the Cambridge Structural Database and the distributions of the values are assessed.

the transformation matrix to yield Cartesian axes from the crystal axes, which depends on the convention chosen. Schlenker (1978), PASCal and the Bilbao Crystallographic Server all choose the Institute of Radio Engineers (IRE) convention: z(cart) parallel to crystal c, x(cart) parallel to crystal a*, and y(cart) perpendicular to x(cart) and z(cart). The same convention was implemented in the Python code, and the resulting strain tensors were validated against the STRAIN module of the Bilbao server.

S3. Calculating the principal expansion coefficients
The three principal strains, L/L, are obtained as the eigenvalues of the strain tensors. For each data point above the minimum temperature, the strain tensor is calculated relative to the lowest temperature, then a linear LS fit is applied. As for PASCal, the eigenvalues at each step are sorted by magnitude and assumed to be in the same sequence through the range. The resulting  L values are relative to L at the minimum temperature in the supplied range. For consistency with the approach applied to the volume fit, the values are re-scaled to refer to L at T = 298 K.  L = 1E6 × LS gradient / (1 + L/L (298 K)) ppm K -1 for the plot of each eigenvalue vs T. Acta Cryst. (2021). B77, doi:10.1107/S2052520621003309 Supporting information, sup-2

S4. Calculating errors
Standard uncertainties are calculated using heteroscedasticity-consistent standard errors, defined at: https://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors. For  V , the quoted standard uncertainty is the standard error on the gradient of V vs T, divided by the reference volume extrapolated to 298 K. For  L , the quoted standard uncertainty is the standard error on the gradient of L/L vs T, divided by the reference value of L/L extrapolated to 298 K.

S5. Fitting of the distributions
Histograms were produced using EXCEL, with bin ranges chosen to provide a smooth representation of the distribution. Continuous distributions were fitted to the derived histogram values using the SOLVER within EXCEL. For the volumetric coefficient, a normal distribution was applied. Three parameters (scale, mean, su) were optimised so as to minimise the sum of the squared differences between the normal distribution and the value in each histogram bin. The temperature points for the fit were taken to be the midpoint of each bin. For the principal coefficients and the anisotropy measure, the histogram was initially fitted using a skew normal distribution (https://en.wikipedia.org/wiki/Skew_normal_distribution), defined in EXCEL as follows: scale * NORM.DIST(T, mean, su, FALSE) * NORM.DIST(alpha*T, alpha*mean, su, TRUE) The first term is a symmetrical normal distribution and the second term is a cumulative normal distribution, multiplied by the skew parameter alpha. Four parameters (scale, mean, su, alpha) were optimised so as to minimise the sum of the squared differences between the normal distribution and the value in each histogram bin. The resulting continuous skew normal distribution was then approximated by two half normal distributions, defined with a common mean, but individual standard deviations and a single scale parameter linked by the ratio (su(R)/su(L)). The four parameters defining these two half normal distributions were optimised so as to minimise the squared differences relative to the continuous skew normal distribution over the full range of the plot, with this sum of squares including the lower half of the left distribution and the upper half of the right distribution. The values quoted in the paper are rounded to integers, to avoid any false indication of precision. Table S1 This is noted in the text to be an example of a family containing a significant outlier at 173 K.

173
-23 (17) 26 (10) 49 (4)  52 136 (51) For this case, with the poor linear fit to all of the points, the volumetric coefficient derived from a fit of T vs V is significantly different from the sum of the principal coefficients.

190
-13 (2) 55 (3) 56 (1) 98 102 (7) S7.2. MNYPDO Table S3 The largest identified structure family, and noted in the text to contain a clear outlier at 296 K. The orthorhombic structure permits an independent check on L/L calculations and extrapolation of  L to 298 K, by a direct plot of L vs T.

Table S5
This example is noted in the text to be an exceptional case reported in the literature. The orthorhombic structure has principal axes aligned with the crystal axes, which permits independent checks on L/L calculations, and extrapolation of  L to 298 K, by a direct plot of L vs T.

Check using a linear fits of V vs T and L vs T (calculations in EXCEL)
: Supporting information, sup-11

Figure S4
The results are then quoted as ranges: the expansion coefficients (ppm K -1 ) of the axes lie in the range 156 <  a < 515; -32 <  b < -85 and -48 <  c < -204 over the temperature range 225-330 K.
The approach in this current paper is to assume linearity of the plot of V (or L) against T (recalling that this is a necessity to deal with the vast majority of the data set extracted from the CSD), thereby producing a single fitted value for the gradient. Considering the full data range for AHEJAZ (225-330 K), this produces the following plots and coefficients: