Designing better diffracting crystals of biotin carboxyl carrier protein from Pyrococcus horikoshii by a mutation based on the crystal-packing propensity of amino acids

In order to improve the efficiency of protein crystallization, an alternative approach using the mutation of surface residues was devised based on the results of a statistical analysis of the crystal-packing propensity of amino acids. A systematic crystallization experiment validated the results of the statistical analysis.


S1 Crystal-Packing Propensity of Amino Acids
We calculated the crystal-packing propensity Li for 20 amino acids (i = 1, 2, ..., 20) from the crystal structure dataset, which was newly compiled from previous reports as described in Methods of the main text. The crystal-packing propensity represents the likelihood of involvement of an amino acid of interest in crystal contacts. Figure S3 presents the crystal-packing propensity for each amino acid irrespective of secondary structural classes. By comparing the propensities of each residue, results showed that the rarest residue in the crystal contact interface was lysine. Lysine tends to avoid involvement in crystal contact. However, alanine, the most commonly used residue as a replacement residue of mutation in the SER method, did not have such high propensity: the crystalpacking propensity of alanine was about 0.964. To confirm our findings of crystal-packing propensities, especially in the secondary structure dependency, we also calculated the crystal-packing propensity using subsets selected randomly from the crystal structure dataset. In this trial, the crystal-packing propensities for 20 amino acids depending on three secondary structural classes and on entire classes were calculated 200 times on every 500 randomly selected entries. Figure S1 depicts boxplots of calculated values of crystalpacking propensities for 20 amino acids depending respectively on three secondary structural classes. We performed Kruskal-Wallis one-way analysis of variance, which revealed that significant differences among bootstrap averages of propensity of each three secondary structure for all amino acids respectively (p<0.01 for all cases). General trends and mean values are consistent with the results presented in Fig. 1B of the main text. In most cases, we found significant differences of crystal-packing propensity of an amino acid depending on secondary structural classes. We infer that these results support our argument in the main text.

S2 Factors Influencing Crystal-Packing Propensity
To identify factors affecting the crystal contact formation, we conducted a multiple linear regression analysis by stepwise forward selection method with the crystal-packing propensity as the explained variable and 10 independent indices, which represent physicochemical properties of amino acids, as candidates for explanatory variables. We used values introduced by Kidera et al. (1) that represent 10 mutually independent physicochemical properties derived from the set of amino acid indices used at that time by application of factor analysis. The set of values defines the important properties of 20 amino acids. The properties include 10 properties such as helix/bend preference, side-chain size, extended structure preference, hydrophobicity, double-bend preference, partial specific volume, flat extended preference, occurrence in alpha region, pK-C, and surrounding hydrophobicity. We conducted multiple linear regression analysis against the crystal-packing propensity for each secondary structure class to pursue a more concrete source of the differences described above. In advance of the regression analysis, the crystal-packing propensity was standardized using the following equation: .
In that equation, μ and σ respectively represent the mean and sample standard deviation of the crystal-packing propensity. The set of values for the 10 properties had already been standardized similarly. In addition, the values for side-chain entropy (SCE) (2) were standardized in the same manner when we used them for multilinear regression analysis.
Results show that the dominant properties of statistical significance against the crystal-packing propensity on the helix and coil region were hydrophobicity and the side-chain size of residues, which were identical to the case of the propensity on all regions (Table S4). The most dominant properties against the crystal-packing propensity on the sheet region included extension of the structure preference in addition to the two factors. The extended structure preference of isoleucine is quite high, perhaps reflecting the higher value of isoleucine in the crystal-packing propensity. Furthermore, although we conducted the same analyses with the 10 properties and SCE as candidates for explanatory variables, the result did not change (SCE was not sufficient to explain the crystal contact formation likelihood). Therefore, the result suggests that the dominant properties for the crystal-packing propensity are hydrophobicity and the side-chain size of residues, rather than SCE. Nevertheless, we do not completely reject the effect of SCE on crystallization improvement. Rather, although these two factors differed from SCE, probably the two factors partially included the property of SCE because, as shown in Table S5, the correlation coefficient of hydrophobicity and side-chain size with SCE were the first and second highest values among the 10 factors, where the correlation coefficient of hydrophobicity and side-chain size with SCE were, respectively, 0.599 and 0.577. From the collected results of this study, we inferred that the most dominant factors controlling crystallization are hydrophobicity and side-chain size of residues, which might latently include the property of SCE.

S3 Analysis of Accessible Surface Area of Crystal Structures
Crystal packing of the wild type and the two mutants (A138I, and A138Y) was investigated further through analysis of the accessible surface area (ASA). In all three crystal forms, the crystal packing buries the ASA of protein molecule in the asymmetric unit to a similar degree by about 2000 Å 2 sharing 47-52% of total ASA (Table S6). As expected, the contribution of the 138th side-chain to the buried ASA was substantially greater in mutants: 3.8% in A138I and 4.9% in A138Y, but only 0.7% in the wild type. A surface representation of the crystal packing interface involving the 138th residue depicts an extensive contribution of the 138th side chain to the crystal-packing interaction (Fig. S2). It is noteworthy that the order of the buried ASA does not correspond to that of the crystal quality in terms of the ability to diffract X-rays.

Figure S1
Crystal-packing propensity for each amino acid: Crystal-packing propensities for 20 amino acids depending on three secondary structural classes were calculated 200 times for every 500 randomly    This contingency table was produced based on the corresponding plot in Fig. 2a.    The Pearson correlation coefficient between SCE and each variable was calculated.