research papers
Detection and correction of underassigned rotational symmetry prior to structure deposition
aPhysical Biosciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
*Correspondence e-mail: nksauter@lbl.gov
Up to 2% of X-ray structures in the Protein Data Bank (PDB) potentially fit into a higher symmetry https://cci.lbl.gov/labelit.
Redundant protein chains in these structures can be made compatible with exact with minimal atomic movements that are smaller than the expected range of coordinate uncertainty. The incidence of problem cases is somewhat difficult to define precisely, as there is no clear line between underassigned symmetry, in which the subunit differences are unsupported by the data, and in which the subunit differences rest on small but significant intensity differences in the diffraction pattern. To help catch symmetry-assignment problems in the future, it is useful to add a validation step that operates on the refined coordinates just prior to structure deposition. If redundant symmetry-related chains can be removed at this stage, the resulting model (in a higher symmetry space group) can readily serve as an starting point for re-refinement using re-indexed and re-integrated raw data. These ideas are implemented in new software tools available atKeywords: underassigned rotational symmetry; LABELIT; validation.
1. Introduction
The accuracy of the molecular model derived from X-ray crystallography is inherently limited by measurement uncertainty in the structure factors and intrinsic disorder of the crystal. Indeed, atomic level accuracy is only possible if the data-set resolution approaches or exceeds 1.0 Å (see Afonine et al., 2007, and references therein). At lower resolutions, prior assumptions about the stereochemistry are required in order to sufficiently restrain the process (Hendrickson, 1985). Likewise, restraints arising from noncrystallographic symmetry (NCS) averaging are important for shaping the molecular envelope and producing interpretable electron-density maps (Jones & Liljas, 1984). However, in view of the probabilistic nature of these restraints it is best to exploit true constraints such as when they are available. Symmetry constraints have two benefits: merging of the symmetry-equivalent reflections increases the accuracy of the measured structure factors and modeling the rather than the entire markedly decreases the number of parameters in the molecular model. A failure to identify the highest space-group symmetry compatible with the observations can have severe consequences for model building (Kleywegt et al., 1996), leading to unwarranted conclusions about the biology of the system under study.
This paper deals with the issue of finding potentially higher δ angle between the axis vectors expressed in direct and (Le Page, 1982). If the δ angle is identically zero the axis qualifies as a rotational symmetry operator as far as the unit-cell measurements are concerned. However, practical experience with typical rotation photography experiments shows that an allowance must be made for deviations as high as 1.4° from perfect alignment in order to construct the highest symmetry Bravais type consistent with the data (Sauter et al., 2004, 2006).
given a particular data set and model. While the choice of is a routine aspect of structure solution, it is worth keeping in mind that experimental measurements never establish the with absolute confidence. There are always physical uncertainties to be considered both in the positions and the intensities of the Bragg reflections. Uncertainties in Bragg spot position affect the first step of space-group assignment, in which the crystal is classified into one of 14 Bravais types (based on the metric symmetry of the unit-cell dimensions). Starting with the three lengths and three angles of the a convenient way to evaluate a potential symmetry axis is to compute theBeyond the classification of Bravais type, measurement uncertainties in the Bragg intensities can potentially hinder the assignment of the diffraction's symmetry. Here again it is possible to evaluate individual symmetry operators based on the agreement of symmetry-related intensity measurements. (Friedel mates are treated as equivalent throughout this paper, regardless of whether there is an Rsymop as the average percentage difference between pairs of symmetry-related intensity measurements (equation 2 in Sauter et al., 2006), this statistic is ideally zero for a valid However, nonzero values of up to 25% must be permitted (to account for poor measurement and/or anomalous signals) in order to assemble an optimal set of operators to describe the diffraction symmetry (Sauter et al., 2006; Evans, 2006).
signal.) Defining the symmetry-operator reliabilityIt would be desirable if the acceptable tolerances chosen for δ and Rsymop could always be large enough to reflect the physical uncertainties for the specific experiment, but there is no established method to make this guarantee. Either by intention or by mistake structures can be solved in space groups with symmetries that are too low. Indeed, from time to time it has been remarked (Hooft et al., 1994, 1996; Zwart et al., 2008) that certain structures deposited in the PDB (Berman et al., 2003) appear to have redundant subunit chains that are related by unassigned rotational symmetry operators. Furthermore, we have observed that some commonly used methods to determine the are susceptible to numerical instability (Grosse-Kunstleve et al., 2004; Sauter et al., 2004), making it possible for high-symmetry Bravais types to be improperly identified, such as hexagonal rhombohedral (hR) being assigned as C-centered monoclinic (mC).
For small-molecule crystal structures, cases requiring reassignment into a higher ; Marsh, 1995, 1997, 2009; Marsh & Spek, 2001) and symmetry-validation software is available (Le Page, 1988; Palatinus & van der Lee, 2008; Spek, 2009). Here, we perform a similar function for the macromolecular field, surveying the entire PDB for underassigned rotational symmetry operators. [We address neither underassigned translational symmetry operators, as was performed recently by Zwart et al. (2005, 2008), nor the topic of as has been covered by Lebedev et al. (2006).] Since we do not usually have recourse to the original raw data images, no judgements are made about the true crystallographic symmetry in individual cases. Rather, we develop scoring tools to quantify how closely a particular atomic model appears to fit into a higher symmetry, and coordinate-manipulation tools to interconvert models between space groups. The tools are intended to be used by the original investigator for validating the model at any stage prior to structure deposition or for correcting a model that is deemed suitable for re-analysis in a higher symmetry.
have been well documented (Marsh & Herbstein, 19882. Computational methods
Software development was greatly facilitated by the framework provided by the open-source Computational Crystallography Toolbox (cctbx; Grosse-Kunstleve et al., 2002, 2006). PDB coordinate files from https://wwpdb.org were parsed with the cctbx.iotbx.pdb file reader. Analysis was restricted to coordinate sets determined by X-ray crystallography and additionally to proteins rather than Solvent molecules, ligands, covalent modifications and alternate conformations were ignored. Structure factors from the PDB, when available, were validated with phenix.cif_as_mtz (Urzhumtseva et al., 2009) to assure consistency with the corresponding PDB coordinate entry. Raw diffraction images for selected cases were downloaded from the Joint Center for Structural Genomics (JCSG; https://www.jcsg.org).
2.1. Automated structure solution in all possible subgroups
Before proceeding with the all-PDB survey, we wish to confirm that the true symmetry can be deduced from the atomic model if the structure is intentionally solved in a lower symmetry 3b77 (Table S21 gives further examples). After integrating the 3b77 data set in the triclinic setting, merging trials performed with labelit.rsymop (Sauter et al., 2006) show that the Bragg intensities, together with the unit-cell dimensions, are consistent with Patterson symmetries P4/m, P12/m1 or . To obtain structure solutions in all three possible symmetries, the data were re-integrated, scaled and merged separately in each of these settings. Molecular-replacement solutions were determined with the program phenix.automr (McCoy et al., 2007) using the published P4 structure as a replacement model. Solutions A1, A2 and A3 (corresponding to the three symmetries noted above) were then built and refined with phenix.autobuild (Terwilliger et al., 2008). As this particular data set consists of a 90° rotation wedge intended for a tetragonal structure, the completeness of the data is quite low (57% out to a limiting resolution of 3.5 Å) when processed in the triclinic setting; however, this is still sufficient for the present purpose. To afford a comparison between crystallographic R factors (Table 1), each structure is refined at the same resolution and the same set of free-R flags as initially calculated for highest symmetry (P4) is expanded into the monoclinic and triclinic settings.
Such structures were generated automatically using original JCSG data sets as a starting point and are illustrated here using PDB entry
|
2.2. Relating the input symmetry to potential higher symmetries
In principle, it should be straightforward to check whether an atomic model can be reassigned to a higher symmetry target G. One simply lists the symmetry operators of the target and selects the operators that are absent in the input H. Applying these trial operators to the input structure will leave both atomic coordinates and structure-factor intensities invariant if the target symmetry is valid.
In practice this calculation is fairly complicated since space groups are conventionally expressed in different reference frames (Hahn, 1996). In the general case, the input and target symmetries will have different unit-cell basis vectors a, b, c and choices of origin. To assure that H is a of G a single point of view must be chosen, and the approach taken here is to perform all comparisons in the reference frame of the target symmetry. Converting from the input to the target reference frame requires the sequence of transformations depicted in Fig. 1. Beginning with the initial setting, a change of basis (Boisen & Gibbs, 1990) is applied to remove any centering operations (Grosse-Kunstleve, 1999). This is then changed to a standard reduced setting (the `minimum' setting defined in Grosse-Kunstleve et al., 2004). To afford comparisons between Bragg reflections that are potentially symmetry-equivalent, we enumerate all Patterson settings that align with the cell to within a tolerance δ (Sauter et al., 2006) and change the basis to each of these metric group settings in turn. Having selected one of these metric settings (see §2.3), we then need to evaluate all of the candidate space groups that share the same Patterson symmetry as the metric group, each requiring a basis change from the reduced setting to the candidate setting. At this point, a fractional translation (see §2.4) must be applied so that duplicate polypeptide chains are correctly related by the the candidate space group's rotational symmetry operators. A final adjustment to the conventional setting is necessary in certain cases, particularly those orthorhombic cases in which the target symmetry is in a nonstandard setting such as P2122, which must be converted to the standard setting (P2221) by an axis swap.
Each transformation in this sequence is represented by a change-of-basis operator, which combines a rotation matrix and a translation vector, each containing rational-valued elements. These operators are mathematically associative, so that the total transformation from input to target setting is succinctly expressed as a single rotation R and translation T. As detailed elsewhere (Giacovazzo et al., 1992; Sauter et al., 2006), the transformation (R, T) can be applied to fractional coordinates, and symmetry operations from the input structure in order to re-express them in the target reference frame. It is important to realise that the entire input structure is moved as a rigid body under the operation (R, T), so the symmetry properties of the structure do not change during the transformation. It is just a matter of convenience to move the structure into the same reference frame where we already have a list of the trial symmetry operators of the target space group.
2.3. Evaluation of the Patterson symmetry
We expected the possibility that the models from §2.1 solved in suboptimal space groups (A2 and A3) would have poorer crystallographic R factors than the optimal model A1. Instead, we found that the R-factor statistic did not help at all to distinguish between the best symmetry and the underassigned symmetry. The implication is one of caution: if the optimal Patterson symmetry is passed over at the stage of indexing and integration then the model-building and process may be completed successfully without any indication of the oversight.
Fortunately, the model itself can be examined (following the approach of §2.2) to assess its compatibility with higher symmetry. A first step (Tables 2 and 3) is to establish missing symmetry operators based on back-calculated reflection intensities, Icalc. After expanding the atomic coordinate model to P1, the unit-cell measurements are used to construct the largest possible set of lattice symmetry operators, as described previously (Sauter et al., 2006). Each potential operator W is then independently scored based on the agreement of symmetry-related intensities,
where is a sum over all pairs of Bragg spots related by W and is a sum over both members of the pair. Low Rsymop values indicate valid rotational symmetry in and in the illustrated example it is apparent that there is a fourfold rotation along the z axis (Table 2). The fourfold is equally clear regardless of whether the model is taken from the monoclinic or the triclinic structure. The triclinic structure (A3) additionally reveals a twofold symmetry along the z axis, while the monoclinic model (A2) already assumes the presence of this twofold, so the Rsymop value for this operator is zero.
|
|
In Table 3 the lattice symmetry operators are grouped together to show all possible Patterson settings consistent with the (to within the small angular tolerance δ). Each setting is scored by tabulating the worst-case symmetry-equivalence measure (Rsymop), considering all operators in the group. As expected, the illustrated example (triclinic structure A3) is consistent with only three of the metrically possible Patterson settings, namely P4/m, P12/m1 and , and not with any groups containing a twofold in the xy plane.
We arrive at the same conclusions about symmetry if we use the experimentally observed data (Tables 2 and 3) rather than model-calculated intensities. Starting with merged structure-factor amplitudes |Fobs|, the observations are expanded to P1, re-expressed as reflection intensities (Iobs) and used in (1) instead of Icalc. This methodology is readily used to evaluate the potential Patterson settings in any deposited reflection file from the PDB.
2.4. Identification of the and positioning of the model
Symmetry-equivalence of the reflections (1), together with knowledge of the establishes the highest possible Patterson symmetry, but two questions remain to be answered: what is the and where should the model be placed in the higher symmetry unit cell? Taking the example of structure A3, we wish to know which of the tetragonal space groups to focus on (P4, P41, P42 or P43) and where to place the polypeptide in relation to the z axis.
We begin by defining x, the fractional origin shift that must be applied in the setting of the target G to the input model in order to properly position it within the higher symmetry (denoted as `translation x' in Fig. 1). The model is correctly positioned when the application of space-group symmetry operators leaves the model invariant. In view of the prohibitive computational cost of translating the model to every position in the we adopt a method from Navaza & Vernoslova (1995), dramatically speeding up the calculation by gauging the correlation between two types of calculated Bragg intensity: Imerge,G and Iensemble,G(x). Imerge,G is simply the set of reflection intensities calculated by expanding the atomic coordinates of the present model into P1 and merging the symmetry equivalents under G. Iensemble,G(x) is the result of applying the origin shift x, thus repositioning the model in the The symmetry elements of G are then applied, giving a hypothetical ensemble containing multiple copies of the P1 model superimposed upon each other (one copy for each symmetry operator) from which intensities Iensemble,G(x) are calculated. The agreement between present model, origin shift and is described by the Pearson correlation coefficient
where 〈 〉 is the average over all H and ΔIH = IH − 〈I〉 is the deviation between the calculated intensity for a given and the average over all intensities. Navaza and Vernoslova's Fast Fourier approach for calculating r(x, G) is computationally tractable even for large structures.
Peaks in the r(x, G) map that approach a value of 1.0 represent candidate translations for positioning the model into the target In the illustrated example (Figs. 2a–2d), the relatively low correlation coefficients under P41 and P43 allow us to rule out these space groups, while space groups P4 and P42 are both shown to be viable candidates as far as intensity correlations are concerned. When viewing these correlation maps it is useful to realise that the r(x, G) function has a special type of symmetry variously called the Cheshire group (Hirshfeld, 1968) or affine (Koch & Fischer, 1996); the effect of this is to restrict the range of possible origin shifts to an area or volume smaller than the of G. For the four tetragonal space groups under consideration r(x, G) is independent of the position along the fourfold, so it is only necessary to illustrate a single section in Figs. 2(a)–2(d).
The is very efficient for discriminating among origin shifts, but in this case it does not distinguish between the two candidate models that might be consistent with structure A3: a P4 model with origin shift xmax = 0 (Fig. 2e) and a P42 model shifted by xmax = ½c (Fig. 2f). The latter model happens to be incorrect in the sense that application of the 42 screw leads to an atomic model (red circles in Fig. 2f) that sterically clashes with the starting model (blue circles) rather than aligning with it; each is effectively duplicated. Yet the calculated intensities for the two sets of asymmetric units are identical since intensities are invariant under the screw axis operator. What is missing in (2) is a recognition that the screw operation affects the structure-factor phase, even though it does not affect the amplitude.
of (2)Properly accounting for phases requires a separate calculation. We take the input model (triclinic structure A3 in this case), apply the origin shift xmax determined above, and then consider the calculated structure factors Fcalc and phases φcalc. Looking separately at each symmetry operator gi of G, a weighted phase difference factor is used to construct a symmetry agreement score as suggested by Palatinus & van der Lee (2008),
In this expression, symmetry operator gi has a rotational part W and a translational part w. The normalization constant C and modular integer n are as described in Palatinus & van der Lee (2008). Models that are invariant under the will have equal values of φHcalc and φHWcalc + 2πH·w, so the score will be zero. In our example, the symmetry agreement scores φ(4) = 0.002 and φ(42) = 0.578 clearly establish the correct as P4.
2.5. Positional of the higher symmetry model
The Pearson is evaluated on a grid whose granularity is approximately half the limiting resolution of the diffraction. Therefore, the origin shift xmax from §2.4 is only a first approximation. Indeed, the displacement between the atomic model and the symmetry axes of the should arguably be the most precise element of any structure. Since the displacement is derived jointly from the positions of all the atoms, its uncertainty should be a tiny fraction of a bond length. It is thus appropriate to subject the origin shift to additional Furthermore, while (3) scores the symmetry agreement of structure factors in it is also desirable to quantify the symmetry based on the atomic model in (or even to provide a computer-graphics snapshot of superimposed symmetry-equivalent molecules), giving a better intuitive grasp of the symmetry fit. This section presents methods for addressing these issues.
(2)2.5.1. Matching of symmetry-equivalent molecules aided by decomposition
As noted in §2.2, we judge a target G by applying symmetry operators present in G that are absent in the input H. The relationship between group G and its H can be most usefully explored by the decomposition tools of group theory. In particular, the left decomposition of G with respect to H is defined as
In this expansion, G is broken down into a series of n subsets (left cosets) generated by applying the symmetry operators gi ∈ G to each element of H. Operator g1 is defined to be the identity, while the elements g2…gn, termed left representatives, are the elements that require evaluation as trial symmetry operators for the The choice of which elements to count as left representatives is not unique; within each left any element can be chosen as the representative with equivalent results. The important property here is that only one representative from each need be considered.
The ). Matching polypeptide pairs (X to Y) are then determined for each representative gi using a triple loop. In the outer loop, gi is applied to each polypeptide chain X of the In the middle loop, each polypeptide chain Y is evaluated as a matching target (with the requirement that Y is only considered as a candidate if X and Y have similar amino-acid sequences). In the innermost loop, each operator h ∈ H is applied to Y and a match is declared if the coordinates approximately superimpose,
expansion makes it possible to quantify how close the noncrystallographic symmetry relationships of a structure come to crystallographic exactness. A necessary first step is to derive trial mappings of the contents to itself, one mapping for each The algorithm begins by origin-shifting the input structure to the optimized setting (Fig. 1In this expression, the atomic coordinates of X and Y are expressed in fractional coordinates and t represents an allowable translation vector on the lattice (one containing full-integer components). Superposition is determined using the method of Kearsley (1989) and calculations throughout this paper are limited to the Cα atoms of polypeptide chains.
A simple example of chain matching is illustrated in Fig. 3. There are 12 identical in the of the monoclinic structure A2. Adapting the input (H = P2) into the target (G = P4) leads to the decomposition
where the numerical symbols are intended to represent the identity operator 1, the twofold rotation 2 of P2 and the fourfold g2 = 4+ chosen as the single left representative. Under the operation of g2, polypeptide chains A–F map to chains G–L, while chains G–L map to chains A′–F′ in the second of the monoclinic cell (corresponding to h = 2).
2.5.2. High-precision of the origin shift
In the preceding section, the approximately known origin shift x is used to discover symmetry-matched peptide pairs. We now turn this process around, performing least-squares on these known matches to produce the best possible chain alignment, while considering x to be a free variable. For these purposes we revert the atomic coordinates back to the candidate group setting (Fig. 1) prior to the application of the origin shift. The function to be minimized is the Cartesian square difference between chain-matched Cα positions,
The outer summation here is over all n cosets except for the first one, which just produces the identity mapping. The middle sum is over all P polypeptide chains in the and the inner sum is over the Nα Cα pairs in the jth matching pair of chains (X, Y). Operator gi is the ith representative, while tXY and hXY are the translational and rotational symmetry operators in H required to produce a match between chains X and Y (5). Matrix O is the orthogonalization matrix required to convert fractional to Cartesian coordinates. After minimization of the function f, the refined origin shift is used to recalculate the optimized structure (Fig. 1).
Having determined the final origin shift, the input structure's fit with target G can now be evaluated. If the structure is perfectly invariant when the representative operators are applied, the value of the function f will be identically zero. The deviation from perfect symmetry can be expressed as the root-mean-squared deviation of Cα atoms from their symmetry-predicted positions,
where ΣN symbolizes the total count of Cα matches over all matching polypeptide pairs and all cosets in the triple sum of (7).
2.5.3. Generating coordinate sets corresponding to each asymmetric unit
Imposing additional symmetry on a structure implies that the number of unique polymer chains will be reduced; in fact, the resulting P/n chains, the original number of chains divided by the number of cosets. The chain-matching results of §2.5.1 can be used to construct approximate models of the higher symmetry The key idea is to select one chain from each group of mutual chain matches; e.g. in Fig. 3 one chain is selected from each of the six groups {A, G}, {B, H}, {C, I}, {D, J}, {E, K} and {F, L}. While there are many possible combinations [n(P/n)], we take the simple expedient of selecting the polypeptide from each group that appears first in the original PDB input file, so in this case chains A–F are selected as the primary model of the To visualize the extent to which the input structure differs from the perfect symmetry of G, n − 1 additional models are then generated, one for each These arise by looping over the X of the primary model and transforming their matched Y with
will contain exactlythus placing the matching chains and the primary model in approximate alignment. The end product of this exercise is a set of n different models of the higher symmetry nearly superimposed, with differences among models reflecting the NCS variability of the input structure. These models can be readily output as PDB-format files for visual inspection and further analysis. In the example of Fig. 3, the two models consist of chains A–F and G–L, respectively.
2.5.4. Interpreting the deviation from perfect symmetry
Differences among these Δrsym measure of (8) contains both components, but it is also informative to separate the rigid-body and residual terms. To evaluate the residual component by itself, we perform a Kearsley (1989) alignment of the entire Cα contents of ASU models i and j, and evaluate the root mean-squared deviation of superimposed atoms, Δrij. Averaging this quantity over all ((n) || 2) pairwise combinations of ASU models, the overall residual component can be expressed as
(ASU) models combine two types of variation: a rigid-body component describing the motion of the contents as a whole and a residual component reflecting the positions of individual atoms. Thewhere Nij is the total number of Cα matches between ASU models i and j. For cases where the ASU model contains more than one polypeptide chain, an additional measure of the residual term, Δrchain, is defined to represent deviations of atoms within individual chains. This quantity is calculated in an identical manner to (10) except that the Kearsley alignment is performed on individual pairs of and the resulting summation contains (P/n)((n) || 2) terms.
Values for Δrsym, ΔrASU and Δrchain for structures A2 and A3 are reported in Table 4. The predominant contribution to the NCS differences in these structures is from random deviations of individual atoms of the order of 0.1 Å. There is only an insignificant contribution (0.002 Å in structure A2 and 0.03 Å in structure A3) from rigid-body rearrangements of polypeptide chains.
|
2.6. Re-indexing the diffraction images in higher symmetry
We now suppose that a decision has been made to increase the symmetry of the atomic model. Clearly, the best outcome can be achieved by returning to the original diffraction images. Imposing the new G (P4 in the case of structures A2 and A3) on the original data will permit better unit-cell constraints for the prediction of spot positions during integration, afford more symmetry equivalents for outlier rejection during scaling and possibly remove model bias resulting from introducing too many free atoms during the model-building step.
Yet there are certain steps of the data-processing pipeline that would be wasteful to repeat. Since we already have an ensemble of models of the higher symmetry 2.4) and subsequent (§2.5.2).
it is no longer necessary to repeat the decision during autoindexing in which the and are chosen from a list of lattices compatible with the observed cell. Similarly, no phasing protocols should be required, as the structure of the atomic model and its position in the have adequately been addressed by the fast translation function (§An express route to re-refinement is achieved by adapting the autoindexing program labelit.index (Sauter et al., 2004) to accept the additional input of a PDB file containing one of the proposed ASU models from §2.5.3. Structure factors are calculated, taking into account a bulk-solvent correction (Afonine et al., 2005) to more realistically model the observed intensities. Separately, data from one or two frames of the raw data are integrated and corrected for Lorentz and polarization factors (Leslie, 1999), using a preliminary reduced (Grosse-Kunstleve et al., 2004) to model the lattice. We now wish to determine how the unit-cell basis vectors of the calculated and observed patterns need to be aligned in order to obtain the best fit between intensities. Two types of ambiguity need to be resolved. Firstly, in some cases the is close to fitting into a higher symmetry metric. A triclinic cell, e.g. with dimensions a ≃ b and α ≃ β, may require an axis swap (a′, b′, c′ = −b, −a, −c) to correctly model the observed pattern. Secondly, certain space groups permit multiple non-equivalent indexing schemes (Dauter, 1999), only one of which will allow the ASU model to align properly with the observations. For example, point groups 3, 4 and 6 can be indexed with the c axis up or down. All of these ambiguities can be resolved by exhaustively testing each possible reindexing scheme that preserves the unit-cell dimensions, and assessing the mutual scaling R factor (Weiss, 2001) between calculated and observed intensities. The result is an indexing solution for the diffraction pattern that correctly accounts for the position and orientation of the ASU model in G. At this point the full data set is integrated, scaled and converted to structure factors. Structure is initiated (e.g. with phenix.refine) starting with the aforementioned ASU model. As shown in Table 1, the re-refinement of triclinic structure A3 in P4, without any further manual intervention, leads to a new structure (A4) that is comparable to the original published PDB file.
3. Results and discussion
A November 2009 snapshot of the PDB was analyzed to identify X-ray structures that are nearly invariant when additional rotational symmetry operators are imposed. Of almost 62 000 files in the database, about 53 000 are X-ray structures. Here, we focus on the approximately 52 000 that contain protein chains rather than exclusively ranks these candidates in order of increasing Δrsym (a measure of the average Cα displacement required to impose the additional symmetry; see equation 8) up to an arbitrary cutoff value (see below) of Δrsym = 0.325 Å. A full listing is given in Table S11.
or small About 1000 structures, or 2%, were conservatively found to produce a good fit with a higher symmetry Fig. 4It is beyond the scope of this paper to deliver a definitive choice as to which space groups are best for individual structures. However, if the conservative group of 1000 shown in Fig. 4 is considered as a whole, there are strong arguments to favor the higher symmetry settings. Foremost is the small size of the displacements needed to bring equivalent atoms into a perfectly symmetrical arrangement. It is generally recognized that the coordinate accuracy of an X-ray structure is a fraction of the diffraction pattern's limiting resolution (Luzzati, 1952). Various methods are presently used to estimate the coordinate uncertainty (Kleywegt, 2000) and where reported in the PDB these 1σ uncertainty values are plotted in Fig. 4(a). Most of the estimated values shown (75%) are at least as high as Δrsym. Generally speaking then, for this group, there is a good chance that displacements seeming to be a product of differences are really a result of experimental coordinate uncertainty.
This argument is made stronger by a considering whether the imposition of added symmetry requires random displacements of individual atoms or rigid-body motions of entire polypeptide chains. The quantity ΔrASU (10) gives an indication of the random variations of equivalent atoms once the polypeptide chains are superimposed by a rigid-body motion (Δrchain serves the same function for cases where there are multiple chains in the asymmetric unit). The plotted values in Fig. 4(a) demonstrate that for most cases ΔrASU (or Δrchain where appropriate) is nearly identical to Δrsym; on average, all but 0.01 Å of the displacement required comes from individual atomic motions. The fact that there is virtually no rigid-body component is consistent with the idea that subunit differences are a consequence of experimental uncertainties rather than true observations of NCS variation.
A final and compelling factor supporting the higher symmetries is the distribution of observed structure factors, which are published in the PDB for 694 of the cases. The agreement of symmetry-equivalent observed intensities is quite good under many of the higher symmetry operators, with values of Rsymop (Iobs) clustering about an average of 4% (Fig. 4b). The fact that the reported merging R values from these same 694 structures have an average of 8% suggests that any observed differences in symmetry-equivalent intensities is not experimentally significant.
Taken together, the data in Fig. 4 are evidence that a considerable number of PDB structures could be reassigned to higher symmetry space groups. Reassigning the would reduce the number of polypeptide chains in the model by a factor of n, where n = 2 for most cases but in some cases is found to be 3, 4, 6 or even 12 (Table 5). It is not apparent whether the reassignment candidates have any particular properties in common, e.g. they seem to be distributed over the entire range of limiting resolutions represented in the PDB. Furthermore, all point groups for which supergroups are available are present in the list (Tables 6 and S1).
|
|
Care should be taken to distinguish between the present results and a previous study by Wang & Janin (1993) showing that NCS symmetry axes tend to lie nearly parallel to unit-cell edges or face or body diagonals. The vast majority of structures listed by Wang and Janin are likely to have correctly classified space groups, with verifiable differences between NCS-related subunits. None of the cases listed in that paper appear in our list of candidates for reclassification (Table S1).
The choice of Δrsym = 0.325 Å as a cutoff for producing Fig. 4 and Table S1, while arbitrary, reflects the notion that larger values of Δrsym and Δrchain are more likely to exceed the expected coordinate uncertainty, implying confident rather than underassigned symmetry. It is instructional to consider how these latter categorizations relate to the mathematical treatment of §2.5.1: with underassigned symmetry the representatives g2…gn are exact symmetry operators leaving the structure invariant, while with these operators match atoms in the in an approximate rather than an exact fashion. Furthermore, with there is the attendant possibility of (Padilla & Yeates, 2003), in which the representatives act as operators that describe the mutual relationship of different unit cells in the crystal. The Rsymop(gi) values obtained from (1) correspond to the Rtwin formula defined by Lebedev et al. (2006), suggesting a role for the Rsymop (Icalc) and Rsymop (Iobs) statistics in quantifying as discussed in that reference.
Ideally, any validation process to prepare structures for final publication and deposition should scrutinize the choice of
Normally the compatible Bravais lattices are evident at the stage of autoindexing, when the observed unit-cell dimensions are checked for higher symmetry metrics. Subsequently, at the step of data-set merging, it is usually possible to unambiguously identify the of the diffraction pattern. Yet the data shown here indicate that a fraction of cases are misassigned, suggesting that a third check should be added at a later step, after the atomic model is built.It is fair to ask how beneficial such a procedure would be. In the unusual but ideal situation in which the data are very accurately measured and there is an ample data-to-parameter ratio, it should be possible to obtain an accurate structure even if the symmetry is underassigned. However, in more typical cases in which the desired atomic details may be only marginally observable in the electron-density map, the constraints offered by perfect symmetry may be crucial to map interpretation. Much of crystallography today is centered on elucidating the relationship between proteins and small-molecular ligands, including ions, 2.5.3) is intended to assist the crystallographer in determining whether there are regions of the model that may exhibit especially large changes under the proposed symmetry target and which therefore warrant extra attention.
drugs and small and the models for these interactions may not be as well restrained by stereochemistry as those of proteins. Assignment into a higher symmetry may prove helpful in borderline cases where it is barely possible to discern the ligand. The ability to align symmetry-equivalent models arising from space-group reassignment (explained in §The spectre of re-evaluating the space groups assigned to hundreds of crystal structures calls to mind recent discussions regarding the worth of archiving original crystallographic diffraction images (see, for example, Baker et al., 2008). If the objective is to justify a certain choice of symmetry to future investigators, then data archival assumes a new importance.
The procedures described here are included in the software package LABELIT, available for download by noncommercial users at https://cci.lbl.gov/labelit and for licensing by commercial users. Command-line parameters for the program labelit.check_pdb_symmetry, explained in the online manual, permit the input of both coordinates and structure factors. LABELIT is also included with the PHENIX package (Adams et al., 2002), available for download at https://www.phenix-online.org.
Supporting information
Supporting information file. DOI: https://doi.org/10.1107/S0907444910001502/dz5193sup1.pdf
Acknowledgements
The authors would like to thank Ashley Deacon (Joint Center for Structural Genomics) for creating the archive of full data sets associated with published JCSG structures, making it possible to develop new methods such as those described here. Karen Woo (Lawrence Berkeley National Laboratory) provided invaluable technical assistance. We thank the NIH for financial support of the LABELIT (1R01 GM77071) and PHENIX (1P01 GM63210) projects and for additional support to PHZ (Y1GM906411). This work was partially supported by DOE contract No. DE-AC02-05CH11231.
References
Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). Acta Cryst. D61, 850–855. Web of Science CrossRef CAS IUCr Journals Google Scholar
Afonine, P. V., Grosse-Kunstleve, R. W., Adams, P. D., Lunin, V. Y. & Urzhumtsev, A. (2007). Acta Cryst. D63, 1194–1197. Web of Science CrossRef CAS IUCr Journals Google Scholar
Adams, P. D., Grosse-Kunstleve, R. W., Hung, L.-W., Ioerger, T. R., McCoy, A. J., Moriarty, N. W., Read, R. J., Sacchettini, J. C., Sauter, N. K. & Terwilliger, T. C. (2002). Acta Cryst. D58, 1948–1954. Web of Science CrossRef CAS IUCr Journals Google Scholar
Baker, E. N., Dauter, Z., Guss, M. & Einspahr, H. (2008). Acta Cryst. D64, 337–338. Web of Science CrossRef IUCr Journals Google Scholar
Berman, H., Henrick, K. & Nakamura, H. (2003). Nature Struct. Biol. 10, 980. Web of Science CrossRef PubMed Google Scholar
Boisen, M. B. Jr & Gibbs, G. V. (1990). Mathematical Crystallography (Reviews in Minerology, Vol. 15), revised ed. Washington DC: Mineralogical Society of America. Google Scholar
Dauter, Z. (1999). Acta Cryst. D55, 1703–1717. Web of Science CrossRef CAS IUCr Journals Google Scholar
Evans, P. (2006). Acta Cryst. D62, 72–82. Web of Science CrossRef CAS IUCr Journals Google Scholar
Giacovazzo, G., Monaco, H. L., Vitergo, D., Scordari, F., Gilli, G., Zonotti, G. & Catti, M. (1992). Fundamentals of Crystallography. Chester, Oxford: IUCr/Oxford University Press. Google Scholar
Grosse-Kunstleve, R. W. (1999). Acta Cryst. A55, 383–395. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K. & Adams, P. D. (2004). Acta Cryst. A60, 1–6. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136. Web of Science CrossRef CAS IUCr Journals Google Scholar
Grosse-Kunstleve, R. W., Zwart, P. H., Afonine, P. V., Ioerger, T. R. & Adams, P. D. (2006). Newsl. IUCr Comm. Crystallogr. Comput. 7, 92–105. Google Scholar
Hahn, T. (1996). Editor. International Tables for Crystallography, Vol. A, 4th ed. Dordrecht: Kluwer Academic Publishers. Google Scholar
Hendrickson, W. A. (1985). Methods Enzymol. 115, 252–270. CrossRef CAS PubMed Google Scholar
Hirshfeld, F. L. (1968). Acta Cryst. A24, 301–311. CrossRef IUCr Journals Web of Science Google Scholar
Hooft, R. W. W., Sander, C. & Vriend, G. (1994). J. Appl. Cryst. 27, 1006–1009. CrossRef CAS Web of Science IUCr Journals Google Scholar
Hooft, R. W. W., Vriend, G., Sander, C. & Abola, E. E. (1996). Nature (London), 381, 272. CrossRef PubMed Web of Science Google Scholar
Jones, T. A. & Liljas, L. (1984). Acta Cryst. A40, 50–57. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kearsley, S. K. (1989). Acta Cryst. A45, 208–210. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kleywegt, G. J. (2000). Acta Cryst. D56, 249–265. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kleywegt, G. J., Hoier, H. & Jones, T. A. (1996). Acta Cryst. D52, 858–863. CrossRef CAS Web of Science IUCr Journals Google Scholar
Koch, E. & Fischer, W. (1996). International Tables for Crystallography, Vol. A, 4th ed., edited by T. Hahn, pp. 855–869. Dordrecht: Kluwer Academic Publishers. Google Scholar
Lebedev, A. A., Vagin, A. A. & Murshudov, G. N. (2006). Acta Cryst. D62, 83–95. Web of Science CrossRef CAS IUCr Journals Google Scholar
Le Page, Y. (1982). J. Appl. Cryst. 15, 255–259. CrossRef CAS Web of Science IUCr Journals Google Scholar
Le Page, Y. (1988). J. Appl. Cryst. 21, 983–984. CrossRef Web of Science IUCr Journals Google Scholar
Leslie, A. G. W. (1999). Acta Cryst. D55, 1696–1702. Web of Science CrossRef CAS IUCr Journals Google Scholar
Luzzati, V. (1952). Acta Cryst. 5, 802–810. CrossRef IUCr Journals Web of Science Google Scholar
Marsh, R. E. (1995). Acta Cryst. B51, 897–907. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Marsh, R. E. (1997). Acta Cryst. B53, 317–322. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Marsh, R. E. (2009). Acta Cryst. B65, 782–783. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
Marsh, R. E. & Herbstein, F. H. (1988). Acta Cryst. B44, 77–88. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Marsh, R. E. & Spek, A. L. (2001). Acta Cryst. B57, 800–805. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
Navaza, J. & Vernoslova, E. (1995). Acta Cryst. A51, 445–449. CrossRef CAS Web of Science IUCr Journals Google Scholar
Padilla, J. E. & Yeates, T. O. (2003). Acta Cryst. D59, 1124–1130. Web of Science CrossRef CAS IUCr Journals Google Scholar
Palatinus, L. & van der Lee, A. (2008). J. Appl. Cryst. 41, 975–984. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2004). J. Appl. Cryst. 37, 399–409. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sauter, N. K., Grosse-Kunstleve, R. W. & Adams, P. D. (2006). J. Appl. Cryst. 39, 158–168. Web of Science CrossRef CAS IUCr Journals Google Scholar
Spek, A. L. (2009). Acta Cryst. D65, 148–155. Web of Science CrossRef CAS IUCr Journals Google Scholar
Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69. Web of Science CrossRef CAS IUCr Journals Google Scholar
Urzhumtseva, L., Afonine, P. V., Adams, P. D. & Urzhumtsev, A. (2009). Acta Cryst. D65, 297–300. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wang, X. & Janin, J. (1993). Acta Cryst. D49, 505–512. CrossRef CAS Web of Science IUCr Journals Google Scholar
Weiss, M. S. (2001). J. Appl. Cryst. 34, 130–135. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zwart, P. H., Grosse-Kunstleve, R. W. & Adams, P. D. (2005). CCP4 Newsl. 43, contribution 7. Google Scholar
Zwart, P. H., Grosse-Kunstleve, R. W., Lebedev, A. A., Murshudov, G. N. & Adams, P. D. (2008). Acta Cryst. D64, 99–107. Web of Science CrossRef CAS IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.