Received 27 September 2012
Placement of molecules in (not out of) the cell
To uniquely describe a crystal structure, it is sufficient to specify the crystal unit cell and symmetry, and describe the unique structural motif which is repeated by the space-group symmetry throughout the whole crystal. It is somewhat arbitrary how such a unique motif can be defined and positioned with respect to the unit-cell origin. As a result of such freedom, some isomorphous structures are presented in the Protein Data Bank in different locations and appear as if they have different atomic coordinates, despite being completely equivalent structurally. This may easily confuse those users of the PDB who are less familiar with crystallographic symmetry transformations. It would therefore be beneficial for the community of PDB users to introduce standard rules for locating crystal structures of macromolecules in the unit cells of various space groups.
Keywords: scientific comment.
Crystals are built from identical unit cells extending in a parallel fashion in three dimensions. Moreover, each unit cell may contain a number of identical structural motifs (e.g. individual molecules or their complexes) arranged according to the symmetry of the particular space group. To uniquely specify the crystal structure, it is therefore sufficient to provide the locations of all of the unique atoms within the asymmetric unit of the crystal, i.e. the coordinates of all of these atoms with respect to the cell origin. From a purely crystallographic point of view, it does not matter in which asymmetric unit the specified atoms are located, and both constellations presented in Fig. 1 are equally correct.
| || Figure 1 |
Two possible crystallographically equivalent representations of the structure of 3-nitroacridine differing by a half-cell origin shift along the vertical axis and appropriate rearrangement of the atoms. Whereas in (a) the molecular structure of this compound is not clear, in (b) it is immediately apparent.
However, (molecular) crystals contain chemical compounds, and from the point of view of chemistry the situation in Fig. 1(a) is dramatically different from that in Fig. 1(b). Whereas in the latter the atomic connectivity and architecture of acridine are immediately apparent, the former representation makes little chemical sense. In analogy, if an asymmetric unit contains several molecules forming discrete oligomers, it is more informative to present the individual molecules grouped logically, rather than randomly, as illustrated in Fig. 2, and indeed most illustrations of oligomeric structures are already presented as biologically relevant assemblies on the PDB web pages.
| || Figure 2 |
Four independent protein protomers in the asymmetric unit of the structure 1woc as presented in the PDB (a) and after regrouping (b), when it becomes apparent that this structure consists of two similar dimers.
It is worth commenting on the concept of the `asymmetric unit' (ASU in the following). Intuitively, it is clearly the part of a unit cell which, under the action of all symmetry operations of the space group, reproduces the complete content of the cell and therefore the whole crystal. As long as this requirement is fulfilled, it does not matter what the shape of the ASU is. International Tables for Crystallography (2005) contains definitions of the ASU for each space group in the form of a convex parallelepiped (in cubic groups it may be a more complicated polyhedron), but this choice is arbitrary and ASUs may have different shapes. In fact, each molecule or, more strictly, each unique structural motif forms an ASU which may have a quite complicated shape, not necessarily convex. An example of such a construction is the Voronoi (1908) tessellation, which was first applied to protein crystals by Richards (1974).
Apart from crystallographic correctness and chemical sense, the presentation of any macromolecular crystal structure should be logical and as easy to comprehend as possible by other scientists who may be less familiar with the principles of crystallography, e.g. biologists interested in the functioning and biochemical properties of a given molecule or complex. In this context, it is meaningful how the structures are presented in the Protein Data Bank (PDB; Berman et al., 2000).
The PDB serves as the repository of macromolecular structures, but it is not responsible for the scientific content of the deposited models. However, it has certain rules concerning the presentation of the atomic models. For example, all solvent water molecules are automatically transformed by symmetry and renamed according to the closest macromolecular chain. The results of this procedure are absolutely equivalent to the original situation and it is meant to make it easier for the results of the structural analysis to be interpreted by people who are less familiar with crystallographic procedures.
The life of noncrystallographers interested in PDB models could be made even easier if some other considerations were also taken into account. Table 1 contains a list of all PDB structures of bovine trypsin complexed with various inhibitors crystallized in the orthorhombic space group P212121 with similar unit-cell parameters. The list shows the positions of the trypsin molecules in the unit cell of specified dimensions. Any crystallographer would realise that all of these structures are practically isomorphous and therefore the architecture of all of the crystals is equivalent. The unique molecules in different structures are simply transformed by the space-group symmetry or the permissible shift of the cell origin. Non-specialists, however, may be confused and treat these models as structurally different.
In addition to humans, some crystallographic programs are also `confused' as a result of placing molecules further away from the cell origin. The CCP4 program CONTACT (Winn et al., 2011) properly identifies six interacting neighboring molecules in most trypsin structures, except for PDB entries 1s0q , 2j9n and 3iti , in which the protein molecules are located farther from the cell origin (Table 2). As stated in the program documentation,
the default is to use only single translations (e.g. +A, -A, -A+B etc.), which works well if the molecule is reasonably positioned within the cell (not outside).
Intermolecular contacts identified by the CCP4 program CONTACT for representative structures of trypsin from Table 1
PDB code x y z No. of contacts Symmetry operations 5ptp 0.45 0.37 0.35 6 ½ - x, 1 - y, ½ + z ½ - x, 1 - y, -½ + z ½ + x, ½ - y, 1 - z -½ + x, ½ - y, 1 - z 1 - x, ½ - y, ½ - z 1 - x, -½ - y, ½ - z 3ptp 0.05 0.13 0.35 6 ½ - x , -y, ½ + z ½ - x, -y, - ½ + z ½ + x, ½ - y, 1 - z -½ + x, ½ - y, 1 - z -x, ½ - y, ½ - z -x, -½ - y, ½ - z 2d8w 0.45 0.13 0.15 6 ½ - x, -y, ½ + z ½ - x, -y, -½ + z ½ + x, ½ - y, -z -½ + x, ½ - y, -z 1 - x, ½ - y, ½ - z 1 - x, -½ - y, ½ - z 1tx8 0.05 0.37 0.65 6 ½ - x, 1 - y, ½ + z ½ - x, 1 - y, -½ + z ½ + x, ½ - y, 1 - z -½ + x, ½ - y, 1 - z 1 - x , ½ - y, - z 1 - x, -½ - y, - z 1n6y 0.45 -0.13 0.35 6 ½ - x , -y, ½ + z ½ - x, -y, -½ + z ½ + x, ½ - y, 1 - z -½ + x, ½ - y, 1 - z 1 - x, ½ - y, ½ - z 1 - x, -½ - y, ½ - z 2oxs 0.05 0.13 -0.15 6 ½ - x, -y, ½ + z ½ - x, -y, -½ + z ½ + x, ½ - y, -z -½ + x, ½ - y, -z -x, ½ - y, -½ - z -x, -½ - y, -½ - z 2otv 0.05 -0.37 -0.15 6 ½ - x, -1 - y, ½ + z ½ - x, -1 - y, -½ + z ½ + x, -½ - y, -z -½ + x, -½ - y, -z -x, ½ - y, -½ - z -x, -½ - y, -½ - z 1s0r 0.55 0.63 0.35 6 - x, 1 - y, ½ + z - x, 1 - y, -½ + z ½ + x, - y, 1 - z -½ + x, - y, 1 - z 1 - x, ½ - y, ½ - z 1 - x, -½ - y, ½ - z 1s0q 0.95 0.63 0.15 4 - x, 1 - y, ½ + z - x, 1 - y, -½ + z ½ + x, - y, -z -½ + x, - y, -z 2j9n 0.45 0.87 0.35 4 ½ + x, - y, 1 - z -½ + x, - y, 1 - z 1 - x, ½ - y, ½ - z 1 - x, -½ - y, ½ - z 3iti 1.05 0.37 0.15 2 ½ + x, ½ - y, -z -½ + x, ½ - y, -z
Taking into account that in space group P212121 the cell origin can be shifted by half of the unit-cell dimension in any direction, it is always possible to locate the center of the molecule (or, more generally, of the unique structural motif) in the region -¼ < x, y, z +¼. In fact, appropriately selecting one of the four existing orientations of the molecule in the cell, this region can be limited to, for example, 0 x, y < +¼, -¼ < z +¼. This is how the structure 2oxs is presented. However, a majority of the P212121 models in the PDB lie outside this region (Table 3).
Since the atomic coordinates in the PDB are expressed in orthogonal ångström coordinates, their transformation has to proceed through conversion to fractional coordinates. Because of the inevitable limitations of the precision of the stored atomic coordinates and cell dimensions, this procedure will introduce a degree of error that increases with the distance of the model from the cell origin.
In conclusion, it would be beneficial to the wide community of PDB users if all of the structures in this depository could be `remedied' by shifting their locations to positions as close as possible to the origin of the unit cell.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.
International Tables for Crystallography (2005). Vol. A, edited by T. Hahn. Heidelberg: Springer.
Richards, F. M. (1974). J. Mol. Biol. 82, 1-14.
Voronoi, G. (1908). J. Reine Angew. Math. 134, 198-287.
Winn, M. D. et al.. (2012). Acta Cryst. D67, 235-242.