Notes of a protein crystallographer: the advantages of combining new integrated methods of structure solution with traditional data visuals

The streamlined crystallographic software tools for solving macromolecular crystal structures should be complemented with the visual aids that were used in the past (for example precession images and stereographic projections) to facilitate better understanding of the basic crystallographic concepts by younger aspiring crystallographers.

The recent CCP4/APS School in Macromolecular Crystallography workshop at the Advanced Photon Source (https:// www.ccp4.ac.uk/schools/APS-2021/) entitled 'From data collection to structure refinement and beyond' brought together in a global electronic meeting some of the most prestigious senior crystallographers, premier method developers and the new generations of aspiring structural biologists and crystallographers. It was a virtual school that also included hands-on data-collection sessions at the Advanced Photon Source and comprised two full weeks of intense immersion in crystallographic methods for the training of younger crystallographers. I decided to register and participate in order to get a refresher course on the newest methods and software tools. I was certainly not disappointed.
Lectures on diffraction theory, the Ewald sphere, datacollection strategies and various methods such as MIR, MR, SIR, SAD etc. were presented at a very fast pace. The field is certainly vibrant with computational innovations that will facilitate the solution and refinement of ever more complex structures by conventional crystallography or by the promising youngest branch of the field, cryo-electron microscopy.
However, in my view, the introductory lectures on diffraction theory failed to take advantage of the traditional visualization tools that so well illustrate the theory and the basic principles of crystallography such as diffraction theory, the reciprocal lattice and the corresponding symmetry of the different projections. The basic principles were presented (for example the Ewald sphere), but the implications for the visualization of the reciprocal lattice were not adequately illustrated. In particular, the presenters failed to demonstrate the value of the images of the intensity-weighted reciprocallattice planes in the different Laue classes. I noticed the absence of highly integrated visual tools along the path to the structure solution of complex structures, including the existence of any noncrystallographic symmetry (NCS) if present (Rossmann & Blow, 1962). I am referring to precession photos ISSN 2059-7983 and stereographic projections of the self-rotation function (SRF). Am I a dinosaur? Am I ossified to the point of no redemption? Am I totally out of touch with the new techniques and programs that can solve structures automatically using specific and ultrafast pipelines and workflows? Possibly, but I do not think so.
I must add that the writing of this article was not motivated by nostalgia for the 'good old days' of tedious data collection and long, interminable and uncertain times for structure solution. My interest is in conveying to software developers, beamline scientists and future generations that the crystallographic community at large will benefit when the superb tools currently available are combined with the more traditional visual guides used in the past.
In this brief essay, I will illustrate this statement with some examples for three reasons. Firstly, although it has been said millions of times and is a mantra in many human endeavors, it is worth mentioning it again: a picture is worth a thousand words. Secondly, because having the younger generations recognize, understand and interpret these images will make them better crystallographers and structural biologists and not just 'program-running wizards'. Thirdly, but not least, because it will help in solving structures more easily, particularly those complex structures where the automatic workflows cannot find the correct solution for a variety of reasons: low resolution and/or incomplete data, complex NCS, twinning and/or ambiguous space-group assignment, among others.
The precession method was introduced in the 1930s in the laboratory of Professor Martin Buerger at the Massachusetts Institute of Technology. A complete discussion of the method was published by Buerger thirty years later (Buerger, 1964). It provided a simple and direct way of establishing the space group and unit cell of a crystal by providing undistorted views of the planes of the reciprocal lattice (Fig. 1a, left and right). The instrument (the precession camera) soon became commercially available and was widely used in crystallography laboratories.
For younger macromolecular crystallographers, it is worth mentioning that the native and heavy-atom derivative data for the determination of the structure of myoglobin at 2 Å resolution were collected by precession photography (Kendrew et al., 1960). The key advantage was that the successive photographs containing individual planes of the reciprocal lattice comprising the 'data set' were very easy to index. The intensities of the corresponding reflections were measured readily using the flatbed densitometers available at the time. The undistorted view of the weighted reciprocal lattice allows immediate indexing as the reflections are arranged regularly along the three main reciprocal axes (Fig. 1a, left and right). The data for the structure of hemoglobin at 5.5 Å resolution (Perutz et al., 1960) were collected by precession methods and using an early counter spectrometer (Arndt & Phillips, 1961), measuring the intensity of one reflection at a time.
Nowadays, you can only see precession cameras in small (museum-like) displays near well established crystallography laboratories: they are a relic of the past. However, current integrated visual software may display the intensity-weighted reciprocal space in almost any possible way with data collected using any hardware or available method.
Younger generations of macromolecular crystallographers are more familiar with the screenless oscillation methods used at synchrotron sources, which were initially used at conventional in-house rotating anodes, collecting the crystallographic reflections from various reciprocal planes simultaneously that could be indexed by the initial software (Arndt & Wonacott, 1977). The pre-alignment of the crystal axes with respect to the X-ray beam was important to facilitate the indexing and integration (Fig. 1b). The autoindexing software using Fourier methods that was introduced later (Otwinowski & Minor, 1997) enormously facilitated the indexing of the initial Figure 1 (a) Precession photographs of the two main planes of the reciprocal lattice (left, hk0; right, h0l) of crystals of C-phycocyanin, a blue-colored protein extracted from the green alga Agmenellum quadruplicatum. The precession angle is = 9 . These diffraction patterns were obtained with an overnight exposure on a precession camera model from the 1970s manufactured by the Charles Supper Co., Boston, Massachusetts, USA mounted on an Elliot rotating-anode generator. Note the 6mm symmetry of the diffraction pattern along the unique c trigonal axis (left) and the centrosymmetric pattern for the h0l plane (right), with the X-rays perpendicular to the main symmetry axis (threefold). A very small beam stop was used to visualize the lowest resolution reflections to understand the packing of the 32 molecular aggregates (Hackert et al., 1977). In the h0l reciprocal-lattice plane (right) the main axis (00l reflections) is horizontal. (b) Oscillation diffraction pattern (Á' = 1.5-3.5 ) of the same crystal form as in (a), with the c axis of the crystal oriented perpendicular to the X-ray beam (crystal-to-film distance 7.5 cm). The central lune corresponds to the h0l plane shown on the right in (a). The crystallographic parameters of the unit cell were a = b = 184.5, c = 60.5 Å (trigonal space group P321). diffraction patterns and the subsequent data collection and processing for randomly oriented crystals.
Obtaining precession photographs of the main projections of the reciprocal lattice (i.e. h0l, 0kl, hk0), often including the upper levels (i.e. h1l, 1kl, hk1), was a crucial (and often timeconsuming) step in any successful structure determination. This was possible by the careful analysis and visual inspection of these images (Figs. 2a-2c, 3a and 3b). I dare say that the majority of younger crystallographers will not even know what I am talking about. Yet all of these data, planes and projections are 'buried' in the well known 'mtz' files produced by the processing programs, namely MOSFLM/iMOSFLM , HKL-2000/HKL-3000 (Otwinowski & Minor, 1997) and XDS (Kabsch, 2010). They can be examined using the VIEWHKL program available in various software packages (CCP4 suite; Winn et al., 2011), as well as the abovementioned HKL-3000 (Minor et al., 2006) and Phenix suite (Liebschner et al., 2019). The precession photographs allow the immediate calculation of the unit-cell parameters, and by looking at the pattern of intensities of certain sets of reflections (for example axial reflections: 00h, 0kl, 00l) the presence of screw axes of symmetry can be determined, as well as any lattice centering by examining the systematic absences 3a and 3b).
This is what the current crystallographic programs do; it is not magic. In addition, the programs compare the intensities of what they think are symmetry-equivalent reflections and provide reasonable suggestions for the full symmetry of the crystal lattice based on some statistical inferences. All of this is important and valuable but, in my experience, it always helps to look at these 'pseudo' precession photographs to confirm what the programs are suggesting (Figs. 2a-2c, 3a and 3b). Except for enantiomorphic space groups (for example P6 1 versus P6 5 ), this is how the software proposes a space group Pseudo-precession photographs extracted using VIEWHKL from the processed data of Mycobacterium tuberculosis FBPase crystals (space group P6 1 22, unit-cell parameters a = b = 129.8, c = 141.2 Å ). (a) hk0 plane exhibiting 6mm symmetry. (b) 0kl plane with the 00l reflections vertical; notice the systematic extinctions 00l = 6n demonstrating the presence of the P6 1 (or P6 5 ) screw axis (Wolf et al., 2018). (c) h0l plane (00l reflections vertical) as rendered by the software available in Phenix (rainbow mode) showing the systematic absences (empty circles) along the main screw axis (different crystal but same space group). Reflection intensity ranges from blue to red (weak to strong). and will merge the integrated reflections if the statistical parameters are robust enough.
Nowadays, view-enabling programs such as HKL-3000 and Phenix can also be used to display various views, regions and planes of the reciprocal lattice (2D or 3D viewers in glowing colors!), including the completeness of the reciprocal data collected in three dimensions. This is very useful to identify certain regions of reciprocal lattice where significant 'gaps' exist due to the crystal orientation during data collection. These gaps or wedges can lead to incorrect space-group determination, particularly if the missing data are along the axes. This is something that the old precession photographs would not allow us to do (Fig. 2c).
The article by Rossmann and Blow defining the 'selfrotation function' (SRF) and its usage for 'the detection of sub-units within the crystallographic asymmetric unit' is a classic (Rossmann & Blow, 1962). The final sentence of the abstract establishes what was to become the most common application of this concept: 'application of the R function to horse haemoglobin gives a dominant peak that corresponds accurately to the relative orientation of the and chains'. Attempts to solve other protein structures in the years that followed often encountered the problem of having more than one molecule in the asymmetric unit with or without any point-group symmetry. Solving this problem was a key step in unraveling the heavy-atom sites and subsequent phase calculations and refinement. One of the most relevant examples of these years was the use of the rotation function for the discovery of a noncrystallographic twofold axis of symmetry in rhombohedral insulin (Dodson et al., 1966). This illustration demonstrated the value of calculating and interpreting the SRF in the early stages of structure determination. It was used routinely in the 1970s and beyond to establish the orientation of the symmetry elements in icosahedral viruses (point group 532), where it was crucial for the averaging of the initial electron-density maps and phase improvement. A time perspective on the molecular-replacement method and its applications has recently been published (Dodson, 2021).
Certainly, the new software packages do calculate the SRF to consider the presence of NCS and they routinely list the various NCS axes, typically using Eulerian angles, but the relative relationship of the NCS peaks is difficult to visualize using the Euler angles (, , or 1, 2, 3). It is certainly easier using the polar coordinates of the axial directions (', and the angle of rotation or ). I routinely use the option in MOLREP (Vagin & Teplyakov, 1997, 2010Winn et al., 2011) to calculate the SRF and plot it on a stereographic projection (Wulf net), which provides a superb overview of the crystallographic and NCS elements and their relative orientations with respect to the crystallographic axes. In my experience, these stereographic projections are very valuable to (i) characterize the NCS elements and their relative orientations with respect to the crystallographic axes (Fig. 4); (ii) discern issues with the quality of the crystallographic data, such as the degree of twinning (Fig. 5); (iii) identify the correct symmetry of the crystal lattice (Fig. 6) and (iv) unravel complex NCS symmetry Pseudo-precession photographs produced by VIEWHKL of data collected at the APS (LS-CAT) for the protein holo-(acyl-carrier-protein) synthetase (ACPS) from M. smegmatis. The program uses the reduced data in the 'mtz' file and extracts the corresponding reflections to display specific planes of the reciprocal lattices, such as hk0, hk1, hhl etc., creating 'pseudo-precession' photographs. The data were integrated and processed using the beamline software (HKL-2000). The space group was confirmed to be R3. These two pseudo-precession photographs illustrate the power of the precession images to identify the space group in a dramatic manner. (a) The hk0 view of the weighted reciprocal lattice along the main threefold axis showing sixfold symmetry. (b) In contrast, the 'upper level' photograph hk1 only displays threefold symmetry. This is diagnostic of trigonal space groups. Other crystallographic suites of programs such as HKL-3000 and Phenix (see text for details) can be used to effectively display the processed crystallographic data to complement the printed output of the crystallographic workflows. in low-symmetry space groups, including the presence of pseudo-symmetry, to facilitate structure solution (Fig. 7). I will illustrate these four cases with examples.
The orientation of the NCS symmetry elements in the lattice is the most traditional use, and in simple cases it is very easy to interpret using the direct listing from the programs. However, even the common presence of a 222 tetramer in the asymmetric unit can be challenging when none of the three twofold axes are aligned with the crystallographic symmetry. Fig. 4 illustrates the SRF of a crystal of the class II fructose 1,6-bisphosphatase (FBPase) from the pathogenic bacterium Francisella tularensis. The protein is present in solution as a tetramer of identical subunits and quite often the entire tetramer is present in the asymmetric unit of crystals belonging to space group P1. In the illustration, there are actually two tetramers with 222 symmetry within the unit cell in slightly different orientations and separated by about 70 Å along the c axis.
I have often found that obtaining the SRF immediately after the crystallographic data have been reduced helps to recognize 'red flags' regarding the quality of the data; this can be a significant time saver for a project. A lack of distinct features and the absence of the symmetry elements expected from the data reduction is often a reflection of disordered crystals and/ or certain types of twinning. In certain cases, it is possible to select the 'best crystal' data set to proceed based on inspection of the SRF. Fig. 5 illustrates how different degrees of twinning in two different data sets of P2 1 crystals grown in the presence of Mg 2+ (not the native divalent cation) are reflected in the appearance and interpretation of the SRF.

Figure 5
Degree of twinning and orientation of the tetramer in the P2 1 unit cell of Mg 2+ -containing crystals of F. tularensis FBPase. (a) SRF of data collected from crystal A, for which the twinning law was impossible to estimate. (b) SRF from crystal B that was interpreted as two 222 tetramers (axes directions labeled as PQR and P 0 Q 0 R 0 related by a crystallographic 2 1 screw axis; horizontal) with a twinning fraction of 0.433 as refined by REFMAC5 (Murshudov  et al., 2011). The structure was solved and fully refined (PDB entry 7js3). Reproduced with permission from Selezneva et al. (2020).
The most unexpected and revealing usage of the SRF has been to show the symmetry elements of the crystallographic data without any preconceived idea of the lattice. Of course, the auto-indexing programs provide reasonable guesses of the crystal lattice based on the values of the unit-cell parameters (dimensions and angles) and these are critical constraints. However, I encountered an example where the auto-indexing and subsequent refinement indicated a tetragonal P4 1 22 (or enantiomer) lattice and yet I could not solve the structure, even though it was a rather simple MR solution of a protein N-terminal domain of about 150 amino acids. A rather straightforward calculation and plotting the SRF immediately showed an unambiguous threefold axis of symmetry along the approximate body diagonal of the cell, directly pointing to a cubic lattice (Fig. 6). The structure was solved immediately after reprocessing the data in P4 1 32. In cases of space-group ambiguity, I suggest that processing the initial crystallographic data in lower symmetry space groups, within the same or different Laue classes, and examination of the SRF could point to the correct space-group assignment, in conjunction with the suggested solutions provided by the auto-indexing programs.
The dramatic impact of the use of the SRF to unravel the NCS of complex asymmetric units is presented in Fig. 7. The crystallographic data could be reduced either in P2 1 (unit-cell parameters a = 80.1, b = 130.3, c = 112.6 Å , = 90.07 ) or in P222 1 , with the longest c axis coinciding with the 2 1 screw axis. The NCS revealed significant peaks for twofold, threefold and even sixfold symmetries inside the asymmetric unit. The solution by MR proceeded in a stepwise fashion, as reported previously PDB entry 6pbs). It was particularly intriguing to see strong evidence for a threefold axis, as only dimers had been hinted at before. Finally, the solution was obtained and refined by solving for a 32 symmetrical cluster (six chains) in P222 1 indexing and searching for two such clusters in the diffraction data reduced as P2 1 (b unique). SRF of the crystallographic data for ClpC1 NTD-rufomycin processed in space group P4 1 22, showing the threefold approximately along the body diagonal of the crystallographic unit cell (bottom left) as well as the inclined twofold axis (top left). Notice that since the crystallographic axes are not exactly equal (in this tetragonal system) the orientation of the 'crystallographic' threefold axis is not exactly at 45 from each of the axes x, y and z. It would have been oriented exactly at the body diagonal if the data had been processed in P4 1 32 (the correct space group). It should be emphasized that the initial data integration and processing was performed in the tetragonal space group with acceptable R merge values in all resolution ranges. The crystals appeared to be tetragonal with no obvious threefold symmetry. Only after the failure to solve the structure was the space-group symmetry questioned. The R merge values from the cubic processing were not significantly better, although the redundancy increased.
Concurrent with the previous examples, it is worth presenting the following argument. The issues related to how to teach crystallography have been brought up for discussion many times. There are fewer formal courses of crystallography in the curricula that are important for training future students, even in the more traditional areas of biophysics, biophysical and analytical methods of structure determination, physical chemistry and others. The most common training occurs when graduate students encounter a need to solve a biological problem related to a macromolecule (protein/nucleic acid) of biological interest. The common theme is that solving the structure of the target molecule will provide insights into its function, whether catalytic, regulatory or other. Thus, the student is asked to attempt to clone, express, purify, crystallize and solve the structure of the macromolecule of interest. Often, there is no expertise in the field in the laboratory, and once the students manage to obtain crystals they are supposed to solve the structure with the 'powerful software' tools available. The assumption is that once the data have been collected and processed (possibly even by beamline scientists), the structure solution comes from running a series of programs. The focus is on solving the structure, and 'learning' crystallography is often equated to being able to run a series of programs, often as streamlined pipelines, and ending with the structure ready for 'interpretation' and rationalizing the biological function(s). Unfortunately, this approach does not teach crystallography, and this is the reason why workshops and crystallography schools are so useful. However, in addition to what those schools do very well, it is suggested that they also include use of the very valuable visual tools of the past to illustrate and explain more fully the crystallographic concepts underlying the amazing software tools that they are providing to younger generations. This suggestion should also be adopted by the courteous beamline scientists who often process the collected data for inexperienced users.
In summary, images are more valuable than words and various lists of copious numbers. They are invaluable as a complement to the extensive computer outputs. Certain  The complex NCS present in the crystals of ClpC1 NTD-ecumicin. The asymmetric unit of the crystal of ClpC1 NTD-ecumicin (space group P2 1 ) contained 12 molecules of the NTD domain of ClpC1 and 24 molecules of the ligand, the natural depsipeptide ecumicin. The 12 ClpC1NTD-ecumicin (1:2) complexes were packed in a nearly orthorhombic cell in two clusters of approximately 32 aggregates . 'traditional' crystallographic images are more informative than the various Cartesian plots produced by most crystallographic programs. I would ask software developers that in addition to the extensive output of their workflows, additional graphic files are output to allow the users the option to examine the three main projections of the reciprocal lattice and an image of the stereographic projection of the SRF using the integrated intensities. I would also encourage the beamline personnel and younger generations to use the integrated visualization tools available to 'cross-examine' the answers inferred by the programs. This visual information would perfectly complement the numerical and statistical output of these programs and would probably aid in space-group determination and in the better characterization of noncrystallographic symmetry, if present. Together, these additions will be a more reliable and informative path to the satisfying experience of solving the structure, with a complete understanding of the process and a fuller appreciation of the final result.