research papers
Patterson correlation methods: a review of CNS
with^{a}Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 4230, Berkeley, CA 94720, USA
^{*}Correspondence email: rwgrossekunstleve@lbl.gov
This paper presents a review of the principles of CCP4 Study Weekend in mind. A complementary presentation with animated Patterson maps is available online (http://cci.lbl.gov/~rwgk/ccp4sw2001/ ). The implementation of molecularreplacement methods in the Crystallography and NMR System (CNS) is presented and discussed in some detail. The three principal components are the direct rotation function, Patterson correlation and the fast translation function. CNS is available online and is free of charge for academic users.
with the audience of theKeywords: molecular replacement; Patterson correlation methods; CNS.
1. Introduction
The method of ) and Rossmann & Blow (1962). The latter publication marks the beginning of practical application to the solution of macromolecular crystal structures. The term `molecular replacement' is somewhat misleading because nothing is `replaced' (but it is helpful for remembering the initials of one of the main champions of the method). The conventional understanding of what encompasses is the placement of one or more known molecular models in the of the crystal under study. The search models are often extracted from databases such as the Protein Data Bank (Berman et al., 2000) or different crystal forms that were solved previously.
was pioneered four decades ago by Hoppe (1957In general, placing a molecular model in a et al., 1999).
is a sixdimensional search problem. The six are most conveniently parameterized as three rotation angles and three translations along the basis vectors of the coordinate system. Conventionally, an (a volume of the search space that is unique under symmetry) is sampled on a uniform grid. For a typical macromolecular the product of angular and translational sampling points is usually too large to carry out an exhaustive sixdimensional search in a reasonable time with current computing resources. However, there have been cases where an exhaustive search has been carried out in spite of the computational cost (SheriffRossmann & Blow (1962) showed that it is possible to break up the sixdimensional search into two consecutive threedimensional searches: a search for the angular orientation of the search molecule (rotation search) and a subsequent search for the translation (translation search). This greatly reduces the demand for computing resources. The total number of sampling points for the two threedimensional searches is roughly proportional to the square root of the number of sampling points for the exhaustive sixdimensional search.
In the Crystallography and NMR System (CNS; Brunger et al., 1998), a third powerful procedure is usually inserted between the rotation search and the translation search: Patterson correlation of the molecular orientation. We will discuss all three stages in the order in which they are typically used.
2. Rotation search
CNS implements two types of rotation search. We will refer to the first type as the `traditional rotation search'. The second type is commonly referred to as the `direct rotation search'. There is a conceptual distinction between these two types of rotation searches. In the traditional rotation search, two Patterson maps are rotated with respect to each other and then superimposed. This can be performed either in direct or in (Crowther, 1972; Navaza, 1987). In contrast, in a direct rotation search the molecular model is rotated directly. The term `directly' is used because it is the fundamental concept of the rotation search to rotate the model. The following sections explain that rotating maps instead is actually a nontrivial optimization, devised to reduce the computer time.
2.1. Principles of the rotation search
An animation that illustrates the general ideas behind the traditional and the direct rotation searches is available at http://cci.lbl.gov/~rwgk/ccp4sw2001/ . The fundamental prerequisites for the understanding of the methods are as follows.
intramolecular (within a molecule) and intermolecular (between molecules). The intermolecular vectors for a given molecule can in turn be classified as vectors between the following.

In summary, the model
has only peaks arsing from intramolecular vectors and intermolecular vectors between copies arising from translations (type i in the list above). Conceptually, both the traditional and the direct rotation search superimpose this `partial' with the observed This can be viewed as a patternmatching procedure. The model is the search pattern. The observed contains the search pattern in an unknown angular orientation. In the observed the search pattern is obscured by other patterns (other types of interatomic vectors) that are not considered in the model Patterson, and noise.The traditional and the direct rotation search are two implementations of this patternmatching concept. Both have their advantages and disadvantages.
In the direct rotation search, the model is rotated directly and a structurefactor calculation is carried out for each sampled angular orientation. This has the advantage of avoiding approximations such as interpolations, but the disadvantage of being computationally expensive.
In the traditional rotation search, the computationally expensive structurefactor calculation is carried out only once to obtain a model
(the next paragraph explains this in detail) which is then rotated and superimposed with the observed This has the advantage of being relatively fast, but the disadvantage of involving approximations.Rotating Patterson maps with respect to each other is not as straightforward as it might seem at first sight. The problem arises from the fact that the two types of interatomic vectors present in the model rotate around different origins. In order to be able to consistently rotate the search pattern present in the map, the two types of vectors need to be spatially separated. Fig. 1 explains how this can be achieved by placing the search model in an artificially enlarged The intramolecular vectors in the large are then concentrated around the origin of the and the intermolecular vectors are concentrated around the other points. It is now possible to cut out the spherical region around the origin that contains the isolated intramolecular vectors, rotate it and superimpose the observed in order to find the angular orientation with the best match. For reasons that will become apparent in the next section (equation 1), the spherical region is commonly referred to as the integration sphere. Obviously, the radius of the integration sphere is chosen to be similar to the largest intermolecular vector. In several implementations of the traditional rotation function, including CNS, a region around the large Patterson origin peak is normally omitted to improve the signaltonoise ratio. The resulting actively used region of the model is then called the integration shell (represented as a yellow region in Fig. 1d).
the intramolecular vectors and the intermolecular vectors arising from translations, are, in general,spatially intermixed. As the search model is rotated, vectors in the map that are close to each other2.2. Rotation search target functions
In CNS, the traditional rotation function is evaluated in real space (Brünger, 1990). We will therefore use this term from now on. For the realspace rotation search, the Patterson correlation Rot(Ω) for a given angular orientation Ω is evaluated as the correlation integral,
P_{obs} and P_{model} are the observed and model Patterson functions, respectively, and u is a location vector in Patterson space U.
The direct rotation function CC(Ω) for a given angular orientation Ω of the search model is typically evaluated as the standard linear of the observed and calculated normalized structurefactor amplitudes E^{2}. The standard linear is well known and frequently used in statistics to measure the strength of a linear relation of two variables. The formula for the evaluation of the is
where X = E^{2}. The summations are computed for all H. 〈X〉 denotes the mean of the X_{H}.
At first sight, (1) and (2) look very different. However, in practice (1) is evaluated as the sum of products. (2) is again a sum of products. The difference is just that in (2) each variable is centered around its mean (this is achieved by the subexpressions of the type X − 〈X〉) and the sums in the denominator normalize the coefficient such that it has values in the range from −1 to 1. In the absence of approximations, the two ways of evaluating the Patterson correlation should give essentially identical results, even though one is evaluated in real space and the other in The absolute values will be different because the first expression is not normalized, but the rotation functions should be very similar except for a scaling factor.
2.3. Comparison of rotation searches
The approximate relative CPU times for rotation searches with AMoRe (evaluated in Navaza, 1987), the CNS real space and the CNS direct rotation function are shown in Table 1.

The direct rotation search is more than one order of magnitude slower than the realspace rotation search and AMoRe is yet another order of magnitude faster. What benefit can be expected from the direct rotation search in return for the large increase in computational expense?
In the previous section we stated that the two ways of evaluating the Patterson correlation should give similar results in the absence of approximations. However, in practice two significant approximations are made for the realspace rotation function. When the rotated is superimposed with the observed grid points do not in general superimpose directly and interpolation has to be used. The other significant approximation is that the correlation integral (1) is only evaluated for a selected set of peaks. Typically, only the highest 3000 peaks in the observed are considered in the calculation. In contrast, the direct rotation function is evaluated uniformly for the entire and does not involve interpolations.
DeLano & Brünger (1995) systematically compared the signaltonoise ratio for a number of test cases. They define the signaltonoise ratio of rotation functions as the ratio of the value of the highest signal point to that of the highest noise point, measured in standard deviations above the mean. `Points' of the rotation function are defined as peaks that are left after reduction by spatial (DeLano & Brünger, 1995). A `signal' is defined by the radius of convergence of Patterson correlation (see §3). Empirical observation led DeLano & Brünger (1995) to the conclusion that a rotationfunction peak that is within about 15° of one of the correct orientations will, in general, converge to it by Patterson correlation Rotationfunction peaks that are within the 15° range were thus considered to be a signal. Points outside this range were considered to be noise.
A typical result of DeLano and Brünger's systematic comparisons is shown in Fig. 2 for search models with all atoms, a polyalanine chain and just the C^{α} atoms. The direct rotation function consistently has a much better signaltonoise ratio. Similarly, in Fig. 3 the highresolution limit is varied. Again, the direct rotation function consistently has a significantly better signaltonoise ratio compared with the realspace rotation functions, both with and without removal of the Patterson origin peak.
3. Patterson correlation refinement
The second stage of the CNS molecularreplacement procedure is Patterson correlation (PC) which is the intervening step between the rotation search and the translation search (Brünger, 1990). The goal of PC is to improve the overall orientation of the search model. Typically, the is carried out for rigid bodies such as domains, subdomains or secondarystructure elements. The major difference from normal crystallographic rigidbody is that PC is conducted without using The rationale for this is similar to that for not using the symmetry in the rotation search (see §2.1). The target function of PC is typically defined as the standard linear correlation between observed and calculated squared normalized structurefactor amplitudes (E^{2}).
By improving the accuracy of the search model for the correct angular orientation, PC
improves the discrimination between correct and incorrect orientations and therefore enables the location of the correct peak in a noisy rotation function. In general, PC makes the combination of a threedimensional rotation search with a subsequent threedimensional translation search much more robust, so that one does not have to resort to exhaustive sixdimensional searches.Brünger (1997) systematically studied the radius of convergence of rigidbody PC under various conditions. One of the examples is a structure with two domains that are connected by a linker region. One domain was kept stationary and the other was systematically misaligned. Fig. 4 shows the value of the Patterson after PC as a function of the initial misaligned interdomain angle. In this particular case, it is found that the PC converges back to the correct angle if the second domain is misaligned by up to approximately 13°.
Another way to assess the power of PC . This figure shows that pretranslation PC has the potential to drastically reduce the number of noise peaks in the translation function. Owing to this noise reduction it can often become immediately obvious what the correct position of the search molecule is.
is shown in Fig. 54. Translation search
At this stage, the angular orientation of the search molecule is assumed to be known. The remaining problem is to determine the location of this oriented search molecule with respect to the symmetry elements. The fundamental concept for solving this problem is straightforward: the .
is subdivided into a regular grid and the search molecule is moved to each grid point in turn. At each location, a structurefactor calculation is performed. The agreement between these calculated and the observed structure factors is evaluated by some type of target function. Depending on the spacegroup symmetry, for macromolecules the result is a two or threedimensional translation function similar to the example shown in Fig. 5The translationsearch target functions available in CNS include the standard linear (equation 2 modified for translation instead of angular orientation) of normalized or unnormalized structurefactor amplitudes, both squared and unsquared (E, E^{2}, F, F^{2}) and the crystallographic R factor. The use of the latter is complicated by the fact that a reasonably accurate estimate of the scale factor between observed and calculated structure factors is required. The literature contains no conclusive evidence that this is a significant disadvantage in practice. However, a is the default choice in CNS.
Computing a translation function is relatively timeconsuming and optimizations are essential. Fujinaga & Read (1987) introduced an efficient method for computing the structure factors for each sampling point. More recently, Navaza & Vernoslova (1995) introduced an ingenious fast Fourier transform based method for computing the final two or threedimensional translation function without explicitly computing the structure factors as intermediate results. The target function for this fast translation function is the between squared structurefactor amplitudes (F^{2}).
Both the more conventional Fujinaga & Read (1987) type translation function and the fast translation function are implemented in CNS. Table 2 shows a comparison of the run times of the CNS conventional translation function (CTF) and the CNS fast translation function (FTF) for a variety of symmetries, unitcell sizes and resolution ranges. The rightmost column of Table 2 shows the factor by which the fast translation function is faster than the conventional one. Depending on the symmetry, unitcell dimensions and resolution range, the fast translation function is 200 to almost 500 times faster than the conventional search.

Because of this enormous increase in speed, the fast translation function has also found a use in the automatic heavyatom search procedure in CNS (GrosseKunstleve & Brunger, 1999). The search procedure consists primarily of alternating singleatom translation functions and PC refinements. This strategy is only practical if the fast translation function is used. CNS has been used by independent groups to automatically locate up to 40 heavyatom sites in the (Walsh et al., 2000).
5. Summary
The CNS procedures that are presented in the previous sections can be combined into a powerful general strategy for solving difficult molecularreplacement problems.
AMoRe program (Navaza, 1994) will give the correct answer much faster than the direct rotation search in CNS. However, for more difficult cases, the unique combination of enhanced signaltonoise ratio, spatial of the rotation function peaks, PC and the fast translation function is a very attractive and much faster alternative when compared with full sixdimensional searches.The time needed for the computation of the direct rotation function could be substantially reduced by using a well known optimization employed by several other programs (Castellano et al., 1992; Kissinger et al., 1999; Glykos & Kokkinidis, 2000). In CNS, a full structurefactor calculation is carried out for each sampling point in the direct rotation search. Alternatively, a fine sampling of the molecular transform and interpolation in could be employed. We expect that the resulting fast direct rotation search will be at least an order of magnitude faster. Therefore, the general strategy outlined above will be even more practical.
6. Program availability
CNS is available online at http://cns.csb.yale.edu/ and is free of charge for academic users. The procedures that are discussed in this paper are implemented in the two standard input files cross_rotation.inp (realspace rotation search and direct rotation search) and translation.inp (PC conventional translation function and fast translation function).
References
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS Google Scholar
Brünger, A. T. (1990). Acta Cryst. A46, 46–57. CrossRef Web of Science IUCr Journals Google Scholar
Brünger, A. T. (1997). Methods Enzymol. 276, 558–580. CAS Google Scholar
Brunger, A. T., Adams, P. D., Clore, G. M., Gros, P., GrosseKunstleve, R. W., Jiang, J.S., Kuszewski, J., Nilges, N., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef CAS IUCr Journals Google Scholar
Buerger, M. J. (1959). Vector Space and its Application to Crystal Structure Analysis, New York: John Wiley & Sons. Google Scholar
Castellano, E. E., Oliva, G. & Navaza, J. (1992). J. Appl. Cryst. 25, 281–284. CrossRef CAS Web of Science IUCr Journals Google Scholar
Crowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173–178. New York: Gordon & Breach. Google Scholar
DeLano, W. L. & Brünger, A. T. (1995). Acta Cryst. D51, 740–748. CrossRef CAS Web of Science IUCr Journals Google Scholar
Fujinaga, M. & Read, R. J. (1987). J. Appl. Cryst. 20, 517–521. CrossRef Web of Science IUCr Journals Google Scholar
Glykos, N. M. & Kokkinidis, M. (2000). Acta Cryst. D56, 169–174. Web of Science CrossRef CAS IUCr Journals Google Scholar
GrosseKunstleve, R. W. & Brunger, A. T. (1999). Acta Cryst. D55, 1568–1577. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hoppe, W. (1957). Acta Cryst. 10, 750–751. Google Scholar
Kissinger, C. R., Gehlhaar, D. K. & Fogel, D. B. (1999). Acta Cryst. D55, 484–491. Web of Science CrossRef CAS IUCr Journals Google Scholar
Navaza, J. (1987). Acta Cryst. A43, 645–653. CrossRef Web of Science IUCr Journals Google Scholar
Navaza, J. (1994). Acta Cryst. A50, 157–163. CrossRef CAS Web of Science IUCr Journals Google Scholar
Navaza, J. & Vernoslova, E. (1995). Acta Cryst. A51, 445–449. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24–31. CrossRef CAS IUCr Journals Web of Science Google Scholar
Sheriff, S., Klei, H. E. & Davis, M. E. (1999). J. Appl. Cryst. 32, 98–101. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tong, L. & Rossmann, M. G. (1990). Acta Cryst. A46, 783–792. CrossRef CAS Web of Science IUCr Journals Google Scholar
Walsh, M. A., Otwinowski, Z., Perrakis, A., Anderson, P. M. & Joachimiak, A. (2000). Structure Fold. Des. 8, 505–514. Web of Science CrossRef PubMed CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.