research papers
Density constraints and lowresolution phasing
^{a}LCM3B, UPRESA 7036 CNRS, Faculté des Sciences, Université Henri Poincaré Nancy I, 54506 VandoeuvrelèsNancy, France,^{b}Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region 142292, Russia, and ^{c}Institute de Génétique et de Biologie Moléculaire et Cellulaire, CNRS, INSERM et Collège de France, Parc d'Innovation, BP 163, 67404 Illkirch CEDEX, CU de Strasbourg, France
^{*}Correspondence email: sacha@lcm3b.unancy.fr
Direct phasing needs additional information of a nonspecific kind in order to select the correct phase set from all possible ones. This paper analyses the use of constraints which can be formulated in terms of electrondensity values. One and multidimensional histograms and connectivity properties are implemented as such constraints in densitymodification procedures. These approaches usually cannot unambiguously select the best solution from a set of alternative phase variants. Nevertheless, they do allow the rejection of wrong solutions and the use of
and averaging on the remaining variants provide a good starting point for further phaserefinement procedures.Keywords: density constraints.
1. Introduction
Nowadays, the direct determination of atomic coordinates from Xray diffraction data is routine work for relatively small molecules. However, the construction of a macromolecular model usually requires the calculation of the Fourier synthesis
at a limited resolution d = min_{h∈S}(1/h) and its interpretation in terms of an atomic model. In order to calculate the distribution ρ(r), phase values φ_{h} need to be assigned to the corresponding experimental structurefactor magnitudes F_{h}. In general, phasing methods use several sets of diffraction magnitudes measured under slightly different conditions: modified crystals (Perutz, 1956) or different wavelengths (for a recent review, see Hendrickson & Ogata, 1997). Otherwise, a known approximate model, usually atomic, of the whole molecule or a significant fraction of it is necessary (see, for example, the review by Rossmann, 1990). The problem of phase determination from a single set of magnitudes F_{h}, also known as direct phasing, is still a challenge for macromolecular crystallography. A collection of reviews on this subject was prepared for the ECM18 in Prague (Podjarny et al., 2000).
Direct phasing needs additional information of a nonspecific kind in order to select the correct phase set from all possible ones. In this paper, we restrict ourselves to the information which may be formulated directly in terms of electrondensity values. To be more precise, we consider constraints applied to values of a truncated Fourier series (1) calculated at grid nodes in the
These density constraints can be conventionally divided in several major groups depending on the way in which they are imposed.1.1. Constraints on synthesis values at given points of the unit cell
Methods in this group use constraints on the density value based on the position of the grid point in the ). Another possible constraint is the equality of the density values at all points of the solvent region (Bricogne, 1974). This gives a basis for a number of solventflattening and densityaveraging methods. In both examples, additional geometrical information is used, namely the knowledge of the molecular envelope and/or of the operators.
For crystals with this can be the condition that the density values at symmetrically related points are equal (Rossmann, 19721.2. Constraints on synthesis values
These methods are based on the knowledge of typical values of electrondensity distributions calculated at a given resolution. For any crystal, overly high or low density values of the synthesis calculated on an absolute scale are not possible. The same argument can be reformulated for rootmeansquare deviations. In a more general form, this information may be represented as a Fourier synthesis histogram. It defines both the range of possible values and the probabilities of finding them in the 6 for more details). Special attention should be given to the fact that such properties can vary with the synthesis resolution. For example, the property of the electrondensity distribution of being nonnegative everywhere was successfully used to develop (Karle & Hauptman, 1950) and some densitymodification methods (Qurashi, 1953; Hoppe, 1962 and many others; reviewed, for example, by Podjarny et al., 1996). Nevertheless, the truncated Fourier series do not necessarily reveal the nonnegativity even when calculated with the true phases. Furthermore, the Fourier synthesis histograms are different at different resolutions.
(see §1.3. Topological properties
Another way to apply constraints to a Fourier synthesis is to restrict the shape of the region containing points with specified density values. For example, at high resolution one would expect to see a continuous image for the main chain with branches corresponding to the side chains if a proper density cutoff level is chosen. At low resolution, one would expect to see a number of compact domains showing the molecular packing. The expected shape of the molecule (if known) can be also considered as a constraint of this type. In these approaches, the criteria are purely geometric and absolute values of density distribution are not important.
This paper discusses the application of several density constraints for phasing at low resolution. Many of the tests discussed here were performed with calculated data in order to demonstrate clearly the character of the problem. More applications, including those with experimentally obtained data, are discussed in the original papers devoted to particular methods and are referred to below.
2. Lowresolution crystallographic images
2.1. Maps and reflection sets
At the usual resolution of about 3 Å or higher at which most crystallographic macromolecular models are constructed the contrast of peaks in the electrondensity maps is quite high and cannot be hidden by the absence of several lowresolution harmonics. Such a strong signal has for many years allowed crystallographers to avoid the particular problems of low resolution data collection and phasing. In contrast, at a resolution of approximately 4 Å or lower crystallographic images do not have such strong details. Since they do not show detailed information, one could suppose that the syntheses are greatly influenced by a few of the strongest reflections and that it might be sufficient to phase these. On the other hand, dataset completeness was found to be crucial for the quality of low resolution images (Podjarny et al., 1981; Rayment, 1983; Urzhumtsev et al., 1989; Urzhumtsev, 1991). Therefore, a special study was undertaken in order to check the results of phasing the strongest lowresolution reflections alone.
2.2. Test data
A test model was prepared simulating the position and a rough shape of the 50S particle from Haloarcula marismortui (H50S) phased directly at low resolution (Lunin et al., unpublished work; the experimental data were provided by A. Yonath). The is C222_{1} and the unitcell parameters are a = 210, b = 300, c = 500 Å, with one molecule per This particular structure was chosen for tests because several different phasing methods had behaved abnormally. In order to obtain a test data set, five spheres approximating the shape and the position of the H50S particle were filled randomly by pseudoatoms. Structure factors to 60 Å resolution (52 reflections in total) were calculated from this pseudoatomic model and were used throughout this section to simulate experimental values (for more details, see Lunin et al., 1999).
The synthesis calculated with this complete data set, S, showed eight well separated molecular envelopes in the (Fig. 1a), consistent with the eight symmetrically related molecules. Since 52 reflections is currently too many for an exhaustive phase search (see §6.2), the 11 strongest reflections (subset S_{0}) were chosen from the set S. The synthesis calculated with these reflections (using the exact phases) is shown in Fig. 1(b). The molecular envelopes lost their shape and, more importantly, the map no longer shows separated individual molecules.
2.3. Seminvariant study
Such deformation of molecular images can be explained in terms of seminvariant structure factors, i.e. those which do not change their magnitude and phase when an alternative origin permitted by the is used (Lunin et al., 1999). Let u be a permitted origin shift such that 2u = 0 (modulo 1), which is true for most cases. Then every synthesis ρ(r) can be represented by a sum of two components,
with
It is easy to demonstrate that these partial syntheses ρ_{oi}(r) and ρ_{ov}(r) correspond to the Fourier series over the seminvariants (`oi' stands for origin independent) and over other reflections (`ov' stands for origin variable), respectively,
The synthesis (3) shows the superimposition of two copies of a molecular image shifted by the vector u. The addition of the extra molecular copies results in merged envelopes rather than in separated molecular images. The synthesis (4) shows the true image surrounded (and possibly distorted) by its flipped and shifted copies.
2.4. Application of the seminvariant decomposition
For the model data set discussed above there are three possible independent vectors u for the origin shift: u_{1} = (½, 0, 0), u_{2} = (0, 0, ½) and u_{3} = (½, 0, ½); other origins in C222_{1} appear owing to the Cface centred cell. Three seminvariantremoved sets of reflections, S_{1}–S_{3}, corresponding to u_{1}, u_{2} and u_{3}, respectively, as well as the set of 11 strongest reflections, S_{0}, are given in Table 1.

For each of these four data sets, a Fourier synthesis was calculated with the exact structure factors (exact in both magnitude and phase). Two extreme cases are presented in Fig. 2. If there is very little overlap of ρ(r) and ρ(r − u), as is the case for the vector u_{1}, the ρ_{ov}(r) component shows eight connected molecular regions even at a quite low cutoff level. However, an overlap of ρ(r) and ρ(r − u), as with the vector u_{3}, gives an endless continuous domain as was observed with direct phasing of the H50S particle.
If the selected set of strong reflections used for the phasing is dominated by S_{oi} or by S_{ov}, then the corresponding image may have features corresponding to one of these syntheses. Conversely, the set of structure factors for phasing can be chosen specifically to agree with a particular property and such selection will be discussed in §6. However, the complete set of reflections were used in the tests of §§3–5.
3. Density flattening at low resolution
For a Fourier synthesis calculated on a unitcell grid, every grid point is characterized by its positional coordinates and by the value of the synthesis. When geometric information is also available, the position of the point can impose limitations on the synthesis value. This geometric information can be either the position with respect to the symmetry elements (e.g. Lunin, 1989) or, more usually, whether or not a point belongs to the molecular region. The latter constraint forms the basis of the solventflattening procedure, which was introduced in real space by Bricogne (1974) and became very popular after an automatic procedure for molecularenvelope determination was suggested (Wang, 1985). This method of phase improvement is based on the hypothesis that the electrondensity distribution is more or less uniform in the intermolecular region. At low resolution this hypothesis can be extended to the molecular region, suggesting that the whole image is essentially a binary function or a molecular mask, equal to 1 inside the molecular envelope and equal to 0 outside. As far as this second hypothesis holds, there is the possibility that envelope could provide a simple phaseextension procedure with far less parameters.
These two hypothesis were tested using simulated lowresolution data for the 50S ribosomal particle from Thermus thermophilus (Urzhumtsev et al., 1996; Podjarny et al., 1998); Xray diffraction data and an envelope obtained by (BerkovitchYellin et al., 1990) were provided by A. Yonath. The is P4_{3}2_{1}2, with unitcell parameters a = b = 496, c = 196 Å.
The following procedure was applied: (i) a density distribution was calculated at a given resolution (60, 40 or 30 Å) and (ii) a molecular envelope was defined as a set of unitcell points with density values above a given threshold; the threshold was chosen in such a way that the region with higher values of the density occupied a given percentage of the unitcell volume and consisted of a single domain. Then either the density distribution was flattened in the solvent region with the density inside the molecular envelope left unchanged (soft modification) or, in order to test the hypothesis of a flat envelope, the density inside the envelope was also flattened (hard modification).
In both cases, structure factors were calculated from the modified density distribution and compared with those calculated from the original model. If, after the density modification, the phases lead to a map ), we consider the phase extension to be successful. Only one cycle of density modification was performed in each case. The results of these tests, presented in Table 2, can be summarized as follows.
of 0.5 or higher (Lunin & Woolfson, 1993


4. Constraints on the synthesis values; histograms
Constraints of this type do not depend on the position of the point in the ab initio phasing method developed by Lunin et al. (1990). This highlighted a number of features which were later found in many other approaches to ab initio phasing and it is, therefore, worth repeating the discussion here.
and in particular do not depend on the character (solvent or protein) of a given point. The use of electrondensity histograms provided the basis of a lowresolution4.1. Electrondensity histograms
To define the electrondensity histogram υ(k) of a synthesis ρ(r) a set of density limits
are chosen to cover the whole range of expected values of a given synthesis class (e.g. of a given resolution). Then each value ρ(r) is placed in a bin k such that ρ_{k} < ρ(r) < ρ_{k + 1} and the corresponding bin counter ν(k) is increased. After all points are treated, the normalized frequencies
are calculated, where N is the total number of grid points. The histograms vary from crystal to crystal and depend on a number of parameters, but particularly on the resolution (Lunin, 1988). However, they do have a typical shape for protein crystals and can therefore be used as an additional source of information for phase improvement and direct phasing.
4.2. Histograms and lowresolution solvent flattening
As shown by Lunin & Vernoslova (1991), improving the agreement of the calculated and the standard density histograms is the basis of most densitymodification procedures,
Furthermore, the knowledge of two standard histograms at different resolutions defines the densitymodification function φ that should be applied for phase extension. It is worth repeating that this function depends on the density value ρ_{old} and does not depend on the position r of the grid point where this value is calculated. In the case of lowresolution solvent flattening, discussed in §3, histograms were calculated for a range of resolutions between 20 and 90 Å. The comparison of these provided the densitymodification function to be applied to the Fourier synthesis calculated at 90 Å resolution in order to reproduce the 20 Å resolution histogram. This function (Fig. 3) supports the idea of soft modification: low density values (which in this case correspond to the solvent region) should be flattened and higher values retained (in fact, the function suggests that the highest density values should be sharpened).
4.3. Model and data for direct phasing
The first test of the histogrambased directphasing method was performed with an artificially constructed atomic model. This model simulated the crystal of the elongation factor G (Chirgadze et al., 1991). In order to obtain this, an atomic model of a protein of similar molecular mass was placed without overlapping into the EFG Structure factors to 30 Å resolution (29 reflections) were calculated from this model and the magnitudes were used to simulate experimental data. The phases calculated from this model were taken as the correct phases with which the results could be compared. The histogram calculated from the exact 30 Å resolution synthesis was assumed to be known and was used as a source of phasing during the procedure.
4.4. Search procedure
A Monte Carlo procedure was applied with 100 000 phase sets at 30 Å resolution generated randomly and independently. For every phase set, a map was calculated using the given magnitudes and its electrondensity histogram was compared with the exact one. Since the correct phase set was known, a phase correlation could be calculated for every phase set. The distribution of the phase correlation against the histogram correlation is shown in Fig. 4.
4.5. Results
The analysis of the twodimensional distribution of the histogram and the phase correlations (Fig. 4), leads to the following conclusions (see Fig. 5 as an illustration).

These observations are not specific to the histogram criterion but are also typical for other criteria used in lowresolution phasing (see, for example, Lunin, Lunina, Petrova et al., 2000). Although histogrambased phasing does have a number of difficulties such as a very high sensitivity to the set of structure factors used (§2) and the need for standard histograms, the tests show that such a method could have potential in direct phasing at low resolution.
4.6. Synthesis alignment
Two phase sets, while being formally different, may present the same solution of the ).
but corresponding to different choice of the origin and/or So, it is extremely important that the syntheses within a cluster are calculated with the same unitcell origin before being averaged. The choice of the same origin and may be performed by means of a mapalignment procedure (Lunin & Lunina, 19964.7. Histograms and wavelets
Recently, wavelet analysis has been introduced into the solution of the ; Main & Wilson, 2000; Wilson & Main, 2000; Lunin, 2000). The values of Fourier synthesis calculated at grid nodes may be considered to some extent as wavelet coefficients for a special type of wavelets (Lunin, 2000). There exist numerous links between waveletbased and densityconstraintsbased approaches, which are beyond the scope of the paper, as well as some other approaches which use griddensity values as primary variables (Szöke et al., 1997). Here, we mention only that the use of the histograms as restraints on wavelet coefficients has resulted in promising results in the phase extension (Main & Wilson, 2000; Wilson & Main, 2000).
(Main, 19995. Mixed approaches
5.1. Multiple histograms
The phasing method suggested by Zhang & Main (1990) combines different types of information discussed above. They modified the density values inside the molecular region in order to match the density histogram to a known one and simultaneously flattened the density values in the solvent region. Such density modification relies on knowledge of the molecular region and may be considered as a modification using two different histograms. The first one is a usual histogram linked to a molecular region. The density flattening in the solvent region may be considered as histogram matching with a singular histogram which allows only one value (the mean solvent density value) for all the points in the solvent region. A natural generalization of this procedure would be the use for the solvent region of a more sophisticated histogram which takes into account variations of solventdensity values.
Conversely, if the molecular region histogram is substituted for a singular histogram which allows only one value for all molecular region points, the procedure of density modification becomes equivalent to the flattening outside and inside molecular region, similar to the `hard' modification discussed above in §3.
5.2. Distancedependent histograms
Attempts to obtain a better model for the solvent distribution led Schoenborn (1988) to a model in which the solvent density ρ depends on the distance r from the molecule (see also Cheng & Schoenborn, 1990). This was further developed by Urzhumtsev & Podjarny (1995a), who supposed that the solventdensity distribution at points at a distance r from the molecular border is not a constant but may be described by a histogram H(ρ;r). These histograms depend on the resolution d of the current Fourier synthesis. They could be used for density modification at any equidistant surface in a manner similar to the histogram matching.
To assign the solventdensity values more precisely, it is possible to use the observation that the density value could be lower for the points on a convex side of the envelope and be higher in cavities large enough to trap a solvent molecule. The obstacle for this is that at low resolution the exact molecular border is unknown. Nevertheless, these points can be discriminated on the base of their position with respect to a series of envelopes calculated at different resolutions d_{m}, m = 1, …, M (Fig. 6). In this case, the histogram which describes the distribution of solventdensity values at the distance r from the precise molecular border may be replaced by a series of histograms H_{d1}(ρ;r_{1}), …, H_{dM}(ρ;r_{M}) operating with the distances r_{1}, …, r_{M} from the envelopes of corresponding resolution. Similarly, a set of histograms can also be calculated for points of the molecular region.
If the molecular envelope is known, for example, by b). The molecular envelope can then be calculated at several lower resolutions, for which the histograms H_{dm}(ρ;r_{m}) are assumed to be known (r_{m} is the distance to the envelope calculated at the resolution d_{m}). The following procedure could be used to reconstruct a density distribution at a given resolution d from this set of flat envelopes.
its position in the can be determined by (Urzhumtsev & Podjarny, 1995For each point r in the unit cell

In this way, a set of flat envelopes and a corresponding set of histograms at a different resolutions can be used to calculate a modulated density distribution which could provide better phases, as shown in §3. In the case of a single envelope and a single histogram H(ρ;r), the procedure will give a distancedependent density distribution similar to that of Cheng & Schoenborn (1990).
Test calculations were performed on lowresolution data from the crystals of aldose reductase (Rondeau et al., 1992). The H(ρ;r) histograms for a 6 Å density distribution were assumed to be known and were calculated for several molecular envelopes for resolutions in the range 20–6 Å. As well as using the correct position for the envelopes, tests were also carried out in which the envelopes were not positioned exactly. For the correctly placed 6 Å resolution envelope, the structure factors calculated from a flat density were reasonably good, but this was not the case when the envelope was wrongly positioned. With a single histogram at 6 Å resolution, the procedure improved both magnitudes and phases, mostly in the higher resolution zones. When all four histograms, corresponding to 6, 8, 11 and 20 Å resolution envelopes, were used, the magnitudes and the phases for the entire resolution range were well predicted (Fig. 7) improving the previous results (for further details, see Urzhumtsev & Podjarny, 1995a).
6. Topological features: connectivity
All mathematical methods for phasing are based on known properties of the density distribution at a given resolution. A Fourier synthesis can be viewed as a set of lines or surfaces joining points with the same density value. This suggests that it may be enough to recover a representative surface (or surfaces) rather than the exact density values at all points of the
providing a method to distinguish the correct solution from a number of noisy syntheses. A typical electrondensity map should have the following properties.

In this section, we discuss methods to employ such prior knowledge in lowresolution phasing and the search procedures and criteria used.
6.1. Connectivity and possible criteria
For a given cutoff level κ we define the set of points Ω_{κ}, which may consist of several regions of the by
Some practical details on the estimation of the connectivity of such regions calculated on a periodic grid in a crystal can be found in Lunin et al. (1999) and Lunin, Lunina & Urzhumtsev (2000).
The function ρ(r) may be calculated on different scales and it is convenient to define Ω_{κ} to be independent of the scale. In the following lowresolution studies we used two approaches, both based on the calculation of the volume for the corresponding region. Such a volume can be estimated from the number of points in the region, the unitcell parameters and the total number of grids in the unit cell.
Firstly, for a given cutoff level, the percentage of the unitcell volume that it defines can be calculated by
so that two syntheses can be analysed by comparing images corresponding to the same relative volume p. Secondly, it is possible to fix the absolute volume α (Å^{3} per residue) accounted for Ω_{κ} per residue by
If the values of p or α are fixed then a change of the F_{obs} scale changes the absolute value of κ = κ(α) but does not alter the Ω^{α} = Ω_{κ(α)} region.
For a given synthesis, variation of κ changes the region Ω_{κ} and its topological features. A slow decrease in the cutoff level can lead to the appearance of new domains corresponding to lower peaks which will merge into connected regions and finally give a large connected domain with a number of holes of decreasing size inside. Thus, numerical values can be assigned to the different topological characteristics of a given synthesis. For example,

Such characteristics can be used as selection criteria for phase sets. The examples discussed below show that even in the simplest case of a single cutoff level such constraints are useful for phasing.
6.2. Exhaustive searches
An exhaustive search in phase space can only be performed for a very small number of reflections, as the number of phase variants grows exponentially with the number of reflections. As discussed in §2, a synthesis calculated with a small number of reflections can have features which depend on the relative weight of seminvariant reflections and any selection criterion should therefore take this into account.
Conversely, for a given phasing criterion, an optimal set of structure factors can be chosen. In particular, when searching for the centre of molecules in the S_{0} to S_{3} (Table 1) as well as the complete data set to 60 Å resolution and the connectivity for these has been analysed at different cutoff levels. Remarkably, the size and number of connected components changed differently for the different data sets as the cutoff level was varied (Table 3).
it is preferable to calculate syntheses without seminvariant reflections. As an illustration, we have calculated Fourier syntheses for each of the data sets

For the same set of structure factors, a number of syntheses with wrong phases were also calculated and for a given cutoff level κ some of these also gave the correct number of connected domains of equal size. However, a slight variation in κ led either to their merging or to the appearance of noise, allowing the wrong phase sets to be identified. A numerical criterion for the selection of phase sets can therefore be formulated in which a phase set is accepted as a possible solution for a given set of structure factors if it gives the correct number of equal connected domains at the lowest cutoff level. The examples (Table 3) show that this criterion is quite sensitive to the set of structure factors and in fact is not applicable at all for some data sets, e.g. S_{0}, the set of strongest reflections, where even the exact phases do not provide a synthesis with eight similar connected domains. On the other hand, the set S_{1} is particularly appropriate for this criterion and, in the synthesis calculated with the exact phases, the corresponding eight domains appear at high levels and do not merge until the cutoff level is quite low.
In order to check the selection power of this criterion, an exhaustive search procedure was applied for all four data sets S_{0}, S_{1}, S_{2} and S_{3}. For each set, all possible phases sets were checked (both values for centrosymmetric reflections and four values, ±π/4 and ±3π/4, for noncentrosymmetric reflections). For every phase set, the corresponding map was calculated and the lowest value for the cutoff level κ was found such that the image had eight connected domains (owing to all of them had the same volume). The synthesis with the lowest value of κ was accepted as the solution and is shown in Fig. 8. The map correlation with the exact synthesis (also shown in Fig. 8) is 77%.
Several important conclusions can be drawn. Firstly, the importance of the choice of structure factors must be stressed once again. More importantly, however, the tests show the usefulness of topological information in direct phasing. This could be developed further by using larger data sets which would require a more sophisticated search technique. One possible approach is discussed briefly in the following sections and the details can be found in Lunin, Lunina & Urzhumtsev (2000).
6.3. Random searches
As the number of phased reflections increases, a crystallographic image will show not only the molecular position, as in the previous case, but also the molecular shape. When using calculated data, it is difficult to model accurately the influence of experimental errors or the contribution of bulk solvent and tests with experimental data are therefore more significant. We have studied the use of topological information in direct phasing with experimentally measured structurefactor magnitudes for several cases where the atomic model was known. In all the tests, structurefactor phases calculated from the corresponding refined atomic model served as a reasonably good approximation to the phases of lowresolution reflections (Podjarny & Urzhumtsev, 1997).
6.4. Search procedure and selection criterion
A comprehensive search becomes impractical with a large number of reflections and either a random search or some other more systematic approach such as the use of a regular grid in the space of all phase sets (Gilmore et al., 1999) must be taken. Here, we have used a random search and, in order to further accelerate the search procedure, the connectivity criterion has been modified so that a single cutoff level was used in the analysis. In most of our tests, we have found that a suitable cutoff level, κ_{25}, corresponds to the region with a volume equal to 25 Å^{3} per residue. Obviously, this does not correspond to the volume of the protein molecule but simply provides nonoverlapping peaks corresponding to different molecules in a lowresolution Fourier synthesis. In general, if the cutoff level is lower, the envelopes for individual molecules begin to merge, although some exceptions will be discussed.
For each randomly generated phase set the Fourier synthesis was calculated and the number and size of connected regions for the cutoff level κ_{25} calculated. The phase set was selected only if the number of regions was equal to the number of molecules in the and if they were of approximately equal volume. As might be expected, a random search with such a selection criterion cannot give a single solution and statistical analysis of the selected syntheses is necessary. From a number of test applications we found that two different cases were possible, examples of which are discussed in the following sections.
6.5. Normal case: topologically based phasing for γcrystallin IIIb
γCrystallin IIIb is a protein of 173 residues which crystallizes in P2_{1}2_{1}2_{1}, with unitcell parameters a = 58.7, b = 69.5, c = 116.9 Å and two molecules per Among 100 000 randomly generated phase sets calculated to 24 Å resolution (28 reflections), 576 provided a synthesis satisfying the given criterion, i.e. the κ_{25} cutoff level showed eight connected regions of very similar volume (a 10% discrepancy between the volume of the two quartets of domains was allowed because of the linking them). This ensemble of selected phase sets had a higher concentration of good phase variants than the original random ensemble. The averaging of the 576 selected variants gave a map with a 0.89 with the exact map at 24 Å resolution. More details of this test are given in Lunin, Lunina & Urzhumtsev (2000) and Lunin, Lunina, Petrova et al. (2000).
6.6. Special case: topologically based phasing for RNAse Sa
The complete set of lowresolution data for RNAse Sa (Ševcik et al., 1991) was kindly provided by E. Dodson. The protein crystallizes in P2_{1}2_{1}2_{1}, with two molecules per and unitcell parameters a = 64.9, b = 78.3, c = 38.8 Å. It contains 96 residues and the complete data set to 18 Å resolution consists of 29 reflections.
In contrast to the case for γcrystallin IIIb, we found that the synthesis calculated with experimental magnitudes and the model phases does not show eight separated domains of approximately the same volume at any cutoff level. This is because of the dense packing of the molecules, possibly coupled with the contribution from bulk solvent (for a schematic illustration, see Fig. 9). This is confirmed by the observation that the synthesis calculated with model magnitudes and phases does show the eight separate envelopes. A study was performed to check whether the use of this idealized condition in the work with experimental data will provide the correct solution.
The calculations were performed at 18 Å resolution and a phase set was selected if at κ_{25} the corresponding synthesis showed eight connected domains of similar size. From 100 000 randomly generated phase sets, 558 were selected and the syntheses averaged. The correlation of the averaged map with the correct map at 18 Å resolution was 0.75 (0.91 at 24 Å). At a high cutoff level the final map showed eight separate molecular envelopes corresponding to the molecular positions. More details of this test can also be found in Lunin, Lunina & Urzhumtsev (2000).
6.7. Topologically based phasing: conclusions
We have found that the topological criterion expressed through the number of connected domains of similar size does allow direct phasing at low resolution in quite different cases. As with other selection criteria for directphasing procedures, the criterion does not provide a single solution but enriches a population of phase sets by those close to the correct solution. The topological constraints are quite weak and the selected phase sets have very different phase quality. However, a simple averaging over the selected phase sets gives a map which can be used for model positioning, for phase improvement and for some preliminary envelope analysis. As before,
can be applied to the selected data sets in order to improve the map further.7. Conclusions
A variety of density constraints have been shown to be useful for lowresolution phasing, both for phase improvement and for direct phase determination. A number of common features have been discovered and, in particular, no search criteria has been found to select unambiguously the correct phase set. Nevertheless, the selection of variants from a random population leads to a new population with a higher proportion of good phase sets. Simple averaging of these phase sets can give a reasonable macromolecular image and
can further improve its quality. The strategy for phase searching should therefore be the statistical treatment of a relatively large number of selected variants rather than a search for the single best variant.Since densityconstraint phasing methods do not use explicit macromolecular model, they are therefore less influenced by the problem of bulk solvent. Recent results show the potential of topological criteria in a direct phasing protocol that may eventually lead to automated structure determination.
Acknowledgements
The authors thank C. Lecomte for his interest in this work. The work was supported in part by RFBR grants 970448319 and 990790461, by the CNRS through a fellowship (VYL) and the collaborative project RAS (VL)–CNRS (ADP, AGU), by the UHP, Nancy, by the Institut National de la Santé et de la Recherche Médicale and the Hôpital Universitaire de Strasbourg (HUS). They are also grateful to the groups headed by Yu. Chirgadze, D. Moras, E. Dodson and A. Yonath for providing the experimental diffraction data. The authors thank Dr J. Wilson for her valuable help in improving the manuscript.
References
BerkovitchYellin, Z., Wittmann, H. G. & Yonath, A. (1990). Acta Cryst. B46, 637–643. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1974). Acta Cryst. A30, 395–405. CrossRef Web of Science IUCr Journals Google Scholar
Cheng, X. & Schoenborn, B. P. (1990). Acta Cryst. B46, 195–208. CrossRef CAS Web of Science IUCr Journals Google Scholar
Chirgadze, Yu. N., Brazhnikov, E. V., Garber, M. B., Nikonov, S. V., Fomenkova, N. P., Lunin, V. Yu., Urzhumtsev, A. G., Chirgadze, N. Yu. & Nekrasov, Yu. V. (1991). Dokl. Acad. Nauk SSSR, 320, 488–491. CAS Google Scholar
Gilmore, C., Dong, W. & Bricogne, G. (1999). Acta Cryst. A55, 70–83. Web of Science CrossRef CAS IUCr Journals Google Scholar
Hendrickson, W. A. & Ogata, C. M. (1997). Methods Enzymol. 276, 494–522. CrossRef CAS Web of Science Google Scholar
Hoppe, W. (1962). Acta Cryst. 15, 13–17. CrossRef CAS IUCr Journals Web of Science Google Scholar
Karle, J. & Hauptman, H. (1950). Acta Cryst. 3, 181–187. CrossRef IUCr Journals Web of Science Google Scholar
Lunin, V. Y. (1988). Acta Cryst. A44, 144–150. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. Y. (1989). Acta Cryst. A45, 501–505. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. Y. (2000). Acta Cryst. A56, 73–84. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lunin, V. Y. & Lunina, N. L. (1996). Acta Cryst. A52, 365–368. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. Y., Lunina, N. L., Petrova, T. E., Skovoroda, T. P., Urzhumtsev, A. G. & Podjarny A. D. (2000). Acta Cryst. D56, 1223–1232. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lunin, V. Y., Lunina, N. L. & Urzhumtsev, A. G. (1999). Acta Cryst. A55, 916–925. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lunin, V. Y., Lunina, N. L. & Urzhumtsev, A. G. (2000). Acta Cryst. A56, 375–382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lunin, V. Y., Urzhumtsev, A. G. & Skovoroda, T. P. (1990). Acta Cryst. A46, 540–544. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. Y. & Vernoslova, E. A. (1991). Acta Cryst. A47, 238–243. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lunin, V. Y. & Woolfson, M. M. (1993). Acta Cryst. D49, 530–533. CrossRef CAS Web of Science IUCr Journals Google Scholar
Main, P. (1999). Abstracts of the XVIIIth IUCr Congress and General Assembly, p. 183. Abstract M12.BB.001. Google Scholar
Main, P. & Wilson, J. (2000). Acta Cryst. D56, 618–624. Web of Science CrossRef CAS IUCr Journals Google Scholar
Perutz, M. E. (1956). Acta Cryst. 9, 867–873. CrossRef CAS IUCr Journals Web of Science Google Scholar
Podjarny, A. D., Lunina, N., Urzhumtsev, A. G., Vernoslova, E. A., Petrova, T. & Lunin, V. (1998). Abstracts of the American Crystallographic Association Meeting, p. 74. Abstract 11.03.08. Google Scholar
Podjarny, A. D., Rees, B. & Urzhumtsev, A. G. (1996). Methods in Molecular Biology, Vol. 56, Crystallographic Methods and Protocols, edited by C. Jones, B. Milloy & M. R. Sanderson, pp. 205–226. Totowa, New Jersey: Humana Press. Google Scholar
Podjarny, A. D., Schevitz, R. W. & Sigler, P. B. (1981). Acta Cryst. A37, 662–668. CrossRef CAS IUCr Journals Web of Science Google Scholar
Podjarny, A. D. & Urzhumtsev, A. G. (1997). Methods Enzymol. 276, 641–658. CrossRef CAS Web of Science Google Scholar
Podjarny, A., Urzhumtsev, A. & Usón, I. (2000). Reviews on Direct Phasing for the XVIIIth European Crystallography Meeting. In the press. Google Scholar
Qurashi, M. M. (1953). Acta Cryst. 6, 103. CrossRef IUCr Journals Web of Science Google Scholar
Rayment, I. (1983). Acta Cryst. A39, 102–116. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rondeau, J.M., TêteFavier, F., Podjarny, A., Reymann, J.M., Barth, P., Biellmann, J.F. & Moras, D. (1992). Nature (London), 355, 469–472. CrossRef PubMed CAS Web of Science Google Scholar
Rossmann, M. G. (1972). The Molecular Replacement Method. New York, London, Paris: Gordon & Breach. Google Scholar
Rossmann, M. G. (1990). Acta Cryst. A46, 73–82. CrossRef CAS Web of Science IUCr Journals Google Scholar
Schoenborn, B. P. (1988). J. Mol. Biol. 201, 741–749. CrossRef CAS PubMed Web of Science Google Scholar
Ševcik, J., Dodson, E. J. & Dodson, G. G. (1991). Acta Cryst. B47, 240–253. CrossRef Web of Science IUCr Journals Google Scholar
Szöke, A., Szöke, H. & Somoza, J. R. (1997). Acta Cryst. A53, 291–313. CrossRef Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G. (1991). Acta Cryst. A47, 794–801. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G., Lunin, V. Y. & Luzyanina, T. B. (1989). Acta Cryst. A45, 34–39. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G. & Podjarny, A. D. (1995a). Jnt CCP4/ESF–EACBM Newslett. Protein Crystallogr. 32, 12–16. Google Scholar
Urzhumtsev, A. G. & Podjarny, A. D. (1995b). Acta Cryst. D51, 888–895. CrossRef CAS Web of Science IUCr Journals Google Scholar
Urzhumtsev, A. G., Vernoslova, E. A. & Podjarny, A. D. (1996). Acta Cryst. D52, 1092–1097. CrossRef CAS Web of Science IUCr Journals Google Scholar
Wang, B.C. (1985). Methods Enzymol. 115, 90–112. CrossRef CAS PubMed Google Scholar
Wilson, J. & Main, P. (2000). Acta Cryst. D56, 625–633. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zhang, K. & Main, P. (1990). Acta Cryst. A46, 41–46. CrossRef CAS IUCr Journals Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.