research papers
BUSTER–TNT
of severely incomplete structures with in^{a}Global Phasing Ltd, Sheraton House, Castle Park, Cambridge CB3 0AX, England,^{b}European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England, and ^{c}Laboratory of Molecular Biophysics, Department of Biochemistry, Oxford University, South Parks Road, Oxford OX1 3QU, England
^{*}Correspondence email: pietro@biop.ox.ac.uk
BUSTER–TNT is a macromolecular package. BUSTER assembles the structural model, scales observed and calculated structurefactor amplitudes and computes the model likelihood, whilst TNT handles the stereochemistry and NCS restraints/constraints and shifts the atomic coordinates, B factors and occupancies. In real space, in addition to the traditional atomic and bulksolvent models, BUSTER models the parts of the structure for which an atomic model is not yet available (`missing structure') as lowresolution probability distributions for the random positions of the missing atoms. In the BUSTER structurefactor distribution in the complex plane is a twodimensional Gaussian centred around the calculated from the atomic, bulksolvent and missingstructure models. The errors associated with these three structural components are added to compute the overall spread of the Gaussian. When the atomic model is very incomplete, modelling of the missing structure and the consistency of the BUSTER statistical model help structure building and completion because (i) the accuracy of the overall scale factors is increased, (ii) the bias affecting atomic model is reduced by accounting for some of the scattering from the missing structure, (iii) the addition of a spatial definition to the source of incompleteness improves on traditional Luzzati and σ_{A}based error models and (iv) the program can perform selective density modification in the regions of unbuilt structure alone.
Keywords: macromolecular refinement; maximumlikelihood methods; refinement of incomplete structures.
1. Introduction
BUSTER–TNT (Bricogne & Irwin, 1996) is a macromolecular package. BUSTER (Bricogne, 1993a) assembles the structural model, scales observed and calculated structurefactor amplitudes and computes the model likelihood. The structural model in BUSTER can include a description of the parts of the structure for which an atomic model is not yet available (`missing structure'). TNT (Tronrud et al., 1987; Tronrud, 1992, 1996, 1997, 1999) receives the likelihood derivatives from BUSTER, evaluates the stereochemistry and NCS restraints residuals and their derivatives and shifts the coordinates, B factors and occupancies of the atomic model to maximize their likelihood while satisfying the restraints.
Both the use of
(ML) and the modelling of the missing structure help in overcoming the major drawbacks encountered by classical methods [least squares (LS) + difference maps] when dealing with the and completion of incomplete structures.The ML method and the missingstructure parameterization are based on a statistical treatment of model structure factors by techniques that constitute the core of BUSTER (Bricogne, 1988, 1993a). Their purpose is to generate and exploit quantitative descriptions of the statistical behaviour of structure factors resulting from the two main sources of randomness present in the situation described above:

At any given stage of the F_{calc}; instead, they have a probability distribution. In practice, these distributions are often approximated by Gaussians and are hence described in terms of the expectation of any collection of random structure factors and by the covariance matrix of fluctuations around these expectations (Bricogne, 1988).
or completion process, model structure factors do not have a `calculated value' as implied by the usual notationThis statistical picture takes into account the phase uncertainty present in these model structure factors to drive the BUSTER calculates the marginal probability distribution of model amplitudes and seeks to maximize the value taken by this marginal probability over the observed amplitudes. This value is called the likelihood of the current model, Λ, and its maximization with respect to all or any of the parameters describing the current model is called the ML of those parameters.
of the Instead of treating their phases as constants when trying to improve the fit between the model amplitudes and the observed ones,Unlike the LS method, the initial probability distribution for the model structure factors may contain an explicit dependency on parameters that influence the variance of the distribution and such parameters may be refined along with others. These parameters are referred to as imperfection parameters. It is through such refineable variancemodulating parameters that the ML method is able to keep a safe distance between observed amplitudes and the amplitudes of the traditional F_{calc}s and thus avoid overfitting. Experimental information on the phases attached to the observed amplitudes can further assist in this bias removal.
The ML TNT) and the parameterization of the missing structure by means of a lowresolution realspace distribution are naturally associated in this formalism in the sense that the probability distribution of the model structure factors and hence the likelihood Λ of the current model depends symmetrically on the atomic parameters (x, y, z, B, occupancies) describing the current and on other parameters, the Lagrange multipliers (λs), describing the extra detail currently conveyed by the positional distribution of the atoms in the missing structure. Since the model structure factors are sums of contributions from the the missing structure and the bulk solvent, we see that the gradient of the loglikelihood (LL), = log(Λ), with respect to the expectations of model structure factors can be redirected (by the chain rule) either towards the atomic parameters on which the atomic model contribution depends, or towards the Lagrange multipliers on which the missing structure contribution depends, or towards both.
of the atomic model (in conjunction withThe present paper focuses on the BUSTER–TNT, while a manuscript in preparation will describe the phase by ML variation of the missing atoms' λs in BUSTER.
in2. Symbols used in this paper
Four types of realspace distributions are dealt with, all of which are handled in BUSTER as CCP4format maps sampled on a crystallographic grid with N_{x}, N_{y} and N_{z} points along the crystallographic axes. We list here the symbols for these distributions (but omitting any subscripts) as an aid to the reader.


3. The BUSTER–TNT structural model
The structural model used for the distribution of the atoms in the crystal in BUSTER has three components (or channels; Bricogne, 1988).^{1}

Under the hypothesis that the three structural components are independent, their sum, the
for the whole structure, is also treated as a random vector and is distributed in the complex plane according to a Gaussian, which is the product of the Gaussians for the three individual components.The assumption that the errors of the
and the bulk solvent are independent breaks down at low resolution, given that the Babinet opposite of the bulksolvent envelope is most often computed by masking around the Therefore, the error model at low resolution can be overly pessimistic, because the sum of variances would need a negative covariance to diminish the total variance.3.1. The partial structure
The B factors and occupancies are available and can be refined. The electron density computed from this atomic model is denoted by ρ^{frag}(x).
or fragment is the set of atoms for which positional coordinates,3.1.1. The Luzzati distribution of F_{h}^{frag}
We make the assumption that the distribution of each and every B factors and occupancy values have no errors associated with them, in the sense that they follow a degenerate probability distribution with zero variance (Luzzati, 1952). The same Luzzati model can be used to model errors in the placement of a rigidbody `fragment'.
atom is a Gaussian centred around the mean atomic position and that the atoms are all distributed independently of one another. It is also assumed thatUnder these hypotheses for the distributions of positions, B factors and occupancies, the for the , is distributed around the offset with a variance following a twodimensional Gaussian whose first and second moments (or offset and variance) are computed as follows (Bricogne & Irwin, 1996).

When the fragment coordinate error is very small, the imperfection factor tends towards zero, the
offset tends to the unattenuated fragment and the associated variance shrinks towards zero. In this limiting case, and provided that the model contains no other source of error, the present formalism tends towards a standard LS problem.At the opposite end of the imperfection regime, with large coordinate errors, B_{impf}^{frag} tends to infinity, the offset tends towards zero and the variance is the full fragment intensity in the resolution bin. The imperfection factor erases all previous knowledge of atom localization and the only remaining information comes from the number, type and temperature factor of the pool of missing atoms. This is the Wilson regime.
3.2. The missing structure
Within BUSTER, the atoms in the missing structure are described by adopting the random scatterer model, introduced to crystallography in the context of (Bricogne, 1984). According to this model, the missing atoms are all equally and independently distributed at random, following the realspace distribution q^{miss}(x), which can be modulated by with restraints (Bricogne, 1997; details of the modulation of q^{miss} will be given by a paper in preparation).
During the BUSTER–TNT of the atomic model no modulation of q^{miss} is carried out and for all practical purposes this distribution can be thought of as a prior probability distribution for the random positions of the atoms of the missing structure. In this case, q^{miss} is indicated as m^{miss}(x) or simply m(x). With this probability model one can compute not only a value for the expectation of the lowresolution electron density for the missing structure but also a statistical variance around that expectation (the latter variance captured in see §3.2.4).
The calculation of m^{miss}(x) is described in the next sections. Similar techniques can be used to compute the envelopes for the whole macromolecule or for the bulk solvent. A more detailed description of the algorithms described here is given by Roversi et al. (2000).
3.2.1. Uniform prior m^{miss}(x)
The simplest choice for the prior probability distribution of the atoms in the missing structure is to exclude them from the regions that already contain a reliable atomic model; this approach brings into the statistical model the notion that a number of atoms are missing and that they are equally likely to be anywhere except where other atoms have been placed already.
The uniform prior distribution is defined in four steps.
We stress that this distribution is uniform outside the regions occupied by the model, hence the name `uniform prior', but its shape is not uniform; only in the absence of any atomic model would this be a truly uniform distribution throughout the unit cell.
We also notice that if the bulksolvent envelope is chosen to fill up all the space left empty by the macromolecular model, the missingstructure envelope and the bulksolvent envelope overlap. Although they can still differ for the parameter B used in the blurring step, this overlap introduces very large correlations between the scaling parameters for these two components.
3.2.2. Modelbased nonuniform prior m(x)
Sometimes a rough guess is available as to the placement of a subset of atoms, such as a protein loop or domain or a bound ligand, but the model tentatively built for the same atoms is questionable.
An envelope m^{miss}(x) can then be built around these illdefined atoms and the same atoms can be omitted from the The realspace picture of the crystal in this case then comprises the bulksolvent envelope, the atomic model for the trusted traced atoms and the missingstructure envelope. The latter is localized around the tentatively placed atoms; it represents our prior expectation about their position but does not retain any of the highresolution details that are being assessed.
The prior distribution is computed in the same way as in the uniform prior case, except for the definition of the initial binary mask; a mask is built around the total structure (fragment and missing structure), from which a mask around the fragment alone is then subtracted. By suitably assigning the masking radii, this protocol allows for the generation of an envelope around a missing loop or domain or a layer of partially ordered solvent around the fragment.
Again, depending on the PDB models, masking radii and blurring factors used to compute the missing structure and bulksolvent distributions, their boundaries can sometimes overlap; in the majority of cases, however, the default parameters for masking and blurring minimize the extent of the overlaps and thus the potential for spurious highresolution features in those regions.
3.2.3. Mapbased nonuniform prior m(x)
Even when no tentative atomic model for the missing structure is available, some rough idea about its placement can be retrieved from the presence of high values of the density (or of its local r.m.s.d.) in noisy electrondensity maps, using techniques first developed to perform phase improvement by density modification, either via the local average of the electron density (Wang, 1985; Leslie, 1987) or from its local fluctuation around the mean (Reynolds et al., 1985; Jones et al., 1991; Abrahams & Leslie, 1996; Abrahams, 1997).
Once the local density fluctuation, ω_{ρ}(x), has been obtained, one may use the homographic exponential model for the whole macromolecular envelope (for details, see Roversi et al., 2000),
Histogramming of ω_{ρ}(x) gives the value of μ_{macrom} that corresponds to the appropriate solvent fraction, while the value of β_{macrom} is taken as proportional to the reciprocal r.m.s. error of the starting density (Blow & Crick, 1959),
FoM_{h} being the figure of merit,
computed from the current phase probability distribution .
Then, to exclude the fragment region from the prior probability distribution for the missing atoms, a homographic exponential model of the fragment density is needed. The local fluctuation, , can be computed based on ρ_{frag}(x) as outlined above; the values of β_{frag} and μ_{frag} are computed from the r.m.s. error of the fragment model density and its fractional volume, as seen above. The homographic exponential model for the fragment density is then
Finally, the homographic exponential model for the missing structure envelope is proportional to the probability that position x lies in the whole macromolecular envelope but not in the fragment envelope,
and the prior distribution for the placement of the atoms of the missing structure is
An example of such a mapbased missingatoms distribution, computed for the missing domain 1 of CD55 using a partial atomic model for domains 2, 3 and 4 of CD55, is shown in Fig. 1.
3.2.4. The distribution of F_{h}^{miss}
Unlike the Luzzati error model, the structurefactor distribution for the missing structure when expanded using an Edgeworth series (Bricogne, 1984) would, strictly speaking, contain terms past the second order. These terms are neglected so that follows a Gaussian distribution around the offset with a variance . The structurefactor offset and variance for the missing structure are computed as follows.
), the variance arising from the statistical nature of the distribution of random atoms is computed using the products between real and imaginary components of the unitary structure factors of the realspace missingatom distribution.in the same formula represents the average scattering power of the missing atoms, while K_{impf}^{miss} adjusts the `granularity' of the missingatom scattering; by the central limit theorem, the missingatom variance is greater if the scattering comes from a single `random scatterer' (Bricogne, 1984) of scattering power f than if the same amount of scattering is produced by N `random scatterers' of scattering power f/N, all distributed according to one and the same missingatom positional probability distribution.
The Luzzatilike variance modulation term 1 − is introduced in (16) as a means of overcoming the shortcomings arising from neglecting the covariances between the channels. The functional form of this term was selected so as to allow the of the randomatom variance contribution as a function of the resolution. Again, when B_{impf}^{miss} refines towards 0, the variance from the missing structure vanishes, while when this parameter is large the components of the variance tensor tend toward the full second moments of the missingatom distribution. Unlike the average intensities that enter the calculation of the fragment (and solvent) variances [see (7) and (20)], the second moments in (16) can extend to high resolution and only the presence of the factor ensures the falloff of the missing structure variance with resolution.
3.3. The bulk solvent
The calculation of the bulksolvent contribution to the crystal scattering with several different methods of increasing complexity has been described in the crystallographic literature (Glykos & Kokkinidis, 2000, and references therein). The bulksolvent density in BUSTER–TNT is modelled using an envelope uniformly filled with a given solvent electron density, , and thermally smeared with a fixed temperature factor B_{s}.
3.3.1. The distribution of F_{h}^{solv}
The structurefactor distribution for the bulk solvent, , follows a twodimensional Gaussian distribution around the offset with a variance . The offset and variance are computed as follows.

4. The BUSTER likelihood function
In the next sections we examine more closely the calculation of the BUSTER likelihood function, its gradient and Hessian. First, the dependence of and on the scaling parameters is analysed. We then briefly present the Rice likelihood functions and mention the incorporation of external phases via Hendrickson–Lattman (Hendrickson & Lattman, 1970) coefficients.
4.1. Scale factors
All quantities entering a likelihood function need to be on an observational scale; prior to describing the likelihood function, in this section we describe the overall scale and temperature factors that are used to bring quantities from an absolute scale to an observational scale. The values of these overall scale and temperature factors are refined in BUSTER by maximizing their likelihood.
The three different contributions to the BUSTER–TNT model may also need to be scaled to one another; relative model scale and temperature factors are also refined jointly with the overall scaling parameters during the ML scaling in BUSTER.
4.1.1. The overall scaling and temperature factors
BUSTER overall scaling parameters are of three different types: the scale factor K_{overall}, the isotropic scaling B factor B_{iso} and the components of the anisotropic scaling tensor β, which enter the isotropic and anisotropic overall scaling factors,
where (notice the sign of the exponents)
The parametrization of the anisotropic scaling factor is slightly unusual but follows the convention adopted in TNT (Tronrud et al., 1987; Tronrud, 1997), which constrains the elements of β to make the tensor traceless and, of course, to obey crystal symmetry.
4.1.2. Componentspecific scaling parameters
The calculated
is a sum of contributions from three components, individually scaled to one another by individual scale factors. Because the fragment component usually represents the main contribution to the total these scaling factors are all expressed relative to the fragment, which is assumed to be on absolute scale. The expressions for the calculated , and its associated variance, , on an absolute scale areDuring scaling, all the componentspecific imperfection parameters are refined together with scaling parameters and the full covariance between them is taken into account.
4.2. The Rice distribution and the BUSTER likelihood function
The BUSTER–TNT distribution of F_{h} is a Gaussian centred around the offset (22) with variance (23). When integrated over the phase, it gives the conditional distribution of the structurefactor amplitude, the Rice distribution
The Rice distribution is a function of the set P of scaling, imperfection and structural parameters, P = {P_{scal}, P_{impf}, P_{struct}}, in that these parameters enter the definitions of and . The details of the centric and acentric Rice distributions implemented in BUSTER–TNT are described by Bricogne (1997).
Assuming independence between reflections, the likelihood of a reflection should be computed by integrating the Rice distribution over the observed BUSTER instead computes the likelihood by consulting the Rice distribution at the value of that observed structurefactor amplitude,
with a probability distribution involving both the observed and its variance. To avoid full twodimensional integration and to simplify the calculation, for each observed reflectionThis approach effectively amounts to discarding the uncertainty over the observed ^{5} it is possible to approximate the integration over the observed by adding the observed variance (as a scalar tensor) to the variance obtained from the model.
Because the observed uncertainty is usually much smaller than the model error,The function maximized during the
of the parameters is the loglikelihood (LL) of the parameters in view of the observed data, ,4.3. External phase distribution
Incorporation of external phase information can help et al., 1998). When the external phase information is cast in the form of Hendrickson–Lattman ABCD coefficients (Hendrickson & Lattman, 1970), its inclusion in the distribution for the is achieved very simply by adding the `external' Hendrickson–Lattman coefficients to the `endogenous' coefficients obtained from the BUSTER distribution for the overall For this approach to be possible, both phase distributions must share the same origin. The resulting phase probability distribution, , is used instead of a uniform weight when integrating over the phase in (24). This process gives rise to the `elliptic' Rice likelihood function derived by Bricogne (1997).
especially in cases of limited resolution and/or data quality (Pannu4.4. The expected structurefactor amplitude F_{h}^{xpct}
The total structurefactor amplitude can be computed as the first moment of the distribution of the total F_{h} and is defined as the expected structurefactor amplitude ,
It is , on observational scale, that BUSTER compares with the observed amplitude F_{h}^{obs} to compute Rfactor statistics (see §5.1).
5. statistics
BUSTER computes several kinds of statistics that serve as a measure of the agreement between the model and the observed data. These statistics are evaluated in resolution bins for free and working sets of reflections and are monitored during the course of the calculation.^{6}
5.1. R factors
The R factors (both overall and in resolution bins) are computed using the expectation of the model rather than its calculated value, on the grounds that the expectation is the current model prediction for an observation. For reflections around resolution d*,
Notice that is on observational scale because it comes from the first moment of the Rice distribution for the structurefactor amplitude (27).
5.2. Loglikelihood gain
To monitor the changes in likelihood of the parameters introduced by the ML ), or LLG for short, defined as the logarithm of the ratio between the likelihood of the current parameters over the likelihood of the starting parameters,
it is useful to consider the loglikelihood gain (Bricogne, 1997The LLG is zero at the beginning of the calculation and increases whenever the likelihood of the current parameters is higher than the likelihood of the starting parameters. As with R factors, the overall loglikelihood gain is split over the working and the test sets of reflections.
In the context of ML R factor and therefore allows a more objective and sensitive assessment of the progress. On the other hand, the LLG cannot inform a comparison of different refinements in that, unlike the familiar R factor, it is relative to the likelihood of the model at the starting cycle.
the loglikelihood gain is a more natural statistic than the5.3. Correlation coefficients between structurefactor amplitudes
To check the agreement with the data of structurefactor amplitudes computed from selected subsets of the BUSTER structural model, the program computes and outputs correlation coefficients between several kinds of structurefactor amplitudes in resolution bins. Each type of relates two sets of amplitudes, F_{1} and F_{2},
The correlation coefficients, like the R factors, do not directly contain information as to the quality of the phases, being computed from the amplitudes alone; unlike the R factors, the correlation coefficients are scaleindependent. If the and completion are successful, the correlation coefficients should increase (see Fig. 2 for the progression in the case of CD55 refinements, starting from a twodomain only model and ending at the full fourdomain structure).
Depending on the particular F_{1} and F_{2}, each contains information about a specific aspect of data quality, model quality or modeltodata agreement.
5.4. logσ_{A} and overall Luzzati B_{impf}
The value of logσ_{A} (Srinivasan, 1966; Read, 1986) and of the overall Luzzati imperfection, B_{impf}, are obtained by computing, respectively, the intercept and slope of the linear regression in d*^{2},
The values of F_{xpct} that enter (31) include contributions from the the missing structure and the bulk solvent, so that it is not possible to correlate directly the value of B_{impf} obtained here to the meansquare coordinate error of the partial structure.
6. BUSTER–TNT refinement
BUSTER–TNT is carried out in much the same way as it is in TNT alone, except for three main differences.
in

6.1. Algorithmic details
The first task of the BUSTER program is to generate the distributions for the missing structure and the solvent.^{7} During of a those distributions are kept constant, as already mentioned. Overall and componentspecific scaling and imperfection parameters and fragment atomic parameters are then refined. At each cycle, the following tasks are performed.

A summary of the parameters involved in scaling is shown in Table 1.

6.2. Approximations in the derivatives
For the sake of computational expediency, some approximations are made while calculating the derivatives of the total loglikelihood during both scaling and structural parameter refinement.
The gradient component of for any refined atomic parameter p can be written as
BUSTER–TNT neglects the second term in the sum on the righthand side of (32), that is, the dependence of the variance on the parameter p.^{8}
The calculation of the second derivative of the loglikelihood with respect to the TNT module rfactor can only compute and handle the second derivative of an LS residual, approximately , while the second derivative of the loglikelihood is needed.
factor needs to accommodate the fact that theTo overcome this limitation, an ad hoc factor is calculated in BUSTER and passed to TNT, such that the TNT module rfactor will effectively compute an approximation to the curvature of the loglikelihood rather than the curvature of the LS residual. The curvature of the loglikelihood can be approximated by a scalar quantity, if we neglect the dependency on the variances and take the average of the absolute values of diagonal elements only,
7. BUSTER–TNT of a severely incomplete structure: CD55
As an example of the use of BUSTER–TNT to refine and help completion of severely incomplete structures, we will illustrate how the program was used to solve the of human CD55 starting from a 50% incomplete molecularreplacement model.
CD55 is a fourdomain 28 kDa human complement regulator that accelerates the decay of the alternative and classical pathway convertases, thus protecting selfcells from complementmediated lysis. The _{34}) was solved first (Williams et al., 2003). Subsequently, crystals of CD55 domains 1–4 (hereafter CD55_{1234}) were obtained; they belong to either of two crystal forms: A (2.3 Å data; PDB code 1ojv ) and B (2.8 Å data; PDB code 1ojw ). Both forms belong to P1, with two molecules in the and about 50% solvent content. For details of data quality and processing, see Lukacik et al. (2004).
of a construct consisting of domains 3 and 4 only (hereafter indicated as CD557.1. Phasing of the CD55_{1234} structure
The molecularreplacement program MOLREP (Vagin & Teplyakov, 2000) was used to place two independent copies of the crystallographic model for CD55_{34} (Williams et al., 2003). This model for CD55_{34} was refined in BUSTER–TNT, modelling the missing domains 1 and 2 with the homographic exponential model described in §3.2.2.
After BUSTER–TNT phases were used to locate heavy atoms in a Pt derivative of crystal form A and an Au derivative of crystal form B. SHARP (de La Fortelle & Bricogne, 1997) heavyatom and phasing of these Pt and Au models in the two crystal forms separately did not lead to interpretable maps (Lukacik et al., 2004).
and model building on domains 3 and 4, theAn iterative phasing procedure was then followed, cycling several times over the following three steps:
7.2. Analysis of the CD55_{34} and CD55_{234} BUSTER–TNT refinements
In this section, we analyse BUSTER–TNT refinements against the CD55_{1234} data. The refinements were performed with and without the missingstructure model at two different stages of model building: the initial refinements of domains 3 and 4 (50% incompleteness) and an intermediate stage where domains 2–4 were built and refined but domain 1 was still missing (25% incompleteness).
7.2.1. CD55_{34}
The molecularreplacement solution for the two copies of domains 3 and 4 was rigidbody refined and then subjected to Bfactoronly followed by joint positional and Bfactor with tight NCS restraints in both crystal form A and crystal form B. After rebuilding of the model for domains 3 and 4, a final round of tight NCSrestrained gave the model for domains 3 and 4 discussed in this section.
At the beginning of each ) based on the nominal solvent content of 50% and variancefiltering of the map obtained from the current phases. The bulksolvent model was based on the Babinet opposite of the mask around domains 3 and 4 and was therefore overlapping with the missingstructure lowresolution model. In a separate series of refinements, the same protocol was followed in the absence of the missingstructure model, refining the model for domains 3 and 4 with the bulksolvent model computed, as mentioned above, by masking around domains 3 and 4.
the lowresolution distribution for the missing domains 1 and 2 was computed as a homographic exponential model (see §3.2.3Table 2 reports the overall scale factors for the final refinements of domains 3 and 4, in the presence and absence of the missingstructure model, for both crystal forms of CD55_{1234}. Modelling the missing structure clearly helps scaling of the higherresolution data set, while the improvement for the 2.8 Å data in crystal form B is marginal.

However, for both crystal forms, significant improvement in the phases (and in the quality of the derived electrondensity and residual maps) is brought about by using the missingstructure model. This improvement is illustrated in Fig. 3 by the plot (for form B) of the average phase error,
In Fig. 3 we also report the phase error for REFMAC5 and CNS refinements of the same CD55_{34} model in crystal form B; they suffer from a larger phase error, which is expected given the 50% incompleteness and the limited resolution.
7.2.2. CD55_{234}
To the refined model for domains 3 and 4, the model for domain 2 was added, taken from a different crystal form (PDB code 1ojy ), thus generating a 25% incomplete model for CD55_{1234}. Again, BUSTER–TNT and rebuilding was carried out separately, with and without a missingstructure model, and the phases and amplitude correlation coefficients were scored at the end. The distribution modelling the missing domain 1 of CD55 based on the phases obtained from the partial atomic model for domains 2, 3 and 4 of CD55, is shown in Fig. 1. The effect of the modelling of the missing structures is still visible in the phase error and phased correlation coefficients plots, whilst the fit to the amplitudes is essentially as good with or without the missingatoms model (see Fig. 2).
At this level of incompleteness (25%), the phases are good enough to be subjected to a further step of ML phase BUSTER (Bricogne et al., in preparation), to improve the density for the missing domain 1. The resulting phase error is shown in Fig. 3.
with maximum constraints inThis analysis confirms earlier studies demonstrating that BUSTER–TNT can be used successfully to bootstrap and completion from an initial incomplete molecularreplacement solution. At 50% incompleteness, additional phase information is required (in the form of NCS or multicrystal averaging or poor experimental phases) for structure solution; at 25% incompleteness or lower, the will lead to structure completion, provided the incomplete model is accurate.
Examples of successful use of BUSTER–TNT to overcome 10–45% incompleteness and/or reduce phase bias in the of macromolecular models can be found in the literature (e.g. Dessen et al., 1999; Fischmann et al., 1999; Bard et al., 2000; Koronakis et al., 2000; Somers et al., 2000; Ng et al., 2000; von Delft et al., 2001; Han et al., 2001; Vicens & Westhof, 2002; HanzalBayer et al., 2002; Benach et al., 2002; Sagermann & Matthews, 2002; Madison et al., 2002; Svensson et al., 2003; Retailleau et al., 2003; Izard et al., 2003).
8. Further developments
Further developments are under way to improve on a number of limitations currently in the software. Among the main improvements planned are

9. Conclusions
BUSTER–TNT offers the possibility of refining incomplete macromolecular atomic models in the presence of a lowresolution probabilitybased model for the missing structure. The program combines the errors in the missingatoms and bulksolvent models to give a consistent statistical probability distribution for the structurefactor amplitude, which in turn is used to drive ML of the model.
When the atomic model is very incomplete, modelling of the missing structure and the consistency of the BUSTER statistical model help structure building and completion because

The program is available for download at https://www.globalphasing.com/buster/ .
Footnotes
‡These authors contributed equally to this work.
^{1}The first and third contribution to the crystal electron density are the familiar contributions present in all macromolecular programs, while the second is to date present in BUSTER only.
^{2}For a recent illustration of the use of the Babinet principle in bulksolvent correction, see Guo et al. (2000).
^{3}If the is used instead of the model for the whole macromolecule, the solvent envelope will overlap with the missing structure regions.
^{4}This approach makes the BUSTER–TNT bulksolvent correction less adequate after rigidbody when the atomic shifts are usually large.
^{5}Only when the model is complete and fairly error free can the magnitudes of model and observation uncertainties be comparable.
^{6}Evaluation of the statistics is not performed with a generalized approach (Cowtan, 2002); rather we use conventional binning of in d^{2} intervals. Two distinct binnings are effected: a coarser one for overall statistics and another with smaller bin widths for free and workingsets statistics.
^{7}Calls to the CCP4 program NCSMASK (Collaborative Computational Project, Number 4, 1994) are used to perform any masking steps needed, while any blurring steps are carried out within BUSTER
^{8}The dependence on the variance is not completely neglected, as an entrainment factor is added to
Acknowledgements
The authors are grateful to the members of the Global Phasing Consortium for financial support. Partial financial support was also provided by European Commission grant No. QLRTCT200000398 within the AUTOSTRUCT project. Dale Tronrud helped in the interfacing of BUSTER and TNT. Eric de La Fortelle, Gwyndaf Evans, Richard J. Morris, Wlodzimierz Paciorek and Marc Schiltz contributed ideas, suggestions and criticism. Very valuable feedback was given by all the beta users of the program and especially by Thierry Fischmann, Sandra Jacob, Dirk Kostrewa and Will Somers. Petra Lukacik expressed, purified and crystallized human CD55_{1234}. The first implementation of the BUSTER–TNT scripts was initially written by John J. Irwin. PR is funded by Biotechnology and Biological Sciences Research Council grant No. 43/B16601 (to SML).
References
Abrahams, J. P. (1997). Acta Cryst. D53, 371–376. CrossRef CAS Web of Science IUCr Journals Google Scholar
Abrahams, J. P. & Leslie, A. (1996). Acta Cryst. D52, 30–42. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bard, J., Zhelkovsky, A., Helmling, S., Earnest, T., Moore, C. & Bohm, A. (2000). Science, 289, 1346–1349. Web of Science CrossRef PubMed CAS Google Scholar
Benach, J., Filling, C., Oppermann, U., Roversi, P., Bricogne, G., Berndt, K., Jornvall, H. & Ladenstein, R. (2002). Biochemistry, 41, 14659–14668. Web of Science CrossRef PubMed CAS Google Scholar
Bertaut, E. (1955a). Acta Cryst. 8, 537–543. CrossRef CAS IUCr Journals Web of Science Google Scholar
Bertaut, E. (1955b). Acta Cryst. 8, 544–548. CrossRef CAS IUCr Journals Web of Science Google Scholar
Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. A12, 794–802. CrossRef CAS IUCr Journals Web of Science Google Scholar
Bricogne, G. (1984). Acta Cryst. A40, 410–445. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1988). Acta Cryst. A44, 517–545. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1993a). Acta Cryst. D49, 37–60. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1993b). International Tables for Crystallography, edited by U. Shmueli, Vol. B, pp. 23–106. Dordrecht, Holland: Kluwer Academic Publishers. Google Scholar
Bricogne, G. (1997). Methods Enzymol. 276, 424–448. CrossRef CAS Web of Science Google Scholar
Bricogne, G. & Irwin, J. J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory. Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Cowtan, K. (2002). J. Appl. Cryst. 35, 655–663. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cowtan, K. & Main, P. (1993). Acta Cryst. D49, 148–157. CrossRef CAS Web of Science IUCr Journals Google Scholar
Delft, F. von, Lewendon, A., Dhanaraj, V., Blundell, T., Abell, C. & Smith, A. (2001). Structure, 9, 439–450. Web of Science CrossRef PubMed Google Scholar
Dessen, A., Tang, J., Schmidt, H., Stahl, M., Clark, J., Seehra, J. & Somers, W. (1999). Cell, 97, 349–360. Web of Science CrossRef PubMed CAS Google Scholar
Fischmann, T., Hruza, A., Niu, X., Fossetta, J., Lunn, C., Dolphin, E., Prongay, A., Reichert, P., Lundell, D., Narula, S. & Weber, P. (1999). Nature Struct. Biol. 6, 233–242. Web of Science CrossRef PubMed CAS Google Scholar
Glykos, N. M. & Kokkinidis, M. (2000). Acta Cryst. D56, 1070–1072. Web of Science CrossRef CAS IUCr Journals Google Scholar
Guo, D., Blessing, R. H. & Langs, D. A. (2000). Acta Cryst. D56, 451–457. Web of Science CrossRef CAS IUCr Journals Google Scholar
Han, M., Gurevich, V., Vishnivetskiy, S., Sigler, P. & Schubert, C. (2001). Structure, 9, 869–880. Web of Science CrossRef PubMed CAS Google Scholar
HanzalBayer, M., Renault, L., Roversi, P., Wittinghofer, A. & Hillig, R. (2002). EMBO J. 21, 2095–2106. Web of Science CrossRef PubMed CAS Google Scholar
Hendrickson, W. & Lattman, E. (1970). Acta Cryst. B26, 136–143. CrossRef CAS IUCr Journals Google Scholar
Izard, T., Evans, G., Borgon, R., Rush, C., Bricogne, G. & Bois, P. (2003). Nature (London), 427, 171–175. Web of Science CrossRef PubMed Google Scholar
Jones, E., Walker, N. & Stuart, D. (1991). Acta Cryst. A47, 753–770. CrossRef CAS Web of Science IUCr Journals Google Scholar
Koronakis, V., Sharff, A., Koronakis, E., Luisi, B. & Hughes, C. (2000). Nature (London), 405, 914–919. Web of Science CrossRef PubMed CAS Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Methods Enzymol. 276, 472–494. Google Scholar
Leslie, A. G. W. (1987). Acta Cryst. A43, 134–136. CrossRef CAS Web of Science IUCr Journals Google Scholar
Lukacik, P., Roversi, P., White, J., Esser, D., Smith, G., Billington, J., Williams, P., Rudd, P., Wormald, M., Harvey, D., Crispin, M., Radcliffe, C., Dwek, R., Evans, D., Morgan, B., Smith, R. & Lea, S. (2004). Proc. Natl Acad. Sci. USA, 101, 1279–1284. Web of Science CrossRef PubMed CAS Google Scholar
Luzzati, V. (1952). Acta Cryst. 5, 802–810. CrossRef IUCr Journals Web of Science Google Scholar
McRee, D. (1999). J. Struct. Biol. 125, 156–165. Web of Science CrossRef PubMed CAS Google Scholar
Madison, V. et al. (2002). Biophys. Chem. 101–102, 239–247. Web of Science CrossRef PubMed CAS Google Scholar
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals Google Scholar
Ng, K. K., Petersen, J. F., Cherney, M. M., Garen, C., Zalatoris, J. J., RaoNaik, C., Dunn, B. M., Martzen, M. R., Peanasky, R. J. & James, M. N. (2000). Nature Struct. Biol. 7, 653–657. Web of Science CrossRef PubMed CAS Google Scholar
Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Acta Cryst. D54, 1285–1294. Web of Science CrossRef CAS IUCr Journals Google Scholar
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668. CrossRef CAS Web of Science IUCr Journals Google Scholar
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals Google Scholar
Retailleau, P., Huang, X., Yin, Y., Hu, M., Weinreb, V., Vachette, P., Vonrhein, C., Bricogne, G., Roversi, P., Ilyin, V. & Carter, C. J. (2003). J. Mol. Biol., 325, 39–63. Web of Science CrossRef PubMed CAS Google Scholar
Reynolds, R., Remington, S., Weaver, L., Fisher, R., Anderson, W., Ammon, H. & Matthews, B. (1985). Acta Cryst. B41, 139–147. CrossRef CAS Web of Science IUCr Journals Google Scholar
Roversi, P., Blanc, E., Vonrhein, C., Evans, G. & Bricogne, G. (2000). Acta Cryst. D56, 1316–1323. Web of Science CrossRef CAS IUCr Journals Google Scholar
Roversi, P., Irwin, J. J. & Bricogne, G. (1998). Acta Cryst. A54, 971–996. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sagermann, M. & Matthews, B. (2002). J. Mol. Biol. 316, 931–940. Web of Science CrossRef PubMed CAS Google Scholar
Somers, W., Tang, J., Shaw, G. & Camphausen, R. (2000). Cell, 103, 467–479. Web of Science CrossRef PubMed CAS Google Scholar
Srinivasan, R. (1966). Acta Cryst. 20, 143–144. CrossRef CAS IUCr Journals Web of Science Google Scholar
Svensson, S., Ostberg, T., Jacobsson, M., Norstrom, C., Stefansson, K., Hallen, D., Johansson, I., Zachrisson, K., Ogg, D. & Jendeberg, L. (2003). EMBO J. 22, 4625–4633. Web of Science CrossRef PubMed CAS Google Scholar
Tronrud, D. E. (1992). Acta Cryst. A48, 912–916. CrossRef CAS Web of Science IUCr Journals Google Scholar
Tronrud, D. E. (1996). J. Appl. Cryst. 29, 100–104. CrossRef CAS Web of Science IUCr Journals Google Scholar
Tronrud, D. E. (1997). Methods Enzymol. 277, 306–319. CrossRef CAS PubMed Web of Science Google Scholar
Tronrud, D. E. (1999). Acta Cryst. A55, 700–703. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489–501. CrossRef CAS Web of Science IUCr Journals Google Scholar
Vagin, A. & Teplyakov, A. (2000). Acta Cryst. D56, 1622–1624. Web of Science CrossRef CAS IUCr Journals Google Scholar
Vicens, Q. & Westhof, E. (2002). Chem. Biol. 9, 747–755. Web of Science CrossRef PubMed CAS Google Scholar
Wang, B.C. (1985). Methods Enzymol. 12, 813–815. Google Scholar
Williams, P., Chaudhry, Y., Goodfellow, I., Billington, J., Powell, R., Spiller, O., Evans, D. & Lea, S. (2003). Proc. Natl Acad. Sci. USA, 278, 10691–10696. CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.