Modelling prior distributions of atoms for macromolecular refinement and completion

Roversi, P.; Blanc, E.; Vonrhein, C.; Evans, G.; Bricogne, G.

doi:10.1107/S0907444900008490

research papers

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 56| Part 10| October 2000| Pages 1316-1323

doi:10.1107/S0907444900008490

Modelling prior distributions of atoms for macromolecular refinement and completion

Pietro Roversi,^a Eric Blanc,^b Clemens Vonrhein,^b Gwyndaf Evans ^a and Gérard Bricogne ^a,^c ^*

^aMRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England,^bGlobal Phasing Ltd, Sheraton House, Castle Park, Cambridge CB3 0AX, England, and ^cLURE, Université Paris-Sud, Bâtiment 209D, 91405 Orsay, France
^*Correspondence e-mail: [email protected]

(Received 3 May 2000; accepted 14 June 2000)

Until modelling is complete, macromolecular structures are refined in the absence of a model for some of the atoms in the crystal. Techniques for defining positional probability distributions of atoms, and using them to model the missing part of a macromolecular crystal structure and the bulk solvent, are described. The starting information may consist of either a tentative structural model for the missing atoms or an electron-density map. During structure completion and refinement, the use of probability distributions enables the retention of low-resolution phase information while avoiding premature commitment to uncertain higher resolution features. Homographic exponential modelling is proposed as a flexible, compact and robust parametrization that proves to be superior to a traditional Fourier expansion in approximating a model protein envelope. The homographic exponential model also has potential applications to ab initio phasing of Fourier amplitudes associated with macromolecular envelopes.

Keywords: homographic exponential models; macromolecular refinement; probability distributions.

3D view: 1lvy

PDB reference: porcine pancreatic elastase, 1lvy

1. The case for low-resolution distributions in partial structure refinement and completion

Crystallographic partial structure refinement and completion is usually performed by omitting the questionable parts of the structure and refraining as much as possible from building in ill-defined density regions. If the starting phases are of poor quality, the process of phase improvement by model building is therefore slow, because some of the low-resolution positional information that is already available is not incorporated until the position of the missing atoms is unambiguously defined. In order to avoid locking in on an incorrect structure, even the most likely clues or inspired guesses about the position of the missing atoms are set aside, surrendering to the fear of model bias.

One way of overcoming these difficulties is the iterative placement of atoms in the peaks of the uninterpretable regions of the electron-density map, leading to a `hybrid model' for the crystal structure that comprises the protein model and free atoms (Perrakis et al., 1999 ). A different strategy is described here, as implemented in the computer program BUSTER (Bricogne, 1993 , 1997 ), which uses a Bayesian statistical model to merge consistently various sources of crystallographic phase information. At any stage during the phasing process, low-resolution real-space distributions are used in BUSTER to provide a statistical description of the scattering from the parts of structures that cannot be modelled reliably, either because they are weakly scattering (missing or disordered residues) or because of their intrinsic disorder (bulk solvent).

The main advantages of this procedure are: (i) the scaling of the data to the model is robust and accurate; (ii) the danger of biasing the refinement towards the initial values given to the parameters of the already traced atoms is less serious, because the scattering from the missing atoms is accounted for in a statistical sense; and (iii) from the low-resolution distribution for the missing atoms a maximum-entropy distribution can be derived; suitably scaled and thermally smeared, this is a versatile alternative to conventional weighted difference Fourier maps.

Before we examine closely how the real-space distributions are computed (§4), we add a brief section defining the symbols used throughout (§2) and a section containing the general outline of the structural model as implemented in BUSTER (§3).

2. Symbols used in this paper

In this paper, five types of real-space distributions are dealt with, all of which are handled in BUSTER as CCP4-format maps sampled on a crystallographic grid with NX, NY and NZ points along the crystallographic axes. We list here the symbols for these distributions (omitting any subscripts), as an aid to the reader.

q(x), a generic distribution in the crystallographic unit cell.
χ(x), an indicator function, i.e. a binary mask whose values are 0 or 1 only; V_χ is the fractional volume of the mask χ(x); when the latter is sampled on a crystallographic grid NX NY NZ,
$[V_{\chi} = (1/ {\rm NX \, NY \, NZ}) \textstyle \sum \limits_{i = 1}^{NX}\sum \limits_{j = 1}^{NY}\sum \limits_{k = 1}^{NZ} \chi(i,j,k). \eqno (1)]$
m(x), an envelope, i.e. a positive everywhere and continuous function, usually with low-resolution Fourier components only; m(x) is normalized so that its average in the unit cell is unity,
$[({{1}/{V}})\textstyle \int \limits_{V} m({\bf x})d^{3}{\bf x} = 1, \eqno (2)]$
V being the volume of the unit cell; when sampling m(x) on a grid,
$[(1/{\rm NX\,NY\,NZ})\textstyle \sum \limits_{i = 1}^{NX}\sum \limits _{j = 1}^{NY}\textstyle \sum \limits_{k = 1}^{NZ} m(i,j,k) = 1. \eqno (3)]$
p(x), a probability distribution, so that 0 ≤ p(x) ≤ 1; $[\textstyle \int_{V} p({\bf x})d^{3}{\bf x} = 1]$ .
ρ(x), an electron density, in e Å⁻³ units.

Vertical bars denote the absolute value, |f(x)| = abs[f(x)]; angled brackets denote expectation value under a probability density, 〈f(x)〉 = $[\textstyle \int]$ P(x)f(x)dx; the asterisk stands for convolution, (f * g)(x) = $[\textstyle \int]$ f(x − y)g(y)dy.

3. The structural model

The electron density at point x in the unit cell is written as the sum of three contributions,

$[\rho_{\rm tot}({\bf x}) = \rho_{\rm frag}({\bf x})+ \rho_{ \rm rand}({\bf x})+ \rho_{\rm solv}({\bf x}), \eqno (4)]$

where ρ_frag(x) is the electron density for the known fragment of the structure for which the atomic positions are known with a good degree of confidence; ρ_rand(x) is the density for the atoms that are missing in the fragment and whose positions are described using a probability distribution and a random atom model (see §3.2); ρ_solv(x) is the bulk solvent density. Here, ρ_tot(x) is on an absolute scale.

The model for the structure factor is clearly

$[F_{\rm tot}({\bf h}) = {\cal F} [{\rho_{\rm tot}({\bf x})}]({\bf h}) = F_{\rm frag}({\bf h})+ F_{\rm rand}({\bf h})+ F_{\rm solv}({\bf h}), \eqno (5)]$

where the subscripts retain the meaning they have in (4).

Before we describe how the real-space distributions are computed, the next three sections will say some more about the individual components of the structural model.

3.1. The partial structure model

The atoms whose positions are known with a good degree of confidence are described by a set of conventional atomic model parameters. Their positions, isotropic displacement parameters (i.e. temperature factors) and occupancies can be refined by maximum likelihood, using an interface to the refinement package TNT (Tronrud et al., 1987 ; Tronrud, 1997 ), as previously described (Bricogne & Irwin, 1996 ). The standard stereochemical, geometrical and non-crystallographic symmetry (hard and soft) restraints are handled in TNT. During partial structure refinement the probability distribution for the random atoms, as well as the bulk-solvent distribution, are kept fixed.

3.2. The missing structure model

The prior expectation about the position of the missing atoms is cast in quantitative terms using an envelope m_rand(x) that is used as a positional prior distribution for the same atoms; the calculation of m_rand(x) is described in §4. As the suffix `rand' suggests, all the missing atoms are assumed to be randomly distributed according to m_rand(x).

Once the partial structure has been refined, a maximum-entropy distribution q_rand(x) for the missing atoms is computed in the form

$[q_{\rm rand}({\bf x}) = {{1}\over{Z}}m_{\rm rand}({\bf x})\exp\left[\textstyle \sum \limits_{\bf h} \lambda_{\bf h}\Xi_{\bf h}({\bf x})\right], \eqno (6)]$

where Z is a normalization factor such that $[\textstyle \int_{V}]$ q_rand(x)d³x = 1, λ_h are Lagrange multipliers and Ξ_h is the trigonometric structure factor, i.e. the structure factor for a point scatterer at rest,

$[\Xi_{\bf h}({\bf x}) = {{1}\over{|G|}} \textstyle \sum \limits_{g\in G}\exp[2\pi i{\bf h} S_{g}{\bf x}]. \eqno (7)]$

|G| is the number of elements of the space group G and S_gx = R_gx + t_g is the generic symmetry operation in G.

The calculation of q_rand(x) is performed varying the λ_h under the constraint of maximum entropy, as outlined in Roversi et al. (2000 ).

q_rand(x) can be normalized and turned into a positional posterior probability distribution. It shows the extent to which the prior expectation m_rand(x) is confirmed or contradicted by the observations. In the absence of noise and if the observations contained no information regarding the region of interest, the final probability distribution would coincide with the (normalized) prior (1/Z)m_rand(x) (because λ_h = 0 ∀ h). In practice, both noise and signal in the data will cause the λ_h to differ from zero and build features into q_rand(x). The structure-factor contribution to the structure factor from the missing atoms is computed from q_rand(x) using the sum of the scattering factors for the same atoms,

$[{\bf F}_{\rm rand}({\bf h}) = \Sigma_{\rm rand}(\bf h) \times {\cal F}[{q_{\rm rand}({\bf x)}}]({\bf h}), \eqno (8)]$

where Σ_rand(h) is the sum of the scattering factors for the missing atoms,

$[\Sigma_{\rm rand}({\bf h}) = {\textstyle \sum \limits^{N_{\rm rand}}_{j}}f_{j}({\bf h}) \exp\left [-\langle B\rangle_{j}{{d^{*2}_{\bf h}}\over{4}}\right ]. \eqno (9)]$

3.3. The bulk-solvent model

The bulk-solvent structure factor F_solv(h) on the absolute scale can be computed from the Fourier components of the bulk-solvent density ρ_solv(h), smeared by the solvent temperature factor,

$[{\bf F}_{\rm solv}({\bf h}) = {\cal F}[{\rho_{\rm solv}({\bf x})}]({\bf h}) \times\exp\left[-B_{\rm solv}{{d_{\bf h}^{*2}}\over{4}}\right]. \eqno (10)]$

The bulk-solvent density is taken proportional to the bulk-solvent envelope m_solv(x),

$[\rho_{\rm solv}({\bf x}) = \overline{\rho}_{\rm solv} \times m_{\rm solv}(\bf x), \eqno (11)]$

where $[\overline{\rho}_{\rm solv}]$ and V_solv are the electron density and volume of the bulk solvent.

In BUSTER, the bulk-solvent envelope m_solv (x) is never handled as such, the macromolecular envelope m_macrom(x) being used instead; m_macrom(x) is either computed from the whole molecule atomic model [see §4.2, the volume V_macrom(x) being the volume of the whole binary mask χ_macrom(x)] or it is computed starting from the density using the known solvent-volume fraction (see §4.3).

Once m_macrom(x) is obtained, the Babinet principle,¹ relating the low-resolution Fourier components of two complementary distributions m_solv(x) and m_macrom(x), is used,

$[V_{\rm solv}{\cal F}[m_{\rm solv}({\bf x})]({\bf h}) = -V_{\rm macrom}{\cal F}[m_{\rm macrom}({\bf x})]({\bf h}), \eqno (12)]$

so that

$[\eqalignno {{\bf F}_{\rm solv}({\bf h})& = -\overline{\rho}_{\rm solv}V_{\rm macrom}\times {\cal F}[m_{\rm macrom}({\bf x})] ({\bf h}) \cr &\ \quad \times \ \exp \left[{{\left(-d^{\ast}_{\bf h}\right)^{2}}\over {4}}B_{\rm solv}\right]. & (13)}]$

4. Computing m_rand(x)

We can now examine more closely how the real-space envelopes are computed; in particular, we discuss here the calculation of the envelope for the missing atoms, m_rand(x). Similar techniques can be used to compute the envelopes for the whole macromolecule or for the bulk solvent.

As soon as an initial model is available, the prior distribution $[m_{\rm rand}(\bf x)]$ for the positions of the missing atoms can be computed in three ways: (i) by excluding the missing atoms from the regions already containing the partial structure (uniform prior, §4.1), (ii) by using a trial atomic model for the missing atoms (model-based non-uniform prior, §4.2) or (iii) simply from the local fluctuation of the electron density (map-based non-uniform prior, §4.3).

4.1. Uniform prior

The simplest choice for the missing atoms prior probability distribution is to exclude them from the regions that already contain a reliable atomic model: this brings into the statistical model the notion that a number of atoms are missing and that they are equally likely to be anywhere except where other atoms have been placed already.

The uniform prior distribution is defined in three steps as follows.

(i) A binary mask $[\chi_{\rm frag}^{\rm a.u.}(\bf x)]$ is drawn around the known partial structure; this step is performed using the program NCSMASK (Collaborative Computational Project, Number 4, 1994 ). The masking radius R_frag can be varied; the default for R_frag is 2.05 Å.
(ii) $[\chi_{\rm frag}^{\rm a.u.}(\bf x)]$ is symmetry expanded to cover the whole cell; this symmetry-expanded binary mask χ_frag(x) is negated to obtain a binary mask χ_rand(x) for the random atoms,
$[\chi_{\rm rand}({\bf x}) = 1-\chi_{\rm frag}({\bf x}). \eqno (14)]$
(iii) χ_frag(x) is blurred by means of a convolution with an isotropic Gaussian G(x; B_rand) and normalized,
$[m_{\rm rand}({\bf x}) = {{1}\over{V_{\chi_{\rm rand}}}}\times [\chi_{\rm rand} \ast G(B_{\rm rand})] (\bf x), \eqno (15)]$
where the parameter B_rand controls the width of the Gaussian and therefore the slope of m_rand(x) around the model used in generating $[\chi_{\rm frag}^{\rm a.u.}(\bf x)]$ .

The convolution in (15) is effected in reciprocal space, using a set of periodized (`aliased') structure factors for m_rand(x). The use of aliased structure factors to sample thermally smeared model densities on arbitrarily coarse crystallographic grids has been described in the Appendix of Roversi et al. (1998 ) and will not be detailed here.²

We stress that this distribution is uniform outside the regions occupied by the model, hence the name `uniform prior', but its shape is not uniform; only in absence of any partial model is this a truly uniform distribution throughout the unit cell.

We also notice that if the bulk-solvent envelope is also chosen to fill up all the space left empty by the macromolecular model, the missing atoms envelope and the bulk-solvent envelope are overlapping. They can still differ for the parameter B used in the blurring step (15).

4.2. Model-based non-uniform prior

Sometimes a rough guess is available as to the placement of a subset of atoms, such as a protein loop or domain or a bound ligand, but the model tentatively built for the same atoms is questionable. An envelope m_rand(x) can then be built around these ill-defined atoms and the same atoms omitted from the partial structure. The real-space picture of the crystal in this case then comprises the bulk-solvent envelope, the atomic model for the trusted traced atoms and the missing atoms envelope. The latter is localized around the tentatively placed atoms; it represents our prior expectation about their position but does not retain any of the high-resolution details that are being assessed.

The prior distribution is computed in four steps as follows.

(i) A binary mask $[\chi_{\rm macrom}^{\rm a.u.}(\bf x)]$ is drawn around the complete atomic model, including the parts that will be omitted; the radius for this masking can vary between 2 and 4 Å, depending on the degree of confidence one wants to retain regarding the omitted model (a tighter radius resulting in a distribution highly localized around the omitted atoms).
(ii) A binary mask $[\chi_{\rm frag}^{\rm a.u.}(\bf x)]$ is drawn around the part of structure that is going to be retained and a binary mask for the random atoms $[\chi_{\rm rand}^{\rm a.u.}(\bf x)]$ is obtained from
$[\chi_{\rm rand}^{\rm a.u.}({\bf x}) = \chi_{\rm macrom}^{\rm a.u.}({\bf x}) \times \left [1 - \chi_{\rm frag}^{\rm a.u.}({\bf x}) \right]. \eqno (16)]$
(iii) The $[\chi_{\rm rand}^{\rm a.u.}(\bf x)]$ mask is symmetry expanded to the unit cell to give χ_rand(x).
(iv) χ_rand(x) is blurred by means of a convolution with an isotropic Gaussian G(x; B_rand) and normalized as in (15).

4.3. Map-based non-uniform prior

Even when no atomic model is available, some rough idea about the placement of the missing atoms can be retrieved from the presence of high values of the local r.m.s.d. in noisy electron-density maps.

The local average of the electron density (Wang, 1985 ; Leslie, 1987 ) or its local fluctuation around the mean (Abrahams & Leslie, 1996 ; Abrahams, 1997 ) have been used to perform phase improvement by density-modification techniques.

The BUSTER envelope is also computed by local variance filtering of a noisy density map. Local averaging is performed by convolution with a Gaussian G(B), parametrized by a Debye–Waller factor B, and a solid sphere mask S(R), parametrized by a radius R. These convolutions are used in two filtering operations that select high and low frequencies in a distribution ρ(x),

$[\eqalignno {\rho^{\rm lo}(B,R)({\bf x})& = [\rho\ast G(B)\ast S(R)]({\bf x})& (17)\cr \rho^{\rm hi}(B,R)({\bf x})& = (\rho-\rho^{\rm lo})({\bf x)}.& (18)}]$

All the convolution steps are carried out in reciprocal space, by calculation of a set of aliased structure factors (Roversi et al., 1998), then Fourier-transformed to sample the density on the required grid.

For the (optional) high-frequency filtering, the following two measures of the local fluctuation around the local average can be defined:

(i) the local average of the absolute value of the deviation from the mean,
$[\omega({\bf x}) = [| \rho^{\rm hi}(B_{1},R_{1}) |\ast G(B_{2})\ast S(R_{2}) ](\bf x), \eqno (19)]$
(ii) the local r.m.s.d. from the local average,
$[\omega({\bf x}) = \{ [\rho^{\rm hi}(B_{1},R_{1}) ]^{2}\ast G(B_{2})\ast S(R_{2})\}^{{{1}\over{2}}}({\bf x}). \eqno (20)]$

The radius of the sphere for the high-pass filter is typically larger than the one for the low-pass filter in (19

) and (20

) (i.e. R₁ > R₂).

The high-frequency filter is useful in those cases where map Fourier components with D ≤ R₁ are either absent or cannot be trusted; but it can be omitted if the lowest-resolution features are correct; in this case, the following two local averages can be computed, also by Fourier transforms:

(i) the local average of the absolute value of the density,
$[\omega({\bf x}) = [|\rho^{\rm lo}| \ast G(B_{2})\ast S(R_{2})]({\bf x}), \eqno (21)]$
(ii) the local r.m.s. deviation from zero of the density,
$[\omega({\bf x}) = [(\rho^{\rm lo})^{2}\ast G(B_{2})\ast S(R_{2})]^{{{1}\over{2}}}({\bf x}). \eqno (22)]$

Once ω(x) is available, m_rand(x) should be obtained by homographic exponential modelling as described in the following section.

5. Homographic exponential modelling

We describe in this section a technique that affords a parametrization of low-resolution distributions and is used in BUSTER for computing macromolecular envelopes from noisy electron-density maps. The technique is a particular case of homographic mapping of a function e(x),

$[e({\bf x})\rightarrow{{a+b\times e({\bf x)}}\over{c+d\times e({\bf x})}}, \eqno (23)]$

where a = c = d = 1 and b = 0, and e(x) is an exponential e(x) = exp[ω(x)]; therefore, we propose to call it homographic exponential modelling.

The distributions obtained by homographic exponential modelling can be handled as values on a crystallographic grid and represent a new way of defining intrinsically `binary-like' macromolecular envelopes that are continuous and not binary. Alternatively, they can be parametrized with a finite set of coefficients in the expansion of ω, opening the way to ab initio low-resolution phasing based on phase permutation for a few coefficients of ω(x).

The potential of the homographic exponential modelling for ab initio phasing of envelope Fourier coefficients has been investigated by G. Bricogne and M. Ramin (G. Bricogne, unpublished results; Ramin, 1999 ). Here, we introduce the technique and present the results of a test study, aiming at the assessment of the number of Fourier coefficients of ω(x) that are needed to satisfactorily reconstruct a given m(x) when a homographic exponential model is adopted.

5.1. The Fermi–Dirac distribution

The problem of defining a low-resolution envelope for the macromolecule based on an electron-density map can be restated in the form of assigning to each pixel in the map a probability of belonging to the bulk solvent, which we can write p_solv(x). Correspondingly, p_macrom(x) = 1 − p_solv(x) is then the probability that the pixel at x belongs to the macromolecular volume.

It is clear that we are dealing with each pixel as an entity that can be in one and one only of two possible states (pixel in the bulk solvent/pixel in the macromolecule), like a fermion whose spin can be either of ±½; an analogy can be drawn with the occupancy distribution function for a system consisting of a finite number of fermion particles with a given total energy. This occupancy distribution function f_FD(E) follows a Fermi–Dirac distribution, depending on the temperature parameter β_FD and on the chemical potential μ_FD (Reif, 1965 ),

$[f_{\rm FD}(E) = {1}/\{{1+\exp[\beta_{\rm FD}(E-\mu_{\rm FD})]}\}. \eqno (24)]$

The chemical potential μ_FD arises from the requirement that the number of fermions is finite. At temperatures close to zero, the low-energy states are occupied [probability f_FD(E) ≃ 1] until the total number of fermions is reached; this defines the Fermi level (or Fermi energy μ_FD) of the system. The distribution quickly tails off to zero as the energy level increases; the states having energy higher than the Fermi level have zero occupancies unless the ratio of the energy gap (E− μ_FD) over the mean thermal energy 1/β_FD is small enough to permit some excitation.

By analogy, we can adopt some measure ω(x) of the local fluctuation of the electron density as an `envelope potential energy' and take β as inversely proportional to the r.m.s. error of the electron density (Blow & Crick, 1959 ),

$[{{1}\over{\beta}} \propto \textstyle \sum \limits_{\bf h} \varepsilon_{\bf h} \left(1-{\rm FOM}_{\bf h}^{2}\right) F_{\bf h}^{2}, \eqno (25)]$

FOM_h being the figure of merit,

$[{\rm FOM}_{\bf h} = ({ \langle \cos\varphi_{\bf h} \rangle^{2}+\langle \sin\varphi_{\bf h} \rangle^{2}})^{1/2}, \eqno (26)]$

computed from the current phase probability distribution P(φ_h).

Where ω(x) is large with respect to the density r.m.s. error, it is highly unlikely that pixel x belong to the bulk solvent. So, for the probability that the pixel belong to the solvent, we can take

$[p_{\rm solv}({\bf x}) \propto {{1}\over{1+\exp\{\beta[\omega(\bf x)-\mu ]\}}}. \eqno (27)]$

The value of μ depends on the number of pixels that define the solvent region (or the solvent-volume fraction); it can be computed by histogramming the ω(x) function and choosing for μ the value of ω(x) that will give the correct number of pixels within the solvent, starting from the pixels where the fluctuation is lowest, and including all the pixels with increasing values of the local fluctuation, until the desired solvent fraction is achieved.

The probability that the pixel at x belongs to the macromolecule is then

$[p_{\rm macrom}({\bf x}) = 1-p_{\rm solv}({\bf x}) \propto {{1}\over{1+\exp\{-\beta [\omega(\bf x)-\mu]\}}}. \eqno (28)]$

5.2. Homographic exponential modelling of missing atoms envelopes

This section describes the homographic exponential modelling of macromolecular envelopes starting from noisy maps. In particular, a description is given of the calculation of an homographic exponential model for the missing atom envelope in the presence of the density for the partial structure $[\rho_{\rm frag}({\bf x})]$ (see §4.3).

Once the local density fluctuation ω(x) has been obtained along the lines described in §4.3 and its histogramming has given the value of μ_macrom that corresponds to the appropriate solvent fraction, one has the homographic exponential model for the whole macromolecular envelope,

$[q_{\rm macrom}({\bf x}) = {{1}\over{1+\exp \{ -\beta_{\rm macrom} [\omega({\bf x})-\mu_{\rm macrom}] \}}}, \eqno (29)]$

the value of β_macrom being proportional to the reciprocal r.m.s. error of the starting density (25). Then, to exclude the fragment region from the prior-probability distribution for the random atoms, a homographic exponential model of the fragment density is needed. The local fluctuation ω_frag(x) can be computed based on ρ_frag(x) as outlined in §4.3; the values of β_frag and μ_frag are computed from the r.m.s. error of the fragment model density and its fractional volume, as seen above. The homographic exponential model for the fragment density is then

$[q_{\rm frag}({\bf x}) = {{1}\over{1+\exp \{-\beta_{\rm frag} [\omega_{\rm frag}({\bf x})-\mu_{\rm frag}]\}}}. \eqno (30)]$

Finally, the homographic exponential model for the missing atoms envelope is obtained by imposing that the pixel lies in the whole macromolecule envelope but not in the fragment envelope,

$[\eqalignno {q_{\rm rand}({\bf x})& = q_{\rm macrom}({\bf x})\times\left[1-q_{\rm frag}({\bf x})\right] & (31) \cr m_{\rm rand}({\bf x})& = {{V}\over{\textstyle \int_{V}q_{\rm rand}({\bf x})\,{\rm d}^{3}{\bf x}}} \times q_{\rm rand}({\bf x}). & (32)}]$

5.3. A simple test

We describe here a simple calculation that investigates the behaviour of homographic exponential modelling of a known envelope m(x) under truncation of its Fourier spectrum, and compares it with a traditional finite-resolution Fourier expansion of the same m(x).

If m(x) is a given envelope and we intend to parametrize it using an homographic exponential model (28), we first map m(x) to the (0, 1) open interval by linear scaling,

$[m'({\bf x}) = {{\left[m({\bf x})-\min m({\bf x})\right]}\over{\left[\max m({\bf x})-\min m({\bf x})\right]}}. \eqno (33)]$

Then, we can compute the ω(x) from

$[\omega({\bf x}) = {{1}\over{\beta}} \log \left[{{m'({\bf x})}\over{1-m'({\bf x})}}\right]+\mu. \eqno (34)]$

Fourier analysis of ω(x), truncation of its Fourier coefficients at resolution d and Fourier synthesis of the truncated set of coefficients lead to the resolution-truncated ω_d(x) distribution

$[\omega_{d}({\bf x}) = {\overline {\cal F}}\{{X_{d}({\bf h})\times {\cal F}[{\omega({\bf x})}]({\bf h}})\}({\bf x}), \eqno (35)]$

where the truncation of the Fourier spectrum of ω(x) at resolution d in (35) is performed by multiplying it by the indicator function X_d(h),

$[\eqalign {X_{d}({\bf h}) & = 1 \,\, {\rm if} \,\,h \geq d, \cr & = 0 \,\, {\rm if}\,\, h\ \lt\ d.}\eqno (36)]$

The homographic exponential, resolution-truncated m_HE,d(x) is then

$[\eqalignno {m'_{{\rm HE},d}({\bf x})& = {{1}\over{1+\exp\{-\beta[\omega_{d}({\bf x})-\mu]\}}}, & (37) \cr m_{{\rm HE},d}({\bf x})& = {{V}\over{\textstyle \int_{V}m'_{{\rm HE},d}({\bf x})\,{\rm d}^{3}{\bf x}}} \times m'_{{\rm HE},d}({\bf x}). &(38)}]$

We note here that for this particular test the actual values of β and μ are irrelevant, provided the same values are used in (34) and (37).

The conventional Fourier expansion of m(x), with truncation at resolution d, reads

$[m_{{\rm FT},d}({\bf x}) = {\overline {\cal F}} \{{X_{d}({\bf h})\times{\cal F}[{m({\bf x})}]({\bf h})}\}({\bf x}). \eqno (39)]$

m_HE,d(x) and m_FT,d(x) differ from m(x) because of the resolution truncation; m_FT,d(x) has no Fourier components past d Å, while m_HE,d(x), computed from the same number of Fourier coefficients, possesses extra-resolution owing to the exponential step.

In the following, we describe the test reconstruction of a model envelope for porcine pancreatic elastase (PPE; Meyer et al., 1986 ; Schiltz et al., 1997 ). The model envelope m(x) was generated as explained in §4.2, using the PDB-deposited structure, with a masking radius R = 2 Å and a blurring factor B = 100. A conventional Fourier truncation and a truncated homographic exponential model were used to reconstruct the model envelope, as explained above. As noted in §2, all envelopes have been normalized so that their average in the unit cell is unity.

Table 1 reports the real-space overall correlation coefficients between the model envelope and its Fourier-truncated and homographic exponential-truncated reconstructions. The Fourier-truncated envelope gives marginally higher CCs when the resolution used for truncating the coefficients is lower than 25 Å: this is because the amplitudes and phases of the very few coefficients retained are exact for this envelope and not for m_HE,d(x). Overall, the values of the CCs are very similar for the two methods, mainly because the correlation coefficients are dominated by the lowest resolution components, which are essentially correct in both maps.

Table 1
Porcine pancreatic elastase: real-space correlation coefficients between a model envelope m(x) and its reconstructions by truncated homographic exponential modelling [m_HE,d(x)] and truncated Fourier synthesis [m_FT,d(x)]

Resolution d (Å) (No. coeffs)	〈CC(m, m_HE,d)〉	〈CC(m, m_FT,d)〉
30 (7)	0.594	0.604
25 (12)	0.634	0.662
20 (22)	0.760	0.758
15 (51)	0.840	0.832

More informative is the visual inspection of sections of the envelopes. Fig. 1 shows a section in the [100] plane of the PPE crystal for the model envelope; Figs. 2 and 3 show the same section of the 15 Å, Fourier-truncated and homographic exponential truncated envelopes, respectively, m_FT,d=15Å(x) and m_HE,d=15Å(x). In Fig. 2, m_FT,d=15Å(x) shows the well known Fourier artefacts arising from truncation: negative ripples, peaky features and a smeared out protein–solvent boundary. In Fig. 3, m_HE,d=15Å(x) is positive everywhere, has a flatter protein ceiling, a steeper slope at the solvent–protein boundary and a flatter solvent floor, with few oscillations. The solvent regions match the ones in the model envelope.

Figure 1
Porcine pancreatic elastase, [100] section of the model envelope m(x). Section: 57.973 × 75.32 Å. The centre of the section is the macromolecule's centre of gravity. The density was obtained by masking with a radius of 2 Å around the model and blurring with a Gaussian temperature factor B = 100.

Figure 2
Porcine pancreatic elastase, [100] section of the 15 Å truncated Fourier reconstruction of the model envelope, m_FT,d=15Å(x). Size and orientation as in Fig. 1

. The density was obtained by truncating the Fourier spectrum of the model density at 15 Å [51 data; see (39

)].

Figure 3
Porcine pancreatic elastase, [100] section of the 15 Å truncated homographic exponential reconstruction of the model envelope, m_HE,d=15 Å(x). Size and orientation as in Fig. 1

. The density was obtained by truncating the ω spectrum at 15 Å (51 data) and recomputing the homographic exponential model (37).

Table 2 contains the correlation coefficients between Fourier coefficients of the model PPE envelope and the Fourier coefficients of the 15 and 20 Å truncated homographic exponential model. Fig. 4 plots the same Fourier coefficients in resolution ranges. The fluctuations observed are typical of the spectrum of macromolecular envelopes; still, the amplitudes of the Fourier components of m_HE,d=15Å(x) retain an average correlation coefficients as high as 0.306 up to 8.2 Å, owing to the extrapolation achieved by the exponential step.

Table 2
Porcine pancreatic elastase: reciprocal-space correlation coefficients between the Fourier components $[\cal F]$ [m(x)](h) of a model envelope and the Fourier components $[\cal F]$ [_mHE,d(x)](h) of its truncated homographic exponential reconstruction

	〈CC{ $[{\cal F}]$ [m(x)](h), $[{\cal F}]$ [m_HE,d(x)](h)}〉
Resolution (Å) (No. coeffs)	d = 15 Å	d = 20 Å
14.1 (61)	0.982	0.920
10.0 (93)	0.170	0.125
8.2 (125)	0.306	0.087
7.1 (136)	0.118	−0.007
6.3 (151)	0.042	−0.040
5.8 (166)	0.154	0.079

Figure 4
Porcine pancreatic elastase. Fourier components of the model envelope 〈 $[{\cal F}]$ [m(x)](h)〉 and of its 15 Å truncated reconstructions 〈 $[{\cal F}]$ [m_FT(x)](h)〉 and 〈 $[{\cal F}]$ [m_HE(x)](h)〉. Fs were averaged in groups of ten data each. The correlation coefficients 〈CC{ $[{\cal F}]$ [m(x)](h), $[{\cal F}]$ [m_FT(x)](h)}〉 are not shown because they are 1.0 for d > 15 Å and zero for d < 15 Å.

6. Conclusions

The macromolecular envelope m_rand(x) is a continuous distribution and not a binary mask; even regions of low density (or low-density r.m.s.d., if a variance filter is used) can therefore be retained within the envelope, with a (possibly small) non-zero probability. The subsequent maximum entropy modulation of the envelope itself therefore has a chance of building up density in the same regions. This has potential in structure completion by density-modification techniques. The only other published example of solvent flattening using real-space continuous probability distributions is the Gaussian distribution described by Terwilliger (1999 ). The map-based algorithm implemented in BUSTER (§5) differs from the past published ones in that the macromolecular envelope is a homographic exponential model and therefore can be parametrized with a few coefficients of ω while still retaining its `binary-like' character.

Supporting information

3D view: 1lvy

PDB reference: porcine pancreatic elastase, 1lvy

Footnotes

¹For a recent illustration of the use of the Babinet principle in bulk-solvent correction, see Guo et al. (2000 ).

²Suffice here to say that first $[{\cal F}]$ [m_rand(x)](h) is computed by taking the products of $[{\cal F}]$ [χ_rand(x)](h) and $[{\cal F}]$ [G(x; B_frag)](h); then, the set of $[{\cal F}]$ [m(x)_rand](h) are made periodic on the lattice reciprocal to the real-space crystallographic grid. These aliased structure factors undergo Fourier synthesis and m_rand(x) is sampled on the desired grid; the aliasing ensures that the m_rand(x) distribution is positive everywhere and free from Fourier-truncation artefacts.

Acknowledgements

This work was partially supported by a TMR Marie Curie Grant (to PR) and a Sponsored Research Agreement from Pfizer Central Research (to GB). We wish to thank one of the referees for extremely helpful reviewing of the manuscript.

References

Abrahams, J. P. (1997). Acta Cryst. D53, 371–376. CrossRef CAS Web of Science IUCr Journals Google Scholar
Abrahams, J. P. & Leslie, A. (1996). Acta Cryst. D52, 30–42. CrossRef CAS Web of Science IUCr Journals Google Scholar
Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802. CrossRef CAS IUCr Journals Web of Science Google Scholar
Bricogne, G. (1993). Acta Cryst. D49, 37–60. CrossRef CAS Web of Science IUCr Journals Google Scholar
Bricogne, G. (1997). Methods Enzymol. 276, 361–423. CrossRef CAS Web of Science Google Scholar
Bricogne, G. & Irwin, J. J. (1996). Proceedings of the CCP4 Study Weekend. Macromolecular Refinement, edited by E. Dodson, M. Moore, A. Ralph & S. Bailey, pp. 85–92. Warrington: Daresbury Laboratory. Google Scholar
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals Google Scholar
Guo, D., Blessing, R. H. & Langs, D. A. (2000). Acta Cryst. D56, 451–457. Web of Science CrossRef CAS IUCr Journals Google Scholar
Leslie, A. (1987). Acta Cryst. A43, 134–136. CrossRef CAS Web of Science IUCr Journals Google Scholar
Meyer, E. F., Radhakrishnan, R., Cole, G. M. & Presta, L. G. (1986). J. Mol. Biol. 189, 553–559. CrossRef PubMed Web of Science Google Scholar
Perrakis, A., Morris, R. & Lamzin, V. (1999). Nature Struct. Biol. 6(2), 458–463. Web of Science CrossRef Google Scholar
Ramin, M. (1999). PhD thesis. LURE, Université Paris XI, Orsay, France. Google Scholar
Reif, F. (1965). Fundamentals of Statistical and Thermal Physics, 1st ed., pp. 350–351. Singapore: McGraw–Hill. Google Scholar
Roversi, P., Irwin, J. & Bricogne, G. (1998). Acta Cryst. A54, 971–996. Web of Science CrossRef CAS IUCr Journals Google Scholar
Roversi, P., Irwin, J. & Bricogne, G. (2000). In Electron, Spin and Momentum Densities and Chemical Reactivities, edited by P. G. Mezey & B. E. Robertson. Dordrecht: Kluwer. In the press. Google Scholar
Schiltz, M., Shepard, W., Fourme, R., Prangé, T., de La Fortelle, E. & Bricogne, G. (1997). Acta Cryst. D53, 78–92. CrossRef CAS Web of Science IUCr Journals Google Scholar
Terwilliger, T. C. (1999). Acta Cryst. D55, 1863–1871. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tronrud, D. E. (1997). Methods Enzymol. 277, 306–319. CrossRef CAS PubMed Web of Science Google Scholar
Tronrud, D. E., Ten Eyck, L. F. & Matthews, B. W. (1987). Acta Cryst. A43, 489–501. CrossRef CAS Web of Science IUCr Journals Google Scholar
Wang, B.-C. (1985). Methods Enzymol. 112, 813–815. Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

BIOLOGICAL
CRYSTALLOGRAPHY

ISSN: 1399-0047

Volume 56| Part 10| October 2000| Pages 1316-1323

doi:10.1107/S0907444900008490

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text

Search term		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Modelling prior distributions of atoms for macromolecular refinement and completion

1. The case for low-resolution distributions in partial structure refinement and completion

2. Symbols used in this paper

3. The structural model

3.1. The partial structure model

3.2. The missing structure model

3.3. The bulk-solvent model

4. Computing mrand(x)

4.1. Uniform prior

4.2. Model-based non-uniform prior

4.3. Map-based non-uniform prior

5. Homographic exponential modelling

5.1. The Fermi–Dirac distribution

5.2. Homographic exponential modelling of missing atoms envelopes

5.3. A simple test

6. Conclusions

Supporting information

Footnotes

Acknowledgements

References

research papers

4. Computing m_rand(x)