- 1. Introduction
- 2. Basic elements of the SMAR algorithm
- 3. The SMAR residuals
- 4. The phase refinement modes in SMAR
- 5. Conclusions
- A1. Definition and peak analysis of the δM Fourier synthesis
- A2. The general equation for δ(M) direct methods
- A3. The δM Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms
- Supporting information
- References
- 1. Introduction
- 2. Basic elements of the SMAR algorithm
- 3. The SMAR residuals
- 4. The phase refinement modes in SMAR
- 5. Conclusions
- A1. Definition and peak analysis of the δM Fourier synthesis
- A2. The general equation for δ(M) direct methods
- A3. The δM Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms
- Supporting information
- References
research papers
The general equation of δ and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples
aInstitut de Ciència de Materials de Barcelona (CSIC), Campus de la UAB, 08193 Bellaterra, Catalonia, Spain
*Correspondence e-mail: jordi.rius@icmab.es
This paper is dedicated to the memory of Professor Carles Miravitlles.
The general equation δM(r) = ρ(r) + g(r) of the δ (δ-GEQ) is established which, when expressed in the form δM(r) − ρ(r) = g(r), is used in the SMAR phasing algorithm [Rius (2020). Acta Cryst A76, 489–493]. It is shown that SMAR is based on the alternating minimization of the two residuals Rρ(χ) = ∫V [ρ(χ) − ρ(Φ)sρ]2 dV and Rδ(Φ) = ∫V mρ[δM(χ) − ρ(Φ)sρ]2 dV in each iteration of the algorithm by maximizing the respective Sρ(Φ) and Sδ(Φ) sum functions. While Rρ(χ) converges to zero, Rδ(Φ) converges, as predicted by the theory, to a positive quantity. These two independent residuals combine δM and ρ each with |ρ| while keeping the same unknowns, leading to overdetermination for diffraction data extending to atomic resolution. At the beginning of a SMAR phase the zero part of the mρ mask [resulting from the zero conversion of the slightly negative ρ(Φ) values] occupies ∼50% of the unit-cell volume and increases by ∼5% when convergence is reached. The effects on the residuals of the two SMAR phase modes, i.e. only using density functions (slow mode) supplemented by atomic constraints (fast mode), are discussed in detail. Due to its architecture, the SMAR algorithm is particularly well suited for Deep Learning. Another way of using δ-GEQ is by solving it in the form ρ(r) = δM(r) − g(r), which provides a simple new derivation of the already known δM tangent formula, the core of the δ recycling phasing algorithm [Rius (2012). Acta Cryst. A68, 399–400]. The nomenclature used here is: (i) Φ is the set of φ phases of ρ to be refined; (ii) δM(χ) = FT−1{c(|E| − 〈|E|〉)×exp(iα)} with χ = {α}, the set of phases of |ρ| and c = scaling constant; (iii) mρ = mask, being either 0 or 1; sρ is 1 or −1 depending on whether ρ(Φ) is positive or negative.
Keywords: SMAR phasing algorithm; δ-GEQ; δ direct methods; SMAR residuals; ipp density modification; crystal structure solution; origin-free modulus sum function; δ recycling algorithm; δM tangent formula.
1. Introduction
Historically, ; Cochran, 1952; Zachariasen, 1952), they have seen continuous advances over the years, not only driven by the steady increase in computing power, but also by clever algorithms and efficient implementations. Computing power continues to increase but the solution of larger structures at lower resolution is still hindered in the case of equal atoms and requires the use of ubiquitous model fragments and density modification. In their widespread successful application, the time scale is in any case shorter than the experimental effort involved in X-ray structure determination.
were developed to solve small crystal structures directly from high-resolution single-crystal diffraction data. From their origins in the 1950s (Sayre, 1952One of the most recent advances in ρ|-based algorithm (SMAR) maximizing the SM,|ρ| (= SMAR) sum function (Rius, 2020). It corresponds to the latest stage in the evolution of the SM origin-free modulus sum function (originally named ZR; Rius, 1993) which still involved triple-phase structure invariants. Even today, ZR is the simplest and certainly one of the most successful working exclusively in (Rius et al., 1995). However, it was almost immediately superseded by the Shake & Bake strategy alternating between reciprocal- and real-space refinements (dual-space recycling methods) which allowed the solution of larger crystal structures (Weeks et al., 1993; Miller et al., 1993). Dual-space methods do not eliminate phase relationships in but complement them with peak picking in real space as an extreme form of density modification. A comprehensive description at the height of development of dual-space recycling methods can be found in the International Tables for Crystallography, Vol. F (Sheldrick et al., 2012).
has been the |Following this trend, triple-phase invariants (whose number becomes exceedingly large for large structures) were replaced in SM by the more efficient and accurate Fourier transforms (FT) (Rius et al., 2007; Rius, 2014). More recently, the ρ2 function in SM was replaced by the mathematically simpler |ρ| function and a new mask was introduced that takes the negative values of ρ into account. Both changes have led to the SMAR phasing algorithm (Rius, 2020). Important aspects of its application have been covered in two recent publications, the first dealing with its extension to larger crystal structures by introducing the fast inner-pixel preservation procedure (ipp) for density modification (Rius & Torrelles, 2021) and using initial phase values derived from the modulus function. In the second publication SMAR was adapted to the solution of substructures in protein crystals (Rius & Torrelles, 2022). The present short introduction is intended to provide a brief overview of the 30-year development of that particular type of which shares the implicit or explicit use of the ρ ≃ δM (or δP) approximation (Rius, 2012a). To distinguish this family of from the rest – and also to help in their identification – the general term `δ is coined in this publication.
The aim of this article is to complete the theoretical foundations of the SMAR phasing algorithm. The algorithm described by Rius (2020) and shown in Fig. 1 essentially consists of the iterative application of the phasing formula
where (i) φk is the phase of the kth of ρ and belongs to the set Φ of phases to be refined; (ii) δM(χ) is equal to with χ = {α} being the set of phases of |ρ| and c a scaling constant; (iii) r is a position vector inside V, the unit-cell volume; and (iv) is a mask function defined in Table 1 that can take the values 1, 0 or −1. The representative test case shown in Table 1 (t = 2.5) indicates that the zero part in the mask is around 50% at the beginning of a phase with initially random phase values; as the converges, the zero part increases by up to 5%. Most of the remaining part of the mask is taken up by ones, as the proportion of −1 values is kept very small (<1.0%).
|
The SMAR phasing algorithm was originally derived from the SMAR sum function. A sum function such as SMAR generally corresponds to the mixed integral of a residual as e.g. in the case of expression (20) in relation to (21) in this article. Only when the residual is known can the derivation of the phasing algorithm be considered complete, which leads to a better understanding of it and enables, for example, the estimation of the minimum value of the residual. In this context, the use of the δM ≃ ρ approximation in Rius (2020) represented a limitation. To overcome this, the relationship between δM and ρ is worked out in Appendix A, resulting in δ-GEQ which, when modified accordingly, leads to one of the two desired SMAR residuals. The derivation of the second residual is simpler and introduces the phases corresponding to |ρ(Φ)| in the algorithm. To increase the readability of this article, δ-GEQ is derived separately in Appendix A and a summary thereof given in Section 2.2.
To complete this introduction, it is interesting to mention, particularly for newcomers, that density modification in the context of ACORN (Foadi et al., 2000). ACORN and SHELXE (Sheldrick, 2002) both use density sharpening and negative density elimination. Later, in the VLD algorithm, a difference and a flipping term were combined (Burla et al., 2010). Finally, another related and more modern development was SHELXT, which is more broadly associated with the charge-flipping algorithm but combines it with and with density modification at part of the peak positions, used to eliminate atoms at random without atoms (Sheldrick, 2015). One distinctive feature of the density modification in SMAR is the zero conversion of only slight negative densities and the preservation of the inner peak pixels (Rius & Torrelles, 2021).
was efficiently introduced inAll calculations in this article have been performed with a modified version of the XLENS_v1 code (Rius, 2011). The diffraction data used in the test calculations correspond to:
(i) Actinomycin Z3 with 1228 (C, N, O) + 8 Cl atoms in the 1a7z), 4 Cl sites are partially occupied and the other 4 Cl atoms have a rather large B value, so that their scattering powers are considerably reduced. The minimum d spacing (dmin) is 0.95 Å; a = 14.803, b = 24.780 and c = 65.059 Å, P212121 (Schäfer et al., 1998).
According to the protocol in the Protein Data Bank (PDB code(ii) Alpha1 peptide with 503 (C, N, O) + 1 Cl. dmin = 0.90 Å; a = 20.846, b = 20.909 and c = 27.057 Å, α = 102.40, β = 95.33 and γ = 119.62°, P1 (Privé et al., 1999).
(iii) Pep1 with 344 (C, N, O). dmin = 1.00 Å; a = 13.999, b = 21.602 and c = 21.615 Å, P212121 (Antel et al., 1995).
(iv) Suoa with 188 (C, N, O). dmin = 1.00 Å; a = 18.350, b = 21.441 and c = 8.350 Å, P212121 (Oliver & Strickland, 1984).
2. Basic elements of the SMAR algorithm
2.1. The ρ Fourier synthesis: its mask definition and general relationship to |ρ|
When solving crystal structures by
from atomic resolution X-ray diffraction intensity data, the electron-density function is normally calculated with the Fourier synthesiswhere |Ek| is the modulus and φk is the phase value of the (quasi)-normalized of the kth reflection (Main, 1975). The moduli and phases of all reflections form the {|E|} and Φ = {φ} sets, respectively. For clarity, the phase type (and eventually the modulus type) used in the Fourier synthesis is added to the function name when required to avoid confusion, e.g. ρ(r, ΦT) specifies that ρ(r) is calculated with ΦT (the index T stands for true phase values).
Expression (2) is stated in terms of the normalized scattering factors, i.e. = for an arbitrary atom j (with fj being its normal scattering factor including the Debye–Waller factor). For the sake of simplicity, a with N equal atoms in the is assumed throughout, so that the normalized scattering factor reduces to = .
Due to the limited number of Fourier terms (a consequence of the finite number of measured intensities), the ρ(r, ΦT) synthesis is affected by Fourier series truncation effects. These termination effects are mainly reflected in the broadening of the atomic peaks, each consisting of a large spherically symmetric positive central part (= CORE) surrounded by negative and positive waves (ripples) that decrease as one moves away from the peak center. For point-like atoms (for which broadening due to scattering factors and thermal vibration effects are largely removed), the limiting spherical surface of the CORE lies ∼0.72 × dmin (Å) from the peak center [it corresponds to the first zero of the T3 in Lipson & Cochran (1966)]. For locations lying outside neighboring COREs, the ripple contributions of neighboring atoms add up and result in slightly negative and slightly positive zones (SNZs and SPZs, respectively). The ρ(r, Φ) values contained in the interval [0, −tσρ] make up the SNZs (σρ and t are defined in Table 1). Since the ρ distribution in the crystal is positive definite, the probability that the SNZs accommodate the COREs of atomic peaks is zero, so this information can be introduced in the form of an mρ,t mask which is 0 for all negative ρ in the [0, −tσρ] interval (negative ripple conversion to zero) and 1 elsewhere. As shown in Table 1, the mask value depends on ρ(r) and the t parameter (since t is always 2.5 in this work, it is suppressed in mρ,t for notation simplicity and we use simply mρ). Table 1 also shows that the zero part of the mask extends to at least 50% of the unit-cell volume for t ≃ 2.5 (just for comparison, for t ≥ 10 the threshold is so low that the mask values of all negative ρ values become zero).
The relationship between the syntheses |ρ(Φ)| and ρ(Φ) is given by the equality
in which is 1 or −1 depending on whether ρ(r, Φ) is positive or negative. This equality is completely general, so the product ρ(r, Φ)sρ(r) can always replace |ρ(r, Φ)| in the derivation of the residuals. Finally, since ρ is positive definite and diffraction data are assumed to reach atomic resolution, it is clear for Φ = ΦT that the negative values in (3) are always associated with small ρ(r, ΦT), so that (3) becomes ρ(r, ΦT) = |ρ(r, ΦT)| . If we denote the phase set of the Fourier coefficients of the |ρ(r, ΦT)| synthesis by χT, then ΦT ≅ χT.
2.2. The general equation of δ and its different forms
The δM Fourier synthesis defined by
with c = , is studied in detail in Appendix A, showing that it contains two types of positive peak (A and B). Table 2 lists their peak strengths and positions. The stronger peaks of type A correspond to ρ (which are resolved in the Fourier map). The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (coincidence is accidental). These are the main constituents of the function g(r) and, due to their large number, must be severely overlapped in the The standard form of the general equation of the δ (δ-GEQ) corresponds to the sum of both contributions ρ(r) and g(r) [equation (43)]. However, δ-GEQ can be used in other forms. For example it can be solved for ρ, so δ-GEQ then takes the form
|
One obvious difficulty here is how to handle the unknown g function. This difficulty can be circumvented by introducing the mask mΔδ (being either 0 or 1), which results from the realizations that (i) δM and ρ have their strong peaks at the atomic positions; and (ii) g is formed by the positive strongly overlapped peaks of type B which are much more numerous but also much smaller than the peaks in δM and ρ. As shown in Fig. 2, the mΔδ mask is obtained by expressing the threshold Δδ in terms of the computable σ(δM) by means of Δδ = t1σ(δM), with t1 ≃ 2.5, such that mΔδ(r) = 1 for δM(r) ≥ Δδ and mΔδ(r) = 0 otherwise. This can be mathematically expressed by
with K being a suitable scaling constant. Fourier transforming both sides of (6), and since E and ρ are linked by the Fourier transform E = FT(ρ), the formula
results. The angular part of (7) corresponds to the δM tangent formula which forms the core of the δ recycling algorithm (Rius, 2012a,b). It has been successfully applied to X-ray diffraction data from small crystal structures, to 3D electron diffraction data (Rius et al., 2013; Capitani et al., 2014) and, due to its robustness, to synchrotron tts (tts = through the substrate) microdiffraction data (Rius et al., 2017). Some considerations regarding the implementation of the δM tangent formula are given in Section A3.
SMAR uses another form of δ-GEQ in which g is isolated, namely
According to ΦT ≅ χT at the end of Section 2.1, (8) can be expressed in terms of χT (= the set of phases corresponding to |ρ(ΦT)|) so that
where both sides are multiplied by mρ [which is derived from ρ(ΦT)]. Expression (9) is the basic equation for one of the two SMAR residuals (Rδ). Note the positive effect of introducing the mask mρ. Since the zero part of the mask is ≥50% of the unit-cell volume, the unwanted contribution of g in (9) is suppressed for at least half of the unit cell.
3. The SMAR residuals
Each iteration of the SMAR algorithm consists of two differentiated parts, ending each part with the application of the corresponding phasing formula (upper-left and lower-right corners of Fig. 1). In this section, the two SMAR residuals leading to these phasing formulae are determined. In the following r is omitted unless absolutely necessary.
3.1. The Rρ(χ) residual
The |ρ(Φ)| density function results from applying the absolute value operator to the ρ(Φ) Fourier synthesis. The structure factors of |ρ(Φ)| correspond to its Fourier transform,
with the moduli and phase values of the structure factors being globally denoted {|ξ|} and χ, respectively. The inverse Fourier transform of both sides of (10) yields
For χ = χT it can be assumed that . Consequently, the integral
must be close to zero for which corresponds to the minimum of Rρ. Simplifying the notation of ρ({|E|}, χ) to ρ(χ) and replacing ρ({|ξ|}, χ) first by |ρ(Φ)| according to (11) and then by ρ(Φ)sρ according to (3), integral (12) takes the simpler form
During the ρ(Φ)sρ is always positive. To find the new χ set minimizing Rρ(χ), the integrand of (13) is developed into three integrals. The two integrals with integrands |ρ(Φ)|2 and ρ2(χ) are both equal to and hence phase independent; however, the third one,
the functionis phase dependent. The maximum of a functional like Sρ(χ) [which is equivalent to the minimum of Rρ(χ) due to the minus sign in (14)] can be found by solving the condition for an extremum, ∂Sρ/∂α = 0, ∀ α ∈ χ, which, in parallel to Rius et al. (2007), yields the χ phasing formula,
3.2. The Rδ(Φ) residual
The residual Rδ is obtained from the left side of (9) after generalizing χT to χ. This generalization entails two changes: (i) δM(r, χT) is simply changed to δM(r, χ), since in both cases the Fourier coefficients of δM contain the observed |E| − 〈|E|〉 values; and (ii) ρ(r, χT) is changed to ρ(r, {|ξ|}, χ). However, since ρ(r, {|ξ|}, χ) = |ρ(r, Φ)| = ρ(r, Φ)sρ(r) because of (11) and then (3), the selected generalized form is ρ(r, Φ)sρ(r) [this selection ensures that ρ(Φ) enters the residual expression (17)]. By applying these two changes to the left-hand side of (9), it becomes
Integration of (16) after squaring gives the Rδ residual,
where ρ(Φ)mρsρ is always positive [= |ρ(Φ)|mρ]. Of methodological importance is the minimum value of Rδ which occurs for Rδ(ΦT). An estimate of this value is obtained by squaring and integrating the right-hand side of (9), namely
In this integral, g = δM(ΦT) − ρ(ΦT) given in (8) and ρ(ΦT) used to calculate mρ are both different functions with different peak distributions. Consequently, the samples of g2 at the points where mρ = 1 can be assumed to be random, allowing the factorization of 〈mρ〉 from the integral
The value of the normalized Ig2 is given by equation (49) in Appendix B, i.e.
According to (19), Rδ does not converge to zero but to the positive value Rδ(ΦT) ≃ 1.12 × 〈mρ〉. Since it is known from Table 1 that 〈mρ〉 is ∼0.45 (at the end of a converging refinement), the approximated value of Rδ(ΦT) should be 0.45 × 1.12 = 0.50.
Once the residual Rδ is defined and its minimum value known, the last step is to find the Φ phase set that minimizes Rδ. For this purpose, (17) is transformed into the sum
with P, Q and Sδ being the following integrals:
The presence of the mask mρ complicates the solution of these integrals. The interested reader can find in Section 3.2.1 their evaluations with the help of experimental information. The principal conclusion of Section 3.2.1 is that minimizing Rδ is essentially equivalent to maximizing Sδ. Knowing this, one only needs to find the desired maximum of Sδ(Φ) by solving the condition for an extremum, ∂Sδ/∂φ = 0, ∀ φ ∈ Φ. By expressing ρ(Φ) in (21) as a Fourier synthesis, then
and, in parallel to Rius et al. (2007) and Rius (2020), the application of the condition for an extremum to (24) gives the Φ (or also SMAR) phasing formula,
For simplicity, the Fourier-transformed function δMmρsρ is denoted ρ′ in the slow convergence mode and ρ′′ in the fast convergence mode (see Section 4).
3.2.1. Evolution of Rδ, Sδ, P and Q during the SMAR phase refinements
First, the values of the integrals (21), (22) and (23) are normalized by division with
which only depends on the |E| values and is therefore computable. The values of −2Sδ, P and Q during the phase progress were determined with the SMAR phasing algorithm already implemented in XLENS_v1 (Rius, 2020). The density function values used in the estimation of (21), (22) and (23) are those available before applying ipp (Fig. 1). Tables 3 and 4 show the evolutions for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide (each grouping of four numbers given in this subsection always refers to this order of test structures; the output files of the test calculations are available in the supporting information). For Actinomycin Z3, the evolutions of 2Sδ, ΔP and ΔQ are also represented in Fig. 3. The evolution of the different integrals can be summarized as follows:
|
(i) Integral Sδ: When starting from random phase values, the initial −2Sδ values are always close to zero for all test structures and become −1.17, −1,18, −1.16 and −1.09 at the end of the respective convergent refinements.
(ii) Integral P: The initial P0 value of integral P is 0.50 for all test structures and, as the phase progresses, the difference ΔP = P − P0 increases. ΔP is approximately proportional to 2Sδ with the slopes 〈kΔP〉 equal to 0.171 (6), 0.155 (6), 0.159 (6) and 0.162 (5) for the four test structures (Tables 3 and 4). Consequently, the following empirical linear relationship between P and 2Sδ can be established,
(iii) Integral Q: To understand the significance of integral Q, the integral
is also calculated for each test structure [it only depends on the (|E| − 〈|E|〉)2 quantities]. The corresponding SDEL values are 1.762, 1.790, 1.756 and 1.728. According to Tables 3 and 4, the initial Q values are Q0 = 0.89, 0.89, 0.88 and 0.89, i.e. Q0 ≃ 0.50 × SDEL. In addition, during the respective phase refinements, the largest Q − Q0 differences are only 0.05, 0.01, 0.02 and 0.01. Consequently, it can be assumed that .
By taking all these results into account, Rδ can be simplified to
Since P0, Q0 and 〈kΔP〉 can be considered nearly constant during the phase it follows from (29) that minimizing Rδ is essentially equivalent to maximizing Sδ.
4. The phase modes in SMAR
Since most experimental results of SMAR have already been discussed by Rius & Torrelles (2021, 2022) and in Section 3.2.1 of this contribution, only a selection of points directly related to the topic of this article are treated here, grouped according to the convergence mode.
4.1. The slow convergence mode
This mode only works with density functions, that is, the positions of the atomic peaks are not used. This mode requires the inclusion of the Fourier terms of all reflections (strong + weak) in the calculation of ρ and δM. The ρ′ = δMmρsρ values entered in the SMAR phasing formula are obtained as follows:
(i) For slightly negative ρ values (amounting to 50–55% of the unit cell), the ρ′ values are 0.
(ii) For positive ρ values (regardless of their strength and representing 45–50% of the unit cell), ρ′ is equal to δM.
(iii) Only for very negative ρ values (<1% of the for t ≃ 2.5), ρ′ is equated to −δM (the minus sign multiplying δM tends to restore the sign of the very negative ρ value).
In summary, ρ′ corresponds either to unrestricted δM and −δM values or to fixed δM = 0 ones. The mask used by the SMAR algorithm results in smooth phase refinements based only on density functions. On the other hand, the experimental Rδ(ΦT) values calculated at the end of converging phase refinements using (20) are 0.48, 0.42, 0.43 and 0.47 for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide, respectively (Tables 3 and 4). These values agree with the theoretical estimation of Rδ(ΦT) ≃ 0.50 found in Section 3.2.
Regarding the very negative densities, some of them are due to the wrong input model (generated by the random starting phases) and disappear during the convergence process. The reductions observed in ρ′ are 0.17% → 0.08% for Actinomycin Z3, 0.18% → 0.06% for Alpha1 peptide, 0.18% → 0.02% for Pep1 and 0.17% → 0.01% for Sucrose (Tables 3 and 4). To get an idea of the effect of t on the phasing of the intensity data of the four test structures, the sums of their total number of successful trials (out of 25) were determined for t = 1.5, 2.0, 2.5 and 10.0 (in the last case, the negative densities are all zero). The respective sums are 34, 76, 69 and 59 (Table S1 in the supporting information). The largest sums are obtained for t = 2.0 and 2.5 and the smallest for t = 1.5. For t = 10.0, the resulting sum is slightly worse than for t = 2.0 or 2.5. These results suggest that (i) the best t values are between 2.0 or 2.5, and (ii) although they represent only a small percentage of the setting the very negative densities equal to zero is not beneficial for phase A possible explanation of the physical meaning of the very negative densities can be found in Section 3.3 of Rius (2020). In any case, a future comprehensive study focusing on this point would be useful.
4.2. The fast convergence mode
In this mode, which is the default operating mode of the SMAR algorithm, the part working only with density functions is supplemented by an additional step in which ρ′ is modified by the ipp method to give ρ′′ (Rius & Torrelles, 2021), which in turn replaces ρ′ in the Φ phasing formula (25). The ipp method is an effective way of accelerating the phase and assumes that the approximate number N of expected atoms is known (which is normally the case). Briefly explained, ipp identifies in the ρ′ Fourier map those grid points closest to the centers of the N largest peaks. The ρ′ values of the 27 inner-peak grid points are then preserved for each peak and the remaining grid points of the ρ′ Fourier map are set to zero, giving rise to the new ρ′′. In this way no interpolation is required to find peak centers and, at the same time, the large ρ′ values are preserved. Additionally, if the grid size Δgrid is ∼0.33 Å, ipp implicitly applies the minimum interpeak separation (mips) constraint. The criterion for considering a positive maximum of the ρ′ Fourier synthesis a peak is that the voxel closest to the peak center must be surrounded by 26 smaller nearest-neighbor voxels (some can even be negative). Consequently, none of the nearest neighboring voxels can become the center of another ρ′ peak. This means that for an isometric grid element with Δgrid = 0.33 Å the average mips value is 0.955 Å [the minimum, intermediate and maximum separations are 2 × 0.33 × 1 = 0.67 Å (6×), 2 × 0.33 × = 0.96 Å (12×) and 2 × 0.33 × = 1.16 Å (8×), respectively]. According to (19), the value of Rδ(ΦT) is proportional to 〈mρ〉. In the fast convergence mode, due to the application of ipp, 〈mρ〉 = 27N/Nvox, where Nvox is the total number of voxels in the For Actinomycin Z3, 27N = 33372 and Nvox = 664875, and hence 〈mρ〉 = 0.05, much smaller than the typical 〈mρ〉 values for the slow convergence mode (∼0.45). Accordingly, Rδ(ΦT) = 1.12 × 0.05 = 0.06 is also much smaller than in the slow convergence mode.
A characteristic of this mode is that the calculation of ρ(Φ) only includes Fourier terms of those reflections satisfying the |E| ≥ |E|lim condition with |E|lim = 1.0, while the calculation of δM(χ) (and thus of ρ′) is always done with the Fourier terms of all k reflections (Fig. 1). That only the large |E| values should participate in the ρ update is certainly related to the fact that only the 27 inner voxels close to the peak center are preserved (the rest of the peak voxels become part of the zero mask). This is supported by the fact that, in the slow convergence mode, the Fourier terms of all reflections must be included in the calculation of ρ(Φ).
4.3. The correlation coefficient
In addition to estimating Rδ, the agreement of minuends and subtrahends in (17) can also be estimated using the correlation coefficient
(using the density values before ipp for the slow convergence case). For refinements reaching convergence, the found CCρ′ values are 0.715 for Actinomycin Z3 (Table 3), 0.742 for Suoa, 0.736 for Pep1 and 0.705 for Alpha1 peptide (Table 4). These moderately high correlation coefficients also confirm the small discrepancy introduced in the CCρ′ calculation by the g contributions of those voxels with mρ = 1. As expected, the corresponding CCρ′′ values (also listed in Tables 3 and 4) are much higher due to the smaller number of voxels with mρ = 1, e.g. CCρ′ = 0.72 and CCρ′′ = 0.90 for Actinomycin Z3.
5. Conclusions
The main objective of this research was to complete the theoretical aspects of the SMAR phasing algorithm. For this purpose, the connection between δM and ρ has been examined in detail. This leads to the general equation of δ (δ-GEQ) which, written in its standard form, is δM = ρ + g, where the density function g is mainly formed by a large number of small positive B-type peaks. Two ways of using δ-GEQ have been investigated. In SMAR, δ-GEQ is used in its difference form, δM − ρ = g, while in δ recycling it is used solved for ρ, so ρ = δM − g. In this second case, it has been shown that the δM tangent formula can be derived directly from it by including a suitable mask.
Regarding the SMAR residuals it can be concluded that:
(i) Rρ(χ) measures the [ρ(χ) − ρ(Φ)sρ]2 differences in the entire The minimum value of Rρ is Rρ(χT) ≃ 0.
(ii) The Rδ(Φ) residual is based on its basic equation (9), i.e. δM(χT)mρ − ρ(χT)mρ = gmρ, where χT are the α phases corresponding to |ρ(ΦT)|.
(iii) Rδ(Φ) measures the [δM(χ)mρ − ρ(Φ)mρsρ]2 differences in the entire after considering that ρ(χ) ≅ ρ(Φ)sρ. The minimum value corresponds to Rδ(ΦT) where . It is shown that minimizing Rδ(Φ) is essentially equivalent to maximizing the sum function Sδ(Φ) [equation (21)], despite the presence of the mρ mask in the Rδ(Φ) definition. In all the examples calculated by the author, the Sδ maximum (characterized by a sudden Sδ increase) is always the true solution ΦT which stands out clearly from the false solutions.
(iv) The convergence of SMAR is achieved by alternately applying the χ and Φ (or SMAR) phasing formulas in each iteration. These are αnew = phase of FT{ρ(Φ)sρ} and φnew = phase of FT{δM(χ)mρsρ}, respectively.
It has been found that at the start of a SMAR the zero part of the mask (created by converting the slightly negative density function values to zero) occupies 50% of V and increases by ∼5% after convergence. According to Rδ(ΦT) ≃ 1.12〈mρ〉 the presence of the zero part of the mask leads to a drop in the Rδ value when convergence begins, since the volume of the regions with only a g contribution is reduced. When the number N of expected atoms is actively used (fast convergence mode), each SMAR iteration is supplemented with the ipp application, which increases significantly the volume of the zero part of the mask.
Finally, a brief reflection is in order. It is known that non-crystalline materials have a continuous diffraction pattern and that oversampling of the intensity data (Shannon, 1949) results in an overdetermined system of equations from which the phases can be solved (even at non-atomic resolution) (Miao et al., 2000). It is also known that oversampling cannot be applied to crystals due to their 3D periodicity. Here it is shown that, in the case of crystals, the combination of δM and ρ each with |ρ| produces two independent residuals while keeping the same unknowns. This also leads to overdetermination which should explain the observed efficiency of SMAR. It is also interesting to note that the SMAR algorithm is particularly well suited for Deep Learning due to its architecture.
APPENDIX A
Derivation of the general equation of δ direct methods
A1. Definition and of the δM Fourier synthesis
The term δM is defined in the by
with {φ} = ΦT and with c being an appropriate scaling constant. It can be reformulated by making use of the δM = δP/2 equality (Rius, 2012a), wherein δP is the Fourier synthesis which differs from (31) only in that (|Ek| − 〈|E|〉) is replaced by (|Ek|2 − 〈|E|2〉). Consequently, (31) can be written as
By expressing (|Ek|2 − |E|2) and in terms of atom positions, it follows that
with denoting the normalized scattering factor of atom j. As can be derived from (33), δM has N3 − N2 peaks at r = rj + rl − rm (l ≠ m), since is the unit for all k. Of interest are the values of δM at the N atomic positions, i.e. at r = rl with l = 1, N. It can easily be verified that there are N − 1 superposed peaks contributing to δM(rl), i.e. those that satisfy the equation rl = rj + rl − rm with l ≠ m and j = m, e.g. for N = 3 and r2 these are r1 + r2 − r1 and r3 + r2 − r3. To estimate the total strength of the δM peak at r = rl, expression (33) is rearranged into
with
For r = rl, the term in (34) becomes unity. In addition, if (35) and (36) are considered, expression (34) can be further simplified to
since = 〈|E|〉kNk and, for an equal atom structure, it holds that
The strength of an atomic peak in ρ is , so (38) can be simplified to
by considering = . Finally, by making
the strength of δM(rl) is equal to ρ(rl) in (40). The δM peaks placed at atomic positions compose the set of type A peaks.
Let us consider the remaining δM peaks at r = rj + rl − rm with l ≠ m and j ≠ m, which form the set of type B peaks. For a given B peak, the corresponding r position is obtained by adding the rj − rm, j ≠ m, interatomic vector to the atomic position vector rl, so the superposition of r with an atomic position is accidental. Consequently, the strength of a (single) B-type peak, e.g. at rjlm, is weaker than that of a δM(rl) peak (formed by the superposition of N − 1 single peaks). According to (40), it holds that
A2. The general equation for δ(M) direct methods
According to the above results, δM(ΦT) contains two types of positive peaks (A and B). Table 2 lists their peak strengths and positions. The stronger peaks of type A correspond to ρ and are resolved in the Fourier map. The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (coincidence is accidental). Due to their large number, i.e. N(N − 1)2 [for comparison, there are only N(N − 1) non-origin peaks in the modulus function], these peaks must have a strong overlap in the The peaks of type B are the main constituents of the g function. Consequently, δM can be considered as the sum of both contributions, i.e.
which is called the general equation of δ(M) In the following, the subscript M is left out of the equation name for the sake of generality. (Note that the g function could also contain contributions from even weaker unconsidered peaks.)
A3. The δM Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms
In general, the Fourier coefficients of δM and ρ have different moduli and may differ in phase values. This is not obvious when comparing (2) and (4) because in both expressions the corresponding Fourier terms have the same φ phase values (which is quite useful for programming). However, when the absolute values ||E| − 〈|E|〉| are used in the synthesis, a phase shift Δφ must be added to each φ to take into account the sign. Consequently, while the Fourier coefficients of ρ are |E|exp(iφ), those of δM are ||E| − 〈|E|〉|exp[i(φ + Δφ)], with Δφ = 0 for |E| > 〈|E|〉 (strong reflections) and Δφ = π for |E| < 〈|E|〉 (weak reflections).
APPENDIX B
Solution of integral Ig2 =∫V g2 dV
Assuming that the integrand of Ig2 is g2 = [δM(ΦT) − ρ(ΦT)]2, squaring the binomial gives
The integral of the first two summands in (44) when expressed in terms of the respective Fourier coefficients is
If one proceeds analogously with the third summand in (44), it follows that
This means that the integral is zero except for k′ = −k where it becomes V, so that the final term in (46) can be approximated by
The values of the exponentials in (47) are 1, so that it reduces to
By adding (45) and (48) and normalizing by = = Nk/V, the final expression of the integral Ig2 is obtained, namely
From the theory of the distribution of |E| values, the values of 〈|E|2〉, 〈|E|〉 (acentric case) and c ≃ 2/〈|E|〉 can be derived, i.e. 1.00, 0.89 and 2.25, respectively, so that (49) gives Ig2 ≃ 1.12. Note that Ig2 only depends on |E| and |E| − 〈|E|〉, since the phases cancel out. Consequently, the values of Ig2 for g2 equal to [δM(ΦT) − ρ(ΦT)]2 and to [δM(χT) − ρ(χT)]2 are identical.
Supporting information
Table S1 and output of test calculations of Section 3.2.1. DOI: https://doi.org/10.1107/S2053273324009628/tw5010sup1.pdf
Acknowledgements
The author thanks Dr Xavier Torrelles and Professor Salvador Galí for valuable suggestions and the reviewers for their constructive criticisms.
Funding information
The following funding is acknowledged: Agencia Estatal de Investigación (grant Nos. PID2021-124734OB-C22 and CEX2023-001263-S). The author acknowledges the support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).
References
Antel, J., Shedrick, G. M., Bats, J. W., Kessler, H. & Müller, A. (1995). Unpublished work. Google Scholar
Burla, M. C., Caliandro, R., Giacovazzo, C. & Polidori, G. (2010). Acta Cryst. A66, 347–361. Web of Science CrossRef CAS IUCr Journals Google Scholar
Capitani, G. C., Mugnaioli, E., Rius, J., Gentile, P., Catelani, T., Lucotti, A. & Kolb, U. (2014). Am. Mineral. 99, 500–510. Web of Science CrossRef ICSD Google Scholar
Cochran, W. (1952). Acta Cryst. 5, 65–67. CrossRef CAS IUCr Journals Web of Science Google Scholar
Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137–1147. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lipson, H. & Cochran, W. (1966). The Determination of Crystalline Structures. The Crystalline State, Vol. III, edited by W. L. Bragg, pp. 324–325. London: G. Bell and Sons Ltd. Google Scholar
Main, P. (1975). Crystallographic Computing Techniques, edited by F. R. Ahmed, p. 99. Copenhagen: Munksgaard. Google Scholar
Miao, J., Kirz, J. & Sayre, D. (2000). Acta Cryst. D56, 1312–1315. Web of Science CrossRef CAS IUCr Journals Google Scholar
Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). Science, 259, 1430–1433. CSD CrossRef CAS PubMed Web of Science Google Scholar
Oliver, J. D. & Strickland, L. C. (1984). Acta Cryst. C40, 820–824. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Privé, G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Protein Sci. 8, 1400–1409. PubMed Google Scholar
Rius, J. (1993). Acta Cryst. A49, 406–409. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rius, J. (2011). XLENS_v1: A Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain. Google Scholar
Rius, J. (2012a). Acta Cryst. A68, 77–81. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J. (2012b). Acta Cryst. A68, 399–400. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J. (2014). IUCrJ, 1, 291–304. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Rius, J. (2020). Acta Cryst. A76, 489–493. Web of Science CrossRef IUCr Journals Google Scholar
Rius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131–134. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J., Mugnaioli, E., Vallcorba, O. & Kolb, U. (2013). Acta Cryst. A69, 396–407. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J., Sañé, J., Miravitlles, C., Amigó, J. M. & Reventós, M. M. (1995). Acta Cryst. A51, 268–270. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339–347. Web of Science CrossRef IUCr Journals Google Scholar
Rius, J. & Torrelles, X. (2022). Acta Cryst. A78, 473–481. CrossRef IUCr Journals Google Scholar
Rius, J., Vallcorba, O., Crespi, A. & Colombo, F. (2017). Z. Kristallogr. 232, 827–834. CrossRef Google Scholar
Sayre, D. (1952). Acta Cryst. 5, 60–65. CrossRef CAS IUCr Journals Web of Science Google Scholar
Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381–2384. CrossRef CAS Google Scholar
Shannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10–21. Google Scholar
Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650. Web of Science CrossRef CAS Google Scholar
Sheldrick, G. M. (2015). Acta Cryst. A71, 3–8. Web of Science CrossRef IUCr Journals Google Scholar
Sheldrick, G. M., Hauptman, H. A., Weeks, C. M., Miller & Usón, I. (2012). International Tables for Crystallography, Vol. F, Crystallography of Biological Macromolecules, edited by E. Arnold, D. M. Himmel & M. G. Rossmann, ch. 16.1, pp. 413–442. Chichester: Wiley. Google Scholar
Weeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A. (1993). Acta Cryst. D49, 179–181. CrossRef CAS Web of Science IUCr Journals Google Scholar
Zachariasen, W. H. (1952). Acta Cryst. 5, 68–73. CrossRef CAS IUCr Journals Web of Science Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.