research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733

The general equation of δ direct methods and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples

crossmark logo

aInstitut de Ciència de Materials de Barcelona (CSIC), Campus de la UAB, 08193 Bellaterra, Catalonia, Spain
*Correspondence e-mail: jordi.rius@icmab.es

Edited by T. E. Gorelik, Helmholtz Centre for Infection Research, Germany (Received 28 May 2024; accepted 30 September 2024)

This paper is dedicated to the memory of Professor Carles Miravitlles.

The general equation δM(r) = ρ(r) + g(r) of the δ direct methods (δ-GEQ) is established which, when expressed in the form δM(r) − ρ(r) = g(r), is used in the SMAR phasing algorithm [Rius (2020). Acta Cryst A76, 489–493]. It is shown that SMAR is based on the alternating minimization of the two residuals Rρ(χ) = ∫V [ρ(χ) − ρ(Φ)sρ]2 dV and Rδ(Φ) = ∫V mρ[δM(χ) − ρ(Φ)sρ]2 dV in each iteration of the algorithm by maximizing the respective Sρ(Φ) and Sδ(Φ) sum functions. While Rρ(χ) converges to zero, Rδ(Φ) converges, as predicted by the theory, to a positive quantity. These two independent residuals combine δM and ρ each with |ρ| while keeping the same unknowns, leading to overdetermination for diffraction data extending to atomic resolution. At the beginning of a SMAR phase refinement, the zero part of the mρ mask [resulting from the zero conversion of the slightly negative ρ(Φ) values] occupies ∼50% of the unit-cell volume and increases by ∼5% when convergence is reached. The effects on the residuals of the two SMAR phase refinement modes, i.e. only using density functions (slow mode) supplemented by atomic constraints (fast mode), are discussed in detail. Due to its architecture, the SMAR algorithm is particularly well suited for Deep Learning. Another way of using δ-GEQ is by solving it in the form ρ(r) = δM(r) − g(r), which provides a simple new derivation of the already known δM tangent formula, the core of the δ recycling phasing algorithm [Rius (2012). Acta Cryst. A68, 399–400]. The nomenclature used here is: (i) Φ is the set of φ structure factor phases of ρ to be refined; (ii) δM(χ) = FT−1{c(|E| − 〈|E|〉)×exp(iα)} with χ = {α}, the set of phases of |ρ| and c = scaling constant; (iii) mρ = mask, being either 0 or 1; sρ is 1 or −1 depending on whether ρ(Φ) is positive or negative.

1. Introduction

Historically, direct methods were developed to solve small crystal structures directly from high-resolution single-crystal diffraction data. From their origins in the 1950s (Sayre, 1952[Sayre, D. (1952). Acta Cryst. 5, 60-65.]; Cochran, 1952[Cochran, W. (1952). Acta Cryst. 5, 65-67.]; Zachariasen, 1952[Zachariasen, W. H. (1952). Acta Cryst. 5, 68-73.]), they have seen continuous advances over the years, not only driven by the steady increase in computing power, but also by clever algorithms and efficient implementations. Computing power continues to increase but the solution of larger structures at lower resolution is still hindered in the case of equal atoms and requires the use of ubiquitous model fragments and density modification. In their widespread successful application, the time scale is in any case shorter than the experimental effort involved in X-ray structure determination.

One of the most recent advances in direct methods has been the |ρ|-based algorithm (SMAR) maximizing the SM,|ρ| (= SMAR) sum function (Rius, 2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]). It corresponds to the latest stage in the evolution of the SM origin-free modulus sum function (originally named ZR; Rius, 1993[Rius, J. (1993). Acta Cryst. A49, 406-409.]) which still involved triple-phase structure invariants. Even today, ZR is the simplest and certainly one of the most successful direct methods working exclusively in reciprocal space (Rius et al., 1995[Rius, J., Sañé, J., Miravitlles, C., Amigó, J. M. & Reventós, M. M. (1995). Acta Cryst. A51, 268-270.]). However, it was almost immediately superseded by the Shake & Bake strategy alternating between reciprocal- and real-space refinements (dual-space recycling methods) which allowed the solution of larger crystal structures (Weeks et al., 1993[Weeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A. (1993). Acta Cryst. D49, 179-181.]; Miller et al., 1993[Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). Science, 259, 1430-1433.]). Dual-space methods do not eliminate phase relationships in reciprocal space but complement them with peak picking in real space as an extreme form of density modification. A comprehensive description at the height of development of dual-space recycling methods can be found in the International Tables for Crystallography, Vol. F (Sheldrick et al., 2012[Sheldrick, G. M., Hauptman, H. A., Weeks, C. M., Miller & Usón, I. (2012). International Tables for Crystallography, Vol. F, Crystallography of Biological Macromolecules, edited by E. Arnold, D. M. Himmel & M. G. Rossmann, ch. 16.1, pp. 413-442. Chichester: Wiley.]).

Following this trend, triple-phase invariants (whose number becomes exceedingly large for large structures) were replaced in SM by the more efficient and accurate Fourier transforms (FT) (Rius et al., 2007[Rius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131-134.]; Rius, 2014[Rius, J. (2014). IUCrJ, 1, 291-304.]). More recently, the ρ2 function in SM was replaced by the mathematically simpler |ρ| function and a new mask was introduced that takes the negative values of ρ into account. Both changes have led to the SMAR phasing algorithm (Rius, 2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]). Important aspects of its application have been covered in two recent publications, the first dealing with its extension to larger crystal structures by introducing the fast inner-pixel preservation procedure (ipp) for density modification (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]) and using initial phase values derived from the modulus function. In the second publication SMAR was adapted to the solution of anomalous scattering substructures in protein crystals (Rius & Torrelles, 2022[Rius, J. & Torrelles, X. (2022). Acta Cryst. A78, 473-481.]). The present short introduction is intended to provide a brief overview of the 30-year development of that particular type of direct methods which shares the implicit or explicit use of the ρδM (or δP) approximation (Rius, 2012a[Rius, J. (2012a). Acta Cryst. A68, 77-81.]). To distinguish this family of direct methods from the rest – and also to help in their identification – the general term `δ direct methods' is coined in this publication.

The aim of this article is to complete the theoretical foundations of the SMAR phasing algorithm. The algorithm described by Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]) and shown in Fig. 1[link] essentially consists of the iterative application of the phasing formula

[\varphi_{k}^{\rm new} = {\rm phase \ of} \int_{V} \delta_{M} ({\bf r}) \, m_{\rho, t}^{\prime} ({\bf r}) \exp{(i 2\pi {\bf k} {\bf r})} \, {\rm d}{\bf r} , \eqno(1)]

where (i) φk is the phase of the kth structure factor of ρ and belongs to the set Φ of phases to be refined; (ii) δM(χ) is equal to [{\rm FT}^{-1} \left \{ c \left ( |E| - \langle |E| \rangle \right ) \exp(i\alpha) \right \}] with χ = {α} being the set of phases of |ρ| and c a scaling constant; (iii) r is a position vector inside V, the unit-cell volume; and (iv) [m_{\rho, t}^{\prime}] is a mask function defined in Table 1[link] that can take the values 1, 0 or −1. The representative test case shown in Table 1[link] (t = 2.5) indicates that the zero part in the [m_{\rho, t}^{\prime}] mask is around 50% at the beginning of a phase refinement with initially random phase values; as the refinement converges, the zero part increases by up to 5%. Most of the remaining part of the mask is taken up by ones, as the proportion of −1 values is kept very small (<1.0%).

Table 1
Values of the mρ(r) mask and the sρ(r) sign functions obtained from ρ(r, Φ) and for t = 2.5 with [\sigma_\rho^2] being the (phase-independent) variance of ρ(r) and [m_\rho^{\prime}] = sρmρ (Rius, 2020[Rius, J. (2020). Acta Cryst. A76, 489-493.])

The meanings of COREs, SPZs and SNZs are explained in Section 2.1[link]. Columns 6 and 7 give the mask compositions in % at the first and last iterations of a SMAR phase refinement reaching convergence (slow convergence mode) using the diffraction data for Actinomycin Z3 (Schäfer et al., 1998[Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381-2384.]).

Condition Corresponds to [m_\rho^{\prime}] mρ sρ First Last
ρ(r, Φ) > 0 COREs and SPZs 1 1 1 50.0 45.3
0 ≤ ρ(r, Φ) > −tσρ SNZs 0 0 −1 49.4 54.5
ρ(r, Φ) ≤ −tσρ Very negative values −1 1 −1 0.62 0.23
[Figure 1]
Figure 1
The iterative SMAR phasing algorithm in four stages. (Upper right-hand corner) Initial (or updated) φ phase estimates belonging to set Φ are combined with observed |E| values to obtain ρ and ρ(Φ)sρ = |ρ(Φ)| [the superscript 1) indicates that ρ is stored]. (Upper left) The FT of |ρ(Φ)| is calculated to get the new set χ of α phases as well as the calculated |ξ| values. (Lower left corner) The new α values are combined with the experimental ΔE = |E| − 〈|E|〉 and Fourier transformed to obtain δM(χ). The mρ mask and the sρ signs are derived from the stored ρ(Φ), and the δM(χ)sρmρ = ρ′ product is carried out. (Lower right) The FT of ρ′ supplies the updated φ phases [the superscript 2) indicates that when ipp is applied, ρ′ is further modified to ρ′′ before the FT operation] as well as the calculated [\left| {\cal E} \right|] values.

The SMAR phasing algorithm was originally derived from the SMAR sum function. A sum function such as SMAR generally corresponds to the mixed integral of a residual as e.g. in the case of expression (20)[link] in relation to (21)[link] in this article. Only when the residual is known can the derivation of the phasing algorithm be considered complete, which leads to a better understanding of it and enables, for example, the estimation of the minimum value of the residual. In this context, the use of the δMρ approximation in Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]) represented a limitation. To overcome this, the relationship between δM and ρ is worked out in Appendix A[link], resulting in δ-GEQ which, when modified accordingly, leads to one of the two desired SMAR residuals. The derivation of the second residual is simpler and introduces the phases corresponding to |ρ(Φ)| in the algorithm. To increase the readability of this article, δ-GEQ is derived separately in Appendix A[link] and a summary thereof given in Section 2.2[link].

To complete this introduction, it is interesting to mention, particularly for newcomers, that density modification in the context of direct methods was efficiently introduced in ACORN (Foadi et al., 2000[Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137-1147.]). ACORN and SHELXE (Sheldrick, 2002[Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644-650.]) both use density sharpening and negative density elimination. Later, in the VLD algorithm, a difference and a flipping term were combined (Burla et al., 2010[Burla, M. C., Caliandro, R., Giacovazzo, C. & Polidori, G. (2010). Acta Cryst. A66, 347-361.]). Finally, another related and more modern development was SHELXT, which is more broadly associated with the charge-flipping algorithm but combines it with direct methods and with density modification at part of the peak positions, used to eliminate atoms at random without atoms (Sheldrick, 2015[Sheldrick, G. M. (2015). Acta Cryst. A71, 3-8.]). One distinctive feature of the density modification in SMAR is the zero conversion of only slight negative densities and the preservation of the inner peak pixels (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]).

All calculations in this article have been performed with a modified version of the XLENS_v1 code (Rius, 2011[Rius, J. (2011). XLENS_v1: A Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain.]). The diffraction data used in the test calculations correspond to:

(i) Actinomycin Z3 with 1228 (C, N, O) + 8 Cl atoms in the unit cell. According to the refinement protocol in the Protein Data Bank (PDB code 1a7z), 4 Cl sites are partially occupied and the other 4 Cl atoms have a rather large B value, so that their scattering powers are considerably reduced. The minimum d spacing (dmin) is 0.95 Å; a = 14.803, b = 24.780 and c = 65.059 Å, space group P212121 (Schäfer et al., 1998[Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381-2384.]).

(ii) Alpha1 peptide with 503 (C, N, O) + 1 Cl. dmin = 0.90 Å; a = 20.846, b = 20.909 and c = 27.057 Å, α = 102.40, β = 95.33 and γ = 119.62°, P1 (Privé et al., 1999[Privé, G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Protein Sci. 8, 1400-1409. ]).

(iii) Pep1 with 344 (C, N, O). dmin = 1.00 Å; a = 13.999, b = 21.602 and c = 21.615 Å, P212121 (Antel et al., 1995[Antel, J., Shedrick, G. M., Bats, J. W., Kessler, H. & Müller, A. (1995). Unpublished work.]).

(iv) Suoa with 188 (C, N, O). dmin = 1.00 Å; a = 18.350, b = 21.441 and c = 8.350 Å, P212121 (Oliver & Strickland, 1984[Oliver, J. D. & Strickland, L. C. (1984). Acta Cryst. C40, 820-824.]).

2. Basic elements of the SMAR algorithm

2.1. The ρ Fourier synthesis: its mask definition and general relationship to |ρ|

When solving crystal structures by direct methods from atomic resolution X-ray diffraction intensity data, the electron-density function is normally calculated with the Fourier synthesis

[\rho \left ( {\bf r}, \Phi_{\rm T} \right ) = {{1} \over {V}} \sum_{k} \left | {E}_{k} \right | \exp{(i \varphi_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno(2)]

where |Ek| is the modulus and φk is the phase value of the (quasi)-normalized structure factor of the kth reflection (Main, 1975[Main, P. (1975). Crystallographic Computing Techniques, edited by F. R. Ahmed, p. 99. Copenhagen: Munksgaard.]). The moduli and phases of all reflections form the {|E|} and Φ = {φ} sets, respectively. For clarity, the phase type (and eventually the modulus type) used in the Fourier synthesis is added to the function name when required to avoid confusion, e.g. ρ(r, ΦT) specifies that ρ(r) is calculated with ΦT (the index T stands for true phase values).

Expression (2)[link] is stated in terms of the normalized scattering factors, i.e. [{\hat f}_j] = [{f_j}/{\sqrt {\sum \nolimits_m f_m^2}}] for an arbitrary atom j (with fj being its normal scattering factor including the Debye–Waller factor). For the sake of simplicity, a crystal structure with N equal atoms in the unit cell is assumed throughout, so that the normalized scattering factor reduces to [\hat f] = [1/\sqrt N].

Due to the limited number of Fourier terms (a consequence of the finite number of measured intensities), the ρ(r, ΦT) synthesis is affected by Fourier series truncation effects. These termination effects are mainly reflected in the broadening of the atomic peaks, each consisting of a large spherically symmetric positive central part (= CORE) surrounded by negative and positive waves (ripples) that decrease as one moves away from the peak center. For point-like atoms (for which broadening due to scattering factors and thermal vibration effects are largely removed), the limiting spherical surface of the CORE lies ∼0.72 × dmin (Å) from the peak center [it corresponds to the first zero of the T3 spreading function in Lipson & Cochran (1966[Lipson, H. & Cochran, W. (1966). The Determination of Crystalline Structures. The Crystalline State, Vol. III, edited by W. L. Bragg, pp. 324-325. London: G. Bell and Sons Ltd.])]. For locations lying outside neighboring COREs, the ripple contributions of neighboring atoms add up and result in slightly negative and slightly positive zones (SNZs and SPZs, respectively). The ρ(r, Φ) values contained in the interval [0, −tσρ] make up the SNZs (σρ and t are defined in Table 1[link]). Since the ρ distribution in the crystal is positive definite, the probability that the SNZs accommodate the COREs of atomic peaks is zero, so this information can be introduced in the form of an mρ,t mask which is 0 for all negative ρ in the [0, −tσρ] interval (negative ripple conversion to zero) and 1 elsewhere. As shown in Table 1[link], the mask value depends on ρ(r) and the t parameter (since t is always 2.5 in this work, it is suppressed in mρ,t for notation simplicity and we use simply mρ). Table 1[link] also shows that the zero part of the mask extends to at least 50% of the unit-cell volume for t ≃ 2.5 (just for comparison, for t ≥ 10 the threshold is so low that the mask values of all negative ρ values become zero).

The relationship between the syntheses |ρ(Φ)| and ρ(Φ) is given by the equality

[\left | \rho ({\bf r}, \Phi) \right | = \rho ({\bf r}, \Phi) \, s_{\rho} ({\bf r}) \quad \forall \ {\bf r} \subset V , \eqno(3)]

in which [s_\rho (r)] is 1 or −1 depending on whether ρ(r, Φ) is positive or negative. This equality is completely general, so the product ρ(r, Φ)sρ(r) can always replace |ρ(r, Φ)| in the derivation of the residuals. Finally, since ρ is positive definite and diffraction data are assumed to reach atomic resolution, it is clear for Φ = ΦT that the negative [s_\rho ({\bf r})] values in (3)[link] are always associated with small ρ(r, ΦT), so that (3)[link] becomes ρ(r, ΦT) = |ρ(r, ΦT)| [ \forall \ {\bf r} \subset V]. If we denote the phase set of the Fourier coefficients of the |ρ(r, ΦT)| synthesis by χT, then ΦTχT.

2.2. The general equation of δ direct methods and its different forms

The δM Fourier synthesis defined by

[{\delta}_{M} \left ( {\bf r}, {\Phi}_{\rm T} \right) = {{c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \exp{(i \varphi_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno (4)]

with c = [2/(|E| - 1/{\sqrt N})], is studied in detail in Appendix A[link], showing that it contains two types of positive peak (A and B). Table 2[link] lists their peak strengths and positions. The stronger peaks of type A correspond to ρ (which are resolved in the Fourier map). The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (co­incidence is accidental). These are the main constituents of the function g(r) and, due to their large number, must be severely overlapped in the unit cell. The standard form of the general equation of the δ direct methods (δ-GEQ) corresponds to the sum of both contributions ρ(r) and g(r) [equation (43)[link]]. However, δ-GEQ can be used in other forms. For example it can be solved for ρ, so δ-GEQ then takes the form

[\rho ({\bf r}) = {\delta_M} ({\bf r}) - g({\bf r}) . \eqno(5)]

Table 2
Overview of the properties of the two main peak types of the δM Fourier synthesis

The peak strength of a ρ(rl) peak corresponds to fNref/V.

Peak type (function) Peak positions Peak strengths Number in unit cell
Aρ At rl atomic positions (l = 1, N) ρ(rl) = δM(rl) N
Bg At rjlm = rj + rlrm with lm and jm (j, l, m = 1, N) g(rjlm) ≃ strength of {ρ(rl)/(N − 1)} N(N − 1)2

One obvious difficulty here is how to handle the unknown g function. This difficulty can be circumvented by introducing the mask mΔδ (being either 0 or 1), which results from the realizations that (i) δM and ρ have their strong peaks at the atomic positions; and (ii) g is formed by the positive strongly overlapped peaks of type B which are much more numerous but also much smaller than the peaks in δM and ρ. As shown in Fig. 2[link], the mΔδ mask is obtained by expressing the threshold Δδ in terms of the computable σ(δM) by means of Δδ = t1σ(δM), with t1 ≃ 2.5, such that mΔδ(r) = 1 for δM(r) ≥ Δδ and mΔδ(r) = 0 otherwise. This can be mathematically expressed by

[\rho \left({\bf r}\right) = K \delta_{M} \left({\bf r}\right) \, m_{\Delta \delta} \left({\bf r}\right) , \eqno (6)]

with K being a suitable scaling constant. Fourier transforming both sides of (6)[link], and since E and ρ are linked by the Fourier transform E = FT(ρ), the formula

[E = K \, {\rm FT}\left \{ \delta_{M} m_{\Delta \delta} \right \} \eqno(7)]

results. The angular part of (7)[link] corresponds to the δM tangent formula which forms the core of the δ recycling algorithm (Rius, 2012a[Rius, J. (2012a). Acta Cryst. A68, 77-81.],b[Rius, J. (2012b). Acta Cryst. A68, 399-400.]). It has been successfully applied to X-ray diffraction data from small crystal structures, to 3D electron diffraction data (Rius et al., 2013[Rius, J., Mugnaioli, E., Vallcorba, O. & Kolb, U. (2013). Acta Cryst. A69, 396-407.]; Capitani et al., 2014[Capitani, G. C., Mugnaioli, E., Rius, J., Gentile, P., Catelani, T., Lucotti, A. & Kolb, U. (2014). Am. Mineral. 99, 500-510.]) and, due to its robustness, to synchrotron tts (tts = through the substrate) microdiffraction data (Rius et al., 2017[Rius, J., Vallcorba, O., Crespi, A. & Colombo, F. (2017). Z. Kristallogr. 232, 827-834.]). Some considerations regarding the implementation of the δM tangent formula are given in Section A3[link].

[Figure 2]
Figure 2
(Left) Part of a hypothetical one-dimensional unit cell corresponding to an atomic position, with schematic representation of the associated δM(x) and ρ(x) [equation (5)[link]]. The binary mask mΔδ(x) is 1 if δM(x) is larger than the Δδ threshold and 0 otherwise. (Right) The same part of the unit cell shows the product function δM(x)mΔδ(x), which is proportional to ρ(x) (for equiatomic structures). If the number N of expected atoms in the unit cell is known, then the δM tangent formula reduces to a structure factor calculation over the N largest product-function peaks greater than Δδ, i.e. the integral of the FT in expression (7)[link] reduces to a sum.

SMAR uses another form of δ-GEQ in which g is isolated, namely

[{\delta}_{M} \left({\bf r}, {\Phi}_{\rm T} \right) - \rho \left({\bf r} , \Phi_{\rm T} \right) = g \left({\bf r}\right) \quad \forall \ {\bf r} \subset V . \eqno(8)]

According to ΦTχT at the end of Section 2.1[link], (8)[link] can be expressed in terms of χT (= the set of phases corresponding to |ρ(ΦT)|) so that

[{\delta}_{M} \left({\bf r}, {\chi}_{\rm T} \right) \, m_{\rho} \left({\bf r}\right) - \rho \left({\bf r}, {\chi}_{\rm T} \right) \, m_{\rho} \left({\bf r}\right) = g \left({\bf r}\right) \, m_{\rho} \left({\bf r}\right) \quad \forall \ {\bf r} \subset V , \eqno(9)]

where both sides are multiplied by mρ [which is derived from ρ(ΦT)]. Expression (9)[link] is the basic equation for one of the two SMAR residuals (Rδ). Note the positive effect of introducing the mask mρ. Since the zero part of the mask is ≥50% of the unit-cell volume, the unwanted contribution of g in (9)[link] is suppressed for at least half of the unit cell.

3. The SMAR residuals

Each iteration of the SMAR algorithm consists of two differentiated parts, ending each part with the application of the corresponding phasing formula (upper-left and lower-right corners of Fig. 1[link]). In this section, the two SMAR residuals leading to these phasing formulae are determined. In the following r is omitted unless absolutely necessary.

3.1. The Rρ(χ) residual

The |ρ(Φ)| density function results from applying the absolute value operator to the ρ(Φ) Fourier synthesis. The structure factors of |ρ(Φ)| correspond to its Fourier transform,

[\left |\xi \right| \exp{(i\alpha)} = {\rm FT} \left [ \left | \rho \left ( \Phi \right ) \right | \right ] , \eqno(10)]

with the moduli and phase values of the structure factors being globally denoted {|ξ|} and χ, respectively. The inverse Fourier transform of both sides of (10)[link] yields

[\rho \left ( \left \{ \left | \xi \right | \right \}, \chi \right ) = \left | \rho \left (\Phi \right ) \right | . \eqno (11)]

For χ = χT it can be assumed that [\left \{ \left | \xi \right | \right \} \cong \left \{ \left | E \right | \right \}]. Consequently, the integral

[{R}_{\rho} \left(\chi \right) = \int \limits_{V} \left [ \rho \left ( \left \{ \left |E \right| \right \}, \chi \right ) - \rho \left ( \left \{ \left | \xi \right | \right \}, \chi \right ) \right ]^{2} {\rm d}V \eqno (12)]

must be close to zero for [R_\rho (\chi_{\rm T})] which corresponds to the minimum of Rρ. Simplifying the notation of ρ({|E|}, χ) to ρ(χ) and replacing ρ({|ξ|}, χ) first by |ρ(Φ)| according to (11)[link] and then by ρ(Φ)sρ according to (3)[link], integral (12)[link] takes the simpler form

[{R}_{\rho} \left ( \chi \right ) = \int \limits_{V} \left [ \rho \left ( \chi \right ) - \rho \left ( \Phi \right ) {s}_{\rho} \right ]^{2} {\rm d}V . \eqno(13)]

During the refinement the function ρ(Φ)sρ is always positive. To find the new χ set minimizing Rρ(χ), the integrand of (13)[link] is developed into three integrals. The two integrals with integrands |ρ(Φ)|2 and ρ2(χ) are both equal to [{1 \over V} \sum \nolimits_k \left| E_k \right |^2] and hence phase independent; however, the third one,

[-2{S}_{\rho} \left ( \chi \right ) = -2 \int \limits_{V} \rho \left ( \chi \right ) \rho \left ( \Phi \right ) {s}_{\rho} \, {\rm d}V , \eqno (14)]

is phase dependent. The maximum of a functional like Sρ(χ) [which is equivalent to the minimum of Rρ(χ) due to the minus sign in (14)[link]] can be found by solving the condition for an extremum, ∂Sρ/∂α = 0, ∀ αχ, which, in parallel to Rius et al. (2007[Rius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131-134.]), yields the χ phasing formula,

[\alpha^{\rm new} = {\rm phase \ of \ FT} \left \{ \rho \left ( \Phi \right ) {s}_{\rho} \right \} .\eqno (15)]

3.2. The Rδ(Φ) residual

The residual Rδ is obtained from the left side of (9)[link] after generalizing χT to χ. This generalization entails two changes: (i) δM(r, χT) is simply changed to δM(r, χ), since in both cases the Fourier coefficients of δM contain the observed |E| − 〈|E|〉 values; and (ii) ρ(r, χT) is changed to ρ(r, {|ξ|}, χ). However, since ρ(r, {|ξ|}, χ) = |ρ(r, Φ)| = ρ(r, Φ)sρ(r) because of (11)[link] and then (3)[link], the selected generalized form is ρ(r, Φ)sρ(r) [this selection ensures that ρ(Φ) enters the residual expression (17)[link]]. By applying these two changes to the left-hand side of (9)[link], it becomes

[\delta_{M} \left ( {\bf r}, \chi \right ) m_{\rho} \left ( {\bf r} \right ) - \rho \left ( {\bf r}, \Phi \right ) {s}_{\rho} ({\bf r}) \, {m}_{\rho} \left ( {\bf r} \right ) \quad \forall \ {\bf r} \subset V . \eqno(16)]

Integration of (16)[link] after squaring gives the Rδ residual,

[{R}_{\delta} \left ( \Phi \right ) = \int \limits_{V} \left [ \delta_{M} \left ( \chi \right ) {m}_{\rho} - \rho \left ( \Phi \right ) {m}_{\rho} {s}_{\rho} \right ]^{2} {\rm d}V , \eqno(17)]

where ρ(Φ)mρsρ is always positive [= |ρ(Φ)|mρ]. Of method­ological importance is the minimum value of Rδ which occurs for Rδ(ΦT). An estimate of this value is obtained by squaring and integrating the right-hand side of (9)[link], namely

[{R}_{\delta} \left ( \Phi_{\rm T} \right ) = \int \limits_{V} {m}_{\rho} {g}^{2} \, {\rm d}V . \eqno(18)]

In this integral, g = δM(ΦT) − ρ(ΦT) given in (8)[link] and ρ(ΦT) used to calculate mρ are both different functions with different peak distributions. Consequently, the samples of g2 at the points where mρ = 1 can be assumed to be random, allowing the factorization of 〈mρ〉 from the integral

[{R}_{\delta} \left ( {\Phi}_{\rm T} \right ) \cong \langle {m}_{\rho} \rangle \int \limits_{V} {g}^{2} \, {\rm d}V = \langle {m}_{\rho} \rangle {I}_{{g}^{2}} .\eqno(19)]

The value of the normalized Ig2 is given by equation (49)[link] in Appendix B[link], i.e.

[{I}_{{g}^{2}} = \left ( c - 1 \right )^{2} \langle \left | E \right |^{2} \rangle_{k} - c \left ( c - 2 \right ) \left ( \langle \left | E \right | \rangle_{k} \right )^{2} \ \simeq 1.12.]

According to (19)[link], Rδ does not converge to zero but to the positive value Rδ(ΦT) ≃ 1.12 × 〈mρ〉. Since it is known from Table 1[link] that 〈mρ〉 is ∼0.45 (at the end of a converging refinement), the approximated value of Rδ(ΦT) should be 0.45 × 1.12 = 0.50.

Once the residual Rδ is defined and its minimum value known, the last step is to find the Φ phase set that minimizes Rδ. For this purpose, (17)[link] is transformed into the sum

[{R}_{\delta} = P + Q - 2{S}_{\delta} , \eqno(20)]

with P, Q and Sδ being the following integrals:

[{S}_{\delta} = \int \limits_{V} \delta_{M} (\chi) \, \rho \left ( \Phi \right ) {s}_{\rho} {m}_{\rho} \, {\rm d}V , \eqno (21)]

[P = \int \limits_{V} {\rho}^{2} (\Phi) \, {m}_{\rho} \, {\rm d}V , \eqno (22)]

[Q = \int \limits_{V} {\delta}_{M}^{2} (\chi) \, {m}_{\rho} \, {\rm d}V . \eqno (23)]

The presence of the mask mρ complicates the solution of these integrals. The interested reader can find in Section 3.2.1[link] their evaluations with the help of experimental information. The principal conclusion of Section 3.2.1[link] is that minimizing Rδ is essentially equivalent to maximizing Sδ. Knowing this, one only needs to find the desired maximum of Sδ(Φ) by solving the condition for an extremum, ∂Sδ/∂φ = 0, ∀ φΦ. By expressing ρ(Φ) in (21)[link] as a Fourier synthesis, then

[\eqalignno{ {S}_{\delta} \left ( \Phi \right ) = & \, {{1} \over {V}} \sum \limits_{k} \left | {E}_{-k} \right | \exp{(i \varphi_{-k})} \cr \times & \, \int \limits_{V} \delta_{M} \left ( {\bf r}, \chi \right ) \, {m}_{\rho} \left ( {\bf r} \right ) \, {s}_{\rho} \left ( {\bf r} \right ) \exp{(i 2\pi {\bf k} {\bf r})} \, {\rm d}{\bf r} & (24)}]

and, in parallel to Rius et al. (2007[Rius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131-134.]) and Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]), the application of the condition for an extremum to (24)[link] gives the Φ (or also SMAR) phasing formula,

[{\varphi}^{\rm new} = {\rm phase \ of \ FT} \left \{ {\delta}_{M} \left ( \chi \right ) \, m_{\rho} s_{\rho} \right \} . \eqno(25)]

For simplicity, the Fourier-transformed function δMmρsρ is denoted ρ′ in the slow convergence mode and ρ′′ in the fast convergence mode (see Section 4[link]).

3.2.1. Evolution of Rδ, Sδ, P and Q during the SMAR phase refinements

First, the values of the integrals (21)[link], (22)[link] and (23)[link] are normalized by division with

[{\rm SRO}2 = \int \limits_{V} \rho^{2} \, {\rm d}V , \eqno(26)]

which only depends on the |E| values and is therefore computable. The values of −2Sδ, P and Q during the phase refinement progress were determined with the SMAR phasing algorithm already implemented in XLENS_v1 (Rius, 2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]). The density function values used in the estimation of (21)[link], (22)[link] and (23)[link] are those available before applying ipp (Fig. 1[link]). Tables 3[link] and 4[link] show the evolutions for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide (each grouping of four numbers given in this subsection always refers to this order of test structures; the output files of the test calculations are available in the supporting information). For Actinomycin Z3, the evolutions of 2Sδ, ΔP and ΔQ are also represented in Fig. 3[link]. The evolution of the different integrals can be summarized as follows:

Table 3
Evolution of −2Sδ [equation (21)[link]], P [equation (22)[link]], Q [equation (23)[link]] and Rδ [equation (20)[link]] during a converging default SMAR refinement from random starting phases for Actinomycin Z3 (Schäfer et al., 1998[Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381-2384.]), normalized by division by SRO2 [equation (26)[link]] (t = 2.5)

kΔP〉 in (27)[link] is the proportionality constant kΔP averaged over the number of refinement cycles. The columns headed 0 and −1 list the number of zero and negative voxels, respectively, in the ρ′ map in %. The columns headed CCρ and CCρ′′ give the correlation coefficients before and after the ipp application.

Iteration −2Sδ P Q Rδ kΔP 0 −1 CCρ CCρ′′
1 0.000 0.518 0.888 1.41 49.75 0.17 0.002 0.515
2 −0.436 0.574 0.880 1.02 0.169 51.60 0.09 0.307 0.652
5 −0.622 0.601 0.890 0.87 0.162 52.32 0.08 0.425 0.710
10 −0.678 0.614 0.898 0.83 0.168 52.39 0.10 0.457 0.729
15 −0.700 0.621 0.905 0.82 0.172 52.48 0.10 0.467 0.745
20 −0.724 0.624 0.913 0.81 0.172 52.53 0.11 0.479 0.746
21 −0.744 0.628 0.915 0.80 0.172 52.62 0.10 0.491 0.767
22 −0.826 0.642 0.936 0.75 0.172 52.91 0.09 0.533 0.800
23 −1.070 0.687 0.944 0.56 0.175 55.00 0.07 0.664 0.876
24 −1.164 0.710 0.936 0.48 0.172 54.65 0.08 0.714 0.906
25 −1.169 0.710 0.934 0.48 0.172 54.70 0.07 0.717 0.908
29 −1.166 0.710 0.938 0.48 0.180 54.68 0.08 0.715 0.903
          kΔP〉 = 0.171 (6) (28×)    
Rδ value at the end of the converging refinement.

Table 4
As for Table 3[link] but for three additional test examples

Only three stages of each phase refinement have been selected (at the beginning, when convergence begins and when it ends). In all three examples QQ0 ≤ 0.02 during the phase refinement.

Data set Iteration −2Sδ P Q Rδ kΔP 0 −1 CCρ CCρ′′
Suoa 1 −0.033 0.517 0.893 1.38 49.71 0.17 0.024 0.602
(Oliver & Strickland, 1984[Oliver, J. D. & Strickland, L. C. (1984). Acta Cryst. C40, 820-824.]) 13 −0.747 0.617 0.890 0.76 0.157 53.62 0.04 0.504 0.753
27 −1.180 0.702 0.900 0.42 0.171 56.44 0.01 0.742 0.917
          kΔP〉 = 0.155 (6) (26×)    
                     
Pep1 1 −0.025 0.520 0.880 1.38 49.86 0.18 0.018 0.602
(Antel et al., 1995[Antel, J., Shedrick, G. M., Bats, J. W., Kessler, H. & Müller, A. (1995). Unpublished work.]) 14 −0.712 0.611 0.875 0.77 0.154 53.13 0.05 0.487 0.740
28 −1.169 0.702 0.899 0.43 0.173 55.76 0.02 0.736 0.907
          kΔP〉 = 0.159 (6) (27×)    
                     
Alpha1 peptide 1 −0.013 0.519 0.866 1.37 49.90 0.18 0.000 0.575
(Privé et al., 1999[Privé, G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Protein Sci. 8, 1400-1409. ]) 21 −0.651 0.605 0.861 0.82 0.156 52.40 0.09 0.451 0.711
42 −1.089 0.687 0.875 0.47 0.167 54.18 0.05 0.703 0.906
          kΔP〉 = 0.162 (5) (41×)    
[Figure 3]
Figure 3
Evolution of the normalized 2SMAR (top, in blue), (PP0) (middle, in red) and (QQ0) (bottom, in green) for a converging SMAR refinement using Actinomycin Z3 data (Schäfer et al., 1998[Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381-2384.]) (t = 2.5). See the heading of Table 3 for further details.

(i) Integral Sδ: When starting from random phase values, the initial −2Sδ values are always close to zero for all test structures and become −1.17, −1,18, −1.16 and −1.09 at the end of the respective convergent refinements.

(ii) Integral P: The initial P0 value of integral P is 0.50 for all test structures and, as the phase refinement progresses, the difference ΔP = PP0 increases. ΔP is approximately proportional to 2Sδ with the slopes 〈kΔP〉 equal to 0.171 (6), 0.155 (6), 0.159 (6) and 0.162 (5) for the four test structures (Tables 3[link] and 4[link]). Consequently, the following empirical linear relationship between P and 2Sδ can be established,

[P = 0.50 + \langle {k}_{\Delta P} \rangle {2S}_{\delta} . \eqno (27)]

(iii) Integral Q: To understand the significance of integral Q, the integral

[{\rm SDEL} = \int \limits_{V} \delta_{M}^{2} \, {\rm d}V \eqno (28)]

is also calculated for each test structure [it only depends on the (|E| − 〈|E|〉)2 quantities]. The corresponding SDEL values are 1.762, 1.790, 1.756 and 1.728. According to Tables 3[link] and 4[link], the initial Q values are Q0 = 0.89, 0.89, 0.88 and 0.89, i.e. Q0 ≃ 0.50 × SDEL. In addition, during the respective phase refinements, the largest QQ0 differences are only 0.05, 0.01, 0.02 and 0.01. Consequently, it can be assumed that [Q \simeq Q_{0}].

By taking all these results into account, Rδ can be simplified to

[{R}_{\delta} \simeq \left ( {P}_{0} + {Q}_{0} \right ) - \left ( 1 - \langle {k}_{\Delta P} \rangle \right ) 2S_{\delta} . \eqno (29)]

Since P0, Q0 and 〈kΔP〉 can be considered nearly constant during the phase refinement, it follows from (29)[link] that minimizing Rδ is essentially equivalent to maximizing Sδ.

4. The phase refinement modes in SMAR

Since most experimental results of SMAR have already been discussed by Rius & Torrelles (2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.], 2022[Rius, J. & Torrelles, X. (2022). Acta Cryst. A78, 473-481.]) and in Section 3.2.1[link] of this contribution, only a selection of points directly related to the topic of this article are treated here, grouped according to the convergence mode.

4.1. The slow convergence mode

This mode only works with density functions, that is, the positions of the atomic peaks are not used. This mode requires the inclusion of the Fourier terms of all reflections (strong + weak) in the calculation of ρ and δM. The ρ′ = δMmρsρ values entered in the SMAR phasing formula are obtained as follows:

(i) For slightly negative ρ values (amounting to 50–55% of the unit cell), the ρ′ values are 0.

(ii) For positive ρ values (regardless of their strength and representing 45–50% of the unit cell), ρ′ is equal to δM.

(iii) Only for very negative ρ values (<1% of the unit cell for t ≃ 2.5), ρ′ is equated to −δM (the minus sign multiplying δM tends to restore the sign of the very negative ρ value).

In summary, ρ′ corresponds either to unrestricted δM and −δM values or to fixed δM = 0 ones. The mask used by the SMAR algorithm results in smooth phase refinements based only on density functions. On the other hand, the experimental Rδ(ΦT) values calculated at the end of converging phase refinements using (20)[link] are 0.48, 0.42, 0.43 and 0.47 for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide, respectively (Tables 3[link] and 4[link]). These values agree with the theoretical estimation of Rδ(ΦT) ≃ 0.50 found in Section 3.2[link].

Regarding the very negative densities, some of them are due to the wrong input model (generated by the random starting phases) and disappear during the convergence process. The reductions observed in ρ′ are 0.17% → 0.08% for Actinomycin Z3, 0.18% → 0.06% for Alpha1 peptide, 0.18% → 0.02% for Pep1 and 0.17% → 0.01% for Sucrose (Tables 3[link] and 4[link]). To get an idea of the effect of t on the phasing of the intensity data of the four test structures, the sums of their total number of successful trials (out of 25) were determined for t = 1.5, 2.0, 2.5 and 10.0 (in the last case, the negative densities are all zero). The respective sums are 34, 76, 69 and 59 (Table S1 in the supporting information). The largest sums are obtained for t = 2.0 and 2.5 and the smallest for t = 1.5. For t = 10.0, the resulting sum is slightly worse than for t = 2.0 or 2.5. These results suggest that (i) the best t values are between 2.0 or 2.5, and (ii) although they represent only a small percentage of the unit cell, setting the very negative densities equal to zero is not beneficial for phase refinement. A possible explanation of the physical meaning of the very negative densities can be found in Section 3.3 of Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]). In any case, a future comprehensive study focusing on this point would be useful.

4.2. The fast convergence mode

In this mode, which is the default operating mode of the SMAR algorithm, the part working only with density functions is supplemented by an additional step in which ρ′ is modified by the ipp method to give ρ′′ (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]), which in turn replaces ρ′ in the Φ phasing formula (25)[link]. The ipp method is an effective way of accelerating the phase refinement and assumes that the approximate number N of expected atoms is known (which is normally the case). Briefly explained, ipp identifies in the ρ′ Fourier map those grid points closest to the centers of the N largest peaks. The ρ′ values of the 27 inner-peak grid points are then preserved for each peak and the remaining grid points of the ρ′ Fourier map are set to zero, giving rise to the new ρ′′. In this way no interpolation is required to find peak centers and, at the same time, the large ρ′ values are preserved. Additionally, if the grid size Δgrid is ∼0.33 Å, ipp implicitly applies the minimum interpeak separation (mips) constraint. The criterion for considering a positive maximum of the ρ′ Fourier synthesis a peak is that the voxel closest to the peak center must be surrounded by 26 smaller nearest-neighbor voxels (some can even be negative). Consequently, none of the nearest neighboring voxels can become the center of another ρ′ peak. This means that for an isometric grid element with Δgrid = 0.33 Å the average mips value is 0.955 Å [the minimum, intermediate and maximum separations are 2 × 0.33 × 1 = 0.67 Å (6×), 2 × 0.33 × [\sqrt 2] = 0.96 Å (12×) and 2 × 0.33 × [\sqrt 3] = 1.16 Å (8×), respectively]. According to (19)[link], the value of Rδ(ΦT) is proportional to 〈mρ〉. In the fast convergence mode, due to the application of ipp, 〈mρ〉 = 27N/Nvox, where Nvox is the total number of voxels in the unit cell. For Actinomycin Z3, 27N = 33372 and Nvox = 664875, and hence 〈mρ〉 = 0.05, much smaller than the typical 〈mρ〉 values for the slow convergence mode (∼0.45). Accordingly, Rδ(ΦT) = 1.12 × 0.05 = 0.06 is also much smaller than in the slow convergence mode.

A characteristic of this mode is that the calculation of ρ(Φ) only includes Fourier terms of those reflections satisfying the |E| ≥ |E|lim condition with |E|lim = 1.0, while the calculation of δM(χ) (and thus of ρ′) is always done with the Fourier terms of all k reflections (Fig. 1[link]). That only the large |E| values should participate in the ρ update is certainly related to the fact that only the 27 inner voxels close to the peak center are preserved (the rest of the peak voxels become part of the zero mask). This is supported by the fact that, in the slow convergence mode, the Fourier terms of all reflections must be included in the calculation of ρ(Φ).

4.3. The correlation coefficient

In addition to estimating Rδ, the agreement of minuends and subtrahends in (17)[link] can also be estimated using the correlation coefficient

[{\rm CC}_{\rho^{\prime}} = {{S_{MAR}} \over {\left ( P \, Q \right )^{1/2} }} \eqno (30)]

(using the density values before ipp for the slow convergence case). For refinements reaching convergence, the found CCρ values are 0.715 for Actinomycin Z3 (Table 3[link]), 0.742 for Suoa, 0.736 for Pep1 and 0.705 for Alpha1 peptide (Table 4[link]). These moderately high correlation coefficients also confirm the small discrepancy introduced in the CCρ calculation by the g contributions of those voxels with mρ = 1. As expected, the corresponding CCρ′′ values (also listed in Tables 3[link] and 4[link]) are much higher due to the smaller number of voxels with mρ = 1, e.g. CCρ = 0.72 and CCρ′′ = 0.90 for Actinomycin Z3.

5. Conclusions

The main objective of this research was to complete the theoretical aspects of the SMAR phasing algorithm. For this purpose, the connection between δM and ρ has been examined in detail. This leads to the general equation of δ direct methods (δ-GEQ) which, written in its standard form, is δM = ρ + g, where the density function g is mainly formed by a large number of small positive B-type peaks. Two ways of using δ-GEQ have been investigated. In SMAR, δ-GEQ is used in its difference form, δMρ = g, while in δ recycling it is used solved for ρ, so ρ = δMg. In this second case, it has been shown that the δM tangent formula can be derived directly from it by including a suitable mask.

Regarding the SMAR residuals it can be concluded that:

(i) Rρ(χ) measures the [ρ(χ) − ρ(Φ)sρ]2 differences in the entire unit cell. The minimum value of Rρ is Rρ(χT) ≃ 0.

(ii) The Rδ(Φ) residual is based on its basic equation (9)[link], i.e. δM(χT)mρρ(χT)mρ = gmρ, where χT are the α phases corresponding to |ρ(ΦT)|.

(iii) Rδ(Φ) measures the [δM(χ)mρρ(Φ)mρsρ]2 differences in the entire unit cell after considering that ρ(χ) ≅ ρ(Φ)sρ. The minimum value corresponds to Rδ(ΦT) [\simeq {\langle m_\rho \rangle} \, I_{g^2}] where [I_{g^2} \simeq 1.12]. It is shown that minimizing Rδ(Φ) is essentially equivalent to maximizing the sum function Sδ(Φ) [equation (21)[link]], despite the presence of the mρ mask in the Rδ(Φ) definition. In all the examples calculated by the author, the Sδ maximum (characterized by a sudden Sδ increase) is always the true solution ΦT which stands out clearly from the false solutions.

(iv) The convergence of SMAR is achieved by alternately applying the χ and Φ (or SMAR) phasing formulas in each iteration. These are αnew = phase of FT{ρ(Φ)sρ} and φnew = phase of FT{δM(χ)mρsρ}, respectively.

It has been found that at the start of a SMAR refinement, the zero part of the mask (created by converting the slightly negative density function values to zero) occupies 50% of V and increases by ∼5% after convergence. According to Rδ(ΦT) ≃ 1.12〈mρ〉 the presence of the zero part of the mask leads to a drop in the Rδ value when convergence begins, since the volume of the regions with only a g contribution is reduced. When the number N of expected atoms is actively used (fast convergence mode), each SMAR iteration is supplemented with the ipp application, which increases significantly the volume of the zero part of the mask.

Finally, a brief reflection is in order. It is known that non-crystalline materials have a continuous diffraction pattern and that oversampling of the intensity data (Shannon, 1949[Shannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10-21.]) results in an overdetermined system of equations from which the phases can be solved (even at non-atomic resolution) (Miao et al., 2000[Miao, J., Kirz, J. & Sayre, D. (2000). Acta Cryst. D56, 1312-1315.]). It is also known that oversampling cannot be applied to crystals due to their 3D periodicity. Here it is shown that, in the case of crystals, the combination of δM and ρ each with |ρ| produces two independent residuals while keeping the same unknowns. This also leads to overdetermination which should explain the observed efficiency of SMAR. It is also interesting to note that the SMAR algorithm is particularly well suited for Deep Learning due to its architecture.

APPENDIX A

Derivation of the general equation of δ direct methods

A1. Definition and peak analysis of the δM Fourier synthesis

The term δM is defined in the unit cell by

[{\delta}_{M} \left ( {\bf r}, {\Phi}_{\rm T} \right ) = {{c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno (31)]

with {φ} = ΦT and with c being an appropriate scaling constant. It can be reformulated by making use of the δM = δP/2 equality (Rius, 2012a[Rius, J. (2012a). Acta Cryst. A68, 77-81.]), wherein δP is the Fourier synthesis which differs from (31)[link] only in that (|Ek| − 〈|E|〉) is replaced by (|Ek|2 − 〈|E|2〉). Consequently, (31)[link] can be written as

[{\delta}_{M} \left ( {\bf r} \right ) = {{c} \over {2V}} \sum \limits_{k} {{\left ( \left | {E}_{k} \right |^{2} - \langle \left | E \right |^{2} \rangle \right )} \over {\left | {E}_{k} \right |}} \left | {E}_{k} \right | \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} . \eqno (32)]

By expressing (|Ek|2 − |E|2) and [\left | {E_k} \right | \exp{(i {\varphi _k})}] in terms of atom positions, it follows that

[\eqalignno{ & {\delta}_{M} \left ({\bf r} \right) = \cr & {{c} \over {2V}} \sum \limits_{k} {{1} \over {\left | {E}_{k} \right |}} \sum \limits_{j} {\hat f}_{j} \sum \limits_{l}\sum \limits_{m(\ne l)}{\hat f}_{l} \, {\hat f}_{m} \exp{\left [ i2\pi {\bf k}\left({{\bf r}}_{j}+{{\bf r}}_{l}-{{\bf r}}_{m}-{\bf r}\right) \right ]} , \cr &&(33)}]

with [{\hat f}_j] denoting the normalized scattering factor of atom j. As can be derived from (33)[link], δM has N3N2 peaks at r = rj + rlrm (lm), since [\exp{[i 2\pi {\bf k} ({\bf r}_j + {\bf r}_l - {\bf r}_m - {\bf r})]}] is the unit for all k. Of interest are the values of δM at the N atomic positions, i.e. at r = rl with l = 1, N. It can easily be verified that there are N − 1 superposed peaks contributing to δM(rl), i.e. those that satisfy the equation rl = rj + rlrm with lm and j = m, e.g. for N = 3 and r2 these are r1 + r2r1 and r3 + r2r3. To estimate the total strength of the δM peak at r = rl, expression (33)[link] is rearranged into

[\eqalignno{{\delta}_{M} \left ( {\bf r} \right ) = & \, {{c} \over{2V}} \sum \limits_{l} {\hat f}_{l} \sum \limits_{k} \exp{\left [ i 2\pi {\bf k} \left ( {\bf r}_{l} - {\bf r} \right ) \right ]} \cr & \, \times \sum \limits_{m(\ne l)} {\hat f}_{m} \exp{\left ( -i 2\pi {\bf k} {\bf r}_{m} \right )} \sum \limits_{j} {{1} \over {\left | {E}_{k} \right |}} {\hat f}_{j} \exp{ \left ( i 2\pi {\bf k} {\bf r}_{j} \right )} , \cr && (34)}]

with

[\sum \limits_{j} {{1} \over {\left | {E}_{k} \right |}} {\hat f}_{j} \exp{(i 2\pi {\bf k} {\bf r}_{j})} = \exp{(i {\varphi}_{k})} , \eqno(35)]

[\sum \limits_{m(\ne l)} {\hat f}_{m} \exp{(-i 2\pi {\bf k} {\bf r}_{m})} = \left | {E}_{k} \right | \exp{(-i {\varphi}_{k})} - {\hat f}_{l} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} . \eqno(36)]

For r = rl, the [\exp{\left [ i 2\pi {\bf k} \left ( {\bf r}_l - {\bf r} \right ) \right ]}] term in (34)[link] becomes unity. In addition, if (35)[link] and (36)[link] are considered, expression (34)[link] can be further simplified to

[\eqalignno{ {\delta}_{M} \left ({\bf r}_{l} \right ) = & \, {{c} \over {2V}} \, {\hat f}_{l} \sum \limits_{k} \left [ \left | {E}_{k} \right | - {\hat f}_{l} \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} \right ] &(37) \cr = & \, {{c} \over {2}} \left [ \left ( {{{\hat f}_{l} {N}_{k}} \over {V}} \right ) \langle \left | E \right | \rangle_{k} - {\hat f}_{l} \rho \left ( {\bf r}_{l} \right ) \right ] , &(38)}]

since [\sum \nolimits_k |{E_k}|] = 〈|E|〉kNk and, for an equal atom structure, it holds that

[{{1} \over {V}} \sum \limits_{k} {\hat f}_{l} \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} = \rho \left ( {\bf r}_{l} \right ) . \eqno (39)]

The strength of an atomic peak in ρ is [{\hat f}_l {N_k}/V], so (38)[link] can be simplified to

[{\delta}_{M} \left ( {\bf r}_{l} \right ) = c {{{\langle \left | E \right | \rangle}_{k} - (1/{\sqrt N})} \over {2}} \rho \left ( {\bf r}_{l} \right) \eqno (40)]

by considering [{\hat f}_l] = [1/{\sqrt N}]. Finally, by making

[c = 2/\left [ \langle \left | E \right | \rangle_{k} - (1/{\sqrt N}) \right ] , \eqno (41)]

the strength of δM(rl) is equal to ρ(rl) in (40)[link]. The δM peaks placed at atomic positions compose the set of type A peaks.

Let us consider the remaining δM peaks at r = rj + rlrm with lm and jm, which form the set of type B peaks. For a given B peak, the corresponding r position is obtained by adding the rjrm, jm, interatomic vector to the atomic position vector rl, so the superposition of r with an atomic position is accidental. Consequently, the strength of a (single) B-type peak, e.g. at rjlm, is weaker than that of a δM(rl) peak (formed by the superposition of N − 1 single peaks). According to (40)[link], it holds that

[{\rm strength \ of \ } {\delta_M} \left ( {\bf r}_{jlm} \right ) \approx {\rm strength \ of \ } \left [ \rho \left ( {\bf r}_{l} \right )/\left ( N - 1 \right ) \right ] . \eqno (42)]

A2. The general equation for δ(M) direct methods

According to the above results, δM(ΦT) contains two types of positive peaks (A and B). Table 2[link] lists their peak strengths and positions. The stronger peaks of type A correspond to ρ and are resolved in the Fourier map. The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (co­incidence is accidental). Due to their large number, i.e. N(N − 1)2 [for comparison, there are only N(N − 1) non-origin peaks in the modulus function], these peaks must have a strong overlap in the unit cell. The peaks of type B are the main constituents of the g function. Consequently, δM can be considered as the sum of both contributions, i.e.

[{\delta}_{M} \left ( {\bf r} \right ) = \rho \left ( {\bf r} \right ) + g \left ( {\bf r} \right ) \quad \forall \ {\bf r} \subset V , \eqno(43)]

which is called the general equation of δ(M) direct methods. In the following, the subscript M is left out of the equation name for the sake of generality. (Note that the g function could also contain contributions from even weaker unconsidered peaks.)

A3. The δM Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms

In general, the Fourier coefficients of δM and ρ have different moduli and may differ in phase values. This is not obvious when comparing (2)[link] and (4)[link] because in both expressions the corresponding Fourier terms have the same φ phase values (which is quite useful for programming). However, when the absolute values ||E| − 〈|E|〉| are used in the synthesis, a phase shift Δφ must be added to each φ to take into account the sign. Consequently, while the Fourier coefficients of ρ are |E|exp(iφ), those of δM are ||E| − 〈|E|〉|exp[i(φ + Δφ)], with Δφ = 0 for |E| > 〈|E|〉 (strong reflections) and Δφ = π for |E| < 〈|E|〉 (weak reflections).

APPENDIX B

Solution of integral Ig2 =∫Vg2 dV

Assuming that the integrand of Ig2 is g2 = [δM(ΦT) − ρ(ΦT)]2, squaring the binomial gives

[I_{g^2} = \int \limits_{V} \left [ {\delta}_{M}^{2} \left ( {\Phi}_{\rm T} \right ) + {\rho}^{2} \left ( {\Phi}_{\rm T} \right ) - 2{\delta}_{M} \left ( {\Phi}_{\rm T} \right ) \, \rho \left ( {\Phi}_{\rm T} \right ) \right ] \, {\rm d}V . \eqno (44)]

The integral of the first two summands in (44)[link] when expressed in terms of the respective Fourier coefficients is

[\eqalignno{ \int \limits_{V} \left ( {\delta}_{M}^{2} + {\rho}^{2} \right ) \, {\rm d}V = & \, {{1} \over {V}} \sum \limits_{k} \left \{ \left [ c \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \right ]^{2} + \left | {E}_{k} \right |^{2} \right \} \cr = & \, {{1} \over {V}} \left [ \left ( {c}^{2} + 1 \right ) \sum \limits_{k} \left | {E}_{k} \right |^{2} - {c}^{2} \langle \left | E \right | \rangle^{2} \right ] . &(45)}]

If one proceeds analogously with the third summand in (44)[link], it follows that

[\eqalignno{ & -2 \int \limits_{V} {\delta}_{M} \left ( {\Phi}_{\rm T} \right ) \rho \left ( {\Phi}_{\rm T} \right ) \, {\rm d}V = \cr & \quad \quad {{-2c} \over {V^2}} \sum \limits_{k} \sum \limits_{k^{\prime}} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k^{\prime}} \right | \exp{\left [ i \left ( {\varphi}_{k} + {\varphi}_{k^{\prime}} \right ) \right ]} \cr & \quad \quad \times \int \limits_{V} \exp{\left [ -i 2\pi \left ( k + k^{\prime} \right ) {\bf r} \right ]} \, {\rm d} {\bf r} . &(46)}]

This means that the integral is zero except for k′ = −k where it becomes V, so that the final term in (46)[link] can be approximated by

[{{-2c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k} \right | \exp{\left [ i \left ( {\varphi}_{k} - {\varphi}_{k} \right ) \right ]} .\eqno(47)]

The values of the exponentials in (47)[link] are 1, so that it reduces to

[ {{-2c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k} \right | .\eqno(48)]

By adding (45)[link] and (48)[link] and normalizing by [\int \nolimits_{V} {\rho}^{2} \, {\rm d}V] = [{N}_{k} \langle \left | E \right |^{2} \rangle_{k}/V] = Nk/V, the final expression of the integral Ig2 is obtained, namely

[I_{g^2} = \left ( c - 1 \right )^{2} \langle \left | E \right |^{2} \rangle_{k} - c \left ( c - 2 \right ) \, \left ( \langle \left | E \right | \rangle_{k} \right )^{2} . \eqno (49)]

From the theory of the distribution of |E| values, the values of 〈|E|2〉, 〈|E|〉 (acentric case) and c ≃ 2/〈|E|〉 can be derived, i.e. 1.00, 0.89 and 2.25, respectively, so that (49)[link] gives Ig2 ≃ 1.12. Note that Ig2 only depends on |E| and |E| − 〈|E|〉, since the phases cancel out. Consequently, the values of Ig2 for g2 equal to [δM(ΦT) − ρ(ΦT)]2 and to [δM(χT) − ρ(χT)]2 are identical.

Supporting information


Acknowledgements

The author thanks Dr Xavier Torrelles and Professor Salvador Galí for valuable suggestions and the reviewers for their constructive criticisms.

Funding information

The following funding is acknowledged: Agencia Estatal de Investigación (grant Nos. PID2021-124734OB-C22 and CEX2023-001263-S). The author acknowledges the support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).

References

First citationAntel, J., Shedrick, G. M., Bats, J. W., Kessler, H. & Müller, A. (1995). Unpublished work.  Google Scholar
First citationBurla, M. C., Caliandro, R., Giacovazzo, C. & Polidori, G. (2010). Acta Cryst. A66, 347–361.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationCapitani, G. C., Mugnaioli, E., Rius, J., Gentile, P., Catelani, T., Lucotti, A. & Kolb, U. (2014). Am. Mineral. 99, 500–510.  Web of Science CrossRef ICSD Google Scholar
First citationCochran, W. (1952). Acta Cryst. 5, 65–67.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationFoadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137–1147.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationLipson, H. & Cochran, W. (1966). The Determination of Crystalline Structures. The Crystalline State, Vol. III, edited by W. L. Bragg, pp. 324–325. London: G. Bell and Sons Ltd.  Google Scholar
First citationMain, P. (1975). Crystallographic Computing Techniques, edited by F. R. Ahmed, p. 99. Copenhagen: Munksgaard.  Google Scholar
First citationMiao, J., Kirz, J. & Sayre, D. (2000). Acta Cryst. D56, 1312–1315.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMiller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). Science, 259, 1430–1433.  CSD CrossRef CAS PubMed Web of Science Google Scholar
First citationOliver, J. D. & Strickland, L. C. (1984). Acta Cryst. C40, 820–824.  CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationPrivé, G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Protein Sci. 8, 1400–1409.   PubMed Google Scholar
First citationRius, J. (1993). Acta Cryst. A49, 406–409.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationRius, J. (2011). XLENS_v1: A Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain.  Google Scholar
First citationRius, J. (2012a). Acta Cryst. A68, 77–81.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRius, J. (2012b). Acta Cryst. A68, 399–400.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRius, J. (2014). IUCrJ, 1, 291–304.  Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationRius, J. (2020). Acta Cryst. A76, 489–493.  Web of Science CrossRef IUCr Journals Google Scholar
First citationRius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131–134.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRius, J., Mugnaioli, E., Vallcorba, O. & Kolb, U. (2013). Acta Cryst. A69, 396–407.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationRius, J., Sañé, J., Miravitlles, C., Amigó, J. M. & Reventós, M. M. (1995). Acta Cryst. A51, 268–270.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationRius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339–347.  Web of Science CrossRef IUCr Journals Google Scholar
First citationRius, J. & Torrelles, X. (2022). Acta Cryst. A78, 473–481.  CrossRef IUCr Journals Google Scholar
First citationRius, J., Vallcorba, O., Crespi, A. & Colombo, F. (2017). Z. Kristallogr. 232, 827–834.  CrossRef Google Scholar
First citationSayre, D. (1952). Acta Cryst. 5, 60–65.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSchäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381–2384.  CrossRef CAS Google Scholar
First citationShannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10–21.  Google Scholar
First citationSheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650.  Web of Science CrossRef CAS Google Scholar
First citationSheldrick, G. M. (2015). Acta Cryst. A71, 3–8.  Web of Science CrossRef IUCr Journals Google Scholar
First citationSheldrick, G. M., Hauptman, H. A., Weeks, C. M., Miller & Usón, I. (2012). International Tables for Crystallography, Vol. F, Crystallography of Biological Macromolecules, edited by E. Arnold, D. M. Himmel & M. G. Rossmann, ch. 16.1, pp. 413–442. Chichester: Wiley.  Google Scholar
First citationWeeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A. (1993). Acta Cryst. D49, 179–181.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationZachariasen, W. H. (1952). Acta Cryst. 5, 68–73.  CrossRef CAS IUCr Journals Web of Science Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733
Follow Acta Cryst. A
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds