The general equation of δ direct methods and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples

Rius, J.

doi:10.1107/S2053273324009628

research papers

FOUNDATIONS
ADVANCES

ISSN: 2053-2733

Volume 81| Part 1| January 2025| Pages 16-25

https://doi.org/10.1107/S2053273324009628

Open

access

The general equation of δ direct methods and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples

Jordi Rius ^a ^*

^aInstitut de Ciència de Materials de Barcelona (CSIC), Campus de la UAB, 08193 Bellaterra, Catalonia, Spain
^*Correspondence e-mail: [email protected]

Edited by T. E. Gorelik, Helmholtz Centre for Infection Research, Germany (Received 28 May 2024; accepted 30 September 2024)

This paper is dedicated to the memory of Professor Carles Miravitlles.

The general equation δ_M(r) = ρ(r) + g(r) of the δ direct methods (δ-GEQ) is established which, when expressed in the form δ_M(r) − ρ(r) = g(r), is used in the SMAR phasing algorithm [Rius (2020). Acta Cryst A76, 489–493]. It is shown that SMAR is based on the alternating minimization of the two residuals R_ρ(χ) = ∫_V [ρ(χ) − ρ(Φ)s_ρ]² dV and R_δ(Φ) = ∫_V m_ρ[δ_M(χ) − ρ(Φ)s_ρ]² dV in each iteration of the algorithm by maximizing the respective S_ρ(Φ) and S_δ(Φ) sum functions. While R_ρ(χ) converges to zero, R_δ(Φ) converges, as predicted by the theory, to a positive quantity. These two independent residuals combine δ_M and ρ each with |ρ| while keeping the same unknowns, leading to overdetermination for diffraction data extending to atomic resolution. At the beginning of a SMAR phase refinement, the zero part of the m_ρ mask [resulting from the zero conversion of the slightly negative ρ(Φ) values] occupies ∼50% of the unit-cell volume and increases by ∼5% when convergence is reached. The effects on the residuals of the two SMAR phase refinement modes, i.e. only using density functions (slow mode) supplemented by atomic constraints (fast mode), are discussed in detail. Due to its architecture, the SMAR algorithm is particularly well suited for Deep Learning. Another way of using δ-GEQ is by solving it in the form ρ(r) = δ_M(r) − g(r), which provides a simple new derivation of the already known δ_M tangent formula, the core of the δ recycling phasing algorithm [Rius (2012). Acta Cryst. A68, 399–400]. The nomenclature used here is: (i) Φ is the set of φ structure factor phases of ρ to be refined; (ii) δ_M(χ) = FT⁻¹{c(|E| − 〈|E|〉)×exp(iα)} with χ = {α}, the set of phases of |ρ| and c = scaling constant; (iii) m_ρ = mask, being either 0 or 1; s_ρ is 1 or −1 depending on whether ρ(Φ) is positive or negative.

Keywords: SMAR phasing algorithm; δ-GEQ; δ direct methods; SMAR residuals; ipp density modification; crystal structure solution; origin-free modulus sum function; δ recycling algorithm; δ_M tangent formula.

1. Introduction

Historically, direct methods were developed to solve small crystal structures directly from high-resolution single-crystal diffraction data. From their origins in the 1950s (Sayre, 1952 ; Cochran, 1952 ; Zachariasen, 1952 ), they have seen continuous advances over the years, not only driven by the steady increase in computing power, but also by clever algorithms and efficient implementations. Computing power continues to increase but the solution of larger structures at lower resolution is still hindered in the case of equal atoms and requires the use of ubiquitous model fragments and density modification. In their widespread successful application, the time scale is in any case shorter than the experimental effort involved in X-ray structure determination.

One of the most recent advances in direct methods has been the |ρ|-based algorithm (SMAR) maximizing the S_M,|ρ| (= S_MAR) sum function (Rius, 2020 ). It corresponds to the latest stage in the evolution of the S_M origin-free modulus sum function (originally named Z_R; Rius, 1993 ) which still involved triple-phase structure invariants. Even today, Z_R is the simplest and certainly one of the most successful direct methods working exclusively in reciprocal space (Rius et al., 1995 ). However, it was almost immediately superseded by the Shake & Bake strategy alternating between reciprocal- and real-space refinements (dual-space recycling methods) which allowed the solution of larger crystal structures (Weeks et al., 1993 ; Miller et al., 1993 ). Dual-space methods do not eliminate phase relationships in reciprocal space but complement them with peak picking in real space as an extreme form of density modification. A comprehensive description at the height of development of dual-space recycling methods can be found in the International Tables for Crystallography, Vol. F (Sheldrick et al., 2012 ).

Following this trend, triple-phase invariants (whose number becomes exceedingly large for large structures) were replaced in S_M by the more efficient and accurate Fourier transforms (FT) (Rius et al., 2007 ; Rius, 2014 ). More recently, the ρ² function in S_M was replaced by the mathematically simpler |ρ| function and a new mask was introduced that takes the negative values of ρ into account. Both changes have led to the SMAR phasing algorithm (Rius, 2020). Important aspects of its application have been covered in two recent publications, the first dealing with its extension to larger crystal structures by introducing the fast inner-pixel preservation procedure (ipp) for density modification (Rius & Torrelles, 2021 ) and using initial phase values derived from the modulus function. In the second publication SMAR was adapted to the solution of anomalous scattering substructures in protein crystals (Rius & Torrelles, 2022 ). The present short introduction is intended to provide a brief overview of the 30-year development of that particular type of direct methods which shares the implicit or explicit use of the ρ ≃ δ_M (or δ_P) approximation (Rius, 2012a ). To distinguish this family of direct methods from the rest – and also to help in their identification – the general term `δ direct methods' is coined in this publication.

The aim of this article is to complete the theoretical foundations of the SMAR phasing algorithm. The algorithm described by Rius (2020) and shown in Fig. 1 essentially consists of the iterative application of the phasing formula

$[\varphi_{k}^{\rm new} = {\rm phase \ of} \int_{V} \delta_{M} ({\bf r}) \, m_{\rho, t}^{\prime} ({\bf r}) \exp{(i 2\pi {\bf k} {\bf r})} \, {\rm d}{\bf r} , \eqno(1)]$

where (i) φ_k is the phase of the kth structure factor of ρ and belongs to the set Φ of phases to be refined; (ii) δ_M(χ) is equal to $[{\rm FT}^{-1} \left \{ c \left ( |E| - \langle |E| \rangle \right ) \exp(i\alpha) \right \}]$ with χ = {α} being the set of phases of |ρ| and c a scaling constant; (iii) r is a position vector inside V, the unit-cell volume; and (iv) $[m_{\rho, t}^{\prime}]$ is a mask function defined in Table 1 that can take the values 1, 0 or −1. The representative test case shown in Table 1 (t = 2.5) indicates that the zero part in the $[m_{\rho, t}^{\prime}]$ mask is around 50% at the beginning of a phase refinement with initially random phase values; as the refinement converges, the zero part increases by up to 5%. Most of the remaining part of the mask is taken up by ones, as the proportion of −1 values is kept very small (<1.0%).

Table 1
Values of the m_ρ(r) mask and the s_ρ(r) sign functions obtained from ρ(r, Φ) and for t = 2.5 with $[\sigma_\rho^2]$ being the (phase-independent) variance of ρ(r) and $[m_\rho^{\prime}]$ = s_ρm_ρ (Rius, 2020)

The meanings of COREs, SPZs and SNZs are explained in Section 2.1. Columns 6 and 7 give the mask compositions in % at the first and last iterations of a SMAR phase refinement reaching convergence (slow convergence mode) using the diffraction data for Actinomycin Z3 (Schäfer et al., 1998).

Condition	Corresponds to	$[m_\rho^{\prime}]$	m_ρ	s_ρ	First	Last
ρ(r, Φ) > 0	COREs and SPZs	1	1	1	50.0	45.3
0 ≤ ρ(r, Φ) > −tσ_ρ	SNZs	0	0	−1	49.4	54.5
ρ(r, Φ) ≤ −tσ_ρ	Very negative values	−1	1	−1	0.62	0.23

Figure 1
The iterative SMAR phasing algorithm in four stages. (Upper right-hand corner) Initial (or updated) φ phase estimates belonging to set Φ are combined with observed |E| values to obtain ρ and ρ(Φ)s_ρ = |ρ(Φ)| [the superscript ¹⁾ indicates that ρ is stored]. (Upper left) The FT of |ρ(Φ)| is calculated to get the new set χ of α phases as well as the calculated |ξ| values. (Lower left corner) The new α values are combined with the experimental ΔE = |E| − 〈|E|〉 and Fourier transformed to obtain δ_M(χ). The m_ρ mask and the s_ρ signs are derived from the stored ρ(Φ), and the δ_M(χ)s_ρm_ρ = ρ′ product is carried out. (Lower right) The FT of ρ′ supplies the updated φ phases [the superscript ²⁾ indicates that when ipp is applied, ρ′ is further modified to ρ′′ before the FT operation] as well as the calculated $[\left| {\cal E} \right|]$ values.

The SMAR phasing algorithm was originally derived from the S_MAR sum function. A sum function such as S_MAR generally corresponds to the mixed integral of a residual as e.g. in the case of expression (20) in relation to (21) in this article. Only when the residual is known can the derivation of the phasing algorithm be considered complete, which leads to a better understanding of it and enables, for example, the estimation of the minimum value of the residual. In this context, the use of the δ_M ≃ ρ approximation in Rius (2020) represented a limitation. To overcome this, the relationship between δ_M and ρ is worked out in Appendix A, resulting in δ-GEQ which, when modified accordingly, leads to one of the two desired SMAR residuals. The derivation of the second residual is simpler and introduces the phases corresponding to |ρ(Φ)| in the algorithm. To increase the readability of this article, δ-GEQ is derived separately in Appendix A and a summary thereof given in Section 2.2.

To complete this introduction, it is interesting to mention, particularly for newcomers, that density modification in the context of direct methods was efficiently introduced in ACORN (Foadi et al., 2000 ). ACORN and SHELXE (Sheldrick, 2002 ) both use density sharpening and negative density elimination. Later, in the VLD algorithm, a difference and a flipping term were combined (Burla et al., 2010 ). Finally, another related and more modern development was SHELXT, which is more broadly associated with the charge-flipping algorithm but combines it with direct methods and with density modification at part of the peak positions, used to eliminate atoms at random without atoms (Sheldrick, 2015 ). One distinctive feature of the density modification in SMAR is the zero conversion of only slight negative densities and the preservation of the inner peak pixels (Rius & Torrelles, 2021).

All calculations in this article have been performed with a modified version of the XLENS_v1 code (Rius, 2011 $[Rius, J. (2011). XLENS_v1: A Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain.]$ ). The diffraction data used in the test calculations correspond to:

(i) Actinomycin Z3 with 1228 (C, N, O) + 8 Cl atoms in the unit cell. According to the refinement protocol in the Protein Data Bank (PDB code 1a7z), 4 Cl sites are partially occupied and the other 4 Cl atoms have a rather large B value, so that their scattering powers are considerably reduced. The minimum d spacing (d_min) is 0.95 Å; a = 14.803, b = 24.780 and c = 65.059 Å, space group P2₁2₁2₁ (Schäfer et al., 1998 ).

(ii) Alpha1 peptide with 503 (C, N, O) + 1 Cl. d_min = 0.90 Å; a = 20.846, b = 20.909 and c = 27.057 Å, α = 102.40, β = 95.33 and γ = 119.62°, P1 (Privé et al., 1999 ).

(iii) Pep1 with 344 (C, N, O). d_min = 1.00 Å; a = 13.999, b = 21.602 and c = 21.615 Å, P2₁2₁2₁ (Antel et al., 1995 ).

(iv) Suoa with 188 (C, N, O). d_min = 1.00 Å; a = 18.350, b = 21.441 and c = 8.350 Å, P2₁2₁2₁ (Oliver & Strickland, 1984 ).

2. Basic elements of the SMAR algorithm

2.1. The ρ Fourier synthesis: its mask definition and general relationship to |ρ|

When solving crystal structures by direct methods from atomic resolution X-ray diffraction intensity data, the electron-density function is normally calculated with the Fourier synthesis

$[\rho \left ( {\bf r}, \Phi_{\rm T} \right ) = {{1} \over {V}} \sum_{k} \left | {E}_{k} \right | \exp{(i \varphi_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno(2)]$

where |E_k| is the modulus and φ_k is the phase value of the (quasi)-normalized structure factor of the kth reflection (Main, 1975 ). The moduli and phases of all reflections form the {|E|} and Φ = {φ} sets, respectively. For clarity, the phase type (and eventually the modulus type) used in the Fourier synthesis is added to the function name when required to avoid confusion, e.g. ρ(r, Φ_T) specifies that ρ(r) is calculated with Φ_T (the index T stands for true phase values).

Expression (2) is stated in terms of the normalized scattering factors, i.e. $[{\hat f}_j]$ = $[{f_j}/{\sqrt {\sum \nolimits_m f_m^2}}]$ for an arbitrary atom j (with f_j being its normal scattering factor including the Debye–Waller factor). For the sake of simplicity, a crystal structure with N equal atoms in the unit cell is assumed throughout, so that the normalized scattering factor reduces to $[\hat f]$ = $[1/\sqrt N]$ .

Due to the limited number of Fourier terms (a consequence of the finite number of measured intensities), the ρ(r, Φ_T) synthesis is affected by Fourier series truncation effects. These termination effects are mainly reflected in the broadening of the atomic peaks, each consisting of a large spherically symmetric positive central part (= CORE) surrounded by negative and positive waves (ripples) that decrease as one moves away from the peak center. For point-like atoms (for which broadening due to scattering factors and thermal vibration effects are largely removed), the limiting spherical surface of the CORE lies ∼0.72 × d_min (Å) from the peak center [it corresponds to the first zero of the T₃ spreading function in Lipson & Cochran (1966 )]. For locations lying outside neighboring COREs, the ripple contributions of neighboring atoms add up and result in slightly negative and slightly positive zones (SNZs and SPZs, respectively). The ρ(r, Φ) values contained in the interval [0, −tσ_ρ] make up the SNZs (σ_ρ and t are defined in Table 1). Since the ρ distribution in the crystal is positive definite, the probability that the SNZs accommodate the COREs of atomic peaks is zero, so this information can be introduced in the form of an m_ρ,t mask which is 0 for all negative ρ in the [0, −tσ_ρ] interval (negative ripple conversion to zero) and 1 elsewhere. As shown in Table 1, the mask value depends on ρ(r) and the t parameter (since t is always 2.5 in this work, it is suppressed in m_ρ,t for notation simplicity and we use simply m_ρ). Table 1 also shows that the zero part of the mask extends to at least 50% of the unit-cell volume for t ≃ 2.5 (just for comparison, for t ≥ 10 the threshold is so low that the mask values of all negative ρ values become zero).

The relationship between the syntheses |ρ(Φ)| and ρ(Φ) is given by the equality

$[\left | \rho ({\bf r}, \Phi) \right | = \rho ({\bf r}, \Phi) \, s_{\rho} ({\bf r}) \quad \forall \ {\bf r} \subset V , \eqno(3)]$

in which $[s_\rho (r)]$ is 1 or −1 depending on whether ρ(r, Φ) is positive or negative. This equality is completely general, so the product ρ(r, Φ)s_ρ(r) can always replace |ρ(r, Φ)| in the derivation of the residuals. Finally, since ρ is positive definite and diffraction data are assumed to reach atomic resolution, it is clear for Φ = Φ_T that the negative $[s_\rho ({\bf r})]$ values in (3) are always associated with small ρ(r, Φ_T), so that (3) becomes ρ(r, Φ_T) = |ρ(r, Φ_T)| $[ \forall \ {\bf r} \subset V]$ . If we denote the phase set of the Fourier coefficients of the |ρ(r, Φ_T)| synthesis by χ_T, then Φ_T ≅ χ_T.

2.2. The general equation of δ direct methods and its different forms

The δ_M Fourier synthesis defined by

$[{\delta}_{M} \left ( {\bf r}, {\Phi}_{\rm T} \right) = {{c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \exp{(i \varphi_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno (4)]$

with c = $[2/(|E| - 1/{\sqrt N})]$ , is studied in detail in Appendix A, showing that it contains two types of positive peak (A and B). Table 2 lists their peak strengths and positions. The stronger peaks of type A correspond to ρ (which are resolved in the Fourier map). The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (coincidence is accidental). These are the main constituents of the function g(r) and, due to their large number, must be severely overlapped in the unit cell. The standard form of the general equation of the δ direct methods (δ-GEQ) corresponds to the sum of both contributions ρ(r) and g(r) [equation (43)]. However, δ-GEQ can be used in other forms. For example it can be solved for ρ, so δ-GEQ then takes the form

$[\rho ({\bf r}) = {\delta_M} ({\bf r}) - g({\bf r}) . \eqno(5)]$

Table 2
Overview of the properties of the two main peak types of the δ_M Fourier synthesis

The peak strength of a ρ(r_l) peak corresponds to fN_ref/V.

Peak type (function)	Peak positions	Peak strengths	Number in unit cell
A → ρ	At r_l atomic positions (l = 1, N)	ρ(r_l) = δ_M(r_l)	N
B → g	At r_jlm = r_j + r_l − r_m with l ≠ m and j ≠ m (j, l, m = 1, N)	g(r_jlm) ≃ strength of {ρ(r_l)/(N − 1)}	N(N − 1)²

One obvious difficulty here is how to handle the unknown g function. This difficulty can be circumvented by introducing the mask m_Δδ (being either 0 or 1), which results from the realizations that (i) δ_M and ρ have their strong peaks at the atomic positions; and (ii) g is formed by the positive strongly overlapped peaks of type B which are much more numerous but also much smaller than the peaks in δ_M and ρ. As shown in Fig. 2, the m_Δδ mask is obtained by expressing the threshold Δδ in terms of the computable σ(δ_M) by means of Δδ = t₁σ(δ_M), with t₁ ≃ 2.5, such that m_Δδ(r) = 1 for δ_M(r) ≥ Δδ and m_Δδ(r) = 0 otherwise. This can be mathematically expressed by

$[\rho \left({\bf r}\right) = K \delta_{M} \left({\bf r}\right) \, m_{\Delta \delta} \left({\bf r}\right) , \eqno (6)]$

with K being a suitable scaling constant. Fourier transforming both sides of (6), and since E and ρ are linked by the Fourier transform E = FT(ρ), the formula

$[E = K \, {\rm FT}\left \{ \delta_{M} m_{\Delta \delta} \right \} \eqno(7)]$

results. The angular part of (7) corresponds to the δ_M tangent formula which forms the core of the δ recycling algorithm (Rius, 2012a,b ). It has been successfully applied to X-ray diffraction data from small crystal structures, to 3D electron diffraction data (Rius et al., 2013 ; Capitani et al., 2014 ) and, due to its robustness, to synchrotron tts (tts = through the substrate) microdiffraction data (Rius et al., 2017 ). Some considerations regarding the implementation of the δ_M tangent formula are given in Section A3.

Figure 2
(Left) Part of a hypothetical one-dimensional unit cell corresponding to an atomic position, with schematic representation of the associated δ_M(x) and ρ(x) [equation (5)

]. The binary mask m_Δδ(x) is 1 if δ_M(x) is larger than the Δδ threshold and 0 otherwise. (Right) The same part of the unit cell shows the product function δ_M(x)m_Δδ(x), which is proportional to ρ(x) (for equiatomic structures). If the number N of expected atoms in the unit cell is known, then the δ_M tangent formula reduces to a structure factor calculation over the N largest product-function peaks greater than Δδ, i.e. the integral of the FT in expression (7)

reduces to a sum.

SMAR uses another form of δ-GEQ in which g is isolated, namely

$[{\delta}_{M} \left({\bf r}, {\Phi}_{\rm T} \right) - \rho \left({\bf r} , \Phi_{\rm T} \right) = g \left({\bf r}\right) \quad \forall \ {\bf r} \subset V . \eqno(8)]$

According to Φ_T ≅ χ_T at the end of Section 2.1, (8) can be expressed in terms of χ_T (= the set of phases corresponding to |ρ(Φ_T)|) so that

$[{\delta}_{M} \left({\bf r}, {\chi}_{\rm T} \right) \, m_{\rho} \left({\bf r}\right) - \rho \left({\bf r}, {\chi}_{\rm T} \right) \, m_{\rho} \left({\bf r}\right) = g \left({\bf r}\right) \, m_{\rho} \left({\bf r}\right) \quad \forall \ {\bf r} \subset V , \eqno(9)]$

where both sides are multiplied by m_ρ [which is derived from ρ(Φ_T)]. Expression (9) is the basic equation for one of the two SMAR residuals (R_δ). Note the positive effect of introducing the mask m_ρ. Since the zero part of the mask is ≥50% of the unit-cell volume, the unwanted contribution of g in (9) is suppressed for at least half of the unit cell.

3. The SMAR residuals

Each iteration of the SMAR algorithm consists of two differentiated parts, ending each part with the application of the corresponding phasing formula (upper-left and lower-right corners of Fig. 1). In this section, the two SMAR residuals leading to these phasing formulae are determined. In the following r is omitted unless absolutely necessary.

3.1. The R_ρ(χ) residual

The |ρ(Φ)| density function results from applying the absolute value operator to the ρ(Φ) Fourier synthesis. The structure factors of |ρ(Φ)| correspond to its Fourier transform,

$[\left |\xi \right| \exp{(i\alpha)} = {\rm FT} \left [ \left | \rho \left ( \Phi \right ) \right | \right ] , \eqno(10)]$

with the moduli and phase values of the structure factors being globally denoted {|ξ|} and χ, respectively. The inverse Fourier transform of both sides of (10) yields

$[\rho \left ( \left \{ \left | \xi \right | \right \}, \chi \right ) = \left | \rho \left (\Phi \right ) \right | . \eqno (11)]$

For χ = χ_T it can be assumed that $[\left \{ \left | \xi \right | \right \} \cong \left \{ \left | E \right | \right \}]$ . Consequently, the integral

$[{R}_{\rho} \left(\chi \right) = \int \limits_{V} \left [ \rho \left ( \left \{ \left |E \right| \right \}, \chi \right ) - \rho \left ( \left \{ \left | \xi \right | \right \}, \chi \right ) \right ]^{2} {\rm d}V \eqno (12)]$

must be close to zero for $[R_\rho (\chi_{\rm T})]$ which corresponds to the minimum of R_ρ. Simplifying the notation of ρ({|E|}, χ) to ρ(χ) and replacing ρ({|ξ|}, χ) first by |ρ(Φ)| according to (11) and then by ρ(Φ)s_ρ according to (3), integral (12) takes the simpler form

$[{R}_{\rho} \left ( \chi \right ) = \int \limits_{V} \left [ \rho \left ( \chi \right ) - \rho \left ( \Phi \right ) {s}_{\rho} \right ]^{2} {\rm d}V . \eqno(13)]$

During the refinement the function ρ(Φ)s_ρ is always positive. To find the new χ set minimizing R_ρ(χ), the integrand of (13) is developed into three integrals. The two integrals with integrands |ρ(Φ)|² and ρ²(χ) are both equal to $[{1 \over V} \sum \nolimits_k \left| E_k \right |^2]$ and hence phase independent; however, the third one,

$[-2{S}_{\rho} \left ( \chi \right ) = -2 \int \limits_{V} \rho \left ( \chi \right ) \rho \left ( \Phi \right ) {s}_{\rho} \, {\rm d}V , \eqno (14)]$

is phase dependent. The maximum of a functional like S_ρ(χ) [which is equivalent to the minimum of R_ρ(χ) due to the minus sign in (14)] can be found by solving the condition for an extremum, ∂S_ρ/∂α = 0, ∀ α ∈ χ, which, in parallel to Rius et al. (2007), yields the χ phasing formula,

$[\alpha^{\rm new} = {\rm phase \ of \ FT} \left \{ \rho \left ( \Phi \right ) {s}_{\rho} \right \} .\eqno (15)]$

3.2. The R_δ(Φ) residual

The residual R_δ is obtained from the left side of (9) after generalizing χ_T to χ. This generalization entails two changes: (i) δ_M(r, χ_T) is simply changed to δ_M(r, χ), since in both cases the Fourier coefficients of δ_M contain the observed |E| − 〈|E|〉 values; and (ii) ρ(r, χ_T) is changed to ρ(r, {|ξ|}, χ). However, since ρ(r, {|ξ|}, χ) = |ρ(r, Φ)| = ρ(r, Φ)s_ρ(r) because of (11) and then (3), the selected generalized form is ρ(r, Φ)s_ρ(r) [this selection ensures that ρ(Φ) enters the residual expression (17)]. By applying these two changes to the left-hand side of (9), it becomes

$[\delta_{M} \left ( {\bf r}, \chi \right ) m_{\rho} \left ( {\bf r} \right ) - \rho \left ( {\bf r}, \Phi \right ) {s}_{\rho} ({\bf r}) \, {m}_{\rho} \left ( {\bf r} \right ) \quad \forall \ {\bf r} \subset V . \eqno(16)]$

Integration of (16) after squaring gives the R_δ residual,

$[{R}_{\delta} \left ( \Phi \right ) = \int \limits_{V} \left [ \delta_{M} \left ( \chi \right ) {m}_{\rho} - \rho \left ( \Phi \right ) {m}_{\rho} {s}_{\rho} \right ]^{2} {\rm d}V , \eqno(17)]$

where ρ(Φ)m_ρs_ρ is always positive [= |ρ(Φ)|m_ρ]. Of methodological importance is the minimum value of R_δ which occurs for R_δ(Φ_T). An estimate of this value is obtained by squaring and integrating the right-hand side of (9), namely

$[{R}_{\delta} \left ( \Phi_{\rm T} \right ) = \int \limits_{V} {m}_{\rho} {g}^{2} \, {\rm d}V . \eqno(18)]$

In this integral, g = δ_M(Φ_T) − ρ(Φ_T) given in (8) and ρ(Φ_T) used to calculate m_ρ are both different functions with different peak distributions. Consequently, the samples of g² at the points where m_ρ = 1 can be assumed to be random, allowing the factorization of 〈m_ρ〉 from the integral

$[{R}_{\delta} \left ( {\Phi}_{\rm T} \right ) \cong \langle {m}_{\rho} \rangle \int \limits_{V} {g}^{2} \, {\rm d}V = \langle {m}_{\rho} \rangle {I}_{{g}^{2}} .\eqno(19)]$

The value of the normalized I_g² is given by equation (49) in Appendix B, i.e.

$[{I}_{{g}^{2}} = \left ( c - 1 \right )^{2} \langle \left | E \right |^{2} \rangle_{k} - c \left ( c - 2 \right ) \left ( \langle \left | E \right | \rangle_{k} \right )^{2} \ \simeq 1.12.]$

According to (19), R_δ does not converge to zero but to the positive value R_δ(Φ_T) ≃ 1.12 × 〈m_ρ〉. Since it is known from Table 1 that 〈m_ρ〉 is ∼0.45 (at the end of a converging refinement), the approximated value of R_δ(Φ_T) should be 0.45 × 1.12 = 0.50.

Once the residual R_δ is defined and its minimum value known, the last step is to find the Φ phase set that minimizes R_δ. For this purpose, (17) is transformed into the sum

$[{R}_{\delta} = P + Q - 2{S}_{\delta} , \eqno(20)]$

with P, Q and S_δ being the following integrals:

$[{S}_{\delta} = \int \limits_{V} \delta_{M} (\chi) \, \rho \left ( \Phi \right ) {s}_{\rho} {m}_{\rho} \, {\rm d}V , \eqno (21)]$

$[P = \int \limits_{V} {\rho}^{2} (\Phi) \, {m}_{\rho} \, {\rm d}V , \eqno (22)]$

$[Q = \int \limits_{V} {\delta}_{M}^{2} (\chi) \, {m}_{\rho} \, {\rm d}V . \eqno (23)]$

The presence of the mask m_ρ complicates the solution of these integrals. The interested reader can find in Section 3.2.1 their evaluations with the help of experimental information. The principal conclusion of Section 3.2.1 is that minimizing R_δ is essentially equivalent to maximizing S_δ. Knowing this, one only needs to find the desired maximum of S_δ(Φ) by solving the condition for an extremum, ∂S_δ/∂φ = 0, ∀ φ ∈ Φ. By expressing ρ(Φ) in (21) as a Fourier synthesis, then

$[\eqalignno{ {S}_{\delta} \left ( \Phi \right ) = & \, {{1} \over {V}} \sum \limits_{k} \left | {E}_{-k} \right | \exp{(i \varphi_{-k})} \cr \times & \, \int \limits_{V} \delta_{M} \left ( {\bf r}, \chi \right ) \, {m}_{\rho} \left ( {\bf r} \right ) \, {s}_{\rho} \left ( {\bf r} \right ) \exp{(i 2\pi {\bf k} {\bf r})} \, {\rm d}{\bf r} & (24)}]$

and, in parallel to Rius et al. (2007) and Rius (2020), the application of the condition for an extremum to (24) gives the Φ (or also SMAR) phasing formula,

$[{\varphi}^{\rm new} = {\rm phase \ of \ FT} \left \{ {\delta}_{M} \left ( \chi \right ) \, m_{\rho} s_{\rho} \right \} . \eqno(25)]$

For simplicity, the Fourier-transformed function δ_Mm_ρs_ρ is denoted ρ′ in the slow convergence mode and ρ′′ in the fast convergence mode (see Section 4).

3.2.1. Evolution of R_δ, S_δ, P and Q during the SMAR phase refinements

First, the values of the integrals (21), (22) and (23) are normalized by division with

$[{\rm SRO}2 = \int \limits_{V} \rho^{2} \, {\rm d}V , \eqno(26)]$

which only depends on the |E| values and is therefore computable. The values of −2S_δ, P and Q during the phase refinement progress were determined with the SMAR phasing algorithm already implemented in XLENS_v1 (Rius, 2020). The density function values used in the estimation of (21), (22) and (23) are those available before applying ipp (Fig. 1). Tables 3 and 4 show the evolutions for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide (each grouping of four numbers given in this subsection always refers to this order of test structures; the output files of the test calculations are available in the supporting information). For Actinomycin Z3, the evolutions of 2S_δ, ΔP and ΔQ are also represented in Fig. 3. The evolution of the different integrals can be summarized as follows:

Table 3
Evolution of −2S_δ [equation (21)], P [equation (22)], Q [equation (23)] and R_δ [equation (20)] during a converging default SMAR refinement from random starting phases for Actinomycin Z3 (Schäfer et al., 1998), normalized by division by SRO2 [equation (26)] (t = 2.5)

〈k_ΔP〉 in (27) is the proportionality constant k_ΔP averaged over the number of refinement cycles. The columns headed 0 and −1 list the number of zero and negative voxels, respectively, in the ρ′ map in %. The columns headed CC_ρ′ and CC_ρ′′ give the correlation coefficients before and after the ipp application.

Iteration	−2S_δ	P	Q	R_δ	k_ΔP	0	−1	CC_ρ′	CC_ρ′′
1	0.000	0.518	0.888	1.41	–	49.75	0.17	0.002	0.515
2	−0.436	0.574	0.880	1.02	0.169	51.60	0.09	0.307	0.652
5	−0.622	0.601	0.890	0.87	0.162	52.32	0.08	0.425	0.710
10	−0.678	0.614	0.898	0.83	0.168	52.39	0.10	0.457	0.729
15	−0.700	0.621	0.905	0.82	0.172	52.48	0.10	0.467	0.745
20	−0.724	0.624	0.913	0.81	0.172	52.53	0.11	0.479	0.746
21	−0.744	0.628	0.915	0.80	0.172	52.62	0.10	0.491	0.767
22	−0.826	0.642	0.936	0.75	0.172	52.91	0.09	0.533	0.800
23	−1.070	0.687	0.944	0.56	0.175	55.00	0.07	0.664	0.876
24	−1.164	0.710	0.936	0.48	0.172	54.65	0.08	0.714	0.906
25	−1.169	0.710	0.934	0.48	0.172	54.70	0.07	0.717	0.908
29	−1.166	0.710	0.938	0.48†	0.180	54.68	0.08	0.715	0.903
					〈k_ΔP〉 = 0.171 (6) (28×)

†R_δ value at the end of the converging refinement.

Table 4
As for Table 3 but for three additional test examples

Only three stages of each phase refinement have been selected (at the beginning, when convergence begins and when it ends). In all three examples Q − Q₀ ≤ 0.02 during the phase refinement.

Data set	Iteration	−2S_δ	P	Q	R_δ	k_ΔP	0	−1	CC_ρ′	CC_ρ′′
Suoa	1	−0.033	0.517	0.893	1.38	–	49.71	0.17	0.024	0.602
(Oliver & Strickland, 1984)	13	−0.747	0.617	0.890	0.76	0.157	53.62	0.04	0.504	0.753
	27	−1.180	0.702	0.900	0.42	0.171	56.44	0.01	0.742	0.917
						〈k_ΔP〉 = 0.155 (6) (26×)

Pep1	1	−0.025	0.520	0.880	1.38	–	49.86	0.18	0.018	0.602
(Antel et al., 1995)	14	−0.712	0.611	0.875	0.77	0.154	53.13	0.05	0.487	0.740
	28	−1.169	0.702	0.899	0.43	0.173	55.76	0.02	0.736	0.907
						〈k_ΔP〉 = 0.159 (6) (27×)

Alpha1 peptide	1	−0.013	0.519	0.866	1.37	–	49.90	0.18	0.000	0.575
(Privé et al., 1999)	21	−0.651	0.605	0.861	0.82	0.156	52.40	0.09	0.451	0.711
	42	−1.089	0.687	0.875	0.47	0.167	54.18	0.05	0.703	0.906
						〈k_ΔP〉 = 0.162 (5) (41×)

Figure 3
Evolution of the normalized 2S_MAR (top, in blue), (P − P₀) (middle, in red) and (Q − Q₀) (bottom, in green) for a converging SMAR refinement using Actinomycin Z3 data (Schäfer et al., 1998

) (t = 2.5). See the heading of Table 3 for further details.

(i) Integral S_δ: When starting from random phase values, the initial −2S_δ values are always close to zero for all test structures and become −1.17, −1,18, −1.16 and −1.09 at the end of the respective convergent refinements.

(ii) Integral P: The initial P₀ value of integral P is 0.50 for all test structures and, as the phase refinement progresses, the difference ΔP = P − P₀ increases. ΔP is approximately proportional to 2S_δ with the slopes 〈k_ΔP〉 equal to 0.171 (6), 0.155 (6), 0.159 (6) and 0.162 (5) for the four test structures (Tables 3 and 4). Consequently, the following empirical linear relationship between P and 2S_δ can be established,

$[P = 0.50 + \langle {k}_{\Delta P} \rangle {2S}_{\delta} . \eqno (27)]$

(iii) Integral Q: To understand the significance of integral Q, the integral

$[{\rm SDEL} = \int \limits_{V} \delta_{M}^{2} \, {\rm d}V \eqno (28)]$

is also calculated for each test structure [it only depends on the (|E| − 〈|E|〉)² quantities]. The corresponding SDEL values are 1.762, 1.790, 1.756 and 1.728. According to Tables 3 and 4, the initial Q values are Q₀ = 0.89, 0.89, 0.88 and 0.89, i.e. Q₀ ≃ 0.50 × SDEL. In addition, during the respective phase refinements, the largest Q − Q₀ differences are only 0.05, 0.01, 0.02 and 0.01. Consequently, it can be assumed that $[Q \simeq Q_{0}]$ .

By taking all these results into account, R_δ can be simplified to

$[{R}_{\delta} \simeq \left ( {P}_{0} + {Q}_{0} \right ) - \left ( 1 - \langle {k}_{\Delta P} \rangle \right ) 2S_{\delta} . \eqno (29)]$

Since P₀, Q₀ and 〈k_ΔP〉 can be considered nearly constant during the phase refinement, it follows from (29) that minimizing R_δ is essentially equivalent to maximizing S_δ.

4. The phase refinement modes in SMAR

Since most experimental results of SMAR have already been discussed by Rius & Torrelles (2021, 2022) and in Section 3.2.1 of this contribution, only a selection of points directly related to the topic of this article are treated here, grouped according to the convergence mode.

4.1. The slow convergence mode

This mode only works with density functions, that is, the positions of the atomic peaks are not used. This mode requires the inclusion of the Fourier terms of all reflections (strong + weak) in the calculation of ρ and δ_M. The ρ′ = δ_Mm_ρs_ρ values entered in the SMAR phasing formula are obtained as follows:

(i) For slightly negative ρ values (amounting to 50–55% of the unit cell), the ρ′ values are 0.

(ii) For positive ρ values (regardless of their strength and representing 45–50% of the unit cell), ρ′ is equal to δ_M.

(iii) Only for very negative ρ values (<1% of the unit cell for t ≃ 2.5), ρ′ is equated to −δ_M (the minus sign multiplying δ_M tends to restore the sign of the very negative ρ value).

In summary, ρ′ corresponds either to unrestricted δ_M and −δ_M values or to fixed δ_M = 0 ones. The mask used by the SMAR algorithm results in smooth phase refinements based only on density functions. On the other hand, the experimental R_δ(Φ_T) values calculated at the end of converging phase refinements using (20) are 0.48, 0.42, 0.43 and 0.47 for Actinomycin Z3, Suoa, Pep1 and Alpha1 peptide, respectively (Tables 3 and 4). These values agree with the theoretical estimation of R_δ(Φ_T) ≃ 0.50 found in Section 3.2.

Regarding the very negative densities, some of them are due to the wrong input model (generated by the random starting phases) and disappear during the convergence process. The reductions observed in ρ′ are 0.17% → 0.08% for Actinomycin Z3, 0.18% → 0.06% for Alpha1 peptide, 0.18% → 0.02% for Pep1 and 0.17% → 0.01% for Sucrose (Tables 3 and 4). To get an idea of the effect of t on the phasing of the intensity data of the four test structures, the sums of their total number of successful trials (out of 25) were determined for t = 1.5, 2.0, 2.5 and 10.0 (in the last case, the negative densities are all zero). The respective sums are 34, 76, 69 and 59 (Table S1 in the supporting information). The largest sums are obtained for t = 2.0 and 2.5 and the smallest for t = 1.5. For t = 10.0, the resulting sum is slightly worse than for t = 2.0 or 2.5. These results suggest that (i) the best t values are between 2.0 or 2.5, and (ii) although they represent only a small percentage of the unit cell, setting the very negative densities equal to zero is not beneficial for phase refinement. A possible explanation of the physical meaning of the very negative densities can be found in Section 3.3 of Rius (2020). In any case, a future comprehensive study focusing on this point would be useful.

4.2. The fast convergence mode

In this mode, which is the default operating mode of the SMAR algorithm, the part working only with density functions is supplemented by an additional step in which ρ′ is modified by the ipp method to give ρ′′ (Rius & Torrelles, 2021), which in turn replaces ρ′ in the Φ phasing formula (25). The ipp method is an effective way of accelerating the phase refinement and assumes that the approximate number N of expected atoms is known (which is normally the case). Briefly explained, ipp identifies in the ρ′ Fourier map those grid points closest to the centers of the N largest peaks. The ρ′ values of the 27 inner-peak grid points are then preserved for each peak and the remaining grid points of the ρ′ Fourier map are set to zero, giving rise to the new ρ′′. In this way no interpolation is required to find peak centers and, at the same time, the large ρ′ values are preserved. Additionally, if the grid size Δ_grid is ∼0.33 Å, ipp implicitly applies the minimum interpeak separation (mips) constraint. The criterion for considering a positive maximum of the ρ′ Fourier synthesis a peak is that the voxel closest to the peak center must be surrounded by 26 smaller nearest-neighbor voxels (some can even be negative). Consequently, none of the nearest neighboring voxels can become the center of another ρ′ peak. This means that for an isometric grid element with Δ_grid = 0.33 Å the average mips value is 0.955 Å [the minimum, intermediate and maximum separations are 2 × 0.33 × 1 = 0.67 Å (6×), 2 × 0.33 × $[\sqrt 2]$ = 0.96 Å (12×) and 2 × 0.33 × $[\sqrt 3]$ = 1.16 Å (8×), respectively]. According to (19), the value of R_δ(Φ_T) is proportional to 〈m_ρ〉. In the fast convergence mode, due to the application of ipp, 〈m_ρ〉 = 27N/N_vox, where N_vox is the total number of voxels in the unit cell. For Actinomycin Z3, 27N = 33372 and N_vox = 664875, and hence 〈m_ρ〉 = 0.05, much smaller than the typical 〈m_ρ〉 values for the slow convergence mode (∼0.45). Accordingly, R_δ(Φ_T) = 1.12 × 0.05 = 0.06 is also much smaller than in the slow convergence mode.

A characteristic of this mode is that the calculation of ρ(Φ) only includes Fourier terms of those reflections satisfying the |E| ≥ |E|_lim condition with |E|_lim = 1.0, while the calculation of δ_M(χ) (and thus of ρ′) is always done with the Fourier terms of all k reflections (Fig. 1). That only the large |E| values should participate in the ρ update is certainly related to the fact that only the 27 inner voxels close to the peak center are preserved (the rest of the peak voxels become part of the zero mask). This is supported by the fact that, in the slow convergence mode, the Fourier terms of all reflections must be included in the calculation of ρ(Φ).

4.3. The correlation coefficient

In addition to estimating R_δ, the agreement of minuends and subtrahends in (17) can also be estimated using the correlation coefficient

$[{\rm CC}_{\rho^{\prime}} = {{S_{MAR}} \over {\left ( P \, Q \right )^{1/2} }} \eqno (30)]$

(using the density values before ipp for the slow convergence case). For refinements reaching convergence, the found CC_ρ′ values are 0.715 for Actinomycin Z3 (Table 3), 0.742 for Suoa, 0.736 for Pep1 and 0.705 for Alpha1 peptide (Table 4). These moderately high correlation coefficients also confirm the small discrepancy introduced in the CC_ρ′ calculation by the g contributions of those voxels with m_ρ = 1. As expected, the corresponding CC_ρ′′ values (also listed in Tables 3 and 4) are much higher due to the smaller number of voxels with m_ρ = 1, e.g. CC_ρ′ = 0.72 and CC_ρ′′ = 0.90 for Actinomycin Z3.

5. Conclusions

The main objective of this research was to complete the theoretical aspects of the SMAR phasing algorithm. For this purpose, the connection between δ_M and ρ has been examined in detail. This leads to the general equation of δ direct methods (δ-GEQ) which, written in its standard form, is δ_M = ρ + g, where the density function g is mainly formed by a large number of small positive B-type peaks. Two ways of using δ-GEQ have been investigated. In SMAR, δ-GEQ is used in its difference form, δ_M − ρ = g, while in δ recycling it is used solved for ρ, so ρ = δ_M − g. In this second case, it has been shown that the δ_M tangent formula can be derived directly from it by including a suitable mask.

Regarding the SMAR residuals it can be concluded that:

(i) R_ρ(χ) measures the [ρ(χ) − ρ(Φ)s_ρ]² differences in the entire unit cell. The minimum value of R_ρ is R_ρ(χ_T) ≃ 0.

(ii) The R_δ(Φ) residual is based on its basic equation (9), i.e. δ_M(χ_T)m_ρ − ρ(χ_T)m_ρ = gm_ρ, where χ_T are the α phases corresponding to |ρ(Φ_T)|.

(iii) R_δ(Φ) measures the [δ_M(χ)m_ρ − ρ(Φ)m_ρs_ρ]² differences in the entire unit cell after considering that ρ(χ) ≅ ρ(Φ)s_ρ. The minimum value corresponds to R_δ(Φ_T) $[\simeq {\langle m_\rho \rangle} \, I_{g^2}]$ where $[I_{g^2} \simeq 1.12]$ . It is shown that minimizing R_δ(Φ) is essentially equivalent to maximizing the sum function S_δ(Φ) [equation (21)], despite the presence of the m_ρ mask in the R_δ(Φ) definition. In all the examples calculated by the author, the S_δ maximum (characterized by a sudden S_δ increase) is always the true solution Φ_T which stands out clearly from the false solutions.

(iv) The convergence of SMAR is achieved by alternately applying the χ and Φ (or SMAR) phasing formulas in each iteration. These are α^new = phase of FT{ρ(Φ)s_ρ} and φ^new = phase of FT{δ_M(χ)m_ρs_ρ}, respectively.

It has been found that at the start of a SMAR refinement, the zero part of the mask (created by converting the slightly negative density function values to zero) occupies 50% of V and increases by ∼5% after convergence. According to R_δ(Φ_T) ≃ 1.12〈m_ρ〉 the presence of the zero part of the mask leads to a drop in the R_δ value when convergence begins, since the volume of the regions with only a g contribution is reduced. When the number N of expected atoms is actively used (fast convergence mode), each SMAR iteration is supplemented with the ipp application, which increases significantly the volume of the zero part of the mask.

Finally, a brief reflection is in order. It is known that non-crystalline materials have a continuous diffraction pattern and that oversampling of the intensity data (Shannon, 1949 ) results in an overdetermined system of equations from which the phases can be solved (even at non-atomic resolution) (Miao et al., 2000 ). It is also known that oversampling cannot be applied to crystals due to their 3D periodicity. Here it is shown that, in the case of crystals, the combination of δ_M and ρ each with |ρ| produces two independent residuals while keeping the same unknowns. This also leads to overdetermination which should explain the observed efficiency of SMAR. It is also interesting to note that the SMAR algorithm is particularly well suited for Deep Learning due to its architecture.

APPENDIX A

Derivation of the general equation of δ direct methods

A1. Definition and peak analysis of the δ_M Fourier synthesis

The term δ_M is defined in the unit cell by

$[{\delta}_{M} \left ( {\bf r}, {\Phi}_{\rm T} \right ) = {{c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} , \eqno (31)]$

with {φ} = Φ_T and with c being an appropriate scaling constant. It can be reformulated by making use of the δ_M = δ_P/2 equality (Rius, 2012a), wherein δ_P is the Fourier synthesis which differs from (31) only in that (|E_k| − 〈|E|〉) is replaced by (|E_k|² − 〈|E|²〉). Consequently, (31) can be written as

$[{\delta}_{M} \left ( {\bf r} \right ) = {{c} \over {2V}} \sum \limits_{k} {{\left ( \left | {E}_{k} \right |^{2} - \langle \left | E \right |^{2} \rangle \right )} \over {\left | {E}_{k} \right |}} \left | {E}_{k} \right | \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r})} . \eqno (32)]$

By expressing (|E_k|² − |E|²) and $[\left | {E_k} \right | \exp{(i {\varphi _k})}]$ in terms of atom positions, it follows that

$[\eqalignno{ & {\delta}_{M} \left ({\bf r} \right) = \cr & {{c} \over {2V}} \sum \limits_{k} {{1} \over {\left | {E}_{k} \right |}} \sum \limits_{j} {\hat f}_{j} \sum \limits_{l}\sum \limits_{m(\ne l)}{\hat f}_{l} \, {\hat f}_{m} \exp{\left [ i2\pi {\bf k}\left({{\bf r}}_{j}+{{\bf r}}_{l}-{{\bf r}}_{m}-{\bf r}\right) \right ]} , \cr &&(33)}]$

with $[{\hat f}_j]$ denoting the normalized scattering factor of atom j. As can be derived from (33), δ_M has N³ − N² peaks at r = r_j + r_l − r_m (l ≠ m), since $[\exp{[i 2\pi {\bf k} ({\bf r}_j + {\bf r}_l - {\bf r}_m - {\bf r})]}]$ is the unit for all k. Of interest are the values of δ_M at the N atomic positions, i.e. at r = r_l with l = 1, N. It can easily be verified that there are N − 1 superposed peaks contributing to δ_M(r_l), i.e. those that satisfy the equation r_l = r_j + r_l − r_m with l ≠ m and j = m, e.g. for N = 3 and r₂ these are r₁ + r₂ − r₁ and r₃ + r₂ − r₃. To estimate the total strength of the δ_M peak at r = r_l, expression (33) is rearranged into

$[\eqalignno{{\delta}_{M} \left ( {\bf r} \right ) = & \, {{c} \over{2V}} \sum \limits_{l} {\hat f}_{l} \sum \limits_{k} \exp{\left [ i 2\pi {\bf k} \left ( {\bf r}_{l} - {\bf r} \right ) \right ]} \cr & \, \times \sum \limits_{m(\ne l)} {\hat f}_{m} \exp{\left ( -i 2\pi {\bf k} {\bf r}_{m} \right )} \sum \limits_{j} {{1} \over {\left | {E}_{k} \right |}} {\hat f}_{j} \exp{ \left ( i 2\pi {\bf k} {\bf r}_{j} \right )} , \cr && (34)}]$

with

$[\sum \limits_{j} {{1} \over {\left | {E}_{k} \right |}} {\hat f}_{j} \exp{(i 2\pi {\bf k} {\bf r}_{j})} = \exp{(i {\varphi}_{k})} , \eqno(35)]$

$[\sum \limits_{m(\ne l)} {\hat f}_{m} \exp{(-i 2\pi {\bf k} {\bf r}_{m})} = \left | {E}_{k} \right | \exp{(-i {\varphi}_{k})} - {\hat f}_{l} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} . \eqno(36)]$

For r = r_l, the $[\exp{\left [ i 2\pi {\bf k} \left ( {\bf r}_l - {\bf r} \right ) \right ]}]$ term in (34) becomes unity. In addition, if (35) and (36) are considered, expression (34) can be further simplified to

$[\eqalignno{ {\delta}_{M} \left ({\bf r}_{l} \right ) = & \, {{c} \over {2V}} \, {\hat f}_{l} \sum \limits_{k} \left [ \left | {E}_{k} \right | - {\hat f}_{l} \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} \right ] &(37) \cr = & \, {{c} \over {2}} \left [ \left ( {{{\hat f}_{l} {N}_{k}} \over {V}} \right ) \langle \left | E \right | \rangle_{k} - {\hat f}_{l} \rho \left ( {\bf r}_{l} \right ) \right ] , &(38)}]$

since $[\sum \nolimits_k |{E_k}|]$ = 〈|E|〉_kN_k and, for an equal atom structure, it holds that

$[{{1} \over {V}} \sum \limits_{k} {\hat f}_{l} \exp{(i {\varphi}_{k})} \exp{(-i 2\pi {\bf k} {\bf r}_{l})} = \rho \left ( {\bf r}_{l} \right ) . \eqno (39)]$

The strength of an atomic peak in ρ is $[{\hat f}_l {N_k}/V]$ , so (38) can be simplified to

$[{\delta}_{M} \left ( {\bf r}_{l} \right ) = c {{{\langle \left | E \right | \rangle}_{k} - (1/{\sqrt N})} \over {2}} \rho \left ( {\bf r}_{l} \right) \eqno (40)]$

by considering $[{\hat f}_l]$ = $[1/{\sqrt N}]$ . Finally, by making

$[c = 2/\left [ \langle \left | E \right | \rangle_{k} - (1/{\sqrt N}) \right ] , \eqno (41)]$

the strength of δ_M(r_l) is equal to ρ(r_l) in (40). The δ_M peaks placed at atomic positions compose the set of type A peaks.

Let us consider the remaining δ_M peaks at r = r_j + r_l − r_m with l ≠ m and j ≠ m, which form the set of type B peaks. For a given B peak, the corresponding r position is obtained by adding the r_j − r_m, j ≠ m, interatomic vector to the atomic position vector r_l, so the superposition of r with an atomic position is accidental. Consequently, the strength of a (single) B-type peak, e.g. at r_jlm, is weaker than that of a δ_M(r_l) peak (formed by the superposition of N − 1 single peaks). According to (40), it holds that

$[{\rm strength \ of \ } {\delta_M} \left ( {\bf r}_{jlm} \right ) \approx {\rm strength \ of \ } \left [ \rho \left ( {\bf r}_{l} \right )/\left ( N - 1 \right ) \right ] . \eqno (42)]$

A2. The general equation for δ_(M) direct methods

According to the above results, δ_M(Φ_T) contains two types of positive peaks (A and B). Table 2 lists their peak strengths and positions. The stronger peaks of type A correspond to ρ and are resolved in the Fourier map. The N − 1 times weaker peaks of type B are located at positions other than the atomic positions (coincidence is accidental). Due to their large number, i.e. N(N − 1)² [for comparison, there are only N(N − 1) non-origin peaks in the modulus function], these peaks must have a strong overlap in the unit cell. The peaks of type B are the main constituents of the g function. Consequently, δ_M can be considered as the sum of both contributions, i.e.

$[{\delta}_{M} \left ( {\bf r} \right ) = \rho \left ( {\bf r} \right ) + g \left ( {\bf r} \right ) \quad \forall \ {\bf r} \subset V , \eqno(43)]$

which is called the general equation of δ_(M) direct methods. In the following, the subscript M is left out of the equation name for the sake of generality. (Note that the g function could also contain contributions from even weaker unconsidered peaks.)

A3. The δ_M Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms

In general, the Fourier coefficients of δ_M and ρ have different moduli and may differ in phase values. This is not obvious when comparing (2) and (4) because in both expressions the corresponding Fourier terms have the same φ phase values (which is quite useful for programming). However, when the absolute values ||E| − 〈|E|〉| are used in the synthesis, a phase shift Δφ must be added to each φ to take into account the sign. Consequently, while the Fourier coefficients of ρ are |E|exp(iφ), those of δ_M are ||E| − 〈|E|〉|exp[i(φ + Δφ)], with Δφ = 0 for |E| > 〈|E|〉 (strong reflections) and Δφ = π for |E| < 〈|E|〉 (weak reflections).

APPENDIX B

Solution of integral I_g² =∫_V g² dV

Assuming that the integrand of I_g² is g² = [δ_M(Φ_T) − ρ(Φ_T)]², squaring the binomial gives

$[I_{g^2} = \int \limits_{V} \left [ {\delta}_{M}^{2} \left ( {\Phi}_{\rm T} \right ) + {\rho}^{2} \left ( {\Phi}_{\rm T} \right ) - 2{\delta}_{M} \left ( {\Phi}_{\rm T} \right ) \, \rho \left ( {\Phi}_{\rm T} \right ) \right ] \, {\rm d}V . \eqno (44)]$

The integral of the first two summands in (44) when expressed in terms of the respective Fourier coefficients is

$[\eqalignno{ \int \limits_{V} \left ( {\delta}_{M}^{2} + {\rho}^{2} \right ) \, {\rm d}V = & \, {{1} \over {V}} \sum \limits_{k} \left \{ \left [ c \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \right ]^{2} + \left | {E}_{k} \right |^{2} \right \} \cr = & \, {{1} \over {V}} \left [ \left ( {c}^{2} + 1 \right ) \sum \limits_{k} \left | {E}_{k} \right |^{2} - {c}^{2} \langle \left | E \right | \rangle^{2} \right ] . &(45)}]$

If one proceeds analogously with the third summand in (44), it follows that

$[\eqalignno{ & -2 \int \limits_{V} {\delta}_{M} \left ( {\Phi}_{\rm T} \right ) \rho \left ( {\Phi}_{\rm T} \right ) \, {\rm d}V = \cr & \quad \quad {{-2c} \over {V^2}} \sum \limits_{k} \sum \limits_{k^{\prime}} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k^{\prime}} \right | \exp{\left [ i \left ( {\varphi}_{k} + {\varphi}_{k^{\prime}} \right ) \right ]} \cr & \quad \quad \times \int \limits_{V} \exp{\left [ -i 2\pi \left ( k + k^{\prime} \right ) {\bf r} \right ]} \, {\rm d} {\bf r} . &(46)}]$

This means that the integral is zero except for k′ = −k where it becomes V, so that the final term in (46) can be approximated by

$[{{-2c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k} \right | \exp{\left [ i \left ( {\varphi}_{k} - {\varphi}_{k} \right ) \right ]} .\eqno(47)]$

The values of the exponentials in (47) are 1, so that it reduces to

$[ {{-2c} \over {V}} \sum \limits_{k} \left ( \left | {E}_{k} \right | - \langle \left | E \right | \rangle \right ) \left | {E}_{k} \right | .\eqno(48)]$

By adding (45) and (48) and normalizing by $[\int \nolimits_{V} {\rho}^{2} \, {\rm d}V]$ = $[{N}_{k} \langle \left | E \right |^{2} \rangle_{k}/V]$ = N_k/V, the final expression of the integral I_g² is obtained, namely

$[I_{g^2} = \left ( c - 1 \right )^{2} \langle \left | E \right |^{2} \rangle_{k} - c \left ( c - 2 \right ) \, \left ( \langle \left | E \right | \rangle_{k} \right )^{2} . \eqno (49)]$

From the theory of the distribution of |E| values, the values of 〈|E|²〉, 〈|E|〉 (acentric case) and c ≃ 2/〈|E|〉 can be derived, i.e. 1.00, 0.89 and 2.25, respectively, so that (49) gives I_g² ≃ 1.12. Note that I_g² only depends on |E| and |E| − 〈|E|〉, since the phases cancel out. Consequently, the values of I_g² for g² equal to [δ_M(Φ_T) − ρ(Φ_T)]² and to [δ_M(χ_T) − ρ(χ_T)]² are identical.

Supporting information

Table S1 and output of test calculations of Section 3.2.1. DOI: https://doi.org/10.1107/S2053273324009628/tw5010sup1.pdf

Acknowledgements

The author thanks Dr Xavier Torrelles and Professor Salvador Galí for valuable suggestions and the reviewers for their constructive criticisms.

Funding information

The following funding is acknowledged: Agencia Estatal de Investigación (grant Nos. PID2021-124734OB-C22 and CEX2023-001263-S). The author acknowledges the support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).

References

Antel, J., Shedrick, G. M., Bats, J. W., Kessler, H. & Müller, A. (1995). Unpublished work. Google Scholar
Burla, M. C., Caliandro, R., Giacovazzo, C. & Polidori, G. (2010). Acta Cryst. A66, 347–361. Web of Science CrossRef CAS IUCr Journals Google Scholar
Capitani, G. C., Mugnaioli, E., Rius, J., Gentile, P., Catelani, T., Lucotti, A. & Kolb, U. (2014). Am. Mineral. 99, 500–510. Web of Science CrossRef ICSD Google Scholar
Cochran, W. (1952). Acta Cryst. 5, 65–67. CrossRef CAS IUCr Journals Web of Science Google Scholar
Foadi, J., Woolfson, M. M., Dodson, E. J., Wilson, K. S., Jia-xing, Y. & Chao-de, Z. (2000). Acta Cryst. D56, 1137–1147. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lipson, H. & Cochran, W. (1966). The Determination of Crystalline Structures. The Crystalline State, Vol. III, edited by W. L. Bragg, pp. 324–325. London: G. Bell and Sons Ltd. Google Scholar
Main, P. (1975). Crystallographic Computing Techniques, edited by F. R. Ahmed, p. 99. Copenhagen: Munksgaard. Google Scholar
Miao, J., Kirz, J. & Sayre, D. (2000). Acta Cryst. D56, 1312–1315. Web of Science CrossRef CAS IUCr Journals Google Scholar
Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). Science, 259, 1430–1433. CSD CrossRef CAS PubMed Web of Science Google Scholar
Oliver, J. D. & Strickland, L. C. (1984). Acta Cryst. C40, 820–824. CSD CrossRef CAS Web of Science IUCr Journals Google Scholar
Privé, G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Protein Sci. 8, 1400–1409. Web of Science PubMed Google Scholar
Rius, J. (1993). Acta Cryst. A49, 406–409. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rius, J. (2011). XLENS_v1: A Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain. Google Scholar
Rius, J. (2012a). Acta Cryst. A68, 77–81. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J. (2012b). Acta Cryst. A68, 399–400. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J. (2014). IUCrJ, 1, 291–304. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Rius, J. (2020). Acta Cryst. A76, 489–493. Web of Science CrossRef IUCr Journals Google Scholar
Rius, J., Crespi, A. & Torrelles, X. (2007). Acta Cryst. A63, 131–134. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J., Mugnaioli, E., Vallcorba, O. & Kolb, U. (2013). Acta Cryst. A69, 396–407. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rius, J., Sañé, J., Miravitlles, C., Amigó, J. M. & Reventós, M. M. (1995). Acta Cryst. A51, 268–270. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339–347. Web of Science CrossRef IUCr Journals Google Scholar
Rius, J. & Torrelles, X. (2022). Acta Cryst. A78, 473–481. Web of Science CrossRef IUCr Journals Google Scholar
Rius, J., Vallcorba, O., Crespi, A. & Colombo, F. (2017). Z. Kristallogr. 232, 827–834. Web of Science CrossRef CAS Google Scholar
Sayre, D. (1952). Acta Cryst. 5, 60–65. CrossRef CAS IUCr Journals Web of Science Google Scholar
Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Angew. Chem. Int. Ed. 37, 2381–2384. CrossRef CAS Google Scholar
Shannon, C. E. (1949). Proc. Inst. Radio Eng. 37, 10–21. Google Scholar
Sheldrick, G. M. (2002). Z. Kristallogr. 217, 644–650. Web of Science CrossRef CAS Google Scholar
Sheldrick, G. M. (2015). Acta Cryst. A71, 3–8. Web of Science CrossRef IUCr Journals Google Scholar
Sheldrick, G. M., Hauptman, H. A., Weeks, C. M., Miller & Usón, I. (2012). International Tables for Crystallography, Vol. F, Crystallography of Biological Macromolecules, edited by E. Arnold, D. M. Himmel & M. G. Rossmann, ch. 16.1, pp. 413–442. Chichester: Wiley. Google Scholar
Weeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A. (1993). Acta Cryst. D49, 179–181. CrossRef CAS Web of Science IUCr Journals Google Scholar
Zachariasen, W. H. (1952). Acta Cryst. 5, 68–73. CrossRef CAS IUCr Journals Web of Science Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

FOUNDATIONS
ADVANCES

ISSN: 2053-2733

Volume 81| Part 1| January 2025| Pages 16-25

https://doi.org/10.1107/S2053273324009628

Open

access

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search IUCr Journals		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

The general equation of δ direct methods and the novel SMAR algorithm residuals using the absolute value of ρ and the zero conversion of negative ripples

1. Introduction

2. Basic elements of the SMAR algorithm

2.1. The ρ Fourier synthesis: its mask definition and general relationship to |ρ|

2.2. The general equation of δ direct methods and its different forms

3. The SMAR residuals

3.1. The Rρ(χ) residual

3.2. The Rδ(Φ) residual

3.2.1. Evolution of Rδ, Sδ, P and Q during the SMAR phase refinements

4. The phase refinement modes in SMAR

4.1. The slow convergence mode

4.2. The fast convergence mode

4.3. The correlation coefficient

5. Conclusions

APPENDIX A

Derivation of the general equation of δ direct methods

A1. Definition and peak analysis of the δM Fourier synthesis

A2. The general equation for δ(M) direct methods

A3. The δM Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms

APPENDIX B

Solution of integral Ig2 =∫V g2 dV

Supporting information

Acknowledgements

Funding information

References

research papers

3.1. The R_ρ(χ) residual

3.2. The R_δ(Φ) residual

3.2.1. Evolution of R_δ, S_δ, P and Q during the SMAR phase refinements

A1. Definition and peak analysis of the δ_M Fourier synthesis

A2. The general equation for δ_(M) direct methods

A3. The δ_M Fourier synthesis expressed with the signs of the |E| − 〈|E|〉 differences in the phase terms

Solution of integral I_g² =∫_V g² dV