research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733

Extending the novel |ρ|-based phasing algorithm to the solution of anomalous scattering substructures from SAD data of protein crystals

crossmark logo

aInstitut de Ciència de Materials de Barcelona, ICMAB-CSIC, Campus de la UAB, Bellaterra, Catalonia 08193, Spain
*Correspondence e-mail: jordi.rius@icmab.es

Edited by A. Altomare, Institute of Crystallography - CNR, Bari, Italy (Received 26 May 2022; accepted 29 August 2022; online 10 October 2022)

Owing to the importance of the single-wavelength anomalous diffraction (SAD) technique, the recently developed |ρ|-based phasing algorithm (SM,|ρ|) incorporating the inner-pixel preservation (ipp) procedure [Rius & Torrelles (2021). Acta Cryst A77, 339–347] has been adapted to the determination of anomalous scattering substructures and its applicability tested on a series of 12 representative experimental data sets, mostly retrieved from the Protein Data Bank. To give an idea of the suitability of the data sets, the main indicators measuring their quality are also given. The dominant anomalous scatterers are either SeMet or S atoms, or metals/clusters incorporated by soaking. The resulting SAD-adapted algorithm solves the substructures of the test protein crystals quite efficiently.

1. Introduction

Important present applications of the single-wavelength anomalous diffraction (SAD) technique are the location of SeMet atoms in crystals of multi-site genetically engineered proteins, the determination of the positions and occupancies of the heavy atoms (or clusters) entering the crystal, e.g. when soaking it in a solution, or also the direct use of chemical species already present in native crystals as anomalous scatterers (S, Cl, P, …). Knowledge of the anomalous scattering (AS) substructure provides starting phase values which can be iteratively improved by density modification. Although the substructure can be solved in favourable cases by the direct interpretation of the anomalous Patterson function (Rossmann, 1961[Rossmann, M. G. (1961). Acta Cryst. 14, 383-388.]), direct methods (DM) often offer the only alternative in complex cases. The application of DM to SAD data takes advantage of the availability of the experimentally accessible absolute values of the anomalous differences (|D|exp) between pairs of acentric reflections (Bijvoet pairs) which follows from the atomic scattering factor definition

[{f_j} = f_j^{\rm n} + f_j^\prime + if_j^{\prime\prime}, \eqno(1)]

where [f_j^{\rm n}] is the normal scattering factor of atom j, and [f_j^\prime] and [f_j^{\prime\prime}] are the corresponding real and imaginary anomalous dispersion corrections (respective symbols for non-vibrating atoms are f0, [f_0^{\rm n}], [f_0^\prime], [f_0^{\prime\prime}]). Let us consider a structure composed of N atoms with NA of them scattering anomalously and with r being the atomic position vector. The structure factor of an arbitrary H reflection is then

[{F_H} = \left| F_H^\prime \right|\exp(i\varphi _H^\prime) + i\left| F_H^{\prime\prime} \right|\exp(i\varphi _H^{\prime\prime})\eqno(2)]

with

[\left| F_H^\prime \right|\exp(i\varphi _H^\prime) = \textstyle \sum \limits_{l = 1}^N f_{l,H}^{\rm n}\exp(i2\pi {\bf Hr}_l) + \sum \limits_{j = 1}^{N_{\rm A}} f_j^\prime\exp(i2\pi {\bf Hr}_j)\eqno(3)]

[\left| F_H^{\prime\prime} \right|\exp(i\varphi _H^{\prime\prime}) = \textstyle \sum \limits_{j = 1}^{N_{\rm A}} f_j^{\prime\prime}\exp(i2\pi {\bf Hr}_j).\eqno(4)]

For two +H and −H reflections constituting a Bijvoet pair (from now on, F + H = F+ and [{F_{ - H}} = {F^ - }]), the absolute value of the anomalous difference D is given by

[\left| D \right| = \left| {\left| {{F^ + }} \right| - \left| {{F^ - }} \right|} \right|\eqno(5)]

which is related to [| {F''} |] by the simple relationship (30)[link] (see Appendix A[link])

[\left| D \right| = 2\left| {F''} \right| \times \left| \sin\left(\varphi ' - \varphi '' \right) \right| \eqno(6)]

if conditions (7a)[link] and (7b)[link] corresponding to (28)[link] and (29)[link] are met, i.e.

[\left| F \right|_{\rm av}^2 \,\gg \left| D \right|^2/4 \eqno(7a)]

and

[\left| F \right|_{\rm av}^2 \,\gg \left| {F''} \right|^2 \eqno(7b)]

with

[\left| F \right|_{\rm av}^2 = \left(\left| F^ + \right|^2 + \left| {F^ - } \right|^2 \right)/2. \eqno(8)]

Equation (6)[link] constitutes the basis for solving AS substructures by DM. First attempts showing the viability of locating AS in metalloproteins by DM were performed by Mukherjee et al. (1989[Mukherjee, A. K., Helliwell, J. R. & Main, P. (1989). Acta Cryst. A45, 715-718.]) with the program MULTAN87 (Debaerdemaeker et al., 1987[Debaerdemaeker, T., Germain, G., Main, P., Tate, C. & Woolfson, M. M. (1987). MULTAN87. A System of Computer Programs for the Automatic Solution of Crystal Structures from X-ray Diffraction Data. University of York, England.]) following the path previously paved by Wilson (1978[Wilson, K. S. (1978). Acta Cryst. B34, 1599-1608.]) in connection with the isomorphous replacement case and taking advantage of preliminary results on the location of AS using tuneable synchrotron radiation (Einspahr et al., 1985[Einspahr, H., Suguna, K., Suddath, F. L., Ellis, G., Helliwell, J. R. & Papiz, M. Z. (1985). Acta Cryst. B41, 336-341.]); however, it was the introduction of the dual-space DM that represented a substantial improvement in the determination of AS substructures. This DM strategy refines phases by iteratively alternating structure invariant manipulation (reciprocal space) with Fourier peak optimization (real space). It was first implemented in the Shake-and-Bake program (Miller et al., 1994[Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). J. Appl. Cryst. 27, 613-621.]). This philosophy was also incorporated in SHELX (Sheldrick & Gould, 1995[Sheldrick, G. M. & Gould, R. O. (1995). Acta Cryst. B51, 423-431.]) which evolved to SHELXD by incorporating, among other things, Patterson seeding (Schneider & Sheldrick, 2002[Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772-1779.]). Descriptions of the application of SHELXD to the solution of the AS substructures are given by Usón & Sheldrick (2018[Usón, I. & Sheldrick, G. M. (2018). Acta Cryst. D74, 106-116.]) and Sheldrick (2010[Sheldrick, G. M. (2010). Acta Cryst. D66, 479-485.]). More recently, the capability of SAD phasing in the presence of only weak AS has increased due to the possibility of extending the SAD experiments to longer wavelengths as well as to the availability of faster and more accurate X-ray detectors (e.g. Leonarski et al., 2018[Leonarski, F., Redford, S., Mozzanica, A., Lopez-Cuenca, C., Panepucci, E., Nass, K., Ozerov, D., Vera, L., Olieric, V., Buntschu, D., Schneider, R., Tinti, G., Froejdh, E., Diederichs, K., Bunk, O., Schmitt, B. & Wang, M. (2018). Nat. Methods, 15, 799-804.]), allowing application of lower dose rates and thus increasing data redundancy on a unique crystal (data set scaling from multiple crystals is minimized). A recent promising alternative acquisition mode, especially useful for data collection from small, weakly diffracting and radiation-sensitive crystals, is serial crystallography. This technique is based on taking one single image (containing partial Bragg reflection information) from each microcrystal and completing the diffraction data set by combining the individual indexed images from thousands of crystals. A selection of de novo (SAD) phasing serial crystallography studies at synchrotron sources can be found in Nass et al. (2020[Nass, K., Cheng, R., Vera, L., Mozzanica, A., Redford, S., Ozerov, D., Basu, S., James, D., Knopp, G., Cirelli, C., Martiel, I., Casadei, C., Weinert, T., Nogly, P., Skopintsev, P., Usov, I., Leonarski, F., Geng, T., Rappas, M., Doré, A. S., Cooke, R., Nasrollahi Shirazi, S., Dworkowski, F., Sharpe, M., Olieric, N., Bacellar, C., Bohinc, R., Steinmetz, M. O., Schertler, G., Abela, R., Patthey, L., Schmitt, B., Hennig, M., Standfuss, J., Wang, M. & Milne, C. J. (2020). IUCrJ, 7, 965-975. ]).

Recently, |ρ|-based DM in the form of the SM,|ρ| phasing algorithm (Rius, 2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]) have been extended to large crystal structures through the introduction of the peakness-enhancing ipp (inner-pixel preservation) procedure (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]) (hereafter, to simplify its designation, the SM,|ρ| algorithm is specified with the acronym SMAR in which S stands for `sum function', M for `modulus function' and AR for `absolute ρ'). The aim of the present contribution is the adaptation of the ipp-improved SMAR to the solution of AS substructures from SAD data (SAD-SMAR). Its feasibility is shown with SAD data sets either kindly supplied by the respective authors or retrieved from the Protein Data Bank (PDB). All calculations have been carried out with a modified version of XLENS_v1 (Rius, 2011[Rius, J. (2011). XLENS_v1: a Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain, https://crystallography.icmab.es/software.]). To help the reader to assess the suitability of the test data, two indicators are given for each data set (extending to all acentric reflections in the corresponding resolution range used in the SAD-SMAR application), namely:

(i) The size of the anomalous signal (Bijvoet ratio), [\langle| D |\rangle/\langle| F |\rangle] (Hendrickson & Teeter, 1981[Hendrickson, W. A. & Teeter, M. M. (1981). Nature, 290, 107-113.]; Wang, 1985[Wang, B. C. (1985). Methods Enzymol. 115, 90-112.]) ranging from 0.012 to 0.070 in the selected test examples.

(ii) The precision of [| D |] given by the [\langle| D |/\sigma(| D |)\rangle] ratio (Schneider & Sheldrick, 2002[Schneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772-1779.]; Wang, 1985[Wang, B. C. (1985). Methods Enzymol. 115, 90-112.]) which should be >1.5 (ideally also for the outermost resolution shell) (Cianci et al., 2008[Cianci, M., Helliwell, J. R. & Suzuki, A. (2008). Acta Cryst. D64, 1196-1209.]; Giacovazzo, 2014[Giacovazzo, C. (2014). Phasing in Crystallography: a Modern Perspective. Oxford University Press.]). Logically, the precision of [| D|] directly depends on the precision of the corresponding [| {F^ + }|] and [| {F^ - } |] (more strictly of I+ and [{I^ - }]).

In SAD phasing, redundancy of diffraction data is an important data collection parameter, since it affects the variance of the average intensity estimates. As this work is based on published data sets, the cited redundancy values are those given by the respective authors.

2. The composition of the |D| set

Solving AS substructures by DM requires a previous selection of the experimental |D| values, |D|exp, since not all of them are appropriate. A preliminary check should ensure that the Bijvoet-pair reflections have |F|av values satisfying conditions (7a)[link] and (7b)[link]. This is accomplished by preserving in the initial set of |D| differences only those reflections with |F|av values (expressed as |E|'s) larger than a given ECUT cut-off value. In the test calculations, the used ECUT is [\cong] 0.25 which causes the suppression of approximately 5% of the total of acentric reflections. The selection process continues with two additional rejection criteria which are directly applied to the |D| anomalous differences (to increase their reliability and the absence of outliers). Since |D| is in general much smaller than |F|av, random errors inherent to |F+| and |F| seriously affect the precision of |D|. Consequently, only those reflections fulfilling the |D| > DFCUT × σ(|D|) criterion are preserved in the |D| set (Hendrickson et al., 1988[Hendrickson, W. A., Smith, J. L., Phizackerley, R. P. & Merritt, E. A. (1988). Proteins, 4, 77-88.]; Grosse-Kunstleve & Brunger, 1999[Grosse-Kunstleve, R. W. & Brunger, A. T. (1999). Acta Cryst. D55, 1568-1577.]). In the test calculations, DFCUT is in general ∼0.4 which represents the additional removal of 10–15% of acentric reflections from the |D| set. The selection process ends with the outlier elimination, i.e. all reflections with |D|/r.m.s.d.(|D|) greater than ∼4.0 are filtered out (Hendrickson et al., 1988[Hendrickson, W. A., Smith, J. L., Phizackerley, R. P. & Merritt, E. A. (1988). Proteins, 4, 77-88.]; Grosse-Kunstleve & Brunger, 1999[Grosse-Kunstleve, R. W. & Brunger, A. T. (1999). Acta Cryst. D55, 1568-1577.]) [r.m.s.d.(|D|) = root-mean-square deviation of |D|]. The surviving reflections in the |D| set are generically denoted by H.

3. The SAD-SMAR algorithm

3.1. The normalized X values

The SAD-SMAR algorithm uses, instead of the experimentally inaccessible quasi-normalized |E| values of the substructure (Main, 1976[Main, P. (1976). Crystallographic Computing Techniques, edited by F. R. Ahmed, K. Huml & B. Sedlácek, pp. 97-105. Copenhagen: Munksgaard.]), the normalized X values based on (6)[link] and defined by the quotient

[{X^2} = {{{{\left| {F''} \right|}^2}{{\sin }^2}\psi } \over {{\langle{\left| {F''} \right|}^2}{{\sin }^2}{\psi}\rangle _s}}\eqno(9)]

with [\psi = \varphi ' - \varphi ''] and where s is the resolution shell corresponding to [| {F''} |^2]. Since [| {F''}|^2] and [{\sin ^2}\psi] may be assumed uncorrelated, the average term in the denominator can be decomposed into the product of [\langle| {F''} |^2\rangle_s] and [\langle\sin ^2\psi\rangle _s]. Furthermore, since [\varphi '] predominantly depends on the protein atoms and [\varphi ''] only on the anomalous scatterers, both phases can be considered largely uncorrelated and hence [\langle\sin ^2\psi\rangle _s] can be assumed to be 0.5, so that

[{X^2} = {{{{\left| {F''} \right|}^2}{{\sin }^2}\psi } \over {0.5\langle\left| {F''} \right|^2\rangle_s}}.\eqno(10)]

On the other hand, according to the |E| definition, [\langle| {F''} |^2\rangle_s] in (10)[link] can be replaced by [| {F''} |^2/| {E''} |^2], so that the expression relating X2 and [| {E''} |^2] reduces to

[{X^2} = {\left| {E''} \right|^2} \times {{{{\sin }^2}\psi } \over {0.5}}.\eqno(11)]

If X2 is averaged over all reflections in its corresponding s resolution shell, then [\langle X^2\rangle_s] = 1, since [\langle| {E''} |^2\rangle_s] is 1 by definition and [\langle\sin ^2\psi\rangle_s] is 0.5.

In addition to X values, SAD-SMAR also uses modified X values called |Xm|. These are obtained (i) by calculating the M modulus function with X as Fourier coefficients (extending the sum to the H reflections), (ii) by suppressing the negative regions in M, and (iii) by back Fourier transforming the modified M function (Karle, 1980[Karle, J. (1980). Int. J. Quantum Chem.: Quantum Biol. Symp. 7, 357-367.]).

3.2. Calculation of X from |D|exp

The relation between X and |D| is easily found by introducing the squared (6)[link] into (10)[link]

[{X^2} = {{{{\left| D \right|}^2}} \over {2\langle\left| {F''} \right|^2\rangle_s}} = {{{k^2} {{\left({{{\left| D \right|}_{\rm exp}}} \right)}^2}} \over {2\langle\left| {F''} \right|^2\rangle_s}},\eqno(12)]

where k is the scaling constant putting [| D |_{\rm exp}] on the same scale as [| D |]. The [\langle| {F''} |^2\rangle_s] quantity in the denominator, i.e. the average intensity of the s shell, can be expressed as

[\langle\left| {F''} \right|^2\rangle_s = \exp\left[ - 2B\left({{{\sin{\theta _s}} \over \lambda }} \right)^2\right] \sum \limits_{j = 1}^{N_{\rm A}} {\left({f_{0j}^{''}} \right)^2},\eqno(13)]

where B is the overall atomic displacement parameter including vibrational and disorder effects. At this point, for convenience, each [f_{0j}^{\prime\prime}] will be converted to qj by dividing by [f_{0L}^{\prime\prime}] (= the largest [f_{0j}^{\prime\prime}]). Replacement of [f_{0j}^{\prime\prime}] by [{q_j}f_{0L}^{\prime\prime}] in (13)[link] and subsequent introduction of the modified (13)[link] into (12)[link] leads to the final expression

[{X^2} = {K^2}\exp\left[2B\left({{\sin{\theta _s}} \over \lambda } \right)^2\right]{{\left(\left| D \right|_{\rm exp}\right)^2} \over {\sum _{j = 1}^{N_{\rm A}}q_j^2}}\eqno(14)]

with

[K = k/2^{1/2} f_{0L}^{\prime\prime}\eqno(15)]

which allows the derivation of X2 from [(| D |_{\rm exp} )^2] provided that the AS composition is known. In view of (14)[link], the estimation of the K constant and the B parameter can be obtained from a Wilson plot, since for each reciprocal-space shell, both [\langle X^2\rangle_s] and the [\langle(| D |_{\rm exp} )^2\rangle_s/\sum_{j = 1}^{N_{\rm A}} q_j^2] quotient are known.

3.3. SAD-SMAR recycling

Phasing with the SMAR algorithm was first shown by Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]). Later on, the ipp procedure, a simple way of enhancing peakness in Fourier maps, was added (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]). To show how the SAD-SMAR modification works, one phase refinement cycle is described in detail in Fig. 1[link]. It has been divided into four stages, each one including one Fourier transform operation. These are:

[Figure 1]
Figure 1
The recursive SAD-SMAR phase refinement algorithm with enhanced peakness (ipp). Compared with the unmodified SMAR, the principal differences are the composition of Φh as well as the replacement of |E| values either by X = |E′′sinψ| or by |Xm|.

(i) Calculation of the [\rho ''] density function. The phase refinement cycle begins with the introduction of Φh, the subset of [\varphi ''] phases of the h reflections to be refined (either initial or updated estimates). Unlike in non-anomalous SMAR applications where Φh contains the phases of all large reflections (i.e. those H reflections with |E| ≥ 1.00), in the case of SAD-SMAR, Φh only includes the [\varphi _h^{\prime\prime}] phases of those H reflections with X larger than a given XCUT cut-off (here XCUT = 1.00). Since X/ 21/2 is equal to |E′′| |sin ψ|, the largest possible value of |E′′| for a given X is X/21/2 (which is reached for |sin ψ| = 1). In general, |sin ψ| will be lower than 1 and therefore X/21/2 is a lower estimate of |E′′| (Grosse-Kunstleve & Adams, 2003[Grosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966-1973.]). How the composition of Φh depends on the X values is illustrated in Table 1[link] for XCUT = 1.00. It can be seen that most phases of reflections with |E′′|'s > 1.00 are present in Φh; however, this number decreases significantly for |E′′|'s between 1.00 and 0.70 and, finally, for |E′′|'s < 0.70, it becomes zero. In this work the initial estimates of [\varphi _h^{\prime\prime}] are the phase values corresponding to the Fourier coefficients of M′, i.e. the randomly shifted modulus function (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]). As can be seen in Fig. 1[link], the Fourier synthesis with [| X_{m,h} |\exp(i\varphi _h^{\prime\prime})] as Fourier coefficients gives the [\rho ''] density function from which the m" mask is derived (and stored). According to Rius (2020[Rius, J. (2020). Acta Cryst. A76, 489-493.]), m" is 1 (for [\rho ''] > 0), 0 (for [\rho ''] between 0 and −tσ) and −1 (for [\rho ''] < −tσ) with σ2 being the variance of [\rho ''] (Φh) and t ∼2.65.

Table 1
Effect of XCUT on the composition of the Φh subset of phases

The central part of the table lists the |E′′|| sin ψ | products for selected |E′′| and |sin ψ| values (numbers in bold refer to XCUT = 1.00). As shown in the rightmost column, Φh contains no phases of reflections with |E′′| < 0.70; however, for |E′′| > 1.00, the percentage of reflections considered in Φh is very high, e.g. 85.56% for |E′′| = 2.

  | sin ψ |  
|E′′| 1.00 0.75 0.50 0.25 0.10 % in Φh
3.00 3.00 2.25 1.50 0.75 0.30 94.28
2.00 2.00 1.50 1.00 0.50 0.20 85.56
1.00 1.00 0.75 0.50 0.25 0.10 55.56
0.71 0.71 0.53 0.36 0.18 0.07 6.40
0.50 0.50 0.38 0.25 0.13 0.05 0.00
0.10 0.10 0.08 0.05 0.03 0.01 0.00

(ii) Calculation of the Fourier transform of |[\rho '']|. It gives the [|C_H^{\prime\prime}|\exp(i\alpha _H^{\prime\prime})] Fourier coefficients and provides the updated [\alpha _H^{\prime\prime}].

(iii) Calculation of [\delta _M^{\prime\prime}]. The [\delta _M^{\prime\prime}] density function is the inverse Fourier transform of the [[({X_H} - \langle X\rangle)\exp(i\alpha _H^{\prime\prime})]] coefficients formed by the experimental [{X_H} - \langle X\rangle] values and the updated [\alpha _H^{\prime\prime}] phases. The calculated [\delta _M^{\prime\prime}] is then multiplied with the previously stored m" mask to give the η product function.

(iv) Calculation of the Fourier transform of η. Peakness in η is enhanced by applying the ipp density modification procedure. Once completed, the modified η is Fourier-transformed to provide the new [\varphi _h^{\prime\prime}] and [| E_h^{\prime\prime} |] values, the latter being used in the calculation of the [{\rm CC}_h] figure-of-merit to follow the phase refinement convergence,

[{\rm CC}_h = \left[ {{\sum_h (|X_{m,h}| \times | E_h^{\prime\prime}|_{\rm new})^2} \over {\sum _h |X_{m,h}|^2 \times \sum_h | E_h^{\prime\prime} |_{\rm new}^2}} \right]^{1/2}.\eqno(16)]

If convergence is not achieved, the next cycle begins until the preset maximum number of cycles is reached.

4. Fourier refinement and figure-of-merit

After applying SAD-SMAR, the phases are further refined by Fourier recycling (five to ten cycles). In order not to have to modify the Fourier refinement module of already existing DM programs, e.g. of XLENS_v1 (Rius, 2011[Rius, J. (2011). XLENS_v1: a Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain, https://crystallography.icmab.es/software.]), the [F_n^{\prime\prime}] structure factor corresponding to a hypothetical structure with scatterers of f0Lqj strengths is introduced (with f0L being the normal scattering factor corresponding to the largest [f_{0L}^{\prime\prime}]). For this purpose, (11)[link] and (14)[link] are equated and both sides of the expression multiplied by f0L2. After rearranging the resulting expression, we obtain

[\eqalignno{&\left| E_n^{\prime\prime} \right|^2 \exp\left[ - 2B\left({{\sin{\theta _s}} \over \lambda } \right)^2\right]\left(\sum_{j = 1}^{N_{\rm A}} f_{0L}^2 q_j^2 \right) {{\sin ^2\psi } \over {0.5}}&\cr &= K^2f_{0L}^2\left(\left| D \right|_{\rm exp} \right)^2. &(17)}]

Notice that the first three factors of the left-hand side of (17)[link] correspond to [| {F_n^{\prime\prime}} |^2]. Replacement of these by [| {F_n^{\prime\prime}} |^2] gives, after taking the square root, the best approximation Γ to the modulus of the structure factor

[\Gamma = \left| {F_n^{\prime\prime}} \right| {{\left| {\sin \psi } \right|} \over {\left({0.5} \right)^{1/2} }} = K{f_{0L}}{\left| D \right|_{\rm exp}}\eqno(18)]

which is used as observational data in the [(2\Gamma - | F_n^{\prime\prime} |_{\rm calc})\exp(i\varphi _{\rm calc}^{\prime\prime})] Fourier coefficients during recycling. At the end of the last Fourier refinement cycle, the (correlation coefficient based) residual is calculated

[{R_{\rm CC}} = 1000\,\left \{{1 - {{{{\left[{\sum _H \left({{\Gamma _H}\,{{\left| {F_{n,H}^{\prime\prime}} \right|}_{\rm new}}} \right)^{1/2} } \right]}^2}} \over {\left({\sum_H {\Gamma _H}\,} \right)\left({\sum _H {{\left| {F_{n,H}^{\prime\prime}} \right|}_{\rm new}}} \right)}}} \right\}\eqno(19)]

wherein the sums only include the H reflections with X ≥ 0.7.

5. Results of the test calculations

Relevant experimental information about the data sets used in the test calculations is given in Table 2[link]. To improve the readability of the text, the test compounds are simply referenced with the appropriate PDB code. The verification of the SAD-SMAR tests was greatly facilitated by the availability of the refined model coordinates either kindly provided by the authors or deposited by them in the PDB. In this way, the r.m.s.d.'s between our substructure models and the deposited ones could be calculated. The most relevant results of the test calculations are summarized in Table 3[link]. Table 4[link] complements this information by giving, for most test examples, the peak heights at the end of the Fourier recycling stage. Peak heights are always given in ρpeak/σ units, where ρpeak is the density at the peak centre and σ2 is the variance of ρ.

Table 2
Relevant data collection parameters and indicators

1Detailed author references in Section 5[link]; 2main anomalous scatterers; 3redundancy of diffraction data taken from the published/deposited data (and later normalized to the point group order); 4highest resolution (in Å) of SAD data used in the structure refinement with Rfree values5 from the respective authors; 6highest resolution for SAD-SMAR application; Bijvoet ratio7 estimation; and [\langle| D|/\sigma (| D |)\rangle]8 calculations (in the whole range and in the outermost reciprocal-space shell).

PDB code1 AS2 Space group λ (Å) Redundancy3 RES4ref Rfree5 RES6SMAR 〈|D|〉7/〈|F|〉 [\langle| D |/\sigma (| D |)\rangle]8
Whole Outer
4jiu(a) Zn P212121 1.282 1.63 1.60 2.50 0.0452 1.41 1.36
5cx8(b) Se P21212 0.979 1.68 2.40 0.208 3.00 0.0568 1.59 0.80
4yu5(c) Se P212121 0.979 1.10 2.90 0.207 3.30 0.0693 1.57 0.90
5lac(d) Se P21212 0.918 1.15 1.94 0.207 2.50 0.0696 2.51 1.60
5iqy(e) I C2221 1.542 6.83 2.40 0.234 3.00 0.0624 3.13 1.69
3k9g(f) I P43212 1.542 1.56 2.25 0.266 2.90 0.0433 2.68 1.41
3km3(f) I R3(H) 1.542 1.87 2.10 0.222 2.80 0.0361 1.60 1.01
3men(f) I P212121 1.542 1.70 2.20 0.237 3.00 0.0466 1.71 1.06
2g4h(g) Cd F432 2.000 3.04 2.00 0.218 2.90 0.0141 3.71 1.46
4tno(h) S, Cl P41212 2.066 9.53 2.14 0.305 2.60 0.0132 2.37 1.12
4pgo(h) S, Cl P6522 2.066 8.58 2.30 0.203 3.00 0.0175 3.03 1.21
2g4s(g) S P6322 2.000 2.86 2.15 0.323 3.20 0.0116 1.93 1.47
(a) López-Pelegrín et al. (2013[López-Pelegrín, M., Cerdà-Costa, N., Martínez-Jiménez, F., Cintas-Pedrola, A., Canals, A., Peinado, J. R., Marti-Renom, M. A., López-Otín, C., Arolas, J. L. & Gomis-Rüth, F. X. (2013). J. Biol. Chem. 288, 21279-21294.]); (b) Goulas et al. (2016[Goulas, T., Garcia-Ferrer, I., Hutcherson, J. A., Potempa, B. A., Potempa, J., Scott, D. A. & Gomis-Rüth (2016). Mol. Oral Microbiol. 31, 472-485.]); (c) Arolas et al. (2016[Arolas, J. L., Goulas, T., Pomerantsev, A. P., Leppla, S. H. & Gomis-Rüth, F. X. (2016). Structure, 24, 25-36.]); (d) Kanitz et al. (2019[Kanitz, M., Blanck, S., Heine, A., Gulyaeva, A. A., Gorbalenya, A. E., Ziebuhr, J. & Diederich, N. E. (2019). Virology, 533, 21-33.]); (e) Krishna Das et al. (2016[Krishna Das, B., Kumar, A., Maindola, P., Mahanty, S., Jain, S. K., Reddy, M. K. & Arockiasamy, A. (2016). Biochem. Biophys. Res. Commun. 473, 1152-1157.]); (f) Abendroth et al. (2011[Abendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83-95.]); (g) Mueller-Dieckmann et al. (2007[Mueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366-380.]); (h) Weinert et al. (2013[Abendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83-95.]).

Table 3
Comparison of the SAD-SMAR phase refinement results for DFCUT = ∼0.4 and 0.0

1Completeness as cD = ND/Nasy in % (ND = number of reflections in |D| set; Nasy = number of unique reflections); 2n.c.t. = number of converging (correct) trials out of 25; 3(average) number of cycles to reach convergence; 4,5final CCh and RCC values for correct solutions; 6number of sites found in the a.u. compared with published refined values; 7sep. = root-mean-square deviation in Å between found and published refined site positions.

PDB code DFCUT cD1 n.c.t.2 ncycle3 CCh4 RCC5 nsites6 Sep.7
4jiu 0.375 71.2 25 5 0.91 39 1/1 Zn 0.15
  0.0 85.4 25 5 0.91 41    
5cx8 0.375 71.6 25 12 0.88 59–61 12/12 Se 0.24
  0.0 86.6 25 10 0.88 62–64    
4yu5 0.375 71.0 25 13 0.87 60–62 18/18 Se 0.35
  0.0 87.2 25 14 0.85 68–69    
5lac 0.375 87.6 25 27 0.91 50–51 12/12 Se 0.18
  0.0 90.1 25 14 0.91 49    
5iqy 0.450 76.2 8 <50 0.87 57–59 15/26 I 0.43
  0.0 86.0 13 <50 0.87 54–61    
3k9g 0.375 73.8 21 <55 0.87 59–65 9/12 I 0.35
  0.0 81.8 20 <55 0.87 60–64    
3km3 0.375 78.2 18 <36 0.87 65–69 13/16 I 0.43
  0.0 93.1 24 <45 0.87 68–71    
3men 0.375 74.1 4 <100 0.88 55–58 33/35 I 0.24
  0.0 88.0 11 <55 0.88 57–58    
2g4h 0.750 68.1 25 <50 0.87 57–61 5/5 Cd 0.22
    80.8 25 <50 0.87 59–63    
4tno 0.400 67.3 22 <30 0.88 51–54 4/3 S + 2 Cl 0.48
  0.0 79.2 20 <30 0.88 53–56    
4pgo 0.375 70.9 22 <40 0.88 54–57 4/2 S + 2 Cl 0.56
  0.0 79.9 25 <40 0.87 55–59    
2g4s 0.375 67.6 19 <125 0.88 57–60 3/4 S 0.18
  0.0 79.3 10 <125 0.88 59–62    

Table 4
Heights of peaks in the final map of Fourier recycling for most test examples expressed in ρpeak/σ units (ρpeak = maximum peak density; σ2 = variance of ρ)

The peaks in the a.u., ordered in decreasing height, are divided into two sets: A containing all correct signal peaks down to the first uninterpreted peak (only the heights of the first and last peaks are given, followed by the corresponding number of AS in brackets); B with mixed correct and uninterpreted peaks (with the heights of the latter in italics). According to these results, cut-off values of ρpeak/σ(ρ) for considering Fourier peaks as part of the substructure model can be set at around 5.0–7.0 (for soaked native crystals, they are slightly higher).

Code A B
5cx8 43.2 → 20.6 [12 Se] 5.6
4yu5 21.1 → 16.3 [18 Se +1 Zn] 15.0, 12.5, 7.0
5lac 48.0 → 12.4 [12 Se] 5.5
5iqy 17.8 → 7.0 [14 I] 6.8, 6.6, 6.3, 5.3, 5.3, 5.1, 5.0
3k9g 35.4 → 10.8 [8 I] 9.9, 9.2, 8.8
3km3 35.6 → 14.4 [9 I] 13.5, 12.5, 10.9, 10.7, 7.4, 7.2, 7.1
3men 36.3 → 8.7 [32 I] 8.1, 8.0, 7.6
4tno 17.0 → 11.0 [2 S, 2 Cl] 5.3
4pgo 23.9 → 9.6 [2S, 1 Cl] 8.1, 6.4, 6.3
2g4s 17.2 → 14.0 [3 S] 6.4, 5.9, 5.6

To get a rough idea of the quality of the deposited/supplied SAD refinements, the deposited Rfree values (listed in Table 2[link] together with the corresponding upper resolution limits, RESref) were compared with the median Rfree values of the PDB which are 0.24, 0.25, 0.26 and 0.28 for upper resolution limits corresponding to the intervals 1.95–2.15, 2.15–2.35, 2.35–2.40 and ∼2.90 Å (Read et al., 2011[Read, R. J., Adams, P. D., Arendall, W. B., Brunger, A. T., Emsley, P., Joosten, R. P., Kleywegt, G. J., Krissinel, E. B., Lütteke, T., Otwinowski, Z., Perrakis, A., Richardson, J. S., Sheffler, W. H., Smith, J. L., Tickle, I. J., Vriend, G. & Zwart, P. H. (2011). Structure, 19, 1395-1412.]). It is found that the Rfree values are less than or equal to the corresponding median Rfree values in all cases, except for 2g4s and 4tno, for which Rfree is significantly higher.

A preliminary test was the substructure solution of the proenzyme of proabylysin (PDB code 4jiu; a = 34.679, b = 44.896, c = 72.233 Å, P212121). The data set was measured at ID29 (ESRF) at the Zn absorption edge (λ = 1.282 Å) (López-Pelegrin et al., 2013[López-Pelegrín, M., Cerdà-Costa, N., Martínez-Jiménez, F., Cintas-Pedrola, A., Canals, A., Peinado, J. R., Marti-Renom, M. A., López-Otín, C., Arolas, J. L. & Gomis-Rüth, F. X. (2013). J. Biol. Chem. 288, 21279-21294.]). The structure refinement (deposited by the same authors) contains one Zn ion, one macromolecule and 148 water molecules in the asymmetric unit (a.u.), amounting to 1055 atoms. The successful run of this simple case (separation between found and deposited Zn ion positions is ∼0.15 Å) confirmed the capability of SAD-SMAR to solve AS substructures at 2.5 Å resolution (B[\cong] 25.1 Å2). Next, it was tested with more challenging cases. To simplify the discussion, the test compounds are divided into three groups.

5.1. SeMet derivatives

Compared with other SAD situations, Se-SAD is particularly favourable due to the large AS strength of Se ([f_{0{\rm Se}}^{\prime\prime}] ∼3.9 and ∼3.3 e for λ = 0.979 and 0.919 Å, respectively) and because the substitution of S by Se in the me­thio­nine amino acids is normally complete. The data sets of the three tested SeMet derivatives correspond to:

5cx8: a = 56.64, b = 184.74, c = 144.31 Å, P21212. A major immunodominant outer-membrane surface receptor antigen of Porphyromonas gingivalis measured at beamline (BL) XALOC (ALBA, Barcelona) (Goulas et al., 2016[Goulas, T., Garcia-Ferrer, I., Hutcherson, J. A., Potempa, B. A., Potempa, J., Scott, D. A. & Gomis-Rüth (2016). Mol. Oral Microbiol. 31, 472-485.]; Se derivative refinement deposited in PDB entry 5cx8; SAD data supplied by one of them). There are 12 Se positions, two macromolecules and 509 water molecules in the a.u., amounting to 8119 atoms. Application of SAD-SMAR yields the positions of the 12 Se atoms (B [\cong] 2.3 Å2) with r.m.s.d. = 0.24 Å compared with the deposited refined model (Table 3[link]).

4yu5: a = 97.61, b = 102.41, c = 242.88 Å, P212121. Thuringilysin, a variant of zymogenic BaInhA2-E/A measured at BL XALOC (ALBA, Barcelona) (Arolas et al., 2016[Arolas, J. L., Goulas, T., Pomerantsev, A. P., Leppla, S. H. & Gomis-Rüth, F. X. (2016). Structure, 24, 25-36.]; Se derivative refinement deposited in PDB entry 4yu5; SAD data supplied by one of them). There are 18 Se, one Zn, two macromolecules and 104 water molecules in the a.u., amounting to 10 942 atoms. Application of SAD-SMAR supplies the positions of the 18 Se atoms (B [\cong] 9.8 Å2) with r.m.s.d. = 0.35 Å. Regarding the Zn ion, it shows up in the Fourier map 1.06 Å apart from the deposited refined position. Its strength is similar to that of the two Se atoms with higher B values.

5lac: a = 94.144, b = 111.353, c = 58.191 Å, P21212. A 3C-like protease of Cavalli virus collected at BL 14.2 (BESSY II, Berlin) (Kanitz et al., 2019[Kanitz, M., Blanck, S., Heine, A., Gulyaeva, A. A., Gorbalenya, A. E., Ziebuhr, J. & Diederich, N. E. (2019). Virology, 533, 21-33.]; SAD and refinement data deposited in PDB entry 5lac). There are 12 Se positions (one of them split in the refinement), one macromolecule and 303 water molecules in the a.u., amounting to 4875 atoms. Application of SAD-SMAR yields the positions of the 12 Se atoms (B [\cong] 4.9 Å2) with r.m.s.d. = 0.18 Å compared with the deposited model.

5.2. Native crystals soaked in heavy metal/metal cluster solutions

The first four cases of this subsection are native crystals soaked in a solution containing iodide ions and with their diffraction data being collected in-house on rotating anodes (Cu Kα radiation) where the anomalous signal for I is large ([f_{0{\rm I}}^{\prime\prime}] ∼6.9 e). The fifth case corresponds to crystals soaked in a Cd2+-containing solution.

5iqy: a = 40.89, b = 132.08, c = 97.57 Å, C2221. An apo-de­hydro­ascorbate reductase from Pennisetum glaucum (Krishna Das et al., 2016[Krishna Das, B., Kumar, A., Maindola, P., Mahanty, S., Jain, S. K., Reddy, M. K. & Arockiasamy, A. (2016). Biochem. Biophys. Res. Commun. 473, 1152-1157.]; SAD and refinement data deposited in PDB entry 5iqy). According to the deposited data, there are 26 sites occupied by a total of 13.3 I1−, one macromolecule and 95 water molecules in the a.u. (1719 atoms). Application of SAD-SMAR yields 15 sites (B [\cong] 45.5 Å2) containing 9.74 I1− which show a good agreement with the deposited data (r.m.s.d. = 0.43 Å) as shown in Fig. 2[link]. Table 5[link] compares the resulting site occupancies with the deposited ones.

Table 5
5iqy: list of top-ranked iodide site occupancies (≥0.40) obtained by applying the SAD-SMAR algorithm compared with those in the deposited refinement (Krishna Das et al., 2016[Krishna Das, B., Kumar, A., Maindola, P., Mahanty, S., Jain, S. K., Reddy, M. K. & Arockiasamy, A. (2016). Biochem. Biophys. Res. Commun. 473, 1152-1157.]) (see Fig. 2[link])

Only two peaks are missing. (Sep = separation between corresponding sites.)

Site No. Occ. SAD-SMAR Occ. LS Sep. (Å)
1 1.00 1.00 0.11
2 0.90 0.76 0.12
3 0.82 0.86 0.14
4 0.76 0.78 0.57
5 0.72 0.62 0.34
6 0.71 0.65 0.25
7 0.67 0.62 0.52
8 0.62 0.66 0.34
9 0.60 0.63 0.45
10 0.53
11 0.48 0.44 0.81
12 0.48 0.60 0.46
13 0.44 0.54 0.24
14 0.40 0.40 0.36
15 0.40 0.42 0.55
16 0.40
17 0.40 0.40 0.55
[Figure 2]
Figure 2
5iqy: (010) and (100) projections of the I1− site arrangement in the unit cell (only sites with occupancies ≥ 0.40): (violet) 120 (= 15 × 8) sites obtained by SAD-SMAR and Fourier recycling (which are also present in the deposited refinement; r.m.s.d. between found and deposited sites is 0.43 Å); (pink) 16 (= 2 × 8) additional sites only present in the deposited refinement (0.53 and 0.40 occupancies) (see Table 5[link]).

3k9g: a = 55.81, c = 200.90 Å, P4312. A plasmid partition protein (Abendroth et al., 2011[Abendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83-95.]; SAD and refinement data deposited in PDB entry 3k9g). According to the structure refinement deposited in the PDB, there are 12 I1− sites, one macromolecule and 91 water molecules in the a.u. (1858 atoms) with 6.6 I1− in the 12 sites. Application of SAD-SMAR yields nine coincident I1− sites (B [\cong] 19.8 Å2) (r.m.s.d. = 0.35 Å) which justify a total of 5.3 I1−, i.e. 81% of the refined I1−content. By normalizing the sum of the heights of the nine strongest Fourier peaks to 5.3, the respective found and deposited site occupancies (using the original site labelling) are I1: 0.91, 0.99; I2: 0.91, 1.00*; I3: 0.60, 0.61; I4: 0.50, 0.53; I5: 0.59, 0.38; I6: 0.54, 0.38; I7: 0.60, 0.53; I10: 0.32, 0.36; I12: 0.33, 0.29 (* truncated to 1.00).

3km3: a = 84.66, c = 140.74 Å, R3(H). A de­oxy­cytidine triphosphate deaminase (Abendroth et al., 2011[Abendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83-95.]; SAD and refinement data deposited in PDB entry 3km3). The refinement in the PDB includes 16 I1− sites, two macromolecules and 516 water molecules in the a.u. (10 752 atoms) with 10.9 I1− in the 16 sites. SAD-SMAR gives 13 coincident I1− sites (B [\cong] 6.2 Å2) (r.m.s.d. = 0.43 Å) which justify the 87% of the refined I1−content. (Due to the large variability of the individual isotropic B values affecting the metal sites, no attempt to estimate the site occupancies from the corresponding peak heights was made.)

3men: a = 45.70, b = 162.12, c = 173.07 Å, P212121. An acetyl­polyamine amino­hydro­lase (Abendroth et al., 2011[Abendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83-95.]; SAD and refinement data deposited in PDB entry 3men). According to the deposited data, there are 35 sites occupied by ∼23.1 I1−, four macromolecules and 516 water molecules in the a.u. (10 825 atoms). Application of SAD-SMAR yields 33 coincident I1− sites (B [\cong] 10.5 Å2) (r.m.s.d. = 0.24 Å) which justify ∼92% of the refined I1−content. The r.m.s.d. between the 33 found and corresponding deposited site occupancies is 0.172.

2g4h: a = 182.16 Å, F432. A Cd-containing apoferritin measured at BL X12 (EMBL/DESY, Hamburg) (Mueller-Dieckmann et al., 2007[Mueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366-380.]; SAD and refinement data deposited in PDB entry 2g4h). Anomalous signal for Cd2+ at λ = 2.00 Å is large ([f_{0{\rm Cd}}^{\prime\prime}] ∼7.2 e). According to the deposited refinement, the a.u. contains five Cd2+ sites (with occupancies > 0.10), two Cl1− sites, 101 water molecules and one apoferritin subunit (a macromolecule with 1374 atoms). Apoferritin is made up of 24 such protein subunits which assemble to form a roughly spherical hollow shell, with an external diameter of ∼120 Å and an internal diameter of ∼80 Å (Chrichton, 2019[Chrichton, R. (2019). Biological Inorganic Chemistry: a New Introduction to Molecular Structure and Function, 3rd ed. Amsterdam: Elsevier B. V.]). The shell is placed at the nodes of the F lattice complex. Application of SAD-SMAR yields the five Cd2+ sites (B [\cong] 33.7 Å2) with the found positions and occupancies close to the deposited values (r.m.s.d. between corresponding sites is 0.32 Å). The respective found and deposited occupancies (using the original site labelling) are Cd1: 0.50, 0.50; Cd2: 0.25, 0.25; Cd3: 0.14, 0.20; Cd4: 0.20, 0.18; Cd5: 0.14, 0.16). The Cd1 sites are located pairwise (∼8 Å separation) at the 12 vertices of a cubo-octahedron centred at (0, 0, 0) (with opposite vertices separated by ∼129 Å), i.e. close to the external diameter of the hollow shell. The same applies for Cd2 but with a somewhat longer intra-pair distance (∼13 Å) and a separation between opposite vertices of ∼75 Å which roughly corresponds to the internal diameter of the hollow shell.

5.3. S-SAD phasing

The data sets of Pf1117 and Pf0907, two hypothetical proteins from Pyrococcus furiosus, were collected at BL X06DA at the Swiss Light Source (Weinert et al., 2015[Weinert, T., Olieric, V., Waltersperger, S., Panepucci, E., Chen, L., Zhang, H., Zhou, D., Rose, J., Ebihara, A., Kuramitsu, S., Li, D., Howe, N., Schnapp, G., Pautsch, A., Bargsten, K., Prota, A. E., Surana, P., Kottur, J., Nair, D. T., Basilico, F., Cecatiello, V., Pasqualato, S., Boland, A., Weichenrieder, O., Wang, B. C., Steinmetz, M. O., Caffrey, M. & Wang, M. (2015). Nat. Methods, 12, 131-133.]; the corresponding SAD and refinement information deposited with respective PDB codes 4tno and 4pgo).

4tno: a = 47.21, c = 82.28 Å; P41212. According to the deposited data, its a.u. contains one macromolecule, three me­thio­nine S atoms and two Cl1− (709 atoms; [f_{0{\rm Cl}}^{\prime\prime}] ∼ 1.20 and [f_{0{\rm S}}^{\prime\prime}] ∼ 0.95 e). Application of SAD-SMAR yields the two Cl1− and two S atoms (B [\cong] 47.9 Å2). The third (more disordered) S atom could not be located. The r.m.s.d. between found and deposited positions is 0.48 Å.

4pgo: a = 88.50, c = 73.12 Å; P6522. The deposited data indicate that besides the macromolecule and water molecules, there are two me­thio­nine S atoms and two Cl1− in the a.u. (∼689 atoms). Application of SAD-SMAR leads to the same AS model (B [\cong] 57.3 Å2) with r.m.s.d. = 0.56 Å.

The third and last example is the PB1 domain of the human scaffold protein NBR1 (Müller et al., 2006[Müller, S., Kursula, I., Zou, P. & Wilmanns, M. (2006). FEBS Lett. 580, 341-344.]):

2g4s: a = 101.40, c = 42.59 Å; P6322. The data set was collected at BL X12 (EMBL/DESY, Hamburg) (Mueller-Dieckmann et al., 2007[Mueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366-380.]; SAD and refinement data deposited in PDB entry 2g4s). According to the deposited refinement, the a.u. contains, besides the macromolecule and the refined water molecules, four me­thio­nine S atoms (one of them with a higher B value) (689 atoms; [f_{0{\rm Cl}}^{\prime\prime}] ∼1.11 and [f_{0{\rm S}}^{\prime\prime}] ∼0.91 e). Application of SAD-SMAR shows the four expected S atoms (B [\cong] 42.6 Å2), three of them as the three strongest Fourier peaks with a r.m.s.d. of only 0.18 Å compared with the deposited model. The fifth-ranked Fourier peak corresponds to the fourth S atom (the one with the higher B value in the refinement) and is shifted by 1.1 Å from the deposited position. The fourth-ranked Fourier peak could not be assigned (perhaps corresponding to some missing Cl1−).

6. Conclusions

Based on the experimental conditions covered by the test examples, it may be concluded that SAD-SMAR can solve efficiently AS substructures from SAD data (i) with upper resolution limits (RESSMAR) between 2.50 and 3.3 Å; (ii) with average Bijvoet ratios of 0.065 (for SeMet derivatives), 0.014 (for S-SAD phasing) and 0.041 (for soaked native crystals); (iii) with [\langle| D |/\sigma (| D | )\rangle] values greater than 1.5; and (iv) with [\langle| D|/\sigma (| D |)\rangle] values for the outermost resolution shell ranging from 0.90 to 1.69 (the average being 1.25). The cut-off values of the various rejection criteria used in the tests have been ECUT [\cong] 0.25, r.s.m.d.(|D|) = 4 and DFCUT = ∼0.4. The introduction of DFCUT ensures the suppression of the less reliable |D|'s while keeping enough observations for a satisfactory DM run. It can be clearly seen that the corresponding CCh values are close to 0.88 for converging trials (with the corresponding RCC values lying between 51 and 69). Since for non-converging trials CCh values are normally smaller by 0.02–0.03 (and RCC values are in general 1.3–1.4 times larger), identification of the correct trials should not be a problem. Notable is how quickly convergence is reached, especially for SeMet derivatives and for soaked native crystals. For native crystals with only S and/or Cl as AS, the test results clearly indicate that SAD-SMAR can be successfully applied to them. In the three test structures, the S atoms belong to me­thio­nine amino acids and no disulfide bridges are present. Since SAD-SMAR only considers the lattice symmetry operations, it processes the initial phase estimates derived from the randomly shifted M′ modulus function quite efficiently (Rius & Torrelles, 2021[Rius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339-347.]). As shown in Table 4[link], the ρpeak/σ limit for considering the peaks at the end of Fourier recycling as part of the structure model can usually be set between 5.0 and 7.0.

To evaluate the influence of the DFCUT value in the phase refinement results, the test calculations were repeated with DFCUT = 0.0 and the results included in Table 3[link] for comparison. It can be seen that, for converging trials, the CCh values are similar (∼0.88) and the RCC values are 2 or 3 units larger (an increase which is otherwise logical since the less reliable |D| values enter in the calculation). The comparison of the number of converging trials (n.c.t.) for both series of calculations indicates that DFCUT = ∼0.4 gives significantly higher n.c.t. values only for 2g4s and 4tno (by factors 1.89 and 1.10, respectively). This is surely related to their higher Rfree values (0.323 and 0.305, respectively) when compared with the median Rfree value of the PDB (0.265).

One characteristic of SAD-SMAR is the delivery of almost complete models when it converges. Most probable causes of non-convergence are, besides the poor quality of the experimental data, some functional limitations of the model description, e.g. when the resolution of the data is not enough to resolve the AS peaks in the Fourier map. Fortunately, due to the large separation among anomalous scatterers, this limitation is generally not a problem. However, at intermediate resolutions (>2.0 Å), the presence of disulfide bridges in proteins, e.g. between cysteine residues, represents a limitation of the otherwise highly effective ipp procedure (the approximate spherical symmetry of individual S Fourier peaks is lost in the overlapped S–S peak). This problem has already been addressed in SHELXD (Usón & Sheldrick, 2018[Usón, I. & Sheldrick, G. M. (2018). Acta Cryst. D74, 106-116.]; Sheldrick, 2010[Sheldrick, G. M. (2010). Acta Cryst. D66, 479-485.]). It is clear that adapting the ipp philosophy to the treatment of disulfide bridges would considerably expand the scope of SAD-SMAR in S-SAD phasing.

APPENDIX A

By considering [i = \exp[i(\pi / 2)]] in expression (2)[link], this can be written as

[{F_H} =\left| {F_H^\prime} \right|\exp(i\varphi _H^\prime) + \left| {F_H^{\prime\prime}} \right|\exp\left[i\left(\varphi _H^{\prime\prime} + {\pi \over 2} \right)\right].\eqno(20)]

Multiplication of FH and [{F_{ - H}}] by their respective complex conjugates gives, after some algebraic manipulation (Ramachandran & Srinivasan, 1970[Ramachandran, G. N. & Srinivasan, R. (1970). Fourier Methods in Crystallography. New York: John Wiley & Sons Inc.]),

[\left| {F_{ \pm H}} \right|^2 = \left| {F_H^\prime} \right|^2 + \left| {F_H^{\prime\prime}} \right|^2 \pm 2\left| {F_H^\prime} \right|\left| {F_H^{\prime\prime}} \right| \sin\left(\varphi _H^\prime - \varphi _H^{\prime\prime} \right).\eqno(21)]

Addition of [| {F_{ + H}} |^2] and [| {F_{ - H}}|^2] leads to

[2\left| {F_H^\prime} \right|^2= \left| {F_{ + H}} \right|^2 + \left| {F_{ - H}} \right|^2 - 2\left| {F_H^{\prime\prime}} \right|^2.\eqno(22)]

Likewise, by calculating their difference, the expression

[\left| {F_{ + H}} \right|^2 - \left| {F_{ - H}} \right|^2 = 4\left| {F_H^\prime} \right|\left| {F_H^{\prime\prime}} \right|\sin\left(\varphi _H^\prime - \varphi _H^{\prime\prime} \right)\eqno(23)]

is obtained which when squared yields

[8\left| {F_H^{\prime\prime}} \right|^2\sin^2\left(\varphi _H^\prime - \varphi _H^{\prime\prime} \right) = {{\left(\left| F_{ + H} \right|^2 - \left| F_{ - H} \right|^2 \right)^2} \over {2\left| {F_H^\prime} \right|^2}}\eqno(24)]

[= \left| {D_H} \right|^2{{\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 + 2\left| F_{ + H} \right|\left| F_{ - H} \right|} \over {\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 - 2\left| {F_H^{\prime\prime}} \right|^2}}\eqno(25)]

wherein

[{D_H} = \left| F_{ + H} \right| - \left| F_{ - H} \right|.\eqno(26)]

By squaring (26)[link], it follows, after rearranging terms, that [2| F_{ + H}|| F_{ - H}|] [ = | F_{ + H}|^2 + | F_{ - H}|^2 - | {D_H} |^2]. Replacement of [2| F_{ + H}|| F_{ - H} |] in (25)[link] gives

[= 2\left| {{D_H}} \right|^2{{\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 - \left| D_H \right|^2/2} \over {\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 - 2\left| {F_H^{\prime\prime}} \right|^2}}.\eqno(27)]

Finally, for Bijvoet pairs satisfying the conditions

[\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 \,\gg \left| D_H \right|^2/2 \eqno(28)]

[\left| F_{ + H} \right|^2 + \left| F_{ - H} \right|^2 \,\gg 2\left| {F_H^{\prime\prime}} \right|^2 \eqno(29)]

the fractional term in (27)[link] tends to 1, so that

[4\left| {F_H^{\prime\prime}} \right|^2\sin^2\left(\varphi _H^\prime - \varphi _H^{\prime\prime} \right) \simeq \left| {{D_H}} \right|^2 \eqno(30)]

holds.

This expression is also valid for structures containing anomalous scatterers of different type.

Supporting information


Acknowledgements

The support and advice of Professor Xavier Gomis-Rüth (IBMB, CSIC) are highly appreciated. Dr Oriol Vallcorba (ALBA Synchrotron, Barcelona) and the two anonymous referees are also acknowledged for their valuable suggestions.

Funding information

The following funding is acknowledged: Project RTI2018-098537-B-C21 funded by MCIN/AEI/10.13039/501100011033/ and by `ERDF A way of making Europe'; Severo Ochoa FUNFUTURE (CEX2019-000917-S) funded by MCIN/AEI/10.13039/501100011033/.

References

First citationAbendroth, J., Gardberg, A. S., Robinson, J. I., Christensen, J. S., Staker, B. L., Myler, P. J., Stewart, L. J. & Edwards, T. E. (2011). J. Struct. Funct. Genomics, 12, 83–95.  CrossRef CAS PubMed Google Scholar
First citationArolas, J. L., Goulas, T., Pomerantsev, A. P., Leppla, S. H. & Gomis-Rüth, F. X. (2016). Structure, 24, 25–36.  Web of Science CrossRef CAS PubMed Google Scholar
First citationChrichton, R. (2019). Biological Inorganic Chemistry: a New Introduction to Molecular Structure and Function, 3rd ed. Amsterdam: Elsevier B. V.  Google Scholar
First citationCianci, M., Helliwell, J. R. & Suzuki, A. (2008). Acta Cryst. D64, 1196–1209.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationDebaerdemaeker, T., Germain, G., Main, P., Tate, C. & Woolfson, M. M. (1987). MULTAN87. A System of Computer Programs for the Automatic Solution of Crystal Structures from X-ray Diffraction Data. University of York, England.  Google Scholar
First citationEinspahr, H., Suguna, K., Suddath, F. L., Ellis, G., Helliwell, J. R. & Papiz, M. Z. (1985). Acta Cryst. B41, 336–341.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationGiacovazzo, C. (2014). Phasing in Crystallography: a Modern Perspective. Oxford University Press.  Google Scholar
First citationGoulas, T., Garcia-Ferrer, I., Hutcherson, J. A., Potempa, B. A., Potempa, J., Scott, D. A. & Gomis-Rüth (2016). Mol. Oral Microbiol. 31, 472–485.  Google Scholar
First citationGrosse-Kunstleve, R. W. & Adams, P. D. (2003). Acta Cryst. D59, 1966–1973.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W. & Brunger, A. T. (1999). Acta Cryst. D55, 1568–1577.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationHendrickson, W. A., Smith, J. L., Phizackerley, R. P. & Merritt, E. A. (1988). Proteins, 4, 77–88.  CrossRef CAS PubMed Web of Science Google Scholar
First citationHendrickson, W. A. & Teeter, M. M. (1981). Nature, 290, 107–113.  CrossRef CAS PubMed Web of Science Google Scholar
First citationKanitz, M., Blanck, S., Heine, A., Gulyaeva, A. A., Gorbalenya, A. E., Ziebuhr, J. & Diederich, N. E. (2019). Virology, 533, 21–33.  Web of Science CrossRef CAS PubMed Google Scholar
First citationKarle, J. (1980). Int. J. Quantum Chem.: Quantum Biol. Symp. 7, 357–367.  CAS Google Scholar
First citationKrishna Das, B., Kumar, A., Maindola, P., Mahanty, S., Jain, S. K., Reddy, M. K. & Arockiasamy, A. (2016). Biochem. Biophys. Res. Commun. 473, 1152–1157.  Web of Science CrossRef CAS PubMed Google Scholar
First citationLeonarski, F., Redford, S., Mozzanica, A., Lopez-Cuenca, C., Panepucci, E., Nass, K., Ozerov, D., Vera, L., Olieric, V., Buntschu, D., Schneider, R., Tinti, G., Froejdh, E., Diederichs, K., Bunk, O., Schmitt, B. & Wang, M. (2018). Nat. Methods, 15, 799–804.  Web of Science CrossRef CAS PubMed Google Scholar
First citationLópez-Pelegrín, M., Cerdà-Costa, N., Martínez-Jiménez, F., Cintas-Pedrola, A., Canals, A., Peinado, J. R., Marti-Renom, M. A., López-Otín, C., Arolas, J. L. & Gomis-Rüth, F. X. (2013). J. Biol. Chem. 288, 21279–21294.  Web of Science PubMed Google Scholar
First citationMain, P. (1976). Crystallographic Computing Techniques, edited by F. R. Ahmed, K. Huml & B. Sedlácek, pp. 97–105. Copenhagen: Munksgaard.  Google Scholar
First citationMiller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). J. Appl. Cryst. 27, 613–621.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMueller-Dieckmann, C., Panjikar, S., Schmidt, A., Mueller, S., Kuper, J., Geerlof, A., Wilmanns, M., Singh, R. K., Tucker, P. A. & Weiss, M. S. (2007). Acta Cryst. D63, 366–380.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMukherjee, A. K., Helliwell, J. R. & Main, P. (1989). Acta Cryst. A45, 715–718.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationMüller, S., Kursula, I., Zou, P. & Wilmanns, M. (2006). FEBS Lett. 580, 341–344.  Web of Science CrossRef PubMed Google Scholar
First citationNass, K., Cheng, R., Vera, L., Mozzanica, A., Redford, S., Ozerov, D., Basu, S., James, D., Knopp, G., Cirelli, C., Martiel, I., Casadei, C., Weinert, T., Nogly, P., Skopintsev, P., Usov, I., Leonarski, F., Geng, T., Rappas, M., Doré, A. S., Cooke, R., Nasrollahi Shirazi, S., Dworkowski, F., Sharpe, M., Olieric, N., Bacellar, C., Bohinc, R., Steinmetz, M. O., Schertler, G., Abela, R., Patthey, L., Schmitt, B., Hennig, M., Standfuss, J., Wang, M. & Milne, C. J. (2020). IUCrJ, 7, 965–975.   Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
First citationRamachandran, G. N. & Srinivasan, R. (1970). Fourier Methods in Crystallography. New York: John Wiley & Sons Inc.  Google Scholar
First citationRead, R. J., Adams, P. D., Arendall, W. B., Brunger, A. T., Emsley, P., Joosten, R. P., Kleywegt, G. J., Krissinel, E. B., Lütteke, T., Otwinowski, Z., Perrakis, A., Richardson, J. S., Sheffler, W. H., Smith, J. L., Tickle, I. J., Vriend, G. & Zwart, P. H. (2011). Structure, 19, 1395–1412.  Web of Science CrossRef CAS PubMed Google Scholar
First citationRius, J. (2011). XLENS_v1: a Computer Program for Solving Crystal Structures from Diffraction Data by Direct Methods. Institut de Ciència de Materials de Barcelona, CSIC, Spain, https://crystallography.icmab.es/softwareGoogle Scholar
First citationRius, J. (2020). Acta Cryst. A76, 489–493.  Web of Science CrossRef IUCr Journals Google Scholar
First citationRius, J. & Torrelles, X. (2021). Acta Cryst. A77, 339–347.  Web of Science CrossRef IUCr Journals Google Scholar
First citationRossmann, M. G. (1961). Acta Cryst. 14, 383–388.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationSchneider, T. R. & Sheldrick, G. M. (2002). Acta Cryst. D58, 1772–1779.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSheldrick, G. M. (2010). Acta Cryst. D66, 479–485.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSheldrick, G. M. & Gould, R. O. (1995). Acta Cryst. B51, 423–431.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationUsón, I. & Sheldrick, G. M. (2018). Acta Cryst. D74, 106–116.  Web of Science CrossRef IUCr Journals Google Scholar
First citationWang, B. C. (1985). Methods Enzymol. 115, 90–112.  CrossRef CAS PubMed Google Scholar
First citationWeinert, T., Olieric, V., Waltersperger, S., Panepucci, E., Chen, L., Zhang, H., Zhou, D., Rose, J., Ebihara, A., Kuramitsu, S., Li, D., Howe, N., Schnapp, G., Pautsch, A., Bargsten, K., Prota, A. E., Surana, P., Kottur, J., Nair, D. T., Basilico, F., Cecatiello, V., Pasqualato, S., Boland, A., Weichenrieder, O., Wang, B. C., Steinmetz, M. O., Caffrey, M. & Wang, M. (2015). Nat. Methods, 12, 131–133.  Web of Science CrossRef CAS PubMed Google Scholar
First citationWilson, K. S. (1978). Acta Cryst. B34, 1599–1608.  CrossRef CAS IUCr Journals Web of Science Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733
Follow Acta Cryst. A
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds