research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733

Updating direct methods III. Reduction of structural complexity when first-rank semi-invariants are estimated via the Patterson map

crossmark logo

aIstituto di Cristallografia, Consiglio Nazionale delle Ricerche (CNR), Via G. Amendola 122/o, Bari, 70126, Italy
*Correspondence e-mail: [email protected]

Edited by L. Palatinus, Czech Academy of Sciences, Czechia (Received 23 August 2024; accepted 11 April 2025; online 22 May 2025)

A new theory for the probabilistic estimation of first-rank one-phase semi-invariants is presented. In this approach, atomic positions are treated as primitive random variables but are constrained by the a priori knowledge of interatomic vectors. This information is always available, thus allowing the new technique to be considered an ab initio probabilistic method conditioned by the knowledge of the Patterson map. The theoretical foundation for the estimation of triplet invariants was outlined in the first paper of this series [Giacovazzo (2019). Acta Cryst. A75, 142–157]. Subsequent experimental tests, shown in the second paper of this series [Burla et al. (2024). J. Appl. Cryst. 57, 1011–1022], have demonstrated the significant superiority of this new approach over existing methods. The improvements were so notable that it has been suggested this technique could be valuable for the ab initio solution of macromolecular structures. This work expands the probabilistic approach to include the estimation of first-rank one-phase semi-invariants, The hope is that they can contribute to the ab initio solution of macromolecular structures. Only in this way can one-phase semi-invariants go from being a historical curiosity to an effective tool for solving macromolecular structures.

1. Notation

DM, direct methods.

ASU, asymmetric unit.

Cs = (Rs, Ts), for s = 1,…, m. m is the number of symmetry operators Cs. Rs is the rotational part, Ts is the translational part.

t, the number of atoms in the asymmetric unit.

Nmt, the number of atoms in the unit cell.

fj, j = 1, …, N, atomic scattering factors (thermal factor included).

Zj, atomic number of the jth atom.

Mathematical equation, the structure factor.

Mathematical equation, the normalized structure factor of F. For equal-atom structures

Mathematical equation

Mathematical equation, interatomic vectors between the ith and the jth atoms.

H = (HKL) = Mathematical equation, (HKL) are the indices of a generic one-phase structure semi-invariant of first rank, and (h, k, l) are the indices of the reflection h.

1-ss, abbreviation for one-phase semi-invariant of the first rank.

Paper I: Giacovazzo (2019[Giacovazzo, C. (2019). Acta Cryst. A75, 142-157. ]).

Paper II: Burla et al. (2024[Burla, M. C., Giacovazzo, C. & Polidori, G. (2024). J. Appl. Cryst. 57, 1011-1022.]).

2. Introduction

The primary characteristics of 1-ss and their possible pivotal role in methods for addressing the phase problem in crystallography can be fully appreciated only when their unique properties are understood. To facilitate comprehension, let us begin with a concise overview of these properties.

A shift in the origin affects the translational components of symmetry operators without altering their rotational elements. This distinction was noted by Hauptman & Karle (1953[Hauptman, H. & Karle, J. (1953). The Solution of the Phase Problem. I. The Centrosymmetrical Crystal. ACA Monograph No. 3. New York: Polycrystal Book Service.], 1956[Hauptman, H. & Karle, J. (1956). Acta Cryst. 9, 45-55.]) in their seminal works. Consequently, any origin shift may modify the algebraic expression of the structure factors. For instance, in the space group P2/m, shifting the origin from an inversion centre to a point on a twofold axis results in a change in the algebraic form of the structure factor. However, moving the origin from one inversion centre to another does not result in any change. It can be said that the eight origins of the space group P2/m fall into the same equivalence class, while origins located on the twofold axes belong to another equivalence class.

All the origins allowed by a fixed functional form of the structure factors are called allowed origins and they are connected by the translational vectors Mathematical equation, called allowed translations, defined (Giacovazzo, 1974[Giacovazzo, C. (1974). Acta Cryst. A30, 390-395.]) by

Mathematical equation

where I is the identity matrix and V is a vector with zero or integer components. For centred space groups V may be a centring vector. The question of whether there are reflections whose phases do not vary under an allowed translation was quickly resolved: these reflections are known as structure semi-invariants. From a geometrical perspective, the necessary and sufficient condition for a phase Mathematical equation to be a structure semi-invariant is that the lattice planes (hkl) must contain all the origins permitted by the functional form of the structure factor. This unique characteristic enables the estimation of 1-ss phases from the observed diffraction amplitudes. No other type of reflection exhibits this particular characteristic.

The theory of representations (Giacovazzo, 1977[Giacovazzo, C. (1977). Acta Cryst. A33, 933-944.], 1980[Giacovazzo, C. (1980). Acta Cryst. A36, 704-711.]) subdivided the semi-invariants into two classes. Referring to the one-phase structure semi-invariants only, Mathematical equation is a structure semi-invariant of first rank (in our notation 1-ss) if its vectorial index H satisfies the relation

Mathematical equation

for at least one (in this case the pth) of the m rotation matrices. There are (Mathematical equation) types of 1-ss for each space group, not all different from each other. Equation (2)[link] is a constraint not only for the vectorial index H but also for the indices h: it is indeed a diophantine equation. For example, in P212121 the 1-ss reflections H have indices (0, 2k, 2l), (2h, 0, 2l) and (2h, 2k, 0), while the indices h are of type (fkl), (hfl) and (hkf), respectively, where f is a free integer. So, not only a reflection, but a set of reflections h must satisfy equation (2) for each fixed H. We will denote by Mathematical equation the set of reflections h that satisfy equation (2). A list of conditions defining the 1-ss for any space group has been given by Giacovazzo (1998[Giacovazzo, C. (1998). Direct Phasing in Crystallography. Oxford University Press.]) on p. 259.

The one-phase semi-invariants of second rank have indices which satisfy equation (1), but it is impossible to find indices h which satisfy equation (2). For example, in P212121 reflections of type (eee) with e ≠ 0 are semi-invariants of the second rank.

This article focuses exclusively on 1-ss of the first rank. Their importance for DM lies in the possibility of estimating the Mathematical equation value directly from the amplitudes of the structure factors h. In fact, each 1-ss with vectorial index H enters a set of triplet invariants of the type

Mathematical equation

where h usually involves a free index. Estimating the triplet phase Mathematical equation is equivalent to estimating Mathematical equation. Unfortunately, despite various mathematical approaches, a sufficiently accurate estimate of Mathematical equation's has never been achieved using the classical DM. This has relegated 1-ss to more of a historical curiosity rather than a useful tool for structural solution, even for small molecules.

Giacovazzo (2019[Giacovazzo, C. (2019). Acta Cryst. A75, 142-157. ]) introduced a new perspective in Paper I of this series. His seminal approach maintains the use of atomic positions as primitive random variables, similar to the classical DM, but now these positions cannot freely span the full unit cell – they must conform to interatomic vectors observable in a Patterson map. In this new framework probabilistic formulas were developed for estimating triplet invariant phases.

Burla et al. (2024[Burla, M. C., Giacovazzo, C. & Polidori, G. (2024). J. Appl. Cryst. 57, 1011-1022.]), in what will henceforth be referred to as Paper II, were the first to verify the reliability of these new formulas. The belief was that the conformity with a Patterson map is a strong source of prior information, which is capable, at least from a theoretical point of view, of improving estimates of triplet invariants compared with probabilistic approaches that do not use the same constraint.

Burla et al.'s (2024[Burla, M. C., Giacovazzo, C. & Polidori, G. (2024). J. Appl. Cryst. 57, 1011-1022.]) findings demonstrated that the quality of the triplet cosines estimated as positive, informed by prior knowledge of the Patterson map, significantly surpasses the quality of the estimates made by Cochran (1955[Cochran, W. (1955). Acta Cryst. 8, 473-478.]). The improvement in estimates was so great that the authors could predict the usefulness of triplet invariants in the ab initio solution of macromolecular structures. From the DM perspective, this amounts to an apparent simplification of structural complexity, which explains the title of this series.

The implementation of this new approach required a profound change in the classic probabilistic procedure. This can be described as a two-step approach: it finds the joint probability distribution function of all the variables of interest involved in the procedure, and from this derives the conditional probability distribution given the available prior information.

For example, if for a crystal structure with structure factors F a partial structure with structure factors Fp is known (the bold character is used to indicate any set of structure factors), the joint probability distribution functions method first calculates the distribution P(F, Fp) and then imposes a priori knowledge of the partial structure through the conditional distribution function P(F|Fp). For example, for triplets the joint probability distribution

Mathematical equation

is first calculated, from which the conditional distribution

Mathematical equation

is obtained.

Unfortunately, not all types of prior information can be treated in this way (see Paper I for some unconventional types of prior information). In particular, Patterson information is a type of information whose treatment requires a different approach from the standard one, and this approach is the one described in Paper I. The reader will in fact find it difficult to imagine a joint probability distribution of the type P(F, P) where P represents the Patterson map, from which the P(F| P) can then be obtained.

The probabilistic approach described in Paper I directly calculates the conditional probability distributions without the intermediate passage with joint probabilities; this allows one to immediately use primitive variables already subjected to the constraints imposed by the Patterson map in the calculations. This change in the mathematical approach not only produces a strong simplification of the calculations but constitutes a new method for the treatment of many types of prior information.

A curiosity remains: why, in the almost century-long development of DM, has this new approach been overlooked? The author thinks that the reverence towards a universally accepted and practised method in classical statistics and in many fields of physics has had a great influence. He is referring to the two steps of joint probability distribution functions and conditional distributions.

Partly the delay is also due to the fact that the modern use of prior information in the study of triplet invariants came after the development of DM, essentially thanks to Peter Main (1976[Main, P. (1976). Crystallographic Computing Techniques, edited by F. R. Ahmed, pp. 97-105. Copenhagen: Munksgaard.]). In that article the Patterson was not classified as possible a priori information, probably because the Fourier transform of the Patterson map is the observed structure factors |F|, and these are already used as prior information in the conditional distributions. Perhaps it was not understood at the time that the Patterson map was a different prior information.

Finally, Patterson as a source of prior information was taken into consideration in the article by Altomare et al. (1992[Altomare, A., Cascarano, G. & Giacovazzo, C. (1992). Acta Cryst. A48, 30-36.]), but the results were disappointing, probably due to a poor formulation of the problem.

In this paper, the probabilistic approach introduced in Paper I is extended to the estimation of the 1-ss. Section 3[link] addresses some algebraic problems that must be preliminarily solved. Other sections examine the joint probability distribution functions for the estimation of the 1-ss using prior information from either the Harker sections or the entire Patterson map. The findings indicate that, in the former scenario, it is sufficient to consider the zero-order moments (those that do not depend on N), while in the latter, it becomes necessary to introduce moments of order 1 (those containing the factor Mathematical equation).

The analysis of all moments involved in the probabilistic distributions is detailed in Appendix A[link]. Section 4[link] (and Appendix B[link]) explores scenarios where the prior information is limited to the Harker sections. Section 5[link] (and Appendix C[link]) covers cases where the prior information comes from the entire Patterson map.

The hope is that the prior information conveyed by the Patterson map will make the estimate of the 1-ss's much more accurate than in the past. If so, this new method will allow them to play an important role in future DM procedures.

A few final words on the ongoing project context. Papers I and II opened up new perspectives for DM. However, it is necessary to achieve a more complete theoretical formulation, which exploits the prior Patterson information for the estimate of the most important structure invariants and semi-invariants.

3. Algebraic characteristics of the one-phase semi-invariants of first rank

This section describes the most important algebraic features of 1-ss: they will suggest to us which probabilistic distributions need to be studied and will allow us to assess what kind of information we can get.

Let us suppose that Mathematical equation is a generic 1-ss. Its phase Mathematical equation does not vary when the origin varies on the allowed origins. But may Mathematical equation be a reflection with phase lying anywhere in the region (0, 2π), or must it have restricted phase values? This is an important question because it defines the type of joint probability distribution we have to derive.

We recall that a reflection with vectorial index h has restricted phase values if a rotation matrix Mathematical equation can be found satisfying the relation

Mathematical equation

If equation (3) is satisfied, the general relation

Mathematical equation

relating the phase of the asymmetric reflection h to those of the symmetry equivalents, restricts the phase Mathematical equation to

Mathematical equation

Equation (3)[link] defines the reflections h with restricted phase values, and equation (5) specifies their allowed phases. Let us now apply the condition (3) to the vectorial index Mathematical equation to check if Mathematical equation is a restricted phase reflection. According to equation (3) Mathematical equation will be symmetry restricted if a rotation matrix Mathematical equation may be found for which

Mathematical equation

It is easy to verify that equation (6) is satisfied if two conditions are simultaneously satisfied:

Mathematical equation

They are satisfied for all the symmetry operators of order 2 because for them Mathematical equation. Thus, in the orthorhombic space groups reflections with Mathematical equation or (e0e) or (ee0) have phases restricted to (0, π). The second condition, however, is not always satisfied. For example, for the space groups P3, P31, P32, the reflections H = (HKL) with Mathematical equation and Mathematical equation are 1-ss without any phase restriction, say they are general acentric reflections with phases lying anywhere between 0 and 2π.

In conclusion, we encounter 1-ss both with unrestricted phases, corresponding to general acentric reflections, and with restricted phase values. However, one might question whether a 1-ss with restricted phase values can have permissible phases other than (0, π). From a strictly logical standpoint, this seems improbable. First let us consider the case of 1-ss without phase restrictions. Experimental data – whether diffraction intensities or the Patterson map – do not allow us to estimate the imaginary component of Mathematical equation due to its centric nature. Therefore, we can only estimate its real component, Mathematical equation. If Mathematical equation is estimated to be positive, then Mathematical equation will likely be closer to 0 than to π. Conversely, if Mathematical equation is estimated to be negative, then Mathematical equation will likely be closer to π. This restriction is crucial to avoid choosing the enantiomorph, which is logically impossible based on the experimental data alone.

Consider now the case of the 1-ss with restricted phase values: let us suppose (by contradiction) that (a, a + π) are the allowed phase values, where a is different from (0, π). In this case estimating Mathematical equation would automatically imply the definition of the enantiomorph. As an example, let us consider a 1-ss with restricted phases (π/4, 3π/4). If we estimate Mathematical equation then Mathematical equation is expected to be closer to π/4, if we estimate Mathematical equation then Mathematical equation is expected to be closer to 3π/4. Choosing between the two should imply a choice of the enantiomorph; thus, if a 1-ss has restricted phase values, the permissible phases are necessarily (0, π). However, this must be demonstrated, and this is done below.

We assume that Mathematical equation has restricted phase values; then, in accordance with equation (3), there will be a rotation matrix Mathematical equation for which Mathematical equation. In accordance with equation (5) Mathematical equation, and then the allowed phase values will be

Mathematical equation

If we prove that Mathematical equation takes integer values, then the allowed values for phase-restricted 1-ss will be (0, π). From the sequence of identities

Mathematical equation

it follows that

Mathematical equation

from which

Mathematical equation

Substituting equation (8) into equation (7) leads to

Mathematical equation

which is always an integer value. In conclusion, allowed phase values for restricted 1-ss must always be (0, π).

The coexistence of phase-restricted and of -unrestricted 1-ss suggests the study of probability distributions for acentric reflections. In particular, we will study Mathematical equation in Section 4[link] and Appendix B[link], and Mathematical equation in Section 5[link] and Appendix C[link].

4. The distribution P(Eh(I−Rp)|Harker sections)

The distribution Mathematical equation is connected with the Patterson superposition techniques (Sheldrick, 1992[Sheldrick, G. M. (1992). Crystallographic Computing 5, edited by D. Moras, A. D. Podjarny & J.-C. Thierry, pp. 145-157. Oxford University Press.]; Pavelčík et al., 1992[Pavelčík, F., Kuchta, L. & Sivý, J. (1992). Acta Cryst. A48, 791-796.]; Caliandro et al., 2014[Caliandro, R., Carrozzini, B., Cascarano, G. L., Comunale, G., Giacovazzo, C. & Mazzone, A. (2014). Acta Cryst. D70, 1994-2006.]; Burla et al., 2023[Burla, M. C., Carrozzini, B., Cascarano, G. L., Giacovazzo, C. & Polidori, G. (2023). Crystals 13, 874-892.]). They calculate the symmetry minimum function (SMF) by combining symmetry-independent Harker sections according to

Mathematical equation

where Mathematical equation is a typical Harker vector. Min is the minimum operator to be applied, pixel by pixel, to the (Mathematical equation) Harker sections. Then a pivot peak in the SMF map is selected and used to calculate the minimum superposition function S(r) between the SMF and the translated Patterson map, according to

Mathematical equation

where Mathematical equation denotes the position of the pivot peak, usually corresponding to a heavy atom with known position.

The SMF is algebraic in nature, and is based on the information contained in the Harker sections: in fact Mathematical equation is nothing other than the Harker section corresponding to the symmetry operator Mathematical equation. In this section we describe a probabilistic method based on the information contained in Harker sections; it will then be generalized in Section 5[link].

In Appendix B[link] we show that the distribution Mathematical equation may be calculated by using only the cumulants of order zero (they are the cumulants that do not depend on the parameter 1/√N): in this case the information contained in the full Patterson map is not accessible. In the general case, in which an acentric 1-ss Mathematical equation is estimated via more Harker sections and, correspondingly, via more reflection sets Mathematical equation we obtain (see Appendix B[link])

Mathematical equation

where Q is a scale factor not depending on Mathematical equation,

Mathematical equation

and

Mathematical equation

Mathematical equation is the μ-th normalized Harker peak amplitude in position Mathematical equation, Mathematical equation is the amplitude corresponding to such μ-th Harker peak and Mathematical equation is the amplitude of the origin Patterson peak. In essence, if Mathematical equation is the position of the jth atom, then Mathematical equation is the corresponding Harker vector and Ij is its peak intensity. In this article, as in the previous papers I and II, we prefer to enumerate the Harker vectors with a free-running index μ and not with j, due to the inevitable peak superposition present in the Harker sections (which breaks the one-to-one relationship between atom and peak).

In the symbol Mathematical equation the subscript emphasizes the fact that more Harker sections may concur to the estimation of a single 1-ss. The prime to the summation over the peaks implies that peaks related by an inversion centre are excluded, and the prime to the summation over the reflections implies that reflections related by inversion are excluded.

From (9) the conditional phase distribution (10) may be derived:

Mathematical equation

If Mathematical equation then Mathematical equation is expected to be near 0, if Mathematical equation then Mathematical equation is expected to be near π.

Equations (9) and (10) need some clarification to be well interpreted. Indeed, according to Section 3[link], our approach cannot estimate Mathematical equation but only Mathematical equation, and therefore we can only decide whether Mathematical equation is closer to 0 or to π.

Equation (10) may also be applied to the case of 1-ss with restricted phase values, which may only take 0 or π values. Then equation (10) still holds, but Mathematical equation must be replaced by Mathematical equation.

The above observations allow us to better understand the relations between Patterson superposition techniques and the probabilistic method described here. In fact the SMF is a map in direct space that coincides with or combines various Harker sections. Its Fourier transform leads to an estimate of the phases of the 1-ss. Conversely, the probabilistic method described here is based on the term Mathematical equation of equation (9), which is nothing other than the Fourier transform of the entire Harker section or part of it. In this case the phase estimates of the 1-ss are immediately available.

5. The conditional distribution P(EH, Eh|Patterson map)

As mentioned in Section 4[link] and in Appendix A[link], the full Patterson map information is available only if the bivariate distribution Mathematical equation is calculated and moments up to the order 1 (they do not depend on the parameter Mathematical equation) are included. This has been accomplished in Appendix C[link] [see equation (33)]. From equation (33) it is easy to calculate the conditional probability distribution

Mathematical equation

where

Mathematical equation

and I0 is the modified Bessel function of order zero.

The first term in equation (12) is the contribution of the Harker sections, the second term comes from the non-Harker regions of the Patterson map. The k's are the cumulants of the distribution Mathematical equation defined in Appendix C[link]. The algebraic expressions of the cumulants are not necessary here. We will make them explicit when (see below) we include in the formula all the reflections h that belong to the sets {h}.

To correctly interpret equation (12) we must remember (see Section 3[link]) that a 1-ss may have restricted phase values [in this case Mathematical equation can only take the values (0, π)], or it is not subject to any restriction on the phases: in this last case Mathematical equation can take any value between 0 and π. In both cases, our theoretical approach is only able to estimate the probability that Mathematical equation is either 0 or π. In conclusion, we can limit ourselves to calculating the probabilities Mathematical equation or Mathematical equation. We obtain

Mathematical equation

The two terms in the hyperbolic tangent argument estimate Mathematical equation independently of each other. Therefore, the Harker sections can indicate phases coinciding with or opposite to the phases suggested by the non-Harker regions. The first term suggests that Mathematical equation is probably close to 0 or π depending on whether k1000 is positive or negative. The second term suggests that Mathematical equation is probably 0 if simultaneously k1020 and Mathematical equation have the same sign, is probably π if k1020 and Mathematical equation have opposite sign.

Let us now generalize the distribution (13) to the case in which all the reflections h that belong to the sets {h} contribute to the Mathematical equation estimate. In accordance with Appendices A and C the cumulants involved in the distribution Mathematical equation are defined as follows:

Mathematical equation

Mathematical equation

Mathematical equation

Mathematical equation

where the prime to the summation over the reflections implies that reflections related by inversion are excluded, and the prime to the summation over the Harker peaks implies that peaks related by an inversion centre are excluded.

Let us compare the generalized equation (13) with the classic formula (Hauptman & Karle, 1953[Hauptman, H. & Karle, J. (1953). The Solution of the Phase Problem. I. The Centrosymmetrical Crystal. ACA Monograph No. 3. New York: Polycrystal Book Service.]; Cochran & Woolfson, 1955[Cochran, W. & Woolfson, M. M. (1955). Acta Cryst. 8, 1-12.]; Giacovazzo, 1978[Giacovazzo, C. (1978). Acta Cryst. A34, 562-574.]),

Mathematical equation

and with the SMF described in Section 4[link]. We observe:

(i) The SMF operates in direct space. By employing the implication transformation techniques, it converts the u coordinates of the Harker peaks into r coordinates of an electron-density map. When symmetrically independent Harker sections are present, the SMF generates a map featuring three-dimensional peaks. For precautionary reasons, their intensities are adjusted using the minimum function. Phases can be derived through a Fourier inversion of the SMF map.

Thus, the SMF technique effectively utilizes the information available in the Harker regions. However, it lacks access to the phase information that the Patterson peak distribution provides in non-Harker regions.

(ii) Equation (14) involves only moments of order 1, and therefore is unable to exploit the information provided by the Harker sections; in fact equation (14) does not contain a zero-order term comparable with the cumulant k1000 present in equation (12) and (13). This is a sign of weakness since the zero-order contributions provided by the Harker sections are typically larger than contributions of order 1.

Equation (14)[link] seems too simplistic to offer reliable estimates of 1-ss. Let us consider, as an example, a symmorphic space group. Equation (14)[link] will indicate Mathematical equation if Mathematical equation, and Mathematical equation if Mathematical equation. In equation (13) the phase indication provided by the term of order 1 is as follows: Mathematical equation if k1020 and Mathematical equation have the same sign, is probably π if k1020 and Mathematical equation have opposite sign.

(iii) In the absence of information on the distribution of peaks in the Patterson map, equation (13) should reduce to equation (14). This is in fact what happens. In fact, for every reflection h we have

Mathematical equation

(iv) The Mathematical equation value estimated by equation (13) can exploit both Harker sections and non-Harker Patterson regions through zero- and 1-order moments, respectively. Therefore, from the perspective of the quantity of information utilized, equation (13) appears to be particularly well equipped for estimating the 1-ss phases.

6. Conclusions

This article is seminal in nature. A probabilistic theory is described that can estimate the one-phase semi-invariants of the first rank using the Patterson map (i.e. the positions of its peaks and the corresponding intensities) as prior information. The foundations on which the theory rests are similar to those described in Papers I and II. The analogy, however, is only partial: in fact the 1-ss satisfy algebraic properties that allow their phase estimation directly from experimental data. Furthermore, the probabilistic formula obtained in this paper cannot be a particular case of that obtained in Paper II for the triplet invariants. In order to explain this last statement, let us recall the observation made in the Introduction: the 1-ss may be obtained from the estimate of the special triplets:

Mathematical equation

Some readers may be tempted to estimate the 1-ss by using the Paper II formula, estimating general triplets, for the estimate of the special triplet Φ defined by (15). This is not allowed. One of the reasons that attests to this impossibility is that the terms that estimate the general triplets are all of order 1 while the semi-invariants are also defined by terms of zero order. In essence, the 1-ss must be estimated from formulas specifically obtained for them.

The aim of this work is to estimate one-phase semi-invariants of first rank by exploiting more information than that exploited by the classic formula (14). If the information gain translates into a better estimate of the 1-ss, then the 1-ss will lose the role of historical curiosity to which they have been condemned so far.

APPENDIX A

Moments of the conditional distributions

In this Appendix the most important moments of the following two conditional probability distribution functions

Mathematical equation

are calculated. The used prior information, say the Harker sections or the full Patterson map, defines the characteristic functions of the two distributions and therefore their algebraic forms.

In accordance with the notation used in the main text, H is the vectorial index of a generic 1-ss, and Mathematical equation is the set of reflections that satisfies equation (2) for a given rotation matrix Mathematical equation. For the sake of simplicity, in most of our calculations we will not emphasize the fact that moments are of conditional type.

A1. Moments of order 0 (say moments not depending on N)

The reader will easily find that Mathematical equation: no Harker or Patterson information may be exploited to change this trivial result. Let us now consider

Mathematical equation

for a given h belonging to the set Mathematical equation. If the information on the Harker sections is available, equation (17) may be rewritten as

Mathematical equation

where Mathematical equation is the standard notation for a generic Harker vector Mathematical equation. In a Patterson map the normalized (with respect to the origin peak intensity Mathematical equation) peak amplitudes are expected to be proportional to

Mathematical equation

where Mathematical equation is the Harker peak amplitude. Equation (18) may then be rewritten as

Mathematical equation

In the case of a symmorphic space group, the origin peak lies on the Harker section but it is not a Harker peak. Therefore, it must be excluded from the summation on the right-hand side of equation (19) because Mathematical equation is by definition a non-zero vector. Since more than one h can belong to the set Mathematical equation, then

Mathematical equation

We prefer to generalize equation (19) in the form

Mathematical equation

to emphasize four important characteristics:

(i) Mathematical equation is a Harker peak, it is not a generic Patterson peak.

(ii) The sum on N (the number of atoms in the unit cell) present in equation (17) is transformed, in equation (19), into a sum on Harker peaks. The larger the number of atoms in the ASU, the smaller the normalized intensities Mathematical equation will be. There is however a compensatory behaviour: the larger the number of atoms in the ASU, the larger the set Mathematical equation will be.

(iii) More rotation matrices Rp may satisfy equation (2). For example, in P212121, for the semi-invariant H = (0, 0, e), two sets of h reflections satisfy equation (2): (0, k, e/2) and(h, 0, e/2), with h and k free indices. The symbol Mathematical equation emphasizes this possible situation.

(iv) Mathematical equation is a parameter with magnitude of order 1/N, which is very small when equation (14) is applied to proteins. The value of Mathematical equation, however, may be sufficiently large because equation (20) implies both a sum on the intensities of the Harker peaks (their number is of order N) and a sum on the reflections that belong to the set {h}.

But can equation (5) be split so that both the real and imaginary parts of Mathematical equation can be estimated? According to Section 3[link], this is not possible; indeed estimating the imaginary part would involve a choice of the enantiomorph. So let us calculate Mathematical equation and Mathematical equation:

Mathematical equation

Mathematical equation

Equation (22) is justified by observing that, if a reflection h belongs to the set Mathematical equation, the reflection −h will also belong to the same set. Therefore

Mathematical equation

If an inversion centre lies on the pth Harker section, then

Mathematical equation

where the prime to the summation over the reflections implies that reflections related by inversion are excluded, and the prime to the summation over the Harker peaks implies that peaks related by an inversion centre are excluded.

An inversion centre does not always lie on the Harker sections. For example, for the space group P3 the Harker section (u, v, 0) of the Patterson group Mathematical equation includes inversion points, but for the space groups P31 and P32 the Harker sections at z = 1/3 and z = 2/3 do not include any inversion point. The problem can be overcome by considering the Harker section at z = 2/3 as part of the Harker section at z = 1/3: the section at z = 2/3 indeed contains peaks which are related to those in the section at z = 1/3 by an inversion centre. Thus equation (23) can be used for any type of 1-ss, no matter if it is with or without restrictions on the phase values.

Let us now calculate the Mathematical equation, Mathematical equation, Mathematical equation, Mathematical equation, Mathematical equation and Mathematical equation moments. Assuming that h is a general reflection, we get

Mathematical equation

where the symbol Mathematical equation means that μ varies freely over the entire Patterson unit cell, Harker sections included, while Mathematical equation implies that the sum is extended only to peaks not related by an inversion centre. As a consequence,

Mathematical equation

The reader will easily find that expression (26) holds for the H reflections too:

Mathematical equation

A2. Moments of order 1 (say moments containing the factor 1/√N)

For a general triplet only four moments are non-vanishing:

Mathematical equation

where Mathematical equation stand for Mathematical equation, respectively. In the triplet with reflections Mathematical equation two of them are symmetry equivalent, and therefore the number of non-vanishing moments reduces to three: Mathematical equation, Mathematical equation, Mathematical equation. We have to check the values of the above three moments.

Let us first calculate the moment Mathematical equation. By definition

Mathematical equation

Simple trigonometric formulas and the identity

Mathematical equation

transform equation (27) into

Mathematical equation

Only the third and the fourth terms contribute to the average. From the third term we obtain the following contributions:

(i)

Mathematical equation

when Mathematical equation, Mathematical equation.

For equal-atom structures Mathematical equation. Accordingly, the term at point (i) is nothing but Mathematical equation. Some comments may be helpful. The term (i) corresponds to the contribution of the origin peak. If Mathematical equation is different from zero and the set {h} is large enough, the overall contribution originating from the entire set {h} is zero. But if Mathematical equation is a zero vector (and this happens regularly in symmorphic space groups, but it can also happen for some symmetry operators in non-symmorphic space groups) then every reflection h makes a positive contribution equal to Mathematical equation. The reason for this behaviour, perhaps unexpected by the reader, has its root in the positivity of the phase of the triplet invariant Mathematical equation. If Mathematical equation and Mathematical equation are sufficiently large, then Mathematical equation and Mathematical equation. In symmorphic space groups, therefore, it is more likely that Mathematical equation rather than Mathematical equation, and the reason lies in the contribution of the origin peak. If the Patterson peaks are not taken into consideration, the positivity will depend only on the Mathematical equation distribution (see Section 5[link]).

(ii)

Mathematical equation

Mathematical equation is a generic Patterson vector Mathematical equation. We can then rewrite the above expression in the form

Mathematical equation

when Mathematical equation.

The Patterson however is always centric, and therefore Mathematical equation and Mathematical equation will exist. In accordance with our previous numerical results, the contribution may be rewritten in the form

Mathematical equation

Since ij the summation over the Patterson peaks does not include the Harker peaks.

(iii)

Mathematical equation

Mathematical equation

An analogous contribution arises from the fourth term. If we collect the contributions (i), (ii), (iii) of the third and fourth terms in a single formula and let h vary within the set Mathematical equation, we finally get the value of Mathematical equation:

Mathematical equation

The value of the moment Mathematical equation may be calculated by the same mathematical technique. The reader will easily find that Mathematical equation.

The same techniques may be applied to estimate the moments Mathematical equation and Mathematical equation. The reader will find Mathematical equation.

APPENDIX B

The conditional distribution P(EH|Harker sections)

Let Mathematical equation be a generic acentric 1-ss. For the sake of simplicity, we will use in this section the simplified notation Mathematical equation. Then the characteristic function corresponding to the distribution Mathematical equation is given by

Mathematical equation

where u and v are carrying variables related to A and B, respectively. In accordance with Appendix A[link] we have assumed Mathematical equation Then

Mathematical equation

Standard calculations lead to

Mathematical equation

where Q represents all the terms that do not depend on φ. A trivial change of variables leads to

Mathematical equation

APPENDIX C

Handling of the distribution P(EH, Eh|Patterson map)

We study here the conditional probability distribution Mathematical equation. Its calculation requires a little more effort than the derivation of Mathematical equation. We will illustrate its derivation with some detail, because it is necessary to introduce some approximations to obtain a well designed formula.

For simplicity, we will denote by Mathematical equation the real and imaginary components, respectively, of Mathematical equation and by Mathematical equation its phase. A2 and B2 will be the real and imaginary parts, respectively, of Mathematical equation. The characteristic function of Mathematical equation is

Mathematical equation

which, expanded in series of moments mijkl up to the terms containing the factor Mathematical equation, gives

Mathematical equation

Equation (30) implicitly defines z. Furthermore (see Section 3[link])

Mathematical equation

In equation (30) we have taken into account the following relationships (see Section 3[link]):

Mathematical equation

In order to express Mathematical equation in terms of cumulants we isolate Mathematical equation according to

Mathematical equation

and then we expand the logarithm according to

Mathematical equation

We obtain

Mathematical equation

from which the following cumulants are obtained:

Mathematical equation

We will neglect the cumulant k3000. Its use is not relevant: it has already been verified that the Wilson distribution offers no benefit due to the higher-order cumulants. Finally, C(u1,v1,u2,v2) may be expressed in terms of cumulants as follows:

Mathematical equation

All the integrals may be exactly calculated via the following general relation:

Mathematical equation

We obtain

Mathematical equation

where

Mathematical equation

Since g has a magnitude of order Mathematical equation, we can introduce the following approximations: in the exponential terms

Mathematical equation

in the square root terms

Mathematical equation

Then the desired joint probability distribution, expressed in polar coordinates, is

Mathematical equation

References

First citationAltomare, A., Cascarano, G. & Giacovazzo, C. (1992). Acta Cryst. A48, 30–36.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationBurla, M. C., Carrozzini, B., Cascarano, G. L., Giacovazzo, C. & Polidori, G. (2023). Crystals 13, 874–892.  CrossRef CAS Google Scholar
First citationBurla, M. C., Giacovazzo, C. & Polidori, G. (2024). J. Appl. Cryst. 57, 1011–1022.  CrossRef CAS IUCr Journals Google Scholar
First citationCaliandro, R., Carrozzini, B., Cascarano, G. L., Comunale, G., Giacovazzo, C. & Mazzone, A. (2014). Acta Cryst. D70, 1994–2006.  Web of Science CrossRef IUCr Journals Google Scholar
First citationCochran, W. (1955). Acta Cryst. 8, 473–478.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationCochran, W. & Woolfson, M. M. (1955). Acta Cryst. 8, 1–12.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationGiacovazzo, C. (1974). Acta Cryst. A30, 390–395.  CrossRef CAS IUCr Journals Google Scholar
First citationGiacovazzo, C. (1977). Acta Cryst. A33, 933–944.  CrossRef IUCr Journals Web of Science Google Scholar
First citationGiacovazzo, C. (1978). Acta Cryst. A34, 562–574.  CrossRef CAS IUCr Journals Google Scholar
First citationGiacovazzo, C. (1980). Acta Cryst. A36, 704–711.  CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationGiacovazzo, C. (1998). Direct Phasing in Crystallography. Oxford University Press.  Google Scholar
First citationGiacovazzo, C. (2019). Acta Cryst. A75, 142–157.   Web of Science CrossRef IUCr Journals Google Scholar
First citationHauptman, H. & Karle, J. (1953). The Solution of the Phase Problem. I. The Centrosymmetrical Crystal. ACA Monograph No. 3. New York: Polycrystal Book Service.  Google Scholar
First citationHauptman, H. & Karle, J. (1956). Acta Cryst. 9, 45–55.  CrossRef IUCr Journals Web of Science Google Scholar
First citationMain, P. (1976). Crystallographic Computing Techniques, edited by F. R. Ahmed, pp. 97–105. Copenhagen: Munksgaard.  Google Scholar
First citationPavelčík, F., Kuchta, L. & Sivý, J. (1992). Acta Cryst. A48, 791–796.  CrossRef Web of Science IUCr Journals Google Scholar
First citationSheldrick, G. M. (1992). Crystallographic Computing 5, edited by D. Moras, A. D. Podjarny & J.-C. Thierry, pp. 145–157. Oxford University Press.  Google Scholar

This article is published by the International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733
Follow Acta Cryst. A
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds