Facing the phase problem

This article reviews phase evaluation in macromolecular crystallography. It accompanies the 2023 Ewald Prize lecture at the 26th IUCr Congress, paying tribute to Paul Ewald’s far-reaching influence.


Introduction
The momentous discovery of X-ray diffraction by Friedrich, Knipping and Laue (Friedrich et al., 1912) has had manifold ramifications.Atomic structure determination is prominent among these, and is arguably pre-eminent.In favorable circumstances, which are not at all atypical, the number of unique reflections in the diffraction from a crystal exceeds the number of parameters needed to define its atomic structure, often manyfold overdeterminatively.Diffraction amplitudes can be measured accurately, whereby X-ray crystallography becomes a manifestly definitive science.However, since the atomic parameters relate nonlinearly to diffraction patterns, structure solution requires having an initial model within the radius of convergence of a minimization procedure that can relate the atomic model to the diffraction data.For complicated structures, such as those of biological macromolecules, suitable initial models are built to fit electron-density distributions from Fourier syntheses, which require phase angles as well as the diffraction amplitudes.Thus, we face the phase problem.

Discovery of X-ray diffraction and structure determination
The announcement of Laue's discovery that crystals can diffract X-rays was followed quickly by contributions from Bragg (1913a) and Ewald (1913) that clarified the conditions for diffraction: (S Á a = h; S Á b = k; S Á c = l) by Laue, where S = s -s 0 is the diffraction vector, a, b and c are unit-cell vectors and h, k and l are integers; n = 2d sin by Bragg, where is the wavelength, d is the spacing between 'reflecting' planes, is the reflection angle and n is an integer; and by Ewald as when the reciprocal-lattice vector S = ha* + kb* + lc* intercepts the s 0 -directed sphere of radius 1/, where a*, b* and c* define the reciprocal unit cell and |S| = 2sin/ = d Ã hkl = 1/d hkl .
A series of papers followed shortly from the Braggs deducing the atomic structures of salts and minerals from their diffraction patterns (Bragg, 1913b(Bragg, , 1914;;Bragg & Bragg, 1913).These initial structures, including NaCl and diamond, were of face-centered cubic crystals with all atomic positions specified by the lattice symmetry.Pyrite (FeS 2 ) and calcite (CaCO 3 ) each have one free parameter, which Bragg determined by comparing relative intensities computed from alternative models with measured intensities.This approach, which came to be known as 'trial and error', was then used in many-parameter structural analyses including those for diopside [CaMg(SiO 3 ) 2 ] (Warren & Bragg, 1929) and benzene derivatives (Lonsdale, 1929(Lonsdale, , 1931)).These early structures were determined by the logic of symmetry and by constraints from especially intense reflections (notably Lonsdale, 1929), but the phases associated with the reflected X-ray waves were not invoked.

Fourier transformation and the phase problem
Images form coherently as scattered waves are collected and then recombined by the optical lens of light microscopes or the electromagnetic lenses of electron microscopes.This is not directly possible for the X-ray diffraction experiment, however, since there are no lenses for hard X-rays; the phases of X-ray waves are lost as the X-ray diffraction is measured.Abbe (1873) described and tested a diffraction theory of image formation in the light microscope, and Porter (1906) provided a mathematical foundation for this double-Fouriertransform theory, which Glaeser et al. (2007) elaborated for the electron microscope.W. H. Bragg picked up on the ideas of Porter (Bragg, 1915) and suggested Fourier analysis for X-ray diffraction, but it was Ewald (1921) who formulated Fourier transformation in practical terms of the reciprocal lattice.Duane (1925) rediscovered electron-density reconstruction by Fourier synthesis.While not directly experimental, this computational approach to the completion of image formation does have the advantages of being free from lens distortions and of offering complete three-dimensional reconstructions.Bragg (1929) implemented the Ewald (1921) formulation of inverse Fourier transformation to produce images for diopside, using the known structure (Warren & Bragg, 1929) to compute phases (actually signs since this structure is centrosymmetric).These appealing images were highly influential, taking hold immediately in structure completion (Lonsdale, 1931), especially so with the innovation of the efficient Fourier strips of Beevers & Lipson (1934).Moreover, systematically direct structure determination would now be possible provided that phases could be determined.Comparison of diffraction from isomorphous series provided the basis for determining phases in the pioneering Fourier analyses of alums (Beevers & Lipson, 1935) and phthalocyanins (Robertson, 1936).When the 'phase problem' was first posed as such is unclear; however, as Patterson (1934) introduced his |F | 2 series, he plainly identified the determination of phases as the problem of X-ray crystal analysis.Certainly, it stood preeminent by the time that Hauptman and Karle boldly titled their ACA Monograph Solution of the Phase Problem I.The Centrosymmetric Crystal (Hauptman & Karle, 1953).

Phase evaluation in the small-molecule tradition
With the adoption of Fourier methods, crystal structure determinations advanced from the simple to the complex, from projections to three-dimensional syntheses, and from centrosymmetric to noncentrosymmetric structures.For a couple of decades, these applications remained in the realm of what we now call small molecules; however, methods from this small-molecule tradition greatly influenced the approaches taken in macromolecular crystallography.
In many cases, the scattering dominance of higher Z elements can be the basis for phase evaluation, either by the isomorphous replacement method or by the heavy-atom method.The Fourier analysis of phthalocyanin from comparison of the apo and nickel-substituted forms (Robertson, 1936) typifies the former approach, and the analysis of platinum phthalocyanin (Robertson & Woodward, 1940) exemplifies the latter.These structures are centrosymmetric and simplified by having the metal ion at the origin.The structure of cholesteryl iodide is noncentrosymmetric, but the general position of its I atom was readily evident in the Patterson synthesis of a centrosymmetric projection, and the iodinephased Fourier synthesis revealed the sterol structure (Carlisle & Crowfoot, 1945).Structures of the alkaloid strychnine were determined in one case using bromine as a heavy atom in combination with Patterson methods (Robertson & Beevers, 1951) and in another by selenium versus sulfur isomorphous replacement (Bokhoven et al., 1951).The latter entailed the complication of unspecified phase alternatives, which prompted the suggestion of double substitutions (selenium for sulfur and bromine for chlorine) to resolve the phase ambiguity (Bokhoven et al., 1951).
Higher Z elements, intrinsically present but not dominating the diffraction, sufficed (along with perseverance and chemical intuition) to solve the challenging structures of penicillin (Crowfoot et al., 1949) and vitamin B 12 (Hodgkin et al., 1957).
However, most organic molecules only have atoms of quasiequal scattering strength (C, N and O, with H being a feeble scatterer), and the addition of heavy atoms (as was performed for cholesterol and strychnine) is not always readily feasible.Fortunately, the need for equal-atom phasing methods was met by the development of direct methods.The number of unique diffraction observations from a typical small-molecule crystal greatly exceeds the number of independent atomic parameters; thus, the reflections must be interrelated.Indeed, both the positivity-based determinants of Karle & Hauptman (1950) and the atomicity-based Sayre's equation (Sayre, 1952) demonstrate interactions among triples of structure factors F(h), F(k) and F(h À k), where h (h, k, l) is a reciprocal-lattice vector.Moreover, from order-three determinants (Karle & Hauptman, 1950), the phase relationship The probabilities for these and other such relationships (Hauptman & Karle, 1953;Karle & Hauptman, 1956) founded the basis for routinely effective phasing procedures for quite substantial equal-atom structures.

Resonance and anomalous scattering
Anomalous diffraction had little impact in small-molecule crystallography, but its potential for phase determination did become appreciated and tested during this pre-macromolecule era (Bijvoet, 1949(Bijvoet, , 1954;;Okaya & Pepinsky, 1956;Ramachandran & Raman, 1956).Some time earlier, Kramers (1924) had already anticipated anomalous dispersion due to resonance with electronic transitions in the atomic model of Bohr (1913), and Mark & Szilard (1925) had characterized such effects in the diffraction from rubidium bromide.
The lattice of RbBr is face-centered cubic, as for NaCl and other alkali halides; however, since Rb + and Br À are isoelectronic, 'normal' diffraction from RbBr is as if from an a/2 primitive cube.Only even orders of [111] reflections are present for 'normal' X-ray scattering, as from Cu K radiation.With Sr K X-rays, chosen for having an energy (14.1 keV) between the rubidium (13.5 keV) and bromine (15.21 keV) absorption edges, the two sites then scatter distinctively and reflections from ( 111) and ( 333) are observed.On considering the phase shifts for such phenomena (Ewald & Hermann, 1927), Ewald retracted his earlier 'proof' (based on the theory for dynamical diffraction, but not yet contemplating anomalous scattering) that Friedel's law always holds.Somewhat later, Coster, Knol and Prins famously bracketed the zinc absorption edge with gold lines to show pronounced differences between the (111) and (111) reflections from zincblende (ZnS) for the ratios of intensities from Au L1 X-rays above the Zn K edge versus from Au L2 X-rays below the edge (Coster et al., 1930).
'Anomalous' scattering arises from resonance between the X-ray energy and the transition energy between a groundstate core orbital and an unoccupied outer orbital, which necessarily is a quantum phenomenon (Ho ¨nl, 1933;Als-Nielsen & McMorrow, 2011).The classical theory of Thomson (1906) for elastic scattering of X-rays from a free electron also applies to X-ray scattering from electrons bound into atomic orbitals but, of course, the electron-density distributions of atoms are also quantal, = *.The two processes (Fig. 1) are additive.Altogether, the atomic scattering factor, f, comprises a 'normal' component due to the Thomson scattering, f 0 , plus the 'anomalous' component, f Á = |f Á |exp(i), which entails an incremental phase shift, , beyond the shift of /2 that accompanies Thomson scattering.In principle, resonant scattering is the actual norm; in practice, however, the X-ray energy is often remote from absorption edges, and anomalies such as departure from Friedel's law are slight.
Because core electronic orbitals are confined near to the nucleus, whereas outer-shell electrons are relatively diffuse, the normal scattering decreases smoothly with diffractionvector amplitude S = |S| = 2sin/.Thereby, f 0 depends on scattering angle but not directly on wavelength.By contrast, f Á barely changes with scattering angle but depends sharply on wavelength when near the absorption-edge resonance.Thus, where f 0 () and f 00 () are the real and imaginary components of the anomalous scattering, respectively.The imaginary component of forward scattering extinguishes part of the incident beam, and is thereby absorptive; thus, f 00 (E) = K (E) E, where is the absorption coefficient.Electrons ejected by photoelectric absorption are recaptured proportionately with absorption, and fluorescent photons are emitted as this happens.This provides a means for measuring f 00 spectra.In turn, the f 0 spectrum is determined from the f 00 values by Kramers-Kronig transformation (Kronig & Kramers, 1928).

Evaluation of phases in macromolecular crystallography
Contemporaneous with the aforementioned advances in structure determination and demonstrations of anomalous scattering, diffraction experiments were also beginning for proteins.It was obvious from diffraction patterns of properly hydrated crystalline pepsin (Bernal & Crowfoot, 1934) that structures of 'a perfectly definite kind' could come from such patterns, but that such structures would not comprise the The atomic scattering factor describes the coherent scattering of X-rays from a given atom.This has two additive components (equation 1): the normal component, f 0 , from Thomson scattering decreases smoothly with the Bragg angle but has no intrinsic dependence on the wavelength, and the anomalous component, f Á , from resonance with electron transitions has a sharp dependence on the wavelength near absorptive transitions but is nearly constant with respect to the scattering angle.Adapted from Box 1 of Hendrickson (1991) with permission from AAAS.
periodicities found by Astbury for -type and -type fibrous proteins (Astbury & Street, 1931).From the large unit-cell dimensions, it was doubtless clear as well that projections would be of little use and that general phases (not signs) would be required.It would take some while for protein crystal structures to be solved.Meanwhile, diffraction-constrained model building provided structural models for the constituent -helices (Pauling et al., 1951) and -sheets (Pauling & Corey, 1951) of proteins.These models took the molecular geometry from Pauling's crystal structures of amino acids and dipeptides, and their approach also paved the way for the doublehelix DNA structure built by Watson & Crick (1953) to explain Franklin's diffraction patterns from B-form DNA (Franklin & Gosling, 1953).What emerged for evaluating phases for diffraction from protein crystals was the method of multiple isomorphous replacement (MIR).Perutz and coworkers showed that effects from Hg atoms added to hemoglobin crystals could suffice for phase determination (Green et al., 1954;Blow, 1958).Although phase information from a single heavy-atom derivative is ambiguous, in general giving two alternatives for each reflection, a second distinctive derivative could resolve the ambiguity, as suggested in Bijvoet's analysis of strychnine (Bokhoven et al., 1951) and elaborated algebraically and graphically by Harker (1956).Harker's phasing diagram illustrated the ambiguity from a single isomorphous derivative [SIR; Fig. 2(a)] and how this can be resolved by including a second derivative [MIR; Fig. 2(b)].Blow & Rossmann (1961) showed that the anomalous scattering from a single heavyatom derivative [Fig.2(c)] could be used to resolve the phase ambiguity [SIRAS; Fig. 2(c)], and North (1965) and Matthews (1966) elaborated its use in the isomorphous replacement method.MIR with anomalous scattering (MIRAS) was first mentioned for the structure of reduced hemoglobin (Muirhead & Perutz, 1963), and single isomorphous replacement with anomalous scattering (SIRAS) was first used for the structure of rubredoxin (Herriott et al., 1970).
Algebraically, the isomorphous-replacement differences with a single derivative are where F P (h) = |F P | exp(i' P ), F PH (h) = |F PH |exp(i' PH ) and F H (h) = |F H |exp(i' H ) are the respective structure factors for reflection h (h, k, l) from the parent protein P, the isomorph PH derivatized with heavy atom H, and the heavy-atom contribution H to PH.When anomalous scattering is present from a single kind of scatterer, the anomalous diffraction differences from Friedel mates (or Bijvoet equivalents) are where jF 0 PH j '  (Hendrickson, 1979).The phase ambiguities are clearly evident from the trigonometric factors of equations ( 2) and (3), and whereas the orthogonality of isomorphous replacement and anomalous diffraction in SIRAS resolves the ambiguity, Macromolecular crystallography also borrowed from the small-molecule tradition in other ways besides its adoption of isomorphous replacement.Although the Patterson function was ill-suited to the heavily overlapped interatomic vectors of proteins themselves, it was well suited to the critical problem of finding the positions of heavy atoms in the derivatives.For this, two developments were instrumental: one is the Patterson synthesis with coefficients ||F PH | À |F P || 2 (Rossmann, 1960) or (Rossmann, 1961), which approximate to the Patterson maps for |F H | 2 , and the other concerns special symmetry-related Patterson sections (Harker, 1936), v = 1 2 for space group P2 1 for example, whereby certain atomic coordinates can be read off directly from self-vectors between symmetry mates.Similarly, while macromolecules themselves are out of the reach of direct methods, the substructures of heavy atoms are accessible from normalized difference coefficients, as shown by Steitz (1968) and now used routinely (Schneider & Sheldrick, 2002).
Harker diagrams (Harker, 1956) were used in manual phase evaluations for the structure of myoglobin at 6 A ˚resolution (Kendrew et al., 1958), which employed five heavy-atom derivatives.This approach proved laborious and fraught (Bodo et al., 1959), andBlow &Crick (1959) supplanted it by a probability treatment that introduced the lack-of-closure error (Fig. 3) and the 'best Fourier' as the probability-weighted inclusion of all phase angles.This Blow-Crick lack-of-closure error is defined as and the phase probability distribution is where the standard error E 2 is the variance for lack of closure for optimal (centroid) phase angles, ' c , which in practice is arrived at by iteration, E 2 = h" 2 (' c )i.
Assuming the independence of information from individual derivatives, the combined probability for multiple isomorphous replacements becomes the product of the individual probabilities.We found that the Blow-Crick formulation (equations 7 and 8) defied analytic reduction to a simplified representation for encoding phase information for facile storage and combination (Hendrickson & Lattman, 1970), but that a related alternative closure definition would do so: lead articles The Gaussian form of (8) then gives the phase probability in terms of " 0 and its associated variance, E 0 2 , which was shown by Blundell & Johnson (1976) to be related to the Blow-Crick variance as With this alternative definition for closure discrepancy, the phase information from isomorphous replacement can define the phasing coefficients A, B, C and D for the distribution The formalism of (11), from which the phasing integrals can be evaluated analytically, also straightforwardly accommodated phase information from other sources, namely anomalous scattering, direct methods, noncrystallographic symmetry and partial structures.Blundell & Johnson (1976) superbly chronicled the methods by which large numbers of macromolecular structures had already been determined; however, at that time many mainstays of biological crystallography were yet to come.The early analyses relied almost exclusively on isomorphous replacements for phase evaluation, albeit often supplemented by information from anomalous scattering.Effective structurerefinement methods had not yet taken hold (Jack & Levitt, 1978;Hendrickson & Konnert, 1980;Konnert & Hendrickson, 1980;Sussman, 1985;Bru ¨nger, 1991).Methods for cDNA cloning and manipulation were just being invented, but recombinant protein expression was not available.Synchrotron radiation was known (Rosenbaum et al., 1971), but crystallography beamlines did not exist.The rotation function was known (Rossmann & Blow, 1962) and molecularreplacement trials had been made (Tollin, 1969;Lattman et al., 1971), but the first atomic-level extension was then only in press (Schmid et al., 1974;Schmid & Herriott, 1976).Phasing from noncrystallographic symmetry had been anticipated (Rossmann & Blow, 1963), but density modification by molecular averaging, as symmetry is now exploited, and by solvent flattening were for the future.Direct methods had been tried for phase refinement (Hendrickson & Karle, 1973;Sayre, 1974), but the improvements were not striking.Without the continuous X-ray spectra from synchrotrons, truly effective use of resonance was not yet possible.

Phase information from partial structures
Although heavy atoms were highly effective in isomorphous replacement phasing, they did not suffice for direct heavyatom phasing of macromolecules.Later, as the body of known structures grew and the molecular-replacement method developed, phase information from partial structures became increasingly important at intermediate stages of structure determination.The Sim probability formula (Sim, 1960) implicitly assumes an error-free partial structure, which effectively is the case for heavy atoms; however, partial structures from molecular replacement often deviate appreciably from the correct answer.A reliable formulation for structure completion needs to take this into account, as was anticipated by Rossmann & Blow (1961) and noted by us (Hendrickson & Lattman, 1970); however, it was not until Read (1986) implemented effective A estimates for the formulation of Srinivasan (1966) that partial structure development was placed on a firm footing.
Following the nomenclature of Sim (1960), the total structure of N atoms is divided into the P atoms of the partial structure model and Q missing atoms.The expected contribution from the missing atoms is 2 To also account for the missing contribution due to errors in the partial model, Srinivasan (1966) introduced an error factor due to Luzzati (1952) into a Sim-like formulation, following developments by Vand & Pepinsky (1957) and Srinivasan & Ramachandran (1965).The phase probability for a given reflection is then where N normalizes for unit integrated probability, |E N | = jF o N j= N is the normalized structure amplitude observed from all N atoms, |E P | = jF c P j= P is the normalized structure amplitude calculated from the P atoms of the model, Dr j is the positional error for the jth atom of the model and 1 = P / N .With appropriate evaluation of A (Read, 1986), the A -based treatment of model errors appreciably facilitates the elaboration of partial structures derived both from molecular replacement and from experimental phasing, and coefficients for the simplified representation (Hendrickson & Lattman, 1970) derive straightforwardly.The phase probability analysis of Blow & Crick (1959) by lack-of-closure error "('), equations ( 7) and ( 8 pseudosymmetry within myohemerythrin (Hendrickson & Ward, 1977).Elegant formulations for the phase information from noncrystallographic symmetry followed (Main & Rossmann, 1966;Crowther, 1967); however, real-space averaging proved simpler to effect, and such phase refinement has become a staple of density modification.Rossmann (1990) persisted in using 'molecular replacement' to describe phasing from noncrystallographic symmetry, but prevailing crystallographic parlance restricts the term to the third category of usage.Demand for such applications awaited the accumulation of known structures and appreciation of the value in analyses of evolutionary homologs and nonisomorphic variants of these knowns.This use gave us a phasing method unique to macromolecules, going beyond the small-molecule tradition, and it has come to prevail in macromolecular crystallography.

Molecular replacement
Molecular replacement, in this usual sense of the term, borrows the phases from a molecule of known structure to determine the crystal structure of an unknown candidate that includes a component structurally similar to this known.An orthogonal transformation is sought to reposition the known structure, {r A }, into the lattice of the unknown: {r B } ' {r A 0 } = R{r A } + t, where R and t are the rotation matrix and translation vector, respectively.The new structure is then composed by Fourier synthesis with phases from the previously known structure, as so transformed: B (r) ' F fjF obs B j expði' calc ½fr A 0 gÞg.Rigid-body refinement improves the transformation, and partial structure methods combined with density modification (described below for SAD ambiguity resolution) can remove bias toward the known calculation.We used molecular averaging and solvent constancy to refine the density for octameric hemerythrin (Ward et al., 1975) after replacement from the structure of myohemerythrin (Hendrickson et al., 1975).
The rotation function (Rossmann & Blow, 1962), typically as fast Fourier transform (FFT) enhanced (Crowther, 1972), is proficient for finding the rotational orientation.The specification of translation can be more problematic.Early translation functions based on symmetry mates (Crowther & Blow, 1967) lacked generality and often proved inconclusive, packing functions have rather low resolution (Hendrickson & Ward, 1976), and brute-force variation is inefficient.Nevertheless, various program systems supported the promulgation of the method.Currently, the Phaser system (McCoy et al., 2007), based on maximum-likelihood procedures (Read, 2001), is commonly used.

Crambin and the power of anomalous diffraction
The early success of isomorphous replacement in macromolecular crystallography was perpetuated by emulation.The method is robust, tolerating imperfect isomorphism and errors from differencing between diffraction patterns measured from two different crystals.Nevertheless, one would ideally obtain the structure from a single native crystal.Moreover, we were unable to derivatize the well diffracting crystals of crambin that Martha Teeter had discovered (Teeter & Hendrickson, 1979).The sequence of crambin does include six disulfidebridged cysteine residues (no methionines), and our experience in using anomalous scattering to locate the Fe atoms in myohemerythrin (Hendrickson et al., 1975) led us to contemplate a similar approach here as well, despite the expectation of much weaker signals.After estimating the precision that would be required, we then established measurement parameters to assure adequate counting statistics for data collection by single-counter diffractometry with a Cu K X-ray source, for which sulfur has f 00 = 0.557 electrons (Cromer & Liberman, 1970).The diffraction was measured to a Bragg spacing limit of 0.945 A ˚.The positions of S atoms were readily deduced from Patterson maps, first at 3 A ˚resolution, where disulfide bridges are not resolved, and then at 1.5 A ˚resolution to separate the six individual S atoms.We computed phase probabilities from a rearranged version of equation ( 3) and attempted the resolution of phase ambiguities from the S atoms as a partial structure.Partial structure probabilities were computed from Sim's formula [i.e.equation ( 12) with A = 1 and Fs instead of Es] and combined by a probabilistic choice procedure (Hendrickson, 1971;Hendrickson & Teeter, 1981).Ambiguity resolution was not fully successful [Fig.4 Ultimately, we refined crambin anisotropically at the full 0.945 A ˚resolution of our data (Smith et al., 1986), and Teeter and coworkers refined the structure against data extended to 0.48 A ˚resolution (Schmidt et al., 2011).Sheldrick et al. (1993) showed that the entire structure could be developed without recourse to anomalous scattering from Patterson-derived S atoms, and Hauptman's group proved that the crambin structure of 418 non-H atoms (including waters) could also be solved directly by their Shake-n-Bake (SnB) procedure (Weeks et al., 1995), again without anomalous scattering.Clearly, crambin is an exceptional protein.There also are notable examples of small-molecule phasing based on anomalous scattering (Hall & Maslen, 1965;Moncrief & Lipscomb, 1966;Fridrichsons & Mathieson, 1967;Venkatesan et al., 1971).All the same, the crambin structure served to demonstrate the exceptional power of anomalous diffraction for phase definition and its potential for directly determining macromolecular structures.

Multi-wavelength anomalous diffraction (MAD)
In the 1950s ferment of phase-problem attention, various formulations were proposed for phase evaluation from anomalous scattering (Ramachandran & Raman, 1956;Okaya & Pepinsky, 1956;Mitchell, 1957;Raman, 1959).Tests of ambiguity resolution by analyses at two wavelengths followed (Ramaseshan et al., 1957;Hoppe & Jakubowski, 1971); however, in the absence of synchrotron radiation these experiments were limited to the X-ray wavelengths of characteristic emissions.The Bremsstrahlung continuum can also generate diffraction, of course, as in the first X-ray diffraction experiments (Friedrich et al., 1912); indeed, although weak, the Bremsstrahlung from a molybdenum anode sufficed to solve the structure of selenolanthionine (Hendrickson, 1985).In this experiment, we supplemented Mo K with three monochromatic wavelengths selected from the Bremsstrahlung to optimize anomalous signals from the Se K edge for phase evaluation by our new multi-wavelength anomalous diffraction (MAD) formulation.Templeton & Templeton (1988) refined the selenolanthionine structure and used synchrotron radiation to characterize the pleochroism of anomalous scattering near the Se K absorption edge.
As synchrotrons took hold, fluorescence spectroscopy experiments revealed very sharp features at many absorption edges, from which relevance to diffraction studies was immediately apparent (Phillips et al., 1978;Templeton et al., 1980;Lye et al., 1980).Influenced by our own synchrotron experience in the X-ray absorption spectroscopy of hemerythrin (Hendrickson et al., 1982) and frustrated by phase ambiguity in the crambin study (Hendrickson & Teeter, 1981), I then revisited the earlier analyses of anomalous diffraction, aiming for the optimization that synchrotron radiation affords.This led to my formulation of MAD (Hendrickson, 1985), which we tested in applications to lamprey hemoglobin (Hendrickson, 1985;Hendrickson et al., 1988) and which I introduced at the 13th IUCr Congress in Hamburg (Hendrickson, 1984).We have previously reviewed developments of anomalous diffraction and its many applications (Hendrickson, 1991(Hendrickson, , 2014;;Hendrickson & Ogata, 1997) and I only summarize here.
In principle, all atoms are anomalous scatterers; however, the X-rays used in typical experiments (E > 6 keV; < 2 A ˚) are far from resonance for nearly all atoms in a biological macromolecule, whereby their anomalous scattering factors (f Á ) are negligible.We ignore the scattering from hydrogen altogether and consider N, C and O atoms to be normal scatterers (N), whereas, for example, the Fe atom in heme is an anomalous scatterer (A).The designation of A versus N is only a matter of relative effect: thus, P and S atoms may be designated N for a metal complex but A in studies of native nucleic acids or metal-free proteins.In our diffraction analysis (Hendrickson, 1985;Hendrickson et al., 1988), we separate contributions from the A subset of atoms from the total diffraction (T atoms, both A and N) in a manner of algebraic dissection due to Karle (1967).We then determine the A substructure and use it to deduce phases for the total diffraction, by which the entire structure should be revealed.
The normal scattering components (f 0 ) are wavelengthindependent and pure real, whereas the anomalous scattering components (f Á = f 0 + if 00 ) are wavelength-variant and complex.Structure factors follow from the contributing scatteringfactor components (equation 1); thus, Superscript prefixes of or 0 connote wavelength dependence or independence, respectively.Thus, F T is the complete structure factor, which is a function of wavelength, 0 A is the structure-factor component from the real parts of anomalous scattering, and A is the structure-factor component from the imaginary parts of anomalous scattering.A macromolecule may contain more than one kind of anomalous scatterer, for example iron in heme and selenium in selenomethionine residues, and contributions of each kind k are specified by 0 F A k and the scattering factors f 0 k , f 0 k and f 00 k .Thereby, the structure factor for reflection h (h, k, l) measured at wavelength is given by The system of equations ( 13) for N kinds of anomalous scatterers can in principle be solved from diffraction data at 2N + 1 wavelengths, but this has not been performed.scattering, whereby this approximation suffices to initiate a fuller analysis.Then, analyzed in terms of the observable intensities The coefficients a(), b() and c() specify the spectral dependences of the anomalous scattering factors, and | 0 F T |, | 0 F A | and Á' = 0 ' T À 0 ' A are the wavelength-invariant contributions from Thomson scattering and relate directly to electron density.
The formulation of ( 14) also provides the basis for understanding the information content in the MAD experiment.Without approximation, the intensity differences between Friedel mates or their symmetry equivalents are and, in the typical situation where anomalous scattering is weak relative to normal scattering . This gives the Bijvoet difference: Similarly, a measure of phase information coming from intensity variation with wavelength, the anomalous dispersion, derives from equation ( 14).Taking h| , again without approximation, the difference in intensities at two selected wavelengths i and j is and, unless anomalous scattering is very strong, the second term greatly dominates; moreover, | 0 F T (h)| ' 1 2 ½hj i FðhÞji þ hj j FðhÞji.This then gives the dispersive difference, Comparing equations ( 16) and ( 18), one sees that Bijvoet differences are proportional to f 00 () and to sin(Á'), while dispersive differences are proportional to |f 0 ( i ) À f 0 ( j )| and to cos(Á'), which demonstrates the orthogonality of information needed for definitive phase evaluation.The Bijvoet and dispersive differences given by equations ( 16) and ( 18) also serve as the basis for estimating the expected ratio of averaged anomalous diffraction to averaged total diffraction.Ratios are obtained for the root-meansquared (r.m.s.) values of the respective diffraction differences relative to that of the total normal diffraction (Hendrickson, 1991), r:m:s:ð ÁF AEh Þ=r:m:s:ð 0 F T Þ ' ðN A =2N T Þ 1=2 ½2f 00 ðÞ=Z eff ð19Þ and r:m:s:ðÁF Á Þ=r:m:s: where N A is the number of anomalous scatterers with scattering factors f 0 and f 00 , N T is the number of non-H protein atoms and Z eff is the effective atomic number for an average non-H atom.For typical protein molecules, Z eff = 6.7.

Scattering-factor evaluation for MAD optimization
Anomalous scattering factors, which are needed for the design and analysis of MAD experiments, can be evaluated from fluorescence spectra measured from the very crystal used for diffraction measurements.The normal, Thomson scattering factors are obtained by transformation from quantum calculations of radial electron-density distributions.Experimental evaluation of anomalous scattering is important, however, because resonant features depend on transitions to unoccupied orbitals, which are critically affected by details of the molecular environment.Quantum calculations (Cromer & Liberman, 1970) for the isolated atoms provide a baseline for designs; however, the molecular reality is often appreciably different near the absorption edge itself.The calculated values remain valid at positions away from the edge, and we standardly normalize both the f 00 spectrum and the Kramers-Kronig-derived f 0 spectrum to the theoretical spectra, splicing experimental near-edge features into the theoretical framework (Hendrickson et al., 1988;Evans & Pettifer, 2001).Such spectra are shown in Fig. 5 for selenium from a selenomethionyl protein.
While full optimization of a MAD experiment depends on varied considerations, extrema from the spectra usually dictate the wavelengths of choice.From the Bijvoet-difference dependence of equation ( 16), one clearly seeks the wavelength of maximal f 00 ( 3 in Fig. 5), and from the dispersive difference dependence of equation ( 18) and knowledge of the theoretical form of the f 0 spectrum one chooses the wavelength of minimal f 0 ( 2 in Fig. 5) for contrast with one or more remote wavelengths ( 1 and 4 in Fig. 5).Note that the f 0 spectrum is roughly the negative of the first derivative of the f 00 spectrum.Thus, the f 0 minimum is at the rising inflection point of the f 00 spectrum and there is a local f 0 maximum at the descending inflection point.
Features in anomalous scattering spectra vary widely among elements and absorption edges.Some are nearly featureless (notably mercury) and others are incredibly sharp and strong.Fig. 6(a) shows the sharp features of uranium spectra.The Bijvoet scattering strength at the M V peak of f 00 ( 2 ) is 105 electrons, and the dispersive scattering difference |f 0 ( 3 ) À f 0 ( 1 )| between values at the descending f 00 inflection point ( 3 ) and the rising f 00 inflection point ( 1 ) is 100 electrons, over a peak width of only 5 eV.Low-energy X-ray diffraction experiments are challenging; nevertheless, we did accomplish a U M IV -edge MAD structure test on uranyl elastase at 3.2 A resolution (Liu et al., 2001).Fig. 6(b) shows the f 00 spectrum at the Y L III edge from a complex with N-cadherin, which has a Bijvoet-peak scattering strength of 30.2 electrons (Shapiro et al., 1995).The anomalous scattering strength increases in general from K to L to M edges, as seen in the at-scale comparison of f 00 spectra in Fig. 6(c), and variably so depending on bonding and coordination interactions.
In modern practice, although fluorescence spectra are used to define the maxima and inflection points for wavelength selections, the actual anomalous scattering factors themselves are quantified by refinement, following earlier practice (Templeton & Templeton, 1982;Fanchon & Hendrickson, 1990;Weis et al., 1991).It is even possible to perform sitespecific scattering-factor refinements to yield spectra from which redox states can be deduced (Spatzal et al., 2016).
From the outset, the aspiration for MAD phasing was for the resolution of the phase ambiguity that had vexed SAD and SIR experiments.This indeed was accomplished.For example, Fig. 8 (Hendrickson et al., 1989).(b) Selenomethionyl ribonuclease H from E. coli (Yang et al., 1990).(c) Lectin domain from rat mannose-binding protein (Weis et al., 1991).(d) Human glycoprotein hormone chorionic gonadotropin (Wu et al., 1994).(e) Tyrosine kinase domain of the human insulin receptor (Hubbard et al., 1994).( f ) Adhesive domain CD1 of murine N-cadherin (Shapiro et al., 1995).Adapted with permission from AAAS (b) and (c), Elsevier (d) and Springer Nature (e, f ).substrate-binding domain (SBD) of selenomethionyl DnaK compared with those from the refined atomic model.
Three technical developments contributed especially strongly to the efficacy and explosive growth of MAD phasing: MAD-adapted synchrotron beamlines, selenomethionyl proteins and cryoprotection.Initially, we performed experiments at various synchrotrons around the world, improvising strategies and employing varied detector systems, we exploited several different anomalous elements, introducing them in various ways, and we measured data from crystals mounted in glass capillaries, either at room temperature or cooled slightly to $0 C. Later, we and others used beamlines specifically designed or adapted for MAD, commonly incorporated selenomethionine through recombinant protein expression, and universally adopted cryoprotection to $100 K to control radiation damage.
A MAD beamline needs to provide fine control of the X-ray energy at any of many potential absorption edges, precise goniometry for crystal positioning, and a suitable diffraction detector system.We designed and built beamline X4A at NSLS expressly for our MAD experiments (Staudenmann et al., 1989), and other similarly purposed early beamlines (for example 5.0.2 at ALS, BM14 at ESRF and SBC-CAT 19ID at APS) also accommodated the growing demand for MAD.Somewhat later, the fixed-wavelength NE-CAT 24ID-E beamline at APS was set up to address Se SAD.
Although the metalloproteins and heavy-atom derivatives used for MIR are excellent sources for MAD phasing, a more general phasing vehicle is clearly desirable.Selenomethionine is just such an agent.We had used selenolanthionine in our initial MAD trials (Hendrickson, 1985), and we synthesized selenobiotin for our initial novel MAD structure (Hendrickson et al., 1989); hence, appreciating the potential benefit from the analogous replacement for methionine in proteins, we set out to produce selenomethioninyl proteins.We found that Cowie & Cohen (1957), studying translation, had already done so in Escherichia coli, and we confirmed that E. coli cultures could be maintained when supplied exclusively with selenomethionine, meaning that recombinant proteins could have 100% replacement of selenium for sulfur in their methionines (Hendrickson et al., 1990).Selenium incorporation also proved to be possible in mammalian cell cultures (Wu et al., 1994), although usually not at the 100% level.Selenomethionyl proteins are widely used for MAD and SAD phasing (Hendrickson, 2014).
MAD experiments are demanding in that one seeks to obtain compatible data sets at three or four wavelengths, which is a serious challenge because of radiation damage.To the good fortune of MAD experiments, techniques for the cryopreservation of macromolecular crystals in vitreous ice were innovated (Hope, 1988) and perfected (Rodgers, 1994) contemporaneously with the introduction of MAD.The rate of radiation damage is reduced $50-fold at 100 K compared with 300 K (Warkentin & Thorne, 2010).Our cryoprotected MAD structure of human chorionic gonadotropin (Wu et al., 1994) was our first of what became the standard for MAD and SAD diffraction studies.

Alternative approaches in MAD phasing analysis
Our theoretical analysis of MAD (equation 14) describes the set of observed intensities for the Bijvoet mates at various wavelengths, {| F(AEh)| 2 }, in terms that separate wavelengthvariant factors [a(), b() and c()] from factors involving the wavelength-invariant structure-factor quantities | 0 F T (h)|, | 0 F A (h)| and Á' = ( 0 ' T À 0 ' A ).We devised a procedure for solving this system of equations for each reflection h subject to trigonometric identities.The set of {| 0 F A (h)|} structure factors serves to define the atomic substructure of anomalous scattering centers, by Patterson or direct methods, and then to refine this substructure, subject to enantiomorph ambiguity.Phases calculated from the substructure then solve the phase problem, 0 ' T = Á' þ 0 ' calc A , provided that the substructure {AEr A } is taken in the correct handedness.
The steps of this algebraic approach to MAD analysis, embodied in the program MADLSQ, are schematized in Fig. 9.The effectiveness of the process is epitomized by results from MAD phasing for the SBD of the molecular chaperone DnaK (Zhu et al., 1996).This structure was refined at 2.0 A ˚resolution after model building based on a 2.3 A ˚resolution electrondensity map, which in turn was derived from a substructure determined at 3.0 A ˚resolution.We adopted a 'phase first, merge later' strategy in MADLSQ, whereby replication statistics define the precision of the analysis.Here, the replicates at 2.3 A ˚resolution gave merging statistics of R(| 0 F T |) = 0.051, R(| 0 F A |) = 0.356, hÁ(Á')i = 36.5 and h(Á')i = 17.2 , where R = P j ðjF j j À hjF i jiÞ= P jF i j for replicates i within each unique reflection j, Á(Á') is the discrepancy between Á' estimates from a pair of replicates and (Á') is the estimate of the Á' error from the least-squares minimization.R(| 0 F T |) is commensurate with the R sym value of 0.052, the higher value for R(| 0 F A |) is compatible with expectation from the partial structure contribution of the six Se atoms to the total diffraction, and the precision of phase evaluations is consistent with the comparability of the experimental and the refined electron-density maps (Fig. 8).
A phase probability approach based on the factorized formalism of equation ( 14) provides a practical alternative to the algebraic approach of MADLSQ.This approach circumvents the defect that the least-squares analysis requires a minimal set of data for each reflection, phase information often still exists when only a subset is available.A probability analysis based on equation ( 14) provides a comprehensive description of the phase information from anomalous diffraction (Pa ¨hler et al., 1990).In our MADABCD reformulation of the procedure, also illustrated in Fig. 9, we first estimate | 0 F T | from the wavelength-dependent observations | 0 F T | ' hj i FðhÞj 2 i; next, recognizing experience from numerous SAD analyses showing that peak Bijvoet differences ( ÁF AEh in equation 16) suffice for substructure determinations, we calculate | 0 F A | and 0 ' A from the refined substructure {r A }, and finally we produce the joint probability distributions, P(| 0 F T |, 0 ' T ), which upon integration yield the best Fourier coefficients.Through integration over all possible values of | 0 F T | in this two-dimensional probability function, a phase probability, P( 0 ' T ), consistent with all available data is obtained (Fig. 9), and similarly for P(| 0 F T |).
A third alternative can be called the MAD-as-MIR approach, akin to the suggestions of Phillips & Hodgson (1980).A MAD experiment can be considered to be an in situ MIR experiment where differences are realized from physics instead of from chemistry.The dispersive differences (equation 18) are like those of MIR differences and the Bijvoet differences (equation 16) are like the anomalous differences that are included to give MIRAS.In the MAD-as-MIR approach, the data set at one of the wavelengths is chosen arbitrarily (also rather unsatisfyingly) as the reference (F P ) data set, and differences are taken relative to these data.Ramakrishnan et al. (1993) analyzed the structure of histone H5 using both MADLSQ and a MAD-as-MIR approach that employed MLPHARE (Otwinowski, 1991).They reported that the latter 'gave somewhat better maps than the algebraic formalism', and this method of analysis quickly took hold (Ramakrishnan & Biou, 1997).
It is worth emphasizing that anomalous diffraction defines absolute configuration (Bijvoet et al., 1951), while conventional diffraction analysis is insensitive to handedness.Thus, as mentioned above, the substructure of anomalous scatterers is unavoidably ambiguous {AEr A }, it being the case that the set of {r A } positions and its enantiomorphic set {Àr A } equally satisfy the | 0 F A | or | ÁF AEh | coefficients.On the other hand, only the correct enantiomer will be compatible with the signs as well as the magnitudes of the Bijvoet differences ÁF AEh .A statistical basis for resolving the enantiomorph ambiguity has proved to be elusive, but electron-density maps derived from the alternatives are decisive when phase evaluations are accurate.The density map for DnaK SBD in the correct hand [Fig.10(b)] shows clear-cut molecular boundaries and structural features, whereas the map from the wrong hand [Fig.10(a)] contains no recognizable features.

Single-wavelength anomalous diffraction (SAD) and density modification
SAD is a subset of MAD, and both the theoretical formalism and many aspects of the analytical procedures also apply for SAD as for MAD.The intrinsic phase definition from SAD is sharp but, with two equally likely alternatives for each reflection, the density-map features become blurred.The orthogonality of dispersive contributions in MAD readily  resolves this ambiguity; however, MAD is not an option.The absorption edges of lighter elements (Z < 26; for example phosphorus, sulfur, potassium, calcium and manganese) and some intermediate elements (for example iodine and xenon) are inaccessible at most synchrotron beamlines (Hendrickson & Ogata, 1997) and, in any case, low-energy diffraction experiments are complicated by high absorption and large Bragg angles.Moreover, a substructure of light atoms is relatively feeble as a partial structure for resolving the phase ambiguity, as we found with the S atoms of crambin (Hendrickson & Teeter, 1981).
A hint into ambiguity resolution was already evident from our analysis of trimeric hemerythrin.Being pre-MAD, we used Cu K radiation to exploit the anomalous scattering from the dimeric iron center in hemerythrin (Smith et al., 1983) as 'resolved' by the iron partial structure.The initial map was inadequate for interpretation on its own; however, threefold molecular averaging and 'solvent leveling', as it was then described, improved the map dramatically.A similarly dramatic enhancement from threefold symmetry averaging had been seen in analyzing the structure of influenza virus haemagglutinin (Wilson et al., 1981), which used procedures devised by Bricogne (1976) and applied decisively in studies with 17-fold (Champness et al., 1976) and 5-fold (Winkler et al., 1977) averaging.Although noncrystallographic symmetry is not always present, fluid solvent expanses are universal features of macromolecular crystals.
Being fluid, or cryo-vitrified, these solvent regions have a featureless, constant electron density, and, in an early article on phase evaluation, I described a procedure for using solvent constancy in phase refinement through iterative Fourier cycling such as used in molecular averaging (Hendrickson, 1981).We had used this procedure in our analyses of lowresolution structures of myohemerythrin (Hendrickson et al., 1975) and octameric hemerythrin (Ward et al., 1975) as well as in tests with orthorhombic Glycera hemoglobin (Hendrickson, 1981).Wang (1985) innovated general and powerful procedures for molecular envelope definition and solvent flattening, and, as applied to anomalous scattering problems, it was called the iterative single-wavelength anomalous scattering (ISAS) method.Successes followed for a neurophysin-dipeptide complex using an iodinated derivative (Chen et al., 1991) and in the corrected analysis of Cd,Zn metallothionein (Robbins et al., 1991).
Density modification advanced beyond Wang's introduction of solvent flattening (Wang, 1985) with the incorporation of histogram matching (Zhang & Main, 1990), the systematic inclusion of molecular averaging (Cowtan & Main, 1993), the innovation of solvent flipping (Abrahams & Leslie, 1996) and the introduction of maximum-likelihood processing (Terwilliger, 2000).As implemented in programs such as DM and reviewed by Cowtan (2010), density modification became a mainstay for phase refinement and map improvement.
The results for ambiguity resolution in SAD phasing are dramatic.This is well illustrated in our sulfur-SAD analysis of DnaK-ATP (Liu et al., 2013).A typical probability distribution shows essentially perfect phase resolution [Fig.11 Enantiomorph definition.Alternative electron-density maps at 2.3 A ˚resolution are shown from the analysis of the DnaK SBD (Zhu et al., 1996).One is based on phases from the substructure {+r A }, as deduced by Patterson analysis from {| 0 F A |} values, and the other is based on the alternative {Àr A } where the z coordinates of the projection are reversed and equivalent slabs are shown.Interpretable protein features and appropriately featureless solvent expanses characterize the correct alternative on the right.partial structure information from the sulfur substructure and, to make this a generally applicable DM was performed without using the noncrystallographic symmetry that happened to exist.Map quality was judged by the map correlation coefficient (MapCC), which compares a given map with that from the ultimate refined atomic model.MapCC improved from 46.6% for the SAD map to 85.3% for the DM-modified map, from which 1117 of the 1200 residues (93%) in this DnaK-ATP structure were built automatically by the ARP/ wARP program (Perrakis et al., 1999).

Evolution from MAD to SAD
It took some time for the power of density-modified SAD phasing to be recognized, and even for the acronym SAD to take root.In part, the success of MAD obviated the need for an alternative.After crambin (Hendrickson & Teeter, 1981), the next single-wavelength anomalous applications were those for B.-C. Wang's ISAS studies (Chen et al., 1991;Robbins et al., 1991), and for a lead derivative of a translation-factor domain (Biou et al., 1995) and a selenomethionyl enzyme (Turner et al., 1998).'SAD' was used in describing experiments for a GFP structure (Ormo ¨et al., 1996), for systematic methodology tests (Dauter et al., 1999;Dauter & Dauter, 1999) and for the structure of psoriasin (Brodersen et al., 2000).Influentially, a re-evaluation of seven MAD structures of selenomethionyl proteins suggested that density-modified SAD maps rivaled MAD maps or could even rescue failed MAD analyses (Rice et al., 2000).The Jolly SAD review by Dauter et al. (2002), presenting 15 tests and novel applications, heralded the start of the SAD era.
Structures of ever-increasing size came under study, and increasingly these investigations were on selenomethionyl proteins.Consequently, the sizes of selenium substructures increased with sequence length in proportion to the methionine frequency, $1:59.An advance of comparable importance to density modification came in using direct methods for substructure determinations.Bijvoet differences (equation 16), typically taken at the wavelength of peak f 00 , are converted to normalized structure-factor estimates, , for the substructure and used in directmethods programs.Turner et al. (1998) implemented this idea, adopting the Shake-and-Bake (SnB) dual-space recycling procedure (Miller et al., 1994;Smith et al., 1998) to find 30 selenium sites.Many other selenomethionyl-protein applications soon followed (Deacon & Ealick, 1999), including a 70selenium enzyme (Deacon et al., 2000).Schneider & Sheldrick (2002) employed an efficient tangent-formula-based version of SnB in SHELXD, and a typical result is shown in Fig. 12 from our native SAD application to DnaK-ATP (Liu et al., 2013).Here, a few (red) among 10 000 random trials stand out from random in the plot of CC all versus CC weak , each an equivalent solution for this 52-atom anomalous substructure.
Among the many selenomethionyl structures solved by SAD, particularly impressive examples come from large and challenging protein complexes produced from mammalian cell  cultures and analyzed at modest resolution.The first structure RAG1/RAG2 recombinase identified 52 of 58 selenium sites at 3.7 A ˚resolution (Kim et al., 2015) and the first structural analysis of DNAPKcs identified 172 of 236 selenium sites at 4.3 A ˚resolution (Sibanda et al., 2017).The Se sites for RAG1/RAG2 were found by SHELXD, but those for DNAPKcs used maps that were based on a Ta 6 Br 2þ 12 cluster derivative.

Native SAD
Structural analysis from native molecules, without recourse to any modification, is an aspiration for structural biologists, and advances in SAD phasing now make native SAD accessible for the vast majority of macromolecules (Liu et al., 2012;Weinert et al., 2015).The crambin structure certainly qualifies as native SAD, although it was not so named at the time; however, the crambin structure is atypical in being quite small (46 residues) and at exceptional resolution [initially 1.5 A (Hendrickson & Teeter, 1981) and later 0.48 A ˚ (Schmidt et al., 2011)].Sulfur SAD as resolved by solvent flattening was brought into play by Liu et al. (2000).Suggestions emerged that native SAD phasing could routinely be based on the S atoms of proteins (Dauter et al., 1999) and the P atoms of nucleic acids (Dauter & Adamiak, 2001) and that the use of lower X-ray energy (longer wavelength) would improve effectiveness (Weiss et al., 2001;Yang et al., 2003;Wagner et al., 2016).Experiments at lower energy do present complications, however.
The absorption edges for P (2.14 keV) and S (2.47 keV) are remote from the energies that can be reached by most beamlines.Moreover, as the X-ray energy is lowered to increase f 00 for a low-Z element, difficulties arise for detector geometry from the consequent large scattering angles, for crystal thickness from an increased absorption coefficient and for background from pathway scatter.As shown in Fig. 13, above a critical thickness the signal as actually transmitted deteriorates with the lowering of X-ray energy.Microcrystals solve the thickness problem, but then radiation damage limits the crystal lifetime; the merging of data from multiple crystals addresses radiation damage and can also improve statistics by achieving higher data multiplicity (Liu et al., 2012).Measurements at multiple orientations also increase accuracy from multiplicity (Weinert et al., 2015).Pathway absorption and scattering at lower energy can be addressed by operation in vacuum (Aurelius et al., 2017) or in a helium environment (Karasawa et al., 2022).
Even just from the five examples of our initial multi-crystal sulfur SAD experiments (Liu et al., 2012), we estimated that $90% of then-current PDB depositions were within the limits of these examples: fewer than 1200 residues, fewer than 32 sulfur sites and better than 2.8 A ˚resolution [Fig.14(a)].Since that time, native SAD has indeed entered the mainstream (Rose et al., 2015;Liu & Hendrickson, 2015).Five more-recent applications (El Omari et al., 2014;Akey et al., 2016;Zeng et al., 2019;Basu et al., 2019) are all at resolutions worse than the previous 2.8 A ˚limit (3.4-2.95A ˚when refined; often much lower for phasing) and extend the size limit to 2362 residues, of which 118 are sulfur-containing.Sulfur SAD determinations have also been reported from FEL experiments by serial femtosecond crystallography (Nass et al., 2020).Although many native SAD efforts have used low X-ray energy (long wavelength) to increase the anomalous scattering strength (Basu et al., 2019;Nass et al., 2020;Karasawa et al., 2022) even the sulfur signal at the Se K edge (f 00 = 0.235 electrons) can suffice.Thus, we now use sulfur SAD phasing of trypsin at the Se K edge to test the performance of the NYX beamline [Fig.14(b)].

Anomalous diffraction and structural genomics
As MAD and SAD were advancing, DNA-sequencing efforts were advancing ever faster.The sequence of the first bacterial genome (Fleischmann et al., 1995) was soon followed by other genome sequences, including the human genome sequence (Lander et al., 2001).Structural biology could not keep pace with gene sequencing; however, the seemingly achievable concept of structural genomics emerged (Shapiro & Lima, 1998).Here, the aspiration was for structure determination on a pan-genomic scale, aiming to produce experimental structures representative of each sequence family in all living organisms.These representative structures would serve as templates for homology modeling of particular family members, as needed.MAD and SAD phasing of selenomethionyl proteins became a foundation for efficiency in structural genomics ventures worldwide.
Selenomethionyl proteins were at the heart of the structuredetermination pipelines for all four large-scale centers of the  Dependence of transmitted anomalous signals on crystal thickness and X-ray energy.Water is taken to approximate the absorptivity of a typical macromolecular crystal, and we plot the transmitted f 00 anomalous signal from sulfur as a function of energy and thickness.The red line shows the ridge of maximal signal in this parameter space.This presentation is adapted from data presented in Fig. 2  2011), the Northeast Structural Genomics Consortium (NESG; Xiao et al., 2010) and the York SGX Research Center for Structural Genomics (NYSGXRC; Bonanno et al., 2005)] and for the New York Consortium on Membrane Protein Structure (NYCOMPS) membrane-protein center (Love et al., 2010).Over the ten-year period of the PSI protein-universe effort, PSI-1 and PSI-2 deposited 5097 atomic structures in the Protein Data Bank, of which 4598 (90%) were crystal structures.These crystal structures included 1353 (29%) determined by MAD and 2168 (47%) determined by SAD (Hendrickson, 2014).
Contributions from structural genomics have had a large impact on the growth of Protein Data Bank (PDB) holdings (Chandonia & Brenner, 2006;Levitt, 2007), with MAD and SAD predominating at 69% (Hendrickson, 2014).Because of the emphasis of structural genomics on sequence families, without regard to biological activity or other properties, structural genomics has had an exceptional impact in structural novelty.Levitt (2007) estimated that 50% of nonredundant sequence data in the PDB from 2005 until mid-2007 came from structural genomics efforts.In turn, this novelty from MAD and SAD structures was crucial in enabling the remarkable advances in protein structure prediction made by AlphaFold (Jumper et al., 2021).

Changing practice in macromolecular phasing
At the outset of protein crystallography, all structures were determined by MIR phasing, although anomalous scattering soon entered by way of MIRAS and SIRAS.Many early structures led to isomorphic variants (IVs), including ligand complexes as well as the heavy-atom derivatives used in MIR, and ultimately these included mutant variants and other modifications.Molecular replacement (MR), as we know it now, first appeared with Tollin's test (Tollin, 1969) and it was not until the structure of crambin (Hendrickson & Teeter, 1981) that anomalous scattering was used alone in what we now call SAD.As recounted above, MAD followed in 1985.It was not until 1996 that the PDB began to record depositor declarations on the method of structure determination; however, during the decade 1990-1999 we produced the annual Macromolecular Structures (MS) series (Hendrickson & Wu ¨thrich, 2000), in which we published curated abstracts of reported structures meeting the criteria of being a biological macromolecule, novel and experimentally determined at an atomic level.From the MS series before 1996 and from PDB statistics after, I was able to chronicle the evolution of crystallographic structure determination (Hendrickson, 2014).
The record of this progression documents a succession of sea changes in the prevalent usages of phasing methods.MR had not yet appeared when Blundell & Johnson (1976) 'wrote the book' on protein crystallography, but already in MS1991 (Hendrickson & Wu ¨thrich, 1991) we recorded that 51% of MS-qualifying structures had been determined by MR, MIR dominated for de novo phasing and only a few MAD structures had been reported.At the expense of MIR, MAD had risen to 32% of de novo structures by 1999 (MS2000), and this rise continued as deduced from PDB statistics [Fig.14 A calcium ion has a peak height of 47 and the sulfur peaks range from 9.6 to 18.9.The peaks from calcium (Ca 2+ ) and sulfate (SO 2À replacement share (MIR + SIR) was over 80% at start of recordings in 1997, this predominance had reversed to $90% for anomalous diffraction (MAD + SAD) from 2008 onwards.By 2013, SAD alone accounted for $70% of all de novo structures.I have estimated that 77% of all de novo crystal structures of macromolecules accumulated through 2012 were determined by MAD or SAD phasing (Hendrickson, 2014).
Over the period 1999-2013 [Fig. 15(b) [Fig. 15(b)] there was substantial growth in PDB depositions; however, the fraction of structures obtained by de novo phasing remained essentially constant.What changed dramatically was the portion attributed to MR.In the MS series, we distinguished an IV from an MR.An IV is defined as a new structure in a previously described lattice, which should not require new phase evaluation, whereas an MR analysis requires a molecular search and model generation in the new lattice.For some time, many investigators have used MR procedures to solve any new structure, even if in a lattice that is isomorphic with a known structure; often, the resulting structure is then declared to be determined by MR upon PDB deposition.By PDB declaration, MR accounted for 34% of 1999 PDB depositions, compared with 20% by MS2000 definition.For 2013, MR was declared for 67% of PDB depositions, whereas by my manual curation the MR fraction was 44%, as plotted in Fig. 15(b).

Phase evaluation today
Macromolecular crystallography is in a state of comfortable and productive maturity.This, however, is also how it felt in 1971 at the watershed Cold Spring Harbor meeting as the fruits of MIR were on resplendent display (Watson, 1972).The prospective impact of the now predominating methods of molecular replacement (MR) and anomalous diffraction (MAD and SAD) was nowhere evident.Meanwhile, advances in molecular biology (gene sequences, recombinant proteins, mutational tests of structure-inspired hypotheses and selenomethionyl proteins), instrumentation (computers, molecular graphics, detectors, synchrotron beamlines and FELs) and diffraction methods (restrained refinement, MR, DM, MAD and SAD) effected revolutions in how we practice macromolecular crystallography today.It would be foolhardy to feel comfortable in our current maturity.
Developments in kindred disciplines continue to impact crystallography, and crystallography reciprocates.Recent advances in cryogenic electron microscopy (cryo-EM) and protein structure prediction (AlphaFold) relate cogently to crystallographic phase evaluation.
The 'resolution revolution' in cryo-EM (Ku ¨hlbrandt, 2014) quickly spawned remarkable advances in structural biology (Saibil, 2022).Direct imaging, even at a modest 4 A ˚resolution, was awe-inspiring to those who have faced the phase problem in crystallographic imaging.Most macromolecular crystallographers have become electron microscopists as well; this is much to the benefit of advancing cryo-EM, and the cryo-EM images are  inspirational for advancing crystallography.X-ray crystallography can complement cryo-EM.A noteworthy example in element identification, notably for bound ions.Cryo-EM is problematic in this regard, whereas refinement of the f 00 anomalous scattering contribution from a site is usually definitive (Liu et al., 2013;Vecchioni et al., 2023).
I made the case above that MAD-and SAD-phased structures have contributed 77% of the novel PDB content (evaluated through 2012) on which the neural-network training needed to perfect AlphaFold was based (Jumper et al., 2021).The effectiveness of AlphaFold is widely appreciated in modern biology generally, and notably so for macromolecular crystallography.AlphaFold models immediately became the search models of choice for molecular replacement (McCoy et al., 2022;Gong et al., 2023;Terwilliger et al., 2023).AlphaFold accuracy is such that this development is expected to expand the reach of MR.
MAD and SAD will be at the ready for otherwise unmet challenges; however, cryo-EM, AlphaFold and future advances will likely decrease the dependence on crystallographic phase evaluation.Just as MIR gave way to MAD, and MAD gave way to SAD (even native SAD), SAD may give way to AlphaFold-guided MR.

Figure 2
Figure 2 Harker phasing diagrams illustrating the phase ambiguity inherent in SIR phasing (a) and SAD phasing (c), and its resolution by two derivatives in MIR (b) or by SIRAS (d).The 'native' amplitude circles (heavy lines) each have radius |F P |.The single isomorphous pair from SIR derivative 1 yields options at Q or R, whereas derivative 2 gives alternatives Q and R 0 .The anomalous scattering from derivative 1 gives alternatives Q and Q 0 .Neglecting experimental error, in principle either MIR or SIRAS resolves the phase ambiguity.Adapted from Figs. 4 and 6 of Hendrickson (1981) with permission from Macmillan.
Figure 3 ).(a) Harker diagram illustrating the lack-of-closure error.(b) Phase probability associated with the single isomorphous derivative in (a).Reproduced from Fig. 5 of Hendrickson (1981) with permission from Macmillan.
Figure 4 SAD phasing structural analysis of crambin.(a) A portion of what would now be called the sulfur SAD map for crambin at 1.5 A ˚resolution.The densest features are at S atoms.(b) The same portion as in (a) for the final 2F o À F c map after refinement to R = 0.104 at 1.5 A ˚resolution.(c) All-atom atomic model of crambin drawn by Irving Geis.S atoms are colored yellow and the first helix is filled in blue.(a) and (b) are adapted from Fig. 2 of Hendrickson & Teeter (1981) with permission from Springer Nature, and (c) is adapted from a drawing by Irving Geis with permission from the Howard Hughes Medical Institute, which now owns the Geis collection.
illustrates the degree of perfection in resulting experimental electron-density maps from our MAD structure of the lead articles 530 Wayne A. Hendrickson Facing the phase problem IUCrJ (2023).10, 521-543

Figure 5
Figure 5Anomalous scattering-factor spectra at the Se K edge of selenomethionyl human chorionic gonadotropin, derived from fluorescence measurement from a crystal as fitted with Kramers-Kronig transformation to theoretical selenium factors outside the range 12.648-12.676keV.Adapted from Fig.11ofWu et al. (1994) with permission from Elsevier.

Figure 6 Figure 8
Figure 6Comparison of sharp anomalous scattering spectra at M, L and K edges.(a) Resonance features at the M V and M IV absorption edges of uranyl nitrate(Liu et al., 2001).Wavelength positions at the M V edge are marked by red lines.(b) Spectrum of the f 00 component of Yb L III anomalous scattering from ytterbium-derivatized N-cadherin D1(Shapiro et al., 1995).(c) Direct at-scale comparison of the U M V edge, Y L III edge and Se K edge in Figs.6(a), 6(b) and 5, respectively.The U M IV , Yb L III and Se K edge energies are on a linear scale of X-ray energy.(a) and (c) are reproduced from Liu et al. (2001) with permission from the National Academy of Sciences.Copyright (2001) National Academy of Sciences.(b) is reproduced from Shapiro et al. (1995) with permission from Springer Nature.
Hendrickson Facing the phase problem 533

Figure 9
Figure 9 Schematic diagram of phase evaluation from MAD data, Bijvoet mates at multiple wavelengths {| F(AEh)| 2 }, by two alternative approaches.Algebraic analysis by MADLSQ deduces | 0 F T |, | 0 F A | and Á' = 0 ' T À 0 ' A for each reflection, from which {| 0 F A |} generates the substructure of anomalous scatterers {AEr A }.When taken in the correct hand, this solves the phase problem.In the phase probability pathway of MADABCD, the average of {| F(AEh)| 2 } provides an estimate for | 0 F T |, the peak Bijvoet differences { ÁF AEh } yield {AEr A }, and these observations form the basis for computing the joint probability P(| 0 F T |, 0 ' A |), an example of which is shown in the inset.
Figure 10 Figure 12Profile of correlation coefficients (CCs) from SHELXD for the sulfur substructure in the SAD analysis of DnaK-ATP(Liu et al., 2013).The distribution of CC all versus CC weak values is shown for 10 000 trials; the 36 successful solutions are colored red and the 99.6% random results are shown in blue.Reproduced from Fig. 5( f ) of Liu et al. (2013).

Figure 11 SAD
Figure11SAD ambiguity resolution from density modification.Examples are taken from the sulfur SAD analysis of DnaK-ATP(Liu et al., 2013).(a) Phase probability distributions for a particular reflection as evaluated in Phaser from SAD combined with the S-atom partial structure (red) and after using DM for density modification that excluded molecular averaging.(b) A portion of the electron-density distribution phased from SAD combined with the sulfur partial structure.(c) The same portion of the map after density modification.
Figure 14 Sulfur substructures from native SAD analyses.(a) Ribbon diagram of the TorT-TorS S ligand-histidine kinase complex showing the 28 S atoms (yellow) and three sulfate ions (yellow with red O atoms) used to define 1148 ordered protein residues.This structural analysis was performed at 7 keV [f 00 (S) = 0.73 electrons].(b) C backbone model of bovine trypsin showing the 16 peaks above 6 in a Bijvoet difference map.A calcium ion has a peak height of 47 and the sulfur peaks range from 9.6 to 18.9.The peaks from calcium (Ca 2+ ) and sulfate (SO 2À 4 ) ions and from two methionine S atoms (Met) are labeled.The other 12 peaks are in disulfide-bridged pairs.This structural analysis was performed at 12.7 keV [ f 00 (S) = 0.235 electrons].(a) was adapted from Fig. 9(d) of Liu et al. (2013).

Figure 15
Figure 15Changing practice in macromolecular structure determination.(a) De novo PDB depositions.Fractional contributions are shown, year by year, from isomorphous replacement, MIR (red) and SIR (pink), compared with anomalous diffraction, MAD (green) and SAD (chartreuse), and with ab initio methods (blue).We parsed depositor declarations to the PDB from 1997 through 10 December 2013, counting multiple declarations into each.The orange line traces the total number of de novo depositions with time.(b) Pie charts of the major categories of PDB depositions (de novo, MR for molecular replacement and IV for isomorphous variant) in 1999 and in 2013.The area of each pie is proportional to the total number of depositions in that year.The division between MR and IV is from the MS2000 curation for 1999 and was hand curated from 67% declared as MR to 44% being in a novel lattice.(a) is reproduced fromHendrickson (2014) with permission from Cambridge University Press.