# Maximum entropy and the foundations of direct methods

A revision of the classical statistical methods of phase determination is presented which widens their theoretical foundations and consolidates their practical implementation, thus bringing about a major increase of their power. In a brief introductory survey (§ 1), the basic concepts and mathematical techniques of direct methods are analysed. Closer scrutiny (§ 2) reveals that severe inadequacies still impair the effectiveness of these methods. The asymptotic character of the series used to approximate joint distributions of structure factors demands that great caution be exercised to guarantee their accuracy, and this requirement can only be fulfilled if they are used within a multisolution algorithm in which the prior distribution of atoms is constantly updated so as to incorporate at every stage all the phase information assumed to that point. Further limitations follow from the traditional practice of approximating joint distributions by products of marginal distributions of single invariants. A scheme for simultaneously overcoming both difficulties is then proposed. The pivotal element of this scheme is a device, based on Jaynes's maximum-entropy principle, for exploiting the prior knowledge of some structure factors in the construction of the joint distributions of others conditional to that knowledge. Jaynes's maximum-entropy formalism is presented and systematically applied to the construction of the requisite non-uniform prior distributions of atoms in § 3. The problem of effectively approximating conditional distributions of very large numbers of structure factors is solved in § 4 by a novel technique of 'maximum-entropy inversion' of Karle-Hauptman matrices, and the result obtained is shown to generalize the most sophisticated probabilistic formulae hitherto obtained. This procedure is proved in § 5 to coincide with an enhancement of the standard method of asymptotic expansions by means of Daniels's saddlepoint approximation. Its relationship to determinantal methods is investigated in § 6. A numerical algorithm for implementing these ideas is presented in § 7, together with an application to data from the small protein Crambin, and a unified strategy for its use ab initio is described and discussed in § 8. It is concluded that the phase-determination strategy proposed here will expedite the realization of the full potential of probabilistic direct methods, and is likely to bring macromolecular structures within their reach.