research papers
by evolutionary search
^{a}Pfizer Inc., 10350 North Torrey Pines Road, San Diego, CA 92037, USA
^{*}Correspondence email: chuck_kissinger@stromix.com
Stochastic search algorithms can be used to perform rapid sixdimensional molecularreplacement searches. A molecularreplacement procedure has been developed that uses an evolutionary algorithm to simultaneously optimize the orientation and position of a search model in a R factor. An alternative stochastic search procedure, simulated annealing, provides similar overall performance to evolutionary search. Methods of extending the evolutionary search algorithm to include internal optimization, selection and construction of the search model are now beginning to be investigated.
Here, the performance of this algorithm and its dependence on search model quality and choice of target function are examined. Although the evolutionary search procedure is capable of finding solutions with search models that represent only a small fraction of the total scattering matter of the target molecule, the efficiency of the search procedure is highly dependent on the quality of the search model. Polyalanine models frequently provide better search efficiency than allatom models, even in cases where the sidechain positions are known with high accuracy. Although the success of the search procedure is not highly dependent on the statistic used as the target function, the between observed and calculated structurefactor amplitudes generally results in better search efficiency than does theKeywords: molecular replacement; evolutionary search.
1. Introduction
New molecularreplacement methods have been introduced recently that use stochastic search algorithms to optimize the orientation and position of a molecular model in a ), evolutionary programming (Kissinger et al., 1999) and simulated annealing (Glykos & Kokkinidis, 2000). These methods are several orders of magnitude faster than a comprehensive sixdimensional search and have proven to be effective tools for molecular replacement.
Unlike traditional approaches, these methods do not divide the molecularreplacement procedure into separate rotation and translation searches. Instead, the three rotational and three translational parameters for a search model are optimized simultaneously. Several stochastic search techniques have been applied successfully to the molecularreplacement problem, including a (Chang & Lewis, 1997Sixdimensional searches have potential advantages over traditional molecularreplacement methods that use separate rotation and translation searches. Although significant advances have been made in increasing the effectiveness of rotationsearch methods (Navaza & Saludjian, 1997; Brünger, 1997; Tong & Rossmann, 1997), the signaltonoise ratio in a rotation search is relatively low because the correlation with the observed diffraction data is modest for a search model that is correctly rotated but mistranslated. Furthermore, the subsequent translation search can be highly sensitive to small errors in the orientation obtained from the rotation search. Conversely, when the orientation and position of a search model are optimized simultaneously the maximum achievable correlation between observed and calculated diffraction data can be obtained.
While advances in computing power have made systematic sixdimensional searches increasingly practical, particularly when spread over multiple processors (Sheriff et al., 1999), these generally remain very lengthy calculations. Stochastic search methods are dramatically faster than a complete systematic sixdimensional search. They do have some disadvantages, however. These algorithms are nondeterministic and the correct solution will not be found on every attempt. Multiple search attempts are typically required in order to ensure that the global optimum has been found. The usefulness of this type of algorithm, therefore, depends critically on its efficiency and reliability across the broad range of molecularreplacement problems. We have developed a program, EPMR, which uses an evolutionary algorithm to perform rapid sixdimensional molecularreplacement searches (Kissinger et al., 1999). Here, we examine the performance of this algorithm and the effect of the quality of the search model and the choice of statistic used as the target function. We also compare the performance of this algorithm with that of another stochastic search algorithm, simulated annealing.
2. using EPMR
In an evolutionary algorithm (Fogel et al., 1966), the global optimum of a function is found through the iterative optimization of a population of initially random trial solutions. During each cycle of optimization, the solutions are ranked and the best ones are retained to serve as `parents' of the next generation. Random variations are applied to the values of these parent solutions to generate a new population and this process is repeated for a number of generations. The size of the `mutations' applied to the parent solutions is typically adjusted to provide broad sampling of the entire search space in the initial stages of the search while gradually focusing in on only the most promising areas. Evolutionary algorithms have been applied successfully to a wide variety of optimization problems involving multidimensional nonlinear search spaces (Fogel, 1995).
In our procedure, the three rotational and three positional parameters that describe the orientation and position of the search model within the et al., 1999). Briefly, an initial population of molecularreplacement solutions is created by assigning a random orientation and position to each population member. The for each solution is calculated and compared with those of a small number of other randomly chosen solutions. The number of these competitions that each solution wins is used to rank the solutions and determine which ones will survive to the next generation. Surviving solutions are retained in the next generation without modification and are also used to regenerate the rest of the population by applying normally distributed random variations to the values for the orientation and position. Structure factors for each solution are calculated rapidly during the course of the optimization by precalculating the molecular transform of the search model and applying the rotations and translations in (Lattman & Love, 1970; Huber & Schneider, 1985; Castellano et al., 1992). A population size of 300 solutions, evolving over 50 generations, has proven to provide a good compromise between the speed and probability of success of an individual search. At the end of 50 generations, the population member with the best is chosen for a local conjugategradient optimization. Searches for multiple copies of a molecule in the can be carried out sequentially. (Simultaneous searches for multiple copies of a molecule are discussed below.) In the sequential procedure, when each copy of the molecule is found its static contribution to the structure factors is calculated and included in the correlationcoefficient calculations during subsequent searches.
are optimized using the between observed and calculated structurefactor amplitudes to rank the solutions. The procedure is described in detail in a previous publication (KissingerEvolutionary sixdimensional molecularreplacement searches are computationally efficient. As illustrated in Table 1 for two representative test cases, the use of an evolutionary algorithm is typically four or five orders of magnitude more efficient than a complete systematic sixdimensional search. In fact, a sixdimensional evolutionary search frequently requires fewer structurefactor calculations than would an equivalent combination of systematic rotation and translation searches [although separate rotation/translation searches can still be performed more quickly using the highly efficient `fast rotation' (Crowther, 1972; Navaza, 1993) and `fast translation' functions (Navaza & Vernoslova, 1995)]. A typical evolutionary molecularreplacement search can be completed in several minutes on a modern workstation. The true efficiency of this approach depends ultimately, however, not only on the speed of a single search attempt but also on how many attempts are required to ensure that the optimal solution has been found. For this reason, we have examined the effectiveness of the search procedure under a variety of conditions. In the following sections we explore two aspects of the performance of the algorithm: the search efficiency, defined as the probability of success in finding the correct solution on a single search attempt, and the sensitivity, defined as the maximum reduction in search model quality that still produces the correct solution. Our results are illustrated using a single test case, but are representative of results obtained on a wide variety of molecularreplacement problems.
‡Calcineurin–FKBP12–FK506 complex, P4_{3}2_{1}2, a = b = 105, c = 162 Å. 
3. Search efficiency and model accuracy
To determine the effect of model quality on the effectiveness of the evolutionary search procedure, we examined the search efficiency using models of systematically decreasing accuracy or completeness. Fig. 1 shows the success rate of the procedure with search models that were incrementally truncated by removing residues from the carboxyl terminus. For both a polyalanine model and a model including side chains, the search efficiency decreased in a roughly linear manner as the completeness of the search model was decreased. We obtained very similar results on a variety of other test cases. The same behaviour also was seen for search models with increasing amounts of coordinate error (data not shown). Clearly, in molecularreplacement problems where the search model is marginal, a large number of search attempts are required to ensure that the optimal solution is obtained. Although this is a drawback of the procedure, it may be compensated by the potential in some cases to use less complete or less accurate search models than is possible with conventional methods (Kissinger et al., 1999).
The chance of success in a given search attempt can be improved by increasing the population size or number of iterations of optimization, at the expense of more computation time. There is not a linear increase in search efficiency with increases in population size or number of iterations, however. The probability of success asymptotically approaches but never reaches 100%, with each incremental increase in population size providing a smaller improvement in search efficiency. It is usually more advantageous to increase the number of search attempts rather than the population size, as illustrated in Table 2 for three protocols that required roughly the same amount of computation time.

Interestingly, we have found that the search efficiency is frequently higher for polyalanine search models than for models that include side chains. This is true even for models in which the sidechain positions are quite accurate. We see two possible reasons for this. One is that a polyalanine model, although less complete, is frequently significantly more accurate. For the test case shown in Fig. 1, the RMS differences between the searchmodel coordinates and final refined coordinates are 0.37 Å for the polyalanine model and 0.71 Å for the allatom model. It has been a common observation that molecularreplacement models that are less complete but more accurate provide better results than more complete less accurate ones. The increased search efficiency occurs in this case despite the fact that the correlation coefficients are significantly lower for the polyalanine models than for the allatom models. We are investigating the possibility that use of polyalanine search models may be advantageous because this results in a smoother variation in the value of the throughout the search space, thus providing an easier landscape for the optimization algorithm to traverse.
We also compared the maximum searchmodel truncation that was possible using both polyalanine and allatom search models. The search models were truncated one residue at a time until the solution with the highest
that was obtained after 100 search attempts no longer corresponded to the correct solution. Surprisingly, nearly as many residues could be removed from the polyalanine model as from the allatom model. The minimally successful allatom search model, comprising 41 residues, represented roughly 35% of the total number of atoms in the molecule, while the minimal polyalanine model, 44 residues, represented only about 24% of the complete molecule.These results were obtained using a highquality search model in which most sidechain positions were very close to those in the final refined structure (RMS coordinate differences from the final model for sidechain atoms only were 1.1 Å). In molecularreplacement problems involving homology models derived from distantly related proteins, the results would be expected to be much less favourable for an allatom model. Our experiments suggest that there is rarely an advantage in including side chains in a model for
at least when using the evolutionary search procedure.4. Alternative scoring methods
We next investigated whether the search efficiency could be improved by using a statistic other than the standard shows the search efficiency for a polyalanine search model using three different targets: the using standard structure factors, the using normalized structure factors and the R factor. We have found little difference in search efficiency between correlation coefficients based on either unnormalized or normalized structure factors. In the test case shown, the use of normalized structure factors results in generally better search efficiency. Because our procedure does not attempt to optimize an overall temperature factor for the search model, the use of normalized structure factors might be advantageous in some cases because it minimizes the effect of errors in the temperature factors assigned to the atoms in the search model. The R factor consistently provided lower efficiency in our search procedure.
between observed and calculated structurefactor amplitudes as the target function. Fig. 2The maximum amount of searchmodel truncation that was possible using the different target statistics is shown in Fig. 3. Use of normalized structure factors allowed truncation of one additional residue from the search model in this case. Using the R factor, significantly less truncation was possible before the bestscoring solution no longer corresponded to the correct solution.
We are continuing to investigate alternative scoring methods. The latest version of EPMR allows the user to choose the between standard or normalized structurefactor amplitudes (or the squares of the structure factors) or the R factor as the target function. More sophisticated scoring methods, such as those based on (Bricogne, 1992; Read, 2001), might lead to further improvements in search efficiency and sensitivity.
5. Alternative stochastic search algorithms
We also investigated whether the search efficiency, particularly with lowaccuracy models, could be improved using an alternative stochastic search technique. Simulated annealing (Kirkpatrick et al., 1983; Kirkpatrick & Swendsen, 1985) is another widely used stochastic method for finding solutions to nonlinear optimization problems. This technique has been used for crystallographic (Brünger et al., 1987) and for protein from NMR data (Nilges et al., 1988); it has also recently been applied to the molecularreplacement problem (Glykos & Kokkinidis, 2000).
We implemented a sixdimensional molecularreplacement search procedure using a simulatedannealing algorithm following the recommendations from Kirkpatrick (1984). The temperature was decreased exponentially in 75 steps, with 300 trial moves at each temperature. To facilitate efficient sampling of the search space, we used a dynamically optimized Monte Carlo method based on the acceptance ratio of the trial moves (Bouzida & Kumar, 1992). Using this method, the acceptance ratio (the ratio of accepted moves to the total number of trial moves) is optimized by adjusting the maximum move size during each temperature step.
We compared the performance of this algorithm to that of our evolutionary search procedure. The results are shown in Fig. 4. The efficiency of the simulatedannealing algorithm was lower than that of the evolutionary algorithm, but also showed a roughly linear decrease in search efficiency with decreasing model completeness. This appears to be an inherent characteristic of these kinds of search methods when applied to the molecularreplacement problem. We believe the lower overall efficiency of the simulatedannealing algorithm primarily reflects a difference in the amount of tuning that we have performed to the two algorithms rather than an inherent superiority of the evolutionary algorithm. As expected, the maximum possible searchmodel truncation was identical for the two methods (data not shown).
6. The future of stochastic search methods in molecular replacement
Stochastic search algorithms have proven to be reliable and efficient tools for solving molecularreplacement problems. The ability to easily extend searches to more than six dimensions suggests a number of interesting ways in which these methods can be enhanced. For instance, when there are multiple copies of a molecule in the , 2001) and we have recently added this capability to EPMR. A simultaneous search has the same potential advantage over a sequential search that a sixdimensional search has over sequential rotation and translation searches; the signaltonoise ratio may be increased, but at the expense of increased computation time. The increase in the number of variables being optimized requires a much larger population size for the evolutionary search to be effective. There is currently a practical limit of two or three molecules that can be found simultaneously using our procedure without requiring impracticably long search times. (Problems involving larger numbers of molecules in the can be attempted using a series of twomolecule searches, however.)
the standard application of our method is to search for each copy sequentially, adding a static contribution to subsequent structurefactor calculations as each solution is found. However, procedures have recently been described for searching simultaneously for multiple copies of a molecule (Glykos & Kokkinidis, 2000In many molecularreplacement problems, it would be valuable to be able to specify particular internal ) has demonstrated the efficacy, when using separate rotation and translation searches, of optimizing the molecularreplacement model after the rotation search by `Patterson correlation' to improve the performance of the subsequent translation search. We have found that the addition of a small number of internal is possible in the evolutionary search procedure without a significant decrease in search efficiency. In this type of procedure, it is necessary to calculate a separate molecular transform for each independently varying part of the search model, however, and the resulting high memory requirements can limit the amount of internal optimization that can be attempted.
within the search model; for instance, to allow the angle between two domains to vary. Brünger (1990We are also extending the search algorithm to include not only optimization but also selection of the best molecular model. This is accomplished by including an additional parameter in the optimization that specifies which one of a set of search models is used. Initial experiments suggest that it is possible to select the optimal model from a set of similar aligned search models while simultaneously optimizing its position and orientation. This may be possible with larger databases of more distantly related structures as well if a suitable alignment is possible. The memory requirements for holding the molecular transforms of a large number of search models again represent a practical impediment to this approach, however. For this reason, screening of multiple search models might best be accomplished by evaluating the alternative models in parallel on multiple computers. Evolutionary algorithms are ideally suited for implementing in parallel across a large number of processors. We have developed a highly parallel version of EPMR to run on multiprocessor computers or computer clusters. This will facilitate the rapid screening of large databases of search models. The ultimate realization of this approach might be the use of a database of structures derived from the entire Protein Data Bank.
The extensible nature of stochastic search algorithms also makes them well suited for implementing procedures in which the search model is not simply optimized but instead actually constructed from structural fragments during the search process. A theoretical basis for such an approach has been discussed by Bricogne (1997). This kind of searchmodel `construction' clearly can be accomplished using a limited number of large domains, where the problem is essentially the same as that of finding multiple molecules in the We are investigating methods of extending the procedure to much smaller fragments.
The program EPMR can be downloaded free of charge from ftp://ftp.agouron.com/pub/epmr/ .
Footnotes
‡Current address: Structural GenomiX Inc., 10505 Roselle Street, San Diego, CA 92121, USA.
References
Bouzida, D., Kumar, S. & Swendsen, R. H. (1992). Phys. Rev. A, 45, 8894–8901. CrossRef CAS PubMed Web of Science Google Scholar
Brenner, C., Garrison, P., Gilmour, J., Peisach, D., Ringe, D., Petsko, G. A. & Lowenstein, J. M. (1997). Nature Struct. Biol. 4, 231–238. CrossRef CAS PubMed Web of Science Google Scholar
Bricogne, G. (1992). Proceedings of the CCP4 Study Weekend. Molecular Replacement, edited by E. J. Dodson, S. Gover & W. Wolf, pp. 62–75. Warrington: Daresbury Laboratory. Google Scholar
Bricogne, G. (1997). Methods Enzymol. 277, 14–18. CrossRef PubMed CAS Web of Science Google Scholar
Brünger, A. T. (1990). Acta Cryst. A46, 46–57. CrossRef Web of Science IUCr Journals Google Scholar
Brünger, A. T. (1997). Methods Enzymol. 276, 558–580. CAS Google Scholar
Brünger, A. T., Kuriyan, J. & Karplus, M. (1987). Science, 235, 458–460. PubMed Web of Science Google Scholar
Castellano, E. E., Oliva, G. & Navaza, J. (1992). J. Appl. Cryst. 25, 281–284. CrossRef CAS Web of Science IUCr Journals Google Scholar
Chang, G. & Lewis, M. (1997). Acta Cryst. D53, 279–289. CrossRef CAS Web of Science IUCr Journals Google Scholar
Crowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173–178. New York: Gordon & Breach. Google Scholar
Fogel, D. B. (1995). Evolutionary Computation: Towards a New Philosophy of Machine Intelligence. Piscataway, NJ: IEEE Press. Google Scholar
Fogel, L. J., Owens, A. J. & Walsh, M. J. (1966). Artificial Intelligence Through Simulated Evolution. New York: John Wiley. Google Scholar
Glykos, N. M. & Kokkinidis, M. (2000). Acta Cryst. D56, 169–174. Web of Science CrossRef CAS IUCr Journals Google Scholar
Glykos, N. M. & Kokkinidis, M. (2001). Acta Cryst. D57, 1462–1473. CrossRef CAS IUCr Journals Google Scholar
Huber, R. & Schneider, M. (1985). J. Appl. Cryst. 18, 165–169. CrossRef CAS Web of Science IUCr Journals Google Scholar
Kirkpatrick, S. (1984). J. Stat. Phys. 34, 975–986. CrossRef Web of Science Google Scholar
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. (1983). Science, 220, 671–680. Web of Science CrossRef PubMed CAS Google Scholar
Kirkpatrick, S. & Swendsen, R. H. (1985). Commun. ACM, 28, 363–373. CrossRef Web of Science Google Scholar
Kissinger, C. R., Gehlhaar, D. K. & Fogel, D. B. (1999). Acta Cryst. D55, 484–491. Web of Science CrossRef CAS IUCr Journals Google Scholar
Lattman, E. E. & Love, W. E. (1970). Acta Cryst. B26, 1854–1857. CrossRef CAS IUCr Journals Web of Science Google Scholar
Navaza, J. (1993). Acta Cryst. D49, 588–591. CrossRef CAS Web of Science IUCr Journals Google Scholar
Navaza, J. & Saludjian, P. (1997). Methods Enzymol. 277, 581–594. CrossRef Web of Science Google Scholar
Navaza, J. & Vernoslova, E. (1995). Acta Cryst. A51, 445–449. CrossRef CAS Web of Science IUCr Journals Google Scholar
Nilges, M., Marius, G. & Gronenborn, A. M. (1988). FEBS Lett. 239, 129–136. CrossRef CAS PubMed Web of Science Google Scholar
Read, R. J. (2001). Acta Cryst. D57, 1373–1382. Web of Science CrossRef CAS IUCr Journals Google Scholar
Sheriff, S., Klei, H. E. & Davis, M. E. (1999). J. Appl. Cryst. 32, 98–101. Web of Science CrossRef CAS IUCr Journals Google Scholar
Tong, L. & Rossmann, M. G. (1997). Methods Enzymol. 276, 594–610. CrossRef CAS PubMed Web of Science Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.