Direct phase selection of initial phases from single-wavelength anomalous dispersion (SAD) for the improvement of electron density and ab initio structure determination

A novel direct phase-selection method to select optimized phases from the ambiguous phases of a subset of reflections to replace the corresponding initial SAD phases has been developed. With the improved phases, the completeness of built residues of protein molecules is enhanced for efficient structure determination.

Optimization of the initial phasing has been a decisive factor in the success of the subsequent electron-density modification, model building and structure determination of biological macromolecules using the single-wavelength anomalous dispersion (SAD) method. Two possible phase solutions (' 1 and ' 2 ) generated from two symmetric phase triangles in the Harker construction for the SAD method cause the well known phase ambiguity. A novel direct phase-selection method utilizing the DS list as a criterion to select optimized phases ' am from ' 1 or ' 2 of a subset of reflections with a high percentage of correct phases to replace the corresponding initial SAD phases ' SAD has been developed. Based on this work, reflections with an angle DS in the range 35-145 are selected for an optimized improvement, where DS is the angle between the initial phase ' SAD and a preliminary densitymodification (DM) phase ' DM NHL . The results show that utilizing the additional direct phase-selection step prior to simple solvent flattening without phase combination using existing DM programs, such as RESOLVE or DM from CCP4, significantly improves the final phases in terms of increased correlation coefficients of electron-density maps and diminished mean phase errors. With the improved phases and density maps from the direct phase-selection method, the completeness of residues of protein molecules built with main chains and side chains is enhanced for efficient structure determination.

Introduction
X-ray protein crystallography has been an efficient and dominant method for determining the three-dimensional structures of biological macromolecules. Despite great progress towards its automation, the phasing of diffraction reflections is still a key step for structure determination. The single-wavelength anomalous dispersion (SAD) method using S atoms and various heavy atoms in protein molecules has become increasingly important in phasing because protein crystals typically suffer from radiation damage during the collection of diffraction data by the commonly used multiplewavelength anomalous dispersion (MAD) method. Moreover, S-MAD is not easily achievable at current synchrotron facilities because of its absorption edge in the low range of X-ray energies. The rate of success of S-SAD phasing is much more limited than the SAD method using heavy atoms (Hendrickson & Teeter, 1981;Dauter et al., 1999;Liu et al., 2000;Bond et al., 2001;Cianci et al., 2001;Gordon et al., 2001).
The two main steps in structure determination using the SAD method with sulfur and heavy atoms are locating anomalous scattering atoms in the unit cell to obtain the initial SAD phases from the anomalous differences of structure factors from diffraction intensities and improving the phases and electron density from initial SAD phases by density modification with various algorithms. In general, the overall average figure of merit of the SAD phases is much smaller than that from MAD phasing. A powerful method of density modification or phase improvement following the initial SAD phasing is hence essential for the success of structure determination. Several density-modification approaches are available, such as solvent flattening (Wang, 1985), maximum entropy in the direct method (Bricogne, 1984(Bricogne, , 1988, phase extension combined with entropy maximization and solvent flattening (Prince et al., 1988), direct-space methods in phase extension and phase refinement (Refaat et al., 1996), solvent flattening to improve the direct-method phases (Giacovazzo & Siliqi, 1997) and the programs DM from CCP4 (Cowtan & Main, 1993 and RESOLVE (Terwilliger, 2000).
For example, in the SHELXC/D/E program suite (Sheldrick, 2008), SHELXC is designed to provide a statistical analysis of the experimental X-ray diffraction data, to estimate the structure factors F H of scattering atoms and to prepare the preliminary data for SHELXD and SHELXE to locate the positions of heavy atoms for initial phasing (Usó n & Sheldrick et al., 2001;Schneider & Sheldrick, 2002) and to improve phases iteratively with density modification (Schneider & Sheldrick, 2002), respectively. The anomalous signals from heavy atoms can alternatively be refined iteratively with Phaser in CCP4 (McCoy et al., 2007). The CCP4 program DM can further improve the initial experimental SAD phase to give an improved electron-density map (Cowtan & Main, 1993. The powerful software SOLVE/RESOLVE can accomplish all of the steps for macromolecular structure determination by the SAD method, including data scaling, location of heavy atoms, initial SAD phasing, density modification and model building. In SAD mode, initial phases are obtained with SOLVE; RESOLVE subsequently performs the identification of noncrystallographic symmetry (NCS; Terwilliger, 2002), density modification (Terwilliger, 2000) and automated model building (Terwilliger, 2003). After density modification with solvent flattening, solvent flipping, NCS averaging, histogram matching, maximum likelihood or entropy maximization, an additional step using ARP/wARP can substantially improve the phases (Perrakis et al., 1997).
Beyond these protocols, some methods have been developed to resolve the phase ambiguity. One approach is to use the direct method based on the product of the Sim and Cochran distributions, which can improve the initial phases (Wang et al., 2004). It has been shown that assigning accurate phases to a few strong reflections can improve the densitymodification process in terms of the mean phase errors and map correlation coefficients (Vekhter, 2005). A recent study reported that the map skewness, which describes the extent to which the extreme values in a map tend to be systematically positive or negative, can be used to identify the correct phases for a few of the strongest reflections. A genetic algorithm was developed to optimize the quality of phases using the skewness of the density map as a target function. Such optimized phases have been used in density modification and the quality of the density maps was better than those generated from the original centroid phases (Uervirojnangkoorn et al., 2013). The initial phases obtained from the SAD, SIRAS and SIR methods can be improved according to these two approaches.
In the present work, we focus mainly on the improvement of initial phases from the general SAD method using sulfur or heavy atoms based on a novel 'direct phase-selection method' based on a ' DS list', where DS is the angle between the initial SAD phase and the preliminary DM phase, differing from previously reported methods. We demonstrate that this method of phase selection can resolve the phase ambiguity and improve the phases from SAD with increased effectiveness in combination with RESOLVE or DM utilizing only simple solvent flattening without phase combination and an FOM cutoff. A number of experimental SAD data sets with sulfur or metal (Zn, Gd, Fe and Se) atoms as the anomalous scatterers in proteins have been tested, including two unknown new protein structures; all results show that superior phases can be obtained with this new phase-selection method, yielding an enhanced quality of the corresponding electrondensity maps and increased completeness of model building. Harker construction for SAD phasing. The contribution of heavy atoms to a structure factor consists of a normal part, F H , and an anomalous part, F H 00 . The structure factor F PH is a normal part and F PH + and F PH À are anomalous parts of the protein crystal containing heavy atoms.

The phase ambiguity of SAD
The SAD experiment provides measurements of anomalous signals or Bijvoet differences, The amplitudes of the structure factors, jF ðþÞ PH j and jF ðÀÞ PH j, are measured from the diffraction intensities to estimate the contribution of anomalous scattering from heavy atoms. The positions of the heavy-atom substructures (X H ) can be located with the direct method or the Patterson method to derive the heavy-atom substructure factors (F H ) and the anomalous scattering contributions (F H 00 ). With this preliminary information, the Harker construction, which is based on the assumption that there are no errors in the amplitudes of structure factors or the heavy-atom model, generates two possible phase solutions (' 1 and ' 1 ), with one being the true phase and the other a false phase, from two symmetric phase triangles, as shown in Fig. 1 The phase ambiguity arises from the existence of an angle between F PH and F H 00 , related to jF ðþÞ PH j, jF ðÀÞ PH j and |F H 00 |, which can be calculated as (Blundell & Johnson, 1976) ffi cos À1 f½jF ðþÞ PH j À jF ðÀÞ PH j=2jF 00 H jg; and in which ' SAD is the phase of F H 00 .

Crystal preparation and data collection
Six SAD data sets were collected from five protein crystals with known structures, lysozyme_S (sulfur), lysozyme_Gd (gadolinium), insulin_S, lectin_Zn (zinc) and cytochrome c 3 _Fe (iron), and one crystal of unknown structure, histidinecontaining phosphotransfer B [HptB_Se (selenium)]. The crystallization of these proteins was performed by the hanging-drop vapour-diffusion method at 291 K. The crystallizations of lysozyme, insulin, lectin and cytochrome c 3 were performed using previously described protocols (Nanao et al., 2005;Nagem et al., 2001;Huang et al., 2006;Aragã o et al., 2003). Crystals of lysozyme_Gd and lectin_Zn were prepared with the soaking method, whereas HptB_Se was prepared with selenomethionine substitution during expression and crystallized (unpublished work). The X-ray SAD data sets were collected on beamline BL13B1 of the National Synchrotron Radiation Research Center (NSRRC) in Taiwan and beamline BL44XU of SPring-8 in Japan. The detailed statistics of data collection are summarized in Table 1.  Table 1 Statistics of X-ray data and structure refinement.
Values in parentheses are for the outermost shell.  is the ith intensity measurement and hI(hkl)i is the weighted mean of all measurements of I(hkl). The reflection cutoff [I/(I) > 0] was applied in generating the statistics. ‡ R work = P hkl jF obs j À jF calc j = P hkl jF obs j, where F obs and F calc are the observed and calculate structure-factor amplitudes of reflection hkl. § R free = P hkl jF obs j À jF calc j = P hkl jF obs j for 5% of the reserved reflections.

Location of substructures and generation of initial SAD phases
The overall procedure of the new phase-selection method for phase improvement in this work is shown in Fig. 2. The details of the input and data for each program in all of the steps in this study are presented in Supplementary List S1. 1 The S and heavy-atom substructures (X H ) were determined from the anomalous SAD data (ÁF AE ) with SHELXC/D/E in CCP4 (Sheldrick, 2008), which identified possible sites with high occupancies (Fig. 3). The positions and anomalous signals of S or heavy atoms were iteratively refined; the centroid phases were subsequently generated as the initial SAD phases (' SAD ) with Phaser in CCP4 (McCoy et al., 2007).

The control group for commonly used procedures
The flowchart of the overall procedure in this work is divisible into two groups: the control group (indicated by solid lines) and the experimental group (indicated by dashed lines) (Fig. 2). The control group consists of the regular method and the non-constraint method.
3.3.1. Regular method. In this step, we used RESOLVE to improve the initial SAD phase to obtain the final DM phases (' R DM ) and the regular map from the data set for the regular method, which is defined below, with the SAD standard protocol, including the Hendrickson-Lattman coefficients (phase probabilities) and fom_cut parameters, which set the initial resolution for density modification at the point at which the FOM has the default value of 0.15 (Terwilliger, 2000). For the parallel comparison, we also separately used the CCP4 program DM with solvent flattening and the standard parameters (Cowtan & Zhang, 1999), including the Hendrickson-Lattman coefficients with all reflections for the entire calculation (all reflections automatically weighted by the A calculation were used in every cycle), for density modification from the same initial SAD phase. After these calculations, an adapted data set for the regular method was generated that included some important parameters from the mathematical operations, which include hkl, F hkl , ' SAD , initial FOM, ' R DM (final DM phase from the regular method), , ' 1 and ' 2 , and ' C for further evaluation of the 'percentage correct'. This process is called the 'regular method', and the corresponding electron-density map using the final DM phases ' R DM is called the 'regular map'. For the theoretical simulation, the calculated model phases (' C ) were generated from the five corresponding refined structures. These initial structural models were obtained from the PDB (PDB entries 1gyo for cytochrome c 3 , 2bn3 for insulin and 2lyz for lysozyme) and our laboratory (lectin) and were refined with our experimental data with REFMAC5 (Murshudov et al., 2011;Winn et al., 2011) and visualized or adjusted with Coot (Emsley & Cowtan, 2004). The statistics of the structure refinement are summarized in Table 1. 3.3.2. Non-constraint method. For the non-constraint method, the final DM phases (' N DM ) were obtained using RESOLVE with the SAD standard default protocol, except for the Hendrickson-Lattman coefficients and fom_cut parameters, which set the initial resolution for density modification to the point at which the FOM value is 0. For a comparison, the CCP4 program DM was used in parallel to generate ' N DM phases with standard default parameters and phase extension in FOM steps, in which only the low-resolution reflections were used in the first cycle and extra reflections were added in each cycle until all of the data were used. Histogram matching and the Hendrickson-Lattman coefficients were excluded. In other words, no phase combination was carried out, thus no Hendrickson-Lattman coefficients are provided; all of the data were used for density modification without a resolution cutoff in this method. After the above calculation with the  non-constraint method, ' N DM (the final DM phase from the non-constraint method) can be obtained. The phases ' N DM and ' C were later used for evalution of the 'percentage correct'. This process is called the 'non-constraint method' and the corresponding electron-density map using final DM phases ' N DM is called the 'non-constraint map'. This calculation was performed for control purposes for comparison with the following experimental group with the same DM protocols.

The experimental group
In the experimental group, the direct phase-selection method is utilized as a new algorithm to optimize the initial phases. In this new approach, density modification with simple solvent flattening is first used only once to select one of the two possible phase choices from the SAD phase probability distribution for a subset of the reflections where the phase choice is most likely to be correct, thus differing from the standard approaches using SAD phase combination with Hendrickson-Lattman coefficients throughout density modification.

Preparation of data sets for the simulation test.
Among the six SAD experimental data sets, five cases, lyso-zyme_S, lysozyme_Gd, insulin_S, lectin_Zn and cytochrome c 3 _Fe, were examined with calculated phases ' C from their known models. The preliminary DM phases (' NHL DM ) were generated with one cycle of DM from the initial SAD phases ' SAD using the CCP4 program DM with solvent flattening, histogram matching and all reflections for the entire calculation, involving no Hendrickson-Lattman coefficients (NHL). The important angle parameters were then generated in the data sets for examination by simulation, some of which differed from those in the data set for the regular method in x3.3, such as ' NHL DM , ' am (ambiguity phase ' 1 or ' 2 determined from the preliminary DM phase ' NHL DM ) and DS (the angle between the initial SAD phase ' SAD and the preliminary DM phase ' NHL DM ).

3.4.2.
Overall procedures of the experimental group. The experimental group in Fig. 2 shows the protocol to produce improved phases and direct selection maps. The optimum initial phases ' S SAD were determined from ' NHL DM by the direct phase-selection method de novo (see details in xx3   Locations of the heavy-atom sites with the corresponding occupancies calculated with SHELXC/D/E in CCP4. ' S SAD using RESOLVE or DM, respectively, in parallel for comparison, using the same protocols as were used in the nonconstraint method. The direct selection map is consequently generated with the final DM phases ' S DM . 3.4.3. Phase-selection rule and definition. If the preliminary DM phase ' NHL DM is located in region 1 or 2, the new initial ' am phase is selected as ' 1 or ' 2 , respectively (Fig. 4a). The correct or incorrect selection is defined when ' NHL DM and the model-calculated ' am are in the same region or in different regions, respectively. The selected correct or incorrect phase ' am (' 1 or ' 2 ) is thus based on the phase ' NHL DM because the model phase ' C is fixed by the refined structures. Here, we define the percentage correct as the number ratio of the reflections with selected correct phases to the total reflections. The percentage correct and the angle DS define the 'confidence level' and the 'confidence interval', respectively.
The protocol to determine the percentage correct for the simulation cases is applicable not only to the direct phaseselection method in the experimental group but also to the regular and non-constraint methods in the control group. However, ' C is not available from the practical cases without known structures to distinguish the correct or incorrect phase ' am . The distribution statistics of the percentage correct from our five simulation cases enable us to instead use the novel ' DS list' as the criterion to select the phases for the practical cases without structural models. The details of experimental procedures utilizing the DS list in the direct phase-selection method de novo are described in the following sections.

Direct phase-selection method based on h DS angles.
Our simulation results and statistics show that a high percentage correct occurs at an angle DS in the range between 35 and 145 in regions 1 and 2 (Figs. 4c and 5). A higher confidence level is hence obtained in the confidence interval between 35 and 145 in regions 1 and 2. A ' DS list' from the smallest to the largest angles can be generated. Reflections from the DS list with the angle DS between 35 and 145 are selected, which have a relatively high probability of the correct selected phase ' am . The selected phase ' am is either ' 1 or ' 2 depending on the preliminary DM phase ' NHL DM . The initial phases ' SAD of all of the reflections in the range 35-145 are then replaced by the corresponding selected phases ' am for optimized improvement.
The reflections with replaced phases (selected phases ' am ) and the rest with unselected initial phases ' SAD are subsequently combined into a new data set with optimum initial phases ' S SAD . In this step, FOM = 1.0 was used as the weighting scheme for the selected phase ' am without Hendrickson-Lattman coefficients, whereas the initial FOM values were used for the rest of the unselected phases. In the DM process no phase recombination was carried out, only use of the FOM as the weighting scheme. The final DM phase ' S DM was improved from ' S SAD using RESOLVE or DM in parallel. After the above calculation, final DM phases ' S DM from the direct phase-selection method can be obtained for the calculation of electron density and the evaluation of the percentage correct. Optimum initial phases ' S SAD possessing a higher The highest percentage correct occurs at an angle DS in the range between 35 and 145 for the selected phase ' 1 or ' 2 in region 1 or 2, respectively. The horizontal axis indicates the range of the angle DS from 0.1 to 180 , whereas the vertical axis indicates the percentage correct.
percentage correct have a better chance of improving the DM phases ' S DM compared with ' R DM and ' N DM in the control group. This method is called the 'direct phase-selection method'.

Determination of heavy-atom substructures
For five test cases, the substructures in protein crystals were first solved with SHELXC/D/E based on the anomalous difference maps of each SAD data set. The numbers of heavyatom sites and sulfur 'super-atom' sites were determined (Fig. 3). Five and three strong sulfur 'super-atom' sites with occupancies greater than 0.75 and 0.60 were located in the unit cells of lysozyme_S and insulin_S at resolutions of 1.82 and 2.52 Å , respectively. Two Zn positions with occupancies near 1 were determined in lectin_Zn. One Gd position with occupancy $1 was found in lysozyme_Gd. Eight Fe sites with occupancies greater than 0.8 were located in cytochrome c 3 _Fe. All of the sites with occupancies found by SHELXC/D/ E were input directly to Phaser in CCP4 to generate the initial SAD phases ' SAD . The overall initial hFOM SAD i (mean FOM) of the five test cases were determined as 0.489, 0.262, 0.553, 0.467 and 0.543 for cytochrome c 3 _Fe, lectin_Zn, lysozyme_ Gd, lysozyme_S and insulin_S, respectively.

Relationship between the percentage correct and the angle h DS
Based on the simulation results using the direct phaseselection method (x3.4) and the model phases ' C , the statistics clearly show that the percentage correct at angles DS in the range between 35 and 145 is generally higher than that at other angles (Figs. 4c and 5). In five simulation cases, the percentage correct could not be efficiently estimated in the range 0-10 for lectin_Zn, lysozyme_Gd, lysozyme_S and insulin_S because there are either no or only a few reflections in this small range.

The percentage correct versus the initial FOM using various methods
In this section, we show how the relation between the initial FOM and the percentage correct varies according to the three methods. In five simulation cases, the data sets from the regular method, the non-constraint method and the direct Percentage correct as a function of FOM for initial phases with the regular method, the non-constraint method and the direct phase-selection method. The horizontal axis indicates the initial FOM from the smallest to the largest values (0-1.0). phase-selection method show that the percentage correct varies with the range of the initial FOM (Fig. 6). After calculations with the regular method, the nonconstraint method and the direct phase-selection method, some parameters, including the final DM phase and the modelcalculated phase ' C , are used to evaluate the percentage correct for each case for comparison purposes. A correct or incorrect selection is defined as when the final DM phase (' R DM , ' N DM or ' S DM ) and the model-calculated ' C are in the same region or are in different regions, respectively. The percentage correct is defined as the number ratio of reflections with selected correct phases to the total reflections. A comparison of the same reflections in each FOM interval among the data sets from the regular method, the non-constraint method and the direct phaseselection method indicates that the percentage correct using the direct selection method is higher than those of the regular and non-constraint methods in all five cases. The percentage correct with the regular method is slightly higher than that with the nonconstraint method in each case (Fig. 6).

Improvement of density-map quality
In this section, we examine the differences among the regular map, the non-constraint map and the direct phase-selection map after final density modification  Electron-density maps of cytochrome c 3 _Fe, lectin_Zn, lysozyme_Gd, lysozyme_S, insulin_S and HptB_Se (the unknown structure) after density modification with RESOLVE from various methods (the non-constraint map, the regular map and the direct selection map) are shown with the same contour level 1.0 in blue. The corresponding structures are shown as black sticks.
with RESOLVE. A comparison of the regular maps, nonconstraint maps and the direct phase-selection maps in all five test cases shows that the continuity and completeness of the electron-density maps using the direct phase-selection method are significantly improved and are superior to those of the regular map and the non-constraint map (Fig. 7). The statistical indicators, such as the map correlation coefficient and mean phase error, for the map quality after DM are evaluated in Table 2. On comparison, the new direct phase-selection method gives better map quality statistics than those for the regular and non-constraint methods for all test cases.
Similarly, with the above-mentioned protocol but using the CCP4 program DM with solvent flattening, the results from all test cases show that the new selection method gives better statistics than those for the regular and non-constraint methods (Table 3).

A comparison of model building with regular, non-constraint and direct selection maps
In our experiment, automated model building after the final density modification with RESOLVE for the five test cases was performed with ARP/wARP. According to the results of model building shown in Table 2, the completeness of autobuilt residues with main chains and side chains in lectin_Zn, lysozyme_Gd and lysozyme_S with the direct phase-selection method is greater than those with the non-constraint and regular methods. The autobuilding results for insulin_S are comparable among the three methods. For a parallel comparison, improvements in model building were also observed with the CCP4 program DM combined with the direct phase-selection method ( Table 3).
All of the proteins could be autobuilt using ARP/wARP except for cytochrome c 3 _Fe (resolution 3.0 Å ) because of the resolution limitation of 2.5 Å for ARP/wARP autobuilding. The structure of cytochrome c 3 _Fe could, however, be built manually (73%) based on the improved density map at resolution 3.0 Å generated from the direct phase-selection method compared with the maps from the regular and non-constraint methods, which were not suitable for model building because of severe discontinuity.

Application to an unknown structure
In this section, we applied our newly investigated selection method to a practical case with an unknown structure: histidine-containing phosphotransfer domain B (HptB). HptB comprises 116 amino-acid residues with a molecular mass of $13.2 kDa. HptB_Se was expressed in E. coli in selenomethionine medium for Se-SAD phasing and structure determination. Crystals of HptB_Se diffracted to 2.0 Å resolution and exhibited the symmetry of space group I4 1 22 (Table 1). Interpretations of the anomalous difference map with SHELXC/D/E revealed nine Se sites with occupancies in the range 0.4-1.0 (Fig. 3). The overall hFOM SAD i of the initial SAD phases was determined to be 0.428 using Phaser in CCP4.
The preliminary DM phases (' NHL DM ) were obtained from the initial SAD phases after the first run of DM in CCP4. The DS list from the smallest to the largest was then generated; the phase ' 1 or ' 2 in each reflection at a corresponding angle DS is the DM phase of the ith reflection and ' c (i) is the model phase. N donates the total number of reflections. § The completeness of autobuilt residues with side chains was calculated with ARP/wARP. All proteins can be autobuilt except for cytochrome c 3 _Fe, because the resolution limit of autobuilding in ARP/wARP is $2.5 Å . } Phase ' 1 or ' 2 is selected based on the model phase ' C . in a range between 35 and 145 was subsequently selected with the direct phase-selection method. Data sets with optimized ' S SAD were generated and subsequently directed to RESOLVE to calculate the direct selection map with improved final DM phases ' S DM (Fig. 7). In this step, no phase combination was carried out, thus no Hendrickson-Lattman coefficients were provided; all of the data were used for density modification without a resolution cutoff. For comparison, the regular and non-constraint maps were also obtained with the regular and non-constraint methods, respectively, using RESOLVE. As a result, similar to the five test cases, the continuity and completeness of the direct selection map were significantly improved (Fig. 7).
The initial protein structure was then autobuilt with ARP/ wARP and the final model was refined and completed with REFMAC5 and Coot. The results of the automated model building of the HptB_Se structure with ARP/wARP are compared among the various methods, which show that the direct selection method produces a much higher completeness of residues built with side chains and main chains ( Table 2). The newly determined and refined structure of HptB allowed us to calculate the model phases (' C ) to interpret and to compare the regular method, the non-constraint method and the direct selection method with the statistics of map-quality indicators using RESOLVE. According to the comparison, the quality of the electron-density map using our new direct phase-selection method is much improved, with superior statistics for indicators including the map correlation coefficient, the mean phase error and the completeness of built residues (Table 2 and Fig. 7). For a comparison, improvements were also obtained with the CCP4 program DM combined with the direct phase-selection method (Table 3).
A few more test examples, including chitinase with Zn atoms , sulfite reductase with Fe (Hsieh, Liu et al., 2010) and the unknown structure of haemerythrin with Fe (Phimonphan et al., unpublished data), have also been applied and examined using our new phaseselection method. All of the results showed that the direct phase-selection method produced a similar improvement as in previously described cases, with enhanced electron densities and statistics of indicators (Supplementary Table S1).

Simulated phase selection based on the model phase u C
According to the direct phase-selection method, a portion of the ' am (' 1 or ' 2 ) phase set can be correctly selected with preliminary DM phases ' NHL DM . Our ultimate objective is to select the correct phase (' 1 or ' 2 ) optimally. In our simulation experiment, we demonstrate that the correct phase (' 1 or ' 2 can be effectively selected with the model phase ' C for each reflection in all simulation cases. The correct selected phases ' am are hence highly dependent on the phases ' C . The mean phase errors of the final DM phases and the map correlation coefficients of both the initial phases and the final DM phases were calculated for five simulation cases for evaluation (Table 2) Table 3 Comparison of indicators of map quality among various methods with the CCP4 program DM.
is the DM phase of the ith reflection and ' c (i) is the model phase. N donates the total number of reflections. § The completeness of autobuilt residues with side chains was calculated with ARP/wARP. All proteins can be autobuilt except for cytochrome c 3 _Fe, because the resolution limit of autobuilding in ARP/wARP is $2.5 Å . } Phase ' 1 or ' 2 is selected based on the model phase ' C . simulation with known ' C produces much improved map correlation coefficients and mean phase errors (by 0.05-0.3 and 10-35 , respectively) relative to the regular and nonconstraint methods with RESOLVE. A similar improvement of map correlation coefficients and mean phase errors was observed using the direct selection method combining the CCP4 program DM with solvent flattening and standard parameters (Table 3). This simulation provides a basis for the use of the correctly selected initial phases to improve the map correlation coefficients and mean phase errors. However, ' C is unavailable in practical cases without known structures for the selection of the correct phase ' am . The derivative direct phaseselection method using the angle DS , without the information of ' C , is hence investigated to improve the electron-density map for practical applications.

Comparison of the quality of density maps with various methods
In all test cases, the electron-density map from our new direct phase-selection method is significantly better than those from conventional approaches in terms of map continuity and completeness (Fig. 7). The map correlation coefficients and mean phase errors are generally improved by 0.05-0.2 and 10-18 , respectively, using the direct phase-selection method with a single cycle utilizing the new selected phases ' am (' 1 or ' 2 ), with a higher confidence level to replace the corresponding initial SAD phases ' SAD (Table 2). An iterative calculation with several cycles was also performed to examine any improvement; however, we found that the direct phaseselection method with two or three cycles did not show a notable improvement in map quality and indicators. All of the simulation results show that using selected phases ' am (' 1 or ' 2 ) based on the preliminary DM phase ' NHL DM , instead of ' C , could efficiently improve the map correlation coefficients and mean phase errors and generate density maps of a higher quality from the final DM phases ' S DM compared with the regular methods with the Hendrickson-Lattman coefficients and the commonly used DM default procedures.
To further demonstrate the power of the direct phaseselection method, calculations using the regular method with combinations of phase choice by OASIS direct methods and density modification with RESOLVE and DM were carried out for comparison. In general, the regular method with OASIS initial phases and DM produced results better than those with OASIS initial phases and RESOLVE. However, our direct phase-selection method gave results that were superior overall to calculations using OASIS initial phases combined with both DM programs (Supplementary Table S2).
Moreover, from analysis of our direct phase-selection method, selecting one of the two phase choices from the SAD phase probability distribution seems to shift the starting phase more than performing phase combination. Thus, we performed solvent flipping, another over-shifting method, for comparison. The results showed that using solvent flipping in the regular method did not produce better results than the direct selection method in terms of the mean phase errors and residues built (Supplementary Table S3).

Comparison of the map quality with various aspects of the direct selection method
To optimize the algorithm for direct phase selection, we performed a parallel comparison with various aspects related to this method, including FOM weighting, resolution, iterative cycles, initial SAD phases and ranges. For the FOM, we performed a parallel comparison of three different FOM weighting schemes in our direct phase-selection method: (i) FOM = 1.0 for the selected phases and the initial FOM from SAD for the unselected phases, (ii) FOM = 1.0 for all phases and (iii) the initial FOM from SAD for all corresponding phases. From the parallel comparison, the direct selection method with the weighting scheme (i), which is used in this study, is either comparable to or better than the other two weighting schemes (ii) and (iii) (Supplementary Table S4). We also examined other possible weighting schemes coupled with the selection criteria, i.e. the percentage correct. Fig. 5 shows that the percentage correct varies at different ranges of DS . We tested the weighting scheme corresponding to the percentage correct for selected reflections in the range 35-145 . The respective FOM values are calculated based on the ratio of percentage correct for reflections in different DS ranges. For the unselected reflections, the FOMs are set to the initial FOM values from SAD phasing. The comparison shows that the weighting scheme (i) with FOM = 1 for the directly selected phases is either better than or comparable to the percentage correct-dependent FOM weighting scheme.
For the resolution, we examined various resolution ranges varying from the highest resolution of the data and found no notable differences in improvement. For the iterative cycles, a series of iterative cycles were performed to examine any improvement with the direct phase-selection method, which showed that two or three more cycles did not produce a significant further improvement of the map quality and indicators. For the SAD initial phases, we tested the initial SAD phases generated from OASIS (He et al., 2007) and performed the same protocols of the direct phase-selection method. The results show that our direct selection method using initial SAD phases generated directly from Phaser was better than using initial phases from OASIS. The details of the ranges used are described in the following section.

Percentage correct as a function of the angle h DS
For determination of the optimized range in this work, we extensively analyzed all of the calculations in various ranges for all test cases (Supplementary Table S5). The statistical indicators, including the map correlation coefficients, the mean phase errors and the completeness of the built residues, were improved in all cases with reflections with selected phases ' am at DS angles in the range 35-145 (except for lectin_Zn, where the angles were in the range 40-140 ) relative to the other angle ranges. Considering the comparable statistical indicators (mean phase error 54.28 versus 54.14 ) and the completeness of the built residues (84.3% versus 84.9% for main chains and 80.8% versus 80.5% for side chains) using the DS ranges 35-145 and 40-140 , respectively (Supplementary Table S5), we suggest selecting the reflection phases from angles in the range 35-145 for lectin_Zn as in the other cases for unanimity based on statistical analysis. The fraction of selected reflections is about 0.28-0.48 of the total reflections for DS angles between 35 and 145 in all six cases. DS angles of <35 and >145 show a lower percentage correct for the selection of phase ' 1 or ' 2 in region 1 or 2. For cases with DS < 35 , the angles between the preliminary DM phase ' SAD and the initial SAD phase ' NHL DM might be too small to resolve the ambiguity from the two initial phases (' 1 and ' 2 ) in the grey zone with a low percentage correct (Figs. 4b and  4c). Similarly, for cases with DS > 145 (the grey zone), the angles between ' NHL DM and ' SAD might be too large such that the two initial phases (' 1 and ' 2 ) cannot be effectively distinguished for the correct or incorrect phases with a low percentage correct.
We found that a higher average intensity hIi commonly corresponds to an angle DS in the range between 40 and 120 (Supplementary Table S6). Some individual strong reflections have been shown to improve the map quality after density modification (Uervirojnangkoorn et al., 2013;Vekhter, 2005;Zhang & Main, 1990). The distribution of strong reflections might be one of the reasons why the higher percentage correct occurs at a DS angle in the range 35-145 (Figs. 4c and 5). The algorithm of our direct selection method based on DS angle combined with weighting schemes is different from previous methods.

Percentage correct versus initial FOM with various methods
A comparison of reflections in the same batch among different data sets for the regular method, the non-constraint method and the direct phase-selection method shows that the percentage correct with the direct phase-selection method is generally 5-10% higher than that with the regular and nonconstraint methods in the five simulation cases (Fig. 6). Using the selected phases ' am (' 1 or ' 2 ) could thus efficiently improve the percentage correct compared with the regular method with the Hendrickson-Lattman coefficients and the commonly used DM procedure as described in x2. The percentage correct generally decreases for reflections with large initial FOM values (>0.8), which might result from the two close phases (' 1 and ' 2 ), similar to cases with DS < 35 . The percentage correct might also be affected by lack of closure, random errors and systematic errors (Borek et al., 2003).

Conclusions
The discussions above clearly show that the new procedure of phase improvement, i.e. the direct phase-selection method, combined with RESOLVE or the CCP4 program DM, can effectively improve the phase in comparison to the regular method with Hendrickson-Lattman coefficients using RESOLVE and DM. In the direct selection method, the SAD standard protocol was applied in the RESOLVE routine except for the Hendrickson-Lattman coefficients and fom_cut parameters. Similar improvements in SAD phases and density maps were obtained using DM with standard parameters except for histogram matching and Hendrickson-Lattman coefficients.
Ideally, according to our simulation study, a relatively high completeness for the selection of the correct phases ' 1 or ' 2 could be achievable, but only based on known ' C . A lack of known structures or model-calculated ' C in the practical applications led us to investigate the novel 'direct phaseselection method', which utilizes the ' DS list' to select phases ' am from ' 1 or ' 2 of selected reflections (28-48%) with high percentage correct phases to replace the corresponding initial SAD phases ' SAD . A comparative analysis implies that the choice of a proper subset of reflections with the selected phase based on DS might be more decisive than other aspects, such as the DM program and weighting scheme, in which FOM = 1.0 is used for the selected phases and the initial FOM of SAD is used for the unselected phases in this method.
Optimization of the initial phasing is considered to be a decisive factor in the success of the subsequent electrondensity modification, model building and structure determination with the SAD method. Our new direct phase-selection method provides a powerful protocol with an essential additional selection step, combined with current DM software for simple solvent flattening, such as RESOLVE and DM, to resolve the initial phase ambiguities of a subset of reflections for further density modification. In contrast to most phaseimprovement studies, which focus on density modification after the initial SAD phasing, our method focuses on the optimization of the initial phasing of a subset of reflections by imposing a binary phase choice, without using phase combination, to shift the phase probability distribution towards the better phase choice. With better initial SAD phases before carrying out the general DM procedure, the success rate of structure determination might be increased. The resulting final DM phases and electron-density maps were effectively improved by the direct phase-selection method compared with the regular method with Hendrickson-Lattman coefficients, yielding improved statistical indicators of map quality and completeness of model building. Based on our test results, with data of average or below average quality (high R merge or medium-low resolution), the direct phase-selection method with an additional selection step for simple solvent flattening could still perform well with good electron density for model building. Optimization and increased completeness of the phase selection will be studied systematically in the near future.
We are indebted to the supporting staff at beamlines BL13B1, BL13C1 and BL15A1 at the National Synchrotron Radiation Research Center (NSRRC) and Masato Yoshimura and Hirofumi Ishii at the Taiwan-contracted beamline BL12B2 and beamline BL44XU at SPring-8 for technical research papers assistance under proposal Nos. 2011A4017, 2011A4002, 2011B4012, 2011B4004, 2012A4009 and 2012A6760. We thank Professor Tomake Tsukihara for valuable suggestions and discussions. This work was supported in part by National Science Council (NSC) grants 98-2313-B-009-001-MY3 and 101-2628-B-213-001-MY4 and National Synchrotron Radiation Center (NSRRC) grants 1003RSB02 and 1023RSB02 to C-JC.