research papers
Ab initio phasing by molecular averaging in real space with new criteria: application to of a betanodavirus
aLife Science Group, Scientific Research Division, National Synchrotron Radiation Research Center, 101 Hsin-Ann Road, Hsinchu 30076, Taiwan, bInstitute of Biotechnology and University Center for Bioscience and Biotechnology, National Cheng Kung University, Tainan 701, Taiwan, cInstitute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan, and dDepartment of Physics, National Tsing Hua University, Hsinchu 39943, Taiwan
*Correspondence e-mail: cjchen@nsrrc.org.tw
Molecular averaging, including ab initio phase determination and phase improvement. Applications of the cross-crystal averaging (CCA) method have been shown to be effective for phase improvement after initial phasing by or combinations of these methods. Here, a two-step process for phase determination in the X-ray structural analysis of a new coat protein from a betanodavirus, Grouper nervous necrosis virus, is described in detail. The first step is ab initio of the T = 3 icosahedral virus-like particle using NCS averaging (NCSA). The second step involves of the protrusion domain of the viral molecule using cross-crystal averaging. In this method, molecular averaging and solvent flattening constrain the electron density in real space. To quantify these constraints, a new, simple and general indicator, free fraction (ff), is introduced, where ff is defined as the ratio of the volume of the electron density that is freely changed to the total volume of the crystal This indicator is useful and effective to evaluate the strengths of both NCSA and CCA. Under the condition that a mask (envelope) covers the target molecule well, an ff value of less than 0.1, as a new rule of thumb, gives sufficient phasing power for the successful construction of new structures.
(NCS) averaging, is a powerful method forKeywords: ab initio phasing; NCS averaging; cross-crystal averaging; multi-crystal averaging; phasing power; icosahedral virus structure; free fraction.
1. Introduction
The molecular averaging method in real space, including et al., 1985; Arnold et al., 1987), several virus structures have been solved using NCSA with phase extension. As the phasing power of NCSA is effective, the possibility of ab initio phase determination has been proposed. The feasibility of ab initio phase determination has been demonstrated for spherical viruses (Tsao et al., 1992). Ab initio trials and procedures have been reported for known icosahedral viruses (Taka et al., 2005). An assessment of the application of NCSA to icosahedral viruses, together with a detailed investigation of successful and unsuccessful cases, has also appeared (Plevka et al., 2011).
(NCS) averaging and cross-crystal averaging (CCA), coupled with solvent flattening, has proved to be powerful in improving the determination of the phases in protein crystallography. As a case in point, NCS averaging (NCSA) with phase extension of icosahedral viruses is a common procedure for phase improvement after initial calculations based on (MR) using a density map from a cryo-electron microscope, a similar model or initial experimental phases from or Since the successful of the human common cold virus serotype 14 (RossmannWhen the initial phases were unsatisfactory, the CCA method has been utilized for phase improvement (Taylor, 2010). Many cases were improved with CCA from initial phasing of various types: MR (Isupov et al., 2004), multiple (MIR; Crennell et al., 2000; Chen et al., 2005) and single-wavelength anomalous diffraction (SAD; Su et al., 2010). The phasing power of CCA for the improvement of initial phases from MR to obtain a partial unknown structure has been demonstrated and discussed (Li & Li, 2011), but there has been no attempt to achieve ab initio phasing by CCA.
Nodaviridae are a family of positive-sense single-stranded RNA viruses with a non-enveloped T = 3 icosahedral capsid. Alphanodavirus and Betanodavirus are the two major genera. Several structures of alphanodaviruses, including Nodamura virus (NoV; PDB entry 1nov; Zlotnick et al., 1997), Pariacoto virus (PaV; PDB entry 1f8v; Tang et al., 2001), Black beetle virus (BBV; PDB entry 2bbv; Wery et al., 1994) and Flock house virus (FHV; PDB entry 4fsj; J. A. Speir, Z. Chen, V. S. Reddy & J. E. Johnson, unpublished work), have been solved. Various strains of the genus Betanodavirus have been isolated from several fishes, such as Striped jack nervous necrosis virus (SJNNV), Tiger puffer nervous virus (TPNNV), Red-spotted grouper nervous necrosis virus (RGNNV) and Barfin flounder nervous necrosis virus (BFNNV) (Nishizawa et al., 1997). These betanodaviruses have a similar viral genome and cause viral nervous necrosis (VNN). We have recently determined the first crystal structures of a betanodavirus, Grouper nervous necrosis virus (GNNV), in various forms: a complete T = 3 GNNV-like particle (GNNV-LP) at 3.6 Å resolution, a T = 1 icosahedral subviral particle of the truncated surface domain (S-domain) at 3.1 Å resolution and the individual protrusion domain (P-domain) of the GNNV coat protein (CP) at 1.2 Å resolution (Chen et al., 2015). In this paper, we describe the processes used in phase determination of the T = 3 GNNV-LP and the P-domain in detail.
To determine the new structure of GNNV-LP, we applied molecular-averaging methods in two steps. Firstly, an NCSA ab initio phasing trial with a similar envelope from a different virus was performed. The structure of the whole virus-like particle (VLP) of T = 3 GNNV was successfully obtained. Atomic modelling with the amino-acid sequence of the S-domain of the T = 3 GNNV capsid protein was straightforward. The density map of the P-domain region in the T = 3 GNNV-LP was, however, insufficient for complete model building, so that only partial main chains could be traced. An analysis of the T = 3 GNNV particle structure led us to design a truncated form consisting of only the P-domain of the CP. Several crystal forms of the individual P-domain were subsequently obtained at high resolution (1.8 Å for phasing, 1.2 Å for final refinement). Secondly, utilizing the cut-out envelope of the P-domain region of the T = 3 GNNV-LP crystal, we applied CCA to the various crystal forms of the P-domain to acquire the correct phases, including the phases for the high-resolution data. In this manner, the structure of the individual P-domain was successfully built with good quality and was subsequently used for complete model building and of T = 3 GNNV-LP.
Following earlier work (Arnold & Rossmann, 1986), a new and simpler quantification of the power of molecular averaging is introduced in our work. In molecular averaging, some portions of the electron density in the of the crystal are constrained by both molecular averaging and solvent flattening. We quantify an indicator, named the free fraction (ff), which corresponds to the ratio of the constrained volume and the unconstrained volume and can be easily applied to the case of CCA. Here, we discuss our attempts in applying ff to phase determinations with both NCSA and CCA.
2. Methods and results for phase determination of icosahedral T = 3 GNNV-LP using NCS averaging
2.1. Data collection and processing
Crystals of T = 3 GNNV-LP were first screened on beamlines BL13B1, BL13C1 and BL15A1 at NSRRC, Hsinchu, Taiwan and BL12B2 at SPring-8, Harima, Japan; the complete X-ray diffraction data sets were collected on BL44XU at SPring-8 with a charge-coupled device (CCD) detector (MX225-HE, Rayonix). The crystals of GNNV-LP belonged to the monoclinic C2, with unit-cell parameters a = 477, b = 422, c = 337 Å, β = 134°. Diffraction images were collected with an oscillation angle of 0.3° per frame and an exposure duration of 3 s at a wavelength of 0.9 Å. The high-resolution data set (50–3.6 Å) was collected first.
To assist ab initio phase determination, data at very low resolution are essential. During data collection, the beamstop was moved to the downstream side at 95 mm from the crystal, as near to the detector as possible; an X-ray beam of wavelength 1.2 Å was used to include the lowest resolution spots (266–12 Å). The images at this low resolution were collected with an oscillation angle of 0.5° per frame and an exposure duration of 3 s. The entire data sets, including those at very low and high resolutions, were recorded with an X-ray beam size of 50 µm on separate diffraction regions of a long GNNV-LP crystal of dimensions 0.1 × 0.1 × 0.3 mm.
For data processing, including integration and scaling, we used HKL-2000 (Otwinowski & Minor, 1997). During the processing of the data at the lowest resolution, the rejection criterion to select diffraction spots was changed to the loosest criterion in order to prevent the rejection of any spots. Even though heavily overloaded spots were observed in the region of very low resolution, we collected these spots without rejections; the number of missing reflections at the lower resolution side was thus kept to a minimum. Only two reflections, (h k l) = (3 1 −1) and (3 1 −2), were missed in the resolution range below 100 Å. A total of 600 frames were taken for the high-resolution data set and 360 frames for the low-resolution data set; scaling was performed concurrently with these 960 frames in total. The data set at the highest resolution eventually contained 499 184 reflections from 50 to 3.6 Å with a completeness of 98% for use in refinement.
2.2. Phase determination of T = 3 GNNV-LP
Self-rotation functions were calculated to determine the orientations of the icosahedral symmetries (twofold, threefold and fivefold axes) using MOLREP (Vagin & Teplyakov, 2010) from the CCP4 program suite (Winn et al., 2011). The orientation of the GNNV-LP particle was suggested to be the rotated orientation with CCP4 polar angles (ω, φ, κ) = (90, 90, 40°) from the icosahedral convention of orthogonal coordinate axes and fold axes: 2(z)-5-3-(x)2 (Arnold et al., 1984). There were two GNNV-LP particles in the monoclinic C2 with one twofold NCS axis of the virus icosahedral particle coinciding with the crystallographic twofold axis. One contained half of a GNNV-LP particle or 30 copies of the icosahedral symmetric unit. 60 matrices for icosahedral symmetries were thus generated, of which 30 were used in the averaging calculations as the initial matrix operators for NCSA. To simplify the matrix parameters, we arranged the centre of one GNNV-LP icosahedral particle to be located at the origin (0, 0, 0).
For the ab initio phasing method, which used NCSA with phase extension, an appropriate mask was necessary to separate two regions: the protein region to be NCS-averaged and the solvent region to be flattened. The initial mask for NCSA and solvent flattening was created from the atomic structure of T = 3 FHV (PDB entry 4fsj), a disparate structure, with a larger mask radius of 11–13 Å around each atom (Fig. 1). The mask ranges with those protein regions overlapping with the neighbouring particle were truncated at half the distances of the nearest atoms. As the starting model, a spherical-shell model with uniform density was used; the outer and inner radii of the model were 159 and 119 Å, respectively. These radius values were typical initial parameters for the T = 3 virus particles suggested in previous work (Taka et al., 2005). A comparison of the data with the scattering amplitudes of the spherical-shell model is shown in Supplementary Fig. S1. The model reproduces similar nodes and peaks to the data of lower resolution (>60 Å).
Using data with the `initial' parameters of the above-described initial mask, the initial spherical model and the initial NCSA matrices, ab initio phasing with NCSA was subsequently applied. A basic NCSA cycle involved iterative calculations between real space and which were linked with the Fourier transform (FT) and inverse FT (FT−1). In this cycle, NCSA and solvent flattening were applied to the density map in real space. A phase combination of Φcalc with w × Fobs by Rayment weighting (Rayment, 1983) was applied in in which w corresponds to
where Fobs is the observed and Fcalc is the calculated from the processed density map. A schematic diagram to explain the basic cycle is presented in Fig. 2. In the cycling calculation, we used AVE from RAVE (Kleywegt & Jones, 1999) for NCSA and solvent flattening, FFT and SFALL (CCP4; Winn et al., 2011) for FT and FT−1, and SFTOOLS (coded by B. Hazes) to calculate the weighting factor. RSTATS (coded by S. E. V. Phillips) was used to scale Fcalc to Fobs and to calculate R factors and correlation coefficients between Fobs and Fcalc.
After more than 100 cycles of iteration at 25 Å resolution, phase extension was performed from 25 Å to the highest resolution (initially 3.7 Å to verify phasing and extended to 3.1 Å while testing the map quality) with 50 iterations in one reciprocal-lattice step (∼1/a; a is the unit-cell dimension). This process is referred to as `the procedure' hereafter. The numbers of both iterations (>100 cycles) at low resolution and phase-extension steps (>50 cycles) sufficed for convergence (Fig. 3). We performed 111 steps in phase extension from 25 to 3.7 Å (Fig. 4). An interpretable electron-density map was successfully obtained with the procedure using the initial parameters (Fig. 5).
The electron density of helical structural elements such as α-helices in GNNV-LP at 3.7 Å resolution (or better) was used to examine the correct enantiomorphism of the phase sets (Fig. 5). To improve the electron density, we updated the mask based on the resulting map after every procedure. Each procedure was initiated from the spherical-shell uniform density model with the updated mask and the NCS operators. These NCS operators were refined as the icosahedral orientations by selecting the highest among many steps with varied κ angles (for example from 39.0 to 41.0° with a 0.1° step in the `initial procedure'). After around ten iterations of the procedure, the converged to give the highest value; we refer to this last procedure as the `final procedure'. The final icosahedral orientation was determined as (ω, φ, κ) = (90, 90, 40.025°), in which steps of 0.002° were used to determine the final κ value. The best values of the R factor and the appeared to be 0.20 and 0.92, respectively, at ∼6 Å resolution. The overall values of these calculations are given in Table 1; the progress of phase extension is shown in Fig. 4. All graphics showing molecular structures were produced with PyMOL (https://www.pymol.org/).
‡The resolution limit for the `final' phase extension was extended to the outermost reflections while testing the map quality. §Rwork = , where Fobs and Fcalc are the observed and calculated structure-factor amplitudes, respectively, of reflection hkl. ¶Rfree is as Rwork but was calculated with 5% of randomly chosen reflections omitted from |
2.3. Phase improvement and model building
In the last cycle of phase improvement, DM (Cowtan, 1994) was used for NCSA with of the NCS operators. The quality of the final electron-density map was sufficient to build an atomic model apart from the P-domain region of T = 3 GNNV-LP. We initially suspected that the P-domain of T = 3 GNNV-LP failed to conform to the strict icosahedral symmetry, but utilizing DM with various trials, including the individual mask and the NCS operator around the P-domain, did not significantly improve the density map of the P-domain. From the density map of T = 3 GNNV-LP, the linker region, which links the surface and protrusion domains, was identified. According to this useful information, the corresponding sequence and the spatial arrangement of the individual protrusion domain region were readily obtained for further analysis, which is described in the following section. Model building was performed manually with Coot (Emsley et al., 2010).
3. Methods and results for phasing of the protrusion domain by cross-crystal averaging
3.1. Crystallization
From the information of the corresponding sequence of the P-domain of T = 3 GNNV and a sequence alignment of various betanodaviruses, we constructed the truncated P-domain and performed crystallization experiments. The crystallization procedure of the individual P-domain was conducted using approaches similar to those used for GNNV-LP. We obtained several crystallization conditions that produced crystals in various space groups, including C2, P212121 and P3, using 0.8 M lithium chloride, 0.1 M Tris–HCl pH 8.5, 32%(w/v) PEG 4000 (C2), 0.2 M KCl, 0.1 M magnesium acetate, 0.05 M sodium cacodylate pH 6.5, 10%(w/v) PEG 8000 (P212121) and 0.2 M NaCl, 0.1 M Tris–HCl pH 8.0, 20%(w/v) PEG 4000 (P3). All crystals appeared within one week. Single crystals (∼0.1 × 0.1 × 0.1 mm) were transferred to the respective crystallization solution containing glycerol (20%) as a cryoprotectant and were flash-cooled with liquid N2 for data collection.
3.2. Data collection and processing
X-ray diffraction data were collected from the P-domain crystals with an oscillation angle of 1.0° per frame and an exposure duration of 20 s at a wavelength of 1.0 Å on beamline BL13C1 at NSRRC equipped with a CCD detector (Q315r, ADSC). All data sets were processed with HKL-2000 (Otwinowski & Minor, 1997). The statistics of data collection for four different crystal forms are presented in Table 1. The crystal forms are named P212121, C2D, C2S and P3, respectively, in terms of the corresponding and the number of NCS: S (single) or D (double). For the monoclinic C2, we obtained two crystal forms with large and small unit cells (Table 1). From analyses of the Matthews coefficient (Matthews, 1968), C2D was expected to have two molecules in the whereas C2S contained one.
3.3. Phase determination using cross-crystal averaging
An initial attempt to determine the structure of the P-domain using the MR method with the previous partially traced model of the P-domain from GNNV-LP was not straightforward and failed. We then performed the CCA method using the three data sets P212121, C2D and C2S. A flow chart showing the contribution of each crystal form of the P-domain to phasing calculations is shown in Fig. 6.
From the electron-density map of T = 3 GNNV-LP, the coordinates of the (regular interval) grid points of the map in the region of the P-domain with density greater than 0.4σ were selected. On these grid coordinates (not on density peaks), dummy C atoms were placed to form an envelope of the P-domain as a PDB-format file. This P-domain envelope served as the initial MR model for each crystal. Using Phaser (McCoy et al., 2007), we found the MR solutions with the highest scores. Fig. 7 shows the initial models with dummy atoms in each of the crystal forms.
From these MR solutions, the translocation matrices for CCA were derived. We used four translocation matrices: one identity matrix from the molecule of P212121 to itself, two matrices to the two molecules of C2D and one matrix to the one molecule of C2S. The mask is necessary to define the region of molecules as in the NCSA case; it was generated from the dummy atoms of the envelope. The phases, which were generated with the P-domain envelope, were improved by the CCA method. In the CCA calculation, we used in-house Python and shell scripts together with MAPROT (Stein et al., 1994) and programs from the CCP4 suite. The basic iteration process was the same as that for NCSA in §2, with averaging and solvent flattening in real space and Rayment weighting in Solvent flattening was performed by filling zero values for densities in the solvent regions of each map.
In the CCA cycle, all reflections above 3.0 Å resolution were concurrently used from the first iteration, thus differing from the phase-extension method. From the MR result for the P3 data set, the P-domain envelope shows the trimeric NCS upon locating the crystallographic threefold symmetry. The trimeric NCS averaging was further included in the calculations. The number of cycles was 10 000, which sufficed for convergence (>200 cycles). The correlation-coefficient values were 0.617, 0.779 and 0.710 for P212121, C2D and C2S, respectively. After phase improvement by CCA, the C2S data were used to build the model, because its resolution was the highest (1.80 Å) at the time. ARP/wARP (Langer et al., 2008) built 98% of the total residues automatically in one trial. We subsequently obtained new data to higher resolution (1.2 Å) from one P212121 crystal, and the structure was subjected to a final with this high-resolution data set. The was performed by PHENIX (Adams et al., 2010) and REFMAC5 (Murshudov et al., 2011). Table 1 presents the statistics.
4. Discussion
4.1. NCSA ab initio phasing
The ab initio method with NCSA and phase extension has been successfully implemented to determine the structure of the T = 3 GNNV icosahedral VLP. The key to success was the similarity of the mask (or the envelope) to the target structure. Despite significant variations in the main chains of the CP between FHV and GNNV, the envelopes of these T = 3 capsids are somewhat similar to each other, especially in the spatial position of the P-domain at the quasi-threefold axis. Considering that the P-domain of FHV is much smaller than that of GNNV, the radius of the mask was typically increased to 13 Å to cover the entire structure of GNNV. Other trials using the large spherical-shell mask, which was cut by the bisector planes between neighbouring virus particles, were not successful in phasing. On inspection of this spherical-shell mask, we found that it was so small, covering only half of the P-domain, as to result in a negative effect on the phasing. Tests investigating the simulation trials using errorless amplitudes, Fsimulated, which were calculated from the FHV model positioned in the of the T = 3 GNNV-LP crystals, led to a map with a rough envelope that was uninterpretable in the first trial. In the second trial using a revised mask, which was made from the rough envelope, an interpretable map was obtained even with the mask cutting a small portion of the P-domain. This result is similar to that described in a previous report (Taka et al., 2005). In another simulation trial using the spherical mask with the addition of a hemispherical mask at the P-domain position that covered the entire P-domain region, the map was finally obtained as a Babinet inversion, which yielded a solution in which the phases were incremented by π and in the production of negative electron density for the protein region (Tsao et al., 1992; Plevka et al., 2011). Taken together, the similarity of the mask to the target protein is highly sensitive to the ab initio phasing, but an ideal mask is not invariably required at the initial step.
4.2. Cross-crystal averaging phasing of the P-domain
As in the CCA method, the first trial without the trimeric NCS failed to build a model automatically with ARP/wARP. The second trial, including the trimeric NCS with NCSA and CCA, which was addressed in §3, was successful. To increase the constraints of averaging by including the trimeric NCS, the phase sets were shifted further towards the correct solution, from which we could construct the complete model. As for the initial phase, the envelope cut out from the density of the T = 3 GNNV-LP served as the initial model or as the initial MR model to search for the translocation matrices. The envelope had only dummy C atoms placed at grid points of the density map with a constant level. Before we tried using the cut-out density, which was expected to have better phase information than the envelope, the phasing succeeded. In CCA phasing, we could state that only the information of the envelope of the P-domain was used.
4.3. Quantification of the phasing strength of the molecular averaging
To quantify the strength of averaging in the sense of restriction of the phases, we introduce the free fraction (ff). ff is a simple indicator, which is easily calculated, while general to NCSA and CCA. When the copy number of NCS is 1, the ff value is identical to a fractional volume of protein molecules to be determined in the cell volume.
When the copy number of NCS is n, ff is expressed as
where S is the fraction of the solvent region. The solvent region is constrained to a constant value with solvent flattening. Furthermore, by the averaging restriction, the electron density of 1/n of the protein region can only be changed to reproduce the amplitude Fobs without constraint. Alternatively, it can be explained as that the densities of the 1 − 1/n protein fractional volume are restricted to the same densities as that of the 1/n protein volume.
We can thus expand the same concept for the CCA case,
in which Sk is the fraction of the solvent region of the kth crystal, Pk is the protein fraction of the kth crystal and the equation Sk + Pk = 1 holds. A detailed derivation of (3) is shown in Appendix A. In the case where Sk + Pk ≠ 1, for example where the solvent mask and the NCS mask are different; Pk can be substituted by 1 − Sk.
If we generalize the equation to m pieces of crystal, the kth crystal has nk NCS. We can write
This ff value is another expression for the ratio of the number of data to the number of parameters (Appendix B). The total data (amplitudes) with their phases can reproduce the densities over the whole volume of the crystal at corresponding grid spacing. When ff is equal to 0.5, which means that half of the densities are fixed and the other half can be varied as free parameters to reproduce the amplitudes, it is the same condition as when the missing phases amount to half of the total number of parameters for the construction of a Fourier synthesis map. This value is the largest value (0.5) to determine a unique solution of the electron-density map. In the case of ff > 0.5, the system is deemed to be `underdetermined' or `overfitted' and has no unique solution. Relative to these values of ff, the `constraint ratio' has been discussed in Millane & Arnal (2015).
When ab initio phasing for the T = 3 GNNV-LP was successful, an FHV mask with a mask radius of 13 Å had a 0.71 fraction of the protein region; the copy number of NCS (nNCS) for averaging was 30. The ff value was hence 0.024 (= 0.71/30). We performed trial calculations varying the ff values by decreasing the copy number of NCS with the initial FHV mask and with the initial NCS matrix obtained from the results of self-rotation functions. The results are summarized in Table 2.
|
The quality of the map deteriorated with a smaller copy number of NCS, as expected. Notably, the R factor and CC values have a tendency to deteriorate with an increased number of NCSA or a decreased ff value because of the decreased number of Considering that the superior R factor and CC values are not invariably meant to help to obtain the correct phases, it is thus necessary to take these common phenomena into account in parameter searching when monitoring criteria such as R factor and CC. The phase differences from the refined model are presented in Table 2. When the map suffers from Babinet inversion, its phase differences deviate greatly, which is caused by the mixing of another phase set. Eliminating the case affected by Babinet inversion, the phase differences gradually improve with smaller ff values.
We also performed trial calculations varying the ff values by altering the mask volume. To increase or to decrease the mask volume, we extended or decreased the inner radius of the mask of FHV while the outer particle side of the mask was fixed. The mask with the inner radius extended to either 110 or 90 Å resulted in a non-interpretable density map. The trend in the variation of the ff value is difficult to obtain from simply varying the sizes of the mask volume. Not only a good coverage of the target molecule but also a suitable shape of the mask or envelope is critical for the success of ab initio phasing.
The phasing results of the truncated P-domain with various ff values are summarized in Table 3. The calculations in the upper rows are a successful case in terms of automatic model building with ARP/wARP, compared with those in the lower rows, which failed. The trimeric symmetries were included in the calculation of the successful upper case, but were excluded in the calculation of the failed lower case; the same mask was used in both calculations. The R factor and CC values seem to be random, and might be affected by the structural similarity of each molecule and the validities of their transfer matrices, although these values deviate less in the upper case. In both cases, phase extension was not utilized because of a lack of low-resolution (>30 Å) data. The phase difference (57.4°) from the refined model was smaller in the successful case, whereas the value of the phase difference (81.8°) for the failed case was large and did not decrease from the initial value. ff seems to act as an effective indicator in these phasing calculations.
|
The other factors in molecular averaging, such as the mask, transfer matrices and molecular equalities, as well as the data quality, must also be considered and evaluated. In the NCSA case, the mask roughly resembled the target envelope in shape and completely covered it. The transfer matrices and molecular isomorphism were expected to be satisfactory in the surface domains because of the nature of the icosahedral symmetry. The data quality, including the completeness of the very low resolution data, was essentially satisfactory. An ff value of ∼0.1 seems to be appropriate to separate successful (<0.1) and unsuccessful (>0.1) calculations in terms of map qualities for model building with NCSA (Table 2). In the CCA case, the mask from its own electron density was thought to be almost ideal. The CCA transfer matrices and the CCA isomorphism of molecules among the crystals were expected to be less satisfactory than those of the viral NCS. The data quality was satisfactory, but low-resolution (>30 Å) data were not measured. Coincidentally, an ff value of ∼0.1 was also found as a midpoint threshold to differentiate successful and failed calculations with CCA (0.038 and 0.113, respectively, in this work). From these analyses, we can estimate the ff values of both practical cases of NCSA and CCA, and suggest a new criterion of an ff value of 0.1 as a new rule of thumb to phase new structures of viruses with a single CP or proteins with highly redundant NCS or with several crystal forms.
APPENDIX A
Derivation of the equations for ff
Given the volume p of a protein molecule in an NCS unit and the solvent-region volume s, the density of the protein volume can be changed while the solvent volume is constrained to a fixed value. The ratio of the freely changing volume in the entire volume is written as
If the protein region is composed of n NCS molecules, only the density of one molecule can be free; the density for the other n − 1 molecules is constrained to the density of the first molecule. Therefore,
We rewrite this equation using the fraction of the solvent region S,
We expand the CCA case in the same way. Supposing that there are m crystals, which all have one molecule in NCS, and that sk and pk denote the volumes of the solvent and protein for the kth crystal, respectively, ff is written as
Furthermore,
and
in which Sk and Pk denote the fractions of the solvent and the protein regions, respectively, in the kth crystal.
Substituting (8) and (9) into (7), (3) is derived as
If we generalize the equation to m pieces of crystal, where the kth crystal has nk NCS, we substitute nk + nk(Sk/Pk) for 1 + (Sk/Pk). We rewrite the equation as
APPENDIX B
Comparison of the formulation of ff to phasing power
In early work by Arnold & Rossmann (1986) to quantify the phasing power with NCS, the square of n was used to replace the factor n as
(from equation 17 in Arnold & Rossmann, 1986), where N is the redundancy of NCS and U/V is equal to 1/(1 − S). f is the proportion of data near 1.0 and is the error. Both quantities vary little, so that
whereas our phasing power P′ and ff have the relation
The solvent region and n − 1 of the protein region are equally counted as regions of constrained density. Our formulation is the result from the point of view of a ratio of data and parameters. In contrast, the formulation by Arnold and Rossmann is the result of an analogy to the effect of the multiple measurements on precision.
Acknowledgements
We are indebted to the staff of beamlines BL13B1, BL13C1 and BL15A1 at the National Synchrotron Radiation Research Center (NSRRC) in Taiwan and the staff of the Taiwan-contracted beamline BL12B2 and Eiki Yamashita and Akifumi Higashiura at BL44XU at SPring-8 in Japan for technical assistance under proposal Nos. 2012A4009, 2012A6760, 2012A6600, 2012B4002, 2012B4012, 2012B6600, 2013A4011, 2013A6600, 2013B4000, 2013B6600, 2014A4000, 2014A6600, 2014A6965 and 2014A4004. We thank Tomitake Tsukihara for discussions. We also thank Masaki Yamamoto of RIKEN and Takashi Kumasaka of JASRI and their staff for useful discussions. We are grateful to Sunney I. Chan for manuscript editing. Portions of this research were carried out at the NSRRC-NCKU Protein Crystallography Laboratory of the University Center for Bioscience and Biotechnology of National Cheng Kung University (NCKU). This work was supported in part by National Science Council (NSC) grants 101-2628-B-213-001-MY4, 102-2627-M-213-001-MY3 and NSRRC grants to C-JC.
References
Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals Google Scholar
Arnold, E., Erickson, J. W., Fout, G. S., Frankenberger, E. A., Hecht, H.-J., Luo, M., Rossman, M. G. & Rueckert, E. R. (1984). J. Mol. Biol. 177, 417–430. CrossRef CAS PubMed Google Scholar
Arnold, E. & Rossmann, M. G. (1986). Proc. Natl Acad. Sci. USA, 83, 5489–5493. CrossRef CAS PubMed Web of Science Google Scholar
Arnold, E., Vriend, G., Luo, M., Griffith, J. P., Kamer, G., Erickson, J. W., Johnson, J. E. & Rossmann, M. G. (1987). Acta Cryst. A43, 346–361. CrossRef CAS Web of Science IUCr Journals Google Scholar
Chen, B., Vogan, E. M., Gong, H., Skehel, J. J., Wiley, D. C. & Harrison, S. C. (2005). Structure, 13, 197–211. Web of Science CrossRef PubMed CAS Google Scholar
Chen, N.-C., Yoshimura, M., Guan, H.-H., Wang, T.-Y., Misumi, Y., Lin, C.-C., Chuankhayan, P., Nakagawa, A., Chan, S. I., Tsukihara, T., Chen, T.-Y. & Chen, C.-J. (2015). PLoS Pathog. 11, e1005203. CrossRef PubMed Google Scholar
Cowtan, K. (1994). Jnt CCP4/ESF–EACBM Newsl. Protein Crystallogr. 31, 34–38. Google Scholar
Crennell, S., Takimoto, T., Portner, A. & Taylor, G. (2000). Nature Struct. Biol. 7, 1068–1074. CrossRef PubMed CAS Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Isupov, M. N., Brindley, A. A., Hollingsworth, E. J., Murshudov, G. N., Vagin, A. A. & Littlechild, J. A. (2004). Acta Cryst. D60, 1879–1882. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kleywegt, G. J. & Jones, T. A. (1999). Acta Cryst. D55, 941–944. Web of Science CrossRef CAS IUCr Journals Google Scholar
Langer, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. (2008). Nature Protoc. 3, 1171–1179. Web of Science CrossRef CAS Google Scholar
Li, W. & Li, F. (2011). Structure, 19, 155–161. Web of Science CrossRef CAS PubMed Google Scholar
Matthews, B. W. (1968). J. Mol. Biol. 33, 491–497. CrossRef CAS PubMed Web of Science Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
Millane, R. P. & Arnal, R. D. (2015). Acta Cryst. A71, 592–598. CrossRef IUCr Journals Google Scholar
Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Nishizawa, T., Furuhashi, M., Nagai, T., Nakai, T. & Muroga, K. (1997). Appl. Environ. Microb. 63, 1633–1636. CAS Google Scholar
Otwinowski, Z. & Minor, W. (1997). Methods Enzymol. 276, 307–326. CrossRef CAS Web of Science Google Scholar
Plevka, P., Kaufmann, B. & Rossmann, M. G. (2011). Acta Cryst. D67, 568–577. Web of Science CrossRef CAS IUCr Journals Google Scholar
Rayment, I. (1983). Acta Cryst. A39, 102–116. CrossRef CAS Web of Science IUCr Journals Google Scholar
Rossmann, M. G., Arnold, E., Erickson, J. W., Frankenberger, E. A., Griffith, J. P., Hecht, H.-J., Johnson, J. E., Kamer, G., Luo, M., Mosser, A. G., Rueckert, R. R., Sherry, B. & Vriend, G. (1985). Nature (London), 317, 145–153. CrossRef CAS PubMed Web of Science Google Scholar
Stein, P. E., Boodhoo, A., Armstrong, G. D., Cockle, S. A., Klein, M. H. & Read, R. J. (1994). Structure, 2, 45–57. CrossRef CAS PubMed Web of Science Google Scholar
Su, J., Li, Y., Shaw, N., Zhou, W., Zhang, M., Xu, H., Wang, B.-C. & Liu, Z.-J. (2010). Protein Cell, 1, 453–458. Web of Science CrossRef CAS PubMed Google Scholar
Taka, J., Naitow, H., Yoshimura, M., Miyazaki, N., Nakagawa, A. & Tsukihara, T. (2005). Acta Cryst. D61, 1099–1106. CrossRef CAS IUCr Journals Google Scholar
Tang, L., Johnson, K. N., Ball, L. A., Lin, T., Yeager, M. & Johnson, J. E. (2001). Nature Struct. Biol. 8, 77–83. Web of Science PubMed CAS Google Scholar
Taylor, G. L. (2010). Acta Cryst. D66, 325–338. Web of Science CrossRef IUCr Journals Google Scholar
Tsao, J., Chapman, M. S. & Rossmann, M. G. (1992). Acta Cryst. A48, 293–301. CrossRef Web of Science IUCr Journals Google Scholar
Vagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22–25. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wery, J.-P., Reddy, V. S., Hosur, M. V. & Johnson, J. E. (1994). J. Mol. Biol. 235, 565–586. CrossRef CAS PubMed Google Scholar
Winn, M. D. et al. (2011). Acta Cryst. D67, 235–242. Web of Science CrossRef CAS IUCr Journals Google Scholar
Zlotnick, A., Natarajan, P., Munshi, S. & Johnson, J. E. (1997). Acta Cryst. D53, 738–746. CrossRef CAS Web of Science IUCr Journals Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.