research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983

Solution of the structure of a calmodulin–peptide complex in a novel configuration from a variably twinned data set

aJanelia Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA
*Correspondence e-mail: jacobpkeller@gmail.com

Edited by K. Diederichs, University of Konstanz, Germany (Received 25 February 2016; accepted 2 December 2016)

Structure determination of conformationally variable proteins can prove challenging even when many possible molecular-replacement (MR) search models of high sequence similarity are available. Calmodulin (CaM) is perhaps the best-studied archetype of these flexible proteins: while there are currently ∼450 structures of significant sequence similarity available in the Protein Data Bank (PDB), novel conformations of CaM and complexes thereof continue to be reported. Here, the details of the solution of a novel peptide–CaM complex structure by MR are presented, in which only one MR solution of marginal quality was found despite the use of 120 different search models, an exclusivity enhanced by the presence of a high degree of hemihedral twinning (overall refined twin fraction = 0.43). Ambiguities in the initial MR electron-density maps were overcome by using MR-SAD: phases from the MR partial model were used to identify weak anomalous scatterers (calcium, sulfur and chloride), which were in turn used to improve the phases, automatically rebuild the structure and resolve sequence ambiguities. Retrospective analysis of consecutive wedges of the original data sets showed twin fractions ranging from 0.32 to 0.55, suggesting that the data sets were variably twinned. Despite these idiosyncrasies and obstacles, the data themselves and the final model were of high quality and indeed showed a novel, nearly right-angled conformation of the bound peptide.

1. Introduction

Calmodulin is a ubiquitously expressed, highly conserved calcium-binding protein of great physiological importance (reviewed in Marshall et al., 2015[Marshall, C. B., Nishikawa, T., Osawa, M., Stathopulos, P. B. & Ikura, M. (2015). Biochem. Biophys. Res. Commun. 460, 5-21.]; Tidow & Nissen, 2013[Tidow, H. & Nissen, P. (2013). FEBS J. 280, 5551-5565.]). Its universal role is to bind, generally in the presence of calcium, to short peptide motifs within proteins and thereby regulate their function, although there have been an increasing number of reports of variations on this process. Hundreds of targets have been shown to bind calmodulin physiologically, with roles ranging from muscle contraction to ion channel modulation to enzyme activation (Marshall et al., 2015[Marshall, C. B., Nishikawa, T., Osawa, M., Stathopulos, P. B. & Ikura, M. (2015). Biochem. Biophys. Res. Commun. 460, 5-21.]), with many more predicted by bioinformatics. Owing to its importance and perhaps also its amenability to production and purification (>100 mg l−1 protein can be produced in Escherichia coli), calmodulin has been the subject of many structural studies. Accordingly, there are hundreds of structures in the Protein Data Bank (PDB) which together demonstrate the highly varied conformational states and peptide-binding promiscuity of calmodulin.

Crystallographic merohedral twinning (reviewed in Helliwell, 2008[Helliwell, J. R. (2008). Crystallogr. Rev. 14, 189-250.]) is a phenomenon which in the past has stymied structure solution, but which is becoming increasingly tractable as software packages incorporate treatments thereof. It occurs when the diffraction spots from a given crystal lattice are superimposed exactly on those of another equivalent but independently diffracting lattice through one or more so-called `twin operators'. In merohedral twinning per se these twin operators correspond to the crystallographic operators from a higher symmetry Laue group. Hence, the resulting measured intensities may appear to arise from a higher symmetry space group, whereas in reality the intensities represent a weighted average of two or more different overlapping diffraction patterns from two or more congruent lower symmetry lattices. The weights of this average are denoted `twin fractions', and when the fractions are equal the crystal is said to be perfectly twinned. When the number of twin orientations is two, the crystal is said to be hemihedrally twinned, and when this number is increased to four the term tetartohedral twinning is applied. In the following, a structure solution involving these various twinning variants will be described.

2. Laboratory methods

2.1. Overexpression, peptide synthesis and purification

An open reading frame encoding human calmodulin with a single point mutation D3Y, introduced to facilitate spectroscopic quantification, was cloned into the plasmid pET-24 and expressed in E. coli strain BL21(DE3). A 3 l culture was grown to log phase, induced with 0.5 mM IPTG and further grown for 6 h at 37°C, at which point the cells were centrifuged for 10 min at 7000g, decanted and frozen at −80°C. The cell pellets were resuspended in Tris-buffered saline (TBS; 20 mM Tris–HCl, 138 mM NaCl pH 7.4) plus 1 mM Na2EDTA, sonicated and centrifuged at 40 000g for 30 min to remove in­soluble matter. The supernatant was then supplemented to 2 mM CaCl2 and loaded onto a 50 ml bed volume butyl-Sepharose hydrophobic interaction chromatography column (Pharmacia), washed with 500 ml TBS plus 2 mM CaCl2 and finally eluted by substituting 1 mM EDTA for CaCl2. At this point, although the protein was already >95% pure as judged by SDS–PAGE, the eluate was concentrated and loaded onto a Sephacryl 26/100 S-200 size-exclusion chromatography (SEC) column (GE Healthcare Life Sciences) for further purification and characterization. Calmodulin eluted as a single peak, and fractions from this peak were collected, concentrated to 60 mg ml−1 and frozen at −80°C. The overall yield of purified calmodulin was about 250 mg from 3 l of bacterial culture.

To form a CaM–peptide complex, a synthetic, C-terminally amidated peptide (LifeTein) of sequence KRNKALKKIRK­LQKRGLIQMT (an intracellular loop from the murine Cl/HCO3 exchanger SLC26A3) was added in molar excess to calmodulin in the presence of 2 mM CaCl2 and was run on a Superdex S200 10/300 size-exclusion chromatography (SEC) column (GE Healthcare Life Sciences) with TBS plus 0.5 mM CaCl2 as the running buffer. The presence of a peptide–CaM complex was confirmed prior to crystallization both by the observation of an SEC peak shift relative to CaM alone as well as by SDS–PAGE of the same SEC peak fractions, which showed bands corresponding to both CaM and peptide. Since the peptide was in excess, no CaM-only peak was observed, although a peptide-only peak was observed in later fractions. Complex-containing fractions were concentrated to ∼10 mg ml−1 and stored at 4°C without buffer exchange.

2.2. Crystallization and data collection

Since a complex containing a closely related peptide had recently been solved locally (unpublished results), the same conditions were used to crystallize this complex: 2.4–3.0 M ammonium sulfate, 100–300 mM lithium sulfate, 100 mM Tris–HEPES pH 7–8; hanging-drop configuration with mixtures of well solution:protein solution of 0.5–3 µl:1 µl; temperature 293 K. Crystals readily nucleated spontaneously (in contrast to the aforementioned related complex, which almost always required seeding), and were enigmatically dissimilar to the previous crystals both morphologically and in their lattices. While many drops formed entities of variable crystallinity, apparently single crystals appeared in some of the drops after 1 d and grew for several days as hexagonal rods reaching a maximum diameter of ∼400 µm and a maximum length of ∼1 mm. The crystals displayed pointed ends similar to pencils sharpened at both ends (Fig. 1[link]a). Many of these mostly single crystals displayed growth defects and morphological twinning, but those harvested for X-ray diffraction appeared to be completely single. Crystals were either cooled without further cryoprotection or were cryoprotected in mother liquor supplemented to 3.2 M ammonium sulfate. Initial data sets collected from these crystals at a synchrotron displayed twinning, were of relatively poor quality and were not readily solved by molecular replacement or anomalous scattering from calcium and sulfur. Accordingly, they were not further pursued.

[Figure 1]
Figure 1
Crystals used in the current structure determination. (a) The morphology of most crystals grown under these conditions was rod-like, equilaterally hexagonal in section and pointed at the ends. Crystal size and singularity varied greatly. (b) A crystal similar to that used to determine the structure, with axial growth limited by the coverslip and the drop surface. The crystal actually used was more equilateral, more single-appearing and significantly larger. (c) Schematic representation of the size and shape of the actual crystal used in structure determination.

After about one year, one particularly large and single-appearing crystal, without any visible optical irregularities, was found growing perpendicular to the cover slip, thus forming an equilateral hexagonal lozenge of ∼500 µm in diameter and ∼300 µm thickness (Fig. 1[link]c), and this crystal was used for the structure solution described herein. Unfortunately no microscopic images of this particular crystal were taken, but it was similar qualitatively to the crystal in Fig. 1[link](b). It was mounted face-on on a micro-mesh (MiTeGen) directly from the drop, without further cryoprotection, into the cryostream of a rotating copper-anode X-ray set: a MicroMax-007 HF Cu generator equipped with VariMax HF optics, a Saturn 944 HG CCD detector and a kappa goniostat. One 112.5° data set and two 180° data sets were collected as 5 s 0.1° oscillations in ω (essentially equivalent to φ). The second data set was collected to be 180° shifted in ω relative to the first, and the third was similar to the second but collected with a χ angle of 10°. Although the data were collected at the minimum detector distance, the crystal diffracted beyond these resolution limits (Table 1[link]).

Table 1
Data collection and processing

Values in parentheses are for the highest resolution shell.

Diffraction source Cu Kα
Wavelength (Å) 1.54186
Temperature (K) 100
Detector Saturn 944 HG CCD
Crystal-to-detector distance (mm) 145
No. of crystals 1
Rotation range per image (°) 0.1
Total rotation range (°) 112.5 + 180 + 180 [three sweeps]
Exposure time per image (s) 5
Space group P212
a, b, c (Å) 98.87, 98.87, 128.58
α, β, γ (°) 90, 90, 120
Mosaicity (°) 0.6
Resolution range (Å) 22.81–1.69 (1.73–1.69)
Total No. of reflections 1645096 (8675)
No. of unique reflections 76360 (2805)
Completeness (%) 96.5 (67.6)
Multiplicity 21.5 (3.1)
CC1/2 0.998 (0.763)
I/σ(I)〉 22.1 (2.8)
Rmeas (%) 11.0 (39.4)
Rp.i.m. (%) 2.1 (20.2)
Overall B factor from Wilson plot (Å2) 20.2

It is important for the discussion below to mention a few details about the exact diffraction setup (refer to Fig. 2[link] for the following). Before data collection, the beam center was found to be displaced horizontally ∼150 µm from the goniometer eucentric point as determined both by phosphor screen and by incremental manual translation of a small protein crystal followed by diffraction intensity comparisons. Therefore, the crystal had to be mounted slightly off-center from the gonio­meter axis to maximize exposure during 180° sweeps, which resulted in a translational component throughout the course of the sweep, with the crystal following an arc-shaped trajectory. Since the crystal was large, it was not fully immersed in the beam, although the inferred beam spot appeared to be fully occupied by the crystal throughout the sweeps. Throughout data collection, different amounts and regions of the crystal were exposed, leading to some of the observations about variable twinning described below. In the initial and final frames of each data set, the beam was roughly perpendicular to the face of the crystal, whereas intermediate frames were side-on. Owing to this improvised data-collection strategy, for sweeps of the opposing 180° the goniometer had to be manually rotated 180° and the crystal adjusted by manual horizontal translation to maximize exposure. This manual adjustment introduced an element of uncertainty about the exact orientation and region of the crystal exposed to the beam relative to the first data set. It is also likely to have shifted, to some degree, the locations of the diffraction spots on the detector, as well as the precise path of the beam through the crystal. This description of details is necessary for understanding the intricacies of the processing and analyses described herein, but in the final analysis these peculiarities in data collection did not seem to compromise the quality of the resulting structure.

[Figure 2]
Figure 2
Data-collection scheme. (a) Schematic of crystal at the beginning and end of sweeps, viewed along the beam axis. Since the beam center was not centered on the eucentric point of the goniometer, different parts of the crystal were immersed in the beam at different points in data collection. The effect is exaggerated for clarity. (b) Proposed twin domain structure of the crystal. `+' and `−' refer to the relative orientations of the domains.

3. Structure solution

3.1. Initial attempts

Initial indexing indicated a trigonal space group, so the data were integrated in MOSFLM (Leslie & Powell, 2007[Leslie, A. G. W. & Powell, H. R. (2007). Evolving Methods for Macromolecular Crystallography, edited by R. J. Read & J. L. Sussman, pp. 41-51. Dordrecht: Springer.]) in point group 3. Subsequent analysis in POINTLESS (Evans, 2011[Evans, P. R. (2011). Acta Cryst. D67, 282-292.]) suggested a point-group 622 space group, so the data were accordingly merged and scaled in AIMLESS (Evans, 2011[Evans, P. R. (2011). Acta Cryst. D67, 282-292.]), and attempts at structure solution were undertaken using either MR or SAD, but no convincing solutions were found. Since initial twinning tests in POINTLESS had indicated a twin fraction of 17–18%, and since merohedral twinning is impossible in point group 622, the true space group was inferred to be of a lower symmetry. Often an assumption-free approach is taken under similar circumstances, and the data are processed in P1, subject to later reassessment. In the current case, however, this would have led to an excessively large number of search models per asymmetric unit for molecular replacement or heavy atoms for SAD; retrospective analysis from the solved structure indicated a P1 unit cell containing 24 calmodulin molecules (tantamount in reality to 48 structural entities since calmodulin consists of two flexibly linked `lobes') and 96 calcium ions. Further, the refined unit-cell parameters strongly suggested a point-group 3 space group, so the data were re-evaluated in P3, in which the most likely number of calmodulin lobes per asymmetric unit was 16 (based on solvent-content analyses). In the process of merging, care was taken with respect to the four equivalent indexing possibilities, and one of the three data sets indeed merged better with the other data sets when it was alternatively re-indexed.

3.2. Successful molecular replacement in point group 3

Although there were a large number of candidate MR search models in the PDB, PDB entry 1exr (Wilson & Brunger, 2000[Wilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237-1256.]) was chosen as a starting model simply because it had previously been used as a search model for a related structure (unpublished work) and had already been prepared for MR searches. Because calmodulin consists of two separate lobes connected by an α-helical segment known to be highly flexible, the search model contained not the full-length calmodulin but only one lobe per search. Separate searches for N-terminal and C-terminal lobes (`N-lobes' and `C-lobes') in all alternative point-group 3 space groups were carried out in Phaser (McCoy et al., 2007[McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658-674.]), and since there was uncertainty as to the number of molecules per asymmetric unit, several options were tried. Successful solutions were found only using the N-lobe, and only in space group P32. At first, reasonably probable numbers of N-lobes were used, but since the trend of successful searches continued for progressively more N-lobes, a maximum number of 13 were finally found. Since this was far more than any plausible number of full-length calmodulin molecules per asymmetric unit, it was concluded that the search model was probably being placed promiscuously as both N- and C-lobes, since the two lobes are often similar in structure. It remained to decipher which placed models were which lobes in reality, a problem complicated considerably by the large number of lobes, model bias and the sequence similarity between the lobes. As a first approach to discriminating between lobes, an MR search for 16 lobes, either N- or C-terminal, was run, which resulted in the placement of only 12 lobes, 11 of which were N-lobes: the structure of the N-lobe of the search model was apparently more similar to the C-lobe of the crystal than was the C-lobe of the search model. Although at this point it may have been possible manually to discern differences in the electron-density maps at a few locations in which the lobes diverged in sequence and thereby identify the lobe, a more automated approach using MR-SAD (Schuermann & Tanner, 2003[Schuermann, J. P. & Tanner, J. J. (2003). Acta Cryst. D59, 1731-1736.]) was taken.

To carry out MR-SAD, it is possible in the SAD pipeline of Phaser (McCoy et al., 2007[McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658-674.]) to use phases from partial models to boost performance in identifying weak anomalous heavy-atom sites and subsequently to derive phase information therefrom. The MR solution described above provided just such a partial model, and was therefore used to find anomalous peaks corresponding to either (i) purely anomalous `atoms' or (ii) Ca, Cl and S atoms. The process was successful in both cases, yielding either 30 purely anomalous `atoms' or a total of 39 Ca, Cl and S atoms. In both cases, subsequent autobuilding with Buccaneer (Cowtan, 2006[Cowtan, K. (2006). Acta Cryst. D62, 1002-1011.]) built over 90% of the structure, and this time all eight N- and C-lobes and bound peptides were assigned correctly. Subsequent steps of automatic and manual rebuilding substantially improved the structure to a point, but the twinned nature of the data prevented further improvement. REFMAC 5.8.0135 (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]) was able to address twinning automatically, and it appeared at this point that the data were tetartohedrally twinned in P32, with all four possible twinning operators of approximately equal twinning fractions. Adding twin refinement to the REFMAC protocol therefore dramatically improved the model's statistics, as would be expected.

3.3. Enigmatic MR server results

The following parenthetical retrospective observations made with respect to the structure solution may be of general interest. Concurrently with the structure determination using PDB entry 1exr as a search model, the MrBUMP online server (Keegan & Winn, 2007[Keegan, R. M. & Winn, M. D. (2007). Acta Cryst. D63, 447-457.]) was used to identify and process ∼60 different search models, corresponding either to the N-lobe or the C-lobe (∼120 lobes in total), using various programs to prepare the models as well as to perform MR. In only one of these ∼120 cases was a solution identified, and this occurred using the N-terminus of PDB entry 1exr with Phaser. This was, serendipitously, the model and MR program described above, which had been selected simply out of convenience. It is further notable that this structure, a high-resolution structure (1.0 Å) of calcium-bound calmodulin not from the many 99–100% sequence-identical structures in the PDB, but rather from Paramecium tetraurelia, is only 88% identical to human calmodulin. Moreover, 1exr is in the open, non-peptide-bound configuration, whereas the current refined structure shows a more stereotyped peptide-bound conformation. It would hardly have been possible to predict this result, and it remains puzzling even after the fact.

The following observations, however, can be offered to explain the anomaly. Firstly, generally speaking, this MR problem was particularly difficult owing to the presence of several features: extensive twinning, searching for large undetermined numbers of copies per asymmetric unit, and a lack of definitive space-group assignment. As such, the MR results may have become more sensitive to variations in input parameters than normal. Examination of log files revealed time-outs in the majority of cases, owing to time limits set as a default parameter by the server. The time-outs, however, were probably merely a symptom of some other underlying issue. This issue might be related to calmodulin's wide range of different conformations; despite perfect sequence identity, its structures differ dramatically. Accordingly, it is reasonable that calmodulin deviates from the assumptions used by MR programs to derive r.m.s.d. estimates from sequence similarity. Upon examination of the log files from the MrBUMP server, the estimated r.m.s.d.s from Phaser were found to be systematically and substantially lower than those derived from an analysis of the final structure (Fig. 3[link]) using the DALI server (Holm & Rosenström, 2010[Holm, L. & Rosenström, P. (2010). Nucleic Acids Res. 38, W545-W549.]). While it is possible that some of this discrepancy may arise from differences in the precise meaning of the underlying r.m.s.d. values, the discrepancies are quite large in many cases (e.g. PDB entry 3wfn; Fig. 3[link]), in accordance with the unusually large structural variability of calmodulin. Therefore, systematic underestimation of the anomalously large r.m.s.d. of calmodulin, in combination with the other obstacles mentioned above, may have prevented the MR software from successfully finding solutions in a timely manner.

[Figure 3]
Figure 3
R.m.s.d. estimates by Phaser and calculated r.m.s.d.s from the DALI server as indicated in the legend. A selection of N-lobe models used by the MrBUMP server is shown, with a range of different sequence identities (IDs) as indicated. The successful model, PDB entry 1exr, has the lowest r.m.s.d. (as reported by the DALI server) as well as the second lowest DALI/Phaser r.m.s.d. discrepancy of the models shown here. More similar structures were identified later, after structure solution, but were not tested by MrBUMP. It may be noted that the lack of a fixed relationship between sequence ID and Phaser r.m.s.d. estimates is likely to be owing to Phaser's use of both sequence ID and sequence length to produce improved r.m.s.d. estimates (Oeffner et al., 2013[Oeffner, R. D., Bunkóczi, G., McCoy, A. J. & Read, R. J. (2013). Acta Cryst. D69, 2209-2215.]).

To determine to what degree r.m.s.d. estimates affect MR results in the context of the current data, a number of parallel MR jobs were run in Phaser using manually input, incremented values for the r.m.s.d. parameter. The search model used was the most similar structure in the PDB as identified by the DALI server, PDB entry 4q57 (J.-G. Song, J. Kostan, I. Grishkovskaya & K. Djinovic-Carugo, unpublished work), and this was assumed to be among the best possible search models. As such, results deriving therefrom would apply so much the more to inferior search models. The efficacy of MR was indeed sharply dependent on the accuracy of input r.m.s.d. (Fig. 4[link]), even using what was assumed to be the best possible search model. When r.m.s.d. estimates strayed too far from the optimum value, fewer copies were found and/or the CPU time per molecule increased dramatically (note the log scale on the vertical CPU-time axis of Fig. 4[link]). With this result, two explanations of the MrBUMP server anomaly are apparent. First, because lower total numbers of molecules were found the scores were poorer, and thus did not rise to the level of probable solutions as reported by the server. Second, because of the longer CPU times, many of these searches reached the maximum time allotment set in the server before completion. In the case of PDB entry 1exr, however, two parameters were uniquely optimal: the r.m.s.d. was in fact low, and the r.m.s.d. estimate from Phaser was close to that from DALI, perhaps, ironically, owing to its lower sequence similarity. These two features are likely to explain its success as a search model. In light of these considerations, then, the surprising MR result seems to be understandable. During the course of the publication of this paper, a newer version of Phaser has been slated for release. The new version refines the estimated r.m.s.d. after the first ensemble is placed, so it is likely that all input r.m.s.d.s would perform roughly similarly in the newer version (Airlie McCoy, personal communication).

[Figure 4]
Figure 4
MR results from parallel jobs using PDB entry 4q57 (the most structurally similar PDB entry to the N-terminus of the current structure, as calculated by DALI) as the search model, plotted as a function of input r.m.s.d. values. R.m.s.d. estimate from Phaser (open diamonds) and calculated r.m.s.d. from DALI (open triangles) are indicated. Note the sharp dependence of MR success on correct input r.m.s.d.

3.4. Adjustment of space group

It was noticed at a more-progressed point in refinement that the four twinning fractions in REFMAC did not appear to be independent. It seemed that two of the four twin operators remained paired at about 26–28%, while the others were coupled at 22–24%. It appeared, therefore, that perhaps the space group was in fact of higher symmetry than originally thought, and alternative space groups were thus considered. Using the model derived thus far, MR jobs in all possible space groups of point group 32 were run, and a solution was found with four calmodulin molecules using space group P3212. Refinement and further building in this space group resulted in a good structure, and reduced the twin domains from four to two (tetartohedral to hemihedral). Potential concerns of overfitting and model bias owing to twinning were to some extent allayed by the observation of a large difference peak (>10σ) in electron density at the C-terminus of the peptide, corresponding to the as-yet unmodeled synthetic C-terminal amidation of the peptide. Addition and refinement of this amide satisfactorily resolved the outstanding density peak.

3.5. Model improvement

To improve the model, iterative cycles of manual rebuilding followed by refinement were employed. This rebuilding stage included adjustments to rotamers, addition of alternative side-chain conformations, substitution of solvent ions (chloride, sulfate and sodium) for waters, and gradual addition of terminal residues as they became apparent. While most refinements were carried out in REFMAC 5.8.0135 (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]), this eventually led to poor values in geometry. Parallel refinements in PHENIX 1.10.1-2155 (Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]) were thus tested, and resulted in poorer R values but drastically improved geometrical values. Thereafter, the PHENIX-refined model was returned to REFMAC, while tightening REFMAC's geometrical restraints, and a reasonable balance between the geometry and R values was achieved. Structure-refinement statistics are given in Table 2[link].

Table 2
Structure solution and refinement

Resolution range (Å) 22.8–1.69
No. of reflections, working set 68993
No. of reflections, test set 3631
Final Rwork (%) 9.71
Final Rfree (%) 14.08
Cruickshank DPI 0.0287
Refined twin fraction (REFMAC) 0.427
R.m.s. deviations
 Bonds (Å) 0.0086
 Angles (°) 1.2089
Ramachandran plot
 Most favored (%) 99.29
 Allowed (%) 0.14

3.6. Choice of anisotropic ADP model

Consideration of the various atomic displacement parameter (ADP) models led to the choice of the anisotropic model owing to the following reasoning. Although the matter is controversial, the resolution of the current structure (∼1.7 Å) is on the borderline of anisotropic ADP modeling (this is the default cutoff used in phenix.refine, for example). On the other hand, the results of parallel refinements with isotropic, isotropic plus translation–libration–screw (TLS) and anisotropic models, each starting from an isotropic model, yielded Rfree values improved by ∼1.5% in the case of the anisotropic model versus the other two (Table 3[link]). Although the gap between R and Rfree was indeed larger in the anisotropic case (4% versus ∼2%), potentially indicating overfitting, the geometric aspects of the structure did not suffer appreciably. Such was the empirical observation, but certain conceptual issues must be addressed to justify the use of the anisotropic model for the current data set.

Table 3
Comparison of ADP models

ADP model type Biso TLS + Biso Anisotropic
Rwork (%) 13.52 13.42 9.71
Rfree (%) 15.57 15.49 14.08
RfreeRwork (%) 2.05 2.07 4.37
R.m.s.d., bonds (Å) 0.0081 0.0065 0.0086
R.m.s.d., angles (°) 1.1983 1.1067 1.2089
Average CC (%) 90.75 91.20 96.09
Cruickshank DPI 0.0196 0.0194 0.0287
FOM overall 88.12 88.25 89.52

In principle, if the ratio of the number of observations to the number of modeled parameters is greater than one (n/p > 1), the parameters of the model can be fitted exactly; in practice, higher ratios are considered desirable. In the current case, the number of unique reflections is 72 624 and the number of modeled non-H atoms is 6813, yielding a reflections-to-atoms ratio of 10.66. Including restraints according to Rupp's equation (Rupp, 2010[Rupp, B. (2010). Biomolecular Crystallography: Principles, Practice, and Application to Structural Biology, p. 638. New York: Garland Science.]) yields 25 892 additional `observations', increasing the ratio to 14.46. Use of isotropic ADPs (four parameters per atom) leads to an n/p ratio of 3.62, while the anisotropic model (nine parameters per atom) implies an n/p ratio of 1.61, which would generally be considered insufficient. Furthermore, twinning may have a deleterious effect owing to a reduction in the independence of individual reflections. For a number of reasons, however, the data herein may be of higher accuracy and precision than those used to derive the above rules of thumb. First, the overall multiplicity is 21.5, and second the signal was improved through fine φ-slicing (Pflugrath, 1999[Pflugrath, J. W. (1999). Acta Cryst. D55, 1718-1725.]; Mueller et al., 2012[Mueller, M., Wang, M. & Schulze-Briese, C. (2012). Acta Cryst. D68, 42-56.]). Third, since each of the three data sets was taken with a unique crystal orientation, different detector pixels were used for each data set, probably increasing the accuracy of the intensities. Finally, because of the detector's truncation of the data, plots of data statistics are distributed unusually, with sharp degradations observed at 1.9 Å resolution, where truncation begins. As such, the values in highest resolution shells do not reflect the properties of the data set in the usual way. According to these considerations, it is perhaps tenable to allow lower-than-normal n/p ratios.

To the knowledge of the current author, there is no theor­etical or quantitative treatment of the combined effects of data precision and n/p ratio on the parametrization of crystallo­graphic models, although such a treatment should be feasible, at least empirically, either through the use of a high-redundancy, high-resolution data set and structure or through simulations thereof. Perhaps at a minimum, since it is known from high-resolution structures that protein atoms are in fact anisotropically displaced in the crystalline state, it seems reasonable to use this knowledge if the minimum criteria of n/p > 1 and improved model statistics are met. This discussion, of course, might become more critical should any biological ramifications be inferred from the anisotropic model; herein, no such ramifications are put forward, but let the caveat be hereby noted.

3.7. Analysis of smaller wedges of data

The data-collection protocol used herein was for most purposes non-optimal, but in this case it allowed some interesting insights. After deciphering the twinning issues in the reflection data, questions arose concerning the correspondence between crystal morphology and the observed intensities, particularly since the crystal appeared to be single. A hypothesis was generated that the hexagonal-lozenge morphology really consisted of six conjoined crystalline wedges of alternating crystal lattice orientation (Fig. 2[link]). This made sense considering that the twin operator corresponded to a rotation of 60° (or, equivalently, −60° or 180°) in real space about the axis normal to the lozenge face, and this hypothesis proved testable. Since in the course of collecting the data the beam did not immerse the entire crystal, and since the beam scanned through different regions of the crystal, the possibility arose that subsections of the rotation sweeps might have differing twin fractions owing to the beam's traversing different proportions of the alternating triangular wedges. Accordingly, refinements were run using the smallest angular subsections of the reflection data that afforded near-completeness, i.e. 60°. These 60° subsections were merged in a `moving window' of 30° increments for all three data sets. This process showed that indeed the twin fraction varied as a function of rotation angle, and ranged from 0.32 to 0.55 (Fig. 5[link]a). While it is possible that alternative hypotheses might explain the twinning equally well, the triangular-wedge hypothesis is probably the simplest and is sufficient to explain the observed changes in twin fraction. According to this model, had the crystal been completely immersed in the beam, perfect twinning would probably have been observed. On the other hand, with more-informed alignment and a non-immersing small beam, complete untwinned data sets might have been collected, thus suggesting a further use and justification for the increasingly abundant microfocus synchrotron beamlines.

[Figure 5]
Figure 5
Analysis of subsets of data. Image numbers are given as ranges under bars, and each frame represents 0.1°. (a) Twin fractions versus rotation range in three data sets [called Ainv, Achi10 and A (reindexed)]. Refinement of the final model against subsets of frames from each data set showed a smoothly varying twin fraction. Colored bars represent wedges from different data sets as indicated. (b) Efficacy of MR by two measures: Phaser Max LLG (blue) and (1 − normalized CPU time) (red). The latter unconventional inverse measure was used to show the relationship more clearly. (c) Anomalous signal is shown as the average peak height of the 16 highest-magnitude anomalous peaks (blue), further averaged across the three data sweeps and with values above bars in sigmas. Note that in better data sets these peaks correspond to the 16 known calcium sites, whereas in worse cases these averages include some negative (false) peaks of large magnitude. Red bars indicate twin fractions.

Some other analyses were run on these same subsections of the data that may be of interest. First, the efficacy of MR was analyzed (Fig. 5[link]b), and it was found that the less-twinned wedges were more easily solved in Phaser. Also, in almost all cases, detwinning before MR made structure solution more difficult or impossible. This lack of detwinning efficacy may be attributable to the dramatic rise in estimated errors with higher twinning fractions, as is implicit in detwinning equations, although it is difficult to see why detwinning would be worse than not. Perhaps in the current case the variability in twin fraction made detwinning less successful than otherwise. Next, anomalous peak heights from individually refined wedges were compared (Fig. 5[link]c), with the expectation that peak heights from data sets of lower twin fraction would be greatest. Surprisingly, despite using various approaches to this analysis, no obvious relationship between twin fraction and anomalous peak heights was observed, perhaps because of the`noise' of variability in merging statistics among these data wedges. The relationship between twinning and anomalous signal has already been explored through simulated twinned data (http://smb.slac.stanford.edu/~holton/challenge/twin/), but further careful experiments would be needed to confirm those conclusions in a completely empirical setting.

4. Results/discussion

4.1. Description of the structure and comparison to others

Since there are so many structures of calmodulin and calmodulin–peptide complexes in the PDB, it seemed unlikely that anything new would appear in this structure (Fig. 6[link]). Indeed, when compared with the N- and C-terminal lobes of known calmodulin structures using DALI (Holm & Rosenström, 2010[Holm, L. & Rosenström, P. (2010). Nucleic Acids Res. 38, W545-W549.]), the structure solved herein is highly similar to the tens of homologous structures, with Cα r.m.s.d. values of ∼1 Å. In comparing the whole calmodulin structure with those in the PDB, somewhat less similarity was observed, with the best value of 1.9 Å r.m.s.d. to a calmod­ulin–CaV1 peptide structure (PDB entry 2vay; Fig. 6[link]b; Halling et al., 2009[Halling, D. B., Georgiou, D. K., Black, D. J., Yang, G., Fallon, J. L., Quiocho, F. A., Pedersen, S. E. & Hamilton, S. L. (2009). J. Biol. Chem. 284, 20041-20051.]), but the poor fit may be ascribed to relatively subtle but larger scale interdomain rearrangements. In any case, the overall structure of calmodulin as such observed in the current study was not remarkably different from previously determined structures.

[Figure 6]
Figure 6
Stereograms of structural comparisons and OMIT density. (a) Superposition of four NCS-related copies of the calmodulin–peptide complex described herein. The structures are nearly identical to each other, with the exception of one loop (foreground, center). (b) OMIT map density for peptide in white. (c) Superposition of the current structure with the closest full-length, peptide-bound structural relative (PDB entry 2vay). The peptide observed in the current structure diverges dramatically in structure from the canonical one observed in PDB entry 2vay.

In contrast, the conformation of the bound peptide was strikingly different (Fig. 6[link]c). Instead of the oft-observed straight α-helical form, the peptide in the current structure was observed to be bent to approximately 90° at approximately position 13 of the 19 residues observed in the structure (Fig. 6[link]c). The significance of this is unknown, but ongoing studies comparing the current structure with other unpublished structures with similar peptides may uncover physiologically important ramifications for this configuration.

5. Conclusion

The structure solution described herein provides, if nothing more, a good practical example of a structure solution in which many crystallographic concepts come to the fore. Most prominently, several aspects of twinning are illustrated, such as the relation of twinning to crystallographic symmetry (manifested in the decision to refine the structure as hemihedral in P3212 rather than tetartohedral in P32), ambiguities between varieties of twinning, and the relationship between crystal morphology and twinning, as seen in the analyses of several windows of data. Another concept or paradox was that of the MR results, such that the apparent best structural match to the refined structure was not indeed the successful MR model. Third, the concepts and relationship of n/p ratios and data precision were important in determining which ADP model to use. Lastly, the novel conformation of the peptide may provide a basis for understanding the interaction of calmodulin with the protein from which it was derived. Overall, it is here suggested that these data provide an good case for illustrating, in one example, a multitude of crystallo­graphic concepts.

Additionally, a new experimental and data-analysis approach to twinning may be implicit herein. With the advent of many new smaller profile or microfocus beamlines, strategies are often designed to scan continuously through fresh, undamaged parts of the crystal. This is also occasionally performed, as in the current case, with off-center crystal mounting to sample a toroidal volume, thereby spreading the radiation damage dose over a larger volume (Zeldin et al., 2013[Zeldin, O. B., Gerstel, M. & Garman, E. F. (2013). J. Synchrotron Rad. 20, 49-57.]). With data collected in this way, twinning analyses can be performed on small contiguous wedges of data and variation of the twin fraction can be measured. In cases similar to the current study, the twin fraction can vary significantly within the sample (Kerfeld et al., 1997[Kerfeld, C. A., Wu, Y. P., Chan, C., Krogmann, D. W. & Yeates, T. O. (1997). Acta Cryst. D53, 720-723.]), and at least partial detwinning prior to structure solution may be more feasible in these cases, although detwinning as currently implemented enigmatically does not seem to help in structure determination. If this enigma were solved, or if new experimental phasing algorithms accommodating twinning were developed, this type of data-collection strategy might prove highly effective, especially since experimental phasing methods tend currently to fare worse than MR in the presence of twinning (Dauter, 2003[Dauter, Z. (2003). Acta Cryst. D59, 2004-2016.]). Since these data-collection strategies have already been implemented as a method for reducing radiation damage (Zeldin et al., 2013[Zeldin, O. B., Gerstel, M. & Garman, E. F. (2013). J. Synchrotron Rad. 20, 49-57.]), analyses of sub-wedges of microfocus beam data in the case of twinned crystals might make a critical difference, particularly in experimentally phased structure solutions.

Supporting information


Acknowledgements

The author thanks Loren Looger for scientific advice. Financial support was through Loren Looger's laboratory at the Janelia Research Campus/HHMI. Rebecca Keller, Kay Diederichs, Randy Read, Pavel Afonine, Johan Hattne, Eric Schreiter and Loren Looger provided valuable help on the manuscript, and the author also thanks Johann Hattne, Francis Reyes and the many contributors to the CCP4bb mailing list for provocative and informative conversations about the crystallographic matters discussed herein. The final model and structure factors have been deposited in the Protein Data Bank (PDB entry 5dow) and the images are available from SBGRID (https://doi.org/10.15785/SBGRID/338).

References

First citationAdams, P. D. et al. (2010). Acta Cryst. D66, 213–221.  Web of Science CrossRef CAS IUCr Journals
First citationCowtan, K. (2006). Acta Cryst. D62, 1002–1011.  Web of Science CrossRef CAS IUCr Journals
First citationDauter, Z. (2003). Acta Cryst. D59, 2004–2016.  Web of Science CrossRef CAS IUCr Journals
First citationEvans, P. R. (2011). Acta Cryst. D67, 282–292.  Web of Science CrossRef CAS IUCr Journals
First citationHalling, D. B., Georgiou, D. K., Black, D. J., Yang, G., Fallon, J. L., Quiocho, F. A., Pedersen, S. E. & Hamilton, S. L. (2009). J. Biol. Chem. 284, 20041–20051.  CrossRef CAS
First citationHelliwell, J. R. (2008). Crystallogr. Rev. 14, 189–250.  Web of Science CrossRef CAS
First citationHolm, L. & Rosenström, P. (2010). Nucleic Acids Res. 38, W545–W549.  Web of Science CrossRef CAS PubMed
First citationKeegan, R. M. & Winn, M. D. (2007). Acta Cryst. D63, 447–457.  Web of Science CrossRef CAS IUCr Journals
First citationKerfeld, C. A., Wu, Y. P., Chan, C., Krogmann, D. W. & Yeates, T. O. (1997). Acta Cryst. D53, 720–723.  CrossRef CAS IUCr Journals
First citationLeslie, A. G. W. & Powell, H. R. (2007). Evolving Methods for Macromolecular Crystallography, edited by R. J. Read & J. L. Sussman, pp. 41–51. Dordrecht: Springer.
First citationMarshall, C. B., Nishikawa, T., Osawa, M., Stathopulos, P. B. & Ikura, M. (2015). Biochem. Biophys. Res. Commun. 460, 5–21.  CrossRef CAS
First citationMcCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674.  Web of Science CrossRef CAS IUCr Journals
First citationMueller, M., Wang, M. & Schulze-Briese, C. (2012). Acta Cryst. D68, 42–56.  Web of Science CrossRef CAS IUCr Journals
First citationMurshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367.  Web of Science CrossRef CAS IUCr Journals
First citationOeffner, R. D., Bunkóczi, G., McCoy, A. J. & Read, R. J. (2013). Acta Cryst. D69, 2209–2215.  Web of Science CrossRef CAS IUCr Journals
First citationPflugrath, J. W. (1999). Acta Cryst. D55, 1718–1725.  Web of Science CrossRef CAS IUCr Journals
First citationRupp, B. (2010). Biomolecular Crystallography: Principles, Practice, and Application to Structural Biology, p. 638. New York: Garland Science.
First citationSchuermann, J. P. & Tanner, J. J. (2003). Acta Cryst. D59, 1731–1736.  Web of Science CrossRef CAS IUCr Journals
First citationTidow, H. & Nissen, P. (2013). FEBS J. 280, 5551–5565.  CrossRef CAS
First citationWilson, M. A. & Brunger, A. T. (2000). J. Mol. Biol. 301, 1237–1256.  Web of Science CrossRef PubMed CAS
First citationZeldin, O. B., Gerstel, M. & Garman, E. F. (2013). J. Synchrotron Rad. 20, 49–57.  Web of Science CrossRef CAS IUCr Journals

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds