research communications
Serendipitous high-resolution structure of Escherichia coli carbonic anhydrase 2
aDepartment of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA, and bLife Sciences Institute, University of Michigan, Ann Arbor, MI 48109, USA
*Correspondence e-mail: rankinmi@umich.edu
X-ray crystallography remains the dominant method of determining the three-dimensional structure of proteins. Nevertheless, this resource-intensive process may be hindered by the unintended crystallization of contaminant proteins from the expression source. Here, the serendipitous discovery of two novel crystal forms and one new, high-resolution structure of carbonic anhydrase 2 (CA2) from Escherichia coli that arose during a crystallization campaign for an unrelated target is reported. By comparing unit-cell parameters with those in the PDB, contaminants such as CA2 can be identified, preventing futile molecular-replacement attempts. Crystallographers can use these new lattice parameters to diagnose CA2 contamination in similar experiments.
Keywords: contaminants; carbonic anhydrase; Escherichia coli.
PDB references: carbonic anhydrase 2, P41212, 9eat; space group P21212, 9eaw; space group C2221, 9ebz
1. Introduction
The process of obtaining a three-dimensional structure of a target protein can be time-consuming and expensive, regardless of the technique used. Each stage of the gene-to-structure pipeline has potential for failure, yet the most frustrating and expensive errors may arise at the very end during the analysis of diffraction data. Efforts to solve the structure through
or experimental phasing may result in the unfortunate discovery that the crystallized protein was not the target protein.The PDB contains many accidental structures of contaminants that arose during purification (Niedzialkowska et al., 2016; Grzechowiak et al., 2021). Typical purification schemes involve the addition of exogenous proteins such as lysozyme (Falgenhauer et al., 2021), Tobacco etch virus (TEV) protease (Tropea et al., 2009) or deoxyribonuclease (DNase) I (Funakoshi et al., 1980). Any of these proteins has the potential to persist through the purification and crystallize in lieu of the target protein. Genetically encoded fusion proteins such as maltose-binding protein (MBP; Lebendiker & Danieli, 2017) or glutathione S-transferase (GST; Harper & Speicher, 2011) may also remain in small quantities after cleavage and counterselection.
More commonly, contaminating proteins from the expression source lead to unintended structures. The nickel resin used in immobilized metal-affinity Escherichia coli purification strategy have been reported (Niedzialkowska et al., 2016; Grzechowiak et al., 2021; Bolanos-Garcia & Davies, 2006). These proteins may bind nickel resin or interact nonspecifically with the protein of interest and thus be retained through the final purification step. Common endogenous E. coli contaminants that have been reported to co-elute during nickel-affinity purification include ArnA (Andersen et al., 2013; Robichon et al., 2011), SlyD (Andersen et al., 2013; Robichon et al., 2011; Parsy et al., 2007), Hsp60 (GroEL; Bolanos-Garcia & Davies, 2006), YodA (David et al., 2003) and Can/YadF (carbonic anhydrase; Chai et al., 2021). Frequent contaminants are listed in the ContaBase database (Hungler et al., 2016).
(IMAC), the most common method used to obtain large quantities of recombinant protein, has the potential to bind proteins other than the polyhistidine-tagged target. Many such contaminants from a nickel-affinity-basedAn endogenous carbonic anhydrase frequently contaminates recombinant proteins from E. coli expression systems (Robichon et al., 2011; Chai et al., 2021; Cronk et al., 2001; Merlin et al., 2003). Carbonic anhydrase (EC 4.2.1.1) is a zinc-dependent that forms carbonic acid from CO2, a byproduct of carbohydrate and fat catabolism. In humans, carbonic anhydrases in red blood cells reversibly solubilize CO2 as carbonic acid, allowing it to reach the lungs to be exhaled (Doyle & Cooper, 2024). E. coli contains two carbonic anhydrase genes. The essential can gene (previously yadF; UniProt P61517) encodes carbonic anhydrase 2 (CA2), a β-class CA enzyme. CynT (UniProt P0ABE9) is a paralog of CA2 (33% sequence identity) that can complement disruption of can (Merlin et al., 2003). The PDB contains several structures of E. coli CA2, but none of CynT (Table 1).
|
Here, we report a case of persistent CA2 contamination that crystallized in three forms. Two are new crystal forms and the third yielded a high-resolution (1.43 Å) structure of a common CA2 crystal form.
2. Materials and methods
2.1. Protein production
The gene encoding a natural product biosynthetic protein of interest was cloned into the vector pMCSG7 using a ligation-independent cloning strategy (Stols et al., 2002). To facilitate phosphopantetheinylation of the target protein, the plasmid was transformed into the E. coli BL21(DE3) BAP1 cell line (Pfeifer et al., 2001), which constitutively expresses sfp, encoding a nonspecific phosphopantetheinyl transferase (Quadri et al., 1998). The expression strain also contained the pRare2-CDF (Whicher et al., 2013) plasmid. These cells were made competent by the Mix & Go! E. coli Transformation Kit (Zymo Research). Terrific Broth (TB) cultures containing 100 µg ml−1 ampicillin and 50 µg ml−1 spectinomycin were grown at 37°C with shaking at 225 rev min−1 until an OD600 of 1.0 was reached. The cultures were cooled to 20°C for 1 h, induced with 200 µM isopropyl β-D-1-thiogalactopyranoside (IPTG) and 2 g l−1 L-arabinose, grown for 18 h and harvested by centrifugation at 12 000g.
The cell pellet from a 1 l culture was resuspended in 70 ml lysis buffer [50 mM HEPES pH 7.8, 300 mM NaCl, 10%(v/v) glycerol, 20 mM imidazole pH 7.8], augmented with 1 mg ml−1 chicken lysozyme (Sigma), 50 µg ml−1 bovine DNase I (Sigma) and 2 mM MgCl2, and then incubated for 30 min at room temperature with agitation. Complete lysis was achieved via sonication (Branson Sonifier 450). Following centrifugation at 30 000g, the soluble fraction was collected, filtered (0.45 µm Millex-HP PES membrane filter unit, Millipore), incubated for 2 h with 5 ml packed Ni–NTA agarose beads (Qiagen) and loaded onto a glass column (Bio-Rad). The beads were washed with 100 ml lysis buffer before the protein was eluted in 40 ml elution buffer [50 mM HEPES pH 7.8, 300 mM NaCl, 10%(v/v) glycerol, 400 mM imidazole pH 7.8].
The M HEPES pH 7, 50 mM NaCl, 10%(v/v) glycerol]. The diluted protein solution was passed through a 5 ml HiTrap Q HP anion-exchange column (Cytiva) at a flow rate of 3 ml min−1. Proteins were fractionated by a NaCl gradient (50–400 mM over 125 ml).
was then concentrated to 15 ml using a centrifugal filter unit (Amicon) with a 30 kDa molecular-weight cutoff (MWCO) before being diluted to 50 ml in gel-filtration buffer [50 mFor a final purification step by gel filtration, proteins were concentrated to 5 ml and injected onto a Superdex 200 HiLoad 16/60 prep-grade gel-filtration column (GE Healthcare) that had been pre-equilibrated with gel-filtration buffer. Eluates were assessed by SDS–PAGE (Fig. 1). The target protein was obtained with an estimated purity of >95% and a CA2 fraction of <1%. Target fractions were pooled, concentrated, flash-frozen in liquid nitrogen and stored at −80°C.
2.2. Protein crystallization
The protein sample prepared above was thawed on ice and then dialyzed into a buffer consisting of 10 mM HEPES pH 7.8, 25 mM NaCl overnight at 4°C using Slide-A-Lyzer MINI dialysis cups with a 10 kDa MWCO. The protein was then concentrated to 8.2 mg ml−1 and broad screening with the MCSG suite (Microlytic) was performed using a Gryphon crystallization robot. Within three days, crystals of diverse morphology grew in many conditions (Table 2). Crystals were harvested directly from the growth conditions and cryoprotected by plunging them into liquid nitrogen. After discovering that these crystals from the initial broad screen did not contain the protein of interest, they were not optimized further. This expression construct was abandoned in favour of a strategy that yielded a sample with higher purity.
|
2.3. Data collection and processing
Data were reduced and scaled using XDS (Kabsch, 2010; Table 3).
|
2.4. Structure solution and refinement
Phaser (McCoy et al., 2007) using a homolog of the target protein failed. We then searched the PDB for matching lattice parameters and identified CA2 (PDB entry 1i6p; Cronk et al., 2001) as a match for CA2 crystal form 2. MR via Phaser was then carried out using PDB entry 1i6p as a search model. At this point, CA2 contamination was suspected in the other crystals, so the high-resolution structure (PDB entry 9eat) was used as an MR search model for the CA2 samples in crystal forms 4 and 5. of all models was performed using iterative rounds of phenix.refine (Afonine et al., 2012) and manual model building in Coot (Emsley et al., 2010). The data from crystal form 2 exhibited a strong anomalous signal, presumably due to the tightly bound Zn2+ ion, so the f′ and f′′ contributions of Zn2+ were refined for this data set. All structural figures were created using PyMOL (Schrödinger). Structure validation was performed with MolProbity (Chen et al., 2010). are summarized in Table 4.
(MR) in
|
3. Results and discussion
We report two novel crystal forms of E. coli CA2 that was obtained as a purification contaminant. These lattice parameters can be added to the list of crystal forms of CA2, saving time in the case of contamination. In addition to two novel crystal forms, we present a new, high-resolution (1.43 Å) view of CA2 (Fig. 2). The presumed Zn2+ ion is coordinated with tetrahedral geometry by Cys42, Asp44, His98 and Cys101. The coordinate bond lengths, as seen in the high-resolution structure, are Zn—SG(Cys42) at 2.2 Å (range 2.2–2.2 Å), Zn—OD2(Asp44) at 2.1 Å (range 1.9–2.1 Å), Zn—NE2(His98) at 2.1 Å (range 2.0–2.1 Å) and Zn—SG(Cys101) at 2.3 Å (range 2.2–2.3 Å), with the ranges reflecting observations from all models reported in this study.
The identity of a protein in a crystal is technically uncertain until the structure is solved. When structure solution fails despite high-quality data with no obvious pathologies, the contents of the crystal should be considered. This study highlights the unfortunate reality that even off-target macromolecules with low (<1%) abundance may readily crystallize. It is always useful to search the PDB for a
with symmetry and cell constants that match the indexed data.Sometimes, as was the case for the data sets resulting from CA2 crystal forms 4 and 5, there is no match for the cell and symmetry in the PDB. The fortuitous discovery of a CA2 crystal in the previously characterized form 2 from the same protein sample revealed the contaminant (Table 1). This observation led to successful MR structure determinations for the other two data sets with CA2 as a search model, yielding CA2 structures in crystal forms 4 and 5.
When working with a new data set that has no matches in the PDB, alternative diagnostic approaches are available. If the amount of crystalline material permits, the size of the crystallized macromolecule may be estimated by SDS–PAGE, or more accurately assessed by et al., 2016), tools have been developed to solve contaminant structures by using MR more efficiently. MarathonMR uses a subset of the PDB based on fold families (Hatti et al., 2017). For common contaminants, ContaMiner performs automated MR against common suspects in the ContaBase database (Hungler et al., 2016). SIMBAD combines several strategies by first searching unit-cell parameters and then screening for common contaminants, before finally performing a brute-force search of a nonredundant subset of the PDB (Simpkin et al., 2018).
While researchers have had success with an exhaustive, iterative MR campaign using the full PDB as search models (KeeganSometimes the source of the contaminating protein comes not from the expression source, but from contaminating cells. Serratia proteamaculans was suspected to have contaminated Trichoplusia ni, as the cyanate hydratase CynS co-purified with the target protein and formed well diffracting crystals (Butryn et al., 2015). identified CynS, and MR was successful. The Serratia genus appears to be notorious for cell contamination, as different laboratories have reported contamination with Serratia CynS (Pederzoli et al., 2020) and glycerol dehydrogenase (Musille & Ortlund, 2014) when expressing targets in E. coli.
Although the interference of contaminating proteins in a structural biology project is frustrating, it may sometimes lead to exciting results. Trace lysozyme added to cells during lysis formed a heterotrimeric complex that facilitated crystallization of the cortactin–Arg complex (Liu et al., 2012). Crystallographic analysis of co-purified contaminating proteins has also yielded novel structures. Examples include the yeast nicotinamidase Pnc1p (Hu et al., 2007), the putative cysteine hydrolase YcaC from Pseudomonas aeruginosa (Grøftehauge et al., 2015) and the Achromobacter sp. bacterioferritin Dh1f (Dwivedy et al., 2018).
Engineering approaches may also minimize the chances of co-eluting proteins when using a nickel-affinity purification strategy. Cell lines such as E. coli LOBSTR (low background strain) have been developed by modifying the arnA and slyD genes of E. coli BL21(DE3) such that the encoded proteins exhibit weaker binding to Ni–NTA resin (Andersen et al., 2013). Similarly, in the engineered NiCo21(DE3) E. coli strain, DNA encoding a chitin-binding domain is appended to the 3′ ends of slyD, can and arnA, allowing chitin-resin depletion of the corresponding problematic proteins. In this strain, glmS has also been altered to produce a protein that binds nickel resin with lower affinity.
While usually unwelcome, crystals resulting from unintended targets may yield new results. We present a high-quality carbonic anhydrase 2 structure that may serve as a new standard for structural studies. Additionally, we report two additional CA2 structures in new crystal forms, which may save time when others encounter the same problem.
Supporting information
PDB references: carbonic anhydrase 2, P41212, 9eat; space group P21212, 9eaw; space group C2221, 9ebz
Acknowledgements
We thank the beamline staff at the GM/CA sector of the Advanced Photon Source (APS) for their support. GM/CA@APS has been funded by the National Cancer Institute (ACB-12002) and the National Institute of General Medical Sciences (AGM-12006, P30 GM138396). This research used resources of the Advanced Photon Source, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.
Conflict of interest
The authors declare no conflicts of interest.
Funding information
The following funding is acknowledged: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases (grant No. R01 DK042303 to Janet L. Smith); National Institutes of Health, National Cancer Institute (grant No. F31 CA265082 to Michael R. Rankin); National Institutes of Health, National Institute of General Medical Sciences (grant No. T32 GM145304).
References
Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. Web of Science CrossRef CAS IUCr Journals Google Scholar
Andersen, K. R., Leksa, N. C. & Schwartz, T. U. (2013). Proteins, 81, 1857–1861. Web of Science CrossRef CAS PubMed Google Scholar
Bolanos-Garcia, V. M. & Davies, O. R. (2006). Biochim. Biophys. Acta, 1760, 1304–1313. Web of Science PubMed CAS Google Scholar
Butryn, A., Stoehr, G., Linke-Winnebeck, C. & Hopfner, K.-P. (2015). Acta Cryst. F71, 471–476. Web of Science CrossRef IUCr Journals Google Scholar
Chai, L., Zhu, P., Chai, J., Pang, C., Andi, B., McSweeney, S., Shanklin, J. & Liu, Q. (2021). Crystals, 11, 1227. CrossRef Google Scholar
Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. Web of Science CrossRef CAS IUCr Journals Google Scholar
Cronk, J. D., Endrizzi, J. A., Cronk, M. R., O'Neill, J. W. & Zhang, K. Y. J. (2001). Protein Sci. 10, 911–922. Web of Science CrossRef PubMed CAS Google Scholar
Cronk, J. D., Rowlett, R. S., Zhang, K. Y. J., Tu, C., Endrizzi, J. A., Lee, J., Gareiss, P. C. & Preiss, J. R. (2006). Biochemistry, 45, 4351–4361. Web of Science CrossRef PubMed CAS Google Scholar
David, G., Blondeau, K., Schiltz, M., Penel, S. & Lewit-Bentley, A. (2003). J. Biol. Chem. 278, 43728–43735. Web of Science CrossRef PubMed CAS Google Scholar
Doyle, J. & Cooper, J. S. (2024). StatPearls. Treasure Island: StatPearls Publishing. Google Scholar
Dwivedy, A., Jha, B., Singh, K. H., Ahmad, M., Ashraf, A., Kumar, D. & Biswal, B. K. (2018). Acta Cryst. F74, 558–566. Web of Science CrossRef IUCr Journals Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals Google Scholar
Falgenhauer, E., von Schönberg, S., Meng, C., Mückl, A., Vogele, K., Emslander, Q., Ludwig, C. & Simmel, F. C. (2021). ChemBioChem, 22, 2805–2813. CrossRef CAS PubMed Google Scholar
Funakoshi, A., Tsubota, Y., Fujii, K., Ibayashi, H. & Takagi, Y. (1980). J. Biochem. 88, 1113–1118. CrossRef CAS PubMed Google Scholar
Grøftehauge, M. K., Truan, D., Vasil, A., Denny, P. W., Vasil, M. L. & Pohl, E. (2015). Int. J. Mol. Sci. 16, 15971–15984. Web of Science PubMed Google Scholar
Grzechowiak, M., Sekula, B., Jaskolski, M. & Ruszkowski, M. (2021). Acta Biochim. Pol. 68, 29–31. Web of Science CAS PubMed Google Scholar
Harper, S. & Speicher, D. W. (2011). Methods Mol. Biol. 681, 259–280. CrossRef CAS PubMed Google Scholar
Hatti, K., Biswas, A., Chaudhary, S., Dadireddy, V., Sekar, K., Srinivasan, N. & Murthy, M. R. N. (2017). J. Struct. Biol. 197, 372–378. Web of Science CrossRef CAS PubMed Google Scholar
Hu, G., Taylor, A. B., McAlister-Henn, L. & Hart, P. J. (2007). Arch. Biochem. Biophys. 461, 66–75. Web of Science CrossRef PubMed CAS Google Scholar
Hungler, A., Momin, A., Diederichs, K. & Arold, S. T. (2016). J. Appl. Cryst. 49, 2252–2258. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kabsch, W. (2010). Acta Cryst. D66, 125–132. Web of Science CrossRef CAS IUCr Journals Google Scholar
Keegan, R., Waterman, D. G., Hopper, D. J., Coates, L., Taylor, G., Guo, J., Coker, A. R., Erskine, P. T., Wood, S. P. & Cooper, J. B. (2016). Acta Cryst. D72, 933–943. Web of Science CrossRef IUCr Journals Google Scholar
Lebendiker, M. & Danieli, T. (2017). Methods Mol. Biol. 1485, 257–273. CrossRef CAS PubMed Google Scholar
Liebschner, D., Afonine, P. V., Moriarty, N. W., Poon, B. K., Sobolev, O. V., Terwilliger, T. C. & Adams, P. D. (2017). Acta Cryst. D73, 148–157. Web of Science CrossRef IUCr Journals Google Scholar
Liu, W., MacGrath, S. M., Koleske, A. J. & Boggon, T. J. (2012). Acta Cryst. F68, 154–158. CrossRef IUCr Journals Google Scholar
McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar
Merlin, C., Masters, M., McAteer, S. & Coulson, A. (2003). J. Bacteriol. 185, 6415–6424. Web of Science CrossRef PubMed CAS Google Scholar
Musille, P. & Ortlund, E. (2014). Acta Cryst. F70, 166–172. Web of Science CrossRef IUCr Journals Google Scholar
Niedzialkowska, E., Gasiorowska, O., Handing, K. B., Majorek, K. A., Porebski, P. J., Shabalin, I. G., Zasadzinska, E., Cymborowski, M. & Minor, W. (2016). Protein Sci. 25, 720–733. Web of Science CrossRef CAS PubMed Google Scholar
Parsy, C. B., Chapman, C. J., Barnes, A. C., Robertson, J. F. & Murray, A. (2007). J. Chromatogr. B, 853, 314–319. CrossRef CAS Google Scholar
Pederzoli, R., Tarantino, D., Gourlay, L. J., Chaves-Sanjuan, A. & Bolognesi, M. (2020). Acta Cryst. F76, 392–397. Web of Science CrossRef IUCr Journals Google Scholar
Pfeifer, B. A., Admiraal, S. J., Gramajo, H., Cane, D. E. & Khosla, C. (2001). Science, 291, 1790–1792. Web of Science CrossRef PubMed CAS Google Scholar
Quadri, L. E. N., Weinreb, P. H., Lei, M., Nakano, M. M., Zuber, P. & Walsh, C. T. (1998). Biochemistry, 37, 1585–1595. CrossRef CAS PubMed Google Scholar
Robichon, C., Luo, J., Causey, T. B., Benner, J. S. & Samuelson, J. C. (2011). Appl. Environ. Microbiol. 77, 4634–4646. CrossRef CAS PubMed Google Scholar
Simpkin, A. J., Simkovic, F., Thomas, J. M. H., Savko, M., Lebedev, A., Uski, V., Ballard, C., Wojdyr, M., Wu, R., Sanishvili, R., Xu, Y., Lisa, M.-N., Buschiazzo, A., Shepard, W., Rigden, D. J. & Keegan, R. M. (2018). Acta Cryst. D74, 595–605. Web of Science CrossRef IUCr Journals Google Scholar
Stols, L., Gu, M., Dieckman, L., Raffen, R., Collart, F. R. & Donnelly, M. I. (2002). Protein Expr. Purif. 25, 8–15. Web of Science CrossRef PubMed CAS Google Scholar
Tropea, J. E., Cherry, S. & Waugh, D. S. (2009). Methods Mol. Biol. 498, 297–307. CrossRef PubMed CAS Google Scholar
Whicher, J. R., Smaga, S. S., Hansen, D. A., Brown, W. C., Gerwick, W. H., Sherman, D. H. & Smith, J. L. (2013). Chem. Biol. 20, 1340–1351. CrossRef CAS PubMed Google Scholar
This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.