Structure of the archaeal chemotaxis protein CheY in a domain-swapped dimeric conformation

Che proteins are part of the signal transduction pathway between stimulus and response, mediated by the archaellum. The structure of the chemotaxis protein CheY was determined in two different crystal forms. The novel CheF protein that connects chemotaxis signaling to the motility apparatus was expressed and crystallized, and data were acquired by X-ray diffraction.


Introduction
Archaea and bacteria share the ability to move in response to chemical or physical stimuli towards favorable growth conditions (Marwan & Oesterhelt, 2000;. Motility is based on the rotation of the flagellum (in bacteria) and the archaellum (in archaea; formerly known as the archaeal flagellum), respectively, and the directionality of the movement is provided by modulating the switching frequency in response to the stimulus (Armitage, 1999). The molecular basis underlying taxis is composed of two systems: chemotaxis signal transduction, which processes the external stimulus, and the flagellum/archaellum, which responds to the chemotaxis output signal.
In bacteria, CheY-P interacts with the flagellar motor switch protein FliM (Welch et al., 1993). The CheY-P-FliM interaction has been shown to be responsible for increasing the probability of a switch in the rotational direction of the flagellum (Berg, 2003). In archaea, no homologs of FliM have been identified, and the interaction of CheY-P with different partners in bacteria and archaea has been considered to be a factor that separates the archaeal system of motility from the bacterial system of motility.
In interactomic studies, we have recently identified three candidate proteins in Halobacterium salinarum that are involved in the interaction of Che and Fla proteins (Schlesner et al., 2009). Analysis of deletion strains provided compelling evidence that two open reading frames in particular, OE2401F and CheF1 (OE2402F), are essential for controlling the directionality of archaellar rotation. OE2401F encodes a HEAT_PBS or HEAT family protein comprised of bihelical HEAT-like repeats. CheF1 encodes a protein from the conserved CheF-arch protein domain family (previously DUF439), which is exclusively found in Euryarchaea. CheF genes are consequently located in the chemotaxis gene regions and their occurrence is strictly correlated with the presence of che genes (Schlesner et al., 2009(Schlesner et al., , 2012. Protein-protein interaction analysis of the halobacterial Che proteins revealed that CheF1 directly interacts with CheY, CheD and CheC2, as well as with ArlCE (FlaCE, OE_2386R). As such, CheF1 is proposed to provide the missing function of bridging the chemotaxis signal transduction system and the motility apparatus, thereby representing the factor that connects the Che cascade, which is shared by archaea and bacteria, to the archaea-specific motility apparatus. The archaeal CheY has recently been structurally determined and in its interplay with CheF has been analyzed, providing a basal molecular biological understanding of how a conserved chemotaxis system can target the entirely different motility structures in bacteria and archaea (Quax, Altegoer et al., 2018). Here, we present single-crystal X-ray structures of CheY from Pyrococcus horikoshii (PhCheY) in two different crystal forms, and the protein purification, crystallization and X-ray data collection of CheF (PhCheF). Our data support the conservation of the bacterial and archaeal response-regulator proteins. We observe PhCheY to have a domain-swapped, pseudo-dimeric fold, which may reflect inherent properties of the protein fold but is not likely to be of physiological relevance.

Macromolecule production
The coding sequences for PhCheF and PhCheY were provided by a synthetic plasmid carrying the ORFs PH0494 (PhCheF) and PH0482 (PhCheY) and were amplified by PCR with PH0494 specific primers (forward primer AAGGAGA TATACATATGCCGATCTTTGAAGCCCG; reverse primer GGTGGTGGTGCTCGAGCATGCTCACCAGGCCATATT TC) and PH0482 specific primers (forward primer AAG GAGATATACATATGGCTCGTGTTCTGGTTGT; reverse primer GGTGGTGGTGCTCGAGACTAGACAGCACACG GATTCAC). In an In-Fusion cloning reaction (Clontech, Japan), the gel-purified fragments were ligated with NdeI and XhoI linearized pET-22b(+) (Novagen, USA), yielding the plasmids pDW01-1 and pDW02-1 for the expression of PhCheF and PhCheY as C-terminally His-tagged proteins. For the construction of non-His-tagged variants, we removed the His tag by inserting TAG codons upstream of the XhoI restriction site via the site-directed mutagenesis method (plasmids pDW01-2 and pDW02-2).
For heterologous expression of proteins in Escherichia coli, the plasmids were transformed into BL21 Gold (DE3) cells (Agilent Technologies, USA). Single colonies were used to inoculate 35 ml LB medium containing 100 mg ml À1 ampicillin and were incubated at 37 C for 16 h. This preculture was used to inoculate 2 l TB medium, which was grown at 37 C and 180 rev min À1 until the mid-to-late log phase (OD 600 = 0.8-1.0) before inducing expression at 20 C with 0.5 mM IPTG. The cells were harvested after 16 h of protein expression, and the cell pellets were frozen in liquid nitrogen and stored at À80 C until further use. For purification, the cells were resuspended in appropriate buffers containing protease inhibitors (Roche, Switzerland) and DNaseI (Applichem, Germany) and were lysed using a French press. The lysates were centrifuged for 1 h at 4 C and 47 000g. After centrifugation, the supernatants were subjected to purification protocols. The non-His-tagged proteins PhCheF and PhCheY (in buffer H; 20 mM Tris-HCl pH 7.5, 100 mM NaCl, 5 mM DTT) were purified by heating the supernatant to 80 C for 20 min followed by centrifugation of the precipitated biomolecules at 24 000g for 20 min. As analyzed by SDS-PAGE, this treatment precipitated most of the proteins of the E. coli expression host, while the thermostable proteins PhCheF and PhCheY remained soluble. Concentration of PhCheF and PhCheY and size-exclusion chromatography (SEC) using a Superdex 200 26/60 column (GE Healthcare, USA), removing the remaining proteins and soluble biomolecules (as metabolites and nucleic acids), eventually yielded proteins that were suitable for biomolecular analysis and crystallization. His-tagged proteins were purified by nickel-chelating affinity chromatography [standard protocol; wash buffer W (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 20 mM imidazole) and buffer E (the same as buffer W but with 500 mM imidazole)] and SEC (buffer H) using a Superdex 200 26/60 column (GE Healthcare, USA).
For the in vitro pull-down assay with PhCheF and Histagged PhCheY, 2 ml of the supernatant of each preparation (after lysis using a French press and centrifugation) were first incubated at 37 C for 15 min (in buffer W with 20 mM imidazole). The incubated protein mixture was then added to 2.5 ml equilibrated Ni-NTA beads (in 5-7.5 ml buffer W). After further incubation at 4 C for 1 h, the slurry was subjected to a gravity-flow column. The beads were washed five times with buffer W (one column volume per step) and the proteins were the eluted with buffer E (half a column volume per step). Fractions were collected and loaded onto a gel. The elution peak fractions were also subjected to SEC using a Superdex 200 10/300 column (GE Healthcare, USA). Ni-NTA pull-downs were also performed with 5 mM BeSO 4 as well as 50 mM NaF in the respective buffers to obtain the BeF 3 À species mimicking phosphorylated PhCheY (Lee et al., 2001). Macromolecule-production information is summarized in Table 1.

Data collection and processing
Multi-wavelength anomalous dispersion (MAD) data for PhCheF were collected on beamline PX2 at the Swiss Light Source synchrotron facility, Villigen, Switzerland. The MAD data set was collected at three wavelengths: 0.9795 Å (peak), 0.9797 Å (inflection) and 0.9718 Å (remote). For this experiment, the crystal in the droplet was transferred into a cryosolution consisting of the mother liquor supplemented with 20%(v/v) ethylene glycol for 1 min and was then cryocooled by plunging it into liquid nitrogen. X-ray diffraction data were recorded on a PILATUS 6M detector (Dectris) while the crystal was held in a gaseous N 2 stream at 100 K.
In the case of PhCheY, data sets were collected on beamline ID14-1 at the European Synchrotron Radiation Facility, Grenoble. The crystals were also maintained at 100 K, while data were recorded on a CCD detector (ADSC Quantum Q315r). All data sets were auto-processed and merged with the xia2 suite (Winter et al., 2013) using XDS (Kabsch, 2010) and AIMLESS (Evans & Murshudov, 2013). Data-collection and processing statistics are summarized in Table 3.

Structure solution and refinement
In the case of PhCheY, two different crystal forms were obtained in the monoclinic systems P2 and C2. The solventcontent calculations for the P2 crystal form performed with MATTHEWS_COEF (Matthews, 1968) indicated a solvent content of 58% with three molecules in the asymmetric unit. In the case of the C2 crystal form, MATTHEWS_COEF indicated a solvent content of 57% with six molecules in the asymmetric unit.  Table 1 Macromolecule-production information.
Sequences of PhCheF and PhCheY are available from UniProt with accession codes O58230 and O58193, respectively.   Table 4. The PhCheF data could not be phased and therefore the structure could not be determined.
All structural figures were drawn using PyMOL (DeLano, 2002; http://www.pymol.org). Atomic coordinates and experimental structure factors for PhCheY in space groups P2 and C2 have been deposited in the PDB with accession codes 6er7 and 6exr, respectively. Raw X-ray diffraction data for both PhCheF and PhCheY are available from the Zenodo science data archive (https://doi.org/10.5281/zenodo.1148967).

Preparation of P. horikoshii CheF and CheY
Proteins from H. salinarum have been expressed recombinantly in E. coli, but primarily as unfolded proteins that rely on the uncertainty of refolding protocols (Marg et al., 2005;Grininger et al., 2006). Therefore, we decided to work with CheF and CheY from P. horikoshii (termed PhCheF and PhCheY, respectively), which display sequence identities of 24 and 53% to CheF1 and CheY from H. salinarum, respectively ( Supplementary Fig. S1). In addition, we expected increased thermostability of these proteins (Szilá gyi & Zá vodszky, 2000) owing to the thermophilic lifestyle of the source (P. horikoshii; Kawarabayasi et al., 1998).
The structures of CheY available in the PDB typically display a compact folded structural appearance. The subunits of CheY consistently show a 1/1/2/2/3/3/4/4/5/5 globular fold of five helices flanking a five-stranded parallel -sheet. The subdomain in the N-terminal region (referred to  Table 3 Data collection and processing. Values in parentheses are for the outer shell. All data were processed to a CC 1/2 of 0.5.  as the N-terminal domain; residues 1-53) has -helices 1-2 packing on opposite sides of -strands 1-3, and the C-terminal region of the molecule (referred to as the C-terminal domain; residues 61-118) has -helices 3-5 packing against -strands 3-5. In striking contrast to all of the known CheY structures, PhCheY adopts an open conformation in both crystal forms, with the two subdomains of a given subunit directed away from one another. In both of the two PhCheY structures solved in this study, two polypeptide chains assemble by pairing the N-terminal subdomain of a chain with the C-terminal subdomain of the other chain; i.e. the 3/4/4/5/5 part of the fold of each subunit packs with the 1/1/2/2 part of the other subunit [ Fig. 1(a)].
The crystal form with space group P2 contains three noncrystallographic symmetry (NCS) molecules in the asymmetric unit. Superposition of the three subunits shows differing orientations between the domains, leading to an inter-domain movement of up to 90 [ Fig. 1(b)]. The N-and C-terminal domains of PhCheY superimpose well onto each other in the crystal structure. The closest structural homolog is CheY from T. maritima (Usher et al., 1998  Data for the crystal form in space group P2 are shown. (a) Overall structure of PhCheY with the protomers of a dimeric structure colored cyan and magenta. PhCheY retains the overall (/) 5 fold of CheY, but shows a different packing by swapping about half of the fold. Three molecules are found in the asymmetric unit. Chain A is shown in cyan and the symmetry-related chain A 0 is in magenta. The chains form a total interface of 2610 Å 2 for the AA 0 interaction (the values are 2460 Å 2 for the BB 0 interaction and 2420 Å 2 for the CC 0 interaction). (b) Superposition of the polypeptide chains within the asymmetric unit. The N-terminal parts of the chains (residues 1-53) have been superimposed. Superpositions were calculated with the LSQ tool (leastsquares fit) in Coot (Emsley et al., 2010). (c) Superposition of PhCheY monomers (top; ribbon representation of backbone) and of a PhCheY pseudomonomer (reconstituted from chains A and A 0 ) with T. maritima CheY (middle; PDB entry 1tmy; Usher et al., 1998). R.m.s.d. plot showing deviations from a chain A/A 0 PhCheY pseudomonomer (bottom). Values were calculated with the SSM (secondary-structure match) tool in Coot (Emsley et al., 2010). The r.m.s.d. diagram shows the largest overall deviation in the 3-3 loop hinge region (values exceeding 3 Å ) as well as in the 5-structure (residues 54-60) [ Fig. 1(c)], facilitating the interdomain movement.
In the case of the C2 crystal form, there are six molecules in the asymmetric unit. Four of the subunits (A, B, D and F ) interact with one another in a dimeric-type domain-swapped assembly as described above. The other two subunits, C and E, interact with their own crystallographic symmetric molecules for the dimeric assembly with swapped domains. Superposition of the N-terminal subdomain among the different subunits A, B, C, D, E and F reveals that the C-terminal domain of subunits B, C, D, E and F undergoes a relative movement of up to 57 with respect to the N-terminal domain. In all cases, the rotational movement of the C-terminal domain was calculated after the initial superposition of the N-terminal domain. The connecting region between the 3-3 loop is partially disordered for the two subunits C and E. For both crystal forms an unambiguous trace of the electron density in the 3-3 loop was verified by a feature-enhanced map (FEM; Afonine et al., 2015; shown in Supplementary Fig. S2 for the P2 crystal form).
CheY has been discussed as the evolutionary ancestor fold of periplasmatic binding proteins, such as for example the glucose/galactose-binding protein MglA. A domain swap, producing a pseudo-dimeric fold, has been suggested to be a key event in this process, and has been observed for the CheYlike protein Spo0A (Fukami-Kobayashi et al., 1999;Lewis et al., 2000) [ Fig. 1(d)].

Comparison of archaeal CheY from P. horikoshii (PhCheY) and M. maripaludis
CheY from M. maripaludis has recently been structurally analyzed in the BeF 3 À /NaF activated state, mimicking phosphorylation of the active aspartate Asp57 (Asp53 in PhCheY), and in the non-activated state (Quax, Altegoer et al., 2018). The structural comparison revealed a repositioning of helix 4 accompanied by the displacement of the 4-4 loop upon BeF 3 À binding, overall corresponding well to the structural changes that occur in E. coli and T. maritima CheY during activation (Lee et al., 2001;Ahn et al., 2013). The 'backswapped' PhCheY pseudomonomer (reconstituted from chains A and A 0 ) structure compares well with the structure of the non-activated CheY from M. maripaludis (r.m.s.d. of 1.1 Å , Z-score 22.1; Holm & Laakso, 2016) [Fig. 2(a)]. The positions of the active Asp53, as well as Thr81 and Tyr100, involved in translating the phosphorylation signal to a physiological output (Tyr-Thr coupling; Zhu et al., 1996), also superimpose well, although Tyr100 adopts a different rotamer position [ Fig. 2(b)]. The accumulated negative charge at the N-terminal region of helix 4 in M. maripaludis CheY has been suggested to provide an archaea-specific interface for CheF interaction (Quax, Altegoer et al., 2018). As indicated by a qualitative surface-potential representation [ Fig. 2(c)], PhCheY may be less negatively charged than M. maripaludis CheF, since it lacks a negatively charged amino acid at the position equivalent to Asp88 (Gly84 in PhCheY), as is also the case for other archaeal CheYs [ Fig. 2(d)]. Glu91 is in a different rotamer position in PhCheY and may also contribute to the less developed negative surface potential [see Fig. 2(c)].

Biochemical characterization of interactions of CheF and CheY
A difficult aspect in interaction studies of CheF and CheY is that CheY is supposed to interact with CheF in a phosphorylated state (CheY-P), which is however not accessible from recombinant expression. As a generally accepted treatment for mimicking phosphorylated states, we therefore incubated PhCheY with 5 mM BeSO 4 and 30 mM NaF to modify PhCheY with the phosphoryl analog BeF 3 À (Lee et al., 2001). The elution profiles did not reveal higher apparent molecular masses when CheY and CheF were eluted together, indicating no complex formation [ Fig. 3(a)]. While crosslinking with glutaraldehyde also failed in tracing interaction of the proteins under the selected conditions [ Fig. 3(b)], an Ni-NTA pull-down assay eventually allowed the specific interaction of the proteins to be monitored by His-tagged PhCheY retaining PhCheF [ Fig. 3(c)] during chromatographic elution. Both conditions, BeF 3 À -free and BeF 3 À -treated PhCheY, gave similar elution profiles. The complex observed in Ni-NTA pull-downs was not preserved during SEC. From Coomassiestained SDS-PAGE in Ni-NTA pull-downs, PhCheY appears to be present in a molar excess, which might however result from nonsaturated PhCheY under the experimental conditions. We note that SEC showed the proteins to be monomeric after purification, with PhCheF having some tendency for the formation of a higher oligomeric species under the chosen conditions [see Fig. 3(a)]. Data collected during the biochemical characterization of CheY and CheF therefore do not support a physiological relevance of dimeric CheY species, as are observed in crystal structures.

Discussion
Two-component systems regulate a variety of fundamental processes in metabolism and motility, as well as a set of more specialized processes such as in virulence and development (Zschiedrich et al., 2016). A prototypical response regulator is composed of two domains: a receiver domain and an output domain. The receiver domain operates in a highly conserved mode of accepting a phosphoryl modification from the histidine kinase and forwarding this information to an effector domain. The effector domain triggers the output response, and the variety of effector domains allows a large number of responses regulated by two-component signals (Gao & Stock, 2009).
Owing to its important role in signal transduction, CheY has been intensively studied in recent decades. CheY proteins have been characterized as standalone proteins, resting in an equilibrium of non-activated and activated states that is shifted in response to phosphorylation (Lowry et al., 1994;Lee et al., 2001;Gardino & Kern, 2007). CheY is consistently described as monomeric even at high concentrations and to be independent of the phosphorylation state. As the research communications Acta Cryst. (2019). F75, 576-585 phosphorylated state is inherently unstable, with half-lives from seconds to several hours under ambient conditions (Swanson et al., 1996;Sanna et al., 1995), analysis of the conformational state of activated CheY is complicated, however, and for structural studies phosphorylation was mainly just mimicked by using BeF 3 À -containing buffers (Lee et al., 2001). In the current idea of a working mode for CheY, the activated state is read out by a subtly changed protein surface (Lee et al., 2001;Gao & Stock, 2009;Quax, Altegoer et al., 2018).
While CheY has been broadly characterized in structure and function, a subdomain swap, as observed in PhCheY, has not been reported before. However, it was found that the CheY / protein fold itself does allow domain swapping. Dimerization by domain swapping has been observed for the CheY-homologous sporulation response regulator Spo0A, although the physiological relevance of this structure was questioned in this case owing to nonphysiological crystallization at low pH (Lewis et al., 2000). Further, the pseudodimeric fold of periplasmatic binding proteins such as MglB [see Fig. 1(d)] has been suggested to have evolved from a CheY-like ancestor protein by a domain swap (Fukami-Kobayashi et al., 1999). When considering structural and conformational properties of the CheY fold, it is also informative to regard the folding properties of CheY. Studies on CheY revealed a heterogeneous folding trajectory, which is composed of N-and C-terminal subdomain folding in a hierarchical fashion with subdomain borders found swapped as in this study (Hills & Brooks, 2008, Ló pez-Herná ndez & Serrano, 1996. Both the structural properties of the CheY fold and the folding kinetics of CheY manifest the view of CheY being composed of two separate subdomains. The assessment of the physiological relevance of the domain-swapped dimeric PhCheY is strongly connected to the  Comparative analysis of CheY. (a) Superposition of the PhCheY pseudomonomer (reconstituted from chains A and A 0 ) with M. maripaludis CheY (PDB entries 6ekg and 6ekh; Quax, Altegoer et al., 2018). Views and arrangement are as in Fig. 1(c). Superpositions were calculated with the LSQ tool (leastsquares fit) in Coot (Emsley et al., 2010). PhCheY is in magenta/cyan and BeF 3 À /NaF-activated and non-activated M. maripaludis CheY are in orange and yellow, respectively. The gray background refers to (b). question of whether reversible domain swapping could occur on a time scale that is relevant for signaling. To switch between a monomeric and a dimeric state, an interface of roughly 1300 Å 2 between the subdomains of PhCheY would need to dissociate and reassociate in response to phosphorylation. Considering that the active aspartate (Asp53) is located in a positionally variable loop, it is unlikely that phosphorylation could efficiently induce the dissociation of the fold and enrich a 'swapped' form. Therefore, a physiological role for a domain-swapped dimeric PhCheY has to be ruled out. A physiological role of a domain-swapped dimer would also contradict the current understanding of the function of CheY (Quax, Altegoer et al., 2018).
Crystallization of PhCheY as a domain-swapped dimer can rather be explained as follows: PhCheY is monomeric in solution, as suggested by SEC [see Fig. 3(a)], and in SDS-PAGE of cross-linking experiments [see Fig. 3(b)]. Owing to a putative cold-induced destabilization of PhCheY (P. horikoshii grows at 98 C), and putatively further induced by the crystallization conditions, the PhCheY fold disassembles into an open subdomain-dissociated conformation. At the high protein concentration in the crystallization drop, PhCheY can then reassemble from this open state with a partner poly-peptide to form a domain-swapped dimeric assembly that is eventually removed from the monomer-dimer equilibrium by crystallization.
It is surely exciting to further reveal the different structural appearances of archaeal and bacterial CheY in order to disclose the principles of diversification in bacterial and archaeal chemotaxis/motility systems, which is then further pronounced by CheY interacting with FliM in bacteria and with CheF in archaea (Szurmant & Ordal, 2004;Albers & Jarrell, 2015;Schlesner et al., 2009;.