The X-ray crystal structure of the N-terminal domain of Ssr4, a Schizosaccharomyces pombe chromatin-remodelling protein

The N-terminal domain of Ssr4 from S. pombe, which is an integral protein in the SWI/SNF and RSC chromatin-remodelling complexes, has been crystallized and its structure has been solved using iodine as a phasing vehicle.


Introduction
Chromatin-remodelling complexes are essential for life and are evolutionarily conserved. These complexes have recently been implicated in cancer progression, either owing to their intrinsic ability for tumour suppression or their ability to alter the expression of tumour-suppression gene products (Roberts & Orkin, 2004). Much of our understanding of the chromatinremodelling process comes from studying the two complexes found in the yeast Saccharomyces cerevisiae: SWI/SNF and RSC. The structures of both of these multicomponent machines have recently been determined using cryo-EM (Han et al., 2020;Wagner et al., 2020). The yeast Schizosaccharomyces pombe is sometimes considered to be a more relevant model system for metazoan biology (Monahan et al., 2008), and also contains the chromatin-remodelling complexes SWI/ SNF and RSC, but the Schizosaccharomyces pombe complexes have different subunit compositions to the S. cerevisiae complexes. Ssr4 (UniProt entry Q9P7Y0) is one of the proteins that is not conserved between the two organisms and is only found in S. pombe.
There are at least 12 components of the SWI/SNF global transcription-activator complex in S. pombe and these include at a minimum Arp9, Arp42, Snf5, Snf22, Snf30, Sbf59, Sol1, Ssr1, Ssr2, Ssr3, Ssr4 and Tfg3. Additionally, the RSC complex is composed of at least 13 components including Arp9, Arp42, Rsc1, Rsc4, Rsc7, Rsc9, Rsc58, Sfh1, Snf21, Ssr1, Ssr2, Ssr3 and Ssr4. These complexes interact with histones and histonevariant components of the eukaryotic chromatin. They regulate transcription by altering the contacts made between DNA and the histones that coat the DNA, therefore opening up or closing down the sections available for gene regulation. Typical modifications include the acetylation or methylation of various lysine residues found on histones, the phosphorylation of serines on the histones and the methylation of cytosine bases on the DNA (Strahl & Allis, 2000). Most of these modifications change the interactions between the histones and the DNA, which allows or denies the access of transcription factors and RNA polymerase to the promoter regions of the genome. On the flip side, deacetylases, demethylases and phosphatases generally support the silencing of genes and the compaction of the chromatin (McKnight et al., 2015).
We were interested in characterizing some of the less well known components that had no structural homologues in the Protein Data Bank (PDB). An initial search of the PDB with the Ssr4 sequence confirmed that there were no homologous structures (by sequence), which made it of interest to us. This also meant that there was no molecular-replacement model for this protein structure, so de novo phasing would be required.

Protein production
The ssr4 gene from S. pombe (coding for 395 residues, lacking the N-terminal methionine) was ordered from GenScript and inserted into a pET vector with a six-His tag and TEV cleavage site at the N-terminus. The protein was expressed in Escherichia coli BL21(DE3) cells, which were induced by the addition of isopropyl -d-1-thiogalactopyranoside to 1 mM at an OD 600 of 0.6-0.8 and allowed to grow for a further 4-5 h at 28 C. The cells were pelleted and frozen for further use later. The frozen cell pellet was thawed, resuspended in TBSA (25 mM Tris pH 7.6, 135 mM NaCl, 2.7 mM KCl, 0.02% NaN 3 ), sonicated and centrifuged at 14 000 rev min À1 in a Beckman JA25.5 rotor to remove cell debris, and the supernatant was applied onto a 5 ml Ni-NTA column. The column was loaded with TBSA plus an extra 150 mM NaCl plus 10 mM imidazole, washed extensively with the same buffer with 20 mM imidazole and the protein was then eluted by the addition of 250 mM imidazole in TBSA with 150 mM NaCl. The protein was concentrated and applied onto a Superdex 200 16/60 gel-filtration column preequilibrated in TBSA. The peak fraction from the sizeexclusion column was concentrated to 5 mg ml À1 and stored at 4 C. Approximately 4.5 mg of the full-length protein was obtained from a 1 l culture. The initial protein sample ran as a 46 kDa band on an SDS-PAGE gel, but gels run after storage at 4 C showed a major band that ran at an apparent molecular mass of 25 kDa ( Supplementary Fig. S1a). Trypsin (Sigma, catalogue No. T1426) was dissolved in water to 1 mg ml À1 and 10 ml aliquots were lyophilized in PCR tubes. The protein was treated with trypsin by adding 50-100 ml aliquots of the protein sample to the PCR tubes containing 10 mg lyophilized trypsin and incubating on ice for an hour or more. The protein was used after trypsin treatment with no further purification.

Differential scanning fluorimetry (DSF)
The native and trypsin-treated proteins were characterized by DSF, where 0.30 ml of protein at 4 mg ml À1 was diluted into a final volume of 20 ml. 0.3 ml of a 1:20 dilution of SYPRO Orange dye (Sigma, catalogue No. S5692) was added and the temperature was ramped from 20 to 100 C in 0.5 C steps in a plate-based real-time PCR machine (Bio-Rad CFX-96; Supplementary Figs. S2a and S2b). The fluorescence curves were analysed with Meltdown (Rosa et al., 2015). 13 different buffers ranging from pH 5 to 9, each at two NaCl concentrations (50 and 200 mM), were tested (Seabrook & Newman, 2013), along with the initial TBSA formulation.

Mass spectrometry (MS)
Intact mass determination of Ssr4 preparations was achieved by LC-MS as described previously (Newman et al., 2019). Briefly, native and partially trypsin-digested protein samples were spiked with formic acid (FA) to a final concentration of 0.1%(v/v) and separated by reverse-phase liquid chromatography on an UltiMate 3000 RSLCnano system (Thermo Fisher Scientific) fitted with a 150 Â 4.6 mm, 5 mm particle-size, 300 Å pore-size PLRP-S column (Agilent). Proteins were eluted at a flow rate of 250 ml min À1 by applying a linear 30 min gradient from 0 to 80% solvent B [mobile phase A, 0.1%(v/v) FA; mobile phase B, 90%(v/v) acetonitrile/ 0.1%(v/v) FA] and ionized using an Apollo II electrospray ion source coupled to a MaXis II mass spectrometer (Bruker). High-resolution LC-MS data were analysed using the Intact Mass parsimonious charge-state deconvolution algorithm (Protein Metrics, USA; Supplementary Figs. S1b-S1f ).

Crystallization
Crystallization experiments were set up in SD2 sitting-drop plates (Molecular Dimensions, UK) at 8 C with droplets consisting of 200 nl protein solution plus 200 nl reservoir solution (1:1 ratio) with 50 ml reservoir solution in the wells. Some of the drops were microseeded in an effort to control nucleation and produce larger crystals. A seed stock was made by adding the contents of the reservoir and the crystal drop to a 1.5 ml centrifuge tube, adding a few (5-10) 0.4 mm glass beads and vortexing for a minute. Seeded drops contained 200 nl protein, 180 nl reservoir and 20 nl of the seed stock (Newman et al., 2008). The protein concentration was 5 mg ml À1 in TBSA. Three different protein samples were used in crystallization: protein that had been purified and stored at 4 C, protein that had been purified and treated with trypsin and protein that had been treated with extra protease inhibitors. For this, freshly purified protein was treated with a 1:100 volume ratio of 100Â Roche cOmplete protease-inhibitor cocktail (one tablet dissolved in 0.5 ml 100 mM EDTA).
Rod-shaped crystals ( Fig. 1) grew from both native protein stored at 4 C and trypsin-treated protein using reservoir research communications conditions consisting of 1.5-1.9 M ammonium sulfate, 0.7-12% dioxane and either 100 mM MES, 100 mM bis-tris or 10%(v/v) malate-MES-Tris buffer at a pH between 5.5 and 5.8. Crystals appeared after a week and grew to full size (30 Â 30 Â 120 mm) over the course of a month. No indication of crystal formation was observed for the inhibitor cocktailtreated protein even after setting up ten 96-condition screens at two different protein concentrations.

Data collection and processing
Glycerol was added to the reservoir to give a final concentration of 20%; crystals were cryoprotected by adding 2 ml of the glycerol-doped reservoir to the drop with crystals. Iodine was introduced into the crystals by adding a tiny crystal of solid I 2 to the reservoir solution (along with the glycerol) and this was used to briefly soak the crystals. Crystals were removed using a 50 mm MiTeGen Mylar loop and the crystals were cryocooled by plunging them into liquid nitrogen. Data were collected on the MX2 microfocus beamline at the Australian Synchrotron. 360 of data were obtained with the wavelength set to 8007.8 eV for the iodine-treated crystals. Data sets were collected at 13 600 eV from the native crystals. The data were processed with XDS (Kabsch, 2010) and scaled with AIMLESS (Evans & Murshudov, 2013). Xenon, bromine and native sulfur SAD data sets were also collected at appropriate energies to obtain the maximum anomalous signal given the constraints of the beamline.

Structure solution and refinement
The structure was initially solved with Auto-Rickshaw

Figure 1
Crystals of Ssr4. (Cowtan, 2006) model was then used as a starting point for manual rebuilding in Coot (Emsley et al., 2010) and refinement performed in REFMAC (Murshudov et al., 2011). MolProbity (Chen et al., 2010) was used to determine various quality measures of the structure, including the Ramachandran statistics listed in Table 1. Only the N-terminal half of the protein was seen in the crystal structure (Fig. 2); a topology diagram of the structure from PDBSum (Laskowski et al., 2018) is shown in Fig. 3.

Results and discussion
The full-length protein was expressed and initially purified, but it was cleaved during storage to leave just the N-terminal domain. The initial purification showed a major band at the expected weight of 46 kDa, but there were numerous minor contaminating bands, one of which we assume was a protease. Although the protein was cleaved by some contaminant, the addition of trypsin to the (cleaved) protein sample increased the number of crystals that were formed. The crystals themselves were orthorhombic, but with very similar b and c axes (see Table 1), so that autoprocessing would sometimes misassign the space group as tetragonal. Thermal melt curves were obtained for both the sample treated with protease inhibitor and the trypsin-treated sample. The protease-inhibited sample showed a high initial fluorescence in all buffer/salt combinations, but showed a clear melt transition, with a T m of 58.0 AE 0.1 C, in TBSA. The trypsintreated sample showed flatter pre-transition curves and gave a T m of 60.3 AE 0.1 C. This indicated that there might be a portion of the protein that was disordered under the conditions tested. The Meltdown reports are shown in Supplementary Fig. S2.
In the MS analysis, the protease-inhibited sample eluted as a single peak with a retention time of $15.4 min, with a major mass of 46 256 Da matching full-length Ssr4 with the loss of the N-terminal methionine. Accurate mass analysis is consistent with the loss of the N-terminal methionine (À131 Da), the presence of a single disulfide bond (À2 Da) and partial (phospho)gluconoylation of the His tag (+178/+258 Da), and some C-terminal clipping at Glu377. The partially trypsindigested SSR4 sample produced a complex ion chromatogram Ribbon and electrostatic views of Ssr4. The top panel shows three views of Ssr4 as a cartoon ribbon, highlighting the secondary structure of the protein.
The N-terminal His tag is coloured pink to distinguish it from the rest of the native structure, which uses the standard Jones colouring of blue for the N-terminus to red at the C-terminus. The rotations between figures are an approximately 90 , making the leftmost and rightmost images rotated by 180 . Also shown in sticks are the three arginine residues along 6. Below, in the same orientation, is the electrostatic potential calculated by APBS. This figure was made using PyMOL (version 1.8; Schrö dinger) and Gimp (https://www.gimp.org/).
showing two prominent peaks. Peak 1 at a retention time of $11.6 min contained a major mass of $14 172.3 Da, which was most likely to correspond to a C-terminal Gly298-Arg411 fragment, and peak 2 at 15.8 min contained a major mass of 22 401.6 Da, matching an N-terminal Gly2-Lys199 fragment with partial (phospho)gluconoylation of the His tag (+178/ +258 Da) (see Supplementary Figs. S1b-S1f ).
The Ssr4 structure was solved using a crystal that had been soaked with iodine and which showed two major peaks for I atoms in the electron density (about 40 and 30 above the background), with three more peaks (12, 11 and 5.5) that corresponded to either low-occupancy I or Cl atoms (some of these peaks are also seen in non-iodine-soaked structures). Auto-Rickshaw (Panjikar et al., 2005) was used to find the iodines, phase the data and build the initial structure via Buccaneer (Cowtan, 2006). After manual rebuilding and refinement, this initial structure was then used to phase a slightly higher resolution native data set (using Phaser;McCoy et al., 2007), and manual rebuilding using Coot (Emsley et al., 2010) gave a more complete structure, which was again refined using REFMAC (Murshudov et al., 2011) (see Table 1). The initial structure was also used to phase the sulfur SAD data using Phaser to determine whether these highly redundant data offered any additional aspects on the structure and for post hoc analysis. Density was seen for residues extending from the N-terminal tag to residue 179/180 of the Ssr4 protein in all three structures.
The structure of the Ssr4 N-terminal domain (Figs. 2 and 3) starts with a short helix (1; residues 2-12), followed by another helix (2; residues 15-20), a short -strand (1; residues 22-25) and then an unstructured coil that leads to a long helix (3; residues 37-51). This then leads to another short strand (2; residues 56-59), a loop to strand 3 (residues 68-73), a long loop to strand 4 (residues 92-98), a short turn to strand 5 (residues 101-108), a medium-size loop to strand 6 (the longest -strand; residues 122-130) and a short loop to strand 7 (residues 137-144). The rest of the structure, residues 145-179/180, is mostly a long unstructured coil with two single-turn helical sections. The seven -strands form an antiparallel sheet which is mostly covered on one side by the long helix (3; residues 37-51), and on the other side of the -sheet is a hole which has unstructured coils on the other side. Side chains fill this hole. It is interesting to note that the long -strand 6 (residues 122-130) has a set of arginine residues all oriented in one direction emanating from it (Arg124, Arg126 and Arg128). We hypothesize that trypsin cleaves the protein shortly after the last residue modelled (Glu180), as the sequence at this point is Glu-Pro-Lys-Lys. PDBeFold (Krissinel & Henrick, 2004) was used to compare the Ssr4 structure with all other structures in the PDB. Two different structures of LytM (PDB entries 4zyb and 4bh5, from S. aureus and E. coli, respectively; Grabowska et al., 2015;Peters et al., 2013) can be aligned with an r.m.s.d. of 2.9-3.0 Å over 60-70 residues (Q-scores of 0.09-0.10), and PDB entry 5j6p (C. Shao, C. Wang, M. Zhang, C., Zhang & J. Zang, unpublished work), another S. pombe protein (Mis18) can be aligned with an r.m.s.d. of 4.1 Å over 67 residues. The aligned residues are parts of the -sheet in both cases, with the rest of the protein not aligning at all. This suggests that this domain of SSR4 adopts a fold that is not currently represented in the public PDB. For this reason it was submitted to CASP14, with the results of the modelling to be presented in the future.
The N-terminal tag used for purification forms a helix from residues À9 to À2 and there is a break in the density in two of the three structures where the residues Ser1 and Ala2 should reside (Ser1 is the last residue of the tag and Ala2 is the first residue of native Ssr4). These two residues were modelled into one of the models as there is some weak density in the iodinephased data, with a break at Gly0. The helical part of the tag is packed against the N-terminal 1 helix of the Ssr4 protein body (Fig. 2). The density becomes thin before the third histidine of the His tag, with four residues (MGHH) missing from the N-terminus of the tag.
We calculated the electrostatic potential of Ssr4 using APBS ( Topology of the Ssr4 protein with the N-terminal tag included as determined by PDBSum. potential that might signal specific binding to DNA or histones, for example. Further studies will be required to determine how Ssr4 interacts in the SWI/SNF and RSC complexes and what its role might be in these chromatinremodelling complexes. The structure was determined using the anomalous signal from the iodine-soaked crystals, although previous data sets using xenon, bromine and sulfur as potential phasing vehicles had been collected to relatively high resolution and with good statistics prior to obtaining the iodine data set. We observe four well ordered methionines (with one more that is slightly less ordered) and two cysteines in the structure that should contribute to an anomalous sulfur signal for a protein that is less than 200 amino acids in length, and the data extended to 2.1 Å resolution with about 50-fold redundancy/multiplicity on average for the anomalous data (100-fold for the full data set treating the Friedel mates as equivalent). Despite the good statistics and observable signal, the peaks found were weak (only about 5) and the structure could not be solved after trying several different software packages (Auto-Rickshaw, CRANK2, Phenix AutoSolve anad SHELX; Panjikar et al., 2005;Skubá k & Pannu, 2013;Liebschner et al., 2019;Terwilliger et al., 2009;Sheldrick, 2015).
We suspect that the trypsin neatened the C-terminus left by the contaminating protease to create a crystallizable sample: the residues C-terminal to those seen in the structure are PKK, which correspond to trypsin sites at residues 182/183 in the full-length protein. Once it was recognized that the protein sample set up in crystallization trials had been cleaved, a new preparation was made and special care was taken to prevent proteolysis. This sample did not yield crystals. However, trypsin treatment of freshly prepared protein did yield the familiar rod-shaped crystals. In the LC-MS results of the trypsin-treated protein, peak 1 matches a C-terminal tryptic fragment corresponding to a predicted globular Ssr4 domain Gly298-Arg411 and peak 2 matches a large N-terminal tryptic fragment corresponding to a predicted globular Ssr4 domain Gly2-Lys199 (the numbering in these cases refers to the Ssr4 protein with an N-terminal His tag and TEV site). LC-MS analysis of the native protein showed some clipping of the C-terminal region of the protein, suggesting that the C-terminal region, although folded, is less stable than the N-terminal fragment that crystallized. The central region is predicted to be intrinsically unstructured and is probably rapidly proteolysed.
Most of the tag sequence, HHHHGTENLYFQGS, has clear electron density and forms a helix that packs nicely against the N-terminal domain of Ssr4; thus, the first 14 residues modelled are not native to Ssr4 (pink helix in Fig. 2).
As there were no homologous structures in the PDB which might serve as a molecular-replacement model for this protein structure, we attempted to use several different methods for de novo phasing: xenon, bromine and iodine soaks and native sulfur SAD phasing. Even after soaking, most of the crystals diffracted to beyond 2.5 Å resolution and complete data sets were acquired. In the case of the sulfur SAD data sets, multiple data sets were acquired, and these were sufficiently isomorphous to give a merged data set with 100-fold total (50fold anomalous) redundancy/multiplicity in the data. Despite the high redundancy and the reasonably high resolution (complete data extended to 2.1 Å resolution with reasonable statistics), there was insufficient sulfur signal to successfully phase the structure. Although the sulfur, bromine and iodine data sets all showed some anomalous signal in the scaled statistics, only the iodine data set had sufficient signal to solve the structure.

Conclusion
We have crystallized and solved the structure of the N-terminal domain (up to residue $180 of 395) of Ssr4 using an iodine derivative. This structure does not align well with any other structure in the public PDB; we thus considered it to be 'novel' and it was submitted to CASP for this reason.