The crystal structure of the human smacovirus 1 Rep domain

The structure of the human smacovirus 1 Rep domain was obtained at 1.33 Å resolution. This new HUH endonuclease offers a new ssDNA-binding sequence specificity that will be exploited to increase orthogonality among Rep families.


Introduction
Smacoviridae is a family of small CRESS-DNA (circular Repencoding single-stranded DNA) viruses.These viruses have been found in the feces of multiple animals and are suspected to cause gastrointestinal disease in humans (Krupovic & Varsani, 2021;Li et al., 2022).Indeed, CRESS-DNA viruses mainly infect eukaryotes.However, it was recently found that instead of direct infection of humans, smacoviruses may infect prokaryotes in the gut, making smacoviruses the smallest viruses to infect prokaryotes and functionally distinct from the majority of the family (Dı ´ez-Villasen ˜or & Rodriguez-Valera, 2019; Zhao et al., 2019;Li et al., 2022).
In addition to functional differences, there are putative structural differences in the replication initiator (Rep) domain in the HUH superfamily of enzymes responsible for processing single-stranded DNA (ssDNA) to replicate the genome during rolling-circle replication (Eisenberg et al., 1977;Chandler et al., 2013).Central to DNA processing of all HUH endonucleases is a structurally defined catalytic nickase domain that first recognizes a specific sequence/structure of DNA, nicks ssDNA at a 'nic site' to yield a sequestered 5 0 -end that remains covalently bound to the HUH endonuclease and a free 3 0 -OH that can be used as a primer for DNA replication, and finally facilitates a strand-transfer reaction to resolve the covalent intermediate (Fig. 1; Koonin, 1993;Ilyina & Koonin, 1992;Vega-Rocha et al., 2007;Boer et al., 2006;Chandler et al., 2013;Lovendahl et al., 2017).Named after a triad of residues, the HUH motif in the nickase domain is most often made up of two histidines separated by a bulky hydrophobic residue (U), but can also be histidine-U-glutamine.Several recent crystal structures have illustrated how viral Reps recognize and position ssDNA for cleavage (Luo et al., 2018;Everett et al., 2019;Tompkins et al., 2021;Smiley et al., 2023).Recent comparisons of CRESS-DNA Rep-domain protein sequences show that smacovirus Rep domains are both the smallest in size and the most divergent in sequence of the CRESS-DNA viral Reps (Tarasova & Khayat, 2022).
Finally, Rep domains from HUH endonucleases have been utilized as bioconjugation tags, termed HUH-tags, for applications that require covalent and specific protein-DNA bonds (Aird et al., 2018;Sagredo et al., 2016;Zdechlik et al., 2020).Thus, structural information will guide their engineering to bind to desired DNA sequences (Tompkins et al., 2021).
These interesting distinctions in function and domain composition suggest potential differences in structure and binding (i.e.bioconjugation) of the target DNA in smacoviruses.As a first step towards understanding the structural basis for the function of the smacovirus Rep domain in prokaryote infection, we solved a 1.33 A ˚resolution crystal structure of a smacovirus Rep domain and made structural comparisons with other CRESS-DNA viral Reps.

Protein production and purification
2.1.1.Cloning.A codon-optimized gene block of the Repdomain sequence from human smacovirus 1 (HSV1), accession No. AJE25845.1,was synthesized by Integrated DNA Technologies.An N-terminal His 6 -SUMO tag and 15 nucleotides homologous to the parent vector, pTD68, were included for cloning.The parent vector was linearized with the BamHI and XhoI restriction enzymes (New England Biolabs) and the gene block was ligated in using an In-Fusion HD Cloning Kit (Takara) as per the manufacturer's protocol.The ligated plasmid was transformed into competent Escherichia coli Stellar cells and plated onto 100 mg ml À 1 ampicillin plates.After overnight incubation at 37 � C, colonies were chosen and DNA was purified with a Qiagen Miniprep kit.Confirmation of the purified plasmid was performed by Sanger sequencing (Genewiz).Protein-production details are provided in Table 1.

Protein expression and purification.
Verified plasmids were transformed into E. coli BL21(DE3) cells and cultured in 1 l Luria-Bertani (LB) broth with 100 mg ml À 1 ampicillin at 37 � C. The culture was induced at an OD 600 of between 0.6 and 0.9 using 0.5 mM isopropyl �-d-1-thiogalactopyranoside and the cells were grown for 20 h at 18 � C. The cells were harvested by centrifugation and the pellet was resuspended in lysis buffer (50 mM Tris pH 7.5, 250 mM NaCl). 1 mM EDTA and a protease-inhibitor tablet (Pierce, Thermo Fisher) were added to prevent metal binding and degradation, respectively.Lysis was performed via sonication at 1 min intervals at 4 � C. The homogenous suspension was centrifuged at 24 000g for 25 min at 4 � C. The supernatant was incubated for 1 h on a rotator with 2 ml HisPure Ni-NTA agarose beads (ThermoFisher) and equilibrated with wash buffer (50 mM Tris pH 7.5, 250 mM NaCl, 1 mM EDTA, 30 mM imidazole).The supernatant was loaded onto a gravity column and allowed to flow through.Protein-bound beads were washed with 25 ml wash buffer and the protein was eluted with 5 ml elution buffer (50 mM Tris pH 7.5, 250 mM NaCl, 1 mM EDTA, 250 mM imidazole).The eluted protein was dialyzed in 50 mM Tris pH 7.5, 150 mM NaCl, 1 mM EDTA.The His 6 -SUMO tag was cleaved with 5 ml ULP1 (1 U ml À 1 ) overnight at 4 � C and incubated with Ni-NTA agarose beads, and the flowthrough was collected.The protein-containing flowthrough was further purified using an Enrich SEC70 (Bio-Rad) size-exclusion chromatography column.Fractions containing the 16 kDa target protein were pooled and concentrated to 2.7 mg ml À 1 using a spin concentrator (Amicon Ultra-15 Centrifugal Filter Unit, 3 kDa molecular-weight cutoff).

Crystallization
A protein solution containing a 10 bp DNA oligonucleotide sequence of the smacovirus origin of replication (AGTAT-TACGC) and Mn 2+ was prepared in a 1:2:2 ratio.Drops consisting of 2 ml protein solution and 1 ml well solution were added to hanging-drop slides using the hanging-drop vapordiffusion method.The well solution was composed of 0.1 M sodium acetate pH 5.0, 20% PEG 4000, 1 M guanidine-HCl.Upon crystal harvesting, 17% glycerol was added as a cryoprotectant.Crystallization details are listed in Table 2.

Data collection and processing
The

Figure 1
The catalytic activity of HUH endonucleases relies on the HUH/Q and tyrosine motifs to coordinate the nucleophilic attack on the DNA phosphate backbone (adapted from Tompkins et al., 2021).
detector.The data set resulted in a 1.33 A ˚resolution model.Data-collection and processing details are provided in Table 3.

Structure solution and structure refinement
Molecular replacement with other viral Reps did not provide sufficient phasing information; therefore, a molecularreplacement search model was first generated by AlphaFold2 (Jumper et al., 2021).The top generated model was then trimmed with PyMOL (version 2.0; Schro ¨dinger) at the C-terminal end to remove short segments.The structure was solved with Phaser (McCoy et al., 2007) using the trimmed AlphaFold2-predicted model and was refined with Phenix 1.17.1 (Liebschner et al., 2019) and Coot (Emsley et al., 2010).MolProbity (Chen et al., 2010) was used for Ramachandran analysis.During refinement, it was determined that no ssDNA was bound to the structure.Structure solution and refinement statistics are listed in Table 4.The final model was deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank as PDB entry 8fr5.

Crystallization and structure determination
To uncover structural differences compared with other ssDNA-bound HUH-tags, attempts to co-crystallize the ssDNA-bound protein were performed by mutating the catalytic tyrosine (Tyr81) to a phenylalanine.This allows the coordination of the ssDNA but not covalent linkage to the ssDNA (Larkin et al., 2005).This is because the covalently linked ssDNA is cleaved and the orientation of the ssDNA is changed (see Fig. 2), which does not inform us as to the precleavage coordination orientation.While attempts to obtain the bound/coordinated structure were unsuccessful, we did obtain an unbound structure of HSV1 Rep at 1.33 A ˚resolution (Fig. 3).The lack of 2F o À F c electron density supporting the absence of ssDNA bound to HSV1 Rep is illustrated in Supplementary Fig. S1.Protein crystals formed within days in many of the well conditions screened.The well condition that resulted in the largest crystals was 0.1 M sodium acetate pH 5.0, 20% PEG 4000, 1 M guanidine-HCl.The crystal belonged to space group P211.The unit-cell parameters were a = 31.16,b = 49.37,c = 31.38A ˚, � = 90.00,� = 110.30,� = 90.00� .There was one protein molecule in the asymmetric unit.The final values of R work and R free were 0.187 and 0.224, respectively.

Structure analysis
Attempts were made to model the GEDG residues in the electron density adjacent to the HUH/Q motif (Chandler et al., 2013) but were unsuccessful, indicating that the flexibility of the loop in this region is unrestrained, thus resulting in poor electron density.Modeling of ssDNA in the electron density adjacent to the catalytic domains, HUQ and tyrosine motifs for the bound/coordinated structure was also unsuccessful.The resulting unbound structure consists of �1, �1, �2, �3, �2, �4 and �3 secondary structures, with the �-sheets in an antiparallel layout (Fig. 3).The catalytically dead phenylalanine substituting for the reactive tyrosine residue resides within �3 and the coordinating histidine and glutamine residues reside within �3.The overall fold of Rep is highly conserved among families of Reps (Fig. 4).When a sequence and structure alignment was performed using PROMALS3D (Pei et al., 2008), we found that the Rep from porcine virus 2 (PCV2; PDB entry 5xor) from the circovirus family is structurally closer to that from wheat dwarf virus (WDV; PDB entry 6q1m) from the geminivirus family than that from HSV1    The coordination of the ssDNA by the Rep during the pre-and post-cleavage complexes.A mutation from tyrosine to phenylalanine does not allow the cleavage reaction to proceed but retains the ssDNA-binding ability of the Rep (Larkin et al., 2005).

Figure 3
Ribbon model of HSV1 Rep showing the secondary-structure organization (left) consisting of �1, �1, �2, �3, �2, �4 and �3, and the orientation of the catalytic HUQ and tyrosine (Y81F in our model) motif (right).In our noncoordinated structure, the noncoordinating amino acid 'U' is oriented away from the binding site and is therefore not shown here.�-sheets.Another difference among the families compared here is in the orientation of the HUH/Q and tyrosine residues in the catalytic motifs (Fig. 3), but this could also be explained by the absence of the divalent metal ion that is required to prime the active site for nucleophilic attack on the DNA substrate (Hickman et al., 2002(Hickman et al., , 2004).

Figure 4
Figure 4 Structural alignment of HSV1 Rep (gray) with WDV Rep (orange, PDB entry 6q1m) on the left and PCV2 Rep (green, PDB entry 5xor) on the right.Superimpositions illustrate the structural conservation among the Reps (top) and the orientation of the catalytic HUH/Q and tyrosine residues (bottom).The r.m.s.d.value on superimposition of HSV1 and WDV is 2.4 A ˚and that for HSV1 and PCV2 is 3.3 A ˚.

Figure 5
Figure 5Structural and sequence comparison of HSV1 Rep with WDV Rep and PCV2 Rep using PROMALS3D(Pei et al., 2008).�-Strands are shown in blue and �-helices in red.Consensus amino-acid symbols: conserved amino acids are shown as bold uppercase letters; h, hydrophobic; s, small; p, polar; c, charged; -, negatively charged.

Table 3
Data collection and processing.