Structure of an H3N2 influenza virus nucleoprotein

The influenza virus nucleoprotein binds to the viral RNA genome and is essential for virus replication. Here, the structure of the nucleoprotein from an H3N2 virus is presented at 2.2 Å resolution.


Introduction
Influenza A viruses (IAVs) make a large contribution to the seasonal influenza burden and have established pandemic potential. The major antigenic components of IAVs are the hemagglutinin and neuraminidase proteins that decorate the viral envelope. These proteins are used to classify IAVs into different subgroups by assigning them H and N numbers (for example H1N1, H3N2, H5N1 etc.). IAVs have a broad host range, covering a wide variety of mammals and birds. However, currently only IAVs of two subtypes, H1N1 and H3N2, exhibit sustained human-to-human transmission.
The IAV genome consists of eight segments of negativesense RNA (vRNA), each encoding at least one essential protein. Each segment is assembled into a ribonucleoprotein complex, with the 5 0 and 3 0 termini both bound by the trimeric influenza virus polymerase. The rest of the segment is bound, on average, every 25 nucleobases (Ortega et al., 2000;Hutchinson et al., 2014) by the 56 kDa influenza virus nucleoprotein (NP). The NP forms homo-oligomers along the vRNA by inserting a loop, located close to its C-terminal tail, into the body domain of a neighbouring NP. NP is a multifunctional protein that influences the structure of the vRNA (Lee et al., 2017;Williams, Townsend et al., 2018;Dadonaite et al., 2019), with essential roles in nuclear trafficking of vRNAs (O'Neill et al., 1995) and replication (Portela & Digard, 2002).
Structures have been determined of NPs from influenza A, B (Ng et al., 2012) and D (Donchet et al., 2019) viruses. For IAVs, these include the A/WSN/1933 H1N1 (WSN; Ye et al., 2006) and A/Hong Kong/483/97 H5N1 (HK97; Ng et al., 2008) viruses. The structure of a monomeric mutant of the WSN NP, containing an R416A mutation (located in the oligomerization ISSN 2053-230X loop), has also been determined (Chenavas et al., 2013). No high-resolution structure has been determined of an NP associated with RNA. However, mutational analysis and structural information suggest that RNA binding is mediated by a basic groove located between the head and body domains (Ye et al., 2006;Elton et al., 1999). This basic groove is thought to associate with the phosphate backbone of the RNA. The interaction does not exhibit sequence specificity (Williams, Townsend et al., 2018), and NP also associates with singlestranded DNA (Newcomb et al., 2009).
Efforts are under way to develop antiviral therapeutics targeting NP (Hu et al., 2017), as well as to use it as a target for universal vaccines that are effective against multiple strains of influenza (Sun et al., 2020;Pleguezuelos et al., 2020). This work may be aided by a greater understanding of NP conservation at a structural level; however, no structure has previously been determined of an NP from an H3N2 virus. Here, we present the structure of the NP from the A/Northern Territory/60/1968 (H3N2) influenza virus (NT60) and discuss how it differs from previously determined influenza virus NP structures.

Macromolecule production
The sequence for the NT60 NP containing an R416A mutation was amplified from the vector pFL-TAP-NP R416A (Turrell, 2015). The R416A mutation was used as it has previously been shown to make the NP monomeric, and we reasoned that this construct would be more suitable for crystallization (Ye et al., 2006). This sequence was optimized for expression in Spodoptera frugiperda insect cells. Primers containing overhangs were used to generate a fragment with BamHI and EcoRI restriction sites at either end. The resulting fragment was ligated into the pGEX-6P-1 expression vector (for expression with an N-terminal glutathione S-transferase tag) and transformed into competent DH5 cells. Plasmid DNA was then extracted using a QIAprep Spin Miniprep kit (Qiagen) and successful integration of the insert was confirmed by sequencing. The constructs were transformed into Escherichia coli BL21 (DE3) cells and a 50%(v/v) glycerol stock was produced and stored at À80 C.
Bacteria from the glycerol stock were used to inoculate 10 ml lysogeny broth (LB) containing ampicillin (100 mg ml À1 ) and grown at 37 C overnight. The following day, the overnight culture was used to inoculate 2 l LB medium at a ratio of 1:100. Once an OD 600 of 0.6 had been reached, protein expression was induced by the addition of isopropyl -d-1-thiogalactopyranoside to a final concentration of 1 mM and the temperature was reduced to 18 C. After 16 h, the bacteria were pelleted by centrifugation at 4000g for 15 min at 4 C. The pellets were then resuspended in 25 ml wash buffer [50 mM HEPES-NaOH pH 7.5, 500 mM NaCl, 10%(v/v) glycerol, 0.05%(w/v) octyl -d-1-thioglucopyranoside (OTG)] with the addition of 50 ml 1 M DTT, 2.5 mg RNase A, one SIGMAFAST protease inhibitor tablet (Sigma), 10 ml Benzonase Nuclease (Sigma) and 35 mg lysozyme. The pellet was then resuspended prior to lysis by sonication. The lysed cells were centrifuged at 35 000g for 45 min at 4 C. 1 ml Glutathione Sepharose 4B beads (GE Healthcare) was added to the clarified supernatant, which was incubated at 4 C for 3 h with gentle rotation. The beads were collected by centrifugation at 2000g at 4 C for 3 min and the supernatant was removed. The beads were then washed five times with highsalt wash buffer [50 mM HEPES-NaOH pH 7.5, 1.5 M NaCl, 10%(v/v) glycerol, 0.05%(w/v) OTG]. The beads were washed again with wash buffer containing 5 mM DTT before being resuspended in 10 ml wash buffer supplemented with 5 mM DTT, 0.2 mg HRV 3C protease and 5 ml Benzonase Nuclease. After incubation overnight at 4 C with gentle rotation, the beads were pelleted at 2000g for 5 min at 4 C. The supernatant containing released protein was then collected and concentrated.
The concentrated protein was applied onto a Superdex 200 Increase 10/300 GL column (GE Healthcare) equilibrated with a buffer consisting of 25 mM HEPES-NaOH pH 7.5, 150 mM NaCl. The fractions containing the NP were pooled and concentrated in a 30 kDa Millipore Protein Concentrator to a protein concentration of $20 mg ml À1 . The correct molecular weight of the protein was confirmed by SDS-PAGE with Coomassie Blue staining. The protein was flash-frozen using liquid nitrogen and stored at À80 C. Macromoleculeproduction information is summarized in Table 1.

Assessment of nucleic acid binding
The ability of the purified R416A NP to bind RNA was assessed by mixing a 1:1 molar ratio of NP with RNA of either five (5 0 -AGUAG-3 0 ) or 14 (5 0 -CCUCUGCUUCUGCU-3 0 )  Table 1 Macromolecule-production information. nucleotides in length in buffer consisting of 25 mM HEPES-NaOH pH 7.5, 150 mM NaCl. After incubation for 10 min at room temperature, the mixture was subjected to size-exclusion chromatography (SEC) as described earlier. The NPcontaining fraction was collected and the A 260 /A 280 ratio was assessed. The ability of the purified NP to bind DNA was assessed by mixing the purified NP in a 4:1 molar ratio with a 100-nucleotide DNA in buffer consisting of 25 mM HEPES-Na pH 7.5, 150 mM NaCl. After incubation for 10 min at room temperature, the mixture was subjected to SEC. RNA binding was further investigated using a ThermoFluor assay (Walter et al., 2012) with a G nucleotide, a 5 0 -AG-3 0 dinucleotide or the oligonucleotides 5 0 -UAUGAGGC-3 0 , 5 0 -AAAAAAAAAAAA-3 0 and 5 0 -GUAUAUGAGGCCCA-3 0 . Each sample was analysed in triplicate in a 96-well PCR plate in an Mx3005P qPCR System (Agilent). The excitation filter was set to 492 nm and the emission filter to 585 nm. Data were collected in the range 25-95 C using an 'expanding sawtooth' profile in which fluorescence is always recorded at 25 C after 30 s incubations at increasing temperatures. A total volume of 40 ml was used (buffer: 25 mM HEPES-NaOH pH 7.5, 150 mM NaCl) containing 3 mg NP, 20 mM RNA and a 1:100 dilution of SYPRO Orange (Invitrogen). Melting curves were fitted and melting temperatures were determined using the JTSA web server (Bond, 2017).

Crystallization
The protein was diluted to 10 mg ml À1 in a buffer consisting of 25 mM HEPES-NaOH pH 7.5, 150 mM NaCl. Crystallization trials were undertaken in Swissci 3-drop plates with a drop volume of 200 nl. The conditions which yielded the best diffracting crystal are summarized in Table 2.

Data collection and processing
A number of data sets were collected from cryocooled crystals at Diamond Light Source (DLS), Didcot, UK. Datacollection parameters and merging statistics for the bestdiffracting crystal are summarized in Table 3. Data were processed using autoPROC (Vonrhein et al., 2011) and an anisotropic cutoff was applied to the data using STARANISO (Tickle et al., 2018). The data were weakly anisotropic and were thus truncated anisotropically, giving rise to low spherical completeness and I/(I) values.

Structure solution and refinement
The data quality was assessed for pathologies using phenix.xtriage (Zwart et al., 2005). The structure was then solved by molecular replacement in Phaser (McCoy et al., 2007) using a previously determined WSN R416A NP model (PDB entry 3zdp; Chenavas et al., 2013). Iterative rounds of automated refinement were performed in phenix.refine (Afonine et al., 2012) and manual model adjustment in Coot (Emsley et al., 2010). MolProbity (Williams, Headd et al., 2018) was used throughout for model validation. Data have been deposited in the PDB with the accession code 7nt8. Structural figures were all prepared using ChimeraX (Pettersen et al., 2021). Refinement statistics are summarized in Table 4.

Results and discussion
The NT60 monomeric mutant R416A NP was expressed in E. coli. After multiple high-salt washes and nuclease treatment,    the protein was purified by SEC. A single symmetric peak was observed during SEC, which eluted at a volume consistent with the mass of monomeric NP (Fig. 1a). The peak position and the A 260 /A 280 ratio of 0.49 indicate that the NP was successfully stripped of endogenous nucleic acids from the expression host. The ability of the monomeric R416A NP to bind RNA was investigated using a ThermoFluor assay, in which the melting temperature of the NP was determined in association with different length RNAs (Fig. 1b). The melting temperature of the NP mixed with a 14-nucleotide RNA was increased by 2.8 C compared with that of the NP in the absence of RNA (p < 0.0001, one-way ANOVA), suggesting that this association increased the stability of the NP. Shorter oligoribonucleotides did not significantly increase the melting temperature, although it cannot be excluded that the NP could be stabilized by shorter length RNAs with different sequences.
RNA binding was further assessed by mixing the purified R416A NP in a 1:1 molar ratio with a five-or 14-nucleotide RNA. The A 260 /A 280 ratio of the NP-containing fraction was then measured post-SEC. The NP mixed with the fivenucleotide RNA gave an A 260 /A 280 value of 0.53 and the NP mixed with the 14-nucleotide RNA gave a value of 1.05. This indicates that the 14-nucleotide RNA is able to associate with the NP strongly enough to remain bound through SEC, but the five-nucleotide RNA is not. The ability of the R416A NP to bind DNA was also assessed by mixing purified NP in a 4:1 molar ratio with a 100-nucleotide DNA and performing SEC. This produced a second, earlier elution peak (Fig. 1a) that is likely to represent multiple NPs associating with a single piece of DNA (100-nucleotide DNA has a mass of $30.7 kDa).
A range of crystallization trials were set up for the NT60 R416A NP both in the presence or absence of a 1.7-fold molar excess of 14-nucleotide RNA. Despite its ability to bind to and be stabilized by 14-nucleotide RNA, no RNA could be resolved from the crystals produced in its presence. The bestdiffracting crystal produced in the absence of RNA gave a maximum resolution of 2.2 Å (Fig. 1c), with two NPs per asymmetric unit (referred to as chains A and B), in space group P2 1 . The NP structure consists of head and body    domains composed primarily of -helices. A basic groove, thought to be the site of RNA binding, lies at the interface of these two domains. This groove contains a large number of arginine and lysine residues that, whilst located closely together in the folded structure, are dispersed widely in the protein sequence. Both NP chains are resolved from residues 21 to 389. Most of the oligomerization loop could not be resolved, with residues 390-417 and 390-437 disordered in chains A and B, respectively. At the C-terminus, residues 452-461 and 497-498 were not resolved.
The amino-acid sequence of the NP is highly conserved amongst IAVs. The NT60 NP shares 93.6% and 91.4% aminoacid sequence identity with the WSN (H1N1) and HK97 (H5N1) NPs, respectively, for which structures have previously been determined. The structure of the NT60 R416A NP is highly similar to other published IAV NP structures, with rootmean-square deviations of 1.2 Å compared with the WSN R416A NP (across 439 pairs), 4.0 Å compared with the WSN NP (across 393 pairs) and 5.5 Å compared with the HK97 NP (across 429 pairs). The differences in the amino-acid sequences of these three IAV NPs are widely dispersed both at the sequence level (Fig. 2a) and the structural level (Fig. 2b). Only one nonconserved residue is present in the basic region forming the predicted RNA-binding groove (Fig. 2c). A lysine at position 77 in the NT60 and WSN NPs is replaced by an arginine in the HK97 NP, maintaining the basic charge.
The major difference between the structure presented here and those previously determined is the position of the 73-90 loop, the deletion of which produces an approximately fivefold decrease in RNA-binding affinity (Ng et al., 2008). In the H1N1 R416A structure, residues 82-89 of this loop extend into the putative RNA-binding site, whilst in the H5N1 model these residues are disordered and were not modelled. In chain A of our model we observe that residues 82-89 point away from the RNA-binding groove (Fig. 2d). The density for this region is incomplete in chain B. The 73-81 region of the loop appears to adopt a more conserved structure. This region of the loop appears to be critical to RNA binding, with simultaneous mutation of the Arg74 and Arg75 residues along with Arg174, Arg175 and Arg221 (which are located on the opposite side of the RNA-binding groove) having been shown to abolish RNA binding (Ng et al., 2008).
We observe that the C-terminus of the NT60 R416A NP folds towards the RNA-binding groove. This was observed for the R416A WSN monomeric mutant NP structure but not in the oligomeric structures. It has been suggested that this folding of the tail reduces the positive charge of this groove (Chenavas et al., 2013) and may explain the reduced RNA-binding affinity of the monomeric mutant (Elton et al., 1999).
We have presented the structure of the NT60 R416A NP at 2.2 Å resolution. The structure is highly similar to that of previously reported NP structures, but contributes to our understanding of structural conservation amongst the NPs from IAVs. This may aid in the design of therapeutics with activity against multiple subtypes of IAV to improve responses to future epidemic and pandemic events.