The crystal structure of PD1, a Haemophilus surface fibril domain

The crystal structure of PD1 from the Haemophilus surface fibril was determined at a resolution of 3.3 Å, revealing a novel domain arrangement.


Introduction
Haemophilus influenzae is a Gram-negative, facultative anaerobic bacterium that commonly causes upper respiratory tract infections, pneumonia and acute meningitis (Danovaro-Holliday et al., 2008;Murphy et al., 2009). Different strains of H. influenzae are either encapsulated or unencapsulated, with the former subdivided into serotypes a-f and the latter described as nontypeable (Barenkamp & St Geme, 1996). H. influenzae infection is established by adherence of the pathogen to the host epithelial cell linings and various extracellular matrix (ECM) proteins (e.g. vitronectin), in a process mediated by many pilus and nonpilus adhesive factors (Cotter et al., 2005;Virkola et al., 2000;Hallströ m et al., 2006). Adhesion allows the bacterium to avoid clearance by the host, and facilitates the establishment of a deep-seated infection via numerous virulence mechanisms. While all strains of H. influenzae are pathogenic, it is the virulent type b (Hib) that, before the introduction of an effective vaccine in the 1990s, accounted for the greatest rates of patient morbidity and mortality. One such virulence factor utilized by Hib is the Haemophilus surface fibril (Hsf), a trimeric autotransporter adhesin (TAA) protein that shares significant homology with another, better-characterized H. influenzae TAA known as Hia (Cotter et al., 2005;Singh et al., 2015).
TAAs, which are part of the type V family of secreted proteins, have three major types of domains arranged in a linear fibril 'lollipop' structure. Head and stalk domains are interspersed from the N-terminus in the extracellular region.
Head domains, which are formed from -sheets with either transversal architectures, such as the YadA-like (YIhead) domains (Nummelin et al., 2004), or interleaved architectures, such as the tryptophan-ring (TrpRing) domains (Szczesny et al., 2008), typically mediate the adhesive activity of the proteins. The stalk forms a trimeric coiled-coil structure, with periodicity varying from heptads to pentadecads depending on the degree and direction of supercoiling (Hernandez Alvarez et al., 2010). Finally, the C-terminal translocator domain is a trimeric -barrel, with each subunit contributing one amphipathic -helix plus four -sheets (Meng et al., 2008). This domain is responsible for the translocation of the remainder of the protein through the membrane and is found in all TAAs (Lehr et al., 2010). The highly conserved nature of this domain is in contrast to the diversity observed in the TAA stalk and head domains.
Recent studies have suggested that Hsf has an apparently novel 'hairpin-like' structure, based on EM images (Singh et al., 2015). In their shared regions, Hia and Hsf have 72% sequence identity (Hia161-1098 and Hsf1484-2413; Supplementary Fig. S1), but full-length trimeric Hsf ($750 kDa) is more than double the size of Hia ($340 kDa). The two binding domains of Hia (HiaBD1 and HiaBD2) have also been identified in Hsf (Laarmann et al., 2002); unlike Hia, however, Hsf has an additional binding domain (HsfBD3) and three putative domains, the structure and function of which are unknown. Moreover, a limited in silico approach to modelling the domains of Hsf revealed that it is likely to be a linear TAA of $200 nm in length (Singh et al., 2015). Despite this, electron micrographs of Hsf expressed in H. influenzae RM804 appeared to show Hsf not as a linear TAA but as a double-folded hairpin-loop structure. Mapping of the domain arrangement suggested that the N-terminus of Hsf is located close to the membrane, consistent with the 'hairpin-like' hypothesis.
In addition to its adhesive function, Hsf has been shown to bind the complement inhibitor vitronectin (Vn): the interaction has been mapped to HsfBD2 and the C-terminal Vn residues 352-374 (Hallströ m et al., 2006;Singh et al., 2014). Acquisition of this glycoprotein, which is found in both serum and the ECM, allows H. influenzae to evade the complement system and adhere better to the epithelial surface, augmenting  SDS-PAGE, native PAGE and SEC-MALLS demonstrating that the PD1 domains cause irreversible aggregation. (a) SEC-MALLS chromatogram of IMAC-purified PD1-GCN4. The degree of aggregation, as observed by the LS peak at the void volume ($5 min), the multiple dRI and UV peaks, and an incorrect molecular weight, demonstrated that the purified protein was not amenable to crystallization. (b) Native PAGE of PD1-GCN4 in the presence of increasing concentrations of urea. Increasing the urea concentration had no effect on migration and hence no effect on aggregation. Lane 1, carbonic anhydrase; lanes 3-7, PD1-GCN4 in urea at varying (0, 0.5, 1, 2 and 4 M) concentrations. (c) SDS-PAGE of HsfPD1 purified by IMAC (lanes 1-8) and SEC (lanes 9-13). High levels of expression were evident (lanes 4, 5 and 6) after the proteins were separated on a gradient gel (4-20%) and visualized with Coomassie Blue. Lane M, molecular-weight marker (labelled in kDa); lane 1, unbound; lanes 2-3 and 8, wash; lanes 4-7, IMAC elution fractions; lanes 9-13, SEC fractions. (d) SEC-MALLS chromatogram of IMAC-and SEC-purified HsfPD1 [the peak corresponds to one SEC fraction, lane 12 in the SDS-PAGE gel in (c)]. Alignment of the LS, UV and dRI peaks, and a correct molecular weight, confirmed the presence of trimeric, non-aggregating protein.
bacterial virulence. This may partly explain why, in contrast to Hia, Hsf is expressed in the most virulent, typeable strains of H. influenzae.
Here, we report the crystal structure of a Hsf putative domain, PD1. This structure reveals a novel domain arrangement for PD1, N-TrpRing:KG:TrpRing-C, and hence replaces the domain architecture previously described by in silico sequence analysis. This work constitutes an ongoing effort to determine the full-length structure of Hsf in order to determine whether this TAA adopts the hypothesized novel 'hairpin-like' structure (Singh et al., 2015).

Materials and methods
2.1. Macromolecule production 2.1.1. PD1-GCN4. The Hsf domain PD1 was cloned between two GCN4 anchor proteins. GCN4 is a well characterized yeast transcription factor that forms a coiled-coil dimer in its native state. However, mutagenesis of specific residues in its hydrophobic core allows GCN4 to adopt various oligomeric states. Owing to this, variations of GCN4 are often used as partners for fusion proteins to facilitate stable oligomerization. In this case, the idea was to add a well characterized trimer-forming variant of GCN4 to both the N-and C-terminus to facilitate the stable trimerization of HsfPD1 (Hernandez Alvarez et al., 2008), as successfully used by the Lupas group in a number of structures (Hartmann et al., 2012;Koiwai et al., 2016). This fusion protein, PD1-GCN4, was expressed from a pIBA-PD1-GCN4tri-His 6 plasmid generated using restriction-free (RF) cloning. The PD1 gene was amplified by polymerase chain reaction from a pET-16bhsf 1-2414 plasmid. The primers were designed to generate a 'megaprimer' containing the PD1 gene with complementary overhangs to the destination vector, pIBA-GCN4tri-His 6 (Supplementary Table S1). pIBA-GCN4tri-His 6 was linearized by restriction digestion with XhoI (New England Biolabs) and used as a template in a secondary round of PCR to insert the PD1 gene (contained within the 'megaprimer') into the plasmid. Expression of PD1-GCN4 was induced at an OD 600 of 0.6 by the addition of anhydrotetracycline hydrochloride to a final concentration of 8.6 mM for 4 h. The cells were collected by centrifugation (2000g for 10 min at 277 K), stored at 193 K overnight and resuspended in buffer A consisting of 50 mM NaH 2 PO 4 , 500 mM NaCl pH 8.0. The cells were lysed by sonication and supernatants were collected by centrifugation (16 000g for 10 min at 277 K). The protein was purified by immobilized metal ion-affinity chromatography (IMAC). The cleared supernatant containing PD1-GCN4 was applied onto a Ni-NTA agarose column (GE Healthcare) previously equilibrated with buffer A (2 Â 6 ml; three column volumes) and allowed to bind for 1 h with agitation. Proteins were eluted in buffer B consisting of 50 mM NaH 2 PO 4 , 500 mM NaCl, 10% glycerol, 300 mM imidazole pH 8.0. The quality of the purified protein was assessed by size-exclusion chromatography coupled to a multi-angle laser light scattering (SEC-MALLS) apparatus (Fig. 1a). SEC-MALLS was carried out using a Superdex 200 5/150 column pre-equilibrated with buffer C consisting of 50 mM Tris, 500 mM NaCl, 10% glycerol pH 8.0 at a flow rate of 0.2 ml min À1 and was detected using a DAWN 8+ multi-angle light-scattering (LS) detector, an Optilab T-rEX differential refractive-index (dRI) detector and a UV-absorbance (UV) detector (Wyatt).
To prevent the aggregation of PD1-GCN4 (demonstrated by SEC-MALLS), the purification was repeated in the presence of increasing concentrations of urea. The protein was expressed and the cells were lysed as above. Subsequently, the cleared supernatant was applied onto Ni-NTA agarose resin (2 ml) and allowed to bind for 1 h with agitation. Purification was performed in batch mode. The resin was washed with buffer A (2 Â 6 ml; three column volumes) and then divided into five equal volumes for elution of protein in different buffers: buffer B containing 0, 0.5, 1, 2 and 4 M urea. The protein was eluted and fractions were collected for native PAGE analysis (Fig. 1b).
2.1.2. PD1. Owing to aggregation problems with PD1-GCN4, we also expressed PD1 from a pET28-PD1-His 6 plasmid generated using restriction-free (RF) cloning in the same way as PD1-GCN4 (x2.1.1). The PD1 gene was amplified by PCR from the pIBA-PD1-GCN4tri-His 6 plasmid using primers capable of producing a 'megaprimer'. The pET-28-Tcfa-His 6 destination vector was linearized by restriction digestion with XhoI and NcoI (New England Biolabs) to remove the tcfA gene (while retaining the His 6 tag). This plasmid backbone was the template for a secondary round of PCR, utilizing the 'megaprimer', to insert the PD1 gene into the plasmid (Table 1). Expression of PD1 was induced at an OD 600 of 0.6 by the addition of isopropyl -d-1-thiogalactopyranoside (IPTG) to a final concentration of 0.5 mM for 4 h. The cells were collected and stored as before (x2.1.1) and resuspended in buffer C consisting of 50 mM Tris, 150 mM NaCl pH 8.0. The cells were lysed by sonication and supernatants were collected by centrifugation (16 000g for 10 min at 277 K). The protein was purified via IMAC on a Ni-NTA agarose column previously equilibrated with buffer C (2 Â 6 ml; three column volumes) and allowed to bind for 1 h with agitation. Proteins were eluted in buffer D consisting of 50 mM Tris, 150 mM NaCl, 300 mM imidazole pH 8.0, and the pooled fractions were concentrated to 500 ml. Further purification was carried out by size-exclusion chromatography  Table 1 PD1 production information.

GTGGTGGTGGTGCTCGAGGTCTAGTTTTAAGCCA-TCAGCCAC
Cloning vector pET-28 Expression vector pET-28 Expression host Escherichia coli BL21* Complete amino-acid sequence of the construct produced

MDFVSGDKDTTSVTVESKDNGKRTEVKIGAKTSV-IKDHNGKLFTGKELKDANNNGVTVTETDGKDE-GNGLVTAKAVIDAVNKAGWRVKTTGANGQNDD-FATVASGTNVTFADGNGTTAEVTKANDGSITV-KYNVKVADGLKLD
(SEC) on a Superdex 200 10/300 column pre-equilibrated with buffer E consisting of 50 mM Tris, 600 mM NaCl pH 8.0 and eluting imidazole-free protein at a flow rate of 0.2 ml min À1 . The purified fractions were then pooled and concentrated to 15 mg ml À1 for crystallization. The quality of the purified protein was assessed prior to crystallization by SDS-PAGE and SEC-MALLS, carried out as described above, using buffer C ( Fig. 1c and 1d).

Crystallization
Initial PD1 crystals were obtained using the Wizard Classic 3 and 4 crystallization screens (Molecular Dimensions) using the following conditions: protein concentration 15 mg ml À1 , 1 M LiCl, 0.1 M sodium citrate pH 4, 20%(w/v) polyethylene glycol (PEG) 6000. Crystallization was performed at 293 K using the sitting-drop vapour-diffusion method, in which 100 nl protein solution was mixed with an equal volume of reservoir solution. Drops were set up using a Formulatrix NT8 crystallization robot. Since the initial crystals diffracted poorly, further crystallization optimization of PD1 was performed. The best diffracting crystals grew from 0.75 M LiCl, 0.1 M sodium citrate pH 3.9, 17.6%(w/v) PEG 6000 (Table 2). Owing to the presence of PEG, this solution already had cryoprotectant properties and thus the crystals were flashcooled directly in liquid nitrogen for data collection.

Data collection, processing and structure determination
Data were collected on beamline I03 at Diamond Light Source (DLS). Although radiation damage restricted the data set to the first 1100 images, the crystals diffracted to 3.3 Å resolution (Table 3). Indexing and integration were performed using XDS (Kabsch, 2010), while scaling and merging statistics were calculated using AIMLESS (Evans & Murshudov, 2013). The structure of PD1 was solved by molecular replacement (MR) with Phaser (McCoy et al., 2007), using the non-adhesive domain of Hia307-422 (PDB entry 3emi; Meng et al., 2008;72.3% sequence identity) as the search model. Phaser found one unique solution in space group C2, with nine monomers in the asymmetric unit forming three trimers. The translationfunction Z-score (TFZ) of 39.95 and log-likelihood gain (LLG) of 2010 indicated a correct MR solution. Refinement was carried out with PHENIX (Adams et al., 2010), using secondary-structure and noncrystallographic symmetry torsion restraints, and the structure was refined to an R factor of 0.296 (Table 4).

Purification of PD1-GCN4 and PD1
Initial efforts to purify PD1 utilizing GCN4 anchors (PD1-GCN4; Hernandez Alvarez et al., 2008;Deiss et al., 2014) were not successful owing to complete protein aggregation (Fig. 1a). This was unexpected, as 18 crystal structures of TAA domains have already been solved using this method. We thought that the aggregation might be owing to hydrophobic interactions between the head domains and/or owing to improper folding of these domains arising from their flanking by the GCN4. We therefore decided to purify the protein in the presence of increasing concentrations of urea (0.5-4 M) to prevent aggregation and to use native PAGE to assess the level of aggregation (Fig. 1b). However, this method was unsuccessful in reducing aggregation, as the protein still did not migrate as expected in the gel, suggesting that the GCN4 anchors cause extensive misfolding and not just a small amount of reversible    aggregation. We therefore decided to remove the GCN4 anchors. The construct lacking GCN4 (x2.1.2) yielded protein that was amenable to crystallization. SEC-MALLS showed that the purified PD1 was trimeric (Fig. 1d); the molecular weight of the peak in the chromatogram was 45.6 kDa, as determined from the UV, LS and dRI signals using the ASTRA software (Wyatt). This is within experimental error (AE5%) of the expected molecular weight of 47.8 kDa. It was clear from the chromatogram that no aggregates were present, and this construct yielded diffracting crystals ( Supplementary  Fig. S4).

Structure of PD1
In our effort to characterize the full-length Hsf protein, we determined the crystal structure of trimeric PD1 at a resolution of 3.3 Å , thus providing the first insight into the molecular arrangement of Hsf to date. HsfPD1 crystallized in the  monoclinic space group C2 (Table 1), with nine monomers in the asymmetric unit. The crystals had an estimated solvent content of 56.5%, with a Matthews coefficient (V M ) of 2.83 Å 3 Da À1 . The number of residues identified in the density of each monomer varied between 129 and 133 residues. The missing loops in the monomers are owing to poor electron density. Typical density is presented in Supplementary Fig. S3.
The individual monomers of HsfPD1 are comprised of three distinct domains that fold to form well characterized TAA domains. A proposed N-terminal TrpRing domain, a KG domain and a C-terminal TrpRing domain are seen in each PD1 monomer (Fig. 2b). The N-terminus of HsfPD1 spans 29 amino acids participating in the unexpected formation of three -sheets, W1 1, W1 2 and W1 3 (where 'W' represents tryptophan), which share considerable structural homology with the C-terminal TrpRing domain. Although the sequence identity between these two regions is low (31%; Supplementary Fig.  S2), a structural alignment of the 29 N-and C-terminal residues from one HsfPD1 monomer (Fig. 2c) confirms that the N-terminal region is indeed a TrpRing domain. The KG domain is composed of two -strands, KG 1 and KG 2, as well as three -helices, KG 1, KG 2 and KG 3. The C-terminal TrpRing is composed of five -strands: W2 1, W2 2, W2 3, W2 4 and W2 5. All domains participate in extensive intertwining, where the C-terminal -helices ( KG 3) from each monomer come together to create the central core of the trimer interface. The KG and C-terminal TrpRing domains were easily identified by simple structural observation and comparison with other TAAs (Meng et al., 2008).
The TrpRing domains of TAAs are so named for the highly conserved tryptophan residue that resides at the beginning of the first -strand. Owing to the structural homology between -sheets W1 1, W1 2, W1 3 and the C-terminal TrpRing, we further analysed the full-length Hsf sequence and identified a tryptophan residue 27 residues upstream of our HsfPD1 N-terminus. Since our construct contained only 29 N-terminal residues upstream of the KG domain, and as TrpRing domains typically consist of $55 amino acids, the structural evidence suggests that our N-terminal -strands constitute the latter half of a TrpRing domain. Additionally, the interleaved nature of this proposed TrpRing domain and the fact that its N-and C-termini lie close to the trimer axis support this hypothesis. Prior to the solution of this structure, sequence analysis of full-length Hsf resulted in the annotation of HsfPD1 as a duplicate domain: N-KG:TrpRing-C (Singh et al., 2015). However, our crystal structure indicates a novel triplicate  domain arrangement for HsfPD1, N-TrpRing:KG:TrpRing-C, an arrangement that is likely to extend to all Hsf putative domains.

Comparison of HsfPD1 with HiaBD1 and Hia307-422
Hsf and Hia are remarkably similar in their domain arrangement, as both possess adhesive domains (BDs) and domains of unknown function (PDs). Whilst BD domains have contiguous Neck-TrpRing architecture, the PD domains have a KG domain instead of the Neck domain. The adhesive activity of the BD domains in Hia results from the formation of an acidic pocket created by residues Asp618 and Ala620 of IN 3, along with Val656 of the C-terminal TrpRing (Yeo et al., 2004;Cotter et al., 2005). The substitution of the Neck domain for KG domains abrogates the adhesive activity of PDs owing to the lack of an equivalent -helix in KG to that of IN 3 from the Neck domain. Indeed, a superposition of HsfPD1 with the non-adhesive head domain of Hia (PDB entry 3emi; Meng et al., 2008) shows strong structural similarity (Fig. 3a). In contrast, although a superposition of HsfPD1 with HiaBD1 (PDB entry 1s7m; Yeo et al., 2004) reveals modest structural similarity, the acidic pocket created by IN 3 is clearly missing in HsfPD1 (Fig. 3b), suggesting that HsfPD1 is indeed a nonadhesive domain.

Evolution of putative domains in Hia and Hsf
Owing to the high sequence identity between the shared regions of Hia and Hsf ( Supplementary Fig. S1), we predict that, had the N-terminus of the Hia307-422 construct been extended by $40 residues, the same N-TrpRing:KG:TrpRing-C arrangement would have been observed. Moreover, this triplicate arrangement indicates an evolutionary link between BD and PD domains, in that BD domains are indeed triplicates, i.e. N-TrpRing:Neck:TrpRing-C, and Hsf is approximately double the length of Hia. Thus, PD domains may have evolved via the duplication of BD domains or vice versa. This duplication certainly contributes to the overall length of Hsf, and whilst it is consistently reported that the PD domains are of unknown function, one implication of this evolution is that the additional length created by these domains conveys a survival advantage on those strains of H. influenzae that express Hsf. This perhaps explains why Hsf is expressed by all typeable strains of H. influenzae (e.g. type b; Hib) whilst Hia is not, i.e. it is long enough to extend beyond the bacterial lipopolysaccharide layer and thus bind to complement regulators and ECM molecules to evade attack by the host.

Conclusion
Although HsfPD1 is in many respects a typical TAA domain, the novel domain arrangement (N-TrpRing:KG:TrpRing-C), revealing the N-terminal TrpRing domain, demonstrates the necessity of structural characterization of such proteins, as opposed to sequence analysis alone. This arrangement yielded insights into the evolution of PD domains, supporting the divergent nature of TAAs, and supersedes the previous domain annotation. Furthermore, the structure of HsfPD1 will contribute to the understanding and determination of the hypothesized 'hairpin-like' structure of Hsf. Inclusion of the N-terminal TrpRing domain in computer models may help to refine them. This combination may reveal unique proteinprotein interactions between antiparallel PD and BD domains, generating exciting insights into the structure of TAAs, should this novel hypothesis be true.