Structural basis of chitin utilization by a GH20 β-N-acetylglucosaminidase from Vibrio campbellii strain ATCC BAA-1116

Crystal structures of a GH20 β-N-acetylglucosaminidase from V. campbellii reveal substrate specificity in chitin utilization.

VhGlcNAcase from V. campbellii strain ATCC BAA-1116 has already been characterized by our group as an exo-acting enzyme that sequentially degrades a small (i.e. shorter than six sugar units) chitooligosaccharide chain from the nonreducing end, generating GlcNAc monomers as the final products (Suginta et al., 2010;Meekrathok & Suginta, 2016;Sirimontree et al., 2016). The preferred substrate of VhGlcNAcase is chitotetraose (GlcNAc 4 ). The enzyme can hydrolyze colloidal chitin, but its specific activity for the polymeric substrate is much lower than for chitin fragments (Suginta et al., 2010). We previously identified the gene encoding VhGlcNAcase in the chromosome of V. campbellii and cloned it into the pQE60 expression vector, which is designed to be expressed in an Escherichia coli M15 (pREP4) host. In this study, we report the first crystal structures of a GH20 exo--N-acetylglucosaminidase from the marine species V. campbellii in the absence and the presence of the natural substrate (GlcNAc) 2 , which was cleaved into GlcNAc during crystal soaking, so that one GlcNAc was bound in the active site. We also solved the structure of the D437A mutant of the same protein. Detailed crystallographic analysis revealed the important features of the enzyme for sugar binding and substrate specificity, and the roles of the active-site residues were elucidated by steadystate kinetic studies.

Experimental procedure 2.1. Mutant design and site-directed mutagenesis
Several active-site mutants were generated by the polymerase chain reaction (PCR) technique using the Quik-Change Site-Directed Mutagenesis Kit (Stratagene, La Jolla, California, USA) according to the manufacturer's protocols. The full-length nag2 gene encoding VhGlcNAcase (residues 1-642), with ten additional residues at the C-terminus (RSRS residues of the cloning site followed by the His 6 tag), was cloned into the pQE-60 expression vector (Qiagen, Valencia, California, USA) as described in Suginta et al. (2010) and the recombinant plasmid harboring the nag2 gene was used as a DNA template for mutagenesis experiments. Table 1 provides a list of the mutagenic primers (BioDesign, Bangkok, Thailand and Bio Basic Canada, Ontario, Canada) with the mutated codons underlined. The correct mutations were confirmed by automated DNA sequencing (First BASE Laboratories, Seri Kembangan, Malaysia).

Protein expression and purification
Single colonies of the E. coli M15 (pREP4) host cells (Qiagen) transformed with the pQE60/GlcNAcase construct were picked and grown overnight at 37 C on Luria-Bertani (LB) agar plates containing 100 mg ml À1 ampicillin (Amp) and 25 mg ml À1 kanamycin (Kan). This overnight culture was used to inoculate a fresh 1 l culture using a 1:100 dilution. The fresh culture was Terrific Broth (TB) medium supplemented with 100 mg ml À1 Amp and 25 mg ml À1 Kan and was grown at 37 C with shaking at 200 rev min À1 until an OD 600 of 0.6 was reached. To induce expression of recombinant VhGlcNAcase, isopropyl -d-1-thiogalactopyranoside (IPTG) was added to the culture medium to a final concentration of 0.4 mM. After 16 h of incubation at 20 C, the cell pellet was collected by centrifugation and resuspended in 20 ml extraction buffer [20 mM Tris-HCl pH 8.0 containing 150 mM NaCl, 1 mM phenylmethylsulfonyl fluoride (PMSF), 5%(v/v) glycerol, 1 mg ml À1 lysozyme and 1 mg ml À1 DNase I] and then lysed on ice using a Sonopuls Ultrasonic homogenizer with a 6 mm diameter probe (50% duty cycle; amplitude setting 30%; total time 20 s; 6-8 repeats). Insoluble debris and unbroken cells were removed by centrifugation at 12 000g at 4 C for 1 h and the supernatant was immediately applied onto a polypropylene column packed with 5 ml TALON Superflow metalaffinity resin (Clontech, USA) operated under gravitational flow at 4 C. After sample application, unbound proteins were washed out with eight column volumes (CV) of equilibration buffer (20 mM Tris-HCl pH 8.0 containing 150 mM NaCl) followed by 7 CV of the same buffer supplemented with 5 mM imidazole.
VhGlcNAcase was eluted by applying 3 Â 10 ml of 150 mM imidazole in the equilibration buffer at pH 8.0. Fractions containing active VhGlcNAcase were then pooled and concentrated using Vivaspin-20 ultrafiltration membrane concentrators (Vivascience, Hanover, Germany) and further applied onto a HiPrep 16/60 Sephacryl S-200 column connected to an Ä KTAprime system (Amersham Bioscience, Piscataway, New Jersey, USA). VhGlcNAcase-containing fractions were pooled and concentrated, and the protein purity was confirmed by 12% SDS-PAGE. The final concentration of VhGlcNAcase was determined from the absorbance at 280 nm using a molar extinction coefficient of 118 720 M À1 cm À1 (Gill & von Hippel, 1989). The freshly prepared protein was aliquoted, flash-frozen in liquid nitrogen and stored at À80 C until use.

Investigation of the protein state by size-exclusion chromatography
The molecular weight (MW) of wild-type VhGlcNAcase was investigated using size-exclusion chromatography. A HiPrep 26/60 Sephacryl S-300 prepacked column connected to an Ä KTAprime system (GE Healthcare Biosciences, Bangkok, Thailand) was equilibrated with 20 mM Tris-HCl buffer pH 8.0 containing 150 mM NaCl and was operated at a flow rate of 2.0 ml min À1 . The gel-phase distribution coefficient (K av ) of the analyte between the stationary and mobile phases was calculated as where V e is the elution volume, V o is the void volume and V i is the volume of the stationary phase (Tayyab et al., 1991). The column was calibrated with the protein standards ribonuclease A (13.7 kDa), ovalbumin (43 kDa), bovine serum albumin (BSA; 66 kDa), aldolase (158 kDa), ferritin (440 kDa) and thyroglobulin (669 kDa). Blue dextran 2000 was used to determine the void volume V o , while N " -DNP-l-lysine hydrochloride (0.35 kDa) was added as a control for the retention volume of each protein and also to determine the total volume of the column. A plot of the K av of individual standard proteins (calculated from equation 1) versus the logarithm of the MW yielded a linear calibration plot, which allowed the MW of VhGlcNAcase to be determined. To obtain the elution volume of VhGlcNAcase, the purified enzyme (4 mg) mixed with N " -DNP-l-lysine was applied onto the HiPrep 26/60 Sephacryl S-300 gel-filtration column as specified. Fractions of 5 ml were collected, with VhGlcNAcase being eluted in two peaks: the first eluted close to the void peak, while the second eluted near the BSA peak. SDS-PAGE analysis and GlcNAcase activity assays showed that the first peak was aggregated and inactive enzyme, while the second peak was the active enzyme. Therefore, the MW of the active VhGlcNAcase was determined from the calibration curve as 76 kDa (expected value 74 kDa).

Protein crystallization, data collection and processing
Preliminary crystallization of VhGlcNAcase, both in the apo form and in complex with N-acetylglucosamine (GlcNAc), was performed as described elsewhere (Meekrathok et al., 2015). Crystallization conditions were screened using sittingdrop vapor diffusion at 20 C with the commercially available screens The JCSG Core Suites I, II, III and IV, The Classics and Classics II Suites, The PACT Suite, The PEGs Suite and The Anions Suite (Qiagen, Hilden, Germany). Under the optimized crystallization condition, 1.5 ml wild-type VhGlcNAcase solution (10 mg ml À1 ) was mixed with 1.5 ml reservoir solution consisting of 0.1 M sodium acetate pH 4.6, 1.4 M sodium malonate, while the D437A mutant was mixed with 0.1 M bis-Tris pH 7.5, 0.1 M sodium acetate, 20%(w/v) PEG 3350. 3D plate-shaped crystals of the wild type grew at 20 C within three days to dimensions of up to 400 Â 200 Â  Table 1 Primers used for mutagenesis.

D303A
Forward 5 0 -CATTGGCATCTCACTGCGGATGAAGGCTGGCGTG-3 0 Reverse 20 mm. These wild-type crystals were subsequently successfully soaked with 10 mM (GlcNAc) 2 in the corresponding mother liquor at 20 C for a period of approximately 30 min, as described previously (Meekrathok et al., 2015). The native and soaked crystals were then transferred into a cryoprotectant solution containing the mother liquor with 2.9 M sodium malonate and 10 mM (GlcNAc) 2 . The D437A mutant grew in a condition consisting of 0.1 M bis-Tris pH 7.5, 0.1 M sodium acetate, 20%(w/v) PEG 3350 and was transferred into a cryoprotectant solution consisting of mother liquor supplemented with 25%(v/v) glycerol. X-ray diffraction data were collected from all crystals at 100 K on the PX-II beamline at the Swiss Light Source in Villigen, Switzerland using a PILATUS 6M detector. The data-collection strategy was determined with iMosflm (Battye et al., 2011) from the CCP4 suite  and the diffraction data were indexed, integrated and scaled using XDS (Kabsch, 2010). Crystallographic and refinement statistics are summarized in Table 2.

Structure determination and refinement
Molecular replacement (MR) was employed to obtain phase information using Phaser (McCoy et al., 2007) from the CCP4 suite with the structure of -hexosaminidase from Arthrobacter aurescens (PDB entry 3rcn; 35% identical to GlcNAcase from V. campbellii; Midwest Center for Structural Genomics, unpublished work) as a search model. The final model of wild-type VhGlcNAcase was subsequently used as a template to obtain the phases for the data sets for the VhGlcNAcasesubstrate complex and the D437A mutant. Model building was performed by iterative cycles consisting of manual building in Coot (Emsley et al., 2010) and restrained refinement in REFMAC5 from the CCP4 suite . During the model-rebuilding process, electron density for only one GlcNAc molecule could be found in the structure even though the crystal was soaked with (GlcNAc) 2 . The molecular topology of GlcNAc was taken from the Protein Data Bank (PDB entry 3gh5) and then modeled into the corresponding 2F o À F c and F o À F c maps. The crystallographic data and refinement statistics of the finalized model of the VhGlcNAcase structures are summarized in Table 2. The geometry of each final model was verified by PROCHECK (Laskowski et al., 1993) and MolProbity (Chen et al., 2010). Ligand-protein interactions were analyzed using LigPlot+ (Laskowski & Swindells, 2011), and the graphical structures and electron-density maps were visualized using PyMOL (DeLano, 2002).

GlcNAcase activity assay
GlcNAcase activity was determined by a colorimetric assay using 4-nitrophenyl N-acetyl--d-glucosaminide (pNP-GlcNAc; Sigma-Aldrich, St Louis, Missouri, USA) as a substrate. The reaction of 0.1-5 mg protein samples with 125 mM pNP-GlcNAc in 100 mM sodium phosphate buffer pH 7.0 in a total volume of 200 ml was carried out in triplicate in a 96-well microtiter plate at 37 C for 10 min with constant agitation in a ThermoMixer Comfort (Eppendorf AG, Hamburg, Germany). The reaction was terminated by the addition of 100 ml 3 M sodium carbonate. The amount of 4-nitrophenol (4-NP) released was monitored optically at a wavelength of 405 nm using a Benchmark Plus microplate spectrophotometer (Bio-Rad Laboratories, Hercules, California, USA). A calibration curve of a 4-NP standard varying from 0 to 20 nmol was constructed, allowing determination of the molar quantity of 4-NP liberated by the enzymatic  reaction. The specific hydrolytic activity of the enzyme was expressed as nanomoles of 4-NP produced in 1 min at 37 C.

Steady-state kinetic measurements
Kinetic studies of wild-type and mutant VhGlcNAcase were carried out using a colorimetric assay in a microtiter plate reader, as described above, with substrate concentrations varying from 0 to 500 mM. Briefly, a 200 ml reaction mixture consisting of 0-500 mM pNP-GlcNAc in 100 mM sodium phosphate buffer pH 7.0 and 0.1-30 mg enzyme was incubated at 37 C with constant shaking for 10 min. The enzymatic reactions were then terminated by adding 100 ml 3 M sodium carbonate. The amount of reaction product was measured at 405 nm and converted to molar quantities using a calibration curve for 4-NP as described previously. The kinetic parameters (K m , k cat and k cat /K m ) were determined from triplicate assays using the Michaelis-Menten function in GraphPad Prism version 0.6.0 (GraphPad Software, San Diego, California, USA).

Crystallization, refinement statistics and crystal structures
Wild-type VhGlcNAcase and its catalytic mutant D437A (hereafter referred to as WT and D437A, respectively) were expressed at high levels in E. coli M15 (pREP4) cells as C-terminally His 6 -tagged polypeptides and were purified to homogeneity on a cobalt-affinity column (Clontech, USA) followed by gel-filtration chromatography, giving a final yield of approximately 15-20 mg purified enzyme per litre of bacterial culture. Single crystals of apo WT and apo D437A were successfully grown by hanging-drop vapor diffusion under the optimized conditions described in Section 2 and diffraction data were obtained using synchrotron X-ray radiation to resolutions of 2.37 and 2.6 Å , respectively. A single crystal of WT VhGlcNAcase soaked with (GlcNAc) 2 diffracted to a resolution of 2.50 Å ( Table 2). All crystals belonged to space group P2 1 with two molecules per asymmetric unit. The structures of all crystal forms were determined by the MR method and refined to R work and R free values of 0.21 and 0.25, respectively, for apo WT, 0.21 and 0.26, respectively, for WT GlcNAc and 0.21 and 0.24, respectively, for apo D437A. The root-mean-square deviations (r.m.s.d.s) of bond lengths and angles of all crystals were refined to between 0.007 and 0.009 and between 1.20 and 1.34, respectively. The average B factors refined to 55.34 Å 2 for apo WT, 36.35 Å 2 for WT GlcNAc and 64.96 Å 2 for apo D437A. The coordinates and structure factors were deposited in the Protein Data Bank with PDB codes 6ezr for apo WT, 6ezs for the WT-GlcNAc complex and 6ezt for apo D437A. Fig. 1(a) shows the domain arrangement of VhGlcNAcases based on the 3D structure obtained from this study. The overall structure contains three distinct domains: an N-terminal carbohydrate-binding (CBD) domain (residues 1-114; pink) connected by a long linker to an + domain (residues 148-264; blue) followed by a C-terminal catalytic (Cat) domain (residues 265-642; green) (Val-Cid et al., 2015). Fig. 1(b) is a topology diagram showing details of the secondary-structural elements of VhGlcNAcase. The CBD domain consists primarily of eight antiparallel strands (1-8) that form an immunoglobulin-like fold, while the + domain is composed of two helices mixed with six strands flanked by three additional short strands that are aligned out of the main plane of this domain. The central Cat domain has a typical (/) 8 TIM-barrel-like fold, in which the helices and strands alternate in an antiparallel fashion. Note that in the TIMbarrel domain the fifth helix (5), which joins strands 5 and 6 of the canonical TIM-barrel fold, and the seventh helix (7), which joins strands 7 and 8, are missing and are replaced by three short helical () segments. Helix 1 connects strands 5 and 6, and the 2 and 3 segments connect strands 7 and 8.
We also observe an additional long helix at the end of helix 8, which is also seen in SpHex (PDB entry 1m01; Williams et al., 2002). Fig. 1(c) shows the structural architecture of VhGlcNAcase, in which the central Cat domain is flanked by the CBD domain and the + domain. This CBD domain is structurally related to the family 2 carbohydrate-binding module (CBM2) in the CAZy database (Lombard et al., 2014). A DALI search (http://ekhidna.biocenter.helsinki.fi; Holm & Sander, 1993) reveals that the CBD domain of VhGlcNAcase has the closest structural identity to the CBD of endoglucanase D from Clostridium cellulovorans (Z-score = 13.1; r.m.s.d. of 1.8 Å over 96 residues; 13% sequence identity; PDB entry 3ndz; C. M. Bianchetti, R. W. Smith, C. A. Bingman & G. N. Phillips Jr, unpublished work). The + domain is similar to the + domain of -Hex from A. aurescens (Z-score = 17.0; r.m.s.d. of 1.5 Å over 112 residues; 28% sequence identity; PDB entry 3rcn). The function of this domain is unknown, but it may help to solubilize and stabilize the catalytic domain (Val-Cid et al., 2015). Lastly, the closest relative of the catalytic domain of VhGlcNAcase is the TIM-barrel structure of -Hex from A. aurescens (Z-score = 45.9; r.m.s.d. 1.7 Å over 347 residues; 39% sequence identity; PDB entry 3rcn).
In VhGlcNAcase 12 cysteine residues are distributed over the three protein domains. The previously reported structure of -hexosaminidase from S. plicatus (SpHex; PDB entry 1m01) suggested one disulfide bond in the Cat domain between Cys263 and Cys282 (Williams et al., 2002), which is not present in VhGlcNAcase. Although there are two neighboring cysteine residues very close to the active site in VhGlcNAcase (Cys272 and Cys583), these residues are not conserved in SpHex and their side chains are too far apart (4.5 Å ) to form a disulfide bridge.

Dimer interface
The crystallographic analysis showed that the single crystals of apo (PDB entry 6ezr) and holo (PDB entry 6ezs) VhGlcNAcase (in complex with GlcNAc) contained two identical (r.m.s.d. of 0.14-0.16 Å over 639 C atoms) molecules per asymmetric unit (Meekrathok et al., 2015). The two research papers protein molecules, designated Mol A and Mol B, are related by a twofold rotational axis, as seen in Fig. 2(a). Each holoenzyme molecule contains one GlcNAc unit (F o À F c density shown as an orange mesh with the sugar shown as sticks) at subsite À1 of the catalytic pocket (the Cat domain of Mol A is in green and that of Mol B in gray). The entrance to the substrate-binding cleft of the Cat domain is covered by the CBD domain (magenta for Mol A versus gray for Mol B) of the neighboring molecule. Notably, the WT crystals were soaked with the substrate (GlcNAc) 2 to obtain the ligand-bound structure, but we observed only a single GlcNAc in the active site of the enzyme. Given that (GlcNAc) 2 is a stable sugar, it was presumed that (GlcNAc) 2 was hydrolyzed during crystal soaking, leaving a GlcNAc product in the high-affinity site (site À1). Fig. 2(b) presents a surface representation of the molecular packing of Mol A and Mol B of WT VhGlcNAcase in complex with GlcNAc, with the residues in the dimer interface highlighted in orange. Fig. 2(c) shows Mol A and Mol B separately in the same orientation as in Fig. 2(a)  protein surface area (66 973.06 Å 2 ). 19 residues (Val12, Leu13, Ser14, Glu15, Gln16, Lys17, Gln18, Asn19, Arg21, Asp44, Arg45, Asp50, Ser51, Val52, Ser53, Ser87, Asn88, Pro89 and Arg91) from the CBD domain, 12 residues (Ile397, Glu438, Asn441, Glu489, Trp505, Leu506, Ser507, Glu509, Gln527, Trp546, Ala547 and Asn548) from the Cat domain and six residues (Val121, Ala123, Ser124, Pro125, Tyr126 and Arg127) from the linker are involved in the dimer interface; all are mostly hydrophilic, consistent with the finding that VhGlcNAcase is a monomer in solution, as shown below.

Molecular-weight determination
VhGlcNAcase was previously suggested to be a monomeric enzyme by native PAGE analysis (Meekrathok et al., 2015), but in the crystal structure two molecules per asymmetric unit were observed. To clarify this point, we carried out sizeexclusion chromatography (SEC) to determine the apparent MW of VhGlcNAcase in its native form. The chromatographic profiles of VhGlcNAcase and six calibration proteins are shown in Fig. 3(a). The retention volume of 181 ml for VhGlcNAcase in the chromatographic profile was converted to the distribution coefficient (K av ), which corresponds to a MW of 76 kDa on the calibration curve plotted for K av and log 10 MW (Fig. 3b). Compared with the expected MW of VhGlcNAcase (74 kDa), the MW obtained from gel filtration clearly confirms that VhGlcNAcase is a monomer in solution. Thus, the dimer observed in the crystals is likely to be a crystallization artifact induced by the particularly high protein concentration needed for crystal formation. We also carried out a PISA analysis (Krissinel & Henrick, 2007) and the results predicted no stable dimer formation in solution, confirming the SEC results.

Interactions with the sugar in the active site
The complex of WT VhGlcNAcase with substrate shows one molecule of GlcNAc occupying the À1 subsite (subsites were assigned based on the structure of SmChb in complex with chitobiose; Tews et al., 1996). The single GlcNAc was The dimer interface in the VhGlcNAcase crystals. (a) The overall crystal structure of VhGlcNAcase with two identical molecules per asymmetric unit. The GlcNAc product found in the active site of each molecule is shown in a black ball-and-stick representation, with the F o À F c electron density shown as a yellow mesh. (b) Surface representation of the dimer in the asymmetric unit. For Mol A, the Cat domain is shown is green, the + domain in cyan, the CBD domain in magenta and the linker that joins the Cat and CBD domains in gray. Mol B is represented in gray, while the dimer interface is colored orange. (c) A separate depiction of the dimer interface in the same orientation as in (a), with the dimer interface area highlighted in orange. most likely to be produced by cleavage of the soaked substrate (GlcNAc) 2 by active WT enzyme on the surface of the crystal during the soaking time of approximately 30 min. The long loop that connects the 1-2 hairpin of Mol B close to the entrance to the active site and Gln16 would clearly obstruct a long chitooligosaccharide chain from entering the active site and may explain why we did not observe any ligand molecule in the active site of the catalytically inactive D437A mutant co-crystallized with (GlcNAc) 2 .
In the holo structure, the glycosidic O atom (O 1 ) at the anomeric C atom of the À1 GlcNAc ring makes a hydrogen bond to the amide side chain of Gln16 on the loop belonging to Mol B. The side chain of Gln16 of Mol B also makes a second hydrogen bond to the O 7 atom of the C 2 acetamido group of À1 GlcNAc (Fig. 4a). The À1 GlcNAc adopted a 4 C 1 chair conformation based on the Cremer-Pople parameter calculator (Cremer & Pople, 1975;Jeffrey & Yates, 1979). As shown in Fig. 4(b), the reducing end (C 1 OH) of À1 GlcNAc makes two hydrogen bonds to the side chain of Gln16 of Mol B and Tyr530, and the C 2 acetamido group is immobilized by the side chain of Gln16 of Mol B and a water molecule. The C 3 OH forms two hydrogen bonds to nearby water molecules and a hydrogen bond to the side chain of Arg274. The nonreducing end (C 4 OH) is immobilized by the side chains of Arg274 and Glu584, respectively. The C 6 OH forms two hydrogen bonds to the side chains of Asp532 and the N atom of the Trp546 side chain. The sugar-enzyme interactions shown in Fig. 4(b) are within a distance of 3.5 Å .
The predicted catalytic pair Asp437-Glu438 is located near the C 2 acetamido group of the bound sugar. Asp437 makes a hydrogen bond to the carbonyl residue of the C2 moiety, while the side chain of Glu438 was seen in two conformations, each with 0.5 occupancy. The first rotamer is pointing away, while the second rotamer points towards the sugar molecule. The side chain of Gln16 of Mol B interferes with its ideal location close to the glycosidic O atom. Fig. 4(c) shows a surface representation of the empty subsite À1 of apo WT VhGlcNAcase (PDB entry 6ezr) that is surrounded by aromatic residues. Four conserved aromatic residues, Trp487, Trp505, Tyr530, Trp546 and Trp582, essentially create the pocket wall. Superimposition of Mol A of the apo VhGlcNAcase structure onto Mol A of the structure in complex with GlcNAc gives an r.m.s.d. of 0.90 Å over 1278 residues (two chains) and an r.m.s.d. of 0.26 Å 2 over 639 residues, i.e. there are no significant differences between the CBD domain and the + domain of the two enzyme forms. When compared with the catalytic pocket of the unliganded enzyme (Fig. 4c), a small movement of the loop regions surroundings subsite À1 of the enzyme can be observed. The dimensions of the GlcNAc-fitted catalytic cleft (8.2 Â 17.0 Å 2 ) are slightly narrower than the empty pocket (9.4 Â 17.6 Å 2 ) (Fig. 4d). Four key side chains, Gln398 (part of loop L3), Asp437 and Glu438 (part of loop L4), and Trp505 (part of loop L6), are positioned close to the center of the catalytic pocket and are at an optimal distance to interact with the À1 GlcNAc.

Structural comparison of VhGlcNAcase with other GH20 members
The crystal structure of VhGlcNAcase in complex with GlcNAc is similar to those of other GH20 -N-acetylhexosaminidases, including the WT -hexosaminidase SpHex from S. plicatus (PDB entry 1m01; r.m.s.d. of 2.1 Å over 445 residues; Mark et al., 2001), the chitobiase SmChb from S. marcescens (PDB entry 1qbb; r.m.s.d. of 2.5 Å over 613 residues; Tews et al., 1996)  Size-exclusion chromatographic profile and calibration curve of VhGlcNAcase and standard proteins. (a) The HiPrep 26/60 Sephacryl S-300 prepacked column was calibrated with six well defined globular protein standards plus the small molecule N " -DNP-l-lysine, ranging from 0.35 to 669 kDa. N " -DNPl-lysine (0.35 kDa) was used to estimate the internal volume of the column and blue dextran 2000 was used to determine the void fraction. (b) The estimated molecular mass of VhGlcNAcase (76 kDa, expected 74 kDa) was determined from the calibration plot of K av versus log MW after the K av value had been calculated from the measured elution volume.

domains, an N-terminal CDB domain (magenta), an + domain (cyan) and a C-terminal Cat domain (green), while
SpHex contains only two domains: an + domain (cyan) connected to a C-terminal Cat domain (brown). The N-terminal CBD domain is missing in SpHex. SmChb consists of four domains designated as an N-terminal CBD domain (magenta) followed by an + domain (cyan), a TIM-barrel Cat domain (lilac) and a C-terminal immunoglobulin (IgG)like domain (orange). The structure of insect Of Hex is similar to that of SpHex, and contains two domains: an + domain (cyan) connected to a C-terminal Cat domain (red). Note that the Cat domain of the four enzymes has a common (/) 8barrel fold and is conserved among all GH20 GlcNAcases. Fig. 5(b) shows the electrostatic surface around the active sites of VhGlcNAcase complexed with GlcNAc (PDB entry 6ezs) and NAG-thiazoline (PDB entry 6k35; Meekrathok et al., 2020), SpHex with GlcNAc (PDB entry 1m01), SmChb with (GlcNAc) 2 (PDB entry 1qbb) and Of Hex1 E328A in complex with TMG-chitotriomycin (PDB entry 3vtr). The sugarbinding pockets of these GH20 GlcNAcases are highly polar due to six conserved proton-donating groups that form a strongly negatively charged surface (red) around subsite À1 of the active site of each enzyme [see also Fig. 6(a)   open end towards the positive subsites (subsites +1, +2 and +3), allowing the accommodation of a chitooligosaccharide chain of 2-4 units. The active sites of SpHex and SmChb, on the other hand, are small and rather short open pockets that are suitable for accommodating only one or two GlcNAc units. The sugar ligand of at least one unit is found at subsite À1, which is located at the closed end of the tunnel/pocket. In all enzymes, cleavage occurs between subsites À1 and +1 [indicated by a white arrow, Fig. 5(b)]. The summary of ligandactive-site residue interactions for all of the examined GH20 enzymes is presented in Table 3. LIGPLOT analysis showed that both hydrogen bonds and hydrophobic interactions between the sugar ligand and the active-site residues of each enzyme are mainly formed at the most favored site À1. Many fewer interactions are seen at the product site +1 even with two units of sugar, such as chitobiose in SmChb or TMGchitomyosin in Of HEX.

Identification of the active-site residues directly involved in catalysis by VhGlcNAcase
We have previously proposed that the catalytic mechanism for the cleavage of the glycosidic bond requires a catalytic pair located near the cleavage site in the sugar chain (Suginta et al., 2010;Meekrathok & Suginta, 2016). Since we only observe the GlcNAc cleavage product at subsite À1, and the glycosidic O atom at the anomeric C atom of the bound sugar is in direct contact with Gln16 of Mol B instead of the catalytic residue Glu438 due to a crystal-packing artifact, we cannot directly observe the catalytically relevant conformation. Fig. 6  shows an amino-acid alignment of three sequence segments of the catalytic domains of the enzymes shown in Fig. 5.
VhGlcNAcase has 29% identity overall to SpHex, 28% identity to SmChb and 24% identity to Of Hex. The positions of the conserved active-site residues equivalent to Asp303, Asp304, His373, Asp437, Glu438, Asp532 and Glu584 of VhGlcNAcase are highlighted in red. Note that Tyr530, Trp546 and Trp582 (residues highlighted in blue), which make hydrophobic contacts with the À1 GlcNAc, are also conserved among the four GH20 enzymes. Obviously, the acidic pair (Asp437-Glu438 in VhGlcNAcase) is completely conserved and was also identified as the catalytic pair in SpHex (Asp313-Glu314; Mark et al., 2001), SmChb (Asp539-Glu540; Tews et al., 1996) and Of Hex (Asp327-Glu328) (Liu et al., 2011). Fig. 6(b) shows the positions of these amino-acid side chains around the À1 GlcNAc molecule in VhGlcNAcase. To elucidate the impact of these residues on catalysis, they were chosen as targets for kinetic assessment. Site-directed mutagenesis generated eight single mutants, designated D303A, D303N, D437A, D437N, E438A, E438Q, H373A, D532A and E584A. All of these mutants showed drastic decreases in the enzyme activity, as measured by hydrolysis of pNP-GlcNAc, Active-site mutational design. (a) Sequence alignment of the catalytic domains of four GH20 enzymes: VhGlcNAcase, SpHex, SmChb and Of Hex. Their amino-acid sequences were retrieved from the PDB using the PDB codes presented in Fig. 5. The sequence alignment was carried out by Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and displayed by Jalview version 2.11.1.3 (https://www.jalview.org/). The conserved polar residues equivalent to Asp303, Asp304, His373, Asp437, Asp438 and Glu584 of VhGlcNAcase are shaded red, while the conserved aromatic residues equivalent to Trp487, Trp505, Tyr530, Trp546 and Trp582 are shaded blue. (b) The positions of the mutated residues Asp303, His373, Asp437, Asp438 and Glu584 in the catalytic pocket surrounding the À1 GlcNAc molecule. Colors: green for C atoms of the binding residues and black for the C atoms of the sugar molecule, blue for N atoms and red for O atoms. (c) Bar graphs representing the relative enzymatic activity of the active-site mutants in pNP-GlcNAc hydrolysis in comparison of that of WT VhGlcNAcase. especially the mutants of the catalytic residues D437A/N and E438A/Q (Fig. 6c), confirming the critical role of the conserved Asp437-Glu438 pair in substrate hydrolysis.
The Michaelis-Menten parameters of the hydrolytic activity of the VhGlcNAcase variants were further analyzed and the values are presented in Table 4.
All of the mutants have significantly increased K m and decreased k cat values, yielding an overall decrease in the corresponding catalytic efficiency k cat /K m , relative to the values for the WT enzyme. As expected, the most severe loss in catalytic efficiency is observed with the catalytic mutant D437N, followed by D437A, E438A and E438Q. The D303A/N, H373A and D532A mutants show a moderate decrease in catalytic efficiency, while the E584A mutant showed a modest decrease in k cat /K m , which was only threefold less than that of the WT enzyme, suggesting that this residue does not play a critical role in catalysis.
The long loops L2, L3 and L7 of VhGlcNAcase (in black) protrude into the region where the NGAB2 substrate of StrH would be, which causes the substrate-binding region of VhGlcNAcase to be narrower. Fig. 7(b) emphasizes a severe clash between the side chains of Trp546 and Ala547 of loop L7 in VhGlcNAcase with the modeled NGA2B, suggesting that the active site of VhGlcNAcase cannot possibly accommodate a branched oligosaccharide.

Discussion
We successfully solved the crystal structures of three VhGlcNAcase variants: two apo structures, of the wild type and the D437A mutant, and one holo structure, of the wild type in complex with (GlcNAc) 2 . After soaking the apo wildtype VhGlcNAcase crystals with (GlcNAc) 2 , we observed a clear F o À F c density map corresponding to only one GlcNAc molecule in the À1 subsite. This GlcNAc is most likely to be the product of hydrolysis of (GlcNAc) 2 by active WT enzyme on the surface of the crystal during the soaking time of 30 min. Soaking or co-crystallizing the D437A mutant with (GlcNAc) 2 was not successful, i.e. no substrate was seen in the active site, probably owing to the entrance of the active site of one monomer in the asymmetric unit (Mol A) being blocked by   Table 3 A summary of direct ligand-enzyme interactions obtained by LIGPLOT analysis.
Hydrogen bonds and van der Waals interactions were set at a 3.0 Å distance. The underlined amino-acid residues form hydrogen bonds, while residues shown in bold make hydrophobic interactions with the corresponding ligands.

À1
Arg274, Trp487, Trp505, Tyr530, Asp532, Trp582, Glu584 Arg162, Asp313, Glu314, Trp344, Trp361, Tyr393, Asp395, Trp442, Glu444 Arg349, Asp539, Glu540, Trp616, Trp639, Tyr669, Asp671, Phe672, Trp737, Glu739 Arg220, Asp249, Asp367, the side chain of Gln16 of the last strand of the CBD domain of the second monomer (Mol B). Since the active site of VhGlcNAcase is almost completely blocked by the neighboring monomer in the crystal structure, we assume that this would render such a dimeric enzyme extremely inefficient. This suggests that the active form of VhGlcNAcase is a monomer in solution, which could be confirmed by a gel filtration experiment. In addition, another crystal structure of VhGlcNAcase in complex with NAG-thiazoline (NGT) shows a different space group (PDB entry 6k35; Meekrathok et al., 2020) with an accessible active site and a completely different dimer in the asymmetric unit. This corresponds to the findings discussed in the review by Val-Cid et al. (2015), in which the bacterial GH20 enzymes most closely related to VhGlcNA-GlcNAcase are described as being functional as monomers. Furthermore, a mutation of Gln16 would most likely not completely abolish the artificial dimerization since although the tip of the hairpin, including Lys17 and Gln18, blocks the entry to the ligand-binding pocket, the tightest interactions of the artificial dimer are outside the hairpin interaction site. The hairpin residues themselves interact relatively loosely. Lastly, we observed that monomeric VhGlcNAcase in solution is fully active towards both natural chitooligosaccharide and artificial pNP-glycoside substrates (Suginta et al., 2010). Such results indeed indicate that the dimeric interface does not interfere with the accessibility of substrates to the active site of the enzyme or with the catalytic activity of the enzyme. The overall structures of apo WT and D437A mutant VhGlcNAcases are essentially identical, both containing three distinct domains. The N-terminal CBD domain is followed by a relatively long linker that connects to the + domain. The + domain is similar to the GH20b domain in other GH20 -hexosaminidases (Val-Cid et al., 2015), but its function is still unknown. Some of its residues interact with the surface of the Cat domain and may help to stabilize the structural integrity of this domain. The largest domain is the catalytic (Cat) domain, which consists of eight strands alternating with six helices, instead of eight helices as reported in the related N-acetylglucosaminidases SmChb (PDB entry 1qbb; Tews et al., 1996) and SpHex (DB entry 1m01; Mark et al., 2001). The lack of helices 5 and 7 of the (/) 8 -barrel structure in the Cat domain and an additional helix at the end of helix 8 of VhGlcNAcase seem to be common structural features of GH20 glycoside hydrolases (Maier et al., 2003;Tews et al., 1996;Mark et al., 2001). In the closely related human HexB, the -helices at positions 5 and 7 of the (/) 8 -barrel structure of the catalytic domain are also missing, and an additional C-terminal helix follows helix 8 (Maier et al., 2003).
The active site of VhGlcNAcases is strongly negatively charged, and analysis of the residues surrounding subsite À1 of the holo VhGlcNAcase structure reveals that five acidic residues, Asp303, Asp437, Glu438, Asp532 and Glu584, around the surface of the active site, together with His373, are completely conserved among the four GH20 GlcNAcases. Although some of them are not directly in contact with the sugar molecule, these residues apparently contribute to the anionic character of subsite À1. Comparison of the active-site architecture and ligand-protein interactions among the four GH20 orthologs (Figs. 5a and 5b) provides some ideas about substrate specificity and subsite preference. Essentially, the catalytic pocket of VhGlcNAcase is narrow and elongated, fitting a linear chitooligosaccharide chain of 2-4 GlcNAc units. The structure of the active site supports our previous quan-  titative HPLC and kinetic modeling of the enzymic reaction (Suginta et al., 2010), which predicted the active site of VhGlcNAcase to contain an array of most probably four binding subsites (À1, +1, +2 and +3). Based on this subsite topology, the cleavage site is located between subsites À1 and +1. In a recent study, we examined the inhibitory effect of NAG-thiazoline (NGT), a reaction-intermediate analog and a common inhibitor of GH20 GlcNAcases, and found that NGT strongly inhibited the hydrolytic activity of VhGlcNAcase, with a K i of 62 AE 3 mM and a K d of 32 AE 1.2 mM (Meekrathok et al., 2020). The structure of VhGlcNAcase in complex with NGT showed that NGT occupies subsite À1 (Fig. 5b, bottom left; PDB entry 6k35; Meekrathok et al., 2020). The N atom at the C2 position of the bicyclic NGT mimics the N atom in the cyclized C2 acetamido group of the oxazolinium intermediate and makes a strong hydrogen bond to Asp437, while Glu438 is oriented in the optimal position for catalysis. These two residues are completely aligned with the catalytic pairs of -Hex1 (PDB entry 3sur; Sumida et al., 2011) and of SpHex (PDB entry 1hp5; Williams et al., 2002;Vocadlo & Withers, 2005), both of which employ the substrate-assisted mechanism for catalysis. Although further kinetic experiments are required to prove the mode of action of VhGlcNAcase, we assume that the enzyme, like other functionally characterized bacterial GH20 GlcNAcases, also employs this catalytic mechanism. In the substrate-assisted mechanism of glucosaminidases, two completely conserved acidic residues are proposed to be the catalytic pair (Meekrathok & Suginta, 2016;Jiang et al., 2011;Mark et al., 2001;Thi et al., 2014). In VhGlcNAcase, this acidic pair is identified as Asp437-Glu438, located next to the cleavage site. Given that VhGlcNAcase is an exolytic enzyme that sequentially degrades a chitooligosaccharide chain, releasing one GlcNAc at a time from the nonreducing end, the À1 GlcNAc would be identified as the remaining product after the hydrolysis of (GlcNAc) 2 during crystallization. In both the apo and holo structures we observed an obstruction of the entrance of the active site by Gln16 from a neighboring molecule in the crystal. As shown by the SEC experiments, this is a crystallization artifact that would not occur in the monomeric enzyme in solution. The holo structure of VhGlcNAcase with GlcNAc provides a snapshot of the sugar ring in the 4 C 1 conformation, which signifies the most favorable form of the sugar. The same sugar conformation was also observed in the active site of the -hexosaminidase Hex1T from Paenibacillus sp. TS12 (PDB entry 3gh5; Sumida et al., 2009) and SpHex from S. plicatus (PDB entry 1m01; Mark et al., 2001).
Structural comparison of the apo and holo forms of VhGlcNAcase provides evidence of a ligand-induced conformational change in the local area of the active site. The most notable effect is caused by a swinging of the side chain of Glu438 in the holo structure towards the sugar ring, causing some movement of the corresponding loop L4, essentially narrowing the catalytic cavity around the À1 subsite by 0.6 Å (in length) Â 1.2 Å (in width), while other parts of the active site remain unaltered. This induced-fit movement of Glu438 enables the bond cleavage of chitooligosacharide bound between subsites À1 and +1, while the positioning of Asp437 may help the enzyme to stabilize the transition state by interaction with the acetamido group of the substrate through a substrate-assisted catalysis mechanism (Mark et al., 2001). Point mutation of Asp437 to Ala/Asn and Glu438 to Ala/Gln generated mutants with almost no GlcNAcase activity, confirming the catalytic functions of these two residues. The reduction in catalytic activity of the D437A and D437N mutants is due to the loss of the negative charge of the side chain of this amino acid, which results in destabilization of the reaction intermediate in the transition state (Meekrathok & Suginta, 2016;Mark et al., 2001). Mutation of Glu438 to Ala/ Gln instead removes the proton-donating group that is mandatory for bond cleavage in acid catalysis. A complete loss of catalytic activity was previously seen on changing the equivalent acidic pairs in SpHex (Asp313-Glu314; Mark et al., 2001), SmChb (Asp539-Glu540; Tews et al., 1996) and Of Hex1 (Asp327-Glu328; Liu et al., 2011). Although they are all members of the GH20 family, VhGlcNAcase has only <30% sequence identity to the other three GH20 enzymes (24% to Of HEX and 29% to SmChb and SpHEX), and the dissimilarities in the shape and the surface-charge properties of the sugar-binding pocket from those of the other enzymes highlights some novel features of this enzyme. Its elongated and shallow catalytic pocket is suited to accommodating a linear chitooligosaccharide chain of 2-4 units (Suginta et al., 2010). The highly anionic surface along subsites +1, À1, À2 and À3 supports the formation of a large network of hydrogen-bond interactions between the binding residues and the corresponding sugar moieties, which essentially determine the strong binding affinity of VhGlcNAcase for its substrate. In contrast, the catalytic pockets of SmChb and SpHEX are shorter but wider, suggesting that these two enzymes act preferentially on small substrates, i.e. sugar dimers. The catalytic cleft of Of HEX is more open and less homogenous in the charged surface, signifying a broad substrate specificity of this enzyme towards hetero GlcNAc-containing molecules. Although the catalytic domains of GH20 share a highly conserved (/) 8 TIM-barrel architecture, some GH20 enzymes, such as StrH, HexA and HexB, recognize a broad range of sugar substrates, including branched (1-4), (1-3), (1-2) and (1-6) glycosidic-linked GlcNAc-containing glycans that are components of glycolipids, glycoproteins or sulfated glycoconjugates (Jiang et al., 2011;Sumida et al., 2011;Manuel et al., 2007;Intra et al., 2008). Superimposition of the catalytic domains of VhGlcNAcase and StrH in complex with NGA2B reveals that loop L7 of VhGlcNacase in particular would cause steric clashes with a branched glycan, consistent with its preference for linear chitooligosaccharide chains.

Concluding remarks
This study reports the crystal structures of a GH20 -N-acetylglucosaminidase, namely VhGlcNAcase, in the absence and presence of a natural ligand (GlcNAc) from the marine bacterium V. campbellii. VhGlcNAcase contains three domains: a carbohydrate-binding domain (CBD), an + research papers Acta Cryst. (2021). D77, 674-689 domain and a conserved (/) 8 TIM-barrel catalytic domain. Size-exclusion chromatography confirmed that VhGlcNAcase is a monomeric enzyme in solution, with an apparent MW of 74 kDa. In the complex with product, one GlcNAc was found at subsite À1 of the highly negatively charged catalytic pocket. Binding of GlcNAc induces local conformational changes around the À1 subsite, where the sugar makes contacts with polar side-chain residues through a network of hydrogen bonds and stacks against the side chain of Trp582. Sitedirected mutagenesis and kinetic analysis suggest that the acidic pair Asp437 and Glu438 play an important role in catalysis. Docking of VhGlcNAcase with a branched GlcNAcderived glycan reveals severe steric clashes between the branched sugar and the active-site surface, consistent with the fact that VhGlcNAcase prefers chitin-derived chitooligosaccharides as natural substrates. The structural insight into VhGlcNAcase provides a further understanding of how Vibrio bacteria can thrive in marine ecosystems using chitin as their sole carbon source. From a biotechnological point of view, chitinases and GlcNAcases from Vibrio species are chitindegrading enzymes that are naturally responsible for the recycling of chitin, and a one-pot reaction of these two enzymes could act as a powerful biocatalyst for the complete conversion of chitin biomass into small sugar products that can be used as starting materials for further chemical modifications or for the organic synthesis of highly compatible chitinderived functional biomaterials.