Structure of the Mycobacterium smegmatis α-maltose-1-phosphate synthase GlgM

GlgM is responsible for the biosynthesis of α-maltose-1-phosphate, the building block for the third known biosynthetic pathway to glycogen/α-glucan, which is a target for antimycobacterials. Here, the first known structure of GlgM is reported.


Introduction
Glycogen is a ubiquitous carbon-storage molecule composed of a glucose polymer constructed with -1,4 linkages and -1,6 branch points (Preiss, 2009). The polymerization of glycogen was long thought to be catalysed solely by glycogen synthases that use NDP-glucose as a donor. The bacterial enzyme uses ADP-glucose as the sugar donor [GlgA, GT5 family (Lombard et al., 2014), EC 2.4.1.21, ADP--d-glucose:(1!4)--d-glucan 4--d-glucosyltransferase; Fig. 1a], while the eukaryotic enzyme uses UDP-glucose (GT3 family, EC 2.4.1.11). However, another bacterial polymerase has been identified that does not utilize an NDP-glucose donor. GlgE uses -maltose-1-phosphate as the donor of a disaccharide (Fig. 1b;Kalscheuer et al., 2010;Bornemann, 2016). All three polymerases use a GlgB branching enzyme to generate the mature branched polymer in conjunction with the appropriate polymerase. It is the combination of the polymerase and the branching enzyme that dictates the exact properties of the mature polymer (Rashid et al., 2016). Importantly, the polymer generated by Mycobacterium tuberculosis (often referred to by the generic term -glucan) has been shown to be important in pathogenesis through its interaction with immune receptors Kalscheuer et al., 2019).
An initial estimate of the occurrence of each bacterial pathway, based on genome sequences, suggested that 32% and 14% of bacteria possess the GlgA-dependent and GlgEdependent pathways, respectively (Chandra et al., 2011). However, the figure for the former had to be revised down when it became clear that about a third of the GlgA homologues were not GT5 family members but belonged to GT4 (Henrissat et al., 2002;Koliwer-Brandl et al., 2016). Indeed, ISSN 2053-230X whenever a Gram-positive bacterium possessed the glgE gene, the GlgA homologue was invariably a GT4 family member. The significance of this was apparent when the GlgA homologue from M. tuberculosis was shown not to possess glycogen synthase activity but -maltose-1-phosphate synthase activity [EC 2.4.1.342, ADP--d-glucose:-d-glucose-1-phosphate 4--d-glucosyltransferase (configuration retaining)]. Thus, the enzyme uses ADP-glucose as the donor and -glucose-1phosphate as the acceptor. UDP-glucose can also be used as a donor, but less efficiently. This previously unreported enzyme activity was then named GlgM (Fig. 1b). It was subsequently necessary to revise the occurrence of the GlgA pathway in bacteria down to 20%.
The configuration of the -glucan (glycogen) metabolic pathways in M. tuberculosis is now known to be very different to that assumed a few years ago .
A key feature is that the polymer is generated solely by the GlgE pathway and that its building block is generated either by the phosphorylation of maltose or by the action of GlgM (Fig. 1b). There were no reported structures of GlgM, so the aim of this study was to elucidate one from mycobacteria.

Protein production
The glgM gene (MSMEG_5080) from M. smegmatis mc 2 155 was synthesized with optimum codon usage for expression in Escherichia coli (GenScript, Piscataway, New Jersey, USA), allowing expression with a 6ÂHis tag and a TEV cleavage site at the N-terminus of the protein. The construct was ligated into a pET-21a expression vector (Novagen, Darmstadt, Germany) using the NdeI and BamHI restriction sites. GlgM was produced in E. coli BL21 Star cells (AMS Biotechnology Europe) which were grown at 18 C to an OD 600 of 0.6 in lysogeny broth (LB), when expression was induced with 0.5 mM isopropyl -d-1-thiogalactopyranoside (IPTG). The cells were harvested after a further 16 h of incubation. The protein was purified using a 1 ml HisTrap FF column (GE Healthcare, Amersham, United Kingdom) with imidazolegradient elution. The protein was dialysed against 20 mM Tris pH 8.5 and concentrated to 12 mg ml À1 . Aliquots were stored at À80 C. Size-exclusion chromatography was carried out using a Superdex 200 10/300 GL column eluted with 50 mM bis-Tris pH 6.0 containing 5 mM MgCl 2 and 100 mM NaCl.

Enzyme assay
GlgM activity was monitored by following ADP release using a continuous enzyme-coupled spectrophotometric assay involving the oxidation of NADH . Chemicals were purchased from Sigma-Aldrich. Unless otherwise stated, enzyme assays were carried out at 37 C in 50 mM bis-Tris propane pH 6.0 containing 5 mM MgCl 2 , 0.3 mM NADH, 1 mM phosphoenolpyruvate, 1 U lactate dehydrogenase, 1 U pyruvate kinase and 0.2 mg ml À1 bovine serum albumin. Saturation kinetics for ADP-glucose and -glucose-1-phosphate were measured in a Costar 96-well plate using a BMG Clariostar plate reader. Eight concentrations each of ADP-glucose and -glucose-1-phosphate were used from 0.06 to 8.0 mM. The effect of pH was measured using sodium/potassium phosphate pH 5.0, bis-Tris pH 6.0 and bis-Tris propane pH 7.0, 8.0 and 9.0, with 1 mM each of ADPglucose and -glucose-1-phosphate. Temperature dependence (25-50 C) was also determined with 1 mM each of ADPglucose and -glucose-1-phosphate using a Perkin Elmer Lambda 25 spectrophotometer. Enzyme concentrations were set to allow reactions to progress linearly for 5 min in a cuvette or for 100 s in a plate reader, with the total donor consumption being <10%. Initial rates (v 0 /[E]) were determined by monitoring the absorption at 340 nm.
Kinetics curves were fitted to either a Michaelis-Menten (1) or a substrate-inhibition model (2):

Crystallization and data collection
Crystallization experiments were performed using a protein concentration of approximately 12 mg ml À1 in 20 mM Tris pH 8.5 and at a temperature of 20 C. Screening was conducted by sitting-drop vapour diffusion in MRC 96-well crystallization  The two bacterial biosynthetic pathways to branched -glucan (glycogen). (a) The well known GlgA-dependent pathway uses ADPglucose (ADPG) as the building block for the polymerase (for example in E. coli). (b) The configuration of the pathway in M. tuberculosis, which lacks GlgA. The recently discovered GlgE pathway uses -maltose-1phosphate (M1P) as the building block for the polymerase. One route to this building block is via GlgM (GT4), a homologue of GlgA (GT5). G6P, glucose-6-phosphate; G1P, glucose-1-phosphate; UDPG, UDP-glucose; T6P, trehalose-6-phosphate. The figure is adapted from  plates (Molecular Dimensions) with a mixture of 0.3 ml precipitant (from both commercial and in-house screens) and 0.3 ml protein solution, using either an OryxNano or an Oryx8 crystallization robot (Douglas Instruments). A variety of commercially available screens were set up and promising conditions were optimized with the latter robot using the same crystallization format. Suitable crystals were grown from 25%(w/v) PEG 3350, 0.2 M malonate, 100 mM bis-Tris propane pH 6.5 and were subsequently cryoprotected using the crystallization conditions supplemented with 15%(v/v) ethylene glycol. Heavy-atom-derivative crystals were prepared by soaking for 90 min in the same cryoprotectant solution containing approximately 1 mM HgCl 2 and were then backsoaked for a few seconds in cryoprotectant lacking the heavy atom.
Crystals were harvested and flash-cooled in liquid nitrogen using LithoLoops (Molecular Dimensions). The mounted crystals were stored in Unipuck cassettes (MiTeGen) prior to transport to Diamond Light Source (DLS), Oxfordshire, United Kingdom, where they were transferred robotically to the goniostat on beamline I03 or I04 and maintained at À173 C with a Cryojet cryocooler (Oxford Instruments). For the best native data set, a single pass of 3600 Â 0.1 images was collected with the detector set to a resolution of 1.8 Å at a wavelength of 0.9795 Å . For the derivative data collections, the wavelength was set to 1.0052 Å , which is 50 eV above the theoretical L III X-ray absorption edge for mercury. The best derivative data set comprised two consecutive passes of 7200 Â 0.1 images collected with the detector set to resolutions of 3.5 and 3.7 Å , respectively. X-ray diffraction data were recorded using a PILATUS 6M hybrid photon-counting detector (Dectris) and were integrated using XDS (Kabsch, 2010) and scaled and merged using AIMLESS (Evans & Murshudov, 2013) via the xia2 system (Winter, 2010); the resultant data-collection statistics are summarized in Table 1. All of the crystals belonged to space group P2 1 2 1 2, but the unit-cell parameters showed considerable crystal-to-crystal variation. For the best native and derivative data sets, these were a = 135.37, b = 144.93, c = 46.47 Å and a = 126.35, b = 137.45, c = 48.89 Å , respectively. Nevertheless, crystalcontent estimates suggested that both crystal forms would contain two copies of the 43 485.3 Da protomer (calculated from the native GlgM sequence plus the uncleaved affinity tag) per asymmetric unit, giving solvent contents of approximately 53% and 50% for native and the derivative, respectively.

Structure solution and refinement
All subsequent steps were implemented via the CCP4i2 GUI (Potterton et al., 2018). Initial attempts to solve the structure by molecular replacement with a variety of templates did not yield convincing solutions, most likely owing to the relatively low amino-acid sequence identities of the templates used compared with GlgM (for example 27% for Pyrococcus abyssi GlgA; PDB entries 3fro and 2bis; Díaz et al., 2011;Horcajada et al., 2006). The structure of GlgM was subse-quently solved at 3.5 Å resolution by single-wavelength anomalous dispersion using the Hg-derivative data set through the CRANK2 pipeline (Skubá k & Pannu, 2013), which located 12 sites, but the density-modification stage performed by Parrot (Cowtan, 2010) did not detect any twofold noncrystallographic symmetry. In the final stage, Buccaneer (Cowtan, 2006) built a very incomplete and highly fragmented preliminary model. Nevertheless, the two longest fragments appeared to belong to a putative two-domain subunit. These were extracted from the model by editing in Coot (Emsley et al., 2010) and accounted for 61% of the residues expected for the subunit. This coordinate file was converted to a polyalanine model and then split into separate domains. These were used as inputs for a Phaser (McCoy et al., 2007) (hkl) is the ith observation of reflection hkl, hI(hkl)i is the weighted average intensity for all observations i of reflection hkl and N(hkl) is the number of observations of reflection hkl. § CC 1/2 is the correlation coefficient between symmetry-equivalent intensities from random halves of the data set. } The data set was split into 'working' and 'free' sets consisting of 95% and 5% of the data, respectively. The free set was not used for refinement. † † The R factors R work and R free are calculated as follows: R = P hkl jF obs j À jF calc j = P hkl jF obs j, where F obs and F calc are the observed and calculated structure-factor amplitudes, respectively. ‡ ‡ As calculated using MolProbity.
giving R work and R free values of 0.511 and 0.531, respectively, to 1.9 Å resolution. Further refinement with REFMAC5 (Murshudov et al., 2011) did not yield any significant improvement (R work and R free values of 0.494 and 0.530, respectively). The model was then completely rebuilt with Buccaneer to give an 83% complete model with R work and R free values of 0.284 and 0.313, respectively, to 1.9 Å resolution. Thereafter, the model was completed by several iterations of manual rebuilding in Coot and restrained refinement in REFMAC5; TLS group definitions obtained from the TLSMD server (http://skuld.bmsc.washington.edu/~tlsmd/; Painter & Merritt, 2006) were used in the later stages of refinement. The geometry of the final model was validated with MolProbity (Chen et al., 2010) and B-factor information was extracted via the validation tool in CCP4i2 before submission to the Protein Data Bank (see Table 1 for a summary of model statistics).

Enzyme activity of GlgM from M. smegmatis
Attempts to crystallize the -maltose-1-phosphate synthase GlgM from M. tuberculosis were unsuccessful. We therefore switched our focus to the GlgM homologue from M. smegmatis, which shares 77% identity with the enzyme from M. tuberculosis. The recombinant protein exhibited -maltose-1-phosphate synthase activity as expected. The pH and temperature optima were 6.0 and 40 C, respectively. The k cat values for both -glucose-1-phosphate and ADP-glucose were an order of magnitude higher than for the M. tuberculosis enzyme , while the corresponding values of K m were broadly similar ( Table 2). The kinetics of the enzyme conformed to a ternary-complex mechanism. Furthermore, they were consistent with those of the M. tuberculosis enzyme, which exhibits a compulsory-order ternary-complex mechanism, whereby ADP-glucose binds to the enzyme before -glucose-1-phosphate. For example, the kinetics could be fitted to the Michaelis-Menten equation (1) when varying the ADP-glucose concentration with a fixed -glucose-1-phosphate concentration. In addition, the k cat for ADP-glucose increased and then decreased as the fixed -glucose-1-phosphate concentration increased. In other words, -glucose-1-phosphate exhibited substrate inhibition at high concentrations, as was manifest when varying its concentration with a fixed ADP-glucose concentration. Inhibition would be expected to occur when -glucose-1-phosphate unproductively binds before the ADP-enzyme complex dissociates in the last step of the catalytic cycle. Finally, the K m and K i for -glucose-1-phosphate were broadly independent of the ADP-glucose concentration and were within an order of magnitude of each other. Although the M. tuberculosis GlgM enzyme exhibited a trace of glycogen synthase activity, which was three orders of magnitude lower than its -maltose-1phosphate synthase activity , the M. smegmatis enzyme exhibited no detectable glycogen synthase activity. In conclusion, GlgM from M. smegmatis exhibits high -maltose-1-phosphate synthase (EC 2.4.1.342) activity.

Structure of the a-maltose-1-phosphate synthase GlgM
As expected for a GT4 family member, the GlgM protomer comprised two domains, each with an / architecture resembling a Rossmann fold (Figs. 2a and 2b). All but two of the N-terminal His-tag amino acids were unresolved. Using DALI (Holm & Rosenströ m, 2010), the nearest known structural homologues to GlgM were identified. These were invariably fellow glycosyltransferases. The lowest r.m.s.d. of the hits was 2.4 Å (PDB entry 3mbo; Bacillus subtilis str. Sterne UDP-GlcNAc:l-malate -N-acetylglucosaminyltransferase; BshA; Parsonage et al., 2010) and the highest protein sequence identity was only 28% (PDB entry 3c4v; Corynebacterium glutamicum UDP-GlcNAc:1-l-myo-inositol-1-P -N-acetylglucoaminyltransferase; MshA; Vetting et al., 2008). Both of these hits were GT4 family members, as expected, but neither hit used the same donor or acceptor as GlgM.
The top hits also included GT5 family bacterial glycogen synthases from E. coli (PDB entry 2qzs; Fig. 2c; Sheng et al., 2009), Pyrococcus abyssi (PDB entries 3fro and 2bis; Díaz et al., 2011;Horcajada et al., 2006) and Agrobacterium tumefaciens (PDB entry 1rzu; Buschiazzo et al., 2004). The lowest r.m.s.d. was 2.6 Å and the highest protein sequence identity was 27%, both of which were for the P. abyssi enzyme, and were very similar values to those of the best GT4 hits. Like GlgM, these enzymes use ADP-glucose as a donor. The structural similarity between these proteins confirmed that GlgM had a GT-B fold. This was to be anticipated because both GT4 and GT5 family members are expected to have this fold according to the Carbohydrate Active Enzymes database (Lombard et al., 2014).
The glycogen synthases are known to exist in either open or closed conformations, which generally correspond to states with vacant and occupied active sites, respectively. However,   (Sheng et al., 2009;Fig. 2d) and P. abyssi (Díaz et al., 2012) Sheng et al., 2009) and within the same P. abyssi structure (PDB entry 3fro), although the A chain was closest to a closed conformation and the B chain was closest to an open conformation. It is likely that there is much more flexibility in this protein family in solution and that the conformations observed in crystal structures are determined in a large part by crystalpacking forces, especially when the active sites are vacant. Analysis of all inter-subunit contacts present within the GlgM crystal using the jsPISA server (http://www.ccp4.ac.uk/ pisa/; Krissinel, 2015) revealed one significant interface of 1041 Å 2 , in which three parallel -helices from the N-terminal domain of one subunit interact in an antiparallel fashion with the equivalent helices in a noncrystallographic twofold-related subunit to give a six-helix bundle and an elongated homodimer (Figs. 2a and 2b). Structural differences in the equivalent regions of the bacterial glycogen synthases would preclude the formation of such an interface. Indeed, the E. coli enzyme is reported to be monomeric (PDB entry 2qzs; Fig. 2c; Sheng et al., 2009), whilst the P. abysii enzyme is trimeric in solution (Horcajada et al., 2006) and within crystals (PDB entry 3fro; Fig. 2f; Díaz et al., 2011). In the trimer, the interactions are also through the N-terminal domains, but in a manner that is distinct from that seen in GlgM (compare Figs. 2a and 2b with Fig. 2f ). Confirmation that GlgM forms a dimer in solution was obtained using size-exclusion chromatography. This indicated a molecular mass of 107 kDa, which is close to the theoretical mass of a dimer (87 kDa); the slight overestimation of this value could be owing to the elongated aspect of the GlgM homodimer.
The C-terminal domains of the glycogen synthases are known to be primarily responsible for binding their nucleotide sugar phosphate donors. There were indeed fewer secondarystructural differences between GlgM and the glycogen synthases in this domain (compare the left-hand coloured domain in Figs. 2b and 2c), as expected given their shared donor substrates. Several amino-acid side chains that are known to interact with ADP-glucose in the E. coli and A. tumefaciens glycogen synthases are conserved in GlgM and in the glycogen synthase from P. abyssi (Fig. 2g). These are His109 (His161 in the E. coli enzyme), Val146 (Val211), Asn171 (Asn246), Arg207 (Arg300), Lys212 (Lys305), Glu289 (Glu377), Gly292 (Gly380) and Ile293 (Leu381). All of these residues are shared between the GlgM enzymes from M. smegmatis and M. tuberculosis.
By contrast, the N-terminal domain is likely to be largely responsible for binding the acceptor substrate, and there were more significant differences in secondary structure; most notably, the central parallel -sheet is two strands shorter in GlgM relative to the bacterial GlgA glycogen synthases (compare the right-hand coloured domain in Figs. 2b and 2c). This would be expected given the significant differences between -glucose-1-phosphate and a polymeric glucan substrate. The acceptor-binding sites of the glycogen synthases have not yet been defined in detail. Likewise, we have been unable to date to obtain ligand-bound structures of GlgM, making it difficult to rationalize these structural differences with respect to function. For example, the soaking of crystals with either ADP-glucose, glucose-1-phosphate, ADP-glucose plus glucose-1-phosphate or ADP plus glucose-1-phosphate did not yield ligand-bound structures. Furthermore, co-crystallization with both ADP and glucose-1-phosphate did not yield suitable crystals.
In conclusion, we have shown that GlgM from M. smegmatis exhibits high -maltose-1-phosphate synthase activity and that it shares a GT-B fold with bacterial glycogen synthases. Further studies will be required to establish which structural features define the specificity for the acceptor in this enzyme.