Structural and functional studies of the glycoside hydrolase family 3 β-glucosidase Cel3A from the moderately thermophilic fungus Rasamsonia emersonii

Cel3A from the thermophilic fungus R. emersonii has proven to be more efficient in the hydrolysis of β-glycosidic linkages than Cel3A from H. jecorina.


Introduction
The complete degradation and saccharification of cellulose requires a suite of synergistically acting enzymes. Retaining -glucosidases (BGLs) belong to glycoside hydrolase (GH) families GH1, GH3, GH5, GH30 and GH116 (Lombard et al., 2014). They hydrolyse the -linkage from the reducing end of glucose oligosaccharides. These enzymes are secreted by cellulose-degrading organisms and it has been shown that enzyme mixtures with enhanced levels of the native GH3 Cel3A from the mesophilic fungus Hypocrea jecorina (HjCel3A) benefit the conversion of cellulose to glucose. The GH3 Cel3A from the thermophilic fungus Rasamsonia emersonii (ReCel3A) is a more efficient additive to enzyme mixtures compared with HjCel3A. Biochemical characterization of ReCel3A revealed a substrate preference for disaccharides over longer oligosaccharides. The crystal structure of ReCel3A is a tetramer composed of two biological dimers. Each protein molecule has a three-domain architecture, as observed for previous glycoside hydrolase family 3 BGLs.
Interesting features of the structure are the long C-terminal linker that extends the active-site cleft and the high degree of N-glycosylation. There are N-glycan chains that are partially covered by the extended linker and N-glycans that comprise part of the dimeric interface.

The need for fuels from biomass
The production of biofuels and chemicals in biorefineries from biomass, in lieu of nonrenewable petrochemicals, has garnered much attention in recent years (Ragauskas et al., 2006;Kamm & Kamm, 2007). Cellulose is a major structural polysaccharide in plant cell walls and is a highly attractive renewable energy source as it is the most abundant polysaccharide on earth (Chundawat et al., 2011). Cellulose is built up of -1,4-linked glucose molecules and can be degraded to the monosaccharide glucose using enzymes. In the cell wall, cellulose molecules are organized into fibrils in which the chains are parallel to each other. Intermolecular hydrophobic and hydrophilic interactions hold the cellulose chains in a fibril together. Cellulose fibrils are highly recalcitrant and are not readily accessible to microbial and enzymatic degradation (Himmel et al., 2007). The filamentous fungus H. jecorina is capable of producing large amounts of extracellular plant polysaccharide-degrading enzymes (Martinez et al., 2008) and these enzymes have been used for a wide variety of industrial applications (Nakari-Setä lä et al., 2009). To degrade lignocellulosic biomass H. jecorina produces a set of cellulases, which work together synergistically to degrade the recalcitrant cellulose polymer. The three main groups of cellulosedegrading enzymes are endoglucanases [endo-(1,4)--dglucanhydrolases; EC 3.2.1.4], which randomly cleave the -1,4 linkage between two adjacent glucose units in the cellulose polymer, cellobiohydrolases [(1,4)--d-glucan cellobiohydrolases; EC 3.2.1.91], which processively release the disaccharide cellobiose from either the reducing or the nonreducing end of a polymer, and BGLs (EC 3.2.1.21), which hydrolyse cellobiose into glucose monosaccharides.

Improvement of cellulase mixtures: ratio optimization, protein engineering and enzyme-homology screen
The enzymes needed for the industrial production of biofuels from lignocellulosic biomass represent a significant part of the process costs. Therefore, there is interest in obtaining enzyme mixtures with increased performance. One approach is to optimize the enzyme ratios of the mixture. It has been shown that enriching the H. jecorina secretome with additional amounts of the endogenous BGL Cel3A (HjCel3A) increases the performance of the mixture in the conversion of cellulose to glucose (Karkehabadi et al., 2014;Barnett et al., 1991). Alternatively, enzyme cocktails can be improved by protein engineering (Lantz et al., 2010). An approach for further improving the enzyme mixtures is to substitute components with homologues from alternative sources.
In this study, ReCel3A was cloned and expressed heterologously in H. jecorina and subsequently purified for crystallization and biochemical characterization. ReCel3A is an efficient cellobiase. We compared the efficiency of the hydrolysis of lignocellulosic biomass by mixtures that contained either ReCel3A or Cel3A from H. jecorina (HjCel3A). Detailed biochemical analysis revealed that mixtures containing ReCel3A yielded a significantly improved performance. We also present the three-dimensional crystal structure of Cel3A from the moderately thermophilic filamentous fungus R. emersonii solved to 2.2 Å resolution. The structure was solved with an intact extensive C-terminal loop, similar to those observed for AfG and AoG (Agirre et al., 2016) but different compared with the partially flexible or proteolytically cleaved C-terminal loop observed in the structure of AaBGL1 (Suzuki et al., 2013). In addition, it exhibits extensive N-glycosylation.

Expression and purification of R. emersonii Cel3A
The cel3a gene from R. emersonii (GenBank AAL69548.3) was codon-optimized for expression in H. jecorina and synthesized by GeneArt (now LifeTechnologies, Grand Island, New York, USA). The synthetic gene was cloned into a pTrex3G shuttle vector (amdS R , amp R , P cbh1 ; Foreman et al., 2005). This construct was then used for the transformation of a derivative of H. jecorina strain RL-P37 with the four major cellulases deleted (cel5A, cel6A, cel7A and cel7B; Foreman et al., 2005). Transformants of H. jecorina were picked from Vogel's minimal medium plates (Vogel, 1956) containing acetamide after 7 d incubation at 37 C. Picked transformants were grown in Vogel's minimal medium with a mixture of glucose and sophorose as a carbon source. The resulting H. jecorina strain expressed ReCel3A at levels of greater than several grams per litre, constituting more than 50% of the total secreted protein, as judged by SDS-PAGE. The supernatant was concentrated to 168 g of total protein per litre by ultrafiltration at 4 C using Vivaspin 20 centrifuge concentration tubes with 3000 Da molecular-mass cutoff (Sartorius Stedim Biotech, France).
The ReCel3A culture liquid was sterile-filtered (Sarstedt Filtropur 0.2 mm filters) and then purified on an Ä KTAexplorer (GE Healthcare Biosciences, Sweden) by gel filtration using a Superdex 200 16/60 GL column (GE Healthcare Biosciences, Sweden). The column was equilibrated with 25 mM bis-tris propane pH 7.5. Elution fractions containing ReCel3A were concentrated using Vivaspin 20 centrifuge concentration tubes with 3000 Da molecular-mass cutoff (Sartorius Stedim Biotech, France) to a concentration of 15 mg ml À1 for enzyme-crystallization studies. The ReCel3A protein was further purified by affinity chromatography using a p-aminobenzyl-thio--glucopyranoside tag coupled to activated Sepharose (GE Healthcare, Uppsala, Sweden) according to the manufacturer's instructions. The affinity column was equilibrated and washed with 100 mM acetate buffer pH 5.0 containing 200 mM NaCl. The bound protein was eluted from the column with 100 mM glucose in 100 mM acetate buffer pH 5.0. Glucose was removed by repeated concentration and dilution using the Vivaspin 20 tubes mentioned above. The ReCel3A sample was highly pure as judged by SDS-PAGE after the affinity-chromatography purification. The purified samples were used for kinetic analyses. The protein concentration was estimated by measuring the absorbance of the protein solution at 280 nm using a calculated extinction coefficient of 165 630 M À1 cm À1 for ReCel3A.

Enzyme kinetics of ReCel3A
Kinetic characterization of ReCel3A was carried out using the substrates 2-chloro-4-nitrophenyl--d-glucopyranoside (CNPG) and 4-nitrophenyl--d-glucopyranoside (pNPG) (Sigma-Aldrich, USA). Both assays were run at 37 C in 100 mM phosphate buffer pH 5.0 in Eppendorf tubes incubated in a Thermomixer R (Eppendorf, Germany). Enzyme at a suitable concentration (1-0.5 nM) was added for single measurements in each experiment to 600 ml substrate solution. At each time point, 100 ml of reaction mixture was withdrawn and added to 100 ml 0.5 M Na 2 CO 3 . The absorbance of the sample was then measured at 415 nm in a spectrophotometer. The initial velocity {[CNP] (mM min À1 ) and [pNP] (mM min À1 )} was calculated using a standard curve for CNP and pNP in the range 0-30 mM. The kinetic parameters were calculated by fitting the data to the Michaelis-Menten equation with Plot (Wesemann, 2007). Using the two natural substrates cellobiose and cellotriose, the reaction was followed by detecting the catalytic products using high-performance anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD; Dionex ICS-3000, Sunnyvale, California, USA). A solution with a substrate concentration of 50-3000 mM and enzyme at 1.4-0.6 nM concentration was incubated at 37 C at pH 5.0. An aliquot of 30 ml of the sample was withdrawn and added to 30 ml of 0.1 M NaOH to stop the reaction and this was performed at 2 min intervals for 10 min. Each sample was then loaded onto a CarboPac PA-100 analytical column (4 Â 250 mm; Dionex, Sunnyvale, California, USA). Elution was performed using 100 mM NaOH and a gradient of sodium acetate from 10 to 170 mM in 100 mM NaOH over 27 min at a flow rate of 1 ml min À1 . Quantification of the hydrolysis products was performed using standards for the hydrolytic products.

Differential scanning calorimetry
ReCel3A samples were dialyzed against 10 mM sodium acetate pH 5.0. The samples were diluted to 0.5 mg ml À1 in the absence and presence of 1 mM glucose. The heat capacity was recorded over a temperature trajectory of 30-100 C at a scan rate of 200 C h À1 using a MicroCal VP-Capillary DSC microcalorimeter (GE Healthcare, Pittsburgh, Pennsylvania, USA). Unfolding was irreversible for all tested samples.

Saccharification assay
Corn stover was pretreated with dilute sulfuric acid by the US Department of Energy National Renewable Energy Laboratory (NREL). It was washed with water and the pH was adjusted to 5.0 using soda ash. The acid-pretreated corn stover contained 56% cellulose, 4% hemicellulose and 29% lignin. Enzymes were dosed based on total protein load, and total protein was measured using either a bicinchoninic acid (BCA) assay kit (Bio-Rad, Hercules, California, USA) or the biuret method (Lowry et al., 1951). The enzyme was dosed as milligrams of protein per gram of cellulose in the reaction. Various amounts of ReCel3A (0.1-10 mg g À1 ) in an experimental setup with four replicates were added to a base level of research papers 10 mg g À1 P37 Ábgl1, which is H. jecorina strain P37 with the bgl1 (cel3A) gene deleted. 75 ml of pretreated corn stover (PCS, loading 7% cellulose) per well was loaded into a flatbottom 96-well microtitre plate (MTP). 30 ml of appropriately diluted enzyme solution was added to each reaction well. The plates were covered with aluminium plate sealers and incubated in a plate incubator at 50 C with shaking. The reaction was terminated after 48 h incubation by adding 100 ml 100 mM glycine pH 10. After thorough mixing, the reaction mixtures were filtered through a 96-well filter plate (0.45 mm, PES; Millipore, Billerica, Massachusetts, USA). The filtrate was diluted into a plate containing 100 ml 10 mM glycine pH 10.0, and the amount of soluble sugars produced was measured by HPLC (Agilent 1100, Agilent, Santa Clara, California, USA) equipped with a de-ashing guard column (catalogue No. 125-0118, Bio-Rad, Hercules, California, USA) and a lead-based carbohydrate column (Aminex HPX-87P, Medway, Massachusetts, USA). The mobile phase was water and the flow rate was 0.6 ml min À1 . The fractional cellulose conversion was calculated from the amounts of released glucose and cellobiose divided by the maximum possible amount of glucose that can be produced. The amounts of cellobiose were corrected for the weight of one extra water molecule upon hydrolysis to glucose.

Crystallization and data collection
Crystallization of ReCel3A was carried out at 293 K using the hanging-drop vapour-diffusion method (McPherson, 1999). Crystallization drops were produced by mixing 1 ml 18.9 mg ml À1 protein solution in 25 mM bis-tris propane (Sigma-Aldrich, USA) pH 7.5 with 1 ml reservoir solution consisting of 0.15 M MgCl.6H 2 O (Merck Millipore, Germany), 16%(w/v) polyethylene glycol (PEG) 3350 (Hampton Research, USA). Rectangular crystals appeared in the drops within one week. Prior to X-ray data collection, crystals of ReCel3A were transferred to a cryoprotectant solution consisting of 40%(v/v) 2-methyl-2,4-pentanediol (MPD; Hampton Research, USA) and 60%(v/v) reservoir solution before flash-cooling them in liquid nitrogen. X-ray diffraction data for ReCel3A were collected on the ID23-1 beamline at the European Synchrotron Radiation Facility (ESRF), Grenoble, France.

Structure determination and refinement
The collected X-ray data were processed using XDS (v. of the X-ray diffraction data were excluded from the refinement for R free calculations (Brü nger, 1992). Throughout the refinement, the electron-density maps were inspected and the model was manually adjusted during repetitive cycles of iterative model building using Coot v.0.8.3 (Emsley et al., 2010) and maximum-likelihood refinement using REFMAC v.5.8.0135 (Murshudov et al., 2011). Water molecules were added using ARP/wARP v.7.1 (Lamzin & Wilson, 1993). Statistics from data processing and structure refinement are summarized in Table 1. Figures were produced using PyMOL (DeLano, 2002) and Plot (Wesemann, 2007). The secondary-structure elements were assigned using STRIDE (Heinig & Frishman, 2004). Sequence similarities were calculated with ClustalW (Larkin et al., 2007). Root-mean-square deviation values (r.m.s.d.s) were calculated using LSQMAN (Kleywegt, 1996). Protein-interface volumes were calculated using PISA (Krissinel & Henrick, 2007). Atom coordinates and structure factors have been deposited in the Protein Data Bank (PDB) with accession code 5ju6.

Results and discussion
3.1. ReCel3A production and purification The GH3 BGL ReCel3A was heterologously produced in an H. jecorina strain with the four major cellulase genes (cbh1, cbh2, egl1 and egl2) deleted. In this background ReCel3A was the major protein as judged by SDS-PAGE analysis. This simplified the subsequent purification steps in comparison to the previous production of ReCel3A in a wild-type H. jecorina strain (Murray et al., 2004). ReCel3A was purified to homogeneity using a custom affinity column. The p-aminobenzyl--d-glucose affinity matrix was an efficient step to separate ReCel3A from background proteins.   (hkl) is the intensity of the ith measurement of an equivalent reflection with indices hkl and hI(hkl)i is the mean intensity of I i (hkl) for all i measurements. ‡ Calculated using a strict-boundary Ramachandran definition given by Kleywegt & Jones (1996). § Calculated using the Privateer software (Agirre et al., 2015) within CCP4i2 and presented as introduced by

Hydrolysis of lignocellulosic biomass
Previously, we demonstrated that H. jecorina cellulase mixtures with increased levels of the native BGL HjCel3A have enhanced cellulose-degradation activity (Karkehabadi et al., 2014). In this study, H. jecorina P37 Ábgl1 whole cellulase (lacking HjCel3A) mixtures supplemented with increasing levels of BGL (either ReCel3A or HjCel3A) were compared for the degradation of PCS (Fig. 1). The mixtures containing ReCel3A showed an up to 25% increase in glucose release compared with mixtures with an equal amount of HjCel3A added. These results encouraged us to study ReCel3A in more detail biochemically and to solve its three-dimensional structure using X-ray crystallography.

Enzyme kinetics
The accumulation of cellobiose during enzymatic biomass degradation severely inhibits the activity of cellulases, especially glycoside hydrolase family 7 cellobiohydrolases (Bezerra et al., 2006;Gruno et al., 2004). We have previously shown that increasing the amount of endogenous HjCel3A in mixtures of H. jecorina whole cellulase increases the conversion of phosphoric acid-swollen cellulose and washed PCS to glucose (Karkehabadi et al., 2014). The interest in the enzyme ReCel3A stemmed from the initial biochemical characterization performed by Murray et al. (2004). In this study it was shown that ReCel3A was a relatively thermostable GH3 BGL and retained much of its activity even at higher temperatures. We investigated the enzymatic properties of ReCel3A on different soluble glucan substrates, as reported in Table 2. The highest catalytic efficiency of ReCel3A among the substrates tested was for hydrolysing 2-chloro-4-nitrophenyl--d-glucopyranoside. More interestingly, there was a higher k cat /K m towards cellobiose over cellotriose. Hrmova et al. (1998) have previously shown that the barley BGL ExoI (HvExoI) has an increased affinity towards longer cellodextrins. This is also the case for HjCel3A (Karkehabadi et al., 2014), which when combined with their reported broad substrate affinity indicates that hydrolysing accumulating cellobiose during the degradation of cellulose might not be the primary or the only biological function of GH3 BGLs. The catalytic efficiencies of ReCel3A for the hydrolysis of model substrates, as found by Murray et al. (2004), are in general lower than those found for HjCel3A (Karkehabadi et al., 2014). We also compared the melting temperatures of ReCel3A and HjCel3A (Table 3). ReCel3A has an 8 C higher melting temperature compared with HjCel3A in the presence of 1 mM glucose. The superior performance of ReCel3A on PCS could be explained by its apparent preference for cellobiose compared with other types of disaccharides and cellodextrins and potentially by its higher thermal stability compared with HjCel3A. Saccharification of washed acid-pretreated corn stover with 10 mg g À1 H. jecorina strain P37 Ábgl1 supplemented with 0.1-10 mg g À1 -glucosidases. Data points and error bars represent the mean and standard deviation of four replicates. Horizontal lines indicate the conversion levels of 10 and 20 mg g À1 P37 Ábgl1 as indicated.
(HjCel3A; PDB entry 3zz1; Karkehabadi et al., 2014) as a search model. The electron-density map obtained after molecular replacement was of very good quality, from which it became obvious that the protein was heavily glycosylated (Fig. 2). The ReCel3A structure at 2.2 Å resolution was refined to final R work and R free values of 18.8 and 23.8%, respectively. The final ReCel3A structure model, consisting of four noncrystallographic symmetry (NCS)-related ReCel3A molecules in the asymmetric unit, contains a total of 3348 amino-acid residues, 1842 water molecules and a total of 181 carbohydrate residues. The structure model contains 32 cis-peptides, and there are 36 cysteines, of which 32 form 16 disulfide bonds. Additional X-ray data-collection and refinement statistics for the ReCel3A structure model are presented in Table 1.

Overall structure
The ReCel3A crystal structure model is composed of four NCS-related ReCel3A protein molecules. The average rootmean-square deviation between the four molecules in the ReCel3A structure is 0.2 Å (with a highest deviation of 0.22 Å and a lowest deviation of 0.17 Å ). Each of these protein chains consists of 834 amino-acid residues, and the first and last visible residues in all four ReCel3A molecules in the crystal structure are Asp21 and Pro855, respectively, of the translated deposited ReCel3A DNA sequence (GenBank AAL69548.3). Residues 1-20 of the translated ReCel3A DNA sequence constitute the signal peptide, as predicted by the SignalP server (Petersen et al., 2011), and are cleaved off prior to secretion of the mature protein. The numbering of amino-acid residues in the ReCel3A structure model starts from Met1 of the pre-protein. Each one of the four NCS-related ReCel3A molecules consists of three distinct structure domains, which are connected by two linker regions. No electron density is visible for the C-terminal residues 856-857, probably owing to high flexibility in this region of the protein. All other aminoacid residues of the four NCS-related ReCel3A molecules in the structure model are well ordered, and no gaps in the electron density for the main-chain atoms are found.
The ReCel3A structure is a three-domain structure with an assembly that is similar to previously reported structures, with an N-terminal domain with a TIM-barrel-like (/) 6 fold, sometimes also referred to as a collapsed TIM barrel, a middle (/) 6 sandwich domain (coloured gold in Fig. 3), which contains the glutamic acid that acts as a general catalytic acid, and a third C-terminal domain (coloured red in Fig. 3) with an fibronectin type III-like (FnIII-like) fold. TnBgl3B (Pozzo et al., 2010), which all share the (/) 6 fold. These enzymes also have a third FnIII-like domain in common, the presence of which might contribute to stabilizing the fold of the first (/) 6 -fold domain and allow the otherwise stable TIM barrel to collapse during evolution and open up for changes around the catalytic centre.

Subsite À1 and catalytic residues
The large number of hydrogen bonds between the protein and the ligand bound in the catalytic centre makes the binding in subsite À1 of GH3 enzymes highly specific. The two catalytic residues in ReCel3A were identified based on homology to other GH3 structures: Asp277 (nucleophile) and Glu505 (acid/base) (Fig. 4). In the ReCel3A structure, clear density is observed for a glucose unit in the À1 subsite and no indication  of distortion from the relaxed chair conformation can be observed.

Putative +1 subsite
Trp278 of the ReCel3A structure aligns in the sequence with Trp268 of HvExoI, which is one side of the proposed 'coin slot' (Varghese et al., 1999). Trp278 has a similar inward shift towards the À1 subsite as the corresponding tryptophan residues in AaBGL1, HjCel3A, KmBglI and TnBgl3B (Figs. 4a-4d). The inward shifting of the tryptophan residue breaks the 'coin slot' and the rearrangement is a direct consequence of the collapsed TIM barrel described above. The À1 subsite widens when the second barrel -strand is shorter and antiparallel. One side of the proposed 'coin slot' in the +1 subsite is partially replaced by the Tyr507 side chain. Alhough originating from another loop in the second domain, the phenolic ring occupies almost the same space as the benzene ring of the 'coin slot' tryptophan (Trp434 of HvExo1) to narrow the +1 subsite and the entrance to the active site (Fig. 4e).
Next to Trp278 in ReCel3A and potentially replacing the other side of the 'coin slot' are the two conserved aromatic residues Phe302 and Trp68. When compared with the corresponding residues in HjCel3A, the plane of the Trp68 side chain has turned almost 90 away from the +1 subsite. This allows aromatic stacking of the Phe302 and Trp68 side chains to form a hydrophobic 'knob' and places the phenylalanine residue in the +1 subsite rather than in the +2 subsite as in HjCel3A. This aromatic side-chain stacking further narrows the +1 subsite and contributes to a less pronounced +2 subsite. Although these aromatic residues are present in many BGLs this stacking is not observed in HjCel3A, while it is in all three Aspergillus -glucosidases with known structure [AaBGL1 (Figs. 4a and 4b), AfG and AoG].
Previously, we have shown that HjCel3A prefers the hydrolysis of slightly longer oligosaccharides, i.e. of cellotriose and cellotetraose compared with cellobiose (Karkehabadi et al., 2014). Our data for ReCel3A show that this enzyme prefers cellobiose to cellotriose. There is no increase in activity on cellotriose compared with cellobiose, which indicates that the +2 subsite contributes relatively little to substrate recognition. For HjCel3A, the activity increased for cellotriose compared with cellobiose, thus indicating the importance of a +2 subsite for HjCel3A (Karkehabadi et al., 2014). As mentioned above, the presence of a +2 subsite is less pronounced in ReCel3A than in HjCel3A, where the phenylalanine is also complemented with an asparagine (Asn261 in HjCel3A) to form the +2 subsite. The lack of a +2 subsite in ReCel3A could explain the activity profile for the enzyme as a more pronounced cellobiase than HjCel3A.

Dimerization
It has been shown that ReCel3A forms dimers in solution (Murray et al., 2004), which was confirmed in this study when performing gel-filtration characterization of ReCel3A. Dimerization is clearly supported by the crystal structure and also by the structure of AaBGL1 (Suzuki et al., 2013).  The two molecules in the dimer are related by a 180 rotation (Fig. 3a). The dimer interface has a total overall contact area of 1572 Å 2 , to which the modelled N-glycans contribute about 19%. It is mainly formed between the (/) 6 sandwich domains, but also includes interactions between the sandwich domain and the linker between domains 1 and 2 (Fig. 5d). This is similar to what is found in the AaBglI structure (Fig. 5e), which has a contact area of 1450 Å 2 , but including the modelled N-glycans the contact area increases to 1935 Å 2 . Similar interaction surfaces were observed in the recent publication by Agirre et al. (2016) on the structures of two Aspergillus -glucosidases, one of which crystallized as a tetramer although with a slightly different architecture than that observed for ReCel3A.

N-glycosylation
ReCel3A is highly N-glycosylated, with a pattern resembling those of the three -glucosidases from Aspergillus (Agirre et al., 2016;Suzuki et al., 2013). There are a total of 16 glycosylation sites in ReCel3A with the Asn-X-Ser/Thr N-glycosylation sequon. The ReCel3A structure model contains a total of 181 glycosylation residues, as summarized in Table 4. In spite of this relatively generous glycosylation of the ReCel3A molecules, it was possible to crystallize the protein without enzymatic removal of the N-glycans prior to the crystallization experiments. A large number of carbohydrate chains attached to the ReCel3A molecules in the structure model can be observed and modelled, and the longest chain is composed of ten carbohydrate residues. We can also see that the glycosylation chains contribute interactions at the crystal contacts as well as between the two NCS-related molecules.
In ReCel3A we can model carbohydrate chains, such as Man 7 GlcNAc 2 , that are known to be prevalent in Rut-C30derived strains of H. jecorina (Stals et al., 2004). Wild-type strains of H. jecorina show a more normal endoplasmic reticulum (ER) glycosylation trimming, yielding Man 5-6 GlcNAc 2 chains. Such glycans are the result of the trimming of Glc 3 Man 9 GlcNAc 2 , which is then transferred to the nascent peptide chain in the ER by -glucosidases found in the ER. Further trimming occurs normally by the action of -mannosidases and -N-acetylglucosaminidases.
The Rut-C30-derived strains have an inefficient ER--glucosidase, which accounts for the presence of untrimmed monoglycosylated N-glycans (Stals et al., 2004). We can clearly observe both the longer incompletely trimmed glycan chains, Man 8 GlcNAc 2 , and shorter monoglycosylated Glc 1 Man 5 GlcNAc 2 and Man 5-6 GlcNAc 2 chains (Fig. 5a), as well as single N-acetylglucosamine (GlcNAc) residues, in the ReCel3A structure. Modelled carbohydrate glycans are not in themselves evidence of an N-glycosylation pattern. It is expected that glycosylation chains that are flexible and are not restrained by the protein crystal packing will not be observable using crystallographic methods. However, single GlcNAc residues are observed in the ReCel3A tetramer that cannot be part of a longer glycan as they pack tightly between protein chains and presumably provide important crystal contacts.
The exact mechanisms of how glycosylations affects the structure and function of cellulases and other proteins are unknown. N-glycosylation has been shown to increase the solubility, reduce the aggregation and  enhance the thermal stability of proteins (Wang et al., 2010;Kayser et al., 2011;Ioannou et al., 1998). In ReCel3A most of the larger glycans reside on the first domain (chains I-IV). Chains V and VII are situated on the second domain, close to the proposed dimer interface of ReCel3A. The N-glycosylation chain IV shows the remarkable feature of being buried by the extended C-terminal loop. Two conserved aromatic residues, Tyr720 and Tyr727, on the loop provide stacking interactions with the two buried NAG residues 1201 and 1202. This is very similar to what was reported for the two Aspergillus -glucosidases (Agirre et al., 2016). As stated previously, ReCel3A is most likely to exist as a dimer in nature. Interestingly, the overall glycosylation pattern for the dimer shows that the active site on each of the monomers seems to be encircled by glycans, of which some originate from the other monomer (Fig. 6a). Interestingly, the opposite face of ReCel3A is seemingly devoid of glycosylation (Fig. 3a), both modelled glycans and predicted sites, with the notable exception of glycan chain VI in protein chains A and C, where a single GlcNAc interacts with the opposite NCS-related protein molecule. The glycosylations in ReCel3A could be a contributing factor to the thermal stability of the enzyme. Another function could be to protect the active site from lignin-derived aromatic compounds or to promote substrate binding. It has been shown that N-glycans bind aromatic residues (Yamaguchi et al., 1999) and potentially cellulose (Payne et al., 2013).   Cartoon representations of (a) the R. emersonii Cel3A (ReCel3A) and (b) the A. aculeatus BGLI (AaBgl1) dimers in the two structures. N-glycans in the two structures are shown as magenta spheres.