Structural insights into the substrate-binding proteins Mce1A and Mce4A from Mycobacterium tuberculosis

The mammalian cell entry genes (mceA–mceF) encode the substrate-binding proteins of the lipid-transporting Mce complexes in mycobacteria. The MCE domain of Mce4A has been crystallized as a domain-swapped dimer with the signature β-barrel fold. Solution studies show that the domains of Mce1A and Mce4A are predominantly monomeric and suggest that the helical domain is involved in lipid interactions.


Introduction
Mycobacterium tuberculosis (Mtb) is a deadly intracellular pathogen that causes the disease tuberculosis (Tb), which is responsible for more than a million deaths every year. Approximately one quarter of the population of the world is latently infected with Mtb (World Health Organization, 2018). Mtb can persist in a host for months to years. It is one of the very few bacteria which rely on host lipids as the source of energy and carbon for intracellular survival. Additionally, it also converts these lipid molecules into precursors for cellmembrane remodeling, cell-wall homeostasis and ultimately pathogenesis (Cantrell et al., 2013;Santangelo et al., 2016;Queiroz & Riley, 2017;Zhang et al., 2018;Fenn et al., 2020;Alonso et al., 2020). This property might be most relevant during the intra-phagosomal latent stage of infection (Pandey & Sassetti, 2008). The mammalian cell entry (Mce) proteins encoded by the mce1, mce2, mce3 and mce4 operons [ Fig. 1(a)] are important proteins that play a pivotal role in the import of lipid molecules and Mtb pathogenesis (Cole et al., 1998). These operons are comprised of 10-14 genes each. Their name is based on the initial observation that a DNA fragment (corresponding to Mce1A) from Mtb (strain H37Ra), when expressed in Escherichia coli, caused cell entry of E. coli into HeLa cells (Arruda et al., 1993). Similar to Mce1A, the expression of Mce3A and Mce4A in E. coli also provides E. coli with the ability to invade HeLa cells (El-Shazly et al., 2007;Saini et al., 2008). Nevertheless, subsequent research has shown that the primary role of these proteins concerns lipid transport, and in addition Mce proteins are also involved in modulating host cell signaling, cell-wall homeostasis and cellmembrane remodeling (Alonso et al., 2020;Fenn et al., 2020;Queiroz & Riley, 2017;Santangelo et al., 2016) and are therefore important for the survival and pathogenesis of Mtb. In terms of lipid transport in Mtb, Mce proteins are characterized as ABC transporters. It is now well demonstrated that Mce1 is involved in the transport of mycolic acid/fatty acids and Mce4 imports cholesterol. Mtb that is disrupted in the Mce2 operon accumulates sulfolipid-1 at levels nearly ten times that of wild-type Mtb during stationary growth (Pandey & Sassetti, 2008;Casali & Riley, 2007;Marjanovic et al., 2011).    (Ekiert et al., 2017;Isom et al., 2020;Tang et al., 2021). Lipid transport by the EcMlaFEDB complex depends on a ferry-based lipid-transport mechanism, whereas the EcPqiB and EcLetB complexes facilitate a tunnel-based transport mechanism (Ekiert et al., 2017;Kamischke et al., 2019;Coudray et al., 2020;Isom et al., 2020;Liu et al., 2020;Mann et al., 2020). OM is the outer membrane of Mtb, PG is peptidoglycan and IM is the inner membrane. In each of these transporters the MCE domains are assembled into hexameric rings that stabilize the assembled homohexameric complexes.
The substrate specificity of the Mce3 complex is still unknown. These studies suggested that the mce operons encode the permeases (YrbEA and YrbEB) and the substrate-binding proteins (SBPs) for the formation of the ABC transporter (Casali & Riley, 2007;Perkowski et al., 2016). In addition, the mce1, mce3 and mce4 operons code for Mce-associated membrane proteins (Mam, also known as Mas), which probably stabilize the Mce complexes (Perkowski et al., 2016). The ATPase of this ABC transporter is proposed to be encoded by the mceG gene (also known as mkl), which is located elsewhere in the genome (Joshi et al., 2006).
Although important functions of Mce proteins from Mtb have been established, no detailed protein-level characterization and structural information are available on these proteins from Mtb or any other actinobacterial species. This is mainly due to difficulties in the recombinant expression and purification of these membrane proteins. Homologs of the Mce SBPs from E. coli (EcMlaD, EcPqiB and EcLetB) and Acinetobacter baumannii (AbMlaD) have recently been characterized (Ekiert et al., 2017;Kamischke et al., 2019;Coudray et al., 2020;Isom et al., 2020;Liu et al., 2020;Mann et al., 2020) [ Fig. 1(b)]. A common feature of each of these proteins is that they all contain a conserved domain of approximately 100 residues, now referred to as the MCE domain, which is characterized by a seven-stranded -barrel fold, although the sequence identity of these domains is very low. EcMlaD and AbMlaD have a single MCE domain, which forms a homohexamer in the assembled complex. EcPqiB and EcLetB have three and seven MCE domains, respectively, in a single polypeptide, which form stacks of homohexamers in the assembled complex. In contrast, each of the four Mtb mce operons encodes six different Mce SBPs and these SBPs have more domains compared with the E. coli and A. baumannii homologs. In this study, our main objectives have been to identify the various domains of MtMce1A and MtMce4A, guided by sequence analysis and secondary-structure prediction, and to perform a detailed structural characterization. The results of these studies show that the SBPs of Mtb have unique structural properties that differ from those of their bacterial counterparts.

Biochemicals
The genomic DNA of Mtb H37Rv was purchased from ATCC. Phusion DNA polymerase and the restriction enzymes used for cloning were purchased from Thermo Scientific (Massachusetts, USA) and New England Biolabs. The Ni-NTA chromatography resin was obtained from Qiagen (Hilden, Germany).

Cloning, expression and purification of MtMce1A-1F and MtMce4A-4F
Individual MtMce1A-1F and MtMce4A-4F genes were PCR-amplified using Mtb H37Rv genomic DNA as the template with specific primers (Supplementary Tables S1 and   S2). Each amplicon was cloned into pETM11 vector (EMBL) using a restriction-based cloning method, resulting in an N-terminal His 6 tag followed by a TEV protease site, the MceA-F gene and a C-terminal His 6 tag. For protein expression, the plasmid was transformed into E. coli BL21-RIPL competent cells. Overnight cultures were grown at 30 C until the OD 600 reached 0.6, and expression of the protein was induced with 0.5 mM isopropyl -d-1-thiogalactopyranoside (IPTG) at 16 C overnight. The cells were harvested by centrifugation at 4000g. The bacterial pellet was resupended in the desired lysis buffer with a suitable detergent (Supplementary Table S3). The cells were lysed by sonication and the lysate was centrifuged at 15 000g and 4 C for 30 min. The supernatant was then filtered (0.45 mm; Millipore) and the proteins were allowed to bind to the Ni 2+ -NTA matrix for 1 h. The beads were washed, and bound proteins were eluted from the Ni-NTA column using 400 mM imidazole in the elution buffer (Supplementary Table S3). At this step, the concentration of the detergent was reduced to 5 mM. The eluted protein was analyzed by 12% or 18% SDS-PAGE, concentrated (spin concentrator, molecular-mass cutoff 30 kDa; Millipore) and injected onto a size-exclusion chromatograpy (SEC) column (Superdex 200 10/300 or Superdex 75 HiLoad 16/600; GE Healthcare).

Expression and purification of the MtMce1A and MtMce4A domains
Based on the secondary-structure analysis, MtMce1A and MtMce4A domain constructs were generated. They were cloned in pETM11 using restriction-free cloning methods: the constructs were named according to the secondary-structural features: ( For selenomethionine (SeMet)-labeled MtMce4A 39-140 , the construct was transformed into an auxotrophic strain of E. coli (B834), which was grown according to the protocol from Molecular Dimensions (Ramakrishnan et al., 1993). The expression and purification protocols were similar to those for native MtMce4A 39-140 . SeMet incorporation was confirmed by electrospray ionization liquid chromatography-mass spectrometry (ESI LC-MS), which showed 100% incorporation of SeMet into the protein. (Wyatt Technologies). Purified protein at approximately 5-6 mg ml À1 was loaded onto a pre-equilibrated Superdex 200 10/300 column using an autosampler at a rate of 0.4 ml min À1 at 4 C in a Shimadzu HPLC/FPLC system. The samples were then passed through a refractive-index (RI) detector, a UV detector and subsequently through the MALS detector. The cumulative data collected from the UV, MALS and RI detectors were analyzed using the ASTRA software (Wyatt Technologies). The protein-conjugate analysis method was used to analyse the proteins that were complexed with detergent. The detergent was considered as a modifier and the recommended dn/dc value of 0.1473 ml g À1 for n-dodecyl -dmaltoside (DDM) was used for the protein-conjugate analysis. Analysis of the soluble MtMce1A 36-148 and MtMce4A 39-140 constructs was performed without using the protein-conjugate protocol.
To understand the effect of heat and higher ionic strength on the oligomeric state of the MCE domain, purified MtMce4A 39-140 was subjected to buffer exchange [0.1 M 2-(Nmorpholino)ethanesulfonic acid (MES), 0.7 M ammonium sulfate pH 6.0] using a 10 kDa molecular-mass cutoff Amicon concentrator. MtMce4A 39-140 was heated to 50 C in a thermocycler, with an initial 1 min incubation at 20 C followed by a 0.8 C increase per minute up to 50 C and a final incubation at 50 C for 1 min. The heated protein was then centrifuged at 10 000g for 5 min and the supernatant was injected onto a Superdex 200 10/300 column pre-equilibrated with a buffer consisting of 0.1 M MES, 0.7 M ammonium sulfate pH 6.0. The column was coupled to a MALS detector and was analyzed further to obtain the molecular mass.

Circular-dichroism (CD) spectroscopy of the MtMce1A and MtMce4A domains
The MtMce1A and MtMce4A domains were diluted in water to obtain a lower buffer and salt concentration. The protein concentration used for CD measurements (Chirascan CD spectrophotometer, Applied Photophysics, Surrey, UK) was 0.05 mg ml À1 . Secondary-structure calculations of the CD spectra of the MtMce1A and MtMce4A domains purified with DDM and without DDM were performed using the CDNN and BestSel software packages, respectively (Micsonai et al., 2015(Micsonai et al., , 2018. For the determination of the thermal melting temperature (T m ), the sample was heated from 22 to 92 C at a rate of 1 C min À1 . The melting curves were calculated by comparing the spectra from 190 to 280 nm with the global fit analysis protocol as implemented in the Global3 software from Applied Photophysics.
2.6. Native mass spectrometry of the MtMce1A 36-148 and MtMce4A 39-140 domains MtMce1A 36-148 and MtMce4A 39-140 were buffer-exchanged into 20 mM ammonium acetate pH 6.8 using PD Miditrap G-25 columns (GE Healthcare, Sweden). Mass spectra were measured on a 12 T Bruker solariX XR FT-ICR mass spectrometer using an Apollo-II electrospray ion source (Bruker Daltonics, Bremen, Germany). The instrument was calibrated using sodium perfluoroheptanoic acid (NaPFHA) clusters and was operated with the FTMS Control 2.2 software. The mass spectra were further analyzed using the DataAnalysis 5.1 software.
2.7. SAXS analysis of MtMce1A and MtMce4A domains 2.7.1. Data collection. SAXS data for the purified Mce1A and Mce4A domains were collected on the B21 beamline at Diamond Light Source (DLS), UK. Data were collected based on the standard protocols for inline SEC-SAXS and batchmode measurement using a PILATUS 2M two-dimensional detector at a sample-to-detector distance of 4.014 m and a wavelength of 0.99 Å . Inline SEC-SAXS measurements were collected for domains purified in the presence of the detergent DDM (MtMce1A 38-325 , MtMce1A 126-454 , MtMce1A 38-454 , MtMce4A 39-320 , MtMce4A 121-400 and MtMce4A 36-400 ) at an initial concentration of 5 mg ml À1 as SEC can separate the protein-detergent complexes and the empty micelles (Berthaud et al., 2012). Batch-mode measurements were collected for MtMce1A 38-148 and MtMce4A 39-140 at 2 and 1 mg ml À1 , respectively, with bovine serum albumin (BSA) as a control. For each batch-mode concentration, 25 frames were collected.
2.7.2. Data processing. Data processing and analysis was performed using the ScÅ tter and ATSAS software packages (Franke et al., 2017). The 2D data were averaged to give a 1D data set of intensity, I(q), versus q, where q is the modulus of the scattering vector. The scattering of the buffer was subtracted from the protein scattering using ScÅ tter. The data were rebinned using in-house-developed software (Vilstrup et al., 2020) to be approximately equidistantly spaced on a logarithmic q scale. The radius of gyration (R g ), forward scattering I(0) and maximum particle distance (D max ) were calculated using PRIMUS. The molecular weight was calculated based on two methods: volume of correlation (Rambo & Tainer, 2013) and SAXSMoW (Piiadov et al., 2019;Supplementary Tables S4, S5 and S6). Ab initio shape was generated using DAMMIN (Svergun, 1999). For MtMce4A 39-140 , the compact monomer was generated from residues 32-106 of chain A and residues 107-145 of chain B of the crystal structure. The elongated monomer corresponds to chain B of the crystal structure. These were further provided as a template in Robetta to add the missing residues (Raman et al., 2009;Song et al., 2013). For MtMce1A 36-148 , the entire compact and elongated models were generated with Robetta using the MtMce4A 39-140 compact and elongated crystal structures as the template. The models were evaluated against the experimental data using an in-house-written program (Steiner et al., 2018;Vilstrup et al., 2020) the complexes of DDM with the various constructs were also analyzed using in-house-developed software. The program is based on the methods described previously (Kaspersen et al., 2014;Steiner et al., 2018;Vilstrup et al., 2020;Calcutta et al., 2012). The DDM micellar structure is represented by Monte Carlo points in a triaxial core-shell structure with superellipsoidal shape with shape parameter t = 3 (Maric et al., 2017), and the protein is represented by the atoms in the PDB structures. When the protein overlaps with the core-shell structure, the corresponding Monte Carlo points were removed. The volume of the core was estimated from the number of points and the point density, and the aggregation number was calculated by dividing the core volume by the volume of a C 12 chain (353 Å 3 ). The shell contains both DDM headgroups and solvating buffer, and the thickness of the shell was fixed at 10 Å . In practice, the aggregation number was kept fixed and the lengths of the long axis and of one of the short axes were optimized, whereas the length of the third axis was calculated from these two and the aggregation number. The Monte Carlo points were assigned an excess scattering length corresponding to the electron densities of C 12 tails and heads for points in the core and in the shell, respectively, taking into account the glycerol content of the buffer. Similarly, the excess scattering length of the atoms of the protein was adjusted taking the glycerol into account. The scattering of a hydration layer was added to the protein in the places where it is not in contact with the micelle. The protein structure was divided into three domains, namely the MCE, helical and tail domains, to allow rigid-body refinement. The domains (MCE+Helical, Helical+Tail and MCE+Helical+Tail, respectively, for the three constructs) were connected by soft restraints as described in Vilstrup et al. (2020). The algorithm for generating the micelle, including estimates of the excess scattering length, were checked by fitting a data frame from pure micelle from the elution profile, and gave a satisfactory fit. The SAXS data for all constructs have a deep minimum around q = 0.1 Å À1 followed by a pronounced secondary maximum. This behavior is qualitatively very similar to that of pure DDM micelles, and the first tests revealed that such a q dependence could not be obtained when the protein penetrates significantly into the core of the micelles. Further tests showed that reasonable agreement with the SAXS data was obtained when the helix of the protein was along the long axis of the DDM micelle. Therefore, starting structures with this position were used in the optimizations. Additionally, a soft restraint that keeps the helix in contact with the micelle was introduced. The structure was optimized by random searches, initially with large amplitudes, which were gradually decreased during optimization (Vilstrup et al., 2020). For each structure ten independent runs were performed, each with 4000 cycles of optimization. The structure with the best agreement with the SAXS data in terms of reduced 2 was selected as the resulting structure. Initially the aggregation numbers were estimated from the SEC-MALS results, however, in some cases this did not give good fits to the SAXS data. Therefore, the aggregation number was varied in a reasonable range for these cases.
2.8. Crystallization, data collection, structure determination and structure refinement of MtMce4A  Purified MtMce4A 39-140 and SeMet-labeled MtMce4A 39-140 were concentrated to 7.5 mg ml À1 in protein buffer (Table 1) and used in all of the crystallization experiments. Crystallization was performed using the sitting-drop vapor-diffusion method at three different drop ratios (100:150, 150:150 and 150:100 nl protein:reservoir solution) at 22 C. Crystals were observed in all three drop ratios when using 100 mM sodium HEPES, 100 mM LiCl 2 , 20% PEG 400 pH 7.5 as the reservoir solution for native MtMce4A 39-140 and using 100 mM MES, 700 mM ammonium sulfate pH 6.0 as the reservoir solution for SeMet-labeled MtMce4A 39-140 . The native MtMce4A 39-140 and SeMet-labeled MtMce4A 39-140 crystals were transferred to reservoir solution supplemented with 20% ethylene glycol and 25% glycerol, respectively, for a few minutes and the crystals were subsequently flash-cooled in liquid nitrogen.
The data for both the native MtMce4A 39-140 and SeMet-MtMce4A 39-140 crystals were collected on the BioMAX beamline at MAX IV, Lund, Sweden at 2.9 and 3.6 Å resolution, respectively (Table 1). Data processing and scaling were performed using XDS (Kabsch, 2010) and AIMLESS (Evans & Murshudov, 2013), respectively, which suggested that the space group was P6 1 or P6 5 . The SeMet-labeled MtMce4A 39-140 structure was solved by SeMet SAD phasing using the CRANK2 (Skubá k & Pannu, 2013) pipeline with 20 selenium sites. Subsequently, space group P6 5 was chosen based on its better figure of merit. The model obtained from the CRANK2 pipeline was completed iteratively by model building using Coot (Emsley et al., 2010) and refinement calculations using Phenix (Liebschner et al., 2019), resulting in a model with an R work and R free of 0.34 and 0.37, respectively. This model consisted of two swapped dimers in the asymmetric unit. This model was subsequently used as the search model for expertmode molecular-replacement calculations (Expert-MR) in Phaser (McCoy et al., 2007) to determine the structure of native MtMce4A 39-140 . The obtained molecular-replacement model was then used as an initial model for autobuilding in Phenix (Liebschner et al., 2019). The structure was further refined iteratively using several cycles of manual model building in Coot and refinement in Phenix. The final refinement steps gave an R work and R free of 0.19 and 0.23, respectively. This model of native MtMce4A 39-140 was then again used to refine the SeMet-labeled MtMce4A 39-140 structure, giving a final R work and R free of 0.21 and 0.24, respectively (Table 1) Fig. 2(a)]. The first domain is an N-terminal transmembrane (TM) domain ($30-40 amino acids), which is predicted to form a single transmembrane helix, followed by a second domain with $100 amino acids mainly composed of -strands (seven in total), referred to as the MCE domain. The third domain is predicted to mainly consist of long helices ($200 amino acids) and this domain is therefore referred to here as the helical domain. The fourth domain is predicted to be an unstructured domain and is referred to as the tail domain. Interestingly, the length of the tail domain varies between six and 260 amino acids between the various MtMceA-F SBPs, while the order and length of the other domains is well conserved. Additionally, the tail domains of MtMce1C, MtMce1D, MtMce4D and MtMce4F are proline-rich. Moreover, MtMce1E, MtMce2E, MtMce3E and MtMce4E contain a conserved sequence motif (referred to as the lipobox) in their N-terminus (Sutcliffe & Harrington, 2004).      Table S9).

MtMce4A 39-140 crystallizes as a domain-swapped dimer
Structural studies were initiated on MtMce1A 38-454 and MtMce4A 36-400 as well as the soluble MCE domains MtMce1A 38-148 and MtMce4A 39-140 . Despite extensive trials, only MtMce4A 39-140 crystallized readily in several conditions in space group P6 5 . Given the low sequence identity of MtMce4A 39-140 to homologous proteins ($15%), the structure of MtMce4A 39-140 was determined using SeMet SAD phasing. The data-collection and data-processing statistics are reported in Table 1. Although Matthews coefficient calculations suggest the presence of 6-8 molecules in the asymmetric unit, assuming a solvent content of about 50%, the solved structure showed that only four molecules are present in the asymmetric unit, corresponding to a solvent content of about 71%. The structure was refined at 2.9 Å resolution (Table 1). Interestingly, further refinement and model building of the structure revealed that the four molecules of the asymmetric unit are formed by two domain-swapped dimers [ Fig. 3(a)]. The electron-density map clearly defines the loops in the regions that define the swapping of the C-terminal part [ Fig. 3(c)]. The domain-swapped dimer is formed by the extension of residues 107-141 from one molecule into the other molecule. The swapped region contains two -strands and an extended loop [ Fig. 3(a)].
The secondary structure mainly consists of antiparallel -strands, forming a -barrel-like structure. The topology diagram for the swapped dimer is shown in Fig. 3(b). The residues involved in formation of the seven-stranded -barrel are Thr40-Ser46 (1), Leu52-Met54 (2a), Lys59-Gly65 (2b), Ile65-Ser74 (3), Arg81-Asp87 (4), Thr99-Thr106 (5), Ile107-Ile116 (5 0 ; considered as the sixth -strand), His131-Val132 (7a 0 ) and Val137-Glu141 (7b 0 ). The residues from 107 to 141 are exchanged between the two monomers to complete the signature MCE fold. The overall structure has visible electron density for all of the residues corresponding to MtMce4A 39-140 except for the N-terminal residues 1-31 and C-terminal residues 143-146. The latter residues correspond to residues encoded by the vector region. Interestingly, comparison of the secondary-structure content of MtMce4A 39-140 calculated from the CD spectrum with the crystal structure showed a higher -sheet content (39%) in the crystal than from the CD spectra in solution (28%; Supplementary Table S10), indicating that the protein has more secondary structure in the crystallized condition.  Since the domain-swapped dimer is only observed in the crystallization condition, the purified MtMce4A 39-140 was exchanged into crystallization buffer and analyzed by SEC-MALS. Surprisingly, SEC-MALS analysis also showed only the presence of monomeric MtMce4A 39-140 in the crystallization buffer ( Supplementary Fig. S10). However, dimer formation was observed when MtMce4A 39-140 was heated slowly to 50 C in the crystallization buffer (0.7 M ammonium sulfate; Supplementary Fig. S10). These observations suggest that incubation of this protein solution with the crystallization solution at 22 C probably facilitated the protein in attaining a different conformation, including the formation of a domainswapped dimer. The dimer appears to be selectively crystallized, for example favored by better crystal contacts, compared with the monomer. There are other examples of full-length proteins and truncated domains which exist in different oligomeric states in solution but occur as domainswapped dimers in the crystalline phase. These examples include barnase, cyanovirin-N, the N-terminal domain of Spo0A and the SH3 domain of Eps8, to name a few (Yang et al., 1999;Lewis et al., 2000;Radha Kishan et al., 1997).

Elongated conformation of MtMce1A 36-148 and MtMce4A 39-140 in solution
The domain-swapped dimer is only observed in the crystals. Therefore, to understand the structures of MtMce1A 36-148 and MtMce4A 39-140 in solution, SAXS experiments were performed. The measured intensities I(q) are displayed as a function of the modulus, q, of the scattering vector. Structural parameters calculated from the scattering intensities are given in Supplementary Table S4. The radius of gyration (R g ) and maximum interatomic distances (D max ) were determined to be 21.6 and 70 Å for MtMce1A 36-148 and 21.7 and 80 Å for MtMce4A respectively [Supplementary Figs. S13(c) and 13(d)]. Interestingly, the determined D max for both MtMce1A 36-148 and MtMce4A 39-140 is much higher than the maximum diameter of monomeric EcMlaD (35 Å ), pointing towards an elongated structure for both of the proteins. Further, the ab initio molecular shapes reconstructed by DAMMIN indicate that both MtMce1A 36-148 and MtMce4A 39-140 attain an elongated shape under the purified conditions [ Figs. 4(a) and 4(b)]. From SEC-MALS and SAXS, we know that the proteins exist as monomers in solution. Therefore, the ab initio shape of MtMce4A 39-140 was fitted with two types of MtMce4A 39-140 monomer: a compact monomer consisting of residues 32-106 from chain A and 107-145 from chain B of the crystal structure, and an elongated monomer consisting only of chain A as observed in the crystal structure. The missing N-terminal tag and linker sequences were modeled in these molecules using Robetta, as explained in Section 2. The 2 values of the compact and elongated models calculated against the experimental SAXS data were 10.0 and 2.0, respectively [ Fig. 4(b)]. Similarly, in the case of MtMce1A 36-148 , a template-based model (obtained from Robetta) was used to fit the SAXS data, and the 2 values for the compact and elongated models were 14.0 and 11.0, respectively [ Fig. 4(a)], here also slightly favoring the elongated model. Further, the domain-swapped region of MtMce1A 36-148 was optimized by rigid-body refinement and this improved the 2 to 4.2. In summary, the elongated models fit relatively better than the compact model in both cases (Fig. 4). Taken together, these SAXS studies suggest that both MtMce1A 36-148 and MtMce4A 39-140 are in an elongated conformation in solution under the purified conditions, and the presented elongated models derived from the crystal structure in Fig. 4 represent one of the possible elongated conformations in solution. Nevertheless, in the crystals the MCE fold is still conserved despite its domain-swapped dimer conformation.
3.6. Comparison of the MtMce4A 39-140 structure with the E. coli and A. baumannii homologs Recently, structures of homologs of Mce SBPs from E. coli (EcMlaD, EcPqiB and EcLetB) and A. baumannii (AbMlaD) have been determined Tang et al., 2021;Ekiert et al., 2017;Coudray et al., 2020;Kamischke et al., 2019) [ Fig. 1(b)]. Based on these homohexameric structures, two different mechanisms of lipid transport have been reported.  of seven homohexameric MCE domains one above the other connecting the inner and outer membranes, with a central channel mediating lipid transport (Ekiert et al., 2017;Liu et al., 2020). Like LetB, PqiB also forms a central pore that is formed by three stacked Mce homohexamers, with their long C-terminal helix forming a narrow channel for lipid transport.
In comparison to the homologs from E. coli (EcMlaD, EcPqiB and EcLetB) and A. baumannii MlaD (AbMlaD) (Ekiert et al., 2017;Coudray et al., 2020;Isom et al., 2020;Kamischke et al., 2019;Liu et al., 2020;Mann et al., 2020), the overall MCE fold with a seven-stranded -barrel is conserved in the domain-swapped dimer of MtMce4A 39-140 . Notably, part of the MCE fold in MtMce4A 39-140 is completed by domain swapping. Therefore, we used the compact monomer for structural analysis and comparison. The compact monomer is formed by residues 32-106 of chain A and residues 107-145 of chain B, whereas the model of the elongated monomer is formed by residues 32-145 of chain A.
Superposition of the C atoms of the MCE domain of MtMce4A on EcMlaD and AbMlaD yields root-mean squaredeviations (r.m.s.d.s) of 1.7 and 2.6 Å , respectively [ Fig. 5(a)]. The sequence identity between the MtMCE domain and the E. coli and A. baumanni MCE Mla domains is lower than 15% ( Supplementary Fig. S11). The overall topology of the protein is conserved, with conformational differences mainly in the    domains and the EcPqiB1-3 MCE domains ranges from 7% to 18% and that between the MtMCE domains and the EcLetB1-7 domains ranges from 13 to 26% ( Supplementary  Fig. S11). The superposition showed that the -barrel fold is conserved and the observed differences are mainly in the loop regions. For example, 2a (52-54) is unique to MtMce4A 39-140 and is absent throughout in EcPqiB1-3 and EcLetB1-7. The 3-4 loop conformation present on the exterior surface varies amongst MtMce4A 39-140, EcPqiB1-3 and EcLetB1-7. It is notable that the length of 3-4 loop remains constant (four residues) in all of the MCE domains except EcPqiB3, which has 18 residues in the loop. Furthermore, the 6-7 loop in MtMce4A 39-140 has a different conformation compared with EcPqiB1-3 and EcLetB1-7. Amongst the available Mce SBP structures and MceA-F from MtMce1-4, MtMce4A 39-140 has a maximum number of proline residues in the 6-7 loop. The role of this proline-rich loop is not understood.
The 5 strand and the hydrophobic 5-6 loop (also referred to as the pore-lining loop; PLL) involved in forming the hydrophobic central pore have a different conformation in MtMce4A 39-140 , which contrasts with EcMlaD and AbMlaD. The PLL (5-6 loop) comprising the hydrophobic channel is much longer (16-27 residues) in EcPqiB1-2 and EcLetB1-7 when compared with MtMce4A 39-140 (five residues). We found that the PLL in EcPqiB3 has only seven residues and it is the only MCE domain in EcPqiB and EcLetB which shares this feature with MtMce4A 39-140 . Interestingly, the conformation of the PLL varies throughout the MCE domains of EcPqiB1-3 and EcLetB1-7 ( Supplementary Fig. S12). The central pore of all of the reported Mce SBP hexamers is comprised of highly hydrophobic residues, also known as the PLL, which allows the transport of small hydrophobic lipid molecules across the membranes. The variation in the length of the PLL depends on the transport mechanism followed by the particular Mce complex as well as the number of MCE domains that are present. For example, EcMlaD and AbMlaD have a smaller PLL (six residues) and they have a single MCE domain and follow a ferry-based transport mechanism. In comparison, the PLL is longer in EcPqiB and EcLetB (17-27 residues), which have three and seven MCE domains and follow a tunnel-based lipid-transport mechanism. It has been reported that the PLL of EcPqiB1-3 and EcLeTB1-7 follows the pattern 'xx'', where ' denotes a hydrophobic amino acid and x represents any amino acid . Although this pattern is followed in MtMce1A ( 112 ATTVF 116 ), it does not align with the other Mce SBPs from Mtb ( Supplementary Fig. S11). Instead, the other Mtb Mce SBPs follow the pattern xxx'' in the PLL, which aligns with EcMlaD and AbMlaD. Notably, the Mtb Mce SBPs and the MlaDs have only one MCE domain. Nevertheless, this conserved 'duo' of consecutive hydrophobic residues in the MtMce1A-F and MtMce4A-F SBPs indicate the formation of a hydrophobic pore. In addition, the helical domain of the MtMceA-F SBPs also has a high number of hydrophobic residues, although a clear 'motif' is not observed. The monomeric nature of MtMce4A 39-140 is in contrast to the other Mce proteins (MlaD, PqiB and LetB) from E. coli and A. baumannii, which form a homohexamer (Ekiert et al., 2017;Kamischke et al., 2019;Coudray et al., 2020;Isom et al., 2020;Liu et al., 2020;Mann et al., 2020;Tang et al., 2021). Based on EcMlaD, we modeled a hypothetical homohexamer of MtMce4A 39-140 (Fig. 5b) by superposing the MtMce4A 39-140 monomer onto each of the six EcMlaD monomers. Interestingly, the protein-protein interface of the modeled homohexamer of MtMce4A 39-140 has multiple steric clashes which will preclude the formation of homohexamers in MtMce4A [the clashes between chains A and B are shown in Fig. 5(b)]. These clashes are absent in EcMlaD, AbMlaD, EcPqiB1-3 and EcLetB1-7, where homohexamers are formed. Overall, these comparisons show the different properties of MtMce4A 39-140 , although the core MCE fold is well conserved.  Tables S5 and S6). The elution profile has two peaks: one for the PDCs and one for the empty micelles, showing that the PDCs and empty micelles are separated during SEC. The SAXS scattering data of the PDCs display a minimum at a scattering-vector modulus of 0.1 Å À1 followed by a broad bump. This suggests that the protein is interacting with the nearly intact detergent micelle. However, the scattering from the PDCs is distinctly different from that of empty micelles due to the additional strong scattering from the protein in the PDCs. This makes the forward scattering much greater for the PDCs (Supplementary Fig. S14). The empty micelle has a deeper minimum compared with the minima of PDCs around q = 0.1 Å À1 . Additionally, the detergent is differently organized in the PDCs and therefore the shape of the secondary maximum also differs for PDCs and empty micelles (Supplementary Fig. S14; Kaspersen et al., 2017;Pedersen et al., 2020). The scattering contribution of the detergent micelle interferes with the protein scattering, and detergent scattering cannot be separated from protein scattering in PDCs. Therefore, instead of calculating ab initio shapes for the protein, both the detergent micelle and the protein parts were modeled and the SAXS data for the MtMce1A and MtMce4A complex constructs with DDM were also modeled using in-housedeveloped software (Vilstrup et al., 2020).
The crystallographic data for the Mtb MCE domain suggest that the Mtb SBPs cannot assemble as a homohexameric complex. In order to obtain greater insight into the possible structural properties, models of MtMce1A and MtMce4A were generated using I-TASSER. In both the MtMce1A and MtMce4A models, the extended helix of the helical domain turns back at residues Glu248 and Asp215, respectively, to form a coiled-coil structure, where the coiled-coil helices are research papers IUCrJ (2021). 8, 757-774 held together by hydrophobic interactions. This brings the tail domain close to the MCE domain. This model is referred to as a coiled-coil model [Figs. 6(a) and 6(b)]. In addition, a second variation of this model was generated by opening the helical domain to form an extended helix, keeping the tail domain far away from the MCE domain. This model is referred to as an 'extended helical model ' [Figs. 7(a) and 7(b)].
From our experimental data, it is clear that the MCE domain is soluble and the presence of the helical domain requires detergent for purification. Therefore, the detergent micelle has to interact with the helical domain. Although the helical domain has a high number of hydrophobic amino acids, it also has polar residues which preclude the possibility of the helix being completely inserted into the micelle, as for a typical transmembrane protein. Furthermore, calculations of the SAXS intensity with the helix inserted into the core show that the SAXS intensity for these models smears out the minimum, so that it is not as deep as observed in the data. Therefore, a core-shell model of the detergent micelle was used where the core represents the hydrophobic tail (dodecyl  chains) of the detergent molecules and the shell represents the head group (polar) and the water molecules associated with it (Kaspersen et al., 2014). The core-shell model of the detergent molecules is represented by Monte Carlo points acting as space holders for electron-density difference. On testing multiple micelle shapes using Monte Carlo points, the best fit was obtained when using a super-ellipsoid shape with the long axis along the helical domain, which maximizes the interaction of the protein helix with the core of the micelle. The micelle size (aggregation number) was initially estimated from SEC-MALS and SAXS scattering analysis (Kaspersen et al., 2014) to be in the range in   were not satisfactory it was further varied in a reasonable range to obtain the best fit to the SAXS data.
With these assumptions, both coiled-coil as well as extended helical models for each of the MtMce1A and MtMce4A constructs were optimized (ten independent runs) together with the micelle with an appropriate aggregation number to fit the SAXS data. For MtMce1A 38-325 as well as MtMce4A 39-320 (MCE+Helical domains), both the coiled-coil and extended models showed convincing fits, with 2 values ranging from 3 to 20 (Supplementary Figs. S15 and S16).
In the case of MtMce1A 38-454 (MCE+Helical+Tail domains) the extended helical model [ Fig. 7(a)] has a 2 range of 15-24, compared with 22-50 for the coiled-coil model. The extended model fits the data well in the full q range, with a small deviation around the minimum, where the model curve is not quite low enough [ Fig. 7(c)]. The coiled-coil model is too small, with some deviations at low q, and the optimization compensates partly by displacing the MCE domain away from the Helical+Tail domain, leading to some disconnectivity in the structure [ Fig. 6(a)]. In the case of MtMce4A 36-400 the data are not fitted well at high q values for both models, although the low-q data fit better to the extended model [ Fig. 7(d)]. Similar to the MtMce1A 38-454 coiled-coil model [ Fig. 6(a)], the MCE domain and the Helical+Tail domain also become disconnected in the MtMce4A 36-400 coiled-coil model [ Fig. 6(b)].
The Helical+Tail domain fits for the MtMce4A 121-400 extended and coiled-coil models have similar 2 values in the range 4.6-8.0 ( Supplementary Fig. S17). Both of these models have less deep minima with respect to the data. The MtMce1A 126-454 extended model fitting has a 2 range of 40-75, whereas the coiled-coil model fit shows a 2 of between 114 and 182. Similar to the MtMce4A 121-400 models, the minima are also less deep in the MtMce1A 126-454 models (Supplementary Fig. S18). We have to accept that the tail domain is unstructured, with greater uncertainty in the structure prediction. This could also be a reason for the poorer SAXS fits for all of the constructs with the tail domain. The counting statistics of the data for the samples vary somewhat and therefore the 2 values also vary, and it is observed that the 2 values are often higher for data with good counting statistics. Therefore, we decided to also calculate R factors and weighted R factors, as used in crystallography. R factors are dominated by the high intensities at low q, whereas weighted R factors are a normalized measure of ( 2 ) 0.5 . The determined values are both in the range 1-5% (Supplementary Tables S5 and S6). They reveal that the deviation between data and fits is lower than the 2 values suggest.
The above analysis confirms that the detergent micelle interacts with the helical domain irrespective of whether the helix turns back (as in a coiled-coil model) or is extended (as in an extended helical model) in MtMce1A and MtMce4A. High-resolution information is needed to unambiguously conclude which of these two models is relevant under physiological conditions. Considering the low-resolution information in the SAXS data, as well as the possible errors in the generated MtMce1A and MtMce4A models, which are partly based on structural predictions, our analysis gives the best possible explanation for the observed SAXS data in a qualitative and in a semi-quantitative manner. The methods applied here for the analysis of MtMce1A and MtMce4A can be generalized for use for other membrane proteins as well as for membrane-associated proteins purified in the presence of detergents.

Concluding remarks
The challenges in purifying mycobacterial Mce proteins have hampered their study for many years. However, in this study recombinantly expressed and purified Mtb Mce1A-1F and Mce4A-4F SBPs have been characterized. Each of the SBPs was individually expressed and purified from E. coli. Further, we have classified the Mtb Mce1A-1F and Mce4A-4F SBPs into four different domains based on secondary-structure prediction. The domain characterization shows the presence of a unique tail domain in the SBPs from Mtb that is not present in the other characterized homologs [ Fig. 1(a)]. The predicted length of the tail domain varies from 34 residues to 218 residues in the MtMce1A-1F and MtMce4A-4F SBPs. Further characterization shows that the full length as well as all of the domains of MtMce1A and MtMce4A remain as monomers in solution when purified individually. Only the MCE domain is soluble in the absence of detergents. The MCE domains of MtMce1A and MtMce4A occur as monomers in solution, as also shown by mass spectrometry. The crystal structure of the Mtb MCE domain reveals a -barrel fold, as also found for its homologs, despite very low sequence identity (15% or less). The MCE+Helical and the Helical+Tail domain constructs require detergents for solubility. Further, SAXS analysis of MtMce1A, MtMce4A and their domains suggests that the helical domain may adopt the 'coiled-coil' or 'extended helical' conformation. In the coiled-coil model the MCE and tail domains are near each other, whereas the MCE and tail domains are far away from each other in the extended helical model. Irrespective of the conformation of the helical domain, it is very clear that the helical domain requires detergent for its stability and is either involved in interaction with the lipid substrates or embedded in the membrane. Structural analysis of MtMce4A 39-140 suggests that the homohexamer cannot be formed, at least in Mce4A, due to multiple steric clashes. The fact that there are six Mce SBPs in Mtb suggests that the six MceA-F SBPs may interact with each other to form heterohexamers, where the helical domains of the six MceA-F molecules may form a channel as observed in EcPqiB (Ekiert et al., 2017), but in Mtb this channel will be more extended. The resulting heterohexameric arrangement would therefore favor a tunnel-based mechanism for lipid transport. The presence of a single MCE domain, a longer helical domain and an additional tail domain will make the overall architecture of the