Structure of the N-terminal domain of the protein Expansion: an ‘Expansion’ to the Smad MH2 fold

Expansion is a modular protein that is conserved in protostomes. The first structure of the N-terminal domain of Expansion has been determined at 1.6 Å resolution and the new Nα-MH2 domain was found to belong to the Smad/FHA superfamily of structures.


Introduction
The Drosophila transcription factor Tramtrack (Ttk) is involved in a wide range of processes during development of the tracheal system. The analysis of gene-expression changes in Drosophila embryos after inducing Ttk loss of function and gain of function enabled the identification of the Expansion gene (CG13188; Exp). A search for similar proteins to Expansion led to the identification of CG13183 (recently renamed Rebuf; Reb; Rotstein et al., 2011). These two proteins share 56% amino-acid similarity that is concentrated in the N-terminal part of the sequence. Expansion and Rebuf (Exp/Reb) proteins have been identified in several Drosophila species, other insects and arthropods (Iordanou et al., 2014). They are annotated in the NCBI database as modular proteins containing an N-terminal MH2 domain and a variable C-terminal region which does not present sequence similarity to other characterized domains. The sequence similarity to the MH2 domain is quite remarkable since these domains were believed to be exclusively present in Smad proteins, which are the main players in the TGF-signalling pathway in metazoans (Massagué, 2012). Smad proteins comprise two conserved domains separated by a linker that does not adopt a defined tertiary structure. The N-terminal (MH1) domain binds to DNA sites in promoters, while the linker and the ISSN 1399-0047 C-terminal (MH2) domain are the protein-interaction sites. As mediators and regulators of cytokine signalling, Smads are involved in many cellular processes from cell homeostasis to differentiation, division and cell death (Massagué et al., 2005).
The presence of an MH2 domain in Exp/Reb proteins has been used as a hallmark to classify them as Smad-like proteins (Iordanou et al., 2014). However, the sequence identity of the Exp/Reb proteins to the Smads is very low and is restricted to the MH2 domain. Furthermore, the differences are not only at the sequence level but also in the localization of the domain: in the N-terminal part in Exp/Reb in contrast to a C-terminal position in Smads. Secondary-structure predictions indicate that the Exp/Reb MH2 domain might contain additional elements of secondary structure preceding the MH2 fold. All of these characteristics suggest that Exp/Reb might constitute a new family of proteins that share the presence of a divergent MH2 domain with the Smads.
To clarify this issue and prompted by the similarities and differences between the Exp/Reb and Smads proteins, we set out to investigate the presence of the Exp/Reb proteins in metazoans and to characterize the structure of this new MH2 domain. Our results reveal that Exp/Reb proteins are restricted to protostomes, whereas Smads are highly conserved in both protostomes and deuterostomes. Regarding the structure, the crystal structure of this domain has been determined at 1.6 Å resolution and represents the first structure of an MH2 domain to be defined outside the Smad family of proteins. Although the structure displays the main features of the MH2 fold, it contains an additional -helical region that covers the concave site of the MH2 domain and defines the specific structure of the Exp/Reb MH2 domain. Based on this observation, we refer to the Exp/Reb MH2 domain as an N-MH2 domain, a new member of the FHA/Smad superfamily of MH2 domains.
A characteristic of activated Smad proteins is the formation of quaternary structures through interactions of their MH2 domains (Shi & Massagué, 2003). Even if Exp/Reb proteins are different from Smads, the presence of the N-MH2 domain led us to hypothesize that Exp/Reb proteins could perhaps also modulate TGF-signalling in protostomes through the formation of heterotrimers using the N-MH2 domain as a binding partner for Smads. The functional and structural studies presented here support the nonparticipation of Expansion in the canonical TGF-signalling pathway. Analysis of the N-MH2 structure and its comparison with those of Smad proteins provides the basis for understanding the binding differences. Our data are in agreement with the results reported in the literature indicating that Exp/Reb proteins are required for specific activities in protostomes, regulating receptor tyrosine kinase signalling to control terminal branch size and morphology (Iordanou et al., 2014).

Experimental procedures
2.1. Sequence alignment and secondary-structure prediction Sequences corresponding to Expansion proteins were retrieved from the Ensembl Metazoa database (http:// metazoa.ensembl.org) with PSI-BLAST (Altschul et al., 1990), using the D. melanogaster Expansion protein (CG13188) as the query. Multiple sequence alignments with the query and the target proteins were generated with MAFFT v.7.164b using the iterative L-INS-i method (parameters: Blosum62 scoring matrix and gap-opening penalty set to 1.5).
Conserved residues were highlighted using the BoxShade server v.3.21 (http://sourceforge.net/projects/boxshade/) written by K. Hofmann and M. Baron. A graphical representation of secondary structure was added to the alignment using the ESPript server (Robert & Gouet, 2014).

Cloning of the Expansion Na-MH2 domain
Constructs for the Expansion domain were cloned into the pETM-11 expression vector (EMBL) by means of ligationindependent cloning. The initial construct used for NMR screening and preliminary crystallization trials consisted of a fragment comprised of residues 29-240. We also prepared a second construct including an N-terminal extension (residues 3-240), which was used for structural studies.
Inserts were obtained by PCR using the appropriate primers and the D. melanogaster Expansion cDNA (CG13188, isoform B) as a template. PCR amplification was carried out using the standard PCR Master Mix with 0.02 unit ml À1 Taq DNA polymerase (Thermo Scientific), 1 ng ml À1 template DNA and 0.5 mM of each primer, using an annealing temperature of 321 K for the first ten rounds and 341 K for the last 30 rounds. After subsequent purification of the reaction product using the GeneJET PCR purification kit (Fermentas), ligation-independent cloning was performed in the presence of 6 mg ml À1 RecA DNA recombinase in the recommended buffer (New England Biolabs) at 310 K for 15 min. The full recombination reaction was then used for transformation in E. coli DH5 and positive clones were selected on kanamycin agar plates (50 mg ml À1 ). Positives clones were verified by DNA sequencing and subsequently transformed into the expression strain E. coli BL21 (DE3) Rosetta (Invitrogen), selecting positive clones on kanamycin (50 mg ml À1 ) + chloramphenicol (34 mg ml À1 ) agar plates.

Protein expression and purification
Cultures were grown at 310 K until an OD of $0.6 was reached; the temperature was then lowered to 293 K and protein expression was induced by the addition of 1 mM isopropyl -d-1-thiogalactopyranoside (IPTG). Expression took place for 12-15 h and the cells were harvested by centrifugation (4000g, 15 min). Labelled samples for NMR were prepared in a similar manner using minimal medium (M9) enriched with 15 NH 4 Cl.
The cells were washed in TBS buffer and 5 g of cells were resuspended at 277 K in 10 ml buffer consisting of 50 mM Tris-HCl pH 7.2, 500 mM NaCl, 1 mM phenylmethylsulfonylfluoride (PMSF), 0.1 mM -mercaptoethanol (BME), 0.1 mM EDTA supplemented with 0.1 mg ml À1 lysozyme, 0.25 mg ml À1 DNaseI, 25 mM MgCl 2 , 0.01 mg ml À1 RNaseA for lysis. Lysis was performed in a pressurized cell homogenizer at 277 K and the crude solution was incubated for 20 min on ice. The soluble fraction was then isolated by centrifugation (45 000g, 20 min, 277 K).
Immobilized metal-ion affinity chromatography (IMAC) was performed on an Ä KTA FPLC system at room temperature using a prepacked 1 ml His-tag column (GE Healthcare). The domain of interest was purified using buffer consisting of 50 mM Tris-HCl pH 7.2, 500 mM NaCl, 0.1 mM BME, 0.1 mM EDTA and eluted with a gradient to the same buffer freshly supplemented with 500 mM imidazole (50 mM Tris-HCl pH 7.2, 500 mM NaCl, 0.1 mM BME, 0.1 mM EDTA, 500 mM imidazole). Peak fractions were isolated and diluted with buffer (50 mM Tris-HCl pH 7.2, 200 mM NaCl, 0.1 mM BME) to lower the salt and imidazole concentrations for enzymatic digestion. Subsequently, the N-terminal 6ÂHis tag was removed by overnight digestion at 277 K with Tobacco etch virus (TEV) protease and the digested protein was repurified by IMAC as described above. Size-exclusion chromatography (SEC) was performed on a Superdex 200 10/300 column using buffer consisting of 50 mM Tris-HCl pH 7.2, 200 mM NaCl, 0.1 mM BME and the peak fractions were collected and concentrated to $10 mg ml À1 using centrifugal filters (Amicon).
For overexpression experiments, we used the Gal4/UAS system (Brand & Perrimon, 1993). We used the breathlessGal4 (btlGal4) driver, which drives expression in all tracheal cells, and the nubbinGal4 (nubGal4) driver, which drives expression in the wing disc. Crosses were kept at 29 C to maximize the expression of the transgenes.
To visualize the 'tracheal pattern', the embryos carrying btlGal4 UAS-srcGFP (cell membrane staining) were stained for GFP. The bltGal4 in this combination also drives the other UAS transgenes.
Confocal images were acquired with a Leica TCS-SPE system. Images were post-processed with ImageJ and Adobe Photoshop and assembled using Adobe Illustrator.

NMR experiments
An HSQC (heteronuclear single-quantum coherence) NMR experiment (eight scans and 128 increments in the indirect dimension) was recorded at 298 K on a Bruker Avance III 600 MHz spectrometer equipped with a z-pulse field gradient unit and a triple ( 1 H, 13 C, 15 N) resonance probe head. The protein sample (0.7 mM) was equilibrated in 20 mM deuterated Tris-HCl, 150 mM NaCl buffer with 10% D 2 O and the pH adjusted to 7.5. The data were processed using the XwinNMR 3.5 software supplied with the Bruker NMR spectrometer.

Crystallization
Initial crystallization conditions were identified from sparse-matrix screens. A series of gradient screens optimized the final condition and three-dimensional diffraction-quality crystals were finally grown in sitting drops at 293 K from 12%(v/v) 1,2-propanediol, 9%(w/v) PEG 20 000, 0.1 M glycine pH 9. The short protein construct crystallized in similar conditions.

Data collection
Data were collected at 100 K from a monoclinic crystal using a PILATUS 6M detector on BL13-XALOC at the ALBA Synchrotron Light Source, Barcelona, Spain. 360 images were collected at a wavelength of 0.97949 Å with an exposure time of 1 s and an oscillation of 1 . Data were processed in the XDS suite (Kabsch, 2010) Kabsch (2010). § CC 1/2 is the correlation between intensities from random half data sets (Karplus & Diederichs, 2012). } R free is the cross-validation R factor computed for a test set of reflections (5%) which were omitted from the refinement process. † † R.m.s. deviation from ideal values in accordance with Engh & Huber (2001). ‡ ‡ Chen et al. (2010).
POINTLESS (Evans, 2006). Statistics of data-collection and model building are presented in Table 1.

Phasing, model building and refinement
Structure solution and model refinement was performed with programs from the PHENIX suite (Adams et al., 2010).  Initial molecular-replacement solutions were rebuilt using Rosetta model completion and relaxation in phenix.mr_rosetta, and automated rebuilding yielded the best results starting from Smad2 MH2 (PDB entry 1khx), which resulted in a model with a free R factor of 30%.
Refinement was performed with phenix.refine  employing simulated-annealing and energyminimization cycles. Initially tight geometry restraints were applied, which were released gradually as the R work and R free factors diverged. After several rounds of iterative refinement and model building in OMIT maps (Terwilliger et al., 2008) and feature-enhanced maps (FEMs; Afonine et al., 2015) in Coot (Emsley & Cowtan, 2004), density representing the N-terminal helix of the molecule could easily be identified in a difference electron-density map (mF o À DF c ). Iterative refinement and model building was continued until no significant improvement in R factors could be obtained. The refined model was validated using MolProbity (Chen et al., 2010) and the wwPDB Validation Server (http:// wwpdb-validation.wwpdb.org; Berman et al., 2003). The Ramachandran plot is shown in Supplementary Fig. S2.

Structural analysis
Structural superimposition was performed with the superpose algorithm in PyMOL (Schrö dinger) based on the -strand content of each structure as calculated by DSSP (Kabsch & Sander, 1983). Sequence alignments with the EMBOSS Needle algorithm returned values for sequence identity and similarity (Rice et al., 2000).

Identification of a non-Smad MH2-like domain in protostomes
The sequence identity of the Exp/Reb Drosophila proteins (also known as CG13188 and CG13183, respectively) is mostly restricted to the MH2 domain ($30%). A comparison with Smads reveals a similarity about 16%, which is limited to the MH2 domain. This similarity suggests that the Exp/Reb and Smads proteins share the presence of the MH2 domain and that the Exp/Reb proteins may constitute a new protein family with a specialized function in a subset of metazoan phyla.
To clarify this issue, we searched the Ensembl Metazoa database (http://metazoa.ensembl.org) using PSI-BLAST with the Expansion full-length sequence as the query. Our search retrieved two sets of matches. The first set reflects similarity to the entire query in nematodes, arthropods and other hexapods, suggesting the presence of orthologous proteins in these organisms. The second set reflects similarity of the N-terminal part of the sequence query to only the C-terminal part of Smads, specifically to the MH2 domain, as reported in the annotation of these proteins in the NCBI database. Remark-ably, the similarity in the Exp/Reb subfamily covers an additional region of similarity preceding the canonical MH2 domain which is absent in the Smad sequences. The alignment comparing the sequences of the Exp/Reb domains with those of the MH2 domain of Smads is shown in Fig. 1(a).
Secondary-structure predictions using NetSurfP (Petersen et al., 2009) corroborated the sequence similarity of the new domain to that of the Smad MH2 structure and indicated that the additional conserved region preceding the canonical MH2 fold might adopt a helical structure. Of the two predicted helices, the sequence that lies adjacent to the canonical MH2 domain is more conserved than the fragment predicted at the most N-terminal part of the protein. The prediction is depicted at a 0.4 level of probability in the sequence alignment (Fig. 1a). A schematic representation of the domain organization of the Exp/Reb proteins and the similarity to Smads is shown in Fig. 1(b). An alignment based on the full sequence of the Expansion proteins is shown in Supplementary Fig. S3.

Drosophila Exp/Reb do not participate in the TGF-b pathway
In addition to the structural work, we investigated the functional implication of this new family of proteins using cellular and genetic approaches in Drosophila. The TGFsignalling pathway has been shown to play a key role in tracheal formation, specifying the most dorsal and ventral branches (Llimargas & Casanova, 1997;Ribeiro et al., 2002;Vincent et al., 1997). When the pathway is downregulated (by overexpressing the inhibitory Smad), the formation of dorsal and ventral tracheal branches is compromised. However, when the pathway is constitutively activated (by overexpressing a constitutively active Thick Veins receptor; Tkv CA ) all branches migrate along the dorso-ventral axis (Supplementary Figs. S4a, S4b and S4c). We have observed that the pattern of dorsal and ventral branches was correct not only when Exp/Reb were downregulated using RNAi (Supplementary Figs. S4d and S4e) but also when both genes were removed using a combination of chromosome deficiencies that uncovers both of them ( Supplementary Fig. S4f). In fact, we detected a completely different phenotype in lossof-function conditions for these genes ( To further test any possible involvement of Exp/Reb in the TGF-pathway, we also analyzed the phenotypes of their downregulation in the wing (also obtaining a negative effect) and compared them with the defects in the downregulation of the control Smad4/Medea. As depicted in Supplementary Figs. S4(g), S4(h) and S4(i), the detected phenotypes of Exp/Reb downregulation are very different from those of the control, confirming that Exp/Reb do not appreciably transduce TGFsignals. Our results are consistent with recently reported experiments (Iordanou et al., 2014) and suggest that this new family of proteins are involved in different functional pathways to the canonical TGF-pathway.

research papers
Overall, these results support our hypothesis that the Exp/ Reb proteins define a new family of proteins specific for protostomes that have the MH2 domain in common with Smads.

Protein expression, NMR and crystallographic
screening. Based on the sequence conservation and on the secondary-structure predictions, we selected two different domain boundaries for structural studies: a construct including the most conserved predicted helix (amino acids 29-240) and a second construct comprising nearly the full N-terminal region of the protein (amino acids 3-240) (Fig. 1b). These two recombinant proteins were soluble and eluted as monomers in size-exclusion chromatography ( Supplementary Fig. S5). According to NMR experiments the construct consisting of amino acids 29-240 was folded ( Supplementary Fig. S6), and the initial crystallographic results were obtained using this construct. Since diffraction-quality crystals were also obtained from the larger construct, we focused the structural work on this construct consisting of amino acids 3-240. This would allow us to elucidate the role of the entire N-terminal region.
Diffraction data were collected from a monoclinic wedgeshaped crystal with approximate dimensions of 10 Â 30 Â 150 mm. The data were processed in space group P2 1 to a maximum resolution of 1.6 Å . Matthews coefficient analysis indicated a solvent content of $27% for one protomer ($28 kDa) in the asymmetric unit, which was consistent with the tight packing observed in the structure.

Structural determination of the Na-MH2 domain.
Owing to the anticipated structural resemblance of the new domain to the canonical Smad MH2 domain, we first attempted to solve the structure by conventional molecular replacement (MR) using Phaser (McCoy et al., 2007) with several human Smad MH2 domains as search models (   cher et al., 2004). Among these solutions, the best was that obtained using the human Smad3 MH2 domain (PDB entry 1mjs). The selected solution (using Phaser) reported values of LLG = 56.1, TFZ = 4.3 and R val = 58.5 at 3 Å resolution, and allowed us to partially trace the map. However, we were unable to perform rigid-body refinement as we could not improve the R factors beyond R work = 0.5207 and R free = 0.5215. Similar values were obtained for other resolution ranges.
Since this approach was unsuccessful, we decided to apply the MR-Rosetta algorithm, which has recently been demonstrated to facilitate molecular replacement in cases where only search models of low sequence identity are available. Using the methods compiled in the MR-Rosetta pipeline (DiMaio et al., 2011;Terwilliger et al., 2012) from the PHENIX package (Adams et al., 2010) and an ensemble constructed from the known structures of canonical MH2 domains as a search model, a structural solution could be obtained. The ensemble and the final model are displayed in Supplementary Fig. S1. The data were cut following the recent recommendations by Karplus & Diederichs (2012) at a conservative resolution of CC 1/2 > 0.5, which resulted in the data-collection and modelrefinement statistics reported in Table 1. The high-quality diffraction data allowed the modelling of residues 27-236, with R work = 0.1765 and R free = 0.1977.
As expected, the refined model revealed a tertiary structure with a striking resemblance to the canonical Smad MH2 fold (Figs. 2a and 2b). The structure comprises a -sandwich core of twisted antiparallel -sheets capped at one end by a threehelix bundle and at the other by a region containing an -helix and several loops. In Smad MH2 domains this region is commonly referred to as the 'loop-helix region' (Shi et al., 1997).
At a first glance, the most obvious structural difference when superimposed on Smad2 MH2 domains is the presence of the N helix, which is formed by the N-terminal residues 34-47 and covers one side of the core -sandwich (Figs. 2c and  2d). This new element of secondary structure is named N to indicate its position at the N-terminus of the domain and, most importantly, to maintain the canonical nomenclature of MH2 domains (Fig. 2c). A C trace of the N-MH2 domain and the electron-density map at contour levels of 1 and 2 are shown in Fig. 2( f). A few additional structural differences are observed in the length of the helices that form the helical bundle and in the area adjacent to the 'loop-helix region' comprised of helix 2, a loop and strand 8, referred to here as the 'H2 region' (Fig. 2b, green rectangles). Also significant is the reduced length of the L3 loop that connects 10 and 11. This loop in Smad MH2 domains comprises an interaction motif for phosphorylated residues, and is fundamental to receptor binding and to the quaternary structure and function of the Smad proteins (Lo et al., 1998). In Smad domains loop L3 comprises 17 residues, whereas in Expansion this loop is shorter (11 residues) and is very different in sequence (Fig. 1a). To observe the effect of the differences in and around the L3 area, we have represented the surface charge distribution of the Expansion N-MH2 domain and that of Smad2 for comparison (Figs. 2d and 2e) and highlighted the presence of positively charged patches in the Smad2 MH2 domain that are absent in Expansion.  adjacent monomers in the functional trimer of the Smad proteins. Previous results have suggested how cancer-derived mutations, which map to the same area in the Smad proteins, inhibit the formation of the functional heterotrimer (Shi et al., 1997). When superposing the N-MH2 domain (shown in blue in Figs. 3a, 3b and 3c) onto one monomer of the Smad2 MH2 homotrimer (shown in light grey), the N helix overlaps with the area occupied by the 'loop-helix region' of the adjacent monomer in the trimer (in the figure this second MH2 is shown in dark grey). This structural 'clash' most certainly compromises the formation of homotrimers by the N-MH2 domain in a manner similar to that observed for the Smad MH2 domains. Furthermore, it will also prevent the formation of heterotrimers with the MH2 domain of Smad proteins.
In Smad proteins, where the MH2 domain is located at the protein C-terminus (Fig. 1b), the linker preceding the MH2 domain does not adopt a defined tertiary structure. However, in several structures of Smad complexes segments of the linker have been observed to adopt different conformations when folding upon interaction with other proteins (Aragó n et al., 2011(Aragó n et al., , 2012Shi & Massagué, 2003).
The secondary-structure predictions of the Expansion family of proteins suggested the presence of a second helix at the very N-terminus of the protein (Fig. 1a). However, no electron density could be observed for this predicted helix, likely reflecting a degree of flexibility in this area. Patches of positive electron density observed in a difference-density map (mF o À DF c ) suggested that the region comprising the first 24 residues extends into a solvent channel of the crystal structure; however, any effort to improve the model in this region did not improve the refinement statistics and the model was therefore truncated at the N-terminus.
3.3.4. Protein interactions in MH2 domains. The divergent sequence of loop L3 in the N-MH2 domain indicates a significant difference in the function of the Expansion N-MH2 domain with respect to canonical Smad MH2 domains. In the TGF-pathway signals are propagated through receptor activation of R-Smads by phosphorylation of the S-X-S motif at the C-terminus of the MH2 domain, leading to heterotrimer formation through the MH2 domain of two R-Smads with the common Smad4. Formation of the heterotrimer triggers the subsequent translocation of Smads to the nucleus. Mutations in the L3 loop of the Smad proteins abolish the formation of heterotrimers and hence signalling in the TGF-pathway (Wu et al., 2001). Whereas unphosphorylated Smad proteins are able to form dimers and trimers in a concentration-dependent manner (Shi & Massagué, 2003), the presence of the additional helix and the lack of the L3 loop in the N-MH2 domain could effectively prevent this domain from supporting the formation of similar dimers and trimers.
The Smad MH2 domain is commonly known to support protein interactions, and interestingly the overall composition of the Expansion N-MH2 domain remains largely the same despite the low degree of sequence identity. However, the H2 region that has been established as a protein-protein interaction site in Smad proteins (Qin et al., 1999;Wu et al., 2002) is completely different in the Expansion N-MH2 domain. In the refined model of the N-MH2 domain the -sandwich core is comprised by two antiparallel -sheets each having five strands, whereas in Smad MH2 the upper sheet comprises six strands (Figs. 2a and 2b). In the Expansion N-MH2 domain the region of amino acids 162-165 that would correspond to the 8 strand in Smads lacks a defined secondary structure; it instead bears characteristics of random coil similar to most of the adjacent H2 region, apart from helix 2. In Smad proteins the H2 region extends from strand 7 in the upper -sheet, with helix 2 followed by a short loop that connects to strand 8 continuing to strand 9. In the N-MH2 domain strand 7 is followed by a large loop connected to a reoriented helix 2, followed by an extended region (corresponding to strand 8) continuing to strand 9 (Figs. 2a and 2b).
In the Smad proteins the H2 region and the 8 strand are implicated in protein-protein interactions by -sheet augmentation, similar to common PDZ domains (Cowburn, 1997;Doyle et al., 1996;Morais Cabral et al., 1996;Schultz et al., 1998), by annealing an additional -strand to an existing -sheet. This type of coordination is common in proteinprotein interactions (Remaut & Waksman, 2006) and has been characterized structurally in three cases for Smad proteins. The structure of the isolated human Smad4 MH2 domain was solved from a construct with a protracted N-terminal boundary containing the first part of the linker region (for the 'common' Smad4 this is also known as the Smad-activation domain). As mentioned above, the structure revealed that part of the linker adopts an extended conformation and interacts with the H2 region by -sheet augmentation (PDB entry 1dd1; Qin et al., 1999). Similarly, the structure of Smad4 in complex with the repressor protein c-Ski (PDB entry 1mr1) also shows that the interaction occurs through -sheet augmentation of strand 8 (Wu et al., 2002). Moreover, the interaction between Smad2 and SARA (Smad anchor for receptor activation) has been characterized structurally and was also revealed to involve -sheet augmentation, but not in the H2 region (PDB entry 1dev; Wu et al., 2000). It is possible that secondarystructural changes induced by ligand binding could stabilize the 8 structure. Indeed, NetSurfP predicts the presence of strand 8 in Expansion, indicating (to some extent) an intrinsic folding property of this area. Of interest in the remodelling of the H2 region of the Expansion N-MH2 domain as a potential protein-interaction site is the consideration of the degree of specificity that this remodelling might provide.
3.3.5. Structural classification in the Smad/FHA family.
Structural homology to the Smad MH2 domain has also been found in FHA domains and the C-terminal regulatory domain of IRF-3. Despite a complete absence of sequence conservation between these proteins, an evolutionary link has previously been suggested owing to the structural and distant functional similarities between the -sandwiches in these proteins (Durocher et al., 2000;Huse et al., 2001;Takahasi et al., 2003). Indeed, these proteins have all been classified into the same SCOP superfamily: the Smad/FHA domain. The superimposed structures of the human MDC1 FHA domain, the C-terminal regulatory domain of IRF-3 and the Expansion N-MH2 domain are shown in Supplementary Fig. S7. Since the MH2 domain of Smads is similar to the IRF-3 regulatory domain and to the FHA domain, the N-MH2 domain of Expansion is also a member of the same family of structures.
IRF proteins are only found in vertebrates, whereas the Smad MH2 domain and the FHA domain co-exist in metazoans and the FHA domain is also found in prokaryotes. Durocher et al. (2000) found the minimal -sandwich of the FHA domain to comprise alternative protein-interaction sites, similar to those identified in the Smad MH2 domain, and Takahasi et al. (2003) speculated that the different flanking regions of the -sandwich in the C-terminal regulatory domain of IRF-3 and Smad-MH2 developed on the -sandwich scaffold of the FHA protein to facilitate phosphorylation signalling in higher organisms.

Conclusions
We have determined the structure of the N-MH2 domain present in Expansion at 1.6 Å resolution. The addition of the N-terminal helix, the differences in the L3 loop and its lack of a role in the canonical TGF-signalling pathway support the classification of Expansion as a new family of proteins that share the presence of the MH2 domain with Smads. Furthermore, the structural differences between the Smad MH2 and Expansion N-MH2 domains could have evolved to host a different range of protein-interaction partners, with implications for different cellular functions which apparently have been conserved in protostomes (Mollusca, Annelida and Arthropoda phyla).
For these reasons, we suggest that Expansion should not be termed a 'Smad-like' protein. Furthermore, we propose that the Smad/FHA family of structures should be 'expanded' to also include the Expansion N-MH2 domain.