research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983

Crystal structure of human chondroadherin: solving a difficult molecular-replacement problem using de novo models

aDepartment of Biochemistry and Structural Biology, Lund University, Box 124, SE-221 00 Lund, Sweden, and bDepartment of Rheumatology and Molecular Skeletal Biology, Clinical Sciences Lund, Lund University, BMC-C12, SE-221 84 Lund, Sweden
*Correspondence e-mail: raemisch@scripps.edu, derek.logan@biochemistry.lu.se

Edited by G. J. Kleywegt, EMBL-EBI, Hinxton, England (Received 5 July 2016; accepted 12 December 2016)

Chondroadherin (CHAD) is a cartilage matrix protein that mediates the adhesion of isolated chondrocytes. Its protein core is composed of 11 leucine-rich repeats (LRR) flanked by cysteine-rich domains. CHAD makes important interactions with collagen as well as with cell-surface heparin sulfate proteoglycans and α2β1 integrins. The integrin-binding site is located in a region of hitherto unknown structure at the C-terminal end of CHAD. Peptides based on the C-terminal human CHAD (hCHAD) sequence have shown therapeutic potential for treating osteoporosis. This article describes a still-unconventional structure solution by phasing with de novo models, the first of a β-rich protein. Structure determination of hCHAD using traditional, though nonsystematic, molecular replacement was unsuccessful in the hands of the authors, possibly owing to a combination of low sequence identity to other LRR proteins, four copies in the asymmetric unit and weak translational pseudosymmetry. However, it was possible to solve the structure by generating a large number of de novo models for the central LRR domain using Rosetta and multiple parallel molecular-replacement attempts using AMPLE. The hCHAD structure reveals an ordered C-terminal domain belonging to the LRRCT fold, with the integrin-binding motif (WLEAK) being part of a regular α-helix, and suggests ways in which experimental therapeutic peptides can be improved. The crystal structure itself and docking simulations further support that hCHAD dimers form in a similar manner to other matrix LRR proteins.

1. Introduction

Chondroadherin (CHAD) is an extracellular matrix protein that is prominent in cartilaginous tissues, bone and tendon (Shen et al., 1998[Shen, Z., Gantcheva, S., Mânsson, B., Heinegârd, D. & Sommarin, Y. (1998). Biochem. J. 330, 549-557.]; Önnerfjord et al., 2012[Önnerfjord, P., Khabut, A., Reinholt, F. P., Svensson, O. & Heinegård, D. (2012). J. Biol. Chem. 287, 18913-18924.]). It was first isolated and purified from bovine cartilage (Larsson et al., 1991[Larsson, T., Sommarin, Y., Paulsson, M., Antonsson, P., Hedbom, E., Wendel, M. & Heinegård, D. (1991). J. Biol. Chem. 266, 20428-20433.]), and is classified as a member of the leucine-rich repeat (LRR) family (Neame et al., 1994[Neame, P. J., Sommarin, Y., Boynton, R. E. & Heinegård, D. (1994). J. Biol. Chem. 269, 21547-21554.]). Since its first isolation, CHAD has been found to have multiple physiological roles such as cartilage and bone development, where it acts at the interface between the cell surface and the extracellular matrix. It is known to bind collagen type II (Månsson et al., 2001[Månsson, B., Wenglén, C., Mörgelin, M., Saxne, T. & Heinegård, D. (2001). J. Biol. Chem. 276, 32883-32888.]), as well as both the N-terminal and C-terminal non-triple-helical domains of the pericellular matrix collagen type VI (Wiberg et al., 2002[Wiberg, C., Heinegård, D., Wenglén, C., Timpl, R. & Mörgelin, M. (2002). J. Biol. Chem. 277, 49120-49126.], 2003[Wiberg, C., Klatt, A. R., Wagener, R., Paulsson, M., Bateman, J. F., Heinegård, D. & Mörgelin, M. (2003). J. Biol. Chem. 278, 37698-37704.]). Additionally, articular chondrocytes can adhere to CHAD through integrin α2β1 (Camper et al., 1997[Camper, L., Heinegård, D. & Lundgren-Åkerlund, E. (1997). J. Cell Biol. 138, 1159-1167.]), interacting with a binding motif (WLEAK) at the C-terminal end of CHAD (Haglund et al., 2011[Haglund, L., Tillgren, V., Addis, L., Wenglén, C., Recklies, A. & Heinegård, D. (2011). J. Biol. Chem. 286, 3925-3934.]). Moreover, peptides comprised of the human CHAD (hCHAD) C-terminal sequence bind the heparan sulfate chains of syndecan cell-surface proteoglycans with high affinity and affect focal adhesion formation (Haglund et al., 2013[Haglund, L., Tillgren, V., Önnerfjord, P. & Heinegård, D. (2013). J. Biol. Chem. 288, 995-1008.]).

Gene-targeted mice lacking CHAD show cartilage and bone phenotypes (Hessle et al., 2013[Hessle, L., Stordalen, G. A., Wenglén, C., Petzold, C., Tanner, E. K., Brorson, S.-H., Baekkevold, E. S., Önnerfjord, P., Reinholt, F. P. & Heinegård, D. (2013). PLoS One, 8, e63080.]), with changed nanomechanical properties (Batista et al., 2014[Batista, M. A., Nia, H. T., Önnerfjord, P., Cox, K. A., Ortiz, C., Grodzinsky, A. J., Heinegård, D. & Han, L. (2014). Matrix Biol. 38, 84-90.]). Because CHAD mRNA and protein were found at lower levels in older women with osteoporosis, a role for CHAD in bone metabolism was suspected. Indeed, a cyclic peptide (306CQLRGLRRWLEAKASRPDATC326), representing the α2β1 integrin-binding sequence of hCHAD, impaired preosteoclast migration through a nitric oxide synthase 2-dependent mechanism and decreased dangerous osteoclastogenesis and bone resorption. Thus, this cyclic peptide or derivatives thereof have potential for the treatment of osteoporosis. The same peptide was also later demonstrated to inhibit breast-cancer-induced bone metastases and to inhibit primary tumour growth in a mouse model (Rucci et al., 2015[Rucci, N., Capulli, M., Olstad, O. K., Önnerfjord, P., Tillgren, V., Gautvik, K. M., Heinegård, D. & Teti, A. (2015). Cancer Lett. 358, 67-75.]). However, its potential for tumour therapy is considered to be rather limited. Given these interesting findings, the three-dimensional structure of human CHAD, in particular its C-terminal non-LRR domain, would contribute to understanding its role in intermolecular interactions and the therapeutic potential of CHAD-mimicking peptides.

Structurally, CHAD belongs the small leucine-rich repeat proteins (SLRPs) of the extracellular matrix (ECM), which have disulfide clusters at both the N- and C-terminal ends of the LRR region (Hocking et al., 1998[Hocking, A. M., Shinomura, T. & McQuillan, D. J. (1998). Matrix Biol. 17, 1-19.]; Iozzo & Murdoch, 1996[Iozzo, R. V. & Murdoch, A. D. (1996). FASEB J. 10, 598-614.]). SLRPs can further be divided into five subfamilies (Iozzo & Schaefer, 2015[Iozzo, R. V. & Schaefer, L. (2015). Matrix Biol. 42, 11-55.]). The canonical SLRPs include class I, which contains biglycan, decorin and asporin; class II, which consists of fibromodulin, lumican, keratocan, osteoadherin and PRELP; and class III, which contains epiphycan, osteoglycin and opticin. Class IV contains CHAD, CHADL, nyctalopin and tsukushi, while class V consists of podocan and podocan-like.

Structural information on LRR proteins in general continues to give important insights. Many extracellular receptors are comprised of LRR domains. Targeting such domains with inhibitors or agonists will often require accurate structural information. The importance and usefulness of this protein class is further underlined by several recent advances in engineering artificial LRR proteins for use as protein-binding scaffolds or as biomaterials for other applications (Parker et al., 2014[Parker, R., Mercedes-Camacho, A. & Grove, T. Z. (2014). Protein Sci. 23, 790-800.]; Rämisch et al., 2014[Rämisch, S., Weininger, U., Martinsson, J., Akke, M. & André, I. (2014). Proc. Natl Acad. Sci. USA, 111, 17875-17880.]; Parmeggiani et al., 2015[Parmeggiani, F., Huang, P.-S., Vorobiev, S., Xiao, R., Park, K., Caprari, S., Su, M., Seetharaman, J., Mao, L., Janjua, H., Montelione, G. T., Hunt, J. & Baker, D. (2015). J. Mol. Biol. 427, 563-575.]; Park et al., 2015[Park, K., Shen, B. W., Parmeggiani, F., Huang, P.-S., Stoddard, B. L. & Baker, D. (2015). Nature Struct. Mol. Biol. 22, 167-174.]).

Surprisingly, despite their regularity and the strong structural similarities amongst LRR-protein structures, molecular replacement can still be challenging. hCHAD was successfully crystallized, and an initial crystallographic analysis has been reported (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]). After the diffraction data were collected in 2005, many attempts were made to solve the structure using heavy atoms and conventional molecular replacement (MR). MR strategies included careful preparation of search models using homologues known at the time, such as decorin (PDB entries 1xcd, 1xec and 1xku; Scott et al., 2004[Scott, P. G., McEwan, P. A., Dodd, C. M., Bergmann, E. M., Bishop, P. N. & Bella, J. (2004). Proc. Natl Acad. Sci. USA, 101, 15633-15638.]) and biglycan (PDB entry 2ft3; Scott et al., 2006[Scott, P. G., Dodd, C. M., Bergmann, E. M., Sheehan, J. K. & Bishop, P. N. (2006). J. Biol. Chem. 281, 13324-13332.]), as well as the use of automatic pipelines such as MrBUMP (Keegan & Winn, 2008[Keegan, R. M. & Winn, M. D. (2008). Acta Cryst. D64, 119-124.]) and BALBES (Long et al., 2008[Long, F., Vagin, A. A., Young, P. & Murshudov, G. N. (2008). Acta Cryst. D64, 125-132.]) that systematically search all available homologous structures. Two different data sets were used: one extending to 3.2 Å resolution and the other to 2.3 Å resolution. Account was taken of the weak translational noncrystallographic symmetry (tNCS) believed to be present in the crystals (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]), either by feeding the vectors explicitly to the MR programs or by generating pairs of molecules with the requisite relationships. The repetitive nature of the LRR led to unreliable sequence alignments with homologous structures, which were an impediment to the success of, for example, MrBUMP, as some of the generated models were made up of disconnected segments. Semi-plausible MR solutions could occasionally be found, but they never included all four monomers predicted to occupy the asymmetric unit, and the electron-density maps were not conducive to the extensive rebuilding required to bootstrap the refinement process. We attributed our lack of success to a combination of three factors: (i) the low sequence identity of homologous models in the PDB (consistently 25–32%); (ii) the requirement to search for four copies of CHAD in the asymmetric unit; and (iii) the presence of weak tNCS (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]). Even the eventual implementation of corrections for tNCS in Phaser (Read et al., 2013[Read, R. J., Adams, P. D. & McCoy, A. J. (2013). Acta Cryst. D69, 176-183.]) did not help to find an unambiguous solution. Thus, we abandoned attempts to solve the hCHAD structure around 2009, as no new structures appearing in the PDB after that date had higher sequence identity than those existing previously.

Phasing with de novo models is a relatively new, emerging strategy to obtain solutions from crystallographic data in cases where no sufficiently similar homologous models are available. Several proof-of-principle studies have demonstrated the great potential of using de novo protein models or smaller fragments (Das & Baker, 2009[Das, R. & Baker, D. (2009). Acta Cryst. D65, 169-175.]; DiMaio et al., 2011[DiMaio, F., Terwilliger, T. C., Read, R. J., Wlodawer, A., Oberdorfer, G., Wagner, U., Valkov, E., Alon, A., Fass, D., Axelrod, H. L., Das, D., Vorobiev, S. M., Iwaï, H., Pokkuluri, P. R. & Baker, D. (2011). Nature (London), 473, 540-543.]; Rodríguez et al., 2012[Rodríguez, D., Sammito, M., Meindl, K., de Ilarduya, I. M., Potratz, M., Sheldrick, G. M. & Usón, I. (2012). Acta Cryst. D68, 336-343.]; Bibby et al., 2012[Bibby, J., Keegan, R. M., Mayans, O., Winn, M. D. & Rigden, D. J. (2012). Acta Cryst. D68, 1622-1631.]; Terwilliger et al., 2012[Terwilliger, T. C., DiMaio, F., Read, R. J., Baker, D., Bunkóczi, G., Adams, P. D., Grosse-Kunstleve, R. W., Afonine, P. V. & Echols, N. (2012). J. Struct. Funct. Genomics, 13, 81-90.]; DiMaio, 2013[DiMaio, F. (2013). Acta Cryst. D69, 2202-2208.]; Rämisch et al., 2015[Rämisch, S., Lizatović, R. & André, I. (2015). Acta Cryst. D71, 606-614.]). To date, a number of protein structures have been solved using helical fragments as search probes (Millán et al., 2015[Millán, C., Sammito, M. & Usón, I. (2015). IUCrJ, 2, 95-105.]), but only very few new larger protein structures have been reported that have been solved using de novo models (Bruhn et al., 2014[Bruhn, J. F., Barnett, K. C., Bibby, J., Thomas, J. M. H., Keegan, R. M., Rigden, D. J., Bornholdt, Z. A. & Saphire, E. O. (2014). J. Virol. 88, 758-762.]; Hotta et al., 2014[Hotta, K., Keegan, R. M., Ranganathan, S., Fang, M., Bibby, J., Winn, M. D., Sato, M., Lian, M., Watanabe, K., Rigden, D. J. & Kim, C.-Y. (2014). Angew. Chem. Int. Ed. 53, 824-828.]). The structure solution of β-rich proteins has been identified as particularly challenging owing to the intrinsic variability of β-sheets relative to α-helices (Bibby et al., 2012[Bibby, J., Keegan, R. M., Mayans, O., Winn, M. D. & Rigden, D. J. (2012). Acta Cryst. D68, 1622-1631.]).

Promising success in de novo structure prediction of large repeat proteins (unpublished data) led us to re-examine the previously collected hCHAD diffraction data, but employing structure prediction instead of homologous protein structures to find MR solutions. This paper describes in detail the solution of the hCHAD structure by MR using de novo models. We combined native crystallographic data with MR using models obtained by unbiased de novo structure prediction in Rosetta (Leaver-Fay et al., 2011[Leaver-Fay, A. et al. (2011). Methods Enzymol. 487, 545-574.]; Rohl et al., 2004[Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. (2004). Methods Enzymol. 383, 66-93.]; Bradley et al., 2005[Bradley, P., Misura, K. M. S. & Baker, D. (2005). Science, 309, 1868-1871.]) in combination with the extensive, multifaceted search methodology implemented in the AMPLE software (Bibby et al., 2012[Bibby, J., Keegan, R. M., Mayans, O., Winn, M. D. & Rigden, D. J. (2012). Acta Cryst. D68, 1622-1631.]).

Chondroadherin forms noncovalent dimers in solution. The structure now obtained shows the dimeric assembly anticipated from structures of other SLRP proteins, and protein–protein docking in silico supported that the observed dimer structure may be biologically relevant rather than merely a crystallization artefact.

2. Experimental procedures

2.1. Protein crystallization and data collection

The production of native recombinant full-length hCHAD in HEK 293 cells has been described (Månsson et al., 2001[Månsson, B., Wenglén, C., Mörgelin, M., Saxne, T. & Heinegård, D. (2001). J. Biol. Chem. 276, 32883-32888.]). Crystallization and data collection for hCHAD have been reported previously, as have the native Patterson and self-rotation function (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]). The structure solution was performed with the previously reported data set to 2.3 Å resolution, but as a final step we reprocessed the data to 2.1 Å resolution and re-refined the structure. Data quality to 2.1 Å resolution is reported in Table 1[link].

Table 1
Data and structure-quality details for hCHAD

Values in parentheses are for the outer shell.

Space group P21
Unit-cell parameters (Å, °) a = 56.5, b = 111.6, c = 128.7, β = 92.2
Resolution range (Å) 14.9–2.10 (2.14–2.10)
Rmerge(I) 0.088 (0.607)
Rp.i.m.(I) 0.063 (0.475)
CC1/2(I) 0.994 (0.562)
Multiplicity 2.4 (2.0)
Mean I/σ(I) 6.7 (1.1)
Completeness (%) 95.8 (91.1)
Wilson B factor (Å2) 43.9
No. of reflections used in refinement 88893
No. of reflections in Rfree set 7790 [5.1%]
Rwork(F) 0.238
Rfree(F) 0.260
No. of non-H atoms
 Total 10488
 Macromolecules 10359
 Water molecules 127
R.m.s.d., bonds (Å) 0.005
R.m.s.d., angles (°) 0.83
Ramachandran favoured (%) 93.7
Ramachandran allowed (%) 6.1
Ramachandran outliers (%) 0.2
Rotamer outliers (%) 0.9
Clashscore 2.2
Average B factor (Å2)
 Overall 52.4
 Macromolecules 52.5
 Solvent 42.7
No. of TLS groups 4

2.2. Structure modelling and molecular replacement

The structure of the entire hCHAD was modelled using the Abrelax application of the Rosetta macromolecular modelling suite with the talaris2013 energy function (O'Meara et al., 2015[O'Meara, M. J., Leaver-Fay, A., Tyka, M., Stein, A., Houlihan, K., DiMaio, F., Bradley, P., Kortemme, T., Baker, D., Snoeyink, J. & Kuhlman, B. (2015). J. Chem. Theory Comput. 11, 609-622.]) and allowing fragments from homologous proteins. We generated 50 000 models. Visual inspection of the best-scoring models revealed strong structural convergence of the central part of the sequence, whereas the termini showed a wide range of predicted tertiary structures. To reduce the noise introduced by this lack of convergence, we removed residues 1–34 and 231–359 from the 500 lowest-energy models before using these for MR trials. Control modelling was performed in an identical manner, but excluding homologous structures from the fragment set by using the -nohoms option in the Rosetta fragment-picking application.

To increase the chance of finding an MR solution, we chose to employ the AMPLE algorithm implemented in CCP4 v.6.4 (Winn et al., 2011[Winn, M. D. et al. (2011). Acta Cryst. D67, 235-242.]) with default parameters on a 32-core Linux workstation. This algorithm was developed to aid MR using models from protein structure prediction. AMPLE parallelizes several MR approaches, for example pruning off side chains or removing portions of the termini (MrBUMP; Keegan & Winn, 2008[Keegan, R. M. & Winn, M. D. (2008). Acta Cryst. D64, 119-124.]). Phasing in AMPLE was performed using MOLREP (Vagin & Teplyakov, 2010[Vagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22-25.]) and Phaser v.2.5.6 (McCoy et al., 2007[McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658-674.]) in combination with automatic model building using Buccaneer (Cowtan, 2008[Cowtan, K. (2008). Acta Cryst. D64, 83-89.]). Prior to structure modifications and phasing, AMPLE uses SPICKER (Zhang & Skolnick, 2004[Zhang, Y. & Skolnick, J. (2004). J. Comput. Chem. 25, 865-871.]) to align and cluster the input models for MR. Following MR with AMPLE, a large portion of the remaining structure was built using phenix.autobuild (Terwilliger et al., 2008[Terwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61-69.]) and Buccaneer. Finally, less well ordered parts of the termini, in particular the C-terminal cap, were added in several cycles of manual model building using Coot (Emsley et al., 2010[Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486-501.]) and refinement using phenix.refine. TLS modelling (Winn et al., 2001[Winn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122-133.]) was used during refinement, with one set of TLS parameters per protein chain. Automatically generated torsion-angle NCS restraints were used between the four copies of hCHAD in the asymmetric unit. Release of these restraints on the fully refined structure did not result in a lower Rfree value, so they were retained in the final cycle. The coordinates and structure factors for hCHAD have been deposited in the Protein Data Bank as entry 5lfn.

As a test of whether conventional replacement may have worked under near-optimal conditions, we generated search models comprising residues 35–230. This was performed for the top five structural homologues to the final refined hCHAD structure. Models were prepared using either CHAINSAW (Stein, 2008[Stein, N. (2008). J. Appl. Cryst. 41, 641-643.]) or PDBCLIP, producing side chains truncated to the CG atom or unmodified side chains, respectively. An ensemble model was generated for each type by superimposing the five individual models and MR runs were performed for both with a protocol identical to that used for the Rosetta ensemble.

2.3. Protein–protein docking

Global protein–protein docking was performed using Rosetta 3.5 with the talaris2013 energy function and standard options as described in the Rosetta manual (https://www.rosettacommons.org/docs/latest/Home). Prior to the simulations, the side-chain orientations of the separated monomers were optimized to avoid bias from the complex structure. 20 000 decoys were generated, each from different random starting orientations of the two monomers.

3. Results

3.1. Structure solution

Given our lack of success using conventional MR methods, we were inspired to solve the structure of hCHAD by molecular replacement using de novo molecular models. We generated 50 000 models for the complete sequence using de novo structure prediction in Rosetta. The central part of the 500 lowest-energy models (residues 35–230) then served as input for MR trials with AMPLE. Clustering of the 500 input models by SPICKER yielded a large cluster containing 200 models. AMPLE then generates subclusters; it produced 117 ensembles with varying backbone and side-chain truncations. Those ensembles were used as input for subsequent MR trials. Several ensembles yielded promising solutions. The best solution, obtained using Phaser, had four copies of the search model successfully placed with a translation-function Z-score (TFZ) of 9.4 and a final (refined) LLG of 1683. Placement of the first copy yielded a TFZ score of 4.5 and an LLG of 500. Subsequent refinement using phenix.refine resulted in a promising Rfree value of 0.42.

AMPLE attempts phasing by gradually truncating the backbone and mutating side chains to alanine. Remarkably, the best phasing solution was obtained using an ensemble of full-atom models without any further truncation. For subsequent automatic model building we started with phenix.autobuild, which yielded an improved model with R = 0.33 and Rfree = 0.38. This result strongly suggested that we had successfully circumvented the initial molecular-replacement issues using de novo models. At this point, only a few missing β-strands had been successfully added to the initial Phaser solution, whereas most of the electron density at the missing C-terminal part was filled with water molecules. Unfortunately, as frequently observed, automatic model building built disconnected chains. Furthermore, the four chains in the asymmetric unit differed significantly in length. We removed the water molecules and some randomly placed short peptides, manually reconnected the backbone of the longest autobuilt chain and superimposed this chain onto the others using the MAMMOTH software (Lupyan et al., 2005[Lupyan, D., Leo-Macias, A. & Ortiz, A. R. (2005). Bioinformatics, 21, 3255-3263.]). After further automatic and manual model building and refinement against data reprocessed to 2.1 Å resolution, we were able to finalize the structure of hCHAD to Rmodel and Rfree values of 0.238 and 0.260, respectively (Table 1[link]).

The models from structure prediction showed an unexpected accuracy, at least regarding the converged portion (positions 35–230). 298 structures deviated by less than 2 Å in r.m.s.d. from the final hCHAD model, and 76 deviated by 1.5 Å or less. Fig. 2(b) shows an overlay of the cluster centre (centroid) of the ensemble that yielded the best MR solution. This ensemble (provided as Supporting Information) contained 30 models with pairwise r.m.s.d.s between 1.0 and 1.7 Å (mean r.m.s.d. of 1.3 Å). The centroid of this ensemble is also the centroid of the 200-model ensemble that was generated in the first AMPLE step (SPICKER). When sorted by Rosetta energy, the centroid model is ranked as number 150 out of 500. Thus, our result supports the premise of the AMPLE strategy, i.e. that for MR trials ensembles of structure-prediction models might be a better choice than simply selecting the best energy models.

Despite numerous unsuccessful MR attempts in the past, we made a post hoc control attempt to solve the structure using existing PDB models under `near-optimal' conditions, i.e. knowing the final structure and which portion of the structure had worked using Rosetta models. We generated ensembles of the top five structural homologues to residues 35–230 of the final refined structure of hCHAD (Table 2[link]), with both truncated and complete side chains. The best solution with side-chain truncation had four copies with a TFZ of 13.0 and a final (refined) LLG of 1590. Relative to the first chain, the rotation of the second chain was correct, but the solution was displaced relative to the correct solution by two repeats. For the third and fourth chains the rotations and translations were both incorrect. Refinement of this solution with phenix.refine gave Rfree = 0.481. Similar results were obtained for the all-atom ensemble (TFZ = 11.0, LLG = 1371, Rfree = 0.475), in which all but the first chain were incorrectly placed. In neither case did the maps show obvious areas for improvement.

Table 2
R.m.s. deviations of the 20 closest structural homologues in the PDB90 data set to the core region 35–230 of hCHAD found using the DALI server

The hits are sorted according to DALI Z-score (Z). R.m.s.d. is the r.m.s. deviation in Cα positions with respect to the query structure; Ntot is the total number of residues in the target structure; %ID is the percentage sequence identity over the aligned residues. An X in the final column indicates that the structure was available at the time that the MR searches described in this paper were carried out.

PDB code, chain Z R.m.s.d. (Å) Nalign Ntot %ID Description Available?
4psj, A 26.9 1.9 194 255 31 Engineered protein OR464 X
2o6q, A 25.2 1.7 196 270 33 Hagfish variable lymphocyte receptor A29 X
4qxe, A 25.2 1.5 196 443 29 Hagfish VLR coupled to LGR4  
1p8t, A 25.1 1.5 196 285 30 Reticulon 4 receptor (Nogo 66 receptor) X
4ktl, A 25.1 1.5 196 461 29 LGR4 ECD X
3kj4, A 25.0 1.5 196 283 29 Rat Nogo receptor 1 X
2z81, A 24.9 1.9 195 550 25 Toll-like receptor 2 X
3m18, A 24.7 2.1 194 245 31 Variable lymphocyte receptor A X
4p91, A 24.6 1.5 194 286 28 Reticulon 4 receptor-like 2 (Nogo receptor 2) X
4rcw, A 24.6 2.2 178 234 29 Slit and NTRK-like protein 1  
4y61, B 24.5 1.8 194 235 27 Slitrk2 LRR1  
4bsr, A 24.3 1.5 195 483 27 LGR5 X
4rca, B 24.2 1.9 194 241 28 Slitrk2  
4r5d, A 24.2 2.3 185 441 30 Designed leucine-rich repeat protein  
4r5c, A 23.6 1.7 192 304 33 Designed leucine-rich repeat protein  
3rfs, A 23.5 2.4 193 263 27 Designed leucine-rich repeat protein X
4li1, B 23.4 1.8 193 425 25 LGR4 X
4r6f, A 23.1 2.1 195 329 30 Designed leucine-rich repeat protein  
3rfj, A 22.9 2.4 193 268 28 Designed leucine-rich repeat protein X
4lxr, A 22.9 1.8 195 755 25 Toll  

3.2. Overall structure of CHAD

The asymmetric unit of hCHAD crystals, in space group P21, contains four chains arranged in two pairs, A/B and C/D (see Fig. 1[link]), that resemble the dimers of decorin and biglycan seen previously (see below). Chains A and B are related to each other by a twofold noncrystallographic symmetry axis almost parallel to the unit-cell b axis, as are chains C and D. Chains A and D are related by a twofold axis almost parallel to the unit-cell a axis. The second of these twofold NCS axes corresponds to the twofold peak in the self-rotation function previously noted (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]) at θ = 21.4°, φ = 0°. This arrangement of axes leads to weak translational NCS. For example, chain C is related to chain D in another asymmetric unit by a rotation of a few degrees and a translation of (0.152, 0.495, 0.215), which is close to the peak in the native Patterson function at (0.129, 0.473, 0.220) noted previously (Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]).

[Figure 1]
Figure 1
Arrangement of molecules within the crystallographic unit cell. The asymmetric unit is shown in colour (chains A, B, C and D); dimers are formed by chains AB and CD. Vertical black lines indicate the intra-dimer twofold symmetry axes, and the second twofold-symmetry axis, relating chains A and D, is indicated in the centre of the asymmetric unit, almost parallel to the a axis of the unit cell.

The resolved part of the hCHAD sequence starts with Cys23. This position has been reported to be the N-terminal amino acid in the mature protein when purified directly from cartilage. The protein used for crystallization was expressed in human cells; thus, the crystal structure is consistent with cleavage of the signal peptide upon protein translocation into the endoplasmic reticulum. Based on sequence analysis, CHAD has been classified as a leucine-rich repeat protein (Neame et al., 1994[Neame, P. J., Sommarin, Y., Boynton, R. E. & Heinegård, D. (1994). J. Biol. Chem. 269, 21547-21554.]). Consistent with this analysis, the crystal structure (Fig. 2[link]a) shows a classical horseshoe-shaped LRR fold with a linear hydrophobic core, a concave site comprised of a long parallel β-sheet with 13 strands, and the frequently observed asparagine ladder (Jenkins & Pickersgill, 2001[Jenkins, J. & Pickersgill, R. (2001). Prog. Biophys. Mol. Biol. 77, 111-175.]). There are 11 LRRs, the structures of which differ mostly on the convex side of the horseshoe. The 20 most structurally similar LRR proteins in the PDB are listed in Table 3[link].

Table 3
R.m.s. deviations of the 20 closest structural homologues in the PDB90 data set to the whole of hCHAD found using the DALI server

Sorting and annotation is the same as for Table 2[link].

PDB code, chain Z R.m.s.d. (Å) Nalign Ntot %ID Description Available?
5a5c, B 35.7 2.3 319 332 34 Engineered LRRTM2  
4oqt, A 31.3 3.0 319 475 25 Lingo-1 X
4u7l, A 31.2 2.7 318 455 27 LRIG1 X
5cmp, A 31.0 2.6 308 324 26 FLRT3  
3zyj, A 30.2 2.8 291 396 30 NGL1 X
5ftt, F 30.1 2.5 312 332 24 FLRT2  
1xcd, A 29.9 2.1 285 305 28 Decorin X
4qxf, B 29.7 2.2 278 301 27 LRR-containing GPCR X
3zyo, A 29.7 2.2 291 391 30 LRR-containing protein 4B X
4qxe, A 29.6 2.2 296 443 26 LRR-containing GPCR X
4kt1, A 29.6 2.4 303 461 26 LRR-containing GPCR X
2ft3, B 29.2 2.5 289 305 28 Biglycan X
3rg1, B 29.1 3.0 313 599 23 CD180 X
1p8t, A 28.3 2.7 265 285 26 Reticulon 4 receptor X
3kj4, A 28.3 2.6 262 283 26 Rat Nogo receptor X
4p91, A 28.2 2.6 271 286 29 Reticulon 4 receptor-like protein 2 X
4bv4, R 28.0 2.6 306 440 24 Toll/variable lymphocyte receptor B chimera X
3zyi, A 28.0 2.5 287 395 26 LRR-containing protein 4 X
4r6f, A 28.0 2.5 275 329 26 Designed LRR protein DLRR_I X
4li1, B 27.9 2.8 295 425 24 LGR4 X
[Figure 2]
Figure 2
Structure of human chondroadherin. (a) Tertiary structure of the hCHAD monomer. (b) Overlay of the cluster centre of the ensemble that yielded the best MR solution and the final hCHAD model (chain A). (c) Repeats 2–7; the main-chain trace illustrates the high structural regularity. (d) N-terminal caps of hCHAD, biglycan (PDB entry 2ft3) and decorin (PDB entry 1xcd); disulfide bonds are shown as sticks. (e) C-terminal caps of hCHAD and Nogo receptor; the two disulfide bonds are shown as sticks. (f) C-terminal caps of biglycan and decorin for comparison; they have a different arrangement and disulfide bonding to hCHAD. (g) Comparison of the functionally related proteins hCHAD (purple), biglycan (sand) and decorin (pink) shows very similar curvatures compared with the Nogo receptor LRR domain (green).

In contrast to many SLRPs, CHAD is characterized by the extreme regularity of its LRRs, all but two of which are exactly 23 residues long. This is reflected in the structure, where each repeat, apart from repeat 8, contains a β-strand and a loop. Repeats 2–7 have an almost identical main-chain conformation in the loop regions, which leads to extreme regularity in this part of the structure (Fig. 2[link]b). Their structure is remarkably similar to the structure of repeats 1–8 of the Nogo receptor (He et al., 2003[He, X. L., Bazan, J. F., McDermott, G., Park, J. B., Wang, K., Tessier-Lavigne, M., He, Z. & Garcia, K. C. (2003). Neuron, 38, 177-185.]). A sequence alignment of all of the hCHAD repeats demonstrates impressively how many different sequences are compatible with virtually identical backbone structures (Fig. 3[link]). Repeats 8–11 are more varied; repeat 8 contains a short α-helix that disrupts the regularity of the convex face and repeat 9 is one residue longer, resulting in a bulge in the loop. Repeat 10 returns to the conformation of repeats 2–7.

[Figure 3]
Figure 3
Alignment of individual hCHAD leucine-rich repeat sequences.

As noted above, the similarities to other SLRPs are more limited. The most similar structure is that of the class I SLRP decorin, where the LRRs vary in length from 21 to 30 amino acids owing to the formation of diverse secondary-structure elements on the convex face of the molecule (Scott et al., 2004[Scott, P. G., McEwan, P. A., Dodd, C. M., Bergmann, E. M., Bishop, P. N. & Bella, J. (2004). Proc. Natl Acad. Sci. USA, 101, 15633-15638.]).

Only ten of the LRRs in CHAD were identified by Neame and coworkers in their initial sequence analysis. This is possibly a result of using a different definition of repeat boundaries, thus overlooking the β-strand of the first LRR. In the first hCHAD repeat (residues 33–52), the first two residues of the LRR hallmark motif LxxLxL show a different conformation than in canonical LRRs. The first repeat forms a cap together with the ten N-terminal residues; N-terminal caps typically deviate from internal repeats in LRR proteins.

The hCHAD N-terminal cap has a canonical LRRNT architecture, which is the simplest form known for LRR proteins (Park et al., 2008[Park, H., Huxley-Jones, J., Boot-Handford, R. P., Bishop, P. N., Attwood, T. K. & Bella, J. (2008). BMC Genomics, 9, 599.]). It is comprised only of a short loop and a β-strand with an antiparallel orientation relative to the strands of the following repeats. The short N-terminus is stabilized by two disulfide bonds between the first and third cysteines, as well as the second and the fourth cysteines (Cys23–Cys29 and Cys27–Cys38; Fig. 2[link]d). The fifth cysteine points towards the core and does not engage in a disulfide bond.

The C-terminal cap contains a complete α-helix (residues 306–317) followed by a stretch of amino acids that lack defined secondary-structure elements. Similarly to the N-terminus, the C-terminus of the LRR solenoid structure is stabilized by two disulfide bonds. Sequence alignment of class IV SLRPs suggested a disulfide pattern as seen in the Nogo receptor structure (McEwan et al., 2006[McEwan, P. A., Scott, P. G., Bishop, P. N. & Bella, J. (2006). J. Struct. Biol. 155, 294-305.]). The solved structure confirms this analysis; the conformation of the C-terminal cap is very similar to that of the Nogo receptor (see Fig. 2[link]e and below). This pattern, however, differs from the initial biochemical analysis of bovine CHAD (96% identical to hCHAD) using proteolytic cleavage and sequencing by Edman degradation. As expected for class IV SLRPs, the sixth and eighth cysteines form a disulfide bond (Cys304–Cys326) that connects the loop of the 11th LRR and the first loop of the cap region, whereas the seventh and the ninth cysteines (Cys306–Cys346) connect the N-terminal end of the α-helix to the neighbouring loop at the last visible residue in hCHAD (Fig. 2[link]e). However, this last disulfide bond is only visible in chains A and D, and even in these chains it shows signs of partial reduction. The disulfide-bonding pattern originally proposed for bovine CHAD (Cys304–Cys346 and Cys306–Cys326) is stereochemically unlikely, as the SG atoms of the cysteines are about 11 and 13 Å from each other, respectively.

Interestingly, although the repeat structure is more similar to that of the Nogo receptor, the overall geometry of hCHAD rather resembles those of the functionally related proteins biglycan and decorin. The curvature of the LRR region is significantly lower (i.e. with a larger radius) for the Nogo receptor LRR domain than for the three extracellular matrix proteins (Fig. 2[link]f).

The C-terminal tail of hCHAD (residues 347–360) was not resolved in the structure. This part of the molecule mediates binding to syndecan cell surface proteoglycan receptors, which, in concert with integrin ligation, enables cytoskeletal rearrangement and focal adhesion formation (Haglund et al., 2013[Haglund, L., Tillgren, V., Önnerfjord, P. & Heinegård, D. (2013). J. Biol. Chem. 288, 995-1008.]).

3.3. hCHAD forms a dimer in the crystal

The extracellular matrix proteins decorin, biglycan and opticin, all of which are LRR proteins similar to chondro­adherin, are known to form homodimers (Le Goff et al., 2012[Le Goff, M. M., Sutton, M. J., Slevin, M., Latif, A., Humphries, M. J. & Bishop, P. N. (2012). J. Biol. Chem. 287, 28027-28036.]; Scott et al., 2003[Scott, P. G., Grossmann, J. G., Dodd, C. M., Sheehan, J. K. & Bishop, P. N. (2003). J. Biol. Chem. 278, 18353-18359.], 2006[Scott, P. G., Dodd, C. M., Bergmann, E. M., Sheehan, J. K. & Bishop, P. N. (2006). J. Biol. Chem. 281, 13324-13332.]). Indeed, dimerization in solution has been observed both for cartilage-tissue-extracted CHAD and for native and recombinantly expressed CHAD (Larsson et al., 1991[Larsson, T., Sommarin, Y., Paulsson, M., Antonsson, P., Hedbom, E., Wendel, M. & Heinegård, D. (1991). J. Biol. Chem. 266, 20428-20433.]; Månsson et al., 2001[Månsson, B., Wenglén, C., Mörgelin, M., Saxne, T. & Heinegård, D. (2001). J. Biol. Chem. 276, 32883-32888.]; Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]). In the crystal structures of both decorin and biglycan, the dimer interface is located at the N-terminal half of the concave site. For decorin it was shown that the homodimer interface coincides with the collagen-binding site, and that only monomeric decorin is able to bind collagen (Islam et al., 2013[Islam, M., Gor, J., Perkins, S. J., Ishikawa, Y., Bächinger, H. P. & Hohenester, E. (2013). J. Biol. Chem. 288, 35526-35533.]). The crystal structure presented here suggests that the hCHAD dimeric assembly resembles those formed by decorin and biglycan (Fig. 4[link]a). Although the angle between the two monomers is larger than for the other two LRR proteins, the location of the interface is very similar. As seen, for example, in the decorin dimer, the central part of the chondroadherin dimer interface contains very few intra-chain contacts, whereas the flanking regions form two much tighter interfaces. At 1500 Å2, the dimer interface is significantly smaller than for decorin and biglycan (2300 and 2550 Å2). In particular, the ring-stacking interaction between His28 and Phe150 and hydrogen bonding between several histidines and tyrosines appear to be key interactions (Fig. 4[link]b). In the hCHAD dimer, the two monomers are much more inclined relative to the symmetry axis than is the case for decorin (Fig. 4[link]c).

[Figure 4]
Figure 4
(a) Quaternary structure of the hCHAD dimer. The surface representation shows the lack of contacts between the subunits in the central part of the interface and the two interfaces formed between the respective N-terminal caps and repeats 4–9. (b) Key interactions within the dimer interfaces include ring stacking between His28 and Phe150 as well as hydrogen bonding between tyrosines and histidines. (c) Comparison of hCHAD and decorin (PDB entry 1xcd) dimers. Two monomers are superimposed (ribbon representation); the dashed line indicates an angle of 0°.

To test whether the dimeric arrangement in the crystal is consistent with dimer formation in solution, we performed interface analysis using PISA (Krissinel & Henrick, 2007[Krissinel, E. & Henrick, K. (2007). J. Mol. Biol. 372, 774-797.]) and docking simulations in Rosetta. The PISA analysis suggested that dimer formation is unlikely to occur in solution. However, control analyses using structures of the experimentally verified dimers biglycan (PDB entry 2ft3) and decorin (PDB entries 1xku, 1xec and 1xcd) gave the same result, with an even worse interface ΔΔG, rendering this analysis inconclusive. The Rosetta docking simulations were started by randomly re-orientating the separated subunits (distance of 100 Å) and energy-minimizing all side chains in the separated chains. Fig. 5[link] shows the Rosetta energies of the best 1000 docking decoys and their r.m.s.d.s from the dimer in the crystal structure. There is a strong convergence of the best-scoring docking solutions towards the orientation in the crystal. This means that those orientations that are found to be very close to the crystallo­graphic dimer are also energetically the most favourable ones. Such a clear convergence is a strong indicator of the correctness of the predicted binding interface. Convergence in docking simulations can only serve as an indicator that the system of interest does indeed form a complex. However, the presence of a dimer in the crystal structure and the known dimerization of two structurally and functionally closely related proteins, together with the computational docking results, suggest that hCHAD dimers observed in solution correspond to the dimer assembly of the crystal structure.

[Figure 5]
Figure 5
Result of protein–protein docking simulations with two hCHAD monomers placed randomly in space. The plot shows the complex energy (Rosetta energy units) versus the Cα r.m.s.d. to the dimer structure in the crystal for the top 1000 models. There is a clear convergence towards the crystal structure, indicating it to be a biologically relevant dimer.

3.4. hCHAD has a structured integrin-binding site

The crystal structure of hCHAD reveals a well ordered C-terminal cap structure unlike that observed in canonical SLRPs (i.e. class I, II or III SLRPs). In fact, the hCHAD capping structure conforms to the LRRCT motif found in most LRR proteins, rather than to the LLRCE motif unique to the canonical SLRPs (Park et al., 2008[Park, H., Huxley-Jones, J., Boot-Handford, R. P., Bishop, P. N., Attwood, T. K. & Bella, J. (2008). BMC Genomics, 9, 599.]). In the absence of structural information, this part has previously been consistently considered as mostly unstructured and often described as two large loops, each closed by a disulfide bond, although sequence analysis suggested that it and that of nyctalopin might fall into the same structural group as the Nogo receptor and GPIbα. This region carries the binding site for α2β1 integrin and is hence of functional importance for cell adhesion. Previous studies showed that a short peptide that contains residues 306–318 (CQLRGLRRWLEAK) contains the integrin-binding site (Haglund et al., 2011[Haglund, L., Tillgren, V., Addis, L., Wenglén, C., Recklies, A. & Heinegård, D. (2011). J. Biol. Chem. 286, 3925-3934.]). Using scrambled sequence variants, the WLEAK sequence could be identified as crucial for integrin binding. The hCHAD structure shows that residues 306–317 are precisely those that form the C-terminal α-helix (Fig. 6[link]). Hence, the integrin-binding motif is likely to be structured in vivo. Trp314 and Leu315 from the WLEAK sequence are packed against the last LRR, and Trp314 is equivalent to those residues previously identified as important for the hydrophobic core of the LRRCT cap (Park et al., 2008[Park, H., Huxley-Jones, J., Boot-Handford, R. P., Bishop, P. N., Attwood, T. K. & Bella, J. (2008). BMC Genomics, 9, 599.]). Trp314 can potentially form contacts with a binding partner upon the reorientation of a surface residue that covers one side of the tryptophan ring. In chains A and B of the crystal unit this residue is Lys318; in chains C and D it is Arg321. These differences are owing to crystal contacts between chains A and B and symmetry-related copies of the same chain, which affects the conformation of the loop immediately following the α-helix (residues 318–324). In particular, Lys318 is positioned partly by a salt bridge to Glu89 in the symmetry-related neighbour. These differences between hCHAD chains indicate conformational plasticity in the C-terminal cap that may be functionally relevant. However, the more likely conformation in solution is that seen in chains C and D, where Arg321 stacks on Trp314. This is both owing to the better energetics of the cation–π interaction and to the fact that this conformation is unaffected by crystal contacts. If hCHAD does not undergo major structural changes upon integrin binding, the interface would be tightly interlocked because the tryptophan would lie at the bottom of a cleft between the 11th LRR and the α-helix. Integrin would need to fit into this cleft if the Trp314 side chain is indeed part of the interface. In contrast, Glu316 from the WLEAK motif is exposed in all four copies of hCHAD and is involved in a salt bridge to Arg312 on the same side of the helix. In this position it could potentially coordinate the Mg2+ ion in α2β1 integrin. Whether integrin binds to the structured helix or whether it adopts a different conformation upon binding remains to be investigated.

[Figure 6]
Figure 6
Structure of the integrin-binding site. (a) Position and conformation of the WLEAK motif (yellow) in the LRRCT capping structure. (b, c) Electron density around the WLEAK motif shown from different angles. Density maps were generated at a contour level of 1.

4. Discussion

Despite the solution of an increasing number of leucine-rich repeat protein structures, molecular replacement remains challenging in the absence of close homologues. There has been progress in structural modelling of protein repeats using repeat-type specific knowledge (Kajava, 2001[Kajava, A. V. (2001). J. Struct. Biol. 134, 132-144.]). Here, we demonstrate for the first time that general de novo structure prediction can yield models of leucine-rich repeat proteins that are suitable for obtaining high-resolution molecular-replacement solutions. To our knowledge, this is only the third structure that has been solved using de novo structure prediction and AMPLE. The two structures previously solved using this approach were the coiled-coil domain of the Nipah virus phosphoprotein (Bruhn et al., 2014[Bruhn, J. F., Barnett, K. C., Bibby, J., Thomas, J. M. H., Keegan, R. M., Rigden, D. J., Bornholdt, Z. A. & Saphire, E. O. (2014). J. Virol. 88, 758-762.]) and the S-adenosyl-L-methionine-dependent methyltransferase Ecm18 (Hotta et al., 2014[Hotta, K., Keegan, R. M., Ranganathan, S., Fang, M., Bibby, J., Winn, M. D., Sato, M., Lian, M., Watanabe, K., Rigden, D. J. & Kim, C.-Y. (2014). Angew. Chem. Int. Ed. 53, 824-828.]), which has a size similar to that of CHAD. However, to our knowledge hCHAD is the first structure of a β-rich protein to be solved de novo. All-β proteins were identified as the most difficult structural class to tackle using these methods, possibly owing to the fundamental variability of the β-sheet tertiary structure. It could be the case that this potential structural variability was compensated for by the extreme regularity of the hCHAD structure (e.g. in LRRs 2–7, which have almost identical loop conformations). This could enable de novo modelling algorithms to produce unusually accurate structures. In contrast, even the minor differences in the LRR length and structure of the loop regions compared with decorin and other homologues were possibly enough to foil conventional MR at a sequence-identity level of 30%. Table 2[link] shows a list of the 20 top structural homologues identified by a DALI (Holm & Rosenström, 2010[Holm, L. & Rosenström, P. (2010). Nucleic Acids Res. 38, W545-W549.]) search against the PDB90 subset of PDB entries using the refined model. None of these has an r.m.s.d. of less than 1.5 Å to the final structure for residues 35–230. Thus, the structural optimization performed in Rosetta has clearly had an influence on the quality of the search model.

It should be stressed that models from de novo structure prediction are not necessarily the only way to address difficult problems such as that described here. A plethora of methods for generating input structures for MR exist. For example, alternative structure-prediction methods such as homology modelling can be used, or phasing may succeed using small fragments as implemented in ARCIMBOLDO. Furthermore, it cannot be ruled out that even more exhaustive trials using conventional MR methodology could eventually have been successful. Our post hoc control MR attempt produced solutions that were close to, but significantly worse than, the Rosetta solution, with chains misoriented with respect to each other both in rotation and translation. This result suggests that conventional MR would have been difficult even under the best conditions and that the use of a large ensemble of energy-minimized all-atom models with the correct sequence was key to the success of MR in this case. In any case, the successful structure solution of a relatively large all-β protein such as hCHAD using de novo models underlines the increasing utility of protein structure prediction as one alternative to the use of homologous structures for MR. Moreover, the combination of Rosetta structure prediction and AMPLE yielded a solution without requiring any parameter adjustments. Therefore, an MR solution was obtained with relatively little time and effort.

Rosetta employs a fragment-insertion strategy that uses pre-generated fragments based on sequence homology. Thus, the availability of structural homologues with similar sequences can help to find a more correct solution. Here, fragments from two relatively close homologues (PDB entries 2z81 and 2o6q) were part of the fragment sets. A control modelling run with homologues excluded (performed after the structure was solved) resulted in six more models with an r.m.s.d. of <1.5 Å than did the initial modelling. In general, the likelihood of successful modelling will be reduced if less similar structures are available. However, extensive benchmarking and the biannual CASP competitions, as well as our control modelling, show that accurate structures can be predicted even in the absence of homologues.

Like decorin and biglycan, CHAD forms dimers in solution, as seen by size-exclusion chromatography, SDS–PAGE and Western blotting of CHAD (Larsson et al., 1991[Larsson, T., Sommarin, Y., Paulsson, M., Antonsson, P., Hedbom, E., Wendel, M. & Heinegård, D. (1991). J. Biol. Chem. 266, 20428-20433.]; Månsson et al., 2001[Månsson, B., Wenglén, C., Mörgelin, M., Saxne, T. & Heinegård, D. (2001). J. Biol. Chem. 276, 32883-32888.]; Pramhed et al., 2008[Pramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516-519.]). Thus, the dimeric assembly observed in the crystal structure probably corresponds to the native dimers, although it may in principle be the result of frequently observed weak crystal contacts and hence differ from the native dimer in solution. Interestingly, the performed docking simulations identified the crystal dimer with high accuracy. In the absence of convergence towards a single state, it is difficult to judge whether two proteins interact natively, because the native interaction might not be correctly sampled or its energy might be evaluated wrongly owing to deficiencies in the available energy functions. However, when convergence towards a single state is seen, one can be more confident as the presence of a `funnel' correlates well with relatively high affinities (Gray et al., 2003[Gray, J. J., Moughon, S., Wang, C., Schueler-Furman, O., Kuhlman, B., Rohl, C. A. & Baker, D. (2003). J. Mol. Biol. 331, 281-299.]). Moreover, we find it highly un­likely that an artificial dimer that strongly resembles the oligomers of functionally and structurally closely related proteins would form by coincidence.

The obtained structure highlights the structural and functional redundancy of extracellular matrix proteins; both the overall structure and the oligomerization behaviour are remarkably similar between chondroadherin, decorin and biglycan. Here, we show that the previously identified integrin-binding motif of hCHAD is a well ordered α-helix that forms an integral part of a classical LRRCT C-terminal cap. Previously, the design of peptides from the hCHAD C-terminus was based on the assumption that the C-terminus consists of loops. The specific binding of these peptides to integrin suggests that the integrin-binding motif can adopt the correct conformation even in the absence of the rest of the proteins. However, binding a structured linear epitope with a stable conformation within a protein comes at a lower entropic cost than binding the same epitope in a flexible peptide. This means that the binding affinity in the second case is likely to be lower than binding the epitope in its native context. The structural information on the C-terminal domain presented here can now be used to improve existing therapeutic peptides. For example, the cyclic peptide 306–326 investigated by Capulli et al. (2014[Capulli, M., Olstad, O. K., Önnerfjord, P., Tillgren, V., Muraca, M., Gautvik, K. M., Heinegård, D., Rucci, N. & Teti, A. (2014). J. Bone Miner. Res. 29, 1833-1846.]) is based on the idea that the peptide will cyclize through the originally proposed disulfide-bonding pattern Cys306–Cys326 (Neame et al., 1994[Neame, P. J., Sommarin, Y., Boynton, R. E. & Heinegård, D. (1994). J. Biol. Chem. 269, 21547-21554.]), whereas the true disulfide is between Cys304 and Cys326. Cys306 is 14 Å from Cys326, which may distort the peptide conformation in solution. Interestingly, Haglund and coworkers detected higher affinity when using a linear peptide version, which might be owing to the release of the otherwise non-native conformation (Haglund et al., 2011[Haglund, L., Tillgren, V., Addis, L., Wenglén, C., Recklies, A. & Heinegård, D. (2011). J. Biol. Chem. 286, 3925-3934.]). In general, the peptides are most likely to be very flexible. They lack the hydrophobic core of the C-terminal cap that is formed by the inside of the α-helix along with residues in loops C-terminal to the peptide, most importantly Ile337 and Phe343. Inclusion of more residues from the C-terminal cap, or utilizing the correct disulfide pattern in therapeutic peptides, might improve their potency by more closely mimicking the native structure.

Supporting information


Footnotes

Present address: Department of Immunology and Microbial Science, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.

§Present address: Euro Diagnostica AB, PO Box 50117, SE-202 11 Malmö, Sweden.

Acknowledgements

We wish to express our gratitude to the late Professor Dick Heinegård for his contribution to the initial stages of this project and Dr Ingemar André (Lund University) for providing computational infrastructure as well as the encouragement to try MR with modelled LRR proteins. We thank the staff at the MAX-lab macromolecular crystallography beamlines for access and technical assistance.

References

First citationBatista, M. A., Nia, H. T., Önnerfjord, P., Cox, K. A., Ortiz, C., Grodzinsky, A. J., Heinegård, D. & Han, L. (2014). Matrix Biol. 38, 84–90.  CrossRef CAS
First citationBibby, J., Keegan, R. M., Mayans, O., Winn, M. D. & Rigden, D. J. (2012). Acta Cryst. D68, 1622–1631.  Web of Science CrossRef IUCr Journals
First citationBradley, P., Misura, K. M. S. & Baker, D. (2005). Science, 309, 1868–1871.  Web of Science CrossRef PubMed CAS
First citationBruhn, J. F., Barnett, K. C., Bibby, J., Thomas, J. M. H., Keegan, R. M., Rigden, D. J., Bornholdt, Z. A. & Saphire, E. O. (2014). J. Virol. 88, 758–762.  Web of Science CrossRef CAS PubMed
First citationCamper, L., Heinegård, D. & Lundgren-Åkerlund, E. (1997). J. Cell Biol. 138, 1159–1167.  CrossRef CAS PubMed Web of Science
First citationCapulli, M., Olstad, O. K., Önnerfjord, P., Tillgren, V., Muraca, M., Gautvik, K. M., Heinegård, D., Rucci, N. & Teti, A. (2014). J. Bone Miner. Res. 29, 1833–1846.  CrossRef CAS
First citationCowtan, K. (2008). Acta Cryst. D64, 83–89.  Web of Science CrossRef CAS IUCr Journals
First citationDas, R. & Baker, D. (2009). Acta Cryst. D65, 169–175.  Web of Science CrossRef CAS IUCr Journals
First citationDiMaio, F. (2013). Acta Cryst. D69, 2202–2208.  Web of Science CrossRef IUCr Journals
First citationDiMaio, F., Terwilliger, T. C., Read, R. J., Wlodawer, A., Oberdorfer, G., Wagner, U., Valkov, E., Alon, A., Fass, D., Axelrod, H. L., Das, D., Vorobiev, S. M., Iwaï, H., Pokkuluri, P. R. & Baker, D. (2011). Nature (London), 473, 540–543.  Web of Science CrossRef CAS PubMed
First citationEmsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501.  Web of Science CrossRef CAS IUCr Journals
First citationGray, J. J., Moughon, S., Wang, C., Schueler-Furman, O., Kuhlman, B., Rohl, C. A. & Baker, D. (2003). J. Mol. Biol. 331, 281–299.  Web of Science CrossRef PubMed CAS
First citationHaglund, L., Tillgren, V., Addis, L., Wenglén, C., Recklies, A. & Heinegård, D. (2011). J. Biol. Chem. 286, 3925–3934.  CrossRef CAS
First citationHaglund, L., Tillgren, V., Önnerfjord, P. & Heinegård, D. (2013). J. Biol. Chem. 288, 995–1008.  CrossRef CAS
First citationHe, X. L., Bazan, J. F., McDermott, G., Park, J. B., Wang, K., Tessier-Lavigne, M., He, Z. & Garcia, K. C. (2003). Neuron, 38, 177–185.  Web of Science CrossRef PubMed CAS
First citationHessle, L., Stordalen, G. A., Wenglén, C., Petzold, C., Tanner, E. K., Brorson, S.-H., Baekkevold, E. S., Önnerfjord, P., Reinholt, F. P. & Heinegård, D. (2013). PLoS One, 8, e63080.  CrossRef
First citationHocking, A. M., Shinomura, T. & McQuillan, D. J. (1998). Matrix Biol. 17, 1–19.  CrossRef CAS PubMed
First citationHolm, L. & Rosenström, P. (2010). Nucleic Acids Res. 38, W545–W549.  Web of Science CrossRef CAS PubMed
First citationHotta, K., Keegan, R. M., Ranganathan, S., Fang, M., Bibby, J., Winn, M. D., Sato, M., Lian, M., Watanabe, K., Rigden, D. J. & Kim, C.-Y. (2014). Angew. Chem. Int. Ed. 53, 824–828.  Web of Science CrossRef CAS
First citationIozzo, R. V. & Murdoch, A. D. (1996). FASEB J. 10, 598–614.  CAS PubMed Web of Science
First citationIozzo, R. V. & Schaefer, L. (2015). Matrix Biol. 42, 11–55.  CrossRef CAS
First citationIslam, M., Gor, J., Perkins, S. J., Ishikawa, Y., Bächinger, H. P. & Hohenester, E. (2013). J. Biol. Chem. 288, 35526–35533.  CrossRef CAS
First citationJenkins, J. & Pickersgill, R. (2001). Prog. Biophys. Mol. Biol. 77, 111–175.  Web of Science CrossRef PubMed CAS
First citationKajava, A. V. (2001). J. Struct. Biol. 134, 132–144.  CrossRef CAS
First citationKeegan, R. M. & Winn, M. D. (2008). Acta Cryst. D64, 119–124.  Web of Science CrossRef CAS IUCr Journals
First citationKrissinel, E. & Henrick, K. (2007). J. Mol. Biol. 372, 774–797.  Web of Science CrossRef PubMed CAS
First citationLarsson, T., Sommarin, Y., Paulsson, M., Antonsson, P., Hedbom, E., Wendel, M. & Heinegård, D. (1991). J. Biol. Chem. 266, 20428–20433.  PubMed CAS Web of Science
First citationLeaver-Fay, A. et al. (2011). Methods Enzymol. 487, 545–574.  CAS PubMed
First citationLe Goff, M. M., Sutton, M. J., Slevin, M., Latif, A., Humphries, M. J. & Bishop, P. N. (2012). J. Biol. Chem. 287, 28027–28036.  CrossRef CAS
First citationLong, F., Vagin, A. A., Young, P. & Murshudov, G. N. (2008). Acta Cryst. D64, 125–132.  Web of Science CrossRef CAS IUCr Journals
First citationLupyan, D., Leo-Macias, A. & Ortiz, A. R. (2005). Bioinformatics, 21, 3255–3263.  CrossRef CAS
First citationMånsson, B., Wenglén, C., Mörgelin, M., Saxne, T. & Heinegård, D. (2001). J. Biol. Chem. 276, 32883–32888.
First citationMcCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674.  Web of Science CrossRef CAS IUCr Journals
First citationMcEwan, P. A., Scott, P. G., Bishop, P. N. & Bella, J. (2006). J. Struct. Biol. 155, 294–305.  Web of Science CrossRef PubMed CAS
First citationMillán, C., Sammito, M. & Usón, I. (2015). IUCrJ, 2, 95–105.  Web of Science CrossRef PubMed IUCr Journals
First citationNeame, P. J., Sommarin, Y., Boynton, R. E. & Heinegård, D. (1994). J. Biol. Chem. 269, 21547–21554.  CAS PubMed Web of Science
First citationO'Meara, M. J., Leaver-Fay, A., Tyka, M., Stein, A., Houlihan, K., DiMaio, F., Bradley, P., Kortemme, T., Baker, D., Snoeyink, J. & Kuhlman, B. (2015). J. Chem. Theory Comput. 11, 609–622.  CAS
First citationÖnnerfjord, P., Khabut, A., Reinholt, F. P., Svensson, O. & Heinegård, D. (2012). J. Biol. Chem. 287, 18913–18924.
First citationPark, H., Huxley-Jones, J., Boot-Handford, R. P., Bishop, P. N., Attwood, T. K. & Bella, J. (2008). BMC Genomics, 9, 599.
First citationPark, K., Shen, B. W., Parmeggiani, F., Huang, P.-S., Stoddard, B. L. & Baker, D. (2015). Nature Struct. Mol. Biol. 22, 167–174.  CrossRef CAS
First citationParker, R., Mercedes-Camacho, A. & Grove, T. Z. (2014). Protein Sci. 23, 790–800.  CrossRef CAS
First citationParmeggiani, F., Huang, P.-S., Vorobiev, S., Xiao, R., Park, K., Caprari, S., Su, M., Seetharaman, J., Mao, L., Janjua, H., Montelione, G. T., Hunt, J. & Baker, D. (2015). J. Mol. Biol. 427, 563–575.  CrossRef CAS
First citationPramhed, A., Addis, L., Tillgren, V., Wenglén, C., Heinegård, D. & Logan, D. T. (2008). Acta Cryst. F64, 516–519.  CrossRef IUCr Journals
First citationRämisch, S., Lizatović, R. & André, I. (2015). Acta Cryst. D71, 606–614.  Web of Science CrossRef IUCr Journals
First citationRämisch, S., Weininger, U., Martinsson, J., Akke, M. & André, I. (2014). Proc. Natl Acad. Sci. USA, 111, 17875–17880.
First citationRead, R. J., Adams, P. D. & McCoy, A. J. (2013). Acta Cryst. D69, 176–183.  Web of Science CrossRef CAS IUCr Journals
First citationRodríguez, D., Sammito, M., Meindl, K., de Ilarduya, I. M., Potratz, M., Sheldrick, G. M. & Usón, I. (2012). Acta Cryst. D68, 336–343.  Web of Science CrossRef IUCr Journals
First citationRohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. (2004). Methods Enzymol. 383, 66–93.  CrossRef PubMed CAS
First citationRucci, N., Capulli, M., Olstad, O. K., Önnerfjord, P., Tillgren, V., Gautvik, K. M., Heinegård, D. & Teti, A. (2015). Cancer Lett. 358, 67–75.  CrossRef CAS
First citationScott, P. G., Dodd, C. M., Bergmann, E. M., Sheehan, J. K. & Bishop, P. N. (2006). J. Biol. Chem. 281, 13324–13332.  Web of Science CrossRef PubMed CAS
First citationScott, P. G., Grossmann, J. G., Dodd, C. M., Sheehan, J. K. & Bishop, P. N. (2003). J. Biol. Chem. 278, 18353–18359.  CrossRef CAS
First citationScott, P. G., McEwan, P. A., Dodd, C. M., Bergmann, E. M., Bishop, P. N. & Bella, J. (2004). Proc. Natl Acad. Sci. USA, 101, 15633–15638.  Web of Science CrossRef PubMed CAS
First citationShen, Z., Gantcheva, S., Mânsson, B., Heinegârd, D. & Sommarin, Y. (1998). Biochem. J. 330, 549–557.  CrossRef CAS
First citationStein, N. (2008). J. Appl. Cryst. 41, 641–643.  Web of Science CrossRef CAS IUCr Journals
First citationTerwilliger, T. C., DiMaio, F., Read, R. J., Baker, D., Bunkóczi, G., Adams, P. D., Grosse-Kunstleve, R. W., Afonine, P. V. & Echols, N. (2012). J. Struct. Funct. Genomics, 13, 81–90.  CrossRef CAS PubMed
First citationTerwilliger, T. C., Grosse-Kunstleve, R. W., Afonine, P. V., Moriarty, N. W., Zwart, P. H., Hung, L.-W., Read, R. J. & Adams, P. D. (2008). Acta Cryst. D64, 61–69.  Web of Science CrossRef CAS IUCr Journals
First citationVagin, A. & Teplyakov, A. (2010). Acta Cryst. D66, 22–25.  Web of Science CrossRef CAS IUCr Journals
First citationWiberg, C., Heinegård, D., Wenglén, C., Timpl, R. & Mörgelin, M. (2002). J. Biol. Chem. 277, 49120–49126.  CrossRef CAS
First citationWiberg, C., Klatt, A. R., Wagener, R., Paulsson, M., Bateman, J. F., Heinegård, D. & Mörgelin, M. (2003). J. Biol. Chem. 278, 37698–37704.  CrossRef CAS
First citationWinn, M. D. et al. (2011). Acta Cryst. D67, 235–242.  Web of Science CrossRef CAS IUCr Journals
First citationWinn, M. D., Isupov, M. N. & Murshudov, G. N. (2001). Acta Cryst. D57, 122–133.  Web of Science CrossRef CAS IUCr Journals
First citationZhang, Y. & Skolnick, J. (2004). J. Comput. Chem. 25, 865–871.  Web of Science CrossRef PubMed CAS

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds