research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983

Identification and characterization of two classes of G1 β-bulge

CROSSMARK_Color_square_no_text.svg

aCollege of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
*Correspondence e-mail: david.leader@glasgow.ac.uk

Edited by B. Kobe, University of Queensland, Australia (Received 15 September 2020; accepted 24 November 2020; online 26 January 2021)

In standard β-bulges, a residue in one strand of a β-sheet forms hydrogen bonds to two successive residues (`1' and `2') of a second strand. Two categories, `classic' and `G1' β-bulges, are distinguished by their dihedral angles: 1,2-αRβR (classic) or 1,2-αLβR (G1). It had previously been observed that G1 β-bulges are most often found as components of two quite distinct composite structures, suggesting that a basis for further differentiation might exist. Here, it is shown that two subtypes of G1 β-bulges, G1α and G1β, may be distinguished by their conformation (αR or βR) at residue `0' of the second strand. β-Bulges that are constituents of the composite structure named the β-bulge loop are of the G1α type, whereas those that are constituents of the composite structure named β-link here are of the G1β type. A small proportion of G1β β-bulges, but not G1α β-bulges, occur in other contexts. There are distinctive differences in amino-acid composition and sequence pattern between these two types of G1 β-bulge which may have practical application in protein design.

1. Introduction

The β-bulge was first described (Richardson et al., 1978[Richardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574-2578.]) as a small motif in which, in its commonest and standard form, a residue (`X') in one antiparallel strand of a β-sheet forms main chain–main chain hydrogen bonds to two successive residues (`1' and `2') of a second strand instead of making both hydrogen bonds to a single residue. This disrupts the regular β-sheet so that a bulge occurs, in some cases ending the participation of one or both of the strands in the sheet. Originally, two main types of β-bulge were distinguished: the classic β-bulge, with an αR conformation at position 1, and the G1 β-bulge, with an αL conformation at position 1 (Richardson et al., 1978[Richardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574-2578.]). (The definitions of αR, αL etc. used here can be found in Section 5[link].) The name G1 derives from the frequent, but not invariable (Chan et al., 1993[Chan, A. W., Hutchinson, E. G., Harris, D. & Thornton, J. M. (1993). Protein Sci. 2, 1574-1590.]), occurrence of glycine at this position. Other variants (`wide', `bent' and `special') have been described (Chan et al., 1993[Chan, A. W., Hutchinson, E. G., Harris, D. & Thornton, J. M. (1993). Protein Sci. 2, 1574-1590.]) but are much less frequent, at only 10% of all β-bulges (Craveur et al., 2013[Craveur, P., Joseph, A. P., Rebehmed, J. & de Brevern, A. G. (2013). Protein Sci. 22, 1366-1378.]).

There has recently been renewed interest in β-bulges because the inclusion of both classic (Marcos et al., 2017[Marcos, E., Basanta, B., Chidyausiku, T. M., Tang, Y., Oberdorfer, G., Liu, G., Swapna, G. V. T., Guan, R., Silva, D.-A., Dou, J., Pereira, J. H., Xiao, R., Sankaran, B., Zwart, P. H., Montelione, G. T. & Baker, D. (2017). Science, 355, 201-206.]) and G1 (Dou et al., 2018[Dou, J., Vorobieva, A. A., Sheffler, W., Doyle, L. A., Park, H., Bick, M. J., Mao, B., Foight, G. W., Lee, M. Y., Gagnon, L. A., Carter, L., Sankaran, B., Ovchinnikov, S., Marcos, E., Huang, P.-S., Vaughan, J. C., Stoddard, B. L. & Baker, D. (2018). Nature, 561, 485-491.]) β-bulges in protein design has proved to be necessary to achieve certain structural features. It was originally observed that G1 β-bulges occurred in the context of two quite different composite structures: the β-bulge loop (Milner-White, 1987[Milner-White, E. J. (1987). Biochim. Biophys. Acta, 911, 261-265.]) and what we call the β-link [a structure incorporating a β-bulge and a type II β-turn (Venkatachalam, 1968[Venkatachalam, C. M. (1968). Biopolymers, 6, 1425-1436.]) directed away from the β-sheet (Richardson et al., 1978[Richardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574-2578.])]. The question arises whether features of G1 β-bulges exist that favour the formation of one or the other of these composites and, if so, whether this information can be used in the design of synthetic proteins. We show here that by considering the conformation of the amino-acid residue N-terminal to the doubleton of the G1 β-bulge, such a distinction can be made.

2. Materials and methods

This work employed two MySQL relational databases that modelled the atoms, residues and hydrogen bonds in different sets of proteins. The smaller one, Protein Motif, which was used in the initial phases of this work (Leader & Milner-White, 2009[Leader, D. P. & Milner-White, E. J. (2009). BMC Bioinformatics, 10, 60.]), contains information on 417 globular proteins from the 500 Protein Data Bank files from the Richardson laboratory (Lovell et al., 2003[Lovell, S. C., Davis, I. W., Arendall, W. B., de Bakker, P. I., Word, J. M., Prisant, M. G., Richardson, J. S. & Richardson, D. C. (2003). Proteins, 50, 437-450.]). (Not all proteins in this and the larger data set were used because some contained duplicated amino-acid positions and other nonstandard features that conflicted with our database schema, causing them to be rejected.) Secondary-structure information and φ and ψ dihedral angles of residues were derived using DSSP (Kabsch & Sander, 1983[Kabsch, W. & Sander, C. (1983). Biopolymers, 22, 2577-2637.]), whereas for the χ and ω angles we utilized BBDEP (Dunbrack & Karplus, 1993[Dunbrack, R. L. Jr & Karplus, M. (1993). J. Mol. Biol. 230, 543-574.]). Backbone and inter-residue hydrogen bonds were derived using HBPlus (McDonald & Thornton, 1994[McDonald, I. K. & Thornton, J. M. (1994). J. Mol. Biol. 238, 777-793.]).

The Protein Motif database was populated with a range of motifs derived from SQL queries specifying residue numbers and identities, dihedral angles and hydrogen bonds. For β-bulges the initial specification for the query was two consecutive residues (1 and 2) with a hydrogen bond between the main-chain CO of residue 1 and the main-chain NH of a third residue (X) and a hydrogen bond between the main-chain NH of residue 2 and the main-chain CO of residue X. A further stipulation was that residue 2 should have the βR conformation (defined in Section 5[link]). These β-bulges were divided into two classes: 1,2-αRβR (classic) and 1,2-αLβR (G1).

This database is part of the public web application Motivated Proteins (Leader & Milner-White, 2009[Leader, D. P. & Milner-White, E. J. (2009). BMC Bioinformatics, 10, 60.]) incorporating the molecular viewer Jmol (Herráez, 2006[Herráez, A. (2006). Biochem. Mol. Biol. Educ. 34, 255-261.]) and is also part of the desktop application Structure Motivator (Leader & Milner-White, 2012[Leader, D. P. & Milner-White, E. J. (2012). BMC Struct. Biol. 12, 26.]). Motivated Proteins allows the visualization of individual motifs in the context of the protein, whereas Structure Motivator allows the visualization of dihedral angles at different motif positions.

The second, larger, database, Proteins4K, was constructed specifically for this work. It contains information on 4485 globular proteins from the `Top 8000' filtered structures from the Richardson laboratory (https://kinemage.biochem.duke.edu/databases/top8000.php). It was built using the same pipeline as Protein Motif, except that a script, dihedral.pl, kindly provided by Roland Dunbrack, was used instead of BBDEP. We used Proteins4K for command-line queries and populated it with β-bulges and the composite motifs encompassing them: β-bulge loops and β-links. The SQL queries for β-bulges made the same hydrogen-bond specifications as above, with the inclusion of dihedral angles at positions 0, 1 and 2 to provide subclasses.

Our approach differs from others employed to study structural motifs such as the PROMOTIF program (Hutchinson & Thornton, 1996[Hutchinson, E. G. & Thornton, J. M. (1996). Protein Sci. 5, 212-220.]). Although computationally less powerful than dedicated programs written in a language such as Fortran, SQL queries of a relational database modelling protein structure were used because of their flexibility. Regardless of the motifs that already populate the database, one can quickly retrieve and visualize information about constructs that suggest themselves in the course of an investigation.

3. Results and discussion

3.1. Differentiation between the G1 β-bulges in β-bulge loops and β-links

The relational database of protein structural information, Protein Motif (Leader & Milner-White, 2009[Leader, D. P. & Milner-White, E. J. (2009). BMC Bioinformatics, 10, 60.]; see Section 2[link]), containing 417 proteins was used for our initial work and for that in Fig. 2. In addition to primary data, it is populated with derived small structural motifs, including the β-bulge loop (Milner-White, 1987[Milner-White, E. J. (1987). Biochim. Biophys. Acta, 911, 261-265.]) and the β-link. The latter is a composite of a β-bulge and a type II β-turn where the 1,2-positions of the β-bulge constitute the 3,4-positions of the β-turn (Fig. 1[link]). [The β-link was originally described by Richardson et al. (1978[Richardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574-2578.]), but was not named by them and has been somewhat neglected until recently.]

[Figure 1]
Figure 1
The different types of β-bulges and their relationship to composite structures. The singleton is designated `X' and the doubleton residues as `1' and `2' in the N to C direction. In the diagrams, inter-main-chain hydrogen bonds are represented as broken lines, with the red circles representing O atoms and the blue circles representing N atoms. (a) Differentiation of standard and G1 β-bulges on the basis of their conformation at position 1 (yellow). (b) Subclassification of G1 β-bulges into types G1α and G1β on the basis of their conformation at position 0 (light blue). (c) Relationship of G1 β-bulges to larger composite structures: the β-bulge loop-5 and the β-link. Representative backbone three-dimensional structures of the composites in the context of two β-strands are shown below the diagrams, with the four residues of the β-bulge indicated in CPK colours and other residues in white: β-bulge loop (PDB entry 1a2p; Martin et al., 1999[Martin, C., Richard, V., Salem, M., Hartley, R. & Mauguen, Y. (1999). Acta Cryst. D55, 386-398.]) and β-link (PDB entry 2sak; Rabijns et al., 1997[Rabijns, A., De Bondt, H. L. & De Ranter, C. (1997). Nat. Struct. Mol. Biol. 4, 357-360.]).

While visualizing the dihedral angles of β-bulge loops and β-links as Ramachandran plots in the desktop application Structure Motivator (Leader & Milner-White, 2012[Leader, D. P. & Milner-White, E. J. (2012). BMC Struct. Biol. 12, 26.]), it became evident that the G1 β-bulges belonging to these two composite motifs differed at what would be position `0', N-terminal to the doubleton. In the β-bulge loop this had the αR conformation, whereas in the β-link it had the βR conformation. When modified versions of β-bulges, extended to include position `0', were viewed in the Structure Motivator application two separate distributions of dihedral angles were apparent (Fig. 2[link]).

[Figure 2]
Figure 2
Main-chain dihedral angles at different positions of G1 β-bulges and composite structures containing them. The results are for the motifs present in the Protein Motif data set of 417 proteins visualized with Structure Motivator. The numbers at the top right of each frame indicate the residue position in the nomenclature of Fig. 1[link]. (Position −1 of β-bulge loop-6 is not shown.)

We have therefore altered the definition of the β-bulge to include position `0' and have subdivided the G1 β-bulges into two classes: G1α, 0,1,2-αRαLβR, and G1β, 0,1,2-βRαLβR. These are illustrated diagrammatically in Fig. 1[link](b). Fig. 1[link](c) shows examples of the two composite motifs within protein structures.

3.2. Occurrence of G1 β-bulges outside β-bulge loops or β-links

Having established that the extended definition of G1 β-bulges allows one to distinguish those present in β-bulge loops from those in β-links, it was pertinent to ask whether β-bulges occurred in other contexts than within these composites. We performed the following analysis using the tenfold larger database Proteins4K. We first queried the database for all β-bulges conforming to the pattern 0,1,2-θαRβR (classic) or 0,1,2-θαLβR (G1), stipulating that the pattern of hydrogen bonding to residue X be as in Fig. 1[link](a). (θ represents any of the four pairs of dihedral angles, αL, αR, βL and βR.) The number of instances of each of the eight subtypes so defined are given in the `Total' column of Table 1[link]. It can be seen that almost all classic β-bulges are of subtype 0,1,2-βRαRβR and that the vast majority of G1 β-bulges are of the subtypes G1α or G1β. (The proportions of these three types are included in Fig. 1[link].) As can be seen in Fig. 2[link], for any specification such as βR, the values of the dihedral angles found at different positions and in different motifs vary. Mean values for the major types of β-bulge are given in Supplementary Table S1.

Table 1
Occurrence of different types of β-bulge and their participation in composite motifs

Standard β-bulges were retrieved from a database of 4485 proteins by queries specifying the hydrogen-bonding pattern in Fig. 1[link](a) and the dihedral angles given in the Subtype column. Queries were made to determine the number of each subtype present in the two composite motifs indicated. For β-bulge loops this involved the additional specification that the singleton residue X was at position −2, −3 or −4 for β-bulge loop-5, loop-6 or loop-7, respectively. For β-­links this involved the additional specification of a hydrogen bond between the peptide-bond O atom at position −1 and the peptide-bond N atom at position 2 in the numbering of Fig. 1[link](c).

Type Subtype Total β-Bulge loop β-Link
1,2-αRβR (classic) 0,1,2-αRαRβR 38 (<1%) 25 0
0,1,2-βRαRβR 5133 (50%) 2 0
0,1,2-αLαRβR 116 (1%) 101 0
0,1,2-βLαRβR 8 (<1%) 0 2
1,2-αLβR (G1) 0,1,2-αRαLβR (G1α) 3348 (33%) 3312§ 0
0,1,2-βRαLβR (G1β) 1506 (15%) 2 1283
0,1,2-αLαLβR 128 (1%) 123 68
0,1,2-βLαLβR 13 (<1%) 0 0
β-Bulge loop-5 (24 instances), β-bulge loop-6 (one instance).
β-Bulge loop-5 (88 instances), β-bulge loop-6 (13 instances).
§β-Bulge loop-5 (2154 instances), β-bulge loop-6 (1155 instances), β-bulge loop-7 (three instances).
β-Bulge loop-5 (81 instances), β-bulge loop-6 (8 instances), β-bulge loop-7 (33 instances).

The Proteins4K database was populated with these subclasses of β-bulges, which were then queried to determine the proportion in higher-order structures. To identify β-bulges in loops such as the β-bulge loop-5 (Fig. 1[link]c), loop-6 or higher, the query was for the position of residue X relative to residue 1. To identify β-bulges in β-links the query was for a hydrogen bond between positions −1 and 2 of the β-bulge (Fig. 1[link]c). The final two columns of Table 1[link] show that 99% of G1α β-bulges occur in β-bulge loops and 85% of G1β β-bulges occur in β-links. The 15% of G1β β-bulges that are not in β-links are considered below.

3.3. Amino-acid preferences of G1α and G1β β-bulges

Fig. 3[link] compares the amino-acid compositions of the main types of β-bulge at the four defining positions, 0, 1, 2 and X. Fig. 3[link](a) shows that although both G1α and G1β β-bulges have a high proportion of glycine at position 1, their amino-acid compositions differ considerably at the other positions. This is most marked for position X, where G1α β-bulges are rich in amino acids with side chains that have hydrogen-bonding potential (50% Asn/Asp, 15% Ser/Thr), whereas G1β β-bulges are rich in aliphatic amino acids at this position (63% Ala/Ile/Leu/Val). Position 0, the conformation of which differentiates the G1 β-bulges, also shows differences in amino-acid composition, with G1α β-bulges being rich in residues with polar side chains (73% Asn/Asp/Gln/Glu/Ser/Thr), whereas G1β β-bulges have 48% Ala/Lys/Pro/Val. A degree of similarity occurs at position 2, with both types of G1 β-bulge having many residues with polar side chains, although G1β β-bulges are enriched in aspartate (G1β, 21%; G1α, 4%).

[Figure 3]
Figure 3
Amino-acid compositions of different classes of β-bulge in the Proteins4K database. The four positions 0, 1, 2, X are as shown in Fig. 1[link]. The numbers in the figure are the percentages of the total for all 20 contributed by each individual amino-acid residue. (a) The three classes of β-bulge: classic (5133), G1α (3348) and G1β (1506). (b) Division of G1α β-bulges into those present in β-bulge loop-5 (BBL-5; 2154) and loop-6 (BBL-6; 1155) and of G1β β-­bulges into those present in (1283) or absent from (223) β-links.

We separated G1α β-bulges into those that are components of β-bulge loop-5 and β-bulge loop-6 structures, and separated G1β β-bulges into those that are components of β-links and the 15% that are not. Their amino-acid compositions are shown in Fig. 3[link](b). It is evident that G1α β-bulges belonging to β-bulge loop-5 motifs have a higher proportion of glycine residues at position 1 than those in β-bulge loop-6 motifs, and that at position 0 their polar amino acids are skewed to aspartate and asparagine at the expense of threonine. G1β β-bulges within β-links are likewise enriched in glycine at position 1 compared with those not in β-links. Also noteworthy is that the enrichment in aspartate at position 2 of G1β β-bulges is confined to those in β-links.

Some of these differences in amino-acid composition can be rationalized in terms of constraints imposed by the composite structures of which G1 β-bulges are components. This is illustrated in Fig. 4[link]. The polar side chain at position X of approximately 70% of G1α β-bulges (which is rare at this position in G1β β-bulges) may be involved in either backbone (Fig. 4[link]a) or side-chain (Figs. 4[link]b and 4[link]c) hydrogen bonding within the β-bulge loop. In the case of G1β β-bulges additional side-chain hydrogen bonding is often found from a polar side chain at position 2 to the backbone NH or CO at position −1 (Figs. 4[link]d, 4[link]e and 4[link]f). The amino-acid residue frequently involved is aspartate, which is much less abundant in G1β β-bulges that are not parts of β-links. Aspartate is equally rare at this position in G1α β-bulges. Hydrogen bonding by aspartate and asparagine side chains to nearby main-chain atoms has previously been observed in small motifs (Eswar & Ramakrishnan, 1999[Eswar, N. & Ramakrishnan, C. (1999). Protein Eng. 12, 447-455.]; Wan & Milner-White, 1999[Wan, W.-Y. & Milner-White, E. J. (1999). J. Mol. Biol. 286, 1633-1649.]; Duddy et al., 2004[Duddy, W. J., Nissink, J. W., Allen, F. H. & Milner-White, E. J. (2004). Protein Sci. 13, 3051-3055.]). The greater abundance of glycine residues at position 1 in G1 β-bulges in the more tightly constrained β-bulge loop-5 motifs and β-links suggests a role in stabilizing the respective β-turns in these latter structures. It should also be mentioned that there is a clear difference in the distribution of dihedral angles found at position X of β-bulge loop-5 and β-bulge loop-6 motifs within the general βR region, as indicated by arrowheads and asterisks, respectively, in Fig. 2[link].

[Figure 4]
Figure 4
Examples of additional hydrogen bonding in composite structures incorporating G1 β-bulges: (a) G1α β-bulge loop (PDB entry 119l, residues 20–24; Blaber et al., 1993[Blaber, M., Lindstrom, J. D., Gassner, N., Xu, J., Heinz, D. W. & Matthews, B. W. (1993). Biochemistry, 32, 11363-11373.]), (b) G1α β-bulge loop (PDB entry 1aqb, residues 124–128; Zanotti et al., 1998[Zanotti, G., Panzalorto, M., Marcato, A., Malpeli, G., Folli, C. & Berni, R. (1998). Acta Cryst. D54, 1049-1052.]), (c) G1α β-bulge loop (PDB entry 1fnc, residues 239–243; Bruns & Karplus, 1995[Bruns, C. M. & Karplus, P. A. (1995). J. Mol. Biol. 247, 125-145.]), (d) G1β β-link (PDB entry 1a62, residues 55, 92–95; Allison et al., 1998[Allison, T. J., Wood, T. C., Briercheck, D. M., Rastinejad, F., Richardson, J. P. & Rule, G. S. (1998). Nat. Struct. Mol. Biol. 5, 352-356.]), (e) G1β β-link (PDB entry 1k7i, residues 318–321, 334; Hege & Baumann, 2001[Hege, T. & Baumann, U. (2001). J. Mol. Biol. 314, 187-193.]), (f) G1β β-­link (PDB entry 1fdr, residues 9, 83–86; Ingelman et al., 1997[Ingelman, M., Bianchi, V. & Eklund, H. (1997). J. Mol. Biol. 268, 147-157.]).

3.4. Sequence patterns and heterogeneity of G1 β-bulges

The difference in the dihedral angles of G1α and G1β β-bulges enables one to distinguish them in proteins of known three-dimensional structure. In a similar way, a machine-learning approach allows one to assign the most probable structure of the two on the basis of amino-acid preferences (D. P. Leader, E. J. Milner-White & S. Rogers, unpublished work). However, in engineering proteins with specific subtypes of β-bulge a sequence of amino acids must be selected that is likely to produce the desired structure: a choice made from the many combinations of the most frequent amino acids in the four positions 0, 1, 2 and X.

Supplementary Table S2 contains a list of sequence patterns for the G1 β-bulges. Although the number of variants is large, it is instructive to examine the five that occur most frequently in each category, as shown in Table 2[link]. For G1α β-bulges present in β-bulge loop-5 motifs, tripeptides for the 0, 1, 2 sequence of the type DG(S/T/N) are common, as expected from the amino-acid composition, and allow the selection of combinations with residue X that are uncommon in other subtypes. The frequent occurrence of the 0, 1, 2, X combination KGEN is less expected: it is as abundant as all other –GEN combinations in total. Its structure is shown in Fig. 4[link](c), with hydrogen bonds between the asparagine side chain at position X and the glutamate side-chain O atom and backbone NH group. The lysine residue is oriented away from the β-bulge hydrogen bonds towards the surface of the protein and, in all instances except one, does not interact with the carboxyl group of the glutamate. For the G1α β-bulges in β-bulge loop-6 motifs the most common sequences are consistent with the frequencies of amino acids.

Table 2
Sequence patterns for G1 β-bulges

The five most frequently occurring patterns are shown for each motif. The frequency is per thousand motifs, with the actual number of instances in parentheses. Where no instance of a sequence pattern was found for a particular motif the entry in the table has been left blank to facilitate comparison.

Sequence Motif
(0, 1, 2, X) G1α (BBL5) G1α (BBL6) G1β (β-link)§ G1β (no β-link)
KGEN 11 (24)   2 (2)  
DGND 10 (22)      
DGTN 10 (21) 1 (1)    
DGSN 9 (20) 2 (2)    
DGTL 9 (19)      
TGED 1 (2) 16 (19)    
TGKD 2 (4) 16 (18)    
TGEN   10 (12)    
TGRD 0 (1) 10 (11)    
TGAD 1 (3) 9 (10)    
PGDV     21 (27)  
VGDV     12 (16)  
EGDV     9 (12)  
IGDV     9 (12)  
KGDV     9 (12)  
QNEL       13 (3)
ADVT       9 (2)
AGIT       9 (2)
AGVT       9 (2)
KDYY       9 (2)
†G1α β-bulge within a β-bulge loop-5 (1143 unique patterns in 2153 motif occurrences).
‡G1α β-bulge within a β-bulge loop-6 (854 unique patterns in 1152 motif occurrences).
§G1β β-bulge within a β-link (824 unique patterns in 1283 motif occurrences).
¶G1β β-bulge not within a β-link (213 unique patterns in 223 motif occurrences).

The situation for the majority of G1β β-bulges, those that form β-links, is that the most abundant combinations 0, 1, 2, X are of the type –DGV, as in the amino-acid compositions. The most frequent sequence pattern is PGDV, a reflection of proline being most frequent at position 0. What is not evident from Table 2[link] is that the amino acid at position −1 is either lysine or arginine in half of the 27 instances. The disposition of these side chains is towards the surface of the protein away from the β-bulge hydrogen bonds (Fig. 4[link]f), resembling that of lysine at position 0 in the KGEN motif of G1α β-bulges. In this case, however, about half of the basic side chains interact with the carboxylate group of the aspartate. These observations are consistent with previous analysis of the distribution of amino acids in β-sheets, which showed that lysine and arginine are often found at the edges of sheets (Fujiwara et al., 2014[Fujiwara, K., Ebisawa, S., Watanabe, Y., Toda, H. & Ikeguchi, M. (2014). Proteins, 82, 1484-1493.]), where most G1 β-bulges are located.

Although we believe that this analysis of sequence patterns will be useful in protein design, it is evident that other factors determine whether a particular pattern will be appropriate in any instance.

4. Conclusions

This work answers a longstanding question about G1 β-bulges by showing that there are two subtypes, G1α and G1β, which can be differentiated on the basis of the conformation at position 0. A reclassification of β-bulges on this basis has been implemented in the Protein Motif database and the publicly available web (Leader & Milner-White, 2009[Leader, D. P. & Milner-White, E. J. (2009). BMC Bioinformatics, 10, 60.]) and desktop (Leader & Milner-White, 2012[Leader, D. P. & Milner-White, E. J. (2012). BMC Struct. Biol. 12, 26.]) applications that incorporate it.

An important aspect of this reclassification is that these two types of G1 β-bulge are integral components of two different composite structures: G1α β-bulges in β-bulge loops and G1β β-bulges in β-links. G1α β-bulges and the loops containing them occur in different types of β-sheet as an alternative to the simple β-turn in β-hairpin and β-meander structures. In β-barrels, these loops may serve to reduce strain (Dou et al., 2018[Dou, J., Vorobieva, A. A., Sheffler, W., Doyle, L. A., Park, H., Bick, M. J., Mao, B., Foight, G. W., Lee, M. Y., Gagnon, L. A., Carter, L., Sankaran, B., Ovchinnikov, S., Marcos, E., Huang, P.-S., Vaughan, J. C., Stoddard, B. L. & Baker, D. (2018). Nature, 561, 485-491.]). The β-links (Richardson et al., 1978[Richardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574-2578.]), in which the majority of G1β β-bulges reside, have received less attention, but our unpublished work shows that they are important in small β-barrels and in β-sandwich proteins. The analysis of G1 β-bulges in the present work should help to inform the design of engineered proteins in these categories.

5. Abbreviations

αR encompasses the range of dihedral angles −140° < φ < −20°, −90° < ψ < 40°, αL the range 20° < φ < 140°, −40° < ψ < 90°, βR the range 150° < φ or φ < −25°, 40° < ψ or ψ < −150° and βL the range 20° < φ < 140°, −180° < ψ < −80° (here the γL region is included within the αL region). These abbreviations are used in shorthand descriptions of β-bulges to indicate the conformations at residues 0, 1 and 2 on the `bulged' strand: for example, 0,1,2-αRαLβR indicates a β-bulge in which residue 0 has the αR conformation, residue 1 has the αL conformation and residue 2 has the βR conformation.

Supporting information


References

First citationAllison, T. J., Wood, T. C., Briercheck, D. M., Rastinejad, F., Richardson, J. P. & Rule, G. S. (1998). Nat. Struct. Mol. Biol. 5, 352–356.  CrossRef CAS Google Scholar
First citationBlaber, M., Lindstrom, J. D., Gassner, N., Xu, J., Heinz, D. W. & Matthews, B. W. (1993). Biochemistry, 32, 11363–11373.  CrossRef CAS PubMed Web of Science Google Scholar
First citationBruns, C. M. & Karplus, P. A. (1995). J. Mol. Biol. 247, 125–145.  CrossRef CAS PubMed Web of Science Google Scholar
First citationChan, A. W., Hutchinson, E. G., Harris, D. & Thornton, J. M. (1993). Protein Sci. 2, 1574–1590.  CrossRef CAS PubMed Google Scholar
First citationCraveur, P., Joseph, A. P., Rebehmed, J. & de Brevern, A. G. (2013). Protein Sci. 22, 1366–1378.  Web of Science CrossRef CAS PubMed Google Scholar
First citationDou, J., Vorobieva, A. A., Sheffler, W., Doyle, L. A., Park, H., Bick, M. J., Mao, B., Foight, G. W., Lee, M. Y., Gagnon, L. A., Carter, L., Sankaran, B., Ovchinnikov, S., Marcos, E., Huang, P.-S., Vaughan, J. C., Stoddard, B. L. & Baker, D. (2018). Nature, 561, 485–491.  CrossRef CAS PubMed Google Scholar
First citationDuddy, W. J., Nissink, J. W., Allen, F. H. & Milner-White, E. J. (2004). Protein Sci. 13, 3051–3055.  CrossRef PubMed CAS Google Scholar
First citationDunbrack, R. L. Jr & Karplus, M. (1993). J. Mol. Biol. 230, 543–574.  CrossRef CAS PubMed Web of Science Google Scholar
First citationEswar, N. & Ramakrishnan, C. (1999). Protein Eng. 12, 447–455.  CrossRef PubMed CAS Google Scholar
First citationFujiwara, K., Ebisawa, S., Watanabe, Y., Toda, H. & Ikeguchi, M. (2014). Proteins, 82, 1484–1493.  CrossRef CAS PubMed Google Scholar
First citationHege, T. & Baumann, U. (2001). J. Mol. Biol. 314, 187–193.  Web of Science CrossRef PubMed CAS Google Scholar
First citationHerráez, A. (2006). Biochem. Mol. Biol. Educ. 34, 255–261.  PubMed Google Scholar
First citationHutchinson, E. G. & Thornton, J. M. (1996). Protein Sci. 5, 212–220.  CrossRef CAS PubMed Web of Science Google Scholar
First citationIngelman, M., Bianchi, V. & Eklund, H. (1997). J. Mol. Biol. 268, 147–157.  CrossRef CAS PubMed Web of Science Google Scholar
First citationKabsch, W. & Sander, C. (1983). Biopolymers, 22, 2577–2637.  CrossRef CAS PubMed Web of Science Google Scholar
First citationLeader, D. P. & Milner-White, E. J. (2009). BMC Bioinformatics, 10, 60.  Google Scholar
First citationLeader, D. P. & Milner-White, E. J. (2012). BMC Struct. Biol. 12, 26.  Google Scholar
First citationLovell, S. C., Davis, I. W., Arendall, W. B., de Bakker, P. I., Word, J. M., Prisant, M. G., Richardson, J. S. & Richardson, D. C. (2003). Proteins, 50, 437–450.  Web of Science CrossRef PubMed CAS Google Scholar
First citationMarcos, E., Basanta, B., Chidyausiku, T. M., Tang, Y., Oberdorfer, G., Liu, G., Swapna, G. V. T., Guan, R., Silva, D.-A., Dou, J., Pereira, J. H., Xiao, R., Sankaran, B., Zwart, P. H., Montelione, G. T. & Baker, D. (2017). Science, 355, 201–206.  CrossRef CAS PubMed Google Scholar
First citationMartin, C., Richard, V., Salem, M., Hartley, R. & Mauguen, Y. (1999). Acta Cryst. D55, 386–398.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationMcDonald, I. K. & Thornton, J. M. (1994). J. Mol. Biol. 238, 777–793.  CrossRef CAS PubMed Web of Science Google Scholar
First citationMilner-White, E. J. (1987). Biochim. Biophys. Acta, 911, 261–265.  CAS PubMed Google Scholar
First citationRabijns, A., De Bondt, H. L. & De Ranter, C. (1997). Nat. Struct. Mol. Biol. 4, 357–360.  CrossRef CAS Google Scholar
First citationRichardson, J. S., Getzoff, E. D. & Richardson, D. C. (1978). Proc. Natl Acad. Sci. USA, 75, 2574–2578.  CrossRef CAS PubMed Web of Science Google Scholar
First citationVenkatachalam, C. M. (1968). Biopolymers, 6, 1425–1436.  CrossRef CAS PubMed Web of Science Google Scholar
First citationWan, W.-Y. & Milner-White, E. J. (1999). J. Mol. Biol. 286, 1633–1649.  CrossRef PubMed CAS Google Scholar
First citationZanotti, G., Panzalorto, M., Marcato, A., Malpeli, G., Folli, C. & Berni, R. (1998). Acta Cryst. D54, 1049–1052.  Web of Science CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds