The b-link motif in protein architecture

The -link is a composite protein motif consisting of a G1 -bulge and a type II -turn, and is generally found at the end of two adjacent strands of antiparallel -sheet. The 1,2-positions of the -bulge are also the 3,4-positions of the -turn, with the result that the N-terminal portion of the polypeptide chain is orientated at right angles to the -sheet. Here, it is reported that the -link is frequently found in certain protein folds of the SCOPe structural classification at specific locations where it connects a -sheet to another area of a protein. It is found at locations where it connects one -sheet to another in the -sandwich and related structures, and in small (four-, fiveor six-stranded) -barrels, where it connects two -strands through the polypeptide chain that crosses an open end of the barrel. It is not found in larger (eight-stranded or more) -barrels that are straightforward -meanders. In some cases it initiates a connection between a single -sheet and an -helix. The -link also provides a framework for catalysis in serine proteases, where the catalytic serine is part of a conserved -link, and in cysteine proteases, including M of human SARS-CoV-2, in which two residues of the active site are located in a conserved -link.


Introduction
The two major architectural features of proteins --helix and -sheet -occur where a suitable pair of dihedral angles is repeated along the backbone, leading to extensive hydrogen bonding. These features are often combined into higher order structures such as helix bundles, -sandwiches and -barrels. At the opposite level of complexity are small (three-to sixresidue) structural elements -protein motifs -that are defined by combinations of dihedral angles and patterns of hydrogen bonding. Such motifs may cause changes in the direction of the polypeptide chain [for example, in the -turn (Venkatachalam, 1968;Richardson, 1981;Wilmot & Thornton, 1988;Hutchinson & Thornton, 1994;Gunasekaran et al., 1998) and the Schellman loop (Schellmann, 1980;Milner-White, 1988)] or produce indentations in protein surfaces that enable them to bind particular ligands [for example, in the nest (Watson & Milner-White, 2002;Afzal et al., 2014) and the crown bridge (Leader & Milner-White, 2015)].
The -bulge is a small motif in which a single residue (X) on one -strand forms hydrogen bonds to two successive residues (1 and 2; referred to as the 'doubleton') of a second, usually antiparallel, -strand. Two broad categories of -bulge have been defined: classic -bulges, which have the R conformation at position 1, and G1 -bulges, with the L conformation at this position (Richardson et al., 1978). We have recently shown that two classes of G1 -bulge can be distinguished on the basis of the conformation of the residue (numbered 0) preceding the doubleton: G1, where the conformation is R , and G1, where it is R (Leader & Milner-White, 2021). This led to the recognition that the members of each class of G1 ISSN 2059-7983 -bulge most frequently occur as part of a composite with a specific second small hydrogen-bonded motif. In the case of the G1 -bulge this composite is with a type I -turn and is the well known -bulge loop (Richardson et al., 1978;Milner-White, 1987;Chan et al., 1993;Blandl et al., 2003;Craveur et al., 2013). Almost all G1 -bulges (99%) are found in -bulge loops (Leader & Milner-White, 2021).
In the case of the G1 -bulge the composite is with a (hydrogen-bonded) type II -turn, with the 1,2-positions of the -bulge corresponding to the 3,4-positions of the -turn (Fig. 1). The role of the -turn in the composite can be regarded as making a hydrogen bond to ensure that the polypeptide chain not only enters or leaves the -sheet at the position of the disruptive -bulge, but that it does so in a direction that is perpendicular to the -sheet. In this respect, it differs from most other small hydrogen-bonded motifs, which result in either turns or indentations in proteins. This composite motif has not received much attention since it was originally described (Richardson et al., 1978). It has recently been assigned a name --link -and is the subject of the present work.
The -link includes 85% of G1 -bulges (Leader & Milner-White, 2021), and approximately 22% of proteins contain at least one instance of this motif. As there are fewer -links in proteins than -sheets, the question arose as to whether the motif only occurred in particular structural contexts. This has been approached here by first identifying which folds of the SCOPe classification of protein architecture contain -links and then determining whether the motifs occupy specific positions within these folds. We report that the majority of -links are found where an antiparallel -sheet connects to certain other structural domains. Two situations predominate: in certain -sandwich structures at the junction between the two component sheets, and in many small -barrels at one end of a loop across an end of the barrel. Thus, the role of the -link is not merely to provide a terminus to a -sheet, but to form a connection between specific components of larger units of protein architecture.

Materials and methods
This work employed a new MySQL relational database -Protein Motif 2 -containing structural information from a set of 4484 individual protein subunits derived from the Richardson laboratory Top 8000, which is 70% nonredundant in terms of structure (http://kinemage.biochem.duke.edu/ databases/top8000.php). It was populated with small hydrogen-bonded motifs, using the program HBPlus (Mc-Donald & Thornton, 1994) to determine the hydrogen bonds, in a similar manner to that described previously for an original, smaller, database (Leader & Milner-White, 2009). Protein Motif 2 incorporates a table listing the SCOPe classification (Murzin et al., 1995;Fox et al., 2014;Chandonia et al., 2017) (Leader & Milner-White, 2012) also incorporates this new database.] The -links in the database were determined from SQL queries of the Protein Motif 2 database for structures with the three hydrogen bonds shown in the inset in Fig. 1, together with a requirement that the dihedral angles of residue 2 be within the R region of the Ramachandran plot. This selects for the combination of a type G1 -bulge and a type II -turn shown in CPK colouring in Fig. 1(d) without any separate explicit queries for these constituents. It should be emphasized that the specification does not include the two hydrogen bonds between the two -strands shown in white in Fig. 1(d  Structure of the -link and its relationship to the constituent motifs. (a) Type II -turn. (b) G1 -bulge. (c) -Link. The residues are numbered either as for the -turn (italic serif font) or thebulge (roman sans-serif font) or both. The two hydrogen-bonding residues of the -bulge (3 and 4) that are common to the -turn (1 and 2) are coloured pink. (d) The -link in the context of two antiparallel strands of a -sheet in staphylokinase (PDB entry 2sak; Rabijns et al., 1997). The residues of the -link are represented using the CPK colour scheme with hydrogen bonds coloured red. Proximal residues and hydrogen bonds are coloured white. The numbering is as for the -bulge, extended in the negative direction. Inset: diagrammatic representation of the hydrogen-bonding pattern with residues numbered as in (d).
not necessarily located at the ends of perfect antiparallel -strands.
The numerical data reported here ( Fig. 2 and supporting information) are from the analysis of -links in the Protein Motif 2 database. Of the 4484 protein subunits in the database, only 2885 had an assigned SCOPe ID in release 2.07. Of the total of 1283 -links, 973 resided in a protein fold that had been assigned a SCOPe ID, and these constituted the subset analysed. Further work to determine the extent of conservation of -links in particular folds was performed by examining other proteins with the same or related IDs using the SCOPe web resource. Positions in these proteins corresponding to -links in the database were identified either by visual inspection using the 3D protein graphics program Jmol (Herrá ez, 2006) or, where necessary, with the multiple sequence alignment program ClustalX (Larkin et al., 2007).

Assignment of b-links to architectural components of proteins
SCOPe (Structural Classification Of Proteins; Fox et al., 2014) was used to determine whether -links occur in particular types of protein domain. This classification has the hierarchy 'class' (designated by a character) followed by 'fold', 'superfamily' and 'family' (designated by numerals). For example, in group b.1.1.1 the class is 'All beta proteins', the fold is 'Immunoglobulin-like beta-sandwich', the superfamily is 'Immunoglobulin' and the family is 'V-set domains'. The SCOPe ID for the environment of each -link was identified and the great majority were found to be in classes b, c or d. The distribution of -links is seen in Fig. 2. We initially examined folds of class b, which contain 58% of all -links. The class comprises 178 folds (b.1, b.2 etc.), although many have few members. This was simplified by considering the 12 folds described as -sandwich or -sandwich-like as one group, including 18% of all -links, and the 15 folds described as -barrels as another group, containing 22% of all -links. Distributions of -links between folds are available in Supplementary Fig. S1 and Supplementary Table S1.

b-Links in sandwich and sandwich-like structures
-Sandwich proteins (folds b.1-b.33) consist of two antiparallel -sheets, the planes of which are juxtaposed and in many cases linked by one or more strands of polypeptide chain. The double-stranded -helix (fold b.82) is similar in these respects, and we regard it as -sandwich-like. Together, these folds constitute 46% of all class b folds, in the database which include an approximately similar proportion of thelinks in class b. It emerged that many of these -links have the role, which has not previously been reported, of making an angular connection between the two sheets of a -sandwich. Fig. 3 illustrates three different aspects of the connection. The sharp bend between the two sheets is evident in the ribbon view of the whole domain, in which the -link is towards the upper right of one sheet (salmon) and makes a connection to a second sheet (blue). The -link makes this connection through the N-terminal portion of the strand on which the doublet of the -bulge is located (Fig. 3a). A plan view of three strands in the proximity of the -link shows how the planes of the -turn and the left-hand (salmon) sheet are approximately at right angles ( Fig. 3b). More precisely, the first (N-terminal) residue of the -turn, that nearest to the right-hand (blue) sheet, is directed away from the right-hand sheet in an approximately perpendicular orientation to it. Distribution of -links between different SCOPe classes. Purple bars: number of instances of the class in the database. Green bars: number of -links in the class. Note that in some cases there is more than one -link in a particular instance of a SCOPe class. Here, the hydrogen bonding of the two upper strands of the right-hand sheet extends to the start of the -turn, which is the tightest situation possible (Fig. 3c). However, such close relative positioning is not found in every -sandwich.
The number and topology of the strands vary considerably between different -sandwich and -sandwich-like protein folds, as does the position of the -link. Their common feature is that the -link is at the effective corner of the sheet on which it resides, even when it is not actually on the first strand. Fig. 4 uses diagrams (Figs. 4a-4f) to show how this idea can accommodate -sandwich proteins which show increasing divergence from a situation of two 'ideal' sheets with the same number of antiparallel strands of similar length. Proteins illustrating the successive features introduced in Figs. 4(a)-4( f ) are presented in Figs. 4(g)-4( l), although it is emphasized that in other respects (for example the total number of strands) these proteins may not correspond to the simple patterns in the diagrams.
We start with two similar sheets in which antiparallel strands are linked by the continuation of the two strands on which the -link (red) occurs (Fig. 4a). This is illustrated by -amylase inhibitor (Fig. 4g), which has three strands in each sheet rather than the four depicted in Fig. 4(a). One variation of this involves a different number of strands in the two sheets (Fig. 4b), where the antiparallel nature of the sheet at the back is disrupted by an additional strand (dark grey). This is illustrated by CD58 (Fig. 4h). In another variation (Fig. 4c) the -link is not the first ('upper') strand. Here, an additional strand (dark grey) is shown above the linking strand, displaced from the position of the -link. It is illustrated for the TNF receptor ligand (Fig. 4i). In some cases the second strand from the front sheet is not connected to the sheet behind (Fig. 4d), research papers Arrangements of strands in some -sandwiches containing -links. Diagrammatical representations are shown (a-f ), with corresponding illustrations as ribbon models (g-l). In the diagrams the two layers of the sandwich are coloured salmon and blue, while the -links are red: the -turn is indicated by a kink and the line representing the singleton of the -bulge is shorter. In the ribbon models the colour scheme is similar to that in Fig. 3(a). (a) The simplest situation, in which each sheet has the same number of antiparallel strands and the two upper strands are connected between both sheets. (b) As (a), but with an additional strand (dark grey) in the back sheet separating the two upper strands that are adjacent in the front sheet. as illustrated by Charcot-Lyden crystal protein (Fig. 4j), where it continues to the strand below.
Certain -sandwich or -sandwich-like folds contain two -links. One such situation is that shown in Fig. 4(e), where the second -link ('2') is on the back sheet, with pseudo-symmetry to that on the front sheet. We observed this in the doublestranded -helix, an example of which is shown in Fig. 4(k).
The other situation is a feature of cupredoxin-like folds of type b.6.1 and has a second -link diametrically opposite to the first at the other end of the same pair of -strands (Fig. 4f ). This may seem surprising in relation to disruption of hydrogen bonding between strands 2 and 3 but, as the diagram indicates and the example of plastocyanin illustrates (Fig. 4l), strand 3 is always truncated (cf. Fig. 4c). (Fold b.6.1 only has three strands on the front sheet.) It is possible to consider this -link as being at the bottom left of the front sheet.
In all but one of the folds where the -link does occur it is found in the context of the two flanking hydrogen bonds shown in white in Fig. 1(d). An exception is fold b.1.1 (immunoglobulin-like), where the apparent insertion of an extra residue results in the flanking hydrogen bonds being separated from the -link by a wide -bulge. This is shown in Figs. 5(a) and 5(c), with the corresponding region of a b.1.2 fold included for comparison (Figs. 5b and 5d).
-Links are absent from two folds (b.3 and b.11) classified as -sandwich-like for which there is significant representation in the database (over four members; see Supplementary Table  S1). In all five examples of fold b.11, each with two similar domains, connections between the two sheets are associated with (and are presumably facilitated by) a combination of a classic -bulge and an adjacent glycine residue (Supplementary Fig. S2). In contrast, the connections in the nine examples of fold b.3 are not associated with any single structure, but exhibit a variety of -bulges of different types or none at all.
Of 263 -sandwich-like proteins in the database (folds b.1b.33 and fold b.82), 49% contain at least one -link, with there being 180 -links in total. We have inspected each of these -links and have ascertained that 99% lie at a corner between the two sheets of the sandwich and 91% are involved in a direct connection between the corners of the two sheets of the types shown in Fig. 4. The group of 8% that are not involved in a direct connection contains ten with a very extended connection and five in which the connection is between two adjacent strands on the same sheet.

b-Links in small b-barrels
Approximately 40% of the -links found in SCOPe class b folds in the database occur in -barrels (folds b.34-b.62). Of these, over 90% have 4-6 strands and are categorized as small (Youkharibache et al., 2019), with most of the remainder being eight-stranded barrels with atypical strand arrangements ( Supplementary Fig. S1). The striking, and previously unreported, feature of most of these -links is that they make a connection from the N-terminus of one strand of the -barrel, through a polypeptide loop, to the C-terminus of another strand of the same -barrel. This and other aspects of the connection are illustrated in Fig. 6 for a variety of different types of -barrel. An example, -spectrin, of the large group of four-stranded -barrels containing a -link is shown in Fig. 6(a). The antiparallel nature of the strands in four-stranded barrels means that a connection across either of the two ends of the barrel is only made between adjacent strands (Fig. 6a, i). In contrast to the -bulge loop (which is also found connecting strands in -barrels) this is not a tight (5-6-residue) connection: in this example it is a loop of 14 residues crossing the upper end of the barrel (Fig. 6a, ii). -Links in barrels of this fold (b.34) are flanked by the two additional hydrogen bonds in Fig. 1(d), as are -links of most other -barrel folds (Supplementary Table S1). The exception is fold b.36 (PDZ domainlike), where they are generally absent or, at best, weak. A typical example is shown in Fig. 6(b), where it can be seen that the divergence of the two strands only allows one hydrogen bond (marked with an arrowhead).  An example of a -link in a five-stranded -barrel is seen in Figs. 7(c) and 7(d) (OB-fold protein, eIF5a). It connects a short 3 10 -helix across the end of the barrel. A -link in a sixstranded antiparallel -barrel is shown in Fig. 6(c) for ferredoxin reductase. Like many such six-stranded -barrels, the strand connecting to the -turn of the -link is directly opposite, three strands away. In this example the connecting polypeptide is extensive, including a short helix across the 'upper' end of the barrel. A different six-stranded -barrel, in which not all strands are antiparallel, is the double--barrel. The example in Fig. 6(d), endogluconase V, contains two -links: one at each end of the barrel. Fig. 6(e) shows the two -links in the eight-stranded -barrel of cyclophilin (b.62), which is unusual in that some strands are parallel. The -link at the top of the image resembles those in the smaller -barrels by making a connection to the opposite side across the 'upper' end of the barrel by a three-turn -helix. The topology of the b.62 fold differs from other eight-stranded -barrel folds (b.60 and b.61), in which the regular meander is expected to preclude this type of -link. The second -link in Fig. 6(e) is atypical in being directed away from the barrel (see below).
The location of a -link within the -sheet of a -barrel differs from that in a -sandwich, where occurrence at the end of an outer strand involves no loss of hydrogen bonding. The situation in a -barrel with an even number of strands is that it is an antiparallel -sheet rolled into a cylinder in which every strand is hydrogen-bonded to two other strands. However, the strands of the -barrel are staggered with respect to one another, so that the extremities of a strand are only hydrogenbonded to one adjacent strand. Thus, -links can occur in one direction without disrupting the hydrogen bonding of the -barrel, but not in the other direction. This is illustrated in Fig. 7(a) one orientation and one in the other. In each case the strand adjacent to that with the doubleton is shown in green, and the potential hydrogen bonding that the residues N-terminal to the doubleton would make to that strand is indicated by the grey gradient fill. At position '1' no hydrogen bonds are made in any case, so the -turn in the -link does not disrupt hydrogen bonding. At position '2' the hydrogen-bonding is disrupted. This is why a -link at position '2' is rarely observed. An exception was mentioned above for cyclophilins (Fig. 6e) and is shown in more detail in Supplementary Fig. S3. The -links in several classes of -barrel exhibit an additional structural feature which involves hydrogen bonding to a third strand. If one considers the second strand to be that on which the singleton of the G1 -bulge of the -link is situated, an additional -bulge of the 'classic' type, with three hydrogen bonds, is made between this and a third, antiparallel, strand. This is shown in Fig. 7(b), where it is seen that positions 1 and 2 of the classic -bulge are X + 1 and X + 2 in relation to the G1 -bulge. Figs. 7(c) and 7(d) show this in the context of the whole of the -barrel. This additional classic -bulge associated with the -links is a feature of many small barrels, particularly in the PDZ domain (Fig. 6b), which functions to bind specific proteins. This involves the third strand, above, extending the -sheet to a fourth strand: the C-terminal peptide of the ligand (see Supplementary Fig. S4).
The additional classic -bulge is also found in the doublestranded -helix, but is rare in the -sandwiches. -Links are absent from three folds (b.50, b.55 and b.61) classified as -barrels for which there is significant representation in the database, and there are only a few -links in folds b.45 and b.60: see Supplementary Table S1.
Of 219 proteins in the database with small -barrels (folds b.34-b.62), 60% contain at least one -link, with there being 167 -links in total. We have inspected each of these -links and have ascertained that 92% are formed between adjacent strands at the end of the barrel and 76% form connections across the end of the barrel of the types shown in Fig. 6. The group of 16% that are appropriately located but are not involved in a connection across the end of the barrel include some that connect to other parts of the protein and some where the connecting chain terminates above the barrel.

Other b-links
SCOPe class c and d folds contain mixtures of -helices and -sheets. Although the proportion of -links in c and d folds is low, together they comprise 34% of -links in the database (Fig. 2). Many fall into specific folds: in three or four (c.66, d.26, d.145 and perhaps d.157) the -link occurs where a strand of a -sheet connects to an -helix. Two examples are shown in Supplementary Fig. S5. In both, the -link is associated with a classic -bulge as in Fig. 7(b).
Not all -links are found in the structural environments discussed above, and instances occur in various situations, some of which do not involve -strands. Two structures, seen in Supplementary Fig. S6, involving a pair of cooperating -links, are worthy of attention. The first is fold b.85, the -clip (Iyer & Aravind, 2004), which is what might be regarded as a belt of two -strands looped out from the rest of the structure. The second is fold b.84, the barrel-sandwich hybrid.    40.4.5. (d) As in (c), but a backbone diagram with the -link/-bulge combination coloured CPK, the remaining parts of the three strands on which they occur salmon pink, and the rest of the -barrel white.

b-Links at the active site of serine and cysteine proteases
No mention has yet been made of a role for the -link in enzyme function. However, the catalytic Ser195 of serine proteases is located in the -turn of a -link, a feature noted by Richardson et al. (1978). This is shown for trypsin in Fig. 8(a), where it can be seen that the -link is at the connection of opposite strands across an end of one of the two six-stranded -barrels (family b.47.1.2) and that Ser195 occupies the position of residue 0. In all active enzymes of this type examined, including eukaryotic, bacterial and viral serine proteases, the -link was conserved (Supplementary Table  S2). The classic -bulge (as in Fig. 7b) associated with the -link is found in the eukaryotic serine proteases examined, but only in some of the bacterial and viral enzymes.
Although the -link is also present in the inactive precursors of serine proteases, trypsinogen and chymotrypsinogen, it is absent from human prokallikrein 6 (Fig. 8b). The activation of kallikrein differs from that of other serine proteases, as it involves the movement of a new N-terminus (Leu16) to the proximity of Asp194, breaking its interaction with Trp141 and reorienting the nucleophilic Ser195 in position for catalysis (Gomis-Rü th et al., 2002). Fig. 8(b) shows that this also allows the formation of the hydrogen bond of the -turn of the -link.
Superfamily b.47.1, containing the serine proteases, also encompasses the superfamily b.47.1.4, which contains viral proteases that are structurally homologous to the serine proteases, but in which the catalytic Ser195 is replaced by a cysteine residue. (Cysteine proteinases such as papain occupy a different fold: d.3.) The catalytic mechanism of these viral proteases also differs from that of the serine proteases in the involvement of a second essential histidine residue (Anand et al., 2003). Both the catalytic cysteine and the second histidine (His163) are located in the conserved -link, with the histidine occupying the singleton position (X) of the G1 -bulge component. This is shown in Fig. 8(c) for the protease M pro from human SARS-CoV-2 (Jin et al., 2020). The first histidine, His41, corresponds to the well known His57 of trypsin, but the third component of the catalytic triad in trypsin, Asp102, is absent in the viral protease.

Discussion
In the original description of the motif studied in this work it was suggest that 'one sort of function' that it might have 'would be to influence the direction in which a strand could leave a -sheet' (Richardson et al., 1978). The present work describes the architectural nature of such a function. When a strand leaves a -sheet through a -link it is frequently directed towards and connects with another structural component of the protein: a second sheet in a sandwich structure, a strand at the other side of a small -barrel, or an -helix. These latter environments are not as different as may at first appear because one can regard many -barrels as 'two -sheets packed face to face, with the strands in each sheet lying roughly perpendicular to one another' (Chothia & Janin, 1982;Youkharibache et al., 2019). Furthermore, a short helix is often associated with the -links in small -barrels (Youkharibache et al., 2019;Fig. 6). Of course, -links occur in other contexts, as discussed in Section 3.4, but at least 50% would appear to have this newly identified architectural role.
The specific nature of this is evident from the frequency with which -links occur at the same position in particular SCOPe folds (Supplementary Table S2), but we recognize that such evolutionary conservation is not absolute. In the -sandwich we have found that in many cases at positions research papers  Katz et al., 1995). Backbone plot with the -link, associated classic -bulge and active-site residues (labelled and side chains shown) coloured CPK, adjacent residues salmon pink and other parts of the protein white. (b) (i) Human prokallikrein 6 (PDB entry 1gvl chain A; Gomis-Rü th et al., 2002). (ii) Human kallikrein 6 (PDB entry 1l2e chain A; Bernett et al., 2002). Shown is the region near the catalytic Ser195. The arrowheads indicate the position of the hydrogen bond in the -turn of the -link found only in the active protein. Side chains are shown for Ser195, Asp194 and residues in proximity to the latter. Leu16 is not shown in (i) because it is distal to the residues shown before the activation of prokallikrein. (c) Human SARS-CoV-2 M pro (PDB entry 7bqy; Jin et al., 2020). The environment of the -link with the side chains of the active-site Cys145, His163 and His41 is shown.
where a -link is not completely conserved a G1 -bulge is still present; for example in family b.1.8.1 (Supplementary  Table S2). Here, one supposes that either other features of protein structure fulfil the role previously mentioned for the hydrogen bond of the type II -turn, or a wider loop between sheets does not require this. In other cases, such as folds b.3 and b.11 (Section 3.2), the connections between the two sheets appear to be facilitated quite differently.
It is possible to imagine that in addition to its role in maintaining a particular protein architecture, the -link might initiate the folding of the protein into this architecture. If, as is widely, although not universally (Leader & Milner-White, 2011), assumed, proteins fold as they are synthesized, in the N to C direction, this would seem somewhat at odds with position of the -link on the connection. However, we have no data that bear on this question.
We are not aware of previous reports of the relationship of -links to architectural classifications such as SCOPe, but Craveur et al. (2013) did report such a study of -bulges. Their work showed high conservation of -bulges at particular positions, but it did not differentiate between the two subclasses of G1 -bulge, nor consider folds within SCOPe classes, making it difficult to relate to the present study.
Although our emphasis has been on how -links can connect larger structural components of proteins, we have shown that in at least one circumstance such -links can play an additional role of their own. The active-site Ser195 of serine proteases, and the cysteine residue of homologous cysteine proteases, are situated at the end of a small -barrel in which a -link initiates a loop across the top of the barrel. These residues are located at the same position within the completely conserved -link, and it seems likely that the extensive inter-main-chain hydrogen-bond network in and around the -link ensures that the serine or cysteine side chain is orientated optimally for catalysis. The possible relevance of this to the development of inhibitors of viral proteases, such as that of SARS-CoV-2 (Fig. 8c), should not be discounted. Impressive progress has been made recently in the design and synthesis of proteins, including that of an eight-stranded -barrel (Dou et al., 2018), but as far as we are aware none of these constructs have yet included -sandwiches or small -barrels. We have previously described patterns of amino acids that distinguish -links from other composites of -bulges (Leader & Milner-White, 2021) and have also found some small differences between the -links in -sandwiches and -barrels (Supplementary Table S3). Now that the architectural role of the -link has been recognized, we hope that this will help in the design of increasingly ambitious synthetic proteins.