A comprehensive classification and nomenclature of carboxyl–carboxyl(ate) supramolecular motifs and related catemers: implications for biomolecular systems

The vast diversity of carboxyl–carboxyl(ate) arrangements is reduced to 17 supramolecular motifs and eight catemers. Examples of each, extracted from the CSD, are presented.


Introduction
Carboxyl and carboxylate [written collectively as carboxyl-(ate)] groups are found in a large variety of biomolecular compounds and also in drugs and synthetic molecular systems. For the former, the two Asp and Glu amino acids represent $ 2% of the $ 2 million amino acids found in the Protein Data Bank (PDB, November 2014 release; Berman et al., 2000). For the latter, they assemble to form essential supramolecular synthons recurrently used in crystal engineering (Desiraju, 2007(Desiraju, , 2013Merz & Vasylyeva, 2010) and are present in $ 37 000 ($ 5-6%) of the $ 675 000 crystal structures in the Cambridge Structural Database (CSD Version 5.35, November 2013; see Table 1; Allen, 2002;Chisholm et al., 2006;Groom & Allen, 2014).
The formation of carboxyl(ate) dimers and further of carboxyl catemer motifs implies the involvement of common syn but also less common anti conformers, as well as the syn and/or anti lone pairs of the O atoms (Gö rbitz & Etter, 1992a;Das et al., 2005;Das & Desiraju, 2006;Sanphui et al., 2013;Fig. 1). Theoretical studies have investigated the relative stability of the syn and anti conformers. It is generally accepted that in the gas phase, the syn conformer is favoured over the anti conformer by 21.4-28.9 kJ mol À1 depending on the theoretical level and basis set used in quantum chemical calculations (Kamitakahara & Pranata, 1995;Sato & Hirata, 1999;Nagy, 2013). In aqueous solution, the estimated relative energy difference between the two conformers is reduced to 7.12 kJ mol À1 (Nagy, 2013). A further point of interest involves the relative basicity of the syn and anti lone pairs of carboxylate groups. Theoretical studies have reported that although the syn lone pairs are intrinsically more basic, the basicity difference decreases and even cancels out when environmental effects are taken into consideration (Li & Houk, 1989;Allen & Kirby, 1991;Gao & Pavelites, 1992). In line with these data, a significant number of catemer motifs involving anti conformers have been observed in various crystallographic surroundings, supporting the fact that environmental effects are able to reverse anticipated conformational equilibria (Das & Desiraju, 2006). Anti conformers have also been considered in drug discovery strategies involving bioisosterism (McKie et al., 2008;Meanwell, 2011;Allen et al., 2012).
Given the importance of these carboxyl-carboxyl(ate) dimers in both the chemical and biochemical realms, the present study aims at: (i) providing an exhaustive classification of all possible dimers and catemers involving these groups; (ii) proposing a systematic nomenclature for them; (iii) defining recurrent hydrogen-bond properties. This study should contribute to an improved understanding of the structural diversity observed in small-molecule crystal structures, and provide insights into crystal engineering of new materials (Desiraju, 2007(Desiraju, , 2013, including pharmaceutical cocrystals (Blagden et al., 2007). However, the main incentive of the study resides in acquiring reliable statistical data that will help to understand carboxyl(ate) interactions in biomolecular systems. In this respect, analysing small-molecule crystal structures, where H atoms are systematically observed, has a clear edge over exploring biomolecular systems where H atom positions are rarely reported (Ahmed et al., 2007;Fisher et al., 2012).

Methods
The Cambridge Structural Database (CSD Version 5.35, November 2013) was searched for structures containing carboxyl-carboxyl(ate) motifs by using explicit H-atom positions. All searches were performed with the ConQuest software (Bruno et al., 2002) using filters so that error-containing, polymeric and powder structures were excluded, as well as structures marked as disordered. Although H-atom disorder is common in carboxylic systems, structures where the H atom could not be unambiguously assigned to a single O atom were not considered (Leiserowitz, 1976;Berkovitch-Yellin & Leiserowitz, 1982;Wilson et al., 1996;Das et al., 2005;Thomas et al., 2010;Hursthouse et al., 2011). This criterion leads to exclusion of 12 out of the 23 catemers listed by Das & Desiraju (2006). However, Steiner (2001) reported that statistics were not affected by excluding disordered structures. The searches were also restricted to structures with low R-factor values (R 0.05) unless otherwise specified. Metal-bound carboxyl(ate) Carboxyl(ate) groups (syn and anti conformers) and schematic structure of a catemeric chain. The syn and anti lone pairs of the three carboxyl(ate) O atoms are marked by double dots. groups were excluded given their specific structural features (Hocking & Hambley, 2005). Note that the November 2006 CSD release contained less than 2/3 of the structures found in the November 2013 release. Thus, the present searches significantly extend those presented in earlier publications on smaller samples of structures (Kuduva et al., 1999;DeVita Dufort et al., 2007;Langkilde et al., 2008). Since carboxyl(ate) groups are involved in strong hydrogen bonds (Jeffrey, 1997;Steiner, 2001;Langkilde et al., 2008), a stringent hydrogen-bond cut-off criterion could be used (OÁ Á ÁO 2.8 Å ). The H-atom positions were not considered for analysing hydrogen-bond lengths since their position is systematically unreliable when not derived from neutron diffraction experiments (Vishweshwar et al., 2004;Allen & Bruno, 2010). Neutron diffraction surveys provide an average 1.018 Å (Allen & Bruno, 2010) or even a 1.070 Å value (Vishweshwar et al., 2004) for the carboxyl O-H bond length, compared with an average of 0.87 Å derived from our survey. Hence, we have not used H atoms in the subsequent analysis, except for obviously differentiating carboxyl from carboxylate groups and for defining the syn/anti character of the former. An incidental advantage of not using H atoms is that our defined criteria can be used in biomolecular systems where H atoms are rarely characterized.
The geometric parameters used to distinguish the syn and anti conformers of the carboxyl groups and the spatial syn and anti arrangement of carboxyl-carboxyl(ate) dimers are detailed in x3.1. Specific criteria were used to exclude a few borderline and possibly error-containing structures. For instance, the WEGBUH structure (Ying, 2012) contains a short (2.58 Å ) interaction between two O atoms of the carboxylic hydroxyl groups that corresponds rather to a carboxyl-carboxylate than to a carboxyl-carboxyl motif. Similarly, a significant number of structures are excluded where the H atoms are located out of the O C-O plane by more than 0.4 Å .
The results of the searches were analysed using Vista (CCDC, 1994), and all structures were visualized using Mercury CSD Version 3.3 (Macrae et al., 2008). Catemer structures were individually examined and classified. The possibility that some of the presented catemer motifs could belong to large rings rather than infinite chains was considered and excluded.

Carboxyl and carboxylate groups
Carboxylic acids bear a proton that is commonly found in the syn and more rarely in the anti conformation. In order to distinguish between the syn and anti conformers, we imposed the following criterion on the OÁ Á ÁO-H angle () (Fig. 2). The syn conformer corresponds to angle values between 0 and 120 ; the anti confirmer to angle values between 120 and 180 . The relative proportion of these conformers is roughly 9/1 in favour of syn, while negatively charged carboxylate groups represent about 2/3 of the total carboxyl groups (  Geometric parameters used for separating the carboxyl syn and anti conformers. The syn conformers are defined by a value below 120 (marked by a blue dashed line on the histogram; corresponds to the OÁ Á ÁO-H angle). The anti conformers are defined by a value greater than 120 . The histogram has been derived from an ensemble of low Rfactor (R 0.05) carboxylic acid containing structures.

Table 2
Number of structures and fragments containing carboxyl(ate) groups in the CSD.
Only low R-factor structures (R 0.05) are considered. Statistics were also collected for the anti conformer subgroups that take into account the carboxyl groups that are involved in intra-and intermolecular hydrogen bonds, respectively. Distances are in Å , angles in .

Figure 3
Three dicarboxylic acids with an anti carboxyl group involved in an intramolecular hydrogen bond, schematically displayed under the CSD most represented mono-anion dicarboxylic acid form. similar to those reported in an early study (Leiserowitz, 1976).
Our updated values are reported in Table 2. Note that, due to its partial double-bond character, the C O bond of carboxyl groups is shorter by $ 0.11 Å than the adjacent C-O(H) hydroxyl bond. The anti conformer population is more heterogeneous than the syn population since they are involved in a large diversity of intermolecular but also intramolecular bonds such as those observed in oxalic, malonic, maleic (Fig. 3) as well as phthalic acids. For the three former acids in their most represented mono-anion dicarboxylic acid form, the average d(OÁ Á ÁO) hydrogen-bond distances are 2.67 AE 0.03 (10 structures), 2.46 AE 0.03 (20 structures) and 2.44 AE 0.03 Å (107 structures), stressing the formation of very short hydrogen bonds. Since the scope of this study is to examine supramolecular motifs, we eliminated from our searches all 'intramolecular' contacts involving an anti carboxyl conformer unless otherwise specified. When structures containing intramolecular hydrogen bonds were excluded, the number of fragments containing an anti carboxyl conformer decreased from 1168 to 223.

Carboxyl-carboxyl(ate) interactions
3.2.1. Nomenclature. An evaluation of carboxyl(ate) interaction modes based on the syn/anti carboxylic conformers and the syn/anti carboxyl(ate) lone pairs led to a total of 17 carboxyl-carboxyl(ate) dimers comprising: (i) one cyclic dimer; (ii) 12 carboxyl-carboxyl dimers involving a single hydrogen bond; (iii) 4 carboxyl(ate) dimers. Free rotation around the interlinking hydrogen bond is considered for all except the cyclic dimer (Fig. 4). The formation of three-centred or bifurcated hydrogen bonds was not considered since they do not appear in previous (Gö rbitz & Etter, 1992b) and current CSD surveys as well as in molecular dynamics simulations of formate and acetate ions in water (Payaka et al., 2009(Payaka et al., , 2010. This simplifies considerably the presented nomenclature.
Sixteen interaction modes involve a single hydrogen bond linking the two units. We propose a three letter nomenclature for carboxyl-carboxyl dimers based on: (i) the syn or anti conformer of the first carboxyl group that is by convention always the hydrogen-bond donor group of the dimer; (ii) the syn or anti lone pair of the carbonyl hydrogen-bond acceptor group of the dimer; (iii) the syn or anti conformer of the dimer hydrogen-bond acceptor group. The first letter (S or A) corresponds to the syn or anti conformer; the second letter (S or A) to the lone pair involved in the hydrogen bond; the third letter (S or A separated by a dash from the two others) to the position of the H atom not involved in the hydrogen bond. For the eight dimers involving the participation of a     Geometric parameters used for separating carboxyl-carboxylate dimers involving syn or anti lone pairs. The histogram has been drawn for a subensemble of SS and SA dimers. The syn conformers are defined by a value below 130 marked by a blue dashed line on the histogram; corresponds to the O(H)Á Á ÁOÁ Á ÁO angle. The anti conformers are defined by a value greater than 130 . Hydroxyl dimer ss-a 2 (7) 2 (7) 2.71 sa-s 6 (8)  (102) 68 (117) 2.54 AE 0.06 † The number of structures and fragments are given for structures with low R-factors (R 0.05). The number of structures and fragments derived from the entire CSD (no Rfactor restrictions) are given in parentheses. ‡ Average distances (Å ) calculated for the R 0.05 subset.
3.2.2. Geometric classification criteria. As noted above (Fig. 2), simple geometric criteria can be used to filter the carboxyl syn and anti conformers. It was less obvious how to discriminate dimers based on their syn or anti lone pair bonding types. After having tried several options, we found that the histograms showing the angle that corresponds to the O(H)Á Á ÁOÁ Á ÁO angle involving the hydrogen-bond donor O atom and the two carboxylate O atoms are the most helpful to achieve such a goal. The histogram drawn for the carboxylcarboxylate dimers is unambiguous and prompted us to use a 130 cut-off for isolating the SS and AA from the SA and AS carboxyl-carboxylate dimers, respectively (Fig. 5). Although a clear partition is difficult to identify on the SS-S dimer histogram (data not shown), a visualization of these dimers confirmed the soundness of the defined criteria. As is often the case, borderline conformations are observed and are difficult to eliminate but do not alter the inferred landscape.
3.2.3. Carboxyl-carboxyl interaction modes. Cyclic dimer: This dimer is undoubtedly the best represented in the CSD (Table 3). The distance between the O atoms involved in the hydrogen bond is on average close to 2.65 AE 0.03 Å (Fig. 6) and consequently shorter by 0.17 Å than the accepted H 2 OÁ Á ÁOH 2 hydrogen-bond length (2.82 Å ). Cyclic dimers are almost perfectly planar.
'Carbonyl dimers': Eight 'carbonyl dimer' types were identified (Table 3). The four types involving the syn conformer of the donor carboxyl group and among them, the SA-S dimers, are well represented. The synplanar rotamers are generally not observed except for the SA-S dimers where they are as prominent as antiplanar rotamers (Fig. 7). Note that syn-and antiplanar rotamers are defined by inter-dimer dihedral angles with values close to 0 and 180 , respectively (see, for example, Fig. 7c). The ACETAC09 acetic acid structure seems to be stabilized by a C-HÁ Á ÁO interaction involving the methyl group, an orientation that is not found for chloroacetic acid in the CLACET01 structure and illustrates how weak interactions participate in structural networks. Not surprisingly, the four dimer types involving the anti conformer of the donor carboxyl are rare. Among them, the AA-S dimer that involves the anti lone pair of a carbonyl group is best represented. However, convincing structures are found for each dimer type (Fig. 8). The hydrogen-bond length distribution is broader than the one given for the cyclic dimers, while the average hydrogen-bond length is roughly the same (2.66 AE 0.05 Å ; Fig. 6).
'Hydroxyl dimers': Although the two carboxyl hydroxyl groups could form hydrogen bonds, this interaction occurs rarely. Only two ss-a and six sa-s conformers were characterized (Table 3; Fig. 9). None of the two other possible as-a and aa-s conformers were observed. This points to the fact that the lone pairs of carboxyl -OH groups seem to be much less basic and/or accessible to other carboxyl groups than the lone pairs of more common hydroxyl groups.  involving a hydrogen bond between a syn hydroxyl group and a syn carboxylate lone pair, is the most prevalent carboxylcarboxylate dimer in the CSD (Table 3). The antiplanar SS dimer is frequently observed while dimers close to the synplanar orientation are much less represented (Fig. 10). Some rare occurrences of the synplanar orientation stabilized by intervening groups (such as NH 4 + in JEDPUE; see Fig. 10) are reported. In those instances, the distances between the O atoms not involved in the hydrogen bond exceed 3.0 Å .
All SA rotamers, involving a hydrogen bond between a syn hydroxyl group and an anti carboxylate lone pair, are nicely represented with some preference for the antiplanar orientations. The AS and AA dimers are less abundant but are still observed in a significant number of structures.
The most distinctive feature of these carboxyl-carboxylate dimers is related to the very short average hydrogen-bond distance between the two O atoms (2.54 AE 0.06 Å ), which does not seem to be dependent on the dimer type (Fig. 6). The shortest observed hydrogen bonds (2.43 AE 0.04 Å ) belong to intramolecular mono-anion dicarboxylic acids (Figs. 6 and 10).
3.2.5. Carboxyl(ate)-water hydrogen-bond length. The hydrogen-bond length between carboxyl(ate) groups and water molecules is strongly dependent on the acceptor or donor character of the former. When bound to the hydroxyl group, the average d(C-O(H)Á Á ÁOw) distance is 2.59 AE 0.06 Å (Fig. 11a); when bound to a carboxyl(ate) carbonyl group, the average d(C OÁ Á ÁOw) distance (2.77 AE 0.07 Å ) becomes close to water hydrogen-bond distances (Figs. 11b and c). The shortest reported hydrogenbond lengths are close to 2.4 Å . Such a short length is found in the CACTUW structure (Vishweshwar et al., 2004), where the (C O)O-HÁ Á ÁOw distance is close to 2.48 Å and involves an anti conformer (Fig. 9a). Interestingly, only 44 water molecules establish a hydrogen bond with the lone pair of the carboxyl-OH group either in syn or anti (compared with the $ 2800 water molecules found around the other groups), confirming its poor acceptor potential. The associated distances are close to 2.80 Å . 3.2.6. Catemers. Nomenclature: The dimer nomenclature can be adapted without major modifications to the catemer motifs for which two classes can be defined: (i) the homo-catemers involving the formation of a continuous chain of the same dimer and (ii) the heterocatemers involving two alternating dimer types. In the latter case, we impose the convention that the syn conformer precedes the anti conformer. Thus, the SS-AÁAS-S code should be used instead of the AS-SÁSS-A code. In the current CSD release, four homo-and four hetero-catemer types were identified (Table 4 and Fig. 12).
Catemer formation rule: The SS-S and SA-S homo-catemers are the most represented followed by the SS-AÁAA-S hetero-catemers. Three other catemers are poorly repre-  Rare carboxyl-carboxyl dimers involving the lone pair of the hydroxyl group ('hydroxyl dimers'). The C and O atoms not belonging to the interacting carboxyl groups are shown in light blue, N atoms are shown in magenta. The light blue spheres indicate that the molecule has been truncated for visualization purposes. (a) Antiplanar SS-A dimer (CACTUW; R = 0.04). Due to the size of the system, only the interacting fragments are shown. The unusually short carboxyl-Ow distance is given. The red asterisks mark the carboxyl groups involved in the ss-a dimer. (b) Antiplanar sa-s dimer (CAYJAO; R = 0.06). (c) Synplanar sa-s dimer involving two fumaric acid molecules (EMONAW; R = 0.11). The Ncontaining interacting molecule has been truncated due to its size.  After closer examination of the catemer nomenclature (Table 4), a simple rule emerged. If the dimer starts with a syn or an anti conformer it should end with an identical conformer. Thus, the SS-S, SA-S, AS-A and AA-A dimers form homo-catemers since the first and the last conformers are identical, while the SS-A, SA-A, AS-S and AA-S dimers need to associate with a complementary motif and can only form hetero-catemers. According to this rule, all eight possible homo-and hetero-catemer combinations were identified in the CSD, although the SS-AÁAS-S (ROZHEU; Dawid et al., 2009) and SA-AÁAS-S catemers (MEKLOE; Das & Desiraju, 2006) were identified in only one instance. Table S1 of the supporting information provides a list of all characterized catemers, which were manually checked to confirm that they are not part of large rings.
4. Discussion 4.1. A systematic classification of carboxyl-carboxyl(ate) dimers . . . By using simple stereochemical considerations, we have demonstrated that the apparently overwhelming diversity of carboxyl-carboxyl(ate) dimers (Rodríguez-Cuamatzi et al., 2007) can be reduced to 17 supramolecular motifs when considering free rotation around the interlinking hydrogen bond. A hierarchy of motifs emerged that distinguishes first the cyclic dimer (1929 fragment occurrences), followed by the SS (947 occurrences), SA (357 occurrences) and SA-S dimers (234 occurrences) ( Table 3). The other dimers are less represented and some are rare, especially those in the 'hydroxyl dimer' class where the as-a and the aa-s types are absent from the current CSD release (Fig. 4). This latter observation is in agreement with the fact that strong donor groups such as carboxyl -OH functions are also poor acceptors, as reported in small molecules and biomolecular systems (Ramanadham et al., 1993;Steiner, 2002).
The reasons as to why in certain circumstances, carboxyl groups prefer to form single hydrogen-bonded dimers extending sometimes into polymeric-like catemeric chains rather than cyclic dimers remains a subject of astonishment, although much has been written on this topic including considerations related to the preferential involvement of syn and anti lone pairs and conformers (Glusker, 1998;Sato & Hirata, 1999;Nagy, 2013).
In order to appreciate better these conformational preferences, statistical models predicting the number of hydrogen bonds that might form between any donor/acceptor pair in a crystal structure have been derived using CSD data (Allen et al., 1999;Galek et al., 2014) along with computational models providing estimates of their intrinsic stability (Dunitz & Gavezzotti, 2012). These studies confirmed the pre-eminence Histograms showing the distance distribution between the two O atoms directly involved in the carboxyl(ate)-water hydrogen bond. For clarity, only water molecules positioned in a 1 Å slice above and below the plane defined by the three heavy atoms of the carboxyl(ate) groups are considered. A cut-off of 2.2 Å for d(C OÁ Á ÁH-Ow) or d(C-OHÁ Á ÁOw) was used.  Table 4 Number of catemer-containing structures in the CSD.
Only low R-factor structures (R 0.05) are taken into account (see complete list in Table S1). Disordered, error-containing, polymeric and powder structures were excluded from the search.

No. of structures
Hetero-catemer SS-AÁAS-S 1 SS-AÁAA-S 17 SA-AÁAS-S 1 SA-AÁAA-S 7 of the cyclic dimer over other motifs. Although such approaches appear promising, they suffer from: (i) drawbacks related to the still noticeable lack of a sufficient number of crystal structures; (ii) the difficulty to take into account environmental effects; (iii) important approximations in the calculation of the interatomic forces at play in such complex systems. In this respect, non-additive contributions are especially difficult to estimate and quantum mechanical calculations confirmed that the energy gap between different motifs is small and lies within the precision limits of the methods (Meot-Ner et al., 1999;Meot-Ner, 2012). The most important factor to take into account is related to the strong competition of alternate binding motifs. Indeed, in CSD crystal structures, it was established that the probability of formation of dimers was around 30%, the remainder forming hydrogen bonds with a great variety of other acceptors (Steiner, 2001(Steiner, , 2002. Interestingly, unforeseen motifs are still brought to light. To cite only a few of them, new crystal forms of aspirin were recently published (Hursthouse et al., 2011) and a crystallization study of a family of mono-substituted salicylic acid compounds reported an unexpectedly large diversity of motifs (Montis & Hursthouse, 2012). To understand the association rules of these supramolecular synthons and to be able to be truly predictive, we probably still have to expand current databases by orders of magnitude.

. . . and associated catemers
For catemers, we designed a simple rule derived from the carboxyl-carboxyl(ate) dimer nomenclature that postulates that only eight catemer motifs can be formed (Fig. 12). As for dimers, a catemer hierarchy exists, with the SA-S catemer being the most represented (Table 4). The possible origin of the less frequent formation of catemer motifs over the common cyclic dimer has been addressed by several authors and is of special interest in crystal engineering (Beyer & Price, 2000;Das & Desiraju, 2006;Sanphui et al., 2013). Basically, the same factors involved in the preferential formation of one or the other dimer play a role here, namely steric factors, supporting C-HÁ Á ÁO interactions and hydrogen-bond competition with various types of chemical groups in addition to specific stereoelectronic effects. These observations stress that intrinsic or local energetic considerations are not sufficient to describe the formation rules of these motifs (Leiserowitz, 1976;Berkovitch-Yellin & Leiserowitz, 1982;Kuduva et al., 1999;Das & Desiraju, 2006;Hursthouse et al., 2011).
As for dimers, new catemer patterns are still uncovered such as in the 1,2-phenylenedipropynoic acid where two carboxylic groups from the same molecule are involved in the formation of a SA-AÁAS-S catemeric chain (unfortunately the structure was not deposited in the CSD; Saravanakumar et al., 2009). Furthermore, recent examples of carboxylic acid catemer and dimer synthon polymorphs were reported (Gajda et al., 2009;Sanphui et al., 2013). Overall, we characterized 122 catemers that can be compared with the 73 catemers characterized from a survey of the April 1998 CSD (Kuduva et al., 1999). Note that in this present study, we were able to categorize two particularly rare catemers observed in only one instance each (Table 4). This is fortunate since we believe to have now a complete structural sample of each of the eight possible homo-and hetero-catemer structures.

Short hydrogen bonds
Besides these classification attempts, this study supports findings established in earlier surveys on smaller structural samples that hydrogen bonds involving carboxyl-carboxylate dimers are on the shorter and consequently stronger side of hydrogen bonds (Jeffrey & Saenger, 1991;Jeffrey, 1997;Steiner, 2001Steiner, , 2002Vishweshwar et al., 2004;Langkilde et al., 2008). It is beyond the scope of this paper to analyse the reasons as to why such short hydrogen bonds are formed. However, the topic of short or 'strong' hydrogen bonds involving amongst others the carboxyl(ate) groups found in proteins has received great attention especially since they were  associated with enzymatic catalytic mechanisms (Perrin & Nielson, 1997;Katz et al., 2002;Gilli & Gilli, 2009;Perrin, 2010;Hosur et al., 2013) involving either the syn or anti lone pairs (Zimmerman et al., 1991).
The carboxyl-carboxyl hydrogen bonds are generally considered as -cooperative bonds or bonds belonging to the class of 'resonance-assisted hydrogen bonds' (RAHB; Vishweshwar et al., 2004;Bertolasi et al., 2006;Gilli & Gilli, 2009). In these motifs, the COOH donor is activated by -cooperative hydrogen bonding (O-HÁ Á ÁO C). The carboxylcarboxylate hydrogen bonds that involve a bond between an acid and its conjugate base fall clearly in a different pool where the stabilizing effect is induced by the presence of the negative charge. These bonds are also called ionic hydrogen bonds (Steiner, 1999;Meot-Ner, 2012) or negatively 'chargeassisted hydrogen bonds' (CAHB; Vishweshwar et al., 2004;Gilli & Gilli, 2009). They are on average $ 0.1 Å shorter than the RAHB hydrogen bonds (Fig. 6). This is particularly obvious when both groups have similar pK a values as in protein structures where they play important structural and sometimes catalytic functions (Cleland & Kreevoy, 1994;Hosur et al., 2013).
A third category of hydrogen bonds is found in mono-anion dicarboxylic compounds (Fig. 3). These intramolecular hydrogen bonds can be regarded as very short CAHBs given their average 2.43 Å distance (Fig. 6d). Consequently, they also belong to the strongest class of hydrogen bonds among those involving carboxyl(ate) groups. The shortening of the hydrogen bond is attributed to the presence of the electronegative O acceptor atom. They are probably further stabilized by some synergism due to increased -delocalization facilitated by their intramolecular character (Perrin & Nielson, 1997). These dimers involve both the anti conformer and a carbonyl lone pair, supporting the view that the lone pair basicity scale might be essentially contextual. Further, these mono-anion dicarboxylic compounds are involved in the formation of at least two types of hetero-catemeric chains: (i) the SA-AÁAS-S (Fig. 7d) and (ii) SA-AÁAA-S types (Fig. 13d).
Rather counterintuitively, the shortest carboxyl(ate)-water hydrogen bonds involve the neutral carboxyl and not the charged carboxylate group (Fig. 11). Such short hydrogen bonds were analysed by density functional theory (Ś miechowski et al., 2011;Brown et al., 2012) and extensively discussed in a small-molecule neutron diffraction study where the authors were able to demonstrate the associated chain of polarization events (Vishweshwar et al., 2004). The latter group observed that not only charge and resonance assistance can lead to very short intermolecular hydrogen bonds [d(OÁ Á ÁO) ' 2.4-2.5 Å ], but polarization assistance must also be considered in terms of -cooperative stabilization (see Fig.  9a). These synergistic effects were named 'synthon-assisted hydrogen bonds' or SAHB (Brown et al., 2012). Examples of such multi-centred short hydrogen bonds can also be found in biomolecular systems and might play a significant role at catalytic sites (Cleland & Kreevoy, 1994;Katz et al., 2002).

Implications for biomolecular systems
Carboxyl dimers that involve simultaneous protonation of two Asp/Glu amino acids have not been reported in biomolecular systems, although carboxyl-carboxylate dimers appear to be relatively frequent in a wide pH range that can extend to 8.0 (Sawyer & James, 1982;Flocco & Mowbray, 1995;Torshin et al., 2003;Wohlfahrt, 2005;Langkilde et al., 2008). The formation of such interactions is surprising since it is generally assumed that given the pK a of the Asp ($ 3.9) and Glu ($ 4.3) residues (Pace et al., 2009), they would be deprotonated at physiological pH. As an outcome, carboxyl(ate) groups can form four different dimer types that extend to 16 when the two Asp/Glu amino-acid types are considered. However, since Hatom positions can rarely be observed in macromolecular systems, SA and AS dimers cannot be differentiated and this number reduces to nine due to degeneracy.
It was reported that the SA/AS arrangement is the most common in proteins (62%) followed by SS (24%) and AA (14%; Wohlfahrt, 2005), in contrast to the present study where the SS dimer dominates (Table 3). This originates probably from the better accessibility of the anti lone pairs of the Asp/ Glu residues that are not shielded by large chemical groups, as is observed in a majority of CSD structures. However, it remains to be determined whether the SA or AS arrangements is favoured or if they are energetically not differentiable. In other words, if the anti conformer is preferred or not over the syn conformer or if these preferences are contextual as so often witnessed in all types of chemical systems. Theoretical calculations on model systems favour the AS arrangement (Wohlfahrt, 2005), while the present study identifies the SA arrangement as being the most frequent (Table 3).
To identify the protonated state of Asp/Glu residues in Xray structures, efforts based on stereochemical factors have been made. The most obvious consideration relates to the hydrogen-bond proximity of two carboxyl(ate) O atoms, the associated distance being generally well below 2.7 Å (Sawyer & James, 1982;Ramanadham et al., 1993;Flocco & Mowbray, 1995;Torshin et al., 2003;Wohlfahrt, 2005;Langkilde et al., 2008). The carboxyl C-O(H) and C O bond lengths differ by $ 0.1 Å (Table 2) and the bond electron densities have also been exploited in the analysis of high-resolution protein structures ( 1.3 Å ), leading to clear identification of protonated Asp/Glu residues (Ahmed et al., 2007;Fisher et al., 2012). In the absence of good neutron diffraction structures (Ahmed et al., 2007;Hosur et al., 2013), such techniques could help to unscramble the degeneracy issue mentioned above. On a similar line of thought, short side-chain Asp/Glu carboxyl(ate) to Ow distances could be used to infer protonation states of the residues (Ramanadham et al., 1993).

Summary and perspectives
This work illustrates the diversity of supramolecular motifs generated by a single chemical group and offers a comprehensive carboxyl-carboxyl(ate) dimer and catemer nomenclature. As noted above: research papers Acta Cryst. (2015). B71, 164-175 (i) 17 possible carboxyl-carboxyl(ate) interaction modes including syn and anti conformers as well as carbonyl lone pairs were identified; (ii) among them, the cyclic dimer is the most represented; (iii) instances of all other possible interaction modes were found in the CSD, except the two as-a and aa-s 'hydroxyl dimers'; (iv) based on this classification, eight catemeric types could be uniquely identified; (v) the anti conformers are well represented and form distinguishable supramolecular motifs implying no significant basicity difference between the syn and anti lone pairs; (vi) the strongest (intramolecular) hydrogen bonds are observed in mono-anion dicarboxylic compounds and involve simultaneously an anti conformer and an anti lone pair, supporting the fact that anti interactions are by no means weaker than syn interactions; (vii) the shortest hydrogen-bond lengths found in this survey, including those formed with water molecules, are close to 2.36 Å (Fig. 6d); (viii) cooperative effects appear to be important in probably all systems involving carboxyl(ate) groups and should always be considered.
Although significant progress has been achieved in crystal engineering, it seems appropriate to recall a sobering thought by Steiner, who wrote in a paper on hydrogen-bond competition: 'Even though it is true that strong hydrogen-bond donors tend to interact with strong acceptors, this is valid only as a tendency. Weak acceptors also have a certain chance of attracting the strong donor. This weakens the general applicability of rules for predicting hydrogen-bond modes from hierarchies of donor and acceptor strengths and indeed all such rules published are very unreliable in practice' (Steiner, 2001). Further, Desiraju, witnessing the constant discovery of unforeseen structures, noted that after all: 'it would seem that brute-force method will eventually win' (Desiraju, 2007), suggesting that many more interaction rules of increasing complexity will be brought to light and that concerted but also serendipitous crystallization experiments are still very much needed to make progress in the field. These considerations on small supramolecular synthons apply fully to biomolecular systems where carboxyl(ate) groups are found to adapt in surprising and still insufficiently documented ways to their local environment.