Structure-based screening of binding affinities via small-angle X-ray scattering

Protein-protein and protein-ligand interactions can alter the scattering properties of participating molecules, and thus be quantified by solution small-angle X-ray scattering (SAXS). In such cases, scattering reveals structural details of the bound complex, number of species involved, and in principle strength of the interaction. However, determining binding affinities from SAXS-based titrations is not yet an established procedure with well-defined performance expectations. We thus used periplasmic binding proteins and in particular histidine-binding protein as a standard reference, then examined precision and accuracy of affinity prediction at multiple concentrations and exposure times. By analyzing several structural and comparative scattering metrics, we found that the volatility of ratio between titrated scattering curves and a common reference most reliably quantifies ligand-triggered changes. This ratio permits the determination of affinities at low signal-to-noise ratios and without pre-determining the complex scattering, demonstrating that SAXS-based ligand screening is a promising alternative biophysical method for drug discovery pipelines. SIGNIFICANCE Solution X-ray scattering can be used to screen a set of biomolecular interactions, which yields quantitative information on both structural changes and dissociation constants between binding partners. However, no common benchmarks yet exist for the application of SAXS within drug discovery workflows. Thus, investigations into its performance limitations are currently needed to make SAXS a reliable source for high-throughput screening. This study establishes a generalizable protocol based on protein-ligand interactions, and demonstrates its reproducibility across several beamline setups. In the simplest case, the micromolar binding affinities can be determined directly from measured intensities without knowledge of the molecular structure, with material consumption that is competitive with other biophysical screening techniques.


INTRODUCTION
Small-angle X-ray scattering (SAXS) is a widely used technique to examine structural features on the micrometer and nanometer scales, offering ready access to the physical behaviours of biomolecules in solution environment. (1)(2)(3) Within this context, SAXS reports the globally-averaged distance distribution between scattering electron densities around all atoms. This distribution is obtained via measuring the excess intensity of a sample solution over that of an equivalent buffer solution without sample. The scattered photon intensity I(q) as a function of the momentum transfer q, and its associated scattering angle 2θ between the incident and deflected beams, obeys: sin qr qr dr, where P(r) is the desired distance distribution between all atoms in and nearby the molecule with maximum spatial diameter D max , weighted by the excess electron density. The globally-averaged properties of SAXS implies that a mixture of multiple species with negligible long-range spatial interactions will linearly contribute their respective scattering intensities. Thus, by titrating two species at different input concentrations and measuring I(q) at each point, their populations at equilibrium can in principle be retrieved along with respective structural information. This is the basis of SAXS-based ligand screening. (4,5) For illustration, consider a simple two-state interaction between a receptor R and its ligand L that forms a complex RL. The concentrations of these three species at equilibrium are governed by a dissociation constant K D that describes the strength of the interaction: Thus, K D can be measured by titrating R and L at multiple input ligand:receptor ratios. When the size of L is small, its scattering is relatively constant over the q-ranges covered by SAXS. If we assume that measured I(q) changes directly correspond to the balance between R and RL once the constant contribution has been accounted for, then dissociation constants can be directly determined from perturbations of the SAXS signal ( Fig. 1).   The availability of high-intensity synchrotron sources and automated workflows(6-9) enables precise SAXS measurements within seconds, or far less using time-resolved setups.(10) Thus in the context of screening, a single synchrotron experiment can feasibly screen a small set of candidate ligands and identify members that interact strongly, weakly, or not at all. The precision is such that SAXS can track subtle changes regardless of whether the biomolecules are composed of well-folded domains (11,12) or disordered chains. (13,14) This capability has been leveraged to conduct titrations of diverse biomolecular systems: proteins, (15)(16)(17)(18)(19)(20) nucleic acids, (21,22) detergents, (23,24) and others, in order to expose possible structural mechanisms that underlie their functional behavior.
The majority of above studies concentrate on the structural information obtained, although several so utilize SAXS to additionally compute K D . Broad uptake of affinity-based applications is hindered by a number of practical difficulties. Firstly, a ligand screen may potentially produce novel complex conformations that are not known prior to the experiment. This complicates the application of population modelling to compute K D directly, since the scattering signals of the corresponding purified complex are not available. A reliable, alternative procedure for K D predictions must be found using a replacement metric. Secondly, no broadly applicable guidelines have yet been established on how to formulate protocols for SAXS-based ligand screening. While prospective beamline measurements will provide sufficient context to establish sample requirements and measurement protocols, one lacks screening-specific details such as their influence on the observable range and precision of K D values. To help tackle this challenge, we conducted reference titrations for a series of bacterial periplasmic binding proteins (PBPs), (25,26) leveraging their consistent purification and SAXS-detectable compaction upon ligand capture (27) to yield a protocol that can be replicated across multiple beamlines. The availability of SAXS titrations at multiple ligand K D and multiple protein concentrations enables us to propose the volatility of ratio V R as a viable scattering metric for generalized ligand screening, and begin to define practical sensitivity ranges relevant to SAXS-based ligand screening.
The production and purification of GlnBP and DEBP follow that of previous protocols. (28) We summarize this below with slight modifications for HisBP. After plasmid amplification in E. coli DH5α cells, protein expression was carried out in E. coli BL21(DE3) cells. Production cultures were grown in LB media at 37 • C supplemented with either 100 µg/ml ampicillin (GlnBP/DEBP) or 50 µg/ml kanamycin (HisBP). During prospective trials, this was replaced by 15N-labelled minimal media for parallel SAXS, ITC, and NMR experiments. Once OD 600 exceeded 0.6, cultures were induced with 1 mM isopropyl β-D-1-thiogalactopyranoside for 4 hr. Cells were spun-down, resuspended in loading buffer (500 mM NaCl, 20 mM imidazole, and 20 mM NaH 2 PO 4 [pH 7.4]), lysed by sonication, then filtered before conducting His 6 -tag-based purification by Ni-nitrilotriacetic acid affinity (Ni-NTA) chromatography. After initial loading and washing, on-column refolding was conducted via a decreasing Urea gradient from 8 M to 0 M to remove bound endogenous ligands, then eluted in buffer (500 mM NaCl, 300 mM imidazole, and 20 mM NaH 2 PO 4 [pH 7.4]). HisBP was further exchanged back into loading buffer, cleaved overnight via addition of TEV protease, then re-loaded onto a second Ni-NTA column to separate the similar-sized N-terminal tag. All PBPs were finally subjected to size-exclusion (SEC) chromatography in the final buffer used for all experimental measurements (100 mM NaCl, 0.5 mM TCEP and 20 mM NaH 2 PO 4 [pH 7.4]), except for NMR meaurements where samples were diluted due to the addition of 9% D 2 O.

SAXS measurements
Candidate ligands for each protein were selected on the basis of known affinities in literature and chemical similarity. Titrations were conducted at fixed protein concentrations to exclude signal-to-noise factors. An initial mass concentration of 2.0 or 4.0 mg ml −1 was employed in prospective experiments, then scaled to 2.0, 1.0, 0.5, and 0.25 mg ml −1 in later concentration variation studies. 12-point titrations were prepared at ligand-protein ratios 0.0, 0.2, 0.6, 0.8, 0.9, 1.0, 1.1, 1.2, 1.5, 20., 4.0, and 10.0 to cover a theoretical K D sensitivity within 2 orders of magnitude of the fixed protein concentration employed, assuming ∼5% random error (Fig. S7). A total of eight beamline experiments were conducted, using the automated sample changing environments available on-site, operated in constant-flow mode: three times at ESRF BM29, three at DESY P12, once at Australian Synchrotron SAXS/WAXS, and once at Diamond B21. We note that one prospective session at ESRF and two at DESY have been excluded from this study. Further, experiments at ESRF BM29 and Diamond B21 were carried out on site, while the included DESY P12 experiment was carried out by mail-in. For these measurements, samples were prepared by pipetting ligand:protein mixtures onto 96-well plates prior to transport. As for the Australian Synchrotron measurements, apo-HisBP was lyophilized in its final buffer and reconstituted on-site using pure H 2 O, and then mixed with ligands in 96-well plates. Measurement parameters for each site are catalogued in Table 1. For brevity, we report six representative scattering curves for HisBP, GlnBP, and DEBP in their free and native ligand-bound forms in Table S5

Automated analysis pipeline
The SAXScreen workflow was utilized for the semi-automated prediction of binding curves from scattering intensities, (20) with modifications for quantitative prediction. Buffer-subtracted intensities provided from respective synchrotron pipelines were used as starting points for analysis. DATGNOM(30) was used to determine a maximum molecular extent D max = 7.2 of PBPs, by averaging values found for apo-HisBP at a preliminary ESRF experiment. This was fixed for all computations involving D max including V R . The usable q-range was visually determined by conservatively discarding aggregation and noise-dominated regions, and also adjusted to match cross-site data for comparison of similar HisBP concentrations: 0.2-2.0 nm −1 below 20 µM, 0.25-2.5 nm −1 at 40 µM, and 0.3-3 nm −1 above 80 µM. The q-range for DEBP and GlnBP was set to 0.1-2.8 nm −1 . We considered the data cleaning algorithms provided by SAXS Merge to further exclude user bias,(31) but did not use this in the final analysis. A common scattering reference is created per HisBP concentration and per beamline experiment by averaging apo-protein replicates after discarding outliers. This reference is used for both V R and χ lin. computations. Further explanation of V R is available in the Supplementary Material. Other metrics have been computed as previously reported. (20) To fit binding curves against metrics, we consider as a single set all titrations conducted per protein concentration per beamline experiment. After eliminating measurement artifacts, the set of curves were fitted using a single Powell-minimization algorithm over the following free parameters: the shared apo-value, the holo-values for each ligand titration, and log dissociation affinities log 10 K D for each ligand titration. Ill-defined regions during fitting are capped by limiting K D to within 4 orders of magnitude of the receptor concentration (c.f. Fig. S7). Two uncertainty estimation methods were implemented to define errorbars in K D . For metric possessing measurement uncertainty ( χ lin. , R g and V c ), 1000 replicates were produced by adding Gaussian-noise with σ equal to measurement uncertainty. Error-bars indicate σ of logK D from the ensemble of replicate predictions. For metrics where uncertainty was not computed (V R and PV), the uncertainty was estimated by single-point removal, where K D was recomputed after removal of each titration point. The σ of the resulting replicates are taken as the σ of logK D . To eliminate cases where the fitting process fails due to ill-constrained parameters, replicates were discarded if the fitted apo-holo differences were > 3σ from the mean value of all replicates.

ITC
All ITC measurements were conducted on a Malvern MicroCal PEAQ ITC at 20 • C with stirring at 200 rpm, using the same buffer as in SAXS measurements. The cell was initially loaded with 200 µl of 20 µM protein with an initial ligand concentration of 200 µM in the syringe to quantify HisBP:His binding. This was further adjusted in subsequent runs to resolve weaker interactions as necessary. ITC titration curves for all experiments are shown in Figure S14 along with the concentrations used.  (2013)(37), where the shift in peak position is used to derive free and bound populations without considering peak intensities. The effect of site-specific exchange rates upon peak positions is taken into account. (38) We discarded residues with either insignificant shifts at saturation (δ < 0.05), or which were in slow exchange such that peak positions alone were no longer informative. This resulted in 110 sites included in the fitting process. The binding interaction is modeled with site-specific bound-state CSP, site-specific bound-state lifetimes τ, and a single overall K D . This translates to 221 degrees of freedom, fitted to 1468 target CSP values.

Evaluation of multiple PBPs at ESRF Grenoble
Within the in vivo context, PBPs serve as nutrient sensing and transport proteins, and are physically composed of two hinged globular domains each providing one half of a ligand binding site within their central cleft. We surveyed a set of amino acid specific PBPs with preliminary SAXS measurements conducted at ESRF BM29 and DESY P12 to identify a member that is best suited for evaluating the performance of SAXS-based ligand screening: Histidine-binding protein (HisBP), glutamine-binding protein (GlnBP), and aspartate/glutamate-binding protein (DEBP). In the absence of ligand, PBPs exist either as purely open conformations as is the case for DEBP, or already in a pre-existing open-closed conformational equilibrium in the cases of HisBP and GlnBP. (41) Ligand binding universally increases the population of closed/bound conformations, where the physical compaction is measurable as a small but significant change in the SAXS radius of gyration R g : 1.34±0.05 Å for HisBP:His binding, 0.66±0.29 Å for GlnBP:Gln binding, and 3.23±0.23 Å for DEBP:Glu binding. The net scattering changes can also be summarized via other parameters, such as Porod volume PV, linearity of fit χ lin. , (20) volume of correlation V c , (42) and volatility of ratio V R .(43) V R in particular compares the rate of intensity changes between two scattering curves -its nomenclature arises  Titrations that produce no significant changes in scattering curves have been denoted as no change.
from economics where V R is used to compare the "riskiness" between two stocks, i.e. which is more prone to rapid fluctuations in price. The adoption of V R in SAXS serves as a generalized measure of scattering difference that is unbiased over q and also insensitive to protein size. To enable comparisons between titrations within a single SAXS experiment, we average the replicate apo-protein scattering curves per concentration and compute V R against these averages, and further take into account unbound ligand scattering by adding constant corrections to minimize V R . The scattering perturbations of a series of ligands against HisBP, GlnBP, and DEBP at a preliminary ESRF session is summarized in Fig. 1c. The high protein concentrations in this initial test (2 ∼ 4 mg ml −1 ) give rise to excellent signal-to-noise ratios. This was necessary to detect the relatively minute GlnBP perturbations, and in addition limits the deterioration of accuracy due to lost data points. Despite the loss of 0.2:1 and 0.6:1 scattering curves in DEBP titrations, binding could still be quantified. Of the tested candidates, the significant scattering perturbations and availability of both nanomolar and micromolar interactions suggest that HisBP is the best candidate to use for the further exploration of screening protocols.
Aside from the binding curves exhibited in V R , we also observe analogous binding interactions exhibited by the majority of structural parameters (Fig S1). However, these other parameters appear to exhibit significantly lower precision due to measurement and preparation limitations. While χ lin also corrects for small concentration variations, and appears to be the next-best performing, applying linear fitting to raw intensities appears to over-emphasize the contributions from low angles where signal-to-noise is at its optimum. This renders χ lin vulnerable to residual aggregation artifacts, which also affects the net molecular compactness as measured by R g . These limitations to sample purity are particularly problematic for GlnBP, where the scattering intensity itself experiences a maximum of only 10% change upon ligand capture (Fig. S2). The overall size as reported by Porod volume is limited by our ability to precisely prepare molecular concentrations, estimated to similarly possess ∼10% error. Only DEBP exhibits an apparent size change greater than this margin. We also note that V c appears to pick up universal binding at high ligand saturation inconsistent with all other metrics. This is likely to be a false detection of gradual buffer differences stemming from unbound ligand and counter-ions. Regardless of the metric used, significant artifacts such as from sample loss cannot be corrected post-hoc and must be manually removed.
While the scattering perturbations closely resemble binding curves, the derived K D values are not always in agreement with nominally equivalent estimates derived from ITC (Table 2). For instance, almost all SAXS titrations estimate the nanomolar HisBP:His and GlnBP:Gln interactions instead to be 10-100 fold weaker. Curiously, SAXS suggests that DEBP binds not only its known ligands Glu and Asp, but also their polar equivalents Gln and Asn. Here, it is important to note that the two sources need not be consistent because SAXS measures structural change while ITC measures binding thermodynamics. SAXS-based K D reports the structural equilibrium between bound-closed/free-closed versus both bound-open/free-open conformations. If ligand binding alone is not sufficient to fully stabilize a closed conformation, then this may differ from the ITC measurement, which reports the net heat produced arising from both ligand-binding and conformational change steps. Another explanation for the discrepancy is that the sample requirements in our SAXS protocols are unable to effectively discriminate nanomolar affinities for PBP-sized biomolecules (∼30 kDa). This is explored further below.

HisBP replication studies
We investigated the sensitivity limits of SAXS-based screening by conducting HisBP titrations at multiple protein concentrations, selected to explore the balance between sample requirements and discriminative power between nanomolar affinities. Resulting V R measurements at four beamlines are reported in Fig. 2 with respective K D predictions visualized in Fig. 3. A summary of K D values is included in Table 2 with full values in Table S1. For completeness, we include results for χ lin , R g , and V c in Figures  S3-S5 and corresponding K D values in Tables S2-S4.
We first find that the total V R change upon ligand saturation decreases with decreasing HisBP concentration, due to the concomitant decrease in usable q-range. In the case of HisBP, dropping concentrations from 40 µM to 20 µM lead to loss of detectable intensity changes above q > 2 nm −1 (Fig. S6). The remaining scattering range is nevertheless sufficient to quantify binding at most concentrations. The minimum viable concentration is imposed by total sample exposure time: the short exposure times of 4-5 s trialled at DESY and ESRF provided insufficient signal-to-noise ratios to quantify scattering changes using 10 µM protein, corresponding to 0.26 mg ml −1 . In contrast, long 40 s exposures trialled at Diamond enable screening at this concentration, which is near the sample limitations recommended for synchrotron scattering. These findings suggest that trade-off between performance and sensitivity occurs below 20 µM for HisBP. Although exact numbers are also influenced by additional factors such as source intensity and sample stability, we expect comparable limitations for similarly-sized systems exhibiting total V R change of 0.1.
Repeated HisBP titrations also show that SAXS-based titrations conform to the K D range limits imposed by two-state binding (Fig. S7). In particular, we observe a strong dependence upon protein concentration of the fitted K D values for HisBP:His, and to a lesser extent HisBP:Arg (Fig. 3). This systematic error excludes the alternative hypothesis that histidine is unable to fully stabilize a closed conformation, which is also corroborated by observations in NMR chemical shifts of slow kinetics and saturation near 1:1-ratios (Fig. S8). Although we initially estimated that K D cannot be distinguished below 2 orders of magnitude of protein concentrations, the practical limit appears to be somewhat closer to 1. Regardless of the quantitative limitations, V R retains qualitative ranking between the four ligands for the majority of experiments. Longer exposure protocols appear to provide both more consistent ranking and less severe systematic errors. Further analysis of scattering parameters also suggest structural differences between the His-bound and Arg-bound HisBPs. Both V R and χ lin indicate larger perturbations elicited by His binding, in contrast to their comparable total change in R g . This suggests that HisBP reorients its domains laterally to accommodate the larger Arg sidechain. We further confirmed the results in NMR titrations of the two HisBP ligands: while the active site is shared, the extent and direction of shifts are altered at numerous locations ( Fig. S8-S10).

DISCUSSION
The practical niche for SAXS-based ligand screening appears to be a complementary filter that grants structural information on the bound configurations, providing information on the equilibrium between structurally distinguishable states. This provides a unique advantage in determining whether candidate ligands elicit or inhibit structural changes required for native function, serving as an in vitro discriminator between agonists and antagonists in the context of drug discovery. When permitted by sample stability and available beam time, it is possible to quantitatively derive K D at the lower sample limits of synchrotron scattering, giving SAXS a competitive advantage on this aspect versus alternative structural biology methods. On the other hand, the ability to differentiate between strong interactions is limited by minimum receptor concentrations required to obtain scattering data and detect potentially minute variations. Outside of large (> 100 kDa) complexes exhibiting significant global rearrangements, we do not expect SAXS to provide nanomolar resolution. While the tested scattering parameters all provide information on the mixture of states during titration, V R appears to be the most reliable method to extract the underlying populations and thus a means of quantifying K D . Notably, no assumption is made of the substrate material. It is possible that V R can quantify structural alterations in other existing applications of SAXS, for instance cellular ultrastructure, (44) whole-cell morphology, (45) and beyond.
The HisBP titrations conducted here are sufficient to provide initial guidelines on viable SAXS-based ligand screening protocols. A 20 µM limit for HisBP translates to a minimum sample requirement of 0.31 mg per titration, competitive with ∼0.24 mg used per ITC run and 2.4 mg used for NMR titration. This is again likely to hold for similarly-sized systems exhibiting comparable structural perturbations. The minimum consumption is ultimately determined by the replicate measurements necessary to reliably determine apo, 1:1, and ligand-excess scattering. This is in turn determined by accuracy of V R measurements versus its expected change due to structural perturbations. We note that V R of independent apo measurements constitutes a measure of accuracy, as its ideal value is zero. In contrast, the expected V R changes due to binding is system-dependent. Further work will be needed to derive an algorithm that predicts the expected V R changes based on the receptor structure or scattering pattern, and thus determine screening viability. This will also help determine whether V R is a universally applicable metric for affinity computation, both in terms of the type of structural changes and in terms of reaction complexity. It is possible that other metrics may be superior in cases where intensity changes are restricted to particular q-regions, or where more than two states are involved. Here, we expect that direct population modelling of intensities will persist as the theoretically optimal choice.
The throughput performance of each beamline used here varies between 3.7 hours and 6 hours per 96-well plate depending on overall exposure time, sample handling, and cleaning of measurement capillaries. The last two are major limiting factor to further improvement for solution setups, where the next breakthroughs in both speed and precision are likely to take place on existing robotic and promising microfluidic-chip(46-48) platforms. Although not employed here, we note that the SIBYLS beamline claims highest throughput so far at ∼15 minutes per plate with unit second exposures. Adjusting for 20 s exposure times required to improve precision, this decays to ∼45 minutes per plate, sufficient to screen 10 2 ligands in a 24-hour session. For reference, our ITC and NMR protocols require ∼48 hours to accumulate 96 measurements. This is significantly slower, but is less constrained by available access.

CONCLUSIONS
In summary, this work presents a detailed analysis of the accuracy and precision of SAXS-based K D determination, using sample setups that are competitive with secondary screening approaches used to complement high-throughput screening. In comparison to the throughput of pharmacological screening assays (up to 10 6 compounds,) the throughput here is sufficient to validate a selection of initial hits and inform on the structural implications of ligand binding. This translates to discrimination between likely agonists and antagonists, potential oligomerization, and other observations that may not be available from standard pharmacological. In this way, SAXS-based ligand screening can benefit drug discovery pipelines as an independent source of structural information for solution biological processes, without requiring isotope-labelling for NMR or reliable crystallization protocols. Further work will be needed to confirm that V R can be used to directly retrieve K D regardless of the biological system being used, which will also contribute towards a useful library of reference data to guide future screening efforts. Along with potential feasibility studies using lab-based X-ray sources, we may yet see a mass uptake of this particular structure-based screening tool.

AUTHOR CONTRIBUTIONS
PC and JH conceived and designed the research. PC conducted NMR and the majority of SAXS measurements, analyzed data, and updated SAXScreen software. PM conducted half of sample preparations and trained PC to conduct the other half. KP conducted the majority of ITC measurements and trained PC for the remaining measurements, as well as analysis. JH conducted