REGALS: a general method to deconvolve X-ray scattering data from evolving mixtures

Mixtures of biological macromolecules are inherently difficult to study using structural methods, as increasing complexity presents new challenges for data analysis. Recently, there has been growing interest in studying evolving mixtures using small-angle X-ray scattering (SAXS) in conjunction with time-resolved, high-throughput, or chromatography-coupled setups. Deconvolution and interpretation of the resulting datasets, however, are nontrivial when neither the scattering components nor the way in which they evolve are known a priori. To address this issue, we introduce the REGALS method (REGularized Alternating Least Squares), which incorporates simple expectations about the data as prior knowledge and utilizes parameterization and regularization to provide robust deconvolution solutions. The restraints used by REGALS are general properties such as smoothness of profiles and maximum dimensions of species, which makes it well-suited for exploring datasets with unknown species. Here we apply REGALS to analyze experimental data from four types of SAXS experiment: anion-exchange (AEX) coupled SAXS, ligand titration, time-resolved mixing, and time-resolved temperature jump. Based on its performance with these challenging datasets, we anticipate that REGALS will be a valuable addition to the SAXS analysis toolkit and enable new experiments. The software is implemented in both MATLAB and python and is available freely as an open-source software package.


Introduction
unknown before the experiment is performed and must be inferred from the data itself. In such cases, the challenge is to identify appropriate mathematical tools to incorporate more general, physically motivated restraints that lead to a reliable and accurate model-free separation.
In dilute solution, SAXS intensities from non-interacting components combine linearly in proportion to their relative concentrations. A SAXS dataset from a mixture can therefore be described as the convolution of the concentration and SAXS profiles, and deconvolution can be performed using matrix factorization techniques such as singular value decomposition (SVD) [14,15]. However, to recover the scattering from each component, the basis vectors from SVD must be recombined using prior knowledge about what constitutes a physically valid solution. The field of chemometrics has developed a number of algorithms for solving this problem, known as multivariate curve resolution or MCR [16,17]. When a physicochemical model is available, the alternating least squares (MCR-ALS) algorithm can perform deconvolution using the model as a hard restraint [17]. In the context of SAXS, deconvolution with hard restraints has been applied to time-resolved experiments [11,[18][19][20], equilibrium titrations [10,12,21,22], unfolding experiments [23,24], protein-micelle interactions [25], and fibril formation [26]. Interestingly, MCR can be performed without assuming a hard model by imposing soft restraints such as positivity, unimodality, and local rank [17]. Such model-free deconvolution is seldom applied to SAXS data because soft restraints are rarely sufficient to provide a robust and unique solution on their own [16]. One exception is SAXS data collected with in-line size-exclusion chromatography (SEC-SAXS), where MCR-ALS has been combined with evolving factor analysis (EFA) [27] to separate overlapping elution peaks [28,29].
Although SVD and MCR algorithms are well suited to certain SAXS experiments, they are a poor fit for other more challenging datasets. A notable example is SAXS data collected with in-line anion exchange chromatography (AEX) [30]. AEX separates according to charge by applying the sample to cationic media and eluting with a salt gradient. In SAXS, the salt gradient produces a changing background scattering that must be accounted for. Because this changing background violates certain assumptions of the EFA method, model-free deconvolution of AEX-SAXS data is not possible with EFA. We previously encountered this issue when analyzing AEX-SAXS data from the large subunit of B. subtilis ribonucleotide reductase (BsRNR) [31]. To overcome this challenge, we incorporated a simple assumption as additional prior information: namely, that the background scattering must change gradually over time. Using the ALS algorithm with smoothness regularization applied to the concentration of background scattering components, we achieved a clean separation of multiple protein and buffer components [31].
Here, we examine the generality of this strategy for the model-free deconvolution of other complex types of SAXS data where traditional "soft" restraints are insufficient. We describe the REGALS (REGularized ALS) toolset and demonstrate its application to a wide variety of SAXS experiments from evolving mixtures. Unlike most deconvolution methods that impose a physicochemical model, REGALS relies on very general parametric models for the SAXS profiles and concentration curves. The models include two types of restraint: smoothness and compact support. In AEX-SAXS, for example, each elution peak is assumed to be non-zero over a particular range (compact support), and the background components are assumed to be smooth. For the BsRNR dataset, we find this is sufficient to deconvolve the protein scattering peaks. In other cases, such as equilibrium titration and time-resolved SAXS, where concentrations are typically non-zero in all (or nearly all) data frames, the assumption that concentrations have compact support is insufficient.
However, compact support can be applied to the SAXS profiles in real space by imposing a maximum particle dimension. We show that compact support in real space, as well as boundary conditions applied to the concentration basis functions, provide sufficient information for successful deconvolution of such data.
Finally, we introduce the REGALS software package, which is adaptable by design, freely available, and open source.

Background
A dilute, evolving mixture of K components scatters X-rays according to the following linear model: where y k (q) are the individual SAXS profiles and c k (x) are the relative concentrations. The SAXS profiles depend on the scattering vector magnitude q = (4π/λ ) sin θ , where λ is the X-ray wavelength and 2θ is the scattering angle. The concentration profiles depend on an independent variable x (representing time, ligand concentration, etc). Since intensities are measured at discrete values of q and x, Equation 1 can be written in matrix form as follows: where I calc. contains scattering profiles arranged side-by-side as column vectors. Here and throughout this section, the intensity matrix has dimensions of M × N (N scattering profiles with M discrete values of q).
Our aim is to determine Y and C given the measured intensity I meas. , which contains noise. This is accomplished by minimizing the least-squares error between data and model: where σ i j are the standard errors of the measured intensity. In the following, we assume that the experimental errors depend only on q, so that Equation 3 can be written as a Frobenius norm of the error-weighted residual: where Σ is a diagonal matrix with Σ ii = N −1 ∑ N j=1 σ i j . This simplifying assumption is approximately correct for the datasets considered here.
In general, minimizing χ 2 is not sufficient to determine Y and C uniquely. The main issue is that basis vectors can be mixed (or "rotated") without changing χ 2 : for any non-singular K × K matrix Ω, replacing Y → YΩ and C → CΩ −T leaves the product YC T unchanged. Thus, the primary challenge of deconvolution is to impose appropriate restraints that provide a unique and physically meaningful solution.
Deconvolution problems resembling Equation 2 arise in many experimental contexts. A common approach is to apply SVD [14,15], by which an error-weighted data matrix is decomposed as follows: where U has the left singular vectors as columns, V contains the right singular vectors as columns, and S contains the singular values along the diagonal in decreasing order. The uniqueness of the decomposition results from the fact that the singular vectors are an orthonormal basis.
The singular values s j = S j j are positive and indicate the importance, or weight, for each pair of left and right singular vectors. When the number of observations (N) is much larger than the number of independent components in the signal (which is generally the case for examples studied here), most of the singular values will be small and represent the noise in the data, while a few large singular values correspond to the signal of interest. To detect significant singular values, it is useful to calculate a normalized singular value, as follows: where M and N are the number of rows and columns of the data matrix. If no signal is present, random matrix theory shows that s j < 1 in the limit where the data matrix is large (see [32] and references therein).
Thus, components corresponding to signal above the noise are expected to have s j > 1.
By retaining only the K most important singular vectors (U → U K , etc.), one obtains an approximate (reduced rank) representation of the data. Thus, a solution for Y and C can be constructed from SVD as follows: Here, the singular value weights have been distributed evenly between the SAXS and concentration basis vectors, but other choices could be made depending on the normalization conditions.
Although SVD provides a unique low-rank decomposition of the data, the orthonormality of the singular vectors often produces non-physical results. For instance, the component SAXS profiles or concentrations might have negative values. It is therefore often necessary to further unmix (or "rotate") the SVD basis vectors by applying physical restraints [10,19,23,25]. In traditional MCR techniques, physical restraints are imposed using "hard" or "soft" models, whose applicability depends on the type of experiment performed and prior knowledge. Alternatively, prior information can be imposed through Tikhonov-Miller regularization, where additional functions are minimized at the same time as χ 2 [33,34]. As described above, in an AEX-SAXS experiment, the expectation that background scattering varies gradually over time can be enforced using a regularization function that penalizes large oscillations [31].
Regularization is also used in conventional SAXS data analysis to infer the pair distance distribution function, or P(r), from the measured intensity [35]. Essentially, P(r) represents the probability of two electrons having a distance r in the sample, and it is related to the scattering intensity by a Fourier transform: where the integral terminates at the maximum particle dimension, d max (since P(r) = 0 for r > d max ). Although Equation 8 can be inverted analytically, in practice the intensity is measured over a finite q-range, and thus, inversion is an ill-posed problem. Since the Fourier transform is a linear operator, Tikhonov-Miller regularization can be applied. P(r) is discretized as a vector u of length R, which samples values of P(r) on a uniform grid with spacing ∆r. Equation 8 can then be written as where I calc is a vector of length M, and A is an M × R matrix with elements The standard indirect Fourier transform (IFT) method for SAXS data minimizes the χ 2 between I calc. and I meas. plus a regularization term:û = arg min Typically, the matrix B performs a discrete approximation of the second derivative [36,37], which enforces smoothness by penalizing wildly oscillating solutions. The regularization parameter (or Lagrange multiplier) λ controls the tradeoff between minimizing χ 2 and minimizing the regularization function. The optimization problem is solved by the method of normal equations, with the (formal) result: In this study, we describe a general method for deconvolving SAXS data from mixtures that applies regularization to both the concentration and SAXS profile basis vectors. We first formulate the deconvolution problem (Equation 2) using a parametric representation of the basis vectors, similar to the IFT example above. This parametric form allows the SAXS profiles to be represented in the real-space (P(r)) basis if desired. Then, we describe the REGALS algorithm for minimizing the sum of χ 2 (Equation 3) and regularization terms.

Deconvolution by regularized least squares
In order to deconvolve SAXS data from evolving mixtures, we introduce a method to impose mathematical constraints that embody prior information (or general expectations) about a SAXS experiment. The first way that constraints are imposed is through a parameterization of the basis vectors: where u k and v k are the parameter vectors for the SAXS profile and concentration bases respectively. Here and in the following equations, primed functions or matrices refer to the concentration basis, in order to distinguish them from the SAXS profile basis. We implemented three types of basis vector: simple, smooth, and real-space (Figure 1a). In a simple basis vector, A k is the identity matrix and the parameter vector encodes the basis vector directly. In a smooth basis vector, A k performs a linear interpolation from a uniform grid of control points to the experimental grid, which need not be uniform. Finally, in a real-space basis vector (which applies exclusively to SAXS profiles), u k samples P(r) on a uniform grid and A k is given by Here, u and v refer to global parameter vectors that are constructed by concatenating parameter vectors for the individual basis functions (for example, u is u 1 , . . . , u K placed end to end), and χ 2 is calculated from Equations 4 and 13 as follows: The regularization functions are a sum of quadratic regularizers acting on each component's parameter vector: The regularization parameters λ k and λ k control the tradeoff between minimizing χ 2 and each regularizing function. For smoothness regularization, B k is a discrete approximation of the second derivative [37]. Zero boundary conditions are optionally imposed by removing the parameters on the boundary and deleting the corresponding rows of A k and B k [37].

REGALS algorithm
The optimization problem described in the previous section (Equations 14, 15, and 16) is nonlinear, and therefore does not afford a straightforward solution. We chose to adapt the alternating least squares (ALS) algorithm, which is often used in classic MCR [16,17,38]. ALS replaces the single nonlinear optimization problem with two linear problems that are solved in an alternating fashion over many iterations: beginning with an initial guess, one set of basis functions is optimized (e.g. the concentrations) with the other held fixed, and then the other basis functions are optimized. This is repeated until the change in basis vectors from one iteration to the next is smaller than a certain tolerance or the maximum number of iterations has been reached.
The REGALS algorithm solves Equation 14 iteratively using ALS with regularization ( Figure 1c). First, an initial guess is made for the concentration basis parameters (v). This can be supplied by the user or generated automatically based on the parameterization type and boundary conditions. In the first leastsquares step, the SAXS basis functions are optimized while the concentrations are held fixed: Then, the profiles are normalized according to a their parameterization type; for simple and smooth types, the parameters are divided by the root-mean-squared value, while for the real-space type, parameters are normalized by the scattering intensity at q = 0 calculated from the area under the P(r) curve (see Equation   8). In the second least-squares step, the concentration basis functions are optimized while the SAXS profiles are held fixed:v Statistics are calculated at this stage, including the change in the basis vector from the previous iteration (sum of the absolute value of the difference) as well as the χ 2 for the current model (Equation 3). Finally, the cycle is repeated until a reaching convergence according to user-specified termination conditions. Further details about parameter estimation, error analysis, and implementation can be found in Methods.

REGALS deconvolution of AEX-SAXS data
During an AEX separation, sample bound to the column is eluted by flowing buffer with increasing salt concentrations. The main challenge in deconvolving AEX-SAXS data is to account for the changing background scattering from the buffer. We analyzed a dataset previously reported for the large subunit of BsRNR [31], which eluted from the column in two main peaks during a linear gradient of 100 to 400 mM NaCl (  Finally, we refined the model using regularization to enforce smoothness of the background components. The SAXS profiles were not parameterized (simple basis vectors were used). To ensure that each protein concentration model fully encompassed the peak for each component but was not larger than necessary, we performed several trial refinements with REGALS while varying the region of support and inspecting a plot of residual χ 2 vs. frame number (not shown). The model parameters are summarized in Supplementary   Table S1. Finally, REGALS was run for 50 iterations, at which point it was well-converged. The overall reduced χ 2 was 1.011, suggesting that the refined model accounted for most of the signal.  and orange). We previously showed that these protein components were in excellent agreement with models of the monomeric and dimeric forms derived from crystal structures [31].
Although SVD had suggested that two background components were needed to describe the data, we asked whether two components were strictly necessary for deconvolving the protein peaks. To test this, we removed the minor component from the model (B2) and performed the deconvolution using REGALS. As expected, the quality of fit was noticeably worse when the background was modeled with one component compared with two (Figure 3a, bottom vs. top). Interestingly, the fit of the one-background model is worse in the buffer-only region of the data, but it achieves a near-perfect fit (χ 2 ∼ 1) in the region where the proteins elute. This observation suggests that the protein components absorbed the background subtraction error. Indeed, a comparison of the extracted SAXS profiles for C1 shows a significant deviation from expected shape in the low-q region if only one background is used (Figure 3b). These results indicate that the buffer scattering in AEX-SAXS can be complex and must be modeled well to achieve well-subtracted SAXS profiles. Furthermore, they underscore the importance of collecting the full buffer scattering before and after the peak in AEX-SAXS experiments, as this information is effectively used to extrapolate the complex behavior underneath the elution peaks.

REGALS deconvolution with real-space SAXS restraints
In SAXS datasets from time-resolved or ligand titration experiments, it is common for components to have non-zero concentrations in most or all of the measurements, and a compact support cannot be assumed in the concentration basis, as in the AEX-SAXS example above. To robustly deconvolve such datasets, it is necessary to incorporate additional prior information. Within REGALS, this can be done in two ways: (1) by imposing boundary conditions on the concentration basis vectors, and (2) by limiting the maximum dimension of certain components through real-space parameterization of the SAXS basis vectors.

Equilibrium titrations.
As a first test of real-space restraints, we examined a challenging ligand titration dataset of phenylalanine hydroxylase (PheH) [28]. The tetrameric enzyme undergoes a conformational change upon binding its allosterically-activating ligand, L-phenyalanine (L-phe). In SAXS, the signature of this conformational change is an oscillating mid-q feature that appears at physiological concentrations of L-phe (Figure 4a, 0-1 mM L-phe). At higher concentrations of ligand, the mid-q feature does not change further, but an increase in scattering at low-q is observed (Figure 4a To deconvolve the PheH titration dataset, we constructed a REGALS model with three components: resting tetramer, activated tetramer, and aggregate. The SAXS profiles for each component were modeled using the real-space parameterization (P(r)). The resting and activated tetramers were estimated to have a maximum dimension of 130Å based on previous studies [28], and the aggregate was assigned a maximum dimension of 300Å, the largest dimension that could be measured based on the Shannon limit for this dataset (d max < π/q min ). Boundary conditions of P(r) = 0 were imposed at both r = 0 and r = d max . We also imposed prior information on the concentration basis vectors using a smooth parameterization. According to the equilibrium model for this system [28], the concentration of activated tetramer is negligible at 0 mM L-phe, so a zero boundary condition was imposed. For the resting tetramer, we limited the range of the basis The model parameters are summarized in Supplementary Table S2. The basis vectors were optimized using the REGALS algorithm, which converged after 50 iterations with an overall reduced χ 2 of 1.41.
One advantage of using the real-space parameterization is that P(r) functions are obtained directly from the deconvolution and provide immediate insight into particle shape. For the resting and activated PheH tetramers, we find that the P(r) functions decay to zero smoothly at d max (Figure 4b, components 1 and 2), as expected for compact particles. The peak in P(r) shifts toward larger dimensions in the activated tetramer, indicating that it has a less compact conformation. The P(r) for the aggregated species decays toward zero In ligand titration datasets like this one, the concentrations of different components are often of great interest, since they give insight into the equilibrium behavior of the system, including cooperativity and binding affinities. The REGALS deconvolution of the PheH titration produced concentration profiles that appear physically reasonable (Figure 4c). To verify that smoothness regularization had not overly biased the result, we also extracted concentration estimates at each point without regularization (Equation 27) and found that they agree with the regularized curve (Figure 4c, circles vs. continuous curves). We find that the aggregate is present under all conditions, staying at a low level between 0 and 1 mM L-phe, before rising sharply at high concentrations, in agreement with prior SEC-SAXS experiments [28]. The resting tetramer converts into the activated tetramer in a manner characteristic of cooperative two-state transition, as shown previously [28].
Further analysis of this equilibrium is beyond the scope of this study. However, we note that the arbitrary concentration scale of Figure 4c can be transformed readily into the fraction of resting and activated species, which can be fit using an equilibrium model. The REGALS results are normalized by the area under P(r) (equal to I(q → 0)), and this quantity is expected to be the same for components with the same molecular weight. Thus, in this case the tetramer concentrations in Figure 4c differ from the true concentrations (e.g. in mg/mL) by the same scale factor.
One assumption in the REGALS model was that the aggregate did not change in size or shape as a function of [L-phe], which may not be the case, particularly since its shape appears to be rod-like and therefore its growth might be non-terminating. To check whether this assumption was supported by the data,

Time-resolved SAXS.
Based on the successful application of real-space REGALS to the challenging PheH titration dataset, we  Pump-probe datasets are also special in that very small changes can be measured by examining difference profiles (laser on minus laser off), which removes systematic error. Given these differences, we chose to evaluate REGALS with both mixing and pump-probe datasets, as described below.
First, we chose to analyze a stopped-flow mixing dataset from the soluble nucleotide binding domains (NBDs) of the membrane transport protein MsbA, which was recently published and deposited in a public database [39]. In the experiment, a solution with nucleotide-free NBD monomers was rapidly mixed with ATP, resulting in ligand binding and dimerization, followed by ATP hydrolysis and dissociation back to the monomeric state. This transient increase in average size can be observed in a Kratky plot (q 2 I(q) vs. q), where the main peak shifts to the left (lower q) and then to the right (Figure 5a). In the original publication, the relative concentrations of NBD monomer and dimer at each time point were fit using calculated scattering profiles from known crystal structures. However, in time-resolved experiments generally, it is often the case that atomically detailed structures are not available, either because they have not been characterized at high resolution or because they are dynamic. Therefore, we asked whether REGALS could deconvolve the MsbA dataset using only general properties of the molecules.
The REGALS model consisted of two components representing NBD monomer and dimer. The parameterization was similar to the PheH titration example above: the smooth parameterization was used for concentrations, and real-space for SAXS profiles. Based on the full-length structure of dimeric MsbA (PDB ID: 3b60), we estimated the d max of the dimeric and monomeric forms of the NBD portions to be 70Å and 62Å respectively. Reflecting the prior observation that NBDs relax to a fully dimeric state after ATP hydrolysis, we applied a zero boundary condition to the dimer concentration at the final time point (approximately 2 minutes after mixing). The model parameters are summarized in Supplementary Table S3. REGALS was run for 100 iterations, resulting in an overall reduced χ 2 of 0.335. The fact that χ 2 < 1 here suggests that the reported experimental errors were overestimated, so χ 2 is not a reliable statistic for quality of fit. However, the quality of fit was confirmed by examining the residual (not shown).
The deconvolved concentration profiles show a rise and fall of the dimer component, with a concomitant dip in the monomer, which resembles the profiles obtained by fitting scattering from crystal structure models in the original publication [39]. The P(r) functions are also physically reasonable, with single peaks that decay smoothly to zero as r approaches the maximum dimension. Using the REGALS deconvolution, we For a pump-probe dataset, we chose a temperature-jump SAXS/WAXS experiment which was performed on the protein CypA [40]. These experiments involved rapidly heating the sample by approximately 10 • C with an infrared laser pulse of several nanoseconds duration, followed by a synchrotron X-ray pulse of approximately 500 ns duration after a delay of 562 ns to 1 ms. Following the methods in the original publication [40], difference profiles were constructed (laser on minus laser off) for both the protein and buffer blanks, and these were scaled together in the WAXS regime and subtracted. The remaining signal, attributed to the effect of the rapid temperature change, is most significant in the SAXS regime (Figure 6a), and it evolves non-trivially as a function of the time delay (Figure 6a, inset). Note that the difference profiles are negative, and this is thought to result from the differential thermal expansion coefficients of protein and water, which would reduce scattering contrast at high temperature [40].
Previously, the biphasic appearance of the mean intensity (Figure 6a, inset) was interpreted as a fast transition to excited states of the molecule followed by a slow relaxation toward equilibrium [40]. Although SVD analysis revealed 3 significant components, no kinetic or structural interpretation of the basis vectors was reported. We asked whether a real-space REGALS deconvolution might offer additional insight. Based on the SVD result, we chose to model three components (C1, C2, and C3). For all three, a smooth parameterization was used for the concentration basis, and real-space for the SAXS profile basis. The first component (C1) was assigned to represent the transient process following T-jump, with a concentration of zero at both end points. No constraints were applied to the concentrations of the other two components.
In real space, C2 was assigned a maximum dimension of 46Å estimated from a crystal structure of CypA (PDB ID: 3k0n). Lacking further information with which to restrain the model, the maximum dimensions for C1 and C3 were adjusted by trial and error based on quality of fit and subjective appearance of the P(r) functions. The final model parameters are summarized in Supplementary Table S4. Note that since difference intensities are fit, this parameterization represents the difference P(r) function, ∆P = P on − P off , and d max represents the maximum dimension over which changes to P(r) occur after heating.
The REGALS algorithm was run for 400 iterations, converging to an overall reduced χ 2 of 1.667. Although the difference intensities are negative (Figure 6a), the deconvolved ∆P(r) functions are all positive ( Figure 6b) because the REGALS algorithm normalizes SAXS basis functions by the integral of P(r). Consequently, some of the concentrations are negative (Figure 6c). Negative concentrations (or SAXS curves) are a necessary feature when analyzing difference intensities, and they can be a challenge to conceptualize.
However, two immediate observations can be made. First, the concentration of C3 is approximately constant for the first ∼ 4 µs after T-jump, and the change on those timescales is captured by C1 and C2. According to the ∆P(r) functions for C1 and C2, we conclude that the fast processes occur on length scales up to ∼ 60Å, somewhat larger than the size of the CypA monomer. Changes on longer timescales additionally involve C3, which has a much longer range of 150Å and likely involve interparticle interactions because the experiments were done at a relatively high protein concentration of 50 mg/mL.
To gain a more intuitive picture of the changes following T-jump, we used the regularized basis functions to reconstruct the time evolution of ∆P(r). This removes, to some extent, the influence of choices made during the REGALS parameterization, and resolves the sign ambiguity. Since the signal is dominated by the contrast decrease (not shown), we subtracted the first time point to obtain ∆∆P(r,t) ≡ ∆P(r,t) − ∆P(r,t = 561 ns), which tracks the change in signal after T-jump (Figure 6d). This reconstructed signal reveals a clear positive feature with a peak at r ∼ 35Å that appears at fast time scales, followed by a negative feature with a peak at r ∼ 60Å on slower time scales. The physical explanation is not entirely clear, but one hypothesis might be transient partial unfolding followed by an increase in inter-particle repulsion (or a decrease in attraction). As experiments which rely on difference intensities are often performed at high protein concentrations, further investigations of inter-particle interactions are of great interest.

Conclusions
In this work, we introduced REGALS as a robust, generally applicable technique to deconvolve challenging SAXS datasets from evolving mixtures. The strategy implemented in REGALS has several key advantages.
Most notably, prior knowledge is taken into account without having to impose a physicochemical "hard" Future work will focus on augmenting the REGALS toolkit to further expand the range of applications.
Here, we found that two simple restraints, smoothness and compact support, proved powerful for expressing prior knowledge. However, many other types of restraint are possible within the REGALS framework. Examples of particular interest to SAXS include sparseness and non-negativity [22], hard restraints on certain components with known scattering curves, and fixed non-zero or derivative boundary conditions. In addition, REGALS could be applied to datasets with more than one independent variable using methods from MCR of multi-way data [16]. For example, the CypA time series analyzed here was one among several conducted at different initial temperatures [40], and thus, the entire dataset might be analyzed using a multiway REGALS decomposition. Furthermore, the assumption of dilute solution can also be relaxed by adding extra components to represent terms in the Taylor expansion of the structure factor [25], which may be of particular interest for time-resolved experiments that require high protein concentrations. Finally, certain parameter choices in REGALS may be automated by leveraging the Bayesian interpretation of regularized linear regression [41,42], much as the regularization parameter and d max are determined automatically in Bayesian IFT [36]. We anticipate that the REGALS method described here, and future developments, will be a valuable addition to the SAXS data analysis toolset and enable new applications. where Note that these equations can be combined and written in the form (M + H)u = b, making them straightforward to solve using standard numerical methods. Similarly, the parameters for the concentration basis (Equation 18) are found by solving the K sets of linear equations: where

Extracting scattering curves and error estimates
After fitting a dataset with a REGALS model for Y and C, the results are typically smooth versions of the concentrations and SAXS profiles. However, for further analysis (such as fitting atomistic models to the SAXS data), it is desirable to extract curves resembling experimental data with properly estimated errors.
Previously, we applied a projection algorithm which uses the pseudoinverse of the concentration matrix to generate SAXS profiles and associated error bars [28]. For the datasets examined here, we found that the pseudoinverse method amplifies noise in certain cases. Therefore, we developed an alternate method which makes use of the regularized basis vectors to overcome this issue. In order to extract a particular component, a residual data matrix is reconstructed by subtracting the model with component k excluded: The unregularized basis functions y and c are extracted by minimizing with either scattering profile or concentration held fixed. The solutions can be written as weighted averages of the residual data matrix, as follows: with coefficients The uncertainties are estimated by standard propagation of experimental errors:

Regularization parameter estimation
The regularization parameters λ k and λ k reflect prior information about the smoothness of the parameters.
They are not known in advance, so initial values must be chosen by the user and further adjusted if REGALS fails to converge. However, the regularization parameter is not an intuitive quantity, and it depends in a complicated fashion on the noise level in the data and the particular regularizer chosen. To assist the user in selecting initial values, we provide the option of specifying a more intuitive parameter: the "number of good parameters," or n k . This parameter comes from the Bayesian interpretation of regularized linear regression [41,42], and it estimates how many parameters are effectively determined by the data (as opposed to the regularizer). The number of good parameters determined for u k (SAXS basis) is as follows: where d k is the vector of generalized eigenvalues of the matrices M kk (which depends on |c k |, see Equation 20) and H k = B T k B k . To determine λ k given n k , Equation 30 is solved numerically using the initial guess for c k . Strictly speaking, n k should be determined using the final value of c k (after REGALS has converged), however we have found that initial estimates of n k are usually close to the final values. Similarly, regularization parameters for the concentration basis are found by solving Equation 30 where d k are the generalized eigenvalues of M kk (Equation 23) and H k = (B k ) T B k .

Software implementation
The REGALS method was developed in MATLAB and subsequently translated into python. The two im-

AEX-SAXS of BsRNR large subunit
The collection and preprocessing of AEX-SAXS from the large subunit of B. subtilis ribonucleotide reductase (BsRNR) was described in the original publication [31]. Briefly, the as-isolated protein was eluted from a MonoQ column using a linear gradient of 100 to 500 mM NaCl directly into a SAXS flow cell. Scattering images were recorded continously during elution (q-range of 0.008 to 0.700Å −1 ). After integration, each profile was normalized by the transmitted beam intensity, and buffer-only curves collected before the start of the gradient were averaged and subtracted from the remaining curves. A set of 1737 frames was retained for further analysis, beginning just after the start of the linear gradient and ending before the gradient completed, when the NaCl concentration had reached approximately 400 mM. These preprocessed data are available in NrdE mix AEX.mat (a MATLAB-formatted hdf5 file).

Time-resolved mixing of MsbA NBD with ATP
As a first example of time-resolved SAXS data, we chose a recently published stopped-flow mixing dataset [39]. In the experiment, a soluble nucleotide binding domain (NBD) construct (residues 330-581 of the ATP-binding cassette transporter MsbA) was mixed with Mg 2+ -ATP in a 1:1 (v/v) ratio (final concentrations 500 µM NBD and 450 µM ATP). One X-ray exposure of 35 ms was acquired per shot after a variable time delay of 20 ms to 120 s.
The time-resolved MsbA NBD dataset consisting of 23 buffer-subtracted scattering curves (0.01 < q < 0.5Å −1 ) was downloaded from a public database (https://www.sasbdb.org/data/SASDGV5/), minimally reformatted, and saved as MsbA time resolved.mat (a MATLAB-formatted hdf5 file). Minor preprocessing was performed before running REGALS. Upon inspection, we noted a strong negative-going feature at low-q suggesting a background subtraction error. We therefore truncated the low-q at 0.015Å −1 .
In addition, we found that the average intensity displayed a slight random jitter shot-to-shot. We corrected for this by applying a scale factor to each curve, which was found by fitting a 5th-order polynomial to the mean intensity vs. log 10 (time). The resulting scale factors were close to 1 (standard deviation of 0.013).

Time-resolved temperature-jump of CypA
As an example of a pump-probe time-resolved experiment, we chose recently reported T-jump SAXS/WAXS data collected on the cis-proline isomerase CypA [40]. Here we analyze one particular set of experiments corresponding to the wild type CypA protein and buffer blanks following T-jump to 29.9 ± 0.1 • C. After downloading the raw T-jump data ( [43]), we repeated the published data reduction protocol [40] using a custom MATLAB script (available upon request). Briefly, difference scattering curves (∆I = I on − I off ) were calculated for both the protein and buffer blanks, and a series of scaling operations was performed to correct for shot-to-shot variations, most crucially the scaling of buffer difference profiles in order to minimize ∆I protein − ∆I buffer in the WAXS regime, where solvent scattering predominates. This produced a set of 28 difference profiles: 27 logarithmically distributed time points after T-jump and one control where the laser was off prior to X-ray exposure. The control profile was close to zero, indicating that the data processing had not introduced errors, and the remaining profiles resembled those reported in the original publication. After initial data reduction, the WAXS data were discarded and the SAXS portion of the curves (0.025 ≤ q < 1Å −1 ) was saved as CypA Tjump.mat (a MATLAB-formatted hdf5 file).