Global fitting of multiple data frames from SEC–SAXS to investigate the structure of next-generation nanodiscs

A framework is presented for refining models from several data frames from the same size-exclusion chromatography small-angle scattering experiment. The method can be employed to drastically reduce the number of parameters refined from the data series.


Introduction
Small-angle scattering (SAS) is a well established and widely applied method that is used to investigate a broad range of soluble samples, ranging from particles of biomolecular origin, including proteins and nucleotide-based structures, to selfassembled systems such as micelles, vesicles and various lipidprotein complexes, including nanodiscs. The use of small-angle scattering for investigating biomolecular structures has triggered large improvements on both the instrument and the sample-environment sides. These improvements have been driven by the frequent scarcity of sample and the relatively small signal over the background, as well as the propensity of many biomolecular samples to aggregate.
The combination of size-exclusion chromatography (SEC) and small-angle X-ray scattering (SAXS) into an integrated SEC-SAXS setup and, more recently, of SEC and small-angle neutron scattering (SANS) into SEC-SANS, are great examples of such improvements (David & Pé rez, 2009;Mathew et al., 2004;Watanabe & Inoko, 2009;Jordan et al., 2016;Johansen et al., 2018). Despite the fact that SEC-SAS leads to a dilution of the sample and hence a decreased signal over the background, this is in most cases counterbalanced as the remaining part of the signal comes from a single species or a narrow distribution of species, making the data interpretation less ambiguous.
With the introduction of SEC-SAXS and SEC-SANS, sizeexclusion-based segregation splits the sample into size-sorted fractions from which data are then continuously recorded by SAXS or SANS. Using this setup on a polydisperse sample, the investigator will obtain much more information than if the SAS analysis is performed on the nonfractionated sample. For example, for pure protein samples which are prone to oligomerization this setup may be used to separate and collect information on the different oligomeric states of the protein . Usually, SEC-SAXS and SEC-SANS are used with the goal of overcoming protein-aggregation issues since the sample is irradiated immediately after SEC purification Jeffries et al., 2016;Ryan et al., 2018). In these cases there is a narrow focus on a single species.
There are circumstances in which SEC fails to fully separate molecules with differing structures. Initial SEC-SAS data processing often involves checking for monodispersity within the relevant peak in the chromatogram by calculating radii of gyration (R g ) or molecular weight (MW) per frame. The use of a program such as CHROMIXS (Panjkovich & Svergun, 2018), for example, makes this process very simple. Using this information, typically the average of a small set of consecutive frames are selected for further analysis. Usually the rest of the SEC-SAS data series is not analysed in depth, despite possibly also containing relevant information about the species. Furthermore, in cases where two or more discrete populations are merged into a single chromatographic peak there are advanced mathematical techniques available, such as state-ofthe-art evolving factor analysis (EFA) software (Hopkins et al., 2017;Konarev et al., 2022;Tully et al., 2021), to devolve the overlapping peaks and isolate SAXS profiles corresponding with each population. This is less applicable, however, to the naturally occurring polydispersity around a single species in self-assembled systems.
Nanodiscs are disc-shaped particles consisting of a central lipid bilayer encircled by two amphipathic membranescaffolding proteins (MSPs), as depicted in Fig. 1(a) (Bayburt The experimental setup and broadening of the peak during SEC-SAXS. (a) Molecular visualization of a DMPC-loaded csMSP1D1ÁH5 nanodisc built with CHARMM-GUI NanodiscBuilder (Jo et al., 2008;Qi et al., 2019). (b) Schematic of the SEC-SAXS setup to reiterate the distance between the HPLC UV280 absorbance detector and the capillary where SAXS is recorded. (c) Normalized chromatogram and scattergram for csMSP1D1ÁH5 nanodiscs. The grey points indicate UV absorbance and the red points indicate the total intensity per frame. Solid lines are exponentially modified Gaussian (EMG) fits to the data. The centres of the two peaks are aligned. (d) The black profile is the EMG fit to the chromatogram in absorbance units. The red profile is the corrected version substituting in parameters from the fit to the scattergram while keeping the area under the curve constant. (e) I(0)/c as a function of the elution volume. Black points are calculated from the original SEC profile. Red points are calculated from the corrected profile. The dashed line is the theoretical value estimated for 120 DMPC per nanodisc. Denisov et al., 2004). Nanodiscs are formed by a self-assembly process involving detergent-solubilized lipids and MSPs. The self-assembly is initiated by removal of the detergent, making the lipids and MSPs form particles in a process that is highly dependent on the MSP and lipids of choice. In addition, membrane proteins can be included in the self-assembly, resulting in membrane protein-loaded nanodiscs. Due to the presence of lipids, nanodiscs are commonly used as a platform to study the structure and function of membrane proteins in a native-like environment (Denisov & Sligar, 2017).
In this article, we investigate and discuss how the large amount of information obtained in a SEC-SAXS experiment can be brought into play through global analysis of the data. We use dimyristoylphosphatidylcholine (DMPC)-loaded nanodiscs of three various sizes, facilitated through three nextgeneration circularized (Nasr et al., 2017) and supercharged (Johansen et al., 2019) membrane-scaffold proteins (csMSPs). Circularization refers to the covalent linkage of the MSP Nand C-termini in order to improve size homogeneity, while increasing the number of negatively charged residues enhances the solubility of the nanodisc. The smallest nanodisc that we investigate, csMSP1D1ÁH5, is approximately 8 nm in diameter (Hagn et al., 2013), followed by csMSP1D1, which is approximately 10 nm in diameter (Hagn et al., 2013), and finally csMSP1E3D1, which is 13 nm in diameter (Johansen et al., 2019). The solution structures of these three nanodiscs have been studied previously by offline SEC purification and standard robot SAXS measurements (Johansen et al., 2019(Johansen et al., , 2021, however, without a focus on the underlying size and shape distributions within the populations. In this study, we demonstrate that this kind of information is easily accessible via SEC-SAXS. To the obtained data we fit a simple geometrical model for the nanodiscs that we have used several times before (Skar-Gislinge & Arleth, 2011;Skar-Gislinge et al., 2010. Global fitting of multiple data sets is already used to investigate simultaneously acquired SAXS and SANS data through the fitting of a common model which is then calculated in the relevant contrast. This has been widely exploited and several examples are available in the literature for various types of systems, i.e. microemulsions (Arleth & Pedersen, 2001), nanodiscs (Skar-Gislinge et al., 2010), the self-assembly of polymers into toroids (Hollamby et al., 2016) and micelles (Mineart et al., 2019), and in the case of specifically deuterated proteins in solution (Whitten et al., 2007;Heller et al., 2003).
A global fitting approach can also be used to analyse a series of data on the same sample where a subset of the model parameters are conserved throughout the series and others vary. For such shared parameters, a single value is refined for all data sets. For parameters which are not shared, a distinct value is refined for each data set. Such approaches have been applied to diverse cases of analysis of SAXS data from timedependent fibrillating samples (Herranz-Trillo et al., 2017;Ortore et al., 2011), the variation of monomer-dimer equilibria with concentration (Blobel et al., 2009), temperatureinduced aggregation (Mariani et al., 2010;Gonnelli et al., 2020), a SANS analysis of the growth behaviour of SDS micelles  and even the analysis of both a series of SAXS data and a series of SANS data simultaneously (Sinibaldi et al., 2008).
The global approach to model fitting has strength in that it ensures a more self-consistent analysis across data sets and with fewer parameters. Additionally, a larger amount of the acquired data are used to evaluate the proposed model and to determine the model parameters. The weakness lies in the added complexity of the modelling setup.
Overall, we show how the global fitting approach provides a more robust analysis of the obtained SEC-SAXS data for nanodiscs. As a part of this, we are able to rationalize the degree of lipid loading in the nanodiscs over the SEC peak. For the small csMSP1D1ÁH5 discs we find that there is very minimal size separation over the peak, but for the slightly larger csMSP1D1 discs as well as the even larger csMSP1E3D1 discs we observe how the SEC splits the sample up into discs with progressively higher to lower lipid-to-MSP stoichiometries. The geometric parameters of the nanodiscs over the SEC peak can be described with a linear frame-to-frame relationship in order to reduce the number of free parameters while still providing a detailed structural overview of the nanodisc populations and without compromising the integrity of the fit to the data sets. The global model provides excellent fits to the whole series of eight SAXS data sets from the same SEC peak simultaneously for each of our three samples. Using our global model we are able to reduce the number of free parameters to 16, compared with 56 free parameters if we were to refine the nanodisc model against eight SAXS frames independently.
As a side note, we introduce a novel approach for quantifying the broadening of the peak during a SEC-SAXS experiment, with the aim of calculating more accurate concentration estimates, which are essential for modelling on an absolute scale.

Sample preparation
MSP-based nanodiscs were prepared as described elsewhere (Johansen et al., 2021), excluding the final size-exclusion chromatography (SEC) purification. Briefly, DMPC was solubilized to 50 mM with reconstitution buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl) containing 100 mM sodium cholate. The solubilized DMPC was mixed with MSP in molar ratios of 55:1 (csMSP1D1ÁH5), 80:1 (csMSP1D1) and 130:1 (csMSP1E3D1) and was diluted with reconstitution buffer to a final DMPC concentration of 10 mM. The samples were incubated at 28 C with 15%(w/v) detergent-absorbing beads (Amberlite XAD-2, Merck) for three hours. The samples were separated from the beads, stored on ice and transported to the SAXS facility.

Data acquisition
SAXS data were collected on BM29 at the European Synchrotron Radiation Facility (ESRF) using the online SEC-SAXS setup (Pernot et al., 2013), where the temperature of the SAXS capillary was kept at 10 C. 200 ml samples were loaded onto a Superdex 200 Increase 10/300 GL column (GE) equilibrated in phosphate buffer. For csMSP1D1ÁH5 and csMSP1E3D1 nanodiscs the buffer was 20 mM sodium phosphate pH 7.0, 150 mM NaCl, while for csMSP1D1 nanodiscs the buffer was phosphate-buffered saline (Sigma) with 1 mM DTT. We note that nanodiscs were initially reconstituted in Tris-based buffer according to standardized procedures; however, the pK a of Tris is quite temperature-sensitive, and to keep the pH stable in our measurements we opted for buffer exchange into phosphate buffer, which is rather insensitive to temperature. 1 s SAXS frames were continuously measured during sample elution. The intensity was measured as a function of q, with q = 4sin/, where is half the scattering angle and is the wavelength (here 0.9919 Å ), and calibrated to units of cm À1 using H 2 O as a calibration standard (Orthaber et al., 2000). The absorbance at 280 nm was converted to a concentration using protein extinction coefficients calculated with ProtParam (Gasteiger et al., 2005): 18 450 M À1 cm À1 for csMSP1D1ÁH5 and csMSP1D1 and 26 930 M À1 cm À1 for csMSP1E3D1. Note that DMPC does not absorb light at this wavelength. The loading nanodisc concentrations were 0.06 mM for csMSP1D1 and csMSP1E3D1 nanodiscs and 0.16 mM for csMSP1D1Áh5 nanodiscs.

Data processing
To reduce the size of the data series, the 1 s SAXS frames were averaged over 10 s. 50 frames collected prior to the elution peak, corresponding to buffer, were then averaged and used for background subtraction. The baseline intensity remains stable before and after the peak, indicating that the chosen buffer frames are suitable for the entire data series (see, for example, Supplementary Fig. S1). SAXS data were rebinned to lie evenly on a logarithmic q-scale. Pair-distance [p(r)] distributions were obtained by the indirect Fourier transform (IFT) method using the online program BayesApp available at https://genapp.rocks/ (Savelyev & Brookes, 2019;Hansen, 2000). Radii of gyration (R g ) and the forward scattering [I(0)] were calculated using AUTORG from ATSAS (Petoukhov et al., 2007). Scattergrams were generated by calculating the total intensity in the q-range 0.008-0.3 Å À1 per SAXS frame and plotting it as a function of elution volume, where we use the HPLC flow rate to convert SAXS time stamps to elution volumes so that the scattergrams and chromatograms can be aligned. The nanodisc model is implemented in WillItFit (Pedersen et al., 2013).
2.4. Small-angle scattering and principles of the modelling 2.4.1. Modelling of nanodiscs. With our SAS data, our main objective is to refine structural models of our nanodiscs from the SEC-SAXS data presented in Fig. 2. The model of choice in this study is the well established nanodisc model (Skar-Gislinge et al., 2010;Skar-Gislinge & Arleth, 2011), in which the geometric structure of the nanodisc is described by a series of form-factor amplitudes, each of which accounts for the scattering from a distinct part of the nanodisc. The nanodisc model is sketched in Fig. 3(c). These form-factor amplitudes have been mathematically described in the literature (Pedersen, 1997). The model is calculated on an absolute scale by utilizing the sample concentration, as well as the molecular composition of the MSP and DMPC, to calculate the scattering length applicable for each part of the nanodisc, as listed in Supplementary Table S7.
Overall, the nanodisc model is described by the following quantities: (i) the axis ratio of the patch of lipid bilayer, ", (ii) the average area per phospholipid headgroup in the bilayer, A L , (iii) the number of lipids in a nanodisc, N L , (iv) the partial specific molecular volume of a phospholipid, L , (v) the partial specific molecular volume of an MSP, P , and (vi) the height of the cylinder describing the protein belt. In this study, we fix this height at 25.8 Å throughout our refinement, in line with previous studies . The model is sketched in Fig. 3. Additionally, we refine a constant background contribution, b, and a term accounting for the interface roughness in our model, R (Als-Nielsen & McMorrow, 2011). We denote this set of parameters as h.
Such models are usually refined by minimizing the (reduced) 2 r , which estimates the overlap between the data and a specified model function, I Mod (q, h). This quantity is defined as where q j , I j and j constitute the jth data point in a data set consisting of N data points. N DoF is the number of degrees of freedom, which we compute as the number of data points minus the number of parameters in the model. 2.4.2. Global fitting of multiple frames. In this study, we refined our structural models from several data sets simultaneously and found the best fit for the whole series. As our data sets were collected across a peak in the same SEC experiment, we split our list of parameters into two categories: parameters that we assumed to vary across the irradiated SEC fractions and parameters that we assumed not to vary. All nanodiscs within the same sample comprise the same lipids and MSPs, and hence there should be minimal variation in the volumes of lipids and MSPs. Although there is evidence to suggest that the dynamics and packing of lipids embedded in nanodiscs vary depending on the distance of the lipid from the rim (Bengtsen et al., 2020;Martinez et al., 2017), on average the area per headgroup should remain stable under identical experimental conditions. Rather, depending on sample preparation, there may be a distribution of fully loaded circular discs and underloaded elliptical discs (Skar-Gislinge et al., 2018). Thus for the kth data set we refine individual values of N L , " and b (which we denote by h k ). The parameters L , P , A L and R are refined to a single value used in all of the models; we label these parameters H.
In order to accommodate for this categorization of our parameters, we redefine our figure of merit, 2 r , from equation (1) where N k is the number of data points in the kth data set, of which there are M, which now prompts us to denote the jth data point in the kth data set by (q k,j , I k,j , k,j ). Note that the model function now depends on not only the parameters specific to the kth data set, h k , but also the 'global' parameters that are identical across all of the data sets, H. This is an adaptation of a similar scheme to analyze temperature series of SAXS data (Johansen et al., 2021). Additionally, rather than allowing the individual parameters in h k to vary irrespective to the other data sets, this scheme allows us to assume and enforce, for example, linear trends between the various frames to lower the total number of parameters refined in the scheme: i.e. rather than refining M individual values of N L , we assume a linear trend across the SEC fractions, N L = an + b, where n is the frame number in the data series and a and b are parameters to be refined. Hence, we reduce the number of parameters in the refinement scheme by M À 2. By employing the same idea for ", we reduce the number of refined parameters by an additional M À 2. In a sense, this notion is a natural extension of the idea of the 'global' parameters in H which are simply kept constant across the frames, and hence their frame-to-frame relationship is described by a single parameter using a zeroth-order polynomial rather than two parameters in a first-order polynomial. We remark that a linear function is sufficient for our  Model fit results for csMSP1E3D1 nanodiscs. (a) Global fit to experimental SAXS data sets from frames with increasing elution volumes/positions across the SEC peak. Data sets are the middle eight highlighted frames in Fig 2(a). (Individual fits are shown in Supplementary Fig. S3.) The topmost data set is on an absolute scale, while those below are scaled by 2 Àn , where n is the frame number. (b) Refined structural parameters. The coloured data points indicate parameters refined from each data set individually. The black lines indicate parameters refined from the global fit, where one shared value is found for A L , v P and v L , while N L and " are both forced to follow a linear trend. (c) Representation of the nanodisc model used; a quarter of the MSP belt is not shown to highlight the interior structure of the lipid bilayer.
purposes; providing a more physical model to describe particles eluting from a SEC column could require a more complicated relationship and further investigation is necessary before drawing conclusions. More complicated relationships can readily be employed but become impractical (or simply useless) if they require a number of coefficients comparable to the number of data sets, unless there is a solid underlying theory to support their use.

Co-calibration of the SEC-UV280 and the SEC-SAXS intensities
The SEC-SAXS setup is sketched in Fig. 1(b). Broadening of the elution peak often occurs during SEC-SAXS experiments due to Taylor dispersion (Taylor, 1953) and the difference in diameter between the HPLC tubing and the SAXS capillary (Bucciarelli et al., 2018). Here, we introduce a novel approach for estimating and correcting for this broadening. The approach is illustrated in Figs. 1(c) and 1(d) and Supplementary Fig. S2. In Fig. 1(c) the normalized chromatogram for csMSP1D1ÁH5 nanodiscs is plotted with its corresponding scattergram, i.e. the scattering intensity per individual frame as a function of the elution volume. The centre of the peak of the scattergram is aligned with the centre of the peak of the chromatogram and the broadening of the scattergram is clearly visible.
Exponentially modified Gaussian (EMG) functions are good models for chromatographic peaks under a range of conditions (Naish & Hartwell, 1988;Busnel et al., 2001), where broadening can be characterized by two parameters: the standard deviation (width), , and a relaxation parameter (skew), . EMGs were fitted to the main peaks of the chromatogram and scattergram via nonlinear least-squares regression. A 'corrected' SEC profile was then calculated by keeping the area under the EMG fit of the chromatogram constant, but substituting in and from the fit to the scattergram in order to take account of the change in the shape of the peak, which becomes wider and develops a tail on the right-hand side. Thus, the corrected profile approximates the UV absorption as if it were recorded directly on the SAXS capillary and should provide much more accurate concentration estimates. The original SEC peak and the corrected SEC peak can be compared in Fig. 1(d). Estimating the sample concentration directly from the raw HPLC absorption measurements may lead to underestimated concentrations in the tails of the peak and overestimated concentrations in the centre.
As a check, we calculated the forward scattering I(0) divided by concentration for the SAXS data sets as a function of elution volume, as plotted in Fig. 1(e). The values calculated with the original concentrations show a prominent decrease and then an increase, which cannot be readily explained. For a fully homogeneous sample, I(0)/c should remain constant. If there is some size variation I(0)/c may decrease systematically towards the right-hand side, which is seen for the values calculated with the corrected concentrations. These values also fall close to an estimate of I(0)/c which we calculated for csMSP1D1ÁH5 nanodiscs loaded with 120 DMPC. We note that during modelling the nanodisc form factor multiplied by the new concentrations matches the experimental SAXS intensities perfectly without the need for an additional scaling factor.
One potential drawback of this method lies in the fact that scattering intensity scales with squared particle volume while protein UV absorption does not, meaning that some discrepancy between the shapes of the chromatograms and the scattergrams is to be expected. In this case, however, the corrected SEC profile performed better and the method could be considered for other SEC-SAXS studies in which accurate concentration estimates are desirable for absolute-scale modelling or molecular-weight determination.

SEC-SAXS data overview
The SEC-SAXS data and associated p(r) distributions for all three nanodiscs species are shown in Fig. 2. For each nanodisc species the data indicate some systematic structural variation across the size-sorted fractions. For the smallest nanodiscs, R g stays constant across the SEC peak at $40 Å ; however, for csMSP1D1 nanodiscs there is a steady decrease from $46 to 42 Å , and for csMSP1E3D1 nanodiscs the decrease from $58 to 52 Å is even more apparent. Each of the scattering curves is compatible with that we typically observe for monodisperse nanodiscs: a flat Guinier region in the low-q regime, followed by a trough and a broad bump at medium to high q. csMSP1D1ÁH5 and csMSP1E3D1 display the typical nanodisc double-bump feature (Skar-Gislinge et al., 2010;Denisov et al., 2005). For csMSP1D1, and even more significantly for csMSP1E3D1, as the position of the fraction in the elution profile progresses, the first minimum in the scattering curve shifts systematically to higher q values, indicating a change in particle shape. The p(r) distributions reaffirm this, showing a systematic loss of depth of the first minimum alongside a decrease in the maximum pair distance (D max ) as we move to larger elution volumes. Again, these variations are least prominent in the small discs and most prominent in the large discs, which may suggest that larger discs are more structurally disperse. Altogether, these observations suggest that even within a SEC-purified nanodisc population there is some size distribution which may be sorted by a SEC column so that larger particles elute first, but below some resolution it will not be separated into multiple elution peaks.

Modelling and data analysis
Analysing many data sets from the same SEC-SAXS experiment with the nanodisc model provides more detailed insights into the size and shape distributions underlying the populations. We select eight sequential SAXS data sets for each sample. Firstly, we refine the model against each data set independently as a precursor. Secondly, we refine the model against each data set simultaneously with both global and frame-specific free parameters in order to constrain the fits further and investigate the amount of information which can be extracted with a reduced number of free parameters. We note that although each of the individual data sets are collected from a narrow fraction of the SEC-purified sample, the data sets may still contain some slight overlap between different nanodisc sizes. The refined model therefore describes the average scattering from the nanodiscs present and does not account for polydispersity within a certain frame.
3.3.1. Individual fits. When fitted to the individual frames, the nanodisc model provides excellent fits to each of the SAXS data sets chosen for further analysis. The individual fits are plotted in Supplementary Figs. S3, S4 and S5. The refined model parameters from individual fits to the eight SAXS data sets for csMSP1E3D1 nanodiscs are plotted as coloured points in Fig. 3(b) and are further listed in Supplementary Table S1. The results for csMSP1D1 and csMSP1D1ÁH5 nanodiscs are given in the supporting information.
For all three nanodisc samples the area per lipid headgroup, A L , and the partial specific molecular volumes of the lipid, v L , and MSP, v P , generally fluctuate only slightly between frames. This is in line with our expectations since the volume of DMPC and of each MSP should be very stable for the entire sample, regardless of elution volume. Although prone to local fluctuations, the refined value of the area per headgroup should also remain stable. For the three nanodiscs, A L was refined to values of between 49.5 and 53.5 Å 2 , which is in good agreement with previous values of 47.5 Å 2 for DMPC-loaded nanodiscs (Johansen et al., 2021), 52.1 Å 2 for DMPC-loaded peptide discs (Midtgaard et al., 2014) and 47.2 Å 2 for a pure DMPC bilayer (Tristram-Nagle et al., 2002), all of which were recorded at 10 C. We mention that since the temperature is not controlled over the entire SEC-SAXS instrumentation, the temperature of the sample may be slightly above 10 C. This may affect the lipid packing slightly; however, as the temperature was kept below the melting temperature of DMPC at 24 C the effect will not be prominent (Johansen et al., 2021). v L becomes up to 5% larger than the reported value of 1041 Å 3 (Tristram-Nagle et al., 2002). v P stays within 5% below our pre-estimated values based on the molecular compositions, which are specific for each MSP. Prominent frame-to-frame fluctuations of these three free parameters could be the result of overfitting to the SAXS data and strong correlations between parameters in the model.
Rather, the systematic variations in the SAXS data sets are reflected in the steady decrease in the number of lipids per nanodisc, N L , as a function of elution volume, likely coinciding with a general increase in the axis ratio, ". Since the circumference of the nanodisc is determined by the length of the MSP and is therefore expected to remain constant, variation in the number of lipids (and thereby the bilayer surface area) must be compensated by some variation in the shape of the disc. Although " is poorly determined by this method, we assume that this dependency between N L and " is present across the sample. Each data set indicates elliptical discs, where discs with higher lipid-to-MSP stoichiometries appear to be slightly rounder, while discs with lower lipid-to-MSP stoichiometries become more elliptical. The same trend has been observed many times (Skar-Gislinge et al., 2010Graziano et al., 2018). According to our analysis, csMSP1E3D1 nanodiscs contain the largest underlying size distribution, with a difference in N L of 65 lipids between the size-sorted first and last frames, from 325 to 260 lipids. csMSP1D1 decreases by 35 lipids from 150 to 115 and csMSP1D1ÁH5 decreases by 15 lipids from 130 to 115.
Unlike previous reports (Johansen et al., 2021), we do not see a simple linear correlation between axis ratio and length of the MSP here, despite larger discs theoretically being more structurally flexible. csMSP1D1 nanodiscs persistently have the largest axis ratio, which varies between 1.6 and 1.8, whereas csMSP1D1ÁH5 nanodiscs have the smallest, varying between 1.3 and 1.5, and csMSP1E3D1 lies in between with values varying between 1.45 and 1.65. Although seemingly incidental, this coincides with a recent course-grained moleculardynamics study of the same circularized MSPs (cMSPs, nonsupercharged; Kjølbye et al., 2021), where cMSP1D1 was found to have the highest degree of anisotropy, with cMSP1D1ÁH5 being the most circular and cMSP1E3D1 falling in between. These results suggest that there are other factors influencing the shape of nanodiscs besides the degree of lipid loading, especially the choice of MSP and its intrinsic rigidity.
3.3.2. Global fits. Fitting the nanodisc model to M data sets requires 7M free parameters. Certain parameters, however, should be conserved when examining data sets from the same SEC-SAXS experiment and hence fitting the parameter M times becomes redundant. The individual fits justify the introduction of global parameters for A L , v L , v P and R to ensure that the model refinement is self-consistent and that these parameters are better determined. N L and ", however, capture important trends between the data sets as a function of elution volume. This information would be lost if fitting using a constant rather than the two-parameter function that we utilized here.
A global model could be set up with A L , v L , v P and R as global parameters and N L , " and b as frame-specific parameters, such that the number of free parameters is 4 + 3M. However, to constrain the fit even further, frame-to-frame linear relationships are enforced for N L and ", where the y intercept and slope of the respective functions are global parameters as described in Section 2.4.2 and shown in the top row in Fig 3(b), capturing increasing or decreasing trends across the data series using only two free parameters per function. In this implementation of the model, the number of free parameters is 8 + M, where the only frame-specific parameter is the background, b. In this case, where M = 8, swapping from individual modelling to the global modelling described here drastically reduces the number of free parameters from 56 (7 Â 8) to 16 (8 + 8). Fig. 3(a) shows the global fit refined against the eight SAXS data sets simultaneously for csMSP1E3D1 nanodiscs. The refined model parameters are listed in Supplementary Table S1 and the frame-to-frame relationships are plotted in Fig. 3(b) as solid black lines. Global results for csMSP1D1 and csMSP1D1ÁH5 nanodiscs are given in the supporting infor-research papers mation. Despite the extra constraints, the global model is able to describe the entire series of SAXS data sets excellently, with no features standing out visually as poorly captured. The global model achieves impressive 2 r values of 7.5, 5.4 and 5.9 for csMSP1E3D1, csMSP1D1 and csMSP1D1ÁH5, respectively, as calculated by equation (2). Furthermore, reasonable structural parameters are maintained over the three samples and the important frame-to-frame trends are sustained.
For csMSP1E3D1 and csMSP1D1ÁH5 the global fit parameters mimic the individual fit parameters very closely, which suggests that the results are reliable and the choice of framespecific and global parameters are compatible. For csMSP1D1 the global fit parameters, although still satisfactory, are a slightly looser match to the individual fit parameters, especially the axis ratio, where the global model possibly determines a much steeper slope. We note that this could be explained by the fact that this data series has the poorest signal-to-noise ratio. We further comment that the large error on the " slope for all three experiments should be expected since it is clear in the individual fits that " is poorly determined and a range of slopes could be applicable. Refined global fit parameters should not be anticipated to emerge as the exact mean of the individual fit results, since the global fit minimizes the risk of overfitting to the SAXS data and constrains correlations between fit parameters.
We observe that the refined gradient of the straight line representing the fraction-dependent change in the number of lipids, N L , further rationalizes the degree of polydispersity present in each respective nanodisc sample: the largest disc csMSP1E3D1 shows the greatest gradient of À9.96N L per frame, with csMSP1D1 showing a gradient of À5.00N L per frame and csMSP1D1ÁH5 showing the most gentle gradient of À1.47N L per frame. These slopes can be compared with linear fits to R g as a function of position, where we calculate slopes of À0.23, À0.18 and À0.03 Å per frame for csMSP1E3D1, csMSP1D1 and csMSP1D1ÁH5, respectively.
Summing up, employing frame-to-frame constraints in our analysis of the presented SEC-SAXS data seems to allow considerably more constrained fits of a large amount of data whilst still producing realistic models and capturing interframe trends in a quantitative manner. The most notable advantages are the considerable reduction in the total number of parameters refined from the data and the tractability of refining a single model accounting for all of the data sets rather than individual models from each data set, which are then to be compared at a later stage; both of which in the cases presented here seem to come at little expense in terms of the quality of the fits.

Conclusions and further perspectives
Often during SEC-SAXS analysis only a small fraction of the SEC peak is considered and a large amount of structural information is discarded. We perform a comprehensive investigation into three types of next-generation nanodiscs by analysing many SAXS data sets from the same SEC-SAXS experiment. The size-sorted SAXS data sets reveal some systematic polydispersity within the structure of the nanodisc populations. A global approach to model fitting provides a robust analysis to help characterize the polydispersity. We observe that the SEC column gradually splits the samples into discs with high and low lipid-to-MSP stoichiometries. We employ simple frame-to-frame linear functions to further reduce the number of free parameters in the fitting routine. Despite the extra constraints, the global model is able to describe the entire series of SAXS data sets excellently and provides a detailed overview of the nanodisc populations through frame-specific and global refined values.
The reduction in the number of parameters refined from the data sets is a particularly attractive attribute of the outlined modelling scheme. Like similar inference tasks, model refinement from small-angle scattering data is prone to overfitting, so these simplifications (in terms of number of parameters in the model) provide a convenient means of analyzing the extensive amount of data one obtains from, for example, a SEC-SAXS experiment in a somewhat constrained manner. Naturally, such schemes rely intrinsically on the validity of the assumed trends across the analyzed data sets. Here, we successfully employ constant and linear relationships and argue that they are indeed sufficient to capture the general behavior of our data; mostly as we observe little to no increase in our figure of merit and the overall quality of our fits by employing them.
Our method has general applicability for samples and systems with inherent polydispersity within the resolution of the SEC column, including cases where the SEC peak is asymmetric or where two peaks have merged together. These include nanodiscs, as presented here, as well as similar membrane-protein carrier systems, including di-block copolymer lipid particles, for example, styrene-maleic acid lipid particles (Knowles et al., 2009), saposin lipid particles (Frauenfeld et al., 2016) and detergent micelles. Additionally, the method could be modified to analyse biological systems in different types of equilibrium and where distinct populations cannot be sufficiently separated on SEC for individual analysis (Vestergaard, 2016). These include, for example, protein monomer-dimer equilibria, protein-ligand equilibria, phaseseparated disordered proteins or systems adopting different structural states. In these cases, our method could be complementary to the popular evolving factor analysis (EFA) programs where model-independent EFA can be employed to identify and isolate uncontaminated profiles of the distinct populations for further structural analysis, potentially including global fitting (although of only two or three data sets). With EFA it is possible to extract an overall picture of sources of extreme structural heterogeniety within a sample. Previous examples include identifying scattering contributions from massive contaminants (Meisburger et al., 2016), separating protein monomers from dimers or oligomers (Hopkins et al., 2017;Konarev et al., 2022) and separating bound and unbound protein states (Tully et al., 2021). Our presented method is more suitable, however, when the desired outcome is a continuous description of systematic polydispersity across a data series, particularly when there is an underlying distri-bution within a single population or when the amount of polydispersity is too small for EFA to detect. This is only possible by investigating many narrow fractions of the elution profile. Furthermore, EFA fails when the chromatographic peak is too asymmetrical or when two peaks are too close together (Konarev et al., 2022). In this work we analyse data sets directly from SEC-SAXS and assume that each fraction contains only a single population; however, one should be cautious since this is not necessarily true under the resolution of the SEC column.
Furthermore, our global fitting scheme is readily suitable for SEC-SANS experiments, and would be a very powerful fitting platform if the model could be refined against series of SEC-SAXS data sets and series of SEC-SANS data sets simultaneously. Finally, issues with peak broadening are well acknowledged in the SEC-SAXS community (Ryan et al., 2018) and efforts have been made to measure the absorption directly on the SAXS capillary (Bucciarelli et al., 2018). As part of our overall method, we suggest a simple correction procedure for the online absorption measurement, which eliminates parts of the problem with peak broadening and thereby allows more accurate determination of the forward scattering and thereby parameters such as molecular weight.