ATSAS 3.0: expanded functionality and new tools for small-angle scattering data analysis

ATSAS is a comprehensive software suite for the processing, visualization, analysis and modelling of small-angle scattering data. This article describes developments in the ATSAS 3.0 release, including new programs for data simulation and for the structural modelling of lipids, nucleic acids and polydisperse systems.


Introduction
Small-angle scattering (SAS) of X-rays (SAXS) or neutrons (SANS) provides low-resolution structural information about various nanostructured systems, including biological macromolecules in solution . Over the past two decades, SAS has become an increasingly common technique in the integrative structural biology toolkit (Graewert & Svergun, 2013;Tuukkanen et al., 2017;Brosey & Tainer, 2019). Importantly, SAS can be combined with high-resolution methods such as X-ray crystallography, nuclear magnetic resonance and cryo-electron microscopy, as well as other biophysical techniques like circular dichroism, static and dynamic light scattering, and cross-linking mass spectrometry (Lipfert & Doniach, 2007;Kachala et al., 2015;. Solution SAS, in particular, allows the study of macromolecules in close to physiological environments and also the effects of changing environmental conditions, for example by varying temperature, pH or pressure, or by adding ligands. Increased availability of and continuous improvements to SAXS laboratory sources, third-generation synchrotrons, high-neutron-flux nuclear reactors and spallation sources have contributed to the growth of the biological SAS community (Fig. 1), which in turn has spurred developments in sample environments and instrument hardware (Classen et al., 2013;Kirby et al., 2013;Pernot et al., 2013;Acerbo et al., 2015;Blanchet et al., 2015;Heller et al., 2018;Liu et al., 2018;Wood et al., 2018). For instance, size-exclusion chromatography coupled to SAS (SEC-SAS), first demonstrated by Mathew et al. (2004), has now become a routine approach for the separation and structural analysis of mixture components, and is offered at many SAXS and SANS beamlines, as well as for laboratory instruments (David & Pé rez, 2009;Graewert et al., 2015;Jordan et al., 2016;Brennich et al., 2017;Yeh et al., 2017;Johansen et al., 2018;Ryan et al., 2018;Bucciarelli et al., 2018;Inoue et al., 2019). Improved X-ray detectors enable timeresolved measurements at shorter timescales and, in combination with lasers and rapid mixing devices, facilitate the study of macromolecular kinetics (Cammarata et al., 2008;Kubelka, 2009;Pollack, 2011;Graceffa et al., 2013;Levantino et al., 2015;Tuukkanen et al., 2017;Josts et al., 2020).
In a solution SAS experiment, the scattered radiation is generally isotropic and recorded as a 2D detector image. The isotropic data are azimuthally ('radially') averaged into a 1D scattering intensity curve IðsÞ, s ¼ ð4 sin Þ=, where 2 is the angle between the scattered and the incident radiation and is the wavelength. Increased data quality can be achieved through the collection of replicate exposures from the sample, which are averaged into a single 1D scattering profile. Repli-cate exposures are similarly collected and averaged for the solvent. The resulting average is subtracted from the average sample scattering as a background, which includes the solvent, the sample holder and parasitic scattering effects . The background-subtracted 1D scattering profile thus represents scattering data from the sample alone and can be used to derive important structural characteristics such as the radius of gyration (R g ) (Guinier, 1939), maximum dimension (D max ), pair distance distribution function [pðrÞ] (Glatter, 1977;Svergun, 1992;Hansen, 2012), Porod volume (V p ) (Porod, 1951) and molecular weight (MW) (Orthaber et al., 2000;Mylonas & Svergun, 2007;Hajizadeh et al., 2018;Piiadov et al., 2019). Low-resolution models may be generated ab initio, either as dummy-atom/ residue models (Svergun, 1999;Svergun et al., 2001;Franke & Svergun, 2009) or electron densities (Grant, 2018). Hybrid methods incorporating high-resolution models from other techniques such as X-ray crystallography can be applied to obtain atomistic representations of the macromolecule (Petoukhov & Svergun, 2005;Panjkovich & Svergun, 2016a). Sample polydispersity -which may occur due to oligomer formation or intrinsic molecular flexibility or disorder -can be considered e.g. by modelling the solute as a mixture with defined components, each having different volume fractions (Tria et al., 2015;Konarev & Svergun, 2018).
The software tools developed by the SAS community offer access to various data analysis and modelling options. These include the multipurpose packages SASTBX (Liu et al., 2012), BioXTAS RAW (Hopkins et al., 2017), Sasview (http:// www.sasview.org) and ScÅ tter, which contain utilities for data handling and analysis in the form of radial averaging of 2D detector images to 1D scattering profiles, calculation of modelindependent structural parameters, SEC-SAXS data processing and deconvolution (for BioXTAS RAW), and model fitting and refinement (for SASTBX). Various specific modelling tools have also been developed. A non-exhaustive list includes SASfit, which constructs models using an extensive library of analytical expressions (Breßler, Kohlbrecher & Thü nemann, 2015); GenApp, a modular infrastructure containing SASSIE and US-SOMO, for atomistic modelling which integrates hydrodynamic information (Perkins et al., 2016;Brookes et al., 2016); FoXS, a web server for the calculation of SAXS data from atomic coordinates, which may be combined with docking (FoXSDock) or flexibility modelling (MultiFoXS) for biomolecular structures (Schneidman-Duhovny et al., 2016); and GENFIT, McSAS and X+, which perform ab initio modelling accounting for shape polydispersity, primarily for soft-matter SAS but with applications to large supramolecular assemblies of biomolecules like micelles and fibrils (Spinozzi et al., 2014;Bressler, Pauw & Thü nemann, 2015Ben-Nun et al., 2010. ATSAS is an evolving cross-platform software suite under continuous development which encompasses numerous utilities for SAS data processing, visualization, analysis and modelling. The general scope of the ATSAS suite is shown in Fig. 2, which enumerates specific programs that may be used for various data analysis scenarios. The utilities are largely The number of biological SAS publications per year (1) has steadily increased over the past two decades, accompanied by an increase in the number of biological SAS publications which cite the ATSAS software suite (2). The numbers of unique users per year that downloaded ATSAS (3) and used the web applications in ATSAS online (4) also show a concurrent increase. developed for biological solutions but are generally applicable to a wide variety of monodisperse and polydisperse systems (Konarev et al., 2006;Petoukhov et al., 2007Petoukhov et al., , 2012Franke et al., 2017). First released in 2003, ATSAS has since been downloaded more than 100 000 times by over 18 000 unique users, and its use has grown along with the expanding community of biological SAS practitioners (Fig. 1). ATSAS online, a web application facilitating easy access to a subset of ATSAS tools, has similarly experienced a constant increase in usage since its release in 2007. An average of 40 000 jobs are submitted to ATSAS online per year, representing around 900 unique users. The growing ATSAS user community has served as an impetus for the continued improvement of the suite and has prompted new developments of specialized tools, several of which are discussed below. For the general ATSAS description we refer readers to previous publications (Konarev et al., 2006;Petoukhov et al., 2007Petoukhov et al., , 2012 and to the comprehensive presentation of the features in ATSAS 2.8 (Franke et al., 2017). Here we focus on the changes made since the ATSAS 2.8 release, which include major improvements in the existing tools, technological and standardization updates, new modules for data simulation, and modelling programs for specific systems such as membrane proteins, liposomes and nucleic acids.
2. Calculation and simulation of scattering data 2.1. CRYSOL for anomalous SAXS CRYSOL utilizes a spherical harmonics approach for rapidly calculating the scattering amplitudes and isotropic SAXS intensities from high-resolution atomic structures of macromolecules and optionally fitting the calculated scattering to experimental SAXS data (Svergun et al., 1995). Since the ATSAS 2.8 release, CRYSOL has been updated to provide scattering intensities not only proportional to electronssquared units but also on an absolute scale per unit concentration [I abs ðsÞ (cm À1 )/c (mg ml À1 ); file extension .abs]. In addition, CRYSOL can now be used to calculate scattering curves that incorporate wavelength-dependent anomalous effects. Anomalous X-ray scattering occurs when the wavelength of incident radiation is at or near an atom's absorption edge, i.e. at the energy that corresponds to electronic transitions of a particular element. At wavelengths close to the edge, the incident radiation is partially absorbed, resulting in electrons being excited to higher-energy states and a consequent reduction in scattering intensity (James et al., 1948). This anomalous effect allows one to quantify distance information in crystallography (Hendrickson, 2014), and has also been used for the same purpose in SAXS (Stuhrmann & Notbohm, 1981;Miake-Lye et al., 1983). The net reduction in the SAXS signal is, however, very low and has the potential to be lost in the background scattering (Fig. 3); therefore, accurate evaluation of the anomalous effect is of great importance in designing and cross-validating anomalous SAXS (ASAXS) experiments. An atom's X-ray scattering form factor f is represented as a function with a wavelength-independent term, f 0 , and two wavelength-dependent anomalous correction terms, f 0 and f 00 (James et al., 1948): Absorption edges are wavelengths at which f 0 and f 00 are at local minima and maxima, respectively, resulting in a decreased magnitude of the atomic form factor and an overall decrease in scattering intensity. CRYSOL may now be used to account for anomalous scattering effects, using the correction terms f 0 and f 00 for elements from calcium to uranium, and for X-ray energies in the range from 1.0 to 29.4 keV. The corrections were tabulated by the University of Washington Biomolecular Structure Center, http://skuld.bmsc.washington.edu/ scatter/AS_periodic.html. The ASAXS mode of CRYSOL can be accessed via the command line by specifying the absorbing element and the energy in eV. The anomalous correction terms are applied to all instances of the specified element, while the rest of the atomic form factors are computed as usual. Since the provided correction terms are theoretical and may vary from the experimental values based on the chemical environment of the absorbing atom, users may also specify custom data files containing the experimental f 0 and f 00 values to more accurately account for the anomalous effects.

Simulation of experimental scattering data
Realistic simulated data are often required to test SAS data analysis and modelling programs on a wide variety of macromolecules, for which experimental scattering data might be unavailable. IMSIM (image simulation) simulates 2D SAXS patterns that can be processed into 1D scattering data using existing radial averaging applications (Franke et al., 2020), e.g. IM2DAT, discussed in the next section. IMSIM requires calculated scattering data in absolute scale, e.g. from CRYSOL, and follows a purely statistical simulation approach, where the final intensities and error estimates of 1D patterns obtained from the radial average of the simulated 2D images exhibit the same statistical properties as observed with actual experimental data. Effects due to changes in concentration, exposure time, flux, wavelength, sample-detector distance and dimensions, pixel size, and detector mask and incident beam position can be considered in the simulation, but not systematic instrument effects. As currently implemented, IMSIM simulates X-ray scattering only, but with the addition of a constant to account for incoherent scattering and a resolution function to incorporate instrumental smearing effects (Barker & Pedersen, 1995) it may also be adapted to simulate 2D SANS patterns in future ATSAS releases.
Aside from applications in SAXS methods development and testing, the simulated data could be used, for example, to aid experimental design or beamline configuration to optimize photon counting and statistical variance in I(s), and also for educational purposes. Figs. 3 and 4 depict examples of 1D scattering profiles resulting from simulated 2D detector images from IMSIM which were subsequently radially averaged by IM2DAT.

Primary data processing
Primary data processing spans the steps from radial averaging to the computation of model-independent structural parameters from 1D scattering data. Care should be taken in the derivation of 1D scattering data from the set of 2D detector images, particularly with the identification and removal of outlier data frames, and error estimation and propagation. The  Simulated SAXS data for parvalbumin (PDB ID 1pal), with terbium atoms in two calcium-binding sites of the protein. Regular, wavelengthindependent scattering (top panel, black) was computed with CRYSOL in default mode, while anomalous scattering (top panel, red) was evaluated with CRYSOL in anomalous mode, at the L III absorption edge of terbium (7517 eV). Experimental data were simulated with IMSIM at two parvalbumin concentrations, 10 and 50 mg ml À1 . DATCMP was used to compare regular and anomalous scattering at the two concentrations, showing greater differences at 50 mg ml À1 (details in Table 1). Residual plots on the bottom panel more clearly depict the differences between regular and anomalous scattering at 10 (black) and 50 mg ml À1 (red). At both concentrations, there is a reduction in forward scattering at the absorption edge. The difference between regular and anomalous SAXS is partly obscured by noise at 10 mg ml À1 but is more clearly visible at 50 mg ml À1 parvalbumin.

Figure 4
1D scattering data from beta-lactamase (PDB ID 5hw5) simulated by IMSIM and radially averaged with IM2DAT (dark blue), overlaid with the source data calculated by CRYSOL (cyan), and the corresponding fit of the ab initio model from DATMIF (pink). The inset shows the DATMIF bead model superimposed on the source model. The offset residual plots show random distribution of the residuals around zero within the expected bounds (AE3). Corresponding goodness-of-fit statistics are reported in Table 1. 1D scattering data would result in inaccurate structural parameters and potentially erroneous models. The practical implications of improper data handling are illustrated by the apparent and widespread misspecification of experimental errors in many data sets submitted to the Small-Angle Scattering Biological Databank (Kikhney et al., 2020). This highlights the need for cross-validation methods, preferably at multiple steps in the data processing pipeline. Below, we discuss several updates in the ATSAS 3.0 package which can be used for cross-validation at different processing steps, from the 2D image to the calculation of structural parameters.

Basic operations on 2D and 1D scattering data
IMOP (image operations) is a new support application for operations on 2D images, similar to the established DATOP (data operations) for 1D scattering data (Franke et al., 2017). IMOP supports addition and subtraction operations on images of equal size, as well as AND, OR and XOR operations that are intended for binary masks. In addition, it may be used to permanently apply a given bit-mask to an image. An example of the use of these elemental operations of IMOP is cross-validation of data reduction operations, e.g. by comparing radial averaging of N images and summing the 1D patterns versus the summation of N images followed by radial averaging.
IM2DAT (image to data), formerly called RADAVER (Konarev et al., 2006), performs azimuthal/radial averaging of 2D detector images into 1D scattering patterns. Error estimation is based on Poisson counting statistics. To detect outliers within the data of each ring, the Poisson-distributed photon counts are transformed via the Anscombe transform (Anscombe, 1948) to approximate a normal distribution, and a median-based robust z score (Iglewicz & Hoaglin, 1993) is calculated to reject outliers where z > 4. No attempt at subpixel analysis (i.e. pixel-splitting) has been implemented as this would probably introduce correlations between neighbouring intensity estimates, which cannot easily be tracked and propagated in subsequent operations.
In contrast to past versions of ATSAS, in which the radial averaging application was only available upon request, IM2DAT has now been included by default in ATSAS 3.0 to facilitate its use with IMSIM. The users may, of course, also separately employ IM2DAT to reprocess existing experimental 2D data. 1D data produced by radial averaging can be used for various downstream operations implemented in the DATTOOLS suite. Although there are no conceptual changes in DATTOOLS compared with its previous description (Franke et al., 2017), the error propagation implemented in these tools was extensively validated and corrected where needed. Once the provenance and independence of the initial error estimates are established, they can be used in further operations.

Variance and residual analysis
In SAS data analysis, several model-independent parameters, e.g. D max , V p and MW, are computed as point estimates only, without an estimate of variance. In these cases, DATRESAMPLE may be used to determine the variability of these estimates by parametric resampling of the experimental intensities, i.e. by drawing randomly from a normal distribution (Marsaglia & Bray, 1964) with the expected value and standard deviation corresponding to the intensity and scaled error estimate, to account for the additional uncertainty, at each s. For example, to validate the R g variation estimate provided by DATRG, a single data frame can be resampled N = 1000 times, with the resampling R g calculated for each frame from the same data range. The standard deviation of the obtained set of resampled N R g values can then be compared with the standard error estimate provided by DATRG for the original data. In addition to generating or validating variance estimates, DATRESAMPLE may be used to augment available training data for machine-learning applications by resampling a single data set N times.
The analysis of the outliers allows one to identify data sets influenced by effects like sample misloading, denaturing or radiation damage. The identification of these systematic deviations is one of the most important steps in the analysis pipeline. In previous ATSAS releases, DATCMP provided two statistical tests to determine the presence of systematic deviations: the reduced 2 test, which requires well estimated experimental errors (Pearson, 1900), and CORMAP, which is independent of experimental errors . In this release, we added the Anderson-Darling statistic to DATCMP. This test evaluates the goodness of fit of the distribution of standardized residuals, i.e. the differences between experimental data and calculated scattering, divided by the propagated error estimates, to the expected standard normal distribution (Anderson & Darling, 1954;Stephens, 1974;Marsaglia & Marsaglia, 2004). Based on the properties of the standard normal distribution, it follows that, for two SAS profiles identical up to experimental noise, the residuals should be symmetric and centred on zero, and approximately 99% of them should fall in the range of AE3 (Fig. 4). Table 1 summarizes the results of the Anderson-Darling test, alongside the reduced 2 and CORMAP tests, for the cases illustrated in Figs. 3 and 4. The first two cases in Table 1 involve the comparison of regular and anomalous scattering curves simulated from parvalbumin [Protein Data Bank (PDB) ID 1pal; Declercq et al., 1991] at two different concentrations (Fig. 3). At both concentrations, the standardized residuals were observed to have large systematic deviations from the standard normal distribution, and the hypothesis of the data sets being identical up to noise can be rejected at a significance level of ¼ 0:01 for all three DATCMP tests, i.e. anomalous scattering effects, although rather small, are still reliably detected by the statistical tests. The next two cases in Table 1 are illustrated in Fig. 4. The (arbitrarily selected) high-resolution structure of beta-lactamase (PDB ID 5hw5; Roose et al., in preparation) was used as a model structure, from which noiseless scattering data were calculated using CRYSOL and experimental effects simulated using IMSIM. The IMSIMsimulated data were further used to generate an ab initio bead model with DATMIF. The third case in Table 1 compares the noiseless scattering data calculated with CRYSOL with those simulated using IMSIM, while the fourth compares the scattering profile of the ab initio bead model and the simulated data. In these last two cases, the hypothesis of being identical up to noise cannot be rejected at a significance level of ¼ 0:01 for all the tests in DATCMP. As illustrated by the residual plots in Fig. 4, there are no systematic deviations in either case, the standardized residuals are randomly distributed, and their distribution follows, indeed, a standard normal distribution as underlined by the Anderson-Darling test.

Derivation and validation of the p(r) function
Real-space distance information can be extracted from SAS data as a pair distance distribution function, pðrÞ. The scattering intensity IðsÞ is the Fourier transform of the pðrÞ function: The pðrÞ function is then derived from IðsÞ by the inverse transform: Using equation (3) to compute the pðrÞ function directly from experimental data is challenging, due to the limited angular range that can be physically measured and the contribution of experimental noise, particularly at high angles. To overcome these difficulties, indirect Fourier transformation approaches were developed, such as GNOM in the ATSAS package (Svergun, 1992). Here, p(r) is parameterized by a set of analytical functions, and regularization is employed to balance the fit to experimental data and smoothness of the resulting distribution in real space, while also accounting for possible smearing effects (Glatter, 1977;Semenyuk & Svergun, 1991;Hansen, 2012). However, the direct application of equation (3) might be worth revisiting, especially as improvements in instrumentation, data collection and detector technologies have made experimental data less noisy and increasingly over-sampled, with often negligible smearing effects.
The program DATFT was developed to compute the pðrÞ function through a direct Fourier transform of IðsÞ, without the use of regularization. This approach is applicable if Ið0Þ and R g are reliably assessed from the data using the Guinier approximation (e.g. in the absence of aggregation and interparticle interference) and may be used to cross-validate the p(r) function obtained from GNOM. To reduce termination effects -artificial oscillations in the p(r) function, which are caused by the absence of scattering data at higher angles (Harris, 1978) -DATFT extrapolates high-angle data as IðsÞ ¼ s Àn , where the value of n can be selected (e.g. n ¼ 4 for globular particles and n ¼ 2 for flexible chains). As input, DATFT takes the experimental scattering data, the desired number of points in the pðrÞ function and its distance range r max . In addition, Ið0Þ and R g must be provided to DATFT to facilitate the extrapolation of truncated low-angle data using the Guinier approximation (Guinier, 1939). The resulting pðrÞ function gives an estimate of D max , as well as R g derived from the entire experimental data set, which can be used to crossvalidate the R g estimated from the Guinier region (s < 1/R g ) (Feigin & Svergun, 1987). Generally, no data pre-processing is required before the application of DATFT. However, best results are achieved for low-noise experimental data on an equidistant s grid.
To verify whether the given pðrÞ function is consistent with the experimental scattering data, a new tool, PDDFFIT, can be employed, which is useful for both the programs utilizing the reciprocal-space fits and those modelling directly to the p(r) function. PDDFFIT derives the scattering data from the pðrÞ function using equation (2), allowing a convenient comparison with experimental data with DATCMP or PRIMUS/Qt. Two helper tools were also added to ATSAS for manipulating output files from GNOM: OUT2POFR and OUT2FIT. OUT2POFR extracts the pðrÞ function into a separate file, e.g. for plotting with a third-party software application, while OUT2FIT does the same for the fit between the experimental data and the Fourier transform of the p(r) function.  Table 1 Summary of statistical analysis using the three DATCMP methods to assess goodness of fit of the data presented in Figs. 3, 4 and 7.
The test values are given, with the corresponding p values in parentheses. For all three tests, the hypothesis that there are no significant differences between data sets holds for significance level = 0.01 when the p value > .

Protein MW estimates from SAXS data
MW estimates derived from solution scattering data provide important information about possible aggregation or the oligomeric state of a macromolecule in solution. SAXSderived MW estimates can be obtained if the concentration of the macromolecule is known, by comparing against scattering either from pure water or from a reference sample of known concentration and MW (Orthaber et al., 2000;Mylonas & Svergun, 2007). In the absence of accurate concentration estimates, for example for SEC-SAXS experiments, concentration-independent methods can be used. Some concentration-independent MW assessment methods use scattering invariants that are independent of data scaling, such as the Porod invariant (Q p ) to obtain an estimate of the volume (V p ) of the sample, from which MW is derived by dividing the volume by the partial specific volume to obtain MM Qp (Porod, 1951), and applying additional corrections as done in DATPOROD and SAXSMoW (Petoukhov et al., 2012;Piiadov et al., 2019). Another scattering invariant, the volume of correlation (V c ), was found to correlate with MW in a large survey of protein and RNA structures in the PDB, and this relationship can be used for MW estimation . The machine-learning method DATCLASS also leveraged numerous structures in the PDB, performing shape classification and D max and MW estimation from scattering data, independent of data scaling . In addition to the individual methods, we developed a Bayesian approach to combine the concentration-independent MW estimates into a single consensus value, while also providing a probability estimate and credibility interval (Hajizadeh et al., 2018). All methods mentioned are combined into a commandline tool DATMW, which is also accessible from the graphical user interface PRIMUS/Qt (described in Section 5).

Structure modelling using SAS data
SAS-based structure modelling goes beyond the parameters derived from primary data analysis to provide insight into the 3D organization of macromolecular systems. The modelling approaches for monodisperse systems range from ab initio methods that are purely based on the scattering data to hybrid methods incorporating high-resolution models of domains/ subunits and biochemical information. Additionally, scattering data from polydisperse systems can be modelled as mixtures of several scattering species, where the SAS data allow the evaluation of their volume fractions in solution. Below we discuss new structure modelling tools in the current ATSAS release as well as new features added to existing tools. Of particular note are the approaches for lipid and nucleic acid structure analysis developed in response to the increased use of SAS to characterize these types of macromolecules.

Ab initio methods
Ab initio modelling is applicable in cases where no structural information is available about the macromolecule of interest. ATSAS contains several ab initio modelling tools that are based on either comparison with simple shapes (BODIES) (Konarev et al., 2003), bead/dummy-atom models (DAMMIN, DAMMIF and MONSA) (Svergun, 1999;Franke & Svergun, 2009;Svergun & Nierhaus, 2000), or, in the case of proteins, dummy amino-acid representations (GASBOR) (Svergun et al., 2001). Below, we briefly describe two new tools for ab initio modelling in ATSAS 3.0.
4.1.1. Direct modelling from experimental data. Several ab initio bead modelling applications in the ATSAS suite (DAMMIN, DAMMIF, GASBOR) do not model the experimental data directly, using instead the regularized scattering data computed by GNOM during the generation of the pðrÞ function. A new application, DATMIF, derived from DAMMIF, has been added to the current ATSAS release. DATMIF produces bead models by direct fitting of the scattering data, thereby making use of the experimental error estimates. Aside from the data fit, the only modelling penalty applied by DATMIF is the Akaike information criterion (AIC), which minimizes the number of parameters (in this case, beads). Hence, the AIC minimizes the volume of the final model, which results in compact protein-like structures (Fig. 4, inset).

Multiphase modelling of solubilized membrane
proteins. MONSA performs ab initio modelling of systems consisting of multiple phases with distinct contrasts (Svergun, 1999;Svergun & Nierhaus, 2000) and may thus be used to model detergent-solubilized transmembrane proteins. However, the ab initio reconstruction of membrane proteins is an ill-posed problem, with an even larger number of potential solutions than the single-phase ab initio modelling. A proper use of additional information about the system is therefore essential for this type of ab initio analysis. A new preparatory tool, DAMEMB, imposes knowledge-based constraints by building the initial MONSA search volume consisting of three phases corresponding to the protein, detergent tails and detergent heads (Fig. 5). Users may specify the thickness of the last two phases on the basis of the chemistry of the detergent used. To facilitate optimal data fitting in MONSA, the phase assignment of the boundary regions between each pair of phases is variable, including any boundary shared A DAMEMB-generated initial search volume for multiphase modelling of membrane proteins with MONSA. The protein phase, 1 (cyan), is defined within a spherical core region, located at the origin of the search volume. The core volume is surrounded by two distinct phases, 2 and 3 , corresponding to the tail (pink) and head group (yellow) regions of a detergent molecule. The thickness of each phase, as well as that of the boundary region Ád, may be specified by the user. between the protein core and the solvent phases. DAMEMB may also be used for membrane-associated proteins by shifting the protein phase to the surface of the search volume, and symmetry restrictions may be imposed.

Hybrid methods
Hybrid modelling methods can be employed in cases where either partial or full high-resolution structures of the macromolecule of interest are available. Hybrid methods in ATSAS utilize either rigid-body or flexible modelling approaches. In rigid-body methods, the high-resolution structures are represented as immutable blocks arranged in space to optimally fit the scattering data, while also meeting geometric criteria such as structure connectivity and lack of clashes. ATSAS programs for rigid-body modelling include, but are not limited to, SASREF, which models oligomers and complexes given the structures of the subunits; BUNCH, which builds multidomain protein models given the structures of the domains while adding missing linker residues; and CORAL, a combination of the above two methods, to model protein complexes with missing residues (Petoukhov & Svergun, 2005;Petoukhov et al., 2012). Flexible modelling does not keep the high-resolution models fixed, instead allowing them to change conformation. For example, the ATSAS program SREFLEX permits high-resolution protein structures to be morphed along their Cartesian normal modes, in order to find alternative conformations better agreeing with the experimental scattering data (Panjkovich & Svergun, 2016a).
In the current ATSAS release, two hybrid modelling tools were added: ELLLIP, for the rigid-body modelling of bicellar systems, and NMATOR, for modelling conformational changes in nucleic acid structures. Below we present these new tools, as well as updates to SREFLEX.
4.2.1. Quasi-atomistic bicellar modelling. The program ELLLIP builds quasi-atomistic models of ellipsoidal liposomes ( Fig. 6) (Petukhov et al., 2020). The liposomes are constructed as two nested ellipsoids corresponding to the inner and outer leaflets. The sizes and shapes of the leaflets can be specified by the user by defining the lengths of the ellipsoid semi-axes. Two quasi-uniform angular grids are generated for the outer and inner liposomal leaflets, and each of them can have a user-defined number of directions. The angular grids are then populated with pairs of adjacent lipid molecules, which could be previously modelled with molecular dynamics as decoupled building blocks. Subsequently, ELLLIP may be used to randomize the positions of the lipids, whereby their centres are additionally displaced to account for the possible nonideality and disorder of the bilayer. In addition to liposome modelling, ELLLIP is applicable to other bicellar systems, e.g. those made of proteins. Note that the program does not perform any optimizations or fitting of the experimental data; it just generates the liposomal scaffolds, which can be used in subsequent modelling with other tools.
4.2.2. Modelling conformational changes. Normal mode analysis (NMA) approximates conformational changes of a macromolecule as coordinated, harmonic motions around an initial equilibrium position (Goldstein, 1950) and has been shown to approximate interdomain motions in many proteins (Tama & Sanejouand, 2001;Krebs et al., 2002;Alexandrov et al., 2005;Tobi & Bahar, 2005;Dobbins et al., 2008;Wako & Endo, 2011). NMA is the basis for the SREFLEX algorithm (Panjkovich & Svergun, 2016a), which models conformational changes in proteins by modifying an initial structure using its low-frequency normal modes in Cartesian space in the search for the model providing improved fit to experimental scattering data. SREFLEX can be used, for example, to model conformational differences between the crystal and solution structures, provided that these differences are detectable by SAS. A new feature has been implemented in the current version of SREFLEX, which produces a pool of alternative models from an initial high-resolution structure. The pool mode of SREFLEX can be used as a source of initial models for modelling structures with intrinsic flexibility, for example, with EOM, the ensemble optimization method (Tria et al., 2015).
SREFLEX was found to work well for proteins but has limitations for nucleic acids, possibly leading to breaks in the modified models. The new program NMATOR also employs NMA to capture conformational differences by SAS (Fig. 7 and Table 1) but uses the normal modes in torsion angle space instead of Cartesian space (Manalastas-Cantos & Svergun, 2021). NMATOR has been optimized for single-chain nucleic acid structures, morphing high-resolution models through coordinated, iterative bond rotations that alter the backbone dihedral angles: i.e. ' and for protein structures; , , , " and for nucleic acids. In order to prevent spuriously large amplitudes at the ends of the molecule that may occur due to lighter packing, we have added a stiffening factor to the tip regions, as described by Lu et al. (2006). Since only bond rotations are imposed, NMATOR avoids the nonviable motions that may result from NMA in Cartesian space; the latter does not consider bond connectivity, and can thus  ELLLIP builds a liposome as two nested quasi-ellipsoids corresponding to the inner and outer liposome leaflets. The ellipsoidal shapes can be user specified by defining the lengths of the ellipsoid semi-axes (A out , B out and C out for the outer leaflet, and A in , B in and C in for the inner leaflet). Atomic models of the constituent lipids (grey beads) are placed on angular grids (top right) that define the outer (pink) and inner (blue) leaflets of the liposome. After the grids have been populated with lipids, a randomization step occurs in which the lipid molecules are displaced to account for possible disorder.
introduce distortions due to excessive bond stretching or compression (Ló pez-Blanco & Chacó n, 2016). NMATOR can be used in three modes: (i) to compute normal modes in torsion angle space, (ii) to refine an initial structure along its normal modes and fit the experimental SAXS data, as discussed above, and (iii) to generate a pool of alternative configurations from the initial model, which can be used for ensemble modelling of flexible structures, in a similar way to SREFLEX's pool mode.

Polydisperse systems
In contrast to monodisperse systems, in which all particles in solution are identical, polydisperse systems require data analysis methods that take into account both the structures and the volume fractions of different particles in solution. The scattering profile from a mixture can be represented as the volume-weighted sum of the scattering profiles of the individual components: Here the mixture is assumed to contain N distinct scattering species, each with the scattering profile I k (s), comprising volume fractions v k . The addition of unknown variables to the system, such as scattering species of unknown structure and/or concentration, necessitates the use of multiple distinct scattering curves to adequately constrain the possible solutions.
Depending on the type of polydisperse system, the scattering curves can either represent different time points (for evolving systems) or different sample conditions. In the present ATSAS release, three new methods were added to characterize polydisperse systems: DAMMIX, for ab initio reconstruction of an unknown intermediate in an evolving system; LIPMIX and BILMIX, to model polydispersity in multilamellar and asymmetric lipid vesicles, respectively.
4.3.1. Modelling evolving systems. DAMMIX reconstructs ab initio the low-resolution shape of a transient component together with its volume fraction, on the basis of multiple scattering patterns recorded from an evolving system (Konarev & Svergun, 2018). The system is assumed to be a closed three-component mixture with known starting and final structures, and an unknown intermediate to be reconstructed. The three components have volume fractions with the relationship v m ðkÞ þ v i ðkÞ þ v a ðkÞ ¼ 1, for k scattering curves representing different time points, where v m , v i and v a are volume fractions for the monomer (starting structure), intermediate and aggregate (final structure), respectively (Fig. 8).
DAMMIX can also be applied to two-component evolving systems when one component (e.g. the monomer) is known, allowing the reconstruction of the unknown component. In addition, DAMMIX can be used to retrieve the shapes of unknown components in systems with multiple assembly states, for instance, virus-like particles or nanoparticles stabilized by polymer chains. For these more complicated pathways, chemometric approaches such as multivariate curve resolution-alternating least squares (MCR-ALS) (Herranz-Trillo et al., 2017) and evolving factor analysis (EFA) (Maeder, 1987;Maeder & Neuhold, 2007;Meisburger et al., 2016) could aid in finding subsets of the data taken along the pathways where DAMMIX may be applied.  NMATOR models conformational changes in RNA structures to fit SAS data, while preserving bond lengths. Both the initial and target models were obtained from the solution NMR ensemble of U65 Box H/ACA snoRNA (35 nt; PDB ID 2pcv; Jin et al., 2007). The target model is shown as grey spheres in the bottom-left inset, with the initial model superimposed in cyan. SAXS data were simulated from the target model with IMSIM. The conformational differences between the initial and target models are detected as a poor fit between the IMSIM-simulated SAXS data from the target and the scattering data computed by CRYSOL from the initial model (statistics are summarized in Table 1). The NMATOR model (red) recapitulated the unbending of the short helix, resulting in a better correspondence to the target model and a much better fit to the simulated data. The residuals are shown in the bottom panel.

4.3.2.
Modelling polydisperse lipid vesicles. The programs BILMIX  and LIPMIX (Konarev et al., 2021) use scattering data from a mixture of lipid vesicles to reconstruct the electron density across the lipid bilayer [(z)] and the size distribution of the vesicles [D v (r)] (Fig. 9). BILMIX can account for vesicle anisotropy, while LIPMIX allows the vesicles to be modelled as multilayered structures.
In both programs, the scattering data from a lipid vesicle are approximated using a separated form factor (SFF) approach. SFF is a product of the form factor of a thin spherical shell F TS , which defines the vesicle size, and the form factor of a flat lipid bilayer F FB describing the electron density across the bilayer (Kiselev et al., 2002;Pencer et al., 2006). The scattering profile of each distinct vesicle k of a specific size and architecture can thus be expressed as The last term in equation (5) is implemented only in LIPMIX, and accounts for the presence of M distinct multilayer architectures, each with an inter-bilayer structure factor S i FB and occupancy factor w i (Zhang et al., 1994). The form factor F FB (s) is the Fourier transform of the electron-density profile (z) (Fig. 9, right panel), defined as The two Gaussian terms of width H1 , centred at AEz H1 , represent the hydrophilic head groups, while the Gaussian term of width c centred at z = 0 (the middle of the bilayer) represents the electron density of the hydrophobic core. The two Gaussian terms of width H2 , centred at AEz H2 , are implemented only in BILMIX and allow the modelling of asymmetric electron-density profiles, e.g. proteins associated with the inner or outer leaflets of the liposome. Both BILMIX and LIPMIX can be utilized to model various liposomal systems and serve as tools for lipidomics structural studies.

Technical updates and standardization
Several changes have been made in ATSAS 3.0 to facilitate maintainability and future development. These include preparations for read and write compatibility with the mmCIF format, as well as updates to the graphical frameworks.

mmCIF compatibility
A number of programs in the ATSAS suite make use of high-resolution structure files, including CRYSOL, which computes scattering from atomic coordinates, and the hybrid modelling methods, which use high-resolution structures as building blocks for SAS-guided modelling. As the PDB has made mmCIF the new standard format for structure files (Hall & McMahon, 2005;Berman et al., 2014;Adams et al., 2019), the ATSAS software is currently being adapted to be read and write compatible with both PDB and mmCIF formats. As of the current release, the programs BUNCH and NMATOR utilize both PDB and mmCIF formats as input. In order to use BUNCH, a preparatory program, PRE_BUNCH, must first be run. This produces a single PDB file containing the domains and the appropriate number of dummy atoms representing the missing loop regions, which is then used by BUNCH as input. PRE_BUNCH has been updated to read both PDB and mmCIF structure files, thus allowing BUNCH to be used with mmCIFs as the initial input. For other relevant ATSAS applications and in the interim period while not all ATSAS programs are natively mmCIF compatible, a format conversion utility CIF2PDB can be used. CIF2PDB converts structure files from mmCIF to the PDB format, making them readable by all ATSAS programs.

Updates to graphical interfaces
PRIMUS/Qt provides an interactive graphical user interface for many ATSAS applications and acts as an interactive plotting and data analysis tool. In the current release, PRIMUS/Qt was ported to utilize the most recent long-term support release of the Qt5 framework (https://www.qt.io) for continued and improved cross-platform support. The functional enhancements in PRIMUS/Qt include, but are not limited to, improved plot display, configurability and export to bitmap and vector graphic formats with variable size and resolution, addition of residual plots where data fitting is performed, and a redesign of the pairwise comparisons of data sets view. The latter now allows for minor mismatches of the angular grid and provides a square heatmap-like overview of comparison results employing CORMAP or the reduced 2  test. Further, all statistics implemented in DATCMP are immediately accessible in this view.
The graphical interface in CHROMIXS enables a convenient and rapid display of thousands of SEC-SAS data frames, as well as manual or automated selection of sample and buffer frames (Panjkovich & Svergun, 2018). Extra features have been added to CHROMIXS since the ATSAS 2.8 release, which include the calculation of MW and R g estimates for the selected sample-peak elution frames, as well as the ability to load and visualize other time-course data, e.g. UV absorbance (Fig. 10).
A plugin, SASpy, enables the usage of a subset of ATSAS functions within the molecular visualization system PyMOL, facilitating creation, manipulation and SAS-guided refinement of hybrid models in a graphical environment (Panjkovich & Svergun, 2016b). SASpy has been updated to be both Python 2 and Python 3 compatible. Also, feature updates to several main ATSAS programs are now available in SASpy, such as an explicit hydrogens toggle for CRYSOL, which enables users to generate accurate scattering amplitudes for the input structure files with atomic groups not recognized in the default mode. SASpy is distributed both as a component of the ATSAS package and as an open-source PyMOL plugin (https:// github.com/emblsaxs/saspy).

Conclusions
The ATSAS 3.0 release introduces a set of new functionalities, which include modelling tools for lipids and nucleic acids, and expanded options for polydisperse systems. Data simulation tools have also been introduced in this release, with the intention of spurring SAS methods development in a wider developer community. In addition, to facilitate maintainability and future development, ATSAS was updated to technical standards, including support of the mmCIF format and utilization of the most recent versions of graphical frameworks.
ATSAS can be installed and used locally (installers for Windows, Mac OS and Linux available at https://www. embl-hamburg.de/biosaxs/software.html). Alternatively, many programs can be run on the EMBL Hamburg cluster via the ATSAS online interface (https://www.embl-hamburg.de/ biosaxs/atsas-online/). Feedback from the user community serves as an important guide to future developments in ATSAS and can be given at the SAXIER forum (https:// www.saxier.org/forum/). CHROMIXS updates. (a) Regions in the SEC-SAS data (blue line) which represent the sample (green, on the peak) and buffer (red, on the flat region) can be selected manually or automatically. The R g or MW across the sample region (black correlation, through the sample elution peak) can be calculated. (b) Complementary time-course data (black dots), such as a UV absorbance trace to track protein elution, can be loaded and viewed together with the SEC-SAS data. The third (rightmost) UV absorbance peak corresponds to buffer mismatch, i.e. components in the sample buffer that are not present in the SEC running buffer.