Describing small-angle scattering profiles by a limited set of intensities

An indirect Fourier transform method is presented which describes a solution scattering profile from a reduced set of intensities. Equations are derived to fit the experimental profile using least squares and to calculate commonly used size and shape parameters directly from the reduced set of intensities, along with associated uncertainties. An analytical equation is derived enabling regularization of the real-space pair distribution function. Convenient software is provided to perform all described calculations.


S1. Extension of Moore's IFT
Moore uses a trigonometric series to define a function Q(r) = P (r)/r. This definition resulted in a convenient relationship between the real space Q(r) and the reciprocal space U (q) = qI(q), where the two are Fourier mates. This results in equations 17 through 18 defining P (r) and I(q): P (r) = r 2π 2 ∞ n=1 a n sin nπr D (17) I(q) = D π ∞ n=1 a n q sin(qD − nπ) qD − nπ − sin(qD + nπ) qD + nπ (18) where a n are weights for each term in the series, the Moore coefficients, and D is the maximum particle dimension (Note: modest variations compared to Moore's original description of these functions by a factor of 2π are due the use of q = 4π sin(θ)/λ rather than s = 2 sin(θ)/λ, where 2θ is the scattering angle and λ is the wavelength).
Key to Moore's approach (and other IFT methods (Glatter, 1977;Svergun, 1992)) is that the weights a n define both the real space and reciprocal space profiles, using the appropriate basis functions. Least squares can be used to determine the a n 's and the associated standard errors by minimizing the χ 2 formula (equation 19): where I e is the experimental intensity for data point i, I c is the intensity calculated at q i given by equation 18, σ i is the experimental error on the intensities, and N is the total number of data points.
Moore's use of Shannon information theory to define I(q) resulted in a selection of q values, namely q n = nπ/D, termed "Shannon channels" (Feigin & Svergun, 1987;Svergun & Koch, 2003;Rambo & Tainer, 2013). The intensities at q n , i.e. I n = I(q n ), therefore become important values as they determine the a n 's and thus can be used to completely describe the low-resolution size and shape of a particle obtainable by SAS.
It is therefore convenient to derive the mathematical relationship between I n and a n .
Note that here we will further use m to refer to a particular term in the series, and we will use n when referring to the terms in the function defining the entire series. The intensity I m at q m = mπ/D is Since sin((n − m)π) (n − m)π − sin((n + m)π) (n + m)π = 0 : n = m 1 : n = m the sum reduces to a single term when m = n, resulting in and therefore Defining the basis functions B n as I(q) can now be expressed as a sum of the basis functions B n weighted by physical intensity values at q n As in Moore's original approach, the B n functions are determined by the maximum dimension of the particle, D. B n 's for D = 50Å are illustrated in Figure 1. The P (r) function can be determined from the continuous I(q) according to equation 27: P (r) can also be represented using the series of I n values by inserting equation 23 into equation 17, resulting in equation 28: or by defining real space basis functions S n as follows: As intensity values measured precisely at each q m are typically not collected during experiment, least squares minimization of χ 2 can be used to determine optimal values for each I m from the oversampled SAS profile, resulting in greatly increased precision for each I m compared to measured intensities. To determine the set of optimal I m values, let Values for each I m are sought which minimize χ 2 , i.e. where δχ 2 /δI m = 0, yielding for all m. Let and Furthermore the errors on the calculated I c (q) curve can be calculated as: and the errors in P (r) are

S2. Derivation of Size Parameters and Error Estimates
Here the detailed derivation is presented for calculating R g from the I n 's. Derivations for the remaining parameters and errors can be determined similarly. R g can be calculated from the P (r) curve according to the following equation: To determine the equation relating I n coefficients to R g , we substitute 28 into equation

Since
the denominator can be simplified to The numerator becomes Since Porod's law shows that intensity (and thus the I n 's) decays as q −4 for globular particles, and similarly as q −2 for random chains, the infinite sum is guaranteed to converge to a finite value. Since the sum converges, the Fubini and Tonelli theorems (Fubini, 1907;Tonelli, 1909) show that the infinite sum can be exchanged with the finite integral as follows Pulling constants out of the integral results in The integral can be solved by three iterations of integration by parts and evaluated at the limits to obtain which can be combined with equation 40 to ultimately obtain equation 8. Similar steps can be followed for the remaining parameters.
The average vector length in the particle, r, is defined as The Porod volume can then be calculated, using its definition containing the Porod invariant (Porod, 1982), by the following equation The Volume of Correlation (Rambo & Tainer, 2013), V c , is defined as where c is the length of correlation (Porod, 1982). V c can thus be estimated from the Since the matrix C −1 contains all the information on the variances and covariances of the I n 's, the uncertainties in each parameter can be derived using error propagation.
The error in I(0) is thus The error in R g is In equation 8 it can be seen that R g has a non-linear dependence on I n , and thus the error on R g is dependent on R g itself. The error in r is The error in Q is The error in V p is The error in V c is The error in c estimated from equation 49 becomes

S3. Derivation of Analytical Regularization of P (r)
To enable the regularization of the P (r) curve for the derivation described above, S has been chosen to take the commonly used form of equation 58: where P (r) is the second derivative of P (r) with respect to r. The second derivative is often chosen as it is sensitive to large oscillations in the P (r) function, i.e. smoother functions will have fewer oscillations and thus S will be small. This representation allows for an analytical solution to the problem of regularization of the P (r) curve.
To begin, the second derivative of P (r) can be calculated as Equation 58 Since the term in square brackets outside the sum is independent of n, it can be brought inside the sum, and the integration and summation exchanged: The terms in the integrand can now be expanded, yielding four terms in total: Each term can now be integrated and evaluated at the limits to obtain the following: δS δI m = ∞ n=1 I n 0 − (−1) m+n m 2 n 4 π 2 2D 5 (m 2 − n 2 ) + (−1) m+n m 4 n 2 π 2 2D 5 (m 2 − n 2 ) − (−1) m+n (mn) 2 (m 4 + n 4 )π 2 2D 5 (m 2 − n 2 ) 2 .
These terms can be combined and represented by G mn below. Note that when m = n the equation is undefined, so the integration has been repeated after taking the limits as m approaches n for the special case when m = n. Taken together, the derivative of S with respect to I m can be now be represented as equation 65: where G mn =        π 2 2D 5 (mn) 2 m 4 + n 4 (m 2 − n 2 ) 2 (−1) m+n : m = n π 2 48D 5 n 4 2n 2 π 2 + 33 : m = n .
Following the same procedure outlined in equations 31 through 34 and now including 65, equation 15 can now be solved by least squares minimization to yield the optimal values for each I m while accounting for the regularizing function S according to the following modified equations: