research papers
Dummyatom modelling of stacked and helical nanostructures from solution scattering data
^{a}Institute of Inorganic Chemistry, Graz University of Technology, Stremayrgasse 9/V, Graz 8010, Austria
^{*}Correspondence email: amenitsch@tugraz.at
The availability of dummyatom modelling programs to determine the shape of monodisperse globular particles from smallangle solution scattering data has led to outstanding scientific advances. However, there is no equivalent procedure that allows modelling of stacked, seemingly endless structures, such as helical systems. This work presents a beadmodelling algorithm that reconstructs the structural motif of helical and rodlike systems. The algorithm is based on a `projection scheme': by exploiting the recurrent nature of stacked systems, such as helices, the full structure is reduced to a single buildingblock motif. This building block is fitted by allowing random dummyatom movements without an underlying grid. The proposed method is verified using a variety of analytical models, and examples are presented of successful shape reconstruction from experimental data sets. To make the algorithm available to the scientific community, it is implemented in a graphical computer program that encourages user interaction during the fitting process and also includes an option for shape reconstruction of globular particles.
Keywords: SAXS; stacked structures; helical structures; shape retrieval; SasHel; structure determination; solution scattering; computational modelling; structural biology; nanoscience.
1. Introduction
Smallangle Xray scattering (SAXS) is an established technique to study the structural aspects, such as the size and shape, of molecular systems in solution (Li et al., 2016). As this structural information is not directly apparent from the recorded scattering intensity, one requires a fitting process that generally relies on an underlying mathematical model (Pedersen, 1997). While a variety of such analytical models are available in the literature (Pedersen, 1997), each of them is bound to a given shape. The fitting process of an experimental data set therefore requires previous knowledge of the sample such that the appropriate model can be chosen. Dummyatom (DA) modelling, which describes the particle shape as a variable bead assembly, bypasses this issue as the fitting process is no longer constrained to a single geometry (Chacón et al., 1998; Walther et al., 2000; Svergun et al., 2001; Franke & Svergun, 2009; Koutsioubas et al., 2016). Consequently, no a priori knowledge regarding the solute's shape is necessary. This advancement has helped to make SAXS an attractive technique to characterize stable molecules in solution, far beyond the community of scattering experts (Yang, 2014).
However, single molecules in solution are not always stable. In fact, it is in their nature to interact and aggregate, often resulting in highly organized hierarchical systems stretching over several orders of magnitude (Palmer & Stupp, 2008; Wasielewski, 2009; Busseron et al., 2013; Praetorius & Dietz, 2017). Helical and chiral superstructures are a common yet spectacular class of such systems, e.g. the length of the human genome DNA double helix exceeds several centimetres (Venter et al., 2001) while the forces that give rise to the characteristic helical motif act on the molecular (nanometre) level (Dobbs, 2007). Amyloid fibrillation, i.e. the aggregation of proteins into insoluble, often helical, fibrils, has been linked to critical diseases such as type 2 diabetes or Alzheimer's (Dobson, 2003; von Bergen et al., 2000) as well as to unwanted drug degradation (MorozovaRoche & Malisauskas, 2007; Vestergaard et al., 2007). Consequently, a comprehensive understanding of these systems requires structural characterization at the (supra)molecular scale. Smallangle Xray diffraction of aligned fibres [e.g. the famous studies of the structure of DNA (Wilkins et al., 1953; Watson & Crick, 1953)] is a powerful technique for this purpose. However, the evaluation of solution scattering data from randomly oriented helical structures faces two challenges: (i) only a few analytical models are available from which structural information, such as pitch and twist, can be retrieved (Schmidt, 1970; Pringle & Schmidt, 1971; Hamley, 2008; Teixeira et al., 2010); and (ii) the endless nature of helices does not allow DA modelling using current programs (Volkov & Svergun, 2003; Gingras et al., 2008).
In this work, we present a beadmodelling algorithm to determine the structural motif of monodisperse systems, highly elongated in one dimension, in solution from SAXS data. We use symmetrical boundary conditions to project the seemingly infinite nature of e.g. helical systems onto a single buildingblock unit, represented by dummy atoms (DA). This building block is altered by random DA movements while simultaneously fitting the corresponding scattering curve against the experimental one. The proposed method is verified using simulated data sets of various onedimensional structures and we subsequently apply it to experimentally obtained SAXS data. The full algorithm is implemented in a computer program (SasHel), which also includes an option for globular geometries. The reduction of the system's complexity by symmetric projection and the fast code implementation result in a toolkit that allows a full shape retrieval from scattering data in the order of 3–90 min on a standard work station (see Appendix A), depending on the level of detail.
2. Projection scheme
Dummyatom modelling is based on the idea that a given particle shape is represented by an assembly of N small beads of equal scattering length density, called dummy atoms (DA). According to the Debye formula (Debye, 1915), the scattering intensity of this assembly can then be calculated from the bead assembly as
where the norm of the scattering vector is denoted as q = 4πsin(θ)/λ (2θ is the scattering angle and λ is the radiation wavelength) and f_{DA}(q) represents the DA form factor (given in this work by a Gaussian sphere approximation; Koutsioubas & Pérez, 2013; Svergun et al., 1995; Grudinin et al., 2017; Fraser et al., 1978). A DA diameter of 0.2 nm was used for all reconstructions. As this formalism considers the distances between all possible DA pairs r_{ij} = r_{j} − r_{i} inside the assembly, its mathematical complexity is . Hence, the computational effort increases drastically for large N. Adequate modelling of seemingly infinite rodlike geometries requires very large numbers of DAs (N 10 000), making the Debye formula appear inadequate in its standard notation.
Rodlike structures, such as helices, often possess a certain structural motif, a building block, which recurs along the elongation direction. Hence, only this building block is of interest for shape reconstruction. We define a building block of N_{BB} dummy atoms with its elongation direction parallel to the z axis (see Fig. 1). The full rodlike structure is then reproduced by duplicating the building block M times along the z direction, with a stacking distance corresponding to the building block's height H_{BB} (see Fig. 1). The evaluation of this stacked model by means of the Debye formula [equation (1)] now includes redundant terms, as distinct motifs inside the structure are now recurrent. For instance, the single buildingblock motif would be evaluated at each repetition along z, such that the same computation is performed a total of M times. It is therefore sufficient to calculate the scattering intensity of the building block only once and scale it by the number of stacks. The same principle applies to interbuildingblock contributions. For example, as the motif of neighbouring building blocks can be found (M − 1) times in the full structure, we can evaluate this motif once and, again, multiply it by the corresponding scalar.
In mathematical terms, the Debye formula can be adjusted to neglect these redundancies, resulting in
where e_{z} denotes the unit vector along the z axis (elongation/stacking direction). This formalism reduces the calculation of the scattering intensity to the sum of structural motifs found inside the stacked model (see scheme in Fig. 1). It further bypasses the demand for a DA model to represent the full structure, as we evaluate only projections of the building block instead of all stacked duplicates, hence the term `projection scheme'. Its benefit, compared with the standard Debye formula, is a reduction in the mathematical complexity from to . In the case of the example shown in Fig. 1 (N_{BB} = 400, M = 15), this increases the calculation speed approximately seven fold.
The notion of reducing a given model to its structural motifs raises a M necessary to represent a seemingly endless structure. This number defines the length of the entire projected model L, simply via L = MH_{BB}. According to scattering theory, rodlike structures present a characteristic q^{−1} powerlaw behaviour in the intermediate Porod regime (Glatter & Kratky, 1982), whereas the transition from this Porod regime (q^{−1}) to the adjacent lowerangle Guinier regime (q^{−0}) occurs at a specific scattering vector q_{1} depending on the length of the rod, which can be estimated using q_{1} = 18^{1/2}/L in the simplified framework of the Hammouda model (Hammouda, 2010). Thus, if the experimental data present a q^{−1} power law at even the smallest accessible scattering angle q_{min}, the length of the structure must be larger than L 18^{1/2}/q_{min}. Similarly, we use this relation to determine the minimum number of stacks required, according to M 2 + 18^{1/2}/(H_{BB}q_{min}) (with the additional term of + 2 to avoid truncation effects).
regarding the required number of stacks3. Determination of the stacking distance
The projection scheme is a faster alternative for calculating the scattering intensity of a stacked structure compared with the standard Debye formula. As is apparent from equation (2), the formalism requires an additional input parameter, the stacking distance between the building blocks H_{BB}. In the following, we present a pathway to determine this using the pair distance distribution function (PDDF).
The PDDF corresponds to the surfaceweighted probability to find two points separated by a distance r inside a particle (Glatter & Kratky, 1982). It is thus a histogram of all distances that appear inside a given particle, weighted by the corresponding electron densities. In the case of DA assemblies, the PDDF can be also interpreted as the volumeweighted pair correlation function p(r).
An analytically available example of this definition yields the case of stacked spheres using the formalism based on the work of Glatter (1980), from which a series of conclusions can be drawn (see Fig. S1 in the supporting information for a system of ten stacked spheres). Evidently, as the system contains ten spheres the corresponding PDDF presents ten peaks. The first peak (blue trace in Fig. S1) relates to the mean shape of all the spheres involved – it is hence an intrabuildingblock PDDF. The following nine peaks (red traces in Fig. S1) are caused by the repetitive nature of the system, as for each possible spheretosphere distance a new peak is found – they are hence interbuildingblock PDDFs. These interbuildingblock PDDFs, in particular the distances between them, therefore hold information on the stacking distance between the building blocks.
In order to determine the stacking distance of a structure from a given PDDF, it would be intuitive to measure the peak distances. However, the exact peak shapes and therefore positions are distorted due to (i) the linear highr decay in the PDDF (see dashed lines in Figs. S1 and S2) and (ii) the overlap of neighbouring peak contributions (Glatter, 1980; Feigin & Svergun, 1987). A direct measurement of the peak positions in the PDDF might therefore result in misdetermination of the stacking distance. A stable approach to circumvent this issue is to calculate the derivative of the PDDF (dPDDF). This numerically fast and easy operation suppresses the abovementioned decay distortion (see dPDDF in Fig. S1). The resulting dPDDF can then be fitted by e.g. a damped sinusoidal function in which the period is directly related to the mean stacking distance (see Fig. S1 and Section S1 in the supporting information).
A more complex example illustrating the named distortion effects is the case of a torus (a ring with a circular ). Such a torus presents two characteristic dimensions: the ring–centre diameter and the ring thickness. As a result, the PDDF of such a single torus exhibits two distinct peaks correlating to these structural features (see Fig. S2; the model scattering data, including an artificial error band, were calculated according to Appendix A). Considering now the case of multiple stacked rings (Kornmueller et al., 2015), for each ring that is added on top of the other(s) we find a new peak in the corresponding PDDF (see Figs. S2 and S3). However, in this case, the radial size of the tori (diameter 20 nm) was selected to be larger than the stacking distance between them (7 nm), resulting in an increased distortion of the interbuildingblock peak positions [see distortion phenomenon (ii) above, as well as Fig. S2]. As shown in Fig. S2, the determination of the stacking distance by fitting of the dPDDF yields a stable result.
Kawaguchi, 2001We employ this approach to determine the stacking distance from all repetitive structures presented in this work. The advantage of this choice is that we avoid the use of a complementary characterization technique, as the underlying PDDF can be determined from a given data set using a variety of available software packages. A more detailed discussion of the limitations and possible causes of misdetermination can be found in Sections S1.2 and S1.3 and Figs. S1–S6.
4. Fitting algorithm
According to the presented projection scheme, we reduce a seemingly infinite structure to a buildingblock motif together with two boundary conditions, the stacking distance H_{BB} and the number of necessary stacks M. As these two values are determined directly from either the PDDF or the scattering data, the modelling occurs within the single buildingblock volume. We thus represent the buildingblock motif by a dummyatom configuration X such that the scattering intensity I_{calc}(q_{m}) (where q_{m} denotes the q value of the measured data points) can be calculated using equation (2). To find a configuration that best fits the experimental scattering intensity I_{exp}(q_{m}), the relation
needs to be minimized, where σ_{exp}(q_{m}) denotes the experimental error and N_{exp} the number of data points.
In DA modelling, the principle idea behind this minimization procedure is straightforward: a given initial configuration is gradually altered until the best agreement between the corresponding scattering curves is found. Here, we randomly fill an initial buildingblock volume with DAs that are set free at the start of the fitting procedure (500–1500 DAs per building block are suggested, depending on the experimental resolution – see Section S2). In order to optimize the target function χ^{2}, various metaheuristic methods exist such as Simulated Annealing (Kirkpatrick et al., 1983) or the (Mitchell, 1996). In our case, we employ a metaheuristic fitting procedure (Boussaïd et al., 2013; Gogna & Tayal, 2013) with an antifragile implementation (Taleb & Douady, 2013) and the algorithm improves χ^{2} by randomly moving single DAs. In contrast with current DA modelling programs, the DAs do not move on an underlying grid. As the magnitude of the movement is scaled by a temperature factor, we force the system to freeze eventually in a given condition and the algorithm to converge.
DA modelling faces the general problem of uniqueness: as models consist of >10^{3} DAs, the information content given by the scattering data is highly overdetermined by a given configuration. Fitting of the scattering data without restraints can therefore lead to physically unfeasible results, in particular with regard to the model and compactness. Current algorithms control and optimize these two properties during the fitting process by means of a `looseness penalty' regularization term as a quantitative measure of the DAs' local vicinity: by counting and maximizing the number of contacting neighbours of each DA, a compact and homogeneous configuration is achieved (Svergun, 1999; Franke & Svergun, 2009; Koutsioubas et al., 2016). We adapt this proven technique to our gridfree algorithm in two ways. First, we introduce the parameter d_{N12}, denoting the distance between a given DA and its 12 (closepacked limit) nearest neighbours. Similar to the looseness penalty, d_{N12} quantifies the local vicinity of each DA, acting as a classifier for the algorithm to decide if a given DA position is accepted or not. Second, we apply the idea of a regularization term and introduce a `radial compactness' parameter RC(X) into the minimization procedure, keeping the DA close to the (radial) centre of mass of the configuration.
As the choice of a gridfree DA algorithm is a departure from current implementations, we face challenges for which no readily available solutions exist. In particular, we must introduce new concepts, (i) to obtain a hardcontact limit for neighbouring DAs, (ii) to allow model scalability over different length scales and (iii) to generate a random movement depending on the current annealing temperature. In the following subsections we discuss these topics in more detail, starting with the introduction of the scaling parameter D_{X} and the randommovement generator, and then moving to the mathematical definitions of d_{N12} and RC(X) regarding and compactness. A general overview of these concepts and their parameters is given in Table 1. At the end of this section, we further address the topic of model uniqueness and repeatability and give a conclusive overview of the algorithm's implementation in the program SasHel.

4.1. Scalability – scaling parameter D_{X}
Allowing DAs to move randomly without an underlying grid brings an advantage in terms of the model volume and therefore the radial size. In a fixed grid system, the final DA density is predefined by the grid resolution, which is related to the resolution of the scattering pattern in N_{DA} ≃ D_{X}^{ 2} for radial growth). Radial compression, on the other hand, is limited by the grid resolution, such that at a certain point the grid type (e.g. hexagonal packing of the DAs) may impose artificial structural features. This is particularly true for the case of highly elongated structures, where the grid resolution is likely to be more coarse than the experimental limit as a reasonable computational overhead (N_{DA} < 10 000) must be maintained. Our gridfree approach does not present such limitations: as the number of DAs per building block (mainly determined by the experimental resolution) is kept constant, the algorithm on its own adapts to a change in DA density while at the same time maintaining the relative model resolution and computational resources (see Section S2.2 for a discussion of model resolution). This in particular benefits users who have chosen the wrong radial size for the starting configuration, as the algorithm adapts according to the scattering curve without computational drawbacks. We exploit this adaptive potential of our algorithm using an estimated diameter D_{X} of a given configuration X, which acts as a scaling parameter throughout the algorithm.
In the case of reconstruction of a highly elongated structure, radial expansion of the DA configuration during the fitting process is only possible by the addition of new DAs, hence increasing the numerical overhead (We quantitatively track the growing or shrinking of the DA configuration X throughout the fitting process via the R_{G,X}. For elongated cases, we assume a cylindrical reference geometry, such that we estimate D_{X} after every DA movement according to
Along the z axis we introduce a continuity condition (see the limited buildingblock height in Fig. 1) such that DAs leaving the building block in the vertical direction are reprojected back inside the buildingblock volume.
4.2. Randommovement generator
As we pursue the concept of gridfree random DA movements to find an optimal configuration, we require a randommovement generator that depends on a series of parameters, including (i) the dimensions of the current DA configuration (H_{BB} and D_{X}), (ii) the current temperature T of the fitting process () and (iii) the helical of the sample (helical field bias γ). In every numerical iteration throughout the fitting process, a chosen DA is moved along the unit axes e_{x}, e_{y} and e_{z} by the random vector r_{rand}(T), where
Here, a random number generator Rand returns any value between −1 Rand_{x,y,z} 1 every time it is called. In order to adjust the random movement by the size of the current configuration, the radial and longitudal contributions are scaled by the current diameter D_{X} or the buildingblock height H_{BB}, respectively (movement along the z direction is scaled by α to ensure axial with a standard value of α = 0.1). Further, each movement along the x, y or z direction is scaled by the current temperature T. In the case of radial movement along the x or y direction, we added a helical bias term (depending on the DA's current e_{z} position z_{i}; the buildingblock motif extends from z = 0 to z = H_{BB}). This bias term is scaled by the helical field bias γ (user defined with values of the order of 0 γ 1, default value 0.3) as well as by the current temperature T, such that it favours counterclockwise solutions motivated by the first structural model of DNA (Watson & Crick, 1953). It is important to note that, mathematically, the helical bias term correlates quadratically with the current temperature T. Therefore, its contribution is only significant in the early stages of the fitting process (see Fig. S11 for its influence on the reconstruction) such that, using its default value of 0.3, it does not influence the modelling of nonhelical systems (Fig. S15) but facilitates the reconstruction of highaspect helical motifs.
In the case of fitting globular structures (M = 1), the randommovement vector r_{rand}(T) is significantly simplified according to
where the current diameter D_{X} is now evaluated over all three directions, such that equation (4) now also includes the contribution of the DAs' current z positions z_{i}^{2}.
4.3. – the d_{N12} parameter
The random nature of DA movements causes a side effect with regard to the uniqueness of the fitted model: as there are infinite possibilities for describing a given structure by randomly filling it with point scatterers, the terminology of a unique model is not reasonable in this context. However, certain criteria related to the
of the DA configuration make a fitted DA reconstruction, in a physical sense, very unlikely. These scenarios are (i) highdensity DA clusters and (ii) single disconnected DAs far away from the remaining configuration.We optimize the ; Franke & Svergun, 2009; Koutsioubas & Pérez, 2013), namely by evaluating the local vicinity around each DA. In these fixedgrid implementations, the distance between neighbouring DAs is known such that only the number of contacts needs to be counted. In our gridfree case, we use the same principle in an inverted manner: we assume an ideal closepacked condition with 12 neighbours and calculate their mean distance to the central DA, resulting in the parameter d_{N12}. In the extreme case of a DA within a highdensity cluster [scenario (i) above], d_{N12} will be very small. On the other hand, for a single freefloating DA that is far away from the core DA assembly [scenario (ii)], d_{N12} will be very large. The magnitude of d_{N12} is hence inversely proportional to the DA density around a given DA. Equally, the average d_{N12} over the full DA configuration d_{N12,X} denotes an inverse measure of the mean DA density of the configuration.
throughout the fitting process by analogy with the established looseness parameter in fixedgrid systems (Svergun, 1999We use the d_{N12} parameter for a twofold purpose. In order to avoid scenario (i), we do not allow DAs to come closer than 0.1〈d_{N12,X}〉. This acts as a hardcontact limit so that we circumvent DA clustering and therefore unfeasible singularities throughout the fitting process. In order to avoid scenario (ii), we repeatedly force DAs with d_{N12} 2〈d_{N12,X}〉 that are outside the mean radial distance 〈r〉_{X} to move towards the centre of mass of the closest fraction of DAs. This avoids free floating of a single DA and therefore forces the DAs to remain in a compact configuration. A detailed description of how the d_{N12} parameter is implemented in the fitting algorithm can be found in Section S3 and Fig. S27.
4.4. Compactness – the radial compactness parameter RC(X)
In order to prevent unphysical disassembly of the DA configuration during the fitting procedure, we retain the DAs close to the radial centre of mass (RCOM) of a given configuration. We achieve this by introducing a potential well such that the DAs are weighted as a function of their radial distance from the centre of mass r_{i}^{COM}. DAs inside the potential well should be unaffected, so that they move freely regardless of their position. Once DAs drift towards the well border, a force should push them back towards the centre of mass, hence avoiding radial disassembly. To find an adequate mathematical description of this potential field φ we specified the following criteria: (i) radial continuity to avoid nonlinearities; (ii) asymptotic behaviour, such that and ; (iii) scalability in magnitude (0 φ 1); and (iv) scalable potentialwell size and steepness. Based on these requirements (and inspired by the potential field for a Gaussian distribution of electrostatic charges; Schlick, 2010), we use the mathematical properties of the error function erf(x) to define a radialsymmetric potential well acting on each DA. As a result, we obtain the radial compactness parameter RC(X) according to
where r_{i}^{COM} represents the radial distance between the RCOM of configuration X and the ith DA (in the case of globular geometries, r_{i}^{COM} represents the distance between the centre of mass of configuration X and the ith DA). With regard to the specifications defined above, this formalism fulfils the abovementioned points (i)–(iv) by means of the scaling parameter D_{X} (which is evaluated after every DA movement). In particular, for point (iv), scalable potentialwell size and steepness, the potentialwell boundary (halfheight position) will be found at (r_{i}^{COM})/D_{X} = (π +1)/2π = 0.66, where its derivative is 1/π. In absolute terms, for a very compact structure such as a cylinder with a smooth surface or an infinitely thin cylindrical shell, RC(X) will be approximately 0.06 or 0.24, respectively. Each single atom moving further away from D_{X} will cause RC(X) to increase towards 1.
We account for this radial compactness throughout the fitting process by using RC(X) scaled by the compactness weight β as a regularization term. Thus, the algorithm in fact minimizes the function
instead of only χ^{2} alone.
As an ultimate radial boundary, similar to the searchvolume diameter in DAMMIN (Svergun, 1999), we apply a critical diameter D_{X,crit} = 2D_{X} such that single DAs moving far away from the RCOM are repeatedly forced into a more compact configuration (see Section S3 and Fig. S27 for a detailed description of how the radial compactness is implemented in the algorithm).
4.5. Model resolution and uniqueness
The most obvious visual side effect of using random movements is related to the outer surface of the fitted model. For example, if DAs can only move on an artificial grid, the fitted configuration of a globular particle will present a surface smoothness according to the lattice planes of the grid. As already discussed above, allowing random DA movements results in infinite possibilities for representing a given particle volume and thus also its surface. Consequently, the surface of a randommovement fitted configuration will be significantly rougher than a corresponding fixedgrid model. However, this increased surface roughness is just a visual symptom of the information content provided by the scattering curve, as we discuss in the following.
In absolute terms, we can expect to end up with a fitted configuration that presents structural features, both on the surface and within the volume, that are below the resolution limit of the experimental data (d_{min} = π/q_{max}, where q_{max} is the upper angular range of the experimental data). This implies that e.g. thin helical tapes require a large accessible angular range in order to be resolved properly. If this is not the case, the retrieved model is at high risk of being overinterpreted. On a less obvious note, a low angular resolution can further lead to artefacts within the configuration that might not be seen by a common surface representation (we address this topic in more detail in Section S2).
A common technique to avoid misinterpretation of such artefacts is to test the reproducibility of the reconstruction (Volkov & Svergun, 2003). This approach brings a series of advantages. First, the overall stability and reliability of the reconstruction from a given data set are assessed. Second, a consecutive averaging process of all reconstructions projects the DAs onto an artificial occupancyweighted grid, which provides a straightforward procedure to determine DA validity and volume inhomogeneity. Third, this occupancy map helps to identify structural artefacts in single reconstructions caused by insufficient information content of the scattering data, thus avoiding over interpretation. Fourth, the reconstruction of an arbitrary shape from scattering data is a (highly) underdetermined problem so it is not guaranteed to obtain a unique result from a given fitting algorithm. Performing a reproducibility analysis when reconstructing a structural motif from experimental scattering data, in particular when the information content and validity of the data are questionable, is hence highly recommended. In quantitative terms, the reproducibility of a reconstruction may be judged by the mean normalized spatial discrepancy 〈NSD〉 of parallel reconstructions, which represents an measure of dissimilarity (〈NSD〉 = 0 for identical models) for the independent runs (Volkov & Svergun, 2003).
4.6. Implementation
We implemented the fitting algorithm in the computer program SasHel, a Qt graphical user interface (GUI) written in C++ that allows user interaction. When starting a fitting procedure, the program generates a random cylindrical starting configuration according to a userdefined diameter, the buildingblock stacking distance and the number of DAs per building block. Once started, the fitting algorithm undergoes N_{k} iterations: starting from a temperature T_{0}, the system cools down as defined by the quenching coefficient q_{T} (0 q_{T} 1) such that the current temperature at a given iteration k is T_{k} = T_{0} q_{T}^{ k} (we recommend the default values N_{k} = 100, T_{0} = 1 and q_{T} = 0.99 as an initial set of parameters for convergence). In each k iteration, all the DAs are randomly moved using the randommovement generator (see Section 4.2) under the restraints explained in Section 4.3, such that a movement is only accepted if (i) the new DA position complies with the hardcontact limit 0.1〈d_{N12,X}〉 (if not, up to 100 new movements are considered) and (ii) an improvement in the function f(X) is found [see equation (8)]. After each k iteration the sequence of DAs is randomly mixed to avoid sequential biasing of the algorithm.
Throughout the fitting procedure, we pursue the concept of antifragility (Taleb & Douady, 2013), a concept applicable to metaheuristic optimization (Boussaïd et al., 2013; Gogna & Tayal, 2013): the converging system is repeatedly forced out of its local minimum such that a global minimum is more likely to be found. These nonoptimal moves are forced onto the configuration without consideration of the damage caused to the model, making the algorithm less prone to distortion of the solution space by regularization terms. A summary of the full algorithm, including an example implementation in pseudocode, can be found in Section S3 and Fig. S27.
The program SasHel further includes a `parallel mode' that runs a unique reconstruction on each available CPU core, so the model's validity and reproducibility can easily be tested using e.g. DAMAVER (Volkov & Svergun, 2003).
5. Model examples
To test the implemented algorithm, we simulated a number of scattering intensities from seemingly endless bodies (see Appendix A for detailed dimensions). All reconstructions were performed using the default fitting parameters as follows: starting temperature T_{0} = 1, quenching coefficient q_{T} = 0.99, number of iterations N_{k} = 200, helical bias parameter γ = 0.3, compactness weight β = 1, DA form factor diameter D_{DA} = 0.2 nm and number of stacked building blocks M = 10. The dimensions of the initial random configuration, including the stacking distance of the building blocks, were determined from the relative dPDDFs (see Figs. S12 and S13). For all reconstructions, we used 500–800 DAs per building block, resulting in computation times for each run between 20 and 60 min on a standard workstation, respectively.
The scattering intensities of the theoretical models are shown in Fig. 2, along with the fits from the reconstructions. For all geometries used, we find agreement between the theoretical and fitted scattering intensities (the χ^{2} value of all fits is below the threshold of 0.1). Yet, cases A–D show slight oscillations in the higherq regime 1 q 2 nm^{−1} (see Fig. S14), which is a caused by the stacking nature of identical building blocks (see Section S2.3).
Fig. 3 presents the corresponding threedimensional models and reconstructions (see Fig. S15 for point representations). Models A and B show a circular in agreement with the model, while in the case of the cylindrical shell the empty core is present. Further, the rectangular of model C is visible in the reconstructed model. However, the sharp corners are not fully resolved, which is most likely the effect of insufficient resolution from the scattering data (d_{res} ≃ 1.6 nm compared with the rectangular a × b = 4 × 20 nm). In cases D–G, the helical fingerprints (single or doublestranded nature) are well resolved in the reconstructed models. Cases D and E, and F and G, present noticeable differences in their cross sections, helping to distinguish between a helical filament (empty core) and a helical tape (filled core). Further, variations in the thickness of the helical strands can be observed. The overall agreement between model and reconstruction for the main features demonstrates the functionality of the algorithm for similar seemingly endless bodies.
In order to evaluate the stability and reproducibility of the reconstruction algorithm, eight independent runs for each model geometry were analysed using DAMAVER according to the literature (Volkov & Svergun, 2003). Accordingly, we obtained the mean normalized spatial discrepancy 〈NSD〉 as a measure of the similarity of independent reconstructions of each model (〈NSD〉 = 0 for identical configurations). As shown in Fig. 3, the 〈NSD〉 of models A–G gradually increases from 1.01 to 1.37 with the complexity of the model. On an absolute scale, these 〈NSD〉 values are higher than those in the literature (Volkov & Svergun, 2003), where stable reconstructions were linked to an 〈NSD〉 of 0.4–0.7. The reason for this difference is found in the gridfree nature of our approach: NSD values for gridfree programs such as GASBOR are generally higher (Svergun et al., 2001). To validate our findings, we constructed eight randomly filled artificial cylinders and singlestrand helices (N_{DA} = 700), yielding 〈NSD〉 values of 1.03 ± 0.01 and 1.06 ± 0.01, respectively. These values, in context with the 〈NSD〉 of the above reconstructions, provide a reference for gridfree models where an 〈NSD〉 between 1 and 1.4 is observed.
It is also possible to run the fitting algorithm using only M = 1 stacks, which corresponds to the case of the building block alone. Thus, the same program can also be used to fit globular particles. We tested this option in a similar way to the method outlined above by simulating a number of scattering intensities from globular bodies (see Appendix A for detailed dimensions). For all reconstructions, we used the same default fitting parameters as mentioned previously. The dimensions of the initial random configurations were determined from the maximum dimension found in the relative PDDFs (see Fig. S12). However, we increased the number of DAs to 800–1200, now resulting in computation times between 5 and 10 min per run on a standard workstation, respectively.
The scattering intensities of the theoretical models and the fits from the reconstructions are shown in Fig. 4(a). For all geometries used, we find agreement between the theoretical and fitted scattering intensities (the χ^{2} value of all fits is below the threshold of 0.1). Evidently, the oscillations previously found for the repeating motif in the higherangle regime (see Fig. 2) are not present. The threedimensional models and reconstructions corresponding to the scattering patterns are shown in Fig. 4(b) (see Fig. S16 for point representations). Also in this case, the fitted morphologies clearly represent the corresponding models, including the hollow geometries I and L. The 〈NSD〉 values for eight independent reconstructions of each geometry are within the range 0.96–1.28, as denoted in Fig. 4(b) [in relative terms, the 〈NSD〉 values for eight randomly filled artificial spheres and cubes (N_{BB} = 1500) were 1.04 ± 0.01 and 1.06 ± 0.01, respectively]. These findings validate the use of the algorithm for globular bodies as well.
6. Experimental examples
As final example, we applied the described reconstruction procedure to experimental data for a selfassembled peptide doublestrand helix (Kornmueller et al., 2015). Prior to the fitting process, we determined a buildingblock stack spacing of 53 nm from the corresponding PDDF (see inset in Fig. 5 and Fig. S13). The scattering data were then fitted using 800 DAs over the angular range 0.08 q 2.14 nm^{−1}, resulting in a realspace resolution of approximately π/q_{max} = 1.5 nm, where the default fitting parameters according to Section 4 were used.
As shown in Fig. 5(a), we again find agreement between the experimental and fitted scattering curves (χ^{2} = 0.08). The corresponding realspace reconstruction (Fig. 5b) presents two independent tapes within the building block (see Fig. S17a for point representations). However, the two tapes do not appear to be symmetric along the z direction, suggesting a displacement angle of φ 180° between them. The of the helical tapes presents a rather rough surface that does not allow more detailed interpretation, as is typical for such randommovement DA models. Nevertheless, a comparison of the reconstruction with the model according to the previously published dimensions (Kornmueller et al., 2015) gives good agreement (see the red model in Fig. 5c). A movie of the rotating helices can be found in the supporting information. To obtain a measure of the uniqueness of the final model, we repeated the fitting procedure to end up with 16 independent reconstructions. The average of all 16 models (Volkov & Svergun, 2003), as shown in Fig. S18(a), is consistent with the single reconstruction. In agreement with the reference values shown in Fig. 3, the 〈NSD〉 was found to be 1.27 ± 0.02, hence confirming the reproducibility of the reconstruction.
Analogous to the above, we further applied the reconstruction procedure to experimental data for the globular protein alcohol dehydrogenase 1 (ADH) in phosphatebuffered saline at pH 7.5. The data set was taken from the SASBDB database (Valentini et al., 2015; date of download 12 December 2016), corresponding to the identifier SASDA52. Again, we first determined the size of the initial random configuration (d = 9 nm) from the PDDF, as seen on the right of Fig. 5. We then fitted the scattering data over the angular range 0.13 q 6 nm^{−1}, corresponding to a realspace resolution of approximately 0.5 nm. The default fitting parameters (M = 1) according to Section 4 were used, but this time the number of DAs was increased to 1500, due to the outstanding angular range.
Also in this case, the experimental data were fully fitted throughout the reconstruction procedure (χ^{2} = 0.24; see Fig. 5c). Interestingly, the corresponding realspace model (Fig. 5d) presents a characteristic triangular that is recurrent from different perspectives (see Fig. S17b for point representations). We thus compared the reconstruction with the ADH found in the literature (Raj et al., 2014), and an overlay of the two models can be seen in Fig. 5(d). Undoubtedly, the reconstructed model is qualitatively in agreement with the molecular structure; quantitatively, we find an NSD between the and the reconstruction of 0.93. Also in this case, we performed a total of 16 independent reconstructions (〈NSD〉 of 1.20 ± 0.02), where the averaged representation is consistent with the molecular structure (see Fig. S18b).
This, in congruence with the previous example, demonstrates the applicability of the proposed fitting algorithm for the realspace reconstruction of seemingly infinite and globular geometries from experimental smallangle scattering data.
7. Conclusions
In conclusion, we have presented a new method for the reconstruction of the structural motif of seemingly endless rodlike systems, such as helices. In this regard, the following points were critical:
(i) We optimized the numerical complexity of the Debye formula for the unique case of recurrent symmetry along the elongation direction, resulting in the proposed projection scheme.
(ii) Based on this projection scheme, we developed a metaheuristic fitting algorithm. Instead of minimizing a variety of structural penalties (Franke & Svergun, 2009), the system is repeatedly forced out of the current numerical equilibrium by forcing the DA to move, regardless of the damage caused.
(iii) The algorithm is implemented in the multiplatform compatible graphical computer program SasHel, which allows live tracking of the fitting progress in real and and encourages user interaction.
We have demonstrated the functionality and reliability of the presented method using a variety of analytical and experimental examples. These showcases provide a comprehensive reference for future users. We have further addressed and discussed the risks of wrongly chosen fitting parameters or insufficient data quality: we have illustrated a series of negative examples to discuss which reconstructed features might be true or not (see Section S2 and Figs. S7–S11 and S19–S26). SasHel, the computer program corresponding to this work, also includes an option for shape reconstruction of globular particles. This, in congruence with its originally intended use, makes the program applicable over a wide range of SAXSbased structural studies, expanding the scope of currently available dummyatom modelling software.
The program SasHel is freely available for academic use. The most uptodate version can be obtained from https://sashel.tugraz.at or upon request from the authors.
8. Related literature
The following references are cited in the supporting information: Damaschun et al. (1968); Konarev & Svergun (2015); Shannon & Weaver (1949); Taupin & Luzzati (1982).
APPENDIX A
Model calculations
All scattering patterns shown throughout this work were calculated according to the literature, with the exact dimensions and references shown in Table 2. The error band of each data set was estimated according to σ(q) = 0.2(cm^{−1/2})[I(q)]^{1/2} (see Section S2 and Figs. S19–S23 for test cases with noisy data). For models A–L we used a total of 124 data points in the angular range 0.01 q 2 nm^{−1}. In case of seemingly endless geometries, we further ensured that the stacking distance H_{BB} was resolvable in the lower angular limit (H_{BB} π/q_{min}). All PDDFs throughout this work were calculated using the GIFT software package (Bergmann et al., 2000).
All reconstructions were performed on a `standard work station' from Hewlett–Packard, using an Intel Core i7 4800MQ processor with four cores, each operating at 2.7 GHz.
Supporting information
Additional discussion and figures. DOI: https://doi.org/10.1107/S2052252518005493/tj5016sup1.pdf
Supporting movie: rotation of the reconstruction from experimental data as shown in Fig. 5(b). DOI: https://doi.org/10.1107/S2052252518005493/tj5016sup2.avi
Acknowledgements
We thank K. Kornmüller and R. Prassl for providing us with the experimental scattering data for the selfassembled peptide double helix. We further thank M. DeMarco for the first core implementation of the algorithm and for consultation throughout the program development, and M. Kriechbaum for revision of the manuscript.
Funding information
The following funding is acknowledged: Horizon 2020 Framework Programme, H2020 Research Infrastructures (grant No. 654360 NFFAEurope).
References
Bergen, M. von, Friedhoff, P., Biernat, J., Heberle, J., Mandelkow, E. M. & Mandelkow, E. (2000). Proc. Natl Acad. Sci. USA, 97, 5129–5134. Google Scholar
Bergmann, A., Fritz, G. & Glatter, O. (2000). J. Appl. Cryst. 33, 1212–1216. Web of Science CrossRef CAS IUCr Journals Google Scholar
Boussaïd, I., Lepagnot, J. & Siarry, P. (2013). Inf. Sci. 237, 82–117. Google Scholar
Busseron, E., Ruff, Y., Moulin, E. & Giuseppone, N. (2013). Nanoscale, 5, 7098–7140. Web of Science CrossRef CAS Google Scholar
Chacón, P., Morán, F., Díaz, J. F., Pantos, E. & Andreu, J. M. (1998). Biophys. J. 74, 2760–2775. Web of Science CAS PubMed Google Scholar
Damaschun, G., Müller, J. J. & Pürschel, H. V. (1968). Monatsh. Chem. 99, 2343–2348. CrossRef CAS Web of Science Google Scholar
Debye, P. (1915). Ann. Phys. 351, 809–823. CrossRef Google Scholar
Dobbs, M. B. (2007). Clin. Orthop. Relat. Res. 462, 2. Web of Science CrossRef Google Scholar
Dobson, C. M. (2003). Nature, 426, 884–890. Web of Science CrossRef CAS Google Scholar
Feigin, L. A. & Svergun, D. I. (1987). Structure Analysis by SmallAngle Xray and Neutron Scattering. New York: Plenum Press. Google Scholar
Franke, D. & Svergun, D. I. (2009). J. Appl. Cryst. 42, 342–346. Web of Science CrossRef CAS IUCr Journals Google Scholar
Fraser, R. D. B., MacRae, T. P. & Suzuki, E. (1978). J. Appl. Cryst. 11, 693–694. CrossRef CAS IUCr Journals Web of Science Google Scholar
Gingras, A. R., Bate, N., Goult, B. T., Hazelwood, L., Canestrelli, I., Grossmann, J. G., Liu, H., Putz, N. S. M., Roberts, G. C. K., Volkmann, N., Hanein, D., Barsukov, I. L. & Critchley, D. R. (2008). EMBO J. 27, 458–469. Web of Science CrossRef CAS Google Scholar
Glatter, O. (1980). Acta Phys. Austriaca, 52, 243–256. Google Scholar
Glatter, O. & Kratky, O. (1982). Small Angle Xray Scattering. London: Academic Press. Google Scholar
Gogna, A. & Tayal, A. (2013). J. Exp. Theor. Artif. Intell. 25, 503–526. Web of Science CrossRef Google Scholar
Grudinin, S., Garkavenko, M. & Kazennov, A. (2017). Acta Cryst. D73, 449–464. Web of Science CrossRef IUCr Journals Google Scholar
Guinier, A. & Fournet, G. (1955). SmallAngle Scattering of Xrays. New York: John Wiley and Sons Inc. Google Scholar
Hamley, I. W. (2008). Macromolecules, 41, 8948–8950. Web of Science CrossRef CAS Google Scholar
Hammouda, B. (2010). J. Appl. Cryst. 43, 1474–1478. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kawaguchi, T. (2001). J. Appl. Cryst. 34, 580–584. Web of Science CrossRef CAS IUCr Journals Google Scholar
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. (1983). Science, 220, 671–680. CrossRef PubMed CAS Web of Science Google Scholar
Konarev, P. V. & Svergun, D. I. (2015). IUCrJ, 2, 352–360. Web of Science CrossRef CAS PubMed IUCr Journals Google Scholar
Kornmueller, K., LetofskyPapst, I., Gradauer, K., Mikl, C., CachoNerin, F., Leypold, M., Keller, W., Leitinger, G., Amenitsch, H. & Prassl, R. (2015). Nano Res. 8, 1822–1833. Web of Science CrossRef CAS Google Scholar
Koutsioubas, A., Jaksch, S. & Pérez, J. (2016). J. Appl. Cryst. 49, 690–695. Web of Science CrossRef CAS IUCr Journals Google Scholar
Koutsioubas, A. & Pérez, J. (2013). J. Appl. Cryst. 46, 1884–1888. Web of Science CrossRef CAS IUCr Journals Google Scholar
Li, T., Senesi, A. J. & Lee, B. (2016). Chem. Rev. 116, 11128–11180. Web of Science CrossRef CAS PubMed Google Scholar
Mitchell, M. (1996). Comput. Math. Appl. 32, 133. Google Scholar
MorozovaRoche, L. & Malisauskas, M. (2007). Curr. Med. Chem. 14, 1221–1230. CAS Google Scholar
Palmer, L. C. & Stupp, S. I. (2008). Acc. Chem. Res. 41, 1674–1684. Web of Science CrossRef CAS Google Scholar
Pedersen, J. S. (1997). Adv. Colloid Interface Sci. 70, 171–210. CrossRef CAS Web of Science Google Scholar
Praetorius, F. & Dietz, H. (2017). Science, 355, eaam5488. Web of Science CrossRef Google Scholar
Pringle, O. A. & Schmidt, P. W. (1971). J. Appl. Cryst. 4, 290–293. CrossRef CAS IUCr Journals Web of Science Google Scholar
Raj, S. B., Ramaswamy, S. & Plapp, B. V. (2014). Biochemistry, 53, 5791–5803. Web of Science CrossRef CAS PubMed Google Scholar
Schlick, T. (2010). Molecular Modeling and Simulation: An Interdisciplinary Guide. New York: Springer New York. Google Scholar
Schmidt, P. W. (1970). J. Appl. Cryst. 3, 257–264. CrossRef CAS IUCr Journals Web of Science Google Scholar
Shannon, C. E. & Weaver, W. (1949). The Mathematical Theory of Communication. Urbana: University of Illinois Press. Google Scholar
Svergun, D. I. (1999). Biophys. J. 76, 2879–2886. Web of Science CrossRef PubMed CAS Google Scholar
Svergun, D., Barberato, C. & Koch, M. H. J. (1995). J. Appl. Cryst. 28, 768–773. CrossRef CAS Web of Science IUCr Journals Google Scholar
Svergun, D. I., Petoukhov, M. V. & Koch, M. H. (2001). Biophys. J. 80, 2946–2953. Web of Science CrossRef PubMed CAS Google Scholar
Taleb, N. N. & Douady, R. (2013). Quant. Finance, 13, 1677–1689. Web of Science CrossRef Google Scholar
Taupin, D. & Luzzati, V. (1982). J. Appl. Cryst. 15, 289–300. CrossRef CAS Web of Science IUCr Journals Google Scholar
Teixeira, C. V., Amenitsch, H., Fukushima, T., Hill, J. P., Jin, W., Aida, T., Hotokka, M. & Lindén, M. (2010). J. Appl. Cryst. 43, 850–857. Web of Science CrossRef CAS IUCr Journals Google Scholar
Valentini, E., Kikhney, A. G., Previtali, G., Jeffries, C. M. & Svergun, D. I. (2015). Nucleic Acids Res. 43, D357–D363. Web of Science CrossRef CAS PubMed Google Scholar
Venter, J. C. et al. (2001). Science, 291, 1304–1351. Web of Science CrossRef PubMed CAS Google Scholar
Vestergaard, B., Groenning, M., Roessle, M., Kastrup, J. S., van de Weert, M., Flink, J. M., Frokjaer, S., Gajhede, M. & Svergun, D. I. (2007). PLoS Biol. 5, e134. Web of Science CrossRef PubMed Google Scholar
Volkov, V. V. & Svergun, D. I. (2003). J. Appl. Cryst. 36, 860–864. Web of Science CrossRef CAS IUCr Journals Google Scholar
Walther, D., Cohen, F. E. & Doniach, S. (2000). J. Appl. Cryst. 33, 350–363. Web of Science CrossRef CAS IUCr Journals Google Scholar
Wasielewski, M. R. (2009). Acc. Chem. Res. 42, 1910–1921. Web of Science CrossRef CAS Google Scholar
Watson, J. D. & Crick, F. H. C. (1953). Nature, 171, 737–738. CrossRef PubMed CAS Web of Science Google Scholar
Wilkins, M. H. F., Stokes, A. R. & Wilson, H. R. (1953). Nature, 171, 738–740. CrossRef CAS Web of Science Google Scholar
Yang, S. (2014). Adv. Mater. 26, 7902–7910. Web of Science CrossRef CAS PubMed Google Scholar
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.