Dummy-atom modelling of stacked and helical nanostructures from solution scattering data

A bead-modelling approach is presented to determine the structural motifs of helical and rod-like systems from small-angle solution scattering data. The implemented algorithm is verified using analytical models and is further applied to reconstruct from experimental scattering data the building block of a self-assembled peptide double helix.

In order to investigate if the PDDF can be obtained from scattering data even if > , we calculated the scattering pattern of 63 concentric stacked rings (see model L in Table 2) using a dspacing of 7 between them, which results in a total height of 439 nm. We then obtained the PDDF from this data-set using an increasing low-q Fourier limit such that simultaneously, the resolvable limit is decreased. The resulting curves, as well as the dPDDFs with corresponding fits are shown in Supporting Figure S3.
Of course, the low-q transition from the −0 Guinier to the rod-like −1 Porod regime in the PDDF fits of the scattering patterns occurs earlier the more the scattering curve is truncated (see Supporting Figure S2 -top). However, the resulting PDDFs and dPDDFs remain congruent as only the number of stacking induced oscillations decreases due to the artificial truncation of the low-q range and consequently = / . Further, and most important for the scope of this work, the corresponding fits of the dPDDF yield the right stacking-distance with an approximate error of <2%. A model reconstruction from the used scattering pattern can be found in Supporting Figure S5.

S1.3. Limitations and error of misdetermination
Of course, the determination of the stacking-distance from the dPDDF relies on the fact that the stacking-induced oscillations can be separated from the single-building-block-and the crosssectional-fingerprint. In the case of helical structures, there exist two scenarios where this might not be the casefor extremely high (case one) and extremely low (case two) building-block aspect ratios.
Moreover, in case three, the effect of a wrongfully determined stacking-distance on the reconstruction is discussed.
For case one, if the helix is pulled apart, the stacking-distance becomes significantly larger than the helix diameter: the actual twisted motif becomes weaker and less influential until, eventually, the helix is completely straight. Hence, the cross-sectional-fingerprint dominates the PDDF while the stacking-induced oscillations disappear. To investigate the possible error resulting from this effect, we determined the stacking-distance from the dPDDF of helices according to model D in Figures 2 and 3 with increasing pitch, ranging from 10 to 100 nm (in all cases with diameter of 20 ). The corresponding scattering curves, PDDFs, dPDDFs and fits can be found in Supporting Figure S5. For all cases with a stacking distance < 70 nm, the error of misdetermination was less than 2%, while this error systematically rises up to 6% for larger aspect ratios.
The other scenario (case two) in which the stacking-induced oscillations might be influenced by the building-block is the opposite case of a highly compressed helix. In these cases, the stacking-distance is smaller than the helix-diameter, hence convoluting the first 1-2 high-r oscillations. This can cause misinterpretation, if these peaks are not neglected when fitting the damped sine function to retrieve the stacking-distance. As an example we fitted the dPDDF of the helix in model D in Figures 2 and 3 with a diameter of 20 nm and a pitch of 10 nm. As seen in Supporting Figure S6, the determined stacking-distance changes from 9.83 to 9.64 nm, if the first peak is considered, thus leading to an increased error.
However, case three, there always exists the possibility of choosing a (slightly-) wrong building-block stacking-distance, e.g. when noisy experimental data is used (see Supporting Information section S2 for details). Hence, the question arises to which extend the reconstruction remains stable. We thus took the scattering patterns of models D and F (pitch of 50 nm) and reconstructed the building-block motif using a series of wrong stacking-distances, ranging from 30 − 70 nm.  Figure S7). Similarly, the corresponding reconstructions present significant artefacts both in the lateral and cross-sectional perspective. As one would expect, these deviations in real-and reciprocal-space become smaller the closer the used stacking-distance comes to the real dimension (50 nm). Even though both the fitted scattering curves and the reconstructions of the cases using 45 and 55 nm still show deviations from the ideal case, the motif of the structure being a continuous single-strand helix, remains preserved.
In case of model F: In this case, the fitted scattering patterns are in good agreement with the analytical model over the full q-range. Even in the low-q range, no particular deviations can be found (see Supporting Figure S9). Interestingly, the reconstructions in all cases present the typical double strand motif, even in cases of 30 and 70 nm (model pitch of 50 nm). However, particularly these two cases present strong deviations from the ideal shape in regard of the continuity of the helical strands. It is further noteworthy, that in none of the reconstructions of model F one helical strand is higher populated than the other, as one would expect if a helical motif would be forced onto the system. The helical bias term ( = 0.3 for all reconstructions shown in this work) hence does not influence the final reconstruction.
These illustrative examples suggest that the algorithm presented in this work supplies stable reconstructions even if the stacking-distance of the building-blocks is determined with a relative error of less than 10%.

S2. Additional considerations on "model resolution"
The terminology of "model resolution" used in this work relates to two major aspects: the resolution limited by the information content provided by the scattering curve as well as the resolution relating to the used number of dummy atoms (DA). While these aspects overlap to some extent, we briefly address them separately to raise awareness and prevent model over-and misinterpretation.

S2.1. Angular information content
A quantitative estimate of the information content accessible from scattering data is the number of Shannon channels . In the case of globular particles, this number is defined by the largest accessible scattering angle and the maximum real-space dimension by = / (Shannon & Weaver, 1949;Damaschun et al., 1968;Taupin & Luzatti, 1982). Nevertheless, for helical structures, is seemingly infinite, which makes this formalism impractical.
In a similar but more general formalism, can also be used to estimate the minimal samplingdistance in real-space. This sampling distance hence defines the, mathematically speaking, real-space resolution corresponding to the scattering curve, such that the smallest resolvable feature is > , 1982;Feigin & Svergun, 1987). In the case of scattering from a helical structure, the smallest structural motif is the width of the helical tape cross-section. Thus, the question arises to which extent the reconstruction of this motif is affected by an insufficient angular range and therefore real-space resolution.
To visually investigate this effect, we repeated the reconstructions of models D and F in Figures 2 and 3, using scattering curves with a de-and increased angular range. The resulting models and the corresponding curves are shown in Supporting Figure S19 & 20. In case of model D, all three reconstructions present the characteristic helical motif. However, the quality of the reconstructed tape as well as the overall radial cross-section is evidently better with increasing q-range. Also in case of model F, all three reconstructions present the characteristic double strand motif, whereas again the quality of the reconstructed strands as well as the overall radial cross-section is better with increasing q-range. Interestingly, even in the low resolution case (S.Fa) both helical strands are equally populated, suggesting that the helical bias parameter does not influence the reconstruction.
In relative terms, the helical-tape cross section of the used models is = 5 nm whereas the resolution defined by the scattering angle corresponds to 3.1, 1.6 and 1.0 nm in the case of model scattering S.Da/S.Fa, S.Db/S.Fb and S.Dc/S.Fc., respectively. As only the latter two reconstructions (S.Db/S.Fb and S.Dc/S.Fc) resemble the actual model shape, we can conclude that reconstruction is not feasible if the resolution defined by the scattering range is worse than half of the helical-tape cross section.

S2.2. Statistical information content
Experimental scattering data is never noise free. Hence, the information content provided by a scattering curve is not only limited by the experimentally accessible angular range but also by the statistical quality of the data: the noise (Konarev & Svergun, 2015). So the obvious question in regard to this work is, at which noise level the determination of the stacking-distance and the fitting algorithm break down? Therefore, we took the reference patterns of models D and F with an angular range corresponding to case S.Db/S.Fb ( = 2 nm −1 ) and gradually added artificial noise. In detail, we defined the error- We first used the noisy data sets to determine the stacking distance . For cases S.D5, S.D10 and S.D50 the found building-block height is within 10% of the real value (see Supporting Figure S21).
Only in the case of S.D100, we find a stacking-distance of 56.89 nm, which is off by 14 % compared to the model-pitch of 50 nm. For cases S.F5-100, the building-block height is within 10% of the real value (see Supporting Figure S16). However, in case of S.F100, only two oscillations are found in the dPDDF, yielding the determined value questionable.
Nevertheless, we reconstruct the structural motif from all scattering patterns using the determined stacking-distances. As seen in Supporting Figure S22, starting from the case of S.D10 the helical strand is not anymore continuous in the reconstructed modela phenomenon similar to the effect of insufficient angular resolution as observed for S.Da in Supporting Figure S19. However, as seen in the point-representations in Supporting Figure S22c, the DA density remains homogenous such that a helical motif is suggested. In the cases of S.D50 and S.D100 excessive DA clustering is observable such that the reconstruction is not feasible. However, a detailed look at the model data and the fitted scattering patterns in Supporting Figure S22a of S.D10, S.D50 and S.D100 shows an interesting anomaly: the reconstruction algorithm wrongfully fits artefacts in the scattering patterns that are caused by the random noise. In order to avoid this circumstance, we used the scattering patterns of the fitted PDDF curves (see Supporting Figure S21) as input for the reconstruction procedure. As shown in point representations in Supporting Figure S23 all reconstructions now present a homogenous DA distribution and no clustering occurs. However, a continuous helical strand is only found for cases

S.D5, S.D10 and S.D50.
Similar phenomena are witnessed for model F: as shown in Supporting Figure S24, direct fitting of the scattering data yields the characteristic double-stand motif for all reconstructions. For S.F50 and S.F100, minor deviations of the single strands from the ideal shape can be observed. In contrary to model D, reconstructions from the scattering patterns of the fitted PDF curves (see Supporting Figure   S16) show no significant improvement or worsening (see Supporting Figure S25). However, a detailed comparison of the point plots (Supporting Figures S24c & 25c) reveals a more homogenous DA distribution within the helical strands in the latter case (using the scattering intensity of the fitted PDDF). We find it further noteworthy that also in this case, no influence of the helical field can be seen in any of the double strand helix reconstructions.
To summarize the above findings, direct fitting of noisy experimental data can lead to physically unfeasible artefacts such as DA clustering or strong inhomogeneties in the DA density, which can be improved using the smoothed scattering curve from the PDDF determination.

S2.3. DA resolution
In the case of fixed-grid DA modelling (Chacón et al., 1998;Walther et al., 2000;Svergun, 1999;Franke & Svergun, 2009;Koutsioubas & Pérez, 2013), the scale of the grid and thus the model resolution is defined prior to the fitting process. Hence, the number of DAs that represent the final model solely depends on the shape of the reconstruction. However, in the case of fitting by random DA movement, as proposed in this work, such an intrinsic spatial resolution limit does not exist.
Consequently, to increase the model resolution, it appears intuitive to simply increase the number of DAs used for the fitting process, as this would add additional degrees of freedom to e.g. model the structure's surface. However, and as already written in section 4.4, when performing such a random movement model reconstruction, it can be expected to end up with a fitted configuration that presents structural features below the resolution limit of the experimental data ( = / where is the upper angular range of the fitable data (Glatter & Kratky, 1982;Feigin & Svergun, 1987)).
Excessively increasing the number of DAs might thus actually not necessarily increase the model resolution, as also the amount of non-resolvable model artefacts will increase. To minimize the risk of over-interpreting such artefacts, we suggest to keep the number of DAs rather low (from experience < 1000) or, if large numbers cannot be avoided, to always compare the final model with the resolution limit provided by the scattering curve and to perform a statistical analysis using e.g. DAMAVER (Volkov & Svergun, 2003) (see Supporting Figure S18 for an example of such an analysis).
However, when minimizing the number of DAs used for the fitting process, one might run into other issues such as general under-sampling or, in the case of the proposed projection scheme, coherence effects resulting from the mirroring of an exact DA conformation. To illustrate the latter effect, we calculated scattering curves of artificially constructed single-strand helices with a decreasing number of DAs. In one case we generated the helix (consisting of 15 building block units) by randomly filling up the full helix, such that no two building-blocks would be the same. We then used the standard Debye formula to obtain the scattering pattern. In the other case, we only filled up one single building-block and then used our projection scheme to calculate the scattering curves. The results as well as the corresponding analytical model (Pringle & Schmidt, 1970) are shown in Supporting Figure   S26.
As expected, the lower angular regime of all model curves ( < 1 nm −1 ) is in good agreement with the analytical one. Even in the case of the 250 DAs/BB model, which corresponds to an approximate next-neighbor DA distance of 3 , no significant deviations are visible. Yet, when looking at the upper angular regime (1 < < 2 nm −1 ) of the projection scheme curve (250 As/BB), two rather obvious peaks appear. If now the number of DAs is increased, these peaks become weaker (500 DAs/BB) and eventually disappear(1000 As/BB). However, the sole fact that the peaks' positions between 250 and the 500 DAs/BB model do not change indicates that this effect is caused by the stacking of identical building blocks. A comparison to the scattering curves from the fully filled helices (equivalent to the stacking of non-identical building blocks) confirms this, as no such peaks can be found in the 250 As/BB model.
To summarize the above, an excessive increase of the number of DAs used in a fitting process causes not only an immense numerical overhead, but also leads to an increased number of non-resolvable artefacts. This can further lead to misinterpretation of reconstructed models. On the other hand, if coherence effects, such as depicted in Supporting Figure S26, are visible in the fitted curve, one should nevertheless consider increasing the number DAs to suppress these effects.

S3. Fitting algorithmtechnical details
The fitting algorithm implemented in SasHel starts from a random configuration 0 . If wanted (but A pseudo-code implementation of the full fitting algorithm is shown in Supporting Figure S27.     Scattering curves of single strand helices (see model D in Table 2   can cause a distortion on the fitting procedure. Inclusion of this peak thus leads to an increased error (top) in the determined stacking-distance. The scattering curve as well as the PDDF corresponding to the dPDDF can be found in Supporting Figure S4.

Figure S7
Reconstruction of model D (see Table 2 of the main text) with a building-block height of