Quantitative analysis of the effect of radiation on mitochondria structure using coherent diffraction imaging with a clustering algorithm

A clustering algorithm based on deep learning is proposed to perform accurate image reconstruction from noisy coherent diffraction patterns. Structural changes in mitochondria induced by X-ray radiation damage are quantitatively characterized and analysed at the nanoscale with different radiation doses.


S1. Phase retrieval and image reconstruction
The reconstruction process for the first diffraction pattern was shown in detail as an example, because the parameters and processes used in the other three patterns were the same. After setting α = (160,140,120,100,85,70,55,40,30,20) and β = 0.9 (α and β are the parameters of Gaussian kernels and real-space limit), a total of m = 1,000 different initial phase reconstructions with 69×83 rectangular support were performed to generate a tight support. During the iterations, the error between the calculation and experiment was monitored by an R factor (R f ) (Sekiguchi et al., 2016), where u and v are frequencies, |F c (u,v)| and |F e (u,v)| are the Fourier modulus of the calculation and experiment, respectively. The OSS algorithm contained a total of 2000 iterations, and every 200 iterations was a step. Each step updated α to make the Gaussian kernel sharper for smoother filtering, and the reconstruction with the smallest R f was selected as the input for the next step. However, for the high-noise diffraction patterns in this paper, the changes in α in the first few steps were sharply. If these steps also took the reconstructions with the smallest R f as the input, the reconstruction performance was poor, so the last reconstructions in the first i steps were directly used as the input for the next step; here i = 3. In each of the remaining steps, a GrabCut was used to generate new support representing the approximate outline of the object (Rother et al., 2004). GrabCut is a commonly used image segmentation algorithm that needs nothing in addition to setting a rectangular area in advance. Outside this rectangular area is completely background and no object. Based on the characteristics of the external background, the algorithm separates the target and background in the area. Fortunately, the rectangular support set in the initial iteration could be used as the GrabCut input. After 10 steps, another one step of error reduction (ER) was implemented to further prevent stagnation problems (Fienup, 1982). In the end, rough electron density maps were obtained. By averaging the tens of reconstructions (here we averaged 24 reconstructions) with the smallest R f and using GrabCut, a tight support could be generated.
In order to prevent the support from being too compact and cut to the sample, the tight support was dilated by 3 pixels.
After the first generation of reconstructions mentioned above, the next generation would start with the newly generated tight support to improve the quality of imaging, while remain other parameters  (Sekiguchi et al., 2016). They perform PCA and k-means both in support estimation and phase retrieval for fine structures. In this study, we investigated ConvRe, a clustering algorithm based on deep learning, and the flow chart is shown in orange in Fig. 2. Through the whole process, the extraction of features is more important, which will affect the final clustering performance.
IUCrJ (2022). 9, https://doi.org/10.1107/S2052252521012963 Supporting information, sup-3 First, all of the reconstructions were scaled up from 83×69 pixels to 224×224 pixels (through cv2.resize) (Culjak et al., 2012) since the weights of convolutional neural networks (CNNs) used were pre-trained on the ImageNet database and only able to process images with the same size as training (Russakovsky et al., 2015). The reason for choosing ImageNet is that its database contains millions of images, and the weights obtained by training with such a large database are highly reliable. Then, by the keras.applications module (Chollet, 2015), the VGG16, VGG19, and ResNet50 networks were used for feature extraction to find the best performing network for these reconstructions (Simonyan & Zisserman, 2014, He et al., 2016. The fully connected layers were removed and the outputs were computed from a series of convolution layers. After that, there were 25088 (for VGG16 and VGG19) or Therefore, by making a scatter plot of the first three principal components, the number of clusters k that needs to be set next could be determined.
Finally, the reconstructions were clustered into k clusters by k-means (through sklearn.cluster.KMeans), which is one of the most popular clustering algorithms due to its simplicity and efficiency (Lloyd, 1982). For better performance and efficiency, we performed k-means++, which keeps the initial k centroids away from each other (Arthur & Vassilvitskii, 2007). Cohn and Holm indicate the accuracy of the above process through VGG16 as the feature extraction network (Cohn & Holm, 2020). Although the network depth of VGG16, VGG19, ResNet50 is gradually deepening and the theoretical performance is becoming stronger, it is still necessary to compare to choose the most suitable network.  Table S1 shows the results of three different networks after clustering. The cluster with the smallest R f of each network is in bold. Among them, the results obtained by VGG16 had the largest number of reconstructions while having the smallest error, which was the most suitable network for Pattern1. Here, the number of reconstructions included in the cluster was also used as a criterion for selecting the cluster. This is because under normal circumstances, the correct results tend to appear the most frequently, and this criterion is also used in XFEL data analysis (van der Schot et al., 2015, Sekiguchi et al., 2016. However, if the reconstructions in this cluster were selected indiscriminately and averaged as the imaging result, both the resolution and the contrast would be poor. Therefore, the cross correlation (CC) was introduced, where (x,y) and � are the i th reconstruction result and its average, respectively, and (x,y) is the average of all reconstructions in this cluster as the comparison standard. As shown in Fig. 3(b), the Fourier error is the normalized R f , where Fourier error . The realspace error, error R = 1 -CC (i,j) , represents the disagreement between reconstructions. By taking them as the coordinates of each reconstruction, the 24 reconstructions nearest to the origin were averaged as the final imaging result, which is shown in Fig. S2. The clustering result of 4 patterns is shown in Fig.   S3.

Figure S3
Clustering of one thousand reconstructions from the patterns at different radiation doses using VGG16.

S2. Radiation dose estimation for mitochondrion
To quantify the radiation dose, the incident X-ray flux has to be calculated first: where F L and F H represent flux of LROI and HROI respectively, n e = 4.5×10 5 the number of electron counts, g = 10 7 the amplifier gain, and w = 3.65eV the energy required to create an electron-hole pair.
IUCrJ (2022). 9, https://doi.org/10.1107/S2052252521012963 Supporting information, sup-7 Then the total incident photons per projection P T could be got. The exposure time for the LROI pattern was ~ 0.08 s/exposure with 1000 exposures and for the HROI pattern was ~ 12 s/exposure with 80 exposures.
The total incident X-ray photons per projection and unit area (through a pinhole with a diameter of 10 μm) was where η = 83.8% is the fraction coefficient and A = 1.77×10 -10 m 2 the area of the pinhole.
Finally, the radiation dose of the first patterns could be determined by where μ ρ = 23.20 cm 2 /g is the mass absorption coefficient of mitochondria at 5.5keV. Since only the HROI was collected for the following three patterns, the exposure time was 960s each time. In the same way, the remaining radiation dose could be calculated, which were 57.8, 85.6, and 113 MGy respectively.  IUCrJ (2022). 9, https://doi.org/10.1107/S2052252521012963 Supporting information, sup-8 As shown in Fig. S4(a), the PRTF curves of Patterns1 and 4 show abnormal rise in the highfrequency region, which is contrary to the theory. In the high-frequency region, the signals were weak and the signal-to-noise ratio was poor. Thus, there should not be a tendency to improve the quality of reconstruction. By introducing a Wiener filter, the noise of the experimental pattern could be suppressed, thus the outliers in the PRTF were removed. The curves of Wiener-filtered phase retrieval transfer function (wPRTF) are shown in Fig. S4(b). The curves in the high-frequency region show a downward trend which is consistent with the theory.