Evaluation of the performance of classification algorithms for XFEL single-particle imaging data

The performances of three image-classification algorithms were evaluated. The three classification methods lead to different datasets and subsequently result in different electron density maps of the reconstructed models.

The architecture of the convolutional neural network used in this study is shown in Figure S1. Binary cross-entropy and adaptive moment estimation (Adam) (Kingma & Ba, 2014) are used as a loss function and optimisation algorithm. Our program is established on the Keras framework with a Theano backend.
Based on the architecture in Figure S1, the number of parameters in CNN can be calculated as below.

S2. Graph cut model
The construction of the sparse graph for the data and the computation of the adjacency matrix is the most time-consuming part of the implementation of the GC algorithm. We used the MATLAB software package VLFeat (Vedaldi & Fulkerson, 2010) to construct the k-nearest neighbour graph for the dataset ( = 5 in this application), and the adjacency matrix was computed accordingly. For simplicity, we refer to the gradient operator defined on the graph as the symbol . More exactly, let ( ) be a function defined on the vertices of the graph . Given vertex and one of its neighbouring vertices , the gradient of at in the direction of to is denoted by Thus, ( ) is a sparse -dimensional vector with number of nonzero entries equal to the number of neighbours of . By stacking all of the ( ) in rows, we obtain as the × sparse matrix.
Accordingly, we can define a "flow" variable on the graph in the same dimension of , where a row, for example ( ), denotes the flow from the vertex to every vertex in the graph. The divergence operator ( ) is defined as the adjoint operator to the gradient operator ( ), usually defined on a flow on the graph such that is an -dimensional vector. The prior probability ( ) for vertices of the graph models the conditional probability of belonging to the single-particle patterns given the set 1 of the single-particle patterns and the set 0 of the non-single-particle patterns. We use the following definition for ( ): where ( , ) = ( (2) ) 2 (2) (2) . Here (2) is the ( , ) entry of the matrix 2 ( to the second power).
The pseudo code for the GC algorithm is listed in Figure S2. In each iteration of the whole loop, there is one evaluation of the gradient of the labelling function and one evaluation of the divergence of the flow , which take up most of the computation cost. In addition, there is one projection onto the infinity-norm ball ( || || ∞ ≤1 ) that restricts each row of to be unit vector and one projection onto the set ∆ where each entry is on the interval [0,1]. Figure S3 shows the pseudo code of DM method, and the source code is available at https://github.com/haoyuanli93/DiffusionMap . For CXIDB 58, the first 3 components of the eigenvectors are used for clustering, where ( 1 , 2 , 3 ) = (−0.75,0,0) is recognised as the most ideal single-particle diffraction pattern.

S4. Calculation method of orientation distributions
The orientation distribution of the merged dataset is calculated as equation below, where is the total probability of the th orientation and is the probability of the th orientation for the th pattern. In this study, we only used the top 10 orientations with the highest probabilities for every pattern.

S5. Orientation recovery using simulation data
10,000 diffraction patterns were simulated from a solid icosahedron using randomly sampled Euler angles, shown in Figure S8a. The orientation recovery on this simulation data were carried out using the Dragonfly program, and the recovered orientation distribution is shown in Figure S8b. Although there are some fluctuations in the distributions, the is no belt-like distribution observed in Figure S4.
Furthermore, the fluctuation levels are much smaller in the simulation dataset, compared to uneven