High-speed classification of coherent X-ray diffraction patterns on the K computer for high-resolution single biomolecule imaging

A code with an algorithm for high-speed classification of X-ray diffraction patterns has been developed. Results obtained for a set of 1 × 106 simulated diffraction patterns are also reported.


Introduction
The X-ray free-electron laser (XFEL) generates an intense X-ray laser pulse as short as a few femtoseconds. This type of light source is anticipated to offer a new possibility of single-particle coherent X-ray diffraction imaging (CXDI) for non-crystalline biomolecular samples (Neutze et al., 2000;Schlichting & Miao, 2012). The intense X-ray laser pulse is irradiated onto a single biomolecular target, and two-dimensional coherent diffraction patterns are recorded repeatedly, each for a random unknown orientation. Even with the use of an intense XFEL, the diffraction intensity arising from a single particle is weak, causing diffraction patterns deeply immersed in quantum noise.
A decade ago, a basic scheme of data analysis for threedimensional structure determination was suggested (Huldt et al., 2003). This scheme consists of three steps. At first, the diffraction patterns are classified according to similarity and averaged within each similarity group in order to improve the signal-to-noise (S/N) ratio. Then, a three-dimensional diffraction intensity function is constructed by aligning signalenhanced two-dimensional diffraction patterns in reciprocal space. Finally, the phase is retrieved by applying the oversampling method (Sayre, 1952;Gerchberg & Saxton, 1972;Fienup, 1982).
Generally, along the suggested line, we reported a detailed algorithm for classifying and assembling two-dimensional noisy diffraction patterns to construct a three-dimensional diffraction intensity function (Tokuhisa et al., 2012). The algorithm enables signals immersed in the quantum noise to be extracted, which is indispensable in constructing a nearatomic-resolution three-dimensional structure. We have reported that the algorithm can classify the diffraction data with statistics as low as 0.1 photons pixel À1 . To classify diffraction patterns according to the similarity, a correlation pattern is calculated for each pair of diffraction patterns. In order to construct a structure with sub-nanometer resolution for the case of 70S ribosome, it is necessary to analyze about 1 Â 10 6 diffraction patterns.
All-pair calculation for this number of patterns is of high computational cost. In this paper we report a representative-all pair scheme in order to reduce the cost significantly. In this scheme, correlation patterns are calculated between one from two-dimensional diffraction patterns representing each similarity group and the other from a set of whole observed twodimensional diffraction patterns. The number of correlation patterns to be calculated is about 13 billion for the above example. Even with this representative-all pair scheme, the calculations take about 100 days in the case of a 10TFLOPS computer.
For a system of data analysis to be practically useful, it is necessary to process the calculation concurrent to the data collection in order to diagnose the data quality during the experiments. The calculation results can then be used to optimize the experimental parameters (Tokuhisa, 2013). To achieve these goals we have implemented a code of high-speed classification on the K computer, a 10PFLOPS supercomputer at RIKEN Advanced Institute for Computational Science (AICS; http://www.aics.riken.jp/en/). We report the present status of our developments on (i) the non-visual automatic similarity detection algorithm, (ii) the representative-all pair classification scheme, (iii) program parallelization, and (iv) an efficient diffraction data flow between the XFEL facility, SACLA (Ishikawa et al., 2012), and the K computer. Computation results obtained using a set of 1 Â 10 6 simulated diffraction patterns are also reported.

2.
A high-speed classification system 2.1. Automatic similarity detection algorithm In our method of detecting similarity between a pair of twodimensional diffraction patterns i and j, we calculate a correlation pattern c ij (, ) as a function of two variables and and defined as follows (Tokuhisa et al., 2012), Here is the angle of diffraction, which is expressed as 2 in the usual literature, is the angle of rotation of the detector plane around the incident beam axis, N is the number of Shannon pixels on a circle with a fixed value of , s Q is the photon number to be observed by a detector Shannon pixel with solid angle !, and " s s is its mean over pixels on the above circle. The quantum-mechanically expected mean s(k) of s Q is given by where I i is the incident X-ray intensity, r CE is the classical electron radius, F(k) is the structure factor, k is the momentum transfer and i(k) is the diffraction intensity density. The magnitude of momentum transfer is given as where is the wavelength of the incident X-ray. Simulated examples of s Q and c ij are shown in Fig. 1. Reflecting the fact that the target is a single particle, the experimentally observed diffraction pattern s Q (, ) is immersed deeply in the quantum noise especially in the higher-angle range. This noisy nature of s Q is inherited in the noisiness of c ij . When a pair of s Q s for i and j are similar, a high correlation line appears in c ij . The correlation line becomes invisible against the noisy background at a high k region, k > k N , where k N (subscript standing for 'noise') is the value of k at which the standard deviation of the background noise becomes as high as 0.6 ' exp(À1/2) (Tokuhisa et al., 2012).
Detection of similarity between a pair of s Q s is thus translated to detection of a high correlation line in c ij . To do this job at high speed, we developed an algorithm of non-visual automatic similarity detection. The basic idea of the algorithm is to use the following integral value of c ij so that the quantum noise is averaged out within a single figure and the positive definite signals are enhanced by integration, Here, k C is the correlation length of the intensity data which is approximately given by k C = 1/L with L being the length of a sample molecule. In our method, the upper bound of the integration k up is chosen in the range k up k N where " s sðk N Þ = I i r 2 CE ! " i iðk N Þ assumes the value of about 0.1. An example of this integrated correlation pattern, I c , is also shown in Fig. 1. If a significant maximum of I c (k up , ) is detected at = , it gives the direction of the high correlation line. We then identify the value of k up for which I c (k up , ) assumes the peak value within the range k up k N , write such a value as I c ðk k; Þ, and will refer to it as the peak value of the integrated correlation. This value is used to judge the similarity between the pair of s Q s for i and j. A higher value of I c ðk k; Þ means a higher similarity. Use of I c contributed to improve the sensitivity of the method significantly.
In our method, s Q s are classified according to similarity and averaged within each similarity group. The similarity is judged by I c ðk k; Þ. If we employ a higher threshold value of I c ðk k; Þ for a pair of s Q s to be classified into one group, the number of similarity groups will become larger. But, at the expense of a large number of groups, we can attain higher structural resolution of the final result. We can control the obtainable structural resolution by the threshold value of I c ðk k; Þ.

Representative-all pair classification scheme
To avoid the necessity of carrying out c ij calculations for all pairs of 1 Â 10 6 s Q s, we adopt the representative-all pair scheme mentioned in the Introduction. This scheme contains two tasks: (i) selection of representative s Q s and (ii) implementation of similarity detection for pairs, each pair consisting of one from a set of the representative s Q s and the other from the set of whole s Q s. Even though it is conceivable to design a scheme to solve task (i) while processing task (ii), we nonetheless developed a simple scheme in which the two tasks are carried out in two separate steps in sequence. This simple scheme has the merit of being quick and flexible. Because of its quickness, this scheme is suited for the real time application of the analysis system. In step 1, task (i) is carried out as follows. We start from defining a targeted structural resolution. In the simulation calculation for 70S ribosome, a particle with length L of about 270 Å , we set the targeted resolution r to be about 5 Å . This value is translated to an allowed solid angle ! G = 2 G of a circular disc on the Ewald sphere for each similarity group, where G = r/L (Tokuhisa et al., 2012) turns out to be about 1 . In order to select a good set of representative s Q s, the respective beam directions of the associated patterns should not be close. Here we select a set of representative s Q s that satisfies all the angles between the respective beam directions larger than G on the Ewald sphere. We estimate the maximum number of points on the sphere satisfying this requirement by 4/! G , where 4 is the solid angle of the whole sphere. This number turns out to be about 13000. Thus the targeted resolution is translated to the number of representative s Q s. We then prepare a set of relatively small number of s Q s sampled from uniform random orientations for which all pair c ij calculation is possible. In our simulation we prepared tentatively a set of about 1.5 times as many diffraction patterns as compared with the number of targeted representative s Q s. It should be noted that such a set of relatively small number of s Q s can be prepared at an early stage of an on-going experiment. For this small set we carried out all pair c ij calculations to obtain I c ðk k; Þ. By referring to I c ðk k; Þ sorted in descending order, we identify pairs of s Q s in each of I c ðk k; Þ, and erase one of the pair of s Q s in this order from the list of candidates of representatives until the remaining number of candidates becomes exactly the number of targeted groups. I c ðk k; Þ, where this occurs, is recorded as the threshold peak value I c,representative to be referred to in task (ii).
In step 2, task (ii) is carried out as follows. At first I c ðk k; Þ is calculated for all pairs, each pair consisting of one from the representative s Q s and the other from the whole set of about 1 Â 10 6 s Q s. This part of the calculation can be divided into independent separate jobs by dividing the large set of whole s Q s into subsets; or, even while the whole set is being generated during the experiment, calculation can be started for a part of the growing set. This flexible feature is a result of the two-step scheme we adopted.
As a result of step 2, about 80 s Q s on average are expected to be assigned to belong to each similarity group. This is the number needed to improve the S/N ratio so that mutual alignment of signal-enhanced s Q s in the reciprocal space can be performed.  After the correlation calculations, we proceed to identify pairs judged to be similar. This is done by comparing I c ðk k; Þ for each pair with a certain threshold value of I c,group . In this paper, values of I c,group were chosen so that the average number of s Q s in the similarity groups is larger than 80 and I c,group < I c,representative .

Assessment of the algorithm
The algorithm we propose in this paper carries out similarity detection among a large number of experimentally observed diffraction patterns. The algorithm also gives a relative rotation angle ij of the detector plane for each pair. A pair of s Q s for i and j is defined to be similar, when the angle ij between the respective beam directions is less than a certain cut-off value 0 . We are applying the proposed algorithm for a set of s Q s. Because s Q s in this paper are the simulated diffraction patterns, the values of ij and ij are in fact known precisely beforehand. We can assess the quality of the proposed algorithm by comparing its result with precise values from the simulation.
The result of assessment is expressed in terms of two probabilities, P right , the probability that a result of classification is right, and, P capture , the probability that a right pair is captured, which are expressed, respectively, as follows, Here the quantities appearing on the right-hand-sides are defined as follows. A set of pairs whose simulation ij values are smaller than a certain value 0 is defined as A with its number of elements denoted as N A . Set B is defined as a set of pairs whose value is within such a narrow range around ij as j À ij j < Á 0 , where Á 0 is a small value of angle taken to be 1 in this paper. A set of pairs that are judged by the algorithm to have high I c ðk k; Þ is defined as C with its number of elements denoted as N C . A set of pairs that are correctly captured and judged as a right pair is given by the product set A \ B \ C with its number of elements designated as N A\B\C .

Parallelization of the classification program
Basically, the correlation calculation must be applied to any possible combination of two s Q s. This procedure can be easily parallelized by decomposing the diffraction data set; however, a naïve implementation cannot avoid reading the same file multiple times and this file I/O can be a severe performance bottleneck.
The K computer consists of 82944 nodes. Since each node has 16 GB of memory, the total amount of memory of the K computer is approximately 1.3 PB. The total size of 1 Â 10 6 diffraction patterns is approximately 14 TB, much smaller than the whole memory size of the K computer. Thus, all diffraction data can be loaded into the memory of the K computer. The first prototype program was developed by using the MPI (Message Passing Interface) library. Each MPI process reads a dedicated file and then the read data is passed to the other nodes upon request. In this way, the file I/O bottleneck can successfully be avoided. Based on this prototype, a new program is under development to achieve better performance.

Result of application of developed classification scheme
In this section, we report the result of application of the developed scheme and algorithm for diffraction data simulated for 70S ribosome. The incident X-ray wavelength = 1 Å and intensity I i = 2.55 Â 10 20 photons pulse À1 mm À2 are assumed in the simulation. This intensity can be realised when the XFEL beam emitted at SACLA is fully transported and focused down to 50 nm Â 50 nm. Note that this focusing condition will make the hit rate of the XFEL pulse to the molecule lower and may require novel experimental methodology. For this molecule, we set the targeted resolution r to be 4.7 Å . This value corresponds exactly to G being 1.0 . This resolution is translated to the number of similarity groups to be 13146 and to the necessary number of s Q s to be 1.05 million. In our treatment in this paper we do not explicitly pay attention to the centrosymmetric property of i(k). When we take this symmetry into account (which we should in real experiments), a single s Q is to be subjected to the classification twice, the first time as the pattern itself and the second time as its centrosymmetric pattern. In this treatment, 1.05 million s Q s for classification can be prepared from the half number of s Q s (Tokuhisa et al., 2012). In this paper we prepared 1.05 million s Q s by using equation (2) and the PDB coordinate of 70S ribosome, 1yl3 and 1yl4 (Jenner et al., 2005). Each s Q consists of photon-count data by pixels arranged in a two-dimensional square lattice. The photon-count data are given up to the diffraction angle corresponding to 0.74 Å À1 , a value far enough to achieve 4.7 Å resolution. These s Q s on a square lattice were then converted into a form suitable for the c ij computation, i.e. photon-count data by fictitious pixels on a circle with fixed value of . In fact, in order to compute c ij rapidly using a fast Fourier transform library, a Fourier transform of such data is prepared and stored in place of the original s Q . The size of the data for the whole 1.05 million s Q s is 14 TB.
In step 1 for selection of the representative s Q s, 13252 patterns (slightly different from the targeted number 13146 for a very technical reason) were selected from a set of exactly 20000 s Q s by the method described in x2.2. The obtained value of I c,representative is 0.00111. The distance to the nearest representative is found distributed roughly between 0.4 and 2.2 with the average being 1.1 , which is very near to our target value of 1.0 . The result shows that our algorithm can detect the similarity between a pair of s Q s with an accuracy of about 1 . The calculation of this step was carried out in one job using the computational resource of 3.8 M nodes s.
The calculation of I c for classification of 1.05 million diffraction patterns into 13252 groups was carried out by dividing the whole calculation into 255 independent 1 h jobs, each using 385 nodes, with the total computational resource used being 207 M nodes s. The total number of correlation calculations is thus 13.8 billion. The input data for each job is (i) the data set of all representative s Q s, common for all jobs, and (ii) a part of the data set from the whole s Q s allocated to the job. If we use all of the 82944 nodes of the K computer, the whole calculation can be finished in 71 min.
After calculation of I c ðk k; Þ for all representative-all pairs, we proceed to judge whether or not each s Q from the whole set belongs to the similarity group of each representative. This judgement is done by comparing I c ðk k; Þ for each pair with a threshold value, I c,group , for the judgement. In this paper, I c,group is set to be 0.0010 as described in x2.2, yielding the average number of s Q s in the similarity groups to be 82.8. Here we allowed one s Q to belong to more than one group. In this treatment, a total of 1097672 pairs are assigned as similar (set C). For each pair thus judged to be similar, the exact value of similarity ij is in fact known from the record of simulation. The distribution of the similarity value ij versus I c is shown in Fig. 2(a). Almost all ij are less than 2.0 , indicating that the attainable resolution of our analysis is better than 9.4 Å . This shows that our method can achieve sub-nanometer-resolution three-dimensional imaging of biomolecules with an XFEL. For higher-resolution imaging, we should solve a few problems. It is noted that about 28% of s Q s were found to be orphans, i.e. to belong to no similarity groups of the representatives. Upgrading of the method for the selection of the representatives should reduce the number of orphans. Our algorithm failed to identify 133387 pairs with ij < 1 as belonging to set C. Out of the pairs in set C, 713248 pairs are found to belong to set N A\B\C . P right and P capture are found to be 0.65 and 0.68, respectively. Revision of the automatic similarity detection method, e.g. equation (4), should improve these values. Fig. 2(b) shows the distribution of the number of s Q s classified in each similarity group. In cases where the classification calculation is carried out on a real-time basis, the diffraction pattern collection experiment should be carried out by monitoring such a graph as in Fig. 2(b) until the average becomes larger than 80.

An efficient data flow between SACLA and the K computer
The SACLA facility is located 60 km in a straight line from the K computer. Both facilities are connected via the Wide Area Network, SINET4 (http://www.sinet.ad.jp/index_en.html). The data transfer system is now under construction. In Fig. 3 we show the data flow diagram. First, the diffraction data are saved to a storage device in SACLA in a run data format. Next, s Q s not suitable for analysis are excluded by applying a filtering algorithm. Data sets of s Q s, each with a proper size, are then transferred from SACLA to the K computer in a certain interval by using SINET4, where 10 Gbps bandwidth is reserved from the SACLA facility to the edge node of SINET4. A dedicated network is also in the proposal phase to secure the on-line data-transmission bandwidth. In the K computer, each s Q is then converted into a Fourier-transformed format suitable for subsequent calculation of c ij before the two-step classification computation is executed. During the classification calculations, the temporal results can be monitored remotely from the SACLA beamline endstation so Result of classification of a set of about 1 Â 10 6 diffraction patterns for 70S ribosome obtained by simulation assuming the intensity of incident X-ray is I i = 2.55 Â 10 20 photons pulse À1 mm À2 . (a) Distribution of values of ½ ij ; I c ðk k; Þ, where pairs with I c ðk k; Þ > I c,group = 0.0010 are judged similar and ij , the angle between each incident beam direction for a pair, is the value known from the simulations. (b) Distribution of the number of members in each similarity group.

Figure 3
Schematic diagram showing an efficient data flow between the XFEL facility SACLA and the K computer. that data quality can be diagnosed by the experimentalists. The run data format and the similarity list are implemented on HDF5 (HDF group, http://www.hdfgroup.org/). The complete system of the above data flow will be operational in the near future.

Conclusion
We developed a code with a classification algorithm (Tokuhisa et al., 2012) compatible with data as large as 1 Â 10 6 diffraction patterns. The code is designed so as to be able (i) to finish the whole classification calculation within about 1 h of computation by the K computer, and (ii) to conduct the classification concurrent to the experimental data collection. The benchmark with simulated data demonstrated the speed and flexibility that enables the target experimental scheme. It is shown that our method can achieve a sub-nanometer resolution imaging by the synergistic use of SACLA and the K computer.
We have found a rather large number (about 28%) of diffraction patterns (orphans) which were not classified into any similarity groups. An ad hoc improvement would be to select additional representatives from the orphans, and recalculate grouping for those additional representatives. A more serious improvement would be to re-examine the estimation of the number of representative diffraction patterns by 4/! G and the size of a relatively small set of diffraction patterns (currently 1.5 times the targeted number) from which the targeted number of representatives are selected. We expect that the above change would also contribute to improve the observed P right and P capture . Revision of the automatic similarity detection method is also under consideration for the improvement of these quantities. Use of a highperformance I/O library will reduce the reading and the writing time of the data which occupy half of the execution time in this work.
Recently, illumination of multiple molecules to increase the scattering intensity has been proposed to overcome the low statistics of each diffraction pattern (Oroguchi & Nakasako, 2013). In this case, the analysis of the diffraction patterns becomes more complex, and makes the attribution of diffraction patterns to the structure of each molecule limited. On the other hand, single-particle coherent X-ray imaging, which has been discussed in this paper, has a clear physical relation between the diffraction pattern and the structure of each molecule. The latter has several technological issues to be overcome, such as a low hit rate of particles by the XFEL pulse. One of them is the diagnostics of the data quality. The present study shows that data diagnostics during the data acquisition can be executed by the dedicated code implemented on the state-of-art computation infrastructure.
Part of the results were obtained using the K computer at the RIKEN Advanced Institute for Computational Science (proposal Nos. hp120213 and hp120214).