Advances in long-wavelength native phasing at X-ray free-electron lasers

Significant improvements to the current state-of-the-art in phasing strategies in serial femtosecond crystallography are presented and quantified.

IUCrJ (2020). 7, doi:10.1107/S2052252520011379 Supporting information, sup-1 S1. Online data monitoring and offline data conversion Data from 31 modules of the Jungfrau 16M detector (Mozzanica et al., 2016) were read out by a custom detector backend software at a rate of 775 MB/s. After assembling the frame from the individual modules, the images were streamed using the zeromq protocol to the data writer (25 Hz,all images), to the online visualization (1 Hz, one out of 25 images) and online analysis processes (5 Hz, every 5th image). The writer saved the images in the uncorrected, raw format (directly as it is read out by the detector) in hdf5 files, one file per run. Unless interrupted, most of the runs had 10,000 images. For online visualization the bokeh library [http://bokeh.pydata.org/] was used to display an assembled image in a web-browser after detector corrections (pedestal subtraction, gain correction, bad pixel identification and masking). The online visualization functions included: maintaining a buffer with 60 images, zoom and pan functions to inspect any area of interest in the image and reporting about time-based and frame-based image characteristics, such as the integrated intensity of the whole image, pixel or area of interest. In the online analysis processes, each received image was analyzed using a custom version of the peakfinder8 algorithm from the CrystFEL software suite (White et al., 2012) for the presence of Bragg spots. Images with equal to or more than 15 identified spots were regarded as crystal hits. Information about the position of spots in an image and the number of crystal hits were sent to another online visualization process, which displayed information on the percentage of hits and the diffraction resolution limit. The peak detection parameters of the peakfinder8 algorithm such as the number of connected pixels, intensity threshold and signal to noise ratio needed to be adjusted for each sample and data collection settings. The online hit detection was used to identify and correct misalignments of the jet with respect to the beam position and to estimate the amount of collected data from a given sample. After the image was recorded by the Jungfrau 16M detector, it took approximately 1 second for this image to be displayed in the online visualization. For the offline data analysis, the raw diffraction images were converted to the corrected, unassembled format by applying gain and pedestal corrections and thereafter the images were saved as hdf5 files.
The pedestal values were calculated from dark runs, taken approximately every 24 hours during the experiment. The conversion factors (gain factors) for each pixel and gain setting used to transform ADU values to keV were obtained from the SLS Detector group that manufactured the JUNGFRAU 16M detector.
S2. Structure determination of thaumatin from data set acquired at 6.06 keV by native-SAD The structure of thaumatin was determined to 1.95 Å resolution from all available indexed images (271,609) collected at 6.06 keV by the automated pipeline for structure determination implemented in CRANK2 pipeline (Skubak & Pannu, 2013). The heavy atom substructure was found by SHELXD (Sheldrick, 2010) after 616 trials searching for 9 sites with resolution cut-off of 3.5 Å. The CFOM of the best solution was 68.3, CCall was 44.5 and CCweak was 23.8 (Supplementary Figure 3). The combined model building and refinement finished after 212 residues were built in 3 fragments, 97.1 % IUCrJ (2020). 7, doi:10.1107/S2052252520011379 Supporting information, sup-2 of them were assigned to the sequence (Supplementary Figure 4). The refinement Rwork and Rfree factors of the automatically built thaumatin model were 24.9 % and 29.6 % respectively. The average peak height of the phased anomalous Fourier difference map for the first 17 highest peaks obtained with the fully refined model was 14.6.
The structure of thaumatin from the minimal number of indexed images (50,000) was determined to 2.0 Å resolution using the CRANK2 pipeline. The heavy atom substructure was found by SHELXD after 4171 trials searching for 10 sites with resolution cut-off of 3.5 Å and additional parameters ESEL = 1.3 and MIND = (-3.5, 2.8). Lower than default ESEL values usually work better for lowresolution data sets (http://shelx.uni-ac.gwdg.de/~athorn/pdf/thorn2017a.pdf). The MIND parameter defines the minimal distance (in Å) between heavy atoms (HA) sites. The above setting of MIND parameter ensured that disulphides were treated as single HA. The CFOM of the best solution was 56.8, CCall was 39.3 and CCweak was 17.5 (Supplementary Figure 5). The combined model building and refinement finished after 202 residues were built in 1 fragment, 97.1 % of them were assigned to the sequence (Supplementary Figure 6). The refinement Rwork and Rfree factors of the automatically built thaumatin model were 27.6 % and 32.7 % respectively. The average peak height of the phased anomalous Fourier difference map for the first 17 highest peaks obtained with the fully refined model was 9.57.

S3. Structure determination of thaumatin from data set acquired at 4.57 keV by native-SAD
The structure of thaumatin was determined to 2.65 Å resolution from all available indexed images (242,578) collected at 4.57 keV by the automated pipeline for structure determination implemented in CRANK2 pipeline. The heavy atom substructure was found by SHELXD after 78 trials searching for 9 sites with resolution cut-off of 3.5 Å. The CFOM of the best solution was 77.4, CCall was 49.3 and CCweak was 28.1 (Supplementary Figure 7). The combined model building and refinement finished after 211 residues were built in 6 fragments, 92.7 % of them were assigned to the sequence (Supplementary Figure 8). The refinement Rwork and Rfree factors of the automatically built thaumatin model were 28.1 % and 35.2 % respectively. The average peak height of the phased anomalous Fourier difference map for the first 9 highest peaks obtained with the fully refined model was 14.5.
The structure of thaumatin from the minimal number of indexed images (20,000) was determined to 2.65 Å resolution using the CRANK2 pipeline. The heavy atom substructure was found by SHELXD after 9738 trials searching for 9 sites with resolution cut-off of 3.0. The CFOM of the best solution was 41.6, CCall was 27.2 and CCweak was 14.4 (Supplementary Figure 9). The combined model building and refinement finished after 207 residues were built in 5 fragments, 90.3 % of them were assigned to the sequence (Supplementary Figure 10). The refinement Rwork and Rfree factors of the automatically built thaumatin model were 31.3 % and 40.7 % respectively. The average peak height IUCrJ (2020). 7, doi:10.1107/S2052252520011379 Supporting information, sup-3 of the phased anomalous Fourier difference map for the first 9 highest peaks obtained with the fully refined model was 9.82.

S4. Structure determination of A2A from data set acquired at 4.57 keV by native-SAD
The structure of A2A was determined to 2.65 Å resolution from all available indexed images (199,136) collected at 4.57 keV using the automated pipeline for structure determination implemented in CRANK2 (Online Methods). The heavy atom substructure was found by SHELXD after 1506 trials searching for 16 sites with resolution cut-off of 3.0 Å. The CFOM of the best solution was 64.9, CCall was 41.4 and CCweak was 23.5 (Supplementary Figure 11). The combined model building and refinement finished after 432 residues were built in 11 fragments, 89.6 % of them were assigned to the sequence (Supplementary Figure 12). The refinement Rwork and Rfree factors of the automatically built A2A model were 30.4 % and 33.8 % respectively. The average peak height of the phased anomalous Fourier difference map for the first 17 highest peaks obtained with the fully refined model was 12.0.
The structure of A2A from the minimal number of indexed images (50,000) was also determined to 2.65 Å resolution using the CRANK2 pipeline. The heavy atom substructure was found by SHELXD after 18921 trials searching for 12 sites with resolution cut-off of 3.5 Å with additional parameters ESEL = 1.3 and MIND = {-3.5, 2.8} similarly to the substructure solution protocol for thaumatin 6.06 keV data set with minimal number of images. The CFOM of the best solution was 39.1, CCall was 29.4 and CCweak was 10.0 (Supplementary Figure 13). The combined model building and refinement finished after 403 residues were built in 12 fragments, 73.2 % of them were assigned to the sequence (Supplementary Figure 14). The refinement Rwork and Rfree factors of the automatically built A2A model were 34.6 % and 43.2 % respectively. The average peak height of the phased anomalous Fourier difference map for the first 17 highest peaks obtained with the fully refined model was 9.48.