Electron Event Representation (EER) data enables efficient cryoEM file storage with full preservation of spatial and temporal resolution

Direct detector device (DDD) cameras have revolutionized electron cryomicroscopy (cryoEM) with their high detective quantum efficiency (DQE) and output of movie data. A high ratio of camera frame rate (frames/sec) to camera exposure rate (electrons/pixel/sec) allows electron counting, which further improves DQE and enables recording of super-resolution information. Movie output also allows for computational correction of specimen movement and compensation for radiation damage. However, these movies come at the cost of producing large volumes of data. It is common practice to sum groups of successive camera frames to reduce the final frame rate, and therefore file size, to one suitable for storage and image processing. This reduction in the camera’s temporal resolution requires decisions to be made during data acquisition that may result in the loss of information that could have been advantageous during image analysis. Here we present experimental analysis of a new Electron Event Representation (EER) data format for electron counting DDD movies, which is enabled by new hardware developed by Thermo Fisher Scientific for their Falcon DDD cameras. This format enables recording of DDD movies at the raw camera frame rate without sacrificing either spatial or temporal resolution. Experimental data demonstrate that the method retains super-resolution information and allows correction of specimen movement at the physical frame rate of the camera while maintaining manageable file sizes. The EER format will enable the development of new methods that can utilize the full spatial and temporal resolution of DDD cameras.

Direct detector device (DDD) cameras have revolutionized electron cryomicroscopy (cryoEM) 19 with their high detective quantum efficiency (DQE) and output of movie data. A high ratio of 20 camera frame rate (frames/sec) to camera exposure rate (electrons/pixel/sec) allows electron 21 counting, which further improves DQE and enables recording of super-resolution information. 22 Movie output also allows for computational correction of specimen movement and compensation 23 for radiation damage. However, these movies come at the cost of producing large volumes of 24 data. It is common practice to sum groups of successive camera frames to reduce the final frame 25 rate, and therefore file size, to one suitable for storage and image processing. This reduction in 26 the camera's temporal resolution requires decisions to be made during data acquisition that may 27 result in the loss of information that could have been advantageous during image analysis. Here 28 we present experimental analysis of a new Electron Event Representation (EER) data format for 29 electron counting DDD movies, which is enabled by new hardware developed by Thermo Fisher 30 Scientific for their Falcon DDD cameras. This format enables recording of DDD movies at the 31 raw camera frame rate without sacrificing either spatial or temporal resolution. Experimental 32 data demonstrate that the method retains super-resolution information and allows correction of 33 specimen movement at the physical frame rate of the camera while maintaining manageable file 34 sizes. The EER format will enable the development of new methods that can utilize the full 35 spatial and temporal resolution of DDD cameras. 36 37

Introduction: 39
Complementary metal oxide semiconductor (CMOS) direct detector device (DDD) cameras for 40 cryoEM provide improved detective quantum efficiency (DQE) compared to other detectors 41 (McMullan et al., 2016). Furthermore, these cameras can record movies of the specimen during 42 irradiation. Movies are output from the detector as raw 'camera frames' (Fig. 1A), with 43 successive frames summed to produce 'exposure fractions' that are saved for image processing 44  (Li et al., 2013). For electron counting, the exposure per frame is limited to one 54 electron for every ~40 to 100 pixels. This low density of electrons per frame allows individual 55 electrons to be detected with a low probability of two electrons impinging on the same region 56 during the recording of the frame, which would lead to undercounting electrons in a phenomenon 57 known as 'coincidence loss'. Each electron deposits energy into multiple pixels upon hitting the 58 sensor, and consequently the center of the impact event can be localized to a specific region of a 59 pixel in order to allow super-resolution imaging (Li et al., 2013 size 3 )*+ of an optimally compressed EER movie in bytes, neglecting any file header 131 information, is therefore given by 132 where E is the total electron exposure in the movie in e -/pixel and 0 frames is the number of 134 camera frames recorded. 135

136
The EER format implemented for Falcon cameras uses run-length encoding (RLE) to reduce data 137 size. For each camera frame the pixel distances between detected electrons, in the scanline order 138 in which they are stored in memory, are encoded with a constant word length, 5 562 . In the 139 current algorithm, 5 562 was set at 7 bits. The maximum value, m, for the given number of bits 140 (i.e. 6 = 2 7 !"# − 1 = 127 for 5 562 =7 bits) is used to indicate that there was no electron 141 detected after this maximum number of 6 pixels. This scheme does not achieve the optimal data 142 compression and file size described in equation 4, but has the advantage of straightforward 143 image encoding and decoding. The approximate total file size with RLE compression, 3 562 , is 144 given by the product of total electron exposure E, number of pixels 0 pixels , and the number of bits 145 per electron 5 562 + 2 log 2 (-), but with a correction to account for the extra bits needed to 146 represent the situation where no electron was detected after 6 pixels: 147 3 562 (!, -, 4) = ' ( 4 ⋅ 0 pixels ' 7 !"# '8('8*) $ + 2 log 2 (-) +.
(5) 148 The optimal choice for 5 562 to minimize file size depends on p. The use of 7 bits enables small 149 file sizes when typical exposure rates for electron counting are used. The EER format 150 implemented for Falcon cameras uses u=4, meaning physical pixels are divided into 4´4 sub-151 pixels. 152 153 Figure 1D shows typical EER file sizes (50 e -/pixel total exposure with 1 Å/pixel) compared to 154 standard image formats, such as MRC image stack files (Cheng et al., 2015). In contrast to the 155 EER files, the MRC files described in the figure have reduced temporal resolution due to 156 averaging of successive frames. Where the example MRC files preserve super-resolution 157 information they use 2´2, rather than 4´4, sub-pixels. When more than ~35 exposure fractions 6 resolution information. The intersection of the EER curve with the conventional fractionation 160 approach curve will occur at a larger number of exposure fractions if a compressed image format 161 is used (e.g. LZW-TIFF). However, the amount of image compression that can be achieved 162 depends strongly on image content and consequently it is difficult to compare these methods 163 analytically. In principle, RLE compression could be applied to conventional movies saved with 164 each exposure fraction consisting of a single super-resolution camera frame. However, the real-165 time output of EER data from the camera avoids saving extremely large uncompressed 166 intermediate files even temporarily, which would make workflows prohibitively complicated. 167 Lossy compression approaches have also been shown to reduce file sizes when complete 168 preservation of information is not required (Eng et al., 2019). Consequently, conventional files 169 that are smaller than the EER format can be produced, but doing so requires sacrificing temporal 170 or spatial resolution. Thermo Fisher Scientific Falcon 3EC or 4 localize electrons with sub-pixel accuracy using a 175 centroiding procedure before electron positions are recorded. As described above, this super-176 resolution information is preserved in the EER format by sub-dividing each physical pixel into 177 u´u sub-pixels. Because the Nyquist resolution of a camera is given by two times the edge length 178 of a pixel, sub-division of physical pixels by a factor of u extends the Nyquist resolution by 1/u. 179 Even without sub-pixel localization of electrons, images retain information beyond the Nyquist 180 frequency because the corners of Fourier transforms encode spatial frequencies that are finer 181 than the Nyquist frequency in the x or y direction of the image. ( Fig. 2A). 182

183
We investigated the ability of a Titan Krios electron microscope with a Falcon camera and EER 184 capability to record information beyond the physical Nyquist frequency of the camera sensor. 185 Images of a standard cross-grating with polycrystalline gold were recorded with a physical pixel 186 size of 1.71 Å (Fig. 2B). The Fourier transform of the image shows diffraction peaks that 187 correspond to 2.35 Å, or 1.46´ the Nyquist resolution of 3.42 Å (Fig. 2C, red circle). Therefore, 188 it is evident that the electron counting algorithm combined with the EER data format enables 189 recording of information beyond the physical Nyquist limit of the camera.

191
To test whether the super-resolution capability of EER files could be applied to biological 192 specimens, we imaged human light-chain apoferritin particles with a calibrated physical pixel 193 size of 1.64 Å and a physical pixel Nyquist resolution of 3.28 Å. Movies were recorded as EER 194 data with a total exposure of ~42 e -/Å 2 on the specimen and a camera exposure rate 0.63 e -195 /pixel/sec. These movies were then converted to 30 MRC format exposure fractions. 3D 196 reconstruction from 118,766 particle images extracted from 157 movies with a conventional 197 refinement work-flow gave a 3D resolution by Fourier shell correlation of 3.3 Å (Fig. 2D, black  198 curve). It should be noted that 3D reconstructions with resolutions close to the Nyquist frequency 199 can suffer from artefacts that limit the ability to resolve their highest-resolution features. Next, 200 the same EER files were converted to movies with 30 fractions but with a pixel size of 0.82 Å 201 (Nyquist resolution 1.64 Å). Electrons were placed on pixel grid that is 4´4 supersampled from 202 the camera's physical pixel grid. Sub-pixel positions were either chosen randomly or using the 203 EER information. Subsequently, the image were Fourier cropped to give an effective 2´2 204 supersampling of the physical pixel grid. 3D reconstruction from these images following the 205 same workflow used with the conventional image files gave 3D maps with resolutions of 3.1 Å 206 for the random sub-pixel placement (Fig. 2D, blue curve) and 2.7 Å for placement with 207 information from EER (Fig. 2D, red curve). The resolution from the randomized sub-pixel 208 information, 3.1 Å, is notable because it goes beyond the physical Nyquist resolution of 3.28 Å. 209 This effect is due to information past the Nyquist resolution found in the corners of the Fourier 210 transform of the image ( Fig. 2A), although improved motion correction in the supersampled 211 images may also improve the map. The resolution from the reconstruction that used sub-pixel 212 information from the EER file was 2.7 Å, 18 bins in Fourier space beyond the physical Nyquist 213 resolution and 13 bins in Fourier space beyond the randomized sub-pixel control. Numerous 214 features in the maps indicate improved resolution where EER sub-pixel information was used 215 (Fig. 2E, right, blue asterisks) compared to where random information was used (Fig. 2E, left, 216 red asterisks). 217 218

Intra-fraction motion correction enabled by EER imaging 219
The ability to fractionate exposures up to the physical frame rate of the camera, without needing 220 to store the data as high frame rate movies, provides the possibility of improved measurement in the applied motion trajectories, 3 rd order B-spline interpolation was used to assign the position 241 of each particle in each camera frame (Fig. 3A, blue line). Three-dimensional reconstruction 242 using just the measured motion from the 30 exposure fractions without interpolation produced a 243 map at 2.10 Å resolution (Fig. 3B, black curve). In contrast, applying interpolated motion at the 244 physical frame rate prior to averaging gave a map at 2.07 Å, which is an improvement of two 245 bins in Fourier space (Fig. 3B, red curve). Beam-induced motion in the early frames of a movie 246 is thought to be one of the primary limits to resolution in cryoEM at present (Henderson, 2018). The calculation of 3D maps from different exposure fractions described in Fig. 3C shows that it 302 is possible to obtain the highest-resolution from a single exposfraction after pre-exposure of the 303 specimen with 1.4 e -/Å 2 . This finding is consistent with the large body of evidence that the 304 earliest part of the exposure, where high-resolution information should be best preserved, suffers 305 from the most beam-induced specimen motion (Henderson, 2018). The position of this optimum 306 indicates that smoother application of the measured particle motion from interpolation has the 307 greatest effect near the beginning of the movie where motion is still large, while in the first 1.4 e -308 /Å 2 of exposure inaccuracies in the measured motion prevent the smoother application from 309 improving map resolution. This result is particularly encouraging. It suggests that new 310 techniques that are capable of more accurate measurement of beam-induced motion could allow 311 for extraction of high-resolution information from the earliest frames of a movie. EER data, 312 which preserves the full temporal resolution of data acquired with DDD cameras while 313 maintaining manageable file sizes, can allow for development of these improved beam-induced 314 motion correction methods. 315 316

Specimen preparation 318
Human apoferritin was a gift from Ms. Taylor Sicard and Prof. Jean-Philippe Julien (The 319 Hospital for Sick Children) and was used at 10 mg/mL. Holey gold grids with a regular array of 320 ~2 µm holes were prepared as described previously (Marr et al., 2014). Grids were subjected to 321 15 sec of glow discharge in air before freezing in liquid ethane with a Gatan CP3 grid freezing 322 device. The grid freezing device chamber was at room temperature, 90 % RH, and blotting was 323 done for 10 sec with an offset of -0.5 mm. 324 325 Data collection 326 Images were acquired as described in the main text with a Titan Krios G3 electron microscope 327 from Thermo Fisher Scientific operating at 300 kV and equipped with a Falcon 3EC camera and 328 a prototype EER module (used for intra-fraction motion correction experiments) and later a 329 prototype Falcon 4 camera (used for super-resolution experiments). Automatic data collection 330 was done with the EPU software package. For EER intra-frame motion correction, 325 movies 331 of human light-chain apoferritin were collected with the Falcon 3EC camera at 75,000× nominal 332 magnification, corresponding to a calibrated pixel size of 1.06 Å. Falcon 3EC movies were 333 recorded simultaneously in both EER format with 2312 raw frames per movie as well as 16-bit 334 MRC format with 30 fractions per movie. The camera exposure rate and the total exposure of the 335 specimen were 0.80 e -/pixel/sec and ~41 e -/Å 2 , respectively, with defocus ranging from 0.4 µm 336 to 1.6 µm. Following completion of this aspect of the work, we replaced the Falcon 3EC camera 337 with a prototype Falcon 4 camera, which increased the physical frame rate from 40 to 250 338 frames/sec. Consequently, for EER super-resolution data, 157 movies were collected on the same 339 microscope but with the prototype Falcon 4 camera. A nominal magnification of 47,000× gave a 340 calibrated pixel size of 1.64 Å. This camera did not allow for simultaneous recording of EER 341 data and conventional movies. After collection, these EER files could be converted to standard 342

EER image handling 347
The prototype EER module for Falcon 3EC camera ran custom firmware with real-time EER 348 encoding, streaming the data to a dedicated computer running the Ubuntu 16.04 operating 349 system. With the Falcon 4 camera, the EER files were stored with the standard Falcon 4 storage 350 infrastructure, which normally records MRC exposure fractionation stacks. Electron detection 351 events were stored with run-length encoding as described in the text of the manuscript. Frames 352 were packed in a BigTIFF compliant file format with a gain reference image stored separately in 353 an MRC file. Information about defects were encoded in the same gain reference with a value of 354 '0'. EER files were decoded using a hybrid CPU/GPU implementation of the decoding 355 algorithm. To utilize sub-pixel information optimally for both super-resolution and non-super-356 resolution cases, all decoded images were reconstructed on the full 4´4 supersampled image grid 357 and subsequently Fourier-cropped to the desired resolution. For single particle cryoEM, EER 358 files were converted to standard exposure fractionated image stacks that could be used in 359 standard image processing pipeline. In the final correction of motion for individual particle 360 images, the EER files were decoded with the desired supersampling (i.e. 4´4 oversampling 361 followed by Fourier cropping), image shifts applied, and exposure-weighting performed as 362 described previously (Rubinstein & Brubaker, 2015). Application of image shifts to data from 363 EER files was done by placing electrons on shift-compensated positions rather than first 364 composing an image and then applying shifts by interpolation in real space or phase changes in 365 Fourier space. The procedure of shifting electron positions prior to image reconstruction is less 366 expensive computationally than image interpolation, and prevents image interpolation artefacts. 367 Efficient gain correction was performed by retrieving the gain correction coefficient from the 368 uncorrected pixel locations for each detected electron and applying it as a weighting factor for 369 the contribution of the electron to its shifted position. During these procedures, the individual 370 particle motion trajectories were either smoothed with cubic spline interpolation, or not 371 interpolated as a control, as described in the manuscript.