OH cleavage from tyrosine: debunking a myth

A systematic macromolecular crystallography investigation into the observed electron density loss around the –OH group of tyrosines, as a function of dose at 100 K, is reported. It is concluded that a probable explanation is aromatic ring disordering as opposed to –OH cleavage; occurrence of the latter mechanism is a misconception perpetuated in radiation damage literature, and is unsupported by any observations in radiation chemistry.

scores calculated by EDSTATS (Tickle, 2012) for the 1dwa structure retrieved directly from PDB and retrieved from PDB_REDO (Joosten et al., 2014) after 10 cycles of rigid body refinement. RSZO is a metric for model precision and is calculated as RSZO = mean(ρ obs ) / σ(Δρ), in which σ(Δρ) is the standard uncertainty of the 1dwa F obs -F calc map. The percentage of residues within 1dwa with lower RSZO scores than specific thresholds (between 0.5 and 3) are provided. The RSZO score is unbounded from above, with larger positive values indicating greater residue to density fit (Tickle (2012) Figure S1.1. 2mF obs -DF calc electron density maps and mF obs -DF calc maps for the 1dwa myrosinase model at different stages in the PDB_REDO re-refinement, centred at residue Tyr-215. All 2mF obs -DF calc and mF obs -DF calc maps were generated over the asymmetric unit with FFT, and are contoured at 2 (in grey) and +/-4 σ (green/red) respectively. (a) The original 1dwa model extracted directly from the PDB (with ordered water in red), with the corresponding original electron density maps (maps were extracted from the PDB in .mmcif format, and converted to .mtz format with CCP4 program CIF2MTZ). In (b), the coordinate model after 10-cycles of rigid body refinement with PDB_REDO has been superimposed on the coordinate model and electron density maps in (a). (c) The re-refined 1dwa model (with ordered water in yellow) and electron density maps extracted from PDB_REDO after 10-cycles of rigid body refinement. In (d), the coordinate model after full model rebuilding with PDB_REDO has been superimposed on the coordinate model and electron density maps in (c). (e) The fully rerefined 1dwa model (with ordered water in pink) and electron density maps extracted from PDB_REDO after full model rebuilding. Full model re-building with PDB_REDO resulted in correct placement of both the Tyr-215 aromatic group and the ordered solvent molecules. All subfigures were rendered in PyMOL (www.pymol.org). Figure S1.2. Specific radiation damage to Tyr-215 in the myrosinase damage series (Burmeister, 2000). (a) F obs (5) -F obs (1) difference map, overlaid on the original initial coordinate model in white (PDB accession code: 1dwa). Structure factor amplitudes have been retrieved directly from the PDB for the F obs (5) -F obs (1) map calculation, with phases derived from the original 1dwa coordinate model. Negative difference density is observable adjacent to the Tyr-215, however does not align with the -OH group. (b) F obs (5) -F obs (1) difference map, overlaid on the re-refined initial coordinate model in white (PDB accession code: 1dwa) retrieved from PDB_REDO. Structure factor amplitudes have been retrieved from PDB_REDO for the F obs (5) -F obs (1) map calculation, with phases derived from the re-refined 1dwa coordinate model. Clear negative difference density is observable centred at the position of the Tyr-215 -OH group. In both cases, the F obs (5) -F obs (1) maps have been generated by FFT within the RIDL pipeline, and are overlaid over the coordinate models at +/-5 σ contouring levels in green/red.
(a) (b) Figure S1.3. D loss damage signature histogram plots over all atoms of selected residue types for (a,b) myrosinase (Burmeister, 2000), (c,d) TRAP (Bury et al., 2016), and (e,f) Malate Dehydrogenase (Fioravanti et al., 2007). Gaussian kernel density estimates are overlaid on the histogram plots. For each plot, Kolmogorov-Smirnov (KS) test statistics have been calculated to measure the similarity between the two residue types included. Other doses within these damage series exhibit qualitatively similar behavior (data not shown). ), (f) insulin (unpublished data), (g) the C-protein DNA complex, (h) phosphoserine aminotransferase, and (i) acetylcholinesterase. Tyr-OH residues exhibiting hydrogen bond interactions to Glu or Asp carboxyl groups are coloured blue. See main text for list of original publications for each protein structure. For clarity, only proteins for which damage series consisted of > 2 higher dose datasets have been included. X-axis doses are those reported originally (see original publications for corresponding dose calculations). The exception is myrosinase, for which doses were originally quoted in units of photons mm -2 ; diffraction weighted doses (DWD) (Zeldin, Brockhauser, et al., 2013) have been calculated in RADDOSE-3D (Zeldin, Gerstel, et al., 2013) for this damage series using crystal composition (heavy atom content, crystal size) and beam characteristics (energy, flux, exposure time, and collimation) as supplied in (Burmeister, 2000).
Figure S1.5. Relationship between D loss (atom) and change in Bdamage (Gerstel et al., 2015) relative to the 1dwa structure, for each Tyr -OH atom in the myrosinase structure at each higher dose in series (a-d). For each atom a, and higher dose structure k = 1dwf, 1dwg, 1dwh, 1dwi, the relative change is calculated as To account for non-unity occupancies in the originally deposited data, all atomic occupancies in each coordinate model were set to 1 and a further round of isotropic B-factor refinement was performed in phenix.refine (Adams et al., 2010) prior to calculating Bdamage. The high correlation between D loss and change in Bdamage is striking, since both are independently calculated radiation damage metrics. Whereas Bdamage is dependent on refined coordinate model atomic B-factor values, D loss is derived directly from electron F obs (n) -F obs (1) difference density values.

Supplementary material 2: GH7 protein crystallization and data collection Crystallization
Glycoside hydrolase family 7 cellulase from Daphnia pulex (DpCel7B) was supplied by the National Renewable Energy Laboratory (NREL), having been overexpressed in a trichoderma reesei system. A crystal was grown using a hanging-drop crystallization protocol with 2.75 mg/ml protein mixed with 0.1 M monosodium citrate and 0.9 M ammonium sulphate, pH 4. The approximately cuboid-shaped crystal had dimensions of 75 × 75 × 10 µm. Prior to cryocooling of the crystal, it was soaked for 2 minutes in buffer solution with 30% (v/v) glycerol added as a cryoprotectant, immediately before storage in liquid nitrogen.

X-ray data collection
Data were collected at 100 K on beamline I02 at the Diamond Light Source, using an incident wavelength of 0.980 Å (12.7 keV) and a Pilatus 6M detector, positioned at 244.9 mm from the crystal throughout data collection. The beam passed through a 200 µm aperture and was slitted to 60 × 120 µm (vertical x horizontal), with an approximate Gaussian profile (vertical × horizontal FWHM: 17.7 × 104.8 µm). The flux at the sample position was determined using a pre-calibrated 500 µm thick silicon PIN photodiode to be ~1.77 × 10 12 ph/s at 100% beam transmission. The crystal was initially orientated with the small crystal dimension (z = 10 µm) parallel to the beam direction. A 2000° continuous sweep of data was collected, consisting of 9999 frames (Dj = 0.2°) of 0.04 s exposure time each. The beam transmission was held at 25% throughout.

Data processing
The single of 2000° sweep of diffraction data was divided into 11 disjoint 180° wedges (sufficient for completeness due to the C121 space group), each consisting of 900 images. Each wedge of data was integrated using DIALS (Fuentes-Montero et al., 2014), and then scaled and merged in AIMLESS . To obtain an initial set of phases for the first dataset, molecular replacement was performed in PHASER (McCoy et al., 2007), using an identical GH7 family Cellobiohydrolase from Daphnia pulex already deposited in the PDB (accession code: 4xnn, resolution 1.9 Å, (McGeehan, in preparation)) as a search model. The resulting initial coordinate model was then refined using REFMAC5 , first with 10 cycles of rigid body refinement, followed by repeated rounds of 10 cycles of restrained and TLS refinement, coupled with manual inspection in COOT. Final protein geometry was assessed with Molprobity4 (Chen et al., 2010).
For the model from each later dataset, 10 cycles of rigid body refinement were performed in REFMAC, using the refined coordinate model derived from the initial dataset coupled with the merged structure factors from the later dataset. This is a standard protocol for refining multi-dataset protein damage series (Southworth-Davies et al., 2007). Since the crystal unit cell dimensions generally increase as a function of dose , rigid body refinement was employed to compensate for slight re-definition of the unit cell parameters with increasing dose. For the current analysis, no restrained refinement was performed for the models from each of the higher dose datasets. The calculation of the D loss metric is derived from F obs (n) -F obs (1) Fourier difference map coefficients, using the phases obtained for the initial dataset refined coordinate model, and as such restrained refinement of models for higher dose datasets is not required. Data reduction and refinement statistics for the full damage series are reported in Tables S2.1 and S2.2. Coordinates and structure factor amplitudes have been deposited in the PDB with accession codes: 5mcc, 5mcd, 5mce, 5mcf, 5mch, 5mci, 5mcj, 5mck, 5mcl, 5mcm and 5mcn.

Dose calculations
The diffraction weighted dose (DWD) was calculated using RADDOSE-3D for each dataset in the damage series. The crystal was modelled as a cuboid with dimensions 75 × 75 × 10 µm. The crystal composition was determined using the 4xnn coordinate model, which has an identical sequence. The contribution of heavy elements within the crystallization buffer (0.1 M Na and 0.9 M S) was included in the calculated crystal absorption coefficients. A Gaussian-shaped beam was modelled with parameters as given above. As a result of the high symmetry in the data collection procedure, for dataset n = 1,2,… the resulting DWD value was calculated to be 1.11 + 2.16×(n -1) MGy. By the end of the final dataset (n = 11) the crystal had absorbed an accumulated dose of DWD = 22.7 MGy.  Table S2.2: Assessment of final macromolecular geometry for the refined coordinate model corresponding to the first dataset, as reported by Molprobity4 (Chen et al., 2010). Since for the coordinate models corresponding to the higher dose datasets, only rigid-body refinement was performed on the refined initial dataset coordinate model coupled with the higher dataset merged structure factor amplitudes, the reported geometry statistics were conserved for the higher dose datasets.

Supplementary material 3: Tyr -OH versus -C ζ correlation analysis
To verify whether published reports of negative F obs (n) -F obs (1) difference density at a small subset of Tyr residues (Tyr-330 in myrosinase (Burmeister, 2000), Tyr-63 in TRAP (Bury et al., 2016)) were compatible with a model of radiation-induced ring displacement, D loss values for Tyr -OH have been compared directly with those for covalently bound Tyr-C ζ atoms. Here, linear regression fitting has been restricted to structures deemed to contain statistically valid sample sizes of atoms (> 60 kDa).
For myrosinase, at the highest dose analysed (Fig. S3.1a) a high positive correlation exists between Tyr-OH and -C ζ D loss , (linear R 2 : 0.72 -0.82 for dose range), which is of the order of both the Asp -C γ & -O δ1 (Fig. S3.1c) and Glu -C δ & -O ε1 (Fig. S3.1e) (with linear R 2 : 0.81 -0.91 and 0.83 -0.91 respectively for dose range). The observed high correlations for Asp and Glu correspond here to full oxidative cleavage of the carboxylate. The comparably high correlation between D loss for Tyr-C ζ and Tyr-OH supports a hypothesis of radiation-induced disordering of the overall tyrosyl aromatic ring, fixed as a covalently bound unit. For explicit cleavage of the phenolic C-O bond, a lower correlation would be anticipated, similarly to those reported between the Asp -C γ & -C β (Fig. S3.1b) and Glu -C δ & -C γ (Fig. S3.1d) atoms (linear R 2 : 0.11 -0.23 and 0.42 -0.59 respectively across the dose range). These reduced correlations are expected since the Asp-C β and Glu-C γ atoms are not predicted to be cleaved during sidechain decarboxylation.
Similarly, for the TRAP damage series, the Asp -C γ & -O δ1 and Glu -C δ & -O ε1 D loss behaviour was more correlated than that of Asp -C γ & -C β and Glu -C δ & -C γ (Fig. S3.2 a-d).
For TRAP, the Tyr -OH and -C ζ behaviour was only weakly correlated across the large dose range studied (1.3 -25.0 MGy, (Bury et al., 2016)) ( Fig. S3.2e). We suggest here that the low correlation observed for Tyr is a consequence of the 11-fold symmetry around each TRAP ring. Each of the two TRAP rings within the asymmetric unit contains 11 symmetry-related copies of a single Tyr residue (Tyr-63), with all of these predicted to exhibit similar D loss values due to the conserved local protein environment (binding interactions, solvent accessibility) for each Tyr-63 residue around a TRAP ring. Consequently, the Tyr scatter plot may not contain an adequate sampling of Tyr residues for linear regression analysis in TRAP.
In contrast, other proteins (malate dehydrogenase (Fioravanti et al., 2007), acetylcholinesterase (Weik et al., 2000), and phosphoserine aminotransferase (Dubnovitsky et al., 2005)) exhibited negligible D loss correlation between Tyr -OH and -C ζ atoms (Fig. S3.3 & S3-4). However, no Tyr-OH groups were flagged as radiation-sensitive within these structures (Fig. 3, overall Tyr-OH ranks: 42, 36 and 54 for these proteins respectively, with the highest individual Tyr-OH positions being above 290, 138 and 235 at all tested doses). Consequently, a correlation would not be expected between -OH and -C ζ , with noise dominating at low D loss values. In summary, whereas the correlation between Tyr -OH and -C ζ atom electron density loss is inconsistent between the investigated proteins, in a subset of cases for which negative F obs (n) -F obs (1) map peaks have been detected near Tyr-OH groups (myrosinase and TRAP), a positive correlation is present. Figure S3.1 (a-e) Scatter plots to compare D loss (atom) behaviour for atoms within the same residue side-chains, for the highest dose myrosinase crystal dataset (Burmeister, 2000). The linear coefficient of determination is computed for each scatter plot. (a-e) Scatter plots to compare D loss (atom) behaviour for atoms within the same residue side-chains, for dataset 5 (11.6 MGy) of the TRAP crystal damage series (Bury et al., 2016). The linear coefficient of determination is computed for each scatter plot.
(a) R 2 : 0.21 (b) R 2 : 0.17 Figure S3.4. The linear coefficient of determination (R-squared) quantifying the linear correlation between D loss for selected atom pairs for large protein structures (> 60 kDa). Box plots illustrate the variation in R-squared value between each dataset for each separate damage series.