research papers
Metrics for comparison of crystallographic maps
^{a}Centre for Integrative Biology, Department of Integrated Structural Biology, IGMBC, CNRS UMR 7104–INSERM U964–Université de Strasbourg, 1 Rue Laurent Fries, BP 10142, 67404 Illkirch, France, ^{b}Faculté des Sciences et Technologies, Université de Lorraine, 54506 VandoeuvrelèsNancy, France, ^{c}Lawrence Berkeley National Laboratory, One Cyclotron Road, BLDG 64R0121, Berkeley, CA 94720, USA, ^{d}Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino 142290, Russian Federation, ^{e}Los Alamos National Laboratory, Los Alamos, NM 875450001, USA, and ^{f}Department of Bioengineering, University of California Berkeley, Berkeley, CA 94720, USA
^{*}Correspondence email: sacha@igbmc.fr
Numerical comparison of crystallographic contour maps is used extensively in structure solution and model
analysis and validation. However, traditional metrics such as the map (map CC, realspace CC or RSCC) sometimes contradict the results of visual assessment of the corresponding maps. This article explains such apparent contradictions and suggests new metrics and tools to compare crystallographic contour maps. The key to the new methods is rank scaling of the Fourier syntheses. The new metrics are complementary to the usual map CC and can be more helpful in map comparison, in particular when only some of their aspects, such as regions of high density, are of interest.Keywords: Fourier syntheses; crystallographic contour maps; map comparison; sigma scale; rank scaling; correlation coefficients.
1. Notation
F(hkl)exp[iφ(hkl)]: crystallographic with indices hkl.
F_{calc} = F_{calc}exp(iφ_{calc}): structure factors calculated from an atomic model.
F_{model} = F_{model}exp(iφ_{model}): structure factors calculated from an atomic model including modelled contribution from bulk solvent and various scales (Afonine et al., 2013).
N_{x}, N_{y}, N_{z}: grid numbers defining a regular grid in real space.
N_{grid}: total number of grid nodes of the used for comparison; in particular, N_{grid} = N_{x} × N_{y} × N_{z} if the maps are analyzed for the whole unit cell.
n = (n_{x}, n_{y}, n_{z}): grid node defined by its three integer indices.
ρ(x, y, z): Fourier synthesis calculated in the of direct space.
ρ(n) = ρ(n_{x}, n_{y}, n_{z}): Fourier synthesis calculated in grid node n.
ρ_{σ}(n) = ρ_{σ}(n_{x}, n_{y}, n_{z}): Fourier synthesis scaled in σ.
ρ_{d1–d2}: Fourier synthesis calculated with structure factors in the resolution range (d_{1}, d_{2}).
ρ_{complete}, ρ_{incomplete}: Fourier syntheses calculated with a complete set of structure factors up to a given highresolution cutoff or with some reflections excluded from this set; both the resolution value and the method used to exclude reflections are described explicitly for particular tests.
(F, φ) synthesis: Fourier synthesis calculated with the Fourier coefficients Fexp(iφ).
N_{μ}: number of grid nodes with the value below the cutoff level μ in the Fourier synthesis ρ: ρ(n) < μ; μ is given in the same units as ρ.
η(μ; ρ): quantile rank corresponding to the cutoff level μ for the Fourier synthesis ρ(n).
Q(n): (quantile) rankscaled Fourier synthesis ρ(n).
P(n): rankscaled Fourier synthesis ρ(n) with the values flattened out of the peaks.
M(q) = {n: Q(n) < q}: mask defined by the cutoff level expressed in the quantile rank q.
D(q; ρ_{a}, ρ_{b}); discrepancy function between two grid functions, ρ_{a}(n) and ρ_{b}(n), in particular between two Fourier syntheses.
CC(ρ_{a}, ρ_{b}): map between two grid functions.
CC_{r}(ρ_{a}, ρ_{b}): rank between two grid functions.
CC_{<qpeak>}(ρ_{a}, ρ_{b}): peak between two grid functions; selected peaks correspond to the q_{peak} quantile rank.
2. Introduction
Macromolecular crystallography operates with the electron (or neutron) density distribution in crystals. For ideal crystals, this physical entity can be described by a periodic function ρ_{exact}(x, y, z) of three space fractional coordinates (x, y, z) and can be represented by a Fourier series composed of an infinite number of complex coefficients F(hkl)exp[iφ(hkl)],
(Ewald, 1913). The values of these coefficients, called structure factors, depend on the crystal under study. The scale factor κ, equal to the inverse unitcell volume, puts function (1) on an absolute scale; alternative scales can also be used. In crystallographic practice, Fourier series contain only a finite set S of terms and are usually calculated on a threedimensional regular grid N_{x} × N_{y} × N_{z} with the grid nodes described by integer indices n = (n_{x}, n_{y}, n_{z}),
We call these grid functions (2) Fourier syntheses. To be analyzed visually or by a computer program, these mathematical entities are traditionally explored by contouring threedimensional isosurfaces
where μ_{n} are empirically chosen values. The result of such contouring is a geometric object that is referred to below as a crystallographic contour map.
Crystallographic structure solution typically deals with many maps arising at different stages of the process. Often, one is required to compare maps in order to assess modelbuilding and/or
steps. Quantitative comparison of maps calculated for the same crystal, for different crystals and even for different structures is important to evaluate the progress of structure solution and to validate the structure. However, confusion about the three terms given above, electron (or neutron) density distribution, Fourier syntheses and corresponding Fourier contour maps, sometimes leads to apparent contradictions between numerical and visual analyses, as shown below.As an example, we consider the exact electron density ρ_{pept_a}(n) = ρ_{exact}(n) corresponding to a peptide model (B = 1 Å^{2}) placed in an orthogonal with unitcell parameters a = b = 6, c = 3 Å, P1. ρ_{pept_b}(n) is its Fourier synthesis at a resolution of 0.5 Å and ρ_{pept_c}(n) is a Fourier synthesis calculated at a resolution of 1.0 Å for the same peptide model but taken with B = 5 Å^{2} and completed by a water molecule with B = 20 Å^{2}.
The maps for ρ_{pept_a}(n) and ρ_{pept_b}(n) shown at 2σ (§3.1.1) are very similar to each other (compare Fig. 1a with Fig. 1b). However, the usual map correlation coefficient
(see Supporting Information^{1} §S1) between ρ_{a}(n) = ρ_{pept_a}(n) and ρ_{b}(n) = ρ_{pept_b}(n) is only 0.90; here, 〈ρ_{a}〉 and 〈ρ_{b}〉 represent the mean values of ρ_{a}(n) and ρ_{b}(n), respectively. Indeed, the contour maps at 1σ (compare Fig. 1d with Fig. 1e) show that ρ_{pept_b}(n) differs significantly from ρ_{pept_a}(n). This reminds us that similarity of two contour maps at some cutoff level does not necessarily imply similarity of the corresponding syntheses.
Note that here we use the coefficient (4) to compare the whole syntheses, for example as in Read (1986) and Lunin & Woolfson (1993), while it can also be used locally (see, for example, Brändén & Jones, 1990; Kleywegt et al., 2004; Rupp, 2006; Tickle, 2012).
Secondly, the traditional choice of a cutoff level in σ (§3.1.1) is often not appropriate for map comparison. The map for ρ_{pept_c}(n) at 2σ (Fig. 1f) shows a much larger volume of the in comparison with that for ρ_{pept_a}(n) at the same 2σ cutoff level (Fig. 1a). However, the maps look similar when taken at different cutoff values (compare Fig. 1c with Fig. 1a).
Thirdly, the three maps ρ_{pept_a}(n), ρ_{pept_b}(n) and ρ_{pept_c}(n) look similar to each other, while the map CC calculated using (4) is high for one pair of them, CC(ρ_{pept_a}, ρ_{pept_b}) = 0.9, and is low for another, CC(ρ_{pept_a}, ρ_{pept_c}) = 0.6.
In fact, the map is obtained by comparing two sets of values calculated on the same grid, comparing all these values point by point but with no reference to the position of these points in space (these may even be in a onedimensional space). However, when we compare two maps visually we look at the shape of one or a few chosen isosurfaces. In other words, these two methods of comparison give different characteristics for different objects related to each other as explained above.
(4)Fig. 2 illustrates a practical example with two protein models available in the PDB (Bernstein et al., 1977; Berman et al., 2000). Here again, the calculated CC values disagree with the visual analysis. The corresponding details are given in §4.2.1.
Crystallographers use contour maps at different contour levels to focus on different aspects of the maps. At high contour levels the most prominent features are shown, while at lower contour levels more details of the electron density are seen. In many cases it is the most prominent features that are most useful to the crystallographer in identifying where atoms are likely to be present in the structure. In other cases, of course, the details of the map are very important in identifying errors in atomic placement and in comparing different maps.
In this article, we focus on a subset of the information in a map, such as the prominent features in the electron density, and suggest new approaches to comparing crystallographic maps. The emphasis in this work is on the shapes of isosurfaces in these maps. These are the shapes that crystallographers normally use to identify the atomic features of structures in crystals.
Suppose we have two functions calculated on the same grid. For each function a mask can be defined by some isosurface, with all the points inside this mask having a value greater than the cutoff associated with the isosurface. We would like to compare the shapes of these masks (isosurfaces). Intuitively, masks containing a different number of grid nodes are different. The question we focus on is how similar are two masks composed of the same number of grid nodes, i.e. covering the same volume of the We show below that to answer this question it is convenient to rescale the syntheses in the quantile rank (see §3.1.2) instead of a traditional scaling in σ (see §3.1.1).
After introducing rank scaling, we discuss a way to create a normalized metric useful in the comparison of two masks or a series of masks for various cutoff levels (§3.2). This naturally leads to a use of the Spearman rank correlation (Spearman, 1904; see also, for example, Lehmann & D'Abrera, 1998 and references therein), which is the same as the conventional calculated for rankscaled maps (§3.3). Considering only grid nodes with relatively high rank values results in another metric, a peak (§3.4) that corresponds to a visual comparison of the contour maps and that is based on much of the key structural information in the maps. §4 gives various possible illustrations where the new metrics complement the traditional map or explain some its apparent contradiction with a visual analysis.
Comparison of maps calculated on different grids is outside the scope of this work.
3. Methods
3.1. Scaling of crystallographic Fourier syntheses
3.1.1. Scaling by σ
In macromolecular crystallography, currently the most popular way of scaling crystallographic syntheses is by σ. Sigmascaled Fourier syntheses are obtained as follows,
with
and
Here, ρ(n) is some initial function, N_{grid} is the number of grid points in the and 〈ρ〉 is always equal to 0 when the term F_{000} is absent from the Fourier series (2). With such a scaling, the grid function (5) has the properties
and
Empirically, crystallographers consider values of ρ_{σ}(n) > 1 as a `signal level' at which the structural details are analyzed (values notably above the mean value, i.e. above the value for bulk solvent) and values of ρ_{σ}(n) > 3 as a `strong signal level'.
Another source of confusion comes from the map . In statistics, the is used to compare two sets of values from related distributions. However, the same formal expression is often used in crystallography, instead of the leastsquares metric (Supporting Information §S1), to compare two syntheses defined as vectors in an N_{grid}dimensional space. We stress that in the current work we do not consider the crystallographic Fourier syntheses as random functions even when such a consideration has previously been used in a number of projects (see, for example, Luzzati, 1953; Blow & Crick, 1959; Ramachandran & Raman, 1959; Main, 1979 and references therein; Vijayan, 1980; Read, 1986; Lunin, 1989; Terwilliger, 2000; Burla et al., 2010; Lang et al., 2014). In the following, we consider that both the map (4) and the new metrics are calculated for the whole Naturally, they can be calculated locally for any part of the in this case, N_{grid} would be the number of grid nodes inside this part.
(4)Since the scaling (5)–(7) is a linear transformation, the (4) calculated for the ρ_{σ}(n) values coincides with the CC calculated using the original values ρ(n).
While such scaling in σ is convenient to distinguish macromolecular features, it may be misleading when used for visual and numerical comparison of syntheses, as the example in §2 shows (Fig. 1; see also §4.1). The reason for this is that the of the values of the syntheses (Lunin, 1988, 1993; Main, 1990a,b) may be different for the two syntheses. As a consequence, the same cutoff level in σ defines different numbers of grid nodes selected by this level for these syntheses. Obviously, regions composed of a different number of points (using the same grid) can never be equal.
3.1.2. Rank scaling
The map comparison becomes easier if the Fourier syntheses are scaled in quantile ranks or are rank scaled. In image processing, this operation is referred to as histogram equalization (see, for example, Pratt, 1978). This means that for each cutoff value μ we count the number N_{μ} of grid nodes n such that the synthesis value is below it, ρ(n) < μ, and we then calculate the ratio
Here, the second argument, ρ, is the Fourier synthesis to be studied and the first argument, μ, is a particular value. In statistics, the value η (10) is called a quantile rank; when multiplied by 100 this gives the percentile rank. The notions of percentile and quantile and the corresponding ranks have recently been used in crystallography by Pozharski (2010), Gore et al. (2012) and Tickle (2012), although for different goals. Previously in crystallography, a scaling in units complementary to the quantile/percentile rank, i.e. in the fractional unitcell volume covered by the mask ρ(n) > μ, has been used by Vagin (personal communication) and by Lunin and coworkers (Lunin, 1988; Vernoslova & Lunin, 1993).
For a given synthesis ρ, the function (10) increases with μ. This monotonic behaviour permits an easy rank scaling (Appendix A), replacing the value ρ(n) at each point by
using η(μ; ρ) (10). This scaling does not change the shape of any isosurface, as all points with the same value of μ have the same value of the new function. Note that in contrast to the rescaling in σ, rank rescaling is a nonlinear transformation.
Most commonly, macromolecular crystallographers work with syntheses calculated with the coefficients (amplitudes) wF_{obs} or 2mF_{obs} − DF_{calc} (Read, 1986) and scaled in σ. Analyzing these syntheses, at least for the resolutions 1–3 Å at which many structural projects are carried out, the cutoff values μ used for visual interpretation range approximately between 1 and 2σ. The particular choice may depend on the resolution, bulksolvent content and other factors. Fig. 3 shows that the ranks corresponding to these values vary approximately from 0.85 to 0.95. These model calculations agree with calculations using various experimental data (not shown); in particular, this includes experimental data from PDB entries with low, medium and extremely high solvent content. This also agrees with the previous observation by Ioerger & Sacchettini (2002).
Other scaling methods, e.g. choosing another κ value [for example such that max_{n}ρ(n) = 100 or using a socalled `absolute scale'] or another nonlinear scheme (for example, Bhat, 1988; Lunin et al., 2000) are known, but we will not review this issue here in detail.
3.2. Comparison of two masks
Since the introduction of graphics stations in macromolecular crystallography, syntheses have typically been presented by a single isosurface at a time with the possibility of varying the corresponding cutoff levels. When we compare two syntheses visually, we look at the shape of the masks covered by the corresponding isosurfaces (there are a number of publications on image analysis that discuss the relevant computational procedures; see, for example, Bruckner & Möller, 2010 and references therein). As mentioned above, the quantitative similarity of two masks can be examined most readily when these masks are constructed so that they contain equal volumes. This is a particular advantage of the rankscaling approach, which naturally leads to equal volumes at given contour levels in different maps. Other onetoone synthesesscaling schemes with a similar property (Supporting Information §S3) are less convenient for the goals of the current work.
In order to compare two masks, we start by measuring (calculating) the difference between them. Let Q_{a}(n) and Q_{b}(n) be the two rankrescaled syntheses ρ_{a}(n) and ρ_{b}(n). For any quantile rank value, 0 ≤ q ≤ 1, the subsets (masks)
contain the same number N_{selected} = qN_{grid} of grid nodes. The difference between these masks may be described by the number N_{diff} of the nodes that belong to one of them and do not belong to another one,
where
Note that by construction M_{ab}(q) and M_{ba}(q) contain the same number of points. The condition N_{diff} = 0 means that the masks M_{a}(q) and M_{b}(q) coincide. If N_{diff} > 0 then the masks are different, but the value of N_{diff} does not allow judgment of the degree of this difference because the same number of differing points N_{diff} may have a different significance for small and large rank values q.
To put this difference N_{diff} on a scale, we compare M_{a}(q) with a random set M_{random} composed of the same number N_{selected} of grid nodes distributed uniformly in the and thus containing no structural information. On average, the number of grid nodes of M_{random} that are outside M_{a}(q) is just the number of grid nodes in M_{random} multiplied by the fraction of the cell that is outside M_{a}(q), i.e. by (1 − q),
The same estimate is valid for the comparison of M_{random} with M_{b}(q). Based on this, we normalize N_{diff} as
The calculated values of the normalized function D(q; ρ_{a}, ρ_{b}) at some value of the argument q may be equal to its minimal possible value, zero, when the corresponding masks coincide and may approach one when the two masks are uncorrelated. Since (15) is only a statistical estimate, in practice D(q; ρ_{a}, ρ_{b}) may sometimes happen to be greater than one. We notate (16) as D(q; ρ_{a}, ρ_{b}) and not D(q; Q_{a}, Q_{b}) to stress that this measure can be applied to any two functions calculated on the same grid and not necessarily functions rescaled in some specific way. We call (16) a discrepancy function. Different values of the argument q are useful for obtaining different types of information: high q values are useful for identifying the peaks of the functions (atomic positions or macromolecular chain), while q close to 0.5 is useful for the identification of molecular envelopes (the actual corresponding value of q varies with the fraction of the solvent region).
3.3. Rank correlation coefficient
When calculating the discrepancy function D (16) between two syntheses, we compare masks of equal size (`equivalent masks'), varying the cutoff level at which these masks are selected. To make such comparison easier, we rankscale the syntheses. When comparing a pair of equivalent masks we check each grid node one by one, identifying whether this grid node is inside only one mask, the other, both or neither.
Alternatively, after rank scaling the two syntheses Q_{a}(n) and Q_{b}(n) we may express their similarity by
This metric of similarity of syntheses ρ_{a}(n) and ρ_{b}(n) varies from −1 to 1, and in statistics it is known as Spearman's rank (Spearman, 1904). We may note that (Appendix A)
The key property of the rank _{r}(ρ_{a}, ρ_{b}) is its invariance with respect to scaling of the syntheses ρ_{a}(n) and ρ_{b}(n). As mentioned in §3.1.1, scaling by σ does not change the standard CC(ρ_{a}, ρ_{b}). In particular, CC(ρ_{a}, ρ_{b}) = 1 for all proportional functions, i.e. when ρ_{b}(n) = λρ_{a}(n) for all n. An important advantage of the rank CC_{r}(ρ_{a}, ρ_{b}) compared with CC(ρ_{a}, ρ_{b}) is that the former is invariant upon any monotonic (and not necessary linear) rescaling of the syntheses ρ_{a}(n) and ρ_{b}(n). In particular, CC_{r}(ρ_{a}, ρ_{b}) for a pair of nonproportional functions related by any monotonously increasing function ρ_{b}(n) = f[ρ_{a}(n)].
CCNote that using CC_{r}(ρ_{a}, ρ_{b}), in contrast to D(q; ρ_{a}, ρ_{b}), applies not only to Fourier maps shown as series of masks but also to any continuous spectrum of colours or intensities (see, for example, Schotte et al., 2003).
As an example, the rank correlation coefficients CC_{r} for the peptide syntheses defined in §2 are given by CC_{r}(ρ_{pept_a}, ρ_{pept_b}) = 0.56 and CC_{r}(ρ_{pept_a}, ρ_{pept_b}) = 0.22, which is more indicative of their difference than the standard map values, which are equal to 0.90 and 0.60, respectively. More details of comparison of these syntheses using the discrepancy function, the rank and other metrics as defined below are discussed in §4.1.
3.4. Comparison of peaks
Syntheses such as wF_{obs} or 2mF_{obs} − DF_{calc} scaled in σ have both positive and negative values. While analysis of negative values may be important (see, for example, Urzhumtsev et al., 1989), often only the regions of positive values are of interest. This is the case for visual analysis and manual model building; for example, the program Coot (Emsley et al., 2010) defaults to showing nondifference σscaled maps at μ > 0. However, maps similar in the positive domain may be different in the negative domain. This may give rise to an apparent contradiction: similarlooking maps (inspected in the positive domain only) may have low correlations computed using the entirety of the maps.
Since map regions with high values contain most of the structural information, it is useful to have a way to compare contour maps such that (i) differences between low values of the synthesis should not play a role and (ii) if a high value in one synthesis corresponds to a low value in another synthesis, the desired metric should not depend on the exact value of the lower value.
For example, for structures with the most frequent percentage of bulk solvent, the separation of positive and negative values in σscaled maps roughly corresponds to half of the syntheses, i.e. to the quantilerank cutoff q = 0.50. When comparing the top halves of the rankscaled syntheses Q_{a}(n) and Q_{b}(n), we shall exclude from comparison all grid points for which the values in both syntheses are low, defining a set of grid nodes staying with
Similarly, to effectively compare regions with high density (near peaks in the map) corresponding to a higher quantile rank value 0.5 < q_{peak} < 1.0, we define
We then flatten the syntheses values in the Ω_{qpeak} points if these values are below q_{peak} for one of the syntheses,
and finally calculate
Here,
and N_{Ω,qpeak} is the number of grid nodes in Ω_{qpeak} defined by (20). For example, a q_{peak} value equal to 0.50 defines CC_{50} and a q_{peak} value equal to 0.90 defines CC_{90} . As previously, the sums in (22) exclude all grid nodes in which both syntheses have values lower than the chosen threshold, indicating that we are not interested in comparison of syntheses at these points.
3.5. Practical applications
Depending on the particular problem, different tools are useful to compare crystallographic Fourier syntheses and the corresponding contour maps.
Naturally, when the similarity of threedimensional functions (for example, crystallographic Fourier syntheses) is analyzed, for example when these functions are used to extract the phase values of corresponding Fourier coefficients, the traditional map is still a good metric.
(4)However, in a major part of crystallographic projects only the Fourier contour maps for positive cutoff values (in σscaled syntheses) are used for visual inspection of maps. Moreover, for syntheses at a resolution of 1–3 Å the most frequently used cutoff levels of 1–2σ correspond to rank values q of as high as 0.85–0.95. To accompany the traditional visual analysis, we suggest using the coefficient CC_{90} as a rule of thumb and switching to CC_{95} using higher rank values in the case of a larger fraction of bulk solvent, higher map resolution or smaller B factors, and switching to CC_{85} or CC_{80} in the opposite situations. The CC_{50} may be used to characterize the similarity of isosurfaces roughly corresponding to molecular masks for structures with typical values of the bulksolvent fraction.
Use of the coefficient CC_{r} may be advised when the whole set of isosurfaces, including those for negative peaks, are studied. The discrepancy function D(q; ρ_{1}, ρ_{2}) completes this toolset when more detailed information is required.
§4 below provides examples of applications of the new correlation coefficients to macromolecular diffraction data. All of these applications confirm that the new metrics reflect important synthesis details that the standard CC does not fully consider. Moreover, in some cases they explain an apparent disagreement between CC and visual map analysis.
With regard to an appropriate visual comparison of syntheses, we suggest rankscaling them first and selecting the same cutoff value for the visualization of each. Alternatively, the syntheses can be taken on their initial scales (for example in σ) with the cutoff levels selected from equalization of the corresponding rank values as described in (28) and (29) in Appendix A.
4. Examples, applications and results
4.1. Peptide model data
We first apply the new metrics to the syntheses ρ_{pept_a}(n), ρ_{pept_b}(n) and ρ_{pept_c}(n) defined in §2 for a simulated peptide crystal. For the very sharp electrondensity distribution ρ_{pept_a}(n) corresponding to a crystal with very few atoms, the rank scale is lower than that for the macromolecular syntheses at usual resolutions of 1–3 Å. In particular, for ρ_{pept_a}(n) the value q = 0.80 corresponds to a zero cutoff level in σ, the value q = 0.95 corresponds to 0.6σ and q = 0.99 corresponds to 2.2σ. (Fig. 3 reminds us that for typical macromolecular syntheses the value 0σ corresponds to the range q = 0.40–0.60, the value 1σ corresponds to the range q = 0.85–0.90 and 2σ to the range q = 0.90–0.95.)
For the exact electrondensity distribution ρ_{pept_a}(n) and the corresponding synthesis ρ_{pept_b}(n) at a resolution of 0.5 Å, the rank is lower than the standard map (Table 1). This means that for most cutoff levels the masks in the 0.5 Å resolution synthesis differ significantly from those in the exact electron density. Figs. 1(d) and 1(e) provide an example. The coefficient CC_{90}(ρ_{pept_a}, ρ_{pept_b}) is above 0.80, indicating that the peaks (their position and shape) around atomic positions are more or less conserved.

Both the CC and CC_{r} correlation coefficients for ρ_{pept_a}(n) with ρ_{pept_c}(n) are lower than for the comparison of ρ_{pept_a}(n) with ρ_{pept_b}(n); this is owing to the lower resolution of ρ_{pept_c}(n) and the presence of an additional atom in the crystal. Neither of these values indicates similarity of the contour maps showing peaks above the level for the water molecule (Figs. 1a and 1c), while the CC_{95} does.
The Supporting Information (§S2) contains another example built on the basis of this peptide model; this example is more mathematical and illustrates comparison of grid functions by different correlation coefficients in a more transparent way. These results confirm that the new metrics describe the information contained in the crystallographic contour maps much better than the traditional metrics.
4.2. Incomplete lowresolution data sets
4.2.1. Explaining an apparent contradiction between low correlation coefficients and similar contour maps
A model F_{calc}exp(iφ_{calc}) Fourier synthesis (referred to as ρ_{incomplete}) computed for PDB entry 1nh2 by (2) using reflection indices from the deposited data set (Bleichenbacher et al., 2003; highest resolution 1.9 Å; data completeness 95%) shows part of the structure very poorly (Fig. 2a). Fig. 2(b) shows a model Fourier synthesis ρ_{complete} calculated with the same coefficients using all theoretically possible reflections up to 1.9 Å resolution. The CC calculated using (4) between the two syntheses is 0.70. Since both syntheses were calculated with the model data, the only source of difference is the missing reflections, essentially the lowest resolution reflections (there are 300 reflections missing from 408 with resolution below 10 Å; all 59 reflections with resolution below 20 Å are missing).
A similar comparison of ρ_{incomplete} with ρ_{complete} for another test case (PDB entry 3cr1; MacElrevey et al., 2008; highest resolution 2.25 Å; data completeness 98%) yields an even lower map CC = 0.64, which one would expect to be reflected by a larger difference between the two maps. This low is owing to missing only 2% of the reflections (there are 116 reflections missed out of 251 collected at a resolution below 10 Å and 32 reflections out of 42 at a resolution below 20 Å). However, the contour map obtained with the incomplete data set is perfectly interpretable for the whole molecule and is very similar to the map calculated with the complete set of reflections (compare Fig. 2d with Fig. 2c). This illustrates that the map is not necessarily a good predictor of the visual similarity of maps either from its value or when comparing different pairs of maps.
The rank _{r} (18) is 0.30 for 1nh2 and just 0.01 for 3cr1 and is even lower than the values of the standard map It shows that in this case of missing lowresolution data most of the masks are severely changed compared with the corresponding masks in ρ_{complete} (see also Urzhumtsev, 1991; Urzhumtseva & Urzhumtsev, 2011).
CCThe peak 1nh2 and 0.83 for 3cr1 and shows that the peaks are conserved much better for 3cr1, agreeing with the visual analysis. This relationship is not shown by either the standard map or the rank Fig. 4(a) expands on this calculation of CC_{90} by showing the discrepancy function D(q) for these two comparisons. It can be seen that for most rank values q the contours are quite different in both cases, D(q) ≃ 1, that for high q values such as 0.90 they are equally similar and for very high values such as q ≃ 0.95 they are more similar for 3cr1.
considers only the part of the map in the quantile rank greater than 0.90 and gives different information. Its value is 0.67 for4.2.2. Effect of lowresolution incompleteness on crystals with various solvent contents
The examples in §4.2.1 illustrate the effect of lowresolution data incompleteness. It seemed possible that the strength of this effect might depend on the fraction of bulk solvent in the crystal. We made a comparative analysis considering three cases of bulksolvent content: near the very common value of 50% (PDB entry 1zud; Lehmann et al., 2006; solvent content 0.47), very high (PDB entry 1q09; Changela et al., 2003; solvent content 0.84) and very low (PDB entry 1ous; Loris et al., 2003; solvent content 0.24).
For each of these structures, we calculated a complete set of structure factors F_{calc}exp(iφ_{calc}) from the atomic model at a resolution of 2 Å. We call the Fourier synthesis calculated with these structure factors ρ_{2–∞}(n) = ρ_{complete}(n). We also calculated another Fourier synthesis F_{calc}exp(iφ_{calc}) in which all of the structure factors at a resolution outside the range 2–10 Å were excluded. We call this synthesis omitting lowresolution data beyond 10 Å ρ_{2–10}(n) = ρ_{incomplete}(n).
Both the conventional _{r} comparing ρ_{2–∞}(n) and ρ_{2–10}(n) decrease with increasing volume of the bulksolvent region in these cases (Table 2). Note that for 1ous, with an extremely low bulksolvent content, all of the maps are well conserved.
CC and the rank CC

The variation in CC_{r} is more significant; in particular, its value of close to zero for 1q09 means that for these data most of the masks changed, information which is difficult to extract from the CC value of above 0.7. At the same time, the peaks are well conserved for all three structures; Fig. 5 gives an example for 1zud. The larger the bulksolvent content, the higher the quantile rank corresponding to the highest value of the peak (Table 2; Fig. 4b).
The situation is quantitatively similar when we compare the corresponding maps ρ_{4–∞}(n) = ρ_{complete}(n) and ρ_{4–10}(n) = ρ_{incomplete}(n) calculated with data at lower resolution, in the ranges from 4 Å to infinity and from 4 to 10 Å, respectively (Table 2).
4.3. Effect of dataresolution cutoff
Intuitively, it is clear that excluding highresolution data changes the maps in a different way than excluding lowresolution data. It is easy to illustrate this using the new metrics.
To do so, for each of the three structures described in §4.2 we calculated the F_{calc}exp(iφ_{calc}) syntheses ρ_{2–∞}(n) and _{4–∞}(n) with the complete data sets at resolutions of 2 and 4 Å, respectively, and compared them. The map values CC(ρ_{2–∞}, ρ_{4–∞}) are relatively high; for example for 1q09 this coefficient is as high as 0.88. This number shows some difference in the maps at such high and lowresolution cutoffs; however, one might intuitively expect a much larger difference. Indeed, the rank CC_{r}(ρ_{2–∞}, ρ_{4–∞}) is much lower for 1q09, being equal to 0.44 and showing that the maps are substantially different.
As expected, the peak correlation coefficients for high rank values q are low (see, for example, CC_{95} and CC_{99}) since the peaks are merged in the 4 Å resolution maps in comparison with the 2 Å resolution maps. The closetozero values of these coefficients are more intuitive than the value of the map CC(ρ_{2–∞}, ρ_{4–∞}) given above.
At the same time, some peak correlation coefficients are relatively high, e.g. CC_{80}(ρ_{2–∞}, ρ_{4–∞}) = 0.85 for 1q09. The corresponding rank value corresponds well to that defining the molecular region (see also Fig. 5) and shows that the molecular masks are less affected by excluding the highresolution data. For the 1ous data, the molecule occupies practically the whole (and simply the whole if structural waters are included), and all peak correlation coefficients for it are low, showing changes in the maps at all cutoff levels.
Thus, using CC_{r} and the rank correlation coefficients may illustrate features that are difficult to see when referring only to the standard map CC (4).
4.4. Effect of excluding reflections for crossvalidation
§4.2 shows that the loss of a relatively small number of lowresolution reflections (as few as 2%) can result in significant changes in the Fourier contour maps. On the other hand, the test data set (Brünger, 1992), typically containing 5–10% of the total number of reflections, is purposely excluded from all calculations; this should be the case for all steps including, formally speaking, the calculation of contour maps (although the latter is not always the case in practice). These data are used for validation (Brünger, 1992) and to estimate statistical parameters (Lunin & Skovoroda, 1995; Pannu & Read, 1996; Murshudov et al., 1997). In general, the reflections for the test set are chosen randomly and uniformly across reciprocal space.
There is an old and frequently asked question whether excluding such reflections noticeably distorts the Fourier contour maps. We do not analyze this question in detail here, but simply illustrate the effects for a typical protein structure under typical conditions. To do so, we used the IF2 structure that we recently solved (Simonetti et al., 2013; PDB entry 4b3x). The corresponding crystals belonged to P2_{1}2_{1}2_{1}, with unitcell parameters a = 45.42, b = 61.46, c = 162.40 Å. The experimental data set is complete to 2 Å resolution (with only two lowresolution reflections missing); bulk solvent occupies approximately 50% of the The R and R_{free} values calculated by PHENIX (Adams et al., 2010) are less than 0.18 and 0.22, respectively, showing that the structure factors F_{model}exp(iφ_{model}) calculated from the atomic model including the correction from the bulk solvent [Jiang & Brünger (1994), with the improvements described by Afonine et al. (2013)], reproduce the experimental data well. Thus, we used the phase values φ_{model} as the best possible approximation to the unknown values to be associated with the experimental structurefactor amplitudes F_{obs}.
We calculated a series of Fourier syntheses at a resolution of 2 Å with coefficients F_{obs}exp(iφ_{model}), with the fraction of randomly excluded reflection ranging between 5 and 10%, as is routinely undertaken for testset reflections. Each of these syntheses was compared with a synthesis calculated with the complete data set. The CC between them remained high, i.e. above 0.90, even when the test set contained up to 20% of the data. However, the peak correlation coefficients CC_{50}–CC_{80} indicated nonnegligible map changes when 10% of the data were excluded. The maps showed significant noise at the rank value q = 0.80 (roughly 0.4σ for this synthesis) and incorrect density for a few weakly defined side chains. We note that the molecule occupies approximately half of the q = 0.50. The differences resulting from the exclusion of 10% of reflections are more significant than the differences owing to experimental errors in amplitudes, as can be seen from comparison with the maps calculated with coefficients F_{model}exp(iφ_{model}) (Table 3). Overall, maps obtained with the model data F_{model}exp(iφ_{model}) illustrated a behaviour similar to that for F_{obs}exp(iφ_{model}) maps.

Summarizing, we suggest that carrying out an analysis of the rank and peak correlation coefficients could be used as a routine tool for identifying a suitable fraction of reflections for a test set in Fourier syntheses even when this set has been already assigned. A synthesis may be calculated with the working set of reflections and with the full available data set, and if the rank or peak correlation coefficients between these maps are low (a more systematic analysis is probably required to define appropriate critical values), the test data set might be reduced for further calculations by reassigning, also randomly and uniformly, some reflections back to the working set. As this example shows, the usual
alone may be not sufficiently informative.4.5. Bulksolvent contribution
It is largely accepted that using a bulksolvent correction is vital in order to properly include lowresolution data into the structuresolution process (see, for example, Phillips, 1980; Fenn et al., 2010; Afonine et al., 2013 and references therein). However, the influence of the bulksolvent correction on Fourier syntheses has been less discussed.
To analyze the direct effect of the bulksolvent contribution on the Fourier synthesis, complementary to the synthesis with {F_{model}exp(iφ_{model})} for the IF2 model (§4.4), we calculated another synthesis with the structure factors {F_{calc}exp(iφ_{calc})} without a bulksolvent correction. The data sets were complete at the resolution of 2 Å. As mentioned above, the first data set, including the bulk solvent, reproduces the experimental data quite well.
The _{r} of 0.62 shows that in fact the changes in the map owing to unmodelled bulk solvent are not negligible. This means that ignoring a bulksolvent correction when modelling the `experimental syntheses' may result in maps that differ from the correct maps and therefore may lead to wrong or unjustified conclusions. In particular, such data are not recommended for analysis of molecular envelopes since they may be mostly affected by this improper modelling (Table 4). At the same time, such simulated syntheses can be successfully used when studying only the structural details since CC_{80}–CC_{95} indicate very high similarity of the peaks.
CC between the two syntheses, equal to 0.89, indicates their high similarity. However, the rank coefficient CC

Comparison of the corresponding syntheses calculated at a resolution of 3 Å gives values comparable with those for the 2 Å resolution syntheses. However, the peak correlation coefficients for the rank q ≥ 0.9 are lower. For example, the coefficient CC_{99} corresponding roughly to the 3σ cutoff level decreases from 0.95 at 2 Å to 0.86 at 3 Å. This indicates that at lower resolution limits the unmodelled bulksolvent contribution may distort not only the molecular envelopes but also the peaks of the syntheses.
5. Discussion
The several examples presented in this work show that the traditional map
CC does not always correspond well to the similarity of or the difference in two Fourier syntheses based on visual examination. Approaches are presented to address this problem. They are based on the concept of a rank scaling of the syntheses. With such a scaling, regions selected with the same cutoff level contain the same number of grid nodes and the number of grid nodes in common is a useful measure of the similarity of the maps at that cutoff level.The rank _{r} is calculated as a correlation of the rankscaled syntheses instead of the initial values ρ(n), for example those in σ. Both CC and CC_{r} are equal to 1 when the values of the two syntheses are related by a linear transformation. However, in contrast to CC, CC_{r} is equal to 1 also when the values of the syntheses are related by a nonlinear monotonic transformation; here, the maps are exactly the same but correspond to different cutoff levels on the original scales.
CCTo accompany traditional visual analysis, we suggest using the peak correlation coefficients, in particular CC_{90}, as a rule of thumb, adjusting the peak level to particular situations and problems. To compare molecular masks or peaks in the lowresolution maps, the CC_{50} may be more appropriate. The discrepancy function D(q; ρ_{a}, ρ_{b}) compares the selected regions (masks) by counting the number of grid nodes in complementary regions, regardless of the exact values of the syntheses in these nodes.
The computational tools described here may be applied to answer additional questions to those that we have illustrated. The new coefficients may be calculated not in the whole σ_{A} synthesis and a difference synthesis. These tools may be used, in the case of comparing several maps, to select the one for which the corresponding contour maps correspond better to a control map. Naturally, the choice of the map for comparison is important and should be considered for each particular project.
but locally in a given region. With the peak one may compare syntheses previously difficult to compare numerically such as the usualThe developed metrics can be also applied to compare maps corresponding to different crystals or to noncrystallographic objects, for example
reconstructed images. The only requirement is that the compared parts of the images are of the same size and the maps are calculated on the same grid.The tools discussed in this manuscript, namely the discrepancy function D(q; ρ_{a}, ρ_{b}), the rank CC_{r}(ρ_{a}, ρ_{b}) and the peak CC_{<qpeak>}(ρ_{a}, ρ_{b}), are implemented in PHENIX (Adams et al., 2010) and are also available as an independent program from AU.
APPENDIX A
Rankscaled synthesis and corresponding statistical moments
Firstly, for a synthesis calculated on an arbitrary scale on a grid composed of N_{grid} nodes n, one computes the frequency of its values, as was introduced into crystallography by Lunin (1988) and Main (1990a,b). To do so, the interval (ρ_{min}, ρ_{max}) = [min ρ(n), max ρ(n)] is divided into J nonintersecting subintervals called bins: (ρ_{0} = ρ_{min}, ρ_{1}), (ρ_{1}, ρ_{2}), …, (ρ_{J−1}, ρ_{J} = ρ_{max}). For each grid node n, we identify the interval j to which the corresponding value ρ(n) belongs to,
and add a unit to the counter n_{j} of this bin. The frequencies of the synthesis values are then calculated as
giving
The quantile ranks q_{j} corresponding to the bin borders μ = ρ_{j} are the numbers N(ρ_{j}) of grid nodes n with the synthesis value below the threshold, ρ(n) < ρ_{j}, normalized as
For an intermediate value ρ_{j−1} < μ < ρ_{j}, its quantile rank q may be calculated by a linear interpolation
making it a strictly increasing function of the initial synthesis values. Inversely, for a given rank q_{j−1} < q < q_{j} the corresponding initial synthesis value is recovered as
The value complementary to q(μ; ρ) gives the fractional volume V_{f} of the selected by the corresponding cutoff level
Scaling of crystallographic Fourier syntheses in fractional volume has been described previously, for example by Vernoslova & Lunin (1993).
Let Q(n) be a rankscaled synthesis where each value ρ(n) is substituted by the corresponding quantile rank using (28). We split the nodes into M equal groups defined by the values of the synthesis, with no relation to their position in the cell. The first N_{grid} points correspond to the lowest values of the synthesis; the corresponding rank values are 0 < Q(n) < 1/M; the next N_{grid}/M points correspond to slightly higher values with 1/M < Q(n) < 2/M etc. Then, for large enough M,
Similarly,
Supporting information
Supporting Information. DOI: https://doi.org//10.1107/S1399004714016289/kw5094sup1.pdf
Acknowledgements
The contour maps used in this work and shown in Figs. 1, 2 and 5 and in the Supporting Information were produced using PyMOL (DeLano, 2002). PVA, TCT and PDA thank the NIH (grant GM063210) and the PHENIX Industrial Consortium for support of the PHENIX project. This work was supported in part by the US Department of Energy under Contract No. DEAC0205CH11231 (PVA, TCT and PDA) and by Russian Foundation for Basic Research grant 130400118a (VYL). AU thanks the French Infrastructure for Integrated Structural Biology (FRISBI) ANR10INSB0501 and Instruct as part of the European Strategy Forum on Research Infrastructures (ESFRI). We thank the referees for their very fruitful and constructive comments.
References
Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. Web of Science CrossRef CAS IUCr Journals
Afonine, P. V., GrosseKunstleve, R. W., Adams, P. D. & Urzhumtsev, A. (2013). Acta Cryst. D69, 625–634. Web of Science CrossRef CAS IUCr Journals
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F. Jr, Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542. CSD CrossRef CAS PubMed Web of Science
Bhat, T. N. (1988). J. Appl. Cryst. 21, 279–281. CrossRef Web of Science IUCr Journals
Bleichenbacher, M., Tan, S. & Richmond, T. J. (2003). J. Mol. Biol. 332, 783–793. Web of Science CrossRef PubMed CAS
Blow, D. M. & Crick, F. H. C. (1959). Acta Cryst. 12, 794–802. CrossRef CAS IUCr Journals Web of Science
Brändén, C.I. & Jones, T. A. (1990). Nature (London), 343, 687–689.
Brünger, A. T. (1992). Nature (London), 355, 472–475. PubMed Web of Science
Bruckner, S. & Möller, T. (2010). Comput. Graph. Forum, 29, 773–782. Web of Science CrossRef
Burla, M. C., Caliandro, R., Giacovazzo, C. & Polidori, G. (2010). Acta Cryst. A66, 347–361. Web of Science CrossRef CAS IUCr Journals
Changela, A., Chen, K., Xue, Y., Holschen, J., Outten, C. E., O'Halloran, T. V. & Mondragon, A. (2003). Science, 301, 1383–1387. Web of Science CrossRef PubMed CAS
DeLano, W. L. (2002). PyMOL. http://www.pymol.org.
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. Web of Science CrossRef CAS IUCr Journals
Ewald, P. P. (1913). Phys. Z. 14, 465–472. CAS
Fenn, T. D., Schnieders, M. J. & Brunger, A. T. (2010). Acta Cryst. D66, 1024–1031. Web of Science CrossRef CAS IUCr Journals
Gore, S., Velankar, S. & Kleywegt, G. J. (2012). Acta Cryst. D68, 478–483. Web of Science CrossRef CAS IUCr Journals
Ioerger, T. R. & Sacchettini, J. C. (2002). Acta Cryst. D58, 2043–2054. Web of Science CrossRef CAS IUCr Journals
Jiang, J.S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100–115. CrossRef CAS PubMed Web of Science
Kleywegt, G. J., Harris, M. R., Zou, J., Taylor, T. C., Wählby, A. & Jones, T. A. (2004). Acta Cryst. D60, 2240–2249. Web of Science CrossRef CAS IUCr Journals
Lang, P. T., Holton, J. M., Fraser, J. S. & Alber, T. (2014). Proc. Natl Acad. Sci. USA, 111, 337–342. Web of Science PubMed
Lehmann, C., Begley, T. P. & Ealick, S. E. (2006). Biochemistry, 45, 11–19. Web of Science CrossRef PubMed CAS
Lehmann, E. L. & D'Abrera, H. J. M. (1998). Nonparametrics: Statistical Methods Based on Ranks. Englewood Cliffs: Prentice–Hall.
Loris, R., Tielker, D., Jaeger, K. E. & Wyns, L. (2003). J. Mol. Biol. 331, 861–870. Web of Science CrossRef PubMed CAS
Lunin, V. Y. (1988). Acta Cryst. A44, 144–150. CrossRef CAS Web of Science IUCr Journals
Lunin, V. Y. (1989). Acta Cryst. A45, 501–505. CrossRef CAS Web of Science IUCr Journals
Lunin, V. Y. (1993). Acta Cryst. D49, 90–99. CrossRef CAS Web of Science IUCr Journals
Lunin, V. Y., Lunina, N. L. & Urzhumtsev, A. G. (2000). Acta Cryst. A56, 375–382. Web of Science CrossRef CAS IUCr Journals
Lunin, V. Y. & Skovoroda, T. P. (1995). Acta Cryst. A51, 880–887. CrossRef CAS Web of Science IUCr Journals
Lunin, V. Y. & Woolfson, M. M. (1993). Acta Cryst. D49, 530–533. CrossRef CAS Web of Science IUCr Journals
Luzzati, V. (1953). Acta Cryst. 6, 142–152. CrossRef CAS IUCr Journals Web of Science
MacElrevey, C., Salter, J. D., Krucinska, J. & Wedekind, J. E. (2008). RNA, 14, 1600–1616. Web of Science CrossRef PubMed CAS
Main, P. (1979). Acta Cryst. A35, 779–785. CrossRef IUCr Journals Web of Science
Main, P. (1990a). Acta Cryst. A46, 372–377. CrossRef CAS Web of Science IUCr Journals
Main, P. (1990b). Acta Cryst. A46, 507–509. CrossRef CAS Web of Science IUCr Journals
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals
Pannu, N. S. & Read, R. J. (1996). Acta Cryst. A52, 659–668. CrossRef CAS Web of Science IUCr Journals
Phillips, S. E. V. (1980). J. Mol. Biol. 142, 531–554. CrossRef CAS PubMed Web of Science
Pozharski, E. (2010). Acta Cryst. D66, 970–978. Web of Science CrossRef CAS IUCr Journals
Pratt, W. K. (1978). Digital Image Processing. New York: Wiley.
Ramachandran, G. N. & Raman, S. (1959). Acta Cryst. 12, 957–964. CrossRef CAS IUCr Journals Web of Science
Read, R. J. (1986). Acta Cryst. A42, 140–149. CrossRef CAS Web of Science IUCr Journals
Rupp, B. (2006). Nature (London), 444, 817. Web of Science CrossRef PubMed
Schotte, F., Lim, M., Jackson, T. A., Smirnov, A. V., Soman, J., Olson, J. S., Phillips, G. N. Jr, Wulff, M. & Anfinrud, P. A. (2003). Science, 300, 1944–1947. Web of Science CrossRef PubMed CAS
Simonetti, A., Marzi, S., Fabbretti, A., Hazemann, I., Jenner, L., Urzhumtsev, A., Gualerzi, C. O. & Klaholz, B. P. (2013). Acta Cryst. D69, 925–933. Web of Science CrossRef CAS IUCr Journals
Spearman, C. (1904). Am. J. Psychol. 15, 72–101. CrossRef
Terwilliger, T. C. (2000). Acta Cryst. D56, 965–972. Web of Science CrossRef CAS IUCr Journals
Tickle, I. J. (2012). Acta Cryst. D68, 454–467. Web of Science CrossRef CAS IUCr Journals
Urzhumtsev, A. G. (1991). Acta Cryst. A47, 794–801. CrossRef CAS Web of Science IUCr Journals
Urzhumtsev, A. G., Lunin, V. Y. & Luzyanina, T. B. (1989). Acta Cryst. A45, 34–39. CrossRef CAS Web of Science IUCr Journals
Urzhumtseva, L. & Urzhumtsev, A. (2011). J. Appl. Cryst. 44, 865–872. Web of Science CrossRef CAS IUCr Journals
Vernoslova, E. A. & Lunin, V. Y. (1993). J. Appl. Cryst. 26, 291–294. CrossRef CAS IUCr Journals
Vijayan, M. (1980). Acta Cryst. A36, 295–298. CrossRef CAS IUCr Journals Web of Science
This is an openaccess article distributed under the terms of the Creative Commons Attribution (CCBY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.