On incoherent diffractive imaging

Starting from a simple model of stochastic fluorescence emission, a theory is derived of contrast formation and signal-to-noise ratio for incoherent diffractive imaging; its feasibility for plausible experimental parameters is discussed.


Introduction
X-ray diffraction capitalizes on the fact that microscopic signals of scattered waves add up coherently and form a macroscopic interference pattern which can be captured by X-ray detectors in the far-field. Hence, within a volume defined by a coherence length of the radiation, the signal is enhanced by constructive interference proportional to the square of the scattering centers $ N 2 , most notably in the forward direction and, for diffraction from crystals, at the Bragg peak positions. Moreover, even for distances beyond the coherence length, the diffraction signal scales linearly with the interaction volume. Modulating the scattering intensity as a function of the outgoing wavevector IðkvÞ with wavenumber k and unit direction vector v, the measurable far-field interference pattern thus encodes the spatial Fourier transform of the 'scattering length density' FðrÞ on atomic scales. Since the scattering process is coherent, the difference between the outgoing v and incoming wavevector v 0 determines the phase shifts between scattering centers, and in the kinematical scattering regime the intensity is proportional to the structure factor SðqÞ / jF fFgðqÞj 2 as a function of the scattering vector q kðv 0 À vÞ.
Notwithstanding the abiding importance of coherent diffraction, the small cross section of Thomson scattering limits structural analysis at the level of single small crystallites or even single molecules. This has raised a desire to make better use of the much higher cross section for photo-electric absorption, not only for spectroscopy, but also for a generalized diffraction method. Depending on photon energy E ¼ h -! and atomic number Z, the ratio of cross sections for photoelectric absorption and elastic (Thomson) scattering can easily result in hundreds of photons being absorbed per coherently scattered photon. Photo-ionization then leaves an excited atom with an inner-shell vacancy behind, from which a sizeable fraction decays via emission of an X-ray fluorescence photon. Fluorescence emission was long perceived not to convey any information about the microscopic sample structure, due to the random nature of the emission with independent phases. 1 Correspondingly, one would expect a random speckle pattern in the far-field, encoding not only the path length differences due to structure but also the unknown random phase differences. Yet, all structure in the far-field pattern 'averages out' because the coherence time of X-ray fluorescence is generally on the order of 1 fs, and all measurements are significantly longer than the coherence time.
However, as observed by Classen et al. (2017), the pulse length of modern XFELs (X-ray free-electron lasers) on the order of a few up to a hundred femtoseconds provides an intrinsic time gating for the fluorescence emission.
Although not yet quite as short as the coherence time of X-ray fluorescence, the pulse lengths might just be short enough to leave some structural information in the fluorescence. In the work of Classen et al. (2017), a simple time-independent quantummechanical model was used to show that the two-point correlations of the fluorescence intensity are in fact proportional to the very same structure factor SðqÞ which emerges in coherent scattering plus a constant offset. They hence proposed a method to extract spatial information from incoherent diffraction patterns, which they termed incoherent diffractive imaging (IDI) in analogy to coherent diffractive imaging (CDI) (Miao et al., 1999(Miao et al., , 2015Chapman et al., 2006). The underlying principle, intensity interferometry, goes back to the work of Hanbury Brown & Twiss (1956), where it was used to measure the angular diameter and separation of stars. However, we are not aware of any successful realization of IDI with atomic resolution, to date.
A closely related experiment, exploiting two-point intensity correlations of X-ray fluorescence, was reported by Inoue et al. (2019). The authors used the correlations in the fluorescence from a thin copper foil, excited by XFEL pulses from SACLA (SPring-8 Angstrom Compact free electron Laser), to infer the duration of the exciting pulse and spatial extent of the focal spot. Unlike IDI, however, which aims for 3D imaging with atomic resolution, the latter experiment extracted only the spatial extent of the scattering volume with a resolution just below 1 mm, and thus served mainly for characterization of the pulse itself. Similarly, two-point intensity correlations have been studied analytically as a means to deduce the source size as well as the detector point-spread-function (Gureyev et al., 2017). While the experiment of Inoue et al. (2019) is hence still far from a realization of IDI with atomic resolution, it is conceptually similar, since in both cases structural information is extracted from two-point intensity correlations of hard X-ray fluorescence. Further, two-point and higher-intensity correlations of incoherent diffraction data have also been exploited to reconstruct 2D test structures imaged by FEL (free-electron laser) pulses (Schneider et al., 2017). They are thus prototypical for an emerging class of experiments, where the information is not contained in the mean of the experimental data (which is homogeneous), but in its dependency structure.
The concept of IDI and the absence of its experimental demonstration have also sparked theoretical investigations. Ho et al. (2020) discussed the coherence time of hard X-ray fluorescence following the excitation by intense XFEL pulses. More recently, Trost et al. (2020) discussed the signal-to-noise ratio (SNR) of IDI, using a simple time-independent waveoptical model. Their model considers a set of 'emitters that emit monochromatic spherical waves with random relative phases' and includes the photon (detection) statistics of a pixel detector. Based on their model, they derive and simulate the scaling of the SNR of the two-point photon correlations as a function of several external parameters, such as the mean number of counts per pixel, the number of emitters and the number of modes. They also give an analytical expression for the number of modes as a function of the coherence time, the duration of the excitation pulse and the polarization state. Moreover, they mention that as the pixel size is increased, the number of modes increases. However, they do not explicitly treat this 'speckle sampling' effect, arguing that it can be considered by an appropriate adjustment of the parameters. Similarly, they discuss only qualitatively how 'large crystals' can lead to an increased number of modes, when the linear extent of the scattering volume is greater than the coherence length of the emissions, but do not quantify this effect. They conclude, in agreement with the seminal work of Classen et al. (2017), that, under optimized conditions, 'IDI may offer utility in structure determination of single molecules at X-ray FELs' and that 'IDI could potentially provide element-specific structural information to complement weak coherent scattering' (Trost et al., 2020).
Here, we ask to what extent we can translate our intuition from the kinematic and dynamic theory of coherent diffraction to incoherent diffraction? In particular, can we develop a quantitative understanding of contrast formation in incoherent diffraction, including the effects that were mentioned qualitatively in the works of Classen et al. (2017) and Trost et al. (2020) in a self-contained way?
To answer these questions, we develop a time-dependent probabilistic model for incoherent emissions following a short excitation pulse that accounts for the geometry of scatterers and detector. We treat the fluorescence emissions with fully specified (self-) coherence functions and explicit emission and propagation times. Since two emission events may have lost temporal overlap when reaching the detector, in particular for extended samples, the effective contrast can be severely degraded. We derive detailed estimates for the contrast in terms of a few parameters, including coherence time, sample extent, as well as the number and relative strength of emission lines. In particular, we can show that the contrast is inherently limited and derive a universal upper bound on the photon yield and correspondingly on the SNR.
The text is structured as follows. In Section 2, we describe the model, compute some statistical properties in terms of the radiated energy, and discuss contrast formation in the twopoint correlations. We subsequently extend this model in Section 3 to account for quantized photodetection and derive an expression for the SNR. In contrast to the statistical treatment in the work of Trost et al. (2020), we treat the coincident measured counts as statistically dependent random variables and give rigorous lower bounds on the noise level and thereby rigorous upper bounds on the SNR. In the course of Section 4 we discuss different contributions to the contrast with special emphasis on the geometrical implications of the finite coherence time, propagation of the excitation pulse, and non-negligible size of the scattering volume. Section 5 is concerned with spatial sampling of the correlation signal and the implications of finite-sized detector pixels. In particular, we quantify, using rigorous upper bounds on the contrast, how the contrast decreases with increasing angular pixel size and size of the scattering volume, as was previously mentioned qualitatively in the works of Trost et al. (2020) and . In Section 6 we estimate and show how the constraints imposed by the contrast relations inherently limit the photon yield. We use these estimates to discuss the feasibility of experiments in Section 7. Section 8 summarizes and concludes our findings.

A probabilistic model for incoherent emissions
We develop a simple probabilistic model to describe the random incoherent emissions from an ensemble of emitters, following excitation by a short pulse. A time-dependent description is necessary to capture the loss of contrast due to the finite coherence time in a closed form. We use this model to derive statistical properties of the energy that is radiated during each pulse and show how the structure factor of the emitter distribution enters the two-point intensity correlations. The main symbols used throughout the article are listed in Appendix C for reference.

Setting
Let us consider the following experimental setting, as sketched in Fig. 1. An ensemble of atoms is irradiated by an ultrashort excitation pulse that excites a sizeable number of them. Some fraction of the excited atoms subsequently undergoes a radiative transition and each emits a fluorescence photon. These photons are registered by (energy-sensitive) pixels of a 2D pixel array detector. The average radiated energy is isotropic if we neglect secondary interactions with the sample. However, the equal-pulse two-point intensity correlations À contain a structural signal of the emitter ensemble.
IDI, as originally proposed, uses pulses from an XFEL that produce inner-shell vacancies due to photo-absorption. There, the majority of the emitted photons stem from radiative transitions to the K shell, most frequently the K 1 and K 2 transitions. The model we describe, however, neither depends on the nature of the excitation pulse nor on the particular type of transitions, and should therefore be equally applicable to other spectral ranges, in particular the optical regime, and even other kinds of incoherent emissions. We formulate the theory as generally as possible, referring to the parameter range of K radiation only for examples.
Significant correlations can only be observed when the pulse duration (implementing the time gating) is comparable to the coherence time of the emitted light. The coherence time of inner-shell fluorescence radiation is on the order of femtoseconds, which is significantly shorter than the time resolution of any conceivable detector. We thus assume that only the total energy, deposited over an entire pulse, is registered experimentally.
Although emissions of individual photons are typically discussed in a quantum-mechanical formulation, a large number of emissions can be conveniently described semiclassically. Interference in two-point intensity correlations from a pair of incoherently excited atoms is discussed for example in the book by Agarwal (2013), ch. 14. There it is shown that the emerging interference fringes have 100% visibility, which is a clear sign that the underlying process is non-classical (Mandel, 1999). However, the visibility gradually decreases with increasing number of emitters N E and asymptotically approaches 50%. In fact, Classen et al. (2017) have calculated that the interference patterns produced, respectively, by single-photon emitters (SPE) and thermal light sources (TLS) converge with one another with Oð1=N E Þ. It thus seems plausible that the interference from a large number of emitters can be approximated well with a semi-classical description, but special care has to be taken when discussing two-photon contributions.

Basic assumptions
First, we discuss the emissions as randomly parameterized classical electromagnetic waves (see Fig. 2). Later, in Section 3, we additionally include quantized photodetection, following the book of Goodman (1985). We initially suppose that the fields are perfectly polarized to simplify the notation. The unpolarized nature of fluorescence is taken into account in Section 4.3. Let N E be the number of individual emitters and let r m for m 2 f1; . . . ; N E g denote their positions. We assume the system to be stationary in the sense that each pulse has the same initial conditions and can be represented as a realization of the same statistical ensemble.
We make the following assumptions: (i) The event in which a specific atom with index m emits a photon follows a Bernoulli distribution with probability s m . We associate the random variables b m 2 f0; 1g with the emission of a photon.
(ii) The photon energy h -! m takes values from a finite set of discrete transition energies (emission lines) ' m with probabilities corresponding to the relative line intensities.
(iii) The emission times t m are randomly distributed with probability density proportional to the cycle-averaged local intensity I exc ðr m ; tÞ of the excitation pulse (the lifetime of the excited state is described by the time evolution of the field amplitudes) at position r m .
(iv) The emissions are independent, such that the random variables characterizing the emissions, b m , t m and h -! m , are mutually independent.
(v) The emissions can be described by outgoing spherical waves.
Let u m ðvr; tÞ be the complex-valued analytic signal associated with the electromagnetic disturbance due to the emission from the mth atom. The normalized vector v denotes the observation direction relative to the center of the scattering volume. Let denote the energy flow of u m into the infinitesimal solid angle d at some point vr in the far-field. The total electromagnetic disturbance at position vr due to all the emissions can be written as The total energy flow through d becomes

Correlations
These basic assumptions suffice to significantly simplify the average intensity and two-point intensity correlations. The expectation value of W can be expressed as Inserting (2) and exploiting the independence of the individual emissions show Here, we have used E½u m ðtÞ ¼ 0 and b 2 m ¼ b m . Performing the time integral and inserting Eb m ¼ s m yields Next, we calculate the two-point correlations of the intensity. Consider the energy flow into the direction v 0 through the solid angle d 0 as a second observable. We are interested in the correlation of W d and W 0 d 0 , expressed as Importantly, the random variables that appear are identical for both observation directions. We obtain and E Uðr; tÞUðr; tÞ Ã Uðr 0 ; t 0 ÞUðr 0 ; t 0 Þ Ã ½ Â E u m ðr; tÞu n ðr; tÞ Ã u m 0 ðr 0 ; t 0 Þu n 0 ðr 0 ; t 0 Þ Ã Â Ã : Here, we use the notation r vr and r 0 v 0 r 0 . Since the emission times are mutually independent and Eu m ¼ 0 as well as Eu 2 m ¼ 0, only certain combinations of the indices survive the expectation value: Sketch of the probabilistic model. The emitters at fixed positions r m randomly emit spherical waves with finite coherence time C . The moment of emission is random and proportional to the instantaneous intensity of the excitation pulse, which is assumed to be a traveling wave with pulse duration exc . E Uðr; tÞUðr; tÞ Ã Uðr 0 ; t 0 ÞUðr 0 ; t 0 Þ Ã ½ Eb m Eb n E u m ðr; tÞu m ðr 0 ; t 0 Þ Ã u n ðr 0 ; t 0 Þu n ðr; tÞ Ã Â Ã : Performing the time integration yields To evaluate the remaining integrals, we express the u m as u m ðr; tÞ u m ðt À t m À m Þ ð 12Þ with geometric phases m ¼ jr À r m j=c. The integral in (11) can then be expressed as with mn ¼ t m À t n and interference terms In the far-field we have jr m j ( jrj such that the usual far-field approximation gives m À n ' Àv Á T mn with In the following we suppress the solid-angle differentials. To further evaluate these interference terms (14), we require additional assumptions on the temporal structure of the individual emissions. We assume that the emissions produce electromagnetic signals that are characteristic for the involved atomic energy levels (emission line ' m ) up to the spatial origin r m and emission time t m . More precisely, we parameterize the waves in terms of their respective temporal and spatial origin and their emission line ' m such that where u 0 ð'; tÞ, for each emission line ', describes an outgoing spherical wave. Inserting this into (14), we obtain We have assumed that signals with ' m 6 ¼ ' n do not interfere because the emission lines do not overlap.

Spectrum and self-coherence
It turns out that (17) is fully determined by the complex degree of coherence (CDC) of U, which in turn is fully determined by the (phase-less) spectrum of U.
For simplicity, we ignore the spatial dimensions and also consider the simplest situation of only a single contributing emission line '. Then, we may express the self-coherence at time delay of the total signal U as the ensemble average (Mandel & Wolf, 2015): Here the expectation value is taken over the time offsets t m which are distributed over some time interval longer than the coherence time. Since u 0 ðtÞ oscillates rapidly with some central frequency ! 0 and the time offsets are mutually independent, each non-diagonal term must vanish. The diagonal terms become for jj ( exc . Here the duration of the excitation pulse exc quantifies the width of the distribution of t m . Then, also the self-coherence (18) Normalizing the left-hand side gives by definition the CDC ' associated with the emission line ', such that The temporal autocorrelation of u 0 is hence directly related to the macroscopically accessible CDC of an isolated emission line. Importantly, equation (21) implies that the temporal autocorrelation of u 0 is fully determined by ' , which is determined by the coherence time C;' and the line shape.

Interference terms
Substituting (21) in (17) yields The CDC factorizes into a rapidly oscillating part and an envelope, such that The envelope is real for symmetric line shapes but may take negative values in general. However, for Gaussian and Lorentzian line shapes, it is restricted to non-negative real values. It can hence be expressed as ' ¼ j ' j. In particular for Lorentzian line shapes, we have ' ðÞ ¼ expðÀjj= C;' Þ where C;' is the coherence time. Inserting the factorized ' ðÞ yields where k m ¼ ! m =c.
Suppose that the emissions are spectrally filtered so that only photons from a relatively narrow energy band are registered in the detectors. More precisely, suppose that ðk m À " k kÞ= " k k is very small for all m, where " k k ¼ E½! m =c is the mean wavenumber of the involved frequencies. In the same sense suppose that the pulse energies are approximately equal, i.e. w m ; w 0 m ' " w w. The expectation value of (24) can then be written as with The coupling coefficients C mn take values in the interval ½0; 1. Note that their diagonal elements are C mm ¼ 1 and that C mn ¼ C nm as can be readily seen. From here on set " w w 1 and q " k kðv À v 0 Þ to simplify the notation. After inserting (25) into (13), (11) finally yields Comparing this expression with the result of Classen et al. (2017), which was derived with a time-independent quantummechanical description of SPE, shows two differences. First, their model is time independent and as such implicitly assumes C mn 1. Second, our classical formulation fails to reproduce that the diagonal terms, corresponding to coincident detection of the same atomic emission, must vanish. However, since the diagonal terms scale with Oð1=N E Þ relative to the other terms, they can be neglected for large N E and the two expressions converge. Using EW ¼ EW 0 ¼ P m s m and neglecting the single sum simplifies the correlation to The right-hand side of (28) consists of two parts: a constant background EWEW 0 and a structural signal AE = À À EWEW 0 . It follows that

Effective contrast and structure factor
The two-emitter contributions in (28) are individually attenuated by the coupling coefficients C mn . Although these coefficients could be computed individually for a specific model, it is instructive to consider their global average, Using the approximation C mn ' C eff decouples the contributions of the individual emitters in (28) and the right-hand side simplifies to with Equation (32) resembles the definition of the structure factor defined in the context of elastic scattering, although the coefficients have a different meaning.
The sums can be expressed as integrals. Defining the (unitless) emitter distribution we can write the structure factor (32) as where F fsg denotes the 3D Fourier transform of sðXÞ. We scaled the emitter distribution with the wavenumber " k k, because it is a property of the emission line and hence determined by the emitters. We will mostly use the unit-less scattering vector, defined as x v À v 0 , in place of q. The two are related by q ¼ " k kx so that jxj probes a length scale of =jxj. We will use the two notations SðqÞ ¼ SðxÞ synonymously.
Equation (31) is equally applicable for W 0 ¼ W, which provides the variance VarW ¼ E½WW À ðEWÞ 2 . Using E½WW ¼ Àð0Þ and Sð0Þ ¼ 1, we read off that For clarity and later reference, we rewrite (31) more explicitly to include the two observation directions and their solid-angle differentials as

Comparison with elastic scattering
Compare (36) with the elastically scattered intensity, which can be written as where here v in is the propagation direction of the incoming beam and k the wavenumber of the incoming and scattered beam. Recall that the elastic structure factor S el is defined as in (32); however, instead of the emission probabilities s m , the coefficients are interpreted as atomic form factors. There are two profound formal differences between (37) and (36) besides the dependence on one and two directions, respectively. First, (36) contains constant (unit) offset. Second, the signal is attenuated by the constant C eff quantifying the contrast of the structure factor to this constant background. The correlation function thus contains a constant intrinsic background and the intrinsic contrast of the structural signal to this background is always less than unity. The coherently scattered intensity on the other hand has no intrinsic background and therefore no intrinsic contrast limitations. Only additional processes that contribute to the intensity form a background and limit the contrast. This extrinsic contrast, however, is not limited to unity.

Photon statistics and noise
The goal of this section is to estimate the noise level of measurements of AE. Apart from the fluctuations of the classical energies W, which are caused by the randomness of the emissions (source noise) [Trost et al. (2020) call this contribution phase noise, because it originates from the random phases in their model], additional fluctuations emerge due to the quantization of the electromagnetic field or the detection process (shot noise). In fact, in the low-photon regime, this second source of noise is the bigger contribution to the noise level. Importantly, it is independent of the signal level of the correlation.

Statistics at a single observation point
We follow the semi-classical formalism described in the textbook of Goodman (1985), starting by repeating some fundamentals for reference purposes. Let K be the number of counts registered by the detector at observation direction v and covering the solid-angle differential d. The conditional distribution K j W, conditioned with the energy W entering the detector, is Poissonian with parameter W, where is the quantum efficiency of the detector. (Here we include the photon energy h -! in the definition of the quantum efficiency.) The expectation value and variance of the (unconditional) K can be expressed in terms of moments of W as Inserting (35), which gives VarW in terms of EW for the model discussed in Section 2, we obtain Equation (39) gives the fluctuations of the number of counts measured with a single detector as a function of the mean counts. The fluctuations have two contributions: the source noise, C eff ðEKÞ 2 , and the shot noise, EK. As we are going to see in Section 6, the mean count numbers that can be expected are usually well below " K K < $ 1. In this regime, the shot noise strongly dominates the source noise.

Count-correlations
The situation is more complicated for coincident measurements at two or more points, because the emissions into the two respective directions are correlated. We can, however, assume that the two detectors do not influence each other, such that the conditional random variables K j W and K 0 j W 0 are in fact independent. Under this assumption we have that E½KK 0 = 2 E½WW 0 and similarly Cov½K; K 0 ¼ 2 Cov½W; W 0 (Goodman, 1985). As a result, the correlations and covariances of the two classical energies and the two photon counts are related simply by a constant factor. In particular, we can define the signal as and use the expressions for Cov½W; W 0 . Note that (40) assumes K 6 ¼ K 0 , because otherwise the two conditional distributions are no longer independent. For K ¼ K 0 we get Cov½K; K ¼ VarK which includes a shot-noise term that is absent in (40) (Singer & Vartanyants, 2013).

Fluctuations in the count-correlations
The covariance on the right-hand side of (40) can be expressed as Cov½K; K 0 ¼ E½ÁKÁK 0 , where ÁK ¼ K À EK and ÁK 0 alike. We would like to calculate the second central moment, Var½ÁKÁK 0 . Unfortunately, there is no simple identity relating this expression to moments of W and W 0 . A simple but rather lengthy calculation (see Appendix B) shows that In particular, since all terms on the right-hand side are nonnegative, we conclude that This proves that the variance is greater than or equal to the product of the shot noise of two individual (independent) intensity measurements, irrespective of the joint statistics of W and W 0 . It is obvious that the other terms in (41) will contribute significantly for higher average counts EK, so that the variance scales differently. However, this high-photon regime is of little relevance here due to the low photon numbers that can be expected (see Section 6). For further discussion of this high-photon-count regime, see the work of Trost et al. (2020).

Measurements and SNR
We now turn our attention to the SNR for estimates of the covariance AE. We first specify, for reference, how to compute AE from measured coincident realizations of K and K 0 . Suppose that ð j ; 0 j Þ for j 2 f1; . . . ; Rg are R independent (coincident) realizations of ðK; K 0 Þ. Define the sample mean as and " 0 alike, and set Then " converges to EK and b AE AE converges to Cov½K; K 0 for increasing R. The SNR of b AE AE can be defined as Using Inserting (42) and AE ' EKEK 0 Â C eff SðxÞ, and assuming isotropic radiation with " Recall that (42) bounds the variance by the shot noise of the intrinsic background, which results in a strict upper bound on the SNR. For higher photon counts, however, the source noise may exceed the shot noise so that (48) overestimates the SNR. In fact, for increasing " K K, the SNR saturates at R 1=2 SðxÞ (Trost et al., 2020).
When only a single pair of detectors is used then the number of independent realizations R is given by the number of pulses N pulses . In practice, however, it makes sense to use a pixel array detector and acquire a large number N pixel of measurements simultaneously. Since not all possible pairs of pixels correspond to distinct scattering vectors, every frame provides multiple realizations for the same x simultaneously. The multiplicity ðxÞ corresponding to the scattering vector x decays from N pixel for x ¼ 0 to 1 for large x, where its exact shape depends on the chosen discretization and the detection geometry. Note that these simultaneous realizations are in general not independent. Even if the individual intensity measurements were independent (if the coincident intensity measurements were strictly independent, their correlation, and thereby the signal, would vanish), then the individual terms in (44) would still be correlated, because some j appear more than once (Trost et al., 2020). Nevertheless, for the purpose of an optimistic upper bound on the SNR, we can use R N pulses ðxÞ and implicitly count the simultaneous realizations as if they were independent.
Solving (48) for N pulses gives the minimum number of pulses required for a certain target SNR and a given signal level: The main results of this section are (48) and (49), giving an optimistic bound on the relation between the SNR and the number of pulses. Although we derived the SNR only for a specific method of estimating the two-point covariances, which we do not claim to be optimal, the result illustrates a general characteristic: for low photon counts, the noise in the estimated covariances is dominated by the shot noise of the individual measurements, which is independent of the signal level. As a result, the signal level does not cancel out and the SNR is linearly proportional to the signal level. These exponents, together with the scaling of contrast and photon counts, sensitively govern the achievable SNR for different experimental settings, as we will discuss in Section 6.

Contrast estimates
Since the contrast C eff is a crucial parameter in (36) and (48), it seems appropriate to discuss it in more detail. We make some simplifying assumptions to factorize C eff and discuss the main constituents individually. Each can be estimated from a few parameters, such as the duration of the excitation pulse, the spatial extent of the scattering volume, and the spectrum of the emitted radiation. Suppose for simplicity that all involved emission lines have the same coherence time C and therefore that j ' j is identical for all emission lines. Then, the coefficients (26) factorize into a spatiotemporal (ST) and spectral (L) factor, Here we dropped the index ' since all j ' j are alike. Equation (50) gives an approximate expression for the coupling coefficient between two specific emitters. The macroscopic contrast C eff , however, is determined by the average coupling constant between all pairs of emitters. Assuming that the spectral factor is constant for all pairs of emitters allows one to take the averages separately, so that with C L eff ¼ hC L mn i m;n and C ST eff ¼ hC ST mn i m;n .

Spectral overlap
We discuss the spectral contribution first. Suppose, for simplicity, that all emitters have identical emission spectra, consisting of discrete emission lines with relative intensities p ' . In that case, the first contribution to the coupling coefficients becomes The factor therefore does not depend on m and n, but differs for diagonal (identical emitter) and non-diagonal (distinct emitters) contributions. Since there are only N E diagonal terms, but N 2 E À N E non-diagonal terms, only the latter contribute significantly to C eff . In particular, the K line splits into the K 1 and K 2 lines with relative intensities of 2/3 and 1/3, so that C L eff ðKÞ ' 5=9.

Spatiotemporal overlap
A pair of emissions contributes to the structural signal only when the two emissions arrive within the coherence time at each pixel. This temporal overlap is purely governed by the duration of the excitation pulse when the scattering volume is research papers Acta Cryst. (2021). A77, 480-496 smaller than the coherence length. For larger scattering volumes, however, the finite-speed propagation of the excitation pulse and of the emissions also enter the picture, and we have to consider the combined temporal and spatial overlap.
The overlap C ST mn could be approximated analytically for certain symmetric configurations; however, covering all possibilities would be lengthy and cumbersome. Instead, our strategy is to write down the full expression and to derive some complementary upper bounds for it. In this way, we can include all configurations with little effort and give rigorous estimates on the maximum achievable contrast.
The factor C ST eff is governed by the emitter locations r m and the probability distribution of the emission times t m . We write the probability density function of the temporal separation mn ¼ t m À t n as Éðr m ; r n ; Þ. The probability density of t m is proportional to the intensity of the excitation pulse I exc ðr m ; tÞ, by assumption, so that Éðr m ; r n ; Þ ¼ R I exc ðr m ; t þ ÞI exc ðr n ; tÞ dt R I exc ðr m ; tÞ dt Â R I exc ðr n ; tÞ dt : The spatial diagonal Éðr; r; Þ is the temporal autocorrelation of the excitation pulse, which, neglecting dispersion, does not depend on r. For brevity we write É 0 ðÞ ¼ Éðr; r; Þ.
We model the excitation pulse as a dispersion-less plane wave with propagation direction v exc , group velocity c and duration exc . Since a plane wave can be written as a function of a single argument, I exc ðr m ; tÞ I exc ðr m Á v exc À ctÞ, a simple shift in the integration variable of (53) shows that the crosscorrelation can be expressed as Note that É 0 ð0Þ $ 1= exc and R É 0 ðÞ d ¼ 1. The expectation value can be expressed as where v cen ¼ ðv þ v 0 Þ=2. For the last identity, we have substituted (54) and shifted the integration variable. First, we focus on the purely temporal components of the overlap, corresponding to a small scattering volume. Suppose that all emitters lie close together such that jT mn j ( C and jT mn j ( exc . Under that assumption (55) simplifies to Importantly, this quantity is position-independent. We can give upper bounds to (56) using the inequality R f ðxÞgðxÞ dx max x ff ðxÞg R gðxÞ dx; which holds for non-negative functions f and g. Noting that R jðÞj 2 d ¼ C , we obtain This shows that the contrast is independent of the emitter geometry and hence 'pulse limited', for small scattering volumes. Larger scattering volumes, on the other hand, do affect the contrast. Consider a cuboid scattering volume, aligned with the observation directions as shown in Fig. 3. Assuming a homogeneous emitter distribution, we can calculate the average over all emitter pairs. The average can be expressed as where T j ¼ L j =c are the propagation durations, j ¼ ðv cen À v exc Þ j are the respective projections to the coordinate axes, and ÃðtÞ ¼ maxf1 À jtj; 0g is the unit triangle function. Note that the dependence on j can be neglected when exc ) T j and exc ) C because then É 0 is essentially independent of its argument. Exchanging the order of integration, we obtain withÉ Note that (60) reduces to (58) in the limit of small T j . First, consider the effect of T 1 and T 3 onÉ É 0 . Using (57) on both integrals and extending the bounds to infinity, we obtaiñ É É 0 ðÞ min É 0 ðÞ; Next, applying (57) on the integral over t 2 in (60)  Consider a collection of emitters that is homogeneously distributed in a cuboid scattering volume with edge lengths L j . The edges of the cuboid are aligned with respect to the scattering vector x and v cen ¼ ðv þ v 0 Þ=2.
For the second inequality we have set j 2 j ¼ 0, which does not decrease the value of C ST eff . Here AC jj denotes the temporal autocorrelation function of jðÞj, For the second identity we have used jðÞj ¼ expðÀjj= C Þ, corresponding to a Lorentzian line shape. The time integral Inserting (62) into (63), using (57) to bound the integral over É 0 ðÞ, and then inserting (65) yields which is the equivalent of (58) for larger volumes and large scattering vectors x. Equation (66) is only useful for jxj > 0. Going back to (63), we find another inequality for j 2 j > 0. More precisely, (63) implies Here we have set jxj ¼ 0, which only increases the value of the integral. InsertingÉ É 0 ðÞ É 0 ðÞ yields This shows that even for small scattering vectors, the contrast can be significantly below the pulse-limited contrast, when the detector is not placed in the forward direction with respect to the excitation pulse. This effect is most pronounced with jðv cen À v exc Þ 2 j ' 2 in a 'back-scattering' geometry but also enters with jðv cen À v exc Þ 2 j ¼ jv cen j in the 90 geometry that has been proposed by Classen et al. (2017) and is depicted in the work of Trost et al. (2020, Fig. 1).
In summary, we have discussed that for small scattering volumes (i.e. negligible propagation time), the spatiotemporal contrast is given by (58), which we refer to as the pulse-limited case. For larger scattering volumes, this estimate is complemented by (66) and (68). Since all estimates are rigorous upper bounds, the smallest one takes preference. In particular, the contrast can only be pulse limited when all of jxjL 2 < $ 2c C , jðv cen À v exc Þ 2 jL 2 < $ c C and jðv cen À v exc Þ j jL j < $ 2c C for j 2 f1; 3g are satisfied. This is the case, irrespective of the scattering volume, when measuring at small scattering vectors jxj ( 1 in the forward direction with respect to the excitation pulse, so that v exc k v cen . For all other cases, however, optimal (pulse-limited) contrast requires effectively that L j < $ c C in all three dimensions. Importantly, we have that C ST eff < $ 2c C =ðjxjL 2 Þ even in the best case. Note that the scattering vector corresponds to a feature size a by jxj $ =a. The constraint on L 2 can thus be expressed as The right-hand side is on the order of 10 4 for inner-shell fluorescence.

Polarization effects
In the preceding discussion we have assumed the emissions to be perfectly polarized such that they can be expressed as scalar functions uðr; tÞ. In an ensemble of emitters without a preferred spatial orientation however, the individual emitters produce (on average) unpolarized emissions. This further reduces the contrast.
Consider two fixed observation directions v and v 0 . For anỹ v v that lies in the plane spanned by v and v 0 the emissions can be decomposed into two orthogonal polarization components, one parallel component ðÞ and one orthogonal component ðÞ. A detector that does not distinguish the polarization registers both components, W ðÞ and W ðÞ , independently, such that the total deposited energy can be written as W ¼ W ðÞ þ W ðÞ . It follows that Assuming that the two polarization components are uncorrelated and equally intense, we obtain Comparison with (31) and using EW ¼ 2EW ðÞ shows that effectively When both polarizations produce the same contrast, the contrast for unpolarized light is reduced by 2. On the other hand, when one component produces no contrast at all, the contrast is reduced by 4. We can thus incorporate unpolarized emissions simply by introducing C P eff , with 1=4 C P eff 1=2, as an additional factor into (51).

Spatial sampling and finite detectors
We have seen that the intensity correlation Àðv À v 0 Þ between two observation directions v and v 0 gives access to the struc-research papers ture factor SðxÞ at the scattering vector x ¼ v À v 0 . Next we shall discuss requirements on the sampling of observation directions in order to map the structure factor as a function of the scattering vector x. Note that we consider SðxÞ as the quantity of interest and ignore the topic of reconstructing the real-space emitter distribution. In particular, resolution exclusively refers to the resolution of SðxÞ. Moreover, we quantify and discuss how finite detectors influence the contrast.

Spatial spectrum of the correlation signal
Set EW ¼ EW 0 ¼ 1 for this section. Consider AEðxÞ as a function of the unit-less scattering vector x. Its spatial spectrum is easily computed from (28) and can be expressed as Note that the spectral amplitudes are real and positive since ÀðxÞ is real and symmetric by definition. The spectrum of the homogeneous background is a single function. The spectrum of AE, on the other hand, is similar to the spatial self-correlation or generalized Patterson function (Cowley, 1995) of the emitter distribution, but attenuated by the C mn coefficients. The coefficients can be pulled out of the sums in the same way as in (31) by approximating them with C eff , so that Integrating the spectrum over R 3 gives approximately C eff . Consider the previously introduced geometry, a homogeneous emitter distribution contained in a box with edge lengths L 1 , L 2 and L 3 as shown in Fig. 3. Approximating the homogeneous emitter distribution by a continuous density, the spatial spectrum (73) can be expressed as with ÁX s;j ¼ " k kL j , AEð0Þ ¼ C eff and the unit triangle function ÃðtÞ ¼ maxf1 À jtj; 0g.
More generally, any compact emitter distribution can be enclosed in such a box and its spatial frequencies are therefore limited by ÁX s;j for all dimensions j. These frequencies only depend on the respective linear extent but not on the exact form of the scattering volume.
In other words, the structure factor and AEðxÞ are band limited with bandwidth ÁX s . They can thus be sampled aliasing-free with the Nyquist rate 2ÁX s , that is on a 3D grid with spacing less than or equal to =ÁX s;j .

Finite pixels
The argument of ÀðxÞ in (36) is determined by the difference of two unit vectors v and v 0 , representing two observation directions. In a real system these observation directions are not points but finite solid angles, given by the pixel size and emitter-to-detector distance. We describe these solid angles as finite patches on the (normalized) Ewald sphere, i.e. Á; Á 0 & S 2 . The measured correlation function can then be expressed as with Hðx; Á; Á 0 Þ ¼ where Ã denotes a 3D convolution, 1 Á ðvÞ the indicator function of Á on the sphere S 2 and Á ðxÞ the Dirac surface delta function (Laplacian of the indicator) of Á on R 3 . The vectors v 0 2 Á and v 0 0 2 Á 0 are the central directions of the two observation regions.
In practice, the two solid angles are small enough to neglect the curvature of the Ewald sphere. They can hence be approximated as planar patches with respective normals v 0 and v 0 0 [see Fig. 4(a)]. We discuss the symmetric case of two partially aligned planar squares with relative angle ' and side lengths (in practice, the side lengths Á are given by the angular size of the detector pixels) Á as sketched in Fig. 4(a). Here, the convolution kernel H can be computed analytically. However, the exact expression for H is less important than its support, which is sketched in Fig. 4(b). We approximate H by The effect of finite-sized detector pixels (a) on the correlations ÀðxÞ can be described by a 3D convolution H Ã À. (b) Support of the convolution kernel H.
where Áx H;1 ¼ Á; Áx H;2 ¼ 1 À jx=2j 2 À Á 1=2 Á; Áx H;3 ¼ jx=2jÁ: The Fourier transform of this approximate kernel reads As expected, the finite pixels cause a low-pass filtering of ÀðxÞ with angular frequency scale 1=Áx H and thereby limit the resolution with which ÀðxÞ can be measured, irrespective of the sampling rate. The more general case of unaligned detectors corresponds to a less regular convolution kernel but qualitatively has the same effect.

Contrast
Finite pixels limit the resolution with which the structure factor SðxÞ can be measured since higher spatial frequencies are attenuated. However, due to the presence of the intrinsic background, they also limit the achievable intrinsic contrast. We will illustrate this with a simple example that can be computed analytically: the homogeneous emitter distribution in a box as shown in Fig. 3.
For that purpose we decompose À ¼ 1 þ AE into a sum of constant unit background and signal AEðxÞ ¼ C eff SðxÞ. The measured correlation becomes H Ã À ¼ 1 þ H Ã AE, using that the background is unaffected by the convolution. We may factorize the approximate convolution kernel HðxÞ = Q j H j ðx j Þ as well as the signal AEðxÞ ¼ Q j AE j ðx j Þ into 1D functions, such that Each factor satisfies the inequality Exploiting the Fourier convolution theorem we find a second triple of inequalities, 2 Combining the two inequalities yields Substituting ÁX H;j and Áx s;j shows jH Ã AEj AEð0Þ min 1; with A generalization of this result, not relying on specific assumptions on the shape of AE and H, is presented in Appendix A. Equation (85) implies that the contrast falls off with 1=L j for each linear extent L j of the scattering volume that exceeds a critical length in the order of L crit . Note that (85) is based on a not necessarily sharp upper bound on the signal strength, so that the contrast may suffer even for smaller L j . To put this into perspective, consider K radiation of iron ( ' 0.19 nm) and a typical pixel detector with 50 mm pixel size. Placing the detector 10 cm, 1 m or 10 m from the emitters corresponds to L crit of 0.38 mm, 3.8 mm or 38 mm, respectively.

Photon count estimates
We have seen in (48) that the SNR is bounded by an expression proportional to C eff Â " K K. Here, we derive universal upper bounds on this product, based on the fact that the emitter density " E is finite and that the contrast decreases with increasing extent of the scattering volume.
The mean number of photons emitted into 4 from a cuboid scattering volume with edge lengths L j (see Fig. 3) is nL 1 L 2 L 3 " E , where n is the emission efficiency in the considered energy band. A detector pixel of angular size Á and quantum efficiency registers on average counts if we neglect self-absorption within the scattering volume. We have derived two independent constraints between scattering volume and contrast. First, (66) implies, in particular, that the contrast falls off with 1=L 2 for L 2 > $ 2c C =jxj (coherence time constraint). We set to quantify the margin by which the coherence time constraint is satisfied. Second, (85) implies that the contrast falls off with 1=L j for each dimension j in which the linear extent of the scattering volume L j is larger than a certain critical size (sampling constraint). We analogously set to quantify the margin by which the sampling constraints are satisfied. Inserting (88) and (89) into (87) yields The unit-less scattering vector corresponds to a feature size a via jxj $ =a, so that (90) can be expressed as Having expressed " K K in terms of the parameters , we can optimize the product C eff Â " K K. Equations (66) and (85) imply that whereC C 1. Since (92) becomes proportional to every that is less than one, it follows that where ¼ 1 refers to all the coefficients. In particular, increasing the mean photon count " K K beyond its value for ' 1 does not increase the SNR, because the contrast is reduced just as much. In other words, there is an optimal size of the scattering volume, beyond which the SNR does not increase any further.
Although some -dependent factors are included inC C and are not explicitly written in (92), they only reduce the contrast further. Because of these factors and since the individual inequalities are not sharp (there can be significant margins between the left-and right-hand side), the maximum SNR may actually be attained with some > $ 1 but, nevertheless, will not exceed the given bound.
For jxj ( 1, the sampling constraint on L 3 becomes arbitrarily large and thereby exceeds the self-absorption length (the 1=e length of the emitted light in the material), so that (90) strongly overestimates the achievable " K K. We can use S;2 instead of C to quantify L 2 to obtain where L SA is the self-absorption length. This estimate is independent of the coherence time.
To put (91) into perspective, consider a dense monoelemental iron crystal with C ' 2.6 fs, a ' 0.29 nm and " E ' 85 nm À3 . For these values, (69) yields L 2 < $ 2.3 mm. Optimistically assuming ¼ 1, C ¼ 1 and S ¼ 4, (91) shows that n ' 0:1% of all atoms are required to emit a photon for an average photon count of " K K ¼ 0:1 photons per detector pixel. The right-hand side of (91) is proportional to the squared feature size a 2 and therefore grows quadratically with increasing a. However, when measuring atomic distances, it effectively scales with 1=a, because the mean emitter density " E scales with 1=a 3 . Consider a crystal structure with lattice constant a, such that " E $ 1=a 3 . The number " E a 2 c C , which bounds the mean photon count " K K in (91), is then essentially governed by c C =a. In general, increasing the feature size a increases the SNR only as long as the mean emitter density decays slower than 1=a 2 .
As an example consider a crystal with ten times the lattice constant of iron, a ' 2.9 nm, and one iron atom per unit cell. The emission efficiency needs to be ten times as high for the same photon yield as for the pure iron crystal, so that, based on our previous estimate, n ' 1% for " K K ¼ 0:1. In summary, we have shown that an optimal size of the scattering volume exists. Larger volumes do not result in higher SNR, even though the photon count is increased. The mean photon count corresponding to optimal SNR can be estimated by (90), (91) or (94), with all coefficients > $ 1.

Discussion
Next, we use our findings to discuss the feasibility of different experiments and to highlight the challenges. We chose two particular experiments to illustrate the fundamentally different experimental regimes.

Determination of illumination spot size
First, consider an experiment to determine the illumination spot size, following Inoue et al. (2019). A spot size of L 2 = 0.5 mm with copper K radiation ( = 0.154 nm) corresponds to a scattering vector jxj $ =L 2 ¼ 3 Â 10 À4 . Correspondingly, the width of the zeroth-order correlation peak has to be resolved, similar to the primary beam profile in classical smallangle X-ray scattering geometry. To this end, the detector is placed in the forward direction a few metres downstream of the sample. The sample can be a foil so that the focal spot size of the excitation pulse determines the transverse extent of the scattering volume. Here, since the scattering vectors are small, the sample thickness L 3 can be chosen largely independently of the detection geometry, in practice up to the self-absorption depth. In particular, the contrast is pulse limited, because jxjL 2 $ ( c C and v exc À v cen vanishes. As a rough estimate, consider a 10 fs excitation pulse, 2 fs coherence time (corresponding to copper K), C P eff ¼ 1=2 and C L eff ¼ 5=9 (the two K lines) so that C eff $ 0:05. The selfabsorption length depends on the material and the emission line. For the K lines of copper with 8.96 g cm À3 mass density the 1=e length is 22 mm. Moreover, by using a highly focused XFEL beam it should be possible to ionize a large fraction of the sample atoms, so that a mean photon number of 0.1 per pixel can be easily achieved according to (94). Using a pixel detector with 1000 Â 1000 pixels provides $ 10 6 parallel realizations. Inserting these numbers into (49) estimates that at least 14 pulses would be required to sample the half maximum S ¼ 0:5 with a signal-to-noise level of 10. The experiment is relatively robust, because translations and rotations of the scattering volume do not affect the signal in first order. Inoue et al. (2019) used such a measurement to determine the pulse duration and the focal size at SACLA. In contrast, if one wanted to perform an analogous experiment at a laser-driven plasma source (Schoenlein et al., 2019), where on the order of 10 8 copper K emissions occur per excitation pulse of about 100 fs, the emission efficiency can be estimated to 10 À5 based on the atom density of copper, a spot size of about 2 mm and a target thickness of 10 mm. The low emission efficiency results in at most 10 À3 photons per pixel according to (94). The squared dependence on the photon count and contrast in (49) then would require more than 10 7 pulses. Ironically, initial attempts of such an endeavor had motivated the present work.

Atomic resolution from Bragg scattering
Second, consider an experiment to resolve crystal planes. As an example, consider the K 1 radiation from iron with C = 2.6 fs, = 0.19 nm and a total K-shell fluorescence yield of 35% (Schoonjans et al., 2011). Pure iron at room temperature has a b.c.c. (body-centered cubic) lattice with lattice constant 0.29 nm and number density " E = 85 nm À3 . Since the wavelength is fixed by the emission line, the lowest (110) reflections have a (unit-less) scattering vector of q=k ¼ 0:927. To reach this peak, the pixel array detector needs a field of view of about 55 in one dimension. The accessible scattering vectors are bounded by jv À v 0 j 2, so that the highest accessible reflection is the (220). As previously discussed, the coherence time constrains the sample size to about '2 mm through (69).
Consider a cubic perfect crystal with diameter 0.5 mm ( C ' 4) so that the contrast is excitation-pulse limited. The sampling constraint with S ¼ 4 requires an angular pixel size of Á < $ 6 mdeg, so that a detector covering 60 requires 10 4 pixel-columns in that direction. This could be realized by arranging ten 1M detectors in an arc. Fewer detector modules with larger gaps could also be used in principle, at the cost of covering only specific Bragg peaks.
A very optimistic estimate on the contrast for a 10 fs-long XFEL pulse gives C eff ¼ 0:07, for perfect conditions. This estimate assumes a perfect single crystal and that the system is perfectly stationary and stable. In particular, the orientation of the sample has to be stable on the order of magnitude of Á=jxj. An uncertainty in the sample orientation that exceeds this limit smears out the Bragg peaks over several resolution elements and therefore decreases the contrast by ðÁ=jxj=Þ 2 . An uncertainty of $ 60 mdeg, for example, would decrease the contrast by 10 À2 down to 7 Â 10 À4 . Whereas a stable sample orientation of 6 mdeg is easily obtained for static and extended samples, it is non-trivial to reach this accuracy in single-particle experiments with random orientations. The coherent diffraction signal of the particles could be used to calculate the particle orientation in each pulse. However, solely from a sampling perspective, detectors with a huge pixel number would be necessary to reach the desired angular accuracy , neglecting further experimental inaccuracies of 3D orientation determination. In this specific example, the coherent diffraction signal needs to be sampled on a detector with a minimum of about 10 4 Â 10 4 pixels to obtain $ 6 mdeg. Note that additional factors such as lattice vibrations (Debye-Waller factor), lattice strain and defects have not been considered.
Next, we discuss the multiplicity of correlation measurements from a single pulse (detector frame). The scattering vectors given by the sets of all pixel pairs are distributed in a volume . Most of the realizations correspond to small scattering vectors, while the larger scattering vectors of the Bragg peaks have a strongly reduced multiplicity . Assuming 10 3 pixel rows gives a conservative estimate of $ 10 3 parallel realizations for a Bragg peak signal.
Assuming a mean photon count of " K K ¼ 0:1 and using (49) we see that on the order of 2 Â 10 6 realizations are required for an SNR of 10, which could be acquired within 2 Â 10 3 pulses, assuming optimal conditions. Since the number of pulses depends quadratically on the contrast, an uncertainty in the sample orientation of 0.06 would increase the required number of pulses to 2 Â 10 7 , which is a sizeable number for a dense mono-elemental iron sample.
It will be challenging to realize a mean photon count of even " K K ¼ 0:1 for dilute samples. Using (91) with C ¼ S ¼ 4 and assuming 100% detection efficiency shows that about 0.4% of the atoms in the pure iron sample need to emit a K photon for a per-pixel photon yield of 0.1. Correspondingly, to achieve the same resolution in a sample with 1% iron content, 40% of the iron atoms would have to emit a K photon, which is already above the K-shell fluorescence yield. Therefore, such dilute samples can only produce a photon count of less than 0.1 photons per pixel, even when fully ionized. Optimizing the geometry for smaller scattering vectors, i.e. coarser resolution, enables one to use larger scattering volumes and can improve the photon yield to some degree. However, the given estimate already corresponds to a sample with diameter 0.5 mm, which does not leave much room for increase in the case of nanocrystallography or single-molecule diffraction.

Summary and conclusions
We have derived comprehensive equations relating the twopoint intensity correlations to the structure factor S of the emitter configuration. We have reproduced the expression given by Classen et al. (2017) up to an additive term of order Oð1=N E Þ, which is hence irrelevant for a large number of atomic emitters. This agreement with the results of  and the classical description presented by Trost et al. (2020) underlines that IDI does neither rely on any nonclassical states of light nor beat any classical limits.
By including time dependence, we have obtained an explicit expression for the contrast between the structural signal and the inherent background in the correlation functions. Equation (28) shows that the total signal can be decomposed into a sum of terms from individual pairs of emitters and that each term is attenuated by a coupling constant C mn that takes values from 0 to 1. Averaging the coupling constant over all pairs of emitters gives an effective contrast C eff of less than one.
We have given an estimate for the SNR, equation (48). In the low-photon regime, it scales linearly with the mean photon count " K K and the signal strength C eff S. Importantly, we have obtained a rigorous upper bound on the SNR for two statistically dependent coincident measurements. Based on our model, we have identified several factors influencing the contrast and have quantified them in terms of experimentally accessible parameters. First, the fact that multiple emission lines contribute to the total signal decreases the contrast by a factor C L eff that is inversely proportional to the number of emission lines and depends on their relative strengths. This emphasizes the need for an effective energy discrimination of the emitted radiation. Second, the lack of polarization of the emitted radiation decreases the contrast by a factor of C P eff , where 1=4 C P eff 1=2 depending on the angular separation of the two observation directions. Third, the finite coherence time C , the finite propagation time through the sample, and the finite duration of the excitation pulse exc strongly affect the contrast by a spatiotemporal factor C ST eff . The scaling of the contrast depends on the relative magnitude of the three involved time scales. In particular, the pulse-limited scaling, (58), does only apply for small scattering volumes, or when both observation directions are in a small cone around the propagation direction of the excitation pulse v exc . In general, the contrast is additionally affected by the extent of the scattering volume, as detailed in (66). It implies that C ST eff < $ 2c C =jxj=L 2 which effectively restricts the linear extent L 2 of the scattering volume for larger scattering vectors jxj. Although for two fixed directions (or pixel coordinates), the size restriction applies to only one dimension, for effective use of a 2D pixel array, involving simultaneous measurements at many different scattering vectors v À v 0 , it effectively applies to all dimensions.
The contrast is also affected by the integration associated with finite detector pixels. If the detector pixels are too large to properly resolve the speckle patterns, the signal is decreased while the inherent background remains unaffected, resulting in a loss of contrast. The scaling of the contrast can be expressed in terms of a critical length L crit ¼ =Á depending on the wavelength and the angular pixel size Á. According to (85), the contrast is decreased proportional to L crit =L j for each dimension j in which the diameter of the scattering volume exceeds a critical length. For small scattering vectors, only the transverse dimensions L 1 and L 2 are relevant while the thickness L 3 is effectively unconstrained, whereas for large scattering vectors, all dimensions contribute approximately equally.
Importantly, the constraints of the critical linear extent of the scattering volume, as discussed above, also directly limit the total photon yield per pixel for SPE due to their finite number density " E . We have shown in particular that the SNR cannot be improved beyond an optimal value, which is significantly lower than anticipated. Increasing the photon count beyond its corresponding optimum decreases the contrast and does not improve the SNR. The optimal photon count and SNR depend on the magnitude of the scattering vectors jxj and thus on the probed length scales. For small scattering vectors, the sample thickness can be increased independently of the transverse size to optimize the photon yield, so that the photon yield is bounded by (94). In contrast, for larger scattering vectors, all dimensions enter with the same scaling and the photon yield is bounded by (91).
Although both constraints had been mentioned by  and Trost et al. (2020), their implications for the photon yield had not yet been quantified or discussed.
Based on these bounds, we have discussed examples of two fundamentally different experimental regimes. First, an experiment aiming to resolve the geometry of the scattering volume as demonstrated by Inoue et al. (2019). Second, an experiment to resolve the crystal structure within the scattering volume as was proposed by Classen et al. (2017). For the latter case, we have shown that the best possible SNR is inversely proportional to the lattice constant when aiming for atomic resolution. We have also given estimates for the best possible photon count and the corresponding fractions of ionized atoms. Moreover, we have discussed how the pulse-topulse orientational stability influences the SNR.
We would like to stress that the simulation presented by Classen et al. (2017) strongly overestimates the achievable photon count and SNR for the discussed geometry. More than 5 photons per pixel are only possible with a large and dense sample with optimal geometry, which is inconsistent with the stated assumptions. Similarly, Trost et al. (2020) used mean photon counts in the range from 10 À2 to 10 3 in their simulations. We have shown that for atomic resolution IDI even " K K $ 10 À1 can be achieved without sacrificing the SNR only for samples with a high density of emitters, whereas for dilute samples with a lower emitter density, such as macromolecules, " K K < $ 10 À1 is more realistic.
In light of the low SNR, even under idealized conditions, and the required pulse-to-pulse stability, we come to the conclusion that utilizing IDI for serial crystallography will be extremely challenging in general -even more so for diffractive imaging of single molecules, as was envisioned by Classen et al. (2017) and deemed achievable by Trost et al. (2020).
We hope that our quantitative estimates may serve as a solid basis for discussing the use of structure determination based on incoherent emissions. In particular, we hope that our results may be useful to assess the limit of length scales that can be reasonably probed. Finally, we would like to mention that the derived limits, which are quite fundamental, scale very favorably with the coherence time of the emissions. It should therefore not escape our attention that emissions with longer coherence time such as visible light fluorescence could result in quite realistic IDI experiments on larger length scales.

APPENDIX A Contrast for arbitrary emitter distributions
We utilized concrete assumptions on the shape of the functions AE and H in the derivation of (85). In particular we assumed the multiplicative separability of the two functions to estimate each dimension separately. Clearly, this strategy fails for more general classes of AE. Here we outline a similar derivation without the need the factorize AE and H.
Following the same approach with the full 3D functions shows that jH Ã AEj AEð0Þ and H Ã AE j j ð2Þ 3 R F fHgðXÞ F fAEgðXÞ d 3 X: Unfortunately the next inequality in (83) cannot be applied here, because F fAEg is a distribution. Nevertheless, F fHg is smooth and slowly oscillating because H is compactly supported. We may therefore approximate the emitter distribution sðXÞ by a continuous emitter density E ðXÞ such that AEðxÞ ' AEð0Þ jF f E gðxÞj 2 jF f E gð0Þj 2 ¼ AEð0Þð2Þ 6 jF f E gðxÞj 2 R E ðXÞ d 3 X 2 : ð96Þ We can thus express the Fourier transform of AE by the normalized cross-correlation of E , Substituting F fAEg in (95) gives Z F fHgðXÞ F fAEgðXÞ d 3 X AEð0Þ We obtain where V denotes the volume of the emitter distribution [in units of ð2=Þ 3 ] and Our approximate expression (78) yields V H = ðÁÞ 3 jx=2jð1 À jx=2j 2 Þ 1=2 , roughly corresponding to the volume of the support of H. Equation (99) is a generalization of (83) for arbitrary detector shapes and scattering volumes. Approximating F fAEg by a continuous emitter density requires F fHg to be approximately constant on the (dimension-less) length scale of the emitter distance. On the other hand, in the limit of very wide H and thus very sharply peaked F fHg the individual -distributions are isolated in (95). We then obtain that H Ã AE j j AEð0Þ independent of the scattering volume and detector size.

APPENDIX B
Variance of dependent random variables Let ðX; X 0 Þ $ P X;X 0 be a pair of possibly dependent random variables on R þ . Define Y; Y 0 conditionally via P Y;Y 0 jX¼x;X 0 ¼x 0 ¼ PoiðxÞ Poiðx 0 Þ as a tensor product of Poisson distributions. We seek an expression for Var½ðY À EYÞðY 0 À EY 0 Þ in terms of X and X 0 . We have EY ¼ EE½Y j X ¼ EX and EY 0 ¼ EX 0 , as well as