Ptychographic X-ray Speckle Tracking

We present a method for the measurement of the phase gradient of a wavefront by tracking the relative motion of speckles in projection holograms as a sample is scanned across the wavefront. By removing the need to obtain an un-distorted reference image of the sample, this method is suitable for the metrology of highly divergent wavefields. Such wavefields allow for large magnification factors, that, according to current imaging capabilities, will allow for nano-radian angular sensitivity and nano-scale sample projection imaging. Both the reconstruction algorithm and the imaging geometry are nearly identical to that of ptychography, except that the sample is placed downstream of the beam focus and that no coherent propagation is explicitly accounted for. Like other x-ray speckle tracking methods, it is robust to low-coherence x-ray sources making is suitable for lab based x-ray sources. Likewise it is robust to errors in the registered sample positions making it suitable for x-ray free-electron laser facilities, where beam pointing fluctuations can be problematic for wavefront metrology. We also present a modified form of the speckle tracking approximation, based on a second-order local expansion of the Fresnel integral. This result extends the validity of the speckle tracking approximation and may be useful for similar approaches in the field.


Introduction
New facilities are providing ever more brilliant x-ray sources. To access the full potential of these sources we need x-ray optics that are capable of focusing light to meet the requirements of various imaging modalities. Thus there is an increasing need for at-wavelength and in-situ wavefront metrology techniques that are capable of measuring the performance of these optics to the level of their desired performance. This is a challenging task, as current x-ray optics technologies are attaining focal spot sizes below 10 nm (Huang et al., 2013;Mimura et al., 2010;Morgan et al., 2015;Bajt et al., 2018;Murray et al., 2019). Furthermore, adaptive optics are being employed to correct for wavefront aberrations by altering the physical state of a lens system in response to real-time measurements of wavefront errors (Mercère et al., 2006;Zhou et al., 2019). Such systems therefore benefit from fast and accurate wavefront metrology for rapid feedback.
One such method, falling into the second category above, was introduced by Bérujon et al. (2012) 1 . This method is a wavefront metrology technique based on near-field speckle-based imaging, which they term the "x-ray speckle tracking" (XST) technique. In XST, the 2D phase gradient of a wavefield can be recovered by tracking the displacement of localised "speckles" between an image and a reference image produced in the projection hologram of an object with a random phase/absorption profile. Additionally, XST can be employed to measure the phase profile of an object's transmission function. Thanks to the simple experimental set-up, high angular sensitivity and compatibility with low coherence sources this method has since been actively developed for use in synchrotron and laboratory light sources, see (Zdora et al., 2018b) for a recent review.
In ptychography, a sample is scanned across the beam wavefront (typically at or near the focal plane of a lens) while diffraction data is collected in the far-field of the sample. An iterative algorithm is usually employed to update initial estimates for the complex wavefront of the illumination and the sample transmission functions. If illuminated regions of the sample overlap sufficiently, then it is possible for a unique solution for both of these functions to be obtained (Hüe et al., 2010). Thus, ptychography is an imaging modality that performs both aberration free sample imaging and wavefront metrology simultaneously. This is in contrast to XST where these two imaging modalities correspond to separate imaging geometries. Table 1 The PXST method
Imaging geometry see Fig. 3 Described in section 2.
Iterative update algorithm see Fig. 6 Described in section 3.

Angular sensitivity
In the plane of the sample, see Eq. 47.
In the plane of the detector, see Eq. 48.
We propose a combined approach, which we term Ptychographic X-ray Speckle Tracking (PXST). In this approach, nearfield inline holograms are recorded as an unknown sample is scanned across an unknown wavefield. Estimates for the undistorted sample projection image and the wavefield are then updated based on the observed speckle displacements. There is no reference image and no additional speckle-producing object is required. This imaging geometry allows for XST to be used for highly divergent x-ray beams, thus expanding the applicability of this simple and robust method to include next generation high numerical aperture x-ray lenses. Berujon et al. (2014) have proposed a similar method, also based on XST and compatible with highly divergent beams.
In their approach, the second derivative of the wavefront phase is measured. Additionally, nano-radian angular sensitivity can be achieved with relatively small step sizes in the scan of the sample on a piezo-driven stage (discussed further in the next section). In contrast, PXST more closely aligns with current XST based techniques, such as the "unified modulated pattern analysis" method of (Zdora et al., 2017;Zdora et al., 2018a), that do not rely on small sample translations.
In section 2, we briefly review the XST method and its extension to PXST. In section 3 we present the governing equation, which is based on a second-order expansion of the Fresnel diffraction integral (presented in section A). The region of validity for the speckle tracking approximation determines the applicable imaging geometries, which are presented in section 4. We present the iterative reconstruction algorithm and the target function, which is to be minimised by the algorithm, in section 5. Conditions for the uniqueness of the solution are discussed in section 6. Finally, the theoretically achievable angular sensitivity of the wavefront reconstruction as well as the imaging resolution of the sample projection image are then presented in section 7. For reference we define the mathematical symbols used throughout the paper in table 2. In table 1 we summarise the main results of this article and refer the reader to the relevant sections.

Table 2
Symbols In(x, z) n th recorded image geometric magnification factor λ wavelength of radiation σ eff smallest resolvable speckle displacement in the plane of the detector i imaginary number a · b dot product between vectors a and b illumination wavefront in the sample plane. w and φ are the intensity and phase respectively.
illumination wavefront in the detector plane. W and Φ are the intensity and phase respectively.
transverse gradient operator

Background
The problem with wavefront metrology is that it is much more difficult to measure a wavefront's phase than its intensity; the intensity can be measured directly by placing a photon counting device at any point in the wavefront's path, whereas the phase information is indirectly encoded in the wavefront's intensity profile as it propagates through space. For plane wave illumination, no measurement of the wavefront's intensity alone will reveal its direction of propagation. One solution to this problem is to place an absorbing object at a known point in the path of the light from which the direction of propagation can then be inferred from the relative displacement between the centre of the object and the shadow cast on a screen some distance away, just as the angle of the sun can be estimated by following the line from a shadow to its object. This simple idea forms the basis of the Hartmann sensor (Daniel & Ghozeil, 1992), shown schematically in Fig 1. Originally designed to measure aberrations in telescopes and later for atmospheric distortions, the Hartmann sensor can be used as an x-ray wavefront metrology tool by cutting a regular grid of small holes, spaced at known intervals (say x i where i is the hole index), in a mask and then recording the shadow image on a detector, which is placed a small distance downstream of the mask.
Provided that each hole can be matched with each shadow image, the angle made between them, Θ(x i ) = arctan (∆x(x i )/z) (in 1D), is equal to the average direction of propagation of that part of the wavefront passing through each hole Θ( is the observed displacement along x of the i th shadow, z is the distance between the mask and the detector and d is the hole width.

Figure 1
Illustration of the Hartmann sensing principle. Top: phase of the wavefront incident on the entrance surface of the mask. The phase has been scaled by λ/2π so that the normal to the tangent is parallel to the local direction of propagation of the wavefront. The arrows indicate the direction of propagation at the centre of each mask hole. Middle: intensity of the wavefront as it propagates from the mask (top) to the detector (bottom). The colour scale is shown on the left. Bottom: the one-dimensional intensity profile of the wavefront as measured by the detector.
With a suitable interpolation routine, Θ(x) can be estimated from the set of Θ(x i ) and the phase profile can be obtained up to a constant with One limitation of this technique is that the resolution obtained is limited by the spacing between each hole in the mask. For example, Mercère et al. (2006) used a Hartmann sensor in an active optic system with a grid of 75 × 75 square holes over a 10 mm × 10 mm area, whereas the CCD detector had a 1024 × 1024 grid of pixels over a 13 mm × 13 mm area. Thus the Hartman sensor had a resolution 10.5 times worse than the CCD detector.
The maximum density of the holes in the grid is limited. This is because the task of uniquely matching each shadow image with each hole becomes more difficult as the hole density is increased -a problem that is easier to appreciate in two dimensions. In 2012 Bèrujon et al. realised a simple yet elegant solution to this problem, one that allowed for an arbitrarily fine grid of "masks" with a resolution and sensitivity limited only by the CCD pixel array and the signal to noise ratio (Bérujon et al., 2012). Their solution, the XST, is to replace the binary mask of identical holes with a thin random phase object, such as a diffuser, as shown in Fig. 2. Because the diffuser is random, the shadow from each sub-region of the diffuser is uniqueencoded by the speckle pattern seen on the detector -so that one can therefore consider any point in the diffuser to be the centre of a virtual Hartmann hole. In this sense, the random object serves as a high density fiducial marker for each of the light rays that pass from the reference or mask plane to the detector.

Figure 2
Illustration of the X-ray Speckle Tracking (XST) principle. Top: as in Fig. 1. Middle: as in Fig. 1, with the binary mask replaced by a random phase / absorption mask (dashed outline). Bottom: Sub-regions of the measured shadow image (solid black line) are compared to the reference shadow image (dashed blue line) to determine displacements (black arrows).
In the Hartmann sensor, it is assumed that the mask is well characterised, so that the shadow positions can be compared to their ideal positions, which are known a priori. However, since the mask is no longer a simple geometric object, it is now necessary to record a reference image of the mask with which to compare the distorted image.
In addition to measurements of a wavefronts phase, the XST principle can be extended to incorporate phase imaging of arbitrary samples. This can be achieved by recording an image of the wavefront with the diffuser (acting as a mask) in the beam path -this image is called the "reference" image. Then, another image is recorded with an additional sample (the one to be imaged) placed in the beam path, in addition to the diffuserthis is referred to simply as the "image". Here the relative displacements between the "reference" and the "image" are due, not to the phase profile of the wavefront, which effects both images equally, but to the phase profile of the sample transmission function.
These are two XST imaging configurations suggested by Bèrujon et al., one for imaging samples and the other for wavefront metrology: (i) in the differential configuration a speckle image is recorded with and without the addition of a sample, and (ii) in the absolute configuration a speckle image is recorded at two detector distances with respect to the mask.
In (i), the relative motion of speckles reveals the local phase gradient of the sample in the beam. Whereas in (ii), the total wavefront phase is recovered and is therefore useful for characterising x-ray beamline optics (this is the configuration shown in Fig. 2). Of course, it is still possible to characterise beamline optics in (i), just not in-situ, by placing the optical element in the sample position. This approach has been useful, for example, in measuring the phase profile of compound refractive lens systems (Berujon et al., 2012) but is impractical for larger systems such as Kirkpatrick-Baez mirrors.
Since the proposal by Bèrujon et al. there have been a number of substantial improvements; see for example the extensive review by Zdora et al. (2018b). For example, Zanette et al. (2014) developed a method where a diffuser is scanned so as to obtain a number of reference / image pairs at different diffuser positions. This step can add a great deal of redundancy, which improves the angular sensitivity of the method and even allows for multi-modal imaging of the sample when employed in the differential configuration. In subsequent publications, this approach has been termed the Unified Modulated Pattern Analysis (UMPA) method (Zdora et al., 2017;Zdora et al., 2018a).
In the absolute configuration, where the reference and image have been recorded at two detector distances, the smallest resolvable angular displacement (the angular sensitivity) is given by the ratio of the effective pixel size, which is the smallest resolvable displacement of a speckle, to the distance between the reference and image planes ∆θ = d/∆z. Therefore, the best accuracy is obtained by maximising the distance between the reference and image planes. However for highly divergent wavefields, as would be produced (for example) by a high numerical aperture lens system, there arises an unavoidable trade off between the wavefront sampling frequency and the angular sensitivity. In this situation the ideal location of the image plane is as far downstream of the lens focus as is required to fill the detector array with the beam, as this maximises the wavefront sampling frequency. In order to minimise ∆θ (maximise ∆z) one should then place the reference plane as close as possible to the beam focus. But in this plane, the footprint of the beam on the detector may be much smaller than in the image plane due to the beam divergence. This leads to a poorly sampled reference image, as only a few pixels will span the wavefront's footprint. Therefore, the smallest resolvable speckle shift will be larger than that obtainable by plane wave illumination, by a factor proportional to the beam divergence.
Realising this, Berujon et al. (2014) devised an XST technique, X-ray Speckle Scanning (XSS), that relies on small displacements of the XST mask between acquired images. No reference image or images are required and the diffraction data is recorded in a single plane. This enables the sampling frequency to be maximised by placing the detector such that the divergent beam fills the pixel array. Without a reference image however, the speckle locations in one image are instead compared to the locations observed in neighbouring images. As the speckle displacements in each image are proportional to the phase gradi-ent, the differential of the speckle locations between images are proportional to the second derivative of the phase; thus it can be viewed as a wavefront curvature measurement. The achievable angular sensitivity is now proportional to the step size of the mask, which can be substantially smaller than the effective pixel size. Interestingly, this approach is similar in principle to the Wigner-distribution deconvolution approach described in (Chapman, 1996).
In the following section, we describe an approach that is similar in principle to the one described above: (iii) Ptychographic-XST: shadow images are recorded as the mask / object is translated across the wavefront. In this method (see Fig. 3) the unknown object acts as both the imaging target and the speckle mask simultaneously. There is no special reference image, rather each image serves as a reference for all other images. Both the wavefront phase (without the influence of the object) as well as the object image (without the influence of wavefront distortions) are determined in an iterative update procedure. At each iteration, speckles 2 in the recorded images are compared with the current estimate of the reference image (in contrast to the XSS method). Images are recorded at a fixed detector distance and there is no trade off between phase sensitivity and the wavefront sampling frequency, making this method suitable for highly divergent beams. Because the speckle displacements are compared between the image and the estimated reference, large angular distortions can be accommodated. This is advantageous because it allows for the sample to be placed very near the beam focus, where the phase gradients across the sample surface are largest and where the magnification factor allows for high imaging resolution and angular sensitivity.

The Speckle Tracking approximation
In this section we describe the governing equation that relates the measured intensities in each image and the reference in terms of the wavefront phase. For monochromatic light, in the Fresnel diffraction regime the image formed on a detector placed a distance z downstream of an object is given by where T (x) represents the exit-surface wave of the light in the plane z = 0. For plane wave illumination, under the projection approximation [see Eq. (2.39) of (Paganin, 2006)], T (x) also represents the transmission function of the object. Now let us suppose that, rather than plane-wave illumination, the object is illuminated by a wavefront with an arbitrary phase (φ) and amplitude ( √ w) profile given by p(x, 0) = √ w(x)e iφ(x) . The observed intensity is now given by (3)

Figure 3
Illustration of the ptychographic-XST method. The beamline illumination was focused (off-axis) in 2D by two linear focusing lenses, with numerical apertures of 0.015 (horizontal) and 0.014 (vertical). The Siemens star sample was placed 371 µm downstream of the focal plane. Images were recorded on a CCD pixel array detector 0.71 m downstream of the focus. The scan data consists of 49 shadow images, recorded as the sample was translated across the beam profile. The wavefront phase and reference image maps were refined iteratively.
For XST based techniques, the challenge is to relate the image (I) to the reference (I ref ) via a geometric transformation. Here, the reference image as defined in Eq. 2 represents an image of the sample, a distance z downstream of the sample plane, that is undistorted by wavefront aberrations or magnified by beam divergence. For the purposes of this section, I ref could be a recorded image, but in following sections we will see that this image can be estimated from a set of distorted images.
We should note that at this point the mathematical description is rather general. For example, in the differential configuration of XST, T (x) would represent the wavefront generated by the diffuser in the plane of the object and p(x, 0) would represent the transmission function of the object. In what follows however, we will continue to describe T (x) as the object or mask transmission function and p(x, z) as the x-ray beam profile (unmodulated by the object).
A common approach to this problem is outlined by Zanette et al. (2014). There, φ is expanded up to first-order, and √ w to zeroth-order, in a Taylor series about the point x: where φ H (x ) and √ w H (x ) are the higher order terms in the expansion. Now we have, for φ H and √ w H ≈ 0, This confirms the intuitive assumption that the local gradient of φ at each position along the sample is converted into a lateral displacement of the speckles observed in the reference image. Equation 6 serves well in the limit where φ H and √ w H approach 0 (i.e. for smooth wavefronts) and is employed in a number of XST based techniques. For example in the UMPA approach (see Eq. 9 in (Zdora et al., 2018b)) the governing equation is given by whereĪ r is the mean intensity of the reference pattern and D(x) is a term the authors refer to as the "dark field signal". This term is related to a reduction in fringe visibility due to fine features in w(x) and, in fact, serves as an alternative contrast mechanism when solved for in addition to the phase gradients. Putting this term aside by setting D = 1, one can see that Eq. 7 reduces to Eq. 6. Given the restrictive nature of the approximations employed however, it is not surprising that Eq. 6 quickly fails to serve as a valid approximation for larger phase gradients. To see this, let us consider a well known analytic solution to I in terms of I ref called the "Fresnel scaling theorem", which is described in, for example, appendix B of (Paganin, 2006). Simply put, it states that: The projected image of a thin scattering object from a point source of monochromatic light is equivalent to a magnified defocused image of the object illuminated by a point source of light infinitely far away.
The derivation is rather simple and so we shall present it here using the current notation. Let us say that the image, I, is formed by the point source of illumination a distance z 1 along the optical axis (the z-axis) and that this distance is large enough that we can ignore intensity variations of the illumination across the sample surface, so that √ w(x) = 1. The probing illumination in the plane of the sample is then given by p(x, 0) = e iπx 2 /λz1 . Substituting this into Eq. 3 and completing the square in the exponent, we have where the geometric magnification factor M = z1+z z1 and z/M is the effective propagation distance (z). But according to Eq. 6 we would have with a geometric magnification factor M = z1 z1−z , in contradiction to the result from the Fresnel scaling theorem. As expected, the results agree in the limit z 1 → ∞, i.e. in the limit where the phase gradient approaches 0. Current formulations for XST based on Eq. 6 (in the absolute configuration), are expected to perform badly when the effective source distance, z 1 , approaches the propagation distance, z, or, in the differential configuration, when the sample transmission function departs significantly from the weak phase approximation.
In a notable departure from this approach, Pagenin et al. have recently developed an alternative description of the speckle tracking approximation based on a "geometric flow" equation (Paganin et al., 2018) This approximation, which closely resembles the transport of intensity equation (Teague, 1983), has the remarkable property that φ may be determined analytically from a reference-image pair, thus permitting the rapid and simple processing of large tomographic data sets. This approach also assumes small and local distortions of the reference image and is, therefore, illsuited as an approximation for larger phase gradients. For example, substituting the quadratic phase for a diverging wavefield, φ = πx 2 λz1 , into Eq. 11 yields This corresponds to a geometric magnification factor of M = Comparing the above equation with Eq. 12, we have m = −z z1−z . Solving for the geometric magnification factor yields M = z1−z z1−2z as above.
Remarkably, with only a minor modification to the speckle tracking formula in Eq. 6, a second-order expansion of the phase term can be accommodated in the Fresnel integral, leading to where ∇φ and ∇Φ are the transverse gradients of the illuminating wavefield phase in the sample and image planes respectively (without the influence of the object) and w and W are the intensity profiles of the illuminating wavefield in the reference and image planes respectively. In Fig. 4 we show a diagram for a hypothetical PXST imaging experiment. In this diagram one can see the lens, focal, sample, reference and image planes respectively. The reference image would have been measured by plane-wave illumination in the plane indicated. A point that is not illustrated in the diagram, is that both the image and the reference image exhibit propagation effects, such as Fresnel fringes. We note, once again, that the speckle tracking approximation above, applies to more imaging geometries / modalities than that displayed in Fig. 4. Equations 13 and 14 are reciprocal statements of the same approximation and choosing between them is a matter of convenience depending on the desired application. We note here that this approximation makes a distinction between the phase gradients in the sample and image planes, whereas it is common to assume that they are similar or related by a lateral scaling factor (magnification). This distinction is not important in cases where the separation between these two planes and the beam divergence are small, but becomes critical for highly magnified imaging geometries or long propagation distances. This approximation is not as strong as the "stationary phase approximation" (Fedoryuk, 1971), which links coherent propagation theory with geometric optics, although the principles used to derive this result are similar. The derivation is straightforward and self-contained but lengthy, and may be found in section A of the appendix.

Figure 4
Schematic diagram for a hypothetical projection imaging experiment. The illuminating beam propagates from left to right and the solid black lines indicate the boundaries of the illumination wavefront. The sample is depicted as a small black filled circle in the sample plane and as a black circle in the reference and image planes. The red lines depict the illumination's wavefront in the sample and image planes, which are not merely related by transverse magnification. The distorted shape of the circle in the image plane represents possible distortions of the speckle produced by the sample and the transverse phase gradients of the illumination.
Equations 13 and 14 posses two beneficial properties for the current analysis: they relate the image and its reference via a geometric transformation and they are consistent with the Fresnel scaling theorem. In fact, the Fresnel scaling theorem is a special case of the above approximations when w(x) = 1 and φ(x) = π x 2 λz1 . Evaluating Eq. 14 for these values of w, φ and usingz = zz 1 /(z + z 1 ) we have , which yield the correct magnification and scaling factors, in agreement with Eq. 8. Similarly, we can evaluate Eq. 13 using where these values for the illumination's wavefront in the plane of the detector follow from the Fresnel approximation for a point source placed a distance z + z 1 upstream and from flux conservation of the beam when w(x) = 1 in the sample plane. Evaluating Eq. 13 yields which is once again, in agreement with Eq. 8. In general, for arbitrary φ, the phase curvature of the illumination may vary in direction, as is the case (for example) in an astigmatic lens system, and also with position in the image. Thus, the magnification is also position dependent and directional: where Given the extended validity of Eq. 13, we suggest that the following modification to the UMPA equation (Eq. 7), will achieve better results: x , or, using the notation of Zdora et al. : We also note that although Paganin et al.'s geometric flow algorithm (Eq. 11) is a poor approximation for larger distortion factors (large M), it may be a more general physical description in the limit M → 1. As the authors note, the term ∝ ∇I ref · ∇φ in the expansion of Eq. 11 accounts for speckle translations that arise from strong intensity gradients of the reference image, i.e. that are not generated from ∇φ alone.

Limits to the Approximation
The second-order speckle tracking approximation of Eqs 13 and 14 are subject to the following approximations where these are additional to the approximations necessary for the paraxial approximation to hold, u(x) = x − λz 2π ∇Φ(x) and g(x, x ) ≡ π |x−x | 2 λz + φ(x ). In general, these approximations hold best for smooth wavefront amplitudes √ w, predominantly quadratic phase φ and large spatial frequencies of the object.
Here, we examine the speckle tracking approximation, in 1D, for the imaging geometry depicted in Fig. 4 and with parameters corresponding to a typical experiment utilising x-ray multilayer Laue lenses. For this example we choose that the illumination is formed by a lens with a hard edged aperture and with the sample placed at two possible distances from the focal plane, z 1 = 500 µm and z 1 = 10 µm. The lens has a numerical aperture of NA = 0.01 and the detector is placed in the far-field of the probe and the sample, with z 1 + z = 1 m. This imaging geometry leads to an effective propagation for plane-wave illumination that is nearly identical to the distance from the focus to the sample (z ≈ z 1 ). The wavelength is 10 −9 m. The sample has a Gaussian profile so that T (x) = 1 − ne −x 2 /2σ 2 , where n = 1 − i was chosen arbitrarily and would be proportional to the sample thickness and the deviation from unity of the refractive index. The Fresnel number is thus F ≈ σ 2 /λz. Comparison between the images formed according to Frensel diffraction theory and the speckle tracking approximation. In the left and middle columns, the sample has a σ-width of 0.15µm and is placed 500µm from the focus. In the left column the sample is centred in the beam profile, whilst in the middle it has been shifted to the edge. In the right column the sample has a σ-width of 0.01µm and is placed 10µm from the focus. First row: the exit surface wave intensities formed by illuminating a small Gaussian object with divergent illumination (black line). The intensities have been scaled by the factor z/z. The sample transmission amplitudes are shown in blue. The angles along the x-axis are given by arctan x/z 1 and match those of the second row. Second row: the intensity of the wavefront in the image plane (black line) and the images formed by the speckle tracking approximation (blue line). The fractional differences are shown in red. The angles are given by arctan x/(z 1 + z).
The wavefronts in the sample and image planes were simulated using the discrete form of the Fresnel diffraction integral. The illumination's wavefront in the image plane is given by p(x, z) = c √ W (x)e iΦ(x) , where c is a complex pre-factor that does not depend on x, √ W (x) was calculated numerically and Φ is almost quadratic, with Φ(x) ≈ πx 2 λ(z1+z) . Note that φ(x), the phase profile of the illumination in the sample plane, is not given by πx 2 λz1 as would be the case for a point source of light (i.e. for NA → ∞). This is because the hard edges of the aperture produce Fresnel fringes that progress from the edge of the wavefront to the focal point at x = 0 as one moves from the image to the focal plane.
To test the validity of the speckle tracking approximation, we compare these simulated Fresnel images with those formed by evaluating Eq. 13. In this case Eq. 13 can be evaluated analyti-cally with where In order to arrive at the above result, we have assumed that Φ is purely quadratic across the wavefront, but this approximation has not been used when simulating the image according to Fresnel diffraction theory.
In appendix B, we suggest a suitable criterion for the speckle tracking approximation to hold for this imaging geometry based on the second criterion above, √ λz + λzq T z 1 NA 1, where q T = 1/X is the spatial frequency corresponding to full period features of size X. This criterion holds for features within the plateau of the illumination profile.
In the first column of Fig. 5, we have placed the sample in the centre of the illumination profile. Here, the left hand side of Eq. 24 evaluates to 0.8 and one can see that the fractional differences between the image and the approximation are small compared to that of the middle column. There, the sample has been shifted to the edge of the illumination profile, where the slope of the illumination amplitude is large. This leads to a breakdown of the second condition ( √ w(x ) ≈ √ w(u(x))) and indeed the discrepancy between the approximation and the image is largest near the edge of the pupil region and slowly reduces for features closer towards the central region.
In the right column of Fig. 5, the sample is smaller, with a σvalue of 0.01µm and has been moved closer to the focal point. The left hand side of Eq. 24 now evaluates to 11.0. As expected, this increase from 0.8 in the first column to 11.0 in the right column corresponds to an increasing discrepancy between the speckle tracking approximation and the image. This image is in the transition region between the near-field and far-field diffraction regimes. Clearly, features in the diffraction outside of the holographic region, where √ W ≈ 0, are not represented at all by the approximation.
In both the second and third examples shown here, the errors in the speckle tracking approximation are dominated by the error in the approximation √ w(x ) ≈ √ w(u(x)). This is not surprising given that the zeroth-order expansion of √ w(x ) about u(x) is a much stronger approximation than the second-order expansion of φ(x ) about u(x) (both approximations are necessary to arrive at the speckle tracking formula).
The increased quality of projection images due to smoother illumination profiles was one of the principle motivations behind Salditt and collaborators' efforts to develop an xray single-mode waveguide, in order to improve their tomoholographic imaging methods; see for example (Krenkel et al., 2017).

Reconstruction Algorithm
In this section we describe the steps necessary to recover estimates for Φ(x) and I ref (x) from a series of N measurements of the kind depicted in Fig. 3; where each recorded image on the detector corresponds to a translation of the sample in the transverse plane by ∆x n (here n is the image index). According to the speckle tracking approximation of Eq. 13, the geometric relationship between the recorded images I n (x) and the un-recorded reference image I ref (x) is given by Translating the sample by ∆x n along the x-axis leads to a corresponding translation of the reference image, this is because the convolution integral in Eq. 2 possesses translational equivariance.

Figure 6
Flow diagram for the PXST iterative refinement cycle. The ←'s represent an update of the item on the left given the items to the right of the arrow. The dashed line arrows represent optional paths in the algorithm. Each step in the diagram corresponds to an equation in the main text: "update reference" to Eq. 27, "update phase gradients" to Eq. 28, "update sample translation" to Eq. 29, "calculate error" to Eq. 26, "regularise phase gradients" to Eq. 30 and finally the "irrotational constraint" and "integrate phase gradients" steps are given by Eq. 31.
To recover estimates for Φ(x) and I ref (x), we choose to minimise the target function in an iterative update procedure with respect to ∇Φ(x) and (as needed) ∆x n , subject to where σ 2 I (x) is the variance of the recorded intensities at each detector pixel, such that σ 2 I (x) = I 2 n (x) − I n (x) 2 n n . In fact Eq. 27 is the analytic solution for the minimum of n ε(n, x) with respect to I ref (x) but for σ 2 I (x) = 1. The reason we have set σ 2 I (x) = 1 for the reference image update is that, in this way, the reference image is formed preferentially from parts of the image with larger intensities and thus will not be unduly effected by detector noise. This is also the update procedure that is often employed in single-mode ptychographic reconstructions.
The update for ∇Φ(x) is given by while holding I ref (x) and ∆x n constant, where argmin ∇Φ means "the argument of the minimum" with respect to ∇Φ, which is to say, the ∇Φ that gives rise to the minimum of n ε(n, x). The minimisation is performed by evaluating n ε(n, x) for possible value of ∇Φ(x) within a pre-defined search window. The update for ∆x n is given by while holding I ref (x) and ∇Φ(x) constant. Once again, the minimisation is performed by evaluating possible value of ∆x n within a pre-defined search window. Additionally, it is often desirable to regularise ∇Φ(x) during the update procedure (especially for the first few iterations), according to where ⊗ is the convolution operator and σ is the regularisation parameter that can be reduced as the iterations proceed. Once the iterative procedure has converged, the phase profile of the illumination (Φ(x)) can be recovered from the gradients (∇Φ(x)) by numerical integration. For this we follow the method outlined in the supplementary section of Zanette et al. (2014). Let us label the final value of the phase gradients by δ(x) ≡ ∇Φ(x). The procedure is then given by where ∇Φ(x) is evaluated numerically and the minimisation is performed via the least squares conjugate gradient method. The fact that Φ is given by the numerical integration of ∇Φ suggests a further constraint that could be employed in the update procedure. As noted by Paganin et al. (2018): since ∇Φ is given by the gradient of a scalar field, then ∇Φ will be irrotational if Φ is continuous and single valued. This follows from the Helmholz theorem, which states that any field can be written as the sum of a gradient and a curl. Since we know that ∇Φ is, by definition, the gradient of Φ, then the curl must be zero ∇ × ∇Φ = 0. In their work, this condition is automatically satisfied by the solution. Here however, we must incorporate this as a separate constraint. An irrotational field f is one that satisfies where f x (x) and f y (x) are the x and y components of the vector field respectively. To ensure that ∇Φ is irrotational, one need only apply the numerical integration in Eq. 31 followed by numerical differentiation as needed during the update procedure. If this condition is not enforced, then the degree to which the recovered ∇Φ is irrotational can be used as a measure of the fidelity of the result. Numerical considerations for the implementation of this iterative update procedure, in addition to the source code developed to implement the PXST algorithm has been published online 3 .
The algorithm presented here is by no means the only approach to solve for the phase gradients and reference image. Indeed, similar problems emerge in many areas of imaging such as computer vision, medical imaging and military targeting applications. In magnetic resonance imaging, the process of identifying the distortions that relate an image to its reference is often termed the "image registration" problem and generating the reference image from a set of distorted views is termed "atlas construction". So called "diffeomorphic image registration" algorithms are popular in that field, many of which are based on Thirion's demons and log-demons algorithm (Thirion, 1998;Lombaert et al., 2014). Indeed, this approach has been employed in the context of XST by Guillon et al. (Berto et al., 2017) to recover the phase gradients from an image/reference pair. Others in the XST field use correlation based approaches, where the geometric mapping between a small region of the distorted image and the reference is determined by the point which provides the greatest correlation correlation coefficient (Zdora et al., 2018b). The approach outlined in this work was employed because of its simplicity and ease of implementation. However, it seems likely (in the authors' view) that one or more of these approaches could be adapted to the current problem in order to produce superior results.

Example reconstruction
Here we provide a brief example of a PXST reconstruction from a simulated 1D dataset. This example is not intended as realistic simulation for an actual experiment, see  for experimental results in 2D. Rather, it serves as a simple illustrative check on the basic principles of PXST.
The simulated sample is similar to that shown in Fig. 2. It was constructed in Fourier space with a Gaussian intensity profile and random phases at each pixel. The real-space object is thus complex valued, so that rays passing through the sample will be both absorbed and deflected in angle. The intensity of the illumination profile, in the plane of the detector, was formed by setting W equal to a top hat function filtered with a Gaussian kernel. This filter produces a smooth tapered fall-off in the intensity near the edges of the beam that helps to avoid aliasing artefacts during numerical propagation of the wavefront. The phase profile, Φ, was constructed with the quadratic function πx 2 /(λ(z 1 + z)), where λ = 1.2 nm (1 keV), z = 20 mm and z 1 = 40 mm, so that the focal plane of the illumination is upstream of the sample in the top panel of Fig. 7 by a distance that is twice that of the sample to detector distance. This leads to an average magnification factor of 1.5. In addition to this, a sinusoidal phase profile was added to the phase in order to simulate the result of aberrations in the lens system, this can be seen as the dashed black line in the bottom panel of Fig. 7. Top: intensity of the wavefront propagating from the sample (z = 0) to the detector plane (z = 20 mm). The linear colour scale ranges from 0 (white) to the maximum value (black). Middle: stack of the 1D images recorded as the sample is scanned across the wavefield (to the right). The colour scale is the same as in top. Bottom: reconstructed and input phase aberrations in the detector plane. See text for further details.
The intensity of the wavefront, I(x, z), propagating from the exit surface of the sample to the detector plane is shown in the top panel. Upon close inspection, one can see that the intensities in the plane of the detector are non-trivially related to those in the exit surface of the sample. As such I(x, 0) cannot be constructed from I(x, z) by a scaling in x (magnification) or indeed by any geometric mapping. We make the point again, that in PXST the "reference" image is not the sample transmission profile, rather, it is the intensity profile one would have observed on a detector placed a distancez = zz 1 /(z 1 + z) = 13.3 mm downstream of the sample illuminated by a plane wave. It is the geometric mapping between the reference image (not the sample transmission) and the recorded images that is used to reconstruct the phase profile of the illumination.
The advantage of this 1D example is that one can visualise the entire dataset in a single 2D image. In the middle panel of Fig. 7 the 1D images formed on the detector, as the sample is scanned across the wavefield, are displayed as an image stack. Along the vertical axis is the image number and the horizontal axis is in angle units, which are the angles made from the point source to each pixel in the image. It is seen that this image stack consists of a series of lines that appear to flow towards positive angles as the image number increases. These are the features in the image that can be obviously tracked through the stack. In this representation, the gradient of the lines at each diffraction angle are proportional the local wavefront curvature. For example, at a diffraction angle of ≈ −0.75 mrad, the wavefront aberrations have a negative curvature and so features at this point in the wavefield are demagnified with respect to the mean. At a diffraction angle of ≈ 0.75 mrad, the opposite is true (with a greater magnification) and the line gradients are shallow with respect to those at ≈ −0.75 mrad. In addition to variations in the geometric magnification, the wavefront aberrations also locally adjust the effective propagation distance of the speckles. This is a non-geometric effect and (unlike the local variations in the magnification) is not accounted for by the PXST reconstruction algorithm. For the current example, we have deliberately set the aberrations such that the local magnification and effective propagation distance vary by a significant fraction across the wavefield. This allows for their effect to be clearly observed in the simulated data, but also leads to some errors in the phases.
The reconstructed phase profile, after 30 iterations of the PXST algorithm, is shown as the blue line in the bottom panel of Fig. 7. The pedestal, tilt and defocus terms have been removed prior to display, to allow the sinusoidal aberration profile to be clearly visualised. Near the edges of the illumination, W ≈ 0 and the phases could not be determined (as expected). Apart from this, the differences between the ground truth and reconstructed phase profile (0.1 rad root mean squared error) are too small to see in this plot but are still much greater the theoretical lower limit of ≈ 0.0001 (this limit is defined in section 7)due to strength of the aberrations (as described in the previous paragraph).

Pedestal and tilt terms are unconstrained in the illumination phase profile.
In general, the solution to the target function in Eq. 26 does not constrain terms proportional to 1, x and y in the recovered phase profile. These terms are sometimes referred to as the "pedestal" and "tilt" components of the pupil function. To see this, consider a phase profile Φ = c + d · x + Φ, where Φ corresponds to the true phase profile in the plane of the detector and where c and d are constants. This leads to the phase gradients ∇Φ = d+∇Φ. Substitution into Eq. 25 yields independent of the pedestal term. Also, the tilt in the phase profile has produced a shift in the reference image, which is generally undetectable unless the position of a feature in the object is known a priori with respect to the detector. Alternatively, we could have absorbed the term λz 2π d as a constant offset in the sample translation vectors ∆x n = ∆x n + λz 2π d. We note that the tilt terms are typically unconstrained in speckle tracking techniques, since it is common to allow for an overall offset in the sample or detector positions.

Speckle patterns of sufficient density are required for a unique solution to exist.
Clearly, in the extreme case where no speckles are recorded then the phase is completely unconstrained, so that and for any Φ . This condition could also be reached in the limit where the fringe visibility of the speckle pattern approaches zero. The requirement for adequate fringe visibility is a point which is emphasised by Zanette et al. (2014) as well as many others in the field (Zdora et al., 2018b). Of course, the above condition could also be reached for any sub-domain of x. If, for example, I n (x ) = W (x ) for all n, then the phase terms are unconstrained at the points x . This suggests a more general (necessary) condition for a unique solution to exist: that a speckle of sufficient contrast must be observed at least once at each position in the image. As an example, the Hartmann sensor discussed in section 2 does not satisfy this condition and, as such, it is necessary to interpolate values of Φ between the mask holes, rendering the method insensitive to high order aberrations that lead to rapid variations in ∆Φ. We should note that this is not always an issue, especially in cases where the low order aberrations of the wavefront are of primary concern, such as when we wish to correct them by some means, or when the low order aberrations are dominant and dominate the imaging performance of the optic. In another extreme, the phase is also unconstrained for N = 1, when only a single image has been recorded. This is because the unknown I ref can be adjusted to accommodate any Φ : for and for any Φ . This situation has arisen because the observed speckles are modelled as a function of ∇Φ and I ref , both of which are refined in the PXST method. So if a given speckle is observed only once, at a location x , and the phase gradient at x is not otherwise constrained, then multiple solutions for ∇Φ and I ref exist. The above two considerations suggest the more general (necessary) conditions for a unique solution to exist: A speckle of sufficient contrast must be observed at least once at each position in the image, and, this speckle must be observed at least twice and at different positions in the image.
Note that the above constraint does not require that every speckle must be observed more than once. The easiest way to satisfy this condition is to use a sample that produces a dense, high contrast array of speckles on the detector, such as a diffuser. With such a sample, the above constraint may be satisfied with just two images (i.e. for N ≥ 2), provided that the sample step size is not greater than half the illuminated region of the sample along the direction of the step.
However, it is possible for a unique solution to exist even when the sample produces only a single observable speckle in each image. In this case, the above condition can be satisfied by scanning the sample such that this speckle is observed at each point in the image. This is a far less efficient means for wavefront sensing than using a diffuser. But this generality allows for nearly any object, such as the Siemens star in Fig. 3 or a Hartmann mask, to be used as a wavefront sensing device.

Ambiguities can arise due to unknown sample positions.
If the sample translation vectors (∆x n ) are unknown, then there exists a family of solutions to Eq. 25, with each solution corresponding to a set of translation vectors related by an affine transformation. Consider a set of translation vectors ∆x n = x 0 + A · ∆x n , where x 0 is an overall offset and the dot product is between the 2 × 2 linear transformation matrix A and the true sample translation vectors. As described previously, any overall offset in the translation vectors generates a corresponding offset in the reference image and a tilt term in the recovered phases. So, neglecting the offset term, we generate this family of solutions by the substitution ∆x n = A −1 · ∆x n into Eq. 25: If, on the other hand, the true sample translation vectors are given by the input values of ∆x n , but with small random offsets of mean 0, then the true solution for the phase and reference image can be recovered from the retrieved values by removing the effect of any affine transformation that may have arisen during the reconstruction. This can be accomplished by minimising n |∆x out n − A · ∆x in n | 2 with respect to A, where ∆x in n and ∆x out n are the input and output values of ∆x n respectively, then generating the corresponding solutions for ∇Φ and I r from the above equations. This situation can arise, for example, due to small relative errors in the translation of a stepper motor or from the pointing jitter of an XFEL pulse.

An unknown rotation of the sample stage axes with respect to the detector axes can be corrected.
A common systematic error for the input sample positions, is an overall rotation of the axes of the sample translation stages with respect to the pixel axes of the detector. In this case the linear transformation matrix reduces to the rotation matrix Here, we can make use of the fact that in general ∇Φ of Eq. 41 is not irrotational for θ = 0 and A = R(θ). If (u, v) ≡ ∇Φ out then we can demand that the vector field R −1 · (u, v) be irrotational. With R −1 = R(−θ) and Eq. 32, we have where the derivatives with respect to x and y are evaluated numerically. However, if the irrotational constraint was enforced during the reconstruction, then the recovered ∆x out n is free of the erroneous rotation and no further analysis is required.

The raster grid pathology produces artefacts for latticelike sample translations.
A common annoyance encountered in ptychography is the so called "raster grid pathology" (Thibault et al., 2009). The raster grid pathology arises when reconstructing both the illumination and sample profiles from diffraction data acquired while the sample is scanned along a regular grid. In that case, the recovered illumination and sample transmission functions may be modulated by any function, so long as it is periodic on a lattice of points upon which all of the sample positions lie. In many cases, the governing equation for a ptychographic reconstruction is given by where F [·] is the Fourier transformation operator over the transverse plane and represents the propagation of the exit-surface wavefront ψ n (x) ≡ T (x − ∆x n )p(x, 0) to the detector (in the far-field of the sample). If the sample is translated along a regular grid, for example, with step size d then ∆x n = n · d, where the vector n = (i n , j n ) is the 2D lattice index corresponding to the n th image (for integer i n and j n ). If we make the substitution p (x, 0) = f (x)p(x, 0) into the above equation, then we have The raster grid pathology can be avoided by ensuring that the sample scan positions lack any translational symmetry, i.e. by scanning the sample in non-regular patterns, for example in a spiral grid, or by adding a random offset to every grid position (Fannjiang, 2018). Given that Eq. 44 is nothing but a special case of the Fresnel integral in Eq. 2 (from which the speckle tracking approximation is derived) it is natural to consider whether or not the same pathology applies here. One can show that a similar pathology does indeed arise in the present case. Here, the illumination's intensity is constrained during the reconstruction, so instead we make the substitution p (x, z) = p(x, z)e ig(x) , which is equivalent to Φ (x) = Φ(x) + g(x), into Eq. 25: So, rather than modulating the reference image with a periodic function, the pathology here creates a geometric distortion of the reference image.

Angular Sensitivity and Imaging Resolution
7.1. The resolution of the reference image is given by the demagnified effective pixel size.
The angular resolution for the wavefield in the plane of the detector (∆Θ Φ ) is not, in general, the same as the angular resolution in the plane of the sample. This is because of the difference in extent between the effective pixel size and the demagnified effective pixel size that arises due to divergent illuminating wavefields. ∆Θ Φ is given by For a fixed focus-to-detector distance, once again z = z t − z 1 , and we have An interesting feature of the above equation is that the effect of the source incoherence δ pix σ s does not vary with z 1 ; as z 1 is decreased the increase in the magnification factor leads to a corresponding decrease in the effective pixel size due to the finite source size, but this is exactly balanced by the reduction in the angle subtended by the demagnified effective pixel size. Therefore, the optimal position for the sample, in order to minimise ∆Θ Φ , is as close to the focal plane as possible. However, this can be a dangerous limit to approach, as the speckle tracking approximation will begin to break down due to the rapidly varying illuminating wavefield -at this point, it would be necessary to employ a fully coherent model for the wavefront propagation, for example, by switching to far-field ptychography. Additionally, it can be beneficial to increase the focus-to-sample distance for practical reasons; for example, increasing z 1 provides a larger field-of-view of the sample in each image which aids in positioning the region of interest of the sample with respect to the illuminating beam and, typically, increases the speckle visibility. For these reasons, it can be beneficial to approach the limit where σ det M ≈ σ s or z 1 ≈ z t σ s σ det .
Here z 1 has been increased until the demagnified pixel size is approximately equal to the demagnified feature size one would observe from a point-like object due to incoherence alone. This represents the transition between modes where ∆Θ Φ is dominated by the detector pixel size (larger z 1 ) and by the finite source size (smaller z 1 ).

7.4.
Although the angular sensitivity of the wavefields in the sample and detector planes differ, the phase sensitivities are equal and are both minimised by maximising the magnification.
The difference between ∆Θ φ and ∆Θ Φ may at first appear to be a curious asymmetry. However, the phase sensitivity in the plane of the sample and the detector are in fact equal for a given sample position. For divergent illumination, the angular distribution of the wavefield in the sample plane is larger than that of the wavefield in the detector plane by a factor of M. On the other hand, the sampling frequency of the wavefield in the sample plane is also larger by the factor M. So, when propagating uncertainties in the angular distributions to the integrated phase profiles, these two effects cancel.
Recall the relationship between the integrated phase profile and the angular distribution of the wavefield in Eq. 1. Uncertainties in the angular distribution are thus scaled by the step size in x after integration, so that Therefore, with σ ref = σ eff /M, we have that ∆φ = ∆Φ ≈ 2π λ σ 2 eff /M and both are minimised by placing the sample as close as possible to the focal plane, as described in the previous subsection.

Discussion and conclusion
We have presented a modified form of the speckle tracking approximation, valid to second-order in a local expansion of the phase term in the Fresnel integral. This result extends the validity of the speckle tracking approximation, thus allowing for greater variation of the unknown phase profile and thus for greater magnification factors when the wavefield has a high degree of divergence (such as that produced by a high numerical aperture lens system) or, when imaging a sample in the differential configuration of XST, to allow for greater phase variation across the transmission function of the sample (such as that produced by a thick specimen). We suggest that this approximation can be used, with little modification, in many of the existing XST applications and suggest such a modification for the UMPA approach.
We have also presented the PXST method, a wavefront metrology tool capable of dealing with highly divergent wavefields (like XSS) but unlike XSS, the resolution does not depend on the step size of the sample translations transverse to the beam. Coupled with a high numerical aperture lens, PXST provides access to nanoradian angular sensitivities as well as highly magnified views of the sample projection image. With a suitable scattering object, which in this case is the sample itself, a minimum of two images are required although more images will improve robustness and resolution.
We must emphasise that it is only the projection image of the sample that is recovered. The phase and transmission profile of the sample must be inferred from the projection image via standard techniques (Wilkins et al., 2014). This is in contrast to other methods that provide multiple modes of imaging of the sample, such as the transmission, phase and the so called "darkfield" profiles. What distinguishes PXST from these methods, is that the sample image is obtained in addition to the wavefield phase in the absolute configuration of XST; that is, both can be obtained from a single scan series of the sample.
A further application of this method is to use it as an efficient prior step to Fourier ptychography, by recording images out of focus. The recovered illumination and sample profiles can be used as initial estimates for a Fourier ptychographic reconstruction. Experimentally, this additional step can be achieved simply by moving the sample towards the focal plane of the lens. In some cases, this additional step would not even be required, so that speckle tracking followed by ptychography could be performed on the same dataset.
For experimental results utilising the PXST method, see . These results are based on a campaign of measurements for the development of high numerical aperture wedged multi-layer Laue lens systems.