Multi-stage deep learning artifact reduction for parallel-beam computed tomography

Shi, J.; Pelt, D.M.; Batenburg, K.J.

doi:10.1107/S1600577525000359

research papers

JOURNAL OF
SYNCHROTRON
RADIATION

ISSN: 1600-5775

Volume 32| Part 2| March 2025| Pages 442-456

https://doi.org/10.1107/S1600577525000359

Open

access

Multi-stage deep learning artifact reduction for parallel-beam computed tomography

Jiayang Shi,^a ^* Daniël M. Pelt ^a and K. Joost Batenburg ^a

^aLeiden University, Einsteinweg 55, 2333 CC Leiden, The Netherlands
^*Correspondence e-mail: [email protected]

Edited by A. Stevenson, Australian Synchrotron, Australia (Received 19 September 2024; accepted 14 January 2025; online 17 February 2025)

Computed tomography (CT) using synchrotron radiation is a powerful technique that, compared with laboratory CT techniques, boosts high spatial and temporal resolution while also providing access to a range of contrast-formation mechanisms. The acquired projection data are typically processed by a computational pipeline composed of multiple stages. Artifacts introduced during data acquisition can propagate through the pipeline and degrade image quality in the reconstructed images. Recently, deep learning has shown significant promise in enhancing image quality for images representing scientific data. This success has driven increasing adoption of deep learning techniques in CT imaging. Various approaches have been proposed to incorporate deep learning into computational pipelines, but each has limitations in addressing artifacts effectively and efficiently in synchrotron CT, either in properly addressing the specific artifacts or in computational efficiency. Recognizing these challenges, we introduce a novel method that incorporates separate deep learning models at each stage of the tomography pipeline — projection, sinogram and reconstruction — to address specific artifacts locally in a data-driven way. Our approach includes bypass connections that feed both the outputs from previous stages and raw data to subsequent stages, minimizing the risk of error propagation. Extensive evaluations on both simulated and real-world datasets illustrate that our approach effectively reduces artifacts and outperforms comparison methods.

Keywords: artifact reduction; deep learning; computed tomography; ring artifacts.

1. Introduction

X-ray computed tomography (CT) is a non-invasive imaging technique widely employed in fields ranging from medical diagnostics to industrial non-destructive testing and security screening (Hansen et al., 2021 ). Synchrotron facilities are at the high end of X-ray tomography capabilities, providing the flux required to image a range of specimens at high spatial and temporal resolutions. (Thompson et al., 1984 ; Westneat et al., 2008 ; Sun et al., 2018 ; Labriet et al., 2018 ). Moreover, the synchrotron environment provides access to a range of imaging modes for probing absorption, phase, fluorescence, diffraction and other contrast formation mechanisms. As illustrated in Fig. 1, synchrotron tomography datasets are typically processed by a computational pipeline (Hintermüller et al., 2010 ; Gürsoy et al., 2014 ; Ganguly et al., 2021 ), which involves three sequential stages: (i) acquiring X-ray projection images, (ii) converting these projections into sinogram images, (iii) reconstructing to visualize the object's internal structure. Each stage may consist of multiple processing steps. Implementations of such pipelines can be found in several open-source software packages (Gürsoy et al., 2014; Van Aarle et al., 2015 ; Biguri et al., 2016 ; Kazantsev et al., 2022 ; Kim & Champley, 2023 ), such as TomoPy, Astra Toolbox and LEAP.

Figure 1
Schematic representation of a typical synchrotron CT pipeline, consisting of three stages: projection, sinogram and reconstruction. Each stage may consist of multiple processing steps.

The quality of the reconstructed images is often compromised by various artifacts introduced during data acquisition. These artifacts propagate through the processing pipeline and result in artifacts in reconstructed images. For example, ring artifacts arise due to variations in the detector response and excessively high-energy photons captured by the detector can lead to the occurrence of zinger artifacts. This propagation of artifacts can be mitigated by including artifact-specific data processing steps early in the computational pipeline, such as processing sinogram images for ring artifact reduction (Rivers, 1998 ; Titarenko et al., 2010 ; Paleo & Mirone, 2015 ; Titarenko, 2016 ; Vo et al., 2018 ; Mäkinen et al., 2021 ) and processing projections images for zinger artifact reduction (Rivers, 1998; Gürsoy et al., 2014; Faragó et al., 2022 ; Mertens et al., 2015 ). While these methods are commonly used in practice, they necessitate parameter tuning to reduce artifacts properly and can introduce additional artifacts, especially if the parameters are not set optimally (Pelt et al., 2018 ). We acknowledge that, while the effort required for hand-tuning may be relatively modest, significant expertise is often required.

Meanwhile, deep learning has significantly advanced the state-of-the-art in image quality enhancement for a variety of imaging tasks, including denoising, super-resolution and deblurring for natural images (Burger et al., 2012 ; Zhang et al., 2017 ; Pan et al., 2016 ; Liang et al., 2021 ). Its effectiveness has led to increased use in CT imaging (Pelt et al., 2018; Nauwynck et al., 2020 ; Liu et al., 2023 ; Zhu et al., 2023 ; Shi et al., 2023 ). Existing efforts toward integrating deep learning in the synchrotron tomography pipeline roughly fall into three different categories:

(i) Integrating deep learning based post-processing steps after the image reconstruction step. Deep learning models, particularly convolutional neural networks (CNNs), are often applied on reconstructed images to enhance image quality (Pelt et al., 2018; Chen et al., 2017 ; Yang et al., 2018 ; Gholizadeh-Ansari et al., 2020 ; Yan et al., 2023 ; Liu et al., 2023). Although these methods work well for noise reduction, they struggle with non-local artifacts that cannot be easily addressed using local image information, as shown in Fig. 2. Additionally, coupling these models with classical methods that operate before reconstruction introduces the risks associated with parameter tuning and new artifact creation.

Figure 2
Comparison of deep learning based post-processing performance on reconstructed images from the simulated foam phantom dataset (Pelt et al., 2022

) affected by local artifacts (noise) and non-local artifacts. The red and green insets show enlarged views of the affected areas. Peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM) (Wang et al., 2004

) values are provided in the top-right and lower-right corners, respectively. Deep learning based post-processing is less effective in reducing global artifacts than noise. While this figure demonstrates differences in artifact reduction based on simulation, similar challenging scenarios can occur in real-world dynamic experiments (Bührer et al., 2019

) or fast scan setups (Raufaste et al., 2015

; Mokso et al., 2017

) at synchrotron facilities.

(ii) Replacing individual blocks in the computational pipeline with deep learning modules for specific artifact reduction tasks. By incorporating deep learning modules into the CT pipeline at the projection and sinogram stages and training them in a supervised manner for a specific artifact-reduction task, classical algorithms for artifact reduction can be replaced by learned methods (Yuan et al., 2018 ; Ghani & Karl, 2019 ; Nauwynck et al., 2020; Fu et al., 2023 ; Zhu et al., 2023). However, these models only deal with a specific step of the computational pipeline and may introduce new artifacts in certain cases. When applied in sequence together with post-processing, newly introduced artifacts can propagate through the pipeline and result in artifacts in the reconstructed images.

(iii) Replacing the entire pipeline with end-to-end deep learning. Ideally, a fully integrated end-to-end deep learning pipeline (Wang et al., 2020 ; Chen et al., 2022 ) would comprehensively address mentioned challenges such as error propagation, the introduction of new artifacts and the insufficient reduction of non-local artifacts. However, the vast data size in synchrotron tomography experiments makes this approach computationally impractical, transforming the task into a quasi-3D problem even when using 2D networks. The data volumes typical for synchrotron tomography far exceed the memory capacities of most modern GPUs. For example, consider a dataset with a 2000³ volume and 2000 projection angles, which demands about 164.8 GB¹ of memory just for a single forward inference using a 2D mixed-scale dense (MS-D) network (Pelt & Sethian, 2018 ) with a depth of 100. Such large datasets are not only common in synchrotron-based setups but are also increasingly encountered in laboratory-based CT systems, especially when using large flat-panel detectors for high-quality scans with a high number of projections. The primary distinction between these systems lies in the high-throughput acquisition enabled by the larger temporal resolution of synchrotron CT. The memory requirement for end-to-end training, which still requires gradients and other computational overhead for back propagation, makes such an approach impractical.

The aim of this work is to propose a novel deep learning based method that is specifically designed for artifact reduction throughout the synchrotron tomography pipeline that combines the ability to address multiple types of artifacts simultaneously, each in their own data domain, with the computational efficiency required for processing large synchrotron tomography datasets. We address the above-mentioned challenges by:

(i) Comprehensive integration. We incorporate deep learning across all three CT processing stages — projections, sinograms and reconstructions — targeting specific artifacts at each stage to exploit deep learning's full potential while simplifying artifact reduction by maintaining their locality.

(ii) Bypass connections. We include bypass connections that provide each stage access not only to all outputs from preceding stages but also to their raw inputs, reducing the risk of error propagation throughout the pipeline.

(iii) Efficient training. To avoid the computational complexity of fully end-to-end training, we adopt 2D CNNs for each stage and propose to train them individually to ensure computation efficiency and practical applicability.

This paper is organized as follows. Section 2 provides an overview of the related concepts and notation that underlie our motivation for the proposed method. In Section 3 we describe the details of our method, which involves using a series of CNNs to reduce artifacts at different stages of the processing pipeline, and describe ways of obtaining high-quality reference data for training. Section 4 covers the experimental design and implementation specifics. In Section 5 we present and analyze the experimental results. Section 6 is dedicated to discussing the implications and significance of our findings. Lastly, in Section 7, we conclude the paper by highlighting potential application areas for our method.

2. Notation and concepts

In this section, we first provide a comprehensive overview of the CT pipeline that forms the basis of our method. We then define the general notation for CT artifacts that arise from the acquisition process and show how they propagate through the pipeline, leading to artifacts in the reconstructed images. Next, we discuss deep learning based denoising methods for CT images and define their notations. Finally, we introduce and define the notations for classical artifact reduction operations that are performed at different steps of the pipeline. This foundation establishes the context and terminology for our proposed artifact reduction method.

2.1. CT pipeline

We apply our multi-stage method to a typical synchrotron CT processing pipeline with three sequential steps (Hintermüller et al., 2010; Gürsoy et al., 2014; Ganguly et al., 2021), illustrated in Fig. 1. First, the CT system scans the object and acquires a series of projection images. Second, those projection images are rearranged into sinogram images. Finally, the reconstructed images are computed.

In X-ray CT, the projection of an object from a specific angle is fundamentally described by the Beer–Lambert law, which models the attenuation of X-rays as they pass through the object. This law states that the intensity I of X-ray radiation exiting the object is exponentially related to the integral of the object's linear attenuation coefficients μ(x, y, z) along the path s of the beam through the object. Mathematically, this relationship is expressed as I = $[I_{0}\exp[\,-\!\int\mu(x,y,z)\,{\rm{d}}s]]$ , where I₀ is the initial intensity of the X-ray beam. We define a projection image as the intensities captured by the detector along a certain angle. Given a detector with M × N pixels, the object is scanned with N_a scanning angles, producing a set of projection images $[{\bf p}]$ $[\in]$ $[{\bb{R}}^{N_{\rm{a}}\times M\times N}]$ . Each projection image is influenced by inherent imaging noise such as Poisson noise, arising from the statistical nature of X-ray photon detection. Additionally, artifacts caused by systematic errors in the projection image, like ring artifacts, can be addressed by techniques such as flat-field correction (Prell et al., 2009 ; Van Nieuwenhove et al., 2015 ).

We first introduce the rearrange operation, denoted as $[{\cal T}]$ , where $[{\cal T}\!:{\bb{R}}^{N_{\rm{a}}\times M\times N}]$ → $[{\bb{R}}^{M\times N_{\rm{a}}\times N}]$ . We then define sinogram images as the rearranged projection images $[{\bf s}]$ $[\in]$ $[{\bb{R}}^{M\times N_{\rm{a}}\times N}]$ . In the reconstruction stage, reconstruction methods are applied to the sinogram images to compute the reconstructed images $[{\bf r}]$ $[\in]$ $[{\bb{R}}^{Z\times Y\times X}]$ . The reconstruction operation is $[{\cal R}]$ , where $[{\cal R}:{\bb{R}}^{M\times N_{\rm{a}}\times N}]$ → $[{\bb{R}}^{Z\times Y \times X}]$ . The projection images p, sinogram images s and reconstructed images r represent the same underlying object and are three data representations in the pipeline.

2.2. Artifacts

For a general artifact introduced by the imaging process, we illustrate its propagation through the pipeline, leading to artifacts in the reconstructed images. Additionally, we provide schematic representations of the different artifacts in each pipeline stage to facilitate a better understanding of their characteristics.

Corrupted projections $[\hat{{\bf p}}]$ = $[\left\{\hat{{\bf p}}_{1},\ldots,\hat{{\bf p}}_{N_{\rm{a}}}\right\}]$ consist of a series of corrupted projection images $[\hat{{\bf x}}_{i}]$ , and are a combination of the underlying clean projection images p and artifacts n,

$[\hat{{\bf p}} = {\bf p}+{\bf n}. \eqno(1)]$

For example, n can contain noise, offsets by miscalibrated detector pixels that cause ring artifacts, and/or outliers that cause zinger artifacts. In a scenario where no artifact removal steps are included in the CT pipeline and, for illustration, we assume a linear reconstruction operation $[{\cal R}_{\rm lin}]$ , the resulting reconstruction images would be expressed as $[\hat{{\bf r}}]$ ,

$[\hat{{\bf r}} = {\cal R}_{\rm{lin}}\left[{\cal T}\left(\hat{{\bf p}} \right)\right] = {\cal R}_{\rm{lin}}\left[{\cal T}\left({\bf p}\right)\right]+{\cal R}_{\rm{lin}}\left[{\cal T}\left({\bf n}\right)\right]. \eqno(2)]$

Here, $[{\cal R}_{\rm{lin}}\left[{\cal T}\left({\bf p}\right)\right]]$ represents the ideal artifact-free reconstruction image, and $[{\cal R}_{\rm{lin}}\left[{\cal T}\left({\bf n}\right)\right]]$ represents the artifacts originating from artifact term n in corrupted projection images. In this manner, artifacts that occur during acquisition are passed through the pipeline and become artifacts in the reconstructed images. In the following, we describe three common artifact types encountered in (high-energy) CT systems; existing methods for reducing artifacts are discussed in Section 2.4:

(i) Noise. Poisson noise is a common artifact in CT, arising from insufficient photon counts at the detector (Boas & Fleischmann, 2012 ). Low-dose CT can suffer from strong noise artifacts due to fewer photons captured by the detector pixels. Poisson noise is introduced during acquisition and presents as local disruptions in projection images. It corresponds to a local perturbation in all stages of the CT pipeline, as illustrated in Fig. 3.

Figure 3
Representations of noise, ring and zinger artifacts in projection, sinogram and reconstruction stages. Red patterns are schematic illustrations of distortions. Noise is a local artifact in images of all stages. Distorted pixel values in the same positions of projection images become a line in the sinogram, resulting in a ring artifact in the reconstructed image. Extremely high pixel values (randomly distributed in different positions) remain as high-value spots in the sinogram and cause crossing lines in the reconstructed image as zinger artifacts.

(ii) Ring artifacts. Ring artifacts arise from systematic detector errors, such as miscalibrated or defective elements in the detector. For example, as demonstrated in Fig. 3, a detector element may record its value with an additive offset applied to the actual data (Boas & Fleischmann, 2012; Pelt & Parkinson, 2018 ). Consistent detector offsets in the projection images translate to straight lines in sinogram images and become ring-like artifacts in the reconstructed images.

(iii) Zinger artifacts. Zinger artifacts often appear in high-energy CT, such as synchrotron CT. They are caused by extremely high-value spots in projection images because the detector occasionally records high-energy photons (Mertens et al., 2015). Since these spots' occurrence is stochastic among projections, they appear as prominent local spots in sinograms, as shown in Fig. 3. After reconstruction, the local artifacts become crossing streaks in the reconstructed images.

2.3. Post-processing with deep learning

We denote the CNN as f_θ, with θ representing the network's learnable parameters. In the context of CT image post-processing, the CNN operates on a corrupted reconstruction image $[\hat{{\bf r}}]$ to yield a processed output r^PP as follows,

$[{\bf r}^{\rm{PP}} = f_{\theta}(\hat{{\bf r}}). \eqno(3)]$

This paper focuses on 2D CNNs for their computational efficiency over 3D counterparts, especially important given the large size of CT images, which often exceed 1000³ pixels. Thus, f_θ is applied slice-by-slice across the Z slices of $[\hat{{\bf r}}]$ , aggregating the outputs into r^PP.

To optimize the CNN parameters θ to approximate artifact-free reconstructions, we employ supervised learning with a training set X = $[{(\hat{{\bf r}}_{1},{\bf r}_{1}^{\rm{HQ}}),\ldots,(\hat{{\bf r}}_ {N^{t}},{\bf r}_{N^{t}}^{\rm{HQ}})}]$ , pairing N^t corrupted reconstructions $[\hat{{\bf r}}_{i}]$ with high-quality (HQ) references $[{\bf r}_{i}^{\rm{HQ}}]$ . High-quality reconstructions are typically acquired through high-dose scans, a large number of projection images or advanced reconstruction techniques (Mohan et al., 2014 ; Kazantsev et al., 2017 ). However, consistently acquiring high-quality reconstructions is often impractical in many scenarios due to their high resource demands, long acquisition times or high radiation exposure. By leveraging CNN-based post-processing, high-quality images can be approximated from low-dose or under-sampled acquisitions, providing an efficient alternative to traditional approaches. This capability is particularly advantageous for data from dynamic experiments (Sieverts et al., 2022 ) or batch processing of similar objects, where rapid and reliable artifact reduction is crucial.

The optimal parameters $[\theta^{*}]$ are determined by minimizing the loss function L that quantifies the discrepancy between the CNN outputs and reference images,

$[\theta^{*} = \mathop{\rm{arg\,min}}_{\theta} \sum\limits_{i\,=\,1}^{\,N^{\,t}} L\left[\,f_{\theta}\left(\hat{{\bf r}}_{i}\right),{\bf r}^{\rm{HQ}}_{i}\right]. \eqno(4)]$

CNN-based post-processing has been shown to effectively reduce noise across various CT imaging applications. However, CNNs typically learn to exploit local information due to their use of small convolution kernels, even when their depth allows for large receptive fields. Due to their reliance on local convolution kernels, CNNs are generally better suited for mitigating local artifacts than non-local ones, such as ring or zinger artifacts. This limitation is illustrated in Fig. 2, where deep learning post-processing shows notable performance in noise reduction but is less effective against non-local artifacts. This observation underscores the rationale for the multi-stage method proposed in this manuscript.

2.4. Classical artifact reduction methods

In practice, classical (i.e. non-learning) artifact reduction methods are typically performed at different stages of the CT pipeline. Specifically, ring artifacts are primarily addressed at the sinogram stage, zinger artifacts are tackled within projection images, and denoising techniques are applied to the reconstructed images. We define the following artifact reduction in classical CT imaging workflows as specific operations: (i) operation on projection images A_p, (ii) operation on sinogram images A_s, (iii) operation on reconstructed images A_r.

The integration of these artifact reduction methods into the CT pipeline can be represented by the following formula, which sequentially applies projection stage operator A_p, sinogram stage operator A_s and reconstruction stage operator A_r to produce final, artifact-reduced images,

$[{\bf r}^{\rm{PL}} = {\bf A}_{\rm{r}}\left[{\cal R}\left({\bf A}_{\rm{s}} \left\{{\cal T}\left[{\bf A}_{\rm{p}}\left(\hat{{\bf p}}\right)\right] \right\}\right)\right]. \eqno(5)]$

In detail, classical denoising methods, such as median or Wiener filtering and TV-based regularization (Rudin et al., 1992 ), effectively reduce noise by enhancing image smoothness or exploiting patch similarity [e.g. BM3D (Dabov et al., 2007 )]. These are denoted by A_r when applied to reconstructed images. Additionally, noise reduction can also be implicitly integrated within the reconstruction algorithm $[{\cal R}]$ .

For ring artifacts, strategies often involve filtering techniques applied directly to sinograms to address linear disturbances (Rivers, 1998; Anas et al., 2010 ; Ketcham, 2006 ; Münch et al., 2009 ) or, alternatively, transforming reconstructed images to polar coordinates for line-based artifact correction (Sijbers & Postnov, 2004 ; Brun et al., 2009 ; Chao & Kim, 2019 ). These are captured by A_s and A_r, respectively. Meanwhile, zinger artifact reduction in projection images A_p is frequently addressed with filtering methods, such as those provided by software like TomoPy (Gürsoy et al., 2014), which includes specialized functions for this purpose.

3. Algorithm

This section outlines our multi-stage artifact reduction approach. We delve into the methodology, detailing the CNN training process and the acquisition of high-quality reference data for training purposes. The discussion concludes by highlighting the design choices that enhance the computational efficiency of our method.

3.1. Multi-stage artifact reduction

Our data-driven approach, depicted in Fig. 4, utilizes three CNNs $[f_{\theta_{\rm{p}}}^{\,\rm{p}}]$ , $[f_{\theta_{\rm{s}}}^{\,\rm{s}}]$ and $[f_{\theta_{\rm{r}}}^{\,\rm{r}}]$ to sequentially process projection, sinogram and reconstruction data. To enhance the efficacy of this sequence and reduce the risk of error propagation, our model integrates bypass connections that incorporate both raw and previously processed data at each stage. This design not only improves the robustness of artifact reduction across the pipeline but also maintains the integrity of the original data. Each CNN is trained independently in a sequential manner to ensure training efficiency.

Figure 4
Schematic of our proposed multi-stage artifact reduction method, illustrating distortions and their correction at each stage. Our method includes bypass connections that incorporate both raw and previously processed data at each stage to reduce the risk of error propagation throughout the pipeline.

Three CNNs are employed independently at each stage of the pipeline, with their specific inputs and outputs defined as follows:

(i) Projection stage. The input to the first network, $[f_{\theta_{\rm{p}}}^{\,\rm{p}}]$ , is the set of raw projection images $[\hat{{\bf p}}]$ = $[\left\{\hat{{\bf p}}_{1},\ldots,\hat{{\bf p}}_{N_{\rm{a}}}\right\}]$ , where N_a denotes the number of projections. The output is the set of enhanced projection images $[{\bf p}^{*}]$ = $[\big\{{\bf p}_{1}^{*},\ldots,{\bf p}_{N_{\rm{a}}}^{*}\big\}]$ , computed as $[{\bf p}^{*}]$ = $[f_{\theta_{\rm{p}}}^{\,\rm{p}}(\hat{{\bf p}})]$ .

(ii) Sinogram stage. The second network, $[f_{\theta_{\rm{s}}}^{\,\rm{s}}]$ , processes the sinograms, which are obtained by rearranging both the raw and enhanced projections using a transformation $[{\cal T}]$ . The input to this network consists of both the raw sinograms $[{\cal T}(\hat{{\bf p}})]$ and the rearranged enhanced projection images $[{\cal T}({\bf p}^{*})]$ . The output is the enhanced sinograms $[{\bf s}^{*}]$ = $[f_{\theta_{\rm{s}}}^{\,\rm{s}}\left[{\cal T}\left(\hat{{\bf p}}\right),{\cal T}\left({\bf p}^{*}\right)\right]]$ .

(iii) Reconstruction stage. The final network, $[f_{\theta_{\rm{r}}}^{\,\rm{r}}]$ , refines the reconstructed images by taking as input the reconstructions of all sinograms, including those derived from raw data and the enhanced outputs of previous stages. The input is composed of the reconstructed raw sinograms $[{\cal R}[{\cal T}(\hat{{\bf p}})]]$ , the reconstructed rearranged enhanced projection images $[{\cal R}[{\cal T}({\bf p}^{*})]]$ , and the reconstructed enhanced sinograms $[{\cal R}({\bf s}^{*})]$ . The output is the set of enhanced reconstructed images $[{\bf r}^{*}]$ , calculated as $[{\bf r}^{*}]$ = $[f_{\theta_{\rm{r}}}^{\,\rm{r}}\{{\cal R}\left[{\cal T}\left(\hat{ {\bf p}}\right)\right]]$ , $[{\cal R}\left[{\cal T}\left({\bf p}^{*} \right)\right]]$ , $[{\cal R}\left({\bf s}^{*}\right)\}]$ .

In essence, this process underscores the strategic application of 2D CNNs across different stages of data processing, emphasizing computational efficiency and systematic enhancement. For example, $[f_{\theta_{\rm{p}}}^{\,\rm{p}}]$ is applied to each of the N_a projection images $[\hat{{\bf p}}]$ , culminating in the collection of processed projections $[{\bf p}^{*}]$ . This stepwise approach is mirrored in subsequent stages, ensuring a comprehensive and effective artifact reduction through the entire imaging pipeline. The method is summarized in Algorithm 1:

3.2. Training procedure

Our training procedure, as detailed in Algorithm 2, strategically decomposes the multi-stage artifact reduction process into separate segments, optimizing the parameters Θ = {θ_p, θ_s, θ_r} of each CNN independently for efficiency and practicality. Although our approach employs 2D CNNs for each phase, the incorporation of operations $[{\cal T}]$ and $[{\cal R}]$ , along with the inherently 3D nature of CT images, imbues our method with a 3D-like capability for artifact reduction. By avoiding the computationally intensive demands of end-to-end 3D training, our strategy greatly reduces the significant computational demands typically faced by 3D deep learning models. As demonstrated in Section 5.2, chaining the projection, sinogram and reconstruction stages outperforms models trained on single stages alone, with further improvements achieved through our proposed bypass connections. This approach not only maintains the model's applicability to large-scale CT problems but also ensures that each stage is fine-tuned to its specific target of artifact reduction.

For our training procedure, we primarily employ a supervised learning approach, though alternatives like self-supervised learning are also feasible. Supervised training necessitates high-quality reference projections, sinograms and reconstructions. These references are attainable through several methods, tailored to the specific requirements of the use case. High-quality scans may involve using increased radiation doses and capturing a large number of projections, from which reference reconstructions are derived. When the projection angles in low-quality scans match those in high-quality scans, corresponding high-quality projections can be directly selected. Alternatively, for non-matching angles, simulations of projections from high-quality reconstructions can be performed, for instance, utilizing tools like the ASTRA Toolbox (Van Aarle et al., 2015). Another approach involves generating artifact-reduced reconstructions through advanced scanning techniques or processing methods designed to minimize artifacts (Zhu et al., 2013 ; Pelt & Parkinson, 2018). These reconstructed images can then serve as the basis for simulating high-quality projections and sinograms, ensuring a comprehensive set of reference data for training.

The training begins with optimizing θ_p for the projection domain, using high-quality reference projections to establish a training set. The objective function is minimized to find optimal parameters. Following this, we proceed to the sinogram domain, optimizing θ_s based on both corrupted and enhanced sinogram pairs, accommodating for discrepancies in projection angles through upsampling. The process culminates in the reconstruction domain, refining θ_r by leveraging the outputs of the previous stages and high-quality reference reconstructions to guide the training. Throughout these stages, domain-specific loss functions can be employed to precisely target and mitigate artifacts, ensuring an effective reduction across each domain:

4. Experiments

In this section, we detail the artifact simulation on the simulated dataset and introduce the real-world dataset for evaluating our method. We also cover the implementation details and the metrics used to evaluate our experiments.

4.1. Datasets

Our method was assessed using the following datasets:

(i) Dataset with simulated artifacts.

Foam phantom. Using the foam_ct_phantom package (Pelt et al., 2022 ), we created simulated cylinder foam phantoms of dimensions 512 × 512 × 512. For each experiment, two unique phantoms were generated for training and testing, respectively. These phantoms contained 100000 non-overlapping bubbles of various sizes within a cylindrical volume. Projection images were generated at a resolution of 512 × 512 pixels across 1024 angles spanning 180° using parallel beam geometry.

LoDoInd. From the LoDoInd dataset (Shi et al., 2024 ), we selected the reference tube sample, characterized by its composition of 15 different materials, such as coriander and pine nuts. This material diversity in the LoDoInd dataset presents a higher level of complexity compared with the single-material foam phantom dataset, offering a more challenging scenario that effectively evaluates our method's performance on intricate data. The sample's dimensions are 4000 × 1250 × 1250 pixels, with the top half designated for training and the bottom for testing. The upper and lower halves of the sample are notably distinct in material composition and structural features, ensuring a meaningful evaluation of the model's generalization capability. Projection images were simulated over 1024 angles distributed over a 180° range, similar to the foam phantom setup.

(ii) Dataset with real-world artifacts.

TomoBank. The fatigue-corrosion experimental dataset from TomoBank (De Carlo et al., 2018 ) provided 25 distinct tomographic sets of aluminium alloy subjected to fatigue testing. Each set consists of 1500 projection images with dimensions of 2560 × 2160 pixels, uniformly sampled over 180°. Dark field and flat field images were also captured. We utilized tomo_00056 and tomo_00055 for training and testing (before and after the alloy breaks into two pieces in the corrosion experiment, images are fairly different for training and testing), respectively, with volumes sized 2160 × 2560 × 2560 pixels.

4.2. Artifact simulation

We applied simulations of noise, ring and zinger artifacts to the foam phantom and LoDoInd datasets. Fig. 5 displays the simulated foam phantom with varying levels of these artifacts.

Figure 5
Example image with various levels of artifacts on a simulated foam phantom dataset. The artifacts, including noise, ring artifacts and zinger artifacts, were generated by varying the parameters I₀, P_ring and P_zinger, respectively. The PSNR and SSIM metrics with respect to the ground truth image are provided for each reconstructed image, displayed in the bottom left.

(i) Noise. Low-dose projections were simulated by applying Poisson noise to the projection images, converting the data into raw photon counts following the procedure given by Pelt et al. (2022). Noise levels were regulated by two factors: the average absorption γ, set to absorb about half the photons, and the incident photon count I₀, which was varied to simulate different noise levels. For low-quality reconstructions, 256 projection images were used, and, for high-quality reconstructions, we used 1024 projection images.

(ii) Ring artifacts. Ring artifacts were simulated by introducing fixed errors to randomly selected detector pixels to emulate systematic detector errors. This was quantified by d_ring ∈ $[{\bb{R}}^{M\times N}]$ , influenced by the affected pixel percentage P_ring and the deviations' standard deviation σ_ring,

$[{\bf d}_{\rm{ring}} = {\bf M}\left(P_{\rm{ring}}\right)\,{\cal N} \left({\bf 0},\sigma_{\rm{ring}}^{2}{\bf I}\right), \eqno(6)]$

where M(P_ring) is a mask with P_ring percent of pixels set to 1. Corrupted projections $[\hat{{\bf p}}]$ were obtained by adding d_ring to each image x_i,

$[\hat{{\bf x}}_{i} = {\bf x}_{i}+{\bf d}_{\rm{ring}}. \eqno(7)]$

The artifact's severity is modulated by P_ring, with σ_ring fixed at 0.005.

(iii) Zinger artifacts. Zinger artifacts were simulated by setting random pixel values in projection images to excessively high levels, imitating detector saturation. The impact of zingers was determined by the percentage of affected projections P_proj, set at 10%, and the percentage of altered pixels within these projections P_zinger. The excessive value v was set at 5, with the variation in P_zinger affecting the number of streaks in the reconstructed slices.

4.3. Preparation of the TomoBank dataset

For the TomoBank dataset, we processed the data to generate both high-quality and corresponding low-quality versions. High-quality data were refined through flat-field correction, utilizing the median values from all ten available flat and dark fields. Despite these initial corrections, slight ring and zinger artifacts remained in the reconstructed images. To mitigate these, we employed TomoPy (Gürsoy et al., 2014) for further refinement of high-quality data: applying a median filter to remove zinger artifacts from projection images and utilizing a polar coordinate system for ring artifact removal in the reconstructed images (Sijbers & Postnov, 2004). The parameters for these additional steps were determined through visual inspection. For low-quality data creation, we selected a single flat field and dark field at random for correction and reduced the set to 500 equally spaced projection images from the original 1500. No additional zinger or ring removal was conducted for low-quality data.

4.4. Comparison methods

Our method's effectiveness was benchmarked against various established approaches to validate its performance in artifact reduction:

(i) Post-proc. Involves deep learning based post-processing techniques applied directly to reconstructed images to enhance their quality.

(ii) Sinogram proc. A deep learning approach for processing sinogram data before reconstruction, as introduced by Nauwynck et al. (2020), which employs a line suppression loss function to address artifacts.

(iii) MBIR/Kazantsev. Employs a model-based regularized iterative reconstruction (MBIR) method that incorporates a data fitting term based on the Student's t-distribution (https://github.com/dkazanc/ToMoBAR), as proposed by Kazantsev et al. (2017). This method aims to suppress outliers and enhance reconstruction from limited data, complemented by subsequent deep learning based post-processing.

(iv) Median filter + Münch. Utilizes a ring artifact removal technique based on wavelet-Fourier filtering, suggested by Münch et al. (2009), and is paired with median filtering to specifically target zinger artifacts. Deep learning based post-processing is applied after initial pre-processing and reconstruction to further refine image quality.

(v) Median filter + Miqueles. Applies a generalized Titarenko's algorithm for ring artifact reduction, as developed by Miqueles et al. (2014 ), and integrates median filtering. Following pre-processing and reconstruction, deep learning based post-processing is employed to enhance the final image quality.

(vi) Vo. Implements a ring reduction algorithm by Vo et al. (2018), designed to address various types of striping artifacts in sinograms. This method includes a median filtering step, making it effective against both ring and zinger artifacts. Similarly, deep learning based post-processing is adopted after pre-processing and reconstruction.

For classical methods, we optimized parameters through grid search using the training dataset. We iterated over parameter combinations, computing reconstructions for each and comparing them against reference reconstructions free of artifacts from the training dataset to identify the most effective parameter set by choosing the highest SSIM value among them. This optimization process was tailored for each dataset and level of simulated artifact.

4.5. Implementation details

In this study, we employed the MS-D network (https://github.com/ahendriksen/msd_pytorch) (Pelt & Sethian, 2018) for all three stages of our method, setting the network depth to 100. The total number of trainable parameters for each stage is detailed in Table 1. To ensure a fair comparison with deep learning based post-processing methods, we adjusted the MS-D network's depth to 180 in those cases, aligning the parameter count across different approaches, as Table 1 illustrates.

Table 1
Number of trainable parameters of used neural networks in our proposed method (total) and the deep learning based post-processing method (post-proc.)

Network	Stage 1	Stage 2	Stage 3	Total	Post-proc.
MS-D	45652	46553	47454	139659	146972

Training of the networks occurred over 200, 200 and 500 epochs for the first, second and third stages, respectively, using the ADAM optimizer (Kingma & Ba, 2015 ) with an L2 loss function and an initial learning rate of 10⁻³. The training for the final stage is longer as there are more complicated image features in reconstructed images. Comparable total training epochs (900) were assigned for the post-processing networks to match the combined duration. Training was subject to early stopping, triggered by ten epochs without validation loss improvement or exceeding 14 days.

Performance evaluation utilized peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM) (Wang et al., 2004 ), computed against high-quality reference data based on the reference data's range.

5. Results

5.1. Artifact reduction performance

Table 2 presents a quantitative comparison between our method and other artifact reduction approaches. Our method consistently surpasses all others in PSNRs and SSIMs across datasets with both simulated and real-world artifacts. It was observed that classical artifact reduction methods, when followed by deep learning based post-processing, generally achieved better PSNRs and SSIMs than strategies that rely solely on post-processing. This highlights the utility of classical artifact reduction techniques. Among the evaluated methods, those focusing exclusively on sinogram images were the least effective in compared metrics, primarily because they omit the crucial image enhancement step applied to reconstructed images. Fig. 6 visually corroborates these findings: our approach not only delivers superior image clarity but also more effectively addresses areas heavily affected by global artifacts, unlike the competing methods which frequently struggle in such regions.

Table 2
Evaluative comparison across deep learning based, classical and our proposed methods on the simulated foam phantom (Pelt et al., 2022), LoDoInd (Shi et al., 2024) and real-world experimental dataset TomoBank (De Carlo et al., 2018). Performance metrics, specifically PSNR / SSIM, are averaged across all slices and highlighted in bold for the best outcomes. Classical methods include an additional post-processing step before reconstruction, with parameters optimized on training data. All methods were designed to have similar numbers of trainable parameters for the same network architecture and were trained for a consistent number of epochs to ensure fairness in comparison

Parameters				MBIR/Kazantsev	Münch	Miqueles	Vo
I₀ / P_ring / P_zinger	Corrupted	Post-proc.	Sinogram proc.	Followed by post-proc.				Ours
Dataset: foam (512, 512, 512)
30 / 0.1 / 0.001	1.14 / 0.23	19.55 / 0.70	17.97 / 0.44	19.24 / 0.70	19.72 / 0.71	20.34 / 0.72	20.37 / 0.71	21.80 / 0.76
100 / 0.1 / 0.001	4.07 / 0.27	21.10 / 0.69	19.58 / 0.51	21.81 / 0.75	22.26 / 0.76	23.70 / 0.77	23.57 / 0.78	24.24 / 0.79
100 / 0.1 / 0	4.97 / 0.29	22.67 / 0.77	19.63 / 0.51	23.00 / 0.77	22.39 / 0.76	23.32 / 0.77	23.38 / 0.77	24.41 / 0.79
100 / 0.2 / 0	3.04 / 0.26	22.21 / 0.76	19.59 / 0.51	22.50 / 0.76	22.25 / 0.75	22.78 / 0.76	23.20 / 0.77	24.09 / 0.78
100 / 0 / 0.002	5.58 / 0.29	23.10 / 0.77	19.66 / 0.51	–	24.26 / 0.78	–	23.86 / 0.77	24.87 / 0.79

Dataset: LoDoInd (2000, 1250, 1250)
500 / 0.1 / 0.001	5.91 / 0.21	36.30 / 0.91	36.22 / 0.90	36.11 / 0.91	36.93 / 0.92	36.93 / 0.92	36.50 / 0.91	38.65 / 0.93

Dataset: TomoBank (2160, 2560, 2560)
NA	17.77 / 0.24	35.64 / 0.77	35.33 / 0.77	35.53 / 0.77	35.77 / 0.78	35.79 / 0.78	35.81 / 0.78	36.55 / 0.79

Figure 6
Comparative visualization on three different datasets of our proposed artifact reduction method against various established techniques, encompassing strategies that initially apply classical pre-processing followed by deep learning based post-processing, as well as methods employing deep learning based processing across distinct stages. Red insets highlight magnified sections for closer inspection. Red arrows point out the remaining artifacts.

(i) Artifact reduction in each stage. The impact of our multi-stage strategy on artifact reduction was evaluated by examining results at each stage, as shown in Fig. 7. Initial projection image processing effectively reduced most zinger artifacts and some ring artifacts, with PSNR and SSIM improving to 18.17 dB and 0.45, respectively. Subsequent sinogram processing reduced most ring artifacts, though at the expense of some high-resolution details, further enhancing PSNR to 18.80 dB and SSIM to 0.47. The final stage of processing the reconstructed images restored numerous image details, smoothed the image, and significantly elevated the PSNR to 22.25 dB and SSIM to 0.76.

Figure 7
Demonstration of artifact reduction across various processing stages. In the projection stage, zinger artifacts and portions of ring artifacts are mitigated. Subsequent processing in the sinogram stage eliminates remaining ring artifacts, while the reconstruction stage focuses on noise reduction to further improve image quality. For illustrative clarity, sinogram images are cropped into square formats.

(ii) Classical methods. After optimizing parameters on the training dataset, MBIR/Kazantsev (Kazantsev et al., 2017) demonstrated effective ring artifact reduction, as shown in Fig. 8. However, this method only partially removed zinger artifacts and caused oversmoothing in image details, resulting in some denoising effect. Although MBIR/Kazantsev (Kazantsev et al., 2017) produced images with the highest PSNR and SSIM values among all classical pre-processing methods (first row in Fig. 8), it resulted in lowest PSNR and SSIM when followed by deep learning post-processing (second row in Fig. 8). This indicates that successful ring artifact reduction by methods like Kazantsev et al. (2017) does not necessarily improve final image detail and quality after post-processing. Techniques combining median filtering and wavelet-based ring reduction (Münch et al., 2009) effectively addressed ring and zinger artifacts but introduced new artifacts, complicating post-processing. The optimized parameters for median filtering and wavelet-based ring reduction are detailed in Appendix A, provided as examples of classical approaches. Median filtering in combination with Miqueles (Miqueles et al., 2014) and Vo (Vo et al., 2018) managed to reduce artifacts without introducing new ones. Nonetheless, they were unable to deliver precise image details after post-processing. Contrarily, our method bypasses traditional artifact reduction steps, eliminating the need for manual parameter selection. Parameters are instead determined through a data-driven approach, yielding superior artifact reduction performance compared with classical methods followed by post-processing.

Figure 8
Comparative visualization of artifact reduction efficacy using classical methods, where pre-processed reconstructed images serve as inputs for further post-processing. Despite the efficiency of classical methods in mitigating specific artifacts, the simultaneous presence of various global artifacts together with noise often results in inferior performance relative to our approach. Red insets highlight magnified sections for detailed examination. Quantitative assessments, indicated by PSNR and SSIM metrics, are displayed in the top-left and bottom-left corners of each image, respectively.

5.2. Ablation analysis

An ablation study was performed to assess the contribution and efficacy of the multi-stage strategy. Central to our design are the bypass connections, aimed at enhancing artifact reduction and maintaining data integrity. Utilizing only processed data from previous stages might lead to the propagation of errors across the pipeline, with limited opportunity for rectification. By introducing both raw and processed data as inputs for subsequent stages, we aim to enrich the information available for following stages, potentially improving artifact reduction while preserving fidelity to the original dataset.

Our analysis included training models solely on projection, sinogram and reconstruction images for artifact reduction and comparing the outcomes with those from our comprehensive multi-stage approach. Furthermore, we examined a variant of our multi-stage method that omitted bypass connections, thereby directly linking each stage without incorporating raw inputs in subsequent phases (green flow in Fig. 4). To ensure a fair comparison, single-stage models were adjusted to match the total trainable parameters of our multi-stage framework. As indicated in Table 3, our multi-stage method consistently outperformed the alternatives, achieving significantly higher PSNR/SSIM values. This underscores the effectiveness of our multi-stage strategy. Notably, the inclusion of raw data as an additional input source yielded further enhancements in PSNR/SSIM values compared with the multi-stage method without raw data, reinforcing the value of integrating raw data for superior artifact reduction.

Table 3
Ablation study evaluating the impact of different stages of deep learning processing within our multi-stage method

This includes comparisons with scenarios processing only a single stage (projection, sinogram or reconstruction) and a variant of our method that does not incorporate raw data but relies solely on the processed output from the previous stage. The assessments were performed on the simulated foam dataset, configured with I₀, P_ring and P_zinger parameters set to 30, 0.1 and 0.001, respectively. The PSNR/SSIM values for the corrupted reconstructed images were 1.14 and 0.23.

Stage of processing	Result
Projection	17.74 / 0.44
Sinogram	17.97 / 0.44
Reconstruction	19.55 / 0.70
Multi-stage without bypass connections	21.03 / 0.74
Multi-stage with bypass connections (ours)	21.80 / 0.76

5.3. Computation analysis

A computational efficiency comparison was conducted between our method and classical methods followed by post-processing techniques using a simulated foam phantom dataset. This dataset comprised 256 corrupted projection images, each with dimensions of 512 × 512 pixels. All computational analyses were performed on a workstation equipped with an Intel i7-11700KF CPU and an Nvidia RTX3070 GPU.

Our approach processes the raw projections through three distinct stages, utilizing a separate CNN for each stage. Notably, in the sinogram stage, processed sinogram images underwent a fourfold upsampling. Training durations for the individual stages of our multi-stage approach were 9 min, 6 h 50 min and 3 h 38 min, respectively, culminating in a total training period of 10 h 37 min. By comparison, the training time for the post-processing technique totaled approximately 14 h 31 min.

As depicted in Fig. 9, the aggregate inference duration for our methodology was 56 s, identical to using the wavelet-based method followed by post-processing as an example. The computation times required for the pre-processing stages of various classical artifact reduction methods are detailed in Table 4. Among these, the method by MBIR/Kazantsev (Kazantsev et al., 2017), which employs a regularized model-based iterative reconstruction for artifact reduction, was noted for its extended computation time, potentially hindering its practical application. In general, the computation time for our integrated multi-stage method aligns closely with the combined computational time of classical methods and their subsequent post-processing.

Table 4
Comparison of inference time between classical artifact reduction methods followed by post-processing on 512³ volumes with 256 projection angles

Method	Pre-processing	Reconstruction	Post-proc.
Median filter + Münch (Münch et al., 2009)	15 s	1 s	40 s
Median filter + Miqueles (Miqueles et al., 2014)	15 s
Vo (Vo et al., 2018)	31 s
MBIR/Kazantsev (Kazantsev et al., 2017)	10 min

Figure 9
Comparison of inference time between our method and classical method followed by post-processing. Our method involves reconstructing the raw projection, the processed projection, and the processed sinogram, requiring three passes through CNNs at each stage. In contrast, post-processing involves pre-processing the raw projection, reconstructing and applying it to the image domain CNN, exemplified here with a wavelet-based technique for clarity. Notably, when considering methods by Vo et al. (2018

) and MBIR/Kazantsev (Kazantsev et al., 2017

), our approach demonstrates improved computational efficiency.

6. Discussion

The experiments described in this paper indicate that the proposed multi-stage method effectively reduces artifacts in CT images, outperforming classical methods combined with post-processing and deep learning based post-processing in PSNRs and SSIMs. Our method achieves accurate artifact reduction on both the simulated and experimental datasets. In particular, our method demonstrates a greater advantage over post-processing when severe ring and zinger artifacts are present in the reconstructed images.

Our method has several advantages over existing methods. First, it employs a multi-stage approach that reduces artifacts accurately in their natural domain, where the artifact is easier to reduce than in other domains. By processing data in the projection, sinogram and reconstruction domains, our method can effectively reduce different artifacts jointly. This multi-stage strategy also holds potential for addressing challenges in extended-field-of-view (FOV) CT, as well as in propagation-based phase-contrast imaging, where artifacts from phase retrieval could further be addressed in their respective domains. Second, the neural networks in each stage can be selected and trained independently to optimize performance for specific artifacts, enabling computationally efficient training. Third, our method can be easily integrated into existing CT pipelines without reducing the throughput of pipelines. The method is conceptually applicable to cone-beam CT, particularly for smaller cone angles where the geometry closely approximates parallel-beam CT. Overall, our method provides an efficient and effective solution for reducing artifacts in CT images.

Although high-quality reference data can be acquired in various manners as explained, such as scanning with high-dose CT, using more sophisticated reconstruction methods or additional post-processing steps, our proposed method still relies on these extra steps to obtain high-quality reference data. These steps may require additional time and effort in real-world scenarios. Therefore, it is worth exploring the integration of our multi-stage strategy with self-supervised methods, for example by applying Noise2Inverse (Hendriksen et al., 2020 ) in a multi-stage manner. This could potentially reduce the reliance on high-quality reference data and improve the applicability of our method in real-world scenarios.

In principle, the proposed method for artifact reduction could be extended to other computational imaging modalities with similar processing pipelines. In many settings, reconstructed images are computed through a series of processing steps in different domains, with artifacts in the measurements propagating through the pipeline resulting in artifacts in the reconstructed images. The key idea of applying deep learning within the pipeline instead of only at the end of the pipeline could be beneficial to improve the image quality of other imaging modalities, e.g. magnetic resonance imaging (MRI) or positron emission tomography (PET). By incorporating our multi-stage strategy into these modalities, artifact reduction can be achieved in their natural domain, potentially leading to improved image quality.

7. Conclusion

In this work, we present a novel multi-stage artifact reduction method for CT images. Our approach involves three stages, each targeting a different type of image artifact in its corresponding domain: projection, sinogram and reconstruction. We employ three separate neural networks, one for each stage, to jointly reduce artifacts in their respective domains. To reduce the risk of error propagation typical in conventional processing pipelines, we incorporate bypass connections between stages. The networks are trained independently from each other in a sequential manner, ensuring computationally efficient training. Our experimental results demonstrate that our method outperforms deep learning based post-processing techniques in terms of artifact reduction accuracy, achieving superior PSNRs and SSIMs both for simulated data and real-world experimental data. Moreover, our approach is designed to seamlessly integrate into existing CT pipelines for enhancing image quality.

APPENDIX A

Parameters optimization of classical methods

This appendix details the optimization of parameters for classical artifact reduction methods, employing median filtering and wavelet-based ring reduction (Münch et al., 2009) as example on a simulated foam phantom dataset with settings I₀ = 100, P_ring = 0.1 and P_zinger = 0.001.

Through grid search, a parameter set was identified that, while reducing artifacts, introduced new ones into the reconstructed images. A subsequent visual inspection allowed for the refinement of parameters, striking a balance between minimizing ring artifacts and avoiding the introduction of new ones. The parameters determined through grid search and refined via visual inspection are provided in Table 5. This tailored parameter set facilitated enhanced artifact reduction in subsequent post-processing steps. Despite these improvements, our proposed method significantly outperforms these classical approaches, as showed both visually in Fig. 10 and quantitatively in Table 6.

Table 5
Determined parameters of classical methods: (i) the expected difference value between the outlier value and the median value of the image (Dif), (ii) the median filter size (Size) for outlier removal function, (iii) the discrete wavelet transform levels (Level), (iv) the type of wavelet filter (Wname), and (v) the damping parameter in Fourier space (Sigma)

Approach	(i) Dif	(ii) Size	(iii) Level	(iv) Wname	(v) Sigma
Grid	0.5	3	4	sym5	8
Grid + visual	0.5	3	4	sym5	1

Table 6
Comparison of the average PSNR and SSIM values of our proposed method with classical methods combined with deep learning based post-processing on the simulated foam phantom dataset (values are calculated as the average of all slices)

	PSNR/SSIM
Method	Pre-processed	Result
No artifact removal	4.07 / 0.27	–
Grid	8.58 / 0.33	22.76 / 0.77
Grid + visual	8.10 / 0.33	23.41 / 0.77
Our	–	24.50 / 0.79

Figure 10
Visual and quantitative comparison of artifact reduction outcomes using classical methods versus our proposed approach. Automatically selected parameters through grid search introduced unintended artifacts, complicating subsequent post-processing. Parameters refined through visual inspection offered better post-processing outcomes by compromising slightly on ring artifact reduction. Nonetheless, our method substantially surpasses these approaches in both visual quality and metric evaluations.

Footnotes

¹The end-to-end workflow includes six stages: data input, projection network processing, conversion to sinogram images, sinogram network processing, reconstruction and final network processing on the reconstruction output. For end-to-end training, it is necessary to host 3D volumes multiple times as input, output and intermediate data between stages. The memory calculation considers hosting the 3D volume five times to account for these requirements, along with managing three sets of depth-100 2D image features processed by the MS-D network at each stage. The primary memory consumption comes from hosting the 2000³ volume five times (5 × 32.96 GB), which significantly exceeds the memory required for storing the network's intermediate features three times (3 × 1.6 GB).

Acknowledgements

We acknowledge the use of the large language model, ChatGPT, to assist in refining the text. The tool was utilized at the sentence level for tasks such as correcting grammar and rephrasing sentences.

Funding information

This research was co-financed by the European Union H2020-MSCA-ITN-2020 under grant agreement No. 956172 (xCTing).

References

Aarle, W. van, Palenstijn, W. J., De Beenhouwer, J., Altantzis, T., Bals, S., Batenburg, K. J. & Sijbers, J. (2015). Ultramicroscopy, 157, 35–47. Web of Science PubMed Google Scholar
Abu Anas, E. M., Lee, S. Y. & Hasan, M. K. (2010). Phys. Med. Biol. 55, 6911–6930. Web of Science PubMed Google Scholar
Biguri, A., Dosanjh, M., Hancock, S. & Soleimani, M. (2016). Biomed. Phys. Eng. Expr. 2, 055010. Web of Science CrossRef Google Scholar
Boas, F. E. & Fleischmann, D. (2012). Imaging Med. 4, 229–240. CrossRef Google Scholar
Brun, F., Kourousias, G., Dreossi, D. & Mancini, L. (2009). World Congress on Medical Physics and Biomedical Engineering, 7–12 September 2009, Munich, Germany, Vol. 25/4, Image Processing, Biosignal Processing, Modelling and Simulation, Biomechanics, pp. 926–929. Springer. Google Scholar
Bührer, M., Stampanoni, M., Rochet, X., Büchi, F., Eller, J. & Marone, F. (2019). J. Synchrotron Rad. 26, 1161–1172. Web of Science CrossRef IUCr Journals Google Scholar
Burger, H. C., Schuler, C. J. & Harmeling, S. (2012). 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2392–2399. IEEE. Google Scholar
Chao, Z. & Kim, H.-J. (2019). Phys. Med. Biol. 64, 235015. CrossRef PubMed Google Scholar
Chen, C., Xing, Y., Gao, H., Zhang, L. & Chen, Z. (2022). IEEE Trans. Med. Imaging, 41, 2912–2924. CrossRef PubMed Google Scholar
Chen, H., Zhang, Y., Zhang, W., Liao, P., Li, K., Zhou, J. & Wang, G. (2017). Biomed. Opt. Expr. 8, 679–694. CrossRef Google Scholar
Dabov, K., Foi, A., Katkovnik, V. & Egiazarian, K. (2007). IEEE Trans. Image Process. 16, 2080–2095. Web of Science CrossRef PubMed Google Scholar
De Carlo, F., Gürsoy, D., Ching, D. J., Batenburg, K. J., Ludwig, W., Mancini, L., Marone, F., Mokso, R., Pelt, D. M., Sijbers, J. & Rivers, M. (2018). Meas. Sci. Technol. 29, 034004. Web of Science CrossRef Google Scholar
Faragó, T., Gasilov, S., Emslie, I., Zuber, M., Helfen, L., Vogelgesang, M. & Baumbach, T. (2022). J. Synchrotron Rad. 29, 916–927. Web of Science CrossRef IUCr Journals Google Scholar
Fu, T., Wang, Y., Zhang, K., Zhang, J., Wang, S., Huang, W., Wang, Y., Yao, C., Zhou, C. & Yuan, Q. (2023). J. Synchrotron Rad. 30, 620–626. CrossRef CAS IUCr Journals Google Scholar
Ganguly, P. S., Pelt, D. M., Gürsoy, D., de Carlo, F. & Batenburg, K. J. (2021). J. Synchrotron Rad. 28, 1583–1597. CrossRef IUCr Journals Google Scholar
Ghani, M. U. & Karl, W. C. (2019). IEEE Trans. Comput. Imaging, 6, 181–193. CrossRef Google Scholar
Gholizadeh-Ansari, M., Alirezaie, J. & Babyn, P. (2020). J. Digit. Imaging, 33, 504–515. PubMed Google Scholar
Gürsoy, D., De Carlo, F., Xiao, X. & Jacobsen, C. (2014). J. Synchrotron Rad. 21, 1188–1193. Web of Science CrossRef IUCr Journals Google Scholar
Hansen, P. C., Jørgensen, J. & Lionheart, W. R. (2021). Computed Tomography: Algorithms, Insight, and Just Enough Theory. SIAM. Google Scholar
Hendriksen, A. A., Pelt, D. M. & Batenburg, K. J. (2020). IEEE Trans. Comput. Imaging, 6, 1320–1335. Web of Science CrossRef Google Scholar
Hintermüller, C., Marone, F., Isenegger, A. & Stampanoni, M. (2010). J. Synchrotron Rad. 17, 550–559. Web of Science CrossRef IUCr Journals Google Scholar
Kazantsev, D., Bleichrodt, F., van Leeuwen, T., Kaestner, A., Withers, P. J., Batenburg, K. J. & Lee, P. D. (2017). IEEE Trans. Comput. Imaging, 3, 682–693. CrossRef Google Scholar
Kazantsev, D., Wadeson, N. & Basham, M. (2022). SoftwareX, 19, 101157. Google Scholar
Ketcham, R. A. (2006). Proc. SPIE, 6318, 216–222. Google Scholar
Kim, H. & Champley, K. (2023). arXiv:2307.05801. Google Scholar
Kingma, D. P. & Ba, J. (2015). 3rd International Conference on Learning Representations (ICLR2015), San Diego, CA, USA, 7–9 May 2015. Google Scholar
Labriet, H., Nemoz, C., Renier, M., Berkvens, P., Brochard, T., Cassagne, R., Elleaume, H., Estève, F., Verry, C., Balosso, J., Adam, J. F. & Brun, E. (2018). Sci. Rep. 8, 12491. Web of Science CrossRef PubMed Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L. & Timofte, R. (2021). Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844. Google Scholar
Liu, Y., Wei, C. & Xu, Q. (2023). Med. Phys. 50, 4308–4324. CrossRef PubMed Google Scholar
Mäkinen, Y., Marchesini, S. & Foi, A. (2021). J. Synchrotron Rad. 28, 876–888. Web of Science CrossRef IUCr Journals Google Scholar
Mertens, J., Williams, J. & Chawla, N. (2015). Nucl. Instrum. Methods Phys. Res. A, 800, 82–92. CrossRef CAS Google Scholar
Miqueles, E. X., Rinkel, J., O'Dowd, F. & Bermúdez, J. S. V. (2014). J. Synchrotron Rad. 21, 1333–1346. Web of Science CrossRef CAS IUCr Journals Google Scholar
Mohan, K. A., Venkatakrishnan, S., Drummy, L. F., Simmons, J., Parkinson, D. Y. & Bouman, C. A. (2014). 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6909–6913. IEEE. Google Scholar
Mokso, R., Schlepütz, C. M., Theidel, G., Billich, H., Schmid, E., Celcer, T., Mikuljan, G., Sala, L., Marone, F., Schlumpf, N. & Stampanoni, M. (2017). J. Synchrotron Rad. 24, 1250–1259. Web of Science CrossRef IUCr Journals Google Scholar
Münch, B., Trtik, P., Marone, F. & Stampanoni, M. (2009). Opt. Express, 17, 8567–8591. Web of Science PubMed Google Scholar
Nauwynck, M., Bazrafkan, S., Van Heteren, A., De Beenhouwer, J., Sijbers, J., München, Z. & Sammlungen, S. (2020). The International Conference on Image Formation in X-ray Computed Tomography. Regensburg, Germany. Google Scholar
Paleo, P. & Mirone, A. (2015). J. Synchrotron Rad. 22, 1268–1278. Web of Science CrossRef IUCr Journals Google Scholar
Pan, J., Sun, D., Pfister, H. & Yang, M.-H. (2016). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1628–1636. Google Scholar
Pelt, D. M., Batenburg, K. J. & Sethian, J. A. (2018). J. Imaging, 4, 128. Web of Science CrossRef Google Scholar
Pelt, D. M., Hendriksen, A. A. & Batenburg, K. J. (2022). J. Synchrotron Rad. 29, 254–265. Web of Science CrossRef CAS IUCr Journals Google Scholar
Pelt, D. M. & Parkinson, D. Y. (2018). Meas. Sci. Technol. 29, 034002. Web of Science CrossRef Google Scholar
Pelt, D. M. & Sethian, J. A. (2018). Proc. Natl Acad. Sci. USA, 115, 254–259. Web of Science CrossRef CAS PubMed Google Scholar
Prell, D., Kyriakou, Y. & Kalender, W. A. (2009). Phys. Med. Biol. 54, 3881–3895. Web of Science CrossRef PubMed Google Scholar
Raufaste, C., Dollet, B., Mader, K., Santucci, S. & Mokso, R. (2015). Europhys. Lett. 111, 38004. Web of Science CrossRef Google Scholar
Rivers, M. (1998). Tutorial: Introduction to X-ray Computed Microtomography Data Processing, University of Chicago, USA. Google Scholar
Rudin, L. I., Osher, S. & Fatemi, E. (1992). Physica D, 60, 259–268. CrossRef Web of Science Google Scholar
Shi, J., Elkilany, O., Fischer, A., Suppes, A., Pelt, D. & Batenburg, K. (2024). eJNDT, 29, https://doi.org/10.58286/29228. Google Scholar
Shi, J., Pelt, D. M. & Batenburg, K. J. (2023). International Workshop on Machine Learning in Medical Imaging, pp. 52–61. Springer. Google Scholar
Sieverts, M., Obata, Y., Rosenberg, J. L., Woolley, W., Parkinson, D. Y., Barnard, H. S., Pelt, D. M. & Acevedo, C. (2022). Commun. Mater. 3, 78. CrossRef Google Scholar
Sijbers, J. & Postnov, A. (2004). Phys. Med. Biol. 49, N247–N253. Web of Science CrossRef PubMed Google Scholar
Sun, Z., Ng, C. K. & Sá Dos Reis, C. (2018). Quant. Imaging Med. Surg. 8, 609–620. CrossRef PubMed Google Scholar
Thompson, A., Llacer, J., Campbell Finman, L., Hughes, E., Otis, J., Wilson, S. & Zeman, H. (1984). Nucl. Instrum. Methods Phys. Res. 222, 319–323. CrossRef Web of Science Google Scholar
Titarenko, S., Withers, P. J. & Yagola, A. (2010). Appl. Math. Lett. 23, 1489–1495. Web of Science CrossRef Google Scholar
Titarenko, V. (2016). J. Synchrotron Rad. 23, 1447–1461. CrossRef IUCr Journals Google Scholar
Van Nieuwenhove, V., De Beenhouwer, J., De Carlo, F., Mancini, L., Marone, F. & Sijbers, J. (2015). Opt. Express, 23, 27975–27989. Web of Science CrossRef CAS PubMed Google Scholar
Vo, N. T., Atwood, R. C. & Drakopoulos, M. (2018). Opt. Express, 26, 28396–28412. Web of Science CrossRef PubMed Google Scholar
Wang, W., Xia, X.-G., He, C., Ren, Z., Lu, J., Wang, T. & Lei, B. (2020). IEEE Trans. Comput. Imaging, 6, 1548–1560. CrossRef Google Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. (2004). IEEE Trans. Image Process. 13, 600–612. Web of Science CrossRef PubMed Google Scholar
Westneat, M. W., Socha, J. J. & Lee, W.-K. (2008). Annu. Rev. Physiol. 70, 119–142. Web of Science CrossRef PubMed CAS Google Scholar
Yan, R., Liu, Y., Liu, Y., Wang, L., Zhao, R., Bai, Y. & Gui, Z. (2023). IEEE Trans. Comput. Imaging, 9, 83–93. CrossRef Google Scholar
Yang, Q., Yan, P., Zhang, Y., Yu, H., Shi, Y., Mou, X., Kalra, M. K., Zhang, Y., Sun, L. & Wang, G. (2018). IEEE Trans. Med. Imaging, 37, 1348–1357. CrossRef PubMed Google Scholar
Yuan, H., Jia, J. & Zhu, Z. (2018). 15th International Symposium on Biomedical Imaging (ISBI2018), pp. 1521–1524. IEEE. Google Scholar
Zhang, K., Zuo, W., Chen, Y., Meng, D. & Zhang, L. (2017). IEEE Trans. Image Process. 26, 3142–3155. Web of Science CrossRef PubMed Google Scholar
Zhu, Y., Zhao, H., Wang, T., Deng, L., Yang, Y., Jiang, Y., Li, N., Chan, Y., Dai, J., Zhang, C., Li, Y., Xie, Y. & Liang, X. (2023). Comput. Biol. Med. 155, 106710. CrossRef PubMed Google Scholar
Zhu, Y., Zhao, M., Li, H. & Zhang, P. (2013). Med. Phys. 40, 031114. Web of Science CrossRef PubMed Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

JOURNAL OF
SYNCHROTRON
RADIATION

ISSN: 1600-5775

Volume 32| Part 2| March 2025| Pages 442-456

https://doi.org/10.1107/S1600577525000359

Open

access

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Format		BIBTeX
		EndNote
		RefMan
		Refer
		Medline
		CIF
		SGML
		Plain Text
		Text

Search IUCr Journals		doi		Advanced search
Author		volume	page

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Multi-stage deep learning artifact reduction for parallel-beam computed tomography

1. Introduction

2. Notation and concepts

2.1. CT pipeline

2.2. Artifacts

2.3. Post-processing with deep learning

2.4. Classical artifact reduction methods

3. Algorithm

3.1. Multi-stage artifact reduction

3.2. Training procedure

4. Experiments

4.1. Datasets

4.2. Artifact simulation

4.3. Preparation of the TomoBank dataset

4.4. Comparison methods

4.5. Implementation details

5. Results

5.1. Artifact reduction performance

5.2. Ablation analysis

5.3. Computation analysis

6. Discussion

7. Conclusion

APPENDIX A

Parameters optimization of classical methods

Footnotes

Acknowledgements

Funding information

References

research papers