Deep learning to overcome Zernike phase-contrast nanoCT artifacts for automated micro-nano porosity segmentation in bone

A deep-learning processing approach is proposed to assess in three dimensions the micro- and nano-porosity in bone imaged by Zernike nano-computed tomography.

A parallel monochromatic X-ray beam is first converged onto the sample by the condenser lens, and then it divides into diffracted and undiffracted waves that pass through a Fresnel Zone Plate, which magnifies the sample image.The undiffracted wave is phase-shifted by the phase ring on the way to the detector.The image detector measures the sum of these two waves.(Ronneberger et al., 2015) and is a widely implemented CNN.Briefly, the workflow contains encoder layers that capture contextual information in the data and reduce the spatial resolution of the input and decoder layers that will generate segmentation maps.

Fig. 1 .
Fig. 1. 3D representation of a vertebra of the zebrafish skeleton within available micro-CT data of bone imaged on laboratory micro-computed tomography scanner (Skyscan 1172, Brucker-microCT Kontich, Belgium) as previously reported (Silveira et al., 2022) with a pixel size of 5 µm.Part of Hemal spine (orange dashed rectangle) was imaged with Zernike nanoCT.

Fig. 2 .
Fig. 2. Schematic representation of the experimental setup of the Zernike nano-CT experiment.Each spine of zebrafish vertebrae was mounted on the rotation stage.A parallel monochromatic X-ray beam is first converged onto the sample by the condenser lens, and then it divides into diffracted and undiffracted waves that pass through a Fresnel Zone Plate, which magnifies the sample image.The undiffracted wave is phase-shifted by the phase ring on the way to the detector.The image detector measures the sum of these two waves.

Fig. 3 .
Fig. 3. Schematic diagram of the U-Net model architecture, extracted from the Dragonfly interface.It is based on(Ronneberger et al., 2015) and is a widely implemented CNN.Briefly, the workflow contains encoder layers that capture contextual information in the data and reduce the spatial resolution of the input and decoder layers that will generate segmentation maps.

Fig. 4 .
Fig. 4. Schematic diagram of the Sensor3D model architecture extracted from the Dragonfly interface.The architecture is based on details provided in (Novikov et al., 2019) with modules that can be adapted in the Deragnofly interface.Briefly, this CNN combines time-distributed layers and bidirectional ConvLSTM (Long Short Term Memory) in an end-to-end U-Net-like hybrid architecture, yielding improved performance as compared to the U-Net architecture.

Fig. 5 .
Fig. 5. Correlation between the training data size and batch size parameters and the accuracy (Dice coefficient) and error loss metrics (Categorical Cross-entropy loss function), used to evaluate the training and validation of the CNN models.The training data size presents a strong correlation, where the accuracy of the trained models increases with the increase of the training data size (positive correlation) and the error loss decreases with the increase of training data size (negative correlation).The batch size does not influence the accuracy and error loss of the trained models since it has a low correlation coefficient.

Fig. 6 .
Fig.6.Analysis of duration time to train the Sensor3D and U-Net models and the impact of the number of images used to train the models as well as the batch size used.Most of the models with a batch size of 64 took an average ∼ 150% less time to train than the models with a batch size of 32.But, in some cases, models trained with a batch size of 32 required fewer epochs and shorter training duration than models with a batch size of 64.

Fig. 7 .
Fig. 7. Example slice of classification for shade-off (brown), bone (green) and the LCN (blue) classes comparing the result obtained for Sensor3D and U-Net models trained with 20, 50 and 70 ground truth images (training data size) at different stages.The models trained with 20 and 50 ground truth images have the worst outcome, mislabeling the bone class with background value or shade-off value.Both sensor3D and U-Net models trained with 70 ground truth images can correctly segment the voids, bone and shade-off regions.

Fig. 8 .
Fig. 8. Example renders comparing the resulted classification obtained with a) Otsu thresholding, b) Sensor3D model and c) U-net model for shade-off (brown), bone (green) and the LCN (blue) classes.The standard Otsu thresholding has the worst outcome and the trained Sensor3D model has the best outcome.The mislabeling pixels of bone class with background gray values are pointed out by the black arrows.

Table 1 .
Model training for Anatomix dataset

Table 2 .
Model validation for Anatomix dataset