Deep learning-based hybrid reconstruction algorithm for fibre instance segmentation from 3D X-ray tomographic images

3D X-ray tomography is a powerful scanning technique used for generating images of complex fibre structures. A novel machine-learning algorithm to identify and separate individual fibres using 3D images is proposed in this article. The developed four-step hybrid 3D fibre segmentation algorithm involves deep-learning aided semantic segmentation that slices 3D images to create 2D images for fibre extraction, elliptical contour estimation combined with the marker-controlled watershed algorithm for separating fibres from the background area, identifying individual fibres through 3D reconstruction, and, lastly, the 3D object refining approach based on outlier object detection and replacement. The proposed methodology is implemented on a real-time sample of nylon fibre bundle under compression and its 3D X-ray image volume to validate the performance. The results show its superior performance compared to off-the-shelf image processing algorithms in terms of precision, that is, with a validation accuracy greater than 90%, and efficiency, that is, preventing the need for a huge data set and reducing the


INTRODUCTION
The quality of paper handsheets manufactured in the pulp and paper industry depends on the characteristic properties of the pulp fibre and the operating conditions of the pulping process [1]. Understanding the effect of the pulping process operating conditions on the fibrous structure of the final product would enable the manufacture of paper with the desired properties. The images are used from 3D X-ray tomography in conjunction with advanced deep learning-based image processing techniques to correlate the micro-structure of the fibres in paper handsheets with the process operating conditions. An efficient 3D image segmentation algorithm is proposed and implemented to solve this problem. The study focuses on nylon-fibre bundles to initiate the primitive study. Nylon fibres are synthetic fibres made of polymers that are close contrasts to the wood fibres in the images allowing them to deal with similar levels of noise and artifacts to the paper handsheets but on simpler shapes.
In recent years, there have been significant advances in image processing algorithms and tools, particularly those that use advanced deep learning algorithms.
These algorithms often provide highly accurate models for image processing. Our goal is to use these advanced algorithms for image segmentation to identify specific properties of fibres in paper handsheets. This information can eventually be used to determine appropriate process operating conditions. For example, a soft sensor can be developed to infer the correlations between process operating con-ditions and paper handsheet properties using handsheet images. This way, process operating conditions can be optimized to obtain desired product qualities. Image segmentation comprises semantic and instance segmentation, where the latter extends the former by further separating individual objects [2]. Many existing deep learning segmentation algorithms are developed using basic convolutional neural networks (CNN), such as the widely known AlexNet [3], ResNet [4], GoogLeNet [5], and so on. Moreover, in order to incorporate specific practical considerations, various extensions of the basic CNN structure have been implemented. Examples include the recurrent neural network (RNN) [6] and its improved version, long short term memory network (LSTM) [7], generative adversarial networks [8], fully convolutional networks [9], and so on. Most recently, motived by the successful applications of transformers in the natural language processing field, vision transformer [10] and its variants [11] have been applied on image recognition tasks and demonstrated better performance than convolutional models on a very large dataset. Many of these algorithms can be adapted to different environments and perform well in describing complicated real-world scenarios. However, these algorithms generally result in extreme model complexity, and require huge datasets and significant maintenance efforts after a model is built. Some existing literature focuses on fibre segmentation and 3D micro-structure reconstruction from tomographic image volumes. For example, Agyei et al. [12] proposed a four-step sequential 2D segmentation approach, followed by a 3D volume rendering algorithm for object matching. Viguie et al. [13] designed an image analysis method to identify the fibres with irregular cross sections and quantify the fibre contacts. Emerson et al. [14] developed a centre point detection and tracking approach to segment individual fibres from X-ray 3D tomography. To the best of the authors' knowledge, deep learning-based segmentation algorithms have not yet been used in the X-ray based nylon fibre segmentation problems, motivating us to develop a hybrid 3D reconstruction algorithm.
Using efficiency and simplicity as criteria, the encoder-decoder structured deep neural networks, called U-Nets [15], are chosen to perform semantic segmentation. Other deep learning-based semantic segmentation algorithms, such as Mask R-CNN [16] and Faster R-CNN [17], require heavier annotation, training, and tuning workloads. However, for post-processing the segmentation results, the traditional instance segmentation algorithms, such as connected component analysis [18] and watershed segmentation [19], are considered. Therefore, the proposed work aims to develop an efficient segmentation algorithm that addresses these challenges.
A four-step hybrid 3D fibre segmentation algorithm that makes use of both deep learning and conventional image processing algorithms for image segmentation is proposed in this article. Each of these steps has challenges associated with training data set creation, instance segmentation, 3D reconstruction, and refining. To address these challenges, some improvements to the existing algorithms are proposed that include augmentation of the U-Net training data set and development of an improved version based on a conventional marker-controlled watershed algorithm. More importantly, motivated by the centroid object tracking algorithm, a novel multivariate Gaussian modelling and Kullback-Leibler (KL) divergence oriented fibre tracking algorithm is developed, followed by a novel 3D object refining approach based on outlier detection and replacement.
The rest of this paper is organized as follows. Section 2 highlights the 3D image data set and the challenges associated with fibre segmentation. Section 3 illustrates the four steps in the proposed algorithm for segmentation, and Section 4 demonstrates the superior performance of the proposed algorithm on a compressed nylon fibre bundle sample.

3D individual nylon fibre segmentation
A typical nylon fibre bundle sample studied in this work is shown in Figure   1. It shows both the integrated 3D volume and the 2D slice representations of the same bundle. These images are generated using a non-destructive X-ray tomography technique to reveal the internal fibre microstructure [12]. A 3D image can be decomposed into a series of 2D slices through any of the x, y, or z axes. For instance, Figure 1B   In the X-ray tomography images, the nylon fibres are embedded in a grey background, from which the nylon fibres need to be initially extracted and then separated from each other. Good quality images with high contrasts (distinct grey values for the different phases, e.g., air and fibres) are challenging to obtain for such light and porous structures. Moreover, the effect of beam hardening creates black rings and shadows on fibres' cross-sections, preventing the utilization of an easy classical intensity-based segmentation. The nylon fibres are uniform in concentration. The variation in grayscale across their diameter results from beam hardening, an artifact of image acquisition. In order to accurately identify and segment the nylon fibres, a 2D fibre segmentation is performed on all single 2D slices, followed by a 3D reconstruction using the 2D segmentation results.

Challenges
As mentioned in the introduction section, deep learning algorithms can provide highly automatic and accurate instance segmentation on some real-world problems. However, in this fibre segmentation problem, the efficiency of the advanced deep learning algorithms is restricted by several factors listed below, which result in degraded performance. Therefore, instead of solely relying on the deep learning approaches, a hybrid algorithm combining both deep learning and traditional machine learning algorithms is proposed. The developed methodology addresses several challenges unique to the segmentation of fibres in tomographic images: • Lack of labelled samples: Each 3D image is sliced into hundreds of 2D tomograms, and therefore, manual annotations on any 3D sample will require significant effort. This restricts the use of deep learning algorithms that need a large, fully annotated training data set.
• Limited number of 3D tomographic samples: The lack of a sufficiently large data set to train a full-fledged deep learning model. This challenge can be addressed through transfer learning algorithms such as VGGNet [20]. However, these algorithms must be initially trained by big open-source data sets and then fine-tuned with relevant smaller data sets. There are no existing labelled large data-sets of tomographic images that can be utilized to train such a network.
• Low accuracy of deep learning-based instance segmentation: The dense fibre occurrence, arbitrary orientations, and locations of fibres make the training of deep neural networks, such as Mask-RCNN [16], extremely difficult.
Deep neural networks will take several hours to train and provide lower segmentation accuracy than traditional approaches, such as watershed segmentation [21].
• 3D reconstruction errors: The errors in 2D tomogram instance segmentation will also impact the 3D reconstruction performance.
We address the above challenges through a hybrid 3D fibre instance segmentation algorithm that integrates deep learning and traditional image segmentation algorithms. The detailed workflow of this algorithm is explained and illustrated in the next section.

THE PROPOSED METHODOLOGY
Our ultimate objective is to automatically segment and label the individual nylon fibres in the 3D images. Starting with the 2D tomogram slices as inputs, four sequential image processing blocks are developed as shown in Figure 2

U-Net aided semantic segmentation
U-Net is a simpler version of convolutional neural networks (CNNs) and has been widely used to segment biomedical images that share similar texture with 2D nylon fibre tomograms. The U-Net model is composed of two major components, namely the encoder and decoder. The encoder half of the model is utilized to identify image features that carry out a down-sampling process, creating a lower dimensional feature embedding space using the input image. The decoder half performs an up-sampling operation to recover the spatial information of the input image before building the model output using the features as input. Due to its features such as simple structure and ability to train efficiently and provide good accuracy, U-Net is selected as the front-end semantic segmentation algorithm for 2D fibre extraction. In this problem, the compressed nylon fibre bundle is the tar- Here, X n ∈ R m 1 ×m 2 denotes the n th 2D tomogram, which consists of image slices and Y n ∈ R m 1 ×m 2 is the corresponding pixel-wise label map, with 1 indicating nylon and 0 indicating the background.
The characteristic features of X-ray tomographic images depend on experimental conditions and the sampling procedure used. As such, 2D tomogram images of only certain nylon fibre specimens can be easily segmented by naive thresholding approaches. The carefully selected data set D s belongs to this category, and the labels Y 1:N s are generated using a simple thresholding method. In contrast, the target data set D t is difficult to segment by applying any thresholding algorithms. The following two steps are used to create an effective data set for the U-Net model for training: • Step 1: Scale the images in D s based on the mean and standard deviation of the image grayscale intensity in D t as shown in Equation (1), thus creating where X n s ∈ D s and Z n s ∈ D S , n s = 1, · · · , N s , represent the original and converted image grayscale intensities, respectively; X n t ∈ D t , n t = 1, · · · , N t , denotes the image grayscale intensity in the target data set; and E(·) and std(·) denote the mean and standard deviation operations, respectively. The advantage of scaling the D s is to guarantee the training and test data sets follow the same distribution.
• Step 2: Manually annotate n t images, n t ≪ N t , in the target data set, and perform image augmentation on these labelled images, forming a new data Then, we create an integrated new data set for U-Net training by combining D S and D T , after which the trained U-Net model is employed to label the entire target data set D t .

Elliptical contour estimation with watershed algorithm for instance segmentation
The U-Net model uses the grayscale 2D image slices and generates a categorical output that separates fibres from their background areas. Once the binarized images are created, the next step is to separate the fibres from each other through a process called instance segmentation. The elliptical fused objects are rather difficult to separate compared to circular fused objects using conventional markercontrolled watershed algorithms [22]. This observation is illustrated in the first two sub-figures of Figure 3. As an alternative, the concave points supported elliptical contour estimation algorithm [23] that provides better separation between two closely connected elliptical fibres is used in this methodology. However, the estimated elliptical contours sometimes share small overlapped areas between each other. In order to obtain smoother instance segmentation, a simple but effective combination of the following three steps is proposed: (1)  isting non-elliptical contour-based segmentation algorithms [14]. Although the elliptical contour estimation recognizes shapes such as circles and ovals, it cannot differentiate when the cross-sections are close to each other. Therefore, steps 3 and 4 are proposed to address these concerns.

Gaussian modelling based 3D reconstruction
In the 3D reconstruction functional block of Figure 2, the 2D slices with separated nylon fibres are integrated into 3D space by matching and linking the labelled cross sections belonging to the same nylon fibre. The segmented 2D image slices are stitched together to generate a 3D image while accounting for the 3D spatial and geometric layout of the nylon fibre bundle. The 3D reconstruction involves tracking the location of every fibre that appears in two adjacent 2D slices and identifying new fibres in those slices. There are several existing object-tracking algorithms that can be used to track the location of the same fibres between adjacent 2D slices. Among them, the centre point matching algorithm has been successfully applied to glass and carbon fibre tracking due to its simplicity, high processing speed, and satisfactory accuracy [14]. However, without considering the shape and size of individual nylon fibres and focusing only on the centre, it tends to lose its accuracy when the nylon fibres on each slice are densely distributed or when there exist instance segmentation biases from the previous watershed segmentation block.
In order to compensate for the weakness of the centroid matching algorithm, a novel fibre tracking approach that uses a Gaussian probability distribution model is proposed. Samples from a two-dimensional multivariate Gaussian distribution automatically fall in an elliptical shape. The centre of the ellipse is the mean value of the Gaussian distribution, and the corresponding covariance matrix defines the shape and orientation of the ellipse. Therefore, each separated fibre object is modelled with a Gaussian distribution, as follows: where l represents the l th detected nylon fibre object on the current slice, and indicates the total m pixel coordinates of this fibre object. µ (l) and Σ (l) are the Gaussian parameters that are estimated by maximizing the following log-likelihood function.
As a result, every individual fibre object is compressed into a Gaussian distribution on each 2D slice. When performing object tracking in a subsequent slice, instead of using Euclidean distance to trace the centroid movement of a fibre, we use KL divergence as a matching criterion by measuring the similarity between two Gaussian distributions. Given two fibre objects with distributions (1) , y y y (1) ) ∼ N(µ (1) , Σ (1) ) and q 2 (x x x (2) , y y y (2) ) ∼ N(µ (2) , Σ (2) ), the KL divergence is computed as follows: While traversing the fibre objects in the current 2D slice, the KL divergence criterion between the current object distribution and the distributions of suspected objects in the previous slices are computed. When the target object with minimal KL divergence is identified, and the calculated D KL is less than a predefined small threshold τ, this identified fibre object is connected to the one on the current slice.
Therefore, q 1 and q 2 fibre objects are coming from the same fibre, as shown in step 3 of Figure 2.  k v in array A A A (l) is computed as follows: where Σ

(l)
k v is the estimated covariance matrix of the fibre cross section on the k th v 2D slice.
Normally, considering the continuous extension of nylon fibre, the variation of fibre cross-section area is expected to be smooth. Any abrupt increment in the area implies a potential occurrence of additional fibre pieces introduced by erroneous instance segmentation. Compared with the actual area of the fibre cross-section, the above indicator can better highlight the abnormal increase in the fibre crosssection areas. Therefore, it is selected for outlier detection. Based on the indicator A A A (l) , the upper quartile, denoted by τ u , is used to formulate the detection threshold.
Because A A A (l) sometimes exhibits a non-stationary characteristic, the outliers in A A A (l) are finally detected by satisfying the following two conditions: We can detect the fibre cross-section changing rapidly, which contains redundant components of adjacent fibres. The detection threshold is chosen to be 1.5 times the upper quartile based on the empirical analysis.
where ∆A k v ,y ] T , for v = 1, · · · ,V , a linear model can be formulated as in Equation (7).
where the model parameters a x , b x , a y , and b y are estimated from the normal data Based on the above linear model, the centroids of the wrongly segmented fibre pieces can be estimated using their slice identities {k v m } v m ∈V V V abnormal , and the corresponding covariance matrix Σ The proposed methodology for segmenting and labelling the individual nylon fibres from the 3D X-ray tomographic images is as follows. First, U-Net aided semantic segmentation is used for 2D fibre extraction that labels the target data set. Second, instance segmentation is performed through elliptical contour-based estimation with a watershed algorithm for accurately separating the fibres from each other. Third, the same nylon fibres are integrated into 3D space through a novel 3D reconstruction by Gaussian modelling and distribution. Last, the 3D object refining method is implemented on fibres to eliminate the attached parts of adjacent fibres. In summary, each step in the developed hybrid workflow combining deep-learning and image-processing algorithms enhances the precision of fibre segmentation from 3D X-ray tomographic images.

PERFORMANCE VALIDATION ON A CASE STUDY
In this section, a nylon fibre bundle under compression is selected as our sample, and its corresponding 3D X-ray image volume is processed according to the proposed methodology for performance validation. The generated 3D image volume has the dimension of 1013 × 964 × 500 voxels. The resolution is 1 micron, so its physical volume is about 1 × 1 × 0.5 mm 3 . After slicing along the z axis, 500 2D tomogram slices with dimension 1013 × 964 pixels are produced. Initially, a randomly selected slice in Figure 4A is analyzed, and the grayscale intensity histograms of the nylon fibres and background, excluding the black area, are depicted in Figure 4B. It can be observed from this comparison that the grayscale intensity distribution of the nylon fibres significantly overlaps with the background intensity distribution and thus makes it difficult to find an appropriate threshold for segmentation. and 15% respectively [24; 25]. Finally, the training and validation performances are summarized in Table 1. As shown in Figure 5, the proposed watershed algorithm is able to provide higher and more stable segmentation accuracy than the conventional one [26]. Moreover, after investigating the segmentation accuracy of all the 500 slices, the proposed watershed algorithm offers more than 90% accuracy for 77.6% of the 2D slices, and only 6.2% of the slices have segmentation accuracy less than 85%. Given the inevitable manual annotation error, the proposed watershed approach could give satisfactory accuracy at this step. The error of instance segmentation can also be compensated in the 3D reconstruction and refining procedures.
In the 3D reconstruction step, because of the fibre orientation, fibre tracking from the X direction can be more efficient than from the Z direction. Therefore, in this special case, the binarized 3D image volume is re-sliced from the X direction, generating 1013 slices with dimension 964 × 500 pixels. The proposed watershed algorithm is re-applied on each slice to perform instance segmentation. On each slice, every individual object is first represented by a multivariate Gaussian distribution, and then the KL divergence between the current object is compared with all the adjacent objects in the previous six slices. The object in the previous slices with the minimal KL divergence is selected as a potential candidate to connect if the minimal KL divergence is less than 1. Otherwise, the current object will be assigned a new label and treated as a recently emerged one that never appeared before.
After conducting the 3D reconstruction, several extracted fibres can have extra attached pieces due to erroneous watershed instance segmentation. Figure 6A is a typical example. After detecting the abrupt area change and performing trimming based on the probability density function value, the redundant piece has been removed, as shown in Figure 6B. Finally, after running through the four sequential functional blocks in Figure   2, we have the 3D nylon fibre separation result shown in Figure 7, where one can clearly observe the separated nylon fibres without redundant pieces attached.
Integrating deep-learning methods with image processing techniques provided an accurate fibre segmentation even with a lesser training data set.
In summary, since only a limited number of samples exist, none of the 3D instance segmentation approaches can be employed directly. Therefore, the proposed model has its own novelty and efficacy. For the model improvement, the conventional marker-based watershed segmentation is modified by utilizing an elliptical contour estimation. The performance has been compared with the conventional algorithms, and Figure 5 illustrates the efficacy of the proposed approach.
Even though there are some similar works on glass fibre segmentation, many of those models started from a thresholding approach to obtain the region of interest, which is not achievable in our case because the nylon fibre and background cannot be separated using this approach. Moreover, the glass fibre orientation is much more organized than nylon fibres, making the model unusable in this scenario. Therefore, the multivariate Gaussian model is employed for 3D reconstruction followed by novel 3D object refining, which is another novelty resulting in good reconstruction performance, as shown in Figures 6 and 7.
In the future, we plan to investigate two potential avenues for further improvement: (1) a mixture of Gaussian distributions can be used to identify fibres with irregular shapes instead of modelling each object using a single Gaussian distribution; and (2) for better fibre tracking and object refining, fused information can be used from multiple views instead of conducting the reconstruction from a single view of the image.

CONCLUSIONS
This paper proposes a 3D hybrid instance segmentation approach to process the 3D tomographic images of nylon fibres. Without loss of generality, we have