Self‐adaption and texture generation: A hybrid loss function for low‐dose CT denoising

Abstract Background Deep learning has been successfully applied to low‐dose CT (LDCT) denoising. But the training of the model is very dependent on an appropriate loss function. Existing denoising models often use per‐pixel loss, including mean abs error (MAE) and mean square error (MSE). This ignores the difference in denoising difficulty between different regions of the CT images and leads to the loss of large texture information in the generated image. Purpose In this paper, we propose a new hybrid loss function that adapts to the noise in different regions of CT images to balance the denoising difficulty and preserve texture details, thus acquiring CT images with high‐quality diagnostic value using LDCT images, providing strong support for condition diagnosis. Methods We propose a hybrid loss function consisting of weighted patch loss (WPLoss) and high‐frequency information loss (HFLoss). To enhance the model's denoising ability of the local areas which are difficult to denoise, we improve the MAE to obtain WPLoss. After the generated image and the target image are divided into several patches, the loss weight of each patch is adaptively and dynamically adjusted according to its loss ratio. In addition, considering that texture details are contained in the high‐frequency information of the image, we use HFLoss to calculate the difference between CT images in the high‐frequency information part. Results Our hybrid loss function improves the denoising performance of several models in the experiment, and obtains a higher peak signal‐to‐noise ratio (PSNR) and structural similarity index (SSIM). Moreover, through visual inspection of the generated results of the comparison experiment, the proposed hybrid function can effectively suppress noise and retain image details. Conclusions We propose a hybrid loss function for LDCT image denoising, which has good interpretation properties and can improve the denoising performance of existing models. And the validation results of multiple models using different datasets show that it has good generalization ability. By using this loss function, high‐quality CT images with low radiation are achieved, which can avoid the hazards caused by radiation and ensure the disease diagnosis for patients.


INTRODUCTION
X-ray computed tomography (CT) 1 is widely used in clinical screening, diagnosis, and intervention because of its fast scanning time and clear image.However, its high radiation puts patients at risk of gene damage and malignant tumor. 2 Epidemiological studies have shown that even the radiation dose of two or three times CT scans can significantly increase the risk of cancer, especially in children. 3 Therefore, according to the ALARA principle, low-dose CT examination is the trend of clinical scanning mode selection in the future.However, when X-ray flux is reduced, 4,5 X-rays cannot penetrate the scanned object due to the lack of energy intensity, which will lead to noise and artifacts in the reconstructed CT images 6 and may confuse the identification and analysis of diseases, damaging the diagnostic performance.This leads to the need to remove noise from LDCT.
Various denoising algorithms have been proposed to obtain high-quality denoised images.0][21][22] IR can significantly improve the quality of low-dose CT reconstruction images, but it is timeconsuming and complex in practical applications, which is an unavoidable problem.In contrast, sinogram domain denoising and image processing after reconstruction are faster, but they need appropriate models, which may be difficult to obtain, and the problems of resolution loss and edge blur often occur.
With the recent rapid development of deep neural network, [23][24][25][26][27] the combination of tomography and deep learning or machine learning can not only realize image analysis but also image reconstruction, which provides a new research direction for LDCT denoising.Due to the powerful feature extraction ability of the convolutional neural network (CNN), 28 it is increasingly applied to image denoising.The methods based on deep learning mainly use advanced DL technology to optimize the structure of the neural network to improve the analysis algorithm.Han et al. 29 combined with the U-net model, proposed a new depth residual learning method for sparse view CT reconstruction.Jin et al. 30 demonstrated the method of low-field CT reconstruction using filtered back projection (FBP) and CNN combined with residual learning and U-net architecture.Kang et al. 31 proposed a deep convolution neural network (CNN) applied to wavelet transform coefficients of low-dose CT images to suppress specific noise in LDCT.Content-noise complementary learning (CNCL) 32 strategy is proposed, which uses two deep learning predictors to learn the respective content and noise of image datasets.
The deep learning method includes two main parts, the neural network model structure and the loss function used to measure the error between the reconstruction result and the target.In the training process, the network model is updated by the backpropagation of the loss function until the error is minimized.In other words, the loss function will directly affect the performance of the model.Although the existing deep learning methods have made various innovations in the model structure and have achieved certain results, they usually use the loss per pixel of the generated image and the target image as the training target, MAE 33 or MSE, 34 which leads to two problems.First of all, per-pixel loss ignores the difference in the difficulty of denoising in each region of the LDCT.The neural network model may ignore the regions that are too difficult to deal with to obtain higher average visual quality output. 35In addition, per-pixel loss makes the reconstructed image lose a lot of texture information which is crucial to human beings.The texture is in the high-frequency information, but the pixel information is the low-dimensional feature of the image.So, per-pixel loss will cause the generated image to become smooth. 36o solve the above problems, we propose a hybrid loss function in this study.This hybrid loss function consists of two parts, namely WPLoss and HFLoss.The noise distribution in CT images is not uniform, and experiments show that the noise-denoising difficulty is uneven in different regions, for example, the noise is more likely to remain in texture-complex regions, which may be the diagnostic regions of interest to doctors.Therefore, we use WPLoss to perform adaptive noise reduction in different regions of the LDCT image to avoid uneven denoising difficulty in different regions of the image.First, the generated images and the target images are uniformly divided into some corresponding non-overlapping patches, and then the MAE of each patch is calculated.thelarger the patch loss is,the worse the generation effect is, so more training is needed.Therefore, it is necessary to assign a larger weighting factor.Meanwhile, the weights of patches with small losses are weakened to avoid forgetting the trained information due to over-updating.HFloss is used to measure the difference between the high-frequency part of the generated image and the target image.The highfrequency part is extracted by Fourier transform and a high-pass filter.
In this study, we propose a new hybrid loss function, which can improve the denoising effect of existing denoising models by verifying several models, RED-CNN, 37 pix2pix, 38 SPARNet, 39 and ldct_nonlocal. 40e proved this on the AAPM-Mayo Low-Dose CT Grand Challenge dataset and a simulation dataset of 34 patients from The Cancer Imaging Archive (TCIA).By using the proposed hybrid loss function, the LDCT images can be better used to obtain high-quality images after noise reduction, thus achieving the goal of obtaining accurate diagnostic information at low F I G U R E 1 Noise Reduction Model.The model establishes a mapping from input to output.We use LDCT as input, and the goal is to find a model to make the output closest to the corresponding NDCT which means minimizing the Distance (NDCT,output).
doses and greatly reducing the risk of radiation to patients.

Noise reduction model
As shown in Figure 1, our workflow starts with the images reconstructed by scanning slices, and the denoising problem is limited to the image domain. 41ince the method based on deep learning is independent of the statistical distribution of image noise, the LDCT denoising problem can be simplified to the following problem.Suppose that an NDCT image y ∈ R w×h is transformed into an LDCT image x ∈ R w×h after adding quantum noise, where w and h mean the length and width of the 2D CT image.Then, their relationship can be expressed as: where  : R w×h → R w×h denotes the mapping from NDCT to LDCT.A sufficiently complex deep neural network can approximate any function, so by using a series of paired LDCT-NDCT images (x, y) to train a CNN model ∅ closest to  −1 with parameters , we can make ∅  (x) as close to y as possible.
Then, the noise reduction problem can be considered equivalent to finding  to minimize the Euclidean distance between ∅  (x) and y after noise reduction: The backpropagation formula during model updating is expressed as: where  t show the parameters of the model at the t-th iteration, are the gradients corresponding to the model parameters  t , and lr. is the learning rate.

Hybrid loss function
The hybrid loss function proposed in this paper is expressed mathematically as follows: where L WP (x, y) and L WP (x, y) denote WPLoss and HFloss which are calculated from Equation ( 6) and Equation ( 10) respectively, and  1 ,  2 are their weighting coefficients.

WPLoss
Considering that different regions of the image contain different amounts of information and have different denoising difficulties, we fine-tune the backpropagation process from a new perspective.We propose a new loss function to adjust the loss weight of different regions of the image.In the part with greater difficulty in denoising, there is a large gap between the generated image and the target image during training, so we give a larger weight to improve the model's attention to this area.To achieve this, the generated image and corresponding NDCT image first are divided into N non-overlapping patch pairs and assume that the loss weight of the i_th pair is a constant w i .We define w i as calculated as follows: where (∅(x) i , y i ) denotes the i_th patches of generated image and NDCT, and I is a value in the set {|∅(x) i − y i |}, 1 ≤ i ≤ N, which is set as the median number in our study.It should be noted that to prevent model instability caused by excessive local loss weight, we limit w, for example [0.25,4].Therefore, WPLoss we proposed is defined as follows: Then, we can replace Equation ( 3) with the following formula: In CT images, different areas contain different amounts of information.In experiments, it can be found that the denoising ability of the denoising model is too weak in some places, and may be too strong in others, and the more complex the texture is, the more difficult it is to denoise, but this is often the location of the key pathological information.Here, by adjusting the weight coefficient of the local MAE of the image, we make the model pay more attention to the areas with complex textures, which requires almost no additional calculation work.

HFLoss
Noise reduction models generally add per-pixel loss to try to minimize the pixel error between the denoised image and the full dose image.Although it makes the generated image have a high PSNR, it may produce blurred images and lead to distortion and texture distortion.The high-frequency information of the image contains more edge and texture information than the low-frequency information, so we propose a loss function named HFLoss based on the high-frequency information.
The discrete Fourier transform 42 of a two-dimensional image with length m and width n is defined as: The corresponding inverse Fourier transform is expressed as follows: Through Fourier transformation, we complete the transformation of the image from the spatial domain to the frequency domain.Then we use the high-frequency filter to keep only the high-frequency part.Finally, through the inverse Fourier transform, we get the corresponding result of the high-frequency part of the image, which we think is the edge and texture of the image.
Therefore, the loss function HPLoss we define is calculated as follows: where G denotes the high pass filter.HFLoss extracts the high-frequency information of the generated image and the target image and calculates its distance in the data domain to improve the ability of the model to generate detailed textures.

EXPERIMENTS AND RESULTS
In this section, we first describe the dataset preparation, implementation details, and important parameter settings in our experiment.Then, we conducted qualitative and quantitative experiments to compare the effect of noise reduction before and after adding the hybrid loss to multiple models, and we proved the role of each loss through ablation experiments.Datasets include Mayo datasets and simulation datasets produced using public NDCT datasets.The quantitative evaluation indicators are PSNR and SSIM.

Mayo data
To verify the effectiveness of the proposed loss function in clinical tasks, we used a real clinical dataset authorized by Mayo Clinic for the "2016-NIH-AAPM-Mayo Clinic low dose CT challenge 43,44 "to train and test.

Simulated data
In the simulation dataset, we used 4000 normal doses of CT images of 34 patients downloaded from TCIA 45 as the NDCT.All images are 512×512 pixels, and 3600 images are used for training, while the rest are used as the testing dataset.The corresponding LDCT dataset is obtained by converting NDCT images to the projection domain by ray transform 46,47 and adding Poisson noise. 48The Poisson noise model can be expressed as: Where p i denotes measurement along the i_th path of X-ray, K represents the conversion gain from X-ray photon to electron, and N 0 is the initial incident intensity of X-ray used to constrain the intensity of noise.Relative to the Mayo dataset, the TCIA dataset contains more body parts.Moreover, in the simulation dataset, we can freely control the noise level.

Parameter setting
In the experiment, images adjusted to 224×224 are used uniformly, and random clipping is used for data enhancement.In the training, ADAM optimizer with hyperparameters  = 1 × 10 −5 ,  1 = 0.5,  2 = 0.9 is used to update the network parameters of all models.In addition, the initial learning rate is set to 10

Mayo dataset
In Mayo dataset, the data of nine patients are first selected as the training dataset and the data of another patient as the testing dataset.Table 1 shows the average measurements of all images in the testing dataset.
The proposed hybrid loss function as Equation ( 4) can significantly improve the denoising effect of all models.We selected one representative abdominal CT slice in the clinical testing dataset to demonstrate the impact of the hybrid loss function on each model.Figure 2 shows the denoising results of each model before and after adding the hybrid loss function.It can be seen that there is strong quantum noise in LDCT, especially near structural parts (such as bones) with high attenuation coefficient.All models show the denoising effect to varying degrees.However, in the first row, RED-CNN leads to over-smoothness and loses a large amount of texture information because it only uses the average per pixel loss as its loss function.In contrast, pix2pix and SPAR-Net better retain the structural features, but there is still a lot of noise and artifact residue.Though ldct-nonlocal eliminates most of the noise and artifact and has good structural fidelity, there is still some blur near the bone.The third row shows the denoising results of all models after adding the hybrid loss function.Compared with the denoising results in the first row of the original models, we can find that after adding the hybrid loss function we proposed, all models show a better noise reduction effect and retains more details.
The second and fourth rows in Figure 2 further show the enlarged images of the regions of interest (ROI) in LDCT.By observing the arrow marks, it is obvious that RED-CNN smoothes some important structures.Pix2pix, SPARNet, and ldct-nonlocal can retain these details to varying degrees without blurring, but blocky artifacts appear locally, such as near bones.For the low contrast liver lesions highlighted in the red circle, after adding the loss function, all models reduce artifacts and better preserve the edges, improving the detectability of the lesions.
For quantitative evaluation, we select three ROIs as shown in the red dotted boxes of the NDCT image in Figure 2 to calculate PSNR and SSIM respectively.The results are shown in Table 2.It can be found that the quantitative evaluation results are consistent with the visual inspection,and all the ROIs after adding the hybrid loss have higher PSNR/SSIM.In order to verify the effect of each part of the proposed hybrid loss, we used another abdominal CT slice from the same patient to conduct ablation experiments   3 shows the noise reduction results of pix2pix with different loss functions.And the second row is the enlarged images of the image content indicated by the red rectangle.By observing the partial images indicated by the red arrows, it can be found that the original pix2pix encountered serious blocky artifacts and blurred near the bone.Although the pix2pix after adding WPLoss reduces artifacts in the image, but there is still the problem of missing texture.In contrast, pix2pix has the smallest difference with NDCT after adding the hybrid loss and retains most of the details.Table 3 lists the quantitative results from the whole abdominal slice in Figure 3.We can observe that PSNR and SSIM are both improved after adding the hybrid loss we proposed.

Model
Considering that just using the data of 1 patient as the testing dataset can't effectively demonstrate the effect of the proposed loss, we further divided the dataset according to 6:2:2.Besides, to analyze the combined effect on validation dataset, we performed an analysis of variance on the quantitative metrics.The results are shown in Table 4.
To compensate for the lack of data volume in the Mayo dataset, we used an external testing dataset of 20 patients with only LDCT data provided by the Mayo Clinic for further validation.We selected a representative CT image for demonstration, and the results are shown in Figure 4.The proposed hybrid loss in this paper obtained good denoising results on all four models.

Simulation dataset
Table 5 shows the quantitative results of the simulation testing dataset on different models.The results show that our proposed hybrid loss and each part of the loss have improved in quantitative indicators on each model, which provides quantitative evaluation support for visual observation.
Figure 5 shows the influence of the hybrid loss function on the denoising effect of each model.The second and fourth rows are the zoomed images over the ROI.By observing the parts marked by the red arrows, it is obvious that all models have improved in noise suppression and structure maintenance after adding the hybrid loss.On the contrary, in the original models, RED-CNN and pix2pix smoothed some details.Although SPARNet and ldct-nonloacl retained more details, blocky artifacts appeared locally.
Figure 6 shows the denoising results of another abdominal slice in the testing dataset with different loss functions on ldct-nonlocal.Obvious blocky artifacts and blurring appear after ldct-nonlocal denoising.By visually inspecting the local magnified images, the denoising result has a clearer appearance after adding WPLoss, but there is still the problem of unclear edges, and adding HFLoss makes up for the problem of blurred edge texture.
To analyze the combined effect of the model on testing dataset, we performed an analysis of variance on the quantitative metrics.The results are shown in Table 6.

DISCUSSION
As a multi-level imaging technology, CT can provide high-resolution, high-quality three-dimensional images.
In practical applications, CT is commonly used to diagnose a variety of diseases, such as lung diseases, neurological diseases, and bone diseases, which can provide more accurate diagnostic information and help doctors better develop treatment plans.However, the X-rays used in CT produce radiation, and long-term exposure to radiation may increase the risk of developing cancer, which greatly limits the use of CT in clinical Various loss functions have been proposed by many national and international researchers for improving the denoising performance of the models.Perceptual loss 49 uses neural networks to extract high-level features of images, which effectively promotes the approximation of denoising results and real images in terms of deep information.However, perceptual losses may have some problems when processing images with higher complexity, for example, more complex textures, shapes, etc. may be ignored, in contrast, our proposed HPFLoss uses converting images from the image domain to the frequency domain through Fourier transformations and filters out low-frequency information, effectively retaining texture details in high-frequency information.
Jiang 50 et al. proposed focal frequency loss by reducing the simple frequency components to focus adaptively on the difficult-to-synthesize frequency components, effectively facilitating the generation of texture details by the model.However, the frequency domain information consists of frequency components containing complex relationships, and low-frequency information can interfere with the generation of high-frequency information.Therefore, the high-pass filter is used in the HPFLoss proposed in this paper to filter lowfrequency information and focus on the generation of high-frequency information, while WPLoss is used to calculate the image pixel loss, which ensures the retention of low-frequency information.The advantages of our proposed hybrid loss are its interpretability and simplicity.WPLoss does not require any additional computational operations and only requires reassigning the loss weights for each region in the image.However, it is important to note that the weight needs an appropriate threshold to avoid model shocks caused by too large weights.The difference between the high-frequency part of the image generated by the HFLoss calculation and the corresponding NDCT image penalizes the model when the texture details are not well generated.We combine these two complementary loss functions to form a new hybrid loss function.Since these two loss functions have different roles, we need to balance their effects in practical applications.We find the optimal parameters through extensive experiments to make them widely adaptable.
One of the main drawbacks of our study is the lack of a dataset.The clinical dataset contains data from 10 patients, 9 of which were used for training.Due to the small number, the model may lack generalization ability.However, we believe that with large-scale internal validation and the random nature of the data, our hybrid loss function can achieve better results.We validated it on several models and demonstrated that it is a good component to add to existing models and can meet clinical requirements.In addition,the dataset is not clear enough with a large number of blurred or low-quality images.
We used manual screening to avoid the impact of this problem.In addition, we are collecting more data.
In summary, we propose a hybrid loss function for LDCT image noise reduction, and we have demonstrated that the loss function can effectively improve the denoising performance of the model.

F I G U R E 2
Impact of the hybrid loss function on each model for the abdominal slice in the testing dataset.The images from (a1) to (e1) are LDCT, and the denoising results of RED-CNN, pix2pix, SPARNet, and ldct-nonlocal before adding the hybrid loss function, respectively.The two red rectangles in LDCT specify the regions of interest (ROI) and the amplification results are shown in the second row.The images from (a2) to (e2) are the NDCT and the denoising results of each model after adding hybrid loss respectively, and the fourth row is the enlarged images of the corresponding ROI.The arrows indicate the areas where the visual effect are observed.The CT display window is [−160 240] HU.In contrast, after adding the hybrid loss function, every model retains more details and reduces artifacts and blur.

F I G U R E 3
Denoising performance comparison of pix2pix with different loss functions for the abdominal slice in the testing dataset.(a)LDCT, (b)pix2pix with original loss function, (c)adding WPLoss, (d)adding HFLoss, and (e)adding hybrid loss function denoised results, (f)NDCT.The CT display window for the images is [−160 240] HU.The results show that both WPLoss and HFLoss can improve the denoising effect of pix2pix, and when using the hybrid loss, the generated image is the closest to NDCT.TA B L E 3 Quantitative results associated with pix2pix using different loss functions for the abdominal slice.

F I G U R E 4
Denoising effect of each model using hybrid loss function on LDCT image of the external testing dataset.The images from (a) to (e) are LDCT image, and the denoising results of RED-CNN, pix2pix, SPARNet, and ldct-nonlocal using hybrid loss function, respectively.The CT display window is [−160 240] HU.It can be seen that each model shows a good denoising effect.F I G U R E 5 Impact of the hybrid loss function on each model for the abdominal slice in simulation testing dataset.The images from (a1) to (e1) are LDCT, and the denoising results of RED-CNN, pix2pix, SPARNet, and ldct-nonlocal before adding hybrid loss function, respectively.The images from (a2) to (e2) are the NDCT and the denoising results of each model after adding hybrid loss respectively.The red rectangle in LDCT specifies the ROI and the amplification results are shown in the second and the fourth row.The CT display window is [−160 240] HU.In contrast, after adding the hybrid loss function, every model retains more details and reduces artifacts and blur.TA B L E 4 Quantitative and analysis of variance r results of each model with different loss functions for the images in the Mayo testing dataset.

F I G U R E 6
Denoising performance comparison of ldct-nonlocal with different loss functions.(a)LDCT, (b)NDCT, (c)ldct-nonlocal, (d) ldct-nonlocal adding WPLoss, (e) ldct-nonlocal adding HFLoss, and (f)ldct-nonlocal adding the hybrid loss denoised results.The display window for the images is [−160 240] HU.TA B L E 5 Quantitative results of each model with different loss functions for the images in the simulation testing dataset.
applications.There are means to reduce the radiation dose and obtain low-dose CT results, thus reducing the associated risks.Low-dose CT images have important medical significance and diagnostic value, such as liver, pancreas and breast examinations, and can promote disease prevention and early diagnosis, early detection of lesions, and improved treatment outcomes.However, low-dose CT has a large amount of quantum noise, which affects doctors' diagnosis of the condition, thus reducing the diagnostic value of low-dose CT.With the explosion of deep learning in recent years, various deep neural network models have been developed to obtain high-quality images using low-dose CT.However, most of them optimize the structure of neural networks to achieve better denoising.Appropriate loss functions are important for neural network models, but most of them usually use per-pixel loss such as MAE and MSE.however, per-pixel loss ignores the difficulty of denoising CT images without region and the generated images often suffer from texture detail loss.
The dataset is made up of 2378 3 mm thickness NDCT images and the corresponding LDCT images of size 512 × 512 from 10 patients.The NDCTs are produced by setting the tube potential to 120 kv and quality effective mAs to 200 mAs.And the Poisson noise is inserted into the NDCTs to obtain the LDCTs corresponding to a noise level of 25% of the full dose.To validate the effect of the proposed loss, we divide the Mayo dataset into Quantitative results of each model with different loss functions for the images in the testing dataset.
Performance comparison of pix2pix before and after adding hybrid loss over the ROIs marked in Figure2in terms of the selected metrics.
TA B L E 2

Model RED-CNN pix2pix SPARNet ldct-nonlocal
Analysis of variance results for quantitative metrics on the testing dataset.