Adaptive 3D noise level‐guided restoration network for low‐dose positron emission tomography imaging

Many deep learning methods have been proposed to improve the quality of low‐dose PET images (LPET), which usually construct end‐to‐end networks with certain radiation dose inputs. However, these approaches have omitted the noise disparity in PET images, which may differ among manufacturers or populations. Therefore, we tend to exploit these noise differences among PET images to achieve adaptive restoration. We proposed a 3D noise level‐guided PET restoration network for LPET including (1) adaptive noise level‐aware subnetwork and (2) LPET restoration subnetwork. The first subnetwork aims to predict the noise level of the given LPET, while the second subnetwork treats the estimated noise level as a priori information to guide the restoration process from LPET to standard‐dose PET images. Experiments were performed on real human head and neck datasets while the peak signal‐to‐noise ratio and structural similarity index measure were used to evaluate LPET recovery performance. Moreover, we also compared the proposed network with several deep‐learning approaches. Experimental results demonstrate that our network with dual‐stage design can perform adaptive restoration for LPET, yielding better visual and quantitative results. In future work, we attempt to apply our method to other imaging tasks and adapt it for clinical practice.

the early diagnosis of neurological diseases. 4 To obtain images for diagnostic or research use, the utilized scanner must register a large number of events, which requires either a relatively high radiation dose or a long scan time. High radiation doses inevitably raise concerns about potential health hazards, while long scan times increase the burden imposed on patients because they need to be motionless during the scanning process. 2 As a result, it is desirable to reduce the required tracer injection and image acquisition time. However, a shorter scan duration and a lower injection activity level degrade the resulting image quality and make the obtained image contain considerable noise and artifacts, 5 which increases the difficulty of detecting low-contrast lesions.
To improve the quality of low-dose PET images (LPET), a range of methods have been proposed, which fall into three main categories 6 : (1) iterative reconstruction algorithms, (2) post-reconstruction image filtering methods, and (3) machine learning methods. Iterative reconstruction algorithms focus on processing emission data and usually treat LPET reconstruction as an optimization problem, trying to find a set of parameters that make the actual data most likely to be observed. A widely used method in clinical practice is the maximumlikelihood expectation-maximization. 7 Different types of prior information such as Markov random field, dictionary learning, total variation -based priors, and other variants are also often incorporated in the reconstruction process to penalize noise. 8 Recently, images from different modalities have also been treated as prior information to improve the quality of the reconstructed images. 9 Although iterative reconstruction can sometimes achieve good performance, the reconstructed images still suffer from the loss of image details and blurred edges. 10 In addition, the computational cost of iterative reconstruction tends to be high, which has difficulties in practical applications.
LPET images suffer from low signal-to-noise ratio (SNR) and poor image resolution, so many postreconstruction methods use image filters to achieve enhancement while preserving details. 11 For instance, Hofheinz et al. proposed bilateral filters to improve the SNR of PET data while preserving the spatial resolution at the boundaries. 12 Huerga et al. used context modeling and spatially adaptive wavelet shrinkage to maintain edges in important regions. 13 Jomaa et al. presented a new method based on the multiscale and nonlocal means method to reduce noise in dynamic PET sequences. 14 Despite these methods mentioned above can achieve higher visual quality, image filters involve the consideration of several factors which may vary significantly between image processing tasks. Therefore, a great deal of experimentation and experience is required to determine the optimal parameter settings.
Machine learning methods for PET denoising have achieved good performance in both objective and subjective assessments. Kang et al. proposed a framework based on a regression forest to estimate standard-dose brain PET images from their counterparts. 15 Wang et al. proposed a mapping-based sparse representation framework to predict standard-dose PET images. 16 With the rapid development of artificial intelligence, deep learning (DL) methods that can learn noise distribution and image prior information have revealed great potential in LPET image processing. 17 Spuhler et al. proposed a novel dilated convolutional neural network (CNN) to recover standard-dose brain images from low-dose images. 18 Wang et al. trained a CNN to recover standard-dose PET images with 6.25% ultralow-dose 18 F-fludeoxyglucose ( 18 F-FDG) whole-body PET images. 19 Several works based on generative adversarial networks (GANs) have also been used in this context. 20 Lu applied a GAN to recover whole-body 18 F-FDG PET images with an input consisting of 10% dose PET images. 21 Lei et al. proposed a Cycle GAN model to estimate the diagnostic quality of PET images using 1/8th count data. 22 Likewise, a U-shaped network architecture with skip connections was introduced for the restoration task. Josh Schaefferkoetter et al. employed the 3D U-net model to denoise the reconstructed PET images for lesion detection. 23 Dutta et al. applied residual U-Net-and dilated U-Net-based architectures to generate high-count preclinical PET images from a low-count form. 24 However, the abovementioned studies merely considered the single-dose level of LPET data as the input to the DL-based neural network; in other words, they all simply fed plenty of images at a certain noise level into the network to find the mapping relationship between LPET and their responding standard-dose PET images. Furthermore, they all neglected the effects of different radiation doses on the final PET images, resulting in PET images with noise differences. 25 Even when PET images are collected from the same patient and reconstructed using the same algorithm, their noise levels can vary greatly due to the use of different scanners, scanning times and so on. 10,26 Besides, PET image noise levels may differ across individuals and medical devices. 20 For example, the injection doses of PET scans for children and the elderly are typically lower than those for adolescents. Therefore, it is often not enough to focus only on a specific noise level, but also more attention needs to be paid on the PET images with different noise levels. Moreover, we can easily observe that PET images contaminated with higher observable noise levels have worse image quality and diverge more from the ground truth. In this light, the variant noise levels of PET images could well provide prior and/or additional knowledge for the restoration task.
Considering the noise-level disparity, we propose a 3D noise level-guided PET restoration network to adaptively restore LPET. Different from other DL-based methods that directly construct end-to-end networks for PET restoration, our network works as a dual-stage structure containing two subnetworks with multiple noise-level inputs: (1) an adaptive noise level-aware subnetwork and (2) a LPET restoration subnetwork. The first subnetwork is used to estimate the unknown noise level of the input LPET image, and then the estimated noise level is used as prior information to guide the restoration process in the second subnetwork. Moreover, to enrich the features of the reconstruction process, feature maps derived from the adaptive noise level-aware subnetwork were integrated into the LPET restoration subnetwork.
We summarize our contributions as follows: (1) We take noise differences among LPET into consideration. PET images have a wide range of noise levels, which cannot be fully covered by a single-dose PET images. Our work translates this noise discrepancy into a priori to guide the reconstruction of LPET and the obtained quantitative and visual results reflect the superiority of the proposed method. (2) The proposed method has the potential to normalize PET images with different noise levels. After obtaining the predicted noise levels, we transform them into parameter pairs using affine transformation and optimize them during the training process, which matches an adaptive weight for images at different noise levels and makes it easy to perform normalization. (3) Our method allows the application to other imaging tasks, such as PET/CT denoising tasks. Significant noise intensity differences can also be observed in lowdose CT images obtained under different medical devices produced by different manufacturers. This phenomenon suggests that our method may also migrate to other modalities while achieving good performance.
The remainder of this article is arranged as follows. Section 2 presents the details of our proposed network. Section 3 describes the details of the utilized datasets and evaluation metrics. Next, we show the experimental results and an evaluation of our proposed method in Section 4. Ablation experiments are presented in Section 5. Section 6 describes a discussion of the findings and plans for future work. Finally, we report our conclusion in Section 7.

| METHODS
In this section, the details of our proposed method are presented. We provide an overview of the framework, including the adaptive noise level-aware subnetwork, fusion block and LPET restoration subnetwork.

| Method overview
We propose a 3D noise level-guided PET restoration network to solve the LPET problem with an additional noise-level estimation strategy. Figure 1 describes the overall structure of our method. The adaptive noise levelaware subnetwork aims to estimate the closest defined noise level for the given LPET. The fusion block works based on the estimated noise level and converts it into two affine transformation parameters that are able to scale and shift the input feature map. We expect to learn an adaptive weight for images with different noise distributions through the noise level-aware subnetwork and the fusion block to facilitate the subsequent reconstruction process. Finally, pairs of LPET (the inputs) and standard-dose PET images (the labels) were trained in the LPET restoration subnetwork. The individual components of the algorithm are outlined in further detail in the following sections.

| Adaptive noise level-aware subnetwork
In real PET scanning situations, radiologists may apply different tracer doses for different patients, which leads to uncertainty in the noise levels of PET images. To estimate the unknown noise levels to which LPET belong, we introduce the adaptive noise level-aware subnetwork. The basic architecture of this subnetwork includes a convolution layer, an activation layer, and a max pooling layer. The input and output of the i th unit can be expressed as follows: where x i−1 denotes the unit input and x i denotes the output. w i is the weight of the convolution filter, while b i represents its bias. ⊗ indicates the convolution operation, and g is a nonlinear activation function. In this paper, we choose the rectified linear unit (ReLU) activation function, which can be expressed as To speed up the convergence of the model, instance normalization (IN) is applied after each convolutional layer. After L basic units, we set up two fully connected layers and a softmax layer. In this way, the output of the subnetwork can be represented as where x input indicates LPET volumes, x output indicates the estimated noise level, R denotes the fully connected and softmax operations, and L is the number of basic units.

| Fusion block
Through the adaptive noise level-aware subnetwork, we obtain noise information from the input images. In this section, we demonstrate how this prior information can be efficiently incorporated into the reconstruction process. Wang proposed a spatial feature transform layer that exploits segmentation probability maps as prior conditions and converts them into a pair of affine transformation parameters through a mapping function. 27 To make the mapping function available for the end-to-end optimization with super-resolution branch, Wang used a neural network for it. Huang et al. proposed a similar approach that transforms an anatomical prior to a channel weight mask via DL methods. 28,29 Inspired by them, we attempt to retrofit the intermediate feature map with an affine transformation over the predicted noise level in the channel dimension. This can be expressed as where (γ, β) is the modulation parameter pair and H is the mapping function. x output represents the outputs of the adaptive noise level-aware subnetwork which is considered as a priori. In order to make the mapping function available for end-to-end optimization during standard-dose PET image reconstruction, we also employed a neural network to obtain affine parameter pairs. γ is derived from a convolutional layer with a kernel size of 3 � 3 � 3, and a sigmoid activation function, ranging from 0 to 1. β is produced by a convolutional layer with a kernel size of 3 � 3 � 3, a stride of 1. Since we intend to retrofit the intermediate feature maps in the channel dimension, the number of channels in γ and β is the same as those of the feature maps and its matrix size is 16 � 32 � 1 � 1 � 1. After obtaining the (γ, β), the transformation is carried out by scaling and shift feature maps, which is formulated as follows: where y and b y denote the input feature map and output feature map, respectively, and ⊙ represents the elementwise multiplication operation. In particular, these operations are based on the channel dimension, allowing for better spatial information preservation.

| LPET restoration subnetwork
The U-Net architecture with skip connections combines the appearance feature presentation obtained from the shallow encoding layer with the advanced feature representation derived from the deep decoding layer. 30 This innovative idea excels in many applications, such as image translation. In this paper, we also employed a U-shaped network to estimate standard-dose images. The LPET restoration subnetwork is divided into two parts: encoding and decoding. Five down-sampling units are contained in the encoding part. Each down-sampling unit includes a 3D convolution layer with a kernel size of 3 � 3 � 3, a stride of 2, an activation layer and another 3D convolution layer. The kernel size of this second convolution layer is the same as before, but its stride is 1. To speed up the convergence of the network, we add IN after each convolution layer.
In addition, to diversify the features of the reconstruction process, feature map x i obtained from the adaptive noise level-aware subnetwork and feature map y i derived from the downsampling units are concatenated along the channel dimension. Then, a 3D convolution layer is used to restore the original number of channels. The output of the encoding part is delivered to the fusion block. The decoding part also contains five units for upsampling. Each up-sampling unit contains a 3D transpose convolution layer, an activation layer and another 3D convolution layer. The transpose convolution layer has a 3 � 3 � 3 filter size applied with a stride of 2, and the convolution layer has a 3 � 3 � 3 filter size applied with a stride of 1. Additionally, IN follows each convolution layer. The use of skip connections can be beneficial to the estimation of standard-dose PET images; therefore, the feature maps obtained from the encoding part are copied to concatenate them with the feature maps derived from the decoding part. Here, we chose a Leaky ReLU function as the activation function for the LPET restoration subnetwork.

| EXPERIMENTS
In this section, we describe the details of the datasets used in the experiments. After that, we state the data processing approach, which transformed DICOM-format images to standardized uptake value (SUV) units. Finally, the experimental parameters and the evaluation metrics are elaborated.

| Patient studies
Our datasets included 50 subjects, each of which consisted of LPET and standard-dose PET images. Each LPET had an image size of 192 � 192. The subjects were randomly divided into training (n = 40), validation (n = 5) and test (n = 5) groups. The anatomical site was the head and neck. All data were taken from a uPMR 790 PET/MR scanner (Shanghai United Imaging Medical Technology Company). The patient information is summarized in Table 1.
Before PET scanning, each patient subject was injected with an average of 234 MBq (from 152 to 365 MBq) of the radioactive tracer 18 F-fluorodeoxyglucose ( 18 F-FDG) and PET raw data were registered in list-mode format. Corrections for attenuation, scatter, and randomness were also included in the reconstruction process. In order to simulate the noise distribution in PET images, we roughly divided the noise distribution into four categories, noted as 5%, 10%, 20% and 50% dose. Four simulated doses of PET images were extracted from the standard raw data by randomly selecting a certain percentage of the count events. All these low-dose images were then reconstructed by the ordered subset expectation maximization method and were all calibrated and decay-corrected.

| Data preprocessing
The SUV reflects the ratio of the 18 F-FDG radiation activity in the region of interest (ROI) to the body-averaged radiation activity. 31 It is often used for clinical differentiation between malignant and benign lesions, and indicates the degree of tumor malignancy. The formula for the SUV is defined as follows: where r indicates the radioactivity concentration, weight represents the weight of the subject, and a indicates the decay-corrected amount of injected tracer dose. SUV is a relative uptake measure and is beneficial for standardization and memory saving. 32 Therefore, the DICOMformat images we obtained were normalized into SUV units.

| Implementation details
Our 3D noise level-guided PET restoration network is a dual-stage structure which includes two subnetworks: the adaptive noise level-aware subnetwork and the LPET restoration subnetwork. Instead of training the entire model in an end-to-end manner, we trained two T A B L E 1 Demographic information of the subjects.

The parameters
The subjects subnetworks separately during the training process and fixed the number of channels to 32. The adaptive noise level-aware subnetwork, offering noise-level prior information, was pretrained with 200 epochs and the crossentropy loss function. The learning rate decreased by half every 50 epochs and the batch size is 16. Mean absolute error (MAE), originated from a measure of mean error, has often been used to evaluate vector-vector regression models, which are robust to various types of noise. 33 Therefore, we chose MAE as the loss function for the LPET restoration subnetwork with a batch size of 16 for 500 epochs. After every 200 epochs, the learning rate was divided in half. Both subnetworks had an initial learning rate of 0.0001 and utilized the adaptive moment estimation optimizer. All experiments were performed on a TITAN 3090Ti GPU with the PyTorch neural network framework.

| Evaluation metrics
To evaluate the performance of the predicted standarddose PET images, we selected two common evaluation metrics: the peak signal-to noise ratio (PSNR) and structural similarity index measure (SSIM). The PSNR is formulated as follows: where x is the standard-dose PET image, b x is the estimated standard-dose PET image, M is the maximum intensity range of the real standard-dose image x and the estimated results b x, and N is the total number of voxels in the image. The SSIM is defined as where μ x , δ x indicate the mean and variance of x and μ y , δ y indicate the mean and variance of y. c 1 , c 2 are constants for stability. Apart from that, we also compare the proposed approach with several other DL-based methods such as CNN(3D), U-Net (3D), and RED-CNN(3D). RED-CNN (3D) applies shortcut connections between the encoder and decoder to compensate for the distortion of structural information caused by upsampling. Compared to other classical methods, RED-CNN(3D) provides a certain degree of improvement in the reconstructed image quality. 34 To be fair, these DL-based methods are trained on the same dataset using the same batch size and number of epochs. For CNN(3D), we use 10 convolution layers with a kernel size of 3 � 3 � 3. In particular, we perform a comparison not only with these 3D networks but also with the 2D U-Net model. 30 All these methods share the same experimental environment with our model.

| RESULTS
In this section, we will present the performance of the proposed 3D noise level-guided PET restoration network, which is based on its noise reduction capability for LPET.

| Noise level estimation
To acquire the noise-level information of input LPET, we constructed an adaptive noise level-aware subnetwork and pre-trained it. When PET images pass through this subnetwork, the corresponding estimated noise level can be obtained. In this way, we can compare the actual noise level of the low-dose images and calculate the accuracy. Then, we integrated it into the subsequent reconstruction process as a priori to improve the model performance. Figure 2 shows the accuracy of noise-level prediction exhibited on the training and validation datasets. We find that the accuracy achieved on the training dataset gradually stabilizes at approximately 100%, while the accuracy F I G U R E 2 Accuracy results achieved on the training and validation datasets. The blue curve denotes the accuracy of the training dataset, and the red curve denotes the accuracy of the validation dataset.
attained on the validation dataset is stable at 94%. We also use this subnetwork on the test dataset, producing an accuracy rate of 93%. It can be seen that our adaptive noise level-aware subnetwork had the capability to go for the correct prediction of the noise level on PET images, facilitating the subsequent reconstruction process.

| Standard-dose PET estimation
In this section, we present the visual results produced using different methods together with their quantitative results. Figure 3 shows the results of different methods for the 5% dose and 10% dose. For the estimated PET images corresponding to the 5% dose, among the areas indicated in black circles, other methods have lighter colors in their pseudocolor maps than the standard-dose PET images, indicating that the SUV obtained by these methods is lower, but our method is closer to the ground truth. For the estimated results corresponding to the 10% dose, the results predicted using our method have a complete gray matter structure, while the other methods show discontinuities at the spots indicated by arrows. Figure 4 shows the results obtained using different methods for the 20% dose and 50% dose cases. In the presented pseudocolor map, a darker color corresponds F I G U R E 3 Estimated results of different DL methods for the 5% dose and 10% dose cases. The black circles denote the ROIs, and the yellow arrows denote the differences. DL, deep learning; ROIs, region of interests. LI ET AL.
-7 of 14 to a higher SUV. Our method renders a darker color at each location with a yellow arrow, which is consistent with ground truth for both 20%-dose and 50%-dose PET images. In actual tumor diagnosis scenarios, these highlighted parts may suggest cancerous lesions. A large SUV difference may cause interpretation difficulties and a potentially higher risk of misdiagnosis.
It is clear that the LPET reconstructed using our method are visually closer to the ground truth. In addition to this qualitative evaluation, we performed a quantitative assessment of the estimated PET images. To reduce the impact of the background on the produced evaluation metrics, we captured images with sizes of 128 � 128 from the center to calculate PSNR and SSIM values, and the results are shown in Table 2. Based on the quantitative results, we find that our method can easily perform better at higher noise levels, such as a 5% dose. Compared to the original LPET, the 5%-dose PET images yield a 12% PSNR improvement and a 7% SSIM improvement. However, RED-CNN(3D), the best-performing approach among the other methods, only obtains a 10% PSNR improvement and a 5% SSIM improvement. In addition, our method still outperforms this technique in terms of the PSNR and SSIM at all four noise levels.

| Evaluation for specific ROIs
In some practical applications, we need to focus on the uptake of tracers in specific areas. For example, during the postoperative review of a tumor, doctors observe whether there are metabolic abnormalities at the lesion. On this basis, we set out to compare the performance of the five tested methods on specific ROIs that have relatively high SUV as a way to benchmark the details of the estimated standard-dose PET images. Figure 5 demonstrates three selected ROIs from the PET images at the 50% dose level, and the calculated PSNR and SSIM results are shown in Figure 6.
As shown in Figure 6, our method yields superior performance in specific ROIs. For the three selected ROIs, our method achieves significant improvements over the original LPET. In particular, in ROI 1, our method improves PSNR from 22 to 28 dB and SSIM from F I G U R E 4 Estimated results of different DL methods for 20% dose and 50% dose cases. The black circles denote the ROIs, and the yellow arrows denote the differences. DL, deep learning; ROIs, region of interests.
T A B L E 2 Quantitative results (mean � std) obtained on simulated datasets in terms of the PSNR and SSIM. The best quantitative results are marked in bold. F I G U R E 5 Estimated results obtained using the different methods on simulated datasets at a 50% dose. Three ROIs are marked in black boxes. ROIs, region of interests.

F I G U R E 6
Statistical results on the three selected ROIs in terms of the PSNR and SSIM while the three ROIs are marked in Figure 5. LPET, low-dose PET; PET, positron emission tomography; PSNR, peak signal-to-noise ratio; ROIs, region of interests; SSIM, structural similarity index measure.
0.88 to 0.95, while other methods cannot achieve these improvements. Furthermore, we find that our approach can also perform well on edges. Figure 7 exhibits shoulder slices with the highest noise levels. The selected ROIs are placed in the bottom-right corner of each subfigure. We also present their corresponding statistical PSNR and SSIM indicators in Figure 8. Based on our observations, the proposed method achieves better statistical results than the competing approaches.

| ABLATION STUDIES
In this section, we conduct several ablation studies on our datasets to validate the effectiveness of the proposed method.

| The impacts for inaccurate noiselevel estimation
We show the noise-level estimation results of the proposed method in Section 4.1, where the accuracy can reach 100% on the training dataset but not on the validation datasets, which may raise the concerns about the impact on the model when the noise level is incorrectly predicted. To explore the effect of the wrong noise-level estimation, we input both the random and the correct noise-level labels into the trained network. The quantification results, shown in Figure 9, indicate that PSNR decreased with the random noise-level input, suggesting that the noise prior plays a role in improving performance.
We have described the ablation studies to explore the model performance when the noise level is random. However, there is still a concern that our training and validation datasets have the same noise level, which means that the validity of the model against unseen noise levels has not been verified. To explore the effect of unknown noise levels, we implemented experiments on a new dataset and the corresponding quantified results are shown in Figure 10. This dataset is independent of the dataset used in our model, and the patient's scan site is also the head and neck, with the difference that the dose level of the included LPET is unknown.
As illustrated in Figure 10, when faced with an unknown noise-level input, our method still performs satisfactorily with improved image quality compared to the original LPET, suggesting the outstanding generalization ability of the proposed network.

| The effectiveness of the prior information
Our LPET restoration subnetwork combines noise levels and feature maps from the adaptive noise level-aware subnetwork. To verify the effectiveness of adding these two priors, we designed four ablation experiments. The "Baseline" model does not incorporate any prior information. It is a simple version of our restoration subnetwork. The "Baseline + noise level" model only adds the noise level prior to the baseline. Similarly, the F I G U R E 7 The selected region of interests are marked with red lines, and the corresponding zoomed-in images are placed in the bottom-right corner of each subfigure. "Baseline + feature map" model only adds a feature map to the baseline while omitting the noise-level estimation step. The "Proposed" represents our LPET restoration subnetwork, which incorporates both noise levels and feature maps.
We train four models with the same batch size of 16 and 200 epochs when the MAE loss is recorded. Quantitative PSNR results are calculated for the entire image on the validation and test datasets as shown in Table 3. We find that our method with two priors integrated has higher statistical results. The PSNR achieved using the proposed method is much higher than that of the "Baseline" model and the "Baseline + noise-level" model. Although the "Baseline + feature map" model obtains a relatively high PSNR, our method still performs better than this version. These quantitative results validate the effectiveness of incorporating two types of prior information to achieve model performance improvements. Comparing the quantitative results for the different noise levels, our method also yields higher quantitative results with the highest PSNR metric, which provides even stronger evidence for the need to incorporate noise levels and feature maps.

| DISCUSSION
In this paper, we rethought the low-dose problem by considering the impact of noise differences in image restoration quality and proposed a 3D noise level-guided PET restoration network to estimate standard-dose PET images from LPET.
To investigate the impact of the noise level, we coarsely divided common noise into four levels with different radiation doses. A decreasing radiation dose implies an increasing noise intensity for PET images. Additionally, PET image noise variances are widely distributed and may differ across individuals, medical machines, and other reasons. Motivated by these considerations, we proposed a 3D noise level-guided PET restoration network for LPET imaging, which treats the estimated noise level as a priori information for guiding the standard-dose PET image restoration process. However, our current noise-level definition fails to fully simulate the PET image noise distribution. In future work, the division and estimation of noise levels with more uniform and smaller intervals will be carried out to improve the robustness and generalization of the model.
On the other hand, network design is discussed. When predicting standard-dose PET images from LPET, we employed a U-shaped structure to facilitate information sharing between images from the same modality with different doses. However, other network structures are also promising, such as GANs, which have demonstrated superior performance in generative image modeling cases. 35,36 In addition, some semi-supervised and unsupervised methods are also used to solve this problem. 37 Therefore, we will consider incorporating noise priors into other types of network architectures in future work. In addition, our method is based on a 3D structure, which aims to preserve the spatial information contained in PET images. Compared with the U-Net (2D) model, our method also obtained higher quantization results, proving the advantage of the 3D structure.
Furthermore, the anatomical sites of the subjects are also worth considering. We keep targeting the head and neck information of the subjects. However, in clinical practice, physicians also scan different anatomical sites, such as the chest, the pelvis, and even the whole body, depending on the patient's symptoms. 38 Therefore, our future work will introduce more anatomical sites and also explore the performance of our method on whole-body PET images at low doses. In addition, our datasets contained only 50 subjects, and we expect to include more patients in future work.
The proposed method has the potential to complete other imaging tasks. In this study, we considered noise differences in PET images, and similarly, significant noise intensity differences are present among low-dose CT images obtained under different medical devices produced by different manufacturers. 28 This suggests that our method may also be able to migrate to other modalities while achieving good performance. Meanwhile, PET images with less than a quarter dose are usually employed for the dynamic PET studies. In the future, we may apply our method to denoise single-frame low-dose dynamic PET images.
T A B L E 3 Quantitative PSNR results (mean � std) obtained on validation and test datasets. The best PSNR (dB) results are marked in bold.

Baseline
Baseline + noise level Baseline + feature map Proposed Abbreviation: PSNR, peak signal-to-noise ratio.

F I G U R E 1 0
Statistical results on the new datasets with unknown noise level in terms of the PSNR. PSNR, peak signal-tonoise ratio.
In conclusion, we considered the noise differences among PET images and proposed a 3D noise level-guided PET restoration network for LPET. Different from other DLbased methods that directly construct end-to-end networks to predict standard-dose PET images, our method consists of two subnetworks with multiple noise-level inputs. The adaptive noise level-aware subnetwork aims to predict the unknown noise level of the input images, while the LPET restoration subnetwork employs the estimated noise level as a priori information to guide the restoration process from the original LPET. Ablation studies demonstrated the effectiveness of incorporating the noise prior, and both visual and quantitative results showed the superiority of our method over several DL methods. In future work, we will try to apply the developed method to other low-dose imaging tasks and implement more uniform and smaller noise-level intervals to improve the robustness of the model.