Improving synthetic CT accuracy by combining the benefits of multiple normalized preprocesses

Abstract Purpose To investigate the effect of different normalization preprocesses in deep learning on the accuracy of different tissues in synthetic computed tomography (sCT) and to combine their advantages to improve the accuracy of all tissues. Methods The cycle‐consistent adversarial network (CycleGAN) model was used to generate sCT images from megavolt cone‐beam CT (MVCBCT) images. In this study, 2639 head MVCBCT and CT image pairs from 203 patients were collected as a training set, and 249 image pairs from 29 patients were collected as a test set. We normalized the voxel values in images to 0 to 1 or −1 to 1, using two linear and five nonlinear normalization preprocessing methods to obtain seven data sets and compared the accuracy of different tissues in different sCT obtained from training these data. Finally, to combine the advantages of different normalization preprocessing methods, we obtained sCT_Blur by cropping, stitching, and smoothing (OpenCV's cv2.medianBlur, kernel size 5) each group of sCTs and evaluated its image quality and accuracy of OARs. Results Different normalization preprocesses made sCT more accurate in different tissues. The proposed sCT_Blur took advantage of multiple normalization preprocessing methods, and all tissues are more accurate than the sCT obtained using a single conventional normalization method. Compared with other sCT images, the structural similarity of sCT_Blur versus CT was improved to 0.906 ± 0.019. The mean absolute errors of the CT numbers were reduced to 15.7 ± 4.1 HU, 23.2 ± 7.1 HU, 11.5 ± 4.1 HU, 212.8 ± 104.6 HU, 219.4 ± 35.1 HU, and 268.8 ± 88.8 HU for the oral cavity, parotid, spinal cord, cavity, mandible, and teeth, respectively. Conclusion The proposed approach combined the advantages of several normalization preprocessing methods to improve the accuracy of all tissues in sCT images, which is promising for improving the accuracy of dose calculations based on CBCT images in adaptive radiotherapy.

calculated value. These anatomical variations have led to the development of adaptive radiotherapy (ART), [1][2][3][4] in which plans are adjusted according to the actual clinical anatomical structures. Cone-beam computed tomography (CBCT) images are widely used to monitor setup errors and anatomical variations. However, radiotherapy dose calculations based on CBCT images can be inaccurate due to scattering and other artifacts, 5 which limits the applicability of CBCT images in ART. Recent studies have investigated converting CBCT images into synthetic CT (sCT) images with higher image quality through deep learning methods. [6][7][8][9] The sCT images have precise HU values similar to those of CT images and anatomical structures similar to those observed in CBCT images and may thus lead to accurate dose calculations in ART. Megavolt CBCT (MVCBCT) images have more noise, lower image contrast, and poorer image quality than kilovolt CBCT (kVCBCT) images. However, it is worth noting that the higher X-ray energy greatly reduces the photon starvation and radiation hardening effects, making the streak and metal artifacts commonly seen in kVCBCT almost negligible. 10 This has led many researchers to explore the feasibility of acquiring sCT images based on MVCBCT images. [10][11][12][13][14] The deep learning processes use gradient descent, which requires data feature scaling before training. The CT numbers in different types of tissues vary greatly in CT images; the CT numbers in soft tissues are approximately 0 HU, and the CT numbers in cavities reach as low as −1000 HU, while the CT numbers in bony tissues can be greater than 1000 HU, and the CT numbers in teeth can reach 3000 HU. When training deep learning networks, different feature values should be treated as equally important; however, the numerical calculations are dominated by larger feature values, even if these are not the most important features. Thus, feature scaling is needed to normalize the different dimensional feature values to similar small ranges, such as 0 to 1 or −1 to 1. 15,16 On the other hand, larger differences in feature values result in large differences in gradient values, which requires a larger difference in the learning rate to guarantee model convergence. Feature scaling can accelerate model training by reducing the difference in the required learning rate. 15,17 Normalization is a common feature scaling method. Some research groups have used normalization methods in preprocessing before deep learning begins to generate sCT images based on CBCT images. The normalization range is typically 0 to 1 or −1 to 1, and the normalization methods are mainly linear normalization 11,[18][19][20][21][22][23] or nonlinear normalization approaches using the hyperbolic tangent function (Tanh). 24,25 CT images have unequal proportions of voxels for different tissues, with soft tissues being the most dominant (approximately more than seventy percent in the head). Before normalization, the voxel values in CT images are equal to the CT numbers,and the distribution of voxel values will be different after normalization by different methods in the preprocessing stage. The voxel values corresponding to soft tissues after linear normalization preprocessing (min-max normalization) account for only approximately ten percent of the total voxel value range, which may not be effectively trained. In contrast, the nonlinear normalization preprocessing method can greatly increase the distribution range of voxel values corresponding to soft tissues after normalization and improve the training effect of soft tissues, but the cavities and bony tissues will become inaccurate again due to insufficient training. CT images are the basis for calculating the three-dimensional dose distribution in patients, which is related to the efficacy and safety of radiotherapy, so any accuracy of body tissues should be guaranteed. However, the traditional single image normalization preprocessing method does not seem to ensure that all body tissues are adequately trained simultaneously. To the best of our knowledge, no studies have compared the impact of different normalization preprocessing methods on the quality of generated sCT images, which is explored in this study. In this study, the cycle-consistent adversarial network (CycleGAN) model was used to generate sCT images based on MVCBCT images. We demonstrate that different image normalization preprocesses can lead to different accuracies in different CT number ranges for each set of sCT images obtained from training. We then select, crop, stitch, and smooth the above different sCT images to obtain a new set of images that combines the advantages of multiple normalization preprocessing methods to ensure the accuracy of all body tissues.

Research process
As shown in Figure 1, the research procedure in this study can be divided into four steps. First (Figure 1a), the paired MVCBCT and CT images were preprocessed, including elastic registration, resampling, cropping, and normalization. Among the above preprocessing steps, normalized preprocessing is the most critical step and is also a crucial step in the entire study. We used different linear and nonlinear normalization preprocessing methods to normalize the voxel values of CT and MVCBCT images from the original CT number range to 0 to 1 or −1 to 1,respectively.Different normalization preprocessing methods change the distribution of voxel values of different body tissues after normalization. This affects the accuracy of different body tissues in sCT images. Second (Figure 1b), the CycleGAN model was trained by the above paired data normalized by seven different methods. Third (Figure 1c), the quality of the images generated by the seven trained models was evaluated on the testing cases. Finally, (Figure 1d), several data normalization preprocessing methods were combined to generate the final sCT images, and the image quality and CT numbers of organs at risk (OARs) were evaluated.

Data acquisition and preprocessing
In this study, paired CT and MVCBCT images were collected from 232 radiotherapy head patients. A total of 2639 image pairs from 203 patients were randomly selected as the training set, and the remaining 29 patients and 249 image pairs were used as the test set. The CT numbers of dental tissues are much larger than those of other bony tissues in the head, so including dental tissues in the images helps to investigate the effect of different normalization preprocessing methods on the accuracy of high CT number tissues in sCT images. Therefore, we selected only head CT images containing dental tissues for investigation in this study. The CT images were obtained with a Siemens SOMATOM Spirit helical CT scanner (tube voltage is 130 kV, slice thickness is 3 mm). Paired MVCBCT images were obtained in the first fraction of treatment (Siemens Artiste Medical Electron Linear Accelerator, 6 MV, 0.54 mm × 0.54 mm × 0.54 mm). The maximum CT number in the CT images did not exceed 3000 HU, and the maximum CT number in the MVCBCT images did not exceed 1400 HU. The MVCBCT images were elastically deformed according to the planning CT images to ensure that the spatial positions between the two images were as one-to-one as possible, and the deformed images were resampled to a spacing of 1 mm × 1 mm × 1 mm and cropped to 256 × 256 voxels.

Different normalization preprocessing methods
Various research groups have described linear normalization 11,18-23 and nonlinear normalization preprocessing methods using the hyperbolic tangent function (Tanh), 24,25 where Tanh(x) = e x −e −x e x +e −x . With reference to the above literature, 11,18-25 we defined two linear normalization and one nonlinear normalization preprocessing methods employed in this work as follows: where N img represents the normalized voxel value and HU img represents the CT number of a voxel in the image. The subscript img indicates the image type, corresponding to the CT or MVCBCT images.
In the image set used in the paper, values between −400 and 150 HU (soft tissues) only make up 13.75% of the total voxel value range, but the corresponding voxels account for approximately 75.55% (Table 1) of the total voxels. After two linear normalizations (I, II), values between −400 and 150 HU are mapped to 0.15 to 0.29 and −0.70 to −0.43, respectively (Table 2), with a ratio of values still 13.75% (Table 3), which may result in 75.55% of the images (soft tissue) not being effectively trained. The nonlinear normalization preprocessing method III increased the proportion of soft tissue voxel values to 56% (Table 3). Thus, the soft tissue portions of the images can be more thoroughly trained, leading to improved accuracy. However, the proportion of the 1500-3000 HU mapping interval decreased from 37.5% to 0.06% (Table 3) after normalization with method III, resulting in images of high CT number tissues (e.g., dental tissue) not being effectively trained. In radiotherapy, CT images are used to accurately calculate dose distributions, impacting the efficacy and safety of the treatment. Improving the voxel value distribution after normalization preprocessing for different tissues separately by changing the normalization preprocessing method for better training is feasible.
We developed four other normalization preprocessing methods by changing the scaling center (a img ) and proportion (b img ) in method III. These methods ensure that the corresponding voxel values of different tissues (cavities, soft tissues, bony tissues (excluding teeth), dental tissues) fit the main distribution in the range from −1 to 1 after image normalization preprocessing. These four normalization preprocessing methods can be expressed as IV to VII: Table 2 shows the changes in voxel values of CT images for different normalization preprocessing methods (−1000 HU, −150 HU, 400 HU and 3000 HU). Methods IV to VII modify the voxel values after normalization preprocessing by changing the values of a img and b img in III, altering the range of voxel values for different tissues. As shown in Table 3, after normalized preprocessing of IV to VII, the widest distribution of voxel values is obtained for cavities (−1000 to −150 HU), soft tissue (−150 to 400 HU), bony tissue (excluding teeth, 400 to 1500 HU), and dental tissue (1500 to 3000 HU), respectively, in the CT images.
The synthetic images generated by the CycleGAN models after normalizing the CT and MVCBCT images according to the above seven normalization methods in the preprocessing are named Linear_1, Linear_2, Tanh_0, Tanh_1, Tanh_2, Tanh_3 and Tanh_4. These images as a whole are referred to as "uncombined sCT images" in later sections.

Structure and training of the CycleGAN model
The CycleGAN model is an unsupervised network that has been widely used to generate synthetic CT images. 5,7,10,[26][27][28] The CycleGAN architecture is consistent with that of the original paper described by Zhu et al. (2017), which includes two generators (Gmvcbctct and Gct-mvcbct) and two discriminators (Dct and Dmvcbct), using ResUNet by Xiao et al. (2018) and 70×70 PatchGAN described by Zhu et al. (2017) as generators and discriminators, respectively. 29,30 Gmvcbct-ct converts MVCBCT to CT images and Gct-mvcbct converts CT to MVCBCT images. Dct is used to discriminate sCT from real CT images and Dmvcbct is used to discriminate sMVCBCT from real MVCBCT images. In training, Gct-mvcbct also converts the sCT images generated by Gmvcbct-ct into cycle MVCBCT images and ensures that cycle MVCBCT is as consistent as possible with real MVCBCT images. The activation functions in the last ResUNet layer are ReLU and tanh when the normalization range is 0 to 1 and −1 to 1, respectively. The Python deep learning library PyTorch was employed as a backend to achieve the CycleGAN architecture. An Nvidia Quadro RTX 6000 GPU with 24 G memory was used to train the model. The initial learning rate was set to 0.002, the number of epochs was set to 200, and the learning rate decreased linearly from 0.002 to 0 in the last 100 epochs.

Cropping and combining the sCT images
As discussed in the "Different normalization preprocessing methods" section, different normalization preprocessing methods can alter the distribution of voxel values for different tissues. The more widely distributed the voxel values are, the better the data of the body tissues will be trained, and the more accurate their synthetic images will be. Therefore, the sCT images obtained by different normalization preprocessing methods will have higher accuracy in different body tissues. We chose MVCBCT images (within the skin) as a benchmark for segmentation of image regions, as the sCT images in this study were all generated based on MVCBCT images with consistent anatomical structures. We select the most accurate sCT images in different image regions compared to the planning CT images. To verify the generalization ability of the model, the cropping and stitching of the sCT images in the validation set are based on the selection result of the training set. For example, as shown in Figure 1(d), assume the training set MVCBCT images (the part inside the skin) are divided into five image regions (r1-r5) according to the CT number from the smallest to the largest, and the most accurate sCT images in these five regions are sCT_1 to sCT_5. Then, in the validation set, the MVCBCT images are segmented into five regions according to the same criteria, and the r1 part of sCT_1, r2 part of sCT_2, r3 part of sCT_3, r4 part of sCT_4, and r5 part of sCT_5 are each cropped and reassembled into one image. The final combined images are expected to be more accurate than all prior sCT images. In this study, the MVCBCT images are first smoothed with an average filter (OpenCV's cv2.blur, kernel size 5) and then segmented into 48 regions by CT number in 50 HU units. The structural similarity index (SSIM), which is typically used to compare the similarity of two images, was chosen to select the most accurate sCT images in each of the 48 regions in the training set. Based on the selection results in the training set, we cropped the corresponding sCT images in the test set from each of the 48 regions and stitched them together to create the combined image-sCT_Combine. We obtained sCT_Blur by filtering sCT_Combine with a median filter (OpenCV's cv2.medianBlur, kernel size 5) because we found that the stitching edges of the different sCT images in the sCT_Combine images were less natural and smooth to the human eye.

Image accuracy evaluation
The relative electron density (RED) is one of the main bases for radiotherapy dose calculation. 12 To compare the differences in the RED distributions due to noise and artifacts between the MVCBCT and CT images, the MVCBCT images were converted to CT images using CT number-RED correction curves. The original MVCBCT images are denoted as MVCBCT_1, and the converted images are denoted as MVCBCT_2. We evaluated the accuracy of the sCT images compared to planning CT images qualitatively by eye and quantitatively using metrics such as the mean absolute error (MAE), root mean square error (RMSE), peak signal-tonoise ratio (PSNR) and structural similarity index (SSIM) as described by Chen et al. 19 Lower MAE and RMSE and higher PSNR and SSIM indicate better image quality. The MAE is used to calculate the average of the absolute error of all voxel values between the two CT images.
The RMSE measures the deviation between the model-generated sCT images and the planning CT images and is more sensitive to outliers in the deviation of voxel values.
PSNR is the most common and widely used objective measure of image distortion.
SSIM mainly considers three key features of an image: brightness, contrast and structure, and can measure the similarity of two images. SSIM is a perceptual model that is more in line with the intuition of the human eye. It is different from MSE, RMSE, and PSNR, which measure the absolute error between images.
In Equations (1) to (4), MAX is the maximum CT number in the image. I 1 and I 2 represent the two images to be compared. I 1,i and I 2,i refer to the HU values of the i-th voxel in images I 1 and I 2 . I 1 and I 2 represent the mean of the voxel values in I 1 and I 2 , respectively. I 1 and I 2 represent the standard deviation of the voxel values in I 1 and I 2 , respectively. I 1 ,I 2 represents the covariance of the voxel values in I 1 and I 2 . c 1 and c 2 are constants.
In the study by Vinas et al. (2021), images with CT numbers within the range of [−400, 150] HU were defined as soft tissue areas, and images with CT numbers larger than 150 HU were defined as bony tissue areas. 10 The same division standard is adopted in this study, and regions with [−1000, −400] HU are defined as cavities. The voxels outside the skin are excluded when evaluating the overall image. When images of different tissues were evaluated, only the voxels with CT numbers within a certain range were assessed. We used integrated two-dimensional histograms and quantilequantile (Q-Q) plots to analyze how similar the CT number distributions of different sCT images were to the planning CT images ( Figure 5). The more accurate the CT number of the sCT image,the brighter and more concentrated the hexagon around the 45-degree line. The closer the Q-Q plot is to a 45-degree straight line, the more similar the CT number distribution is between the two sets of CT images. Two associate chief physicians of radiotherapy outlined and reviewed the contours of the major OARs (oral cavity, parotid, spinal cord, cavity, mandible, and dental tissue) on the planning CT images and reverified the accuracy of the contour reproduction after we copied the contours onto each set of sCT images. These contours were used to evaluate the difference between the CT numbers of the corresponding voxel points in different images. Wilcoxon signed-rank tests for two related samples were performed using IBM SPSS Statistics software 31 to analyze whether the differences between the accuracies of the various sCT images were statistically significant.

RESULTS
The HU, Tanh_4. The sCT_Combine image was composed of sCT images obtained within the above ranges. Figure 2 compares the sCT, MVCBCT, and planning CT images, with similarity determined qualitatively by eye. Although MVCBCT_2 is electron density corrected, its image, similar to MVCBCT_1, has a very different appearance from the CT image. The cavity shapes of Tanh_1 and Tanh_2 in the uncombined sCT images look more consistent with the MVCBCT images. sCT_Combine and sCT_Blur inherit this advantage (as shown by the red boxed line in Figure 2). The teeth and mandibles in Tanh_4, Linear_1, and Linear_2 look closer to the planning CT. The teeth in Tanh_3 are darker compared to the planning CT, but the mandible images look accurate. The soft tissue structures in the Tanh_0, Tanh_2, Linear_1, and Linear_2 images appear finer than those in the other sCT images and are consistent with the planning CT.sCT_Combine images perform well for both soft and bony tissues, but appear less smooth at the edges of some anatomical structures (such as the edges of the teeth and mandible in the blue line box in Figure 2) than the sCT_Blur images processed with the median filter. Figures S1 and S2 demonstrate the generalizability of the model in this work. All sCT_Blur images appear to be close to the planning CT images, both for soft tissue ( Figure S1) and bony tissue ( Figure S2). Additionally, the anatomical structures in the sCT_Blur images are consistent with MVCBCT_1 (shown in the red boxed line in the figure). Figure 3 shows the consistency of the CT number distribution of sCT_Combine, sCT_Blur and each group of uncombined sCT images with the planning CT images. Each group of uncombined sCT images had different accuracies in different CT number intervals, consistent with Figure 2. The soft tissue in Tanh_0 and Tanh_2 is relatively accurate, but the bony tissue (CT number > 400 HU) differs from the planning CT images (shown by the red ellipse). Both soft tissue and bony tissue in Tanh_1 are inaccurate (shown by the red ellipse). Tanh_3 and Tanh_4 are relatively inaccurate in the soft tissue (shown by the red ellipse) but accurate in the bony tissue. Linear_1 and Linear_2 are accurate in both soft and bony tissues, but the highest CT numbers are obviously lower than the planning CT (white F I G U R E 3 Two-dimensional histograms and Q-Q plots of the sCT_Combine, sCT_Blur and uncombined sCT images relative to the planning CT images in the range of [−1000, 3000] HU. The brighter the color near the 45-degree line and the more concentrated the bright color block, the more accurate the CT number of the sCT image and the smaller the dispersion of the CT number error. The white curve is the Q-Q plot. If the Q-Q plot deviates too much from the 45-degree line, it indicates that the CT number distribution of the two images has a large difference. To better compare the soft tissue regions of the images, 2D histograms were redrawn and magnified in the lower right corner of each subplot in the range of −150 to 100 HU. The green dashed lines in the subfigures indicates the main concentration area of brighter blocks in sCT-Blur. CT, computed tomography; sCT, synthetic computed tomography. area shown by the blue arrow). sCT_Combine images are accurate in both soft and bony tissues. In addition, the brighter color blocks in sCT_Combine versus CT are basically concentrated within the green dashed line compared to the uncombined CT images (shown by the white arrow), indicating that the dispersion of CT number errors in sCT_Combine images is lower, while the dispersion of CT number errors in sCT_Blur images is even lower after smoothing (shown by the white arrow in sCT_Combine vs. CT).
It is necessary to quantitatively compare the accuracy of each group of sCT images in different CT number intervals. Tables 4 and 5 show the accuracy of uncombined sCT images in different CT number intervals and the image quality improvement of sCT_Blur relative to uncombined sCT images quantitatively, where all sCT images are compared to planning CT images by metrics including MAE, RMSE, PSNR, and SSIM. The quantitative analysis shows the image quality variability of different uncombined sCT images in different CT number intervals and no single uncombined sCT image provided the best accuracy for all tissues. Among the uncombined sCT images, Tanh_2 and Tanh_4 are the most accurate images in the range of −400 to 150 HU (soft tissue) and 150 to 3000 HU (bony tissue), respectively. Tanh_1 has the best MAE, RMSE and PSNR  5 show that sCT_Blur image quality metrics are better compared to other sCT images (Linear_1, Linear_2, and Tanh_0) obtained by training with the CycleGAN model after preprocessing the images with three common normalization methods (methods I to III), Figure 4 provides a more visual comparison of image accuracy. sCT_Blur preserves the anatomical structure information of MVCBCT_1, while the image is closer to the planning CT image (Figure 4a). The absolute CT number error maps in Figure 4(b) compares the CT number accuracy of MVCBCT_1, sCT_Blur, and other sCT images (Linear_1, Linear_2, and Tanh_0) with the planning CT as the reference image. The result shows that all sCT images are more accurate than MVCBCT_1. In the dental tissue, the blue color in the |sCT_Blur-CT| image is lighter relative to the absolute error maps of CT numbers in the other sCT images, representing a more accurate distribution of CT numbers in the dental tissue. We created Figure 4(c) to more directly compare the CT number accuracy of sCT_Blur with the other three sCT images. The darker blue color of a point in a map means that the absolute error of the CT number of sCT_Blur at that point is relatively smaller and sCT_Blur is more accurate. The opposite is true for red, and white means that the absolute error of the CT number of both images at that point is equal. In the two difference maps "|sCT_Blur-CT|-|Linear_1-CT|" and "|sCT_Blur-CT|-|Linear_2-CT|", red and blue are mixed throughout the map, but overall, there are more blue points in the maps. This indicates that sCT_Blur is more accurate than Linear_1 and Lin-ear_2. In map |sCT_Blur-CT|-|Tanh_0-CT|, it is hard to tell by eye which color is more in the soft tissues, but the blue color is clearly more and darker in the dental part. This shows that the accuracy of the dental tissue is improved in sCT_Blur compared to Tanh_0. Figure 4(d) shows the CT number profiles for the portion of the tooth that passes through the blue line in Figure 4(a). Some of the teeth in Tanh_0 obviously have higher CT numbers than planning CT (shown by the red arrows). All three uncombined sCT images have some voxels in the teeth with much lower CT numbers than the planning CT images (shown by the blue dashed box). The position of the tooth edges in Linear_1 and Linear_2 differs greatly from the planning CT images (shown by the blue arrows).Overall,the CT value profile of sCT_Blur is closest to the planned CT, indicating that the dental tissue of sCT_Blur is more accurate compared to the other three groups of sCT. Figure 5(a-f) show the CT number error distributions between the sCT images and planning CT images in the OARs at the voxel level, and Figure 5(g) shows the MAEs of the CT numbers for different OARs at the slice level. The CT numbers of all OARs in sCT_Blur were more accurate than those of Linear_1, Linear_2, and Tanh_0 obtained by training after normalization using conventional methods (methods I to III) in the image preprocessing phase, and the differences were statistically significant (p < 0.001). Relative to the planning CT images, the MAE in the CT number for each OAR in the sCT_Blur images was 15.7 ± 4.1 HU (oral cavity), 23.2 ± 7.1 HU (parotid), 11.5 ± 4.1 HU (spinal cord), 212.8 ± 104.6 HU (cavity), 219.4 ± 35.1 HU (mandible), 268.8 ± 88.8 HU (dental tissue).

DISCUSSION
The main purpose of this paper is to solve the problem that the traditional, single image normalization methods (methods I to III) in the image preprocessing stage cannot allow different tissues in CT images to be adequately trained simultaneously, thus affecting the accuracy of sCT images and the accuracy of 3D dose distribution calculation in radiotherapy. We compared the accuracy of seven groups of sCT images using different normalization methods (methods I to VII) in the image preprocessing stage in different CT number intervals.  (Table 3 and Figure 6c,d), resulting in poor image accuracy for the soft tissues in the sCT images (Table 4 and  with the hyperbolic tangent function allow the voxel values of the soft tissues to fit a wider distribution after normalization (as shown in Table 3 and Figure 6c,d), thus improving the accuracy of the CT numbers of the soft tissues in the sCT images (Table 4). However, after nonlinear normalization preprocessing, the voxel values of tissues with low and high CT numbers are normalized to a very narrow range ( Table 3) that could not be effectively learned. For example, after normalization preprocessing by the tanh function, the CT numbers of 500 HU, 1400 HU, and 3000 HU become 0.848, 0.998, and 0.999, respectively. As a result, the model cannot identify bones and teeth effectively. In this study, we change the distribution of voxel values in different CT number intervals after normalization by changing a img and b img in the nonlinear normalization equation (Tanh, methods III to VII) in the image preprocessing stage, thus making the CycleGAN model more focused on training image regions in specific CT number intervals. As shown in Table 3 and Figure 6(c,d), the normalization preprocessing methods used to generate the Tanh_1, Tanh_2 and Tanh_4 images allow the cavity, soft tissue and bony tissue to fit large distribution ranges, respectively. Thus, the Tanh_1, Tanh_2, and Tanh_4 images showed the best image effects in the cavity, soft tissue and bony tissue, respectively ( Table 4). The combination of multiple sCT images utilizes the advantages of various nonlinear normalization prepro-cessing methods to improve the accuracy of all tissues in the sCT images. Our results show that sCT_Blur image accuracy is better than that of the Tanh, Linear 1 and Linear_2 images (Tables 4 and 5). For the OARs investigated in this study, the accuracy of the CT numbers in the sCT_Blur images was higher than that in all the sCT images obtained via conventional normalization preprocessing methods; and the differences were all statistically significant (p < 0.001).
In a previous study, Liang et al. 24 generated sCT images based on head and neck kVCBCT images using the CycleGAN model by applying a normalization preprocessing method involving the hyperbolic tangent function. Vinas et al. 10 generated sCT images based on head and neck MVCBCT images using the CycleGAN model with an unknown image normalization preprocessing method. In this study, the SSIM values inside the body were recalculated according to the image quality evaluation methods proposed in the above two studies. The SSIM value is 0.928±0.014 in our study, which is higher than the results of Liang Figure 2 and Figure S3. Our results show that simply calibrating the MVCBCT images with CT number-RED calibration curves is not accurate in most soft tissue regions. This problem may occur because the CT numbers in the center of the MVCBCT images are underestimated during the reconstruction process, leading to cup artifacts in the image centers. The existence of cup artifacts leads to nonuniformity in the MVCBCT images. 12 Therefore, the calibration error is large when the CT number-RED calibration curve obtained from the standard electron density phantom at the center of the MVCBCT beam field is applied directly before the nonuniformity is corrected. Morin et al. 12 showed that uniformity calibration of MVCBCT images can correct cup artifacts and greatly improve the CT number accuracy in MVCBCT images. To some extent, the accuracy of the generated sCT images can be improved by training the model on CT and MVCBCT images with the artifacts removed. We will study this issue more in future work.
Our proposed method still has some limitations. In our study, we need to train the seven data sets obtained from each of the seven normalization preprocessing methods with the CycleGAN model to compare the effects of different methods on the accuracy of different tissues in the finally obtained sCT images. Obviously, this process is very time-consuming. In addition, we cannot ensure that each normalization method is optimal because the different normalization methods are obtained by artificially changing the parameters in the normalization equation. Since the seven sets of sCT images are trained independently, the final combined sCT can be obtained only after the training of each data set is completed and cannot be integrated into the CycleGAN model for automatic training. In future work, we plan to modify the CycleGAN model to achieve automatic optimization of multiple normalization preprocesses, simultaneous training of different data obtained after normalization, and automatic combination and optimization of different sCT images.

CONCLUSION
In this study, the CycleGAN model was used to generate synthetic CT images from head and neck MVCBCT images containing dental tissue. We demonstrate that changing the image normalization preprocessing method can adjust the accuracy of different tissues in sCT, and that a single image normalization preprocessing method cannot make all tissues sufficiently accurate. Our proposed method of combining sCT can take advantage of different normalization preprocessing methods to improve the accuracy of all tissues and all OARs in sCT images, which has considerable clinical value for improving the accuracy of dose calculation in ART based on MVCBCT images.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
None.

DATA AVA I L A B I L I T Y S TAT E M E N T
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

E T H I C S S TAT E M E N T
The studies were reviewed and approved by ethics committee of the First People's Hospital of Hefei (2022-037).