Assessing the variability and correlation between SUV and ADC parameters of head and neck cancers derived from simultaneous PET/MRI: A single‐center study

Abstract Objective Intratumoral heterogeneity is associated with poor outcomes in head and neck cancer (HNC) patients owing to chemoradiotherapy resistance. [18F]‐FDG positron emission tomography (PET) / Magnetic Resonance Imaging (MRI) provides spatial information about tumor mass, allowing intratumor heterogeneity assessment through histogram analysis. However, variability in quantitative PET/MRI parameter measurements could influence their reliability in assessing patient prognosis. Therefore, to use standardized uptake value (SUV) and apparent diffusion coefficient (ADC) parameters for assessing tumor response, this study aimed to measure SUV and ADC's variability and assess their relationship in HNC. Methods First, ADC variability was measured in an in‐house diffusion phantom and in five healthy volunteers. The SUV variability was only measured with the NEMA phantom using a clinical imaging protocol. Furthermore, simultaneous PET/MRI data of 11 HNC patients were retrospectively collected from the National Cyclotron and PET center in Chulabhorn Hospital. Tumor contours were manually drawn from PET images by an experienced nuclear medicine radiologist before tumor volume segmentation. Next, SUV and ADC's histogram were used to extract statistic variables of ADC and SUV: mean, median, min, max, skewness, kurtosis, and 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles. Finally, the correlation between the statistic variables of ADC and SUV, as well as Metabolic Tumor volume and Total Lesion Glycolysis parameters was assessed using Pearson's correlation. Results This pilot study showed that both parameters’ maximum coefficient of variation was 13.9% and 9.8% in the phantom and in vivo, respectively. Furthermore, we found a strong and negative correlation between SUVmax and ADVmed (r = −0.75, P = 0.01). Conclusion The SUV and ADC obtained by simultaneous PET/MRI can be potentially used as an imaging biomarker for assessing intratumoral heterogeneity in patients with HNC. The low variability and relationship between SUV and ADC could allow multimodal prediction of tumor response in future studies.


INTRODUCTION
Head and neck cancer (HNC), including oral cavity, oropharyngeal, hypopharyngeal, and laryngeal tumors, is the seventh most common type of cancer worldwide. Recent studies have reported that the 5-year survival rate for HNC is approximately 40-50% and accounts for 3% of cancer-related deaths after medical treatments. [1][2][3] The main effective treatment options for HNC include surgery, chemotherapy, radiation therapy, targeted therapy, and immunotherapy. The advancement of treatments improved HNC outcomes in the recent decade. 4,5 However, recurrent HNC persists and can limit curative treatments. 6 Eckardt et al. reported that the majority of patients with HNC experienced recurrent cancer within 2 years after primary treatment. 7 Intratumoral heterogeneity is associated with cellular and molecular characteristics such as cellular proliferation, necrosis, fibrosis, differences in blood flow and angiogenesis, cellular metabolism, tissue stiffness, hypoxia, and specific receptor expression. 8 It poses a major challenge for the clinical management of patients with HNC owing to persistent drug tolerance within heterogeneous cell populations. 9 This could lead to tumor recurrence after the first treatment.
Medical imaging plays a vital role in assessing intratumoral heterogeneity as it provides spatial tissue characteristics of intratumoral mass, thereby allowing heterogeneity analysis. For instance, [ 18 F]-fluorodeoxyglucose (FDG)-positron emission tomography ([ 18 F]-FDG PET) is used to measure the therapeutic response relatively early in the course of treatment by measuring various FDG parameters, such as average standard uptake value (SUV mean ), maximal standard uptake value (SUV max or SUV peak ), total lesion glycolysis (TLG), and metabolic tumor volume (MTV) . The TLG is the product of SUV mean and MTV (TLG = SUV mean * MTV).
In addition to [ 18 F]-FDG PET, diffusion-weighted magnetic resonance imaging (DWI-MRI) is also used to investigate heterogeneity in tumor. DWI-MRI quantification is based on the microscopic random translational motion of water molecules in biological tissues. The magnitude of translational motion is described by its apparent diffusion coefficient (ADC) values derived from DWI-MRI images. As a result, tissues can be characterized through the variation of ADC values. 10 Previous studies showed an inverse correlation between ADC values and cellularity in tumor, and they stated that ADC parameter can reflect tumor microstructure. 11,12 The use of ADC value was also suggested for characterizing head and neck tumors. [13][14][15] Simultaneous PET/MRI is a new approach for functional and morphological imaging modality, in which [ 18 F]-FDG PET and MRI images are obtained simultaneously, allowing accurate spatially aligned multiparametric imaging to characterize intratumoral tissues. 18 It is also a powerful tool for evaluating biology and pathology and shows great potential for assessing intratumoral heterogeneity in HNC. Meyer et al. 16 proposed PET and MRI images' histogram analysis, a novel technique for radiological image analysis, to assess the correlation between histogram based ADC parameter and complex FDG-PET parameters, including SUV max, SUV mean, TLG , and MTV in HNC patients. They found that ADC entropy had a good correlation with MTV and TLG.
Although the ADC and FDG-based parameters have been reported that it had the potentials in clinical use in assessing tumor response, the variability in the reported accuracy of [ 18 F]-FDG PET and confounding factors, such as early reduction in activity in the presence of viable tumors or increases in uptake secondary to inflammatory processes following chemotherapy and radiotherapy, still limits the reliability of SUV parameters in assessing tumor response, [17][18][19] and its clinical use for HNC prognosis is debatable. Therefore, the reliability of SUV and ADC should be investigated before adopting these parameters in clinical research because of the variability of SUV and ADC parameters obtained in a single or multi-imaging center. [20][21][22] The interscanner and intrascanner variability of quantitative parameters, in particular fractional anisotropy, of different 3T MR scanners (seven models form four vendors) obtained from nine imaging centers in 30 healthy volunteers was reported by Schlett et al. (2016). 23 They found a higher reproducibility of intrascanner than for interscanner comparisons. The range of interscanner variability varied from 1.0% to 53.2%. They suggested that the differences decreased when using identical MRI model by single vendor. In addition, a significant difference of SUV max values in phantom across 10 imaging centers was also studied by . 24 They reported that SUV variability was in the range of 10-25% and suggested that the variability of SUV could be potentially increased more than this report due to biological and protocol factors.
Therefore, as the first step for assessing HNC response following the first treatment using SUV and ADC parameters as a multimodal predictor, which is derived from simultaneous PET/MRI at our center, we aimed to measure the SUV and ADC variability and also identify a correlation between their parameters in patients with HNC to ultimately develop multimodal prediction of head and neck tumor response in future study.

METHODS
As summarized in Figure 3, the study's methods were performed as follows.

SUV and ADC variability measurement in phantoms
In this present study, we measured variability of SUV and ADC parameters derived from PET/MRI scanner both in phantom and in-vivo by utilizing coefficient of variation (CV) value as variability index in the rest of this work, as it is most commonly used measure for repeatability in medical imaging by many investigators. 25 Megabecquerel (MBq) on average at the start of the measurement, except for the spheres with diameters of 28 and 32 mm, which were empty. The phantom was placed at the center of the MRI table of PET/MRI scanner (3T Biograph mMR, Siemens Healthineer, Erlangen, Germany). Images were obtained using PET-optimized body radiofrequency (RF) receiving coils and standard-of -care OSEM point spread function algorithm, as described in Table 2 for 3 consecutive days with one session per day using an identical imaging protocol used in routine simultaneous PET/MRI scans for patients with HNC, as shown in Figure 1 (a).
Three scanned datasets were imported into the 3Dslicer software (www.slicer.org) for SUV measurements.Image post-processing was performed to remove image noise before measurements using median noise filtering with a neighborhood size of 2 × 2 × 2. Then, the region-of -interest (ROI) with a size of 0.19 mm 2 was manually drawn at the center of each sphere to measure SUV. The measured SUV values across six spheres were averaged to identify the coefficient of variation (standard deviation [SD]/mean) and determine intersession variability.

2.1.2
The variability of ADC parameter measurement using in-house diffusion phantom The ADC variability was assessed using an in-house diffusion phantom and modified from a phantom developed by Hara et al. 28 The phantom comprises eight tubes, each containing sucrose solution with NaN 3 (0.03% w/w) of 0, 0.2, 0.4, 0.6, 0.8, 1.0, and 1.2 mol/L and ultrasound gel for the eight tubes as shown in Figure 1 The phantom was placed inside a PET-optimized head/neck RF coils (16 channels). DWI images were obtained for 3 consecutive days (sessions) with three scans per session without removing the phantom. All data were acquired in the axial direction using the identical imaging protocol used in routine clinical scanning. The acquired datasets were imported into the 3Dslicer for median noise filtering (neighborhood size of 2 × 2 × 2) and N4ITK MRI bias field correction before measuring the ADC value by manually drawing a single ROI at the center of each tube in all datasets (size, 0.61 ± 0.03 mm 2 ). The measured ADC values were averaged to compute both intra-(within session1) and intersession variability for each tube.

ADC variability measurement in healthy volunteers
Five normal volunteers (age, 21-22 years) were prospectively recruited for assessing the variability of ADC parameters in MRI. A signed informed consent form was obtained from all volunteers before MRI scanning. DWI-MRI examinations of the head and neck were performed twice on the healthy volunteers with a scanning interval of 1 month using the identical imaging protocol used in patients with HNC. ADC images of all volunteers were imported into the 3Dslicer to perform N4ITK MRI bias field correction and median noise filtering (neighborhood size of 2 × 2 × 2), which computes the value of each output pixel as the statistical median of the neighborhood of values around the corresponding input pixel. This image filter method is one of efficient methods to remove "salt-and-pepper" noise.
The right and left parotid glands, as normal tissue references, were contoured using semi-automatic drawing tools to measure ADC values in each patient from all datasets. ADC values were then averaged to compute intersubject variability of the parotid glands.

Simultaneous [ 18 F]-FDG PET/MRI scans
Simultaneous head and neck PET/MRI was performed using a fully integrated whole-body PET/MRI scanner (3T Biograph mMR, Healthineer, Erlangen, Germany). Patients were asked to fast for 4-6 h, and blood sugar level (dextrostix) was measured before the examination as per routine practice. Before the examination, a dose volume (approximately 3-5 mL) of 158.10 ± 59.55 MBq of [ 18 F]-FDG was intravenously injected into the vein in the arm. Thereafter, patients were instructed not to do any strenuous movements for 60 min before the scan. This allows for minimizing the injected radiotracer uptake into the muscles or tissues because of brain activation. During the examination, patients were positioned headfirst supine to obtain static PET/MRI images simultaneously using routine head and neck imaging protocol. Single-shot spin-echo diffusion EPI MRI acquisition was performed using an attenuated head/neck RF coil (specific PET/MRI coils). The T1-weighted (T 1 w) turbo spin-echo DIXON MR sequence was utilized for attenuation correction (AC) by segmenting tissues into the four tissue classes of adipose tissues, soft tissues, lung adaptive, and bone, as well as air-filled spaces for the synthesis of head and neck attenuation correction map (µ-map). PET images were corrected according to the following methods: normalization, dead-time correction, attenuation correction, scattering correction with relative modelbased, and decay correction. The parameters of image acquisition and image reconstruction were provided in Table 2.

Correlation between ADC and SUV parameters in patients with HNC
We utilized histogram analysis method as it was reported as an effective method for tumor hetero-geneity. First, the tumor mass was contoured by a nuclear medicine radiologist using an [ 18 F]-FDG PET attenuation correction (AC) image. Then, PET AC and ADC images and tumor contours were imported into the 3Dslicer for image post-processing, including median noise filtering (neighborhood size of 2 × 2 × 2) and N4ITK MRI bias field correction for ADC images. Tumor contouring volume was converted into a binary mask volume. Next, all post-processed images and binary mask volume were exported as a nifty file before importing into the MATLAB software (MATLAB ver. R2022a) for image segmentation ( Figure 2). Finally, the histogram of segmented tumor mass in SUV and ADC parameters was extracted to compute the ADC mean and ADC median values denoted as ADC med ; ADC max , ADC min, and ADC Kurtosis denoted as ADC kur ; ADC Skeness denoted as ADC ske and ADC 5,10,25,50,75,90, and 95percentiles ; SUV mean and SUV med denoted as SUV med , SUV max , and SUV min ; SUV Kurtosis denoted as SUV kur ; and SUV Skewness denoted as SUV ske and SUV 5,10,25,50,75,90, and 95 percentiles variables, respectively, to assess a correlation across these extracted variables. In addition to statistic variable extraction, TLG parameter was calculated by multiplying MTV with SUV mean for each patient.

F I G U R E 3
Workflow diagram of methods for assessing variability in phantom and in vivo, as well as correlation between ADC and SUV in head and neck cancer patients.

Statistical analyses
Microsoft Excel was used to calculate descriptive statistics. SUV values measured from the NEMA phantom were expressed as the mean and SD to compute intersession variability (SD/mean). ADC values measured from in-house diffusion MR phantom were also expressed in the mean and SD to measure intrasession CV, whereas intersession CV of ADC was computed from the averaged ADC values and SD across sessions. For ADC variability measurements in vivo, the average ADC values and SD of the right and left parotid gland across two scanning sessions were used to measure intersubject variability. We tested patient's data normality by Shapiro-Wilk test using SPSS software (SPSS ver.29) to determine the normal data distribution. Pearson's correlation was used to see if there is a linear relationship between two quantitative variables between the average values of extracted ADC and SUV variables, as well as standard TLG and MTV parameters, using a MATLAB's function, [R,P] = corrcoef(), which provides the Pearson correlation coefficient (r) of two random variables. P-values of < 0.05 were considered statistically significant. If the correlation coefficient (r) value lies near ± 1, between ± 0.5 and ± 1, between ± 0.30 and ± 049, and below ± 0.29, it is considered to show a perfect, strong, moderate, and low degree of correlation, respectively.

SUV and ADC variability measurements in phantoms
The SUV's intersession variability of the first to the eighth spheres in the NEMA phantom were 5.8%, 8.7%, 13.9%, 5.5%, 9.7%, and 8.6%. The intersession variability of all spheres was < 10%, except the third sphere (13.9%). Figure 4-(a) shows the comparison of intersession CV across all spheres.

ADC variability measurement in the in-house diffusion phantom
The ADC's intrasession variability of the first to the eighth tube was 7.9%, 8.7%, 5.8%, 4.1%, 9.3%, 7.9%, 7.9%, and 8.0%, respectively, whereas the intersession variability of each tube was 10.0%, 10.4%, 8.4%, 9.0%, 5.6%, 4.0%, 7.0%, and 9.8%. Figure 4-(b) shows the comparison of intra-and intersession variabilities of each tube containing different sucrose concentrations and ultrasound gel in the last tube. Figure (4-c) shows that the intersubject variability of the ADC value measured in healthy volunteers was 9.7% for the right parotid gland and 9.8% for the left parotid gland. The ADC's intersubject variability of < 10% in both the parotid glands was observed.

Correlation between ADC and SUV parameters in patients with HNC
With the normality of 11 patient data for average SUV and ADC variables using Shapiro-Wilk test (P = 0.247 and P = 0.181, respectively), Table 3 shows the correlation coefficient value (r-value) across SUV and ADC variables. An r-value is an indicator of the strength of the linear relationship between ADC and SUV variables. An r-value of > 0 indicates a positive relationship; < 0 signifies a negative relationship, and 0 indicates no relationship between the ADC and SUV variables. Asterisk indicates a significant correlation (P < 0.05). With small data, MTV and TLG parameters had a negative correlation with ADC mean , ADC med , ADC min , and ADC values at 5, 10, 25, and 75 percentiles, but was not significantly correlated. ADC mean and ADC med had a good and significant negative correlation with SUV mean , SUV med , SUV max , SUV 25 , SUV 50 , SUV 75 , SUV 90 , and SUV 95 . The ADC 5 , 10, 25, and 50 had a good and significant negative correlation with SUV mean, med, max, 25,50,75, 90, and 95 . The ADC 75 had a good and significant negative correlation with the SUV mean, med, 25, and 50 . TA B L E 3 Correlation coefficient (r-value) between SUV and ADC variables.  F I G U R E 5 Invert linear correlation observed between ADC med and SUV max variables. SUV max tended to increase with the decrease in ADC med in HNC.
No significant correlation was observed between ADC 90 and 95 , ADC ske , and ADC kur and all SUV variables Figure 5 shows an example of a linear correlation between ADC and SUV variables (ADC med and SUV max ). The SUV max value tended to increase with decrease in ADC value in patients with HNC.

DISCUSSION
This study investigated the variability of quantitative SUV and ADC parameters of a head and neck imaging protocol using simultaneous [ 18 F]-FDG PET/MRI. Our study revealed that the maximum SUV variability was up to 13.9% in the third sphere, which might be caused by motion and attenuation correction problem in the NEMA phantom. Further investigation of this issue using PET/CT for comparison is recommended to warrant the reliability use of attenuation correction method in the phantom. Furthermore, the maximum intrasession variability of ADC was up to 9.8% compared with an intersession variability of 10.4%. Finally, when assessing ADC variability in normal healthy volunteers, no difference was observed between the right (9.7%) and left (9.8%) parotid glands used as a reference normal tissue in the head and neck areas.
The diagnostic accuracy of SUV max parameter as an independent parameter for tumor prognosis is controversial owing to its reliability, 29 therefore, it is reasonable to increase the SUV max 's diagnostic accuracy using multimodal imaging technique, in particular, PET/MRI that can provide both SUV and ADC parameter at once. Especially, ADC parameter was extensively reported as a strong predictor of tumor prognosis. [30][31][32] This study found that the majority of ADC and SUV variables of HNC acquired using simultaneous [ 18 F]-FDG PET/MRI had a negative and linear correlation; all SUV variables significantly increased with decrease in ADC mean , med, 5, 10, 25, and 50 values. SUV max or SUV peak had a strong correlation with ADC mean, med,5,10,25,50 ,variables and had a moderate correlation with ADC min , but not significantly correlated. The results of this study are similar to those of Brandmaier et al., 33 who reported the inverse correlation between SUV and ADC in cervical cancer (r = −0.532, P = 0.05). However, our study did not find a significant correlation between all statistic variables of SUV parameter and ADC min , ADC max , ADC 90, ADC 95, ADC kur , and ADC skew . This may be owing to an artifact or nontumor voxel, which normally lies in both histogram tails.
To compare extracted statistic variables of ADC with standard FDG-based TLG and MTV parameters, we found that MTV and TLG tended to increase with the decrease of ADC values, except for ADC max . However, no statistical significance was observed owing to small number of sample size. ADC max increased with an increased MTV, probably resulting from tissue necrosis within a tumor region.
Only a few published studies have reported the ADC and SUV of HNC correlation obtained by simultaneous PET/MRI technique for comparison. Varoquaux et al. (2013) found that the SUV increased with the decrease in ADC but not significantly correlated. 34 They stated that ADC and SUV could be used as independent biomarkers for prognosis in head and neck squamous cell cancinoma without clinical outcome assessment. However, further investigation is still required from their conclusion, as unreliability of using SUV alone remains in current practice. To compare with this current study, the same trend of these two parameters was also found; however, we met a statistically significant correlation between most statistic variables of ADC and SUV. The improved results of this study may have been caused by simultaneously obtained images, which led to reduced confounding factors between two different scanners and time interval of scanning compared with their study, in which images were obtained from different scanners (PET/CT and MRI) and days of scanning.
Furthermore, the results from this current study could improve the limitation of using quantitative SUV alone in routine clinical practice for tumor interpretation. Injection time, serum blood glucose, volume averaging from motion, attenuation correction, and difference in scanners can vary SUV max value, leading to erroneous interpretation of the results, especially, when SUV max is used for tumor response. The significant correlation between ADC and SUV in simultaneous PET/MRI could allow to develop multimodal predictive model of tumor response.
This study has several limitations. First, we investigated a limited number of patients with HNC who underwent simultaneous PET/MRI. Increasing the sample size was suggested to improve the significance for further studies. Second, this study measured the ADC and SUV values of patients with heterogenous HNC, which may affect the alteration of the results. ADC and SUV measurement in a single type of HNC is suggested to improve the reliability of ADC and SUV parameters. Third, collected medical image data with radiological reports lacks the TNM classification of malignant tumors (TNM) information. As a result, we do not have this information for analysis in this current study. Lastly, we performed only 2D distortion correction provided by MRI vendor, median noise filtering, and image bias field correction without the correction of Gibbs ringing artefact, motion artifact, and eddy-current problem that might alter ADC values in this current study. However, we excluded DWI-MRI image data that suffered from patient motion from our data collection. We found that ADC variability was acceptable, although some diffusion image correction process was not carried out. The separated correction of Gibbs ringing artefact, motion correction, and eddy-current correction would improve the reproducibility of ADC value in simultaneous PET/MRI in future study.

CONCLUSION
This study showed the feasibility of utilizing the SUV and ADC parameters derived from simultaneous PET/MRI as a quantitative PET/MRI parameter for assessing heterogeneity in HNC. The SUV and ADC parameters were reproducible,and their significant correlation would allow developing multimodal prediction model of tumor response in further clinical studies.

AU T H O R C O N T R I B U T I O N
PW (research project leader) and MP contributed to the conception and design of the work, management, analysis, data interpretation, revising the work critically for important intellectual content, agreement to be accountable for all aspects of the work, final approval of the version to be published, and manuscript preparing. MN, SS, and PP contributed equally to data collection, diffusion phantom development, image acquisition, image postprocessing, document preparation for ethic submission, healthy volunteer recruitment, and data analysis. AJ contributed to image data acquisition, data collection, and helping in manuscript preparing. DS contributed to patients'data collection,tumor contouring,and helping in manuscript preparing. PL contributed to diffusion phantom development and helping in manuscript preparing (Supporting Information).