Raman Spectroscopy Reveals Abnormal Changes in the Urine Composition of Prostate Cancer: An Application of an Intelligent Diagnostic Model with a Deep Learning Algorithm

Early diagnosis of prostate cancer (PCa) is always a great challenge in clinical practice, especially in distinguishing benign prostatic hyperplasia (BPH) from early cancer, due to the high similarity in pathology from the prostate‐specific antigen (PSA) test and radiological detection. The conventional diagnostic methods are often less efficient in specificity and accuracy, leading to quite a few unnecessary biopsies. This work establishes a noninvasive diagnostic method for PCa by investigating urine samples using Raman spectroscopy and convolutional neural network (CNN) algorithm. The results of urine Raman spectra show the intensities of characteristic peaks for lipids, nucleic acids, and some amino acids are distinguishable between PCa and BPH, suggesting an abnormal metabolism caused by PCa, which can be detected by Raman spectroscopy. These data are then used to train an intelligent diagnostic model with CNN algorithm. The cross‐validation results show the mean diagnostic accuracy, sensitivity, and specificity for PCa are 74.95%, 77.32%, and 72.46%, respectively. This noninvasive diagnostic method is a promising method for the early diagnosis of PCa, and the idea of using urine Raman spectroscopy with deep learning techniques for diagnosing PCa provides a reference for the application of artificial intelligence in the field of clinical medicine research.


Introduction
Prostate cancer (PCa) is one of the most common malignancies of the male urinary system and is one of the leading causes of cancer death among elderly men worldwide. With an estimation of more than prostate biopsy. [3] Even with the improved rate of biopsy by the use of a nomogram, the detection rate is still less than 40%. [4] Therefore, there is an urgent need for new noninvasive, highly sensitive, and specific diagnostic tools for the early diagnosis of PCa, which may help to reduce unnecessary biopsies and avoid overdiagnosis and overtreatment.
In recent years, the development of liquid biopsy technology in the early detection of tumors has attracted wide attention. Before the use of liquid biopsy, traditional surgery biopsy was the gold standard for cancer diagnosis. With the application of advanced optical technology, such as third harmonic generation (THG) microscopy, surgical biopsy has improved in speed and accuracy. [5][6][7][8][9][10][11] However, surgical biopsy is invasive and causes pain for patients, so it is not suitable for early cancer screening during physical examination. Liquid biopsy is a noninvasive cancer diagnosis method that detects highly sensitive biomarkers, such as circulating tumor cells (CTCs) and exosome microRNA, that could be used in early cancer detection [12] and postoperative monitoring. [13,14] However, the current liquid biopsy technologies were less efficient due to the cell loss or other cell contamination in the enrichment process of CTCs and potential uncertainties in the subsequent immunoassay. Studies have shown that the detection of urine exosome microRNAs has a high accuracy and stable effect in the early diagnosis of PCa, [15] which is conducive to the correct evaluation of the necessity of prostate biopsy to reduce overdiagnosis. [16] Unfortunately, exosomal microRNA sequencing involves a complicated extraction process and reverse transcription-polymerase chain reaction (RT-PCR), which is an inefficient and potentially costly method.
Raman spectroscopy, which has been widely used in noninvasive tumor detection in recent years, could be used in liquid biopsy and is expected to be a new method for early cancer diagnosis. The Raman scattering effect proposed by C. V. Raman in 1928 is an inelastic scattering process of light. [17] Based on this theory, Raman spectroscopy technology was established to explore the overall molecular composition and relative content of substances by analyzing the spectrum formed by a series of wavelength offsets and scattered light intensities. However, the intensity of Raman scattering is usually weak. In practical applications, the surface-enhanced Raman scattering (SERS) technique is generally used to enhance the conventional Raman scattering signal. With the help of silver or gold nanoparticles as enhancers, the excited SERS spectrum could be strong enough for biochemical detection. [18] For biological samples with complex components, the detected SERS spectrum is usually the superposition of a series of molecular spectra. Therefore, the SERS spectra of proteins, sugars, lipids, nucleic acids, and other biochemical substances are varied due to their complex composition, which could be used to distinguish and identify the molecular fingerprints of samples. [19] In recent years, SERS has been widely used in the diagnosis of colorectal cancer, bladder cancer, breast cancer, brain cancer, oral squamous cell cancer, and other cancers. [20][21][22][23][24] Although the SERS spectrum shows information on the substance composition and relative content, it is not easy for researchers to identify and understand this information as an abstract data representation form. At present, statistical methods and machine learning algorithms are often adopted for analyzing SERS spectral data. Statistical methods are able to analyze the differences in some positions of SERS spectra of objects in multiple groups, indicating composition changes. Machine learning algorithms can learn the potential rules and characteristics of SERS spectral data from multiple groups for research objects, as well as predict and classify the unknown data.
Machine learning algorithms used for analyzing SERS spectral data include principal component analysis (PCA), back propagation neural networks (BPNNs), logistic regression (LR), linear discriminant analysis (LDA), and support vector machine (SVM). In recent years, as a branch of machine learning, the emerging deep learning technologies for computer vision and natural language processing have also gradually been applied to spectral analyses. Deep learning uses neural networks to learn useful feature representations directly from raw data. As the development of deep learning algorithms was inspired by the working principle of the biological nervous system, they have intelligence that is similar to the human brain and can achieve state-of-the-art accuracy in object classification. In particular, the convolutional neural network (CNN) can extract advanced characteristic information that is easily understood by the human brain layer by layer from complex and dense data. If applied to SERS spectrum analyses, theoretically, CNN could also extract the characteristic information for substance identification and classification from SERS spectra.
This research proposes a noninvasive method for prostate liquid biopsy based on urine Raman spectroscopy and deep learning algorithms, which may provide a reference for the application of artificial intelligence in the field of clinical medicine research.

Difference Analysis of Urine SERS Spectra in the BPH and PCa Groups
The mean Raman spectrum for each patient was calculated by averaging all the repeated measurements, and the mean Raman spectrum for each group (PCa and BPH) was calculated by averaging all the mean Raman spectra of patients. According to the data, the mean spectra of the two groups of patients with PCa and BHP showed obvious characteristic peaks at Raman  shifts of 496, 591, 640, 684, 725, 812, 850, 890, 959, 1014,  1094, 1134, 1204, 1247, 1333, 1357, 1401, 1467, 1549, 1622, and 1703 cm À1 (Figure 1). There were visible differences in the intensities of the characteristic peaks of the BPH and PCa spectra at 725, 812, 959, 1014, 1094, 1134, 1247, 1333, 1357, and 1467 cm À1 . To analyze the significance of the differences between the two groups, t-test and z-test analyses were conducted for the mean intensity of each characteristic peak. The results showed that there was a significant difference in the intensity of the urine SERS spectra near 725 cm À1 between the BPH and PCa groups (p < 0.05 for t-test or p < 0.01 for z-test), and there was also a slight difference near 959 cm À1 (p < 0.1 for t-test or p < 0.05 for z-test; Table 1 and Figure 2).
The Raman spectrum is a kind of molecular fingerprint spectrum that can specifically characterize the components and relative content of substances in biological samples. In recent decades, a large number of studies in related fields have reported some corresponding relationships between Raman spectral characteristic peaks and biological substances. This article referred to the published data in many studies. After careful consideration and analysis, the possible correspondences between urine Raman spectral characteristic peaks and biological substances were tentatively formulated ( Table 2).

Differential Analysis for Lipid Metabolism-Related Characteristic Peaks
The characteristic peaks with significant differences in the intensities between the two groups were distributed at approximately 725, 959, 1094, 1333, and 1467 cm À1 (p < 0.1, z-test). To further study the relationship between PCa and the change in lipid metabolism with the urine SERS spectrum, the peaks at 959 cm À1 (cholesterol), 1094 cm À1 (lipid), and 1467 cm À1 (lipid) were analyzed ( Figure 3A a-c). The results showed that the intensities of urine SERS spectra in the PCa group were lower than those in the BPH group at the three lipid-related peaks of 959, 1094, and 1467 cm À1 , indicating that the urine lipid content of PCa patients might be lower than those of BPH patients (p > 0.05 for t-test, p < 0.1 for z-test; Figure 3A d).
www.advancedsciencenews.com www.advintellsyst.com urine nucleic acid content in this article ( Figure 3B a-c,e-g). The results showed that the intensities of urine SERS spectra in the PCa group were slightly higher than those in the BPH group at the 6 peaks of 890, 1014, 1204, 1247, 1333, and 1357 cm À1 ( Figure 3B d,h), indicating that the nucleic acid, nucleotide-or base-related substances in the urine of PCa patients might be higher than those in the urine of BPH patients (p > 0.05).

Differential Analysis for Amino Acid Metabolism-Related Characteristic Peaks
During the genesis and progression of cancer, the stress response triggered by the rapid proliferation of tumor cells could lead to abnormal amino acid metabolism in the human body, and the concentration of some small molecules (such as tyrosine) in these abnormal metabolites may exceed the normal level. [25] Therefore, the changes in specific amino acid concentrations in urine may be closely related to the genesis and progression of tumors. Studies have shown that urine tyrosine levels in patients with lung cancer [26] and bladder cancer [27] are higher than those in healthy people. Moreover, some studies have successfully developed a urine tyrosine detection kit and applied it to the clinical detection of early cancer. [28] Therefore, it could be inferred that the changes in urine tyrosine contents should be reflected in the urine spectra of PCa patients. By analyzing urine SERS spectra, the results showed that the intensities of tyrosine characteristic peaks at 640, 812, and 1204 cm À1 (Table 2) in the PCa group were slightly higher than those in the BPH group ( Figure 3C d), indicating that the urine tyrosine content in PCa patients might be higher than that in BPH patients (p > 0.05). Like tyrosine, phenylalanine and tryptophan have also been reported to be elevated in the urine of patients with various tumors. [26,29] We further extracted and analyzed the SERS spectral characteristic peaks corresponding to phenylalanine and tryptophan. According to the published data, 640 and 1204 cm À1 may not only be characteristic peaks of tyrosine but also correspond to the vibration of phenylalanine molecules, while the peaks of 1204, 1333, 1357, 1549, and 1622 cm À1 are characteristic of and correspond to tryptophan ( Table 2). By analyzing these characteristic peaks (Figure 3C a, B c,f,g, 3D a and b), the results showed that all the intensities of urine SERS spectra at these six peaks in the PCa group were slightly higher than those in the BPH group ( Figure 3D c,d), indicating that the urine phenylalanine and tryptophan content in PCa patients might be slightly higher than those in BPH patients (p > 0.05). In addition, the spectral intensity of the characteristic peak at 1204 cm À1 (corresponding to proline) was slightly higher in the PCa group than in the BPH group (p > 0.05), which was consistent with the increase in the proline concentration reported in the urine of lung cancer patients. [29]

Differential Analysis for Characteristic Peaks of Urine Erythrocytes Porphyrin and NADH
Clinical painless hematuria is one of the signs of PCa, and most patients with early, middle, and advanced PCa have hematuria symptoms. We analyzed the SERS spectral characteristic peaks related to hematuria and found that the characteristic peaks located at 1549 and 1622 cm À1 were not only related to red blood cells but also corresponded to porphyrin and nicotinamide purine dinucleotide [nicotinamide-adenine dinucleotide hydrogen (NADH)] ( Table 2). Porphyrin is an important component of hemoglobin, and its concentration is directly related to erythrocyte content. More relevant studies have shown that porphyrin [30] and NADH [31] concentrations in urine of patients with various tumors were higher than those of healthy people. Therefore, the content changes of erythrocyte, porphyrin, and NADH in urine could be studied by extracting and analyzing the characteristic peaks of 1549 and 1622 cm À1 (Figure 3D a,b). The results showed that the intensities of urine SERS spectra at the two peaks in the PCa group were slightly higher than those in the BPH group (Figure 3D c,d), indicating that the content of erythrocytes, porphyrin, and NADH in urine of PCa patients might be higher than those of BPH patients (p > 0.05).
In addition, it should be noted that the characteristic peak intensity of 725 cm À1 in the urine spectrum of the PCa group was significantly lower than that of the BPH group (p < 0.05 for t-test, p < 0.01 for z-test; Figure 3E). This characteristic peak may correspond to hypoxanthine (Table 2), which is an intermediate product of purine nucleotide catabolism and is distributed in the liver, blood, and urine of animals. In recent years, studies have found hypoxanthine in human tears by Raman spectroscopy, and hypoxanthine is expected to be an indicator for disease screening. [32]

PCa Diagnosis Model Based on the CNN Algorithm
The CNN is a kind of deep neural network model commonly used in the field of deep learning and is mainly used for image recognition and analysis in computer vision research. CNN can process pixel information of an original image through multiple convolution layers to extract the advanced features that are easy to understand by the human brain. The CNN model used in this article was equipped with a structure generally similar to that of LeNet-5, [33] which has two convolution layers and two pooling layers for feature extraction and data reduction, and finally used the fully connected layers for classification ( Figure 4).
Different from the classic LetNet-5, the input layer, convolution layer, and pooling layer were all changed into a 1D linear structure to adapt to the input and processing of spectral data ( Figure 5A). In detail, the dimension of the input layer size was set as 1400 Â 1 to receive the input of spectral data. In the first convolution layer, 50 kernels with a size of 1 Â 12 were used to preliminarily extract the data features. A following maxpooling layer of 1 Â 2 was designed to compress the data size to 700 Â 50. Similarly, the second convolution layer with 100 kernels of 1 Â 12 and the second max-pooling layer of 1 Â 2 were assembled subsequently. In addition, a flatten layer was stacked behind the second max-pooling layer to pull the output data into a straight line to facilitate docking with the next fully connected layer. The number of nodes in the fully connected layer and output layer was set as 1024 and 2, respectively, for classification ( Figure 5B).
In this article, fivefold cross-validation was used to train and evaluate the performance of the CNN model for the urine Raman www.advancedsciencenews.com www.advintellsyst.com spectrum classification of patients with BPH and PCa. Fivefold cross-validation is a commonly used method for dataset partitioning and model training, in which the dataset is randomly divided into five disjointed subsets of the same size. In every fold, the four datasets are used to train the model, and the remaining dataset is used to test the model. This procedure is repeated 5 times and uses different subset combinations each time. The loss function was cross-entropy, and the loss was calculated using the following formula where O is the model output response, Y is the label value, and N is the total number of observations. The optimization algorithm was Adam (derived from the adaptive moment estimation), [34] which is a very commonly used optimizer in machine learning, and the formula is as follows where θ is the model parameters, E(θ) is the loss function, β 1 is the gradient decay factor, β 2 is the squared gradient decay factor, ϵ is a constant value, α is the learning rate, and l is the iteration number. The m and ν values are the first-order and second-order moment estimations of the gradient, respectively. The first-order (m l ) and second-order (ν l ) moment estimations of the gradient are used to dynamically adjust each parameter so that it makes the parameter change smoothly. As CNN has the ability of automatic feature extraction, the full-dimensional spectral data were directly fed with no additional feature extraction algorithms. During the training process, to prevent the training from falling into the local optimal solution, the input dataset was divided into several batches with a size of 100, and the training stopped when the number of iterations exceeded 200. The results showed that the average accuracy of the CNN model by fivefold crossvalidation reached the peak value of 74.95 AE 1.51% at the 169th iteration ( Figure 6), and the average PCa diagnosis sensitivity, specificity, negative likelihood ratio, and positive likelihood ratio were 77. 32 Figure 6C). Based on the experimental results, a noninvasive liquid biopsy method for PCa by Raman spectroscopy using a deep learning technique can be envisaged as follows: collecting urine from the subject, preparing the testing sample by mixing urine with a Raman surface enhancer (concentrated silver sol), measuring Raman spectra with standard parameters by Raman spectrometry, making baseline corrections and normalizations for spectral data, making predictions by feeding the data to the trained diagnostic model and calculating the probability of PCa, repeating the above procedure several times, and obtaining the average probability of PCa. According to the results calculated by the diagnosis model and other examination results, the clinician can reach a comprehensive diagnostic conclusion. www.advancedsciencenews.com www.advintellsyst.com 3. Discussion

Liquid Biopsy Technique for Early PCa
PCa is a common malignant tumor in the urinary system, mostly occurring in elderly men. At present, PSA is commonly and clinically used as a marker for early screening of PCa. [35] Generally, a serum PSA concentration below 4 ng mL À1 is within the normal range, while the cancer risk is increased if more than 10 ng mL À1 . However, some benign prostate diseases, such as acute prostatitis, BPH, and urinary tract infection, can also cause elevated PSA levels. [36] In particular, BPH is often similar in both PSA levels and medical images, so it is relatively difficult to distinguish, and it can only be diagnosed clearly through multiple biopsies, which increases the physical and mental pain of patients. In recent years, substantial studies have found that in the body fluids of some cancer patients, genetic material (such as ctDNA and exosome microRNA) that is released by tumors during development and progression can be detected, so these specific genetic materials can be used as markers not only for the detection and tracking of tumor progression but also for the early diagnosis of malignant tumors. [37] This method is highly specific because it relies on the support of genetic technology and only requires a small amount of liquid samples, such as peripheral blood, to complete the test without traditional biopsy, such as puncture or surgery; [38] therefore, it is called liquid biopsy technology. It was found in a recent study that the CTC test could be used as a marker to predict progression-free survival of metastatic PCa in antiandrogenic therapy. [39] In addition, according to relevant reports, patients in the gray area of PCa diagnosis (4 ng mL À1 < PSA ≤ 10 ng mL À1 ) had a significantly increased risk of PCa when their ctDNA content exceeded 180 ng mL À1 . [40]

Cancer Detection by Urine Sample
In addition to using blood for liquid biopsies, in recent years, a team of researchers has also developed liquid biopsy techniques using urine samples. A study found that by detecting changes in the relevant ctDNA in the urine of patients with nonsmall cell lung cancer (NSCLC), the tumor response to targeted therapy could be monitored. [41] In another relevant report, the content of miR-618 and miR-1255b-5p in the urine of bladder cancer patients was significantly increased compared with that in the control group (p < 0.05), suggesting that these two microRNAs could be used as specific genetic markers for the early diagnosis of bladder cancer. [42] In addition to the detection of genetic markers in urine that can be used in liquid biopsies, numerous studies have also been reported on the use of nongenetic markers in urine for tumor diagnosis. According to a relevant study, survivin in urine could be used to diagnose bladder cancer with high sensitivity and specificity. [43] In addition, urine metabolomics analysis has also been applied in the detection of renal cancer, and potential molecular markers have been identified. [44] 3.

Application of Raman Spectroscopy in Cancer Detection
Similarly, the purpose of this article was to try to use urine as a test sample to explore the early diagnosis of PCa. In contrast to the detection of the tumor genetic marker ctDNA in urine, this article obtained comprehensive information on the chemical substances contained in urine by detecting the Raman spectrum of urine and made a diagnosis by using spectral information, providing a new technique for liquid biopsy of PCa. The Raman spectrum is a kind of molecular fingerprint spectrum that can specifically represent the molecular vibration information of the substance contained in the sample. The peaks on the spectrum are different due to the vibration modes of the molecular covalent bonds, and these peaks or combinations of peaks can be used as evidence to support the existence of some corresponding molecules to some extent. In addition, the superposition of characteristic peaks of all molecules with Raman scattering activity in the sample constitutes an important part of the Raman spectrum. Therefore, the Raman spectrum contains comprehensive information on the components in the sample, which may be a prominent advantage of this method compared with the conventional molecular chemical analysis method. According to a study, by detecting the Raman spectra of human plasma samples and using the PCA and LDA algorithms, nasopharyngeal carcinoma patients and healthy populations could be distinguished with a sensitivity and specificity of 90.7% and 100%, respectively. [45] Another study used the serum and prostatic fluid of PCa and BPH patients and analyzed their Raman spectra with the LDA algorithm to obtain a PCa serum Raman spectral diagnosis model with a sensitivity of 75%, specificity of 75%, and accuracy of 75% as well as a PCa prostatic fluid Raman spectral diagnosis model with a sensitivity of 60%, specificity of 76.5%, and accuracy of 68%. [46] 3.

Urine Raman Spectroscopy Liquid Biopsy for Early PCa
Unlike most PCa Raman spectroscopy studies, the test samples used in this article were urine, a liquid biological sample more readily available than serum, prostatic fluid, and prostate tissue. As urine is generated by the filtration of venous blood by the kidney, it contains a variety of human metabolites and thus, to some extent, contains information about human health or disease status. [47] The genesis and progression of tumors often cause changes in human metabolism, such as amino acid metabolism and nucleic acid metabolism, [25,48,49] which may be reflected in changes in the urine composition. In addition, the urethra passes through the prostate in the male urinary system, so certain molecular markers produced by abnormal prostate tissue may be released into the urine, leading to changes in the urine composition. Therefore, theoretically, the changes in substances in the urine could be reflected in Raman spectra. This article measured and compared the differences in the Raman spectra between the urine of patients with BPH and PCa to explore a noninvasive method for the early diagnosis of PCa. Multiple teams found potential tumor diagnosis markers using the method of urine metabonomics research. [44,50,51] Similarly, this article also referred to the corresponding relationship between the Raman peak and substance based on published data, summarized the urine Raman characteristic peak measured by experiment into lipids, nucleic acids, amino acids, and four other categories according to substance types (Table 2), and discussed the changes in PCa and BPH in urine metabolomics www.advancedsciencenews.com www.advintellsyst.com according to different metabolite types. Among the urine metabolites, the change in amino acid content caught our attention and interest. At present, many relevant studies focus on the relationship between changes in certain amino acid levels in urine and tumors. [26,29,52] Accordingly, this article also analyzed the differences in amino acid content in urine in the BPH and PCa groups by Raman spectroscopy. The results showed an elevation in urine tyrosine, phenylalanine, tryptophan, and proline contents in PCa patients, and this finding was consistent with the studies of urine tyrosine in multiple kinds of tumors. [25,26,29] This change in the urine amino acid concentration might be related to the selective uptake of amino acids by tumor cells. In addition to analyzing the changes in amino acid metabolites in PCa urine, the characteristic peak differences related to urine erythrocytes, porphyrins, NADH, and hypoxanthine were also found in Raman spectra in this article. The results showed a slightly rising trend for NADH and porphyrins in the urine of the PCa group, which was consistent with the findings in studies of other cancers. [30,31] In addition, this article speculated that the increased intensity of characteristic peaks of urine red blood cells and hypoxanthine in the Raman spectrum was also related to the canceration of prostate tissue, which may be a potential diagnostic marker for PCa.

PCa Diagnostic Method Based on the CNN Algorithm with Urine Raman Spectroscopy
Although Raman spectra can reveal certain molecular changes in urine for PCa, it is still difficult to detect the disease directly depending on the spectral data because the Raman spectral data are abstract, complex, and hard to parse. Another important reason might be that the profile shapes of the spectra are very similar and the intensity differences of characteristic peaks are too low to be clearly distinguished manually. Therefore, a proper algorithm is essential to analyze the spectral data and classify the PCa and BPH patients through the underlying data differences. To establish an intelligent PCa urine diagnosis model based on Raman spectral data, this article preliminarily explored the application of a deep learning algorithm in Raman spectral diagnosis. The algorithm used in this study was CNN, a deep learning algorithm often used for image recognition in computer vision. In image recognition applications, the CNN algorithm extracts and compresses the features of the original input image layer by layer through multiple convolutional layers and pooling layers, summarizes the advanced features that are easy for humans to understand, and then inputs the full connection layers for classification. The back-propagation algorithm based on gradient descent is often used in the training of CNN, but compared with BPNN, CNN greatly reduces the number of parameters by sharing the filter parameters in the convolution layer, thus making CNN faster than BPNN in dealing with complex problems and effectively inhibits the occurrence of overfitting. The spectral data obtained in this study were vectors (1D tensor) rather than matrices, but the adjacent elements contained in them had strong correlations, which were similar to the property of image data composed of pixel matrices. Therefore, it is feasible to use the CNN algorithm to study and model spectral data. To make CNN suitable for processing spectral data, this study modified the LeNet-5 model, trained and tested the model with spectral data, and obtained a test accuracy of 74.95 AE 1.51%, sensitivity of 77.32 AE 12.11%, and specificity of 72.46 AE 15.13%. Compared with traditional machine learning algorithms such as LR, LDA, and SVM, one of the most prominent advantages of the CNN method is probably that it does not need to specially design the feature extraction methods. With regard to traditional machine learning, there is no best feature extraction method or a one-size-fits-all method. Finding the right algorithm is partly based on trial and error, and perhaps even highly experienced data scientists cannot tell whether an algorithm will work without trying it. However, the feature extraction of the CNN convolutional layer is adaptive to data and is constantly adjusted by the optimization function to the appropriate direction in the iterative process. Therefore, researchers do not need to pay special attention to the problem of data feature extraction but only need to design the appropriate number of convolutional layers according to the dimension and characteristics of the data. In addition, CNN is a deep neural network stacked by multiple convolutional layers, which can extract deeper features hidden in the data, and as the number of layers increases, the extracted features become increasingly abstract and advanced. This may also be one of the reasons why the CNN has the ability to recognize and classify data like a human. Therefore, the CNN algorithm combined with urine Raman spectroscopy is expected to be a new method for PCa liquid biopsy.

Prospect of Urine Raman Spectroscopy Diagnosis for PCa
According to the latest version of the prostate cancer diagnosis and treatment guidelines, [53] the current standard procedures for the clinical diagnosis of PCa can be summarized as follows: PSA test, digital rectal examination (DRE), medical imaging examination [mainly prostate-enhanced magnetic resonance imaging (MRI)], prostate biopsy, bone scan, or positron emission tomography-computed tomography (PET-CT) scan. The PSA, DRE, and MRI examinations are conducted before the biopsy, and the bone/PET-CT scan is conducted after confirmation of PCa by biopsy to estimate the metastasis in the whole body.
As a noninvasive diagnostic method using only urine samples, urine Raman spectroscopy can be used prior to PSA testing for a preliminary diagnosis. The diagnostic algorithm model can calculate the probability of PCa based on urine Raman spectrum data. If the model indicates that the risk of PCa is high, the subject should be advised to undergo PSA testing. In fact, due to its advantages of being noninvasive, rapid, and inexpensive, the urine Raman spectroscopy diagnostic method is more suitable for preliminary PCa screening in the health examination of the general population. If a man is over 50 years old and/or has a family history of PCa, urine Raman spectroscopy should be used as part of a regular routine physical examination.
This study proposes a noninvasive method for prostate liquid biopsy based on urine Raman spectroscopy and a deep learning algorithm, which could provide a reference for the application of artificial intelligence in the field of clinical medicine research.

Experimental Section
Subject Sample Data: A total of 84 human urine samples used in this study were provided by the Urology Experimental Center of Renji Hospital affiliated with Shanghai Jiaotong University Medical School and were collected from 45 BPH patients (with an age of 67.95 AE 6.37 years and PSA of 10.12 AE 3.47 ng mL À1 ) and 39 early PCa patients (with an age of 68.51 AE 8.96 years and PSA of 11.00 AE 4.09 ng mL À1 ). All the patients included in this study had no other tumor diseases or drug abuse, and there were no significant differences between the two groups for age and PSA level. Patients diagnosed with PCa after prostate biopsy underwent MRI and PET-CT examinations to assess the malignancy and clinical stage of the tumor. The results showed that the PCa group only included T1 and T2 stages (tumor node metastasis stage), and no lymph node invasion was found.
Written informed consents were obtained from the subjects or their next of kin prior to the study. This study was approved by the Institutional Ethics Committee of the Ren Ji Hospital affiliated to the Shanghai Jiao Tong University, School of Medicine.
Sample Preparation: Urine samples were collected from subjects who had fasted for 12 h, centrifuged at a low speed (1000 RPM) for 10 min, and immediately frozen in an ultralow temperature refrigerator at À80 C after removal of sediment. During the experiment, the urine samples were first defrosted into liquid at room temperature, and then the SERS procedure was carried out.
Raman Enhancement Substrate Preparation: A 2 mL of solution of silver nitrate (AgNO 3 , 0.1 mol L À1 , AP, Aladdin) was added to 198 mL of ultrapure water (18.3 MΩ cm, 25 C), and the mixed solution was heated to boiling. Then, 3.6 mL of 1% trisodium citrate (Na 3 C 6 H 5 O 7 , AP, Aladdin) solution was added to the mixture above and stirred evenly. The mixture was slowly heated and kept simmering gently until it turned gray-green. The heating was stopped and the sample was cooled to room temperature. To prepare a highly concentrated silver sol, the prepared silver sol was centrifuged at a high speed (7500 RPM) for 10 min to separate the supernatant and obtain the viscous colloid in the lower layer. Then, a certain amount of separated supernatant was used to dilute the viscous colloid in the lower layer to a constant volume of 1.0 mL to obtain the highly concentrated silver sol with a 200 times higher concentration.
SERS Spectrum Detection: Ten microliters of urine and 20 μL of highly concentrated silver sol were evenly mixed together (volume ratio 1:2). A Horiba HR Evolution LabRAM laser confocal micro-Raman spectrometer was used for the detection of Raman spectra. The experimental conditions and parameters were set as follows: the laser wavelength was 785 nm; the laser power was 3.3 mW; the wavenumber resolution was 1 cm À1 ; the photon acquisition time was 10 s; and the scanning range was 400-1800 cm À1 . Every time before the measurement, the instrument was calibrated, and all the operation procedures followed the manufacturer's instrument instructions. The laser was focused on the surface of the sample through an L50Â long focal lens, and Raman spectra of five to seven different positions were randomly collected for each sample to obtain as much data as possible and reduce the measurement error.
Data Processing and Analysis: All SERS spectra original data were obtained by using LabSpec software (Horiba, Japan). A total of 261 spectral data were obtained from 45 BPH urine samples, and 240 spectral data were obtained from 39 PCa urine samples. HyStudio Subase V2.16 was used to conduct baseline correction of the original data. The intensities of Raman characteristic peaks for the two groups were compared by t-test and z-test. The deep learning algorithm was implemented by using Keras with the TensorFlow back-end. All 261 BPH and 240 PCa Raman spectral data were included in the dataset. The training and validation sets were selected by fivefold cross-validation so that they were randomly divided from the dataset on a scale of 1:4 for every fold. The dataset was divided at the patient level so that the repeated measurements for the same patient were kept in the same set.