Development of a portable Raman device with artificial intelligence method for the detection and staging of endometrial cancer

The success of a Raman spectroscopy device in cancer detection lies in its ability to acquire high‐quality Raman signals from samples and to employ efficient classification algorithms in analysing spectral data. Portable Raman systems enabled with artificial intelligence tools are well adaptable to clinical settings and for accuracy for community‐level rapid screening. Here, we developed a robotic Raman device with a high‐efficiency Raman probe, validating it against endometrial cancers detecting high‐grade, low‐grade cancers and normal classes. Algorithms like principal component analysis‐discriminant analysis, and support vector machine were compared against the deep learning methodology; convolutional neural network (CNN) with and without data augmentation. Eventually, the system could classify high‐grade, low‐grade and normal tissues with an F1‐score of 91%, 94% and 97%, respectively. CNN with data augmentation proved to be the most dependable classifier that works well even in the presence of high background noise. Thus, we demonstrate a unique portable Raman device with AI tools for high‐sensitivity Raman analysis of endometrial cancer.

reach 28.4 million cases in 2040, referring to a 47% increase from 2020 [1]. Hence, the need for timely, accurate diagnosis and advanced therapeutic options to cater to the cancer burden is getting increasingly relevant now. Endometrial cancer is becoming the most common type of gynaecological cancer in emerging nations. Treatment usually consists of a hysterectomy with bilateral Salpingo-Oophorectomy. Patients have to undergo pelvic and para-aortic lymphadenectomy, conditional to their subtyping. Currently, intra-operative subtyping is achieved through macroscopic examination, touch imprint cytology and frozen section biopsy, which are usually time-consuming methodologies. Also, lymph node removal may lead to an irreversible chronic condition, lymphedema at later stages of life. Hence, quick, intra-operative subtyping of endometrial cancer is critical for surgeons to decide the extent of surgery, reduce the duration of surgery, and thus alleviate morbidity factors [2]. Consequently, a point-of-care biopsy device that is based on inherent bio-molecular signatures of the tissue can be immensely helpful for clinicians, to get near realtime results.
Recent advances in Raman spectroscopic instrumentations make it a sensitive methodology for early detection and rapid (within minutes) staging of cancers [3][4][5]. In Raman spectroscopy, chemical bonds in a sample are exposed to electromagnetic radiation, and the vibrational response of these bonds is evaluated to draw information regarding the molecular constituents of the sample [6]. As a medical diagnostic methodology, Raman spectroscopy has got specific advantages like fast processing, no staining or labelling required, and also the non-destructive nature of measurement [4,[7][8][9]. However, the intrinsic weak nature of Raman scattering, noise from tissues, minimal spectral variations between diseased and healthy tissues, auto-fluorescence, inability for wide-field analysis, and so forth, poses significant limitations in the clinical translation of these systems [10]. Particularly, in tissues, there exists a heterogeneous multitude of molecules, and any subtle disease-related changes in the structural, functional, and biochemical patterns should be precisely recorded and evaluated to draw an inference about the exact pathological state of the tissue. AI-based sophisticated spectral analysis tools can help unveil the subtle features and hidden dependencies within a Raman spectral database [11]. As used in our portable set-up, probe-based analysis can help for quicker analysis in patients, and precise positioning over targeted tissue lesions and can also be extended in the future as an in-situ robotic endoscopic biopsy device. Thus, a probe-based portable Raman device equipped with advanced AI spectral analysis tools has much better prospects as a point-of-care device than conventional Raman microscopes.
Several portable Raman systems were reported earlier. Zuniga et al. compared NIR excitation wavelengths of 785 nm and 1064 nm and found that classification could be attained with an overall accuracy of 91% using a 1064 nm laser and 96% with 785 nm [12]. In breast cancer detection using Raman spectroscopy, Kothari et al reported an accuracy of 93.2 to 94.6% [13] while. Ma et al recorded values of 86.5% with support vector machine (SVM)/88.5% with Fischer's discriminate analysis (FDA) and 92% with convolutional neural network (CNN) [5]. On a comparative aspect between algorithms, Dingari et al reported values of 75.3% with Logistic Regression, 69.5% with Random Forest, 80.8% with k-nearest neighbors, and 81.5% with SVM in breast biopsies [14]. A portable Raman-based "Marginbot" was developed by Thomas et al to detect cancer margins and reported 93% sensitivity and 85% specificity in ex-vivo breast tissue samples [15]. Serzhantov et al generated sensitivity of 88%, 93% specificity, and an F1-score of 90% in classifying skin cancers [16]. Joanna Depciuch et al utilised conventional ML algorithms like principal component analysis and hierarchical cluster analysis to assess carcinogenesis in endometrium and proved that Raman spectroscopy is better than Fourier transform infrared [17]. These studies indicate the potential of Raman spectroscopy in combination with machine learning algorithms to detect cancer. However, though many of the algorithms were individually tested, there exists a lack of proper understanding and one-to-one comparison in the performance effectiveness and clinical utility in real-time settings. This work also attempts to explain the concepts of AI/ML in Raman spectra-pathology for clinicians and medical researchers.
Here, we report the fabrication of a robotic portable Raman spectroscopy-based point-of-care biopsy device, enabled with AI algorithms to identify and partly stage endometrial cancer within 15 minutes. We have implemented significant improvements in the design of the Raman probe, sample stage, and micro-manipulator for 360 tissue analysis with in-situ microscopic monitoring. The recorded Raman data from 52 endometrial tissues were examined using PCA-DA, SVM, and CNN with and without data augmentation. We also identified that data augmented CNN gave a dependable performance with an F1-score of 96.8% in cancer screening, and with 91% and 93.8% in detecting Hg and Lg tissues (cancer subtyping).

| Design of portable Raman device
Raman spectra were recorded using a custom-developed setup as schematically depicted in Figure 1A. The system used a 785 nm diode laser with a sub-zero thermoelectric cooling feature to ensure spectral stability. A spectrograph-CCD unit (Antonpaar Cora 5001 Fibre) with a resolution of 6 to 9 cm À1 , a custom-developed multi-fibre Raman probe, a 360 rotatable sample bay, a 3-dimensional micro-manipulator probe holder to facilitate stable and vibration-free translocation of the probe from point-to-point, a microscope capable of up to 85Â magnification, optical camera for live viewing of samples, a computer for precise controlling of the micromanipulator, microscopic analysis of the sample and the signal processing. For better signal quality from tissues without external noise, the whole analysis unit was encased in a stealth-black box.

| Design of the multi-fibre Raman probe
To compensate for the weak nature of the Raman phenomenon, the photon collection and transfer parameters have to be improved, by increasing the number of collection fibres in such a way that the maximum collection can be ensured and the collected photons can be transferred to the spectrometer with minimal loss. As given in Figures A1 and A2, to assess the signal transfer through spatial confinements of the spectrometer, 12 optical fibres were attached vertically and horizontally at the spectrometer-entrance separated from each other at a distance of 125 μm. Raman signal was passed through each fibre one at a time to assess its measured intensity in the spectrometer. By analysing the region of fibres delivering good signal intensity, a high-sensitivity region in the spectrometer optical inlet path could be identified. Restricting the fibre bundle to this spatial confirmation ensured maximised signal collection. Further, two probe configurations, with seven fibres of step-index 200 μm core-size were compared with 25 number graded-index fibre bundle of 50 μm for assessing the spectral collection efficiency. Schematic of the multi-fibre Raman probe is given in Figure 1B. The final probe used filter sets LP02-785RE (Semrock Inc.) as long pass filter and LL01-785 (Semrock Inc.) as narrowband filter and was designed with the respective number of peripheral collection fibres arranged around the central collection fibre, with minimal spacing in-between.

| Study samples
The patient samples were taken with the approval of the Institute Human Ethics Committee of Amrita Institute of Medical Sciences, Kochi, India. The tissues were collected after hysterectomy and were grouped into high-grade tumour (Hg), low-grade tumour (Lg), and normal adjacent to tumour (normal) types, which was confirmed later by correlating with post-operative findings of the histopathology department. The sample set involved 52 tissues, of which 18 were Lg and 8 were Hg tumours and 26 adjacent normal tissues, all graded as per the FIGO grading system. In the Hg class, type 1 (five numbers) and type 2 (three numbers) were included, as both are clinically classified Hg. Excised tissues were gently rinsed with saline to wash off any blood. The mean age of the patients involved in the study was 60 years the minimum being 33 years, highest being 80 years. The mean age of Lg patients and Hg patients was 59 years and 62 years, respectively, the details of which can be observed from Table 1.

| Raman spectral acquisition
Microscopic observation of the tissue can reveal useful features like changes in colour/texture along the surface. For example, areas of myometrial invasions appear softer and granular when compared to the normal myometrium which appears firm. Such features can be picked up by general microscopic observation without using any histopathologic staining. Such regions, if found are marked and given priority during Raman analysis. The optical microscope (without any labelling of tissue) was used to inspect the tissue for any such prioritisable lesions for the ensuing Raman analysis. For the analysis, the tissue samples were loaded onto the sample bay. With the help of the probe-holding robotic 3D micro-manipulator, the Raman probe was stationed above any such suspected lesions. Further, the spectra from the lesion and nearby regions were recorded. Combining the 3D movement of the probe and 360 rotational movement of the sample holder, wider cross-sections on the tissue could be reached. To monitor the movements and positioning, a live optical camera was mounted upon the probe holder.
Raman recording was done at 200 mW laser power and at an integration time of 15 seconds. Basic spectral processing techniques such as cosmic ray removal, baseline correction, smoothening, and normalisation were performed before spectral analysis. The original spectral dataset included 513 spectra, of which 75% were used as the training dataset and 25% as the test dataset.

| Machine learning algorithms
AI/ML is a family of statistics-based methods which extracts its own set of relevant features and patterns from any raw dataset by referring to its past experiences or examples [16,18]. Pattern recognition can be attained through learning (a) pre-defined engineered features (conventional machine learning) (b) automatically from data representations via multiple levels of feature abstractions (deep learning) [19]. Conventional machine learning processes require a lot more user interference while deep learning processing is mostly automated [20,21]. Here, conventional machine learning algorithms like PCA-DA and SVM were compared to the two instances of deep learning methodology of CNN, i.e. with and without data augmentation.

| Principal component analysisdiscriminant analysis
PCA is an unsupervised method that is generally used for dimensionality reduction, which is the process of finding original correlations between features of spectra and representing them in a lower-dimensional space. Dimensionality reduction helps computational complexity and eliminates noise from the dataset. Discriminant analysis (DA) is a supervised method that works here on outputs of PCA by creating a new variable called discriminant function score, which denotes maximised separability between the classes involved. In this study, a type of DA called linear discriminant analysis (LDA) was used due it its better compatibility when dealing with multiple classes and also due to its processing via a linear combination of variables which has some operational similarities with PCA.

| Support vector machine
SVM has become a first-choice classifier in many biomedical applications. SVM attempts to identify the optimal hyperplane that maximally separates classes in the data set. The optimal hyperplane is specified by the subset of observations that fall close enough to the decision surface, to pose a classification challenge. This subset is generally termed "Support vectors". The processing is pretty straightforward if the data is linearly separable. Else, it employs mathematical functions called kernels to transform the dataset while maintaining the original architecture. In the training phase, SVM identifies important features of spectra and assigns proportional weights; which are retrieved during classification tasks. Despite SVMs' success in solving a wide range of pattern recognition problems, their main limitation is the immense time and memory requirements against high-cardinality data [22].

| Convolutional neural network
CNNs are deep-learning-based pattern-recognition machines and are composed of numerous layers of artificial neurons, as represented in Figure 2. The functioning of a CNN is analogous to the working pattern of the visual cortex of the human brain [23,24]. The visual cortex is a network of neurons arranged in layers of subcortexes and information is passed from one cortical area to the next, where each one is more specialised than the previous one. This network of activated neurons combines to produce the sensation of vision. CNN emulates this functional pattern where artificial neurons are arranged in various layers and data is passed through these layers. The function of the convolution network is to filter in the critical features of the spectral dataset while subsequently reducing spectral data into a form that is easier to process. The network has four convolution layers followed by a max-pooling layer for each convolution layer for extraction of relevant features. Finally, a fully connected layer is placed to consolidate all extracted features. The final 'softmax' layer gives a value that corresponds to the probability of each entry falling into any particular class. In CNN training, 'Epoch' refers to one full cycle of passing the entire dataset through the network. Several epochs may be needed to fully train the network. Over these epochs, the network learns the critical properties that are specific to each class, called parameters. At the end of each epoch, the system evaluates the accuracy it could attain and the error rate associated with it, termed 'Loss'. The goal of the training process is to keep the loss value as minimum as possible.

| CNN with data augmentation
As a medical diagnostic device, it is not generally possible to generate extensive datasets of each tissue type and each pathological condition, especially if a particular subtype is very rare. Contextually, high-grade tumours were scarcely available when compared to low-grade tumours in our study. This scarcity of data can severely hamper the performance of any deep learning methodology. In such situations 'data augmentation' comes relevant and has already been used to process medical data like CT, skin melanoma diagnosis, breast MRI, and histopathological images [25]. Data augmentation involves a set of techniques used to enhance the amount of training data by adding slightly modified copies of existing data or newly created synthetic data from existing data. This method, in theory, can impart two practically useful characteristics. (I) it will improve the accuracy of the deep learning method, (II) also will help the system better cope-up with minor variations and transformations of input data that can be associated with recording in a clinical set-up. However, on the flipside, if not carefully used, the method may potentially induce overfitting, which is a scenario where the machine learning models become too customised to the training set so that they fail to generalise enough while encountering an unseen dataset. Here, initially, the spectral dataset was split into training and test datasets. Data augmentation was done only for the training dataset which comprised 75% of data and was assessed against the remaining as test dataset. For augmenting data, the spectra were shifted 1 and 2 data points to both right and left and were also introduced noise levels of 5%, 15%, and 25%, eventually boosting the spectral numbers from 385 to 3080.

| Performance metrics
The performance of all algorithms was evaluated in terms of their ability to discriminate between normal, Lg, and Hg. The performance metrics used in our case are accuracy, precision, recall, and F1-score.
Accuracy is the ratio of the correctly classified samples to the total number of samples under analysis. Where T.P is true positive, T.N is true negative, F.P is false positive and F.N is false negative. Precision is the fraction of relevant instances among retrieved ones.
Recall refers to the fraction of relevant instances that are successfully retrieved, also termed sensitivity. F1-Score is the harmonic mean of precision and recall and can be denoted as, F1-score combines precision and recall in a manner that extreme values of either will be penalised. so, F1-score is a better metric for imbalanced datasets [26].

| Device development
To overcome the challenges in a clinical application, developing a high-efficiency probe was obligatory. This was attained by incorporating multiple collection fibres into a probe. The primary consideration was to optimise the method of light feeding into the spectrometer using an arrangement of optic fibres as in Figure A1. The schematic arrangement of device components and sample chamber is in Figure 1A, and the probe configuration is depicted in Figure 1B. In the experiments, it was observed that the spectrometer responded well to signals coming within the confinement of 625 μm horizontally and 875 μm vertically through the central optical axis of the spectrometer input port ( Figure 3A). Based on this, Figure 3B indicates the shape of the spectrometer light feed. Further two fibre bundle configurations were tested to assess the signal collection efficiency. We compared 50 and 200 μm fibre bundles consisting of 25 and 7 fibres respectively filling the area identified. Figure 3C shows the cross-sectional view of the 50 and 200 μm core fibres used in these two probe configurations. It was found that the fibre bundle of 7 fibres 200 μm core size and step-index type is much more efficient than 25 numbered 50 μm core sized graded-index fibres. As in Figure 3D, the peak intensity value of the 1000 cm À1 region of the reference sample Benzo nitrile, was 3.3-fold higher for the 7 fibres bundle. The probe also contained focusing lenses to confine collected signals into collection fibres and also double Rayleigh filters to cut down the noise levels to a bare minimum.

| Raman spectral features of endometrial cancer
The 3D plots of a few spectra from each class and average spectra from the dataset are given in Figure 4A, Figure 4B, Figure 4C, and Figure 4D respectively. For a visual understanding, the spectral feature corresponding to each tissue class was represented with the mean value of data points, acquired within the region between 600 to 1800 cm À1 as in Figure 5. Among the numerous subtle featural variations, the conspicuous ones are 720 cm À1 corresponding to nucleic acids, 854, 936 cm À1 Collagen/ Proline, 1004 cm À1 -Phenyl Alanine, 1096, 1327 cm À1 of DNA, amide III at 1247 cm À1 . Collagen/ phospholipids bands at 1450 cm À1 , Amide II at 1554 cm À1 and 1654 cm À1 of Amide I (protein/lipid) [27,28]. The list of relevant peaks is in Table 2. 3.3 | Analysis using conventional machine-learning algorithms

| Principal component analysisdiscriminant analysis
PCA transforms the spectral dataset into linearly uncorrelated variables called principal components, which retain most of the information within the original dataset. Principal components (PCs) represents proportions of variance of the original dataset in a decreasing fashion [29]. The first principal component PC1 attributed 91.79% of the variance, while PC2, and PC3 were 2.21%, 2.0%, respectively, the spectral loadings can be observed from Figure 6. The scatter plot of classification can be observed from Figure A3, which shows that the Hg and Lg classes are clustered relatively closer and the normal class is clustered in a different quadrant. The Hg-Lg overlap indicates a classification struggle. This indicates that the system effectively distinguishes between normal and tumour, but with limitations in subtyping. As in Table 3. The system could identify each class with F1-Score values 71.6 ± 11%, 85.6 ± 4%, 92 ± 2%, for Hg, Lg, and normal, respectively while accuracies were 91.52 ± 2.5%, 89.86 ± 2.73%, and 93.12 ± 2.3%, respectively. The Hg is classified with the lowest precisionrecall values around 70% while corresponding normal values were much better. The system has a dependable classification ability between normal and tumour classes, but subtyping posed a challenge.

| Support vector machine
SVM is a methodology with many adjustable parameters to obtain good performance in classification, broadly termed hyperparameters. The major ones are the slack variable (C) which determines the penalty for misclassification and the scale factor ()which decides the extent of influence of each observation on a classification. SVM also has got different kernels and its parameters (like degree (D) in power kernel) to be optimised to deal with the dimensionality of the data to be analysed. The optimal values of these parameters  were found from a pool of probable values. The range of C values extended from 0.1,0, 1, 10, 100, to 1000. Gamma values from 1, 0.1, 0.01, 0.001 to 0.0001, and degrees for polynomial were from 0 to 9. The best performance of SVM was obtained with RBF kernel with parameters of C = 100 and gamma = 0.1. As in Table 4, the resultant F1-scores were 89 ± 4.4%, 92 ± 2.0% 95 ± 2.0% for Hg, Lg, and normal, respectively. Accuracies were 97.12 ± 1.33%, 94.24 ± 2.0% and 95.84 ± 2.2%, respectively. The F1-Scores and accuracies were better compared to PCA-DA, especially in the case of the Hg class. SVM classified Hg with an F1-score of 89 + 4.4%, but it picked Lg and normal classes with better efficiency. The receiver operating characteristics (ROC) curve is shown in Figure A4, where the area under the curve (AUC) parameter is 0.997, which indicates good classification ability.

| Convolutional neural network
The accuracy characteristics of training 100 epochs are plotted in Figure 7A. The validation accuracy started below 50% and gradually climbed above 90%. It stays consistently around 95% through the epochs indicating that the learning process has stagnated beyond that point. Ideally, the validation and training accuracies should be converging up to a higher point closer to 100% indicating constant improvement in the learning process, efficient learning and the ability to generalise learned features. Training loss, as in Figure 7B on the other hand is the summation of the errors and is not represented as a percentage. Ideally, the losses, both training, and validation starts at a higher value and optimise themselves during training and gradually converge into each other near a value of 0. As seen in Figure 7B, the validation loss remains considerable during training process.
A dataset consisting of the original 513 spectra was processed using CNN. As can be inferred from Table 5, the prediction performance of the CNN is best in normal than Lg and Hg. The classification ability of CNN without augmentation is inferior to SVM results and almost at par with PCA-DA performance. The accuracy values of the Hg, Lg, and normal were 91.31 ± 0.64, 90.07 ± 1.27 and 94.10 ± 0.69, respectively, while the F1-scores were 70 ± 3, 85 ± 1 and 94%. The bigger the size of the training dataset, the better a deep learning method learns. The dataset size of 513 observations is suboptimal to a deep learning methodology, so it fails to pick up the critical features and also to generalise learned features. This leads to diminished classification performance against conventional ML algorithms.

| CNN with data augmentation
The training data set was boosted from 385 to 3080 spectra using the data augmentation methodology. The training run was extended for 250 epochs to analyse the process. The accuracy curve is plotted in Figure 7C. The validation-training accuracy plots converge through the epochs and remain high throughout the process. The training loss curve is plotted in Figure 7D, converging into each other at a low loss value, overall indicating a better efficient training process. It is observed from Table 6 that the system has improved classification accuracies when operating with the augmented dataset. The Hg class with F1-score of 91 ± 0.01%. The recall and precision values are comparatively higher with 91 ± 0.02% and 91.02%, respectively. The accuracy for Hg and Lg though is lower in comparison with SVM at 91.02% and 93.3 ± 0.151%, respectively. However normal is classified with improved accuracy of 96.9 ± 0.01%. The F1-Scores of Hg and normal were also comparatively higher at 93.8% and 96.8%, respectively. Overall, the CNN with augmentation system was ahead of all others in terms of F1-scores but SVM surpassed it in accuracy. The recall/precision values were also better than others with Hg, Lg and normal at 91 ± 0.02% and 91.02%, 94.02%/93.2 ± 0.01% and 96.2 ± 01%/97 ± 01%, respectively.
CNN is a complex network involving numerous hidden layers containing millions of parameters; the larger dataset is demonstrated to have optimised these parameters enabling better classification abilities. The induced subtle changes in original spectra must have acted to improve the variance of features.

| COMPARISON OF METHODOLOGIES
As per Equation 1, the 'accuracy' is the ratio of correctly classified samples to a total number of samples that gives an overall picture of the performance of the methodology but is prone to be affected by the imbalance of classes in supplied data. On the other hand, F1-Score, as per Equation 4, is better against imbalanced datasets. Accuracy serves better when the priority of analysis is on true positives and true negatives. F1-score gets affected proportional to the false positives and negatives, hence is a better dependable parameter of classification, especially for a medical diagnostic device. The violin plots show the kernel density estimates of the probability distribution of models. The accuracy obtained against classes from various methodologies is plotted in Figure 8A. The accuracy value is generally highest for normal followed by Hg and Lg. Hg class was pooled from two types of endometrial cancers type 1 and 2 which could have increased the variability within thereby affecting the predictor performance. The availability of a larger dataset to train from also helped the accuracy values of normal class. SVM gives the best accuracy values, followed by CNN with augmentation. As evident from both accuracy and F1-Score plots, SVM with RBF kernel performs better classification than PCA-DA, and CNN without data augmentation. It even matches closely with the performance of CNN with augmentation due to the strength of the methodology. Though attaining better classification precision, SVM with a non-linear kernel like RBF is prone to inferior generalisation characteristics leaving a higher chance to miss-classify under practical conditions. As can be observed from Figure 8B, the F1-score distribution depicts that CNN with data augmentation performs better than the rest with values of all the classes above 91% and also with the probability distribution falling into a tighter grouping. From the above results, it can be deduced that CNN with optimally augmented data is the best performing and better dependable classification methodology.

| PERFORMANCE OF ALGORITHMS AGAINST NOISE BACKGROUNDS
Since the system is developed as a point-of-care clinical diagnostic device, it has to be evaluated against practical anomalies in the input spectra, which are associated with a real-time clinical setup, like the spectra getting noisy. As a representative case of variations, a set of spectra were generated that the algorithms had not encountered previously. An average spectrum was derived from data from each class, like average normal and average Lg. This ensures that the generated spectra reflect all F I G U R E 9 The classification results of normal, normal with added noise and non-normal endometrial tissues by CNN model with and without data augmentation class-specific parameters, but is previously unencountered by the system to avoid any bias. Afterward, various levels of noise were added to the generated normal spectrum at levels 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90%. Further, the average Lg spectrum was set to act as a completely non-normal one. Thus, this evaluation set consists of 11 spectra including 10 normal spectra with varying noises and a non-normal spectrum.
PCA-DA, SVM, CNN, and CNN with data augmentation were made to predict the exact class of these 11 spectra (n = 5). The results are depicted in Figure 9. All systems are capable of predicting complete normal and complete non-normal, but noise creates performance variations within models. PCA-DA gets confused right from 10% and erring at 30, 60, and 70% of added noise but alternatively predicting correctly at 20%, 40%, 50%, 80%, 90% and eventually predicting non-normal correctly. SVM predicts correctly up to 30% of added noise faltering afterwards but predicts non-normal correctly. CNN without augmentation identified normal correctly up to 50% of noise but erred afterwards only to predict non-normal spectrum correctly at the end. However, when coming to CNN with data augmentation, the model could predict precisely the normal, normal with added noise and eventually identify the non-normal spectrum precisely in a majority of trials. This can be attributed to the fact that CNN (with augmentation) could precisely identify even the minute features, its intricate connections and patterns behind a classification so that even in the presence of noise these factors could be identified precisely.

| CONCLUSIONS
This manuscript showcases the design and validation of a portable Raman spectroscopy-based tissue analyser embedded with AI tools for assessing endometrial cancer tissues. The device is designed to enhance Raman signal recording with an advanced multi-fibre Raman probe, a customised recording environment to keep external noise out and to record data from even difficult-to-reach points of tissue. In addition to developments in hardware, we assess the performances of various Al algorithms, including conventional machine learning and deep learning methodologies to attain fine precision in detecting cancer. The study shows that conventional machine learning algorithms PCA-DA and SVM-based prediction models can classify spectral data with appreciable efficiency. In terms of accuracy, SVM recorded the best values classifying Hg, Lg and normal at 97%, 94%, 95%; CNN with data augmentation gives the second-best accuracy values of 91%, 93% and 97%, respectively. In precision terms, CNN with augmentation produced best values against Hg, Lg and normal with mean values of 91%, 93%, and 97% followed by SVM with 91%, 91%, and 95%, respectively. In terms of the recall, CNN with augmentation made best values against Hg, Lg and normal with 91%, 94% and 96% followed by SVM with 88%, 92% and 96%, respectively. Considering accuracy, recall and precision, F1-score represents a better classification parameter, especially for diagnostic devices. CNN with augmentation provided the best F1-scores against Hg, Lg, and normal with 91%, 93.8%, and 96.8%; followed by SVM with 89%, 92%, and 95.8%, respectively. Thus, across the majority of the chosen metrics, CNN with data augmentation was found to be the best-performing algorithm. We further analysed which methodology could correctly predict spectra ridden with noise factors, to emulate real-life clinical situations. CNN with augmentation performed best by correctly identifying classes even when the spectrum was laden with high levels of noise. It has been concluded that SVM is a very efficient classifier where the dataset remains moderate in size. As the data set cardinality increases SVM will need much more processing time and system resources. In that context, from the aspect of a clinical device where the dataset has to be extensive, CNN with a large enough dataset is a better performer in terms of classification ability, dependability and quicker analyses. Also, studies indicate that CNN performs better with no spectral pre-processing, which can be a time-saving feature during live analyses [30,31].
Therefore, our study indicates that by incorporating optimised hardware and Al-based classification algorithms this portable device can be used for quick diagnosis and staging of endometrial cancers. Further, this portable device with F1-scores above 91% across the high-grade, low-grade, and normal tissues will be a promising tool for the rapid screening of patients at a community level. The shortcoming of the methodology is the inability to scan large surface areas in a single run, which can be overcome by using Raman hyperspectral imaging methodologies, in the future. This device can further be extended as a minimally invasive endoscopic diagnostic device for in vivo cancer detection in cases like urinary bladder cancer. The methodology also has the potential to be used for detecting cancer margins during excision surgeries. F I G U R E A 1 Arrangement of optical fibres used to detect the spectrometer signal acceptance pattern. The fibres were placed at a spacing of 125 μm. The same arrangement was turned 90 and was profiled vertically too. The arrangement consists of 12 optical fibres arranged in a row and each fibre is placed at a distance of 125 μm. Pre-estimated photons were passed through each fibre one at a time. Except for the fibre transmitting, the remaining 11 fibres were closed at the other end to avoid external noises. The spectroscopic response was evaluated from each fibre. Completing the 12 fibres in a row generated a full profile of the spectroscopic acceptance. Further, the fibre panel was turned 90 and placed vertically and the same profiling was repeated. This process in whole derived a region within which if the light is fed, will be acknowledged by the spectrometer. The result of the profiling is given in Figure 3A APPENDIX F I G U R E A 2 H&E images of endometrial cancer and normal tissues. A, (High-grade carcinoma with solid areas and high-grade nuclear atypia. B, Low-grade endometrial carcinoma at 10Â magnification with glandular pattern. C, Normal endometrium 4Â F I G U R E A 3 Scatter plot of classifications obtained from PCA-DA F I G U R E A 4 ROC curve of SVM classification