Rapid Identification of Species, Antimicrobial‐Resistance Genotypes and Phenotypes of Gram‐Positive Cocci Using Long Short‐Term Memory Raman Spectra Methods

Antimicrobial resistance is an aggravating public health problem worldwide, with more than 700 000 deaths attributable to infections caused by antibiotic‐resistant bacteria annually. To tackle this challenge, it is important to design appropriate regimens based on data regarding the species identity of bacterial pathogen concerned, as well as their antimicrobial‐resistance genotypes and phenotypes. Herein, a novel method that utilizes artificial intelligence to analyze Raman spectra to identify microbes and their susceptibility to commonly used antibiotics at both genotype and phenotype level is developed. A total of 130 strains of Enterococcus spp. and Staphylococcus capitis with known minimum inhibitory concentrations (MICs) of commonly used antimicrobial agents are included in this study. After the models are configured and trained, long short‐term memory (LSTM) based Raman platform is developed and is found to be able to offer an accuracy of 89.9 ± 1.1%, 82.4 ± 0.6%, and 60.4–89.2% in bacterial species classification, identification of antimicrobial‐resistance genes (ARGs), and prediction of resistance phenotypes, respectively. This novel method exhibits higher level of accuracy than those using the machine learning algorithms. The results indicate that Raman spectroscopy combined with LSTM analysis can be used for rapid bacterial species identification, detection of ARGs, and assessment of drug‐resistance phenotypes.

Antimicrobial resistance is an aggravating public health problem worldwide, with more than 700 000 deaths attributable to infections caused by antibiotic-resistant bacteria annually. To tackle this challenge, it is important to design appropriate regimens based on data regarding the species identity of bacterial pathogen concerned, as well as their antimicrobial-resistance genotypes and phenotypes. Herein, a novel method that utilizes artificial intelligence to analyze Raman spectra to identify microbes and their susceptibility to commonly used antibiotics at both genotype and phenotype level is developed. A total of 130 strains of Enterococcus spp. and Staphylococcus capitis with known minimum inhibitory concentrations (MICs) of commonly used antimicrobial agents are included in this study. After the models are configured and trained, long short-term memory (LSTM) based Raman platform is developed and is found to be able to offer an accuracy of 89.9 AE 1.1%, 82.4 AE 0.6%, and 60.4-89.2% in bacterial species classification, identification of antimicrobial-resistance genes (ARGs), and prediction of resistance phenotypes, respectively. This novel method exhibits higher level of accuracy than those using the machine learning algorithms. The results indicate that Raman spectroscopy combined with LSTM analysis can be used for rapid bacterial species identification, detection of ARGs, and assessment of drugresistance phenotypes.
To date, several acquired linezolid-resistance genes have been described, including cfr, cfr(B), cfr(D), optrA, and poxtA. [4][5][6][7] More importantly, the rate of isolation of linezolid-resistant Enterococci (LRE) and linezolid-resistant Staphylococci (LRS) in clinical settings has increased year by year. [8][9][10] Rapid, accurate, and preferably automated methods for identification of LRE and LRS microorganisms are urgently needed for guiding the choice of treatment regimens based on the bacterial species concerned and their drug susceptility profiles. Current diagnostic methods require cell culturing to detect and identify the species and assess its antibiotic susceptibility. It takes a day or two to obtain comprehensive diagnostic results; hence, there is preference of adopting the empirical antimicrobial therapy approach to treat bacterial infections, but such approach is commonly associated with excessive and irrational consumption of antimicrobials in the clinical settings, as well as selection of drug-resistant strains. [11] Genotypic analysis of the infecting agent without tedious culturing process bears the features of rapidity and high sensitivity. However, these techniques require skilled personnel with good traning and experieces and is still not able to provide sufficient inforamtion on the resistance phenotype of the organisms concerned.
In recent years, physicochemical methods for whole-organism fingerprinting have attracted significant attention in the medical fields due to their potential for being used in rapid disease diagnosis. [12][13][14] Raman spectroscopy is a noninvasive, fast, and sensitive optical technology based on inelastic scattering. Since Raman spectrum is an ensemble of molecular vibrations, it provides rich but complex information reflecting unique biological characteristics of the cell and its structures. [15] However, due to the high similarity of the features in Raman spectrum, it is difficult to distinguish them by the naked eye. [16] In addition, the efficiency of Raman scattering is low (%10 À6 -10 À8 scattering probability). [14] These subtle spectral differences are easily masked by the background noise. Machine learning and deep learning methods have been introduced for Raman spectra analysis and can outcome the avove shortcomings of Raman spectroscopy detection. [17][18][19][20][21][22][23] In this study, we aim to investigate the application potential of a LSTM-based techniques for rapid identification of bacterial spectra according to the antibiotic susceptibility profiles of organisms in the test sample.

Bacterial Isolates
A total of 130 Gram-positive cocci collected from 2011 to 2021 in a clinical microbiology laboratory were tested in this study. These included fifty Staphylococcus capitis, twenty-seven Enterococcus faecium, and fifty-three Enterococcus faecalis strains. Thirty Enterococcus spp. and twenty-five S. capitis strains were found to carry the transferable linezolid-resistance genes optrA and cfr. The identity of these two antimicrobial-resistance genes was verified by PCR. The remaining seventy-five S. capitis, E. faecium, and E. faecalis strains did not carry any linezolidresistance genes. All the Gram-positive cocci were recovered on Columbia blood agar (Oxoid, Basingstoke, UK) upon incubation for 24 AE 2 h at 35°C as previously described. [18] Species identity of the 130 isolates was confirmed by performing matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (Bruker Daltonik GmbH, Bremen, Germany; MALDI Biotyper), followed by antimicrobial susceptibility tests and Raman spectroscopy analysis.

Antimicrobial Susceptibility Tests
The minimum inhibitory concentrations (MICs) of eight commonly used antimicrobial agents (benzylpenicillin, ampicillin, ciprofloxacin, levofloxacin, quinupristin, linezolid, tetracycline, and nitrofurantoin) against the E. faecium and E. faecalis strains were determined by the broth microdilution method. The antibiotic susceptibility of S. capitis strains to eight antibiotics (gentamicin, ciprofloxacin, levofloxacin, moxifloxacin, clindamycin, quinupristin, linezolid, and erythromycin) was evaluated using the same method. All the results were interpreted according to CLSI guidelines (www.clsi.org).

Raman Microscopy
The experiments were performed using a InVia Reflex confocal Raman microscope (Renishaw; Wotton-under-Edge, UK). The excitation wavelength selected for the study was 785 nm (originating from a near infrared diode laser) in a range of 390.79-1552.14 cm À1 at %150 mW laser power. Wavenumber calibration was performed using a silicon wafer by setting the silicon peak to 520 cm À1 . A 50Â microscope objective (Leica, Wetzlar, German) was used to focus the excitation light onto the sample. The diffraction grating used had 1200 lines mm À1 to maximize spectral resolution (<1 cm À1 ), and the integration time was 1.0 s. The 130 Gram-positive strains were split into two groups. One hundred and four isolates were employed to obtain the original data set for deep learning model training, while the remaining isolates were used to obtain an independent test data set for model evaluation. For each isolate, three "biological replicates" were analyzed, so that the samples were taken from three freshly prepared bacterial culture plates on different days. Under the given measurement parameters, 120-135 spectra were collected from each isolate. Hence, the total original data set consisted of 13 811 spectra, and the independent data set consisted of 3308 spectra.

Raman Spectral Preprocessing
Preprocessing of Raman spectral is crucial. Appropriate preprocessing methods can effectively improve the applicability of the deep learning model. The spectra were preprocessed in three steps: 1) background subtraction, 2) smoothing, and 3) normalizing according to previously described methods. [20] We used the SavitzkyÀGolay filter for smoothing. Polynomial baseline fitting was performed to remove the fluorescence background. The spectral data were then normalized by using the zero-mean normalization (Z-score) approach to reduce the effect of spectral intensity variability caused by laser power fluctuation. The packages of the R program (v3.6.2) were used throughout the process, including the "prospect" and "baseline" analysis. Finally, we removed the spectra with cosmic rays using the Nearest Neighbor algorithm.

Model Architecture and Training Details
The LSTM network used in this paper is shown in Figure 1a, including the input layer, LSTM layer, full-connection layer, and output layer. The number of units in the LSTM layer and the first full connection layer was 50 and 100, respectively. LSTM has a memory unit (including three gates), which has a specific memory function. The three gates are input gate i t , forget gate f t , and output gate o t , which determine how much the LSTM maintains its previous memory and extracts current information. As LSTMs are highly prone to overfitting, a dropout layer with a dropout probability of 0.5 was added after the LSTM layer. The LSTM unit structure is shown in Figure 1b.
The stratified cluster sampling method was adopted for categorization of the original data; briefly, the data were stratified on the basis of the bacterial class and resistance genes concerned ( Figure 2). The Raman spectrum data, including 1015 data points, were then used as the input data. The LSTM training was performed by adopting Tanh as the activation function. The cross-entropy loss function and Adam optimizer were used to calculate the loss values and perform backpropagation optimization, respectively. The model initialized the hyperparameters of each layer randomly with a learning rate of 0.001 and weight decay rate of 1e À4 . During the training, the output values of the model were compared with the species tag to measure the error, and the parameters of the whole model were updated by the back propagation algorithm. Through multiple rounds of training, the error between the output values of the LSTM models and the real species tag was minimized. The latest model that has minimal loss was ranked as the optimal model, and the predicting accuracy on the test that was set by this LSTM model was calculated. Finally, the LSTM taxonomic model that reached the highest accuracy was selected from the five optimal models.
We then used our trained LSTM models to identify bacterial class, antimicrobial-resistance genes (ARGs), and determine the drug-resistance phenotypes in an independent data set. The diagnostic performance of these identification models was assessed using confusion matrixes. The accuracy, sensitivity, and specificity of the deep learning models were calculated. All procedures were implemented with the PyTorch deep learning framework in Python programming language in the NVIDIA GeForce RTX 3070 Ti platform.

Identification by ML Algorithms
Identification of bacterial class and the transferable linezolidresistance genes (cfr, optrA), as well as determination of the drug-resistance phenotypes of the test strains, were also accomplished by using some traditional machine learning algorithms, including decision tree (DT), support vector machine (SVM), k-nearest neighbors (KNN), and logistic regression (LR). The robustness of each node was assessed by the bootstrap method (five replicates). The predictive power of these machine learning models was calculated by the analysis of their accuracy, sensitivity, specificity, using data obtained in the test data set.  Figure 1. Construction of LSTM taxonomic model. a) The diagrammatic sketch of the structure of long short-term memory (LSTM) network. The LSTM network contains the input layer, LSTM layer, full-connection layer, and output layer. b) LSTM unit structure. The input gate decides which values will be updated and creates a vector of new values to be added and updated to the state. After data input, the LSTM's forget gate decides which information to discard. This gate examines the prior hidden state and current input, yielding a binary output. Subsequently, the LSTM decides what new information to store in the cell state. Finally, the LSTM unit decides the sequential output based on the current cell state. The sigmoid and hyperbolic activation functions determine which parts of the cell state to output.

Statistical Analysis
The mean accuracy for the LSTM, DT, SVM, KNN, and LR models was tested for equal variances using Levene's test. The student's t-test or Welch's t-test was used to test whether the differences between the mean accuracy recorded by the LSTM and machine learning (ML) algorithms were statistically significant. A P-value <0.05 was considered to be statistically significant. Multicomparison correction was performed by Tukey's method.

Preprocessing of Raman Spectra Data
Raman spectra contain useful information of a sample but is often accompanied by interfering information such as background fluorescence signal, cosmic rays, and other random noises. Therefore, eliminating noise and removing background fluorescence through spectral preprocessing are important and necessary for collecting meaningful information. We therefore removed the spectral background, normalized noise using a smoothing filter, and performed baseline correction as well as area normalization by using the R program (see Figure S1, Supporting Information, for an example of comparison between raw and corrected spectra). The corrected spectra are displayed in Figure 4. Molecules such as proteins, phospholipids, polysaccharides, and nucleic acids in different groups of bacterial species form specific fingerprint signatures in the Raman spectra. According to previous reports, [24][25][26][27][28][29][30][31] tentative assignments of the main Raman bands relevant to this study are provided in Table 1.

Deep Learning and Traditional Classifiers for Bacterial Classification Based on Raman Spectra
In total, we measured fifty S. capitis, twenty-seven E. faecium and fifty-three E. faecalis strains using confocal microscopic Raman spectrometer in short measurement time, and constructed data sets of 13 811 spectra after preprocessing the data as mentioned earlier. The relative standard deviations (RSDs) were shown to be from 9.7% to 13.2% and considered acceptable. [32,33] We then trained the neural network of microbial species identification model, which generated a probability distribution profile across three different species (S. capitis, E. faecium, and E. faecalis), and the maximum model was taken as the predicted models. We utilized the bootstrap method that involved five resampling rounds to evaluate the classification accuracy of the LSTM taxonomic model. Accuracy, error rate, and validation loss were monitored during the process of training of the neural network. The LSTM models were ended at 50 epochs as the accuracy of the prediction of the model did not significantly increase if the greater epochs were used. The latest model that has minimal loss was ranked as the optimal model, and the predicting accuracy on the test that was set by this LSTM model was calculated. Finally, the LSTM taxonomic model that reached the highest accuracy was selected from the five optimal models. We next used our trained LSTM taxonomic model to identify the species in an independent data set based on its ramanome. The trained LSTM taxonomic model made a prediction on each ramanome and assigned it to a species category. As shown in  Figure 2. The process of resampling using the bootstrap method. The original Raman data were split into training sets and test sets by the stratified cluster sampling method. One set (25% of the data) was used to test the LSTM model and the remaining sets (75% of the data) were used for training the LSTM model. The robustness of the model was assessed by resampling 5 times using the bootstrap method. The best-performing model was selected as the optimal taxonomic model. The final LSTM taxonomic model was evaluated by the independent test data set.
www.advancedsciencenews.com www.advintellsyst.com Figure 5a, for the E. faecalis and S. capitis, the accuracies of identification were 98.8% and 92.4%, respectively. The identification accuracy of E. faecium was more than 80% by our LSTM taxonomic model. Overall, our LSTM taxonomic model was able to identify the different microbial species with an average accuracy of 89.9 AE 1.1% (Figure 5b, Table S1, Supporting Information).
In comparison, we predicted the species of these strains by four traditional machine learning algorithms, including DT, SVM, KNN, and LR, based on ramanomes. The average prediction  www.advancedsciencenews.com www.advintellsyst.com accuracies of DT, SVM, KNN, and LR algorithms on the independent data set were 69.0%, 81.4%, 46.7%, and 63.36%, respectively (Figure 5b).

Deep Learning and Traditional Classifiers for Antimicrobial-Resistance Genes Prediction from Raman Spectra
Since linezolid is a last resort antimicrobial agent used for the treatment of serious infections caused by Gram-positive pathogens, timely detection of linezolid-resistance genes may help to control the dissemination of LR Gram-positive cocci. In this work, an ARGs prediction model was constructed based on two linezolid-resistance genes (optrA and cfr). The performance breakdown of the LSTM model for ARGs is displayed in the confusion matrixes in Figure 6a. For identification of isolates that harbored neither the cfr gene nor optrA gene, the accuracy was 82.4%. The prediction accuracy for cfr-positive isolates was higher than 97% in our deep learning model. However, the classification accuracy of the optrA gene was only 73.1%, which was the lowest among all classification tests. We next used the same strategy to build four common machine learning models. The accuracies of DT, SVM, KNN, and LR were 70.8%, 80.0%, 58.7%, and 78.5%, respectively ( Figure 6b, Table S1, Supporting Information). As shown in Figure 6, the LSTM model achieved a higher identification accuracy (82.4 AE 0.6%) when compared with the other four machine learning models.

Deep Learning and Traditional Classifiers for Determination of Antibiotic Susceptibility Phenotypes
To devise a rapid culture-free antibiotic susceptibility test using Raman spectroscopy, we trained a binary LSTM classifier to differentiate between resistant strains and nonresistant strains according to the susceptibility profiles toward commonly used antibiotics. The LSTM model for the drug-resistance phenotypes

Discussion
The Raman spectrum of a cell contain rich information on molecules. [15] For bacteria, different morphological or physiological features are associated with distinct molecular profiles. Therefore, Raman spectroscopy is able to readily identify bacteria  The accuracy boxplot to five types of models; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.
www.advancedsciencenews.com www.advintellsyst.com at the species and genus levels. [16,[19][20][21] Several recent studies demonstrated the ability of Raman spectroscopy to discriminate between sensitive and resistant strains, [18,34,35] especially when combined with deep learning methods. [17,22,23] In this work, we developed a large data set which comprises a total of 13 811 spectra acquired from eighty Enterococcus spp. and fifty S. capitis strains. After preprocessing, only some subtle differences in the Raman peaks between Enterococcus spp. and S. capitis can be observed by the naked eye ( Figure 4). The band assignments (Table 1) were based on information in the literature. Some spectral bands are common in both Enterococcus spp. and S. capitis, such as those at 786, 1005, 1158, and 1445 cm À1 , but the band at 786 cm À1 in Enterococcus spp. is significantly lower than that in S. capitis at the same peak position, while the band at 1005 cm À1 in Enterococcus spp. appears more dominant than that in S. capitis strains. The band at 1158 cm À1 is an indicator of the C-C/CN stretching in Enterococcus spp. and S. capitis; [27] however, this band is highly prominent in S. capitis species. The Raman spectrum of S. capitis exhibits the characteristic bands at 1516 cm À1 , which are not present in the Raman spectra of both E. faecium and E. faecalis. The cfr gene is a transferable oxazolidinone-resistance gene, which encodes a methyltransferase for methylation of A 2503 in 23S rRNA and may affect the binding of antibiotics to the active site of ribosomal peptidyl-transferase. [5] The optrA gene encodes an ATPbinding cassette F (ABC-F) protein that protects the bacterial ribosome from binding to the antibiotics. [7] These genes affect the composition of nucleic acids and protein in the cell both qualitatively and quantitatively: as resistant strains may contain entirely different genetic material compared to sensitive ones.
Since the spectroscopic vibrations are primarily related to the skeletal structure of nucleic acid and the encoded proteins, it is reasonable to assume that strains harboring different LR genes can be allocated into different categories based on their ramanome. Furthermore, it is likely that for these spectra, the classification result is not only related to changes in genetic related content but also to many metabolic changes that resistant strains undergoes. [36] Resistant bacteria can cause a certain burden and incur fitness cost, which is related to a loss in enzyme efficiency, changes in cell wall thickness, and membrane porin content, even when grown in optimal conditions. [37] While each of these changes is important in resistance, they are slight changes in the overall metabolic profile of a cell and will not affect the spectra dramatically as they may not be prominent. [38] Hence, it was difficult to distinguish the presence or absence of LR genes (optrA and cfr) by the naked eyes, as visual interpretation of such minor spectral differences may lead to misdiagnosis. A robust algorithm for data analysis was required under such situation. LSTM architecture, an improvement on the CNN and recurrent neural network (RNN) models, was first proposed in 1997, which is an expert in dealing with time-series data and avoiding the long-distance dependence of recurrent neural networks. [39] Given these advantages, LSTM has been widely used in tasks such as signal processing and disease prediction. [40][41][42] Three gating mechanisms are added in the LSTM layer, namely, the input gate, forget gate, and output gate. The forget gate determines how much ct of the unit state at the last moment is retained until the current moment. The input gate determines how much of the network's input xt is saved to the cell state ct at the current moment. The output gate decides how much information propagates to the next time step. Certain characteristics of bacteria, such as the nucleic acid and protein information, in the Raman spectrum appear not only a peak but also as a distribution pattern across different regions of the entire Raman data set. When the Raman data are inputted into the LSTM network, the model can comprehensively analyze the feature information distributed in the whole Raman full-length segment by superimposing the retained features information onto the feature information at this time. A previous study showed that LSTM networks could classify different species faster and better than other convolutional network methods. [43] Therefore, we chose LSTM combined with Raman spectroscopy to identify Enterococcus spp. and S. capitis strains at both the genotype and phenotype levels. Furthermore, we attempted to compare the LSTM model with other classical machine learning algorithms like DT, SVM, KNN, and LR, which are the most preferred models for bacterial identification studies with Raman spectroscopy. [18,22,44] The results of bacterial discrimination experiments showed that the LSTM model could identify E. faecalis and S. capitis with higher accuracy (98.8%, 92.4%) than the four machine learning www.advancedsciencenews.com www.advintellsyst.com algorithms. These differences were statistically significant (P < 0.01) ( Figure 5). The accuracy for E. faecium discrimination was slightly lower (80.9%). The 12.7% misclassified Raman spectra of E. faecium were identified as E. faecalis by our LSTM model, indicating that the power of LSTM-based Raman system was not high at the species level. The result seems to be different from the observation by Ho et al. [17] This might be due to 1) the use of more nonduplicated clincial E. faecium and E. faecalis strains for model construction and verification in our study, 2) the use of different samples for model building and testing in our study, and 3) not using surface enhanced Raman spectroscopy (SERS) to increase the signal intensity in our study. Regarding detection of the LR genes (optrA and cfr), the LSTM model exhibted better performance and higher discriminative power than other four machine learning methods ( Figure 6). The accuracy of phenotype-based identification is important for guiding the efficient therapies for clinical infections. Uysal et al reported an accuracy of 97.8 AE 0.63%, 92.3 AE 0.38%, 88.9 AE 1.51% and 82.8 AE 0.69%, respectively, for KNN, SVM, DT, and NB on detection and classification of MRSA. [18] Thus, these spectra were then used to train a LSTM model for rapid resistance phenotypes prediction. However, the accuracies for resistance phenotype prediction in GN, LEV, MXF and E were lower than 70%, which was probably due to the deficiency of training samples. Overall, significant differences were observed in the accuracy of prediction of phenotypic resistance to the commonly used antimicrobial agents when compared to LSTM models with the ML algorithms. Taken together, the high accuracy of the deep-learning-based Raman system suggested that we can utilize Raman spectroscopy to produce a fine-grained and reliable genotype and phenotype identification for both Enterococcus spp. and S. capitis strains. Our deep-learning-based Raman system applied herein for rapid and reliable identification of bacterial species, antibioticresistant, and susceptible bacteria bear the features of simplicity, labeless, and rapidness to achieve rapid and accurate diagnosis to reduce the morbidity and mortality rate of clinical infections. Although this deep-learning-based algorithms use sophisticated computing tools, these models can be used by laboratory personnel who are not experts in the field. Upon further training and fune-tuning, their clinical application potential can be further enhanced, allowing standard clinical microbiology laboratory to analyze data using pretrained networks.

Conclusions
In conclusion, a novel method for prediction of microbiological, ARGs, and drug-resistance phenotypes using LSTM network combined with Raman spectroscopy was developed in this study. To the best of our knowledge, this is the first study that utilize LSTM-based Raman system for prediction of antimicrobial resistance at both genotype and phenotype levels. Compared with the traditional classifiers, LSTM-based Raman platform exhibits high accuracy (89.9 AE 1.1%, 82.4 AE 0.6%, and 60.4-89.2%, respectively) in terms of bacterial classification, ARGs, and resistance phenotypes prediction. This LSTM-based Raman platform could facilitate rapid more effective treatment of bacterial infections by providing more accurate diagnosis, thereby reducing healthcare costs and misuse of antibiotics.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.