In vivo diagnosis of gastric cancer using Raman endoscopy and ant colony optimization techniques

Authors

  • Mads Sylvest Bergholt,

    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    Search for more papers by this author
  • Wei Zheng,

    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    Search for more papers by this author
  • Kan Lin,

    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    Search for more papers by this author
  • Khek Yu Ho,

    1. Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • Ming Teh,

    1. Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • Khay Guan Yeoh,

    1. Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • Jimmy Bok Yan So,

    1. Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • Zhiwei Huang

    Corresponding author
    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    • Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117576
    Search for more papers by this author
    • Tel.: +65-6516-8856, Fax: +65-6872-3069


Abstract

This study aims to evaluate the clinical utility of image-guided Raman endoscopy for in vivo diagnosis of neoplastic lesions in the stomach at gastroscopy. A rapid-acquisition image-guided Raman endoscopy system with 785-nm excitation has been developed to acquire in vivo gastric tissue Raman spectra within 0.5 sec during clinical gastroscopic examinations. A total of 1,063 in vivo Raman spectra were acquired from 238 tissue sites of 67 gastric patients, in which 934 Raman spectra were from normal tissue whereas 129 Raman spectra were from neoplastic gastric tissue. The swarm intelligence-based algorithm (i.e., ant colony optimization (ACO) integrated with linear discriminant analysis (LDA)) was developed for spectral variables selection to identify the biochemical important Raman bands for differentiation between normal and neoplastic gastric tissue. The ACO-LDA algorithms together with the leave-one tissue site-out, cross validation method identified seven diagnostically important Raman bands in the regions of 850–875, 1,090–1,110, 1,120–1,130, 1,170–1,190, 1,320–1,340, 1,655–1,665 and 1,730–1,745 cm−1 related to proteins, nucleic acids and lipids of tissue and provided a diagnostic sensitivity of 94.6% and specificity of 94.6% for distinction of gastric neoplasia. The predictive sensitivity of 89.3% and specificity of 97.8% were also achieved for an independent test validation dataset (20% of total dataset). This work demonstrates for the first time that the real-time image-guided Raman endoscopy associated with ACO-LDA diagnostic algorithms has potential for the noninvasive, in vivo diagnosis and detection of gastric neoplasia during clinical gastroscopy.

Gastric cancer remains a serious health problem and has one of the worst 5-year survival rates among other malignancies.1, 2 Currently, white-light reflectance (WLR) endoscopic imaging combined with excisional biopsy is the standard approach for gastric cancer diagnosis, but this method is invasive and impractical for screening high-risk patients who may have multiple lesions in real-time. Further, subtle changes of early lesions [e.g., dysplasia, carcinoma in situ (CIS)] may not be apparent limiting the diagnostic sensitivity of WLR imaging technique. Gastric malignancies are often diagnosed at an advanced stage with delayed symptoms, and thus it is of imperative clinical value to develop noninvasive and sensitive optical diagnostic technologies that can assist in guiding endoscopists for the targeted biopsies of suspicious gastric lesions for improving gastric disease diagnosis during gastroscopic inspections. In the past decade, autofluorescence imaging (AFI) and the narrow-band imaging (NBI) techniques have considerably improved the diagnostic sensitivity,3 but the diagnosis of gastric precancer or early cancer using AFI or NBI still suffers from a poor diagnostic specificity due to the lack of ability to reveal more specific biochemical information for tissue diagnosis and characterization. Therefore, a rapid and noninvasive optical diagnostic technique enabling the direct assessment of specific biochemical and biomolecular information about suspicious lesions in vivo to complement the WLR/AF/NB imaging would represent a significant advance in the early diagnosis of gastric malignancies during clinical endoscopic examinations.

Optical spectroscopic methods such as fluorescence spectroscopy, light scattering spectroscopy and Raman spectroscopy have been comprehensively investigated for the diagnosis and evaluation of malignancies in humans.4–15 Raman spectroscopy is an inelastic light scattering spectroscopic technique that is capable of probing the highly specific vibrational frequencies of biomolecules, enlightening the surface and subsurface cellular structures and conformations of tissue. Under the near-infrared (NIR) laser light excitation (e.g., 785 nm), NIR Raman spectroscopy has shown great promise for detecting alterations in diseased gastric tissue in vitro with high biomolecular specificity.8–10 For instance, the diagnostic sensitivities and specificities in the range of ∼85–95% and ∼90–98% have been reported for differentiation between different pathologic types (e.g., intestinal metaplasia, Helicobacter pylori infection, dysplasia and adenocarcinoma) of gastric tissue using NIR Raman spectroscopy associated with multivariate analysis.8–13 In general, tissue biomedical Raman spectra are complex. To convert the subtle molecular differences of Raman spectra between different tissue types into valuable diagnostic information, sophisticated multivariate statistical analysis such as principal components analysis (PCA) have been widely practised by utilizing the entire Raman spectra for tissue diagnosis and characterization.13, 16 PCA reduces the dimension of the Raman spectra by decomposing them into linear combinations of orthogonal components [principal components (PCs)], such that the spectral variations in the dataset are maximized. Thus, PCA has been integrated with effective clustering algorithms such as support vector machines (SVM), logistic regression and linear discriminant analysis (LDA) for classification of biomedical Raman spectra. For instance, Huang et al. applied PCA-LDA to distinguish dysplasia from normal gastric tissue in vivo with an accuracy of ∼92–95%.8, 13, 15 PCA is efficient for data reduction and analysis, but it usually cannot interpret the physical meanings of the component spectra, as the PCs that constitute most of the variance in spectroscopic data are not necessarily the spectral parameters with the most diagnostic utility.16, 17 As most multivariate algorithms were originally not designed to cope with large amounts of irrelevant spectral variables,18–20 combining feature selection techniques with multivariate analysis has been widely practiced.18–22 The advantages of incorporating feature selection techniques are manifold: improving the predictive performance, reducing model complexity and gaining insights into the underlying spectroscopic process (i.e., features/variables importance).18–22 To overcome the prohibitive complexity of the large spectral space, different feature selection strategies (e.g., filter, embedded and wrapper) have been developed depending on how they combine the feature search and selection with the construction of the classification models.18, 19 The ant colony optimization (ACO),23 which is a swarm intelligent technique mimicking the behavior of a real ant colony by searching for the shortest path between the nest and a food source, has emerged as a robust search algorithm for efficient features selection with wrapper-based approach using fewer parameters, while still allowing parallel implementation to generate optimal solutions rapidly.24–28 The ACO algorithms have shown promise of providing efficient features selection in many applications such as genomics,24 mass spectroscopy25 and ultraviolet (UV), visible (vis) and NIR spectral analysis.26 However, to date, the ACO for feature selection in tissue Raman spectroscopy has not been reported in literature. The main aim of this study was to evaluate the feasibility of applying our latest developed image-guided Raman endoscopy technique associated with ACO-LDA algorithms for real-time, in vivo diagnosis and characterization of epithelial neoplasia in the stomach during clinical gastroscopy. The ACO technique was employed to search for the diagnostically significant features contained in tissue Raman spectra, and the ACO-LDA algorithms were further developed to differentiate cancer from normal gastric tissue in vivo.

Material and Methods

Raman endoscopy instrumentation

The integrated Raman spectroscopy and trimodal wide-field imaging system, we developed for in vivo tissue measurements at endoscopy, has been described in detail elsewhere.29, 30 Briefly, the Raman spectroscopy system consists of a spectrum stabilized 785-nm diode laser (maximum output: 300 mW, B&W TEK, Newark, DE), a transmissive imaging spectrograph (Holospec f/1.8, Kaiser Optical Systems Inc., Ann Arbor, MI) equipped with a liquid nitrogen-cooled, NIR-optimized, back-illuminated and deep depletion charge-coupled device (CCD) camera (1,340 × 400 pixels at 20 × 20 μm2 per pixel; Spec-10: 400BR/LN, Princeton Instruments), and a specially designed Raman endoscopic probe for both laser light delivery and in vivo tissue Raman signal collection. The novel Raman probe was composed of 32 collection fibers surrounding the central light delivery fiber with two stages of optical filterings incorporated at the proximal and distal ends of the probe for maximizing the collection of tissue Raman signals while reducing the interference of Rayleigh scattered light, fiber fluorescence and silica Raman signals. The Raman probe can pass down to the instrument channel of medical endoscopes and be directed to suspicious tissue sites under the guidance of wide-field endoscopic imaging (WLR/AFI/NBI) modalities.29 Control of the in vivo Raman endoscopy system was implemented by a personal computer (PC) using a custom-designed software that triggers online data acquisition and analysis (e.g., CCD dark-noise subtraction, wavelength calibration, system spectral response calibration, signal saturation detection, cosmic ray rejection etc.), as well as real-time display of in vivo tissue Raman spectra during clinical endoscopic measurements. The system acquires Raman spectra in the wavenumber range of 800–1,800 cm−1 from in vivo gastric tissue within 0.5 sec using the 785 nm excitation power of 1.5 W/cm2 with a beam size of ∼200 μm on the tissue surface. The signal-to-noise ratio of >30 (e.g., Raman peak at 1,450 cm−1) of in vivo gastric Raman spectra can be obtained under the above acquisition conditions.29, 30 The spectral resolution of the system is ∼9 cm−1. All wavelength-calibrated spectra were corrected for the wavelength-dependence of the system using a standard lamp (RS-10, EG&G Gamma Scientific, San Diego, CA). The trimodal endoscopy imaging system primarily comprises a 300-W short-arc xenon light source, a gastrointestinal (GI) videoscope (GIF-FQ260Z, Olympus) and a video system processor (CV-260SL, Olympus). The light reflected or autofluorescence emitted from tissue are detected by two monochrome CCD chips mounted behind the two objective lenses placed next to each other at the distal tip of the GI videoscope: one CCD for WLR/NBI and the other one for AFI. With this unique image-guided Raman endoscopy system, wide-field endoscopic images (WLR/AFI/NBI) and the corresponding in vivo Raman spectra of the tissue imaged can be simultaneously acquired, displayed and recorded in the video system processor and the PC, respectively.

Patients

The present study is part of an ongoing nationwide program aiming at early diagnosis and treatment of gastric malignancies run by the Singapore gastric cancer epidemiology, clinical and genetic program (GCEP).31 This study was conducted with approval by the Institutional Review Board (IRB) of the National Healthcare Group (NHG) of Singapore. All patients signed an informed consent permitting the investigative collection of in vivo gastric Raman spectra in the endoscope centre at the National University Hospital (NUH), Singapore. In this study, in vivo Raman spectra were acquired from 67 gastric patients (36 men and 31 women with a median age of 61 years old) under the guidance of multimodal wide-field endoscopic imaging (WLR/AFI/NBI) modalities.29 The Raman probe was placed in gentle contact with the gastric mucosa surface, and the positioning of the Raman probe against the tissue sites was verified on the endoscopy monitor by the endoscopists in-charge during gastroscopic examinations. To include the intrasubject/intersubject variations for data analysis, multiple Raman spectra (∼8–10) were obtained from each tissue site. As a result, a total of 1,063 in vivo Raman spectra were collected from 238 tissue sites, in which 934 in vivo Raman spectra were acquired from 121 normal tissue sites (confirmed by histopathology) in 63 patients, while 129 in vivo Raman spectra were obtained from 117 neoplastic tissue sites (i.e., adenocarcinoma as confirmed by histopathology) in 61 gastric patients. Immediately after Raman acquisitions, the biopsy samples were taken from the tissue sites measured (with suction markings) and then fixed in 10% formalin solution for histopathological examination by a senior gastrointestinal pathologist. For the assessment of diagnostic sensitivity and specificity of Raman endoscopy for tissue classification in the stomach, histopathological results served as the gold standard.

Data preprocessing

The raw Raman spectra (800–1,800 cm−1) measured from in vivo gastric tissue represented a composition of weak Raman signal, intense autofluorescence background and noise. Thus, the raw spectra were preprocessed by a first-order Savitsky-Golay filter (window width of 3 pixels, which corresponded to the system spectral resolution) to reduce noise.30, 32 A fifth-order polynomial was found to be optimal for fitting the autofluorescence background in the noise-smoothed spectrum,33 and this polynomial was then subtracted from the raw spectrum to yield the tissue Raman spectrum alone. Each background-subtracted Raman spectrum was also normalized to the integrated area under the curve from 800 to 1,800 cm−1, enabling a better comparison of the spectral shapes and relative Raman band intensities among different gastric tissues.

Ant colony optimization

The ACO creates a large population of artificial ants that search for the high dimensional spectral variable space in parallel for the best features suited for a particular discrimination problem. In this study, each artificial ant is assigned to a unique subset of variables in the Raman spectrum, and the artificial ants communicate by a virtual chemical pheromone trail distributed over the Raman variables that are changed dynamically at each run-time and reinforcing itself via positive feedback. To exclude those inefficient variables, an evaporation constant is imposed such that the function uniformly decreases the pheromone trail in real time. The ACO algorithm iteratively performs a loop containing the three central elements as follows: (i) Generation of ants that are stochastically assigned to spectral variables is based on the global pheromone trail; (ii) Each ant's performance is evaluated (i.e., classification accuracy of each subset of Raman shifts) and (iii) The global pheromone trail is updated via an adoption of the evaporation constant and the classification performance of the ants. These three steps are repeated reiteratively in search for the global optimum of the classification accuracy. The assignment of spectral variables to the artificial ants is driven by a transition probability function23:

equation image(1)

where τi(t) is the amount of pheromone for the ith spectral variable at the time t, ηi represents local information (ACO algorithms allows local search information to be implemented for improving the quality of computed solutions),23 α and β are the parameters that weight the pheromone and local information, respectively. Thus, if the local information or pheromone values are strong, ants will have a higher probability to include that particular spectral variable. The local information for each spectral variable was chosen as the weighting factor equation image where equation image and equation image are the means of the ith Raman shift in the two groups, while SDi,1 and SDi,2 represent the corresponding standard deviations.25 In the first iteration, all pheromone values are set to be one, thus each ant chooses variables with probabilities proportional to the existing local information. In this study, the LDA served as a clustering algorithm and the classification performance rendered by each ant was validated in an unbiased manner using the leave-one tissue site-out, cross validation method. As such, Raman spectra of one tissue site were left out and the LDA was redeveloped using the remaining Raman spectra. The algorithm was then used to classify the withheld Raman spectra. This process was repeated iteratively until all withheld Raman spectra were classified. The pheromone value τi for each spectral variable was updated accordingly.23

equation image(2)

where ρ is a constant between 0 and 1, representing the evaporation rate of pheromone and Δτi is related to the classification accuracy of the ants. Note that slightly different varieties of ACO algorithms exist, and the main difference appears in the updating procedure in Eq. (2) and through the constraints imposed on the pheromone trail.23 During the ACO iterations, a 20% best ants are chosen to reinforce the optimal ants (“elitist ants”) by updating the pheromone trail [Eq. (2)] to allow those Raman shifts that yield good classification accuracies to have their pheromone increased, while others gradually evaporate. The parameters of the ACO algorithm are the population size, evaporation rate ρ and the weights α and β of the pheromone and prior information, respectively. Because of extensive collinearity of neighboring variables in Raman spectra, a group of good solutions are prone to exist.26 The diagnostic information that correlates well with tissue pathology is expected to spread over whole spectral regions rather than being restricted to unique variables. In this work, a strategy to improve the important feature selection of ACO algorithms is to execute the search for algorithms multiple times to assess the relative importance of each spectral variable.34 The most frequently selected Raman shifts are presumed to be the most useful features for classification of in vivo gastric Raman spectra.

Results

Figure 1a shows in vivo mean Raman spectra ± 1 standard deviations (SD) of normal (n = 934) and cancer (n = 129) gastric tissues. Prominent Raman bands are observed in both normal and cancer gastric tissue at the following peak positions with tentative biochemical assignments8–16, 33: 875 cm−1 [v(C[BOND]C) of hydroxyproline], 1,004 cm−1 [ν(C[BOND]C) ring breathing of phenylalanine], 1,080 cm−1 [ν(C[BOND]C) or ν(C[BOND]O) of phospholipids], 1,265 cm−1 [amide III ν(C[BOND]N) and δ(N[BOND]H) of proteins], 1,302 and 1,335 cm−1 [CH3CH2 twisting of proteins and nucleic acids], 1,450 cm−1 (δ(CH2) of proteins and lipids), 1,620 cm−1 [C[DOUBLE BOND]C stretching mode of porphyrins], 1,655 cm−1 [amide I v(C[DOUBLE BOND]O) of proteins] and 1,745 cm−1 [v(C[DOUBLE BOND]O) of phospholipids]. The corresponding difference spectrum [i.e., cancer - normal] in Figure 1b reveals the significant Raman spectral changes (e.g., Raman peak intensities, Raman peak positions and spectral bandwidths broadening or narrowing) of gastric cancer tissue, particularly in the spectral ranges of 820–920, 1,300–1,500 and 1,600–1,800 cm−1, which contain signals related to proteins, nucleic acids and lipids, respectively. For instance, as compared to normal tissue, cancer gastric tissue shows lower intensities at 875 and 1,745 cm−1, while higher at 1,004, 1,335, 1,450 and 1,655 cm−1. This indicates that there is an increase or decrease in the percentage of a certain type of biomolecules relative to the total Raman-active constituents in gastric tissue associated with neoplastic transformation. The significant differences in Raman spectra between normal and neoplastic gastric tissue prove the utility of Raman endoscopy for in vivo gastric cancer diagnosis.

Figure 1.

(a) In vivo mean Raman spectra ±1 standard deviations (SD) of normal (n = 934) and cancer (n = 129) gastric tissues. Note that the mean Raman spectrum of gastric cancer tissue is vertically shifted for better visualization. (b) The corresponding difference spectrum ±1 SD calculated from the mean Raman spectra between normal and neoplastic gastric tissue. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

The ACO-LDA algorithms were developed for spectral feature selection to identify the diagnostically useful Raman features of in vivo gastric tissue Raman spectra for correlation with gastric tissue pathologies. The algorithm parameters balanced the exploitation–exploration trade-off and were adjusted by trial and error (100 ants, α = 1, β = 1 and ρ = 0.2) to a high convergence rate for obtaining the best subset of Raman variables for tissue distinction. Figure 2 shows the relationship of the error rate with respect to the evolution of the best ant and the mean ± 1 SD of the respective ants generated, indicating a significant improvement in the diagnostic accuracy of gastric malignancies as the ACO algorithm progresses. Consequently, the termination criterion was chosen at 30 iterations as the best ant in most runs had converged to a high diagnostic accuracy. To extract characteristic biochemical diagnostic information contained in tissue Raman spectra, the ACO-LDA algorithm was repeated over 100 independent runs for the selection of 15 spectral variables. Figure 3a displays the variables importance distributions associated with in vivo mean tissue Raman spectra, revealing that distinctive spectral features in the regions of 850–875, 1,090–1,110, 1,120–1,130, 1,170–1,190, 1,320–1,340, 1,655–1,665 and 1,730–1,745 cm−1 are the most useful Raman signals (biochemical assignments are listed in Table 1) for distinguishing cancer from normal tissue (Fig. 3b). The intensity differences among these Raman features were also verified to be significant (p ≪ 0.001, unpaired Student's t-test (two-sided, equal variance)) for gastric tissue diagnosis and characterization.

Figure 2.

The mean classification error rates of the ants ±1 SD versus the best performing ants for 50 iterations in ACO-LDA algorithms. Parameters (100 ants, α = 1, β = 1 and ρ = 0.2).

Figure 3.

(a) The in vivo mean Raman spectra of normal and neoplastic gastric tissue associated with the cumulative counts of significant Raman shifts (i.e., important variables) identified using ACO-LDA algorithm. (b) The most significant Raman bands identified by the ACO-LDA algorithms for in vivo gastric tissue diagnosis. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Table 1. Tentative assignments of significant Raman bands identified for ACO-LDA modeling
inline image

Figure 4 shows the classification results of the ACO-LDA diagnostic model based on the seven prominent Raman bands identified (850–875, 1,090–1,110, 1,120–1,130, 1,170–1,190, 1,320–1,340, 1,655–1,665 and 1,730–1,745 cm−1) together with the leave-one tissue site-out, cross validation. The ACO-LDA diagnostic algorithm yielded an overall accuracy of 94.6% [i.e., sensitivity of 94.6% (122/129) and specificity of 94.6% (884/934)] for identifying cancerous lesions from normal gastric tissues. Hence, ACO-LDA provides a novel way to determine diagnostically significant Raman spectral features which are important towards constructing a final diagnostic model for in vivo gastric cancer diagnosis. To further evaluate the performance of the ACO-LDA-based diagnostic algorithms together with the leave-one tissue site-out, cross-validation method, the receiver operating characteristic (ROC) curve (Fig. 5) is also generated. The area under the ROC curve is 0.980, confirming that the ACO-LDA diagnostic model based on in vivo Raman spectroscopy is powerful for clinical diagnosis of neoplastic lesions in the stomach at the molecular level. We also evaluated the performance of the ACO-LDA diagnostic algorithms using 80% of the total dataset for ACO-LDA model construction, while the remaining 20% of the total dataset (184 spectra from 24 normal tissue sites and 28 spectra from 23 neoplastic tissue sites) as an independent prediction dataset. The predictive accuracy 96.7% [i.e., sensitivity of 89.3% (25/28) and specificity of 97.8% (180/184)] could be achieved for the independent validation dataset, substantiating the robustness of the ACO-LDA algorithms developed for Raman endoscopic diagnosis of gastric cancer in vivo.

Figure 4.

Scatter plot of the linear discriminant scores of belonging to the normal and cancer gastric tissue using the ACO-LDA algorithms together with the leave-one tissue site-out, cross validation method. The separate line yields a diagnostic sensitivity of 94.6% (122/129) and specificity of 94.6% (884/934) for discriminating cancer from normal gastric tissue in vivo.

Figure 5.

Receiver operating characteristic (ROC) curves of discrimination results of in vivo Raman spectra of normal and cancer gastric tissue using ACO-LDA and PCA-LDA algorithms together with the leave-one tissue site-out, cross validation method. The integration area under the ROC curves for ACO-LDA and PCA-LDA are 0.980 and 0.955, respectively, illustrating the efficacy of Raman endoscopy together with ACO-LDA algorithms for in vivo gastric cancer diagnosis.

Discussion

The optical diagnostic technology which can provide biochemical information for identifying cancer tissue in situ could be of great clinical value to the clinicians during endoscopic examination. Raman spectroscopy is a unique vibrational spectroscopic technique that can noninvasively capture specific biomolecular information for tissue diagnosis and characterization, allowing the study of tissue in its native state without requiring tissue preparation or treatment. In addition, the good compatibility of fiber-optic technology with the Raman technique increases the possibility of Raman spectroscopy to be endoscopically utilized in clinic. Our very recent development of the novel miniaturized Raman endoscopic probe29 that can fit into the instrument channel of most medical endoscopes for effective collection of tissue Raman scattering in internal organs, now enables the endoscopists to directly assess the biochemical information of suspicious lesion sites under the wide-field endoscopic imaging guidance.29 This greatly facilitates analysis of in situ gastric tissue Raman signals of different pathologies, thereby bringing Raman technology into clinical endoscopic applications.

In this work, we investigate, for the first time, the in vivo Raman spectral properties of normal and cancer gastric tissues and explore the potential of translating in vivo Raman spectral differences between normal and malignant gastric tissue into diagnostically useful ACO-LDA algorithms for in vivo diagnosis of gastric cancer at gastroscopy. We have observed the distinctive differences in in vivo Raman spectra between normal and malignant gastric tissue (Fig. 1b), indicating the utility of Raman endoscopy for in vivo diagnosis of neoplastic lesions in the stomach during clinical gastroscopic examinations. To convert the subtle Raman differences of different tissue into valuable diagnostic information, the efficient but robust ACO-LDA diagnostic algorithms are developed to correlate in vivo Raman spectra with tissue histopathology. The ACO-LDA results showed that the valuable diagnostic information (i.e., the variables importance in Fig. 3) can be confined to a much reduced number of Raman bands primarily related to proteins, nucleic acids and lipids (Table 1)8–16, 33 for effective tissue classification (diagnostic accuracy of 94.6% can be achieved for gastric cancer identification as shown in Fig. 4). For instance, the Raman peak (850–875 cm−1, C[BOND]C stretching of hydroxyproline in collagen) intensity identified by ACO-LDA was found to decrease significantly associated with malignancy, indicating a reduction in the percentage of collagen contents relative to the total Raman-active components in malignant tissue.9, 11 This observation is in agreement with the reports that cancerous cells proliferate, invade into underlying stromal layer and express a class of metalloproteases, resulting in an overall reduction of collagen content in cancer tissue.35 In addition, the increased thickness of the gastric epithelial layer due to proliferation of the malignant cells may also attenuate the excitation laser power and also obscure the collagen Raman emission from the lamina propria, leading to the reduction of Raman collagen signals. This phenomenon has also been observed using autofluorescence spectroscopy36 and in our previous Raman studies of gastric cancer.9, 11 Moreover, the decrease in phospholipids related signals (e.g., 1,745 cm−1, C[DOUBLE BOND]O stretching) associated with malignancies were also found to be of fundamental importance in discriminating neoplastic from normal tissues. This Raman band reveals the diagnostic significance in the lower concentration of phospholipids relative to the total Raman-active constituents and is consistent with the finding of increased nucleic acids to lipids ratio in neoplastic tissue and cells.9, 11, 33, 37 Some of the diagnostic important Raman signals (e.g., 1,123 cm−1) identified by ACO-LDA are highly associated with biochemical changes in the extracellular matrix as well as different proteomic activity in the cytoplasm and nucleus in gastric tissue.8–10, 33, 37, 38 The prominent peak increase at 1,655 cm−1 (amide I, C[DOUBLE BOND]O stretching of proteins in the α-helix conformation) identified, is likely related to elevated percentage of histones concentration with respect to the total Raman-active constituents in cancer tissue.8–11, 15 This finding accords with gastric cytologic studies of grading malignancy by the indication of nuclear hyperchromasia.38 The existence of proteins in the β-pleated sheet conformation (e.g., 1,665 cm−1) may reflect more chemical interaction between the proteins and the microenvironment occurring in cancer cells, which could be related to increase of mitotic activity.38, 39 Further, Raman peaks assigned to nucleic acids (1,093 and 1,335 cm−1 of POmath image stretching and CH3CH2 wagging, respectively) were also found to be of essential importance for tissue diagnosis, revealing an increased nucleic acid/cytoplasm ratio in the gastric neoplastic cells, which is one of the characteristics of cell carcinogenesis.8–11, 33, 38 All these findings reinforce that ACO-LDA algorithms can provide a novel way for better understanding of the most vital biochemical transformations of gastric carcinogenesis by identifying a comprehensive range of diagnostically important biochemical information contained in tissue Raman spectra for effective tissue diagnosis and characterization.

We, for the first time, utilized the swarm intelligence ACO-LDA technique to determine diagnostically important Raman features of tissue Raman spectra for construction of robust diagnostic algorithms (Figs. 4 and 5), yielding a diagnostic sensitivity of 94.6% (122/129), specificity of 94.6% (884/934) and an overall accuracy of 94.6% (1,006/1,063) for in vivo gastric cancer diagnosis. The ROC with an area under the curve of 0.980 (Fig. 5) further verifies the diagnostic efficacy of Raman endoscopy integrated with ACO-LDA algorithms for in vivo detection of gastric malignancies. The predictive accuracy 96.7% [i.e., sensitivity of 89.3% (25/28) and specificity of 97.8% (180/184)] can be achieved for the randomly selected validation dataset, substantiating the generalization ability of the ACO-LDA algorithms for Raman diagnosis of gastric cancer in vivo. For comparison purposes, we have also applied the widely practiced PCA-LDA technique on the same in vivo tissue Raman dataset for gastric tissue classification. The PCA-LDA algorithms developed utilizing five significant PCs (PC1, PC2, PC3, PC5 and PC6 accounting for 78.3% of the total variance; unpaired two-sided Student's t-test, p < 0.05) together with the leave-one-tissue site-out, cross-validation yield a diagnostic accuracy of 91.6% [i.e., sensitivity of 92.2% (119/129) and specificity of 91.5% (855/934)] for in vivo diagnosis of gastric malignancies. The integration area under the ROC curve for PCA-LDA algorithms is 0.955 (Fig. 5). Obviously, the ACO-LDA algorithms employing the specific Raman bands identified (850–875, 1,090–1,110, 1,120–1,130, 1,170–1,190, 1,320–1,340, 1,655–1,665 and 1,730–1,745 cm−1) with direct physical meanings of biochemical diagnostic information provide a better diagnostic accuracy as compared to PCA-LDA modeling. The improved diagnostic performance of ACO-LDA model may be attributed to the fact that ACO is a wrapper-based search algorithm that possesses the inherent ability of exploiting the mutual interactions among spectral variables (i.e., the joint ability to discriminate) according to their variables importance.18–20, 39, 40 Hence, the ACO-LDA feature selection technique in Raman spectroscopy can provide new insights into the diagnostically significant biochemical changes associated with carcinogenesis transformation in tissue. The substantial reduction in the spectral dimension of the swarm intelligence-based ACO-LDA diagnostic model is particularly appealing for realizing real-time, in vivo tissue diagnosis and characterization in clinical settings.

In summary, we have acquired for the first time in vivo Raman spectra from normal and cancer gastric tissue within 0.5 sec during gastroscopic examination. The swarm-intelligence ACO-LDA algorithms are developed to identify the most distinctive biochemical changes associated with neoplastic transformation, providing good differentiation between normal and cancer gastric tissue. The image-guided Raman endoscopy integrated with robust ACO-LDA algorithms for efficient Raman features selection has potential for rapid, noninvasive, in vivo diagnosis and detection of gastric cancer at the molecular level during clinical endoscopic inspections.

Ancillary