Near-infrared Raman spectroscopy for early diagnosis and typing of adenocarcinoma in the stomach


  • S. K. Teh,

    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    Search for more papers by this author
  • W. Zheng,

    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    Search for more papers by this author
  • K. Y. Ho,

    1. Departments of Medicine, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • M. Teh,

    1. Departments of Pathology, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • K. G. Yeoh,

    1. Departments of Medicine, Yong Loo Lin School of Medicine, National University of Singapore and National University Hospital, Singapore
    Search for more papers by this author
  • Z. Huang

    Corresponding author
    1. Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, Singapore
    • Optical Bioimaging Laboratory, Department of Bioengineering, Faculty of Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117576
    Search for more papers by this author



The aim of this study was to evaluate the feasibility of using near-infrared (NIR) Raman spectroscopy for early diagnosis and typing of intestinal and diffuse adenocarcinoma of the stomach.


A dispersive-type NIR Raman system was used for tissue measurements. One hundred gastric tissue samples from 62 patients who underwent endoscopy or gastrectomy were used (70 normal tissue specimens and 30 adenocarcinomas). Principal components analysis (PCA) and multinomial logistic regression (MNLR) were used to develop diagnostic algorithms for tissue classification.


High-quality Raman spectra ranging from 800 to 1800 cm−1 were acquired from gastric tissue within 5 s. There were significant differences in Raman spectra between normal stomach and the two gastric adenocarcinoma subtypes, particularly in the spectral ranges 850–1150, 1200–1500 and 1600–1750 cm−1, which contain signals related to proteins, nucleic acids and lipids. PCA–MNLR achieved predictive accuracies of 88, 92 and 94 per cent for normal stomach, and intestinal- and diffuse-type gastric adenocarcinomas respectively.


NIR Raman spectroscopy can detect gastric malignancy and identify the subtype of gastric adenocarcinoma. Copyright © 2010 British Journal of Surgery Society Ltd. Published by John Wiley & Sons, Ltd.


Gastric cancer is the second leading cause of cancer-associated death, accounting for approximately 600 000 annual deaths worldwide1. Early diagnosis and localization with appropriate curative treatment (for example endoscopic submucosal dissection or gastrectomy) is critical in decreasing mortality2. Identification of early cancer, however, can be difficult. Conventional white-light endoscopy relies on visual identification of morphological tissue changes. Subtle changes may not be apparent, limiting diagnostic accuracy. Positive endoscopic biopsy is the standard criterion for gastric cancer diagnosis, but is invasive and impractical for screening high-risk patients who may have multiple suspicious lesions3. In addition, endoscopic biopsies are often small (about 3 mm in diameter) and may be inconclusive both in making the diagnosis and in determining the type of gastric adenocarcinoma4. A non-invasive optical diagnostic technique providing a direct assessment of biochemical information from suspicious lesions would represent a significant advance in the endoscopic detection of early gastric cancer.

Optical spectroscopic methods, including light scattering spectroscopy, fluorescence spectroscopy and Raman spectroscopy, have all been investigated for the diagnosis and evaluation of cancer and precancer5–13. Raman spectroscopy is a vibrational spectroscopic technique that is capable of probing specific biochemical aspects in biological tissues based on inelastic light-scattering processes. The technique has shown promise in detecting biomolecular alterations associated with disease13. With the use of near-infrared (NIR) lasers as excitation light sources, NIR Raman spectroscopy holds significant advantages over previous techniques in that water exhibits very low absorption at the working wavelength range, and tissues exhibit far less autofluorescence than visible light excitation7. Less water absorption makes it easy to detect other tissue components and results in deeper light penetration into the tissue. As a result, NIR Raman spectroscopy has been studied for the early detection of malignant tumours in a number of organs, including the stomach7–9. High diagnostic accuracies (around 90 per cent) have been achieved for early detection of gastric cancer and precancer (dysplasia)14–16.

The clinical potential of NIR Raman spectroscopy for identification of different subtypes of gastric adenocarcinoma (for example intestinal versus diffuse17) has not, however, been evaluated in detail. The distinction between intestinal and diffuse types of gastric adenocarcinoma is clinically relevant and may influence treatment strategy1, 2. Laparoscopic or endoscopic resections tend to be more applicable to intestinal-type adenocarcinoma, whereas patients with diffuse-type adenocarcinoma are more likely to require gastrectomy1, 2. The aim of the present study was to investigate the clinical potential of NIR Raman spectroscopy for the early detection and typing of intestinal and diffuse adenocarcinoma in the stomach. Multivariable statistical techniques, including principal components analysis (PCA) and multinomial logistic regression (MNLR), were employed to develop effective diagnostic algorithms for classification of Raman spectra among different types of gastric tissue.


The present study is part of an ongoing nationwide project focusing on early diagnosis and treatment of gastric cancer run by the Singapore Gastric Cancer Epidemiology, Clinical and Genetics Programme (GCEP). This project has been described in detail elsewhere18. From 2006, 125 gastric tissue samples were collected from 72 patients (40 men and 32 women with a median age of 62 years) undergoing endoscopy for clinically suspicious lesions under the GCEP protocol, or gastrectomy for histopathologically confirmed gastric cancer in the Endoscope Centre at the National University Hospital, Singapore. All patients signed informed consent permitting the investigative use of tissues (before endoscopy or surgery), and the study was approved by the ethics committee of the National Healthcare Group of Singapore.

After biopsy or surgical resection, tissue samples were immediately sent for Raman measurements. The instrument used for tissue Raman spectroscopic studies has been described in detail elsewhere19. Incident laser light, with a beam size of 1 mm, was focused on the tissue mucosal surface of gastric mucosal samples measuring approximately 3 × 3 × 2 mm to mimic in vivo clinical measurements. After measuring and marking the tissue surface, samples were fixed in 10 per cent formalin solution and submitted to histopathological examination. Only spectra that were correctly acquired from the surfaces of tissues were used for data analysis after comparing with histopathological results.

Raw spectra acquired from gastric tissue in the range 800–1800 cm−1 represent a combination of prominent tissue autofluorescence, weak tissue Raman scattering signals and noise. These raw spectra were preprocessed by a first-order Savitsky–Golay filter (window width of 3 pixels, corresponding to the system spectral resolution) to reduce noise20. A fifth-order polynomial7 was found to be optimal for fitting the broad autofluorescence background in the noise-smoothed spectrum, and this polynomial was then subtracted from the raw spectrum to yield the tissue Raman spectrum alone14. Each background-subtracted Raman spectrum was also normalized to the integrated area under the curve from 800 to 1800 cm−1, enabling better comparison of the spectral shapes and relative peak intensities among different tissue samples14.

Statistical analysis

The high dimension of Raman spectral space (each Raman spectrum ranges from 800 to 1800 cm−1 with a set of 544 intensities) results in computational complexity and inefficiency in optimization and implementation of the MNLR algorithms21. As such, PCA was performed on the tissue Raman data set to reduce the dimension of Raman spectral space while retaining diagnostically significant information for tissue classification. To eliminate the influence of intersubject and/or intrasubject spectral variability on PCA, all spectra were standardized so that the mean of the spectra was zero. Mean centring ensures that the principal components (PCs) form an orthogonal basis13. Standardized Raman data sets were assembled into data matrices with wavenumber columns and individual case rows. Thus, PCA was performed on the standardized spectral data matrices to generate PCs comprising a reduced number of orthogonal variables that accounted for most of the total variance in the original spectra. Each loading vector is related to the original spectrum by a variable called the PC score, which represents the weight of that particular component against the basis spectrum. PC scores reflect the differences between different classes.

One-way ANOVA22 was used to identify the most diagnostically significant PCs (P < 0·050) for separation of the three different tissue classes. These significant PC scores were selected as input for the development of MNLR algorithms for multiclass classification, with the restriction that the number of PC scores used for the MNLR model was at least twofold smaller than the number of spectra in the smallest model group (number of PC scores used ⩽ Nsmallest/2, where Nsmallest is the number of spectra in the smallest diagnostic group (diffuse-type adenocarcinoma; n = 12) to avoid overfitting10. MNLR determines the posterior probabilities associated with the different data for each diagnostic category and assigns the data to the diagnostic category with the highest posterior probability23. The performance of the diagnostic algorithms rendered by the MNLR models for correctly predicting the tissue groups (for example normal versus intestinal-type adenocarcinoma) was estimated in an unbiased manner using the leave-one-sample-out, cross-validation method on all model spectra14. In this method, one sample (one spectrum) was held out from the data set and the entire algorithm including PCA and MNLR was redeveloped using the remaining tissue spectra. The algorithm was then used to classify the withheld spectrum. This process was repeated until all withheld spectra were classified.

For assessment of the diagnostic sensitivity and specificity of the NIR Raman spectroscopy technique, histopathological results served as the standard. Post hoc comparison using Fisher's least significant differences test24 was also used to analyse the pairwise differences between different tissue types. Statistical analysis was performed using the statistical software package STATISTICA version 7.0 (StatSoft, Tulsa, Oklahoma, USA). P < 0·050 was considered significant in all statistical tests.


A total of 25 spectral data (samples) were excluded from analysis owing to difficulties in orientating the gastric tissue samples in a way that mimicked in vivo tissue Raman collection, or where histology indicated heterogeneous pathologies within the gastric tissue sample. Some 100 homogeneous gastric tissue samples with clearly defined pathologies were therefore analysed from 62 patients (34 men and 28 women with a median age of 61 years).

Of these 100 gastric tissue samples, 70 were histologically normal and 30 were adenocarcinomas. According to Laurén's classification17, 18 cancers were of intestinal type and 12 diffuse type. Fig.1a shows the mean NIR Raman spectra from normal, intestinal-type adenocarcinoma and diffuse-type adenocarcinoma gastric tissues. Prominent Raman peaks were observed in both normal and neoplastic tissues at the following locations with their respective tentative biochemical assignments7–8, 14–16: 875 cm−1 (C–C stretching modes of collagen), 1004 cm−1 (C–C symmetrical stretch ring breathing of phenylalanine), 1100 cm−1 (C–C stretching modes of phospholipids), 1230 cm−1 (C–C6H5 stretching mode of tryptophan and phenylalanine), 1265 cm−1 (C–N stretching and N–H bending modes of amide III of proteins), 1335 cm−1 (CH3CH2 twisting of proteins and nucleic acids), 1450 cm−1 (CH2 bending of proteins and lipids), 1655 cm−1 (C = O stretching of amide I of proteins) and 1745 cm−1 (C = O stretching mode of phospholipids). Adenocarcinomas (both intestinal and diffuse) were markedly different in spectral shapes and intensities compared with normal tissue (Fig.1b). Adenocarcinomas had lower intensities at 875, 1004, 1100, 1230 and 1745 cm−1, but higher intensities at 1265, 1335, 1450 and 1655 cm−1. Comparison of Raman spectra between the two subtypes of adenocarcinoma (intestinal versus diffuse) showed that the diffuse type had notably higher Raman peak intensities at 875, 1100 and 1450 cm−1, but lower at 1655 cm−1.

Figure 1.

a Comparison of mean near-infrared Raman spectra of 70 normal stomach samples, 18 intestinal-type adenocarcinomas and 12 diffuse-type adenocarcinomas. b Difference spectra were calculated from the mean Raman spectra among the three gastric tissue types

ANOVA on the obtained PC scores showed that six PCs (PC1, PC5, PC6, PC7, PC8 and PC10) were diagnostically significant for classification of the three different types of gastric tissue. Fig.2 displays the six significant PC scores calculated from PCA–ANOVA on the Raman spectra. The first PC accounted for the largest variance (54·9 per cent of the total), and generally represents variations in the major tissue Raman peak (i.e. 875, 1004, 1100, 1230, 1265, 1335, 1450, 1655 and 1745 cm−1). Successive PCs (PC5, PC6, PC7, PC8 and PC10) describe the spectral features that contribute progressively smaller variances (PC5, 3·5 per cent; PC6, 1·9 per cent; PC7, 1·5 per cent; PC8, 1·0 per cent; PC10, 0·7 per cent), containing a wide range of diagnostic information that includes those at the shoulders of the different prominent Raman peaks (for example PC5 loading reveals a peak at 846 cm−1 which is at the shoulder of the prominent Raman peak at 875 cm−1).

Figure 2.

The first six diagnostically significant principal components (PCs) (P < 0·050, one-way ANOVA), accounting for about 63·4 per cent of the total variance calculated from the Raman spectral data set, revealing the diagnostically significant spectral features for tissue classification

Fig.3 shows the relationship between the diagnostically significant PC scores with different gastric tissue types. Fisher's least significant difference tests showed that different PC scores were largely associated with different degrees of diagnostic utility for classification of different gastric tissue types (normal tissue, intestinal-type adenocarcinoma, diffuse-type adenocarcinoma). For instance, PC1 is optimal in discriminating normal tissue from intestinal and diffuse adenocarcinoma; PC5 shows efficacy in classification of the three different gastric tissue types; PC6 can be used for differentiating intestinal-type adenocarcinoma from normal tissue and diffuse-type adenocarcinoma; PC7 can be used to distinguish diffuse-type adenocarcinoma from normal tissue and intestinal-type adenocarcinoma; and PC8 and PC10 can be used to separate intestinal-type adenocarcinoma from normal tissue.

Figure 3.

Box charts of the six significant principal component (PC) scores for the three gastric tissue types (normal, intestinal-type adenocarcinoma and diffuse-type adenocarcinoma): a PC1, b PC5, c PC6, d PC7, e PC8 and f PC10. The line within each notch box represents the median, and the lower and upper boundaries of the box indicate first (25 per cent percentile) and third (75 per cent percentile) quartiles respectively. Error bars (whiskers) represent the 1·5-fold interquartile range. *P < 0·001, †P < 0·010, ‡P < 0·050 (pairwise comparison of tissue types with post hoc multiple comparison tests (Fisher's least significant differences))

Fig.4 is a ternary plot25 derived when all six diagnostically significant PCs were loaded into the MNLR model to generate effective diagnostic algorithms for tissue classification. This depicts probabilistic outcome in association with data for each tissue type, providing a three-class diagnostic model for classification. The final diagnostic category of each data point is determined by the nearest proximity of data to the diagnostic category related to the vertex of the ternary plot, representing the 100 per cent posterior probability belonging to either normal, intestinal-type or diffuse-type adenocarcinoma. The clustering of the normal, intestinal-type adenocarcinoma and diffuse-type adenocarcinoma gastric tissues at the respective vertex (Fig.4) demonstrated the efficacy of the PCA–MNLR algorithms for identification of different types of gastric tissue.

Figure 4.

Two-dimensional ternary plot of the posterior probabilities belonging to normal tissue, intestinal-type and diffuse-type adenocarcinoma, illustrating the good clusterings of the three different gastric tissue types achieved by the principal components analysis–multinomial logistic regression diagnostic algorithms

Table1 summarizes the diagnostic indices for Raman spectra using PCA–MNLR together with the leave-one-spectrum-out, cross-validation method in classifying the three different types of gastric tissue. Diagnostic sensitivities of 91, 78 and 75 per cent, specificities of 80, 95 and 96 per cent, and accuracies of 88, 92 and 94 per cent respectively, were achieved for differentiation between normal tissue, intestinal-type and diffuse-type gastric adenocarcinomas.

Table 1. Classification results of Raman prediction of the three gastric tissue groups using principal components analysis–multinomial logistic regression together with a leave-one-sample-out, cross-validation method
 Raman prediction 
Tissue typeNormalIntestinal typeDiffuse typeTotal
 Intestinal type414018
 Diffuse type21912
Sensitivity (%)917875 
Specificity (%)809596 
Accuracy (%)889294 


Tissue Raman spectra provide biochemical and biomolecular information for tissue diagnosis and characterization13, 15. Raman spectra from different pathological types of gastric tissue are, however, very similar (Fig.1a), and sophisticated statistical analytical procedures are required to elucidate these subtle spectral changes14, 15. Teh and colleagues14 applied PCA to distinguish Raman spectra in dysplastic and normal gastric tissue. Stone and co-workers9 used PCA to identify different diagnostic features contained in the Raman spectrum for epithelial tissue analysis. The present study also used PCA to identify discriminatory Raman signals for diagnosis of gastric adenocarcinoma. This analysis showed that the maximum variance (PC1) in the gastric Raman data set generally represented the variation between adenocarcinoma and normal tissue. This provides strong evidence that significant molecular differences between normal and cancer gastric tissue can be revealed with NIR Raman spectroscopy. The spectral features, showing differences between different types of adenocarcinoma and normal gastric tissue, are in agreement with other Raman studies on gastric malignancies14–16. The five Raman peak intensities that decrease significantly with malignancy indicate a reduction in the percentage of different biochemical components such as collagen, phenylalanine, tryptophan and lipids related to the total Raman-active components in gastric cancer tissue. Raman peak intensities at 1265, 1335, 1450 and 1655 cm−1, which increase in adenocarcinoma tissues, mainly represent increases in amide III, DNA, proteins/lipids and amide I. Changes in different protein-related Raman signals might relate to different proteomic activities in the cytoplasm7, 8, 14–16, 26 and nucleus as well as changes in the extracellular matrix15, 16. Significant variations in the Raman features representing lipids and nucleic acids probably relate to increases in metabolic activities and nucleic acid/cytoplasm ratio7, 12, 15.

Significant discrepancies between histological classification of cancer subtype based on endoscopic biopsy and resection specimens from the same patient have frequently been reported in gastric adenocarcinoma27. The present results show that there are specific Raman spectral differences between intestinal- and diffuse-type adenocarcinoma, confirming the utility of NIR Raman spectroscopy for subtyping. Changes in Raman spectra for the two subtypes of gastric adenocarcinoma are consistent with histopathological features. Diffuse-type adenocarcinoma is usually accompanied by marked stromal reactions such as desmoplasia28, which increases the collagen content of the tissue compared with intestinal-type adenocarcinoma. Consequently, although proliferation of malignant cells results in thickening of the gastric mucosa layer leading to attenuation of the excitation laser power and obscuring collagen Raman emission (for example at 875 cm−1) from the deep collagen basal membrane in all gastric cancers, the association of relatively more stromal collagenous fibrous proteins for diffuse-type gastric adenocarcinoma still gave rise to higher collagen signals in the Raman spectra for diffuse-type cancers (Fig.1). This phenomenon has also been observed in autofluorescence spectroscopy for gastric cancer detection29. The prominent Raman peak at 1450 cm−1 (generally associated with cellular lipids and proteins) was also found to be significantly higher for diffuse-type cancer than for intestinal-type cancer and normal tissues. This probably reflects higher concentrations of intracytoplasmic mucin in diffuse-type gastric adenocarcinoma28.

It should be noted that the tissue Raman spectra acquired from homogeneous ex vivo gastric tissues with clearly defined pathologies may not truly reflect the in vivo clinical conditions during gastroscopic inspection, because the in vivo Raman signal may contain a mixture of spectral information from both the neoplastic lesion and the surrounding ‘normal’ gastric tissue. This might apply to very small early gastric cancer (less than 1 mm in diameter)2–4 with respect to the tissue volume probed by the Raman detection system (approximately 2 mm). Further development of a confocal fibreoptic Raman probe with micron-scale probing volumes may solve the problem.

The PCA–MNLR algorithms (Figs24) showed that Raman spectra provide critical diagnostic information (approximately 7 per cent of the total variation in tissue Raman signals in the entire Raman spectrum from 800 to 1800 cm−1) for distinguishing between diffuse- and intestinal-type adenocarcinomas (Figs2 and 3). The consistency in identifying significant PC scores (PC1, PC5, PC6, PC7, PC8 and PC10) from run to run, during the leave-one-spectrum-out, cross-validation testing suggests that the diagnostic algorithms developed are robust for classification of Raman spectra in different gastric tissues. Diagnostic accuracies of 88, 92 and 94 per cent were achieved for differentiation between normal, intestinal-type and diffuse-type adenocarcinoma tissues respectively.

NIR Raman spectroscopy in conjunction with PCA–MNLR can be used to characterize biomolecular differences between normal and adenocarcinoma gastric tissues. Good differentiation between intestinal and diffuse types of gastric adenocarcinoma can be achieved. This might influence strategies for intervention in patients with gastric cancer. The successful development and implementation of a Raman endoscopic system with different wide-field imaging modalities (white-light reflectance, autofluorescence and narrow-band imaging)30, 31 has led to in vivo Raman endoscopic measurements in patients with gastric cancer during clinical gastroscopy at this centre. It is hoped that this technology will determine the clinical merit of Raman spectroscopy for the early detection and subtyping of gastric adenocarcinoma.


This research was supported by the National Medical Research Council, the Biomedical Research Council, the Academic Research Fund from the Ministry of Education and the Faculty Research Fund from the National University of Singapore. The authors declare no conflict of interest.