Early Detection and Diagnosis
Detection of epithelial ovarian cancer using 1H-NMR-based metabonomics
Article first published online: 21 OCT 2004
Copyright © 2004 Wiley-Liss, Inc.
International Journal of Cancer
Volume 113, Issue 5, pages 782–788, 20 February 2005
How to Cite
Odunsi, K., Wollman, R. M., Ambrosone, C. B., Hutson, A., McCann, S. E., Tammela, J., Geisler, J. P., Miller, G., Sellers, T., Cliby, W., Qian, F., Keitz, B., Intengan, M., Lele, S. and Alderfer, J. L. (2005), Detection of epithelial ovarian cancer using 1H-NMR-based metabonomics. Int. J. Cancer, 113: 782–788. doi: 10.1002/ijc.20651
- Issue published online: 8 DEC 2004
- Article first published online: 21 OCT 2004
- Manuscript Accepted: 1 JUL 2004
- Manuscript Received: 24 OCT 2003
- Cancer Research Institute/Ludwig Institute for Cancer Research Cancer Vaccine Collaborative Grant
- Roswell Park Cancer Center Support. Grant Number: P30CA16056
- NYSTAR faculty development grant
- NCI. Grant Number: KO7CA89123
- ovarian cancer;
- early diagnosis;
- 1H-NMR spectroscopy;
- pattern recognition
Currently available serum biomarkers are insufficiently reliable to distinguish patients with epithelial ovarian cancer (EOC) from healthy individuals. Metabonomics, the study of metabolic processes in biologic systems, is based on the use of 1H-NMR spectroscopy and multivariate statistics for biochemical data generation and interpretation and may provide a characteristic fingerprint in disease. In an effort to examine the utility of the metabonomic approach for discriminating sera from women with EOC from healthy controls, we performed 1H-NMR spectroscopic analysis on preoperative serum specimens obtained from 38 patients with EOC, 12 patients with benign ovarian cysts and 53 healthy women. After data reduction, we applied both unsupervised Principal Component Analysis (PCA) and supervised Soft Independent Modeling of Class Analogy (SIMCA) for pattern recognition. The sensitivity and specificity tradeoffs were summarized for each variable using the area under the receiver-operating characteristic (ROC) curve. In addition, we analyzed the regions of NMR spectra that most strongly influence separation of sera of EOC patients from healthy controls. PCA analysis allowed correct separation of all serum specimens from 38 patients with EOC (100%) from all of the 21 premenopausal normal samples (100%) and from all the sera from patients with benign ovarian disease (100%). In addition, it was possible to correctly separate 37 of 38 (97.4%) cancer specimens from 31 of 32 (97%) postmenopausal control sera. SIMCA analysis using the Cooman's plot demonstrated that sera classes from patients with EOC, benign ovarian cysts and the postmenopausal healthy controls did not share multivariate space, providing validation for the class separation. ROC analysis indicated that the sera from patients with and without disease could be identified with 100% sensitivity and specificity at the 1H-NMR regions 2.77 parts per million (ppm) and 2.04 ppm from the origin (AUC of ROC curve = 1.0). In addition, the regression coefficients most influential for the EOC samples compared to postmenopausal controls lie around δ3.7 ppm (due mainly to sugar hydrogens). Other loadings most influential for the EOC samples lie around δ2.25 ppm and δ1.18 ppm. These findings indicate that 1H-NMR metabonomic analysis of serum achieves complete separation of EOC patients from healthy controls. The metabonomic approach deserves further evaluation as a potential novel strategy for the early detection of epithelial ovarian cancer. © 2004 Wiley-Liss, Inc.
Epithelial ovarian cancer (EOC) is the leading cause of death from gynecologic malignancies. There are more than 23,000 cases annually in the United States, and 14,000 women can be expected to die from the disease in 2003.1 Despite important advances in surgery and chemotherapy that have been made over the past 20 years, the overall survival for patients with EOC has not changed significantly. The high mortality rate of EOC occurs primarily because most women are diagnosed with advanced disease (stage III/IV), which has a 5-year survival rate of 15–20%.1 In contrast, the small proportion of patients with accurately diagnosed stage I disease have 5-year survival rates in excess of 90%.2 Current candidate strategies for the detection of EOC are based on biochemical tumor markers, such as CA125, and biophysical markers assessed by ultrasound and/or Doppler imaging of the ovaries. Unfortunately, the positive predictive values (PPV) of these strategies for the early detection of EOC using these modalities have been consistently less than 10%.3, 4 Attempts to improve the performance characteristics of these early detection strategies in EOC have met with limited success and include the utilization of complex longitudinal algorithms for CA125,5, 6, 7 sequential testing8, 9 and the addition of newer markers such as OVX-1,10 M-CSF,11 lysophosphatidic acid12 and osteopontin.13 In light of these considerations, novel approaches such as functional genomics14 and proteomics15 have been used to identify differential expression of proteins between cases and controls. However, these assays are, as yet, laborious, expensive and not well validated.
An alternative approach for early detection of EOC is to utilize a novel and unique strategy that provides a coherent perspective of the complete metabolic response of organisms to pathophysiologic insult or genetic modification.16 This approach to the study of metabolic processes in biologic systems has been termed metabonomics.16 Metabonomics is based on the use of NMR (and other spectroscopic methods) and multivariate statistics for biochemical data generation and interpretation. NMR spectroscopy is based on the behavior of atoms placed in a static external magnetic field. Atomic nuclei possessing a property known as spin that is not equal to zero can give rise to NMR signals. Nuclei possessing this property are 1H, 13C, 15N and 31P. Since protons are present in almost all metabolites in body fluids, an 1H-NMR spectrum allows the simultaneous detection and quantification of thousands of proton-containing, low-molecular-weight species within a biologic matrix, resulting in the generation of an endogenous profile that may be altered in disease to provide a characteristic “fingerprint.”16, 17, 18, 19 Therefore, we hypothesized that the analysis of a global view of metabolites in serum would enhance the possibility of identifying metabonomic signatures for EOC. To test this hypothesis, we investigated the ability of 1H-NMR metabonomics to completely separate patients with EOC from healthy age-matched controls in the general population. In addition, we studied the regions of the NMR spectrum that most strongly influence separation between EOC and healthy controls.
Material and methods
Collection and preparation of specimens
From July 2001 to December 2002, preoperative serum samples of patients undergoing surgery for EOC at the Roswell Park Cancer Institute (RPCI) were collected under an approved IRB protocol. The verification of tumor tissue pathology was performed by a gynecologic pathologist (M.E.I.). For controls, the sera of normal healthy women (pre- and postmenopausal controls) and from patients with benign ovarian cysts were collected under 2 additional IRB protocols. Sera from healthy premenopausal women were collected from women attending the gynecology clinics at the Roswell Park Cancer Institute. For postmenopausal women, sera were collected from participants in a nutrition intervention study. Within 2 hr of collection of blood by venipuncture, the sera were separated by centrifugation, and aliquots were stored at −80°C until assayed.
1H-NMR spectroscopic analysis of the serum samples
The serum samples were thawed immediately before use, and 100 μl of each was diluted by 99.9% D2O (450 μl) in 5 mm precision NMR tubes (Norell, Landisville, NJ) to provide field-frequency lock. 1H chemical shifts were referenced internally to the residual water (HOD) resonance at δ4.9802, measured relative to the primary internal chemical shift reference trimethylsilyl-2,2,3,3- tetradeuteropropionic acid at δ0.00. Conventional 1H-NMR spectra of the serum samples were measured on a Bruker AMX-600 spectrometer (Billerica, MA) operating at a frequency of 600.13 MHz 1H at 278K. To suppress the large water signal, the 1H-NMR spectra were acquired using a pulse sequence called NOESYPR1D, comprising the following pulse sequence: RD - 90° - t1 - 90° - tm - 90° - acquire free induction decay (FID), where RD represents a relaxation delay of 1.5 sec during which the water resonance is selectively irradiated; t1 represents the first increment in a NOESY experiment and corresponds to a fixed interval of 4 μs; and tm is the mixing time in the NOESY sequence and has a value of 100 ms, during which the water resonance is again selectively irradiated. For each sample, 128 FIDs were collected into 64K data points using a spectral width of 12.2 KHz and an acquisition time of 2.69 sec. The FIDs were multiplied by an exponential weighting function corresponding to a line broadening of 0.25 Hz before Fourier transformation (FT).
Data reduction of NMR data
Each 1H-NMR spectrum was corrected for phase and baseline distortion using NutsPro (version 20021122, Acorn NMR Inc., Livermore, CA) and the 9.5–0.0 parts per million (ppm) spectral region was reduced to 200–250 integral segments of equal width of δ0.04. This optimal width of segmented regions is based on previous studies,20, 21 which found that regions of 0.04 ppm accommodated any small pH-related shifts in signals and variation in shimming quality. To eliminate any spurious effects of variability in the suppression of water resonance, the region containing the water resonance (δ5.5 to 4.75) was set to zero integral. Subsequently, all remaining frequency regions of the spectra were scaled to the total integrated area of the spectra, mean-centered and pareto-scaled.22 Pareto scaling gives each variable a variance numerically equal to its standard deviation.
Principal component analysis (PCA) of the 1H-NMR spectra
Principal component analysis (PCA) is an unsupervised method (i.e., analysis performed without use of knowledge of the sample class) that reduces the dimensionality of the data input while expressing much of the original n-dimensional variance in a 2- or 3-D map.23 This is accomplished essentially through a statistical “grouping” of variables (metabolic signals) that have strong correlations with one another into a smaller set of variables known as factors. The factors themselves are not intercorrelated and thus represent distinct patterns of metabolic signals. The variables associated with a specific factor produce factor loadings that are quantified as the correlation of that variable with the factor. Individuals are then assigned a score for each factor calculated as the sum of the weighted factor loadings for each factor. Prior to PCA analysis, all NMR data were mean-centered and pareto-scaled22 to give each variable a variance numerically equal to its standard deviation. PCA was carried out on the 1H-NMR data from the sera of EOC patients and controls to plot data in order to indicate relationships between samples in the multidimensional space. The principal components were displayed as a set of scores (t), which highlight clustering or outliers, and a set of loadings (p), which highlight the influence of input variables on t.
Soft independent modeling of class analogy (SIMCA)
To provide validation of the results, a supervised analysis of the data was performed based on Soft Independent Modeling of Class Analogy (SIMCA). SIMCA utilizes the features of PCA to construct significance limits for specified classes of samples in the scores and the residual direction. Mapping of unknown samples onto the calculated models provides the class identity based on similarity between the unknown samples and the samples in the predefined class models. A method of visualizing the SIMCA approach is the Cooman's plot,24 which plots class distances against each other. We built separate PCA models for the sera of EOC patients, patients with benign ovarian cysts and postmenopausal healthy controls. SIMCA was then applied to the models using the Cooman's plot to assess the classification performance by predicting class membership in terms of distance from the model. The critical distance from the model used corresponded to a 0.05 level and defined a 95% tolerance interval.
Receiver operating characteristic curve (ROC) analysis
Although principal component analysis is an excellent tool for data reduction and hence graphical display, it does not lend itself to the development of a diagnostic model to predict presence or absence of disease. To address this, we performed univariate ROC analyses via individual logistic regressions for each of 219 1H-NMR regions in order to examine their utility for predicting EOC. The sensitivity and specificity trade-offs were summarized for each variable using the area under the ROC curve denoted AUC and calculated using the trapezoidal rule. An AUC value of 1.0 corresponds to a prediction model with 100% sensitivity and 100% specificity, whereas an AUC value 0.5 corresponds to a poor predictive model (see Pepe et al.25 for an overview of ROC analyses via logistic regression modeling). The best 2-variable models were then fit starting from the univariate information via a forward stepwise selection using the AUC as the criteria for a variable's entry into the model. Due to the high degree of accuracy for a 2-variable model, we did not find it necessary to proceed to a 3-variable model.
Analysis of spectral pattern differences
Based on the results of unsupervised PCA, supervised SIMCA and ROC analyses, we proceeded to identify the molecules responsible for the differences in spectral patterns utilizing a previously described methodology.26 The regions of the NMR spectrum that most strongly influence separation between EOC and healthy controls were identified by the regression coefficients. The coefficients were derived from the PCA models such that each bar represents a spectral region covering 0.04 ppm, showing how the 1H-NMR profile of the EOC samples differed from the 1H-NMR profile of the healthy serum samples. A negative value indicates a relatively greater concentration of metabolite (assigned using NMR chemical shift assignment tables) present in EOC samples, and a positive value indicates a relatively lower concentration, with respect to EOC samples.
Characteristics of the patients
The stage distribution of the 38 EOC patients were as follows: stage I, 2 patients; stage IIIC, 34 patients; stage IV, 2 patients (Table I). Among patients with advanced disease (stages IIIC and IV), 4 (11%) had normal preoperative serum CA125 levels (<35 units/ml). In addition, preoperative CA125 was normal in 1 of the 2 patients with stage I disease. The age range of the study patients was 46–86 years. The age range of the 19 healthy premenopausal controls was 22–44 years, whereas the remaining 32 postmenopausal controls had an age range of 51–69 years. The age range of the 12 patients with benign ovarian cysts was 22–68. There were no significant differences between the EOC patients and postmenopausal controls with respect to age, parity and use of oral contraceptives.
|Characteristics||Epithelial ovarian cancer patients||Patients with benign ovarian disease||Postmenopausal controls||Premenopausal controls|
|No. of subjects||38||12||32||19|
|Age (median/range)||61 (46–86)||50 (22–68)||57 (51–69)||28 (22–44)|
|Others (transitional, mixed)||2||—||—||—|
|Hemorrhagic functional cyst||—||3||—||—|
|Use of oral contraceptives||Undocumented|
1H-NMR spectra of sera from EOC patients and controls
Once the NMR spectra were obtained, the parameters of chemical shift and signal intensity were analyzed. The chemical shift is the difference between a specific resonance frequency and the frequency of a chosen reference. As indicated above, chemical shifts are expressed relative to trimethylsilyl-2,2,3,3-tetradeuteropropionic acid and expressed in parts per million (ppm). Methyl group protons resonate from 0.7–2.0 ppm in C-CH3 groups but from 2.1–3.5 ppm in N-CH3 groups. Aromatic ring protons will resonate from 6.0–9.0 ppm. Thus, the chemical shift already contains information on the molecular structure and can be used to discriminate 1H-NMR spectra of molecules, even when their chemical structure is slightly different. The detailed chemical shift information of different chemical groups have been published,27, 28 and chemical components were assigned to the spectra on the basis of these published data.29, 30 Figure 1a shows the 600 MHz 1H-NMR spectra of serum from a postmenopausal patient with stage 1 EOC, Figure 1b shows the spectra from a healthy postmenopausal patient, Figure 1c shows the spectra from a healthy premenopausal patient, and Figure 1d shows the spectra from a patient with benign ovarian cyst (ovarian endometriosis).
To remove any ambiguity in assigned chemical shift values, a sample was “spiked” with a small amount of 3 reference compounds to test if perfect superposition of the signals can be achieved. A sample of alanine was added first, followed by valine, and then glucose with spectra acquired after each addition. In each case, the resonances of the reference fell directly on top of the assigned resonances in the biofluid.
PCA analysis of 1H-NMR spectra of sera from EOC patients and controls
The data-reduced 1H-NMR-spectra are shown in Figure 2. Subsequent PCA analysis of the dataset indicated good discrimination between EOC patients and controls. Thus, we were able to correctly separate all of the 38 cancer specimens (100%) and all of the 21 premenopausal normal samples (100%) (Fig. 3a). In addition, it was possible to correctly separate 37 of 38 (97.4%) cancer specimens and 31 of 32 (97%) postmenopausal control serum specimens (Fig. 3b). When patients with benign ovarian disease were included in the PCA analysis, it was still possible to correctly separate all of 38 cancer specimens (100%) from the sera of all 12 patients with benign ovarian disease (Fig. 3c). Although sera from patients with benign disease overlapped with sera from the healthy controls, it was possible to achieve separation of cancer vs. noncancer cases. All PCA plots indicated that most of the variation occurred in the first 2 principal components. Since the majority of EOC patients in our study and in clinical practice are postmenopausal, we chose to perform further analysis by comparing the benign and cancer patients with healthy postmenopausal controls.
In addition to PCA analysis, supervised analysis consisting of Soft Independent Modeling of Class Analogy (SIMCA) was applied to the dataset. The resulting Cooman's plot demonstrated that sera classes from patients with EOC, benign ovarian cysts and the postmenopausal healthy controls did not share multivariate space, providing validation for the class separation (Fig. 4). Therefore, it should be possible to predict whether future samples can be classified as cancer or noncancer. This preliminary data demonstrated that 1H-NMR-based metabonomic analysis of serum samples could achieve a clinically useful performance for the identification of serum samples of patients with EOC.
ROC analysis using the raw location data showed that a 2-variable model consisting of the 1H-NMR descriptors at 2.77 and 2.04 ppm provided a perfect fitting model, i.e., AUC = 1.0. A scatter plot is provided in Figure 5, which clearly illustrates the delineation between the 2 groups. Of note, the univariate model that considered only region 2.04 ppm gave an AUC = 0.942, whereas the AUC for the univariate model for region 2.77 ppm an AUC = 0.689, i.e., prediction based upon region 2.04 is enhanced conditional upon the information contained in region 2.77 ppm.
Influential loadings responsible for spectral differences between sera from EOC patients and controls
Based on the promising results showing complete separation of patients with EOC and controls using unsupervised PCA, supervised SIMCA and ROC analyses applied to 1H-NMR spectra of sera, we proceeded to identify the molecules responsible for the differences in spectral patterns. In general, the regression coefficients, or loadings, most influential for the EOC samples compared to postmenopausal controls lie around δ3.7 ppm (due to various sugar hydrogens), whereas another influential loading for the EOC samples compared to premenopausal controls is located around 2.25 (unassigned) (Fig. 6). Other loadings (e.g., in the regions 2.4–2.3 and 1.18 ppm) suggest greater amounts of 3-hydroxybutyrate in the sera of EOC patients compared to pre- and postmenopausal controls. However, the loadings at 2.25 and between 2.35 and 2.40 ppm are characterized by underlying broad peaks that could stem from macromolecules rather than 3-hydroxybutyrate. To validate these observations, additional reference compounds were utilized in spiking experiments. These included 3-hydroxy-butyrate, acetoacetate, acetone and isobutyrate. We were able to demonstrate that 3-hydroxy-butyrate was correctly identified. This compound was clearly visible in half of the spectra from patients with EOC and in none of the spectra from the pre- or postmenopausal controls, nor was it seen in the spectra of patients with benign ovarian disease. In contrast, aceto-acetate, acetone and isobutyrate did not contribute to the observed spectral pattern differences.
Significant obstacles to the early detection of EOC exist. The ovaries are small, relatively inaccessible organs that lie in the peritoneal cavity, and most masses that arise in the ovaries are benign. Because of the relative rarity of EOC, a cost-effective early detection program will require either a very accurate screening method or a method of triaging women by risk or both. Therefore, an effective, clinically useful test for early detection of EOC should be measurable in a readily accessible body fluid, such as blood, urine or saliva. In addition, such marker(s) should provide high sensitivity, specificity and positive predictive value. Until recently, the search for EOC-related biomarkers for early disease detection has been a one-at-a-time approach to look for proteins that are overexpressed as a consequence of the disease process and are shed into body fluids.10 Unfortunately, this approach is laborious and time-consuming, as each candidate biomarker(s) must be identified from among the thousands of intact and cleaved proteins in the human serum. Antibodies would then need to be developed to validate and check the protein marker for specificity and sensitivity. However, the novel approach of 1H-NMR metabonomics is especially well suited to the discovery and implementation of an early detection strategy, as body fluids are an acellular, metabolite rich information reservoir that contain the end products of homeostatic alterations during disease processes. Our data indicate that the sera from patients with EOC and healthy postmenopausal women could be identified with 100% sensitivity and specificity at the 1H-NMR regions 2.77 ppm and 2.04 ppm from the origin (AUC of ROC curve = 1.0). The 2 patients in our study with stage I EOC were also correctly classified as cases. In addition, the sera of patients with EOC were correctly separated from those with benign ovarian disease. These results suggest that high-resolution 1H-NMR-based metabonomic analysis of serum deserves further evaluation as a potential novel strategy for the early detection of epithelial ovarian cancer.
NMR-based metabonomics offers several distinct advantages in a clinical setting. First, it can be carried out on standard preparations of serum, plasma or urine, circumventing the need for specialist preparations of cellular RNA and protein required for genomics and proteomics, respectively.19, 31, 32, 33 Second, since cancer is now known to be a product of the tumor-host microenvironment,34 the organ-specific milieu can generate, and enzymatically modify, multiple proteins, peptides, metabolites and cleavage products at much higher concentrations than for molecules derived only from the tumor cells. However, biologic NMR spectra are extremely complex and much information can be lost, even in rigorous statistical analysis of quantitative data, as the essential diagnostic parameters are carried in the overall patterns of the spectra. Therefore, to reduce NMR data complexity and facilitate analysis, automatic data reduction followed by chemometric methods such as principal components analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) can be applied.17 Recently, an integrated metabonomic approach was applied to investigation of the presence and severity of coronary heart disease (CHD).35 It was possible to completely separate CHD patients with stenosis of all 3 major arteries from subjects with normal coronary arteries using both unsupervised PCA and supervised PLS-DA applied to 1H-NMR spectra of human serum.35 In another study, Brindle et al.36 were able to distinguish serum samples from subjects with low/normal systolic blood pressure from borderline and high systolic blood pressures using NMR spectroscopy. These studies demonstrate the potential ability of 1H-NMR-based metabonomics to distinguish serum samples of individuals affected and unaffected by disease, without requiring preselection of measurable analytes.
The initial report indicating that 1H-NMR spectroscopy of plasma might be useful for cancer detection was published in 1986 by Fossel et al.37 The study was based on the measurements of 1H-NMR spectra of plasma at either 360 or 400 MHz at 20–22°C on 331 subjects including controls, patients with various types of malignant and benign tumors and pregnant women, and examination of the spectra by applying a parameter, Fossel Index, FI, which is calculated as a mean of the approximate widths at half-height of the methylene and methyl resonance envelopes. Although it appeared possible to clearly and reliably distinguish between normal controls (FI = 39.5 ± 1.6 Hz) and patients with malignant tumors (FI = 29.9 ± 2.5 Hz), in many subsequent studies, a remarkable overlap between cancer patients and controls was noted. This led to an intensive inter-laboratory evaluation of the reproducibility and accuracy of the NMR test for cancer by Chmurny et al.38 This test was found to be reproducible but not accurate for screening a general asymptomatic population for cancer. There are several limitations of these early studies. First, affected subjects in these studies had cancer of different organ sites and histologies. Clearly, there is great variability in the biology, invasiveness and metastatic potential of different tumors, and it would be surprising to find a single test that could reliably detect all or even a large number of cancers.38 Second, and most important, the early NMR studies are different from metabonomics because of the significant improvements in computationally intense and robust analytic methods.
At the present time, there is no clearly defined precancerous phase of EOC. However, it is clear that even the detection of early asymptomatic invasive stage I/II disease could have a profound impact on clinical outcome. Therefore, our encouraging results will need to be validated in a larger cohort of women with well-defined clinical correlates such as histologic type, tumor grade and response to therapy. In addition, since novel spectroscopic methods are not really required to diagnose advanced EOC (stages III/IV), our study lays the framework for future studies to assess the performance characteristics of 1H-NMR metabonomics alone, or in combination with other markers for the detection of stage I/II EOC, before conducting prospective screening and cancer control studies.39 In this regard, we have initiated a larger study aimed at validating this metabonomic model in the general and high-risk populations. In addition, it will be important to evaluate the utility of the metabonomic approach in monitoring the course of EOC during treatment. Finally, the elucidation of molecules responsible for 1H-NMR spectral differences in the sera of patients with early EOC compared to healthy controls could lead to the identification of a panel of novel biomarkers and/or targets for therapeutic intervention.
A.H. is supported by a NYSTAR faculty development grant. S.E.M is supported by NCI grant KO7CA89123.
- 3NHS Centre for Reviews and Dissemination. Screening for ovarian cancer. Database of Abstracts of Reviews of Effectiveness, 2003.
- 23Introduction to multi- and megavariate data analysis using projection methods (PCA & PLS). Umea, Sweden: Umetrics, 1999., , , .
- 27Spectral data for structure determination of organic compounds. Berlin: Springer-Verlag, 1989., , , .