Susceptibility to false discovery in biomarker research using liquid chromatography–high resolution mass spectrometry based untargeted metabolomics profiling

Our study demonstrates that biomarker research using liquid chromatography (LC)-high resolution (HR) mass spectrometry (MS) based untargeted metabolomics profiling is susceptible to the discovery of false positive biomarkers. LC-MS, especially LC-HRMS, is popularly used to dis-cover putative biomarkers through comparing untargeted metabolomic profiles between a patient group and a control group. 1,2 This approach is susceptible to various pre-analytical, analytical, and post-analytical biases. 3 More-over, isotopes, adducts, in-source fragment products of some metabolites, artifacts, and contaminants could be wrongly considered as unique metabolomic features. 4,5 To what extent the putative metabolomic biomarkers could be false remains unknown.


Susceptibility to false discovery in biomarker research using liquid chromatography-high resolution mass spectrometry based untargeted metabolomics profiling
Dear Editor, Our study demonstrates that biomarker research using liquid chromatography (LC)-high resolution (HR) mass spectrometry (MS) based untargeted metabolomics profiling is susceptible to the discovery of false positive biomarkers.
LC-MS, especially LC-HRMS, is popularly used to discover putative biomarkers through comparing untargeted metabolomic profiles between a patient group and a control group. 1,2 This approach is susceptible to various preanalytical, analytical, and post-analytical biases. 3 Moreover, isotopes, adducts, in-source fragment products of some metabolites, artifacts, and contaminants could be wrongly considered as unique metabolomic features. 4,5 To what extent the putative metabolomic biomarkers could be false remains unknown.
We attempted to identify putative biomarkers for differentiating two artificial groups of plasma samples (12 samples in each group) with well-defined differences in their metabolome contents ( Figure 1A, Table 1; Tables S1 and S2). By design, a maximum of 22 putative biomarkers are true, and the rest of the putative biomarkers must be false.
The number of metabolomic features depended on the signal-to-noise ratio threshold (snthresh) used for feature extraction (Table 2). A snthresh of 5 had been widely used (Table S3). Using a snthresh of 5 and a false discovery rate (FDR) cutoff of 5% for data mining, 22 true biomarkers (i.e., true positives) and 165 false positive biomarkers were observed (Table 3; Table S4). Therefore, the actual FDR (i.e., the false positive rate) was 88% instead of 5% (Table 2). Increasing the snthresh and reducing the sample size (e.g., n = 6 for each group) could decrease the number of false positive biomarkers. However, the actual FDR remained > 60% (Table 2; Table S5). We also performed a negative control experiment using two groups of plasma samples (n = 12 for each group) having identical metabolome contents. No differential metabolomic features were found (Supplemental Table S6). This indi-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. Clinical and Translational Medicine published by John Wiley & Sons Australia, Ltd on behalf of Shanghai Institute of Clinical Bioinformatics cated that our experimental methods did not suffer from any obvious pre-analytical biases and analytical biases.
We revealed identities of 93% (154) of the false positive biomarkers (Table 4; Table S4). Eighty-five percent (141) of the false positive biomarkers were contributed by insource fragmentation products (29), in-source complexes (23), adducts (79), and isotopes (10) of the true biomarkers ( Figure 1B-1G), whereas 8% (13) were contributed by metabolites irrelevant to the true biomarkers ( Figure S1). We named these two types of false positive biomarkers as "relevant false positive biomarkers" and "irrelevant false positive biomarkers," respectively. Those adducts, insource complexes, etc. should be considered as false positives because their signal intensities cannot be accurately measured in biological specimens. 8 Metabolomic features corresponding to 88 (62%) relevant false positive biomarkers were observed in the LC-HRMS profiles of the respective mixtures of the pure metabolites (i.e., Set A and Set B, Table S7). This confirmed they were originated from the true biomarkers. Although CAMERA and MS-FLO were used to annotate and remove redundant peaks, redundant features still existed in the final list of metabolomic features (Table S4). Similar annotation mistakes could be found in the previous studies. 6,7 It was not possible to differentiate the relevant false positive biomarkers from the true biomarkers according to signal intensity change ( Figure 1H). However, they could be identified through manual inspection of the MS and MS/MS spectra. A true biomarker and its relevant false positive biomarkers shared nearly the same retention time and common MS/MS fragmentation products ( Figures 1B-1G).
For the irrelevant false positive biomarkers, one possible cause for their presence was matrix effect. 9 In electrospray ionization MS, matrix effect is characterized as an alteration of signal response by co-eluting substances. Nine (69%) of 13 irrelevant false positive biomarkers had  (A) When using real patient samples for biomarker discovery, it is not possible to tell which putative biomarkers are false, and it is also not possible to avoid pre-analytical biases. To overcome these two problems, two groups of plasma samples (12 samples in each group) mimicking those collected from a disease group and a non-disease group were created by spiking separately with two different sets of 11 metabolite standards into a preparation of pooled human plasma. Each of the 24 plasma samples (i.e., 12 from the disease group and 12 from the non-disease group) was treated as a unique patient sample and subjected to separate metabolite extraction and LC-HRMS based untargeted metabolomics profiling. XCMS was employed for feature extraction, grouping and retention time alignment. CAMERA and MS-FLO were used to annotate and remove the redundant peaks. At a false discovery rate of ≤ 5%, metabolomic features with statistically significant differential intensities were regarded as putative biomarkers. By the design, a maximum of 22 putative biomarkers could be true, and the rest of the putative biomarkers were false. Six relevant false positive biomarkers which were identified as the in-source fragmentation products (B, C) and adducts (D,E) of a true biomarker (e.g., Tryptophan, Trp). The retention times of the in-source fragmentation products (B) and adducts (D) was the same as that of their retention times overlapping with the spiked metabolites (Table 4). Since the negative control experiment did not find any differential metabolomic features, our data suggested that increasing the plasma levels of 22 metabolites (11 in each study group) by about two-folds were sufficient to alter the "matrix" significantly. Fortunately, fold-changes of the irrelevant false positive biomarkers were significantly lower than those of the true biomarkers and the relevant false positive biomarkers ( Figure 1H). A fold-change cutoff of 1.5 could reject 11 of 13 irrelevant false positive biomarkers in the present study, thus suggesting that any differential metabolomic features with a fold-change < 1. Unlike real biomarker discovery studies, in the present study "true" biomarkers and other metabolites were identical among the plasma samples within each study group. Such "homogeneous" design allowed us to identify artifacts unambiguously using a small sample size. With respect to the complexity of metabolomes among patient specimens in real life, the false positive rate of putative metabolomic biomarkers in a real study could be higher or lower than the value reported by us. The amount of false positive biomarkers should depend on various factors, such as number, quantity, and properties of true biomarkers, complex interactions between the true biomarkers and background metabolites in individual specimens, detection sensitivity of the MS platform, and the study sample size. We showed that larger sample size (n = 12 vs. n = 6 for each group) resulted in more false positive biomarkers because of higher statistical power. In general, about 50 cases or more for each study group are needed for reliable biomarker discovery. 10 We speculate that with sufficient statistical power, similar types of false positive biomarkers could be identified.
To our knowledge, this study is the first to demonstrate that the use of LC-HRMS based metabolomic profiling for metabolite biomarker discovery are susceptible to false discovery. Without knowing the identities, it is risky to treat any statistically differential features as putative biomarkers. Relevant false positive biomarkers could be identified through careful manual inspection of the retention time, MS, and MS/MS information. Informatics tools which can accurately and effectively annotate the LC-HRMS features need to be developed. Irrelevant false positive biomarkers could be minimized by using an appropriate fold-change cutoff ( Figure 1I). Other approaches, such as multicenter study, validation using an independent patient cohort and orthogonal chemical analysis, should also help eliminate the false positive biomarkers.

C O N F L I C T O F I N T E R E S T
The authors declare no conflict of interest.

AVA I L A B I L I T Y O F D ATA A N D M AT E R I A L
All supplementary tables and figures are provided with the manuscript. The LC-HRMS raw data files of the present study is available at the NIH Common Fund's National Metabolomics Data Repository (NMDR) website, the Metabolomics Workbench, TA B L E 3 Summary of 22 true biomarkers among the 187 putative biomarkers discovered by comparing the metabolomic profiles of plasma samples mimicking those collected from diseased subjects (n = 12) and non-diseased subjects (n = 12). The metabolomic features were extracted using a snthresh value  (10) In-source complex 23 (23)