Limitations of using 16S rRNA microbiome sequencing to predict oral squamous cell carcinoma

A new era of next‐generation sequencing has changed our perception of the oral microbiome in health and disease, and with this there is a growing understanding that the oral microbiome is a contributing factor to oral squamous cell carcinoma (OSCC), a malignancy of the oral cavity. This study aimed to analyse the trends and relevant literature based on the 16S rRNA oral microbiome in head and neck cancer using next‐generation sequencing technologies, and to conduct a meta‐analysis of the studies with OSCC cases and healthy controls. A literature search using the databases Web of Science and PubMed was conducted in a scoping‐like review to collect information based on the study design, and plots were generated using RStudio. We selected case–control studies using 16S rRNA oral microbiome sequencing analysis in OSCC cases versus healthy controls for re‐analysis. Statistical analyses were conducted using R. Out of 916 original articles, we filtered and selected 58 studies for review, and 11 studies for meta‐analysis. Differences between sampling type, DNA extraction methods, next‐generation sequencing technology and region of the 16S rRNA were identified. No significant differences in the α‐ and β‐diversity between health and oral squamous cell carcinoma were observed (p < 0.05). Random Forest classification marginally improved predictability of four studies (training set) when split 80/20. We found an increase in Selenomonas, Leptotrichia and Prevotella species to be indicative of disease. A number of technological advances have been accomplished to study oral microbial dysbiosis in oral squamous cell carcinoma. There is a clear need for standardization of study design and methodology to ensure 16S rRNA outputs are comparable across the discipline in the hope of identifying ‘biomarker’ organisms for designing screening or diagnostic tools.

thought to reside in the oral cavity (5). There has been a marked upward trend in oral microbiome research over the past 10 years, and this is also true in relation to oral cancer (10). Oral microbes can induce carcinogenic changes leading to enrichment of lipopolysaccharide (LPS) biosynthesis and epigenetic modulation causing pro-inflammatory changes in the local tumour microenvironment (5,11). These changes are often influenced by foreign carcinogenic substances induced through smoking and tobacco by-products, which can be broken down by microbial metabolites (12).
Studies have implicated the role of Fusobacterium, Pseudomonas, Porphyromonas, Provotella, Campylobacter, Rothia and Leptotrichia in the progression of OSCC (10,11,13). These bacteria are often present in the surrounding tumour environment, however recent evidence has shown that a few species, such as Fusobacterium nucleatum and Porphyromonas. gingivalis can reside within the tumour itself (14). These 'intratumoural' bacteria play a key role in modulating immune-related changes leading to a more aggressive, enhanced tumour form (15,16). Several studies have been published utilizing amplicon sequencing of the oral microbiome in OSCC, however, wide-scale variances in study design including sampling technique, nucleic acid extraction methods, sequencing technique and region of 16S rRNA selected for analysis vary dramatically between studies, ultimately hindering the ability to compare findings (13,14,17). However, it is now thought that specific members of the oral microbial community promote genetic instability, tumour proliferation and changes to the host metabolism contributing to resistance to therapy (18).
In a systematic review conducted in 2021, Mun et al. (10) concluded that there is evidence for the functional properties of the oral microbiome in OSCC, and analysis of the oral microbiome with meta-transcriptomics could further improve our understanding. The study of the oral microbiome remains a potential resource in diagnostic and therapeutic clinical intervention of OSCC. Although revolutionary, NGS has its own set of limitations, hence we hypothesised that a re-analysis of oral microbiome datasets could resolve gaps within the current research. The aim of the study was to collect and analyse publicly available datasets on 16S rRNA sequencing of the oral microbiome in OSCC using a reproducible, standardized pipeline for downloading, processing, and interpreting the data (19). Additionally, this study aims to profile the functional potential to identify and classify key organisms that may act as predictors of disease within the OSCC microbiome.

Search criteria
We utilized an analysis pipeline recently developed by our group, as illustrated in Fig. 1. A full methodology has been included as supplementary material and significant deviations have been briefly listed here (19). Studies were collected by using keyword searches on PubMed and Web of Science (Clarivate Analytics, Philadelphia, Pensylvania, USA) to select for microbiome and NGS studies, performed on the oral cavity, and specifically related to oral squamous cell carcinoma (Table S1). These were filtered to exclude any study published before 2012 to coincide with the advancement of microbiome sequencing platforms (20). Remaining studies were exported to the reference manager Endnote X9 (Clarivate Analytics, Philadelphia, Pensylvania, USA).

Study inclusion and exclusion criteria
Studies were excluded if they were not oral microbiome amplicon studies, had no data accession number, were not mappable to individual samples, had no available metadata for individuals within the study. We also excluded studies with data 'available upon reasonable request', in vitro studies, metagenomic studies and transcriptomic studies.
Shortlisting of studies and retention of key information was carried out by CLRV. These then underwent a twostep process for verification and inclusion by another laboratory-based clinician (SS). Shortlisted articles were then assigned a score from 1 to 5 based on data access and cohort metadata inclusion as outlined previously (19).

Data retrieval and processing
Available data were downloaded from the European Nucleotide Archive (ENA) database and processed as described previously unless otherwise stated (19). In brief, Quality Control protocols were carried out in Qiime2 (21). Primers and barcodes were removed and reads trimmed if below 100 bp. Paired end reads were merged and prefiltered using sortMeRNA against the Silva-bac-id90 database (22) before being assigned to operational taxonomic units (OTUs) by mapping to the human oral microbiome database (HOMD) and GreenGenes.
Operational taxonomic units clustering was performed in closed-reference mode using the Vsearch (https://github. com/qiime2/q2-vsearch) package within the Qiime2 (v 2019.10). Phylogenetic trees were constructed for all representative OTUs using the FastTree algorithm within Qiime2 (23). OTU tables were exported and combined with study metadata tables and imported into R for manipulation and visualization.

Microbiome diversity and composition analyses
Diversity and compositional analyses were carried out as previously described (19). Briefly, aand b-diversity analyses were calculated using phyloseq. All samples within the case-control studies were normalized to the geometric mean, significance was determined using a Welch's t-test and mean Log fold change was calculated for each of the disease vs health samples.
The centred log-ratio (CLR) Euclidean distance matrix was calculated using the make.CLR function within MicrobeR with replacement of 0 counts using the (A) (B) Fig. 1. Study design, collection of metadata and data processing. (A) Based on our inclusion criteria, a total of 916 studies were screened and 58 studies were included in the review and 19 eligible studies with publicly available data were included in the re-analysis. (B) Overview of study characteristics including countries, sex, sequencing technique, region of 16S rRNA, sample type, DNA extraction method/kit, smoking and alcohol status of the healthy and diseased cohort. Illumina MiSeq and V4 region of the 16S rRNA were most popular for sequencing. Saliva was the most common sampling method, and the number of samples from males, non-smokers and alcoholics was higher.
zCompositions function and then calculating the distance matrix in base R. The Phylogenetic Isometric Log-Ratio Transform (PhILR) from the phylogenetic trees, created as described above, were visualized using the R package PhiLR (24).
The function ape::PCoA was then used to ordinate the distance matrixes into a 2D plot (25). Ggplot2 was used for visualization and combining of plots within R. Permutational multivariate analysis of variance was performed and assessed for statistical significance using the ADONIS function within the vegan package (https://github.com/ vegandevs/vegan) and performed individually on all distance matrix with 999 replications.

Random forest classifiers
As described previously, four high-impact case-controlled studies were selected as training datasets for random forest classifiers (19,26). The data was randomly split into a 80:20 ratio to create a training and validation dataset. Centre log-ratio normalized OTUs were used as predictor variables for healthy/disease cases. CLR normalized features from the PICRUSt derived KEGG orthology (KO) feature abundances and PhiLR abundances were additionally added.
Random forest models were built using the randomforest package and receiver operator characteristic (ROC) curves were built based on random forest models using the pROC and ggROC packages (27,28). Prediction and performance metrics were extracted using the predict and performance functions from ROCR (29). The most important features were extracted from random forest models by ranking MeanDecreaseGini scores and plotted using ggplot2. Normalized OTUs from the PICRUSt derived database were used to inform on the functionality of microbiome datasets and the most important features of this model were used to identify enriched metabolic pathways by matching to the KEGG database using the clus-terProfiler package in R (30). Random forest models based upon the random assignment 80:20 ratio for 80% training and 20% validation were built iteratively for each individual case-controlled study. The area under the curve was produced using the predict and performance functions for assessment of each study.

Data collection and review of publications
The initial search conducted on Web of Science yielded 770 articles and was followed up with an additional search on PubMed produced another 146 studies. The study design, as summarized in Fig. 1A, yielded a total of 58 studies relevant to the oral microbiome in OSCC and subsequently 19 were selected for the meta-analysis of publicly available data. A detailed summary of the study design parameters of all 58 studies is included in Data S1. Out of the 19 studies, 11 case-controlled studies were included in the re-analysis if that had sufficient reads >1000. The study parameters, as well as key findings, of the included 11 studies are summarized in Table 1.
We found nine studies that did not provide accessible sequencing data and were graded 0 for data access. Seventeen studies were graded 1 since the data provided could not be mapped to individual health/disease samples. Eleven studies had publicly available data with the healthy and diseased cohort described within the published article but not individually mappable to the sequencing data and were given a score of 2. A total of 21 studies had publicly available data, out of which five were given a score of 3 based as metadata was both available and mappable to sequencing data. Within the original search criteria we allowed for the inclusion of ITS or fungal amplicon based data and only one dataset remained, which was insufficient for reanalysis.
Sixteen studies were given a score of 4/5 if data were available, mappable, and additional metadata was provided (Data S1).
Overall, the metadata showed wide variances in sampling site, sampling technique, DNA extraction method, sequencing technique, region of 16S rRNA selected for analysis as summarized in Fig. 1B. Out of the total number of samples (n = 1197) included, the most common sampling technique was saliva (48%). Around 55% of the studies had selected the V4 hypervariable region of the 16 rRNA with Illumina MiSeq sequencer used for 95% of studies. DNA extraction methods varied among the individual studies, however, 31% used QIAamp DNA blood mini kit. We collected additional information regarding the alcohol consumption and smoking status among the studies (Fig. S1A,B), and other relevant information such as tumour size, stage and immune status of the enrolled OSCC subjects. Around 4% of the samples were smokers, and 10% were alcohol consumers.

Bacterial diversity analysis
The individual samples which had <1000 total passed reads were removed and the data were plotted to show the average total reads per sample, per study. Alpha diversity indexes were implemented to represent genus-or species-level diversity within the individual samples which was statistically compared between the cohort groups. The data were analysed using the observed, Shannon, Simpson and Chao1 diversity indexes on all samples. Minor differences were observed between the healthy and OSCC samples which was not statistically significant as seen in Fig. 2. The overall alpha diversity was slightly lower in the OSCC group which was also nonsignificant. However, when looking at the alpha diversity of smokers vs non-smokers, it was slightly lower in the former group, however this was not significant. In alcoholics vs non-alcoholics, it was  S1). Beta diversity is a metric used to derive differences on a sample-to-sample basis in which diversity can be observed by clustering samples and analysing their level of dissimilarity. The resulting diversity is represented in a distance matrix from which ordination plots are generated to view patterns. Additionally permutational multivariate analysis of variance (PERMANVOVA) was tested via ADONIS function (Table S2). Unifrac, Bray proportion, PhiLR Euclidean and Euclidean distance matrices were utilized for determining variation in the microbiome between collated samples. Ordination plots based upon principle coordinate analysis (PCOA) were drawn and colourised to show healthy (blue) vs OSCC (pink) samples in Fig. 3A. There was some clustering observed in the PhiLR distance matrix, while statistically significant differences were observed (p < 0.001) across all diversity matrices when comparing health and disease groups via ADONIS testing. The PhiLR, weighted UniFrac and Bray-Curtis data types had an R 2 value of 0.024, 0.011 and 0.013 respectively. Individual studies clustered together (Fig. 3B), but become less clustered in PhiLR Euclidean (R 2 = 0.042) and UniFrac proportion matrices (R 2 = 0.046).
Additionally, we analysed the 16S rRNA sequencing region and sampling methods chosen by individual studies and observed clustering within the V4 region and saliva (Fig. S2A,B). ADONIS analysis determined these to be statistically significant with a p < 0.001 (Table S2).

Predictability of OSCC diagnosis among the metadata
The 11 studies were analysed to test the predictability of OSCC samples using the receiver operating characteristic curve (ROC) from random forest classifiers. Four high-impact case-controlled studies, determined by citation number and sample volume, were selected as a training set (Fig. 4). The studies were tested across genus, species, Kegg Orthology (KO) assignment, OTUs and PhILR. We also applied an 80/20 train-test split and calculated the area under the curve (AUC) for the training test set. The overall predictability slightly improved upon applying the 80/20 split. We found that the predictability of the KO group and the PhiLR group in the training set had an approximate AUC of 0.75. An AUC approaching 1 is a good measure of predictability, and the true positive rate of our health and OSCC samples in the training set was higher than the complete case-controlled study set. In the genus, species and OTU groups, the AUC was around 0.5, and similar in both the complete and training study set.

Predictive capacity of individual studies
The 11 individual studies were then subjected to random forest classification to test whether they could accurately predict OSCC or health. Overall, the studies Wolf et al.  (48) showed an AUC less than 0.75 for genus, but over 0.75 for the other parameters. These studies are the best predictors of health and disease and furthermore the alpha diversity denoted by the log2fold change was also calculated. Four studies showed a significant increase in the Simpson index, while one study showed an increase across all four indexes (Fig. 5B). Two studies showed a significant increase in Shannon and observed indexes, while one study was upregulated in Chao1 index. However, no significant difference was observed among the other seven studies. This was calculated using Welch's t-test and a p-value <0.05 was statistically significant. Overall, only 3 of the 11 studies had good predictability, while Granato et al. (50) had an AUC of 1, which could be due to overfitting of data. The overall alpha diversity was also not consistently significantly up or downregulated between the various studies, which may mean that the probability of these studies accurately predicting health and disease is quite low.

Distribution of organisms across health and disease
From the random forest classifier based upon our case-controlled studies the variable importance was determined for the individual features from the model built from species-level OTUs and KEGG orthology level features (Fig. 6). The Gini coefficient was utilized to rank each variable into importance, this metric indicates the level each variable contributed to creating a strong classification tree. From the HOMD identified species we see that there a varying number of genus within our important features. Our top three species from the HOMD database with a corresponding higher level of abundance in disease according to the mean decrease in Gini were unclassified Selemonas sp. HMT 126, unclassified Leptotrichia sp. HMT 223 and Prevotella denticola. The three species Acidovorax caeni, unclassified Actinomyces sp. HMT 175 and unclassified Stomatabaculum sp. HMT 373 were the most important features with a higher abundance in health compared to disease. LogFC were low when comparing our two groups with no features exhibiting larger that 1.5 LogFC. Similarly, our models were not able to accurately classify between health and disease using bacterial features between health and disease. This resulted in low values from our mean decrease in Gini as none of our variables were accurate predictors between health and OSCC. A similar finding was observed when using Kegg orthologies derived from our PICRUSt analysis. Many of the important variables only exhibited a low mean decrease in Gini, and additionally the corresponding fold changes were small between our two groups of health and disease. Important features were grouped into pathways to elucidate any discriminating pathways based upon our highest scoring variables of interest (Fig. 7). We observed some significant clustering of metabolic pathways related to the two component system, nitrogen metabolism and starch and sucrose metabolism, with overrepresentation of metabolites in our disease cohort illustrated by a negative fold change in metabolic potential.

DISCUSSION
Oral squamous cell carcinoma is an increasingly important area of oral health, so improving our diagnostic capabilities is essential. Focusing microbiologically is one critical line of travel. The purpose of our meta-analysis was to review and analyse the current trends in study design and processing of oral microbiome 16S rRNA datasets based on healthy and OSCC patient studies published so far. Overall, the study design seems to be lacking in rationale, with widespread variances in selection of the 16S hypervariable region for sequencing, DNA extraction methods and sampling method for OSCC lesions. Such differences can impact the overall microbiome under study, thus causing a misinterpretation of the oral microbial community, as evidenced by previous systematic reviews (10,13). A key aim of this study was to identify potential biomarker organisms in OSCC, which could help map early development of the disease as an aid to diagnosis, as evidenced by previous studies conducted with the gut (31). The data for the 19 studies finalized for the reanalyses were successfully downloaded and processed, however we only included 11 case-controlled studies of health vs OSCC and filtered all other samples including those with low median reads. The study characteristics included in our data collection included countries, sex, region of 16S rRNA, sample type, DNA extraction method/kit, sequencing technique, smoking and alcohol status of the healthy and diseased cohort. The selection of DNA extraction kit is crucial for production of good quality genomic DNA (gDNA). When extracting DNA from oral samples, mechanical cell lysis can improve the overall bacterial yield from saliva (32). One study observed that enzymatic digestion increases the amount of DNA, particularly with phenol-chloroform extraction (33). Our results have shown that QIAGENÒ kits have gained popularity, and it was found to yield a higher bacterial diversity, however, it may underestimate the oral microbiome (34). Several studies have found that each kit has its own flaws, and bias can be introduced at any point during processing (35). Therefore, the method of DNA extraction should be considered carefully in conjunction with the hypervariable region selected for analysis and should ideally be standardized for oral microbiome studies. NGS technologies have shorter read lengths, therefore, it is crucial to select the appropriate region of the 16 s rRNA for widespread and diverse bacterial detection (36)(37)(38).
Genetic and molecular research is largely shifting to digital databases, with decreased manual handling of raw data (39). Studies have also discovered that despite discovering new and unknown species, it is not always possible for taxonomists to provide species names or phylogenetic mapping (40,41). This could be one of the reasons why our metaanalysis showed many unclassified species of microorganisms. Few studies have explored the possibility of direct shotgun sequencing of the whole oral microbial community (metagenome) to reduce some of the bias associated with cloning and PCR (42,    6. Species of interest from random forest model. Features were selected by mean decrease in accuracy of the Gini coefficient (MeanDecreaseGini) that distinguish between health (green) and disease (Red). The differential abundance between health and disease is also represented as a log fold change and shown for the top 30 features. It had longer sequence reads, higher species richness and can identify organisms at a more advanced taxonomic and phylogenetic range (40). However, NGS is dependable, with lower cost when performed in-house; due to its high sensitivity and specificity, the need for additional reference tests or orthogonal validation assays is avoided (44). Therefore, it is overall cost-effective, and a valuable tool, which with a few modifications can significantly improve the future of personalized management of oral cancer.
Although we utilized a standardized pipeline, we failed to find any significant differences in the overall bacterial diversity between health and OSCC. The alpha diversity was lower in the OSCC group, though the difference was small and not statistically  (48). Our results showed that the V4 region followed by the V3-V4 region was selected most. Studies have shown through Chao1 and ACE index that the V2-V3 region has higher richness and genetic differences when compared to other regions (36). In our meta-analysis, the V4 region was the most frequently used, yielded the most tightly clustered results, and clustered distinctly from other regions in the beta diversity analysis. Due to the majority of studies choosing to use the V4 region, future studies should investigate whether the species coverage is high and if this region is a good predictor for microbiome studies. However, we did not observe any specific improvement in classification based upon the V4 compared to other regions. Indeed, work such as that by Johnson et al. (45) has highlighted the potential in producing accurate, high-resolution taxonomic classification of organisms via full-length 16S sequencing (45).
The results of our re-analysis of publicly available data returned interesting results. Random Forest analysis to generate ROC curves improved when the data was randomly split 80/20. These randomly selected variables are classified by creating decision trees, comparing the predicted values to the actual values. This can help determine the true positive/false negative values and the AUC is then measured to distinguish between the classes (46). The AUC was higher upon the 80/20 split, which shows a higher true positive rate of the randomly split data. Three of the 11 studies had an AUC over 0.75 (47)(48)(49). A significantly increased log 2 fold change was also observed in 4 studies (48)(49)(50)(51).
The beta diversity analysis showed scarce clustering between health and OSCC, which signifies that there was little to no difference between the microbiome of health and disease, which is consistent with the original results of the studies included in our meta-analysis. We have shown, however, that ADONIS analysis of PhiLR and UniFrac beta diversity matrices produced statistically significant differences between the health and OSCC groups in individual studies. This may be indicative that pooling of samples from different studies for the metaanalysis might have altered the overall results.
As evidenced by our results, saliva samples have gained popularity over the last few years. The beta diversity analysis also showed clustering of saliva samples, which was distinctly separate from the biopsy and swab samples. Some studies have found that tissue biopsies have a high concentration of F. nucleatum localized in both pre-cancerous and cancerous tissues (52,53). Gopinath et al. performed an extensive study in 2021 on the different sampling types in oral cancer vs healthy tissues and found increased levels of Prevotella, Campylobacter, Capnocytophaga, Solobacteria, Peptostreptococcus and Catonella genera in oral cancer patients. They found significant differences in bacterial composition between tumour biopsies and swabs (14). These differences could be attributed to presence of biofilms or co-aggregation of bacterial pathogens on the diseased oral mucosa. Our meta-analysis showed an increase toward selection of saliva for microbiome sequencing, owing to the ease of saliva collection, storage and non-invasiveness. However, our top three species in the diseased group were Selemonas sp. HMT 126, unclassified Leptotrichia sp. HMT 223 and Prevotella denticola. A few studies in the meta-analysis had found similar results, with Prevotella sp. being the most significant overall. Since these studies have found that saliva is a partial indicator of the cancer tissue microbiome, more research is needed to consider it as a conventional method of sampling in OSCC (14).
Studies have also found that tobacco consumption in the form of chewing and smoking could alter the oral microbiome, leading to tumour progression. These parameters are extremely important and need to be included during patient data collection, which seemed to be lacking among most studies (13). Five of 11 studies had included information about smokers and alcoholics, however, it was only 4% and 10% of the overall sample size included in our meta-analysis. Our results showed a slightly higher alpha diversity in smokers in Chao1 and observed index, though this was not statistically significant. Upon further analysis, we found significant differences between alcoholics and non-alcoholics. Alcoholics may have an altered microbiome due to an overproduction of acetaldehyde. High levels of acetaldehyde producing bacteria like Actinomyces, Rothia, Streptococcus and Prevotella have been isolated from the oral cavity of chronic alcoholics (13). Indeed, studies, such as those conducted by Mizumoto et al., in 2017 highlight the relationship between acetaldehyde and tumour mutagenesis (54). This would imply that selection for these acetaldehyde producing bacteria could induce further mutations and progression of the tumour. This is further reinforced by our identification of Prevotella as a top-three organism present in disease samples during our meta-analysis. Notably, the majority of studies did not consider the yeast Candida albicans, an important determinant of oral cancer (55). From our literature screening we also identified one ITS focused paper with available data. We found significantly more 16S rRNA microbiome studies within the literature search, with the majority of studies focusing on the bacterial involvement in OSCC. Therefore, future studies should examine the mycobiome as well as the microbiome in any microbiological investigative studies. Although not included in our search criteria it is noteworthy that there are additional considerations other than the microbiome and mycobiome, including the phageome, virome and meta-transcriptome with the oral environment (56).
Microbiome sequencing utilizing 16S rRNA amplicon also have limitation compared to the more holistic metagenomic shotgun sequencing. Due to limitations of integration methods within our standardized protocol for analysing microbiome data we were unable to include these data types within our study design. However, at the time of writing the majority of studies utilizing the shotgun approach were minimal in comparison to those that used 16S microbiome. Within this study we highlight that although this is the predominant form of profiling the oral microenvironment in OSCC there are other technological developments to be considered. Metagenomics by shotgun has been applied to profile other medically relevant conditions within the human body, including the gut and vaginal microbial communities and other inflammatory oral diseases (57-59). Due to its cost effectiveness, speed of preparation and comparative ease of analysis, amplicon sequencing remains the most popular platform for microbial profiling. Within time, and increase of processing simplicity, it may become preferable to utilize shotgun sequencing. However, for 16S amplicon sequencing to become a useful clinical tool the need for standardization is imminently desirable.

CONCLUSION
Despite these advancements, the future of oral microbiome research in OSCC is highly dependent on study design characterization. Several systematic reviews conducted over the past few years have concluded that oral microbiome studies must widely focus on a standard study design, collect essential and adequate metadata, and follow proper pipelines for analyses of the data (10,13). We have determined that it may be difficult to rely upon microbiome studies for this purpose, due to a lack of specificity, and wide variation between individual study design and outcomes. A consensus in approach is a basic requirement before these studies can be collectively useful. However, in understanding the essential marriage of microbiome and clinical metadata when conducting these analyses, we can also postulate that a targeted multi-omic approach to sample analysis may provide a more promising outcome for early diagnosis of OSCC (60). The oral microbiome had great potential in classification of diseases, including diagnosis and prevention. Future studies could include a more clear and standardized technique for analysis, utilizing the vast number of technological advances in the databases available as a predictive tool in OSCC by applying our knowledge of the microbiome into useful clinical applications.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.   Data S1. Meta-data table of clinical studies and study specific information.