We are entering an era of personalized medicine, whereby the development of risk prediction models and molecular diagnostics are emerging to provide guidelines for clinical decision-making and the personalized management of cancer. After Gail et al published what to our knowledge is the first risk prediction model for the absolute risk of breast cancer, many risk prediction models were developed, including those for bladder cancer, breast cancer,[3, 4] colorectal cancer,[5, 6] liver cancer,[7, 8] lung cancer,[9, 10] and melanoma.[11, 12] With advances in technology, several molecular biomarkers and corresponding diagnostic assays have been developed. In this review, we summarize the development, evaluation, and validation of risk prediction models and contributions from modern molecular diagnostics, and discuss some of the challenges for clinical translation.
RISK PREDICTION MODELS
Risk prediction models enable clinician and public health professionals to assess individual risk using known epidemiological and clinical risk factors, which have important clinical and public health applications. They allow for recommendations and changes in clinical care to manage these risks across the cancer continuum (Fig. 1), including behavioral changes and preventive interventions, chemoprevention, evidenced-based screening recommendations, risk-stratified clinical trials, personalized therapy, and risk-stratified follow-up care.
To facilitate risk stratification and the identification of high-risk individuals for cost-effective screening, surveillance, chemoprevention, and early detection of cancer, mounting efforts have been put forth to develop cancer risk prediction models with better prediction accuracy and good calibration. Currently, a majority of the models have moderate discriminatory ability, with areas under the curves (AUCs) that are typically within the range of 0.55 to 0.70,[14, 15] which limits their use in the clinic. One approach to address this is the identification and incorporation of novel risk factors, including genetic and phenotypic biomarkers, to improve power. In addition, we need to develop Web-based tools for dissemination to really have an impact in the field. We performed a systematic review on breast cancer risk prediction models as an example. Since the late 1970s, more than 27 articles have been published regarding breast cancer risk prediction models or modified models. Ottman et al and Anderson et al have developed prediction models of familial breast cancer based on empirical data (age and family history) using nonmodel strategies such as the life table approach. Later, a series of model-based predictions were proposed. The most famous model was developed by Gail et al, which was generated to select participants into the Breast Cancer Prevention Trial. The Gail model could project 5-year and lifelong risks of developing invasive breast cancer based on risk factors including age, age at first delivery, age at menopause, family history of breast cancer, and personal history of breast biopsies using unconditional logistic regression. The Gail model was very popular due to the inclusion of a traditional statistical strategy, a wider spectrum of risk factors, and ease of use. However, to our knowledge, no evaluation of the model performance was included in the initial report. Several studies later modified the Gail model with the inclusion of additional risk factors, such as breast density, nipple aspirate fluid cytology, weight, history of breast inflammation, body mass index, parity, breastfeeding history, smoking history, drinking history, physical activity, and use of hormonal replacement therapy, and evaluated model performance. However, the improvement in model performance was not obvious, with a concordance statistic (c-index) of generally < 0.7. Other studies have applied the Gail model to external populations. Decarli et al applied the Gail model in Italian women with some modification of the categorization of risk factors and demonstrated a small improvement in the model performance. Gail et al have modified and applied their model in an African American population with poor model performance (c-index, 0.56). Two studies[24, 25] have applied a modified Gail model in Asian women; however, no validation was included. With the advent of Human Genome Project, there has been much enthusiasm regarding genetic markers as predictive risk biomarkers and several studies have incorporated genetic information into risk prediction models. Such information includes genetic loci identified from genome-wide association studies (GWAS); however, only a small improvement was obtained. For example, Gail et al[26, 27] examined the prediction benefit from 7 GWAS identified single nucleotide polymorphisms (SNPs) compared with the baseline Gail model, in which only a modest improvement was observed, less than when adding mammographic density alone. Wacholder et al added 10 published GWAS-identified SNPs into a risk model based on age, study, entrance time, and 4 factors from the Gail model (family history, age at menarche, age at first live birth, and previous biopsies), and observed a modest improvement in the AUC from 0.580 to 0.618. For patients with familial breast cancer, improvement has been shown with the addition of BRCA1/BRCA2 germline mutations. However, the addition of phenotypic biomarkers into a breast cancer model provided a significant improvement compared with SNPs. For example, previous work has also shown that adding mammographic density to epidemiologic risk factors increases the AUC for the Gail breast cancer model even more than adding SNPs. A recent prospective study performed methylation profiling of blood DNA and identified differentially methylated CpG sites that are predictive of future breast cancer risk; more importantly, the AUC estimated for methylation markers (65.8%) was much larger than that for the Gail model (56.0%) or the Gail model plus 9 GWAS-identified SNPs (58.8%). Other models incorporated the genetic model for the disease, such as the Claus model, which assumes the prevalence of high-penetrance genes for susceptibility to breast cancer and the BRCAPPRO[32, 33] model based on breast susceptibility genes (BRCA1/BRCA2).
Similar to breast cancer, there is less benefit to incorporate genetic markers. The inclusion of susceptibility loci identified in GWAS only introduced moderate improvement in the model's discriminatory ability for colorectal cancer and lung cancer models. In contrast, intermediate phenotypic markers and molecular biomarkers could have potential for greater discriminatory power. We have also shown that the addition of a phenotypic marker, mutagen sensitivity, in the bladder cancer risk prediction model increased the prediction power by nearly 10% and pushed the model to a level at which it could potentially have clinical relevance. Future efforts should be devoted to identifying strongly predictive genetic and phenotypic biomarkers to increase prediction ability at the individual level for risk prediction models.
The development of an efficient risk prediction model requires a series of steps that involve definition of the target population and candidate covariates, followed by model construction, assessment, and validation, which will be covered in this section.
Two of the most commonly used study designs for risk model development are case-control and cohort. The case-control study is widely used in developing risk prediction models for a binary outcome. The advantage is it requires a shorter time period to conduct the study. The major weakness is the high susceptibility for bias, such as recall and selection biases. However, well-designed and properly conducted case-control studies minimize these potential biases. The cohort study defines participants based on prespecified criteria and follows them over time for the endpoint of interest. It allows for the calculation of baseline hazards of incidence and competing risks, and the estimation of relative risk. However, the study often requires a lengthy follow-up period and is generally more expensive to conduct and maintain. In addition, assaying molecular markers in the entire cohort is often unfeasible due to cost and the large sample size involved. Case-cohort and nested case-control studies are alternatives that allow for the estimation of relative risks and cumulative hazards by sampling from the existing cohort.[36, 37]
For models that are developed for cancer risk, a logistic model is most commonly used. For time-to-event or censored survival outcomes, a Cox proportional hazards model is often applied.
Model construction will begin with univariate analysis to evaluate the main effect of individual variables. A spectrum of variables should be identified based on prior knowledge of disease etiology or sound reasoning on the relationship between the variable and the outcome. This association usually does not need to be causal. Commonly used variables in cancer risk prediction models range from demographics (age, sex, and ethnicity); smoking and alcohol use; exposures; medical and family history; and, more recently, biomarkers. The variables should be well-defined in a consistent, standardized, and reproducible manner. The significant variables in the univariate analysis are candidates to be considered for the final multivariate model. A common strategy is to build a final risk model using forward selection, backward elimination, and others from statistically significant variables in the univariate analysis and clinically relevant variables. Moreover, when selecting the predictors to be included, between-variable or higher-order interactions should be considered by testing pairwise product terms or applying more advanced statistical approaches.
Several approaches including machine learning tools have been developed to address the complex underlying networks among variables. For example, tree-based methods[40, 41] provide intelligent data analysis and are well suited for the binary and censored survival data with the advantage that they do not require any assumptions. An artificial neural network is another method that is capable of modeling complex relationships such as nonlinear effects and interactions. In addition, several methods such as random forest and mixture of experts construct a series of submodels, with the final model comprised of the combination that provides the optimal fit to the outcome of interest.
To prevent overfitting of the model, several guidelines have been proposed such as the Harrell guideline (10 events per variable). Variable selection may also be affected by other factors such as confounders, interaction effects, and transformation of the predictors. The possibility of confounding effects due to high collinearity between predictor variables could be assessed by the variance inflation factor for each variable.
Another technical issue is missing values. Simply removing subjects with missing data may lead to biased estimates unless the missing values are missing completely at random. Multiple imputation is a statistical method for imputing observations for the missing data.
Model performance assessment
Brier score measures the accuracy of the probabilistic predictions. A smaller value indicates better prediction. For binary outcomes, the Brier score is a traditional sum of squares measure that can be expressed as the squared differences between the outcome and prediction. For survival outcomes, the Brier score is calculated as a function of time using a weight function.
Discrimination is the ability to differentiate between subjects with or without an endpoint event. Discrimination is calculated based on sensitivity (correctly identify individuals with an outcome) and specificity (correctly identify individuals without an outcome). For risk, the receiver operating characteristic (ROC) curve analysis and calculation of the AUC are the most commonly used measures. For censored survival data, the c-index that serves as an extension of the ROC curve is used. As a rank-ordered statistic, the AUC or c-index is not sensitive to systematic errors.[51, 52] Other measures such as the discrimination slope can be used to evaluate the separation of subjects with or without an endpoint event for binary outcomes.
Calibration provides a measure for goodness of fit to determine the degree of agreement between predicted probabilities with the observed outcomes. Poor calibration will lead to a systematic error in the model performance. A graphic assessment for calibration is generally used, which plots the predictive value against the observed value. A perfect prediction presents as a diagonal line.
Recently, several new measurements have been proposed. The decision curve uses the probability threshold to represent the relative harms of false-positive and false-negative results. The “reclassification table” indicates how many subjects could be reclassified into different risk groups by adding a new covariate with an extension called net reclassification improvement (NRI). However, recent examination of the appropriateness of NRI demonstrated that it did not provide a proper scoring rule, and the use of miscalibrated risk model could also result in a large NRI statistic.
The final model generated should be presented in a simple and clear manner. A regression model equation should include regression coefficients and intercept for logistic models. In this way, the model can be applied to predict risk in specific subjects. A simplified approach is to convert the predictor estimate to a rounded number, or “risk index.” Moreover, the model may be converted into a user-friendly online calculator or nomogram.
The performance of the prediction model needs to be assessed both internally and externally to determine the model performance, generalization, and clinical usefulness. Internal validation uses the original population with repeated sampling performed many times. The average of the output from these repetitions is often referred to as the bias-corrected value. Several internal validation techniques are used, including cross-validation and bootstrap validation.[38, 60]
External validation is preferred to provide measures of model performance, external usefulness, and generalizability to the general population. The generalizability of the model to a new population is a key concern. However, due to data availability, external validation is usually difficult to conduct, especially when it requires a series of reevaluations. Therefore, collaborative efforts and international consortiums should be encouraged to facilitate external validations of prediction models.
Individuals in the same risk group defined by risk prediction models often exhibit different risks of cancer development and prognosis. Therefore, a molecular diagnosis based on biomarkers may reveal hidden differences not captured in the risk prediction model. Traditionally, clinicians have relied on tumor pathological characteristics to categorize and treat patients with initial testing focused on known mutations. With the evolution of molecular biology and laboratory techniques, especially the emergence of large-scale “omics” platforms (genomics, epigenomics, transcriptomics, proteomics, metabolomics, and others) and whole-genome sequencing, more attention and efforts have been devoted to the development of novel molecular biomarkers and companion diagnostic tools. A well-defined molecular diagnosis could maximize treatment efficiency and the cost-benefit ratio. We are facing unprecedented opportunities for the integration of molecular diagnostics into the clinic, which promotes progress toward the personalized management of patients with cancer.
The general term “molecular diagnosis” includes 3 categories of molecular tests: diagnostic, prognostic, and predictive. Based on current knowledge of cancer initiation, a diagnostic test focusing on specific alterations involved in carcinogenesis can lead to a more accurate description of the cancer subtype and characteristics. For example, testing for the genomic rearrangements creating the BCR-ABL1 and PML-RARA gene fusion events has been applied to the diagnosis of acute leukemia. It is preferred that diagnostic testing is conducted in a noninvasive or minimally invasive manner, such as the testing of blood or urine samples. Prognostic testing is another category of molecular diagnosis. For example, the occurrence of TP53 mutations in patients with chronic lymphocytic leukemia and the FLT-ITD mutation in those with acute myeloid leukemia are markers of poor clinical outcomes. Due to the complexity of tumor progression, it is reasonable to use a panel of markers to predict the patient's prognosis. The Oncotype DX and MammaPrint gene expression tests have been used for prognosis stratification in patients with breast cancer using tumor tissue.[68, 69] Recent research has focused on predictive, or pharmacogenetic, biomarkers to select treatments and to predict responses. The biomarker should be well characterized, relatively homogeneous, and occur at a high frequency within the population. For example, germline polymorphisms in TPMT and UGT1A1 alter response and toxicities to thiopurine drugs and irinotecan, respectively, therefore guiding the dose delivered. Among tumor-based tests, EGFR kinase domain mutation and HER2 expression are used to identify patients who might benefit from treatment with gefitinib and trastuzumab, respectively. A more recent example is BRCA1/2 mutation testing for poly (ADP-ribose) polymerase inhibitor treatment in breast cancer, which has been found to greatly enhance treatment efficacy in a phase 1 trial and was subsequently validated in phase 2 trials.
Blood-based diagnostics are minimally invasive methods that require only a blood draw. The most common application is genetic testing of individuals with a family history of cancer, such as BRCA1/2 for patients with ovarian/breast cancer and APC in those with colorectal cancer. In these families, testing for rare genetic events can provide risk assessment for nonaffected individuals and information for cancer surveillance, risk reduction interventions, and early detection. Recently, whole-genome sequencing and other “omics” technologies have been able to better evaluate risk through the identification of novel alterations beyond known mutations. Another type of genetic testing is based on a panel of common germline genetic variants, or SNPs for risk of sporadic cancer. The clinical usefulness of these variants has been hampered by the lack of a strong effect on risk, limited information regarding interpretation of the testing results (typically for whole-genome sequencing), and a lack of risk reduction strategies for many cancers. Intensive efforts are still needed to address these issues to enable clinical translation.
As discussed earlier, phenotypic blood-based biomarkers, such as methylation markers in breast cancer and mutagen sensitivity in bladder cancer, have great potential in improving risk prediction. Novel approaches that identify biomarkers in the serum or plasma from patients are emerging, including circulating tumor cells, tumor DNA, and microRNAs. The clinical application of these biomarkers will become apparent in the coming years.
Tumor-based molecular diagnostic tests are those that test for alterations in DNA, gene transcription, or protein expression within the tumor tissue. Mutation testing is typically conducted by targeted sequencing of previously identified mutational “hot spots” or of functional domains (eg, kinase domains or binding pockets). Some of the mutations tend to be recurrent mutations observed in many tumor cells. These mutations can serve as predictive biomarkers to provide information for guiding proper treatment regimens and the selection of targeted agents.[79, 80] Moreover, the prevalence of mutations in the same gene or different genes also demonstrates the ability to distinguish between prognostic groups. Genome-wide mutation analysis provides mutation information for hundreds of cancer-related genes in tumors.[63, 74] Although the technique has high validity, it is limited with regard to the interpretation and usefulness of the data. For example, a large number of mutations may be present in the sample due to clonal heterogeneity and expansion, which increase the complexity of identifying causal mutations. Many of the mutations may be “passengers” with benign or nontumorigenic effects. Moreover, a negative result on the mutation analysis does not completely rule out mutations, and the presence of mutations in a gene is not equivalent to a cancer phenotype. In addition, not all mutational changes are “drugable,” making the number of actionable mutations limited.
Tests that focus on patterns of RNA or protein expression provide information concerning the specific tumor phenotype, thereby guiding molecular characterization of tumors that can inform diagnosis and prognosis as well as guide treatment. Previously, RNA expression profiling was performed by hybridization arrays of the protein-coded genes in the genome and validated with real-time polymerase chain reaction. RNA sequencing analysis (RNA-seq) now can generate detailed transcriptome information that goes beyond the protein-coding genes to include small RNA species, noncoding transcripts, and other previously uncharacterized transcripts in the genome. RNA-seq can also identify gene fusions. However, the amount of information generated from RNA-seq may be difficult to analyze and thereby determine actionable changes. A molecular diagnostic test that focuses on a specific known gene is preferred for ease of clinical application.
Characterization of protein biomarkers in tumors is often performed by immunohistochemistry. This method assesses not only expression, but also tissue and subcellular localization. However, the requirement for specific antibodies and tissue handling often make immunohistochemistry cumbersome and unable to detect proteins that do not have a suitable antibody or require a precise quantification. Profiling of global protein expression in tumor samples can be performed by mass spectrometry analysis. Large-scale matrix-assisted laser desorption ionization mass spectrometry analysis is used to analyze protein content. It enables the potential development of techniques to study protein biomarkers by imaging, identifying, mapping, and quantifying them in the tumor samples. Other tools are also important for the analysis of low abundant proteins and to identify protein interactions, which are not captured in mass spectrometry analysis. For example, enzyme-linked immunoadsorbent assay has been used for the detection of proteins in samples with low abundance. Fluorescence resonance energy transfer, a mechanism describing energy transfer between chromophores, has been applied to imaging microscopy to record protein interactions using tumor samples.
Assessment and Validation
Given the increasing importance of molecular diagnostic testing in patient management, it is critical to develop robust and reliable diagnostic testing tools, which requires stringent technical assessment and validation. Accuracy is assessed by comparing the results of the test with a reference standard in the same “study context.”[83, 84] Sensitivity and specificity provide information regarding the accuracy of the test and the positive predictive value (percentage of patients with positive test results with a real outcome) and negative predictive value (percentage of patients with a negative test result who are without a real outcome) measure the clinical usefulness of the test. Validity, reliability, and reproducibility need to be assessed before commercialization and clinical application. Validity of testing should be evaluated in the following aspects: the content validity (biomarkers reflect the real biology phenomenon), construct validation (relevant characterization of the cancer), and criterion validation (extent of biomarker correlation with specific cancers). Reproducibility and reliability are also critical for the assessment of diagnostic testing. Variations in laboratory conditions could lead to misclassifications and result in a biased estimation of risk assessment. Therefore, even pilot studies should be performed in a consistent, accurate, and precise manner, and all laboratory procedures (such as personnel, methods, and storage) must be standardized.
Clinical validation is another critical aspect to be considered and a randomized clinical trial is required to determine whether stratification by molecular biomarker is able to improve clinical care. It is essential to validate and assess the diagnostic test within a targeted population or cancer site due to the cancer site specificity of the markers.
However, even with structured and well-characterized processes for molecular diagnostic development, assessment and validation may still face great challenges. This is particularly the case when the biomarker is not directly associated with cancer pathogenesis or disease progression, or multiple pathways have resulted in the same effect as the observed molecular markers. More sophisticated analytical approaches and knowledge obtained from cancer biology and clinical experiences are needed to evaluate diagnostic testing of molecular biomarkers with complex phenotypes.
SUMMARY AND FUTURE PERSPECTIVES
Risk prediction models based only on epidemiological variables do not provide sufficient prediction power for clinical use. Genetic susceptibility SNPs only modestly improve prediction accuracy and it is unlikely that adding more SNPs would further improve prediction for clinical usefulness. Other novel risk factors and biomarkers are needed. Intermediate phenotypes or biomarkers, such as mutagen sensitivity, methylation profiling of germline DNA, or mammographic density, have shown promise in increasing predictive power.[26, 30] We are entering an era of personalized “omics,” whereby multiple layers of data will be available to identify potential biomarkers for improving risk prediction. The major challenge associated with these high-throughput data are validation with adequate statistical power to avoid false-positive findings, particularly in the prospective setting to address reverse causality for biomarkers that are changeable by disease state. Nevertheless, the future of risk prediction model lies in the full use of “omics” data and the incorporation of well-validated biomarkers.
The rapid technical advances in modern biology and the arrival of the sequencing era have also facilitated the development of molecular diagnostics. With their increasing clinical significance in diagnosis and treatment, the molecular diagnostic tests should be carefully designed, assessed, and validated for their reliability and clinical usefulness. A series of rigorous clinical assessment and validation procedures should be adopted to ensure that accurate medical information is delivered to assist in informed decision-making. In summary, we should take an integrative approach that incorporates multiple layers of information including genetic and environmental influences, host characteristics, clinical data, and molecular alterations for risk assessments. The information is then used for the development of personalized risk prediction models that through clinical trials and hopefully industry partnerships can lead to implementation and risk communication with the patient. Finally, this can guide risk management strategies for personalized prevention and personalized therapy. Ultimately, the goal of epidemiology is to influence public attitudes, practices, and policies in medicine and public health and to improve the lives of patients predisposed to or living with cancer.
Supported by The University of Texas MD Anderson Cancer Center Research Trust to Xifeng Wu as a Senior Fellow and the Center for Translational and Public Health Genomics, Duncan Family Institute for Cancer Prevention and Risk Assessment, The University of Texas MD Anderson Cancer Center.