To facilitate risk stratification and the identification of high-risk individuals for cost-effective screening, surveillance, chemoprevention, and early detection of cancer, mounting efforts have been put forth to develop cancer risk prediction models with better prediction accuracy and good calibration. Currently, a majority of the models have moderate discriminatory ability, with areas under the curves (AUCs) that are typically within the range of 0.55 to 0.70,[14, 15] which limits their use in the clinic. One approach to address this is the identification and incorporation of novel risk factors, including genetic and phenotypic biomarkers, to improve power. In addition, we need to develop Web-based tools for dissemination to really have an impact in the field. We performed a systematic review on breast cancer risk prediction models as an example. Since the late 1970s, more than 27 articles have been published regarding breast cancer risk prediction models or modified models. Ottman et al and Anderson et al have developed prediction models of familial breast cancer based on empirical data (age and family history) using nonmodel strategies such as the life table approach. Later, a series of model-based predictions were proposed. The most famous model was developed by Gail et al, which was generated to select participants into the Breast Cancer Prevention Trial. The Gail model could project 5-year and lifelong risks of developing invasive breast cancer based on risk factors including age, age at first delivery, age at menopause, family history of breast cancer, and personal history of breast biopsies using unconditional logistic regression. The Gail model was very popular due to the inclusion of a traditional statistical strategy, a wider spectrum of risk factors, and ease of use. However, to our knowledge, no evaluation of the model performance was included in the initial report. Several studies later modified the Gail model with the inclusion of additional risk factors, such as breast density, nipple aspirate fluid cytology, weight, history of breast inflammation, body mass index, parity, breastfeeding history, smoking history, drinking history, physical activity, and use of hormonal replacement therapy, and evaluated model performance. However, the improvement in model performance was not obvious, with a concordance statistic (c-index) of generally < 0.7. Other studies have applied the Gail model to external populations. Decarli et al applied the Gail model in Italian women with some modification of the categorization of risk factors and demonstrated a small improvement in the model performance. Gail et al have modified and applied their model in an African American population with poor model performance (c-index, 0.56). Two studies[24, 25] have applied a modified Gail model in Asian women; however, no validation was included. With the advent of Human Genome Project, there has been much enthusiasm regarding genetic markers as predictive risk biomarkers and several studies have incorporated genetic information into risk prediction models. Such information includes genetic loci identified from genome-wide association studies (GWAS); however, only a small improvement was obtained. For example, Gail et al[26, 27] examined the prediction benefit from 7 GWAS identified single nucleotide polymorphisms (SNPs) compared with the baseline Gail model, in which only a modest improvement was observed, less than when adding mammographic density alone. Wacholder et al added 10 published GWAS-identified SNPs into a risk model based on age, study, entrance time, and 4 factors from the Gail model (family history, age at menarche, age at first live birth, and previous biopsies), and observed a modest improvement in the AUC from 0.580 to 0.618. For patients with familial breast cancer, improvement has been shown with the addition of BRCA1/BRCA2 germline mutations. However, the addition of phenotypic biomarkers into a breast cancer model provided a significant improvement compared with SNPs. For example, previous work has also shown that adding mammographic density to epidemiologic risk factors increases the AUC for the Gail breast cancer model even more than adding SNPs. A recent prospective study performed methylation profiling of blood DNA and identified differentially methylated CpG sites that are predictive of future breast cancer risk; more importantly, the AUC estimated for methylation markers (65.8%) was much larger than that for the Gail model (56.0%) or the Gail model plus 9 GWAS-identified SNPs (58.8%). Other models incorporated the genetic model for the disease, such as the Claus model, which assumes the prevalence of high-penetrance genes for susceptibility to breast cancer and the BRCAPPRO[32, 33] model based on breast susceptibility genes (BRCA1/BRCA2).
Similar to breast cancer, there is less benefit to incorporate genetic markers. The inclusion of susceptibility loci identified in GWAS only introduced moderate improvement in the model's discriminatory ability for colorectal cancer and lung cancer models. In contrast, intermediate phenotypic markers and molecular biomarkers could have potential for greater discriminatory power. We have also shown that the addition of a phenotypic marker, mutagen sensitivity, in the bladder cancer risk prediction model increased the prediction power by nearly 10% and pushed the model to a level at which it could potentially have clinical relevance. Future efforts should be devoted to identifying strongly predictive genetic and phenotypic biomarkers to increase prediction ability at the individual level for risk prediction models.