Volume 15, Issue 4
Tutorial in Biostatistics

MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS

FRANK E. HARRELL Jr.

Corresponding Author

Division of Biometry and Cardiology, Box 3363, Duke University Medical Center, Durham, North Carolina 27710, U.S.A.

Division of Biometry and Cardiology, Box 3363, Duke University Medical Center, Durham, North Carolina 27710, U.S.A.Search for more papers by this author
KERRY L. LEE

Division of Biometry and Cardiology, Box 3363, Duke University Medical Center, Durham, North Carolina 27710, U.S.A.

Search for more papers by this author
DANIEL B. MARK

Division of Biometry and Cardiology, Box 3363, Duke University Medical Center, Durham, North Carolina 27710, U.S.A.

Search for more papers by this author

Abstract

Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross‐validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time‐to‐event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

Number of times cited according to CrossRef: 4884

  • Evolution and early government responses to COVID-19 in South America, World Development, 10.1016/j.worlddev.2020.105180, 137, (105180), (2021).
  • Prognostic importance of radiologic extranodal extension in HPV-positive oropharyngeal carcinoma and its potential role in refining TNM-8 cN-classification, Radiotherapy and Oncology, 10.1016/j.radonc.2019.10.011, 144, (13-22), (2020).
  • Improving Urban Water Security through Pipe-Break Prediction Models: Machine Learning or Survival Analysis, Journal of Environmental Engineering, 10.1061/(ASCE)EE.1943-7870.0001657, 146, 3, (04019129), (2020).
  • Implications for restaging in gastric cancer with peritoneal metastasis based on the 15th Japanese Classification of Gastric Carcinoma: An analysis from a comprehensive center, European Journal of Surgical Oncology, 10.1016/j.ejso.2020.01.012, (2020).
  • A novel prognostic nomogram based on microvascular invasion and hematological biomarkers to predict survival outcome for hepatocellular carcinoma patients, Surgical Oncology, 10.1016/j.suronc.2020.01.006, (2020).
  • A DNA methylation signature to improve survival prediction of gastric cancer, Clinical Epigenetics, 10.1186/s13148-020-0807-x, 12, 1, (2020).
  • Association between socioeconomic factors at diagnosis and survival in breast cancer: A population‐based study, Cancer Medicine, 10.1002/cam4.2842, 9, 5, (1922-1936), (2020).
  • A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy, The Prostate, 10.1002/pros.23952, 80, 5, (376-387), (2020).
  • Validation of clinical prediction models: what does the “calibration slope” really measure?, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2019.09.016, 118, (93-99), (2020).
  • Individual-patient prediction of meningioma malignancy and survival using the Surveillance, Epidemiology, and End Results database, npj Digital Medicine, 10.1038/s41746-020-0219-5, 3, 1, (2020).
  • Development and validation of a nomogram for predicting late-onset sepsis in preterm infants on the basis of thyroid function and other risk factors: Mixed retrospective and prospective cohort study, Journal of Advanced Research, 10.1016/j.jare.2020.02.005, (2020).
  • Prognostic nomograms and Aggtrmmns scoring system for predicting overall survival and cancer‐specific survival of patients with kidney cancer, Cancer Medicine, 10.1002/cam4.2916, 9, 8, (2710-2722), (2020).
  • Development and validation of a novel nomogram for pretreatment prediction of liver metastasis in pancreatic cancer, Cancer Medicine, 10.1002/cam4.2930, 9, 9, (2971-2980), (2020).
  • Development and Validation of a Model for Predicting Rehabilitation Care Location Among Patients Discharged Home Following Total Knee Arthroplasty, The Journal of Arthroplasty, 10.1016/j.arth.2020.02.032, (2020).
  • Impact of Modifiable Bleeding Risk Factors on Major Bleeding in Patients With Atrial Fibrillation Anticoagulated With Rivaroxaban, Journal of the American Heart Association, 10.1161/JAHA.118.009530, 9, 5, (2020).
  • Factors Influencing Time-Dependent Quality Indicators for Patients With Suspected Acute Coronary Syndrome, Journal of Patient Safety, 10.1097/PTS.0000000000000242, 16, 1, (e1-e10), (2020).
  • Dynamic prediction and prognostic analysis of patients with cervical cancer: a landmarking analysis approach, Annals of Epidemiology, 10.1016/j.annepidem.2020.01.009, (2020).
  • Prediction modelling - Part 1 - Regression modelling, Kidney International, 10.1016/j.kint.2020.02.007, (2020).
  • Red cell distribution width predicts time to recurrence in patients with primary non–muscle-invasive bladder cancer and improves the accuracy of the EORTC scoring system, Urologic Oncology: Seminars and Original Investigations, 10.1016/j.urolonc.2020.01.016, (2020).
  • Quantifying skeletal muscle wasting during chemoradiotherapy with Jacobian calculations for the prediction of survival and toxicity in patients with gastric cancer, European Journal of Surgical Oncology, 10.1016/j.ejso.2020.03.223, (2020).
  • State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues, Diagnostic and Prognostic Research, 10.1186/s41512-020-00074-3, 4, 1, (2020).
  • Evaluation of the urethral α/β ratio and tissue repair half-time for iodine-125 prostate brachytherapy with or without supplemental external beam radiotherapy, Brachytherapy, 10.1016/j.brachy.2020.02.007, (2020).
  • Evaluation of a five-year predicted survival model for cystic fibrosis in later time periods, Scientific Reports, 10.1038/s41598-020-63590-8, 10, 1, (2020).
  • Evaluating overall survival and competing risks of survival in patients with early‐stage breast cancer using a comprehensive nomogram, Cancer Medicine, 10.1002/cam4.3030, 9, 12, (4095-4106), (2020).
  • Predicting the likelihood of recurrence of pregnancy-associated breast cancer: Nomogram based on analysis of the French cancer network: Cancer Associé à La Grossesse, Journal of Gynecology Obstetrics and Human Reproduction, 10.1016/j.jogoh.2020.101766, (101766), (2020).
  • Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data, BMC Health Services Research, 10.1186/s12913-020-05148-y, 20, 1, (2020).
  • Evidence-based Tumor Staging of Skeletal Chondrosarcoma, The American Journal of Surgical Pathology, 10.1097/PAS.0000000000001397, 44, 1, (111-119), (2020).
  • Prognostic Relevance of Cardiorespiratory Fitness as Assessed by Submaximal Exercise Testing for All-Cause Mortality: A UK Biobank Prospective Study, Mayo Clinic Proceedings, 10.1016/j.mayocp.2019.12.030, 95, 5, (867-878), (2020).
  • Integration of an Objective Cognitive Assessment Into a Prognostic Index for 5‐Year Mortality Prediction, Journal of the American Geriatrics Society, 10.1111/jgs.16451, 68, 8, (1796-1802), (2020).
  • Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments, BMC Public Health, 10.1186/s12889-020-08735-0, 20, 1, (2020).
  • Prognostic value of a combination of innovative factors (gut microbiota, sarcopenia, obesity, metabolic syndrome) to predict surgical/oncologic outcomes following surgery for sporadic colorectal cancer: a prospective cohort study protocol (METABIOTE), BMJ Open, 10.1136/bmjopen-2019-031472, 10, 1, (e031472), (2020).
  • Protocol for the derivation and external validation of a 30-day mortality risk prediction model for older patients having emergency general surgery (PAUSE score—Probability of mortality Associated with Urgent/emergent general Surgery in oldEr patients score), BMJ Open, 10.1136/bmjopen-2019-034060, 10, 1, (e034060), (2020).
  • Vital Signs Data and Probability of Hospitalization, Transfer to Another Facility, or Emergency Department Death Among Adults Presenting for Medical Illnesses to the Emergency Department at a Large Urban Hospital in the United States, The Journal of Emergency Medicine, 10.1016/j.jemermed.2019.11.020, (2020).
  • Prognostic nomogram based on the metastatic lymph node ratio for gastric neuroendocrine tumour: SEER database analysis, ESMO Open, 10.1136/esmoopen-2019-000632, 5, 2, (e000632), (2020).
  • A simple nomogram for predicting early complications in patients after primary knee arthroplasty, The Knee, 10.1016/j.knee.2019.11.015, (2020).
  • Optimizing C-Index via Gradient Boosting in Medical Survival Analysis, Complex Pattern Mining, 10.1007/978-3-030-36617-9_3, (33-45), (2020).
  • Comparing Risk Scores in the Prediction of Coronary and Cardiovascular Deaths, JACC: Cardiovascular Imaging, 10.1016/j.jcmg.2019.12.010, (2020).
  • Biomarkers in patients with heart failure and central sleep apnoea: findings from the SERVE‐HF trial, ESC Heart Failure, 10.1002/ehf2.12521, 7, 2, (503-511), (2020).
  • Prognostic Nomogram For Locoregionally Advanced Nasopharyngeal Carcinoma, Scientific Reports, 10.1038/s41598-020-57968-x, 10, 1, (2020).
  • Modified staging system for gastric neuroendocrine carcinoma based on American Joint Committee on Cancer and European Neuroendocrine Tumor Society systems, BJS (British Journal of Surgery), 10.1002/bjs.11408, 107, 3, (248-257), (2020).
  • Development and validation of a nomogram for predicting long-term overall survival in nasopharyngeal carcinoma, Medicine, 10.1097/MD.0000000000018974, 99, 4, (e18974), (2020).
  • A causal framework for classical statistical estimands in failure‐time settings with competing events, Statistics in Medicine, 10.1002/sim.8471, 39, 8, (1199-1236), (2020).
  • Models for Predicting Melanoma Outcome, Cutaneous Melanoma, 10.1007/978-3-030-05070-2, (299-314), (2020).
  • Plasma circular RNA hsa_circ_0001445 and coronary artery disease: Performance as a biomarker, The FASEB Journal, 10.1096/fj.201902507R, 34, 3, (4403-4414), (2020).
  • Incidence, treatment and outcome of abdominal metastases in extremity soft tissue sarcoma: Results from a multi‐centre study, Journal of Surgical Oncology, 10.1002/jso.25856, 121, 4, (605-611), (2020).
  • Estimation in the Cox survival regression model with covariate measurement error and a changepoint, Biometrical Journal, 10.1002/bimj.201800085, 62, 5, (1139-1163), (2020).
  • Predicting Cardiovascular Outcomes by Baseline Lipoprotein(a) Concentrations: A Large Cohort and Long‐Term Follow‐up Study on Real‐World Patients Receiving Percutaneous Coronary Intervention, Journal of the American Heart Association, 10.1161/JAHA.119.014581, 9, 3, (2020).
  • Descriptive Epidemiology of Gradual Return to Work for Workers With a Work-Acquired Musculoskeletal Disorder in British Columbia, Canada, Journal of Occupational and Environmental Medicine, 10.1097/JOM.0000000000001768, 62, 2, (113-123), (2020).
  • Prediction Models for Delirium in Critically Ill Adults, Delirium, 10.1007/978-3-030-25751-4, (57-72), (2020).
  • Personalized CT‐based radiomics nomogram preoperative predicting Ki‐67 expression in gastrointestinal stromal tumors: a multicenter development and validation cohort, Clinical and Translational Medicine, 10.1186/s40169-020-0263-4, 9, 1, (2020).
  • A new prognostic score for disease progression and mortality in patients with newly diagnosed primary CNS lymphoma, Cancer Medicine, 10.1002/cam4.2872, 9, 6, (2134-2145), (2020).
  • Recurrence following hemithyroidectomy in patients with low‐ and intermediate‐risk papillary thyroid carcinoma, BJS (British Journal of Surgery), 10.1002/bjs.11430, 107, 6, (687-694), (2020).
  • Study of underlying pancreatic cancer could be improved, HPB, 10.1016/j.hpb.2019.12.015, (2020).
  • Framework for personalized prediction of treatment response in relapsing remitting multiple sclerosis, BMC Medical Research Methodology, 10.1186/s12874-020-0906-6, 20, 1, (2020).
  • Development of a Cardiovascular Disease Risk Prediction Model Using the Suita Study, a Population-Based Prospective Cohort Study in Japan, Journal of Atherosclerosis and Thrombosis, 10.5551/jat.48843, (2020).
  • Histological subtype, tumor grade, tumor size and race can accurately predict the probability of synchronous metastases in T2 renal cell carcinoma, Clinical Genitourinary Cancer, 10.1016/j.clgc.2020.02.001, (2020).
  • Prediction of overall survival in resectable intrahepatic cholangiocarcinoma: ISICC‐applied prediction model, Cancer Science, 10.1111/cas.14315, 111, 4, (1084-1092), (2020).
  • A risk score for prediction of venous thromboembolism in gynecologic cancer: The Thrombogyn score, Research and Practice in Thrombosis and Haemostasis, 10.1002/rth2.12342, 4, 5, (848-859), (2020).
  • A qualitative transcriptional prognostic signature for patients with stage I-II pancreatic ductal adenocarcinoma, Translational Research, 10.1016/j.trsl.2020.02.004, (2020).
  • A clinical nomogram for the prediction of early mortality in elderly patients initiating dialysis for end-stage renal disease, Renal Replacement Therapy, 10.1186/s41100-020-0259-y, 6, 1, (2020).
  • Interventional radiology in the airway in children, Pediatric Anesthesia, 10.1111/pan.13821, 30, 3, (311-318), (2020).
  • Quantitative metastatic lymph node burden and survival in Merkel cell carcinoma, Journal of the American Academy of Dermatology, 10.1016/j.jaad.2019.12.072, (2020).
  • Industrial Equipment Reliability Estimation: a Bayesian Weibull Regression Model with Covariate Selection, Reliability Engineering & System Safety, 10.1016/j.ress.2020.106891, (106891), (2020).
  • Comparison of BNP and NT-proBNP in Patients With Heart Failure and Reduced Ejection Fraction, Circulation: Heart Failure, 10.1161/CIRCHEARTFAILURE.119.006541, 13, 2, (2020).
  • Prognostic index consisting of early post‐transplant variables <2 weeks in adult living‐donor liver transplantation, Hepatology Research, 10.1111/hepr.13489, 50, 6, (741-753), (2020).
  • Assessment of modelling strategies for drug response prediction in cell lines and xenografts, Scientific Reports, 10.1038/s41598-020-59656-2, 10, 1, (2020).
  • A generalisable bottom-up methodology for deriving a residential stock model from large empirical databases., Energy and Buildings, 10.1016/j.enbuild.2020.109886, (109886), (2020).
  • How can patients with mobile hips and stiff lumbar spines be identified prior to total hip arthroplasty? – A Prospective, Diagnostic Cohort Study, The Journal of Arthroplasty, 10.1016/j.arth.2020.02.029, (2020).
  • A Nomogram Based on Apelin-12 for the Prediction of Major Adverse Cardiovascular Events after Percutaneous Coronary Intervention among Patients with ST-Segment Elevation Myocardial Infarction, Cardiovascular Therapeutics, 10.1155/2020/9416803, 2020, (1-10), (2020).
  • Exercise cardiac power and the risk of heart failure in men: A population-based follow-up study, Journal of Sport and Health Science, 10.1016/j.jshs.2020.02.008, (2020).
  • Super Learner for Survival Data Prediction, The International Journal of Biostatistics, 10.1515/ijb-2019-0065, 0, 0, (2020).
  • Mapping the Steroid Response to Major Trauma From Injury to Recovery: A Prospective Cohort Study, The Journal of Clinical Endocrinology & Metabolism, 10.1210/clinem/dgz302, 105, 3, (2020).
  • Comparison of pathway and gene-level models for cancer prognosis prediction, BMC Bioinformatics, 10.1186/s12859-020-3423-z, 21, 1, (2020).
  • Predicting the need for supportive services after discharged from hospital: a systematic review, BMC Health Services Research, 10.1186/s12913-020-4972-6, 20, 1, (2020).
  • Risk Score for Predicting 2‐Year Mortality in Patients With Chagas Cardiomyopathy From Endemic Areas: SaMi‐Trop Cohort Study, Journal of the American Heart Association, 10.1161/JAHA.119.014176, 9, 6, (2020).
  • Large B-cell lymphoma presenting primarily in bone marrow is frequently associated with haemophagocytic lymphohistiocytosis and has distinct cytogenetic features, Pathology, 10.1016/j.pathol.2020.04.005, (2020).
  • Aortic morphology post type A acute aortic syndrome: Prognosis significance and association with 24‐hour blood pressure‐monitoring parameters, Journal of Cardiac Surgery, 10.1111/jocs.14512, 35, 5, (981-987), (2020).
  • Predictive Performance of the FIF Screening Tool in 2 Cohorts of Community-Living Older Adults, Journal of the American Medical Directors Association, 10.1016/j.jamda.2020.04.037, (2020).
  • The SHIFT model combines clinical, electrocardiographic and echocardiographic parameters to predict sudden cardiac death in hypertrophic cardiomyopathy, Revista Portuguesa de Cardiologia, 10.1016/j.repc.2019.05.012, (2020).
  • High-Sensitivity Cardiac Troponin I Levels and Prediction of Heart Failure, JACC: Heart Failure, 10.1016/j.jchf.2019.12.008, (2020).
  • Effects of Hypertension, Diabetes, and Smoking on Age and Sex Prediction from Retinal Fundus Images, Scientific Reports, 10.1038/s41598-020-61519-9, 10, 1, (2020).
  • Empirically-Derived Synthetic Populations to Mitigate Small Sample Sizes, Journal of Biomedical Informatics, 10.1016/j.jbi.2020.103408, (103408), (2020).
  • Development and validation of nomograms to accurately predict risk of recurrence for patients with laryngeal squamous cell carcinoma: Cohort Study, International Journal of Surgery, 10.1016/j.ijsu.2020.03.010, (2020).
  • Machine Learning and Prediction of All-Cause Mortality in COPD, Chest, 10.1016/j.chest.2020.02.079, (2020).
  • The relationship between neurocognitive decline and the heart‐lung machine, Journal of Cardiac Surgery, 10.1111/jocs.14505, 35, 5, (1057-1061), (2020).
  • Mcl1 protein levels and Caspase‐7 executioner protease control axial organizer cells survival, Developmental Dynamics, 10.1002/dvdy.169, 249, 7, (847-866), (2020).
  • Time to debridement in open high-grade lower limb fractures and its effect on union and infections: A prospective study in a tropical setting, Journal of Orthopaedic Surgery, 10.1177/2309499020907558, 28, 1, (230949902090755), (2020).
  • Cerebrospinal fluid cytokines and chemokines in children with Lyme neuroborreliosis; pattern and diagnostic utility, Cytokine, 10.1016/j.cyto.2020.155023, 130, (155023), (2020).
  • Residual SYNTAX score and one-year outcome in elderly patients with acute coronary syndrome, CJC Open, 10.1016/j.cjco.2020.03.005, (2020).
  • SAKK38/07 study: integration of baseline metabolic heterogeneity and metabolic tumor volume in DLBCL prognostic model, Blood Advances, 10.1182/bloodadvances.2019001201, 4, 6, (1082-1092), (2020).
  • Peripheral Nerve Field Stimulation for Chronic Back Pain: Therapy Outcome Predictive Factors, Pain Practice, 10.1111/papr.12880, 20, 5, (522-533), (2020).
  • Machine Learning and Mechanistic Modeling for Prediction of Metastatic Relapse in Early-Stage Breast Cancer, JCO Clinical Cancer Informatics, 10.1200/CCI.19.00133, 4, (259-274), (2020).
  • Prevalence and Risk Factors for Opioid-Induced Constipation in an Older National Veteran Cohort, Pain Research and Management, 10.1155/2020/5165682, 2020, (1-11), (2020).
  • Geriatric Nutritional Risk Index (GNRI) and Creatinine Index Equally Predict the Risk of Mortality in Hemodialysis Patients: J-DOPPS, Scientific Reports, 10.1038/s41598-020-62720-6, 10, 1, (2020).
  • Predictors of emergency medical services use by adults with heart failure; 2009–2017, Heart & Lung, 10.1016/j.hrtlng.2020.03.002, (2020).
  • Brenneria alni, causal agent bark canker of Alnus subcordata, Journal of Phytopathology, 10.1111/jph.12929, 168, 9, (516-523), (2020).
  • Influence of ionic liquid counterions on activity and selectivity of ethylene trimerization using chromium‐based catalysts in biphasic media, Applied Organometallic Chemistry, 10.1002/aoc.5874, 34, 10, (2020).
  • Competitive blind spots and the cyclicality of investment: Experimental evidence, Southern Economic Journal, 10.1002/soej.12446, 87, 1, (274-315), (2020).
  • Are randomized controlled trials being conducted with the right justification?, Journal of Evidence-Based Medicine, 10.1111/jebm.12405, 13, 3, (181-182), (2020).
  • Species‐specific distribution model may be not enough: The case study of bottlenose dolphin () habitat distribution in Pelagos Sanctuary, Aquatic Conservation: Marine and Freshwater Ecosystems, 10.1002/aqc.3366, 30, 8, (1689-1701), (2020).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.