A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX
Abstract
Summary We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Π from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Π is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus (1999, Biometrika86, 843–855), and to the matrix method of Morrissey and Spiegelman (1999, Biometrics55, 338–344) in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.
Citing Literature
Number of times cited according to CrossRef: 78
- Ruth H. Keogh, Pamela A. Shaw, Paul Gustafson, Raymond J. Carroll, Veronika Deffner, Kevin W. Dodd, Helmut Küchenhoff, Janet A. Tooze, Michael P. Wallace, Victor Kipnis, Laurence S. Freedman, STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1—Basic theory and simple methods of adjustment, Statistics in Medicine, 10.1002/sim.8532, 39, 16, (2197-2231), (2020).
- Pamela A. Shaw, Paul Gustafson, Raymond J. Carroll, Veronika Deffner, Kevin W. Dodd, Ruth H. Keogh, Victor Kipnis, Janet A. Tooze, Michael P. Wallace, Helmut Küchenhoff, Laurence S. Freedman, STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2—More complex methods of adjustment and advanced topics, Statistics in Medicine, 10.1002/sim.8531, 39, 16, (2232-2263), (2020).
- Sarit Agami, David M. Zucker, Donna Spiegelman, Estimation in the Cox survival regression model with covariate measurement error and a changepoint, Biometrical Journal, 10.1002/bimj.201800085, 62, 5, (1139-1163), (2020).
- Rasmus Froberg Brøndum, Thomas Yssing Michaelsen, Martin Bøgsted, Regression on imperfect class labels derived by unsupervised clustering, Briefings in Bioinformatics, 10.1093/bib/bbaa014, (2020).
- Li‐Pang Chen, Grace Y. Yi, Analysis of noisy survival data with graphical proportional hazards measurement error models, Biometrics, 10.1111/biom.13331, 0, 0, (2020).
- Serena Arima, Silvia Polettini, A unit level small area model with misclassified covariates, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/rssa.12468, 182, 4, (1439-1462), (2019).
- Mochen Yang, Edward McFowland, Gordon Burtch, Gediminas Adomavicius, Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem, SSRN Electronic Journal, 10.2139/ssrn.3339983, (2019).
- Linh Nghiem, Cornelis Potgieter, Simulation‐selection‐extrapolation: Estimation in high‐dimensional errors‐in‐variables models, Biometrics, 10.1111/biom.13112, 75, 4, (1133-1144), (2019).
- Eilis Hannon, Diana Schendel, Christine Ladd-Acosta, Jakob Grove, Christine Søholm Hansen, David Michael Hougaard, Michaeline Bresnahan, Ole Mors, Mads Vilhelm Hollegaard, Marie Bækvad-Hansen, Mady Hornig, Preben Bo Mortensen, Anders D. Børglum, Thomas Werge, Marianne Giørtz Pedersen, Merete Nordentoft, Joseph D. Buxbaum, M. Daniele Fallin, Jonas Bybjerg-Grauholm, Abraham Reichenberg, Jonathan Mill, Variable DNA methylation in neonates mediates the association between prenatal smoking and birth weight, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2018.0120, 374, 1770, (20180120), (2019).
- Per Engzell, What Do Books in the Home Proxy For? A Cautionary Tale, Sociological Methods & Research, 10.1177/0049124119826143, (004912411982614), (2019).
- Qihuang Zhang, Grace Y. Yi, R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX, Journal of Statistical Computation and Simulation, 10.1080/00949655.2019.1615911, (1-23), (2019).
- Alessio Pollice, Giovanna Jona Lasinio, Roberta Rossi, Mariana Amato, Thomas Kneib, Stefan Lang, Bayesian measurement error correction in structured additive distributional regression with an application to the analysis of sensor data on soil–plant variability, Stochastic Environmental Research and Risk Assessment, 10.1007/s00477-019-01667-1, (2019).
- Jessie P. Buckley, Ghassan B. Hamra, Joseph M. Braun, Statistical Approaches for Investigating Periods of Susceptibility in Children’s Environmental Health Research, Current Environmental Health Reports, 10.1007/s40572-019-0224-5, (2019).
- Varadan Sevilimedu, Lili Yu, Hani Samawi, Haresh Rochani, Application of the Misclassification Simulation Extrapolation Procedure to Log-Logistic Accelerated Failure Time Models in Survival Analysis, Journal of Statistical Theory and Practice, 10.1007/s42519-018-0024-5, 13, 1, (2018).
- Ghassan B. Hamra, Jessie P. Buckley, Environmental Exposure Mixtures: Questions and Methods to Address Them, Current Epidemiology Reports, 10.1007/s40471-018-0145-0, 5, 2, (160-165), (2018).
- Erica Ponzi, Lukas F. Keller, Timothée Bonnet, Stefanie Muff, Heritability, selection, and the response to selection in the presence of phenotypic measurement error: Effects, cures, and the role of repeated measurements, Evolution, 10.1111/evo.13573, 72, 10, (1992-2004), (2018).
- Timo B. Brakenhoff, Marian Mitroiu, Ruth H. Keogh, Karel G.M. Moons, Rolf H.H. Groenwold, Maarten van Smeden, Measurement error is often neglected in medical literature: a systematic review, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2018.02.023, 98, (89-97), (2018).
- Mochen Yang, Gediminas Adomavicius, Gordon Burtch, Yuqing Ren, Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining, Information Systems Research, 10.1287/isre.2017.0727, 29, 1, (4-24), (2018).
- Ken Chow, Stefano Mangiola, Jaideep Vazirani, Justin S Peters, Anthony J Costello, Christopher M Hovens, Niall M Corcoran, Obesity suppresses tumor attributable PSA, affecting risk categorization, Endocrine-Related Cancer, 10.1530/ERC-17-0466, 25, 5, (561-568), (2018).
- Krista J. Gile, Isabelle S. Beaudry, Mark S. Handcock, Miles Q. Ott, Methods for Inference from Respondent-Driven Sampling Data, Annual Review of Statistics and Its Application, 10.1146/annurev-statistics-031017-100704, 5, 1, (65-93), (2018).
- Jose Pina-Sánchez, Ian Brunton-Smith, Guangquan Li, Mind the step: A more insightful and robust analysis of the sentencing process in England and Wales under the new sentencing guidelines, Criminology & Criminal Justice, 10.1177/1748895818811891, (174889581881189), (2018).
- Tom De Smedt, Elizabeth Merrall, Denis Macina, Silvia Perez-Vilar, Nick Andrews, Kaatje Bollaerts, Bias due to differential and non-differential disease- and exposure misclassification in studies of vaccine effectiveness, PLOS ONE, 10.1371/journal.pone.0199180, 13, 6, (e0199180), (2018).
- Eric J. Oh, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw, Considerations for analysis of time‐to‐event outcomes measured with error: Bias and correction with SIMEX, Statistics in Medicine, 10.1002/sim.7554, 37, 8, (1276-1289), (2017).
- Xianzheng Huang, Md Shamim Sarker Warasi, Maximum Likelihood Estimators in Regression Models for Error‐prone Group Testing Data, Scandinavian Journal of Statistics, 10.1111/sjos.12282, 44, 4, (918-931), (2017).
- Matthew Zawistowski, Jeremy B. Sussman, Timothy P. Hofer, Douglas Bentley, Rodney A. Hayward, Wyndy L. Wiitala, Corrected ROC analysis for misclassified binary outcomes, Statistics in Medicine, 10.1002/sim.7260, 36, 13, (2148-2160), (2017).
- Grace Y. Yi, Grace Y. Yi, Measurement Error and Misclassification: Introduction, Statistical Analysis with Measurement Error or Misclassification, 10.1007/978-1-4939-6640-0_2, (43-85), (2017).
- Haiyan Liu, Zhiyong Zhang, Logistic regression with misclassification in binary outcome variables: a method and software, Behaviormetrika, 10.1007/s41237-017-0031-y, 44, 2, (447-476), (2017).
- Marko Bachl, Michael Scharkow, Correcting Measurement Error in Content Analysis, Communication Methods and Measures, 10.1080/19312458.2017.1305103, 11, 2, (87-104), (2017).
- Grace Y. Yi, Grace Y. Yi, Miscellaneous Topics, Statistical Analysis with Measurement Error or Misclassification, 10.1007/978-1-4939-6640-0_9, (395-410), (2017).
- Annamaria Guolo, A double SIMEX approach for bivariate random-effects meta-analysis of diagnostic accuracy studies, BMC Medical Research Methodology, 10.1186/s12874-016-0284-2, 17, 1, (2017).
- Linda Valeri, Sarah L Reese, Shanshan Zhao, Christian M Page, Wenche Nystad, Brent A Coull, Stephanie J London, Misclassified exposure in epigenetic mediation analyses. Does DNA methylation mediate effects of smoking on birthweight?, Epigenomics, 10.2217/epi-2016-0145, 9, 3, (253-265), (2017).
- Stefanie Muff, Manuela Ott, Julia Braun, Leonhard Held, Bayesian two-component measurement error modelling for survival analysis using INLA—A case study on cardiovascular disease mortality in Switzerland, Computational Statistics & Data Analysis, 10.1016/j.csda.2017.03.001, 113, (177-193), (2017).
- Liangrui Sun, Michelle Xia, Yuanyuan Tang, Philip G. Jones, Bayesian adjustment for unidirectional misclassification in ordinal covariates, Journal of Statistical Computation and Simulation, 10.1080/00949655.2017.1370649, 87, 18, (3440-3468), (2017).
- Jessie K. Edwards, Alexander P. Keil, Measurement Error and Environmental Epidemiology: a Policy Perspective, Current Environmental Health Reports, 10.1007/s40572-017-0125-4, 4, 1, (79-88), (2017).
- John Buonaccorsi, Agnieszka Prochenka, Magne Thoresen, Rafal Ploski, Correcting for binomial measurement error in predictors in regression with application to analysis of DNA methylation rates by bisulfite sequencing, Statistics in Medicine, 10.1002/sim.6988, 35, 22, (3987-4007), (2016).
- Paul H. Lee, Igor Burstyn, Identification of confounder in epidemiologic data contaminated by measurement error in covariates, BMC Medical Research Methodology, 10.1186/s12874-016-0159-6, 16, 1, (2016).
- Robert S Bradley, Estimation of bias and variance of measurements made from tomography scans, Measurement Science and Technology, 10.1088/0957-0233/27/9/095402, 27, 9, (095402), (2016).
- Sebastian Meyer, Leonhard Held, Incorporating social contact data in spatio-temporal models for infectious disease spread, Biostatistics, 10.1093/biostatistics/kxw051, (kxw051), (2016).
- Ching‐Yun Wang, Jean De Dieu Tapsoba, Catherine Duggan, Kristin L Campbell, Anne McTiernan, Methods to adjust for misclassification in the quantiles for the generalized linear model with measurement error in continuous exposures, Statistics in Medicine, 10.1002/sim.6812, 35, 10, (1676-1688), (2015).
- H. Jiang, P. E. Brown, S.D. Walter, Inference on cancer screening exam accuracy using population‐level administrative data, Statistics in Medicine, 10.1002/sim.6619, 35, 1, (130-146), (2015).
- Alejandro Jara, María José García-Zattera, Arnošt Komárek, Fully Nonparametric Regression Modelling of Misclassified Censored Time-to-Event Data, Nonparametric Bayesian Inference in Biostatistics, 10.1007/978-3-319-19518-6, (247-267), (2015).
- L Costas, C Infante-Rivard, J-P Zock, M Van Tongeren, P Boffetta, A Cusson, C Robles, D Casabonne, Y Benavente, N Becker, P Brennan, L Foretova, M Maynadié, A Staines, A Nieters, P Cocco, S de Sanjosé, Occupational exposure to endocrine disruptors and lymphoma risk in a multi-centric European study, British Journal of Cancer, 10.1038/bjc.2015.83, 112, 7, (1251-1256), (2015).
- Leslie Rutkowski, Yan Zhou, Correcting Measurement Error in Latent Regression Covariates via the MC‐SIMEX Method, Journal of Educational Measurement, 10.1111/jedm.12090, 52, 4, (359-375), (2015).
- Grace Y. Yi, Yanyuan Ma, Donna Spiegelman, Raymond J. Carroll, Functional and Structural Methods With Mixed Measurement Error and Misclassification in Covariates, Journal of the American Statistical Association, 10.1080/01621459.2014.922777, 110, 510, (681-696), (2015).
- Wenqi Wu, James Stamey, David Kahle, A Bayesian Approach to Account for Misclassification and Overdispersion in Count Data, International Journal of Environmental Research and Public Health, 10.3390/ijerph120910648, 12, 9, (10648-10661), (2015).
- Jeffrey S. Buzas, Leonard A. Stefanski, Tor D. Tosteson, Measurement Error, Handbook of Epidemiology, 10.1007/978-0-387-09834-0, (1241-1282), (2014).
- Paul Gustafson, Sander Greenland, Misclassification, Handbook of Epidemiology, 10.1007/978-0-387-09834-0, (639-658), (2014).
- undefined Xue Yang, undefined Hakmook Kang, Allen T. Newton, Bennett A. Landman, Evaluation of Statistical Inference on Empirical Resting State fMRI, IEEE Transactions on Biomedical Engineering, 10.1109/TBME.2013.2294013, 61, 4, (1091-1099), (2014).
- Paul S. Albert, Aiyi Liu, Tonja Nansel, Efficient logistic regression designs under an imperfect population identifier, Biometrics, 10.1111/biom.12106, 70, 1, (175-184), (2013).
- A. Guolo, The SIMEX approach to measurement error correction in meta‐analysis with baseline risk as covariate, Statistics in Medicine, 10.1002/sim.6076, 33, 12, (2062-2076), (2013).
- Heejung Bang, Ya-Lin Chiu, Jay S. Kaufman, Mehul D. Patel, Gerardo Heiss, Kathryn M. Rose, Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis, Journal of Statistical Theory and Practice, 10.1080/15598608.2013.772830, 7, 2, (381-400), (2013).
- Carolyn B. Lauzon, Ciprian Crainiceanu, Brian C. Caffo, Bennett A. Landman, Assessment of bias in experimentally measured diffusion tensor imaging parameters using SIMEX, Magnetic Resonance in Medicine, 10.1002/mrm.24324, 69, 3, (891-902), (2012).
- Matthew N Cooper, Nicholas H de Klerk, Kathryn R Greenop, Sarra E Jamieson, Denise Anderson, Frank M van Bockxmeer, Bruce K Armstrong, Elizabeth Milne, Statistical adjustment of genotyping error in a case–control study of childhood leukaemia, BMC Medical Research Methodology, 10.1186/1471-2288-12-141, 12, 1, (2012).
- Yi Shang, Measurement Error Adjustment Using the SIMEX Method: An Application to Student Growth Percentiles, Journal of Educational Measurement, 10.1111/j.1745-3984.2012.00186.x, 49, 4, (446-465), (2012).
- Helmut Küchenhoff, Thomas Augustin, Anne Kunz, Partially identified prevalence estimation under misclassification using the kappa coefficient, International Journal of Approximate Reasoning, 10.1016/j.ijar.2012.06.013, 53, 8, (1168-1182), (2012).
- Xuehong Zhang, Donna Spiegelman, Laura Baglietto, Leslie Bernstein, Deborah A Boggs, Piet A van den Brandt, Julie E Buring, Susan M Gapstur, Graham G Giles, Edward Giovannucci, Gary Goodman, Susan E Hankinson, Kathy J Helzlsouer, Pamela L Horn-Ross, Manami Inoue, Seungyoun Jung, Polyna Khudyakov, Susanna C Larsson, Marie Lof, Marjorie L McCullough, Anthony B Miller, Marian L Neuhouser, Julie R Palmer, Yikyung Park, Kim Robien, Thomas E Rohan, Julie A Ross, Leo J Schouten, James M Shikany, Shoichiro Tsugane, Kala Visvanathan, Elisabete Weiderpass, Alicja Wolk, Walter C Willett, Shumin M Zhang, Regina G Ziegler, Stephanie A Smith-Warner, Carotenoid intakes and risk of breast cancer defined by estrogen receptor and progesterone receptor status: a pooled analysis of 18 prospective cohort studies, The American Journal of Clinical Nutrition, 10.3945/ajcn.111.014415, 95, 3, (713-725), (2012).
- María José García-Zattera, Alejandro Jara, Emmanuel Lesaffre, Guillermo Marshall, Modeling of Multivariate Monotone Disease Processes in the Presence of Misclassification, Journal of the American Statistical Association, 10.1080/01621459.2012.682804, 107, 499, (976-989), (2012).
- Robert P. Freckleton, William J. Sutherland, Andrew R. Watkinson, Simon A. Queenborough, Density‐Structured Models for Plant Population Dynamics, The American Naturalist, 10.1086/657621, 177, 1, (1-17), (2011).
- Masahiko SAITO, Kei NAKAGAWA, LABOLATORY EXPERIMENT AND NUMERICAL SIMULATION FOR SOLUTE TRANSPORT IN HETEROGENEOUS-UNSATURATED VERTICAL INFILTRATION FLOW FIELD, Doboku Gakkai Ronbunshuu B, 10.2208/jscejb.66.248, 66, 3, (248-257), (2010).
- Laura Wichert, Ralf A. Wilke, Which Factors Safeguard Employment? An Analysis with Misclassified German Register Data, SSRN Electronic Journal, 10.2139/ssrn.1743492, (2010).
- Claudia Lamina, Helmut Küchenhoff, Jenny Chang‐Claude, Bernhard Paulweber, H.‐Erich Wichmann, Thomas Illig, Margret R. Hoehe, Florian Kronenberg, Iris M. Heid, Haplotype Misclassification Resulting from Statistical Reconstruction and Genotype Error, and Its Impact on Association Estimates, Annals of Human Genetics, 10.1111/j.1469-1809.2010.00593.x, 74, 5, (452-462), (2010).
- M. J. García‐Zattera, T. Mutsvari, A. Jara, D. Declerck, E. Lesaffre, Correcting for misclassification for a monotone disease process with an application in dental research, Statistics in Medicine, 10.1002/sim.3906, 29, 30, (3103-3117), (2010).
- Rong Chu, Paul Gustafson, Nhu Le, Bayesian adjustment for exposure misclassification in case–control studies, Statistics in Medicine, 10.1002/sim.3829, 29, 9, (994-1003), (2010).
- Karen McKeown, Nicholas P. Jewell, Misclassification of current status data, Lifetime Data Analysis, 10.1007/s10985-010-9154-0, 16, 2, (215-230), (2010).
- Daniel J. Hopkins, Gary King, A Method of Automated Nonparametric Content Analysis for Social Science, American Journal of Political Science, 10.1111/j.1540-5907.2009.00428.x, 54, 1, (229-247), (2009).
- Ingvild Dalen, John P. Buonaccorsi, Joseph A. Sexton, Petter Laake, Magne Thoresen, Correction for misclassification of a categorized exposure in binary regression using replication data, Statistics in Medicine, 10.1002/sim.3712, 28, 27, (3386-3410), (2009).
- Elizabeth H. Slate, Dipankar Bandyopadhyay, An investigation of the MC‐SIMEX method with application to measurement error in periodontal outcomes, Statistics in Medicine, 10.1002/sim.3656, 28, 28, (3523-3538), (2009).
- Emmanuel Lesaffre, Helmut Küchenhoff, Samuel M Mwalili, Dominique Declerck, On the estimation of the misclassification table for finite count data with an application in caries research, Statistical Modelling: An International Journal, 10.1177/1471082X0800900201, 9, 2, (99-118), (2009).
- Thomas Augustin, Angela Döring, David Rummel, undefined Shalabh, Christian Heumann, Regression Calibration for Cox Regression Under Heteroscedastic Measurement Error — Determining Risk Factors of Cardiovascular Diseases from Error-prone Nutritional Replication Data, Recent Advances in Linear Models and Related Areas, 10.1007/978-3-7908-2064-5, (253-278), (2008).
- Anton Flossmann, Sandra Nolte, Make assurance double sure: combination of two disclosure limitation methods and estimation of general regression models, AStA Advances in Statistical Analysis, 10.1007/s10182-008-0094-x, 92, 4, (405-422), (2008).
- David M. Zucker, Donna Spiegelman, Corrected score estimation in the proportional hazards model with misclassified discrete covariates, Statistics in Medicine, 10.1002/sim.3159, 27, 11, (1911-1933), (2008).
- Anna McGlothlin, James D. Stamey, John W. Seaman, Binary Regression with Misclassified Response and Covariate Subject to Measurement Error: a Bayesian Approach, Biometrical Journal, 10.1002/bimj.200710402, 50, 1, (123-134), (2008).
- Annamaria Guolo, Robust techniques for measurement error correction: a review, Statistical Methods in Medical Research, 10.1177/0962280207081318, 17, 6, (555-580), (2008).
- Simon Cheng, undefined Yingmei Xi, Ming-Hui Chen, A New Mixture Model for Misclassification With Applications for Survey Data, Sociological Methods & Research, 10.1177/0049124107313854, 37, 1, (75-104), (2008).
- Claudia Lamina, Friedhelm Bongardt, Helmut Küchenhoff, Iris M. Heid, Haplotype Reconstruction Error as a Classical Misclassification Problem: Introducing Sensitivity and Specificity as Error Measures, PLoS ONE, 10.1371/journal.pone.0001853, 3, 3, (e1853), (2008).
- Anton Flossmann, Sandra Nolte (Lechner), Make Assurance Double Sure: Combination of Two Disclosure Limitation Methods and Estimation of General Regression Models, SSRN Electronic Journal, 10.2139/ssrn.1131273, (2007).
- Helmut Küchenhoff, Wolfgang Lederer, Emmanuel Lesaffre, Asymptotic variance estimation for the misclassification SIMEX, Computational Statistics & Data Analysis, 10.1016/j.csda.2006.12.045, 51, 12, (6197-6211), (2007).
- Ingvild Dalen, John P Buonaccorsi, Petter Laake, Anette Hjartåker, Magne Thoresen, Regression analysis with categorized regression calibrated exposure: some interesting findings, Emerging Themes in Epidemiology, 10.1186/1742-7622-3-6, 3, 1, (2006).




