Volume 62, Issue 1

A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX

Helmut Küchenhoff

Corresponding Author

Department of Statistics, Ludwig‐Maximilians‐Universität München, D‐80799 München, Germany

email:kuechenhoff@stat.uni‐muenchen.deSearch for more papers by this author
Samuel M. Mwalili

Biostatistical Centre, Catholic University of Leuven, B‐3000 Leuven, Belgium

Search for more papers by this author
Emmanuel Lesaffre

Biostatistical Centre, Catholic University of Leuven, B‐3000 Leuven, Belgium

Search for more papers by this author
First published: 07 July 2005
Citations: 78

Abstract

Summary We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Π from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Π is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus (1999, Biometrika86, 843–855), and to the matrix method of Morrissey and Spiegelman (1999, Biometrics55, 338–344) in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.

Number of times cited according to CrossRef: 78

  • STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1—Basic theory and simple methods of adjustment, Statistics in Medicine, 10.1002/sim.8532, 39, 16, (2197-2231), (2020).
  • STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 2—More complex methods of adjustment and advanced topics, Statistics in Medicine, 10.1002/sim.8531, 39, 16, (2232-2263), (2020).
  • Estimation in the Cox survival regression model with covariate measurement error and a changepoint, Biometrical Journal, 10.1002/bimj.201800085, 62, 5, (1139-1163), (2020).
  • Regression on imperfect class labels derived by unsupervised clustering, Briefings in Bioinformatics, 10.1093/bib/bbaa014, (2020).
  • Analysis of noisy survival data with graphical proportional hazards measurement error models, Biometrics, 10.1111/biom.13331, 0, 0, (2020).
  • A unit level small area model with misclassified covariates, Journal of the Royal Statistical Society: Series A (Statistics in Society), 10.1111/rssa.12468, 182, 4, (1439-1462), (2019).
  • Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem, SSRN Electronic Journal, 10.2139/ssrn.3339983, (2019).
  • Simulation‐selection‐extrapolation: Estimation in high‐dimensional errors‐in‐variables models, Biometrics, 10.1111/biom.13112, 75, 4, (1133-1144), (2019).
  • Variable DNA methylation in neonates mediates the association between prenatal smoking and birth weight, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2018.0120, 374, 1770, (20180120), (2019).
  • What Do Books in the Home Proxy For? A Cautionary Tale, Sociological Methods & Research, 10.1177/0049124119826143, (004912411982614), (2019).
  • R package for analysis of data with mixed measurement error and misclassification in covariates: augSIMEX, Journal of Statistical Computation and Simulation, 10.1080/00949655.2019.1615911, (1-23), (2019).
  • Bayesian measurement error correction in structured additive distributional regression with an application to the analysis of sensor data on soil–plant variability, Stochastic Environmental Research and Risk Assessment, 10.1007/s00477-019-01667-1, (2019).
  • Statistical Approaches for Investigating Periods of Susceptibility in Children’s Environmental Health Research, Current Environmental Health Reports, 10.1007/s40572-019-0224-5, (2019).
  • Application of the Misclassification Simulation Extrapolation Procedure to Log-Logistic Accelerated Failure Time Models in Survival Analysis, Journal of Statistical Theory and Practice, 10.1007/s42519-018-0024-5, 13, 1, (2018).
  • Environmental Exposure Mixtures: Questions and Methods to Address Them, Current Epidemiology Reports, 10.1007/s40471-018-0145-0, 5, 2, (160-165), (2018).
  • Heritability, selection, and the response to selection in the presence of phenotypic measurement error: Effects, cures, and the role of repeated measurements, Evolution, 10.1111/evo.13573, 72, 10, (1992-2004), (2018).
  • Measurement error is often neglected in medical literature: a systematic review, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2018.02.023, 98, (89-97), (2018).
  • Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining, Information Systems Research, 10.1287/isre.2017.0727, 29, 1, (4-24), (2018).
  • Obesity suppresses tumor attributable PSA, affecting risk categorization, Endocrine-Related Cancer, 10.1530/ERC-17-0466, 25, 5, (561-568), (2018).
  • Methods for Inference from Respondent-Driven Sampling Data, Annual Review of Statistics and Its Application, 10.1146/annurev-statistics-031017-100704, 5, 1, (65-93), (2018).
  • Mind the step: A more insightful and robust analysis of the sentencing process in England and Wales under the new sentencing guidelines, Criminology & Criminal Justice, 10.1177/1748895818811891, (174889581881189), (2018).
  • Bias due to differential and non-differential disease- and exposure misclassification in studies of vaccine effectiveness, PLOS ONE, 10.1371/journal.pone.0199180, 13, 6, (e0199180), (2018).
  • Considerations for analysis of time‐to‐event outcomes measured with error: Bias and correction with SIMEX, Statistics in Medicine, 10.1002/sim.7554, 37, 8, (1276-1289), (2017).
  • Maximum Likelihood Estimators in Regression Models for Error‐prone Group Testing Data, Scandinavian Journal of Statistics, 10.1111/sjos.12282, 44, 4, (918-931), (2017).
  • Corrected ROC analysis for misclassified binary outcomes, Statistics in Medicine, 10.1002/sim.7260, 36, 13, (2148-2160), (2017).
  • Measurement Error and Misclassification: Introduction, Statistical Analysis with Measurement Error or Misclassification, 10.1007/978-1-4939-6640-0_2, (43-85), (2017).
  • Logistic regression with misclassification in binary outcome variables: a method and software, Behaviormetrika, 10.1007/s41237-017-0031-y, 44, 2, (447-476), (2017).
  • Correcting Measurement Error in Content Analysis, Communication Methods and Measures, 10.1080/19312458.2017.1305103, 11, 2, (87-104), (2017).
  • Miscellaneous Topics, Statistical Analysis with Measurement Error or Misclassification, 10.1007/978-1-4939-6640-0_9, (395-410), (2017).
  • A double SIMEX approach for bivariate random-effects meta-analysis of diagnostic accuracy studies, BMC Medical Research Methodology, 10.1186/s12874-016-0284-2, 17, 1, (2017).
  • Misclassified exposure in epigenetic mediation analyses. Does DNA methylation mediate effects of smoking on birthweight?, Epigenomics, 10.2217/epi-2016-0145, 9, 3, (253-265), (2017).
  • Bayesian two-component measurement error modelling for survival analysis using INLA—A case study on cardiovascular disease mortality in Switzerland, Computational Statistics & Data Analysis, 10.1016/j.csda.2017.03.001, 113, (177-193), (2017).
  • Bayesian adjustment for unidirectional misclassification in ordinal covariates, Journal of Statistical Computation and Simulation, 10.1080/00949655.2017.1370649, 87, 18, (3440-3468), (2017).
  • Measurement Error and Environmental Epidemiology: a Policy Perspective, Current Environmental Health Reports, 10.1007/s40572-017-0125-4, 4, 1, (79-88), (2017).
  • Correcting for binomial measurement error in predictors in regression with application to analysis of DNA methylation rates by bisulfite sequencing, Statistics in Medicine, 10.1002/sim.6988, 35, 22, (3987-4007), (2016).
  • Identification of confounder in epidemiologic data contaminated by measurement error in covariates, BMC Medical Research Methodology, 10.1186/s12874-016-0159-6, 16, 1, (2016).
  • Estimation of bias and variance of measurements made from tomography scans, Measurement Science and Technology, 10.1088/0957-0233/27/9/095402, 27, 9, (095402), (2016).
  • Incorporating social contact data in spatio-temporal models for infectious disease spread, Biostatistics, 10.1093/biostatistics/kxw051, (kxw051), (2016).
  • Methods to adjust for misclassification in the quantiles for the generalized linear model with measurement error in continuous exposures, Statistics in Medicine, 10.1002/sim.6812, 35, 10, (1676-1688), (2015).
  • Inference on cancer screening exam accuracy using population‐level administrative data, Statistics in Medicine, 10.1002/sim.6619, 35, 1, (130-146), (2015).
  • Fully Nonparametric Regression Modelling of Misclassified Censored Time-to-Event Data, Nonparametric Bayesian Inference in Biostatistics, 10.1007/978-3-319-19518-6, (247-267), (2015).
  • Occupational exposure to endocrine disruptors and lymphoma risk in a multi-centric European study, British Journal of Cancer, 10.1038/bjc.2015.83, 112, 7, (1251-1256), (2015).
  • Correcting Measurement Error in Latent Regression Covariates via the MC‐SIMEX Method, Journal of Educational Measurement, 10.1111/jedm.12090, 52, 4, (359-375), (2015).
  • Functional and Structural Methods With Mixed Measurement Error and Misclassification in Covariates, Journal of the American Statistical Association, 10.1080/01621459.2014.922777, 110, 510, (681-696), (2015).
  • A Bayesian Approach to Account for Misclassification and Overdispersion in Count Data, International Journal of Environmental Research and Public Health, 10.3390/ijerph120910648, 12, 9, (10648-10661), (2015).
  • Measurement Error, Handbook of Epidemiology, 10.1007/978-0-387-09834-0, (1241-1282), (2014).
  • Misclassification, Handbook of Epidemiology, 10.1007/978-0-387-09834-0, (639-658), (2014).
  • Evaluation of Statistical Inference on Empirical Resting State fMRI, IEEE Transactions on Biomedical Engineering, 10.1109/TBME.2013.2294013, 61, 4, (1091-1099), (2014).
  • Efficient logistic regression designs under an imperfect population identifier, Biometrics, 10.1111/biom.12106, 70, 1, (175-184), (2013).
  • The SIMEX approach to measurement error correction in meta‐analysis with baseline risk as covariate, Statistics in Medicine, 10.1002/sim.6076, 33, 12, (2062-2076), (2013).
  • Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis, Journal of Statistical Theory and Practice, 10.1080/15598608.2013.772830, 7, 2, (381-400), (2013).
  • Assessment of bias in experimentally measured diffusion tensor imaging parameters using SIMEX, Magnetic Resonance in Medicine, 10.1002/mrm.24324, 69, 3, (891-902), (2012).
  • Statistical adjustment of genotyping error in a case–control study of childhood leukaemia, BMC Medical Research Methodology, 10.1186/1471-2288-12-141, 12, 1, (2012).
  • Measurement Error Adjustment Using the SIMEX Method: An Application to Student Growth Percentiles, Journal of Educational Measurement, 10.1111/j.1745-3984.2012.00186.x, 49, 4, (446-465), (2012).
  • Partially identified prevalence estimation under misclassification using the kappa coefficient, International Journal of Approximate Reasoning, 10.1016/j.ijar.2012.06.013, 53, 8, (1168-1182), (2012).
  • Carotenoid intakes and risk of breast cancer defined by estrogen receptor and progesterone receptor status: a pooled analysis of 18 prospective cohort studies, The American Journal of Clinical Nutrition, 10.3945/ajcn.111.014415, 95, 3, (713-725), (2012).
  • Modeling of Multivariate Monotone Disease Processes in the Presence of Misclassification, Journal of the American Statistical Association, 10.1080/01621459.2012.682804, 107, 499, (976-989), (2012).
  • Density‐Structured Models for Plant Population Dynamics, The American Naturalist, 10.1086/657621, 177, 1, (1-17), (2011).
  • LABOLATORY EXPERIMENT AND NUMERICAL SIMULATION FOR SOLUTE TRANSPORT IN HETEROGENEOUS-UNSATURATED VERTICAL INFILTRATION FLOW FIELD, Doboku Gakkai Ronbunshuu B, 10.2208/jscejb.66.248, 66, 3, (248-257), (2010).
  • Which Factors Safeguard Employment? An Analysis with Misclassified German Register Data, SSRN Electronic Journal, 10.2139/ssrn.1743492, (2010).
  • Haplotype Misclassification Resulting from Statistical Reconstruction and Genotype Error, and Its Impact on Association Estimates, Annals of Human Genetics, 10.1111/j.1469-1809.2010.00593.x, 74, 5, (452-462), (2010).
  • Correcting for misclassification for a monotone disease process with an application in dental research, Statistics in Medicine, 10.1002/sim.3906, 29, 30, (3103-3117), (2010).
  • Bayesian adjustment for exposure misclassification in case–control studies, Statistics in Medicine, 10.1002/sim.3829, 29, 9, (994-1003), (2010).
  • Misclassification of current status data, Lifetime Data Analysis, 10.1007/s10985-010-9154-0, 16, 2, (215-230), (2010).
  • A Method of Automated Nonparametric Content Analysis for Social Science, American Journal of Political Science, 10.1111/j.1540-5907.2009.00428.x, 54, 1, (229-247), (2009).
  • Correction for misclassification of a categorized exposure in binary regression using replication data, Statistics in Medicine, 10.1002/sim.3712, 28, 27, (3386-3410), (2009).
  • An investigation of the MC‐SIMEX method with application to measurement error in periodontal outcomes, Statistics in Medicine, 10.1002/sim.3656, 28, 28, (3523-3538), (2009).
  • On the estimation of the misclassification table for finite count data with an application in caries research, Statistical Modelling: An International Journal, 10.1177/1471082X0800900201, 9, 2, (99-118), (2009).
  • Regression Calibration for Cox Regression Under Heteroscedastic Measurement Error — Determining Risk Factors of Cardiovascular Diseases from Error-prone Nutritional Replication Data, Recent Advances in Linear Models and Related Areas, 10.1007/978-3-7908-2064-5, (253-278), (2008).
  • Make assurance double sure: combination of two disclosure limitation methods and estimation of general regression models, AStA Advances in Statistical Analysis, 10.1007/s10182-008-0094-x, 92, 4, (405-422), (2008).
  • Corrected score estimation in the proportional hazards model with misclassified discrete covariates, Statistics in Medicine, 10.1002/sim.3159, 27, 11, (1911-1933), (2008).
  • Binary Regression with Misclassified Response and Covariate Subject to Measurement Error: a Bayesian Approach, Biometrical Journal, 10.1002/bimj.200710402, 50, 1, (123-134), (2008).
  • Robust techniques for measurement error correction: a review, Statistical Methods in Medical Research, 10.1177/0962280207081318, 17, 6, (555-580), (2008).
  • A New Mixture Model for Misclassification With Applications for Survey Data, Sociological Methods & Research, 10.1177/0049124107313854, 37, 1, (75-104), (2008).
  • Haplotype Reconstruction Error as a Classical Misclassification Problem: Introducing Sensitivity and Specificity as Error Measures, PLoS ONE, 10.1371/journal.pone.0001853, 3, 3, (e1853), (2008).
  • Make Assurance Double Sure: Combination of Two Disclosure Limitation Methods and Estimation of General Regression Models, SSRN Electronic Journal, 10.2139/ssrn.1131273, (2007).
  • Asymptotic variance estimation for the misclassification SIMEX, Computational Statistics & Data Analysis, 10.1016/j.csda.2006.12.045, 51, 12, (6197-6211), (2007).
  • Regression analysis with categorized regression calibrated exposure: some interesting findings, Emerging Themes in Epidemiology, 10.1186/1742-7622-3-6, 3, 1, (2006).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.