Get access

The impact of dichotomization in longitudinal data analysis: a simulation study


  • Bongin Yoo

    Corresponding author
    1. Global Biometric Sciences, Bristol–Myers Squibb Company, Wallingford, CT, USA
    • Global Biometric Sciences, Bristol–Myers Squibb Company, 5 Research Parkway, Wallingford, CT 06492, USA
    Search for more papers by this author


In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi-parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI-GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI-GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood-based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI-GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE-based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd.