Special Issue Paper
Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study
Multiple imputation is becoming increasingly popular for handling missing data, with Markov chain Monte Carlo assuming multivariate normality (MVN) a commonly used approach. Imputing categorical variables (which are clearly non-normal) using MVN imputation is challenging, and several approaches have been suggested. However, it remains unclear which approach should be preferred.
We explore methods for imputing ordinal variables using MVN imputation, including imputing as a continuous variable and as a set of indicators, and various methods for assigning imputed values to the possible categories (rounding), for estimating a non-linear association between an ordinal exposure and binary outcome. We introduce a new approach where we impute as continuous and assign imputed values into categories based on the mean indicators imputed in a separate round of imputation. We compare these approaches in a simple setting where we make 50% of data in an ordinal exposure missing completely at random, within an otherwise complete real dataset.
Methods that impute the ordinal exposure as continuous distorted the non-linear exposure–outcome association by biasing the relationship towards linearity irrespective of the rounding method. In contrast, imputing using indicators preserved the non-linear association but not the marginal distribution of the ordinal variable.
Imputing ordinal variables as continuous can bias the estimation of the exposure–outcome association in the presence of non-linear relationships. Further work is needed to develop optimal methods for handling ordinal (and nominal) variables when using MVN imputation. Copyright © 2012 John Wiley & Sons, Ltd.