Examining missing data mechanisms via homogeneity of parameters, homogeneity of distributions, and multivariate normality

Authors


Abstract

This paper reviews various methods of identifying missing data mechanisms. The three well-known mechanisms of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) are considered. A number of tests deem rejection of homogeneity of means and/or covariances (HMC) among observed data patterns as a means to reject MCAR. Utility of these tests as well as their shortcomings are discussed. In particular, examples of MAR and MNAR data with homogeneous means and covariances between their observed data patterns are provided for which tests of HMC fail to reject MCAR. More generally, tests of homogeneity of parameter estimates between various subsets of data are reviewed and their utility as tests of MCAR and MAR (in special cases) is pointed out. Since many tests of MCAR assume multinormality, methods to assess this assumption in the context of incomplete data are reviewed. Tests of homogeneity of distributions among observed data patterns for MCAR are also considered. A new nonparametric test of this type is proposed on the basis of pairwise comparison of marginal distributions. Finally, methods of examining missing data mechanism based on sensitivity analysis including methods that model missing data mechanism based on logistic, probit, and latent variable regression models, as well as methods that do not require modeling of missing data mechanism are reviewed. The paper concludes with some practical comments about the validity and utility of tests of missing data mechanism. WIREs Comput Stat 2014, 6:56–73. doi: 10.1002/wics.1287

Conflict of interest: The authors have declared no conflicts of interest for this article.

For further resources related to this article, please visit the WIREs website.

Ancillary