Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random


Qi Long, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.



The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on complete cases loses efficiency because of the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. Although a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve. We conduct simulation studies to evaluate the finite sample performance of the proposed methods and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods by using an observational study of maternal depression during pregnancy. Copyright © 2011 John Wiley & Sons, Ltd.