Get access

A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis

Authors

  • Meredith L. Wallace,

    Corresponding author
    1. University of Pittsburgh School of Medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic, Pittsburgh, PA, U.S.A.
    2. Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, U.S.A.
    • University of Pittsburgh School of Medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic, Pittsburgh, PA 15213, U.S.A.
    Search for more papers by this author
  • Stewart J. Anderson,

    1. Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, U.S.A.
    Search for more papers by this author
  • Sati Mazumdar

    1. University of Pittsburgh School of Medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic, Pittsburgh, PA, U.S.A.
    2. Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, U.S.A.
    Search for more papers by this author

Abstract

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary