The performance of multiple imputation for missing covariate data within the context of regression relative survival analysis

Authors

  • Roch Giorgi,

    Corresponding author
    1. Laboratoire d'Enseignement et de Recherche sur le Traitement de l'Information Médicale, EA 3283, Faculté de Médecine, Université de la Méditerranée, Marseille, France
    • Laboratoire d'Enseignement et de Recherche sur le Traitement de l'Information Médicale, EA 3283, Faculté de Médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, F-13385 Marseille Cedex, France
    Search for more papers by this author
  • Aurélien Belot,

    1. Hospices Civils de Lyon, Service de Biostatistique, Lyon, Université de Lyon, Université Lyon I, CNRS UMR 5558, Laboratoire Biostatistique Santé, Pierre-Bénite, France
    2. Institut de Veille Sanitaire, Département des Maladies Chroniques et des Traumatismes, Saint-Maurice, France
    Search for more papers by this author
  • Jean Gaudart,

    1. Laboratoire d'Enseignement et de Recherche sur le Traitement de l'Information Médicale, EA 3283, Faculté de Médecine, Université de la Méditerranée, Marseille, France
    Search for more papers by this author
  • Guy Launoy

    1. ERI3 INSERM ‘Cancers & Populations’, FRANCIM, Caen, France
    Search for more papers by this author

Abstract

Relative survival assesses the effects of prognostic factors on disease-specific mortality when the cause of death is uncertain or unavailable. It provides an estimate of patients' survival, allowing for the effects of other independent causes of death. Regression-based relative survival models are commonly used in population-based studies to model the effects of some prognostic factors and to estimate net survival. Most often, studies focus on routinely collected prognostic factors for which the proportion of missing values is usually low (around 5 per cent). However, in some cases, additional factors are collected with a greater proportion of missingness. In the present article, we systematically assess the performance of multiple imputation in regression analysis of relative survival through a series of simulation experiments. According to the assumptions concerning the missingness mechanism (completely at random, at random, and not at random) and the missingness pattern (monotone, non-monotone), several strategies were considered and compared: all cases analysis, complete cases analysis, missing data indicator analysis, and multiple imputation by chained equations (MICE) analysis. We showed that MICE performs well in estimating the hazard ratios and the baseline hazard function when the missing mechanism is missing at random (MAR) conditionally on the vital status. In the situations where the missing mechanism was not MAR conditionally on vital status, complete case behaves consistently. As illustration, we used data of the French Cancer Registries on relative survival of patients with colorectal cancer. Copyright © 2008 John Wiley & Sons, Ltd.

Ancillary