Missing data assumptions and methods in a smoking cessation study

Authors

  • Sunni A. Barnes,

    Corresponding author
    1. Baylor Health Care System, Institute for Health Care Research and Improvement, Dallas, TX, USA,
      Sunni A. Barnes, Baylor Health Care System, Institute for Health Care Research and Improvement, 8080 North Central Expressway, Suite 500, LB 81, Dallas, TX 75206, USA. E-mai: sunni.barnes@baylorhealth.edu
    Search for more papers by this author
  • Michael D. Larsen,

    1. Iowa State University, Department of Statistics and Center for Survey Statistics and Methodology, Ames, IA, USA and
    Search for more papers by this author
  • Darrell Schroeder,

    1. Mayo Clinic College of Medicine, Division of Biostatistics, Rochester, MN, USA
    Search for more papers by this author
  • Andrew Hanson,

    1. Mayo Clinic College of Medicine, Division of Biostatistics, Rochester, MN, USA
    Search for more papers by this author
  • Paul A. Decker

    1. Mayo Clinic College of Medicine, Division of Biostatistics, Rochester, MN, USA
    Search for more papers by this author

Sunni A. Barnes, Baylor Health Care System, Institute for Health Care Research and Improvement, 8080 North Central Expressway, Suite 500, LB 81, Dallas, TX 75206, USA. E-mai: sunni.barnes@baylorhealth.edu

ABSTRACT

Aim  A sizable percentage of subjects do not respond to follow-up attempts in smoking cessation studies. The usual procedure in the smoking cessation literature is to assume that non-respondents have resumed smoking. This study used data from a study with a high follow-up rate to assess the degree of bias that may be caused by different methods of imputing missing data.

Design and methods  Based on a large data set with very little missing follow-up information at 12 months, a simulation study was undertaken to compare and contrast missing data imputation methods (assuming smoking, propensity score matching and optimal matching) under various assumptions as to how the missing data arose (randomly generated missing values, increased non-response from smokers and a hybrid of the two).

Findings  Missing data imputation methods all resulted in some degree of bias which increased with the amount of missing data.

Conclusion  None of the missing data imputation methods currently available can compensate for bias when there are substantial amounts of missing data.

Ancillary