Standard Article

Missing Data

Statistical Theory and Methods

  1. Philip K. Hopke1,
  2. Chuanhai Liu2,
  3. Donald B. Rubin3

Published Online: 15 SEP 2006

DOI: 10.1002/9780470057339.vam023

Encyclopedia of Environmetrics

Encyclopedia of Environmetrics

How to Cite

Hopke, P. K., Liu, C. and Rubin, D. B. 2006. Missing Data. Encyclopedia of Environmetrics. 4.

Author Information

  1. 1

    Clarkson University, Potsdam, NJ, USA

  2. 2

    Bell Laboratories, NJ, USA

  3. 3

    Harvard University, MA, USA

Publication History

  1. Published Online: 15 SEP 2006

Abstract

A common problem with environmental data is that there may be samples or sampling intervals for which there are no data. Environmental studies cannot be designed for experiments that can be reproduced, therefore a sample that was not taken or was lost cannot be recovered. These losses of samples represent a total loss of information about the content of the species that were not directly determined for that sample. Also often there are only sufficient resources to perform partial sampling (every xth time interval) instead of contiguous measurements, or to analyze only part of the samples collected. This leads to values that are not obtained, and no information is available about the variables of interest during these sampling intervals. Incomplete data makes analysis using standard complete-data methods like distribution fitting impossible. Filling in missing values has strong appeal, because then standard complete-data methods can be applied and existing software can be used without any modification. This general strategy reduces greatly the burden of developing methods and computer codes for analyzing incomplete data.