Conflicts of interest: the authors have declared no conflicts of interest.
Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files
Version of Record online: 30 AUG 2007
Paediatric and Perinatal Epidemiology
Special Issue: Addressing Gestational Age Measurement Using Birth Certificate Data
Volume 21, Issue Supplement s2, pages 97–105, September 2007
How to Cite
Parker, J. D. and Schenker, N. (2007), Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files. Paediatric and Perinatal Epidemiology, 21: 97–105. doi: 10.1111/j.1365-3016.2007.00866.x
- Issue online: 30 AUG 2007
- Version of Record online: 30 AUG 2007
- missing data;
- multiple imputation;
- birth records
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight–gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.