Imputing unobserved values with the EM algorithm under left and right-truncation, and interval censoring for estimating the size of hidden populations



Capture–recapture techniques have been used for considerable time to predict population size. Estimators usually rely on frequency counts for numbers of trappings; however, it may be the case that these are not available for a particular problem, for example if the original data set has been lost and only a summary table is available. Here, we investigate techniques for specific examples; the motivating example is an epidemiology study by Mosley et al., which focussed on a cholera outbreak in East Pakistan. To demonstrate the wider range of the technique, we also look at a study for predicting the long-term outlook of the AIDS epidemic using information on number of sexual partners. A new estimator is developed here which uses the EM algorithm to impute unobserved values and then uses these values in a similar way to the existing estimators. The results show that a truncated approach – mimicking the Chao lower bound approach – gives an improved estimate when population homogeneity is violated.