• Binary data;
  • Confounding;
  • Cost efficiency;
  • Diagnostic screening;
  • Experimental design;
  • Generalized linear models;
  • Group testing;
  • HIV

Summary. Whether the aim is to diagnose individuals or estimate prevalence, many epidemiological studies have demonstrated the successful use of tests on pooled sera. These tests detect whether at least one sample in the pool is positive. Although originally designed to reduce diagnostic costs, testing pools also lowers false positive and negative rates in low prevalence settings and yields more precise prevalence estimates. Current methods are aimed at estimating the average population risk from diagnostic tests on pools. In this article, we extend the original class of risk estimators to adjust for covariates recorded on individual pool members. Maximum likelihood theory provides a flexible estimation method that handles different covariate values in the pool, different pool sizes, and errors in test results. In special cases, software for generalized linear models can be used. Pool design has a strong impact on precision and cost efficiency, with covariate-homogeneous pools carrying the largest amount of information. We perform joint pool and sample size calculations using information from individual contributors to the pool and show that a good design can severely reduce cost and yet increase precision. The methods are illustrated using data from a Kenyan surveillance study of HIV. Compared to individual testing, age-homogeneous, optimal-sized pools of average size seven reduce cost to 44% of the original price with virtually no loss in precision.