Sequential Analysis of Longitudinal Data in a Prospective Nested Case–Control Study



Summary The nested case–control design is a relatively new type of observational study whereby a case–control approach is employed within an established cohort. In this design, we observe cases and controls longitudinally by sampling all cases whenever they occur but controls at certain time points. Controls can be obtained at time points randomly scheduled or prefixed for operational convenience. This design with longitudinal observations is efficient in terms of cost and duration, especially when the disease is rare and the assessment of exposure levels is difficult. In our design, we propose sequential sampling methods and study both (group) sequential testing and estimation methods so that the study can be stopped as soon as the stopping rule is satisfied. To make such a longitudinal sampling more efficient in terms of both numbers of subjects and replications, we propose applying sequential sampling methods to subjects and replications, simultaneously, until the information criterion is fulfilled. This simultaneous sequential sampling on subjects and replicates is more flexible for practitioners designing their sampling schemes, and is different from the classical approaches used in longitudinal studies. We newly define the σ-field to accommodate our proposed sampling scheme, which contains mixtures of independent and correlated observations, and prove the asymptotic optimality of sequential estimation based on the martingale theories. We also prove that the independent increment structure is retained so that the group sequential method is applicable. Finally, we present results by employing sequential estimation and group sequential testing on both simulated data and real data on children's diarrhea.