Get access

Conditional Generalized Estimating Equations for the Analysis of Clustered and Longitudinal Data



Summary A common and important problem in clustered sampling designs is that the effect of within-cluster exposures (i.e., exposures that vary within clusters) on outcome may be confounded by both measured and unmeasured cluster-level factors (i.e., measurements that do not vary within clusters). When some of these are ill/not accounted for, estimation of this effect through population-averaged models or random-effects models may introduce bias. We accommodate this by developing a general theory for the analysis of clustered data, which enables consistent and asymptotically normal estimation of the effects of within-cluster exposures in the presence of cluster-level confounders. Semiparametric efficient estimators are obtained by solving so-called conditional generalized estimating equations. We compare this approach with a popular proposal by Neuhaus and Kalbfleisch (1998, Biometrics54, 638–645) who separate the exposure effect into a within- and a between-cluster component within a random intercept model. We find that the latter approach yields consistent and efficient estimators when the model is linear, but is less flexible in terms of model specification. Under nonlinear models, this approach may yield inconsistent and inefficient estimators, though with little bias in most practical settings.