A Semiparametric Empirical Likelihood Method for Biased Sampling Schemes with Auxiliary Covariates




Summary We consider a semiparametric inference procedure for data from epidemiologic studies conducted with a two-component sampling scheme where both a simple random sample and multiple outcome- or outcome-/auxiliary-dependent samples are observed. This sampling scheme allows the investigators to oversample certain subpopulations believed to have more information about the regression model while still gaining insights about the underlying population through the simple random sample. We focus on settings where there is no additional information about the parent cohort and the sampling probability is nonidentifiable. We motivate our problem with an ongoing study to assess the association between the mutation level of epidermal growth factor receptor (EGFR) and the antitumor response to EGFR-targeted therapy among nonsmall cell lung cancer patients. The proposed method applies to both binary and multicategorical outcome data and allows an arbitrary link function in the framework of generalized linear models. Simulation studies show that the proposed estimator has nice small sample properties. The proposed method is illustrated with a data example.