Volume 37, Issue 15
RESEARCH ARTICLE

Secondary outcome analysis for data from an outcome‐dependent sampling design

Yinghao Pan

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Search for more papers by this author
Jianwen Cai

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Search for more papers by this author
Matthew P. Longnecker

Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA

Search for more papers by this author
Haibo Zhou

Corresponding Author

E-mail address: zhou@bios.unc.edu

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Correspondence

Haibo Zhou, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Email: zhou@bios.unc.edu

Search for more papers by this author
First published: 22 April 2018
Citations: 1

Abstract

Outcome‐dependent sampling (ODS) scheme is a cost‐effective way to conduct a study. For a study with continuous primary outcome, an ODS scheme can be implemented where the expensive exposure is only measured on a simple random sample and supplemental samples selected from 2 tails of the primary outcome variable. With the tremendous cost invested in collecting the primary exposure information, investigators often would like to use the available data to study the relationship between a secondary outcome and the obtained exposure variable. This is referred as secondary analysis. Secondary analysis in ODS designs can be tricky, as the ODS sample is not a random sample from the general population. In this article, we use the inverse probability weighted and augmented inverse probability weighted estimating equations to analyze the secondary outcome for data obtained from the ODS design. We do not make any parametric assumptions on the primary and secondary outcome and only specify the form of the regression mean models, thus allow an arbitrary error distribution. Our approach is robust to second‐ and higher‐order moment misspecification. It also leads to more precise estimates of the parameters by effectively using all the available participants. Through simulation studies, we show that the proposed estimator is consistent and asymptotically normal. Data from the Collaborative Perinatal Project are analyzed to illustrate our method.

Number of times cited according to CrossRef: 1

  • Plasma proteomics reveals markers of metabolic stress in HIV infected children with severe acute malnutrition, Scientific Reports, 10.1038/s41598-020-68143-7, 10, 1, (2020).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.