Complex Samples and Regression-Based Inference: Considerations for Consumer Researchers




  • Robert B. Nielsen ( is an Associate Professor at Department of Financial Planning, Housing and Consumer Economics, University of Georgia. Martin C. Seay ( is an Assistant Professor at School of Family Studies and Human Services, Kansas State University. Both the authors contributed equally.


This article demonstrates that researchers who treat data collected via complex sampling procedures as if they were collected via simple random sample (SRS) may draw improper inferences when estimating regression models. Using complex sample data from the 2004 panel of the Survey of Income and Program Participation (SIPP) two models—one ordinary least squares (OLS) regression and one logistic regression—were estimated using three methods: SRS with and without population weights, Taylor series linearization, and Fay's Balanced Repeated Replication (BRR). The results of the alternative models demonstrate that depending on the variables of interest, authors who fail to incorporate sample design information or fail to consider the effects of weighting may draw improper inferences from their regression models. Reasons why researchers continue to neglect complex sample-based variance are proposed and discussed, and example SAS and Stata code is offered to encourage adoption by the consumer research community.