• case–cohort;
  • estimated weights;
  • failure time;
  • inverse probability weights;
  • missing data


  1. Top of page
  2. Abstract
  3. Acknowledgements
  4. References

Abstract.  We consider semiparametric models for which solution of Horvitz–Thompson or inverse probability weighted (IPW) likelihood equations with two-phase stratified samples leads to inline image consistent and asymptotically Gaussian estimators of both Euclidean and non-parametric parameters. For Bernoulli (independent and identically distributed) sampling, standard theory shows that the Euclidean parameter estimator is asymptotically linear in the IPW influence function. By proving weak convergence of the IPW empirical process, and borrowing results on weighted bootstrap empirical processes, we derive a parallel asymptotic expansion for finite population stratified sampling. Several of our key results have been derived already for Cox regression with stratified case–cohort and more general survey designs. This paper is intended to help interpret this previous work and to pave the way towards a general Horvitz–Thompson approach to semiparametric inference with data from complex probability samples.


  1. Top of page
  2. Abstract
  3. Acknowledgements
  4. References

The second author owes thanks to Galen Shorack for a helpful discussion concerning the representation in appendix A. This work was supported in part by grants from the US National Institutes of Health and National Science Foundation.


  1. Top of page
  2. Abstract
  3. Acknowledgements
  4. References
  • Andersen, P. K. & Gill, R. D. (1982). Cox's regression model for counting processes: a large sample study. Ann. Statist. 10, 11001120.
  • Begun, J. M., Hall, W. J., Huang, W.-M. & Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11, 432452.
  • Binder, D. A. (1992). Fitting Cox's proportional hazards models from survey data. Biometrika 79, 139147.
  • Borgan, Ø., Langholz, B., Samuelsen, S. O., Goldstein, L. & Pogoda, J. (2000). Exposure stratified case-cohort designs. Lifetime Data Anal. 6, 3958.
  • Breslow, N. (1974). Covariance analysis of censored survival data. Biometrics 30, 8999.
  • Breslow, N. E. & Holubkov, R. (1997). Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling. J. Roy. Statist. Soc. Ser. B 59, 447461.
  • Breslow, N., McNeney, B. & Wellner, J. A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann. Statist. 31, 11101139.
  • Chen, K. (2001). Generalized case-cohort sampling. J. Roy. Statist. Soc. Ser. B Statist. Methodol. 63, 791809.
  • Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34, 187220.
  • Henmi, M. & Eguchi, S. (2004). A paradox concerning nuisance parameters and projected estimating functions. Biometrika 91, 929941.
  • Horvitz, D. G. & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47, 663685.
  • Kalbfleisch, J. D. & Lawless, J. F. (1988). Likelihood analysis of multi-state models for disease incidence and mortality. Statist. Med. 7, 149160.
  • Kulich, M. & Lin, D. Y. (2000). Additive hazards regression for case-cohort studies. Biometrika 87, 7387.
  • Kulich, M. & Lin, D. Y. (2004). Improving the efficiency of relative-risk estimation in case-cohort studies. J. Amer. Statist. Assoc. 99, 832844.
  • Lawless, J. F., Kalbfleisch, J. D. & Wild, C. J. (1999). Semiparametric methods for response-selective and missing data problems in regression. J. Roy. Statist. Soc. Ser. B Statist. Methodol. 61, 413438.
  • Lin, D. Y. (2000). On fitting Cox's proportional hazards models to survey data. Biometrika 87, 3747.
  • Lin, D. Y. & Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika 81, 6171.
  • Manski, C. F. & Lerman, S. R. (1977). The estimation of choice probabilities from choice based samples. Econometrica 45, 19771988.
  • Nan, B. (2004). Efficient estimation for case-cohort studies. Can. J. Statist. 32, 403419.
  • Nan, B., Emond, M. J. & Wellner, J. A. (2004). Information bounds for Cox regression models with missing data. Ann. Statist. 32 723753.
  • Neyman, J. (1938). Contribution to the theory of sampling human populations. J. Amer. Statist. Assoc. 33, 101116.
  • Overton, S. W. & Stehman, S. V. (1995). The Horvitz–Thompson theorem as a unifying perspective for probability sampling: With examples from natural resource sampling. Amer. Statist. 49, 261268.
  • Pierce, D. A. (1982). The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann. Statist. 10, 475478.
  • Præstgaard, J. & Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 21, 20532086.
  • Pugh, M., Robins, J., Lipsitz, S. & Harrington, D. (1994). Inference in the Cox proportional hazards model with missing covariates. Technical Report 758Z. Harvard School of Public Health, Boston, MA.
  • Robins, J. M., Rotnitzky, A. & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89, 846866.
  • Scheike, T. H. & Martinussen, T. (2004). Maximum likelihood estimation for Cox's regression model under case-cohort sampling. Scand. J. Statist. 31, 283293.
  • Scott, A. & Wild, C. (2002). On the robustness of weighted methods for fitting models to case-control data. J. Roy. Statist. Soc. Ser. B Statist. Methodol. 64, 207219.
  • Scott, A. J. & Wild, C. J. (1986). Fitting logistic models under case-control or choice based sampling. J. Roy. Statist. Soc. Ser. B 48, 170182.
  • Scott, A. J. & Wild, C. J. (1997). Fitting regression models to case-control data by maximum likelihood. Biometrika 84, 5771.
  • Self, S. G. & Prentice, R. L. (1988). Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 16, 6481.
  • Skinner, C. J., Holt, D. & Smith, T. M. F. (eds) (1989). Analysis of complex surveys. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons Ltd, Chichester.
  • Therneau, T. M. & Grambsch, P. M. (2000). Modeling survival data: extending the Cox model. Statistics for Biology and Health. Springer–Verlag, New York, NY.
  • Van Der Vaart, A. W. (1998). Asymptotic statistics, vol. 3 of Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge.
  • Van Der Vaart, A. W. & Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, NY.
  • Xie, Y. & Manski, C. F. (1989). The logit model and response-based samples. Sociol. Methods Res. 17, 283302.