• Collinearity;
  • Diagnosis;
  • Imputation;
  • Nonparametric maximum likelihood;
  • Survival analysis

Summary.  Multiple categorical variables are commonly used in medical and epidemiological research to measure specific aspects of human health and functioning. To analyze such data, models have been developed considering these categorical variables as imperfect indicators of an individual's “true” status of health or functioning. In this article, the latent class regression model is used to model the relationship between covariates, a latent class variable (the unobserved status of health or functioning), and the observed indicators (e.g., variables from a questionnaire). The Cox model is extended to encompass a latent class variable as predictor of time-to-event, while using information about latent class membership available from multiple categorical indicators. The expectation-maximization (EM) algorithm is employed to obtain maximum likelihood estimates, and standard errors are calculated based on the profile likelihood, treating the nonparametric baseline hazard as a nuisance parameter. A sampling-based method for model checking is proposed. It allows for graphical investigation of the assumption of proportional hazards across latent classes. It may also be used for checking other model assumptions, such as no additional effect of the observed indicators given latent class. The usefulness of the model framework and the proposed techniques are illustrated in an analysis of data from the Women's Health and Aging Study concerning the effect of severe mobility disability on time-to-death for elderly women.