• Latent class;
  • mixture model;
  • psychometrics;
  • cigarette smoking patterns;
  • menthol

Recent advances in computation have greatly broadened the range of statistical models researchers may consider for their work. To the extent that a statistical model is a possible representation of phenomena of interest and a possible representation of the data-generating mechanism, this expansion of the range of specifiable models correspondingly opens new opportunities for theory development, modeling and testing. That is, new statistical models may lead to new research questions. Scientific areas that have proven difficult to clarify with standard approaches may be well suited for examination with new methodologies. The role of menthol in tobacco use and associated outcomes is one such domain [1]. This commentary introduces a large class of statistical approaches that fit well with existing models of tobacco use, yet also introduce important flexibility, potentially leading to a clearer understanding of cigarette smoking and menthol's role therein.

Standard analytical models such as regression or analysis of variance rely upon multiple assumptions, one of which is that the model applies equally well to everyone in the sample. That is, when a single regression model is estimated, it is assumed that the model and estimates describe everyone in the data (and population, if one is generalizing). Subsequent discussion often involves the authors' extrapolating from sample-based results to individual level processes. For example, finding a positive association between menthol preference and nicotine dependence in a sample could lead to a discussion about menthol exacerbating dependence symptoms at the individual level. This might then be followed by an effort to remove menthol from tobacco products with the hope that it leads to reduced levels of nicotine dependence, greater cessation success and better health outcomes. The association between menthol and nicotine dependence is on the sample level, but the theoretically oriented discussion and potential intervention strategy are at the individual level.

Historically, this single population model assumption was untestable. However, a broad class of statistical models allowing one to examine this assumption has and continues to be developed. Latent class and mixture models (for brevity, latent class and mixture will be abbreviated as LC) [2–4] are statistical models that allow qualitative differences among population subgroups (i.e. the latent classes or mixture components). Essentially, the LCs capture different patterns of association among measured items. Rather than assuming that menthol preference and nicotine dependence are associated positively among all members of a population, one could instead test a model with two subgroups: one in which menthol preference and nicotine dependence are associated positively and another in which they are unrelated.

If the latter situation were true, i.e. two different populations with different associations between menthol preference and nicotine dependence, and that class structure is ignored, then the sample-based parameter estimates are unlikely to represent anyone. Sample averaged estimates would, in fact, represent the overall sample, but they would only be useful for describing the sample. They would not help to clarify the connection between menthol and an individual's nicotine dependence. In fact, estimates based upon models that ignore class structure may be quite misleading. It depends upon the size of the classes and differences in associations across the classes.

If the two classes in this example were the same size in the population (50% each), then the sample average estimates are the average of the two within-class associations. Conversely, if 90% of the sample were from class 1 and 10% from class 2, the sample average estimates would be much closer to the class 1 association; however, it would probably be attenuated. In any case, ignoring the class structure may make accurate substantive interpretation far more difficult.

Tobacco researchers have long discussed different types of smokers. Class or categorical models of tobacco use progression are popular [5]. Different types of smokers have also been discussed in the adult smoking literature [6–9]. The notion of qualitative differences underlies these group labels. For example, light smokers are often discussed in terms qualitatively different from regular or stereotypical smokers: non-dependent, low-rate, driven by external factors, social smokers, etc. The patterns and associations among items imposed by qualitatively different types of smokers maps clearly onto the LC statistical model.

In addition to these different types of smokers, group differences in smoking have been identified across demographic and other social categories [10]. For example, African American and Hispanic smokers have historically had lower smoking rates and lower life-time consumption. Many of these same demographic and social groups also experience tobacco-related health disparities [11]. This combination of lower typical use and relatively poorer health outcomes has led to some of the interest and research on menthol. For example, African Americans are more likely to prefer mentholated cigarettes [12], and these smokers also experience some of the worst tobacco-related health disparities [13]. However, these differences among demographic and social groups raise questions about the structure of smoking within these groups. Do the different types of smokers mentioned above look the same across demographic groups? Are there simply fewer heavy smokers among African Americans (for example), or does the profile of heavy smokers among African Americans look different from the stereotypical profile of heavy smokers? Are light smokers similar across demographic groups too, or do they vary? Finally, does menthol play the same role in smoking across demographic groups and qualitatively different types of smokers?

LC models conform well to the research situation described. Qualitatively different types are discussed, yet a precise description of these groups is unavailable. In LC models, unlike discriminant function analysis or logistic regression, subgroup membership is unobservable, hence the ‘latent’ in latent class. LC models may be used for either exploratory or confirmatory analysis, similar to factor analysis [14]. In the absence of theory indicating how many and what classes to expect, LC analysis is exploratory. What are the prevalent cigarette smoking patterns? Are they comparable across demographic and other social groups? If not, similarities and differences in these smoking types may be important theoretically. LC models may also be used as psychometric latent variable models, paralleling how the factor model is used as a psychometric model for continuous latent dimensions [15]. This is a necessary step in developing a replicable approach to measuring smoking patterns. As knowledge and theory accrue, theories may be tested in a confirmatory manner. Does a parallel set of classes hold across demographic groups? Do the patterns of effects implied by a particular theory hold for tobacco classes in other demographic groups?

LC models may provide a more refined characterization of tobacco use, empirically identifying common patterns between and within populations. If, in fact, associations between covariates and outcomes differ across demographic groups and/or smoking patterns, LC models are a natural step forward. Researchers interested in these models should consult the references included here and other peer-reviewed literature in order to familiarize themselves with model use and performance. Free programs available to estimate these models include PROC LCA and PROC LTA for SAS (, poLCA for R ( and GLLAMM for STATA ( (These are listed in the author's opinion of their ease of use.) Commercial software options include Mplus and LatentGold. These models provide a rich and flexible framework for description, as well as theory-building research.


  1. Top of page
  2. Declaration of interests
  3. Acknowledgements
  4. References

This work was supported by NIH grant R37DA18673.


  1. Top of page
  2. Declaration of interests
  3. Acknowledgements
  4. References
  • 1
    Hebert J. R. Invited commentary. Menthol cigarettes and risk of lung cancer. Am J Epidemiol 2003; 158: 61720.
  • 2
    McCutcheon A. L. Latent Class Analysis. Thousand Oaks, CA: Sage; 1987.
  • 3
    Clogg C. C. Latent class models. In: ArmingerG., CloggC. C., SobelM. E., editors. Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum Press; 1995, p. 31159.
  • 4
    McLachlan G., Peel D. Finite Mixture Models. New York, NY: Wiley; 2000.
  • 5
    Flay B. Youth tobacco use: risks, patterns, and control. In: OrleansC. T., SladeJ. D., editors. Nicotine Addiction: Principles and Management. New York, NY: Oxford University Press; 1993, p. 36584.
  • 6
    Shiffman S. Tobacco ‘chippers’—individual differences in tobacco dependence. Psychopharmacology (Berl) 1989; 97: 53947.
  • 7
    Wortley P. M., Husten C. G., Trosclair A., Chrismon J., Pederson L. L. Nondaily smokers: a descriptive analysis. Nicotine Tob Res 2003; 5: 7559.
  • 8
    Hyland A., Rezaishiraz H., Bauer J., Giovino G. A., Cummings K. M. Characteristics of low-level smokers. Nicotine Tob Res 2005; 7: 4618.
  • 9
    Zhu S. H., Pulvers K., Zhuang Y., Báezconde-Garbanati L. Most Latino smokers in California are low-frequency smokers. Addiction 2007; 102: 10411.
  • 10
    US Department of Health and Human Services. Tobacco Use Among U.S. Racial/Ethnic Minority Groups. Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease and Health Prevention and Health Promotion, Office on Smoking and Health; 1998.
  • 11
    Flaherty B. P., Clayton R. R., Alexander L. A., editors. Conceptual and methodological issues for research on tobacco-related health disparities. Addiction 2007; 102 (Suppl 2).
  • 12
    Giovino G. A., Sidney S., Gfroerer J. C., O'Malley P. M., Allen J. A., Richter P. A. et al. Epidemiology of menthol cigarette use. Nicotine Tob Res 2004; 6: S67S81.
  • 13
    Fagan P., Moolchan E. T., Lawrence D., Fernander A., Ponder P. K. Identifying health disparities across the tobacco continuum. Addiction 2007; 102 (Suppl 2): 529.
  • 14
    Comrey A. L., Lee H. B. A First Course in Factor Analysis, 2nd edn. Hillsdale, NJ: Lawrence Erlbaum Associates; 1992.
  • 15
    Flaherty B. P. Assessing reliability of categorical substance use measures with latent class analysis. Drug Alcohol Depend 2002; 68: S720.