A Monte Carlo investigation of factors influencing latent class analysis: An application to eating disorder research




Latent class analysis (LCA) has frequently been used to identify qualitatively distinct phenotypes of disordered eating. However, little consideration has been given to methodological factors that may influence the accuracy of these results.


Monte Carlo simulations were used to evaluate methodological factors that may influence the accuracy of LCA under scenarios similar to those seen in previous eating disorder research.


Under these scenarios, the aBIC provided the best overall performance as an information criterion, requiring sample sizes of 300 in both balanced and unbalanced structures to achieve accuracy proportions of at least 80%. The BIC and cAIC required larger samples to achieve comparable performance, while the AIC performed poorly universally in comparison. Accuracy generally was lower with unbalanced classes, fewer indicators, greater or nonrandom missing data, conditional independence assumption violations, and lower base rates of indicator endorsement.


These results provide critical information for interpreting previous LCA research and designing future classification studies. © 2011 by Wiley Periodicals, Inc. (Int J Eat Disord 2011)


Latent class analysis (LCA) and latent profile analysis (LPA) are frequently used in psychopathology research to identify qualitatively distinct phenotypes. These techniques have particular relevance for the field of eating disorder research, where much recent attention has focused on establishing the validity of systems for classifying eating disorders1 and revising diagnostic criteria for DSM-5.2 A number of studies have utilized LCA3–11 or LPA12–26 to identify eating disorder phenotypes. However, despite the recent popularity of these methods in the field of eating disorders, little consideration has been given to methodological factors that may influence the accuracy of these results. The incorrect specification of eating disorder phenotypes may adversely influence both future research efforts and clinical practice.

Several previous studies have examined methodological factors that may influence the accuracy of LCA results. Lin and Dayton27 evaluated three information criteria for determining model selection including the Akaike Information Criterion (AIC28), the Bayesian Information Criterion (BIC29), and the Consistent Akaike Information Criterion (cAIC30). Yang31 also evaluated the accuracy of LCA as a function of several information criteria, including the AIC, BIC, cAIC, and the Sample Size Adjusted Bayesian Information Criterion (aBIC32). These studies also considered the effects of sample size, number of true latent classes, and model complexity. Other simulations have addressed similar concerns.33 However, no simulation studies to date have evaluated the effects of unbalanced class size. In addition, to our knowledge, no simulation studies have assessed violations of the local independence assumption, although prior theoretical work has suggested that such violations lead to biased results and identifying spurious classes.34–36 Finally, no studies have evaluated the impact of sample composition in terms of the ratio of symptomatic-to-asymptomatic cases.

The purpose of this study is to evaluate several methodological factors that may influence the accuracy of LCA results in eating disorder research. These results will provide important information to those designing future studies utilizing LCA. The issues considered in this paper are based on those identified in previous eating disorder studies. As such, this article should serve as a guide for future eating disorder classification research. However, the information provided has relevance to those involved in classification research outside the area of eating disorders, particularly in studying other psychopathologies,37, 38 complex medical conditions such as metabolic syndrome39 and migraine,40 behavioral economics and consumer preferences,41 and genetics research.42, 43


Simulating an Underlying Model

The goal of this study was to explore the factors influencing LCA under conditions commonly seen in eating disorder research. As such, we sought to specify an initial “true model” that was similar in sample composition and structure to those seen in the eating disorder literature. To our knowledge, 24 studies have been published (or are in press) that have used either LCA or LPA to empirically define eating disorder subtypes.3–26 Of these, nine have used LCA.3–11 However, these nine studies vary considerably in terms of sample composition and analytic strategy. Two of the studies7, 9 used unique sets of indicators that were not eating disorder symptoms while two additional studies8, 10 involved community samples that included a majority of healthy subjects with minimal endorsement of symptoms; these four studies were not used in defining the true model. The remaining five studies focused on either treatment-seeking samples5, 11 or subsets of community samples that exhibited at least some disordered eating symptoms.3, 4, 6. For this study, these five reference studies were considered in constructing the true model.

Number of Classes

Two of the reference LCA studies identified three classes, two identified four classes, and the remaining study identified five classes. On the basis of these studies, we used a four-class structure to define the true model.

Number of Indicators

The number of indicators in the reference studies ranged from 5 to 15. On the basis of this observed range, we elected to define our model based on 10 indicators.

Conditional Probabilities

For each of the reference studies, the conditional probabilities of indicators for each class were initially categorized as low (0.00–0.19), medium low (0.20–0.39), medium (0.40–0.59), medium high (0.60–0.79), and high (0.80–1.00). Four common patterns were identified. For each of these patterns, we established the probability range of indicators (e.g., 0.00–0.19; 0.20–0.39, etc.) and used a random number generator to determine the exact conditional probability used in the true model for each indicator. We combined the indicators together based upon the observed combinations in the reference studies. The final conditional probabilities for the true model are presented in Table 1. Thus, the true model for this simulation has four classes, ten indicators, with conditional probabilities as specified in Table 1.

Table 1. Conditional probabilities of each latent class for “true model”
 Class 1Class 2Class 3Class 4
Indicator 10.810.440.950.50
Indicator 20.030.960.460.01
Indicator 30.510.910.260.02
Indicator 40.740.920.080.67
Indicator 50.380.950.610.50
Indicator 60.250.750.940.21
Indicator 70.080.820.130.92
Indicator 80.120.730.890.03
Indicator 90.060.880.070.01
Indicator 100.030.590.990.81

Factors Varied in the Analytic Stage

Seven factors influencing analyses were varied. These include: sample size, type of information criteria used, class balance, missing data patterns, local independence violation, number of indicators, and sample composition.

Sample Size

The range of sample sizes across the five reference studies was 429 (11) to 1179 (5). However, across all LCA and LPA eating disorders studies, sample sizes varied from as low as 55 (14) to as high as 3723 (17). Therefore, datasets of varying sample sizes were generated and analyzed, with samples of 100, 200, 300, 400, 500, 750, 1,000, and 2,000.

Information Criteria

Although some of the early LCA studies determined the number of classes based upon log-likelihood statistics, most of the later LCA and LPA studies in eating disorders research focused on one or more information criteria, including AIC, BIC, cAIC, and aBIC. The majority of these studies relied heavily on BIC. Based upon these studies, the AIC, BIC, cAIC, and aBIC were used to evaluate models in this study.

Class Balance

Previous simulation studies have used balanced (i.e., equal sample size) classes. However, no eating disorder studies to date have identified latent classes of comparable size. Therefore, the current simulation compared accuracy for balanced classes to unbalanced classes with proportions of 20%, 10%, 45%, and 25% based upon the distribution of latent classes observed in the reference studies.

Missing Data

Four missing data patterns were simulated. First, the amount of missing data across indicators was established as either 10% or 20%. Next, the mechanism for missing data was established as either missing completely at random (MCAR; equal probability of missing data for every observation) or not missing at random (NMAR; higher probability of missing data if symptom is endorsed). NMAR was accomplished by specifying a lower probability of missing for non-endorsed observations and a higher probability of missing for endorsed items to achieve an overall missing probability of 10% or 20% NMAR.

Local Dependence

To evaluate whether undetected local dependence influences accuracy in analyses, a second parallel underlying LC model was generated that included two indicators that are strongly locally dependent. The partial local dependence was simulated by incorporating a factor associated with two indicators within a latent class.44 This resulted in a correlation of approximately 0.40 between these indicators. The goal of creating this model was to assess the accuracy of analyses conducted without building in a relaxation to the local independence assumption.

Number of Indicators

While the number of “true” indicators defining the underlying model is kept constant at 10, it was of interest to vary the number of actual indicators used to analyze the generated datasets. This is analogous to work in regression modeling scenarios showing the effects on model fit of over-fitting or under-fitting such models with too many or too few parameters.45 Thus, models were fit using the correct 10 indicators but also with 12 or 14 indicators where the excess indicators theoretically should add no additional value to the class structures identified. To generate these excess indicators, values were generated from a binomial distribution with one draw and equal probabilities for each observation. Models were also fit underutilizing the indicators with only six or eight indicators used. To select these indicators, we eliminated the indicators that least differentiated classes via lowest partial eta-squared values generated from one-way analyses of variance comparing classes on each of the 10 indicators.

Sample Composition

Based upon previous studies showing rates of eating disordered symptoms in adolescent school-based samples ranging from 14% to 22%,46, 47 a latent class representing unaffected individuals in a “community sample” was added to the true model so as to represent 80% of the new total sample. Analysis was only performed for the sample size of 2,500 so as to reflect the large sample sizes frequently obtained in community-based samples. Thus, 2,000 additional unaffected cases were added as a new latent class representing 80% (2,000 of 2,500) to our four classes comprising 500 observations to create a new total sample of 2,500, with the structure and sample size of the “affected” subsample remaining unchanged. The probability of endorsing indicators in this unaffected class was set at 0.05 for all indicators.

Simulating Data and Measuring Accuracy

For each scenario described earlier, 100 datasets were generated using the Monte Carlo procedures in Mplus.48 All LCA analyses were then completed in R using poLCA Version 1.1.49 For each model, the accuracy of the LCA results in terms of the number of classes identified (based upon the minimum value of the information criterion) over the 100 datasets was compared to the actual number of classes in the true model.


Sample Size, Information Criteria, and Class Balance

The number of classes identified for balanced and unbalanced structures by information criterion and sample size is presented in Table 2. The AIC performed poorly across all conditions, typically identifying the correct number of classes only slightly more than half the time or less, and did not improve with increasing sample size; most often, the AIC overestimated the number of classes. The accuracy of the BIC, aBIC, and cAIC improved with increasing sample size. However, the accuracy of both the BIC and cAIC did not approach 80% until the sample size was 750 for balanced designs and 1000 for unbalanced designs, while for the aBIC only a sample size of 300 was required in either design. When the correct number of classes was not identified, the BIC and cAIC always underestimated the true number of classes; the aBIC showed a tendency to overestimate the number of classes when the sample size was 200 or less, but both over- and underestimated in larger sample sizes when it was incorrect. Overall, the accuracy of models was somewhat lower for unbalanced versus balanced structures. For the following sections, results are presented only for the unbalanced structure, given that this is the likely scenario in most eating disorder studies.

Table 2. LCA accuracy by latent class balance
Information criterionSample sizeClasses found
  1. Notes: BAL = balanced classes; UNB = unbalanced classes.


Missing Data

The accuracy of LCA models as a function of the amount (10% vs. 20%) and type (MCAR vs. NMAR) of missing data is presented in Table 3. As before, the accuracy of the AIC was poor across all conditions and did not change systematically due to the amount or type of missing data. As expected, the accuracy for the remaining information criteria was generally lower both when the amount of missing data was greater and when data were NMAR.

Table 3. LCA accuracy by type and amount of missing data
Information criterionSample sizeNumber of times correct number of classes identified

Local Dependence

Table 4 presents the accuracy of models with conditional dependence. All information criteria had markedly lower accuracy when an undetected conditional dependence was simulated as compared to when no such conditional dependence exists (from Table 2, unbalanced, 4 classes). Of note, overestimation of the number of classes was seen, especially with larger sample sizes: for example, aBIC, which had accurately assessed all 100 simulations with a sample size of 2,000 when there was no conditional dependence (Table 2), now estimated a five-class solution for all these simulations when a conditional dependence existed.

Table 4. LCA accuracy for local dependence
Information criterionSample sizeClasses found

Number of Indicators

Figure 1 presents the accuracy of LCA models as a function of information criterion, sample size, and number of indicators. The accuracy of AIC was best for sample sizes of 500 or more when only six indicators were included, and worsened as the number of indicators increased. The aBIC was less accurate when too few or too many indicators were used across most sample sizes, although accuracy was most dramatically affected for sample sizes of 300–750. A similar pattern of accuracy can be seen for BIC and cAIC for sample sizes above 300.

Figure 1.

LCA accuracy by number of indicators. Accuracy of LCA by number of indicators used (n = 10 in “true model”) as evaluated by the AIC, aBIC, BIC, and cAIC, for sample sizes ranging from N = 100 to N = 2,000.

Sample Composition

When a fifth class simulating an unaffected class of 2,000 was added to the 500 affected cases (mimicking an overall community sample), the accuracy of detecting all five classes correctly was relatively low even when using the aBIC. For the 100 simulations with an unbalanced structure, the aBIC identified a three-class solution 12 times, a four-class solution 34 times, and a five-class solution 54 times. All estimated models (regardless of the number of classes identified) tended to have one large class consisting primarily of the truly asymptomatic class cases, with the remainder of the affected cases divided into the remaining classes. The correspondence between true classes and empirical classes was far from perfect, with most empirical classes containing cases from at least four true classes; further information on how the true and empirical classes map onto one another in these simulations regarding sample composition is available upon request to the authors.


This study provides important information on factors influencing latent class analyses typically seen in eating disorders research and other medical fields. The effect of sample size on accuracy was consistent with previous simulation studies,27, 31 and the superior performance of the aBIC under most conditions replicates the findings by Yang.31 Missing data patterns, over-fitting or under-fitting the model with too many or too few indicators, violating local dependence, unbalanced class sizes, and sample composition all negatively affected the accuracy of detecting the correct number of classes.

These results have important implications for interpreting previous studies. Attention should be paid to what decision rules are used by each study, as AIC performs poorly and aBIC may be an optimal choice in most relevant settings. Prior studies that have smaller sample sizes6, 7, 9, 12, 14, 15, 22 should be interpreted with appropriate caution. Similarly, previous analyses of datasets with missing data, particularly when NMAR is suspected, are prone to bias as well. The findings from the conditional dependence simulations may be relevant to prior analyses that use both a measure of body weight and amenorrhea as indicators,4, 17 as it is likely that these are conditionally dependent due to their biologic correlation;50 such studies most likely overestimated the number of classes. Without knowledge of the true number of underlying classes, one cannot know which prior studies are prone to under- or overfitting; however, one may speculate that studies using very few or large numbers of indicators may be less accurate, particularly when using an information criterion other than aBIC. Studies that included all individuals in a community sample8, 13, 17 most likely identified an asymptomatic class correctly but perhaps underestimated the number and/or misidentified the composition of affected classes.

Regarding the design and analysis of future studies, these simulations serve as a guide for what aspects of study design and analytic approaches may influence the accuracy of the results. Sample size is certainly important, and based on these results a sample size of under 300 would not be recommended. Accuracy for the aBIC was superior to that of other information criteria assessed in this study. As with any research, missing data should be minimized. In the analytic stage, future studies should evaluate models for violations of the conditional independence assumption, and if such violations are found, steps should be taken to address this analytically; ignoring the violation will most likely result in overestimating the true number of classes. The BMI-amenorrhea correlation is a more straightforward exemplar of a plausible violation but many others may exist within eating disorder research. The results of the simulations varying the number of indicators show that there are problems with over-fitting and under-fitting such models; without knowledge of the true, underlying class structure, future studies should carefully evaluate the contributions of indicators in specifying the models. Finally, sample composition is important, and in future studies with community samples, researchers should consider the study goals when proceeding. If a goal is to help disentangle affected from unaffected individuals, the classes derived from LCA on the full sample should succeed. However, if the goal is to find meaningful classes within the affected subsample, performing LCA on only those endorsing symptoms may more accurately find these classes.

This is the first study to examine the effects of unbalanced class sizes, amount and type of missing data, number of indicators, conditional independence assumption violations, and sample composition on LCA; in particular, it is the only study to our knowledge to examine such issues in the context of eating disorders research. However, this study is not without certain limitations. The current simulation study is limited to the model structure we chose, and although this is particularly relevant to previous eating disorder studies, it is unknown whether other choices of parameter values (e.g., different number of true classes, different conditional probabilities) may have yielded similar results. In addition, there are factors other than those evaluated in the current simulation study that will likely influence the results of LCA (e.g., other settings of missing data issues). Finally, it is unclear to what extent these results will generalize to studies that use indicators that are not dichotomous.

This study provides evidence of the importance of several factors influencing LCA to identify qualitatively distinct phenotypes of eating disorders specifically and classification schema generally. When evaluating past works, researchers should use these findings to aid the interpretation of any results with appropriate skepticism. In the future, researchers should use these findings to guide the study design and analytic approaches in LCA.

Earn CE credit for this article!

Visit: http://www.ce-credit.com for additional information. There may be a delay in the posting of the article, so continue to check back and look for the section on Eating Disorders. Additional information about the program is available at www.aedweb.org