Abstract
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
Multi-group latent growth modelling in the structural equation modelling framework has been widely utilized for examining differences in growth trajectories across multiple manifest groups. Despite its usefulness, the traditional maximum likelihood estimation for multi-group latent growth modelling is not feasible when one of the groups has no response at any given data collection point, or when all participants within a group have the same response at one of the time points. In other words, multi-group latent growth modelling requires a complete covariance structure for each observed group. The primary purpose of the present study is to show how to circumvent these data problems by developing a simple but creative approach using an existing estimation procedure for growth mixture modelling. A Monte Carlo simulation study was carried out to see whether the modified estimation approach provided tangible results and to see how these results were comparable to the standard multi-group results. The proposed approach produced results that were valid and reliable under the mentioned problematic data conditions. We also present a real data example and demonstrate that the proposed estimation approach can be used for the chi-square difference test to check various types of measurement invariance as conducted in a standard multi-group analysis.
1. Introduction
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
There are many situations where we want to know if a measurement or structural equation model for one group has the same parameter values as in other groups (Bollen, 1989). This question can be addressed using a multi-group approach in which various forms of invariance are tested across groups, with or without latent variables, in the structural equation modelling (SEM) framework (Jöreskog, 1973; Sörbom, 1974). There has been a great deal of multi-group SEM research on various methodological and substantive topics (e.g., see Byrne, Shavelson & Muthén, 1989; Cheung & Rensvold, 2002; Cole, Martin & Steiger, 2005; LaGrange et al., 2011; Mun, Fitzgerald, von Eye, Puttler & Zucker, 1989; Muthén, 2001a; Rivera & Satorra, 2002; Vandenberg & Lance, 2000). In recent years, multi-group SEM has been extended to latent growth modelling (LGM) to examine differences in growth trajectories across multiple manifest (observed) groups (McArdle, 1989; Meredith & Tisak, 1990, 2001). For substantive, as well as methodological, examples, see Little, Schnabel and Baumert (2000), McArdle (2000), Muthén and Asparouhov (2002), Palardy (2008), and Wang, Siegal, Falck, Carlson and Rahman (1999).
Despite the usefulness of a multi-group LGM approach, a couple of data problems may arise especially when one of the known, manifest groups is small. For example, Figure 1 shows a hypothetical situation in which heterogeneity in depression trajectories is examined using LGM across several race groups, with Native American and Asian groups having small sample sizes. If any one of these small groups has completely missing responses at a single time point, due to either study design (no planned follow-up) or empirical missingness, then the subsequent estimation fails because the traditional maximum likelihood (ML) estimation for multi-group analysis in the SEM framework initiates its estimation procedures with complete covariance structures for all groups. That is, the estimation fails because a covariance structure for one group cannot be fully specified (i.e., an indicator variable has neither variance nor covariance within the group). Similarly, if all participants within a group have the same response or if only one participant within a group has a response on an indicator, the traditional estimation method also fails for the same reason – neither variance nor covariance can be determined.
These problematic data situations in multi-group analysis are a serious barrier for anyone who wants to implement a multi-group growth model in the SEM framework. The simplest option is to exclude the indicator variable that has no variance from the data. However, such an action has several unattractive implications. First, this approach will result in not fully utilizing existing data for all other groups. Second, depending on the model, removing a critical indicator variable may result in less optimal estimation of the entire model. For example, removing a final follow-up time point could lead to biased growth factor estimates for all groups. Third, in a more complex model, such as piecewise LGM (Bollen & Curran, 2006; Muthén & Muthén, 1999; Raudenbush & Bryk, 2002), reducing the number of indicator variables may not be a viable option [in terms of identification] especially when there exists a minimal number of time points within a single phase or when higher-order polynomials, such as quadratic growth models, have to be specified with a few available time points.
This estimation problem can be circumvented, however, using a simple but creative adjustment approach that takes advantage of an existing estimation procedure for finite mixture modelling with known classes (Muthén & Muthén, 1999). In this adjustment approach, a mixture estimation procedure is employed with a single latent class that encompasses multiple manifest groups. Since there is only one latent class with multiple manifest groups, the model specification is essentially the same as the standard multi-group analysis that has multiple manifest groups. However, unlike the standard multi-group SEM estimation procedure that begins with the premise that the covariance structure for each manifest group must be complete, the mixture estimation approach does not have that requirement. Mixture modelling with known classes in Mplus (Muthén & Muthén, 1999) technically treats manifest groups as a special case of latent classes in the sense that the membership of latent classes is known beforehand (i.e., known classes).1 This alternative to multi-group SEM, the mixture approach with known classes, does not check whether all a priori known classes (i.e., manifest groups) have complete covariance structures. Theoretically, it is unreasonable to check the individual covariance structure for each known class prior to estimating model parameters, because the known classes in a mixture model, as opposed to manifest groups in a standard multi-group LGM, are technically ‘latent’ classes. Latent class membership is determined based on posterior probabilities that are assigned during the estimation process. By regarding the manifest groups as latent classes whose membership is known in this mixture estimation approach, the procedure checks the covariance structure of entire data as a whole, not group-specific covariance structures. The differing approaches to data between these two estimation procedures (the standard multi-group LGM and the mixture multi-group LGM) make a critical difference when estimating a model using data with incomplete covariance structures for some groups. It is not estimable in the former but estimable in the latter.
Mixture multi-group LGM has been utilized as an alternative to multi-group LGM when analysing data with some of these challenging characteristics in recent applied research. For example, supplemental figures available in the online version of the recent article by White, Lee, Mun and Loeber (2012) were drawn with the estimates produced by using this mixture multi-group LGM approach. While these two approaches are considered as equivalent by many for practical reasons, a couple of differences exist conceptually and procedurally. Most important, there is a need to examine these two procedures methodologically and systematically, and to empirically examine whether the mixture estimation approach with known classes produces valid estimates under these problematic data conditions.
The present study describes the estimation procedures of these two approaches in depth, and reports findings from both a simulation study and a real data example. We conducted a Monte Carlo simulation study to examine whether the mixture multi-group estimation provides tangible results, as opposed to the standard multi-group estimation, when a group has no variability on an indicator variable. In addition, we examined how comparable the known class mixture estimation results are to the standard estimation results when there are no data problems. To show these, the present study applied the two estimation procedures to simulated data sets with or without the data problems across several select conditions. Details of the simulations are provided in Section 'Monte Carlo simulation'. A real data example from a smoking cessation clinical trial (Bolt, Piper, Theobald & Baker, 2012; Piper et al., 2009, 2011) is also provided to show the feasibility of the mixture estimation with known classes in the presence of one of the specified data problems, and to show how to test invariance of growth factors using likelihood ratio tests in the context of multi-group LGM analysis.
2. Data problems
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
A couple of data characteristics for which standard SEM estimation cannot give results for a multi-group analysis are presented in this section. To begin, consider a simple, typical type of multi-group data structure in the context of a longitudinal study design. Suppose a researcher is interested in the efficacy of a depression medication for individuals who have a history of alcohol dependence. Depression symptom levels after the pharmacological intervention are collected through hand-held PCs or Palm Pilots daily for 5 days using a seven-point Likert scale. Let the group variable be race: Caucasian (65%), African American (20%), Asian (10%), and Native American (5%). These kinds of real-time ecological momentary assessment (EMA) data tend to have a substantial portion of missing responses (Stone & Shiffman, 1994). Thus, we suppose that the depression symptom levels are available from 400 individuals with 30% of all possible responses missing. A brief illustration is provided in Figure 1.
By fitting a multi-group latent growth model (Bollen & Curran, 2006; McArdle, 2000; Muthén & Muthén, 1999), we would like to see not only the change in depression after the intervention but also whether there are significant differences in those changes across the four different race groups. Suppose that a small group has only one response or even no response at one time point. For example, only one participant in the Native American group responds at T5, or responses by the Native American group are completely missing at T5 as shown in Figure 1. In this case, the standard multi-group SEM procedure fails because a covariance matrix involving T5 data is incomplete for that group, which means an incomplete covariance structure exists for the Native American group. Another situation in which every subject in a group has the same response at least for one time point also results in an estimation problem for the same reason as before, namely no variance. For example, suppose that all subjects in the Asian group rate their depression symptom levels as 2 on a seven-point scale at T4 as shown in Figure 1. In this case, covariances or correlations involving the fourth indicator cannot be calculated for the Asian group, resulting in an incomplete covariance structure.
As discussed previously, one possible solution to this estimation failure due to the completely missing data cell or the same response data cell in Figure 1 would be to eliminate these data at T4 or T5 for all groups from analysis. However, valuable post-intervention outcome data for the majority of the sample will not be utilized, and any resulting growth trajectories may not be very trustworthy because the growth trajectories are based on only three or four time points in this particular hypothetical example. The validity of a latent growth curve model is directly related to the number of indicator variables, that is, the number of time points in growth models (Kim, 2011). Data from four time points are normally acceptable for a linear growth model, but they are not enough, for example, when the sample size is small or when a quadratic slope needs to be estimated. Moreover, when both the missing data problem and the same response data problem simultaneously happen at different time points or when there are a limited number of time points, it may not be feasible to exclude multiple time points in analysis. For example, with four assessment time points, we cannot eliminate data from two waves because it will prevent us from fitting a latent growth curve model.
These situations are not uncommon, especially for cohort sequential longitudinal data. A cohort sequential longitudinal design is often recommended as an economical way to assess a behaviour of interest over a long period of time (Duncan, Duncan & Hops, 1996). Assuming there is sufficient overlap in assessment time periods across cohorts, we can draw valid inference about developmental trajectories from multiple cohorts. For example, White et al. (2012) conducted a multi-group, four piecewise linear growth curve model and examined alcohol use trajectories during the transition from adolescence to adulthood for the following five violence groups: non-violent (n = 580; 65%), late-onsetters (n = 51; 6%), desisters (n = 76; 9%), persisters (n = 103; 12%), and one-time offenders (n = 84; 9%). The sample was made up of two different cohorts: youngest and oldest cohorts who were followed up from the first and seventh grade, respectively (Loeber, Farrington, Stouthamer-Loeber & White, 1986). Thus, this cohort sequential longitudinal design made it possible to examine alcohol trajectories from ages 12 to 24–25 years, a much larger developmental window than using data from either cohort alone. However, this also created a situation where data were sparse at both ends of the age range and even sparser or completely missing when examined separately for each cohort. More specifically, the covariance (data) coverage between some of the time points was low, and there were either zero valid observations or only one valid observation (no variance in either case) for some of the violence groups at a couple of time points. We also provide a real data example of the same response data problem (Bolt et al., 2012; Piper et al., 2009, 2011) to further examine the mixture multi-group procedure with known classes for the tricky data problems, in Section 'Real data analysis'.
3. Growth mixture model with known classes
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
Mixture modelling with known classes (Muthén & Muthén, 1999) can be used when one wants to perform a mixture analysis while taking manifest group membership, such as gender, into consideration. In the mixture model with known classes, there are two types of categorical latent variables: one is a latent class variable, whose values are unknown and estimated by the model; and the other is a known class variable that corresponds to manifest group membership, such as boys and girls or intervention and control groups. Therefore, this model is a combination of latent class analysis (i.e., mixture models) and multi-group analysis. For example, if two latent classes are specified along with four known classes (i.e., four manifest groups), a total of eight () class patterns are formed in the model: from ‘1 and 1’ (first known class and first latent class), ‘1 and 2’ (first known class and second latent class), and so on up to ‘4 and 2’ (fourth known class and second latent class).
For the purpose of the present study, mixture modelling with known classes is applied to a latent growth model in this section, resulting in growth mixture modelling with known classes (Muthén & Muthén, 1999). A path diagram is provided in Figure 2 for a graphical illustration of the model. A thorough model specification is omitted here because growth mixture modelling (GMM; Muthén, 2001b,b, 2002; Muthén & Shedden, 1992) and multiple group analysis (e.g., Jöreskog, 1973; Sörbom, 1974; Vandenberg & Lance, 2000) are well documented elsewhere, and because the specifications of these models for the purpose of the estimation are explained in the next section. Notice that the path diagram is similar to that of GMM, with the difference being the introduction of a known class variable (manifest group variable in the form of a categorical latent variable). Both latent class and known class variables are technically categorical latent variables. However, while latent classes are really latent, known classes, in fact, correspond to manifest groups.
The present study utilizes this special extension of GMM that includes one latent class variable and one a priori known class variable. We identified some critical data problems in a multi-group longitudinal data analysis mentioned previously and applied one special case of the GMM with a known class variable to data to circumvent these problems. This approach involves specifying one latent class variable with a single category and the other latent class variable (i.e., known class variable) to indicate multiple manifest groups.2 As a result, the GMM with one latent class and multiple known classes is equivalent to the standard multi-group LGM because the known classes of this mixture approach are fundamentally the manifest groups. The two approaches, the GMM with one latent class and multiple known classes and the standard multi-group LGM, can be used interchangeably when data across all manifest groups have complete covariance structures.
6. Real data analysis
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
In this section, the mixture multi-group approach is applied to a real data set with one of the identified data problems as an alternative to the standard multi-group approach. We show a case of the same response problem in this example, having briefly described an example of the missing data problem in the previous section (White et al., 2012). We present this analysis example to show the feasibility of the mixture multi-group procedure under these problematic data situations and to show that we can calculate a χ^{2} difference test statistic for invariance tests that are typically implemented in a standard multi-group analysis using provided log-likelihood values in the results. It should be noted that the example provided is for demonstration purposes and thus no serious substantive conclusions should be construed from the findings. The Mplus code is provided in the Appendix
The data used in this analysis are from a large placebo-controlled, comparative effectiveness smoking cessation clinical trial conducted at the University of Wisconsin Center for Tobacco Research and Intervention (Bolt et al., 2012; Piper et al., 2009, 2011). This study was designed to test the efficacy of five cessation pharmacotherapy treatments (nicotine lozenge, nicotine patch, sustained-release bupropion, nicotine patch plus nicotine lozenge, and bupropion plus nicotine lozenge) versus placebo (see Piper et al., 2009, 2011; for more details on study methods and main results). As part of the study assessment, intensive longitudinal data were collected via EMA. Study participants completed four daily EMA reports (just after waking, prior to going to bed, and two additional reports timed to occur randomly during the day) for one week prior to making a quit attempt and for 2 weeks after the quit day. Participants made ratings of nicotine withdrawal symptoms, self-efficacy, motivation, cessation fatigue, smoking, alcohol use, stress, and context (situational factors that may increase risk of smoking). The EMA methodology is described in more detail in Bolt et al. (2012) and Piper et al. (2011).
For a growth model, we utilized seven waves of daily negative affect (NA) ratings in the cessation clinical trial, from quit day to 1 week post-quit. The main outcome measure, NA, was an average score of two five-point (1 to 5) Likert-type scale items: one item was ‘upset’ and the other was ‘distressed.’ Therefore, NA ranged from 1 to 5, in increments of 0.5. The group variable of interest was marital status, assessed using six categories: married, n = 565 (46.3%); divorced, n = 263 (21.5%); widowed, n = 34 (2.8%); separated, n = 29 (2.4%); never married, n = 222 (28.2%); and domestic partner, n = 108 (8.8%). Descriptive statistics for the indicator variables and frequencies of responses are presented in Table 5.
Table 5. Descriptive statistics of negative affect by martial groupsMarital status | | Time |
---|
| T1 | T2 | T3 | T4 | T5 | T6 | T7 |
---|
Note |
Married | M | 1.313 | 1.320 | 1.321 | 1.254 | 1.327 | 1.276 | 1.264 |
SD | 0.620 | 0.675 | 0.686 | 0.554 | 0.700 | 0.622 | 0.615 |
n | 534 | 493 | 479 | 475 | 451 | 449 | 453 |
Divorced | M | 1.406 | 1.464 | 1.424 | 1.429 | 1.367 | 1.448 | 1.403 |
SD | 0.638 | 0.766 | 0.731 | 0.868 | 0.721 | 0.790 | 0.752 |
n | 245 | 235 | 230 | 219 | 210 | 201 | 206 |
Widowed | M | 1.288 | 1.300 | 1.350 | 1.161 | 1.148 | 1.250 | 1.286 |
SD | 0.468 | 0.726 | 0.559 | 0.351 | 0.477 | 0.553 | 0.615 |
n | 33 | 30 | 30 | 31 | 27 | 28 | 28 |
Separated | M | 1.173 | 1.273 | 1.250 | 1.068 | 1.023 | 1.048 | 1.000 |
SD | 0.468 | 0.650 | 0.511 | 0.234 | 0.107 | 0.218 | 0.000 |
n | 26 | 22 | 24 | 22 | 22 | 21 | 19 |
Never married | M | 1.552 | 1.500 | 1.398 | 1.377 | 1.387 | 1.389 | 1.457 |
SD | 0.864 | 0.780 | 0.687 | 0.749 | 0.738 | 0.675 | 0.771 |
n | 212 | 185 | 182 | 175 | 173 | 166 | 163 |
Not married, but living with domestic partner | M | 1.500 | 1.355 | 1.340 | 1.272 | 1.356 | 1.283 | 1.328 |
SD | 0.883 | 0.719 | 0.657 | 0.572 | 0.654 | 0.566 | 0.655 |
n | 103 | 100 | 97 | 101 | 94 | 90 | 87 |
The objective of this multi-group analysis was to examine whether or not the six growth trajectories corresponding to the six groups were comparable to one another. One problem in this typical multi-group latent growth model was that all subjects in the separated group had the same response at T7 (i.e., all 1s; see Table 5). Substantively or conceptually, the fact that all participants had the same response is not a problem. However, with this sameness in a data set, SEM programs, including Mplus, will not initiate the estimation process. For example, Mplus outputs an error message: ‘One or more variables have a variance of zero. Check your data and format statement.’ Thus, we implemented the mixture multi-group procedure with one latent class and six known classes, which then estimated all different growth factor means and variances across the six marital groups. The results are provided in Table 6(a), and the growth trajectories are shown in Figure 3(a). One of the important purposes of estimating a typical multi-group latent growth model is to test whether some of the growth factors are invariant across groups. Thus, we also ran the same multi-group model with the constraint of the same slope means across the six groups using the mixture procedure.6 The results of the restricted model are presented in Table 6(b), and the growth trajectories are shown in Figure 3(b).
Table 6. Results of mixture multi-group analysis with seven waves of negative affect real-time dataMarital status | Frequency | Proportion | Intercept | Slope | Parameters | Log-likelihood |
---|
Note |
(a) Without any constraint |
Married | 565 | 46.3% | 1.323 | −0.008 | 27 | −8111.019 |
Divorced | 263 | 21.5% | 1.446 | −0.004 | | |
Widowed | 34 | 2.8% | 1.277 | −0.004 | | |
Separated | 29 | 2.4% | 1.284 | –0.032 a | | |
Never married | 222 | 18.2% | 1.513 | −0.017a | | |
Domestic partner | 108 | 8.8% | 1.451 | −0.026a | | |
(b) With a constraint of the same slope estimates |
Married | 565 | 46.3% | 1.331 | −0.011a | 22 | −8113.509 |
Divorced | 263 | 21.5% | 1.466 | −0.011a | | |
Widowed | 34 | 2.8% | 1.296 | −0.011a | | |
Separated | 29 | 2.4% | 1.224 | −0.011a | | |
Never married | 222 | 18.2% | 1.494 | −0.011a | | |
Domestic partner | 108 | 8.8% | 1.404 | −0.011a | | |
A likelihood ratio test was then performed to test the invariance of the slopes (H_{0} : The six marital groups have the same slopes, vs. H_{1} : At least one slope is different from the others). Since ML estimation with robust standard errors (Muthén & Muthén, 1999) was used in Mplus 6, scaling correction factors were adjusted to calculate the χ^{2} difference statistic (see Satorra, 2000; Satorra & Bentler, 2001). Given the simpler model's log-likelihood (ll_{s}), scaling correction factor (scf_{s}), and number of parameters (p_{s}), and given the more complex model's log-likelihood (ll_{c}), scaling correction factor (scf_{c}), and number of parameters (p_{c}), the χ^{2}difference statistic is calculated as
- (9)
which follows the χ^{2} distribution with degrees of freedom. In our particular example,
- (10)
and this statistic was compared to the χ^{2}distribution with 5 degrees of freedom (i.e., 27–22 = 5). The p-value was .3173, suggesting that growth slopes were not different across the six groups. Likewise, when the standard multi-group procedure was not feasible because of problematic data situations, the mixture multi-group procedure provided not only the trajectory estimates across the groups but also the χ^{2} difference statistic for invariance tests, just like a standard multi-group analysis without any problematic data conditions.
7. Discussion and conclusion
- Top of page
- Abstract
- 1. Introduction
- 2. Data problems
- 3. Growth mixture model with known classes
- 4. Model specification and estimation
- 5. Monte Carlo simulation
- 6. Real data analysis
- 7. Discussion and conclusion
- Acknowledgements
- References
- Appendix 1:: Mplus code for real data example
The purpose of the present study was to show how to circumvent an estimation problem for a multi-group latent growth model when an indicator variable or variables had no variance in any of the groups examined. Since a multi-group analysis in the SEM framework initiates its estimation process with a check of complete covariance structures for all groups, the parameters for a multi-group model are not estimable when a group has completely missing data (or just one response) or same response data on an indicator variable or variables. This situation can be quite common in cohort sequential longitudinal studies (or accelerated longitudinal studies) or in a complex longitudinal model with multiple distinct phases, because data are likely to be sparser as the time moves farther from a baseline or an intervention point. If a target group of interest is small in size, these data problems can occur more often than in other groups with a larger number of observations because participants in a small, homogeneous group are more likely to have a similar experience at a given time point. The mixture multi-group approach provided tangible results with problematic data sets by applying a creative, straightforward adjustment to an existing mixture modelling approach.
Theoretically and empirically, the mixture multi-group procedure can provide valid and reliable results when used as an alternative to the standard multi-group procedure in problematic data situations. However, without a Monte Carlo simulation study, it is hard to know how closely those estimates from the mixture approach match the results from the standard multi-group LGM. When there was no data problem, the mixture multi-group estimation procedure showed exactly the same results, in terms of means and variances of growth factors, as the standard multi-group estimation procedure. When the generated data sets were manipulated to simulate problematic data examples, the mixture multi-group approach provided quite reliable results. Although the Monte Carlo study showed very reassuring results of the mixture estimation procedure, one should also note that this was a simulation study and the scope and thoroughness of the conditions simulated were limited.
Having verified that the mixture multi-group procedure showed valid and reliable results in the Monte Carlo study, we checked whether this mixture approach could be used as an alternative to the standard multi-group approach in a real data analysis. In other words, we demonstrated that the mixture approach could be used for the χ^{2} difference test to check various types of measurement invariance as conducted in a standard multi-group analysis. Through the results of the real data example, utilizing negative affect data collected over 1 week post-quit in a real smoking cessation clinical trial, we demonstrated that the log-likelihood values from the two models, one of which was the more restricted model (i.e., the model with the same slopes), could be used to test the slope invariance across the six marital groups.
The mixture estimation procedure appears to be useful in the presence of the data problems described. Of the two data problems, however, one needs to differentiate the completely missing data problem from the same response data problem. The fact that a group has the same response on an indicator variable by chance is not a substantive or design problem but an estimation problem. By comparison, completely missing data can be a substantive problem because actual responses for an indicator variable in a group have never been observed. If this missingness occurred by a research design as in cohort sequential longitudinal studies, it is reasonable to assume that the missing at random assumption is satisfied (Graham, Hofer & MacKinnon, 1971). Thus, in this mixture multi-group analysis, it is assumed that the potential responses in a completely missing data cell could lie on an extension of the growth trajectory based on the other valid indicators. If data are missing not at random (e.g., non-ignorable dropouts of patients from a treatment programme), then, needless to say, the mixture multi-group approach will not provide valid results over unobserved data points. Although, in the simulation study, the results showed quite good growth parameter recovery with completely missing data, one should carefully check the growth estimates with the completely missing data problem for interpretation.
In line with this cautionary note, researchers should proceed with caution when using the mixture estimation approach for a factor-analytic model. A latent growth model is fundamentally a factor-analytic model, and therefore this mixture approach can also be used for a factor model under the same kinds of data problems. However, in a latent growth model, one characteristic (e.g., depression) is measured on multiple occasions across time, whereas in a factor-analytic model, multiple characteristics (e.g., depression, craving, and negative affect) are measured only once. It may or may not be relevant to assume that the potential responses of completely missing depression scores are comparable realizations of the other indicators (e.g., craving and negative affect),7 and that this missing pattern is missing at random. Thus, one should be careful when using the mixture multi-group approach with the completely missing data problem, especially in a common factor model.
The present study introduced and demonstrated a modified estimation procedure to circumvent some problematic data situations which hinder estimation in a multi-group longitudinal data analysis. More specifically, the mixture multi-group procedure was shown to reliably estimate a multi-group latent growth model with completely missing data or the same response data on an indicator variable(s). Furthermore, the validity of invariance tests using likelihood ratios from the mixture analysis output was demonstrated. In the current research environment where limited resources are maximized to produce valid inference using efficient study designs – for example, accelerated longitudinal or cohort sequential longitudinal designs (Duncan et al., 1996) or planned missing follow-ups (Brown, Indurkhya & Kellam, 2000) – the mixture approach maximizes the use of the existing data to answer often critical questions in the literature. Thus, this modified mixture approach to a multi-group analysis can have important implications for applied research.