Embedding latent class regression and latent class distal outcome models into cluster‐weighted latent class analysis: a detailed simulation experiment

Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class‐specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster‐weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed modelling approach through a set of artificial examples, an extensive simulation study and an empirical application about psychological contracts among employees and employers in Belgium and the Netherlands.


Introduction
Latent class (LC) analysis (McCutcheon 1985) is a model-based clustering technique that is very popular in the social and behavioural sciences, economics and health sciences alike.The approach is used to cluster a set of observed categorical variables (known as indicators) based on an underlying latent variable.Some example applications include patterns of mobile internet usage for travelling (Okazaki et al. 2015), or, from health sciences, types of treatment engagements among adolescents with severe psychiatric problems, or change over time in nursing patterns (Roberts & Ward 2011).
In most instances, similarly to other latent variable models (such as factor analysis or item response theory), the interest of the researchers is not merely in clustering, but also in relating the clustering to antecedents and consequences in larger models.Some examples of modelling the consequences of the clustering include assessing recidivism rate among clusters of juvenile offenders (Mulder et al. 2012), or predicting distal pain outcomes from clusters of pain management (Roberts & Ward 2011).
Historically, in LC analysis, predictors are added to the model using simultaneous estimation based on the approach introduced by Dayton & Macready (1988).The inclusion of consequences, usually called distal outcomes (for a graphical representation see Figure 1a), is more problematic due to often strong distributional assumptions about the outcome variables.When such assumptions are violated, the LC solution can be distorted, and thus comparison with other models may become meaningless (Vermunt 2010;Bakk, Tekle, & Vermunt 2013).Such circumstances may as well lead to over-extraction of classes (Bauer & Curran 2003).
To overcome these problems, stepwise estimators have been proposed in literature, which allow separating the estimation of the measurement from structural model (Bakk, Tekle, & Vermunt 2013;Lanza, Tan, & Bray 2013;Asparouhov & Muthén 2014).While stepwise approaches are currently the best practice in literature, they hinder detection and testing of direct effects between the indicators of the LC and external variables.
Typically, in distal outcome models (LCdist), the distal outcome Z and Y are assumed to be conditionally independent given the latent variable X (Bakk, Tekle, & Vermunt 2013;Lanza, Tan, & Bray 2013).Direct effects of Z on Y are therefore not allowed for-neither their presence tested.Notably, maximum likelihood (ML) estimation of latent variable models is subject to severe bias when unmodelled direct effects are present in LC and latent trait models (Asparouhov & Muthén 2014), regression mixture models (Kim et al. 2016;Nylund-Gibson & Masyn 2016), and latent Markov models (Di Mari & Bakk 2018;Di Mari et al. 2022) when unmodelled direct effects are present in LC and latent trait models, and these are not accounted for.
Given the restrictiveness of the conditional independence assumption, and the possible severity of its violation, we showcase a more general model that can account for complex interdependencies between the external variable, LC membership and the indicators of the LC model.In regression mixtures, a 'circular' relation among Y-X -Z is typically considered in the cluster-weighted modelling approach (Ingrassia, Minotti, & Vittadini 2012;Ingrassia et al. 2015;Mazza, Punzo, & Ingrassia 2018).That is, a more general model is specified, where next to modelling the class-specific distribution of Z (distal outcome situation), also the direct effect of Z on Y is modelled (LC regression).Then, standard inference tools can assess the statistical significance of each effect.
In LC regression models (LCreg; Kamakura & Russel 1989;Wedel & DeSarbo 1994), although the assumption of conditional independence of Y and Z can be relaxed (see Fig. 1b), the distal outcome's distribution is not of interest, and hence not modelled.Therefore, in the traditional latent class analysis (LCA) approach, an external variable enters the model either as a predictor (LC regression) or as a distal outcome, but never as both at the same time.The cluster-weighted approach, in the context of LCA, allows embedding the models in Figure 1a,b into a single one, as depicted in Figure 1c.Although this idea is not new in LCA (Di Mari, Bakk, & Punzo 2020), the present paper demonstrates that the assumptions of both distal outcome and LC regression models can be tested within LC cluster-weighted (LCcw) umbrella.
Di Mari, Bakk, & Punzo (2020) already proposed the LC cluster-weighted approach, presenting both simultaneous as well as two-step estimators.They provided a detailed simulation study under different levels of violation of the assumption of local independence, and of the distributional assumptions of the distal outcome (that are known to affect the distal outcome model estimates).The simulation results showed the superiority of the LCcw approach when compared to the distal outcome and the LC model with covariates, when the underlying model assumptions of the latter two models are violated.
In this paper, we broaden the scope of cluster-weighted modelling in LCA by bringing out a significant theoretical and practical connection with the LC regression model-the primary competitor for assessing the relationship between Y and Z .Furthermore, we deal with the important issue of model selection, especially relevant when the underlying model assumptions are violated.We focus on how to handle direct effects with the different approaches-whereby recommending readers to Di Mari, Bakk, & Punzo (2020) for the treatment of violations of distributional assumptions of Z in distal outcome models.
We will show evidence, based on a set of artificial examples, an extensive simulation study and an empirical application, that (1) if direct effects are present, our approach, contrary to the distal outcome model, yields unbiased estimates of the distal outcome cluster specific means and variances; and (2) if the most suitable model is one between the distal outcome model or the LC regression model, the relative class sizes and compositions will be the same as the ones delivered under the proposed cluster-weighted modelling approach.
The paper proceeds as follows.In Section 2, we give model definitions and details on the parameterisations.We illustrate the proposed modelling approach through three artificial examples for both one and two external variable cases, for a total of six artificial examples, in Section 3. We present the design and the results of the simulation study in Section 4. In Section 5, we analyse data from the PSYCONES survey (Isaksson et al. 2003), and conclude with some final remarks in Section 6.

The latent class cluster-weighted model
Let Y = (Y 1 , . . ., Y J ) be the vector of the full response pattern and y its realisation.Let us assume also one continuous external variable Z is available, and we denote as z its realisation.Let us denote as X the categorical latent variable, with LCs s = 1, . . ., S .A general form of association between Y, X and Z , involves modelling the following joint probability where the common assumption in LCA of Y and Z being conditionally independent given the latent process is relaxed.From (1), several submodels can be specified (predictor model, distal outcome model, etc).If substantive theoretical arguments postulate the latent variable to be a predictor of the external variable Z , the LC cluster-weighted model specifies the probability of observing a response pattern y as which is defined by three components: the structural component (a), describing the LC variable, the external variable model (b), modelling the LC-specific distribution of Z and a measurement component (c), connecting the LC to the observed responses with a direct effect of Z .Under the assumption of local independence of response variables given the class membership and Z , the conditional distribution of the responses can be written as For estimating the model in (2), we assume each Y j to be conditionally Bernoulli distributed, with success probability π sj , and parametrise the conditional response probabilities through the following log-odds whereby Z is assumed to be conditionally Gaussian with mean μ s and variance σ 2 s , for 1 ≤ s ≤ S .
The model in (2) can be used to assign observations to clusters based on the posterior membership probabilities according to, for instance, modal or proportional assignment rules.
The LC unconditional probabilities can as well be parametrised using logistic regressions.We opt for the following parametrisation log Pr (X = s) for 1 < s ≤ S , where we take the first category as reference, and we set to zero the related parameter.The total number of free parameters to be estimated is therefore J × S (random intercepts) + J × S (random slopes) for the measurement model, S (means) + S (variances) for the external variable model, and S − 1 random intercepts for the structural model.Notice that, by setting the β js s of (4) to zero, the LC cluster-weighted model reduces to a standard LC with distal outcome model.In contrast, given that the external variable component is completely missing, the LC regression is not formally nested in the LC clusterweighted model, although it can be thought of as a sub-model in which the conditional distribution of Y|Z is modelled, and Z is taken as fixed value rather than a random variable to be modelled as well.

The LC with distal outcome model
It is common, in LCA, to consider a less general version of the joint distribution of (1), by assuming the responses and Z to be conditionally independent given the latent process.If, again, the LC variable is taken to be a predictor of the external variable Z , this yields the following LC with distal outcome model Under the local independence assumption of the indicators given the LC variable, the response conditional probabilities can be written, similarly to (3), as and parametrised through the following log-odds The model of ( 7) can be used to cluster observations, according to modal or proportional assignment rules, based on the following posterior membership probabilities The external variable Z is assumed, conditional to the LC, to be Gaussian with mean μ s and variance σ 2 s , s = 1, . . ., S , whereby the LC unconditional probabilities are parametrised as in (6).This yields J × S + 2S + S − 1 free parameters to be estimated.The only difference with model ( 2) is that Y, in the measurement component, is conditional only on X , not on Z .That is, Y is assumed to be independent of Z given X , which is a standard, and rather very strong, assumption of LCA.

The LC regression model
Rather than modelling the joint distribution Pr (Z , X , Y) in (1), the LC regression models the conditional distribution of Y given Z and the LC variable, specifying the following model for Y: In this case, the conditional response probabilities depend on the external variable Z and, under local independence of the responses given the latent variable, the measurement model can be written as in (3), and parametrised as in ( 4).The posterior membership probabilities, computed based on the model in ( 11), are as follows With LC unconditional probabilities parametrised as in ( 6), the total number of free parameters to be estimated is 2(J × S ) + S − 1. Table 1 summarises how Z enters each of the three models, and the total number of free parameters to be estimated.Intuitively, this shows that the first two models can be seen as special cases of the third model, which therefore models the relationship between the three sets of variables in the most exhaustive manner.

Maximum likelihood estimation
From (2), ( 7) and (11), given an observed sample of n units, it is possible to specify the log-likelihood functions of the cluster-weighted, the distal outcome and the LC regression models.Iterative procedures like the expectation-maximisation or Newton-type algorithms, or a combination of both as implemented in Latent GOLD 5.1 (Vermunt & Magidson 2013), can be used to maximise each lo-likelihood function with respect to the model parameters in order to find the maximum likelihood estimates.
Specifically, Latent GOLD starts the model fitting with the EM algorithm, and then switches to Newton-Raphson for the last few iterations.This is done to take the best of both worlds.Namely, to exploit the stability of the EM, while retaining the speed of Newton-Raphson when close to convergence (Vermunt & Magidson 2016).
Latent GOLD's EM algorithm does posterior mode estimation with iterative proportional fitting.The M step is instead a Newton-type step, therefore the log-likelihood is increased (and not maximised).In models including multivariate Gaussian variables (such as the distal outcome model), the Fisher scoring algorithm is only used for M steps when the covariance-structure parameters have no closed-form solution.Readers interested in more details of the estimation approach used by Latent GOLD can refer to (Vermunt 1997;Vermunt & Magidson 2013).Alternative EM implementations are also available for LC models with external variables.See, for instance, Durante, Canale, & Rigon (2019).
The log-likelihood surface, as in all mixture models, presents many local maxima (McLachlan & Peel 2004).Thus, appropriate starting values can be crucial to find a meaningful maximiser.We use Latent GOLD 5.1 for model fitting, which implements a well-refined initialisation strategy (details can be found in Vermunt & Magidson 2016, Sec. 7.8).

Evaluating the latent class solution
A commonly used measure for class separation in LCA, and thus also for classification error, is the entropy-based R 2 (Magidson 1981).For response pattern i , this quantifies how much the posterior membership probabilities deviate from uniformity by using the principle of entropy as follows where Pr (X i = s|Y i = y i ) are the posterior class membership probabilities for the simple LC model.An analogous measure of entropy can be computed based on the posteriors from the LC regression, the LC with distal outcome and the cluster-weighted LC models.The total entropy, when no information on the response (and potentially the external variables) is used to predict X , is defined as where p s is an estimate of Pr (X = s).Equation ( 14) considers only the estimated marginal class probabilities rather than the posterior class membership probabilities as in (13).The proportional reduction of entropy when y is available compared to the situation in which y is unknown is an (entropy based) R 2 measure for class separation-as well as for the quality of the classification of a sample-and is defined as where

Artificial data examples
To substantiate the benefits of the cluster-weighted modelling approach in LCA, in this Section we propose an artificial data analysis on some exemplary LCA contexts.In particular, we analyse six large data sets (30,000 sample units), three with 1 (block 1) and the other three with two external variables (block 2).Under both blocks, we generate data from each of the three models in Figure 1a-c.
In either block, we set the number of LCs to S = 2, and to begin with we fit all three models assuming this value to be known.Then, we also show results on estimation of the number of LCs based on BIC.The data were generated in R (R Core Team 2017), and parameter estimation was carried out with Latent GOLD 5.1 (Vermunt & Magidson 2016).
To get approximately equal (realistic) conditions on class separation, we generated the data such that the entropy-based R 2 (Magidson 1981) for the correctly specified model is  about 0.7 in all the three data sets-which is the minimum class separation to get a good LC model (Vermunt 2010;Asparouhov & Muthén 2014).Below we present results from Block 1. Results obtained with two external variables are comparable, and reported in the Appendix.

LC regression (LCreg) data
The LCreg data set was generated from a two-class LCreg model, with class memberships of 0.7 and 0.3, six dichotomous indicators (J = 6) and one continuous Z -drawn from a standard normal distribution-loaded on all six indicators.The external variable Z is loaded on the indicators with a coefficient of −0.5, if the most likely response is on the first class, or 1, if the most likely response is on the second class, giving a large effect size.In classical LCA this is known as differential item functioning (DIF), a violation of the conditional independence assumption often ignored (Masyn 2017; see also Kankaraš, Moors, & Vermunt 2010;Lee, Bulut, & Suh 2017;Suh & Cho 2014;Woods 2009).
We observe in Table 2 that the LCdist model overinflates the mixing proportion on the bigger class, whereas the LCcw model yields nearly equivalent class proportions as in the correctly specified case.This at the cost of four more parameters to be estimated.
Table 3 reports estimated means and variances for the variable Z based on the LCdist and LCcw models, along with standard errors and P -values of the Wald tests of equality of the means and the variances.Nothing is reported for LCreg, as Z is not modelled.In the LCdist model, both means are wrongly estimated to be statistically different from zero.Moreover, based on the reported Wald tests, we reject the nulls of equal means and equal variances (with P -values smaller than 0.01).These findings for the LCdist model can be explained by the fact that it wrongly predicts a clustered distribution on Z in order to accommodate for a direct effect of Z on the indicators which is not accounted for.This creates an additional source of entropy in the class solution (as displayed by the relatively higher value of the entropy-based R 2 ).

LC distal outcome (LCdist) data
The LCdist data set was generated from a two-class LCdist model, with class memberships of 0.7 and 0.3, six dichotomous indicators (J = 6) and one continuous Z , drawn from a mixture of two normal distributions with means of −1 and 1 and common variance of 1.
The LCreg model yields a completely distorted class solution, whereby both the LCdist and LCcw models yield almost identical (correct) solutions (Table 4).Interestingly, the misspecified response-Z relation in the LCreg model yields a solution with relatively smaller class separation (as measured by the entropy-based R 2 ).
Next, we compare estimates of class-specific means and variances of Z as obtained by the LCdist and LCcw models (Table 5).
The LCdist and the LCcw models estimate almost identical means and variances of Z , both correctly not rejecting the null of common variance across LCs.We observe that the SE's for the less parsimonious LCcw model are systematically larger than those of the correctly specified model: this is not surprising, as having less degrees of freedom corresponds, all else equal, to slightly more variable estimates.

LC cluster-weighted (LCcw )data
The LCcw data were generated from a two-class LCcw model, with class memberships of 0.7 and 0.3, six dichotomous indicators (J = 6) and one continuous Z , drawn from a mixture of two normal distributions with means of −1 and 1 and common variance of 1.
Both the LCreg and the LCdist models deliver distorted class solutions (Table 6).Although with a higher entropy-based R 2 , the residual dependence among the indicators due to the exclusion of the direct effect causes a more severe distortion in the LCdist model compared to the LCreg model.Also relative class sizes are overturned for LCdist.Equally we observe (Table 7) that the means and variance(s) of Z are both biased in the LCdist    (Hubert & Arabie 1985), arranged in a three-by-three table, comparing the hard partitions obtained with each fitted model under the three data generating model scenarios.The results are in line with what was observed above.When the data are generated with a LCreg model, the LCcw model delivers an almost identical partition to that of the correctly specified model, followed close up by the LCdist model-with an ARI of ≈ 0.97.In the LCdist data set as well, the LCcw model's partition is nearly as in the correctly specified model (ARI of ≈ 0.99), whereby the ARI drops to ≈ 0.21 when the comparison is with the LCreg partition.In the latest scenario-LCcw data set-fitting both the LCreg and the LCdist models delivers in both cases quite different partitions (≈ 0.16 and ≈ 0.31 ARIs) compared to the correctly specified model.
Based on the above data sets, in Table 9 we report also results on BIC values for the three models in all three scenarios, for S = 1, . . ., 5.Although BIC values can be compared for LCdist and LCcw, selecting a model among the three with BIC cannot be done as Z in LCreg is not modelled and the model likelihoods are therefore not comparable.In both the LCreg and LCdist data sets, BIC for the LCcw model selects, together with the correctly specified model in the first two scenarios, the correct number of classes.

Number of components
Interestingly, however, misspecifying the indicators-Z relation causes, in both the LCreg and LCdist models, a severe overstatement of the number of classes (LCcw data set).

Simulation study 4.1. Design
This simulation study is designed to assess the cluster-weighted LC approach under varying conditions on sample size, class size and class separation on both the indicators and the continuous variable means.Using the same terminology of the artificial examples, we generate data from a two-class LC model with six indicators (J = 6) and one external variable under the three model specifications LRreg, LCdist and LCcw .This allows us to give a fair account of each model's performance both in the correct and in the incorrect model specification cases.
As target measures, we consider estimated class proportions to assess the clustering performance, and estimated means for the external variable (only for LCdist and LCcw).We manipulate class separation through two channels: (1) by altering the response probabilities that control the strength of the relationship between indicators and LCs; and (2) by varying the class conditional means of the external variable.
Regarding (1), we consider two levels, 0.65 and 0.9, corresponding to entropy-based R 2 of about 0.7 and 0.9.Related to (2), the external variable, under the LCdist and LCcw data generating models, is sampled from a clustered normal distribution with unit variance and mean depending on class membership: the values are set to −1 and 1, −2 and 2 and −3 and 3, corresponding to an increasing separation level.
Under the LCreg data generating model, Z is sampled from a standard normal.The sample sizes used are 500 and 2000, and the relative class sizes equal to 0.5 (equal class sizes) and 0.3 (unequal class sizes).The regression coefficient β js (LCreg and LCcw data generating models) is set to 0.5, which corresponds to a moderate-strong magnitude on the logistic scale.For the resulting 56 simulation crossed scenarios, obtained by combining the conditions on sample and class sizes, class separation on the indicators and the external variable and data generating models, we generate 250 data sets.Data generation is done within R (R Core Team 2022), whereas model estimation is carried out using Latent GOLD 5.1 (Vermunt & Magidson 2016) in combination with R.

Summary of the results
Under LCreg, the correctly specified model and LCcw do well (in terms of estimated class sizes) in all conditions.LCdist has the worst performance, exacerbated by small separation and equal class sizes.
With the LCdist data generating model, again the correctly specified model and LCcw do well in all conditions.LCreg reaches its own best performance in conditions with small separation on both indicators and external variable means (conditions 1-6).Larger separation and equal cluster size conditions is where LCreg does worst.Both LCdist and LCcw estimate well the external variable means in all conditions, with SEs recovering well the true underlying variability.
Under LCcw , both classical approaches LCdist and LCreg are misspecified.Interestingly, with the exception of the first four conditions (small separation on both indicators and external variable means), LCdist performs relatively better (smallest bias in estimated class sizes and smaller variance) than LCreg.LCdist estimates well the class conditional means of the external variable only in large separation conditions.More generally, we observe that class separation on the indicators has a relatively stronger impact on LCdist outcome.
In Table 10 we present relative absolute bias in the class proportion for the first (smallest) class, with average Monte Carlo standard deviations in parentheses, averaged over all the levels of separation between classes and sample size per data generating model for the three models.Except for the correctly specified case, LCreg delivers the most distorted class composition overall, with up to 30% bias under the LCcw data generating process.When a LCdist model is fitted, under LCreg and LCcw , the bias is relatively lower-between 10% and 12%.The aggregate numbers show that, on average, class distortion for the clusterweighted model specification is below 3%.In contrast, the distal outcome model's class proportion output is estimated with relatively smaller variance than its competitors.This is not surprising, as LCdist's specification with a single distal outcome requires estimating a smaller number of free parameters than LCreg and LCcw.
Similarly, Table 11 presents the aggregate relative absolute bias for the first-class estimated distal outcome mean, averaged over simulation rounds and conditions, per data generating process.In line with what we commented above, the distal outcome's means are estimated, on average, with bias by LCdist when there is model misspecification.In our settings, this bias is, on average, of the order of 8% relative to the true parameter value.The detailed simulation output (stacked barplots and linegraphs) can be found in the Data S1.

Real data application: relating LC model of psychological contract types to job insecurity
We analysed data from the Dutch and Belgian sample of the Psychological Contracts across Employment Situation (PSYCONES) project (European Commission, 2006).The sample consisted of n = 1353 respondents.We selected J = 8 indicator variables, measuring psychological contract type: the first four refer to employee obligations (whether a promise was made or not), and the last 4 to employer obligations, where each set of four indicators contained two items for relational and two for transactional obligations.Examples of the wording of indicators are: 'This organization promised me a reasonably secure job' and 'This organization promised me a good pay for the work I do'.The typology on these indicators was first proposed by Cuyper et al. (2008), who identified a LC model with four classes corresponding to mutual high obligations, employee over-obligation, employee under-obligation and mutual low obligations in the Belgian and German sample of the PSYCONES data.The four-class model was replicated on the Dutch and Belgian sample by Bakk, Tekle, & Vermunt (2013).We replicated the LC model proposed by Bakk, Tekle, & Vermunt (2013) and related it to perceived job insecurity measured using a scale developed by De Witte et al. (2000), that consists of four indicators with five categories and had a Cronbach's alpha value of 0.88.While Bakk, Tekle, & Vermunt (2013) used a stepwise distal outcome model, and thus ignored possible direct effects between the eight indicators of the LC membership and job insecurity, we re-analyse the data using cluster-weighted LC and LC regression models that allow also for possible direct effects.
In Table 12 we report the simple LC model with four classes.The respondents belonging to the mutual high obligation class have a high probability to have all employee and employer obligations, while in the over-obligation class employees have a high probability to have all obligations, while they perceive that the employer scores low on their side.The under-obligation class shows a reverse pattern: employers score high on all obligations while employees are disengaged.In the mutual low class both parties have low obligations.More detailed description of the model is available in Cuyper et al. (2008).
Subsequently, we investigated whether there is a difference with regard to perceived job insecurity among the four classes.To verify if the same number of classes would be  As Figure 2 shows, the definition of the class-specific response pattern on the eight indicators is comparable between the simple LCA, LCreg and LCcw, while using LCdist we obtain three classes that split up the mutual high class to better model the distribution of the distal outcome.
In Table 14, the direct effects as modelled with LCcw are reported.While these effects are mild, still modelling them improves the estimation of the relationship between the LC variable and the distal outcome.This leads to a model with four classes that, differently to the distal outcome model, has comparable classes to the simple LC model.Most noticeable is the direct effects on the openness to volunteer and the perception of a safe working environment.Higher job insecurity is associated with lower levels of volunteering in all classes but the mutual low class, while higher level of job insecurity is also associated with lower levels of perceived safe environment in the under obligation and mutual high classes.Indeed, both direct effects show relevant substantive insights.
Furthermore, in Table 15 we report the class-specific means and variances obtained using the LCcw approach.The job insecurity is the highest in the over-obligation class (with also the highest variance), while lowest in the under-obligation class (with also the lowest variance, mostly due to within-class homogeneity).The main conclusions are comparable to the initial results by Bakk, Tekle, & Vermunt (2013), where the authors used a three-step distal outcome model.That is job insecurity is the largest using both approaches in the over-obligation class and lowest in the under-obligation class, but effect sizes differ.Nevertheless using the stepwise distal outcome approach the direct effects were ignored.

Conclusion
In this paper, we have brought modelling ideas from the regression mixture literature into LCA.Our focus has been to motivate the use of the cluster-weighted modelling approach as a general specification for the joint relationship of the response variables, the external variable and the latent variable.Six artificial data analyses, and a comprehensive simulation study have been used to illustrate this idea, and the actual advantage of the proposed approach was showed through an application using data from the Dutch and Belgian sample of the PSYCONES project (European Commission, 2006).
The cluster-weighted modelling approach, contrary to the simpler distal outcome model, was able to bring to light, meaningful insights about the relationship between job insecurity and some indicators of the psychological contract type LCs.By allowing for a more general LCcw type of specification, we were able to show the presence of substantively interesting direct effect of job insecurity on volunteering and on perceived safe work environment in some of the classes.
In the applied researcher perspective, the proposed approach has several advantages.By relaxing the conditional independence assumption, cluster-weighted modelling allows to model direct effects when these are of substantive interest, as well as when they are not.In other words, if direct effects are present, our approach, contrary to the distal outcome model, is able to correctly estimate the distal outcome cluster-specific means and variances.Practically, by starting from the most general model and then testing the model assumptions of both the distal outcome and the LC regression models, it guarantees a more flexible option.
The approach we suggest has some limitations as well.First, it can be unstable if some of the response patterns are unobserved, or their sample frequency is too small (for standard sufficient conditions for identifiability of LCA see, for instance, Bandeen-Roche et al. 1997).In such cases, simpler models can be more attractive.Second, in exploratory contexts where the goal of the analysis is not always clear in mind, the circularity of clusterweighted modelling might be relatively harder to interpret and simpler models might be preferable.

Appendix: Results of data analysis on Block 2 of section 3 with two external variables
Data were generated with entropy-based R 2 for the correctly specified model is about 0.7 in all the three data sets.To avoid the well-known issue of degeneracy of Gaussian mixtures (McLachlan & Peel 2004; see also Di Mari, Rocci, & Gattone 2017;García-Escudero et al. 2018 and references therein), we impose homoscedasticity in the class conditional variances for both external variables.

A.1. LCreg data
The LCreg data set was generated like in 3.1 for the LC model with two continuous external variables-each drawn from a standard normal distribution-loaded on all six indicators.The external variables Z 1 and Z 2 are loaded on the indicators with a coefficient of 0.5, if the most likely response is on the first class, or −0.5, if the most likely response is on the second class, giving a moderate/large effect size (Tables A1 and A2).

A.2. LC distal outcome (LCdist) data
The LCdist data set was generated from a two-class LCdist model, with class memberships of 0.7 and 0.3, six dichotomous indicators (J = 6) and two continuous external variables, both drawn from a normal distribution with unit variance and mean depending on class membership of either −1 or 1 (Tables A3 and A4).

A.3. LCcw data
The LCcw data set was generated from a two-class LCdist model, with class memberships of 0.7 and 0.3, six dichotomous indicators (J = 6) and two continuous external variables, both drawn from a normal distribution with unit variance and mean depending on class membership of either −1 or 1.Both Z 1 and Z 2 are loaded on the indicators with a coefficient of 0.5, if the most likely response is on the first class, or −0.5, if the most likely response is on the second class, giving a moderate/large effect size (Tables A5-A8).

Table 1 .
Summary of different modelling assumptions and number of free parameters to be estimated.

Table 2 .
LCreg data.Estimated class proportions, entropy-based R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:
Notes: Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means and variances for the LCdist model and the LCcw model.Standard errors in parentheses.Reported P -values are function of the unobserved LC variable, and are therefore approximate.

Table 4 .
LCdist data.Estimated class proportions, entropy R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:

Table 5 .
LCdist data.Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means and variances for the LCdist model and the LCcw model.Standard errors in parentheses.Results from correctly specified model in bold font.Reported P -values are function of the unobserved LC variable, and are therefore approximate. Notes:

Table 6 .
LCcw data.Estimated class proportions, entropy-based R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:
Notes: Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means and variances for the LCdist model and the LCcw model.Standard errors in parentheses.Results from correctly specified model in bold font.Reported P -values are function of the unobserved LC variable, and are therefore approximate.

Table 8 .
Adjusted Rand indexes, computed between clustering with correctly specified models-LCreg, LCdist and LCcw models-and clustering with the other two models.
model.Contrary to the correctly specified model, in LCdist the Wald test cannot reject the equal variances hypothesis (at 1 % level).Table8reports adjusted Rand indexes (ARIs)

Table 10 .
Relative absolute bias for the class proportion of the first (smallest) class, with average Monte Carlo standard deviations in parentheses, per data generating model, averaged over simulation condition.

Table 11 .
Relative absolute bias for the first (smallest) class distal outcome mean, with average Monte Carlo standard deviations in parentheses, per data generating model, averaged over simulation condition.

Table 12 .
Class proportions and class-specific probabilities of a positive response for the four-class model estimated for the PSYCONES data (Belgian and Dutch combined sample).

Table 13 .
Number of components (Ncomp), BIC, number of parameters (#par), expected classification error (Class.Err.), and entropy-based (Entr.)R 2 for each of the three models, from 1 to 6 (seven in case of LCdist) components.differentapproaches,first we looked at model fit indices.In Table13we report overall fit statistics for the LCreg, LCdist and LCcw models with one to six classes for LCcw and LCreg, and till seven classes for LCdist.While with LCcw and LCreg the best fitting model would be the four-class model (with class definitions very similar to the simple LC model), with LCdist a six-class model is selected by BIC as best fitting.The six-class model breaks down the mutual high class into three smaller classes to account for the unmodelled direct effects of job insecurity and to better model its classspecific distribution.However, when using LCcw we allow for modelling the mild direct effects between job insecurity and some of the indicators, and the four-class model (that is validated in literature) stays the best fit.

Table 15 .
The class-specific means, variances and corresponding Wald test statistics for the LCcw model on the PSYCON data.

Table A1 .
LCreg data.Two external variables.Estimated class proportions, entropy-based R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:
Notes: Two external variables.Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means for the LCdist model and the LCcw model.Standard errors in parentheses.Reported P -values are function of the unobserved LC variable, and are therefore approximate.

Table A3 .
LCdist data.Two external variables.Estimated class proportions, entropy R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:

Table A4 .
LCdist data.Two external variables.Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means for the LCdist model and the LCcw model.Standard errors in parentheses.Reported P -values are function of the unobserved LC variable, and are therefore approximate. Notes:

Table A5 .
LCcw data.Two external variables.Estimated class proportions, entropy R 2 and number of parameters for each of the three estimated models.Results from correctly specified model in bold font. Notes:

Table A6 .
LCcw data.Two external variables.Estimated means (***P -value<0.001,**P -value<0.01,*P -value<0.05)and variances, and P -values from Wald test of equality of component means for the LCdist model and the LCcw model.Standard errors in parentheses.Reported P -values are function of the unobserved LC variable, and are therefore approximate. Notes:

Table A7 .
Adjusted Rand indexes, computed between clustering with correctly specified models-LCreg, LCdist and LCcw models-and clustering with the other two models.Two external variables.

Table A8 .
Model selection with BIC computed for each model at each data generating model-LCreg, LCdist and LCcw-for S = 1, . . ., 5. Data generating model and minimum BIC value, for each model at each scenario, in bold.Two external variables.