SEARCH

SEARCH BY CITATION

Abstract 

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

We conducted a Monte Carlo study to investigate the performance of the polychoric instrumental variable estimator (PIV) in comparison to unweighted least squares (ULS) and diagonally weighted least squares (DWLS) in the estimation of a confirmatory factor analysis model with dichotomous indicators. The simulation involved 144 conditions (1,000 replications per condition) that were defined by a combination of (a) two types of latent factor models, (b) four sample sizes (100, 250, 500, 1,000), (c) three factor loadings (low, moderate, strong), (d) three levels of non-normality (normal, moderately, and extremely non-normal), and (e) whether the factor model was correctly specified or misspecified. The results showed that when the model was correctly specified, PIV produced estimates that were as accurate as ULS and DWLS. Furthermore, the simulation showed that PIV was more robust to structural misspecifications than ULS and DWLS.


1. Introduction

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Confirmatory factor analysis (CFA) is a widely used statistical tool in test development which allows researchers to test hypotheses about the structure of a scale. Typically, model parameters within CFA are estimated using maximum likelihood (ML) estimation (Browne, 1984; Bollen, 1989). ML assumes that the observed variables are continuous and that they follow a multivariate normal distribution, or equivalently that their covariance matrix has a Wishart distribution. However, test developers often work with items that have a binary response format (e.g. Nestler, Back, & Egloff, 2011). In this case, the assumptions behind ML are typically not met, leading to significant estimation problems, including, for instance, that the model parameters as well as their standard errors are inaccurately estimated (e.g. Babakus, Ferguson, & Jöreskog, 1987; Muthén & Kaplan, 1992).

Given these findings, a number of alternative estimation methods have been developed (Christofferson, 1975; Muthén, 1984; Jöreskog & Sörbom, 1996). All of these methods assume that a continuous latent variable underlies the observed responses to a binary item, and that the specified CFA model holds for these latent continuous variables and not the observed binary variables (see Bollen, 1989, pp. 439–445, for a general introduction). Model parameters are estimated by employing refined versions of a weighted least squares (WLS) approach (Browne, 1984); these versions draw on the tetrachoric correlations among the dichotomous items and on their asymptotic covariance matrix. Overall, robustness studies have shown that these estimation methods perform better than ML (e.g. Muthén & Kaplan, 1992; Beauducel & Herzberg, 2006). Simulations, however, have also found that they are only partially robust to misspecified factor models as they are system-wide estimators: they estimate all model parameters in one step (Bollen & Maydeu-Olivares, 2007).

The aim of the present paper is to compare the performance of a recently suggested alternative equation-by-equation estimator to the established approaches in the estimation of binary CFA models. Specifically, we compared the polychoric instrumental variable (PIV) estimator suggested by Bollen and Maydeu-Olivares (2007) to the unweighted least squares (ULS) estimator and the diagonally weighted least squares (DWLS) estimator in a Monte Carlo study. In the next section we introduce the dichotomous factor analysis model. We will then give a brief description of the standard estimation methods (e.g. ULS, DWLS) and introduce the PIV estimator.

2. The dichotomous factor analysis model

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

Consider a questionnaire consisting of n items yjj = 1,…, n) that individuals rated by using two response alternatives. The dichotomous factor analysis model assumes (a) that a latent variable inline image underlies the observed binary variable yj, and (b) that both are related via a threshold relationship: yj = 0 if inline image, and yj = 1 if inline image, respectively, where τj0 = −∞ and τj2 = ∞.

Furthermore, the latent responses y* are linear functions of m latent factors η:

  • image(1)

where y* denotes the n× 1 vector of latent response variables, Λ is an n×m matrix of factor loadings, η is an m× 1 vector of factors, and ɛ denotes an n× 1 vector of errors terms. By assumption, the factors and the errors are normally distributed, the expectation of both is zero, and the factors and the error terms are uncorrelated.

It follows, then, that the latent distributions are normally distributed, that they have an expectation of zero, and that their covariance matrix Σ is given by:

  • image(2)

where Ψ is the covariance matrix of the latent factors η. The covariance structure hypothesis thus holds for the latent distributions and not for the observed binary responses. To estimate the model parameters, Σ is substituted by the matrix of tetrachoric correlations between the binary responses, P, and Θ is set to Θ = I− ΛΨΛT. The latter is done to identify the model but has the consequence that the error variances are no longer model parameters, and, more generally, that a correlation structure is analysed (see Bentler & Salavei, 2010, for an overview on the analysis of correlation structures; see also Muthén, & Asparouhov, 2002, for an approach for analysing covariance structures when there are binary indicators). Furthermore, it can be shown that the dichotomous factor analysis model is mathematically equivalent to the normal ogive version of the two-parameter item response model and to the graded response model (see McDonald, 1999; Bollen, Bauer, Christ, & Edwards, 2010).

2.1. Standard estimation and testing approach

Imagine that we have a drawn a random sample of size N from the dichotomous factor model. Computing model parameters starts with the estimation of the thresholds, τj1, and the tetrachoric correlations among the items (Olsson, 1979). Then the asymptotic covariance matrix, inline image, of the tetrachoric correlations is estimated. Finally, a least squares function F is minimized to estimate the model parameters (Muthén, 1978):

  • image(3)

where θ is the vector of (independent) model parameters, inline image reflects the vector of estimated tetrachoric correlations, ρ(θ) denotes the restrictions imposed on the population tetrachoric correlations by the parameter vector θ, and inline image, is a positive definite weight matrix.

The standard approaches to estimating model parameters differ in their choice of the weight matrix W. In ULS (Muthén, 1978, 1984), to begin with, W is set to an identity matrix (i.e. W = I). In WLS (Browne, 1984), by contrast, W is the inverse of the estimated asymptotic covariance matrix of the tetrachoric correlations (i.e. inline image). Finally, in DWLS (Jöreskog & Sörbom, 1996), W is set to inline image. Note that the name that Flora and Curran (2004) and Muthén, du Toit, and Spisic (1997) use for DWLS is robust WLS. Also, they use a diagonal matrix V instead of W to estimate the model parameters. In contrast to W, V includes not only the asymptotic variances of the tetrachoric correlations, but also the asymptotic variances of the thresholds. However, all approaches – including ULS, WLS, DWLS, and robust WLS – use the full asymptotic covariance matrix, inline image, to obtain standard errors and the chi-square test statistic.

All three estimators can be employed to obtain model parameters in popular software packages such as LISREL (Jöreskog & Sörbom, 1996) or Mplus (Muthén & Muthén, 1998). In both programs, ULS and WLS are termed accordingly. DWLS, however, is called robust DWLS in LISREL and WLSMV in Mplus. Additionally, although both programs use the same method to estimate the tetrachoric correlation, they differ in their estimation of the asymptotic covariance matrix. Specifically, whereas LISREL employs the procedure described in Jöreskog (1994), Mplus uses the method described in Muthén (1984). The two methods differ in their treatment of the threshold parameters, but are asymptotically equivalent (Muthén & Satorra, 1995). Also, Dolan (1994) showed that they produced similar results in the case of CFA even with a sample size of N = 200.

ULS, WLS and DWLS yield consistent estimates of the model parameters that are asymptotically normal, and asymptotically correct standard errors can be computed (see Bollen & Maydeu-Olivares, 2007). Furthermore, simulation studies have found that whereas WLS performs adequately only for large sample sizes, DWLS and ULS perform well for small ones too (e.g. Dolan, 1994; Flora & Curran, 2004; Beauducel & Herzberg, 2006; Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009). Therefore, both are typically used to estimate the parameters of a CFA model. However, recent simulations have found that they are only partially robust to structural misspecifications (Bollen & Maydeu-Olivares, 2007), and both lead to biased parameter estimates and standard errors when the factor loadings are small, when the latent continuous variables are skewed, and when there are only a few factor indicators (Forero et al., 2009).

2.2. Polychoric instrumental variable estimator

Given these results, and given that these conditions are typically found in applied settings, the performance of alternative estimation methods should be investigated. One of the alternatives is the polychoric instrumental variable estimator recently proposed by Bollen and Maydeu-Olivares (2007). The basic idea behind the PIV estimator is to compute the factor loadings in a first step and to estimate the variance and covariance model parameters – based on the values of the factor loadings – in a second step. Specifically, whereas the factor loadings are obtained using a non-iterative procedure employing instrumental variables (IVs), the variance–covariance parameters are estimated using an iterative procedure.

IV estimation is a special case of generalized method of moments estimation (Baum, Schaffer, & Stillman, 2003; Hall, 2005) and is typically employed in econometrics (e.g. Angrist, Imbens, & Rubins, 1996) to investigate the effect of a predictor on a criterion in a linear regression when the predictor is measured with error or is systematically related to other determinants of the criterion. In this case, the assumptions of ordinary least squares regression are not met, and the model parameters cannot be consistently estimated. A solution to this problem is to find one or more exogenous variables that affect the predictor but not the criterion (cf. Morgan & Winship, 2007), and to use these IVs to compute consistent versions of the regression parameters.

Bollen (1996) extended this approach to structural equation models (SEMs) for continuous variables.1 Applied to a CFA model, he showed that each indicator is a linear function of the factor loadings of this indicator, the indicator used to scale the latent factor, and a composite disturbance term containing the error of the scaling indicator and the error of the actual indicator. Given that the scaling indicator is correlated with the composite disturbance term – the disturbance contains this error term – IVs are used to compute the factor loadings. Specifically, in a two-stage regression (2SLS), the scaling indicator is first regressed on the IVs. The regression coefficients obtained are then used to compute predicted values of the scaling indicator, and the actual indicator is then regressed on these predicted values. The regression coefficient of this second regression is the desired factor loading.

The 2SLS/IV estimator yields consistent estimates of the factor loadings when the IVs meet certain requirements. The IVs must be (a) correlated with the scaling indicator, (b) unrelated to the errors in the composite disturbance term, and (c) sufficient in number so that there are at least as many IVs as there are scaling indicators (cf. Bollen, 1996; Bollen & Maydeu-Olivares, 2007).2 If the CFA model is identified, the potential IVs are a subset of the other indicators, and the structure of the CFA model can be used to determine which of these indicators satisfy conditions (a) and (b). The IVs are thus model implied, and it is this feature that differentiates Bollen's 2SLS/IV approach from the typical usage of IVs in econometric contexts where the IVs are taken from outside of the model. To indicate this conceptual difference, the 2SLS/IV approach is called model-implied instrumental variables (MIIV) in more recent publications (e.g. Bollen & Bauer, 2004).

Recently, Bollen and Maydeu-Olivares (2007) generalized the MIIV estimator to categorical variables. As in the case of continuous indicators, factor loadings are first computed using IVs in a two-stage regression, whereby these computations draw on the tetrachoric correlations among the indicators. Once the factor loadings have been obtained, variances of the factors and the covariance among them are computed. Therefore, the factor loadings are entered into equation (3), and the ULS variant of F is used to estimate the remaining model parameters. Finally, Bollen and Maydeu-Olivares (2007) also provided formulae to compute standard errors for the factor loadings and for the variance and covariance model parameters.

2.3. The current simulation

To date, only Bollen and Maydeu-Olivares (2007) have done a simulation study in which the performance of PIV relative to ULS has been investigated. In their study, they used a two-factor model with high factor loadings (λ = .80), and the model was either correctly or incorrectly specified. They found that PIV provided parameter estimates that were as accurate as estimates obtained using ULS when the model was correctly specified. When the CFA model was incorrectly specified (the cross-loading of an item was set to zero), PIV produced more accurate parameter estimates compared to ULS. The aim of the present research was to replicate and extend these findings. Specifically, we examined the performance of PIV under different settings of factor model size, factor loading magnitude (i.e. indicator reliability), sample size, and non-normality of the latent distributions (see Table 1 for an overview of design factors).3 Furthermore, we investigated the effect of a structural misspecification. Finally, we compared the performance of PIV not only to ULS but also to DWLS, as the latter is also widely used in estimating dichotomous CFA models.

Table 1. Factors investigated in the current simulation
FactorLevels
Model size2 levels: two-factor model or three-factor model
Factor loading3 levels: λ = .40, .55, .70
Sample size4 levels: N = 100, 250, 500, 1,000
Latent distributions3 levels: normal, moderately or extremely non-normal
Misspecification2 levels: no or yes

3. Method

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

3.1. Outcome variables and data generation

3.1.1. Continuous latent responses

In each of the factor loading conditions, three continuous latent response population distributions (N = 20,000) were generated using the R package mvtnorm (Genz et al., 2011). The first distribution was multivariate normal with skewness and kurtosis of zero. The second distribution differed moderately from normality by having a skewness of 2 and a kurtosis of 7, and the final distribution was extremely non-normal with a univariate skewness of 3 and kurtosis of 20. These parameters were chosen as they are representative of levels of non-normality typically encountered in applied research (see Flora & Curran, 2004; Yu, 2002). To simulate the latent distributions with non-zero skewness and kurtosis, we used the procedures suggested by Fleishman (1978) and Vale and Maurelli (1983). For each of the three distributions, 1,000 random samples of four different sample sizes were drawn: 100, 200, 500, and 1,000.

3.1.2. Binary observed responses

After drawing (continuous) samples from the latent distributions, we transformed each latent response variable into a dichotomous variable by dichotomizing it at a threshold of τ = 0.

3.2. Model specification

The population correlation matrix on which samples were based conformed either to a model consisting of two correlated factors each measured by four indicators (see Figure 1(a)) or to a model consisting of three correlated factors each measured by four indicators (see Figure 1(b)). Depending on the factor loading condition, factor loadings on the true latent factor were set to either λ = .70, λ = .55, or λ = .40; otherwise they were set to λ = 0. The uniquenesses were all set to either θ = .51 in the high factor-loading condition, θ = .69 in the moderate, or θ = .84 in low. Finally, the population covariance between the latent factors was set to 0.3, and the variances of the factors were set to 1.

Figure 1. (a) Two-factor model and (b) three-factor model simulated in the present study. Variables in circles refer to latent variables (i.e. latent factors, ηi, or latent continuous response variables, yi*, respectively), and variables in rectangles to dichotomous response variables. In the case of the misspecified two-factor model, the fourth and eighth indicators were set to load on the second and the first latent factors, respectively (see dotted lines). For the misspecified three-factor model, the fourth (eighth, twelfth) indicator was set to load on the third (first, second) latent factor (again see dotted lines).

Download figure to PowerPoint

image

In all replications, two model specifications were tested (again see Figure 1). The first model was correctly specified such that all indicators were allowed to load on their respective true factor and the two or three factors, respectively, were allowed to correlate. In the case of the incorrectly specified factor model, one observed indicator of each factor in the true model was specified to load on the other, wrong, factor, even though the factor loadings on the respective incorrect factors were zero in the population. With respect to the two-factor model, for example, item 4 was set to load on factor 2, and item 8 was set to load on factor 1.

3.3. Data analysis

3.3.1. Parameter estimation

Parameter estimates and standard errors were computed using R (R Development Core Team, 2011; the R code can be found in the supplementary materials, available with the online version of this paper). Specifically, thresholds and tetrachoric correlations were obtained by employing the two-stage procedure described in Olsson (1979). Then, an estimate of the asymptotic covariance matrix, inline image, was computed by implementing the procedures of Jöreskog (1994) and Christofferson and Gunsjö (1996). In the case of ULS, DWLS, and PIV (when the variances of the factors and the covariance between them had to be estimated), the function F was minimized using the R function nlminb(). Factor loadings were estimated with a starting value of 1.00, variance parameters with a starting value of 0.05, and for covariance parameters, the starting value was set to 0.4

Finally, to obtain the factor loadings according to the PIV estimator, we selected IVs that satisfied all three requirements mentioned earlier. Consider, for example, the estimation of the factor loading of the second indicator in the case of the two-factor model (see Figure 1(a)). To estimate it, IVs are needed to predict the scaling indicator of the respective factor (i.e. inline image). These IVs must be (a) correlated with inline image and (b) uncorrelated with the disturbance term that contains the error terms of inline image and inline image. All other indicators, inline image, meet these two requirements as either they load on the same latent factor or the two latent factors are correlated, and all errors are uncorrelated. Of this set of potential IVs, finally, we used the IVs that should be most strongly – according to the CFA model – related to the scaling indicator. In the case of the second indicator, inline image and inline image were chosen to compute the factor loading (see Table 2 for the IVs for the other indicators).

Table 2. Instrumental variables (IV) used to estimate the factor loading of the indicator y* in the two-factor model
Correctly specifiedIncorrectly specified
y* z IV y* z IV
213, 4213, 8
312, 4312, 8
412, 3456, 7
657, 8654, 7
756, 8754, 6
856, 7812, 3
3.3.2. Goodness-of-fit statistic

In addition to examining the accuracy of the three estimators, we also investigated how well an estimator-specific goodness-of-fit statistic performed in the different conditions. These goodness-of-fit tests were based on the test statistic inline image, where N denotes the size of the sample, and inline image is the value of the fit function with the estimated parameter values inserted (see Equation 3). When the weight matrix W is correctly specified, that is, inline image, then T is asymptotically χ2-distributed with d degrees of freedom. If this assumption is not met, as in the case of ULS and DWLS, T is a mixture of d independent one-degree-of-freedom chi-square variables. A goodness-of-fit test can then be obtained by rescaling T so that, for example, its mean and variance equal those of a chi-square variate with d degrees of freedom. In the latter case, T is generally not χ2-distributed, but can well be approximated by a χ2 distribution with d degrees of freedom (Bentler & Savalei, 2010, p. 14). We obtained these mean- and-variance-adjusted statistics for the ULS and the DWLS estimators by implementing the formula as suggested, for instance, in Bentler and Savalei (2010, equation 1.25); for the PIV estimator, we used the equations given in Bollen and Maydeu-Oilvares (2007). As before, we used R to compute the test statistics.

4. Results

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

The results for the two- and three-factor models were very similar. Therefore, we present the results of the two-factor model only (see Figure 1(a)). Tables that report the results for the three-factor model can be found in the supplementary materials. Also, the results in each simulated condition are included in the supplementary materials. In Section 4.1 present the results concerning convergence failures. Then we go on to report the results concerning the performance of the three estimation methods in parameter estimation (Section 4.2) and standard error estimation (Section 4.3). Finally, in Section 4.4, we report the results concerning the test statistic.

4.1. Convergence failures

Table 3 displays the percentage of convergence failures depending upon estimator, magnitude of true factor loading, and sample size. When the factor model was correctly specified, most convergence failures emerged in conditions of a small sample size and a small true factor loading. This effect was more pronounced for ULS and DWLS than for PIV. Overall, PIV yielded a very small rate of convergence failures. A similar result pattern emerged for the misspecified model, although failures to converge arose here also in conditions of moderate and high true factor loadings. Again, PIV yielded the smallest rate of convergence failures. Finally, non-normality had no systematic effect in either model specification condition.

Table 3. Rates (%) of convergence failures for correctly and incorrectly specified models depending on estimator, magnitude of true factor loading (λ) and sample size (N)
λ N ULSDWLSPIV
CorrectIncorrectCorrectIncorrectCorrectIncorrect
.4010021.926.721.425.90.10.1
 2503.911.53.710.70.10.1
 5000.22.80.22.40.00.1
 1,0000.00.30.00.30.00.1
.551001.88.52.18.00.10.1
 2500.00.80.01.20.00.0
 5000.00.10.00.10.00.0
 1,0000.00.00.00.00.00.0
.701000.02.70.41.90.00.0
 2500.00.20.00.20.00.0
 5000.00.00.00.00.00.0
 1,0000.00.00.00.00.00.0

In accordance with earlier research, cases in which the estimator did not converge were removed from the analysis. In contrast to earlier research, however, we kept replications in which Heywood cases occurred in the analysis. This was done as Heywood cases reflect natural variability in parameter estimates, and excluding them thus leads to biased parameter means and parameter variances (see Savalei & Kolenikov, 2008).

4.2. Relative bias of parameter estimates

To compare the performance of the three estimators, we computed the relative bias of the parameter estimates. To this end, a mean factor loading per replication was obtained by averaging the estimated factor loadings across the indicators. For the misspecified factor model, two mean factor loadings per condition were computed: one for the factor loadings of the correctly specified indicators and one for the incorrectly specified indicators. The relative bias of these mean estimates, and of the covariance, was then computed using inline image, where inline image denotes the estimated parameter and θ the true parameter value. Relative bias below 10% was considered acceptable. Relative bias of 10–20% was considered substantial, and above 20% unacceptable (see Forero et al., 2009).

Examining the relative bias of factor loadings in the correctly specified models, first, we found (see Table 4) that PIV tended to underestimate the true factor loading, whereas the other two estimators tended to overestimate it. Overall, PIV produced less bias (−4.4%) than ULS (10.6%) and DWLS (10.4%). As can be seen in Table 4, PIV yielded more accurate factor loading estimates in almost all conditions of low or moderate true factor loadings. When the factor loading was high, ULS and DWLS outperformed PIV. Besides these effects, the magnitude of the true factor loading had a negative influence on relative bias (low, 13.6%; moderate, 3.3%; high, −0.2%) and sample size had a positive influence (N = 100, 11.9%; N = 250, 6.5%; N = 500, 2.6%; N = 1,000, 1.1%). Importantly, the negative effect of a small sample size was stronger the lower the true factor loading. This result pattern held for all estimators alike, although it was more pronounced for ULS and DWLS. Finally, non-normality had negligible effects on the accuracy of factor loading estimates.

Table 4. Average relative bias (%) of factor loading estimates for correctly and incorrectly specified models depending on estimator, magnitude of true factor loading (λ) and sample size (N)
λ N ULSDWLSPIV
CorrectIncorrectCorrectIncorrectCorrectIncorrect
  1. Note. In the case of incorrect model specification, average relative bias refers to correctly specified factor loadings.

.4010066  45  63  44  −31  −49  
 25029  26  31  28  −6.1  −13  
 50011  12  11  11  0.3  0.4  
 1,0004.9  5.2  4.9  5.1  1.6  3.9  
.5510020  15.9  17  16  −6.0  −10.2  
 2504.3  6.0  4.1  5.5  −1.9  −3.3  
 5001.8  1.9  1.8  1.7  −0.8  −2.5  
 1,0000.4  −0.1  0.4  −0.1  −0.7  −2.8  
.701002.9  6.2  2.4  4.5  −3.9  −3.2  
 2500.8  2.1  0.6  1.4  −1.9  −1.3  
 5000.1  1.0  −0.1  0.4  −1.5  −1.4  
 1,000−0.1  0.3  −0.3  −0.1  −1.1  −1.2  

For the covariance estimate, we found that all three estimators yielded substantial biases, albeit PIV was more accurate (12.1%) than ULS (17.1%) and DWLS (19.5%). Also, the magnitude of the true factor loading (low, 19.5%; moderate, 14.7%; high, 14.6%), and the non-normality of the latent distributions had an influence on the quality of parameter estimates (normal, −0.1%; moderately non-normal, 15.2%; extremely non-normal, 34.2%). As can be seen in Table 5, PIV produced high biases in all conditions of a low true factor loading. In the other two factor loadings conditons, biases were greater the more non-normal the latent distributions were. For ULS and DWLS the negative effect of the non-normality of the latent distributions holds in all true factor loading conditions alike. Finally, biases were smaller the larger the size of the samples (N = 100, 19.9%; N = 250, 16.2%; N = 500, 14.8%; N = 1,000, 14.1%).

Table 5. Average relative bias (%) of covariance estimates for correctly and incorrectly specified models depending on estimator, magnitude of true factor loading (λ) and non-normality (NN)
λNNULSDWLSPIV
CorrectIncorrectCorrectIncorrectCorrectIncorrect
  1. Note. Levels of factor non-normality are 1 = multivariate normal, 2 = moderately non-normal, and 3 = extremely non-normal.

.401−0.4  761.1  78−17  10.1
 231  11433  12016  63
 341  14445  15326  74
.5510.5  802.1  90−1.3  57
 29.7  9612  1108.4  62
 332  12935  14933  106
.7012.0  873.3  1131.9  58
 27.8  9410.3  12710.1  70
 331  12134  15632  99

Concerning the incorrectly specified model (Tables 4 and 5), ULS as well as DWLS overestimated the loadings of the correctly specified items (9.1% and 8.8%, respectively), whereas PIV underestimated them (–6.9%). Furthermore, as in the case of the correctly specified model, relative biases were larger the smaller the true factor loading (low, 7.9%; moderate, 2.2%; high, 0.7%), and this effect was larger the smaller the size of the sample. When the true factor loading was moderate or high, all three estimators yielded acceptable biases at N = 250. Finally, non-normality of the latent distributions again had a negligible effect. For the falsely specified items (see Table 6), PIV produced the most accurate estimates as compared to ULS and DWLS. As can be seen, sample size had a positive effect for ULS and DWLS but not for PIV, and PIV estimates got larger – and hence more biased – the greater the magnitude of the true factor loading. Finally, for the covariance estimate, all three estimators produced unacceptable average biases in almost all conditions. However, it is noteworthy that this bias was lower – but still unacceptable – when PIV was used (see Table 5).

Table 6. Means of the factor loading estimates of the incorrectly specified indicators depending on estimator, magnitude of true factor loading (λ), and sample size (N)
λ N ULSDWLSPIV
.40100.84.83.08
 250.67.66.12
 500.50.49.14
 1,000.40.40.13
.55100.85.79.18
 250.65.61.19
 500.52.53.18
 1,000.48.49.18
.70100.81.76.23
 250.63.65.23
 500.59.63.23
 1,000.58.62.22

4.3. Relative bias of standard errors

We computed the relative bias of standard errors for the factor loadings and the covariance using inline image. Here, SE is the average standard error of the parameter estimates across valid replications, and inline image is the standard deviation of the respective parameter estimate.

Concerning the standard errors of the factor loadings in the correctly specified factor model, PIV yielded worse estimates (10.7%) than ULS (−4.8%) and DWLS (−5.7%). As a closer inspection of Table 7 reveals, this result is mainly due to unacceptable and substantial standard errors in conditions of a low true factor loading and a small sample size (i.e. N = 100 and N = 250); in all other conditions, PIV produced acceptable average standard errors. For ULS and DWLS, acceptable average standard errors emerged at N = 1,000 when the true factor loading was low and at N = 250 when the true factor loading was moderate. For the covariance, similar biases in standard errors occurred for all three estimators. Specifically, these were substantial and unacceptable when the true factor loading was low or moderate and sample size was small (N = 100 or N = 250). Finally, for both parameters, non-normality had no systematic effects on standard error estimates.

Table 7. Average relative bias (%) of standard errors depending on estimator, correctly or incorrectly specified models (MSP), magnitude of true factor loading (λ), and sample size (N)
MSPλ N ULSDWLSPIV
λTλFψλTλFψλTλFψ
  1. Note. λT and λF, respectively, refer to the average standard error bias concerning the correctly specified or falsely specified indicators. Levels of the factor model misspecification are C = correct, and IC = incorrect.

C.4010025 −2322 −2575 34
  250−19 −15−19 −1612 14
  500−15 −7.9−19 −8.63.6 5.8
  1,000−3.6 −3.5−3.8 −3.84.7 4.3
 .55100−25 −13−27 −179.0 6.5
  250−5.6 −6.2−6.3 −7.63.8 −0.3
  500−1.8 −0.6−2.1 −1.32.7 2.5
  1,0001.6 0.41.3 0.13.4 2.1
 .70100−8.1 −5.5−9.1 −9.95.6 −2.3
  250−1.7 −2.8−2.1 −4.42.8 −0.3
  500−0.1 1.5−0.1 0.12.9 1.9
  1,0001.5 3.31.6 2.83.1 3.7
IC.401003436−8.73129−10.1797032
  2507.0−2.6−6.46.50.1−8.4433212
  500−11−35−5.1−13−38−6.06.2135.1
  1,000−3.8−49−1.8−3.1−49−1.85.63.6−4.2
 .551005.9−5.2−8.0−6.0−18−10.92211−2.1
  250−9.2−44−6.7−7.0−49−6.34.13.5−13
  500−3.0−611.1−1.1−581.73.33.7−13
  1,00011.4−486.18.6−456.34.83.1−14
 .70100−6.2−41−8.7−22−49−104.53.7−19
  2502.4−57−5.5−4.0−45−4.02.02.2−23
  5008.6−476.3−0.1−412.71.32.3−22
  1,00010.1−429.11.5−393.42.14.1−20

Regarding the misspecified factor model (Table 7), PIV produced acceptable standard errors for both factor loading estimates at N = 1,000 when the true factor loading was low and at N = 250 when the true factor loading was moderate. For ULS and DWLS standard error bias was unacceptable at N = 100 for the correctly specified indicators. In the case of the falsely specified items both estimators produced unacceptable standard errors in almost all conditions. Concerning the covariance, whereas standard errors were substantial or unacceptable in almost all conditions for PIV, they were acceptable in almost all conditions for the other two estimators.

4.4. Chi-square test statistic

Table 8 presents means and rejection rates (at α = .05) for the chi-square test statistics depending upon estimator, magnitude of true factor loading, and sample size (non-normality had no systematic effects). For correctly specified models, overall rejection rates were near the cut-off value for ULS (4.8%) and DWLS (5.7%), and slightly lower for PIV (3.1%). As can be seen in Table 8, the test statistic based on the PIV estimates tended to under-reject models in almost all conditions. For ULS and DWLS, by contrast, under-rejections only occurred in conditions with low factor loadings and small sample sizes. Regarding the misspecified factor models, the test statistic yielded reasonable rejection rates for all three estimators except when the true factor loading was small and the sample size was small (N = 100 or N = 250) or when it was moderate and sample size was small (N = 100).

Table 8. Mean of the chi-square test statistic and type 1 error rate (at α= 5%) depending on estimator, correctly or incorrectly specified models, magnitude of true factor loading (λ), and sample size (N)
λ N ULSDWLSPIV
CorrectIncorrectCorrectIncorrectCorrectIncorrect
χ5%χ5%χ5%χ5%χ5%χ5%
.4010013.92.215.95.414.74.416.89.46.40.76.91.6
 25016.63.925.03516.94.725.7389.81.212.918
 50017.64.938.17817.85.338.87913.31.725.358
 1,00017.95.063.29918.05.264.19915.92.853.596
.5510014.14.325.35314.97.027.0609.51.816.235
 25015.84.053.89816.25.055.69813.42.144.492
 50016.65.110010016.95.610210015.33.793.299
 1,00016.85.419210017.05.619410015.84.4188100
.7010012.75.245.59813.57.847.29810.93.240.993
 25013.95.011210014.55.511210012.94.1112100
 50014.65.222310015.15.622110013.64.9230100
 1,00014.96.244110015.46.343410013.96.1463100

5. Discussion

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

The present simulation study was implemented to investigate the performance of the polychoric instrumental variable estimator in the estimation of a dichotomous confirmatory factor analysis. Specifically, we examined how (a) the factor model size, (b) the magnitude of the true factor loading, (c) the non-normality of the latent distributions, and (d) the size of the sample affected the performance of PIV. Also, its robustness to a model misspecification was tested. Finally, we compared the performance of PIV to two other well-established system-wide estimators, namely, ULS and DWLS.

The results showed, first, that PIV (like ULS and DWLS) provided accurate parameter estimates in most conditions when the model was correctly specified. In some of these conditions, for instance, when the true factor loading was small or moderate, PIV even provided more accurate estimates than ULS and DWLS. Furthermore, we found that the magnitude of the true factor loading had an impact on PIV factor loading estimates and PIV covariance estimates. Non-normality of the latent distributions, by contrast, only affected PIV covariance estimates. Importantly, both of these effects also arose for ULS and DWLS. Thus, overall, PIV produced estimates that were as accurate as ULS and DWLS, and it even outperformed ULS and DWLS in estimation accuracy in some conditions.

Second, when the model was misspecified, the quality of the parameter estimates for the correctly specified factor loading was the same for all three estimators. PIV, however, was more robust to the structural misspecification than ULS and DWLS for the falsely specified factor loadings. This result, however, is to be expected given that ULS and DWLS use all the available information to estimate the model parameters, including, for instance, not only the assignment of an indicator to a latent factor but also whether the latent factors are correlated. PIV, by contrast, employs only instrumental variables (IVs) – the other indicators – to compute the factor loadings. Also, given that the magnitude of an IV estimate depends on the strength of the relation between the IV and the predictor, the bias of the falsely specified factor loading is expected to increase the higher the correlations between the falsely assigned indicator and the other indicators used to compute the factor loading estimate. Note that this not only explains why the bias in parameter estimates got larger the higher the magnitude of the true factor loading, but also suggests that PIV will perform even better the more the correlation between the latent factors approaches zero. Finally, although PIV provided more accurate covariance estimates than ULS and DWLS, these were nevertheless unacceptable in almost all conditions. Again, this result is to be expected as the PIV estimator is most probably robust for the factor loadings but need not be robust for the covariance parameter (see Bollen & Maydeu-Olivares, 2007, for more information concerning the conditions on the robustness of variance–covariance parameters).

The present findings are thus consistent with earlier results showing that PIV as well as MIIV are more robust against structural misspecifications than other system-wide estimators. Although these results are important, they may be of limited value in applied contexts, as it may be more crucial to locate the structural misspecification in applied work (Saris, Satorra, & van der Veld, 2009). We believe, however, that even for this question an application of the MIIV approach is suitable. Specifically, recent evidence suggests that the over-identification tests (see footnote 2) used to test the appropriateness of IVs are a good means for diagnosing the source of a misspecification in the case of structural equation models with continuous indicators (Kirby & Bollen, 2009). The idea behind this approach is that for MIIV, the IVs are unambiguously determined by the hypothesized model structure. When an over-identification test concerning a set of IVs thus fails, this implies that the model structure that suggested these IVs must be misspecified. We think that it is an interesting task for future research to investigate whether this approach is also suitable for SEMs with ordinal indicators.

Third, concerning standard error bias, the present simulation found that ULS and DWLS provided more accurate estimates than PIV when the model was correctly specified. This was, however, mainly due to biased standard errors in cases of a small true factor loading. A potential explanation of this might be that the correlations between the two indicators that served as IVs and the indicator for which the factor loading had to be estimated were too low. This weakness of the IV has been found to influence the asymptotic behaviour of IV estimators before (see Baum et al., 2003; Bound, Jaeger, & Baker, 1995).

Finally, the test statistics for all three estimators performed well in the case of the misspecified factor model, although all three tended to under-reject the wrong factor model when the true factor loading was small and the size of the sample was smaller than N = 1,000.

PIV also tended to under-reject models when the model was correctly specified. Here, better results emerged for the other two estimation methods.

In short, then, the present work shows that the PIV estimator is an interesting alternative method for the estimation of the parameters of a dichotomous confirmatory factor analysis. When the factor model was correctly specified, PIV yielded results similar to ULS and DWLS in most cases, and it was more robust to structural misspecifications than ULS and DWLS. However, the present simulation is just a first step in investigating the performance of the PIV estimator. It would be a worthwhile task for future research to examine PIV in other contexts, such as full structural equation models or models with exogenous observed covariates, to conclusively answer whether it is a true alternative to the more common system-wide estimators.

Acknowledgements

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

We thank Boris Egloff, Stanislav Kolenikov, Stefan Schmukle and three anonymous reviewers for very helpful comments on an earlier version of this article.

Footnotes
  • 1

    A number of earlier accounts used a two-stage least squares 2SLS/IV approach to obtain model parameters. Madansky (1964) used IVs to estimate the factor loadings in an exploratory factor analysis. Hägglund (1982) and Jöreskog (1983) extended this approach to factor analysis models with uncorrelated errors. Finally, Jöreskog and Sörbom (1996) suggested a 2SLS estimator for the parameters of the structural part of an SEM. First, a 2SLS/IV approach is first used to estimate the factor loadings of the measurement models together with the covariance matrix of the latent factors. Second, this matrix is then used by a 2SLS estimator to estimate the parameters of the structural part of the SEM. Bollen's (1996) 2SLS/IV approach differs from all three approaches in that it allows for correlated errors and provides the asymptotic covariance matrix of the model parameters. Finally, it differs from Jöreskog and Sörbom's (1996) approach in that Bollen's method estimates the parameters of the structural part of the SEM without estimating the measurement model first.

  • 2

    If there are as many IVs as there are scaling indicators, the equation is said to be exactly identified; if there are more IVs than scaling indicators, the equation is called over-identified. In the latter case, a validity test can be performed to determine whether a set of IVs is uncorrelated with the composite disturbance term (see Sargan, 1958, for the test statistic; or Baum et al., 2003, for variants of this test).

  • 3

    We ignored different conditions of model size (i.e. the number of indicators) in our simulation as other simulation research showed that the effect of model size on parameter estimates and standard errors is notably small unless the number of indicators is fewer than five (Flora & Curran, 2004; Forero et al., 2009).

  • 4

    To obtain starting values for the factor loading estimates, other software packages such as Mplus (Version 5 and higher), LISREL, or the Stata confa module (Kolenikov, 2009) use IV-based approaches. We did not use such an approach here as we wanted to compare the performance of the three estimators given the same initial (estimation) situation.

References

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information
  • Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association , 91, 444455. doi:10.2307/2291629
  • Babakus, E., Ferguson, C. E., & Jöreskog, K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research , 37, 72141. doi:10.2307/3151512
  • Baum, C. F., Schaffer, M. E., & Stillman, S. (2003). Instrumental variables and GMM: Estimation and testing. Stata Journal , 3, 131.
  • Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling , 13, 186203. doi:10.1207/s15328007sem1302_2
  • Bentler, P. M., & Savalei, V. (2010). Analysis of correlation structures: Current status and open problems. In S. Kolenikov, D. Steinley, & L. Thombs (Eds.), Statistics in the social sciences: Current methodological developments . Hoboken , NJ : Wiley.
  • Bollen, K. A. (1989). Structural equations with latent variables . New York : Wiley.
  • Bollen, K. A. (1996). An alternative two stage least square (2SLS) estimator for latent variables. Psychometrika , 61, 109121. doi:10.1007/BF02296961
  • Bollen, K. A., & Bauer, D. J. (2004). Automating the selection of model-implied instrumental variables. Sociological Methods and Research , 32, 425452. doi:10.1177/0049124103260341
  • Bollen, K. A., Bauer, D. J., Christ, S. L., & Edwards, M. C. (2010). An overview of structural equation models and recent extensions. In S. Kolenikov, D. Steinley, & L. Thombs (Eds.), Statistics in the social sciences: Current methodological developments . Hoboken , NJ : Wiley.
  • Bollen, K. A., & Maydeu-Olivares, A. (2007). Polychoric instrumental variable (PIV) estimator for structural equations with categorical variables. Psychometrika , 3, 309326. doi:10.1007/s11336-007-9006-3
  • Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variables is weak. Journal of the American Statistical Association , 90, 443450. doi:10.2307/2291055
  • Browne, M. W. (1984). Asymptotic distribution free methods in the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology , 37, 127141.
    Direct Link:
  • Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika , 40, 532. doi:10.1007/BF02291477
  • Christoffersson, A., & Gunsjö, A. (1996). A short note on the estimation of the asymptotic covariance matrix for polychoric correlations. Psychometrika , 61, 173175. doi:10.1007/BF02296965
  • Dolan, C. V. (1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: A comparison of categorical variable estimators using simulated data. British Journal of Mathematical and Statistical Psychology , 47, 309326.
    Direct Link:
  • Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika , 43, 521532. doi:10.1007/BF02293811
  • Flora, D., & Curran, P. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods , 9, 466491. doi:10.1037/1082-989X.9.4.466
  • Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte-Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling , 16, 625641. doi:10.1080/10705510903203573
  • Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2011). mvtnorm: Multivariate normal and t distributions. R package version 0.9–9991.
  • Hägglund, G. (1982). Factor analysis by instrumental variables methods. Psychometrika , 47, 20922. doi:10.1007/BF02296276
  • Hall, A. R. (2005). Generalized method of moments . Oxford : Oxford University Press.
  • Jöreskog, K. G. (1983). Factor analysis as an error-in-variables model. In H. Wainer & S. Messick (Eds.), Principles of modern psychological measurement . Hillsdale , NJ : Lawrence Erlbaum.
  • Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika , 59, 381389. doi:10.1007/BF02296131
  • Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User's reference guide . Chicago : Scientific Software.
  • Kirby, J. B., & Bollen, K. A. (2009). Using instrumental variables to evaluate model specification in latent variable structural equation models. Sociological Methodology , 39, 327355.
  • Kolenikov, S. (2009). Confirmatory factor analysis using confa. Stata Journal , 9, 329373.
  • Madansky, A. (1964). Instrumental variables in factor analysis. Psychometrika , 29, 10513. doi:10.1007/BF02289693
  • McDonald, R. P. (1999). Test theory: A unified approach . Mahwah , NJ : Lawrence Erlbaum Associates.
  • Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research . Cambridge : Cambridge University Press.
  • Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika , 43, 551560. doi:10.1007/BF02293813
  • Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical and continuous latent variable indicators. Psychometrika , 49, 115132. doi:10.1007/BF02294210
  • Muthén, B., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus . Retrieved January 06, 2012, from http://www.statmodel.com/download/webnotes/CatMGLong.pdf
  • Muthén, B., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes . Unpublished manuscript.
  • Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor-analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology , 45, 1930.
    Direct Link:
  • Muthén, B., & Satorra, A. (1995). Technical aspects of Muthén's LISCOMP approach to estimation of latent variable relations with a comprehensive measurement model. Psychometrika , 60, 489503. doi:10.1007/BF02294325
  • Muthén, L., & Muthén, B. (1998). Mplus user guide (Version 3.1). Los Angeles : Muthén & Muthén.
  • Nestler, S., Back, M. D., & Egloff, B. (2011). Psychometrische Eigenschaften zweier Skalen zur Erfassung interindividueller Unterschiede in der Präferenz zum Alleinsein. Diagnostica , 57, 5767. doi:10.1026/0012-1924/a000032
  • Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika , 44, 443460. doi:10.1007/BF02296207
  • R Development Core Team (2011). R: A language and environment for statistical computing . Vienna : R Foundation for Statistical Computing.
  • Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica , 26, 393415.
  • Saris, W. E., Satorra, A., & van der Veld, W. (2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling , 16, 561582. doi:10.1080/10705510903203433
  • Savalei, V., & Kolenikov, S. (2008). Constrained vs. unconstrained estimation in structural equation modeling. Psychological Methods , 13, 150170. doi:10.1037/1082-989X.13.2.150
  • Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika , 48, 465471. doi:10.1007/BF02293687
  • Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes (doctoral dissertation). University of California, Los Angeles .

Supporting Information

  1. Top of page
  2. Abstract 
  3. 1. Introduction
  4. 2. The dichotomous factor analysis model
  5. 3. Method
  6. 4. Results
  7. 5. Discussion
  8. Acknowledgements
  9. References
  10. Supporting Information

The following supporting information may be found in the online edition of this article:

Summary statistics and additional results of the simulation

R-Code for the three estimators compared in the simulation: DWLS.txt, ULS.txt, and PIV.txt

Examples: Sample.txt, Corr.txt, and ACM.txt

FilenameFormatSizeDescription
bmsp2044_sm_ACM.txt5KSupporting info item
bmsp2044_sm_Corr.txt1KSupporting info item
bmsp2044_sm_DWLS.txt6KSupporting info item
bmsp2044_sm_PIV.txt8KSupporting info item
bmsp2044_sm_Sample.txt17KSupporting info item
bmsp2044_sm_ULS.txt5KSupporting info item
bmsp2044_sm_suppmat1.xls190KSupporting info item

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.