#### 5.1. Convergence failures and improper solutions

While the focus of this paper is on test statistics, and not on parameter estimates, rates of non-convergence and improper solutions remain relevant. When comparing rejection rates of test statistics, particularly across different estimators, results may depend on how convergence failures and improper solutions are treated during the comparison. Even within the same estimator, different test statistics may ‘win’ when the comparison is done including improper solutions compared to when excluding them. We first discuss the observed number of convergence failures and rates of improper solutions before addressing the issue of how they should be treated in the test statistics comparison.

Table 2 (left panel) shows the number of convergence failures for model 1. At *N*= 600, there are no convergence failures, and these columns are omitted. Note that convergence rates differ by the type of estimator only (cat-ULS vs. cat-DWLS), and within a particular estimator are not affected by the type of test statistic. Most convergence failures occur when the sample size is small and the data have few categories. Convergence rates for binary data are the worst. However, the number of convergence failures is negligible in the S, MA-I, and MA-II conditions. The highest observed rate of convergence failures is 11.6%, corresponding to the cat-DWLS estimator. ULS almost always produces better convergence rates than DWLS. The highest convergence failure rate for ULS is 8%). Across all conditions, 94 more replications converged via ULS than DWLS. The ULS fit function is simpler and thus may be computationally more stable under difficult conditions.

Table 2. Number of convergence failures and convergence failures plus outliers out of 1,000 replications in each cell of the design: model 1. At *N*= 600, no convergence failures occurredThreshold condition | Number of categories | Convergence failures | Convergence Failures + Improper Solutions |
---|

*N*= 100 | *N*= 150 | *N*= 100 | N = 150 | N = 100 | N = 150 | N = 100 | N = 150 |
---|

DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS |
---|

S | 2 | 10 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 148 | 143 | 76 | 74 | 1 | 0 | 0 | 0 |

| 3 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 46 | 47 | 23 | 19 | 0 | 0 | 0 | 0 |

| 4 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 19 | 1 | 2 | 0 | 0 | 0 | 0 |

| 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 20 | 21 | 7 | 7 | 0 | 0 | 0 | 0 |

| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 13 | 1 | 1 | 0 | 0 | 0 | 0 |

| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 20 | 1 | 1 | 0 | 0 | 0 | 0 |

MA-I | 2 | 13 | 16 | 1 | 1 | 0 | 0 | 0 | 0 | 173 | 171 | 74 | 71 | 5 | 5 | 0 | 0 |

| 3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | 43 | 16 | 21 | 0 | 0 | 0 | 0 |

| 4 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 38 | 37 | 9 | 7 | 0 | 0 | 0 | 0 |

| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 13 | 4 | 3 | 0 | 0 | 0 | 0 |

| 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 21 | 2 | 3 | 0 | 0 | 0 | 0 |

| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 11 | 2 | 3 | 0 | 0 | 0 | 0 |

MA-II | 2 | 9 | 6 | 2 | 2 | 0 | 0 | 0 | 0 | 192 | 185 | 77 | 82 | 6 | 6 | 2 | 2 |

| 3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 31 | 4 | 4 | 0 | 0 | 0 | 0 |

| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 39 | 39 | 12 | 12 | 0 | 0 | 0 | 0 |

| 5 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 29 | 25 | 4 | 4 | 0 | 0 | 0 | 0 |

| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 18 | 1 | 1 | 0 | 0 | 0 | 0 |

| 7 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 19 | 3 | 2 | 0 | 0 | 0 | 0 |

EA-I | 2 | 80 | 116 | 48 | 69 | 2 | 1 | 0 | 0 | 463 | 457 | 330 | 316 | 53 | 43 | 9 | 6 |

| 3 | 21 | 18 | 2 | 2 | 0 | 0 | 0 | 0 | 173 | 156 | 82 | 79 | 4 | 3 | 0 | 0 |

| 4 | 7 | 6 | 1 | 0 | 0 | 0 | 0 | 0 | 88 | 76 | 20 | 21 | 0 | 1 | 0 | 0 |

| 5 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 43 | 44 | 16 | 12 | 0 | 0 | 0 | 0 |

| 6 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 32 | 28 | 4 | 7 | 0 | 0 | 0 | 0 |

| 7 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 27 | 23 | 6 | 5 | 0 | 0 | 0 | 0 |

EA-II | 2 | 78 | 79 | 31 | 66 | 1 | 1 | 0 | 0 | 357 | 281 | 235 | 285 | 84 | 114 | 23 | 27 |

| 3 | 12 | 13 | 2 | 4 | 0 | 0 | 0 | 0 | 255 | 245 | 116 | 113 | 6 | 7 | 0 | 0 |

| 4 | 5 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 96 | 97 | 41 | 41 | 2 | 1 | 0 | 0 |

| 5 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 61 | 57 | 11 | 10 | 0 | 0 | 0 | 0 |

| 6 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 36 | 33 | 6 | 7 | 0 | 0 | 0 | 0 |

| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 26 | 6 | 8 | 0 | 0 | 0 | 0 |

Somewhat surprisingly, convergence rates in all conditions are much better for the larger model 2 (these data are not presented). It appears that a greater number of indicators per factor (10 rather than 5) increases the stability of estimation. The number of convergence failures is less than 5 out of 1,000 in all but three cells; in these three cells, all corresponding to the DWLS estimator, the number of failures is 7, 7, and 11. These values are too small to make any difference for the rejection rates.

The right panel of Table 2 shows the *total* number of convergence failures and improper solutions for model 1. That is, the numbers in the right panel include the convergence failures in the left panel plus any additional problematic cases. A replication was said to have an improper solution if at least one residual variance parameter took on a negative value (because the polychoric correlation matrix has 1s on the diagonal, this is equivalent to excluding cases where at least one factor loading was estimated to be greater than 1). Additionally, all replications were checked for outlying estimates of standard errors (SEs), namely SEs greater than 1. However, with the exception of a single replication in a single cell, all SE outliers occurred in replications that also contained improper solutions.

The pattern here is similar, in that the intersection of a small size and binary data creates the most troublesome conditions in terms of the number of problematic cases. The most difficult conditions correspond to the two extreme asymmetry threshold conditions, where almost half of all replications produce improper solutions or result in convergence failures in some cells. It is now the case that cat-DWLS leads to slightly lower combined rates of convergence failures and improper solutions than does cat-ULS. A total of 91 more cases are considered acceptable under cat-DWLS than under cat-ULS. This advantage is mostly due to improper solutions in the two extreme asymmetry conditions.

The number of improper solutions is much smaller for the larger model 2 (these data are not presented). The total number of convergence failures and improper solutions across S, MA-I, and MA-II threshold conditions was between 0 and 4 for data with 3–7 categories, and between 0 and 2 for the largest three sample sizes for data with any number of categories. The only conditions with a greater number of problematic cases were at the intersection of 2-category data and *N*= 100, where the greatest number of improper solutions was 24. In the EA-I and EA-II threshold conditions, the greatest number of problematic cases was 129. In general, the number of problematic cases for model 2 was at least three times smaller than the corresponding number for model 1.

One way to summarize the results of Table 2 is as follows: ULS is more likely to produce *any* output, while DWLS is more likely to produce “clean” output. These findings replicate those of Forero *et al*. (2009), who found that cat-DWLS produced more cases that converged without outliers, and of Yang-Wallentin *et al*. (2010), who found that ULS converged more frequently. However, the differences among the methods in the number of acceptable cases, defined either way, is never greater than 6% of all cases, and is typically much smaller. It is not clear that one method should be preferred over the other based on convergence rates and improper solutions alone.

In order to meaningfully compare Type I error rates for the five test statistics, a decision must be made about how to treat convergence failures and improper solutions in the computations of the Type I error rates. There is some disagreement among methodologists as to the best strategy. From a statistical point of view, Type I error rates are only meaningful if they are computed across *all* replications in a cell, that is, out of 1,000 cases. Conditioning the choice of replications to be kept in the analysis in any way ruins the statistical rationale for expecting a 5% rejection rate at α= .05. This is because exclusion criteria are typically correlated with the size of the test statistic itself. Some programs, including M*plus*, do not produce any output when a case fails to converge; it is thus impossible to use the inclusive strategy of evaluating rejection rates across all cases. Because researchers frequently interpret lack of convergence as indicative of poor model fit, another approach is to count non-converged cases as rejections of the model (Yuan & Hayashi, 2003). This strategy has the potential to produce strongly biased rejection rates in difficult conditions (e.g., small *N*, asymmetric threshold distributions), and it is not a very common strategy in practice. An intermediate strategy would be to simply exclude convergence failures from the analysis. We follow this strategy.^{2}

The case of improper solutions is more complicated, and the decision has the potential to skew the results since many such cases were observed. Chen, Bollen, Paxton, Curran, & Kirby (2001) conducted a simulation study investigating the rate of improper solutions as a function of model misspecification and did not find a clear relationship, concluding that “researchers should not use negative error variance estimates as an indicator of model misspecification” (p. 501). Improper solutions are in fact to be expected in small samples and do not represent a statistical anomaly (Savalei & Kolenikov, 2008). Thus, unlike with convergence failures, replications with improper solutions probably should *not* be counted as cases where the model is rejected. In fact, because such cases typically produce full model output, one can simply include them in the study, which is the strategy employed here. We believe it would be statistically unwise to exclude them from the computation of rejection rates, because as much as 46% of all replications in some cells would have to be excluded. However, results were compared with and without the inclusion of improper solutions, and only minor differences were found (see also Chen *et al*., 2001). The largest of these differences are noted in this text.

#### 5.2. Type I error rates

Tables 3–8 present Type I error rates at α= .05 for data with 2 to 7 categories, respectively. Data for both models are included in each table. Rejection rates are based on all converged cases. Rejection rates in these tables are highlighted if they are statistically greater than .05. The 95% confidence interval for rejection rates when the population value is .05 is from .0365 to .0635, based on 1,000 replications. Rejection rates in Tables 3–8 are additionally printed in bold if they fall outside the bounds specified by Bradley's liberal criterion, which are from .025 to .075 (Bradley, 1978). In the few difficult conditions when virtually all cells are highlighted and in bold, test statistics can be compared based on the absolute rejection rates – the extent of inflation still matters in this case, in that a rejection rate of 10% indicates better performance in difficult conditions than a rejection rate of 20%.

Table 3. Rejection rates of five test statistics at α= .05 when the number of categories is 2. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05 (for 1,000 replications, this interval is from .0365 to .0635). Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.090** | .047 | .051 | **.021** | **.024** | **.238** | .043 | .048 | **.012** | **.013** |

| 150 | **.079** | .051 | .054 | .036 | .037 | **.131** | .034 | .035 | **.013** | **.013** |

| 350 | .065 | .044 | .044 | .037 | .037 | **.096** | .042 | .042 | .028 | .030 |

| 600 | .072 | .058 | .059 | .055 | .055 | **.077** | .046 | .047 | .037 | .039 |

MA-I | 100 | **.105** | .063 | .064 | .026 | .027 | **.238** | .048 | .054 | **.016** | **.016** |

| 150 | **.089** | .056 | .058 | .037 | .040 | **.175** | .047 | .050 | **.016** | **.020** |

| 350 | .073 | .057 | .057 | .046 | .048 | **.101** | .036 | .037 | **.024** | .025 |

| 600 | **.085** | .068 | .068 | .061 | .063 | **.095** | .051 | .053 | .040 | .043 |

MA-II | 100 | **.099** | .057 | .059 | .031 | .033 | **.231** | .047 | .059 | **.005** | **.008** |

| 150 | **.096** | .058 | .062 | .037 | .040 | **.181** | .052 | .054 | **.016** | **.018** |

| 350 | .055 | .041 | .041 | .035 | .035 | **.101** | .048 | .049 | .029 | .034 |

| 600 | .060 | .049 | .049 | .046 | .046 | **.087** | .062 | .063 | .048 | .050 |

EA-I | 100 | **.390** | **.231** | **.244** | **.010** | **.013** | **.942** | **.709** | **.736** | **.003** | **.003** |

| 150 | **.276** | **.207** | **.218** | .025 | .027 | **.768** | **.578** | **.587** | **.012** | **.016** |

| 350 | **.075** | .051 | .053 | .042 | .042 | **.156** | .044 | .045 | .027 | .033 |

| 600 | **.080** | .059 | .060 | .056 | .058 | **.106** | .050 | .051 | .044 | .046 |

EA-II | 100 | **.457** | **.342** | **.355** | **.008** | **.010** | **.953** | **.835** | **.849** | **.001** | **.001** |

| 150 | **.352** | **.284** | **.287** | .030 | .031 | **.922** | **.789** | **.796** | **.010** | **.012** |

| 350 | **.108** | **.083** | **.084** | .055 | .056 | **.328** | **.218** | **.220** | .047 | .049 |

| 600 | **.078** | .060 | .061 | .060 | .062 | **.161** | **.092** | **.092** | .063 | .065 |

Table 4. Rejection rates of five test statistics at α= .05 when the number of categories is 3. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05. Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.103** | .059 | .062 | .025 | .027 | **.218** | .029 | .031 | **.005** | **.007** |

| 150 | **.097** | .054 | .054 | .034 | .035 | **.169** | .029 | .035 | **.017** | **.017** |

| 350 | **.080** | .058 | .059 | .044 | .048 | **.098** | .036 | .037 | .027 | .030 |

| 600 | .061 | .048 | .049 | .039 | .040 | **.098** | .044 | .045 | .032 | .033 |

MA-I | 100 | **.102** | .057 | .059 | .038 | .041 | **.229** | .039 | .042 | **.010** | **.012** |

| 150 | **.103** | .070 | .074 | .054 | .054 | **.168** | .033 | .042 | **.017** | **.017** |

| 350 | .068 | .047 | .047 | .039 | .039 | **.100** | .044 | .045 | .028 | .033 |

| 600 | .069 | .050 | .051 | .049 | .050 | **.105** | .054 | .055 | .040 | .043 |

MA-II | 100 | **.112** | .056 | .063 | .032 | .035 | **.243** | .046 | .051 | **.021** | .025 |

| 150 | **.096** | .067 | .069 | .048 | .052 | **.145** | .033 | .036 | **.014** | **.015** |

| 350 | .066 | .046 | .046 | .039 | .039 | **.121** | .047 | .050 | .037 | .037 |

| 600 | **.082** | .066 | .066 | .059 | .059 | **.084** | .047 | .050 | .039 | .042 |

EA-I | 100 | **.150** | **.082** | **.086** | .052 | .057 | **.433** | **.101** | **.112** | .031 | .033 |

| 150 | **.116** | .069 | .071 | .044 | .045 | **.291** | .068 | .075 | **.024** | .026 |

| 350 | **.090** | .067 | .068 | .050 | .054 | **.126** | .047 | .048 | .032 | .034 |

| 600 | **.076** | .051 | .052 | .049 | .050 | **.107** | .052 | .055 | .042 | .044 |

EA-II | 100 | **.178** | **.098** | **.106** | .054 | .059 | **.443** | **.138** | **.145** | .053 | .062 |

| 150 | **.145** | **.098** | **.101** | .061 | .065 | **.271** | **.092** | **.095** | .043 | .048 |

| 350 | .064 | .046 | .046 | .034 | .037 | **.147** | .072 | .074 | .052 | .054 |

| 600 | .065 | .057 | .058 | .052 | .052 | **.105** | .052 | .053 | .045 | .045 |

Table 5. Rejection rates of five test statistics at α= .05 when the number of categories is 4. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05. Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.156** | **.085** | **.089** | .051 | .054 | **.368** | **.081** | **.089** | **.019** | **.021** |

| 150 | **.120** | .068 | .069 | .046 | .049 | **.225** | .072 | **.082** | .031 | .033 |

| 350 | **.095** | .059 | .060 | .045 | .046 | **.144** | .058 | .061 | .041 | .043 |

| 600 | .060 | .047 | .048 | .043 | .044 | **.109** | .045 | .046 | .039 | .039 |

MA-I | 100 | **.155** | **.077** | **.081** | .050 | .053 | **.422** | **.112** | **.126** | .038 | .042 |

| 150 | **.134** | **.084** | **.089** | .065 | .069 | **.281** | **.098** | **.105** | .041 | .045 |

| 350 | **.094** | .068 | .072 | .060 | .061 | **.162** | .063 | .068 | .054 | .056 |

| 600 | **.087** | .064 | .064 | .062 | .063 | **.111** | .048 | .048 | .036 | .038 |

MA-II | 100 | **.173** | **.099** | **.106** | .061 | .065 | **.418** | **.127** | **.140** | .040 | .046 |

| 150 | **.117** | .075 | **.076** | .054 | .056 | **.261** | .073 | **.083** | .036 | .036 |

| 350 | **.078** | .057 | .059 | .043 | .047 | **.144** | .067 | .066 | .046 | .051 |

| 600 | .068 | .051 | .051 | .046 | .047 | **.121** | .074 | .074 | .062 | .064 |

EA-I | 100 | **.156** | **.077** | **.084** | .032 | .038 | **.366** | **.083** | **.098** | **.022** | .033 |

| 150 | **.117** | **.076** | **.078** | .055 | .056 | **.248** | .074 | **.080** | .035 | .039 |

| 350 | **.080** | .057 | .062 | .045 | .046 | **.142** | .050 | .051 | .037 | .040 |

| 600 | **.091** | .061 | .061 | .057 | .059 | **.087** | .041 | .045 | .036 | .036 |

EA-II | 100 | **.175** | **.091** | **.097** | .050 | .050 | **.377** | **.106** | **.115** | .030 | .036 |

| 150 | **.121** | .069 | .071 | .051 | .052 | **.242** | **.081** | **.087** | .037 | .038 |

| 350 | **.092** | .066 | .069 | .053 | .056 | **.123** | .056 | .058 | .038 | .040 |

| 600 | .064 | .055 | .056 | .049 | .052 | **.103** | .050 | .053 | .039 | .040 |

Table 6. Rejection rates of five test statistics at α= .05 when the number of categories is 5. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05. Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.205** | **.116** | **.124** | .067 | .074 | **.444** | **.120** | **.132** | .049 | .054 |

| 150 | **.139** | **.095** | **.095** | .070 | .071 | **.292** | **.088** | **.093** | .037 | .038 |

| 350 | **.104** | **.079** | **.080** | .065 | .067 | **.147** | .052 | .054 | .039 | .039 |

| 600 | **.085** | .062 | .063 | .056 | .056 | **.132** | .052 | .054 | .043 | .044 |

MA-I | 100 | **.228** | **.135** | **.140** | **.080** | **.081** | **.525** | **.172** | **.187** | **.082** | **.089** |

| 150 | **.158** | **.100** | **.104** | .072 | .074 | **.360** | **.140** | **.145** | .072 | **.082** |

| 350 | **.107** | .073 | .074 | .057 | .061 | **.162** | **.078** | **.081** | .054 | .055 |

| 600 | **.095** | .074 | **.077** | .069 | .070 | **.114** | .040 | .043 | .036 | .037 |

MA-II | 100 | **.225** | **.127** | **.134** | .068 | .073 | **.500** | **.160** | **.176** | .058 | .068 |

| 150 | **.158** | **.112** | **.114** | **.080** | **.087** | **.348** | **.126** | **.134** | .064 | .070 |

| 350 | **.077** | .054 | .055 | .041 | .044 | **.156** | .071 | .074 | .051 | .053 |

| 600 | **.082** | .060 | .061 | .054 | .055 | **.125** | .050 | .052 | .041 | .041 |

EA-I | 100 | **.137** | **.079** | **.081** | .052 | .053 | **.378** | **.098** | **.108** | .036 | .039 |

| 150 | **.120** | **.077** | **.081** | .056 | .059 | **.272** | .071 | .074 | .033 | .042 |

| 350 | **.088** | .062 | .062 | .058 | .058 | **.135** | .043 | .045 | .029 | .029 |

| 600 | **.085** | .065 | .067 | .059 | .064 | **.115** | .048 | .051 | .037 | .039 |

EA-II | 100 | **.172** | **.099** | **.103** | .059 | .062 | **.404** | **.107** | **.112** | .041 | .048 |

| 150 | **.112** | .070 | .075 | .053 | .055 | **.274** | **.090** | **.093** | .049 | .054 |

| 350 | **.093** | .062 | .064 | .058 | .060 | **.135** | .056 | .060 | .041 | .044 |

| 600 | .069 | .057 | .057 | .055 | .055 | **.100** | .053 | .056 | .039 | .039 |

Table 7. Rejection rates of five test statistics at α= .05 when the number of categories is 6. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05. Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.242** | **.138** | **.145** | **.091** | **.096** | **.559** | **.201** | **.220** | **.079** | **.087** |

| 150 | **.166** | **.103** | **.108** | .068 | .068 | **.358** | **.128** | **.133** | .055 | .060 |

| 350 | **.109** | .072 | .074 | .061 | .064 | **.187** | **.081** | **.085** | .062 | .067 |

| 600 | **.077** | .064 | .064 | .055 | .056 | **.126** | .060 | .061 | .047 | .050 |

MA-I | 100 | **.237** | **.155** | **.160** | **.093** | **.101** | **.563** | **.208** | **.224** | **.088** | **.097** |

| 150 | **.172** | **.115** | **.116** | **.085** | **.088** | **.416** | **.182** | **.187** | **.092** | **.096** |

| 350 | **.124** | **.090** | **.093** | .074 | **.077** | **.199** | **.092** | **.096** | .067 | .069 |

| 600 | **.095** | .073 | .073 | .063 | .065 | **.132** | .064 | .067 | .053 | .055 |

MA-II | 100 | **.239** | **.158** | **.162** | **.082** | **.088** | **.577** | **.236** | **.251** | **.096** | **.101** |

| 150 | **.168** | **.114** | **.115** | **.079** | **.083** | **.348** | **.126** | **.134** | .064 | .070 |

| 350 | **.112** | .075 | **.076** | .057 | .062 | **.179** | **.086** | **.090** | .065 | .065 |

| 600 | **.081** | .068 | .071 | .052 | .055 | **.140** | .072 | .072 | .062 | .063 |

EA-I | 100 | **.183** | **.101** | **.107** | .052 | .056 | **.435** | **.117** | **.128** | .049 | .054 |

| 150 | **.141** | **.094** | **.097** | .072 | .074 | **.284** | **.079** | **.082** | .049 | .052 |

| 350 | **.093** | .061 | .065 | .052 | .053 | **.159** | .063 | .065 | .054 | .056 |

| 600 | **.079** | .067 | .067 | .062 | .063 | **.116** | .064 | .066 | .048 | .051 |

EA-II | 100 | **.177** | **.110** | **.113** | .069 | .075 | **.428** | **.131** | **.145** | .059 | .066 |

| 150 | **.139** | **.086** | **.087** | .071 | .072 | **.287** | **.097** | **.101** | .048 | .051 |

| 350 | **.104** | **.081** | **.084** | .072 | .074 | **.146** | .067 | .071 | .052 | .053 |

| 600 | **.078** | .061 | .062 | .055 | .056 | **.119** | .064 | .064 | .054 | .056 |

Table 8. Rejection rates of five test statistics at α= .05 when the number of categories is 7. The rates are out of the number of all converged cases. Values are highlighted if they are statistically greater than .05. Values are highlighted and in bold if they additionally fall outside Bradley's liberal criterion (between .025 and .075)Threshold condition | Sample size, *N* | Model 1 | Model 2 |
---|

DWLS | ULS | DWLS | ULS |
---|

(1) | (2) | (4) | (3) | (5) | (1) | (2) | (4) | (3) | (5) |
---|

S | 100 | **.291** | **.193** | **.204** | **.126** | **.131** | **.665** | **.290** | **.315** | **.121** | **.134** |

| 150 | **.193** | **.134** | **.138** | **.092** | **.095** | **.463** | **.190** | **.211** | **.095** | **.102** |

| 350 | **.114** | **.079** | **.081** | .061 | .063 | **.199** | **.096** | **.097** | .070 | .073 |

| 600 | **.104** | **.078** | **.082** | .073 | **.077** | **.152** | **.075** | **.076** | .060 | .061 |

MA-I | 100 | **.261** | **.172** | **.177** | **.097** | **.100** | **.620** | **.252** | **.271** | **.098** | **.113** |

| 150 | **.179** | **.127** | **.130** | **.090** | **.094** | **.429** | **.160** | **.170** | .071 | **.080** |

| 350 | **.114** | **.089** | **.091** | **.078** | **.081** | **.213** | **.090** | **.093** | .065 | .068 |

| 600 | **.097** | .072 | .074 | .067 | .070 | **.149** | **.084** | **.085** | .066 | .068 |

MA-II | 100 | **.218** | **.154** | **.156** | **.094** | **.099** | **.593** | **.238** | **.261** | **.091** | **.103** |

| 150 | **.174** | **.116** | **.120** | **.078** | **.082** | **.450** | **.185** | **.192** | **.097** | **.102** |

| 350 | **.115** | **.077** | **.078** | .063 | .064 | **.262** | **.114** | **.121** | **.078** | **.083** |

| 600 | **.090** | .066 | .067 | .060 | .061 | **.155** | **.080** | **.082** | .070 | .071 |

EA-I | 100 | **.208** | **.129** | **.135** | **.081** | **.083** | **.534** | **.172** | **.185** | **.079** | **.083** |

| 150 | **.144** | **.093** | **.093** | .072 | .072 | **.351** | **.107** | **.117** | .060 | .065 |

| 350 | **.094** | .070 | .072 | .061 | .061 | **.165** | .065 | .068 | .051 | .054 |

| 600 | **.080** | .063 | .063 | .058 | .059 | **.128** | .051 | .052 | .039 | .043 |

EA-II | 100 | **.203** | **.126** | **.131** | **.079** | **.085** | **.521** | **.191** | **.208** | **.087** | **.098** |

| 150 | **.146** | **.099** | **.107** | **.076** | **.077** | **.319** | **.098** | **.103** | .054 | .061 |

| 350 | **.085** | .061 | .063 | .056 | .056 | **.179** | .074 | .074 | .053 | .058 |

| 600 | **.080** | .049 | .050 | .045 | .046 | **.132** | .069 | .071 | .062 | .063 |

Across all numbers of categories (all tables), the original and the new versions of the mean- and variance-adjusted statistics perform very similarly for both estimation methods. The new versions exhibit slightly higher rejection rates. The cat-ULS mean- and variance-adjusted statistics (equations (3) and (5)) are particularly similar, with the maximum difference never exceeding 1% for any pair of cells corresponding to model 1, and with the maximum difference never exceeding 1.5% for any pair of cells corresponding to model 2. In the vast majority of conditions, the differences are much smaller. The cat-DWLS statistics (equations (2) and (4)) are also very similar but the differences are slightly larger. For model 1, the difference between statistics (2) and (4) exceeds 1% only in two cells across all tables. For model 2, the difference between statistics (2) and (4) exceeds 1% in many cells corresponding to the smallest sample size (*N*= 100), but it remains less than 2.5%. The largest differences occur for data with 7 categories. Thus, the original versions of the mean- and variance-adjusted statistics perform uniformly better, but the difference is typically small. The difference between old and new mean- and variance-adjusted statistics is not emphasized in the remainder of this section, and only the behaviour of the original mean- and variance-adjusted statistics (2) and (3) will be discussed.

Table 3 presents the rejection rates for binary data. Test statistics generally do best with symmetric (S) thresholds, followed by moderate asymmetry (MA) conditions, followed by extreme asymmetry (EA) conditions. The cat-DWLS mean-adjusted statistic *T*_{DWLS-M} (equation (1)) performs the worst, exhibiting inflated rejection across almost all conditions, particularly in small samples (*N*= 100 and 150) and in the EA conditions, where its rejection rates are abysmal, exceeding 20%. They are worse for model 2. These rejection rates become somewhat smaller (by .013 to .035) when improper solutions are excluded, but this improvement is not very helpful (these data are not presented). The mean- and variance-adjusted statistics *T*_{DWLS-MV1} and *T*_{ULS-MV1} (equations (2) and (3), respectively) perform well in S and both MA conditions, even in small samples. However, *T*_{ULS-MV1} tends to under-reject models somewhat in small samples, particularly for the larger model 2, and *T*_{DWLS-MV1} produces better rejection rates. In the EA conditions, however, the performance of *T*_{DWLS-MV1} becomes abysmal for small sample sizes (*N*= 100 and 150). These rejection rates are up to 2.3% smaller when improper solutions are excluded, but again, this decrease is inconsequential (these data are not presented). The performance of *T*_{ULS-MV1} remains quite good even in the EA conditions, but this statistic continues to under-reject in smaller sample sizes, particularly with model 2. Overall, because under-rejection is typically considered to be less of a problem than over-rejection, it can be concluded that *T*_{ULS-MV1} outperforms *T*_{DWLS-MV1} with binary data, and *T*_{DWLS-M} should not be used.

Table 4 presents the results for data with 3 categories. The patterns of results are generally similar to those for binary data. Test statistics again do best in S and MA conditions. The cat-DWLS mean-adjusted statistic *T*_{DWLS-M} again does not do well, particularly in the two smaller sample sizes. This statistic will not be discussed for the rest of this section. The mean- and variance-adjusted statistics *T*_{DWLS-MV1} and *T*_{ULS-MV1} perform well in S and both MA conditions. In the EA conditions, *T*_{DWLS-MV1} again exhibits inflated rejection rates in smaller sample sizes, but the extent of this over-rejection is not nearly as dramatic as it was with binary data. Interestingly, *T*_{ULS-MV1} performs best in the EA conditions, but in the S and MA conditions tends to under-reject in the smaller sample sizes. It is difficult to recommend one mean- and variance-adjusted statistic over the other from these data. There are virtually no differences in the results when improper solutions are excluded; only in two cells do the results change by more than 1%, and this change does not affect the conclusions. Removing improper solutions has virtually no effect on data with more than 3 categories, and will not be discussed further.

Table 5 presents the results for data with 4 categories. The main change in the pattern of the results is that, relative to the data with fewer categories, *T*_{DWLS-MV1} now performs worse, exhibiting inflated rejection rates, in S and MA conditions when the sample size is *N*= 100 or 150. However, relative to data with fewer categories, *T*_{DWLS-MV1} performs better in the two EA conditions. *T*_{ULS-MV1} performs better than *T*_{DWLS-MV1} in almost all conditions. It is worth noting that as the number of categories has increased from 2 to 4, the results for all test statistics have become less differentiated as a function of the threshold conditions. Thresholds matter less as the data approach continuity.

Table 6 presents the results for data with 5 categories. The main change in the pattern of results is that the rejection rates in the S and both MA threshold conditions are uniformly higher. Even *T*_{ULS-MV1}, which tended to under-reject models with fewer categories, now exhibits slightly inflated rejection rates, particularly in smaller samples. Its performance in the S and MA conditions is still better than that of *T*_{DWLS-MV1}, however. Additionally, in the EA conditions, *T*_{ULS-MV1} does very well, while *T*_{DWLS-MV1} does poorly in small samples. Overall, the performance of all statistics is now worse in the MA conditions than in the EA conditions. Table 7, which presents data for 6 categories, exhibits similar patterns, except that the performance of all statistics deteriorates slightly. This pattern continues in Table 8, which presents data for 7 categories. All test statistics over-reject at the smallest two sample sizes, but *T*_{ULS-MV1} does much better than *T*_{DWLS-MV1}. The performance with EA thresholds is slightly better than the performance with MA or S thresholds.

Overall, the two mean- and variance-adjusted statistics followed somewhat different patterns. The cat-DWLS statistic *T*_{DWLS-MV1} performed fairly well in S and the two MA conditions when the number of categories was 2 or 3, and then deteriorated for these conditions when the number of categories was 4–7. The cat-ULS statistic *T*_{ULS-MV1} performed well or under-rejected in the S and the MA conditions when the number of categories was 2–4. In the EA conditions, *T*_{DWLS-MV1} performed very poorly when the number of categories was 2, then showed increasing improvement as the number of categories increased from 3 to 4, then slowly began to deteriorate as the number of categories further increased from 5 to 7. In the EA conditions, *T*_{ULS-MV1} performed well with 3–7 categories, but under-rejected a little with 2 categories.

#### 5.3. Power

Table 9 presents selected power results for *T*_{ULS-MV1} and *T*_{DWLS-MV1}. Only the smallest two sample sizes are presented. Power results are not interpretable when Type I error is not controlled, because inflated Type I error will always lead to artificially high power. Similarly, extremely low Type I error rates can lead to artificially low power. Because, in many conditions, *T*_{DWLS-MV1} tended to exhibit inflated rejection rates (e.g., two-category data, EA thresholds, small samples), while *T*_{ULS-MV1} tended to exhibit rejection rates below nominal, the power comparison of the two statistics is not very meaningful. To get around this problem, Table 9 simply highlights any cell that exhibits power less than .9, and additionally shows in bold any cell that exhibits power less than .8. Given that a grossly misspecified model is fitted to data (a one-factor model is fitted to two-factor data with a factor correlation of .3), it is reasonable to wish that power be at least .8 in such a situation.

Table 9. Power of the new mean- and variance-adjusted test statistics (equations (4) and (5)) at α= .05 at *N*= 100 and 150. Rejection rates are out of the number of all converged cases. Values less than .9 are highlighted. Values less than .8 are in bold.Threshold condition | Number of categories | Model 1 | Model 2 |
---|

*N*= 100 | *N*= 150 | *N*= 100 | N = 150 |
---|

DWLS | ULS | DWLS | ULS | DWLS | ULS | DWLS | ULS |
---|

S | 2 | **.532** | **.422** | **.763** | **.706** | .881 | **.764** | .988 | .967 |

| 3 | **.730** | **.622** | .938 | .897 | .978 | .922 | .999 | .997 |

| 4 | .883 | .827 | .981 | .969 | .997 | .992 | 1.000 | .999 |

| 5 | .929 | .889 | .997 | .989 | 1.000 | .997 | 1.000 | 1.000 |

| 6 | .962 | .938 | .996 | .991 | 1.000 | 1.000 | 1.000 | 1.000 |

| 7 | .971 | .944 | .998 | .997 | 1.000 | 1.000 | 1.000 | 1.000 |

MA-I | 2 | **.479** | **.358** | **.693** | **.612** | .857 | **.711** | .970 | .947 |

| 3 | **.790** | **.726** | .948 | .928 | .988 | .961 | 1.000 | .997 |

| 4 | .867 | .812 | .972 | .955 | .995 | .985 | 1.000 | 1.000 |

| 5 | .919 | .882 | .989 | .982 | 1.000 | .995 | 1.000 | 1.000 |

| 6 | .955 | .916 | .995 | .987 | 1.000 | 1.000 | 1.000 | 1.000 |

| 7 | .962 | .942 | .999 | .997 | 1.000 | 1.000 | 1.000 | 1.000 |

MA-II | 2 | **.504** | **.396** | **.690** | **.634** | .857 | **.723** | .974 | .949 |

| 3 | **.782** | **.713** | .948 | .922 | .983 | .965 | .999 | .999 |

| 4 | .864 | .815 | .973 | .955 | .998 | .995 | .999 | .999 |

| 5 | .949 | .907 | .983 | .970 | .999 | .997 | 1.000 | 1.000 |

| 6 | .941 | .898 | .992 | .989 | 1.000 | 1.000 | 1.000 | 1.000 |

| 7 | .966 | .941 | .997 | .995 | 1.000 | 1.000 | 1.000 | 1.000 |

EA-I | 2 | **.400** | **.075** | **.444** | **.186** | .917 | **.135** | .917 | **.498** |

| 3 | **.508** | **.378** | **.729** | **.631** | .884 | **.758** | .974 | .950 |

| 4 | **.713** | **.626** | .889 | .846 | .980 | .931 | 1.000 | .999 |

| 5 | .818 | **.747** | .952 | .932 | .986 | .970 | 1.000 | .999 |

| 6 | .888 | .834 | .977 | .969 | 1.000 | 1.000 | 1.000 | 1.000 |

| 7 | .905 | .881 | .985 | .981 | 1.000 | 1.000 | 1.000 | 1.000 |

EA-II | 2 | **.511** | **.040** | **.606** | **.162** | .956 | **.041** | .983 | **.388** |

| 3 | **.536** | **.427** | **.720** | **.654** | .921 | .826 | .981 | .955 |

| 4 | **.703** | **.621** | .884 | .857 | .970 | .941 | .999 | .999 |

| 5 | .838 | **.786** | .948 | .927 | .994 | .984 | 1.000 | .999 |

| 6 | .882 | .835 | .979 | .966 | 1.000 | .990 | 1.000 | 1.000 |

| 7 | .925 | .889 | .979 | .973 | 1.000 | 1.000 | 1.000 | 1.000 |

Table 9 reveals that power is much better for the larger model (model 2) than for the smaller model (model 1). When a one-factor model is fitted to the two-factor data with 10 indicators per factor (model 2), power is always greater than .8 for data with 4–7 categories. For the S and the two MA conditions, power is greater than .9 for data with 3–7 categories, and it is reasonably high even for data with 2 categories, never falling below .7. The problematic conditions are the EA conditions with binary data, particularly when *N*= 100. Here, power is extremely high for *T*_{DWLS-MV1} and extremely low for *T*_{ULS-MV1}. For instance, in the EA-II condition, power is .96 for *T*_{DWLS-MV1} and an abysmal .04 for *T*_{ULS-MV1}. A comparison to Type I error rates is necessary to reveal the uselessness of both statistics in this situation. Type I error rates in this condition are .835 for *T*_{DWLS-MV1} and .001 for *T*_{ULS-MV1} (see Table 2). Thus, *T*_{DWLS-MV1} tends to reject all models regardless of whether or not they are correct, and *T*_{ULS-MV1} tends to accept all models regardless of whether or not they are correct. Thus, a combination of binary data, small sample size, and extreme thresholds creates a situation where model evaluation is not possible using *any* test statistic.

When a one-factor model is fitted to the two-factor data with 5 indicators per factor (model 1), power is generally worse. In the S and the two MA conditions, power is greater than .8 for data with 4–7 categories. Power is worse, falling to .62, for the EA conditions for data with 4–7 categories. Binary and 3-category data present the most problems for power. In S and the two MA conditions, the two statistics have similar power in this situation. In the EA conditions, particularly with binary data, it is again the case that the test statistics diverge, and that both are useless. Power is as high as the Type I error rate for the *T*_{DWLS-MV1} statistic, and power is as low as the Type I error rate for the *T*_{ULS-MV1} statistic. Overall, one cannot recommend one statistic over another on the basis of power, because either they both perform fairly well, or, in the most difficult conditions, both fail.

Data for *N*= 350 are not presented. For model 2, power is at least .99 in all conditions and for both test statistics. For model 1, power is at least .99 for 3–7 categories across all conditions and for both test statistics. For binary data in the S and the MA conditions, power is at least .99. For binary data in the EA conditions, power is between .74 and .81. Data for *N*= 600 are also not presented. When *N*= 600, power is at least .99 for 3–7 categories, and at least .95 for binary data.