SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

This paper presents a procedure to test factorial invariance in multilevel confirmatory factor analysis. When the group membership is at level 2, multilevel factorial invariance can be tested by a simple extension of the standard procedure. However level-1 group membership raises problems which cannot be appropriately handled by the standard procedure, because the dependency between members of different level-1 groups is not appropriately taken into account. The procedure presented in this article provides a solution to this problem. This paper also shows Muthén's maximum likelihood (MUML) estimation for testing multilevel factorial invariance across level-1 groups as a viable alternative to maximum likelihood estimation. Testing multilevel factorial invariance across level-2 groups and testing multilevel factorial invariance across level-1 groups are illustrated using empirical examples. SAS macro and Mplus syntax are provided.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

With the development of multilevel structural equation modelling (Goldstein & McDonald, 1988; Lee, 1990; Longford & Muthén, 1992; Muthén, 1990, 1994; Muthén & Satorra, 1995), multilevel confirmatory factor analysis is increasingly used in behavioural and social sciences (e.g., Cheung, Leung, & Au, 2006; Reise, Ventura, Nuechterlein, & Kim, 2005; Zimprich, Perren, & Hornung, 2005). Multilevel confirmatory factor analysis provides an approach to estimating and evaluating measurement models with multilevel data. Like other measurement models, issues of measurement invariance potentially arise in multilevel measurement models. Measurement invariance concerns whether the relationship between an observed measure and underlying latent construct is the same across different groups (Mellenbergh, 1989; Meredith & Millsap, 1992; Millsap, 1997). When the same measure is collected from qualitatively distinct groups (e.g., males and females; public and private school students) or at different time points from the same individuals in repeated measures designs (e.g., first grade and third grade), it is critical to establish measurement invariance in order for comparisons of the constructs to be meaningful. The admissible comparison that may be made across groups depends critically on the level of measurement invariance that can be achieved (Widaman & Reise, 1997).

Within the structural equation modelling tradition, measurement invariance is established by testing a hierarchical series of models that impose increasingly strict constraints on the hypothesized confirmatory factor analysis (CFA) model. Factorial invariance in CFA is typically examined at four different levels: configural, weak, strong, and strict invariance (described below). For single-level CFA models, a standard procedure for testing factorial invariance has become well established (e.g., Cheung & Rensvold, 1999; Meredith, 1993; Reise, Widaman, & Pugh, 1993). Numerous applications testing factorial invariance can be found in a variety of areas of psychological (e.g., Atienza, Balaguer, & García-Merita, 2003; Dauphinee, Schau, & Stevens, 1997; Thill et al., 2003), cross-cultural (e.g., Ang et al., 2009; Steenkamp & Baumgartner, 1998), organizational (e.g., Schaufeli & Bakker, 2004; Torkzadeh, Koufteros, & Doll, 2005), health (e.g., Ang, Shen, & Monahan, 2008; Gregorich, 2006; Malcarne, Fernandez, & Flores, 2005), and educational research (e.g., Edwards & Oakland, 2006; Green-Demers, Legault, Pelletier, & Pelletier, 2008).

Factorial invariance in multilevel CFA requires examining invariance of parameters in the level-1 model and that in the level-2 model. Multilevel data have a hierarchical structure such that individual observations are nested within clusters. In multilevel modelling, level 1 indicates the lowest level in the nested structure, level 2 indicates the next level within which the level-1 observations are nested, and so on. Multilevel factorial invariance introduces additional complexities that are beyond a simple extension of the well-established procedures for testing factorial invariance in single-level CFA. First, the group membership may exist at level 1 or at level 2. To illustrate, consider a two-level model in which the data are collected from students nested within schools. The researcher may be interested in testing factorial invariance at the school level (level 2), for example, between public and private or between religious and non-religious schools. Alternatively, the researcher may be interested in testing factorial invariance at the student level (level 1), for example, between boys and girls or between first- and third-grade students. Second, when group membership is at the lower level (level 1, here students), the level-1 group membership intersects the clustered structure of multilevel data. In this case, a methodological challenge arises: how can factorial invariance be tested across level-1 groups without losing the capability of multilevel modelling to appropriately adjust for the potential dependency arising from the clustering in multilevel data?

Despite the development of multilevel structural equation modelling, testing factorial invariance in multilevel CFA across multiple groups has not been fully established in the literature. Only a few studies have considered factorial invariance in multilevel CFA across level-2 groups (Davidov, Dülmer, Schlülter, Schmidt, & Meuleman, 2012; Kim, Kwok, & Yoon, 2012; Muthén, Khoo, & Gustafsson, 1997). Multilevel factorial invariance across level-1 groups has rarely been addressed, except in one recent study by Jak, Oort, and Dolan (2013a). Jak et al. proposed a five-step procedure which tests both invariance across level-1 groups and invariance across level-2 groups. For testing invariance across level-1 groups, their five-step procedure takes the approach of using the level-1 group membership as a covariate (referred to as a restricted factor analysis model in their paper), instead of taking a multiple group analysis approach.

The goal of this paper is to present and illustrate a procedure for testing factorial invariance in multilevel CFA model. The presented procedure takes a multiple group analysis approach. Therefore it is within the same framework as the well-known standard procedure for testing factorial invariance in single-level CFA. Also, as will be shown later, testing invariance in factor loadings is straightforward in the procedure presented in this paper, whereas it is less straightforward in the restricted factor analysis model, particularly when the invariance is to be tested for a large number of factor loadings simultaneously (see Jak et al., 2013a, for more details about the restricted factor analysis model).

The procedure for testing multilevel factorial invariance across level-2 groups is parallel to the procedure for factorial invariance in single-level CFA. Testing multilevel factorial invariance across level-1 groups, however, raises methodological challenges. This paper identifies two challenges and presents solutions to them. For both cases, testing multilevel factorial invariance is illustrated using an empirical data set. Given limitations of space, the presentation is limited to the contexts that are most widely used in the literature: two-level CFA models and factorial invariance between two groups.

2. Factorial invariance in single-level confirmatory factor analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

Let y be a vector of observed measures and η be a vector of underlying latent factors. The measurement model in CFA specifies a set of linear relations between observed measures and underlying latent constructs:

  • display math(1)

where τ is a vector of measurement intercepts, Λ is factor loading matrix, and ɛ is a vector of residuals. It is assumed that inline image and inline image. For the multiple group case, the measurement model can be written as

  • display math(2)

where the subscript k indicates group membership. Assuming that the latent factors are uncorrelated with the residuals, the mean and covariance structure of yk are reproduced respectively by

  • display math(3)
  • display math(4)

where inline image, inline image, and inline image.

Factorial invariance is typically tested at the following four levels (Widaman & Reise, 1997):

  1. configural invariance – the dimension and the pattern of zero and non-zero loadings in Λk are the same across groups;
  2. weak invarianceΛk is invariant across groups (Λk = Λ for all k);
  3. strong invarianceΛk and τk are invariant across groups (Λk = Λ and τk = τ for all k);
  4. strict invariance: Λk, τk, and Θk are invariant across groups (Λk = Λ, τk = τ, and Θk = Θ for all k).

Figure 1 illustrates linear relations between an observed variable and a latent factor at four levels of invariance between two groups. Solid lines represent the linear relation between an observed variable and a latent factor in group 1; dashed lines represent the linear relation in group 2.

image

Figure 1. Illustrated linear relations between observed variable and latent factor at four levels of factorial invariance between two groups. Crosses indicate group 1, circles indicate group 2. Solid lines show a linear relation in group 1; dashed lines show a linear relation in group 2. For strong and strict invariance, the two lines overlap.

Download figure to PowerPoint

With configural invariance, no parameter associated with latent factors is comparable across groups. When weak invariance holds, the relations between observed and latent variables in two groups are depicted by parallel but non-overlapping lines, as in Figure 1(b). In this case, the covariance matrix of latent factors Ψk is comparable across groups (directly if the same identification method is used; indirectly if different identification methods are used across groups), because the variance and covariance are based on the deviation from the mean. When strong invariance holds, the two lines exactly overlap each other, as in Figure 1(c). Under strong invariance, (3) and (4) are simplified to

  • display math(5)
  • display math(6)

In this case, the means (αk) and covariance matrix (Ψk) of latent factors are comparable across groups. In (5), any difference in the means of observed scores y between groups is due to the difference in latent factor means α and therefore the means of observed variables are comparable across groups. But the variances and covariances of observed scores y are due not only to the difference in Ψ but also to the difference in Θk. In Figure 1(c), the two lines exactly overlap each other but group 1 (indicated by crosses) shows larger residual variance than group 2 (indicated by circles). Finally, when strict invariance holds (Figure 1(d)), the two lines exactly overlap each other and also the residual variance is the same in both groups. In this case, (6) is simplified to

  • display math(7)

With strict invariance, the means (αk) and covariance matrix (Ψk) of latent factors are comparable across groups. As shown in (5) and (7), the group difference in both means and variances of observed scores is due to the group difference in means and variances of latent factors. In other words, all group differences on the observed scores are attributable to group difference on the latent factors. The means and variances of observed variables are comparable too.

3. Multilevel confirmatory factor analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

Throughout this paper, I use individual as level-1 unit and cluster as level-2 unit in multilevel data. Group indicates the group membership across which the invariance is tested. Suppose that the data are from N individuals clustered within J clusters. The clusters are a simple random sample from a population of clusters, and the individuals are a simple random sample with each cluster. The number of individuals in the jth cluster is nj, whereinline image . For the balanced case, nj = n for all j. Let yij denote a data vector for individual i in cluster j. In multilevel CFA, the data vector yij is decomposed into two latent random components reflecting two sources of random variation in multilevel data: between-cluster random components (yBj) and within-cluster, between-individual random components (yWij):

  • display math(8)

where inline image and inline image. Note that yBj and yWij are latent (i.e., not directly observed) components. All level-1 variables are subject to implicit, model-based decomposition in multilevel CFA.

The two-level CFA model specifies linear relations between yBj and underlying latent factors ηBj at level 2, and linear relations between yWij and underlying latent factors ηWij at level 1:

  • display math(9)

where inline image, inline image, inline image, inline image, and inline image.The two-level CFA model shown in (9) assumes the following:(a) inline image, therefore inline image(b) ΣWj = ΣW for all j;(c) inline image

In (9), the level-1 parameters (ΛW, ΨW, ΘW) do not have subscript j because the within-cluster covariance structure is assumed to be homogeneous across clusters (assumption (b)).1 With these assumptions, the mean and covariance structure of yij are reproduced by (10) and (11), respectively:

  • display math(10)
  • display math(11)

In two-level CFA, the mean structure is captured at level 2 and the level-1 model has no mean structure, as shown in (9) and (10).

4. Multilevel factorial invariance

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

I now consider establishing factorial invariance in multilevel CFA, first when the group membership is at level 2 and second when the group membership is at level 1.

4.1 Level-2 group membership

The procedure for testing multilevel factorial invariance across level-2 groups is parallel to the standard procedure for testing single-level factorial invariance. Because the group membership is at level 2, the data can be separated by group membership without altering the clustering in multilevel data. With level-2 group membership (= 1, …, K), the two-level CFA model can be written as

  • display math(12)

where the subscript (k) indicates the level-2 group membership, inline image, inline image,inline image, inline image, and inline image

Assuming multivariate normality, the maximum likelihood (ML) solution can be obtained by the following fitting function:

  • display math(13)

where inline image is the number of clusters in group k, θ is a vector of parameters, inline image, inline image in which inline image, inline image is the total number of individuals in group k, inline image, and inline image.

All four levels of invariance are testable in the level-2 model: configural, weak (inline image), strong (inline image, inline image),and strict (inline image, inline image, inline image). At level 1, weak invariance (inline image) is testable, but strong invariance is not because the level-1 model has no mean structure. A model testing invariance in inline image between level-2 groups can also be specified and tested.

4.2 Level-1 group membership

Suppose the standard procedure is adopted to test multilevel factorial invariance across level-1 groups (e.g., boys vs. girls in typical mixed gender schools that have both male and female students). The standard procedure first separates the data into level-1 groups. Then multilevel CFA models are specified in each group and equality of parameters is tested across the groups. In this case, the individuals are separated into groups within each cluster because the level-1 group membership intersects the level-2 cluster. Therefore the decomposition of level-1 variables occurs separately in each level-1 group:

  • display math(14)

where the subscript |k indicates that the decomposition occurs separately within each level-1 group. The between-cluster component yBj|k in (14) is not constant for all individuals within the same cluster (e.g., the between components for boys are different from the between components for girls within the same school). The decomposition shown in (14) fails to capture the dependency between members of different level-1 groups who belong to the same cluster. In order to maintain the capability of multilevel modelling to take dependency due to the clustered structure into account, the decomposition should follow (8), not (14).

When the group membership is at level 1, it is critical that the decomposition of the level-1 variables is not conditional on the level-1 group membership. Therefore the decomposition must occur before the data are separated into level-1 groups:

  • display math(15)

where the subscript (k) within parentheses indicates that the data are separated after the level-1 variables are decomposed. The between-cluster component is constant for all individuals regardless of level-1 group membership as long as they belong to the same cluster (i.e., inline image for all k). The level-2 model is equivalent regardless of level-1 group membership.

In (15), inline image in each level-1 group is not necessarily zero. The level-1 model needs mean structure to represent the relative difference in means between level-1 groups. The two-level CFA model with multiple level-1 groups can be written as

  • display math(16)

where inline image, inline image, inline image, inline image, inline image, and inline image. In the level-1 model, all four levels of invariance are testable: configural, weak (inline image), strong (inline image,inline image), and strict (inline image,inline image,inline image).The level-2 model is equivalent regardless of level-1 group membership.

Another methodological challenge in testing multilevel factorial invariance across level-1 groups is that an appropriate effective sample size should be used for the level-2 model. In (15) the effective sample size is J for all inline image across all level-1 groups, and the effective sample size for inline image in each level-1 group is smaller than J. In the simplest case, suppose that the school sizes are equal across all J schools and there are equal numbers of boys and girls in each school. The effective sample size would be 0.5J for inline image and 0.5J for inline image so that the effective sample size for the between-cluster model (which is equivalent between boys and girls) would be J. For a general case, the effective level-2 sample size can be obtained using weights that are based on the relative sizes of level-1 groups in each cluster. The weight in level-1 group k is obtained by

  • display math(17)

where inline image is the number of individuals in level-1 group k in cluster j and nj is the total number of individuals in cluster j. The effective level-2 sample size for each level-1 group is obtained by inline image. The sum of effective level-2 sample size across all level-1 groups is always J.

Assuming multivariate normality, the ML fitting function for two-level CFA for multiple level-1 groups can be written as2

  • display math(18)

where θ is a vector of parameters, inline image, inline image, inline image in which inline image is the number of individuals in cluster j in group k, and inline image. Note that (18) includes an additional term for within-cluster mean structure inline image, which did not appear in (13). Also in (18), the between-cluster model [inline image, inline image] does not depend on the level-1 group membership, that is, there is no subscript (k).

4.3 MUML estimation for testing multilevel factorial invariance

To illustrate testing multilevel factorial invariance across level-1 groups, I use Muthén's maximum likelihood (MUML: Muthén, 1989, 1990) estimation via a manual set-up for two reasons. First, MUML serves as a better vehicle for a didactic presentation, showing the decomposition of yij more explicitly. Second, there is currently no software package available to obtain ML solutions using the ML fitting function shown in (18). When the cluster sizes are equal for all clusters, the MUML estimates are equivalent to ML estimates. When the cluster sizes are not equal (i.e., unbalanced cluster sizes), MUML provides an approximated solution. The performance of MUML approximation theoretically depends on the level-2 sample size (i.e., number of clusters) and the variability of cluster sizes (Yuan & Hayashi, 2005). It has been empirically shown that MUML provides a good approximation when the number of clusters is 100 or larger (Hox, 1993; Hox & Maas, 2001).

MUML via multiple group analysis requires means of observed variables (inline image) weighted by the square root of the scaling parameter (c, described below), between-cluster covariance matrix (SB), and pooled within-cluster (SPW) covariance matrix as input data:

  • display math(19)
  • display math(20)
  • display math(21)

SB is an unbiased estimator of the weighted composite of the population within and between covariance matrices (ΣW + cΣB), where c is a scaling parameter, defined byinline image, or c = n for balanced cases. SPW is an unbiased estimator of population within covariance matrix (ΣW). The ‘trick’ in MUML via manual set-up is to use a multiple group analysis of single-level CFA models with two ‘groups’ (Mgroup hereafter).3 In the first Mgroup, [inline image, inline image] is fitted to [inline image, SB]. In the second Mgroup, inline image is fitted to SPW. The mean structure in the second Mgroup is zero. The effective sample sizes are J and N - J for Mgroup 1 and Mgroup 2, respectively. The within-cluster covariance structure inline image is constrained to be equal between the two Mgroups. The input means are weighted by inline image so that the mean and covariance structure for the between-cluster model are on the same scale. To extend the MUML approach to testing factorial invariance in two-level CFA, a multiple group analysis needs to be set up for twice as many Mgroups as the number of groups.

The following input statistics are required in order to use MUML estimation for testing multilevel factorial invariance across level-1 groups:

  • display math(22)
  • display math(23)
  • display math(24)
  • display math(25)

where inline image is a scaling parameter in group k, inline image is the total number of individuals in group k, inline image the number of individuals in cluster j in group k, and inline image the effective level-2 sample size in group k. A SAS macro to compute the weights for level-2 effective sample size, scaling parameters, and the input statistics can be found in the supporting information, available with the online version of this paper.

For MUML estimation via a manual set-up, 2k Mgroups are required. For example, in order to test multilevel factorial invariance between two level-1 groups, a multiple group analysis with four Mgroups needs to be specified. For group = 1, inline image and inline image are submitted to Mgroup 1; inline image and inline image are submitted to Mgroup 2. For group = 2, inline image and inline image are submitted to Mgroup 3; inline image and inline image are submitted to Mgroup 4. The effective sample sizes for Mgroups 1–4 are inline image, inline image, inline image, and inline image, respectively. Three sets of constraints are required. The first set is equality constraints for inline image between two Mgroups within each level-1 group membership (e.g., Mgroup 1 and Mgroup 2 for group k). The second is for inline image to be equivalent across groups (i.e., constrained to be equal between Mgroups 1 and 3). The third is a set of constraints necessary for model identification and test of factorial invariance in the hierarchy. The necessary constraints are shown in the example below.

5.  Illustration

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

In this section, I illustrate testing multilevel factorial invariance using PISA (Programme for International Student Assessment) 2003 data (Lee, 2009; OECD, 2004, 2005). A two-level CFA model for five mathematics self-efficacy items (see Table 1)4 is shown in Figure 2. I first illustrate testing multilevel factorial invariance between two countries, New Zealand and Turkey (level-2 group membership), and then testing multilevel factorial invariance between male and female students in Turkey (level-1 group membership).

Table 1. PISA 2003 mathematics self-efficacy items
  ICC in each country
Items ICCNZLTUR
Notes
  1. The self-efficacy items asked “How confident do you feel about having to do the following mathematics tasks?” on a 1–4 Likert scale of 1 =  very confident, 2 =  confident, 3 =  not very confident, and 4 =  not at all confident. NZL = New Zealand; TUR = Turkey. ICC = intraclass correlation.

Q31c Calculating how many square metres of tiles you need to cover a floor.080.035.096
Q31b Calculating how much cheaper a TV would be after a 30% discount.088.043.109
Q31a Using a train timetable to work out how long it would take to get from one place to another.164.056.115
Q31d Understanding graphs presented in newspapers.132.039.092
Q31 h Calculating the petrol consumption rate of a car.063.037.083
image

Figure 2. Two-level factor model for five mathematics self-efficacy items (PISA 2003).

Download figure to PowerPoint

5.1 Example: Multilevel factorial invariance between New Zealand and Turkey (level-2 group membership)

In New Zealand (NZL), 4,189 students were nested within 173 schools. School size ranged from 1 to 54 (mean = 24.21, standard deviation = 10.031). In Turkey (TUR), 4,030 students were nested within 159 schools. School size ranged from 2 to 35 (mean = 25.35, standard deviation = 7.865). ML estimation in Mplus 6.12 (Muthén & Muthén, 1998) was used.

The goodness of fit was examined for the overall model (i.e., the fit of the school-level and student-level models was examined simultaneously) and also for each level separately using the level-specific approach described in Ryu and West (2009). In multilevel CFA, the model fit statistics for the overall model may be dominated by the level-1 model because the sample size is typically larger at level 1, and therefore the lack of fit in the level-2 model may not be detected by the fit statistics for the overall model (see Hox, 2010; Ryu & West, 2009). The level-specific approach by Ryu and West utilizes partially saturated models to evaluate the model fit at each level separately (e.g., a partially saturated model in which the level-1 model is saturated and the level-2 model is specified as hypothesized is used to evaluate the fit of the level-2 model). In both countries, the model fitted well at both levels (see Table 2). In NZL, the proportion of variance accounted for by the common factor (i.e., communality) ranged from .348 to .589 at student level, and from .531 to .934 at school level. In TUR, the communality ranged from .261 to .465 at student level, and from .785 to .989 at school level.

Table 2. Model fit statistics of the two-level CFA model for mathematics self-efficacy
 New ZealandTurkey
Note
  1. CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual. The fit statistics for school-level and student-level models were obtained by the level-specific approach using partially saturated models (see Ryu & West, 2009).

Overall model 
 

χ2 = 69.467, df = 10, < .001

CFI = .989

RMSEA = .038

SRMRB = .034, SRMRW = .018

χ2 = 59.223, df = 10, < .001

CFI = .990

RMSEA = .035

SRMRB = .022, SRMRW = .017

School-level model 
 

inline image = 4.707, df = 5, = .453

CFIPS_B = 1.000

RMSEAPS_B = .000

inline image = 10.839, df = 5, = .055

CFIPS_B = .986

RMSEAPS_B = .086

Student-level model 
 

inline image = 61.036, df = 5, < .001

CFIPS_W = .989

RMSEAPS_W = .037

inline image = 46.876, df = 5, < .001

CFIPS_W = .990

RMSEAPS_W = .032

For the configural invariance model, the following constraints were imposed for model identification.5 In the school-level model, Q31c was chosen as the reference variable.6 For Q31c, the factor loading was fixed to 1 (inline image = inline image = 1), and the measurement intercept was constrained to be equal between NZL and TUR (inline image = inline image). The factor mean was fixed to zero in one group (NZL here, inline image = 0) and freely estimated in the other group (inline image) was. In the student-level model, the factor loading for the reference variable Q31c was fixed to 1 (inline image = inline image = 1). The mean structure does not exist in the student-level model and therefore no specification was needed.

Factorial invariance at school level was tested by a series of models with increasingly added constraints in the school-level model. The model fit statistics and likelihood ratio test statistics (LR or Δχ2: Bentler & Bonett, 1980) are summarized in Table 3a.7 Once again, the model fit was assessed in two ways: the overall model (i.e., school-level and student-level model simultaneously) and the school-level model (denoted by subscript PS_B in Table 3a) using the level-specific approach by Ryu and West (2009). In Table 3a, the strong invariance did not hold at the school level. The model was modified by removing the equality constraints on measurement intercepts for Q31a, Q31d, and Q31 h (StrongPartial). The strict invariance model with partial strong invariance (non-invariant intercepts for Q31a, Q31d, and Q31 h) yielded acceptable model fit statistics and this model was selected as an appropriate school-level model.

Table 3a. Model fit statistics and likelihood ratio (LR) tests to test factorial invariance at school level between New Zealand and Turkey
Model χ 2 LR test (Δχ2)CFIRMSEASRMRWSRMRBCFIPS_BRMSEAPS_B
Notes
  1. Degrees of freedom are shown in parentheses. CFIPS_B and RMSEAPS_B are fit indices for the school-level model obtained by the level-specific approach by Ryu and West (2009). For Strong, StrongPartial, and Strict models, CFI and CFIPS_B were obtained using more restricted null models that are nested within the hypothesized model (Widaman & Thompson, 2003). No equality constraint on the school-level intercepts of Q31a, Q31d, and Q31 h. For StrongPartial, the LR test compares the StrongPartial to the weak invariance model.

Configural128.706 (20), < .001 .990.036.018.029.989.041
Weak133.916 (24), < .0015.210 (4), = .266.990.033.018.037.986.039
Strong354.927 (28), < .001221.011 (4), < .001.973.053.019.248.772.195
StrongPartiala136.581 (25), < .0012.665 (1), = .103.990.033.018.037.986.041
Strict151.965 (30), < .00115.384 (5) = .009.989.031.018.064.968.054

Factorial invariance at student level was tested by increasingly imposing constraints in the student-level model. The model fit statistics and LR test statistics are summarized in Table 3b. For student-level factorial invariance, the model fit was assessed for the overall model and for the student-level model (denoted by subscript PS_W in Table 3b). Note that the chi-square test and the LR test were significant at < .05 for all models and all model comparisons at the student level, because the sample size was large (8,219 students). As mentioned earlier, strong invariance is not considered at the student level as there is no mean structure in the level-1 model. The fit indices were not satisfactory for strict invariance model. The model was modified by removing the equality constraint on the student-level residual variance of Q31d (StrictPartial). The partial strict invariance model (non-invariant residual variance for Q31d) was selected as an appropriate student-level model.

Table 3b. Model fit statistics and likelihood ratio (LR) tests to test factorial invariance at student level between New Zealand and Turkey
Model χ 2 LR test (Δχ2)CFIRMSEASRMRWSRMRBCFIPS_WRMSEAPS_W
Notes
  1. Degrees of freedom are shown in parentheses. CFIPS_W and RMSEAPS_W are fit indices for the student-level model obtained by the level-specific approach by Ryu and West (2009). For Strict and StrictPartial models, CFI and CFIPS_W were obtained using more restricted null models that are nested within the hypothesized model (Widaman & Thompson, 2003). No equality constraint on the student-level residual variance of intercept of Q31d. For StrictPartial, the LR test compares StrictPartial to the weak invariance model.

Configural128.706 (20), < .001 .990.036.018.029.990.035
Weak191.966 (24), < .00163.260 (4), < .001.984.041.023.031.983.037
Strict677.705 (29), < .001485.739 (5), < .001.941.074.041.050.935.054
StrictPartiala328.468 (28), < .001136.502 (4), < .001.972.051.030.039.970.044

Finally, the selected school-level and student-level models were combined. For the final model, χ2 = 354.675 (df = 39, < .001), CFI = .974, RMSEA = .045, SRMRW = .030, SRMRB = .067. The estimates are shown in Figure 3. At school level, the non-invariant intercepts are interpreted as follows: the school-level components of Q31a and Q31d would be higher for schools in Turkey than those for schools with the same level of mathematics self-efficacy in New Zealand. In other words, given the same level of math self-efficacy, the observed scores of Q31a and Q31d would be higher for students who belong to schools in Turkey than for students who belong to schools in New Zealand. For Q31 h, the direction is the opposite: given the same level of mathematics self-efficacy, the observed scores of Q31 h would be lower for students who belong to schools in Turkey than for students who belong to schools in New Zealand. Therefore the means of Q31a, Q31d, and Q31 h are not comparable across the two countries. If strong invariance had held for all five items at school level, the difference in the common factor (SELFEFFB) mean (0.193, as shown in Figure 3) would have been interpreted as the difference in school-level mathematics self-efficacy between NZL and TUR.

image

Figure 3. Multilevel factorial invariance of mathematics self-efficacy items between New Zealand and Turkey. Non-invariant parameter estimates are shown in bold. N: New Zealand; T: Turkey.

Download figure to PowerPoint

At student level, strict invariance did not hold for Q31d. The non-invariant residual variance is interpreted as follows: the difference in the student-level component of Q31d is larger for students in Turkey compared to the difference in the student-level component of Q31d associated with the same difference of mathematics self-efficacy for students in New Zealand. In other words, the same difference in the student-level component of Q31d does not reflect the same difference in the common factor SELFEFFW between the two countries. Therefore the variance and covariance associated with Q31d cannot be compared between the two countries (i.e., the estimated variances of SELFEFFW, 0.311 for NZL and 0.375 for TUR in Figure 3, are not comparable).

5.2 Example: Multilevel factorial invariance between male and female students (level-1 group membership)

The data used for this example were from 4,030 students – 1,748 females and 2,282 males – nested within 159 schools in Turkey. School size ranged from 2 to 35 (mean = 25.35, standard deviation = 7.865). The proportion of female students ranged from 0 to 1, the mean proportion was .4215, and the standard deviation was .2263. Before testing multilevel factorial invariance using MUML estimation, the estimates and standard errors obtained by ML were compared to those obtained by MUML. As shown in Table 4, the estimates and standard errors obtained by MUML were comparable to those obtained by ML, even though the school size was not balanced.

Table 4. ML and MUML estimates and standard errors for the two-level factor model
 MLMUML MLMUML
Note
  1. Standard errors are shown in parentheses.

Student level  School level 
inline image Fixed to 1Fixed to 1 inline image Fixed to 1Fixed to 1
inline image 0.970 (0.030)0.970 (0.030) inline image 1.052 (0.063)1.051 (0.062)
inline image 0.955 (0.030)0.958 (0.030) inline image 1.065 (0.064)1.052 (0.062)
inline image 0.760 (0.029)0.760 (0.029) inline image 0.950 (0.071)0.950 (0.069)
inline image 0.944 (0.031)0.946 (0.032) inline image 0.874 (0.072)0.875 (0.071)
    inline image 2.162 (0.026)2.137 (0.026)
    inline image 2.039 (0.026)2.013 (0.026)
    inline image 2.284 (0.026)2.257 (0.025)
    inline image 2.158 (0.025)2.132 (0.025)
    inline image 2.401 (0.025)2.382 (0.025)
inline image 0.394 (0.012)0.395 (0.012) inline image 0.004 (0.003)0.004 (0.003)
inline image 0.354 (0.011)0.354 (0.011) inline image 0.001 (0.002)0.001 (0.002)
inline image 0.334 (0.010)0.333 (0.010) inline image 0.001 (0.002)0.002 (0.003)
inline image 0.522 (0.013)0.521 (0.013) inline image 0.007 (0.004)0.008 (0.004)
inline image 0.486 (0.013)0.486 (0.013) inline image 0.015 (0.004)0.015 (0.004)
ψ SELFEFFW 0.319 (0.016)0.318 (0.016) ψ SELFEFFB 0.071 (0.012)0.075 (0.012)
    α SELFEFFB Fixed to 0Fixed to 0

The weights for effective level-2 sample size were .4215 and .5785 for females and males, respectively. The effective level-2 sample sizes were inline image = .4215 × 159 = 67.0185 and inline image = .5785 × 159 = 91.9815. The scaling parameters were inline image = 5.1237 and inline image = 4.9888. Four Mgroups are required to test factorial invariance between females (Mgroups 1 and 2) and males (Mgroups 3 and 4; see Figure 4). The effective sample sizes for Mgroups 1–4 were 67.0185, 1680.9815, 91.9815, and 2190.0185, respectively. Note that the sum of school-level effective sample sizes is equal to the number of schools (67.0185 + 91.9815 = 159).

image

Figure 4. Multiple group analysis for MUML estimation to test multilevel factorial invariance between females and males.

Download figure to PowerPoint

For model specification, three sets of constraints were imposed. First, the student-level model was set equal between two Mgroups within each level-1 group membership (e.g., between Mgroups 1 and 2 for females, between Mgroups 3 and 4 for males). Second, the school-level model was set equal between females and males (e.g., [inline image, inline image] constrained to be equal between Mgroups 1 and 3). The third is a set of constraints necessary for model identification and factorial invariance testing in the hierarchy. The constraints for configural invariance model are summarized in Table 5. Mplus syntax for configural invariance model can be found in the online supporting information.

Table 5. Constraints for testing multilevel factorial invariance between level-1 groups (females and males)
 Females Males 
 Mgroup 1Mgroup 2Mgroup 3Mgroup 4
Notes
  1. group indicates groups in multiple group analysis for MUML estimation. An equivalent school-level model can be specified with inline image = inline image = 0 and no constraint on inline image = inline image.

School levelSchool-level model constrained to be equal between Mgroup 1 and Mgroup 3
 inline image = 1 inline image = 1 
 inline image = 0 inline image = 0 
Student levelStudent-level model for females constrained to be equalStudent-level model for males constrained to be equal
 inline image = 1inline image = 1inline image = 1inline image = 1
 inline image = inline imageinline image = inline imageinline image = inline imageinline image = inline image
 inline image = 0inline image = 0  

Once again the school-level model is equivalent between females and males, and the factorial invariance between females and males at the school level is not of concern. Configural, weak, strong, and strict invariance were tested at student level. The model fit statistics and LR test statistics are summarized in Table 6. The configural invariance model fitted well. The weak invariance model also fitted well: the LR test was not significant even with the large sample size (Δχ2 = 4.580 for df = 4, = .333). The strong invariance model yielded a significant LR test, but the fit statistics indicated good model fit (CFI = .974, RMSEA = .057, and SRMR = .056). Finally, for the strict invariance model, the LR test was significant (Δχ2 = 20.193 for df = 5, = .001), but the fit statistics indicated acceptable fit (CFI = .971, RMSEA = .057, and SRMR = .058). Considering the sensitivity of the LR test statistic to large sample size and the other fit indices indicating good model fit, the strict invariance model was selected as an appropriate student-level model.

Table 6. Model fit statistics and likelihood ratio (LR) tests to test factorial invariance at student level between females and males
  χ 2 LR test (Δχ2)CFIRMSEASRMR
Notes
  1. Degrees of freedom are shown in parentheses. For the strong and strict models, CFI was obtained using more restricted null models that are nested within the hypothesized model (Widaman & Thompson, 2003).

Configural132.449 (35), < .001 .981.053.051
Weak137.029 (39), < .0014.580 (4), = .333.981.050.052
Strong185.306 (43), < .00148.277 (4), < .001.974.057.056
Strict205.499 (48), < .00120.193 (5), = .001.971.057.058

The estimated strict invariance model is shown in Figure 5. Both the means and (co)variances of the common factor (SELFEFFW) are comparable between females and males. The estimated mean of SELFEFFW was lower for males than females by –0.234 (< .001). The estimated variance of SELFEFFW was 0.292 and 0.331 for females and males, respectively. The equality of the variance between the two groups can be tested by adding an equality constraint on the variance of SELFEFFW. The fit statistics for this constrained model were χ2 = 209.822 (df = 49, < .001), CFI = .970, RMSEA = .057, and SRMR = .062. The LR test statistic (compared to the strict invariance model) was Δχ2 = 4.463 for df = 1, = .035. Considering the sensitivity of the LR test to large sample size, it was concluded that the variance of SELFEFFW was not different between females and males.

image

Figure 5. Multilevel factorial invariance of mathematics self-efficacy items between females (F) and males (M). Strict invariance holds at student level.

Download figure to PowerPoint

6. Summary and conclusion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

This paper has presented and illustrated a procedure to test factorial invariance in multilevel CFA. To emphasize the key idea again, a simple extension of the standard procedure is not appropriate for testing multilevel factorial invariance across level-1 groups, because the dependency between members of different level-1 groups is ignored. In the procedure presented, the decomposition of level-1 variables is not altered by level-1 group membership, and therefore the dependency is appropriately taken into account among all individuals within the same cluster regardless of their level-1 group membership.

In testing factorial invariance in multilevel CFA, it is critical to distinguish at which level the group membership is. When the group membership is at level 2, multilevel factorial invariance can be tested using a simple extension of the standard procedure. Weak, strong, and strict invariance can be tested at level 2, and weak and strict invariance can be tested at level 1. Strong invariance is not of concern at level 1 because the mean structure is zero in the level-1 model. When the group membership is at level 1, multilevel factorial invariance can be tested using the procedure presented in this paper. The level-2 model is equivalent regardless of level-1 group membership. Weak, strong, and strict invariance can be tested in the level-1 model.

Factorial invariance is a question of whether the relationship between an observed measure and underlying latent factor is the same across different groups. In single-level confirmatory factor model with no cross-loading, one latent factor underlies each observed variable. The parameters (e.g., factor loadings, measurement intercepts) describe the relationships between observed measures and latent factors. In a multilevel confirmatory factor model with no cross-loading, typically two latent factors underlie each level-1 observed variable: one for the within-cluster random component and the other for the between-cluster random component of the level-1 observed variable. For example, in the two-level factor model shown in Figure 2, the observed variable Q31c is hypothesized to load onto the within-cluster factor SELFEFFW and onto the between-cluster factor SELFEFFB. For typical level-1 observed variables in multilevel data which have non-zero within-cluster and between-cluster variances, the relationship between the observed variables and underlying common factors consists of two parts. At level 1, the parameters describe the relationships between the within-cluster latent random components of the observed variables (e.g., Q31cW) and common factor (e.g., SELFEFFW). At level 2, the parameters describe the relationships between the between-cluster latent random components of the observed variables (e.g., Q31cB) and common factor (e.g., SELFEFFB).

When the group membership is at level 2, there are four possible outcomes: (a) an observed measure is measurement invariant (at the desired level of hierarchy) with both level-1 and level-2 common factors (e.g., Q31c is measurement invariant with both SELFEFFW and SELFEFFB); (b) a measure is measurement invariant with the level-1 common factor but not with the level-2 factor (e.g., Q31c is measurement invariant with SELFEFFW but not with SELFEFFB); (c) a measure is measurement invariant with the level-2 common factor but not with the level-1 factor (e.g., Q31c is measurement invariant with SELFEFFB but not with SELFEFFW); and (d) a measure is not measurement invariant with any of the underlying common factors. When the group membership is at level 1, the level-2 model is equivalent regardless of level-1 group membership. Therefore in this case there are two possible outcomes: a measure is measurement invariant with the level-1 common factor or not (e.g., Q31c is measurement invariant with SELFEFFW or not). Since multilevel CFA assumes that the within-cluster and between-cluster models are uncorrelated, the required level of invariance depends on which latent factor selection or inference is made upon. Measurement invariance must be established at the corresponding level at which an inference is made. For example, in Figure 3, an inference made upon SELFEFFB at school level would be invalidated by the non-invariance in measurement intercepts of Q31aB, Q31 dB, and Q31hB. But the non-invariance in measurement intercepts for Q31aB, Q31 dB, and Q31 hB would not be critical for the inference regarding SELFEFFW at student level.

In conclusion, the procedure for testing multilevel factorial invariance across level-1 groups can be summarized as follows: the decomposition of level-1 variables should follow the clustered structure of multilevel data before the data are separated into level-1 groups; the level-2 model is equivalent regardless of level-1 group membership; the effective sample size should be used for the level-2 model; the level-1 model needs mean structure to represent the relative difference in means between level-1 groups; and weak, strong, and strict invariance can be tested in the level-1 model. The procedure presented can easily be implemented using standard SEM software packages. I hope this paper will motivate researchers and provide them with a viable tool to consider the issue of factorial invariance in multilevel CFA models.

Footnotes
  1. 1

    Jak, Oort, and Dolan (2013b) present a test for the homogeneity of within-cluster measurement model across clusters in multilevel data (referred to as ‘cluster bias’ in their paper).

  2. 2

    Both ML and MUML estimation are based on the assumption of multivariate normality. Alternative test statistics and model evaluation procedures that do not rely on normal theory are available for multilevel structural equation models (e.g., Yuan & Bentler, 2002, 2003).

  3. 3

    To avoid confusion, I use ‘Mgroup’ to indicate ‘groups’ in multiple group analysis for MUML estimation, ‘group’ to indicate the group membership between which the invariance is tested, and ‘cluster’ to indicate the level-2 unit in multilevel data.

  4. 4

    Although the items were measured using 4-point scales, the distribution of the measures was not severely deviated from the normal. The skewness ranged from 0.003 to 0.609, and the excess kurtosis ranged from –0.831 to 0.316. The illustrated examples used MUML estimation to obtain an approximated ML solution assuming multivariate normality. When the normality assumption is violated, an alternative estimation method (e.g., maximum likelihood estimation with robust standard error [MLR]) is recommended.

  5. 5

    There are alternative ways to identify the model. The interpretation of the results is easiest if the reference variable method is used. Alternative identifications do not affect the likelihood ratio statistic.

  6. 6

    The reference variable method relies on the assumption that the factor loading for the reference variable is truly invariant across groups. In this example, the weak invariance model in which all factor loadings were constrained to be equal across groups was acceptable at both school and student levels. If weak invariance does not hold for any of the items, the test of invariance may yield different results depending on the choice of reference variable. In this case it is crucial that the loading for the reference variable is truly invariant across groups (see Cheung & Lau, 2012; Johnson & Meade, 2007; Johnson, Meade, DeVernet, 2009; Rensvold & Cheung, 1998).

  7. 7

    CFI was obtained using a more restricted null model that is nested within the hypothesized model (Widaman & Thompson, 2003).

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Factorial invariance in single-level confirmatory factor analysis
  5. 3. Multilevel confirmatory factor analysis
  6. 4. Multilevel factorial invariance
  7. 5.  Illustration
  8. 6. Summary and conclusion
  9. References
  10. Supporting Information
FilenameFormatSizeDescription
bmsp12014-sup-0001.pdfapplication/PDF35K

Dataset 1: SAS macro to compute the input statistics for MUML estimation

Dataset 2: Mplus syntax for testing factorial invariance in multilevel confirmatory factor analysis

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.