Measurement Invariance of the Wong and Law Emotional Intelligence Scale Scores: Does the Measurement Structure Hold across Far Eastern and European Countries?


  • We would like to thank Eveline Schollaert for her help in collecting the data.


In recent years, emotional intelligence and emotional intelligence measures have been used in a plethora of countries and cultures. This is also the case for the Wong and Law Emotional Intelligence Scale (WLEIS), highlighting the importance of examining whether the WLEIS is invariant across regions other than the Far Eastern region (China) where it was originally developed. This study investigated the measurement invariance (MI) of the WLEIS scores across two countries, namely Singapore (N= 505) and Belgium (N= 339). Apart from items measuring the factor “use of emotion”, the measurement structure underlying the WLEIS ratings was generally invariant across both countries as there was no departure from MI in terms of factor form and factor loadings. The scalar invariance model (imposing an identical threshold structure) was partially supported. Factor intercorrelations (not involving the factor “use of emotion”) were also identical across countries. These results show promise for the invariance of the WLEIS scores across different countries, yet warn of the non-invariance of the dimension “use of emotion”. Reducing the motivation-oriented nature of these items is in order to come to an exact model fit in cross-cultural comparisons.


Since the first publication on emotional intelligence (EI) in 1990 (Salovey & Mayer, 1990), EI has been studied as an individual difference construct in avariety of contexts (e.g. employment, education, and clinical contexts) and countries. Paralleling this growing interest of researchers and practitioners to determine individuals' EI, various approaches for measuring EI have been proposed and scrutinised during the last two decades (Zeidner, Matthews, & Roberts, 2004). One possible EI measurement approach consists of the use of self-report questionnaires wherein individuals are asked to indicate how well the scale items describe their emotion-related abilities and dispositions. These self-report EI scales are typically used to assess global trait EI as “a constellation of emotional self-perceptions located at the lower levels of personality hierarchies” (Petrides, 2010, p. 137).

One of the most popular self-report EI instruments is the Wong and Law Emotional Intelligence Scale (WLEIS; Wong & Law, 2002) because it relies on the revised four-branch ability EI model (i.e. self emotion appraisal, others' emotion appraisal, use of emotion, and regulation of emotion) of Mayer and Salovey (1997) for measuring individuals' self-perceptions about EI. The WLEIS was originally developed in the Far East (Hong Kong in China) and its four-factor structure has been supported in China (Huang, Chan, Lam, & Nan, 2010; Law, Wong, Huang, & Li, 2008; Law, Wong, & Song, 2004; Shi & Wang, 2007; Wong & Law, 2002) and other countries.1

Despite the proliferation of the WLEIS in international contexts, we do not know whether the measurement structure of the WLEIS is invariant across cultures (Law et al., 2004; Shi & Wang, 2007; Whitman, Van Rooy, Viswesvaran, & Kraus, 2009). In fact, a problem is that many use the WLEIS across countries without investigating the measurement invariance (MI) of the WLEIS scores. When an instrument such as the WLEIS is used in an international context, it is imperative to establish that the measurement structure underlying the scores is invariant cross-culturally (F.M. Cheung, 2004; Hoyle & Smith, 1994; Whitman et al., 2009). Only when the MI of an instrument such as the WLEIS is established can WLEIS scores be compared across countries.

Conceptually, there are at least three reasons to expect that the MI of the WLEIS across Far Eastern and West European countries should not be taken for granted. First, differences in the importance that cultures ascribe to specific values (e.g. protecting public image and self-discipline; Schwartz, 1999) affect differences in response styles across these cultures (Harzing, 2006). For example, Chen, Lee, and Stevenson (1995) showed that respondents in collectivistic (Far Eastern) cultures are more likely to use the midpoint values and are less likely to use the extreme values on a scale than respondents do in individualistic (West European) cultures. Such differences in response style could produce differences in factor loadings (i.e. the width of the response interval). Similarly, Harzing (2006) reports higher levels of acquiescent responding (i.e. yes-saying) in Far Eastern countries as compared to West European countries. Such differences may affect threshold values (i.e. the mean score within the response interval) across cultures, resulting in scalar non-invariance (G.W. Cheung & Rensvold, 2000, 2002).

Second, Far Eastern and Western European countries differ in the degree to which they value emotional expressiveness (Trompenaars & Hampden-Turner, 1998). Members of Western European cultures that are emotionally expressive tend to more overtly and visibly demonstrate their feelings through laughing, gesturing, body posture, and facial expressions (Hammer, 2005). In contrast, members of emotionally restrained Far Eastern cultures tend to contain, hide, mask, or otherwise minimise more overt emotional expression (Ting-Toomey, 1999). The expressive–restraint distinction may particularly influence MI across cultures because of its relevance to emotional functioning.

A final reason relates specifically to the phrasing of some WLEIS items (see Appendix, e.g. “I always tell myself I am a competent person”, “I am a self-motivated person” as items of the “use of emotion” scale). Such items measure not only EI but also reflect one's motivation and self-efficacy. Motivationally oriented measures are particularly sensitive to cultural differences in the dimension of individualism–collectivism (Heine & Buchtel, 2009) which distinguishes Western European from Far Eastern countries (Hofstede, 2001; Parkes, Bochner, & Schneider, 2001). Thus, it is plausible that at least some scales of the WLEIS lack MI across these countries.

Taken together, there is a clear need to establish the MI of the WLEIS across different cultures. Though previous studies have examined cross-cultural validity of other EI measures (Ng, Wang, Kim, & Bodenhorn, 2010; Ekermans, Saklofske, Austin, & Stough, 2011; Karim & Weisz, 2010; Gignac & Ekermans, 2010), we believe an examination of the WLEIS is important given its wide popularity and international use in—thus far—13 countries. Therefore, this study examines whether the measurement structure of the WLEIS is invariant across a Far Eastern country (Singapore) and a Western European country (Belgium).


Participants and Procedure

Belgian Sample.  Graduate students from one particular university in the Dutch-speaking part of Belgium were recruited on campus to complete the survey. In total, 339 students provided useful data for analysis (response rate = 95.8%). The Belgian sample comprised 38.9 per cent male students and 61.1 per cent female students. The mean age of these students was 22.7 years (SD= 1.66). All graduate students completed a Dutch translation of the original English WLEIS. The original WLEIS was translated into Dutch/Flemish by a Belgian colleague and this translated version was checked by one of the authors. No modifications needed to be made.

Singaporean Sample.  In Singapore, graduate students were also recruited on campus. To ensure cultural homogeneity of the Singaporean sample, we contacted only ethnic Chinese students (N= 600) that were part of a university-wide subject pool via email. Eventually, 505 students provided data suitable for further analysis (response rate = 84.2%). The Singaporean sample consisted of 48.5 per cent males and 51.5 per cent females. The mean age of these students was 22.0 years (SD= 1.79). All graduate students completed the original English version of the WLEIS (Wong & Law, 2002). They all had a high proficiency in English.


All graduate students completed the 16-item WLEIS (Wong & Law, 2002). They indicated their level of agreement with each individual survey item on a 5-point Likert-scale, with the following labels describing specific (but not all) answer categories: 1 =“strongly disagree”, 3 =“neutral”, 5 =“strongly agree”. The WLEIS comprises four “dimensions” (or, alternatively, referred to as lower-order “factors”): self emotion appraisal, others' emotion appraisal, use of emotion, and regulation of emotion. Each dimension contains four survey items (see Appendix for all items). As shown in Table 1, these dimensions of the WLEIS exhibited high internal consistency (lowest Cronbach's α value was .74).

Table 1. Descriptive Statistics and Cronbach's Alphas of WLEIS within each Country
Scale-level summary statistics Self emotion appraisal Others' emotion appraisal Use of emotion Regulation of emotion
Belgium (N= 339)    
 Cronbach's alpha.
Singapore (N= 505)    
 Cronbach's alpha.


The WLEIS scores in both countries did not follow a multivariate normal distribution (i.e. Mardia's skewness test [i.e. b1p]: b1p= .930, p < .001 [in Belgium]; b1p= 2.173, p < .001 [in Singapore]; and Mardia's kurtosis test [i.e. b2p]: b2p= 27.133, p < .001 [in Belgium]; b2p= 31.043, p < .001 [in Singapore]). To deal with multivariate non-normality of the WLEIS items and their ordered categorical nature, we used the mean- and variance-adjusted weighted least squares estimation approach (i.e. WLSMV in Mplus 5.2; see Kaplan, 2000, p. 85).

As prior empirical research has provided empirical evidence for the theoretical four-factor model underlying the WLEIS (see Law et al., 2004; Shi & Wang, 2007; Wong & Law, 2002), our first analyses re-examined the factor structure of the WLEIS in the Belgian and Singaporean samples. To this end, we assessed model fit of alternative measurement models comprising minimally one and maximally five factors (k= 1, 2, . . . , 5) underlying the WLEIS items. To test the fit of these alternative models, we used exploratory factor analyses with an oblique (promax) rotation (see Woods, 2002). Inspection of model fit statistics (see Table 2) and the pattern of rotated factor loadings revealed that the four-factor solution outperformed the one-, two-, and three-factor solutions in both Singapore and Belgium. Moreover, all WLEIS items loaded onto their designated EI dimension, with factor loadings above .45 (surpassing the cutoff level of .32 of Tabachnik and Fidell, 2001). In addition, factor loadings between items measuring one WLEIS dimension and items measuring another WLEIS dimension stayed below .15. Although a five-factor solution showed a better fit in both countries, the additional factor was conceptually meaningless as none of the WLEIS items loaded substantially on this factor. In sum, our exploratory factor analyses provided evidence for the theoretically derived four-factor structure in both Singapore and Belgium.

Table 2. Summary of Goodness-of-Fit Indices for Within-Group Exploratory Factor Analyses (WLSMV estimationa)
  Belgium (N =339) Singapore (N =505)
  • Note: k= number of factors extracted; RMSEA = root mean square error of approximation; SRMR = standardised root mean square residual.

  • a 

    With WLSMV estimation χ2 values are mean- and variance-adjusted. In addition, degrees-of-freedom are estimated and not derived from the model structure.

χ2 (df) [χ2/df]  
 k= 11684.65 (26) [64.79]1246.18 (33) [37.76]
 k= 21306.32 (29) [45.05]705.02 (33) [21.36]
 k= 3678.87 (27) [25.14]449.48 (34) [13.22]
 k= 4126.14 (40) [3.15]80.11 (36) [2.23]
 k= 595.17 (34) [2.80]49.43 (32) [1.54]
 k= 1.355.329
 k= 2.295.245
 k= 3.219.190
 k= 4.065.060
 k= 5.060.040
 k= 1.221.210
 k= 2.152.143
 k= 3.093.096
 k= 4.022.029
 k= 5.018.028

Our further analyses examined the cross-country MI structure of the WLEIS. To this end, we conducted several statistical comparisons between alternative multigroup mean and covariance structure (MACS) models (Sörbom, 1974, 1978). MACS models differ from the commonly used covariance-based SEM models in that they model not only a covariance structure but also a mean structure. Initially, two hierarchically nested invariance models (see L.K. Muthén & Muthén, 2007, p. 346) were evaluated across the Belgian and Singaporean data: the “least restrictive” form invariance model and a “highly restrictive” scalar invariance model. The form invariance model imposes the same theoretical four-factor structure across both countries. That means that the same indicators (i.e. items) measure the four factors in both countries. As the form invariance model does not impose any measurement parameters to be identical across both countries, it serves as a baseline model to evaluate subsequent more restrictive invariance models.

Given the debate on appropriate fit indices—i.e. use of ordinary χ2 statistics that are rooted in asymptotic statistical theory as proposed by Antonakis, Bendahan, Jacquart, and Lalive (2010) and Kline (2010) versus alternative fit indices with simulation-based cutoffs (Brannick, 1995; Kelloway, 1995) such as Comparative Fit Index, CFI, and Root Mean Square Error of Approximation, RMSEA, as proposed by Cheung and Rensvold (2002) and Hu and Bentler (1999)—and for pragmatic reasons, we decided to rely on both fit index approaches to evaluate model fit. So, we inspected the χ2 and we also categorised model fit to be reasonable/good if: (a) the CFI and TLI values exceed a cutoff point of .90 (rather liberal) / .95 (rather conservative); (b) the RMSEA falls below .08 (rather liberal) / .05 (rather conservative) (see Davidov, Datler, Schmidt, & Schwartz, 2011; Hu & Bentler, 1999; Marsh, Hau, & Wen, 2004); and (c) the χ2/df ratio is smaller than 3.0 (Kline, 2010). Regarding nested MI model comparisons, we relied on recent simulation results by Chen (2007) as well as the size of the Δχ2df ratio. Specifically, the maximum deterioration in CFI between alternative MI models is set at .010 (i.e. two times .005)2 between the form invariance model and the scalar invariance model, whereas this is .020 for RMSEA (i.e. two times .010; see F.F. Chen, 2007).

Table 3 shows that the form invariance model produces a significant χ2 value (p < .01), indicating that the model fails to fit exactly. However, the other indices (CFI, TLI, and RMSEA) satisfy our criteria of adequate model fit. Note too that the χ2/df ratio only marginally exceeds the cutoff point of 3.0 proposed by Kline (2010). Thus, we consider our baseline model as being supported on both theoretical and empirical grounds. Next, we compared this baseline model with the highly restrictive scalar invariance model that imposes invariant factor loadings and an invariant threshold structure (i.e. structure of underlying threshold values of an ordered categorical variable that indicates the specific point at which respondents make a transition from a particular response category to a higher response category) across countries. The statistical comparison between the scalar invariance model and the form invariance model reveals that the difference in χ2 is highly significant (see Table 3). Moreover, the deterioration in CFI (.014) exceeds the cutoff point of .010 mentioned earlier. In contrast, the deterioration in RMSEA (.008) does not exceed the respective cutoff point of. 020. In sum, we conclude that the scalar invariance is overly restrictive and should therefore be rejected. To find out which (restricted) measurement parameters were responsible for the misfit of the scalar invariance model, we continued our search for an optimal fitting partial scalar invariance model.

Table 3. Tests of Measurement Invariance for the Multigroup Model of the WLEIS across Both Countries
Model χ2 dfa χ2/df CFI TLI RMSEA Model comparison b
  • Note: CFI = comparative fit index; TLI = Tucker-Lewis index; RMSEA = root mean square error of approximation.

  • a 

    With WLSMV estimation (we used “delta parametrisation” unless mentioned otherwise; see Muthén & Asparouhov, 2002), χ2 values are mean- and variance-adjusted. In addition, degrees-of-freedom are not derived from the model structure but estimated.

  • b 

    We relied on an adequate procedure to compare χ2 values of (nested) MI models; this procedure is implemented in Mplus as the “DIFFTEST” procedure. As such, differences in χ2 are estimated (and not directly derived from the two models).

  • c 

    In the partial scalar invariance model C, the factor loadings of the second, third, and fourth survey items measuring “use of emotion” were not constrained to be equal across countries; so were the threshold values.

A: Form invariance (baseline model)252.24833.04.977.986.070
B: Scalar invariance372.631053.55.963.983.078B versus A:
Δχ2= 175.81, Δdf= 36,
Δχ2df= 4.88
p= .000
C: Partial scalar invariancec (non-invariance of three survey items measuring “use of emotion”)298.911012.96.973.987.068C versus A:
Δχ2= 80.90, Δdf= 30,
Δχ2df= 2.70
p= .000
D: Partial scalar invariance (model C) and equal factor correlations between all pairs of factors not involving “use of emotion”204.00732.79.982.988.065D versus C:
Δχ2= 4.78, Δdf= 3,
Δχ2df= 1.59
p= .189

Inspection of the modification indices revealed that three (out of the four) WLEIS items measuring “use of emotion” (see Appendix) exhibited non-invariance in terms of their factor loadings as well as some of the thresholds. In particular, the difference lies in higher response scores and thus easier transitions to higher response categories in the Singaporean sample (see note under Table 3 for the specific invariant measurement parameters). Hence, a partial scalar invariance model that released the restriction of these parameters (i.e. model C in Table 3) showed an improved model fit (χ2/df ratio of 2.96 vs. the χ2/df ratio of 3.55 for the scalar invariance model), and an improved Δχ2df ratio (2.70 vs. 4.88). Compared to our baseline model (i.e. the form invariance model), both CFI and RMSEA decreased by .004 and .002, respectively. Therefore, the partial scalar invariance model C seems to be an adequate model indicating the most critical sources of non-invariance in measurement parameters of WLEIS across Belgium and Singapore.3

Given reasonable model fit of the partial scalar invariance model C, we also examined whether factor correlations were invariant across Belgium and Singapore. So, we compared a partial scalar invariance model with equal factor correlations across countries (model D in Table 3) with the partial scalar invariance model not assuming equal factor correlations (model C). Note that this model D does not constrain factor correlations involving the factor “use of emotion” to be equal across both samples as there were non-invariant parameters for this factor. Statistical comparison of both models indicates that the MI model imposing equal factor correlations across groups produced an excellent Δχ2= 4.78; Δdf= 3 with a p-value equal to .189. Compared to model C, the CFI increased by .009 and the RMSEA decreased by .005. This result suggests that the three factor intercorrelations (i.e. self emotion appraisal, others' emotion appraisal, and regulation of emotion) are identical across countries. The constrained factor correlations varied between .28 and .40 (see Table 4). The strength (Huang et al., 2010; Law et al., 2008; Wong & Law, 2002) and ranking of these correlations (Christie, Jordan, Troth, & Lawrence, 2007; Joseph & Newman, 2010) were in line with previous findings.

Table 4. Factor Correlations derived from the Measurement Invariance Model D
  Others' emotion appraisal Use of emotion Regulation of emotion
  1. Note: a Not constrained to be equal across countries, but (rounded) correlations turned out to be identical.

1. Self emotion appraisal.40.10 (Belgium).36
.29 (Singapore)
2. Others' emotion appraisal .04 (Belgium).28
.21 (Singapore)
3. Use of emotion  .32a (Belgium)
.32a (Singapore)

To examine whether the difference in gender composition in Belgium and Singapore (i.e. sample heterogeneity) affected the results of our MI analyses, we conducted an extra set of MI analyses in which we added a direct effect of gender on all four factor scores (i.e. a multiple indicator multiple cause model). Results (available from the authors) revealed a non-significant gender effect and model fit statistics very similar to the ones in Table 3.


Recently, Whitman et al. (2009) warned: “as multinational organizations increasingly adopt EI as a predictor for personnel selection, establishing measurement equivalence across cultures will be necessary to meaningfully interpret multinational data” (p. 1072). Conceptually, this call for MI research on EI instruments is supported by various arguments (i.e. differences in response styles, differences in the valence placed on emotional expressiveness, and the use of motivation-like items in some EI scales). Therefore, this study investigated the MI of the WLEIS across Singapore and Belgium. Results showed that the WLEIS was form invariant, so Singaporean and Belgian respondents use a comparable frame of reference when completing the WLEIS. However, the scalar invariance model was only partially supported as higher factor loadings and lower thresholds—so higher response scores—were found for Singaporean respondents on three items assessing the dimension “use of emotion”. As this dimension is the only dimension with motivation-like items, this result indicates that the last conceptual reason mentioned above (i.e. use of motivation-like items in some EI scales) might be responsible for the invariance.

Where do we go from here? On the one hand, these results bode well for the invariance of WLEIS ratings on the dimensions “self emotion appraisal”, “others' emotion appraisal”, and “regulation of emotion” across different countries. On the other hand, we suggest that cross-cultural comparisons using the dimension “use of emotion” should proceed cautiously. Given the well-documented influence of culture on motivationally oriented constructs (Heine & Buchtel, 2009), a rephrasing of the “use of emotion” items in the WLEIS is likely to improve the cross-cultural viability of this dimension. This admonition might also be relevant for other EI measures that include motivation-like items, such as Bar-On's EQ-i (Bar-On, 1997) and Schutte's EI scale (Schutte, Malouff, Hall, Haggerty, Cooper, Golden, & Dornheim, 1998). Generally, the domain would greatly benefit from further research endeavors resulting in an improved EI questionnaire which (a) has a strong theoretical basis (e.g. one based on the four-branch EI model), and (b) exhibits exact model fit regardless of the context (e.g. country) in which the questionnaire is used.

Future research should further test the invariance of the WLEIS across different contexts. Now that this study has established the MI of the WLEIS in cultures that value collectivism and emotional restraint versus individualism and emotional expressiveness, future research should examine whether the nomological network of EI differs across these different cultural regions. In other words, does EI foster the preservation of social harmony in collectivistic cultures but facilitate autonomy in individualistic cultures? Or does it serve similar functions in both cultural contexts?



Overview of WLEIS Items

Dimension Item
  1. Note: Items with an asterix were non-invariant.

Self emotion appraisalI have a good sense of why I have certain feelings most of the time.
I have good understanding of my own emotions.
I really understand what I feel.
I always know whether or not I am happy.
Others' emotion appraisalI always know my friends' emotions from their behavior.
I am a good observer of others' emotions.
I am sensitive to the feelings and emotions of others.
I have good understanding of the emotions of people around me.
Use of emotionI always set goals for myself and then try my best to achieve them.
I always tell myself I am a competent person.*
I am a self-motivated person.*
I would always encourage myself to try my best.*
Regulation of emotionI am able to control my temper and handle difficulties rationally.
I am quite capable of controlling my own emotions.
I can always calm down quickly when I am very angry.
I have good control of my own emotions.