Measurement invariance of the Kessler Psychological Distress Scale (K10) among children of Chinese rural‐to‐urban migrant workers

Abstract Introduction Kessler Psychological Distress Scale (K10) is a 10‐item screening tool designed for nonspecific psychological distress. The current study aims to identify a best‐fitting factor structure of the K10, and to test its cross‐gender measurement invariance based on the structure. Methods Using convenience sampling, we included 339 (n = 192 for boys and 135 for girls) children of Chinese rural‐to‐urban migrant workers in Hangzhou, China. Results Confirmatory factor analysis for ordered‐categorical measures revealed a two‐factor structure as the best‐fitting model, in which five items (hopeless, depressed, effort, severely depressed, and worthless) loaded on depression and the other five items loaded on anxiety (tired, nervous, severely nervous, restless, and severely restless). The model held at different levels of the measurement invariance testing, that is, full measurement invariance was not rejected in our sample, suggesting that gender differences as assessed with K10 reflect true differences. Structural invariance testing showed that girls in our sample showed significantly higher levels of depression and anxiety than boys. Conclusion These findings support that the K10 is suitable for gender‐comparative research among children of Chinese migrant workers. Using the K10 as a screening tool among this population should be promoted. Limitations and directions for future research were discussed.

as China (Bu et al., 2017), Japan (Sakurai et al., 2011), South Korea (Min & Lee, 2015), and India (Fernandes et al., 2011), as well as African countries such as South Africa (Andersen et al., 2011) and Tanzania (Vissoci et al., 2018). In mainland China, the first validation study on the K10 was conducted among Chinese college students, which suggested K10 had good reliability and validity (Zhou et al., 2008).
The evidence on the factor structure of the K10 is inconclusive. A unidimensional factor structure of the K10 has been reported among community samples of the United States (Kessler et al., 2002), Australia (Sunderland et al., 2012), and the Netherlands (Fassaert et al., 2009). Researchers have also uncovered various two-factor structures of the K10. First, three studies (Lace et al., 2019;O'Connor et al., 2012;Pereira et al., 2019) revealed a two-factor structure with six items loading on depression (depressed, severely depressed, worthless, effort, hopeless, and tired) and four items on anxiety (restless, severely restless, nervous, and severely nervous). Second, Sunderland et al.'s study (2012) suggested a different two-factor structure with four items loading on depression (depressed, severely depressed, worthless, and hopeless) and six items on anxiety (restless, severely restless, nervous, severely nervous, effort, and tired). Third, Bu et al. (2017) found that five items (hopeless, depressed, effort, severely depressed, and worthless) loaded on depression and five items (tired, nervous, severely nervous, restless, and severely restless) loaded on anxiety. In addition, Brooks et al. (2006) found a second-order factor structure of the K10 among Austrian adults. The four first-order factors included negative affect, fatigue, nervousness, and agitation. As for the second-order factors, negative affect and fatigue loaded on depression and nervousness and agitation loaded on anxiety. This second-order factor structure was also reported in Zhou et al.'s study (2008) among Chinese college students. Figures of these alternative factor structures are provided in the Supporting Information of this study.
The issue of measurement invariance (MI) has been discussed by many researchers over the years (e.g., Brown, 2015;Byrne & Watkins, 2003;Meredith, 1993;Vandenberg & Lance, 2000). In general, a measure that has MI indicates that it measures the same construct across groups (such as different races, genders, ages, schools). It implies that the interpretation of the measure is the same across groups. MI of the K10 across different groups indicates that the K10 measures psychological distress comparably across groups. To establish that its measurement properties are equivalent across male and female respondents, for example, researchers need to ensure that a given score on the K10 represents the true level of psychological distress in male and female respondents. Unfortunately, we did not find any studies that have systematically tested cross-gender invariance of the K10.
However, we found one study that examined the MI of the K6 (an abbreviated version of the K10) among young, middle-aged, and older adults from Canada (Drapeau et al., 2010). They reported full MI across gender among middle-aged adults and partial MI across gender among the other two age groups (Drapeau et al., 2010). Another study based in Canada demonstrated K6's MI between youth and adults and between male and female youth (Ferro, 2019). Conversely, researchers have identified measurement noninvariance across gender in the K6 item thresholds in a large sample of Australian adolescents, indicating that reporting biases may be present in the K6 items (Mewton et al., 2016).
We also found some evidence on the MI across gender using different measures. One study, for example, reported full MI across gender among Chinese college students using the Depression Anxiety Stress Scales-21 (Lu et al., 2018).
Based on our literature review, we identified a few gaps to be filled in this study. First, the evidence on the factor structure of the K10 is largely inconsistent. Thus, there is a need to compare these structures for better validation of the K10. Second, based on our review, studies on the K10's MI among adolescents are not available. Therefore, the role of gender biases using the K10 to assess psychological distress in adolescence remains unknown. These gaps undoubtedly confound the findings of subgroup differences in psychological distress among adolescents. It is unclear whether observed differences are the result of true differences in psychological distress across genders, due to gender biases in the measurement instrument, or because of some combination of both. To bridge these gaps, this study aimed to identify the bestfitting factor structure of the K10 and investigate whether this struc- We distributed the parent consent forms through six teachers who agreed to administer the questionnaire in their classes. Students were instructed by their teachers to bring the forms back after their parents signed the consent form for their children to participate in the study.
A total of 339 students were able to bring the form back before our scheduled date for the data collection. Among the 339 parents who signed the consent form, all agreed to have their children participate in our study. On the day when we collected data in six different classrooms, students were asked to assent to participate before they filled in a paper-and-pencil survey. They were told that their participation in the study was completely voluntary, and they may withdraw at any time.
All students assented to participate. In addition to the demographic information, the questionnaire included questions on their psychological stress, school environment, and family relationships.
Our initial sample size was 339. We excluded observations with missing values on gender. The size of the analytic sample of our study was 327. It has been proposed to use the ratio of the number of participants to the number of variables to determine whether the sample size is adequate (Kline, 1994). Considering previous researchers have recommended a ratio of 20 to 1 as the cutoff (Hair et al., 1998), we determined that our analytic sample size was more than adequate, with the ratio equal to 32.1 to 1.

Measure
This present study focused on psychological distress. In our survey, it was measured by the Chinese version of the K10 (Bu et al., 2017), which was translated from the original K10 developed by Kessler et al. (2002) in the United States. The questionnaire starts with a prompt: "These questions concern how you have been feeling over the past 30 days.
Tick a box below each question that best represents how you have   . Since our participants' ratings of some K10 items did not cover all five categories, we collapsed the responses into three categories by combining "a little of the time" with "some of the time" and "most of the time" with "all of the time." Also, theta parameterization was specified to allow for the residual variances of the factor indicators as parameters.

DATA ANALYSIS STRATEGIES
To identify a best-fitting model, we used CFA to compare alternative factor structures of the K10. To compare factor structures, we used the model proposed by Brooks et al. (2006) as the baseline, in which the K10 included four first-order factors and two second-order factors. Nested in the baseline model, two additional factor structures indicated in previous research were tested subsequently: a twofactor model (Bu et al., 2017) and a one-factor model (Kessler et al., 2002).
Subsequently, we used CFA for ordered-categorical measures (CFA-OCM) to test K10's cross-gender MI and structural invariance. Following recommendations in the literature (Brown, 2015), we conducted the analyses in the following order: (1) separate CFA in each group; (2) CFA with equal form (configural invariance); (3) CFA with equal factor loadings (metric or weak invariance); (4) CFA with equal indicator intercepts (scalar or strong invariance); (5) CFA with equal indicator residual variances (strict invariance). In Mplus, categorical indicator thresholds are modeled instead of intercepts, with the number of thresholds equal to the number of categories minus one (Muthén & Muthén, 2017). We also tested population heterogeneity by constraining factor variances, covariances, and means to be equal.
Measures of model fit included the χ 2 goodness of fit, the rootmean-square error of approximation (RMSEA), comparative fit index (CFI), and the Tucker-Lewis index (TLI). Nonsignificant χ 2 goodness of fit was used to define sufficient model fit. Also, as recommended in previous research (Brown, 2015;Hu & Bentler, 1999), a RMSEA value close to 0.06 or below, an SRMR value close to 0.08 or below, and CFI and TLI values close to 0.95 or greater were used to define good fit. Finally, to compare model fit across nested models, the DIFFTEST option was used to calculate χ 2 differences in all Mplus models.
Many K10 items had one missing observation (items 1, 4, 7, 8, 9, and 10) and item 3 had two missing observations. Missingness on K10 was handled by the WLSMV estimator using pairwise deletion (WLSMV-PD) in Mplus. Research has shown that WLSMV-PD produces unbiased model estimates and is far more efficient than the WLSMV estimator using listwise deletion (Asparouhov & Muthén, 2010).
Though this method has been criticized for inflating type I error, this inflation is only prominent when the rate of missing data is high (Chen et al., 2020).

RESULTS
The mean age of the total sample was 13.39 years (SD = 1.02; range = 11-16). There was no significant difference in age between male and female adolescents (p = .58). Students were almost equally distributed across grade levels: 36% were sixth graders, 33% seventh graders, and 31% eighth graders. No difference was detected in the bivariate relationship between gender and grade levels: χ 2 (2) = 0.48, p = .79. We conducted t-tests to examine the differences in K10 item scores across gender. As shown in Table 1, female adolescents were more likely to feel nervous (t = 2.21, p = .01), hopeless (t = 1.84, p = .03), depressed (t = 2.54, p = .006), severely depressed (t = 1.91, p = .03), and worthless (t = 1.71, p = .04) than their male counterparts. However, there were no differences in students' report on the other five items: tired, extreme nervous, restless, extreme restless, and effort.

CFA model comparison
To determine the best-fitting model, we compared three CFA models: the second-order model identified by Brooks et al. (2006), the twofactor model validated by Bu et al. (2017), and the unidimensional one-factor model reported by Kessler et al. (2002). As shown in Table 2, all three models fit the data well. The two-factor model did not significantly worsen the fit of the second-order model (D χ 2 = 1.60, df = 4, p = .81). The unidimensional model, on the other hand, significantly worsened the fit of the two-factor model (D χ 2 = 8.10, df = 1, p = .004). We decided to retain the two-factor model for further analyses because it was the best-fitting model with the least number of parameters. This model is also consistent with substantive theory, which suggests that common anxiety symptoms include agitation and nervousness and common depressive symptoms include negative affect. Previous research has indicated that fatigue is associated with both anxiety and depression disorders (Sharpe & Wilks, 2002). Our model captured fatigue on both anxiety and depression, with the item of tired loading on anxiety and the item of effort loading on depression.

Measurement invariance
To examine cross-gender MI, we first tested the baseline model in both gender groups concurrently (Model 1). This equal form model is also referred to as the configural invariance model in the literature. The factor variances were fixed to 1 and the factor means were fixed to 0 in both groups. The residual variances were all constrained to 1 in both groups. All item factor loadings (one per item) and thresholds (two per item given three response options) were estimated. This model showed excellent fit (Table 3), suggesting that configural invariance was supported.
In subsequent models, we proceeded by applying more stringent parameter constraints to examine potential decreases in fit resulting from measurement or structural noninvariance between boys and girls, with boys as the reference group. Figure 1 explains how we constrained different parameters in four steps (i.e., models). In addition,

F I G U R E 1
Model parameters constrained in testing the MI of the K10. Note. Parameters l1-l10 stand for factor loadings, which were constrained to be equal across the two groups in Model 2 (metric invariance); t1a and t1b through t10a and t10b stand for item thresholds (two thresholds for each item), which were constrained to be equal across groups in Model 3 (scalar invariance); r1-r10 stand for item residual variances, which were fixed to 1 for both groups in Model 4 (strict invariance) model comparison statistics, as well as model fit information of all these models, are presented inTable 3.
Equality of the unstandardized item factor loadings between groups was examined in a metric invariance model (Model 2). The factor variances were fixed to 1 in boys for identification but were freely estimated in girls; the factor means were fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. Model 2 did not fit significantly worse than Model 1: DIFFTEST (8) = 7.72, p = .46. Metric invariance was thus supported, indicating that the same latent factors (i.e., depression and anxiety) were being measured in each gender group by the K10.
Equality of the unstandardized item thresholds across groups was examined in a scalar invariance model (Model 3). The factor variances and means were fixed to 1 and 0, respectively, in boys for identification, but they were freely estimated for girls. All factor loadings and item thresholds were constrained equal across groups; all residual variances were constrained equal to 1 in both groups. Model 3 did not significantly worsen the fit of Model 2: DIFFTEST (18) = 14.66, p = .69.
Thus, scalar invariance was supported, suggesting that the observed difference in the proportion of responses in each category for all K10 items was due to factor mean differences only.
Next, we tested the invariance of the unstandardized residual variances across groups (i.e., strict invariance). The model comparison at this step proceeded backward. In other words, a model (Model 4A) with all residual variances freely estimated in girls was estimated first.
It was then compared with a model in which all residual variances were fixed to 1 in girls (Model 4B). The residual variances in the boys were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described in Model 3. Model 4B did not fit significantly worse than the Model 4A: DIFFTEST (10) = 9.40, p = .49. Thus, strict invariance was supported, indicating that the amount of item variance not accounted for by the two factors was the same in all K10 items across groups.

Structural invariance
Based on the strict invariance model (i.e., Model 4B), we tested an addi- These results indicated that, on average, girls experienced higher levels of depression and anxiety than boys, whose factor means were 0. We also used t-tests to examine the mean difference in anxiety and depression between girls and boys based on the composite score of each subscale. For anxiety, the mean difference was 0.07 (t = 2.07, p = .04, Cohen's d = 0.23); for depression, the difference was 0.08 (t = 2.28, p = .02, Cohen's d = 0.26). Thus, it was corroborated that girls displayed higher levels of psychological distress than boys, though the effect size was relatively small.

DISCUSSION
Based on a convenience sample of children of Chinese internal migrants living in Hangzhou, China, this study's aim is twofold: (1) to identify the best-fitting factor structure of the K1, and (2) to examine the cross-gender MI of the K10 using the best-fitting model. Our findings indicated that the two-factor model validated by Bu et al. (2017) fit our data best, which was retained for further data analysis in our study.
This best-fitting model includes anxiety and depression as two factors, with five items loading on each factor. Specifically, depression includes items of hopeless, depressed, effort, severely depressed, and worthless, whereas anxiety includes items of tired, nervous, severely ner- In terms of the testing of MI for the K10, our analyses showed that full MI of the K10 was obtained across boys and girls, that is, the relationships of the K10 items to the latent factors of depression and anxiety were equivalent between boys and girls. Specifically, boys and girls in our sample appear to use the same conceptual framework, as assessed with the K10, in their perception of psychological distress (equal form); the items of the K10 scale have similar meanings for boys and girls (equal factor loadings); the scaling of the K10 items is similar for the compared groups in our sample (equal item thresholds); the K10 items demonstrate similar amounts of unique variance across boys and girls (equal item residual variances). However, structural invariance was not obtained in full, such that girls experienced higher levels of depression and anxiety than boys. Taken together, our findings suggested that gender differences in anxiety and depression among Chinese migrant children, as measured by the K10, can be construed as meaningful and true, and not due to measurement error or gender biases associated with the use of the K10.
Our findings of invariance are encouraging because MI is a precursor to any subgroup comparison using the K10. Our study also highlights the necessity of establishing MI when comparing subgroup differences. For future researchers, it is always prudent to examine MI before conducting subgroup analyses. This is an imperative lesson considering that ample research has studied gender differences in anxiety and depression symptomatologies but not much has established MI as a precondition (e.g., Lai, 2011;Parker et al., 2014;Schuch et al., 2014;Silverstein et al., 2013). Given our findings of invariance, our study's findings support the validity of these comparative studies. However, this is true only to some extent in that the measurement in our study is not necessarily the same as used in these comparative studies.
Although previous research on the MI of psychological distress is rare, existing evidence reveals MI of depressive symptoms (as measured by Children's Depression Inventory) across male and female adolescents (Carle et al., 2008). Also, a few studies have provided evidence on the invariance (or partial invariance) of the K6 (a shorter version of the K10) across genders and different age groups (Drapeau et al., 2010;Ferro, 2019;Mewton et al., 2016). To better illuminate the role of gender biases in the measurement of psychological distress, future researchers may need to consider testing the MI of the K10 across age groups (especially between children and adults). In addition, previous research (e.g., Bender et al., 2012;Derdikman-Eiron et al., 2011;Negriff & Susman, 2011) has repeatedly reported that girls experience higher levels of psychological distress than boys. Due to the higher means of anxiety and depression among girls, our study confirms this observation.
Both factors validated in our study are common mental illnesses familiar to mental health clinicians. Five K10 items fit the common characterization of anxiety (including two items of agitation, two items of nervousness, and one item of fatigue) and the other five items fit that of depression (including four items of negative affect and one item of fatigue). Probably due to the inclusion of these commonly understood mental health symptomatologies, K10 has been widely used by mental health professionals as a screening tool for common mental health problems in both practice and research (e.g., Cornelius et al., 2013;Furukawa et al., 2003;Kessler, Berglund, et al., 2003;Thelin et al., 2017). Berle et al. (2010) have argued that it makes more sense to understand the K10 as a specific measure of anxiety and depression, despite Kessler et al.'s (2002) intent to use the K10 to measure "nonspecific" psychological distress. They contended that a complete model of psychological distress should encompass other aspects such as somatic complaints and even psychotic symptoms, which are not assessed with the K10 (Berle et al., 2010). However, this argument does not preclude the use of the K10 as an effective tool for common mental health problems such as anxiety and depression.
By demonstrating MI of the K10, the present study supports the notion that girls experience higher levels of anxiety and depression than boys. More importantly, the differences are not due to measurement variability, thus reflecting the true differences between genders. Logically, the next step in mental health practice involves eliminating gender disparities by reducing anxiety and depression symptomatologies among girls. Interventions are often closely monitored to achieve the best outcomes with clients. Our findings indicate that the K10 could be used to evaluate the effectiveness of such interventions.

Implications for practice
In mainland China, where school-based mental health interventions are not always readily available, it is more practical to consider using the K10 as a screening tool among school-aged children. The brevity and validity of the K10 lend itself to being widely used as a mental health assessment tool regularly. We believe that middle school students are worthy of more attention compared to other school-aged children because research has shown that mental health problems developed in childhood can amplify in early adolescence or the middle school years (Stormshak et al., 2011) and that gender differences in depression symptomatologies often emerge reliably around age 15 (Carle et al., 2008). For Chinese mental health professionals, it may be more practical to assess students with migrant parents, considering their relatively vulnerable status. Based on our findings, female children of migrants are at a higher risk than their male counterparts.
Therefore, the assessment with female students should also be prioritized.

Limitations and strengths
The current study has several limitations. First, our sample was lim- were not included in our study. We also left out nonmigrant children who were born and lived with their parents in rural or urban areas.
Future research could tap into the experiences of psychological distress among these populations and examine MI of the K10 across leftbehind children and non-left-behind children and across migrant and nonmigrant children. Third, our study did not test gender differences in K10's criterion validity (e.g., its ability to predict some psychiatric disorders). Fourth, the current study was based on the Classic Theory Test (CTT) and future researchers may wish to examine K10's invariance based on the Item Response Theory (IRT), which can provide additional information, such as item difficulty, discriminative ability, and differential item functions (Ye et al., 2018).
Nevertheless, this study adds to the K10 literature in several ways. It provides a confirmatory test of the factor structure of the K10 and identifies a best-fitting factor structure among a sample of middle school students. By conducting MI analyses, it explicitly examines whether the subgroup difference in psychological distress, as assessed with the K10, is a result of measurement bias. Finally, it uses a community sample of children of rural-to-urban migrants in China, who are considered as a vulnerable population, not only due to their lower socioeconomic status but also due to the discrimination associated with their migrant status. These conditions may predispose migrant children to experience higher levels of psychological distress.
Based on our results, it is our sincere hope that Chinese schools with migrant children will provide more screening services using the K10.

CONCLUSIONS
This study confirms that the two-factor structure (five items on depression and five on anxiety) is the best-fitting model for the K10 among Chinese migrant children. It also supports the full MI across gender, thus eliminating the possibility of measurement error and gender biases associated with the use of the K10 among this population.
These findings provide important implications for future research and practice.

ACKNOWLEDGMENTS
This study was funded by Zhejiang Academy of Social Sciences (Grant #: 14JDDF01YB).

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

PEER REVIEW
The peer review history for this article is available at https://publons. com/publon/10.1002/brb3.2417