Reliability and structural validity of the Norwegian version of the TeamSTEPPS Teamwork Attitudes Questionnaire: A cross‐sectional study among Bachelor of Nursing students

Abstract Aim To test the reliability and structural validity of the Norwegian version of the TeamSTEPPS® Teamwork Attitudes Questionnaire (T‐TAQ) among Bachelor of Nursing students. Design Cross‐sectional study. Methods Bachelor of Nursing students (N = 1,624) at three campuses in different regions of Norway were invited to complete the survey. The data were analysed with descriptive statistics, Cronbach's alpha and confirmatory factor analysis (CFA). Three models were tested. Model 3 was a post hoc modification with a correlation between four negatively worded items. The data was collected in September 2018 and May‐June 2019. Results A total of 509 students were included in the study. Cronbach's alpha ranged from 0.44–0.70 for the dimensions and was 0.79 for the total questionnaire. The fit indexes of model 3 were as follows: RMSEA = 0.043, chi‐square = 724.3 (p < .000), normed chi‐square = 1.862, TLI = 0.812 and CFI = 0.832. The questionnaire shows some potential to display attitudes towards teamwork in health care among Bachelor of Nursing students. Low Cronbach's alpha in the dimensions might indicate that the questionnaire should be considered used as a unidimensional questionnaire.

programme. Changes in attitudes are a frequently used measure of learning outcomes in team training (LaMothe et al., 2016;Reeves et al., 2016;Sweigart et al., 2016;Vertino, 2014); thus, high validity and reliability are essential for questionnaires measuring changes in attitudes (Polit & Yang, 2016).

| BACKG ROU N D
Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS ® ) is a team training programme based on more than 20 years of research examining elements that are essential for providing effective and safe care in health care, including the principles of sustainable implementation (King et al., 2008;Salas et al., 2018). The Agency for Healthcare Research and Quality consists of lectures, reinforcement in simulation-based scenarios, low-fidelity training and roleplay, feedback and reflection in clinical settings (AHRQ, 2012;Chen et al., 2019). The TeamSTEPPS ® team training programme has been used in various healthcare educational settings, such as in nursing education (Gaston, 2018;Goliat et al., 2013;Maguire et al., 2015;Robinson et al., 2018) and in interprofessional educational settings (Chen et al., 2019;Welsch et al., 2018). Previous research has shown positive outcomes of the TeamSTEPPS ® team training programme, including reduced patient complications, mortality (Forse et al., 2011) and risk of fall (Spiva et al., 2014). Positive organizational outcomes include an increase in effective patient treatment (Capella et al., 2010) and improved patient safety culture (Aaberg et al., 2019). Learning outcomes show a positive change among students (Maguire et al., 2015;Sweigart et al., 2016) and among healthcare professionals' (Vertino, 2014;Wadsworth, 2019) attitudes towards teamwork after the implementation of TeamSTEPPS ® . Participants also seem to enjoy attending the team training programme (Thomas & Galla, 2013;Welsch et al., 2018). These outcomes motivated the research team to design a study to implement TeamSTEPPS ® in Bachelor of Nursing education. To our knowledge, no Bachelor of Nursing programme in Europe has implemented the TeamSTEPPS ® team training programme.
Methods used to measure attitudes can provide useful information regarding the perception of teamwork behaviour (Frager, 2014;Manser, 2009). According to Ajzen (1991), intentions to perform behaviours can be predicted by attitudes towards the behaviour, subjective norms and perceived behavioural control. Behavioural purposes account for considerable variance in actual practice (Ajzen, 1991). The content of the T-TAQ was developed based on extensive research on essential teamwork attributes . According to Baker et al. (2010), the TeamSTEPPS ® Teamwork Attitudes Questionnaire (T-TAQ) was designed to measure attitudes towards the core components of teamwork aligned with the TeamSTEPPS ® team training programme. Data from the questionnaire can be used to assess changes in participants' attitudes towards teamwork as a result of training, as attitudes are an aspect of learning. The questionnaire may also support quality improvement activities associated with teamwork . The T-TAQ is the most frequently used instrument to measure changes in attitude following intervention with the TeamSTEPPS programme in interprofessional education settings (Welsch et al., 2018). The Norwegian version of the T-TAQ has been validated in a population of healthcare professionals .
Previous studies have used the T-TAQ questionnaire to evaluate team training with interprofessional students (Chen et al., 2019;Welsch et al., 2018), nursing students (Gaston, 2018;Godin et al., 2017;LaMothe et al., 2016;Maguire et al., 2015) and healthcare professionals (Grapensteter, 2017;Vertino, 2014). Bachelor's students are a different population from experienced healthcare professionals with respect to knowledge, teamwork and healthcare experience. Therefore, it was essential to validate the questionnaire among Bachelor of Nursing students, as they were the population of interest in this project. According to Wooding et al. (2019), questionnaires should not be reused without consideration of the population studied. Structural validity should be reassessed to obtain valid and reliable results in a new target population (Polit & Yang, 2016).
Previous T-TAQ studies in nursing education have been conducted with relatively small samples (N = 7-182) (Gaston, 2018;Goliat et al., 2013;LaMothe et al., 2016;Maguire et al., 2015), which makes it challenging to conduct powerful studies of the validity and reliability of a questionnaire (Polit & Yang, 2016). At this point, we have not found any studies examining the reliability and validity of the T-TAQ within a population of Bachelor of Nursing students.

| Aim of the study
This study aimed to test the reliability and structural validity of the Norwegian version of the T-TAQ among Bachelor of Nursing students.

| Setting and sample
The study was conducted at a Norwegian university, which offers a Bachelor of Nursing programme at three campuses in three different regions. All students (N = 1,624) were invited to participate; 408 were first-year students, 532 were second-year students and 684 were third-year students. According to Polit and Yang (2016), an estimated minimum sample size of ten individuals per item on the questionnaire is necessary for confirmatory factor analysis (CFA), but a larger sample is desirable.

| The questionnaire
The T-TAQ was designed to evaluate the TeamSTEPPS ® team train-  Table 2).
The questionnaire was cross-culturally translated as recommended (c.f. Brislin, 1970), and some semantic and conceptual changes were made after a pilot test. The analysis showed Cronbach's alpha values from 0.53-0.76, a normed chi-square of 1.896, an RMSEA of 0.061, a TLI of 0.773 and a CFI of 0.794 . The respondents score each item on a five-point Likert scale to indicate their level of agreement from strongly disagree (1) to strongly agree (5) with the statement. Central teamwork constructs were explained on the first page of the questionnaire. The students were asked to complete background data on sex, age, study progression, campus, former higher education and work experience in health care.

| Face validity
We invited a convenience sample of final-year Bachelor of Nursing students (N = 40) who did not participate in the main study to take part in an email pilot survey to evaluate the face validity of the T-TAQ. The students were asked to respond to each item, as well as to answer additional questions about to what extent they perceived the items as clear and understandable, as well as how easy it was to choose an option on the Likert scale. The respondents had the opportunity to comment with suggestions on how to improve the questionnaire. Based on the response (N = 10), we added supplementary information to items 13 and 14.

| Data collection
The data collection took place in September 2018 and May-June 2019. A paper version of the T-TAQ (paper survey) was distributed to first-year students (N = 408) who were present during a class. The survey took place after their first clinical placement. The students who wanted to participate answered the survey and returned the questionnaire as they left the class.
Because second-and third-year students in clinical placements were spread over a large geographic area, an electronic survey was administered as an email survey to these students (N = 1,216). For the students who accepted the invitation, a hyperlink directed them to the questionnaire. Reminders were sent after 3 and 7 days.

| Analysis
The statistical software IBM SPSS version 26 (2019) and SPSS AMOS version 25 were used to analyse the data. Before the analysis, the scores of the four negatively worded items were reversed.
Descriptive statistics were used to analyse the background data, teamwork dimensions and items. Cronbach's alpha was used to calculate internal consistency; a value above 0.70 was considered acceptable (Polit & Yang, 2016;Tavakol & Dennick, 2011).
We examined the data for missing item responses before the CFA analysis. The analysis of missing data resulted in a listwise deletion of 32 respondents before the CFA was conducted with a sample of 477. A rule of thumb is a sample size of at least 10 individuals per item for the analysis (Polit & Yang, 2016).
A CFA makes it possible to test how well each item measures the dimension that it is supposed to measure and whether the items explain the variance in the latent dimensions (Brown & Moore, 2012).
The structure of the Norwegian version of the questionnaire is based on the original instrument developed by Baker et al. (2010) and hypothesizes that the variance in the responses to the items reflects the variance in the latent dimensions on which the manifest items are loaded (Brown, 2006;Polit & Yang, 2016). The regression coefficient between the first variable and the latent construct in each dimension was fixed to 1, and the unstandardized regression coefficients from the error terms to the measured variables were also fixed to 1 (Polit & Yang, 2016). The error (e) variance for each item indicates the reliability of the observed variables and is influenced by the random measurement error (Byrne, 2010).
We tested the goodness-of-fit of three models. Model 1 was based on the unmodified T-TAQ questionnaire structure and Model 2 tested the same model with the sample randomly split in half to examine the stability of the results in Model 1 (Schreiber et al., 2006).
Model 3 calculated the model fit with a post hoc modification. We wanted to test whether an intercorrelation between error variances among the four negatively worded items (MS20, MS21, MS24 and C30) could result in a better model fit. This was based on poor factor loading and a hypothesis of intercorrelation based on the shared reversion of the items.
The model fit was estimated with equations of four recommended fit indexes in all three models (Polit & Yang, 2016;Schreiber et al., 2006). Absolute fit indexes indicate how well the T-TAQ model fitted the data and were calculated with the chi-square, normed chisquare and root mean square error of approximation (RMSEA). The chi-square statistic should be nonsignificant with a p-value > .05.
Comparative fit indexes compare the model with a null model where all of the variables are uncorrelated (Polit & Yang, 2016). These indexes were calculated with the comparative fit index (CFI) and the Tucker-Lewis fit index (TLI). The CFI and TLI should have values close to 1.0, and threshold values are ≥0.95 (Hu & Bentler, 1999;Polit & Yang, 2016).
As a part of the CFA, correlations between the latent dimensions were analysed. Since all dimensions address aspects of teamwork, a positive correlation between the latent dimensions was hypothesized (Polit & Yang, 2016).

| Ethics
The study was conducted according to the Helsinki Declaration for ethical principles of research (WMA, 2013). The study was approved by the Norwegian Social Science Data Service (NSD ID: 738592) and by the university involved. The invited students obtained written information about the aim of the study and were informed that responding to the questionnaire was voluntary and had no consequences for their educational progression. Returning the questionnaire was considered to indicate consent to participate in the study.

| RE SULTS
A total of 509 students answered the questionnaire (31.3%). The email survey had a response rate of 15.3% and the paper survey had a response rate of 76.2%. The sample characteristics are displayed in Table 1. In short, 61.1% of the respondents were first-year students, 84.1% were female, the median age was 22 years with a range from 18-55 years and 75.2% had work experience in health care. Table 2 shows the mean scores and the standard deviations of the T-TAQ total scale, the five dimensions and the individual items.
Cronbach's alpha coefficient for the total questionnaire was 0.79, and the coefficients for each dimension varied from 0.44-0.70, as shown in Table 3. Table 4 shows the fit indexes for the three models.
Model 1 had a significant chi-square value. The normed chisquare was 2.24. The RMSEA was 0.051, and the TLI and CFI were

| D ISCUSS I ON
This study aimed to test the reliability and structural validity of the Norwegian version of the T-TAQ among Bachelor of Nursing students. Cronbach's alpha indicated that the reliability of the total questionnaire was acceptable, although Cronbach's alpha within dimensions ranged from 0.44-0.70. The analysis of goodness-of-fit indexes showed acceptable values in two absolute fit indexes (RMSEA, normed chi-square) and below-threshold values for the comparative fit indexes (CFI, TLI) and the chi-square index.

| Reliability
The total questionnaire showed acceptable internal consistency with Cronbach's alpha value of 0.79. The questionnaire has 30 items, and Cronbach's alpha value tends to increase with higher number of items (Tavakol & Dennick, 2011).

| Validity
The RMSEA values were acceptable and indicated a good fit, as the values were below the threshold value and had narrow confidence intervals (Byrne, 2010). This index is considered one of the most informative fit indexes and is widely used to measure how well the correlations of the theoretical model match the observed correlations (Byrne, 2010;Meyers et al., 2016). The RMSEA may be vulnerable with a small sample size (Hu & Bentler, 1999), but the sample size in this study (N = 477) is considered acceptable to calculate a valid RMSEA. The number of participants needed is not an exact rule, but ten individuals per estimated item seems to be the consensus (Polit & Yang, 2016;Schreiber et al., 2006). The sample size in our study was equivalent to 70% of the typical sample size in structural equation modelling (SEM) studies in nursing research (Sharif et al., 2018).
A perfect fit for a model would be indicated by a nonsignificant chi-square value (Polit & Yang, 2016). However, for most empirical SEM studies, this has been proven to be unrealistic (Byrne, 2010).
The chi-square test is highly sensitive to sample size, a high correlation between the dimensions in the questionnaire and error variance in the model (Kline, 2011). Thus, other fit indexes often receive more attention (Mishra, 2016;Polit & Yang, 2016).
We considered the normed chi-square acceptable with a value <3 in all three models. There is no consensus regarding whether the cut-off value should be below 2 or 3 (Polit & Yang, 2016;Schreiber et al., 2006). The normed chi-square in our study was <2 in two out of three models. The goodness-of-fit indexes showed better values from model 1 to model 3 (Polit & Yang, 2016;Schreiber et al., 2006).
The comparative fit indexes (TLI and CFI) are below-threshold values but are, to some degree, considered too strict, especially with complex models (Marsh et al., 2004). The CFI compares the targeted model with a model that has no correlation between the variables, which is unlikely in most models (Rigdon, 1996). Rigdon (1996) claims that the CFI is more suited for explorative factor analyses and small samples and the RMSEA is more suited for more confirmatory, large-sample cases, as in our study. Absolute fit indexes and comparative fit indexes represent the data from different perspectives and a model with inconsistency may be neither "good" nor "bad" but may have limitations and the results must be interpreted with this in mind (Lai & Green, 2016).
The structural validity of a model demonstrates whether the model measures what it is described to measure and is indicated by the factor loading and associated error variances (Byrne, 2010).
Twenty-five out of 30 items loaded on the targeted latent dimensions with a factor loading above 0.30, which should be considered acceptable, according to Kääriäinen et al. (2011). Situation monitoring shows a factor loading for all items >0.40 and reveals the highest internal consistency. The mutual support dimension has three items with acceptable factor loading and three with low factor loading and shows a low Cronbach's alpha. Negatively worded items loading on the mutual support and communication dimensions may explain why not all fit indexes are within threshold values in this model (Fan & Sivo, 2005). The items with low factor loadings showed similarly high error variances, which indicates that there is a bias that is not a result of variation in the respondents' attitudes towards the targeted dimension. A model should have an appropriate factor loading of items to the latent dimension to be a valid instrument (Byrne, 2010).
According to Mishra (2016), some plausible explanations of error variances might be that respondents have limited experience with the construct, the respondents might not have understood the meaning of the items, or they respond according to social desirability. Cote and Buckley (1987) claim that abstract constructs may be more challenging to measure than concrete constructs are and measurement error in social science research within the education discipline accounts for 30.5% of the variance. We conducted our study in the context of education and measured an abstract construct; thus, variance as a result of measurement error may be plausible.
Model 3 (after post hoc modification) shows that a correlation between error variances of the reversed items strengthens the fit indexes of the model. This confirms that there is a substantial correlation between the error variances for item MS20 and item MS21. These items pertain to seeking and offering assistance and are some of the core elements of mutual support in teamwork (King et al., 2008); furthermore, these two items have both low factor loading and high error variance and the error variance is correlated.
Negatively worded items may correct for agreement bias, mainly if the scale comprises equal numbers of regular and negatively worded items (Baumgartner & Steenkamp, 2001). However, it may affect TA B L E 2 Mean score and standard deviation for T-TAQ items and dimensions (N = 509) the reliability, goodness-of-fit and factor loading of questionnaires (Baumgartner & Steenkamp, 2001). A problem in the T-TAQ was that the negatively worded items were not balanced through the questionnaire, as all the negatively worded items were in the last twothirds of the questionnaire. This location may make the respondents more relaxed and more careless in interpreting and responding to the items (Baumgartner & Steenkamp, 2001). Baker et al. (2008, p. 7) state in their T-TAQ manual that "items on the T-TAQ should not be modified." The modification of a model should be theoretically justified (Polit & Yang, 2016) as well, and the T-TAQ is built on a thorough theoretical base . Our results indicate that the reversed items are troublesome for factor loading and affect the reliability of the dimensions.  (Brock et al., 2013;Goliat et al., 2013;Maguire et al., 2015).

Items description Mean SD
This might indicate that the questionnaire is suitable for measuring a change in attitudes among healthcare students.

| Limitations
A limitation of this study is that more than 60% of the sample comprised first-year students. First-year students are supposed to be both the youngest and the least experienced segment of the sample with respect to teamwork experience and professional knowledge.
Another limitation is the use of two different methods of data collection. The email survey invited most of the available students but resulted in a response rate of only 15.3%. It is a known challenge to researchers that email surveys may have lower response rates than other survey methods (Manfreda et al., 2008). Regarding data collection by pen and paper, the number of students responding was limited to the students present in the class. On the other hand, the range and median age and sex of the respondents seem to be representative of the target population in Norway (Statistics, 2018).

TA B L E 3
Cronbach's alpha of T-TAQ, in the current study and previous studies  L12 S13 S14 S15 S16 S17 S18

ACK N OWLED G EM ENTS
We would like to extend our sincere appreciation to the Bachelor of Nursing students who participated in the study. Grateful thanks goes to Nils Rui for help with distribution of the email survey and to Jari Appelgren at Karlstad University, Sweden, for his great support during statistical analysis.

CO N FLI C T O F I NTE R E S T
All authors declare no conflict of interest.

AUTH O R CO NTR I B UTI O N
TK, MH, SW, RB: responsible for the conception and study design.
TK: performed the data collection. TK, MH, SW, RB: contributed to the analysis of the data. TK, MH, SW, RB: involved in drafting the manuscript and revising it critically for important intellectual content. All authors have read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author, [TK], upon reasonable request.