Rasch analysis of Stamps's Index of Work Satisfaction in nursing population

Abstract Aim One of the most commonly used tools for measuring job satisfaction in nursing is the Stamps Index of Work Satisfaction. Several studies have reported on the reliability of the Stamps' tool based on traditional statistical model. The aim of this study was to apply the Rasch model to examine the adequacy of Stamps's Index of Work Satisfaction for measuring nurses' job satisfaction cross‐culturally and to determine the validity and reliability of the instrument using the Rasch criteria. Design A secondary data analysis was conducted on a sample of 556 registered nurses from two countries. Methods The RUMM 2030 software was used to analyse the psychometric properties of the Index of Work Satisfaction. Results The persons mean location of ‐0.018 approximated the items mean of 0.00, suggesting a good alignment of the measure and the traits being measured. However, at the items level, some items were misfiting to the Rasch model.


Rasch analysis of Stamps's Index of Work Satisfaction in nursing population
Nora Ahmad 1 | Nelson Ositadimma Oranye 2 | Alyona Danilov 3

| BACKGROUND
Job satisfaction remains an important topic in organizational studies and has been extensively studied in many fields, including nursing.
Studies on job satisfaction dates back to as early as 1920 (Snarr & Krochalk, 1996) and have been studied with numerous tools and in different populations. The existing evidence shows that job satisfaction is influenced by multiple factors operating at the level of the job, individual, professional, organizational and the general work environment (Pittman, 2007;Ravari, Bazargan, Vanaki, & Mirzaei, 2012). Some of the specific factors that have been found to affect nurses' job satisfaction are job stress (Flanagan & Flanagan, 2002), management style of nursing leadership (Pietersen, 2005;Yamashita, Takase, Wakabayshi, Kuroda, & Owatari, 2009), empowerment (Cicolini, Comparcini, & Simonetti, 2014;Manojlovich & Laschinger, 2002), nursing autonomy (Castaneda & Scanlan, 2014;Hayes, Bonner, & Pryor, 2010), co-worker interactions, group cohesion and salary (Curtis & Glacken, 2014;Wielenga, Smit, & Unk, 2008). The multiplicity of factors that impinge on nurses' job satisfaction have made the development of measurement tools that are valid and reliable across different work and cultural environments very challenging. However, several measurement tools have emerged over time, most of which have demonstrated high reliability and validity.
There are several reasons why job satisfaction among nurses has remained a persistent and hot topic in the nursing literature. Many researchers recognize the need to monitor job satisfaction of nurses because nurses' dissatisfaction could be disruptive to patient care delivery and reduce healthcare organizational effectiveness (Cheung & Ching, 2014;Curtis, 2007;Djukic, Kovner, Budin, & Norman, 2010;Taunton et al., 2004). Also, job satisfaction has been linked to different outcomes for the nurses, which includes nurses' perceived ability to express caring behaviours with patients (Amendolair, 2012), new immigrant nurses' acculturation (Ea, Griffin, L'Eplattenier, & Fitzpatrick, 2008) and 'lower levels of job-stress, burnout and career abandonment among nurses' (Foley, Lee, Wilson, Cureton, & Canham, 2004, p. 94). Nurses job satisfaction has also been associated with positive patient outcomes, such as reduced patient falls (Alvarez & Fitzpatrick, 2007). However, it is important that measurement tools used for job satisfaction are constantly reviewed to ensure that they are measuring what they are intended to measure and that users are made aware of any pitfalls, should they choose to use such tools.

| Measurement tools for job satisfaction
A large body of research on job satisfaction has been accumulated, either using or attempting to validate well-known measurement tools or new tools that assess nurses' job satisfaction. Our search of the literature on nurses' job satisfaction from 1986 to May, 2015 identified 100 studies that reported measurement of job satisfaction in nursing. Among these studies, there were 20 different instruments used to measure nurses' job satisfaction. Some of the tools that showed good reliability and validity and which were most commonly used include: Minnesota Satisfaction Questionnaire developed by Weiss and colleagues in 1967 (Kaplan, Boshoff, & Kellerman, 1991;Lamarche & Tullai-McGuinness, 2009;Stamps, 1997;Weiss, Dawis, & England, 1967); Index of Work Satisfaction (IWS) developed by Stamps andPiedmont in 1970s (Slavitt, Stamps, Piedmont, &Haase, 1978;Stamps & Piedmonte, 1986); Quinn and Staines's Facet-free Job Satisfaction Scale developed by Quinn and Staines in 1979 (Djukic et al., 2010;Kovner, Brewer, Wu, Cheng, & Suzuki, 2006); Mueller and McCloskey's Satisfaction Scale (MMSS) developed by Mueller andMcCloskey in 1990 (Misener, Haddock, Gleaton, &Abuajamieh, 1996;Mueller & McCloskey, 1990;Price, 2002;Tourangeau, Hall, Doran, & Petch, 2006).
The Minnesota Satisfaction Questionnaire has Cronbach α range of 0.83-0.84 and validity between 0.32-0.75 (Lamarche & Tullai-McGuinness, 2009). Zurmehly (2008) noted that Hoyt reliability coefficient between 0.59-0.97 has been reported for the Minnesota Satisfaction Questionnaire, while Kaplan et al. (1991) reported a Cronbach α ranging between 0.82-0.90 for the different components, which demonstrate adequate reliability. The internal consistency of the MMSS was reported as 0.89 in Mueller and McCloskey (1990) and 0.90 in Misener et al. (1996). The test-retest reliability for the subscales ranged from 0.08-0.64 (Misener et al., 1996;Mueller & McCloskey, 1990). With regard to Quinn and Staines's Facet-free Job Satisfaction Scale, Kovner et al. (2006) reported reliability coefficients for the scales ranging from 0.70-0.95. The psychometric properties of IWS have been reported in multiple studies (Ahmad & Oranye, 2010;Huber et al., 2000;Stamps, 1997;Wade et al., 2008), which reported on the internal consistency reliability and the validity of the IWS scales. Zangaro and Soeken (2005) explored the reliability and validity of the IWS through a meta-analysis of 14 studies that used the IWS to measure nursing job satisfaction. The meta-analysis by Zangaro and Soeken (2005) included only articles that reported the reliability of part B of the IWS and concluded that the part B of the IWS was reliable and valid in different settings, including university, community and acute care hospitals and for multisite studies. The internal consistency reliability and validity of the IWS scale and its subscales ranged from 0.50-0.92 Cronbach's α (Bjork, Samdal, Hansen, Torstad, & Hamilton, 2007;Hayes, Douglas, & Bonner, 2015;Itzhaki, Ea, Ehrenfeld, & Fitzpatrick, 2013;Manojlovich & Laschinger, 2007;Penz, Stewart, D'Arcy, & Morgan, 2008;Zangaro & Soeken, 2005). The highest subscale coefficient of 0.92 was reported by Manojlovich and Laschinger (2007), while the lowest Cronbach's alpha was reported by Medley and Larochelle (1995). The Cronbach's α originally reported by Stamps (1997) ranged from 0.82-0.91. Content validity (Kovner, Hendrickson, Knickman, & Finkler, 1994) and construct validity through factor analysis (Stamps, 1997) have been established.
Among these job satisfaction measurement tools, the IWS has been one of the most widely used. The IWS measures 'the extent to which people like their jobs' (Stamps, 1997, p. 13) and provides a quantitative estimation of nurses' job satisfaction. The tool was amplified in 1986 by Stamps and Piedmont based on a critical review of occupational theories in the social sciences (Amendolair, 2012;Kovner et al., 1994;Slavitt et al., 1978;Stamps & Piedmonte, 1986).
The strong theoretical foundation of Stamps's IWS was intended to address the seemingly atheoretical plunge of many of the extant job satisfaction measurement tools. Stamps and Piedmonte (1986, p. 19), noted that they '…proceeded to develop a valid and reliable scale for measuring nurses' work satisfaction, one general enough to be used in many settings…' The IWS scale assesses the level of nurses' professional satisfaction in six work dimensions: payment, professional status, task requirements, interactions, organizational policies and autonomy (Stamps & Piedmonte, 1986) and is rated on a seven-point Litert scale. The level of professional satisfaction for each of the six dimensions (subscales' scores) and the overall professional satisfaction level (entire IWS score) have been reported in previous studies.
Hitherto, the statistical methods typically used for psychometric measurement in nursing research were based on the traditional statistical model. The Rasch analysis model provides an alternative to the traditional psychometric measurement that is sophisticated, comprehensive and is based on the Item Response Theory (Belvedere & de Morton, 2010;Hagquist, Bruce, & Gustavsson, 2009). The Rasch model was originally developed for measuring the psychometric properties of educational testing tools (Andrich, 2005), but nowadays, has been increasingly used in health sciences and many other disciplines. However, not many studies have been undertaken using the Rasch model in health sciences (Hagquist et al., 2009). A successful implementation of the Rasch measurement requires that the assumptions of local independence and unidimensionality are satisfied (Brentari & Golia, 2008). In addition to the criteria of unidimensionality and local independence, Rasch uses the criteria of differential item functioning (DIF), person separation index (PSI) and fit statistics to determine the reliability and validity of a measurement tool.
Few studies have been undertaken using the Rasch model in the health sciences (Hagquist et al., 2009) and very few studies have used Rasch to measure nurses' job satisfaction. There were three articles that applied the Rasch model in nursing (Clinton, Dumit, & El-Jardali, 2015;Flannery, Resnick, Galik, Lipscomb, & McPhaul, 2012;Hagquist et al., 2009); however, despite the wide use of Stamps' IWS in nursing research and in diverse environments, no study has applied the Rasch model to evaluate its reliability and validity. The purpose of this study was to apply the Rasch model to examine the adequacy of Stamps's Index of Work Satisfaction for measuring nurses' job satisfaction cross-culturally and to determine the validity and reliability of IWS using the Rasch criteria.

| METHODOLOGY
This is a secondary data analysis that uses data from Ahmad and Oranye

| Ethics
The study complies with the international human research ethics guideline and the Declaration of Helsinki code of ethics. The Ethical approval for the study was obtained from the University of Sheffield Ethics Committee, the NHS and Hospital directors in the two hospitals in England and Malaysia (Ahmad & Oranye, 2010).

| Procedure
The current descriptive study uses the data related to part B of Stamps (1997) IWS tool to determine the adequacy of the IWS tool in measuring job satisfaction cross-culturally, by applying the Rasch model. The IWS contains 44 items with six components of pay, autonomy, task requirements, professional status, interaction and organizational policies. There are six items in the pay subscale, eight in autonomy, six in task requirements, seven in professional status, 10 in interaction and seven in the organizational policies subscale (Ahmad & Oranye, 2010;Stamps & Piedmonte, 1986). The reliability index of the IWS has been reported in previous studies (Ahmad & Oranye, 2010;Medley & Larochelle, 1995;Wade et al., 2008) and in the Manual (Stamps & Piedmonte, 1986).
This study, conducted a systematic search of the literature related to nurses' job satisfaction and research in four major databases of PubMed, CINAHL, PsycINFO and SCOPUS from 1986 -May 2015, to find studies that have relevance to this study and to ascertain if Rasch model has been applied to the IWS. The following inclusion and exclusion criteria were applied: (1). Papers published in English language; (2). Publications with a study sample that included nurses; (3). Job satisfaction was measured using the IWS; (4). Reliability and validity of the IWS were reported for the study sample; (5). Papers that applied Rasch analysis model to measures of job satisfaction. The search resulted in 100 papers, which were further screened for relevance. Finally, 53 of the papers and four other papers on Rasch model were included in this study.

| Analysis
The Rasch analysis model was applied using the Rumm 2030 software.
Data from the two countries were stacked for comparative analysis purpose. The Rumm 2030 software performs an item by item analysis, providing the capability to examine each item at different levels, including individual, country and other group levels, such as age, work status etcetera. The data stacking enables a simultaneous analysis of variables across the multiple levels. An analysis of the fit statistics was used to determine if IWS scale fits the Rasch model expectations. The statistics were examined to determine whether the criteria of unidimensionality, differential item functioning (DIF) and person separation index (PSI) were satisfied by the IWS scale (Brentari & Golia, 2008).

| Descriptive analysis
Of the 554 subjects in the data, 70% were from Malaysia and 30% were British. The majority of the subjects were female (96.4%), which is a reflection of the gender composition of the nursing profession in many environments. The majority were married (62.5%), while 34.5% were single and the others were divorced, widowed or unknown.
A smaller proportion had a university degree (11%), but the most common level of education was Diploma (65.9%) and a certificate in nursing (23.1%). Most of the nurses worked as full time staff (90.4%).
In Rasch analysis, one way to measure the adequacy of a tool is the targeting of the traits of interest in the population. In Fig. 1, the spread F I G U R E 1 Person-item threshold distribution of the items in the scale vis-à-vis the persons location shows that the tool has a good targeting of the person characteristics in the sample. The persons mean location of −0.018 is approximately equal to the items mean of 0.00. However, the negative persons mean value suggests the possibility of very few participants whose scores were lower than the theoretical expected average level of job satisfaction. This could also suggests a slightly lower level of job satisfaction among the population.

| Reliability indices
The Rasch model provides two estimates that confirm the reliability of a tool and the precision of the estimate of each person trait in the sample. The person separation index (PSI)=0.8578 was approximately equal to the Cronbach α coefficient=0.851, both of which indicate a very good reliability and internal consistency of the IWS (Table 1). Table 2 shows fit statistics for the item-person interaction. The Rasch analysis indicates an excellent power of analysis of fit for the data, which means that the analysis was strong enough to detect any differences where there was one. The standard deviation of the fit residuals for the items at the subscale levels and the total scale were high, suggesting poor fit to the Rasch model. Generally, these suggest the presence of some mis-fitting items and individuals in the data set whose response patterns deviated substantially from the expectation of the Rasch model (Tennant & Conaghan, 2007). In all the six subscales, the residual standard deviation for the items were higher than the residual standard deviations for persons. The persons residual standard deviations for the Professional status, Task requirement and Pay subscales are below 1.4, suggesting that it is very unlikely there were persons whose responses deviated significantly from the Rasch model expectation in those subscales. The significant Chi Square, p < .0001, equally indicates a misfit to the Rasch model. Cummings, Hayduk, and Estabrooks (2006)

| Unidimensionality test
Another important statistics considered in this study is the unidimentionality test, using the paired t test statistics. The paired t test = −2.8, shows that 187 of the sample estimates were significantly different at p < .05 and 110 were significantly different, at p < .01. The t test statistic gave a significant value much higher than the 5% required for Rasch unidimensionality. This analysis supports the multidimensionality of the IWS scale, which was originally designed to measure six dimensions of pay, autonomy, task requirement, organizational requirement, job status and interaction.
The individual item fit residuals were examined to identify those items that may be causing the model miss fit. The result from the subscales shows that items 7, 10, 14, 18, 32, 36 and 43 had extreme fit residual values. The fit residuals for items 7 and 32 were consistently high at subscale and combined scales levels. Also, the Table 3 shows that a total of 12 items had significant Bonferroni Adjusted χ 2 probability <0.00125, indicating that these items were miss fitting of the Rasch model.

| Analysis of Differential Item Functioning (DIF)
The DIF is a test of item bias or how each item in the scale functions for each individual, irrespective of 'ability level'. In this study, the DIF was examined with respect to age, gender, years of experience, work status (full time or part time) and country. A primary factor of interest is the country, whether participants were British or Malaysian nurses, which by extension implies socioeconomic and cultural differences.

| DISCUSSION
The PSI and Cronbach's α for the Index of Work Satisfaction in this study are consistent with previous studies, which reported good reliability, ranging from 0.54-0.92 (Bjork et al., 2007;Curtis & Glacken, 2014;Manojlovich & Laschinger, 2007;Oermann, 1995;Stamps & Piedmonte, 1986). There is a strong evidence, both from this and previous studies that support the reliability of IWS for assessing job satisfaction among nurses. However, it is possible that the variation in the reliability reported across studies is an indication that the meanings or values of some of the items may not always be consistent across populations. For instance, Karanikola and Papathanassoglou (2015) found that two items in the IWS scale were not consistent with other items and as such affected the internal consistency of the tool. Essentially, the reliability of a measurement tool focuses on the consistency of the measurement in measuring what it is intended to measure. However, what is measured, especially in the social world, is often inequivalent across social environments, because the meanings and values vary from one place to another. Therefore, it is important that attention is paid not just to the consistency of a scale, but the meanings and values of the concept or construct being measured, across cultures.
The high standard deviations of the fit residual for the items (range: 1.57-3.36) points to the possibility of mis-fitting items, while the fit residual for the persons (range: 1.08-1.58) shows the less likelihood The IWS items align very well with the persons measure, but overall, the spread of the items were wider on both tails of the graph ( Fig. 1) than the person traits being measured. The spread seems to suggest that some of the measures were either above or below the respondents' 'ability' level. In the context of the measurement of job satisfaction, the items at the extreme were possibly measuring traits that may not be directly relevant to understanding participants' job satisfaction. Again, it is important to note that what makes for job satisfaction would very likely vary in time, place and people. A detailed individual item-response analysis will be required to identify those items in the tool that are probably irrelevant or contributing very little to the measurement of the construct of job satisfaction or the underlying concepts of pay, professional status, interaction, task requirements and organizational policies.
The Rasch model expects a good measurement tool to be invariant across the sample and traits being measured. In other words, each item on a measurement scale is expected to measure the attribute of interest between different participants without any bias. Linacre and Wright (1987) have emphasized the importance of identifying and quantifying differential item functioning for contrasting groups and to clearly understand the differences between groups. For a measurement tool, such as the IWS that is designed to be used in different environments, it is important to understand how the different items in the tool function for different participants and groups. The presence of a substantial number of items with

| Limitations
There were 51 extreme cases in this study sample; however, their removal did not significantly change the result. The findings from this single study may not be sufficient to draw definitive conclusions on the miss fit of IWS to the Rasch model. Further studies across countries and work environments that apply Rasch model and a review of local dependency and item difficulty levels is needed.

| CONCLUSION
The IWS is a very reliable tool, especially at the composite level, as indicated by this study and several others. The IWS has been used in several nursing studies, but its cross-cultural validity has not been well evaluated based on item-response theory and using Rasch model conducted to test for DIF across cultural groups. Given that the IWS is a multidimensional tool, it may not be realistic to sum up the scores from the different dimensions as an index for comparison between significantly different groups, since the issues they measure may vary over time and place. Equally, it may not be meaningful to compare the total score on job satisfaction between two different groups, since the meaning of job satisfaction or any of the components may differ significantly between groups.

FUNDING
There is no funding for this study.