Development and psychometric evaluation of the Job Demands in Nursing Scale and Job Resources in Nursing Scale: Results from a national study

Abstract Aim To develop and test the psychometric properties of the Job Resources in Nursing (JRIN) Scale and the Job Demands in Nursing (JDIN) Scale. Design Cross‐sectional survey. Methods A three‐phase process of instrument development and psychometric evaluation was employed: Phase 1: development of a 42‐item JRIN Scale and 60‐item JDIN Scale through extensive literature review, expert consultation and an iterative content evaluation; Phase 2: pilot survey of 89 nurses and use of item discrimination analysis to estimate the internal consistency reliability of each subscale and reduce the length of each scale; Phase 3: Modified scales were tested in a nationwide survey of 3,822 rural/remote nurses, including use of exploratory factor analysis. Results The 24 items related to job resources favoured a six‐factor structure, accounting for 63% of the variance, Cronbach's alpha 0.88. The 22 items related to job demands favoured a six‐factor structure, accounting for 59% of the variance, Cronbach's alpha 0.84.

cal, social and organizational aspects of employment that contribute to sustained cognitive and emotional effort, leading to job-related stress. Job resources are viewed as the physical, psychological, social and organizational aspects of employment that are thought to foster motivation to achieve work-related goals and personal growth and buffer the impact of job-related demands Bakker, Hakanen, Demerouti, & Xanthopoulou, 2007;Llorens, Bakker, Schaufeli, & Salanova, 2006).
Research involving nurses also suggests that having a sufficient amount of varied job resources can predict lower distress and safeguard against high job demands in the workplace (Lavoie-Tremblay et al., 2013). Job resources in nursing that are shown to potentially have an impact on outcomes are broad-ranging and include those at the organizational level (e.g., staffing, career opportunity and professional development; Carter & Tourangeau, 2012), the interpersonal (e.g., collegial support; Van den Tooren & Jonge, 2008) and position/task level (e.g., autonomy and performance feedback ;Mauno et al., 2007).
Most studies exploring job demands and job resources in the context of nursing practice have used various combinations of single items, portions of standardized scales and/or instruments focused primarily on the acute care, urban and organizational level such as the Nursing Work Index-Revised (Aiken & Patrician, 2000). To improve important occupational outcomes such as retention and work engagement and reduce levels of burnout, it is necessary to develop a multidimensional conceptualization of the most salient job demands and job resources across areas of nursing practice. These instruments should have the ability to conceptually measure various job demands and job resources applicable to nursing practice as a whole, while allowing for potential comparisons to be made across nursing designations (e.g., registered nurses [RNs], nurse practitioners [NPs], licensed practical nurses [LPNs], registered psychiatric nurses [RPNs]), practice settings (e.g., acute care and public health) and geographical locations (e.g., rural and urban).

| Aim
The overall aim of this project was to develop and test the psycho-

| Design
In accordance with our aim and stated objectives, we employed a cross-sectional design with a three-phase process of instrument development and evaluation. Phase 1 involved an extensive literature review, expert consultation and an iterative content evaluation to develop the initial scale items. Phase 2 involved a pilot survey of 89 nurses and use of item discrimination analysis to estimate the internal consistency reliability of each subscale and reduce the length of each scale (data collected between September and December 2013).
Phase 3 consisted of a nationwide cross-sectional survey of 3,822 rural and remote nurses to further test the modified scales and use of exploratory factor analysis to finalize the factor structure of both scales (data collected between April 2014 and September 2015).

| Sample/participants
Those who were eligible to participate in the pilot survey included RNs, NPs, LPNs and RPNs practicing in Canada with recent/current experience in rural or remote settings. Based on power analyses (Bonett, 2002), the target sample size for internal consistency reliability testing of a 42-to 60-item scale (desired Cronbach's alpha [α] of 0.80) was 100 participants. Snowball sampling was used to recruit Phase 2 study participants (N = 89) through emails with study information and a direct link to the online survey. For the Phase 3 national cross-sectional survey, a stratified, systematic sample of 10,072 RNs, LPNs, RPNs and NPs was initially targeted, with 9,622 eligible participants in rural and remote communities across all Canadian provinces and territories. There were 450 participants who were ineligible due to incorrect addresses, duplicate registrations or were retired. A total of 3,822 participants completed surveys via mailed paper version or secure online access, for a response rate of 40%.
A total of 2,774 participants completed all 24 items related to job resources, and 2,431 participants completed all 24 items related to job demands. Only participants with valid responses on each item in both scales were included in the exploratory factor analysis to reduce artificially high correlations resulting from imputation of missing values (Tabachnick & Fidell, 2013). For the job resources analysis, there were no significant differences between those included and excluded (N = 1,081) in the analysis based on gender or nursing registration status (i.e., RN, NP, LPN and RPN), with included participants being slightly younger (mean: 46.6, SD: 11.7) than those excluded (mean: 48.9, SD: 11.9, p < 0.0001). For the job demands analysis, there were no differences based on nursing registration status and slight differences in gender (93.1% women in the included vs. 94.8% in the excluded group [N = 1,481]), with included participants being slightly younger (mean: 46.9, SD: 11.8) than those excluded (mean: 48.2, SD: 11.5, p < 0.001).

| Construct validity testing
Although three distinct types of validity (i.e., content, criterion and construct) have traditionally been identified, Cook and Beckman (2006) questioned these distinctions and proposed the conceptualization of "construct validity" as an overarching framework. Five evidence sources that support construct validity are content, response process, internal structure, relation to other variables and consequences (Messick, 1989 as cited in Cook & Beckman, 2006).
Four sources of construct validity evidence were drawn on through the psychometric testing process. Phase 1 instrument development and Phase 2 pilot testing provided validity evidence of both content (e.g., evaluation of the process for developing and selecting items) and response process (e.g., methods for scoring and reporting results). Internal structure and relation to other variables were further tested in Phase 3 using exploratory factor analysis and examination of the correlation of scale scores with the scores from other concepts measured (e.g., work engagement and burnout) in the Phase 3 national survey.

| Reliability testing
Internal consistency reliability was examined using Cronbach's alpha estimates during Phase 2 pilot testing to assess how well individual item related to other items and "Cronbach's alpha if deleted" to assess the item contribution to each subscale and full scales.
Cronbach's alpha estimates were repeated for the Phase 3 national survey data to confirm the internal consistency reliability for the final versions of both scales.

| Ethical considerations
Research Ethics Committee approval was attained from the separate research ethics boards of each of the research team members' institutions prior to the Phase 2 pilot survey testing and Phase 3 national survey.

| Phase 1: Instrument development
The first phase included an extensive literature review and discussions with a national 16-member research team (including nine registered nurses, four nurse practitioners, a geographer, a statistician and a psychiatric epidemiologist) to determine key dimensions related to job resources and job demands in nursing. Essential research that guided this process was a national survey of the nature of nursing practice in rural and remote Canada (Stewart et al., 2005) and the work completed in Australia exploring resources and demands in rural and remote nursing practice (Lenthall et al., 2011(Lenthall et al., , 2009 Next, the scale developers (K.L.P & J.G.K) generated six items for each of the seven job resources (42 items total) and 10 job demands (60 items total) subscales, with each subscale representing a key dimension. Item development was informed by studies that addressed the key dimensions identified (Crowden, 2010;Delobelle et al., 2011;DesRoches, Miralles, Buerhaus, Hess, & Donelan, 2011;Hanvey, 2005;Hayes et al., 2006;Hunsberger, Baumann, Blythe, & Crea, 2009;Lenthall et al., 2011Lenthall et al., , 2009Nelson, Pomerantz, Howard, & Bushy, 2007;Orchard, King, Khalili, & Bezzina, 2012;Penz et al., 2007;Penz, Stewart, & D'Arcy, & Morgan, D., 2008;Stewart et al., 2005;Thompson, 2004). A draft version of both scales was then provided to our 19-member national advisory team (i.e., nursing leaders and policymakers in each province and territory) who worked with the research team through an iterative process (i.e., items revised and added/excluded) to further verify the content validity of both scales. This was achieved through a combination of a two full day sessions and seven teleconferences, with the research and advisory teams providing conceptual and item-specific feedback, until consensus was reached on the content, phrasing and format of three positively worded (reverse scored for the JDIN) and three negatively worded items (reverse scored for the JRIN) distributed randomly in each subscale. Both scales were scored on a five-point Likert scale from: 1 (strongly disagree); 2 (disagree); 3 (neutral); 4 (agree); to 5 (strongly agree); or 97 (not applicable). "Not applicable" responses were coded as "missing." Higher JRIN scores indicated a higher level of job resources, and higher JDIN scores indicated a higher level of job demands.  (Williams et al., 1999) and a newly developed 60-item Primary Health Care Engagement Scale (the results of which are published elsewhere, see Kosteniuk et al., 2016Kosteniuk et al., , 2017. Demographic characteristics of the pilot survey sample were analysed using IBM SPSS, v23.0. Cronbach's alpha for each subscale. If more than 25% of the items were missing, the subscale score was discarded (El-Masri & Fox-Wasylyshyn, 2005). The item discrimination method (Furr & Bacharach, 2008) and the conceptual judgement of our research data team were used to evaluate the corrected item-total correlation (r) for each item. The range in item-total correlation was evaluated as lower than 0.30 (weak), 0.30-0.49 (moderate) and ≥0.50 (substantial; Kellar & Kelvin, 2013). The items with the lowest item-total correlation were removed, with reliability estimates calculated after each removal. The aim was to identify the four items that had the strongest association to represent the construct. The internal consistency reliability was estimated for shortened version of each subscale. Cronbach's alpha of 0.70 indicated modest or acceptable internal consistency reliability for each subscale (Nunnally & Bernstein, 1994). Through the item analysis process described in the pilot testing results, decisions were also made to remove one JRIN subscale and four JDIN subscales, leaving a 24-item JRIN Scale and a 24-item JDIN Scale for further evaluation.

| Phase 3: National cross-sectional survey: exploratory factor analysis
The data used to analyse the factor structure of the JRIN and JDIN scales were from a nationwide cross-sectional study, the Nature of 2009), which was the case in the pilot testing phase of this study.
The sample size of 2,774 (JRIN analysis) and 2,341 (JDIN analysis) was more than adequate (Comrey & Lee, 1992). IBM SPSS v23.0 was used to conduct exploratory factor analysis of the 24 items related to job resources and the 24 items related to job demands. Principal  Note. Scoring: 1 = strongly disagree; 2 = disagree; 3 = neutral; 4 = agree; 5 = strongly agree (including a "not applicable" response option is not recommended). JRIN 6-factor structure explaining 63% of the variance. Total Cronbach's alpha across all JRIN factors National Survey α = 0.88. Case mean imputation guidelines: For each 4-item subscale, the case mean may be imputed where 25% or less of items is missing (i.e., one item; El-Masri & Fox-Wasylyshyn, 2005); if the participant's subscale is missing 2 or more items, then that participant's subscale should be discarded. a Reverse-scored items: (4,5,7,8,11,12,14,16,17,20,21,22,23).  Cook & Beckman, 2006), correlation of scale scores with the scores from other concepts measured (e.g., work engagement and burnout) was also calculated using the national survey data. Most respondents were direct care providers (67.4%), held positions of leadership (11.2%) or worked in nursing education/other (17.9%). In the remaining six JRIN subscales, the items with the lowest itemtotal correlations were removed one at a time, with reliability estimates repeated after each removal to achieve the highest alpha with the smallest number of items. the "Scheduling" subscale returned strong item-total correlations, a team decision was made to remove this subscale due to theoretical overlap with national survey items and to broaden the applicability (e.g., not all nurses work shift work). The steps of removing the items with the lowest item-total correlation one at a time and examining reliability estimates after each removal were performed for the remaining six JDIN subscales. Table 3 are the refined JRIN Scale dimensions following pilot testing namely supervision, collegial support, staffing, technology, professional development and autonomy and control.   analyses. The majority of participants were female and ranged in age from 19 to 84 years (study included those who were retired but occasionally employed in nursing), with an average age of 46.8 years.

| National survey data: factor analysis sample characteristics
Just over half of each sample was RNs, with the one in three LPN participants. Participants were distributed across Canada, with the majority indicating "staff nurse" as their primary position. A diversity of practice settings was represented, with the majority of nurses working in acute care, long-term care, community health and primary care.

| Exploratory factor analysis and mean subscale scoring
The JRIN and JDIN scales were examined further for construct validity with exploratory factor analysis of data from the full national survey. Table 6   Note. Scoring: 1 = strongly disagree; 2 = disagree; 3 = neutral; 4 = agree; 5 = strongly agree (including a "not applicable" response option is not recommended). JDIN 6-factor structure explaining 59% of the variance. Total Cronbach's alpha across all JDIN factors National Survey α = 0.84. Case mean imputation guidelines: For each 4-item subscale, the case mean may be imputed where 25% or less of items is missing (i.e., one item; El-Masri & Fox-Wasylyshyn, 2005); if the participant's subscale is missing 2 or more items, then that participant's subscale should be discarded. Case mean imputation should not be performed on the 3-item subscales; if a participant is missing 1 or more items on the 3-item subscales, then that participant's subscale should be discarded. a Reverse-scored items: (2,4,5,6,8,9,10,12,15,16,17,18,19,20,22).
internal consistency reliability was noted for the JRIN subscales (α = 0.74-0.88) and the full 24-item JRIN scale (α = 0.88) for the national sample analysis. analysis on the items related to job demands originally clustered into six factors with three to four items per factor and a seventh two-item factor. Due to a low eigenvalue of 1.02, a changing Scree plot slope between the sixth and seventh factor, moderate to low correlation between the two items (r = 0.48) and minimal contribution to the total variance, the seventh factor was removed from the

| Summated scores and relationships to other variables
The mean total score for the final 24-item JRIN scale was 79.6 (SD: 13.1) with a range in scores from 27 to 120 for the full national survey sample indicating a medium to high level of perceived work-related resources. The perceived work-related demands for the full national sample were low to medium with a mean total score of 51.1 (SD: 9.9) for the 22-item JDIN scale, range in scores from 22-99. Pearson's product moment correlations among other concepts measured in the national survey and the JRIN and JDIN are presented in Table 8.
The JRIN scale and JDIN scale scores correlated as predicted with weak to moderate significant (p < 0.001) correlations with work engagement, burnout, organizational commitment and job satisfaction.
Participants with higher scores on the JRIN scale tended to exhibit higher work engagement (r = 0.37), higher organizational commitment (r = 0.28), higher job satisfaction (r = 0.51) and lower burn-  was evident that they perceived themselves to have relatively low to medium job demands and medium to high job resources related to their work. We were encouraged to find that the total scores on the JRIN were positively correlated with work engagement and organizational commitment and inversely correlated with burnout. Further, the total scores on the JDIN were inversely correlated with organizational commitment and work engagement and positively correlated with burnout, suggesting the potential use of these scales to further explore these occupational outcomes. Although comparisons between nursing groups or predicting occupational outcomes were not the purpose of this analysis, the value of these scales is that they may assist in identifying some of the multidimensional resource gaps and demand pressures that require priority attention across a wide variety of settings. For instance, a review of job demands among remote area nurses concluded that the literature in this field was scarce and that further empirical studies would give health service planners much-needed information for policy purposes (Lenthall et al., 2009).
It is evident that due to budgetary constraints, higher patient acuity, understaffing and increased workloads, many areas of nursing practice are facing higher demands (Montgomery, Spânu, Băban, & Panagopoulo, 2015). Exposure to chronic work strain is especially concerning for the nursing profession, having negative consequences on both their physiological health and psychological well-being (Aiken et al., 2013). Both the JRIN and JDIN scales could be used in a variety of ways to assist managers and researchers to better understand some of the factors that may positively or negatively influence the physiological or psychological well-being of nurses in practice and to identify specific areas to target in terms of developing interventions to increase resources and reduce demands in organizations. An assumption of the job demand-resource model is that strategies that increase perceived resources actually have a protective effect on employee's occupational well-being, even in the context of demanding working conditions (Bakker et al., 2005). This is important to note, as it may not always be feasible to directly intervene in reducing overall job demands in nursing practice and it may be helpful to explore whether the presence of higher resources (total summated score on the JRIN) has a protective effect on nurses, even in the face of higher demands (total summated score on JDIN). The scales could also be used to identify and compare specific gaps in resources and/or areas where demands are at their highest. For example, by calculating and comparing the summated mean item score for each of the six subscales in both scales (i.e., summated mean item scores divided by the total number of items), which accounts for the differences in the number of total items (e.g., 3-4) for each subscale and allows for standard comparison across subscales to be made. Comparisons indicating a low degree of agreement (1.0-3.0) to a high degree of agreement (> 3.0) on that particular factor or subscale for either the JRIN Scale or JDIN Scale are therefore straightforward to perform.
To support nurses better in their roles and reduce attrition or nursing turnover, the mean subscale scores could be also used to explore the predictive effect that specific job resources (e.g., staffing, collegial support and technology) and job demands (e.g., safety and working conditions) have on important occupational outcomes such as work engagement, burnout and job satisfaction.
As well, the total scores for both scales could be used in more complex multivariate analyses, where models of work engagement and burnout would be explored as potential mediators between higher demands/lower resources and other key outcomes such as psychological health status and organizational commitment (Boudrias et al., 2011).
The lowest job resources mean subscale score found in our analysis was for "Staffing and Time," indicating that nurse participants had a lower level of agreement about having an appropriate mix of support staff, or adequate time to provide comprehensive care, an area of concern commonly identified across the nursing literature, both rural and urban (Shamian, Kerr, Laschinger, & Thomas, 2002;Twigg, Cramer, & Pugh, 2016). When exploring demands in our analysis, "Safety" demonstrated the highest level of agreement for the participants, indicating that their greatest demand was related to their safety being at risk both in their workplace and when off-duty. The rising rates of physical violence and verbal abuse against nurses and other healthcare workers (Phillips, 2016) highlight how crucial it is to include measures of "safety" in exploring current demands faced by the nursing profession. We believe that targeting policy change to nursing-specific areas of concern identified through use of the JRIN and JDIN could assist in developing strategies to address unsafe and/or difficult working conditions (e.g., safety, supervision, isolation and preparedness) and ultimately improve the quality of care provided in rural and urban practice settings.
Historically, nursing workforce studies have focused on homogeneous samples of urban nurses to access larger sample sizes (Molinari & Monserud, 2008). Unique to this study is that the development, refinement and modification of the JRIN Scale and JDIN Scale evolved throughout a three-phase process over 3 years and the concepts that were retained (e.g., collegial support, staffing and time and comfort with working conditions) were determined to have applicability across both rural and urban nursing practice settings. Further evidence for the validity of the scales is necessary in samples of urban nurses, with a key priority to replicate the study using a strictly urban sample of nurses and to examine the reliability of both scales in large tertiary hospitals and/or primary care settings. Ongoing assessment of JRIN and JDIN would also include exploring the international relevance of these scales in countries with similar diversity as Canada in rural and urban practice such as Australia, New Zealand and the USA. In these geographic contexts, similar issues have been identified including being under-resourced (Lenthall et al., 2009) and having lower numbers of nurses practicing in sparsely populated geographical locations (Glasser, Peters, & McDowell, 2006). Finally, identifying the occupational roles of nurses across geographic contexts and comparing the experiences of those in strictly urban practice settings with those in a diversity of rural settings will further strengthen our understanding of key occupational outcomes (e.g., work engagement, burnout and occupational commitment) predicted by high demands and low resources in nursing practice while also assisting with health human resource planning.

| Limitations
In the pilot survey phase of this study, the authors acknowledge that the sample size of 89 respondents was slightly smaller than the target sample of 100. Instead of calculating a content validity index, the decisions to retain/revise items and subscales involved a content evaluation process (i.e., 16-member research team and 19-member advisory team) based on theoretical analysis, item analysis, reliability testing and exploratory factor analysis. The authors also recognize the limitation of developing new instruments in a large national survey while attending to the overlap of related constructs (e.g., removal of the Scheduling subscale). Our inclusion of a "not applicable" category on the items in each scale may have exaggerated the number of missing cases in both factor analyses (i.e., our application of listwise deletion criteria which excluded any cases with at least one missing item in each scale). Between 5% and 10% of participants responded "not applicable" on four items of the job resources technology subscale and 18%-20% on four items of the job demands travel subscale. Case mean imputation for the "not applicable" responses was not calculated due to the potential for artificially high correlations (Tabachnick & Fidell, 2013). Finally, we acknowledge we could have taken a more broad approach to developing scales focused on the resources and demands for healthcare professionals as a whole (e.g., focusing on a diversity of disciplines), but we have chosen to develop tools specifically for assessing the resources and demands for regulated nurses.

| CON CLUS ION
The 24-item Job Resources in Nursing (JRIN) Scale and 22-item Job Demands in Nursing (JDIN) Scale offer researchers short, simple to administer instruments with acceptable factor structures and good internal consistency reliability as tested in a nationwide cross-sectional survey. Although developed in the context of rural/remote nursing practice, these scales examine specific job demands and resources that a majority of nurses may experience and may mediate or moderate the impact on specific outcomes (e.g., work engagement, organizational commitment and burnout). Use of these scales could assist researchers and managers to better understand the perceived safety and working conditions of regulated nurses and identify specific resources and demands that require action to improve occupational outcomes. Given that our sample is representative of the population of rural and remote nurses throughout Canada, these scales would allow for comparisons to be made across nursing designations, practice settings and geographical areas. Further psychometric testing with urban samples and across different countries to explore their international relevance and make comparisons among various rural/remote and urban practice settings is necessary.

CO N FLI C T O F I NTE R E S T
No conflict of interest has been declared by any of the authors.
Patient consent: not applicable in this study.

E TH I C S S TATEM N E T
The overall project adhered to the ethical principles outlined in