Testing the validity of the Networked Hazard Analysis and Risk Management System (Net‐HARMS)

Testing the validity of newly developed methods is a critical component of human factors and ergonomics (HFE) practice. The Networked Hazard Analysis and Risk Management System (Net‐HARMS) is a recently developed systems thinking‐based risk assessment method which supports the identification of task and emergent risks across overall work systems. This article reports on a validity study of the Net‐HARMS method in which outputs were compared to an expert analysis developed by the first two authors of this paper, with review by subject matter experts. The findings show that individual participant performance was poor for both groups yet when both group's analyses were pooled, validity significantly improved. Further, a subject matter expert analysis of the false alarms identified by participants showed that they may in fact represent credible risks. It is concluded that the Net‐HARMS method achieved high levels of validity when participants analyses are pooled. The implications for risk assessment and the validity of HFE methods are discussed.


| INTRODUCTION
There are increasing calls for a systems approach to risk assessment (Dallat et al., 2017b;Hulme, McLean, et al., 2021;Leveson, 2011;Salmon et al., 2017) Stanton et al., 2013), these calls have yet to make a significant impact in terms of the availability of such methods (Dallat et al., 2017b;Eidesen et al., 2009;Escande et al., 2016;Leveson, 2011;Pasquini et al., 2011;Stanton & Harvey, 2017). The majority of current risk assessment methods applied across safety-critical domains are focused on individual performance (e.g., pilot, control room operator, driver), are component based, and largely view accidents as linear or chain-of-event in their trajectory (Dallat et al., 2017b). Additionally, most risk assessment methods fail to consider that safety is an emergent property of work systems and therefore not a component property (Leveson, 2004). As part of a wider program of research, these authors developed the NET-worked Hazard Analysis and Risk Management System (Net-HARMS; Dallat et al., 2017a). Net-HARMS is a systems thinking-based risk assessment method that supports the identification of systemic risks by enabling analysts to describe the system under analysis and identify task and emergent risks via a risk mode taxonomy. Dallat et al. (2017a) applied Net-HARMS to a 5-day hiking and rafting program. Although the application provided encouraging evidence of its utility, Dallat et al. (2017a) concluded that further testing of the method is required. In particular, reliability and validity testing were identified as an important step in the method's development. An essential requirement for Human Factors and Ergonomics (HFE) methods is that they produce valid outputs when used by analyst's other than the method developers, and that the same outputs are produced when the method is applied by different analysts, across different occasions (Stanton, 2016;Stanton & Young, 1999, 2003. Hulme et al. (2021a) recently compared the reliability and validity of Net-HARMS, STPA, and EAST-BL when used to identify risks across the railway level crossing design lifecycle.
Based on a comparison of the identified risks with a gold standard, an analysis using the Signal Detection Theory (SDT) paradigm (Green & Swets, 1966), Hulme et al. (2021a) found a weak to moderate positive correlation coefficient for Net-HARMS. Whilst this provides positive evidence for the capacity of Net-HARMS to support valid identification of risks, further testing in different contexts is required.
This study involved an assessment of the concurrent validity (a measure of current performance sampled) of Net-HARMS, using the SDT paradigm (Green & Swets, 1966). An important feature of risk assessment methods generally is that they tend to be heavily reliant on the expertise of the analyst (Dallat et al., 2017a). It is quite likely then, that expertise or prior knowledge will influence the validity of a risk assessment method. An interesting recent finding from reliability and validity studies is that HFE methods tend to achieve higher levels of validity when multiple analyses are undertaken by groups of analysts, as opposed to an individual analysis (Cornelissen et al., 2014;Harris et al., 2005;Stanton et al., 2009). This pooling of analysts' work-where independent analysts from across the work system conduct their own analysis, following which the results are pooled, has been suggested as a tactic which both recognises, and addresses the deficits in predictions made by individual analysts (Stanton et al., 2009). Put another way, the increased accuracy achieved by a team of analysts, can outperform predictions made by single analysts (Stanton et al., 2009). Previously, this has only been tested in error prediction (Harris et al., 2005) and formative systems modelling (Cornelissen et al., 2014).
The aim of this study was to, (a) test the validity of the Net-HARMS risk assessment method when it is applied by domain and HFE experts and (b) evaluate whether the validity of Net-HARMS increases significantly when individual analyst's work is pooled together. The following sections provide an overview of the case study context, followed by a brief overview of the Net-HARMS methodology. This paper reports on the findings of a study conducted to evaluate the validity of this newly developed systems risk assessment method, when applied by novice users.

| Net-HARMS
Net-HARMS (Dallat et al., 2017a) was designed to enable the identification and assessment of both task and emergent risks across work systems. Net-HARMS combines Hierarchical Task Analysis (HTA; Annett et al., 1971), with principles of the EAST (Stanton et al., 2013;Walker et al., 2006), and the Systematic Human Error Reduction and Prediction Approach (SHERPA; Embrey, 1986). Net-HARMS has since been applied in various contexts including elite women's cycling rail level crossing safety;Hulme, McLean, et al., 2021), and automated vehicles . Figure 1 provides a flowchart describing how to apply the method. The method involves first developing a HTA that describes the system under analysis. To identify task risks, a risk mode taxonomy is applied to each subgoal within the HTA (see Figure 2-Net-HARMS taxonomy). To identify the emergent risks, a task network depicting the interactions between different subgoals and tasks is developed based on the HTA. The risk mode taxonomy is then applied to the task network to identify emergent risks that arise when the identified task risks, and networked tasks interact with one another. The output includes descriptions of the task and emergent risks and their consequences, optional ratings of probability and criticality, and suggested risk management strategies.

| Validity test case study context
The context used for the validity study is a 5-day led outdoor activity (LOA) program in Australia, involving the activities of camping and rafting for novice school students. LOA's are formally defined as facilitated or instructed activities in outdoor settings (Salmon et al., 2010), and may include for example, single to multiday programs involving activities such as camping, canoeing, rafting, bushwalking, teamwork activities, and cycling (Dallat, 2009). Schools conducting LOA's possess a statutory duty of care to those involved to ensure their safety, as far as is reasonably practicable . Consequently, governing bodies within the LOA sector mandate that risk assessments are conducted before the commencement of any LOA (Dallat et al., 2017).
LOA programs have been shown to exhibit the features of a complex sociotechnical system (Carden et al., 2017). Further, a significant number of analyses of both fatal and relatively minor injuries occurring within LOA programs have confirmed that injury incidents are a systems phenomenon which are underpinned by a network of interacting contributory factors that span the entire LOA system (see Goode et al., 2013;van Mulken et al., 2017;McLean et al, 2022;Salmon et al., 2017). Accordingly, there is a need for systems thinking risk assessment methods to adequately predict and assess the systemic risks associated with LOA programs (Dallat et al, 2018).
Participants in the study, comprising of five LOA and five HFE professionals, were asked to use Net-HARMS to identify the risks associated with the design, development and delivery of a 5-day camping and rafting LOA program. The specific context for the study was described as follows: 1. Five-day led outdoor education school program.
2. Three groups of 12 participants and two adults.
3. Activities are camping and rafting (expected Grade 2 water level). 4. The school is subcontracting the rafting component to an external provider.
5. Program will be conducted in late November in Eastern Victoria, Australia.
6. Participants are year 9 novices-have never been rafting on a school program before.

School is required to follow Department of Education Guidelines.
This refers to supervision ratios, staff qualifications, communications equipment, emergency procedures, and so forth. It also requires that schools receive documented informed consent from parents for their child's participation.
The case study HTA (see Figure 3) was initially developed by the authors of this paper and reviewed by three LOA subject matter experts. The LOA subject matter experts identified two additional tasks.
It identified 54 subgoals associated with the 5-day camping and rafting LOA program (Dallat et al., 2017a). Five main plans were identified and F I G U R E 1 Net-HARMS method flowchart (Dallat et al., 2017a). Net-HARMS, Networked Hazard Analysis and Risk Management System. F I G U R E 2 Net-HARMS taxonomy (Dallat et al., 2017a). Net-HARMS, Networked Hazard Analysis and Risk Management System. DALLAT ET AL. | 301 include, "Initiate program design" (1), "Design program" (2), "Program planning and preparation" (3), "Program delivery" (4), and "Post-program review" (5). For example, Plan 1 describes the subgoals associated with "Initiate program design" (1)-operations in this subgoal included, "Determine staffing model" (1.5), which refers to the task of identifying whether the school would choose to conduct the rafting and camping program with their own staffing resources, or whether they would contract specific expertise from subcontractors. An example from Plan 2 ("Design program") describes the subgoal of "Develop program outline" (2.7). This subgoal involves the task of planning and documenting the program outline for the group of participants, including planned activities, campsites and, additional external actor involvement (e.g., bus shuttles, food drops). The accuracy of the program outline is subsequently an important component of the subgoal in Plan 3, ("Program planning and preparation"), that of "Establish parent consent" (3.3). Department of Education policy mandates that in order for parents to provide informed consent, they must be provided with a complete understanding of the nature of the activities, supervision, and the foreseeable risks involved. An accurate program outline is an important component of this (Dallat, 2009). Notably, the LOA system HTA identified that it is not until Plan 4, ("Program delivery"), and the subgoal of "Commence and complete activity" (4.11), that participants actually commence the activity of rafting or walking. Typically, this is the stage that existing risk assessments commence (Dallat et al., 2015(Dallat et al., , 2017.
All plans and subgoals before the activity starting, identified in the HTA, are directly associated with the tasks of design and program planning and preparation.

| Design
A test-retest study design was used to evaluate the validity of the method when used by LOA professionals and HFE researchers (split into an LOA group and a HFE group) across two occasions (time 1 and time 2). Participants were asked to apply Net-HARMS to the HTA to identify the risks associated with the design, development and delivery of the LOA 5-day rafting and camping LOA program. Ethics approval for the study was granted by the University of the Sunshine Coast Human Ethics Committee (S/16/938).
A full assessment of all the task and emergent risks related to the 54 subgoals within the HTA (Figure 3) would be very time-intensive and require significant commitment from all participants. This was not feasible with this specific study. However, to provide a good overall representation of the HTA and maintain a sufficient level of F I G U R E 3 HTA of 5-day camping and rafting program (Dallat et al., 2017a). HTA, Hierarchical Task Analysis. variability of subgoals (e.g., the tasks selected were quite different), tasks from across the system were selected to base the study on. For the task risk assessment exercise (stage 1, see Figure 1), participants were allocated five tasks to assess (see Table 2). These tasks were situated within the "Initial Program Design" (1), "Design Program" (2) and "Program Planning and Preparation" (3), phases of the HTA. For the emergent risk assessment exercise (stage 2, see Figure 1), two tasks and five linked tasks were allocated. These tasks were situated within the "Design Program" (2), "Program Planning and Preparation" (3), and "Program Delivery" (4) phases of the HTA (see Table 1). A total of 12 tasks (23%) were analysed from the complete HTA.

| Participants
Ten participants took part in the study, including five HFE researchers and five LOA professionals; however, the data from one LOA professional was excluded from the results as they were unavailable for time 2 and so only the results from nine participants are reported here. The mean age of the LOA professionals was 35.7 (SD = 7.7) compared to 40 (SD = 6.1) for the HFE researchers. Table 2 presents a summary of relevant participant characteristics. LOA participants were recruited from a large LOA organisation currently operating in the Australian states of Victoria, New South Wales, South Australia, Tasmania, Northern Territory, and Tasmania. The HFE researchers were recruited from a HFE research centre within a university in Australia. Participants attended a 1-h training followed by 2 h to complete the exercise. Participation was entirely voluntary, and no compensation was offered.

| Training material
The training materials for the risk assessment exercise consisted of the case study HTA (see Figure 3), the Net-HARMS taxonomy (see Figure 2), and the task networks (see Figure 4). Participants also received the Net-HARMS method flowchart (see Figure 1), and a description of the specific context on which they would be conducting their analysis on. This included the activities, time of year, LOA participant experience, staffing structure, group numbers, and school type.

| Response booklet
A response booklet was developed to provide a standardised approach to collecting information from participants and provide further instructions on completing the analysis tasks. The first section of the response booklet requested basic demographic information from participants (e.g., age, gender, role, experience in their current role), an estimate of how many risk assessments they have previously conducted, and their previous experience in the three HFE methods underpinning Net-HARMS (i.e., HTA, SHERPA, EAST). The second section of the response booklet provided instruction on assessing the relevant task risks for the analysis. A table for recording the task being assessed, the identified risk mode from the Net-HARMS taxonomy, the task risk description, and the task risk consequences, were provided. The third section of the response booklet provided T A B L E 1 Summary of the tasks presented to participants to predict task and emergent risks using Net-HARMS. instruction on assessing the relevant emergent risks for the analysis.
The booklet provided a table for recording the initial identified task risk, the linked task identified from the task network, the identified risk mode from the Net-HARMS taxonomy, the emergent risk description, and the emergent risk consequences.

| Procedure
Both groups (the LOA professionals and the HFE researchers) undertook the risk assessment exercise twice and on different occasions to one another but followed the same procedure. At both time 1 and time 2 the risk assessment exercise was limited to 3 h total (1 h of training and 2 h of applying the method). The following procedure was used for both groups at both time 1 and time 2. After informed consent was gained and demographic data collected from all participants, the training material was presented. A brief overview of the systems thinking approach, and the underpinning theoretical framework (e.g., Rasmussen, 1997) applied to the development of the Net-HARMS, was provided. The HTA (Annett et al., 1971) method was described and participants were introduced to the specific LOA HTA that would be used for the study (Dallat et al., 2017a).
Participants were then provided with and familiarised with the Net-HARMS taxonomy and provided with a demonstration using a small case study example. An explanation of how the Net-HARMS method works to predict task risks was then discussed and demonstrated.
An opportunity to practice identifying and assessing task risks on a specific example task of "Choose Location" (2.4), was then provided. Participants were asked to consider each risk mode within the Net-HARMS taxonomy (see Figure 2), and identify credible risks associated with the task of "Choose Location" (2.4). Next, they were asked to record the risk description and consequences of any credible risks associated with this task.
When the participants had completed the task risk identification and assessment, they were then provided with training on identifying and assessing emergent risks. For practice, they were provided with a task network from the case study (see Figure 4)  F I G U R E 4 Case study example of task network for the task of "Choose Location" (2.4).
identified linked tasks (e.g., "Develop program outline" [2.4] and 'Determine resources and staffing requirements' [2.5]). They were then provided with example identified task risks associated with that task, (e.g., "Location choice is unsuitable as it did not consider specific needs of participants"). An explanation was then provided as to how to identify emergent risks using the Net-HARMS taxonomy, and the task network (see Figure 1, stage 2). Following this explanation, and to complete the practice example, participants were then asked to consider each risk mode within the Net-HARMS taxonomy (see Figure 2) and consider what additional foreseeable emergent risks were likely to originate principally due to the interaction of the initial task risks associated with "Choose Location" (2.4), and its linked tasks (see Figure 4). If participants identified foreseeable emergent risks, they then recorded its description and consequences.
Before the completion of the training session, participants were offered the opportunity to ask any questions, or receive clarification, after which they were instructed to commence the risk assessment exercise on their own. This procedure was repeated exactly on the second occasion, by the same researcher. No feedback was provided in between these occasions. The period between analyses was 38 days for the HFE researchers, and 47 days for the LOA professionals.

| Expert analysis
Before the study, the lead and second author used Net-HARMS and the HTA to identify the task and emergent risks associated with the 5-day rafting and camping LOA program. The lead author possessed 21 years of experience in LOA risk assessment (Dallat, 2009(Dallat, , 2011Dallat et al., 2017Dallat et al., , 2017aDallat et al., , 2017b,

| Data analysis
The validity of Net-HARMS was assessed by comparing the risk modes and the risk descriptions identified by participants, with those identified in the expert analysis. Specifically, the signal detection paradigm was used to assess the level of sensitivity of the participants risk assessments when compared to the expert analysis (Baber & Stanton, 1994;Green & Swets, 1966;Harris et al., 2005;Stanton & Young, 2003;Stanton et al., 2009). This approach has been applied extensively in previous validity analyses of HFE methods (Harris et al., 2005;Stanton & Stevenage, 1998;Stanton et al., 2009;Stanton, 2016).
For each participant, the frequency of hits, misses, correct rejections and false alarms were calculated. The risks identified by participants were considered a "hit" if the same risk mode and risk description were identified in the expert analysis. A "miss" was recorded if the participant did not identify a risk mode or risk description identified in the expert analysis. A "false alarm" was recorded if the participant identified a risk mode or risk description not identified in the expert analysis. Finally, a correct rejection was recorded when a risk mode from the Net-HARMS taxonomy was not identified in the expert and participant analyses. Hits, misses, false alarms and correct rejections were then used to calculate hit rates and false alarm rates. The formulas used to calculate the hit rate and false alarm rates are given in Equation (1) According to the literature, acceptable levels of validity for hit rates sit around 0.7 (Baber & Stanton, 1996;Harris et al., 2005;Stanton & Stevenage, 1998;Stanton et al., 2009), and between 0.3 and 0.4 for false alarm rates (Stanton et al., 2009). Individual, group pooled, and combined group pooled, hit and false alarm rates for the analyst groups were calculated for aggregated task and emergent risks using the formulas provided in Equation (1).
The initial analysis of the data revealed a high rate of false alarms and unacceptable false alarm rates in both groups. Previous studies have indicated that it may be imprudent to dismiss false alarms, and rather, treat them as errors/risks that have not yet eventuated or been experienced by subject matter experts (see Harris et al., 2005;Stanton et al., 2009). Consequently, the false alarms identified by participants were evaluated to identify whether they could represent credible or foreseeable risks that were not identified in the expert analysis. This analysis was undertaken by a LOA subject matter expert with over 20 years of experience, who had not been involved with earlier aspects of the study. Each participant false alarm (risk mode and risk description) was judged as either credible or not credible based on the subject matter expert's opinion as to whether the risk could conceivably eventuate. Credible was defined as "reasonably likely" which means a probability of greater than five on a scale of between one to ten. Noncredible was defined as "not reasonably likely" which means a probability of less than five on a scale of between one to ten. For example, for the task of "Choose Activities" (2.3), one participant identified the false alarm, "Culture of the school impacts the choice of activities, e.g., school has a 'hardcore,' 'macho' culture and expects its students to paddle 20 km a day" (Net-HARMS taxonomy mode, E1). This was judged as credible by the subject matter expert. In contrast, for the task, "On program review of pre-existing medical and dietary needs" (4.8), and the DALLAT ET AL.
| 305 linked task of "Incident Response" (4.15), an emergent risk false alarm identified was, "Staff give information about medical condition to emergency services too late, once inappropriate treatment has begun" (Net-HARMS taxonomy mode, C4). This was judged as non-credible by the subject matter expert. 3.2 | False alarm rate-Task risks

| Individual participants
Individual false alarm rate scores for the task risks identified at time 1 and at time 2 are presented in Figure 6, along with the pooled LOA, pooled HFE, and pooled LOA and HFE groups.
For the LOA group, the false alarm rates ranged from 0.20 to 0.27 at time 1, and 0.11 to 0.29 at time 2. The false alarms rates for the HFE group ranged from 0.15 to 0.38 at time 1, and 0.15 to 0.25 at time 2.

| Pooled participants
Again, similar to the pooled hit rate task scores, the pooled task risk false alarm scores were higher than the individual scores. For the LOA group, the pooled false alarm rate was 0.86 and 0.64, at time 1 and 2, respectively. The HFE group had a false alarm rate for task risks of 1.00 and 0.85, at time 1 and time 2, respectively. When both the LOA and HFE groups scores were pooled together, the false alarm rates were 0.95 at time 1, increasing to 0.98 at time 2.

| Individual participants
Participant's hit rate scores for the emergent risks identified at time 1 and at time 2 are presented in Figure 7, along with the pooled LOA, pooled HFE, and pooled LOA and HFE groups. When both the LOA and HFE groups scores were pooled together, the false alarm rate increased at both time 1 (0.9), and Time 2 (0.92).

| Task and emergent risks
Due to the high number of false alarms identified by both groups, a subject matter expert was asked to evaluate the extent to which they represented credible risks that could potentially occur. The number of task and emergent risk false alarms that were deemed to be credible risks by the subject matter expert at time 1 and at time 2, are presented in Figure 9. This shows that a high proportion of the false alarms were considered credible by the subject matter expert, across both groups for task and emergent false alarms.

| DISCUSSION
Testing the validity of HFE methods is a critical but often overlooked component of methods development. The aim of this study was to test the validity of a recently developed systems thinking risk assessment method when used by both domain and HFE experts. The study also sought to determine whether pooling the results of individual participants improved the validity of the method (Cornelissen et al., 2014).
The findings are compelling and have significant implications for risk assessment, and our discipline. First, although the individual hit rate scores for task and emergent risks were unsatisfactory, the pooled results showed a marked improvement with very high levels of validity when compared to the expert analysis. When this is combined with Net-HARMS previous moderate performance in validity studies (Hulme et al., 2021a), this suggests that pooling individual analyses can enhance the sensitivity and validity of the Net-HARMS method. This finding mirrors that of Cornelissen et al. (2014) and may have broader implications for risk assessment generally. These implications include the fact that this Net-HARMS study was not conducted in a team-based environment, where all members identified potential risks together; rather the study was completed individually, and the results pooled. This adds support to a body of work that exists in both accident analyses studies (e.g., Leveson, 2004), and formative systems analysis methods studies (Cornelissen et al., 2014), surrounding the benefits of a multi-analyst approach.
These findings are also significant for the LOA domain. Previous studies have indicated that inadequate risk assessment is an oftenfeatured contributory factor in deaths and injuries on LOA programs (Salmon, 2016;Salmon et al., 2014;White, 2014), and further studies have reported that a common practice within the LOA domain is that practitioners largely complete risk assessments on their own, or with their peers working in similar roles (Dallat et al., 2015;Dallat et al., 2017;Parkin & Blades, 1998).
Moreover, an interesting feature of this study was that the LOA group comprised participants from across levels of an LOA organisation, and the findings, therefore, highlight the importance of gaining perspective, consideration, and input from all levels of the organisation when undertaking risk assessments. Accordingly, these findings question the appropriateness of conducting risk assessments in isolation in organisations, as is often the case when a safety manager is employed (Dallat et al., 2017;Rae & Alexander, 2017). Leveson (2004) has argued, within an accident analysis context, that different people situated across multiple, different hierarchical levels of a work system will possess different views of the system and its safetyrelated processes. These multiple perspectives should be solicited to provide a more accurate understanding and picture of the whole system under analysis (Leveson, 2004). These findings lend support to this position from a risk assessment context.
The results also indicate that the group pooled results were largely consistent across both task and emergent risks, for both groups, and across time 1 and time 2. This is encouraging as it demonstrates that LOA professionals who had no previous experience or expertise in systems methodologies, were generally as consistent in their ability to use the Net-HARMS method to predict emergent risks, as were the HFE researchers who possessed method expertise. Of interest, the LOA group recorded a higher hit rate for emergent risks than the HFE group. This perhaps speaks to their domain-specific expertise and higher level of understanding of interacting risks and tasks. Conversely, the results show that the HFE group, all who possessed prior experience in using either SHERPA (Embrey, 1986), HTA (Annett et al., 1971), or EAST (Stanton et al., 2013;Walker et al., 2006), predicted similar numbers of correct risks as the domain-specific practitioners.
Specific prior expertise in either the LOA domain, or in systems thinking, may have influenced the improvement from time 1 and time 2. All LOA analysts improved their predictions in task risks with increased familiarity of the method (Stanton & Stevenage, 1998), however for emergent risks, this was not the case. The HFE researchers were the opposite, with improvement from time 1 to time 2 in this group's predictions occurring across the emergent risks only. Of interest, emergence, and particularly, identifying and assessing emergent risks is a new concept for the LOA domain. To date, only one publication has specifically addressed this subject within risk assessment (Dallat et al., 2017a). Conversely, the importance of understanding emergence has been discussed within the HFE literature for several decades (e.g., Donovan et al., 2017;Hollnagel, 2012;Leveson, 2011;Rasmussen, 1997;Salmon et al., 2017).
An additional and interesting finding also emerged when pooling the LOA and HFE researchers results together. Across both task and emergent risks, as well as both time 1 and time 2, this "multiple domain pooling," improved on the individual group pooled scores of both the LOA and HFE groups. This finding suggests that the presence of both domain expertise and systems-thinking expertise, should be considered to identify as many foreseeable risks across the entire work system, when conducting risk assessment activities.
While pooling the results across the groups improved the hit rate, the false alarm rate also increased; however, most of the false alarms were considered credible by an independent domain subject matter expert. The high number of false alarms deemed to be credible supports the view that false alarms in error prediction and risk assessments may in fact represent errors, risks or contributory factors waiting to happen (Harris et al., 2005;Stanton et al., 2006). It appears to suggest that to dismiss them as inaccurate would be unwise, and at its extreme, dangerous. The results suggest that many of the false alarms represent credible and important risks that must be considered in the risk assessment process. These findings question the use of false alarms when using "expert" performance as a reference point in assessing the validity of HFE methods.
There were some notable study limitations. The small sample size means that the results should be treated with some caution, although Wallace and Ross (2006) argue that six to eight participants are sufficient for a method evaluation as more tends to produce diminishing returns. The results do suggest, however, that participant characteristics may have a significant impact on the outputs produced when using Net-HARMS. This suggests that further testing with different samples and variable expertise, is required to determine whether the results are replicable. This would also be beneficial as although the study included a representative set of tasks across the system, the method was only applied to a small subset of the HTA. The replication of validity results is almost entirely overlooked in the HFE literature. Finally, there was a difference in time between time 1 and time 2 testing for the two groups due to availability however this was not extensive (9 days).
F I G U R E 9 Credible false alarms for task and emergent risks.