Decision‐making accuracy of soccer referees in relation to markers of internal and external load

Abstract This study examined the relationships between the decision‐making performances of soccer referees and markers of physiological load. Following baseline measurements and habituation procedures, 13 national‐level male referees completed a novel Soccer Referee Simulation whilst simultaneously adjudicating on a series of video‐based decision‐making clips. The correctness of each decision was assessed in relation to the mean heart rate (HR), respiratory rate (RR), minute ventilation (VE), perceptions of breathlessness (RPE‐B) and local muscular (RPE‐M) exertion and running speeds recorded in the 10‐s and 60‐s preceding decisions. There was a significant association between decision‐making accuracy and the mean HR (p = 0.042; VC = 0.272) and RR (p = 0.024, VC = 0.239) in the 10‐s preceding decisions, with significantly more errors observed when HR ≥ 90% of HRmax (OR, 5.39) and RR ≥ 80% of RRpeak (OR, 3.34). Decision‐making accuracy was also significantly associated with the mean running speeds performed in the 10‐s (p = 0.003; VC = 0.320) and 60‐s (p = 0.016; VC = 0.253) preceding decisions, with workloads of ≥250 m·min−1 associated with an increased occurrence of decisional errors (OR, 3.84). Finally, there was a significant association between decision‐making accuracy and RPE‐B (p = 0.021; VC = 0.287), with a disproportionate number of errors occurring when RPE‐B was rated as “very strong” to “maximal” (OR, 7.19). Collectively, the current data offer novel insights into the detrimental effects that high workloads may have upon the decision‐making performances of soccer referees. Such information may be useful in designing combined physical and decision‐making training programmes that prepare soccer referees for the periods of match play that prove most problematic to their decision‐making.

- 659     "maximal" (75-100 au); and (4) running speeds ≥250 m•min −1 were performed in the 10-s prior to a decision.� In contrast, the accuracy of the referees' decisions was not related to mean measures of HR, RR, or V ̇E obtained in the 60-s preceding decisions.
� The relationship between the physiological and decision-making aspects of refereeing performance therefore appears to be a transient one, whereby high internal and external loads experienced immediately prior to infringements may compromise the correctness of the decision.
� Such information may be used to guide the design and delivery of training programmes aimed at preparing soccer referees for the periods of match play that prove most problematic to their decision-making.

| INTRODUCTION
Soccer referees are tasked with ensuring that match play is contested in accordance with the Laws of the Game (IFAB, 2022).A key aspect of this responsibility is to identify incidents of foul play, with ~26 fouls awarded per match (Mallo et al., 2012).To enhance their perception of potential infringements, referees must remain close to play, adopting suitable viewing positions without obstructing the ball or players (Mallo et al., 2012).Elite referees therefore cover total distances (TD) of 9-12 km during competitive matches, with highspeed running (HSR) ≥18 km•h −1 accounting for ~1800 m (Krustrup et al., 2009;Mallo et al., 2009).The internal loads elicited during match play are also considerable with elite referees attaining mean match heart rates (HR) of ~85% of their maximal (HR max ; Krustrup et al., 2009;Mallo et al., 2009).A key challenge facing referees is therefore the necessity to undertake complex perceptualcognitive processes in combination with elevated levels of physiological stress.In considering the implications that high physiological loads may have on the decision-making process, the relationship between these facets of performance requires investigation (Weston et al., 2012).
In the broad domain of sport and exercise science, many studies have explored the impact of acute exercise upon perceptualcognitive performance (Basso & Suzuki, 2017;McMorris & Hale, 2012).Synthesising the available evidence, narrative and metaanalytic reviews have generally concluded that moderate-intensity exercise can increase processing speeds, with little effect on response accuracy (Basso & Suzuki, 2017;McMorris & Hale, 2012).
The findings of individual studies do however display notable variability, with reports of both positive and negative effects.Such variability is likely due to differing moderating variables such as the type, intensity, and complexity of the exercise stimulus and perceptual-cognitive task (McMorris & Hale, 2012).For instance, enhancements in a simple multiple-choice reaction time test were previously noted amongst elite soccer players during a simulated match (Wiśnik et al., 2011).Recent data does however indicate that performance during more complex, soccer-specific tasks may decline with increasing physical loads (Alder et al., 2021).Considering the intricate nature of soccer refereeing and the need to make complex judgements during high-intensity, intermittent activity, the ability to extend findings obtained from other domains to soccer officials is perhaps limited.
Efforts to explore the impact of elevated levels of physiological load upon the decision-making of team sport officials have thus far been relatively scarce (Bloß et al., 2020).Recently, Pizzera and colleagues (2022) examined the performances of soccer referees during a video-based decision-making task administered at rest and whilst running at 60% and 80% of maximal oxygen uptake (V ̇O2max ).It was reported that physical exertion had no influence upon decisionmaking, with decision-making performance remaining comparable between conditions.Several studies have also explored this association in situ, often ascertaining the officials' decision-making accuracy across distinct fixed-time epochs and correlating this against the physiological loads imposed during that same period.In an investigation by Mascarenhas and colleagues (2009), for example, the decision-making accuracy of seven soccer referees was unrelated to their average running speed and TD across 15-min match segments.
A similar investigation amongst Rugby League referees found no association between decision-making performance and mean HR or HSR distance when explored across 10-min intervals (Emmonds et al., 2015).While these data may indicate that decision-making is unrelated to physical load, the analytical approaches adopted are likely insufficient to assess the intricacies of such a relationship.Indeed, current evidence suggests that discrete fixed-time epochs may underestimate physical match demands as compared to rolling averages (Fereday et al., 2020).Aggregating data over prolonged epochs may also conceal acute periods of high physiological stress that could impact decision-making.Although high-intensity episodes last only seconds (Barbero-Álvarez et al., 2012), it is plausible that such exertions may temporarily impair an official's decision-making.
Several theoretical frameworks indeed exist to support a possible negative effect of physical stress on decision-making.One line of thought relates to the attentional control theory, which posits that increasing task demands may place an additional strain on an official's limited attentional resources (Eysenck et al., 2007).That is, at higher levels of physical exertion, referees may be inclined to direct greater attention towards addressing physiological perturbations and ensuring their own postural stability, thereby diverting attention away from the incident at the critical moment.Examining the correctness of decisions immediately following shorter, acute periods of high physical stress may therefore permit a better understanding of the relationship that exists between the physical and decision-making performances of match officials.
A previous investigation amongst soccer referees found no association between the officials' decision-making accuracy and the TD covered 30 s before an incident (Riiser et al., 2019).Similar observations were noted amongst Australian Football umpires, with decision-making accuracy found to be unrelated to the mean running speeds recorded 30 s to 5 min prior to the decision (Elsworthy et al., 2014).Higher relative running speeds performed 5 s before a decision were however associated with a greater occurrence of decisional errors (Elsworthy et al., 2014).While an enhanced physiological demand could explain such findings, this remains speculative as no measures of internal load were obtained.As isolated and repeated bouts of HSR frequently precede crucial match moments like goals and goal scoring opportunities (Martínez-Hernández et al., 2022), additional research is required to examine their impact on the decision-making accuracy of soccer referees.Indeed, as decisional errors can greatly impact match outcomes, deepening our understanding of the relationship between these facets of performance represents an important step if researchers and practitioners are to positively influence the decision-making performances of soccer referees.That is, by understanding how the physical demands of match play influence the judgements made by soccer officials, training interventions that replicate the types of situations and contexts that prove problematic to their decision-making may be developed.The purpose of the present study was therefore to assess the decision-making performances of soccer referees with respect to the internal and external loads recorded in the moments immediately preceding decisions.Considering the literature discussed, the following hypotheses are proposed: (1) higher internal and external loads recorded in the 10-s and 60-s preceding decisions will be associated with a greater occurrence of decisional errors than lower loads; and (2) stronger associations would be identified using smaller (10-s) than larger (60-s) epochs.

| Participants
Thirteen soccer referees (age: 30.4 � 4.1 years; stature: 177.5 � 7.5 cm; body mass: 76.9 � 10.2 kg; and V ̇O2max : 53.5 � 3.5 mL•kg•min −1 ) from the Scottish Football Association (SFA) participated in the current investigation.Participants possessed 7.9 � 2.0 years of officiating experience and had officiated at a national level for 4.5 � 1.9 years.On average, participants trained for 3.7 � 0.8 h/week and officiated 1-2 matches per week within the Scottish Championship and/or Scottish League One.Informed written consent was obtained from all participants prior to testing, and the study received institutional ethical approval.

| Preliminary measurements and habituation
Referees attended the laboratory twice during the in-season (October to December), with a maximum of 7 days separating trials.Participants abstained from strenuous exercise in the 48-h preceding each session.Trials were conducted at a similar time of day (�1h) under standardised environmental conditions (temperature: 19°C; relative humidity: 40%).
Participants' V ̇O2max and HR max were established during a ramp incremental test on a motorised treadmill (Woodway PPS 55sport-I, USA).Following a standardised warm-up, participants commenced running at 8 km•h −1 for 2 minutes, with the speed increased by 1 km•h −1 every minute until 15 km•h −1 .Thereafter, speed remained constant with the gradient increased by 1% every minute until volitional exhaustion (Sperlich et al., 2015).Participants were instructed to perform to the best of their ability and received verbal encouragement throughout.Participants' HR and respiratory variables were monitored throughout via HR telemetry (Polar H10, Finland) and breath-by-breath gas analysis (Jaeger Oxycon Pro, Germany), respectively.V ̇O2max was considered the highest V ̇O2 value recorded using 15-breath rolling averages with HR max defined as the highest value recorded.Achievement of at least two of the following criteria confirmed attainment of V ̇O2max : (1) plateau in V ̇O2 despite an increase in speed; (2) HR within �10 beats•min −1 of age-predicted HR max and (3) a respiratory exchange ratio ≥1.10.Additionally, respiratory rate (RR) and minute ventilation (V ̇E) were averaged on a 5-s basis, with the highest value recorded in a 20-s period retained as RR peak and V ̇Epeak , respectively (Buchheit et al., 2009).

| Soccer referee simulation
During the main trial, the SRS was performed on a programmable motorised treadmill (Woodway PPS 55sport-I, USA).The validity and reliability of the SRS have been described previously (McEwan et al., 2023).Briefly, two ~16-min blocks interspersed with a 90-s passive recovery period were performed (see Supplemental Figure that resulted in an activity change every 6-23 s.This resulted in 145 activity changes, with the rate of acceleration/deceleration between changes set at 2 m•s −2 .Previous data suggest that ~36% of the accelerations/decelerations performed by elite referees during matches are performed at rates of 1.5-2.5 m•s −2 (Castillo et al., 2018).Activity changes were communicated to participants via a visual countdown displayed on a large monitor positioned in front of the treadmill.
Throughout the SRS, each referee was presented with 10 soccer-specific decision-making clips on a 40-inch monitor (NEC MultiSync LCD4010, Japan) positioned ~2 m in front of the treadmill.The number of clips presented to each participant reflected the relative frequency of fouls that occur during a match (Mallo et al., 2012).Video clips were sourced from the Refereeing Department of the Scottish Football Association and represented potential foul-play incidents from club and international European matches.Video clips were selected following consultation with two former international referees (combined to total of 26 years on the FIFA list) and were deemed acceptable if they: (1) depicted a foul-play scenario in the central area of the field whereby input from the assistant referee would be limited (Mallo et al., 2012); (2) omitted the in-game official's decision; and (3) were of high visual quality and presented from an in-game perspective.Regarding this latter criterion, whilst clips were sourced from match broadcast footage, they were presented from a vantage point that closely resembled a referee's perspective on the field, rather than a distant grandstand position.We chose not to include potential penalty decisions as these are one of the four moments in which referees may receive assistance from the Video Assistant Referee (VAR).To control for the impact of contextual factors upon participants' decisions, the elapsed time of the match, the score, and the background sound were removed.Reference decisions were determined by the two former referees who independently assessed each clip as per Law 12 (Fouls and Misconduct) of the Laws of the Game (IFAB, 2022).To facilitate their decision-making, video clips were able to be viewed multiple times in both real time and slow motion (Spitz et al., 2018).Complete agreement was exhibited between experts for each clip with decisions deemed to be of a similar level of difficulty.The following reference decisions were reached: no foul (n = 26), foul without caution (n = 39), foul with yellow card (n = 39) and foul with red card (n = 26).Of the 10 video clips administered during each SRS, 5 were administered during a stand phase and 5 were administered during a jog phase, with each of the five different locomotor activities immediately preceding a clip on two occasions.Upon viewing, referees made one of the following four decisions: no foul, foul, foul with yellow card, or foul with red card (Spitz et al., 2018).Participants' decisions were assessed against the reference decisions and were categorised as correct or incorrect.
Throughout each trial, participants' HR and respiratory variables were measured as previously described.As fixed epochs can underestimate physiological demands (Fereday et al., 2020), HR data were processed using 10-s and 60-s rolling averages, with the mean HR recorded prior to each decision retained for analysis.Mean HR data were expressed in relative terms as a percentage of participants' HR max and were classified into four distinct zones: 60%-69% HR max , 70%-79% HR max , 80%-89% HR max, and >90% HR max (Edwards, 1993).The mean RR and V ̇E recorded in the 10-s and 60-s preceding each decision was also calculated and retained as a percentage of participants' RR peak and V ̇Epeak , respectively (Buchheit et al., 2009).
Using the CR100 scale, participants provided differential ratings of perceived exertion (RPE) to delineate between perceptions of breathlessness (RPE-B) and local muscular (RPE-M) exertion (Weston et al., 2015).To control for the potential influence of acute fatigue upon RPE, ratings were collected during the jog phase preceding each decision and were obtained in a counterbalanced manner to eliminate order effects (Weston et al., 2015).Participants were habituated with this scale during the preliminary session and received instruction on how to appraise differential RPE.Specifically, participants were informed that RPE-B depends mainly on the breathing rate and/ or heart effort, and RPE-M depends mainly on the strain and exertion in the lower limbs (McLaren et al., 2020).Differential RPE were subsequently classified into four arbitrary categories: "nothing at all to moderate" (0-25 au), "moderate to strong" (26-50 au), "strong to very strong" (51-75 au), and "very strong to maximal" (76-100) (Lovell et al., 2020).
The average running speeds performed in the 10-s and 60-s preceding decisions were also calculated.To aid the practical application of the findings, running speeds were expressed as a relative measure (m•min −1 ) and were categorised as: <150 m•min −1 , 150-199 m•min −1 , 200-249 m•min −1 , and ≥250 m•min −1 .

| Data analysis
To explore associations between decision-making accuracy and physiological load, the number of correct and incorrect decisions was calculated for each of the intensity bandwidths of the physiological variables.For measures of HR, for example, we established the number of correct and incorrect decisions made within each HR zone (i.e., 60%-69%, 70%-79%, 80%-89%, and >90% HR max ).Chi-squared tests of independence were subsequently performed to examine if decision-making accuracy was uniformly distributed across the predefined intensity categories (Nevill et al., 2002).The magnitudes of associations were assessed using Cramer's V (V C ) effect sizes and interpreted as: trivial (<0.1), small (0.1-0.29), moderate (0.3-0.49), or large (≥0.5) (Cohen, 1992).To identify cells where observed counts deviated from an expected equal distribution, standardised residuals (SR) were calculated with values of less than −2 and greater than 2 considered statistically significant (Agresti, 2007).Odds ratios (OR), with uncertainty expressed as 95% CI, were subsequently calculated to examine the likelihood of making a decisional error associated with each categorical variable.Statistical procedures were completed using SPSS 26.0 (IBM, USA), with statistical significance set at p < 0.05.

662
- referees remained unaltered by increasing levels of physical exertion (Pizzera et al., 2022).An earlier study by Emmonds et al. (2015) also found no association between the decision-making accuracy of Rugby League officials and metrics of mean and peak HR.While speculative, such discrepancies might stem from differences in the exercise stimulus and the analytical approaches adopted.Firstly, the protocol distinct fixed-time epochs, contextualising these against the physiological loads recorded during that same period.A more nuanced analysis was however performed in the current investigation, assessing the correctness of each decision in relation to the internal loads exhibited immediately preceding the decision.While further research is warranted to identify the underlying mechanisms of this relationship, the disparities noted between the 10-s and 60-s epochs may suggest that the relationship between the physiological and decision-making aspects of refereeing performance is a transient one, whereby high internal loads at the time of an infringement may compromise decision accuracy.Regarding the dual-task paradigm, it has been posited that cognitive performance is likely impaired during high-intensity exercise when both cognitive and physiological demands peak and converge simultaneously (Sudo et al., 2022).As referees attain HRs ≥90% of HR max during competitive matches (Mallo et al., 2009), such findings hold important implications for soccer referees' decision-making, particularly when the most intense periods of match play align with the requirement to make a decision.
Another novel aspect of the present study concerns the assessment of decision-making in relation to perceived levels of central and peripheral exertion.When RPE-B was rated as very strong to maximal (75-100 au), officials were 7.2 times more likely to make a decisional error.As RPE-B is driven by sensory and affective cues like RR and HR (Borg et al., 2010), these findings are perhaps unsurprising given the reduced decision-making accuracy observed when RR and HR were ≥80% and ≥90% of RR peak and HR max , respectively.Moreover, earlier studies have shown that sensations of breathlessness can induce heightened anxiety levels amongst highly trained athletes, particularly at greater levels of ventilation (Faull et al., 2016).Thus, it appears plausible that under high levels of cardiorespiratory stress, attention might have shifted away from the decision at the critical moment, with officials focusing more on stabilising their breathing rate (McEwan et al., 2018).Alternatively, it has been suggested that the onset of hypocapnia, induced by hyperventilation during highintensity exercise, may lead to a reduction in cerebral blood flow, thus impacting cognitive performance (Smith & Ainslie, 2017).
Nevertheless, this remains speculative, and additional research is required to identify the mechanisms through which high internal loads may compromise the decision-making performances of soccer officials.In comparison, the correctness of the officials' decisions was unrelated to measures of RPE-M.It should however be acknowledged that only one decision was made when RPE-M was rated as very strong to maximal.While soccer refereeing is typically associated with a greater perceived peripheral demand (Castillo et al., 2018), the current cohort reported higher levels of perceived respiratory exertion.Since RPE-M depends mainly on the strain and exertion in the lower limbs, these findings may partly reflect the exclusion of referee-specific movements and changes of direction (Hader et al., 2014).Moreover, moderate correlations have previously been reported between RPE-M and GPS-derived measures of external load such as HSR (Weston et al., 2015).The lower RPE-M observed in the current study may therefore simply be a result of the shorter duration of the SRS protocol compared to match play.Whilst HR measures remain relatively stable across 15-min periods of match play (Krustrup et al., 2009), levels of neuromuscular fatigue develop progressively throughout a match (Goodall et al., 2017).Although it was beyond the scope of the present study, future research may wish to explore the effect that match-related fatigue has on the decisionmaking performances of soccer officials, particularly towards the latter stages of match play.
Another key finding concerns the significant associations observed between decision-making accuracy and the mean running speeds performed 10 and 60 s prior to a decision.Specifically, officials were 3.8 times more likely to make an error having performed external workloads of ≥250 m•min −1 in the 10-s preceding the decision.These findings align with those of Elsworthy and colleagues (2014) who noted higher running speeds performed by Australian Football umpires in the 5 s before a decision to have increased the likelihood of an error being made.Conversely, decision accuracy amongst 11 Norwegian soccer referees appeared unrelated to the TD covered 10 and 30 s before a decision (Riiser et al., 2019).Nonetheless, only 6 of the foul-play decisions made by these officials were deemed incorrect, representing an error rate of only 1.7%.While the reasons for this low error rate remain unclear, the failure of Riiser et al. (2019) to observe any associations between the physical and decision-making performances of their cohort perhaps indicates a lack of statistical power.
The current study is not without some limitations.Most important is the performance of a simulated match protocol and therefore a reduced level of ecological validity.Another limitation relates to the relatively small sample of preselected video clips, although the frequency of clips was consistent with the relative number of fouls that occur during match play (Mallo et al., 2012).Additionally, videobased testing has been shown to possess high levels of construct and discriminant validity amongst team sport officials (Spitz et al., 2018).While enhanced ecological validity could be achieved within naturalistic settings, the current approach allowed for the physical and decision-making performances of soccer officials to be assessed in isolation from contextual match factors that may further exacerbate decision-making accuracy.Lastly, although officials in the current study were of a national level, future research is required to confirm whether our findings are reproducible within elite populations.

| CONCLUSIONS AND PRACTICAL IMPLICATIONS
The present findings suggested that the decision-making performances of soccer referees may be compromised under high cardiorespiratory stress, as evidenced by a marked increase in decisional errors when HR was ≥90% of HR max and RR was ≥80% of RR peak in the 10-s preceding the decision.Errors were also more frequent when RPE-B was rated as very strong to maximal (75-100 au) and when high running speeds (≥250 m•min −1 ) were sustained in the 10 s prior to the decision.These observations therefore reaffirm the importance of high levels of aerobic fitness amongst soccer referees (Castagna et al., 2019).Whilst well-developed physical capacities enable soccer referees to keep up with play and adopt suitable viewing positions, our findings suggest that high levels of physical fitness may also be crucial in managing or mitigating the sudden spikes in physiological stress that can hinder decision-making.It therefore stands to reason that soccer officials should continue to apportion ample time to their physical conditioning.However, in considering the direct impact that acute periods of high physiological stress had on the correctness of the referees' decisions, it would also appear important that officials prepare for such challenges during training (Kittel et al., 2019(Kittel et al., , 2023).Yet, as highlighted recently (McEwan et al., 2022), the physical and decision-making abilities of soccer referees are typically developed in isolation.Considering this, practitioners may wish to use the present data to inform the design of combined physical and decision-making training sessions that help to prepare soccer officials for the periods of match play that prove most problematic to their decision-making.
Upon completion, participants received ~15 min recovery before being habituated to the Soccer Referee Simulation (SRS) and main experimental procedures.During the habituation process, participants were acclimated to the stochastic, intermittent velocity profile of the SRS and completed 5 practice video clips.Crucially, all participants were familiar with video-based testing having performed this regularly as part of their training within the previous 2 years.Participants had not however been exposed to the specific clips used within the present study during either training or previous research.
for schematic of SRS).The protocol incorporated varying periods of standing, walking (6 km•h −1 ), jogging (11 km•h −1 ), cruising (15 km•h −1 ), and HSR (18 km•h −1 ), with the frequency and duration of activities reflecting previous literature(Barbero-Alvarez   et al., 2012;Krustrup et al., 2009).Considering the impracticalities of changing speed every few seconds on a motorised treadmill, the frequency and duration of occurrences were manipulated by a factor EUROPEAN JOURNAL OF SPORT SCIENCE -661

F
Decision-making accuracy in relation to mean heart rate (HR; A), respiratory rate (RR; B) and running speed (C) in the 10 s preceding each decision.664 -MCEWAN ET AL. employed by Pizzera et al. (2022) sought to replicate mean match play intensities, with referees running at steady intensities of 60% and 80% of V ̇O2max .Given this approach, it is unlikely that these officials reached the upper intensities observed in the current study to influence decision-making (i.e., ≥90% of HR max ).Secondly, both studies assessed the accuracy of the official's decisions across F I G U R E 2 Decision-making accuracy in relation to mean heart rate (HR; A), respiratory rate (RR; B) and running speed (C) in the 60 s preceding each decision.EUROPEAN JOURNAL OF SPORT SCIENCE -665

F
I G U R E 3 Decision-making accuracy in relation to perceptions of breathlessness (RPE-B; A) and local muscular (RPE-M; B) exertion.