Evaluating mental workload during multitasking in simulated flight

Abstract Background Pilots must process multiple streams of information simultaneously. Mental workload is one of the main issues in man–machine interactive mode when dealing with multiple tasks. This study aimed to combine functional near‐infrared spectroscopy (fNIRS) and electrocardiogram (ECG) to detect changes in mental workload during multitasking in a simulated flight. Methods Twenty‐six participants performed three multitasking tasks at different mental workload levels. These mental workload levels were set by varying the number of subtasks. fNIRS and ECG signals were recorded during tasks. Participants filled in the national aeronautics and space administration task load index (NASA‐TLX) scale after each task. The effects of mental workload on scores of NASA‐TLX, performance of tasks, heart rate (HR), heart rate variability (HRV), and the prefrontal cortex (PFC) activation were analyzed. Results Compared to multitasking in lower mental workload conditions, participants exhibited higher scores of NASA‐TLX, HR, and PFC activation when multitasking in high mental workload conditions. Their performance was worse during the high mental workload multitasking condition, as evidenced by the higher average tracking distance, smaller number of response times, and longer response time of the meter. The standard deviation of the RR intervals (SDNN) was negatively correlated with subjective mental workload in the low task load condition and PFC activation was positively correlated with HR and subjective mental workload in the medium task load condition. Conclusion HR and PFC activation can be used to detect changes in mental workload during simulated flight multitasking tasks.

particular task at the moment (O'Donnell & Eggemeier, 1986). Mental workload usually results from tasks requiring lower physical demand but higher demand on cognition, thinking, and judgment of the operator (Wickens et al., 2016). However, the cognitive resources of humans are fundamentally limited (Wickens, 2002). In some conditions, pilots must perform several tasks simultaneously, each with a different priority. The simultaneous appearance of multiple task information leads to a high mental workload (Hsu et al., 2015). In fact, high mental workload does not inherently have bad consequences, but high mental workload during multitasking can lead to less residual resources to perform concurrent tasks, which may result in task management errors (Wickens et al., 2016). In turn, task management errors may influence task switching. In that case, pilots may omit vital information due to the cognitive tunneling phenomenon, which was defined as "the allocation of attention to a particular channel of information, diagnostic hypothesis or task goal, for a duration that is longer than optimal, given the expected cost of neglecting events on other channels, failing to consider other hypotheses, or failing to perform other tasks" (Wickens, 2005). Many studies have highlighted that performance declines when cognitive demands exceed the operator's cognitive resources (Chenot et al., 2021;Fallahi et al., 2016;Puma et al., 2018;Stojan and Voelcker-Rehage, 2021). Therefore, it is of theoretical and practical significance to evaluate the mental workload of pilots during multitasking. According to Dehais et al. (2020), the objective of measuring the mental workload of pilots during multitasking is to predict the probability of performance impairment.
Mental workload is an abstract attribute of man-machine interaction, which cannot be directly observed (Matthews et al., 2015).
The evaluation of mental workload is generally conducted through various methods such as subjective reports, performance evaluations, and physiological measurements (O'Donnell & Eggemeier, 1986).
One of the mostly used tools in the subjective report is the national aeronautics and space administration-task load index (NASA-TLX) scale (Hart, 2006). Performance evaluation can be divided into primary and secondary task performance evaluations. The primary task is the task that has the priority of processing when the operator needs to complete multiple tasks at the same time. In the case of priority completion of the primary task, the operator uses their residual capacity to complete the other task, the secondary task (Liu & Wickens, 1994). Methods of physiological measurement to evaluate mental workload include electroencephalogram (EEG), electrocardiogram (ECG), eye movement, and functional near-infrared spectroscopy (fNIRS). Mental workload cannot be estimated precisely with a single index or method because individual and environmental factors will affect the mental effort deployed to perform a given task (Wanyan et al., 2014). Therefore, a comprehensive evaluation method is necessary. Lehrer et al. (2010) combined heart rate variability (HRV), NASA-TLX, and task performance to evaluate mental workload during a simulated flight with a Boeing 737 B flight-800 Level D flight simulator. They found cardiac assessment to be a useful addition to self-reported measures for determining flight task mental workload and risk for performance decrements. HRV is widely applied in the evaluation of mental workload because the recording is noninvasive and the ECG signal is easy to extract and analyze. HRV describes beat-to-beat variation in heart rate (HR) or small differences in RR intervals, thus reflecting the function of the autonomic nervous system (ANS). HRV comprises two components, namely, sympathetic and parasympathetic components.
When humans are in a state of heavy mental workload, cardiac activity is controlled mainly by sympathetic nerves, whereas cardiac activity is normally controlled mainly by the vagal nerves (Lean & Shan, 2012).
HR increases and the parasympathetic components of HRV decrease in situations of higher mental workload (Lehrer et al., 2010). ECG is one of the earliest physiological methods used to evaluate pilots' mental workload (Roscoe, 1992). In an early study, HR and HR irregularity (HI) were used to assess mental workload at different flight phases in a simulated flight and were found to differ from phase to phase (Opmeer & Krol, 1973 (Mansikka et al., 2016. Many recent studies showed that HR and HRV are sensitive to different task demands and can distinguish between levels of mental workload in simulated flights (De Rivecourta et al., 2008;Lehrer et al., 2010;Mansikka et al., 2016 or actual flights (Bonner & Wilson, 2002;Skibniewski et al., 2015;Veltman, 2002;Wilson, 2002). However, some studies showed that HR was not sensitive to mental workload (Lee & Liu, 2003;Wanyan et al., 2014) and others showed the same about HRV (Gentili et al., 2014;Hidalgo-Muñoz et al., 2018).
Compared with ANS, mental workload is more correlated with the central nervous system (CNS) (Miura et al., 2016). fNIRS is a functional brain imaging method that can be used to evaluate mental workload by assessing CNS activity. Compared with other imaging modalities, it has the advantages of safety, portability, low cost, and high temporal resolution, and it is applicable in procedures involving mobility and interactivity (Boas et al., 2014). It sends out near-infrared light of 700−900 nm into the cerebral cortex tissue. The near-infrared light is refracted and absorbed by tissue and then passes through the cerebral cortex. The changes in the hemodynamic indices of the cerebral cortex are then measured by calculating the spectral changes of the near-infrared light through the cortex. The main chromophores of near-infrared light in the tissue are oxygenated hemoglobin (HbO) and deoxygenated hemoglobin (HbR) (Ferrari & Quaresima, 2012). Therefore, it can convert the spectral changes to the changes in hemoglobin concentration using the modified version of the Beer-Lambert law (Boas et al., 2014). The changes in hemoglobin concentration can reflect the metabolism of oxygen in the brain. The prefrontal cortex (PFC) has a functional relationship with mental work, and the allocation of attention resources across multiple tasks has also been associated with activity in the PFC (Mckendrick et al., 2016). Previous fNIRS-based mental workload studies suggested that fNIRS was sensitive to mental workload (Liu et al., 2017). The fNIRS technology has been used in simulated-flight-and actual-flight-based assessments to evaluate mental workload. In a simulated flight study of applying fNIRS, the difficulty of a secondary task was used to manipulate mental workload and a main effect of mental workload with a higher HbO level under high mental workload condition was found (Mouratille et al., 2020). Verdière et al. (2018) used fNIRS connectivity metrics to discriminate between two different landing conditions (manual vs. automated). Besides offline analysis, Gateau et al. (2015) adopted an online fNIRS classifier based on support vector machine to detect working memory load accurately during interaction tasks between a pilot and air traffic control (ATC) within a simulated flight. In addition to simulated flight settings, fNIRS has also been applied in actual flights. Gateau et al. (2018) contrasted changes in the concentrations of oxygenated hemoglobin in PFC between an actual and a simulated flight, and the results showed that pilots in the actual flight condition had higher anterior PFC activation than pilots in the simulator.
These studies suggest that ECG and fNIRS are suitable physiological assessment methods for studying mental workload in flight environments. In some flight phases, pilots need to deal with multiple subtasks at the same time, so multitasking becomes more important. Therefore, it is necessary to evaluate the mental workload of multitasking in a specific flight environment. Hsu et al. (2015) assessed mental workload during National Aeronautics and Space Administration Multi-Attribute Task Battery, which includes four subtasks: system monitoring, resource management, communication, and tracking. The subtasks were completed at the same time and three levels were set according to the frequency of event occurrence.
They found that the low frequency over high frequency ratio (LF/HF) was sensitive to different high mental workload levels. However, To test this hypothesis, the changes in subjects' PFC hemodynamics during an engaging flight multitasking task were monitored using fNIRS, and PFC activation was discussed. The changes in the NASA-TLX scores, multitasking performance, mean HR, and HRV were analyzed at the same time. The MATB-II task is a classical simulated flight multitasking model, but to eliminate the learning effect, the time of sub-stantial training given to the participants in MATB-II task was more than 80 min (Hsu et al., 2015). The classical psychological task has little learning effects but is quite different from the flight activity. For this purpose, we designed a semiecological multitask model that simulates flight operations abstractly and has smaller learning effects. The model simulates the content of multiple tasks in actual flight such as flight operation, instrument monitoring, and emergency handling.

Flight task
The task model simulates the multitasking during flight. The multitasking task consists of four subtasks: the flight target tracking task, the meter monitoring task, the emergencies handling task, and the residual capacity task. The residual capacity task is a secondary task, and the other three tasks are primary tasks. The program was run on a computer and the task interface is shown in Figure 1. The upper part of the interface is the task selection area, the middle is the flight target tracking task area, the bottom is the meter monitoring task area, the left side is the emergencies handling task area, and the right side is the residual capacity task area.

Flight target tracking task
Participants were required to track an aircraft-shaped moving target through a circular cursor by aiming it at the target. The cursor was controlled with a joystick on the right-hand side. The demands of the task were to aim the circular cursor at the target.

Meter monitoring task
There were four round dashboards in the meter monitoring task area.
Each dashboard had a red area defined as a warning area, and the F I G U R E 1 The interface of the multitask model location of each warning area was different. At the beginning of the task, the pointer of each dashboard began to rise clockwise from the bottom at different speeds. Participants were asked to respond by pressing a key when the pointer reached the warning area. Once the subjects responded correctly, the pointer immediately fell back and began to rise clockwise again. If subjects did not respond in time, the pointer continued to rise, until it was over the warning area and then fell back automatically and began to rise clockwise again.

Emergencies handling task
During the task, several red dots appeared sporadically in the emergency handling task area. The number of red dots was a random number between 10 and 20. Participants were asked to count the number of red dots and press the digit of the number. The red dots disappeared once subjects gave the correct response and the next group of red dots appeared 30 s later. If participants had not respond correctly in time, the red dots disappeared automatically 30 s after their appearance, and the next group of red dots would appear.

Residual capacity task
There was always a number in the residual capacity task area during the task. Participants were asked to prioritize primary tasks and then respond by pressing the same number key. The number disappeared once participants gave the correct response, and the next number appeared immediately. If participants did not respond or they responded incorrectly, the number remained until the end of the task.

Subjective ratings
The National Aeronautics and Space Administration-task load index (NASA-TLX), developed by NASA in 1988 is a multidimensional mental workload assessment scale (Hart & Staveland, 1988

Procedure
Before the experiment, participants were introduced to the content and process of the experiment and were then asked to read and sign the informed consent form. Participants read the description of the Edinburgh Handedness Inventory scale and completed the scale. They then went through a practice round of the multitasking task, which lasted about 10 min. During the practice, the participants' operations were observed to ensure that they learned how to complete the task. After the practice, the participants wore the fNIRS device head band and ECG device chest band. Care was taken that the light sources and the detectors were completely touching the forehead skin, and hair from the eyebrows or side of the head was kept away from the sources and detectors.
Participants rested for 1 min and then began the multitasking task.
Three multitasking tasks of different task load levels need to be completed. In the low load condition phase, participants were asked to accomplish the flight target tracking and residual capacity tasks simultaneously. In the medium load condition phase, they were asked to accomplish the other three tasks simultaneously, excluding the emergencies handling task. In the high load condition phase, they were asked to accomplish four tasks at the same time. Each task lasted 180 s, and the sequence was counterbalanced among the participants. After the completion of each multitasking task, participants completed the NASA-TLX scale and rested for 1 min. After completing the last multitasking task, the head and chest bands were removed and the experiment was complete. The experiment lasted approximately 1 h.

Statistical analysis
Raw fNIRS data (8 channels × 2 wavelengths) were preprocessed were not performed for correlation analysis. A 5% significance level was adopted in all tests and Bonferroni correction was used for multiple comparison.

Subjective load
One-way repeated measures ANOVA conducted on NASA-TLX scores revealed significant main effects of task load [F (2, 26) = 56.96, p < .001]. High task load had the highest NASA-TLX score (p < .016 with Bonferroni correction for multiple testing). Figure 3a shows the total scores of NASA-TLX for the three different multitasking tasks.
The scores of each dimension are shown in Figure 3b. had a higher score for mental demand than low task load and medium task load (p < .016 with Bonferroni correction for multiple testing). The scores of time demand increased gradually with the increase in task load (p < .016 with Bonferroni correction for multiple testing).

PFC activation
The PFC activation (  Abbreviations: LF/HF, ratio of the low frequency over the high frequency; RMSSD, root mean square of the successive difference of the RR intervals; SDNN, standard deviation of the RR intervals.

Correlation analysis
The analysis showed significant correlations in three tests. Figure 5a shows a negative correlation between SDNN and NASA-TLX in low task load [R (26) = −0.43, p = .029]: the higher subjective mental workload F I G U R E 5 The scatterplot with regression lines was, the smaller the SDNN became. Figure 5b shows a positive correlation between brain activation and NASA-TLX in medium task load [R (26) = 0.44, p = .024]: the higher subjective mental workload was, the more the PFC was activated. Figure 5c shows a positive correlation between brain activation and HR in medium task load [R (26) = 0.44, p = .025]: the higher the PFC activation was, the higher HR became.
None of the other correlations were significant (R < 0.40 and p > .05 in all cases).

DISCUSSION
The purpose of this experiment was to use fNIRS and ECG to measure the functional activation of the PFC and HR variability to evaluate the mental workload of multitasking during a simulated flight. The program simulated flight characteristics such as target tracking, meter monitoring, and emergency handling. The difficulty of the multitasking tasks was set by the number of subtasks completed at the same time.
NASA-TLX scores increased as the number of subtasks increased, indicating that different levels of cognitive demand in multitasking tasks were successfully elicited. As one of the most used subjective assessment scales, NASA-TLX can reflect mental workload from six dimensions, so it can diagnose the source of mental workload (Rubio et al., 2004). In this experiment, the differences in NASA-TLX scores in different tasks were reflected in mental demand and temporal demand.
This indicated that the high mental workload of participants was mainly a result of time pressure, which was different from the task model that said mental workload resulted from difficulty of operation or memory (Fallahi et al., 2016). In the experiment, there was no significant difference in subjective effort or frustration with the increase in task difficulty. This may be because the participants had a subjective feeling that they had completely paid the remaining cognitive resources to the secondary task after completing the primary tasks.
As indicated by an increase in the average tracking distance and number of alarms, the performance of the flight target tracking worsened as the number of subtasks increased. It showed that other subtasks affected the tracking task performance. Other subtasks took up the participants' cognitive resources, and the cognitive resources available to the tracking task were reduced, leading to a decline in their tracking task performance. However, the performance of the meter monitoring task was not affected by the emergency handling task. This may be because subjects had a resource allocation policy when completing the emergencies handling task. The resource allocation policy is a propensity that is adopted by the performer regarding which task was favored (Wickens, 2002). The average reaction time of the emergency handling task was 18.16 s. The long response time indicated that subjects chose to prioritize the meter monitoring task over to the emergencies handling task. In fact, emergency handling is often prioritized over meter monitoring in real-world situations. However, multiple task demands may influence operators' strategies, such as delaying priority tasks (Wickens et al., 2016). Reversed priorities, for example, prioritizing ATC communications over maintaining flight stability can lead to an accident (Schutte & Trujillo, 1996). As a secondary task, the residual capacity task performance had the same result as that of the flight target tracking task. The theoretical basis of the secondary task was the operator's limited attention resources, which can be used to assess the mental workload associated with simulated flight or driving (Heine et al., 2017). In the experiment, participants were asked to complete the primary tasks preferentially and complete the secondary task with their remaining cognitive resources. As the task load increased, their residual capacity decreased, leading to a decrease in the number of numeral response in the secondary task. This is consistent with Wicken et al. (2016) that the higher mental workload is, the less residual capacity from the primary task there is available for the secondary task, which leads to a worse secondary task performance.
fNIRS has been widely used in the field of brain research including cerebral structure and function research, brain-computer interface, adaptive interface, mental workload assessment, etc. (Boas et al., 2014;Pan & Jiao, 2013). fNIRS is sensitive to changes in mental workload (Liu et al., 2017;Mouratille et al., 2020). Changes in task difficulty, including difficulty with memory or information processing load, can lead to changes in the activation degree in relevant brain regions (Foy et al., 2016). Brain areas sensitive to mental workload have been shown to elicit activations during time-limited cognitive activities (Barch et al., 1997). In this experiment, with the increase in the number of subtasks, the information load increased and the PFC activation increased.
These findings were similar to those of previous studies, showing that the PFC was particularly sensitive to mental workload (Ayaz et al., 2012). As a collection of interconnected neocortical areas, PFC is mainly related to high cognitive functions such as working memory, decision making, executive control (Mckendrick et al., 2016). According to the location of the regions, PFC can be divided into the rostral PFC, dorsolateral PFC, ventrolateral PFC, medial PFC, and orbitofrontal PFC (Miller & Cohen, 2001). Although different regions of PFC have different and specific functions (Miller, 2000), mental workload, especially in flight, influence selective attention, spatial attention, episodic memory, cognitive control, task switching, attention allocating, and decision-making (Chenot et al., 2021), which need close and interactive relationship of different regions of PFC. Therefore, the activation of the whole PFC-instead only specific areas-was usually used for assessing mental workload (Causse et al., 2017;Causse et al., 2019;Gateau et al., 2015). The results of this experiment showed that PFC activation can be used to evaluate the mental workload caused by the number of subtasks in multitasking tasks. The theoretical basis is the multiple resources model. According to Wickens (2008), the multiple resources model has four dimensions (processing stages, perceptual modalities, visual channels, and processing codes) and each dimension has separate and distinct pools of attentional resources.
Tasks completed at the same time received resources from the same attentional resources pool, which generated resource competition.
In the multitasking task, each subtask needed the resources of focal vision and the competition of cognitive resources existed between each subtask. Therefore, the increase in subtasks can mobilize more cognitive resources of subjects, which may lead to an increase in the activation of the PFC. Further work is required to investigate whether activations observed in the PFC were induced sustainably or transiently.
SDNN and RMSSD are time-domain features of HRV. SDNN is treated as a reflection of the overall autonomic nervous system function while RMSSD is treated as a reflection of the parasympathetic function (Malik, 1996). As frequency domain features of HRV, LF is mainly associated with sympathetic activity and HF is associated with parasympathetic activity (Kamath & Fallen, 1993). When mental workload increased, sympathetic activities increased while parasympathetic activities decreased (Fallahi et al., 2016). Regrettably, sensibility of HRV and HR to assess mental workload in flight was not consistent as we introduced above. In this experiment, there were no significant differences in HRV among different multitasking tasks, but the mean HR increased with the increase in mental workload, which was consistent with some simulated flight studies (e.g., Gentili et al., 2014;Hidalgo-Muñoz et al., 2018). A possible explanation for these changes is that HRV is less sensitive in certain practical circumstances ( (Veltman, 2002). PFC activation at high task load was higher than PFC activation at low task load in the post-hoc analysis, while no significant differences of PFC activation and HR between the other contrasts were found.
This could be explained by theory of reserve capacity (Wickens et al., 2016). In this study, the subjects were required to use all their remaining capacity to complete the secondary task so as to maintain performance in that task. In fact, an increase in task load can still generate more cognitive resources, but the participants used up their mental effort subjectively during each multitasking task, which was reflected by the subjective effort scores. Therefore, gradual changes in the generated cognitive resources shrink during different multitasking. PFC activation and indices of ECG were not correlated to task performance in the study. This dissociation between performance and nervous system activity was also found by other studies (Ayaz et al., 2012;Causse et al., 2017) and can be explained by the neural efficiency hypothesis of intelligence, which states that intelligent individuals show higher brain activation efficiency when performing cognitive tasks (Neubauer & Fink, 2009). However, SDNN was negatively correlated with subjective mental workload in the low task load condition, while PFC activation was positively correlated with HR and subjective mental workload in the medium task load condition. Correlational relationships were not found in the high task load condition. That may be related to a plateau in high task load, which represents a processing limit (Causse et al., 2019); that is, physiological indices can predict subjective mental workload across participant to a certain extent. In addition, the PFC activation and HR showed effects on task load, suggesting that combining fNIRS and ECG to evaluate mental workload during multitasking in flight may be an effective approach. Of course, future studies on in-flight multitasking should explore more advanced methods of fNIRS, such as wavelet coherence analysis and methods of graph theory, which could provide a rich picture of large scale distributed neural sources (Leff et al., 2015).

CONCLUSIONS
In this study, we found that task performance decreased with increasing mental workload during multitasking. The increased mental workload was primarily due to time pressure. HR and the PFC activation can be used to detect changes in mental workload during multitasking in simulated flight. These results can provide an effective reference for evaluating or quantifying the mental workload of pilots during multitasking. Future work should be carried out to provide a criterion for evaluating mental workload.

ACKNOWLEDGMENTS
This research was funded by the Reserve Talent Innovation Fund, guided by a special project of guidance for construction of first-class discipline in aerospace medicine [2020HBRC05]. We would like to thank Editage (www.editage.cn) for English language editing.