Validating Operator Event Sequence Diagrams: The case of an automated vehicle to human driver handovers

Predicting what drivers will do as vehicle control is handed over to them from automation is a relatively new challenge for the motor vehicle industry. Operator Event Sequence Diagrams (OESDs) offer a way of modeling the interactions between the driver and vehicle automation in the handover of control. In this paper, two studies are presented in which a range of handover strategies are tested. The anticipated driver strategies were modeled using OESDs to serve as predictions of driver behavior. Drivers were then observed in two separate studies: (1) using a Lower‐Fidelity (vehicle seat and controls) simulator and (2) using a Higher‐Fidelity (whole vehicle) simulator. Driver behavior during a takeover task was categorized according to the signal detection paradigm into hits, misses, false alarms, and correct rejections. The results showed that for all strategies in both sets of studies, the median criterion for validity was exceeded ( φ  > 0.8), suggesting that OESDs made good predictions of driver behavior during the handover of the vehicle from automation to manual control.

of other vehicles, hazards on the road, status of their own vehicle, as well as immediate actions required, such as vehicle navigation and guidance Walker et al., 2015).
The design of the handover interface requires some form of representation of the human and machine activities that are anticipated so that the strategy can be implemented in the vehicle. There is a range of Human Factors and Ergonomics methods that could be employed to represent these activities (Stanton et al., 2013. Modeling of the interaction between people and technology is becoming increasingly popular in Human Factors and Ergonomics research and practice (Moray et al., 2016). These methods enable the modeling of tasks, processes, timings, and potential errors. Choosing the most appropriate methods really comes down to the intended purpose to which the modeling is to be put (Stanton et al., 2013). Modeling the structure of tasks can be useful for understanding the nature of work design (Stanton, 2006).
Modeling processes helps in the design of interaction between humans and machines (Banks & Stanton, 2017). Modeling timings can be used to understand time-critical interventions Stanton & Walker, 2011). Modeling errors can be helpful for anticipating nonnormative behavior . For the purposes of modeling the process of handover from an automated vehicle to a human driver, Operator Event Sequence Diagrams (OESDs) were selected as they have been successfully used to model the interactions between drivers and vehicles previously (Banks & Stanton, 2017;Banks et al., 2014).
OESDs have been used to represent the interaction between humans and technology in a graphical manner (Stanton et al., 2013). They are based on the engineering technique for describing technical processes in the form of a flowchart with the use of standard symbols for each defined process (see Table 1). Each separate system has its own column, colloquially called "swim-lanes." OESDs have additional "swim-lanes" for the human operator(s). The output of an OESD graphically depicts activities, including the tasks performed and the interaction between humans and machines over time, using standardized symbols. There are numerous forms of OESDs, ranging from a simple flow diagram representing task order to more complex OESD, which account for team interaction and communication and often include a timeline of the scenario under analysis and potential sources of error.
Previous applications of OESDs include modeling single-pilot operations in commercial aviation (Harris et al., 2015), aircraft landing (Sorensen et al., 2011), air traffic control , electrical energy distribution (Salmon et al., 2008), and automatic emergency braking systems in automobiles (Banks et al., 2014). The latter of these applications used OESDs to compare four different levels of automation for pedestrian detection and avoidance from manual control, through T A B L E 1 OESD task elements with description and an example decision support and automated decision-making to full automation. This analysis showed that both the decision support and automated decisionmaking systems actually involved the driver in more work than driving manually. All of the applications of OESDs were able to delineate between the processes undertaken by different human actors and machine agents in their respective systems. For example, Harris et al. (2015) showed the effects of reducing the crew of two pilots to a single pilot.
Single-pilot operations reduced many of the crew communication activities but did result in more work overall for the remaining pilot, as might be expected.
In the other applications of OESDs, researchers have been able to show how work is distributed across multiple actors and agents. In particular, Sorensen et al. (2011) used OESDs to illustrate how distributed cognition (Hutchins, 1995a(Hutchins, , 1995b worked in their analysis of an aircraft cockpit and crew for a landing task. Inspired by the original work of Edwin Hutchins (1995b), they used OESDs to show how the cognition of the cockpit is distributed among the artifacts, and two human pilots as the descent of the aircraft are managed. Similarly, Walker et al. (2010) and Salmon et al. (2008) show how cognition is distributed among the artifacts and people (who are themselves in different physical locations) for aviation and energy distribution industries, respectively. In a recent review of distributed cognition and situation awareness (SA), Stanton et al. (2017) argue that automated driving systems provide an excellent case study for distributed cognition. Clearly, the vehicle automation is performing some of the cognitive functions on behalf of the human driver, the extent to which depends upon the level of automation involved.
OESDs can make the distribution of these cognitive functions more explicit, as well as identifying the interactions between the human driver and automated systems when conducting the handover (Banks et al., 2014).
While modeling of the interaction may be useful for designing the strategy for the handover, it is only a prediction of the behavior of the system. As such, it requires validation. Unfortunately, validation evidence is rarely reported in the literature (Stanton, 2014Stanton & Young, 1999a, 1999b. There are a few notable exceptions, however, such as the development of the human error prediction methods (Stanton & Baber, 2005;Stanton & Stevenage, 1998;. Stanton et al. (2013) (Stanton & Young, 1999a, 1999b. This approach has enabled analysts to distinguish between predictions of behaviors that are observed (the hit rate) and those that are not (the false alarm rate). It is also possible to calculate the overall sensitivity of the methods, taking both the hit rate and the false alarm rate into account. In summary, the purpose of this paper is to develop OESDs for the vehicle to driver (VtD) handover process and then to validate them in studies using driving simulators. It was anticipated that the models of handover would offer good predictions of actual handover paper, based on modeling evidence from other methods and domains (e.g., Stanton et al., 2013Stanton et al., , 2014.

| OESD DEVELOPMENT
A set of interaction concepts to facilitate VtD handover were derived following two design workshops. The workshops comprised psychologists, human factors engineers, computer scientists, and automotive engineers. During the course of the workshops, discussions were held on the role of the driver and the role of vehicle automation.
These discussions were refined into storyboards that revealed the processes, tasks, and agents involved. From the storyboards, the OESD swimlanes were drawn up. This was an iterative process, entailing drawing up the tasks and processes in the OESDs, comparing with the storyboards, and discussing with the research team. OESDs were selected for their ability to illustrate the tasks and processes with respect to time (Banks et al., 2014;Harris et al., 2015;Sorensen et al., 2011). Events that occur within the system are modeled using a set of task unit elements, the type of element depends on the event type. Table 1 illustrates the set of task unit elements that correspond to different events. Time is represented on the vertical axis of OESDs; therefore, two events occurring simultaneously would be at the same level, whereas sequential events would be vertically offset.
Four interaction designs were selected from the set of concepts, and each represented using an OESD. Each OESD modeled the interactions between the agents throughout a complete cycle of the automation system. The cycle started with the driver in manual mode undertaking a driver to vehicle handover (DtV). During the automated period (when the vehicle is being driven by automation), the OESD modeled the driver interacting with a secondary task (which was engaging in a tablet-based memory game) followed by the VtD handover that returned the driver to manual control. Figure 1 illustrates a part of an OESD, the four agents, in this case, being driver, vehicle, handover interaction, and environment.
Phase A represents the start of the handover of control, where the driver is informed of a forthcoming mode change, and they begin to prepare. The vehicle is currently automated, however in processes not shown in the diagram, the vehicle's sensors have detected the need for the driver to take control, and this has resulted in the vehicle making the handover interaction vocalization. The six driver processes of Phase A then follow. The driver immediately receives the information then performs a sequence of manual actions, ceasing interaction with a secondary task device, putting down the secondary task device, resuming a driving position, and paying attention to the road. The driver then vocalizes that they are ready to take control of the vehicle, followed by the vehicle detecting the vocalization. In the event that the driver makes an unintelligible or negative response, the initial vocalization is repeated.
Phase B, which contains the handover protocol processes, then begins with the vehicle generating hazard information from navigational data created by on-board sensors (see Politis et al., 2018).
The vehicle vocalizes the hazard information, which results in the first two driver processes of Phase B; the driver receives the hazard information and then vocalizes it. In the event that the vocalization is unintelligible, the driver is asked to repeat. If the driver answers incorrectly, they are informed of their incorrect answer and asked to try again. Once the driver has provided the correct response, the next part of the protocol starts, in this case, lane information.
To determine how well handover behavior was modeled, simulator studies were conducted in both a Lower-Fidelity (

| Experiment design
The experiment employed a repeated measures design with four VtD control transfer conditions representing the single independent variable. Conditions were counterbalanced using the Latin Square design. All conditions featured an initial stage whereby the system would verbally ask if the driver was ready to resume control, following a verbal confirmation, the protocol would start. The first condition (Timer) was based on a simple timer that appeared on the dashboard display when the automation detected a need to hand control back to the driver; on confirmation that the driver was ready, it counted down in 10 s intervals from 60 s, the driver is required to take control by pressing a button on the steering wheel before the countdown reached zero. This was based on an existing design currently undergoing testing by Volvo (Volvo Cars, 2015) with an auditory rather than visual countdown.
The second condition, "HazLan" used a "readback" to raise SA, in which the system would vocalize five elements of SA: potential hazards, current lane, current speed, the next required exit, and the next required action. Following each of the system's vocalizations, the participant was required to repeat back. Incorrect or missing readbacks resulted in the system repeating the original vocalization. Once all of the readbacks were complete, the participant was able to resume manual control.
F I G U R E 1 Example OESD section showing all six driver processes for Phase A (Start of VtD) and the first two driver processes of Phase B (Handover protocol phase). This OESD is an example from Study 2 (Higher-Fidelity Simulator study).

OESD, Operator Event Sequence Diagram
The third condition, "VAA" was response-based using the same element sequence as the second condition; however, the system provided the participants with a question regarding each element. If the participant answered the question correctly, the next question was presented; upon completion of the sequence, the participant was able to resume manual control.
The fourth condition was an augmented version of the HazLan condition. In addition to the HazLan SA aspect sequence, it incorporated multiple modalities in the form of audio-driven seat-based haptics (whereby audio signals, including vocalizations, were transmitted to the driver via a pad on the driver's seat) and two lightemitting diode (LED) strips mounted on each side of the driving position. This presented constant information on the longitudinal positions of cars in neighboring lanes, thus providing a dynamic blind-spot warning system.
Data on multiple aspects were collected; however, for the validation experiment presented in this paper, the dependent variable was driver behavior, in terms of the processes carried out by participants during VtD control transitions. This driver behavior data was collected using four webcams, generating footage from multiple angles within the vehicle. The processes consisted of actions, inactions, and vocalizations. Actions included those expected from the driver as predicted by the associated OESD, as well as unexpected actions, such as placing a finger on a button early or making an exaggerated glance. Inactions consisted of the failure to carry out a process predicted by the associated OESD. Vocalizations included those expected by the system as specified in the associated OESD.

| Equipment
Experiments were carried out using a lower-fidelity driving simulator consisting of a gaming seat, a Logitech G25 steering wheel and pedal set, and three screens to provide a wide field of view (as shown in Figure 2). An additional tablet was employed to act as a pseudodashboard to illustrate speed, lane positioning relative to other cars, fuel level, automation mode, ideal lane, and the next required exit.
The driving scenario featured a route approximately 10 miles long and consisting of a combination of highways with gentle bends and urban roads without corners. A Java-based memory game application was installed on a tablet-based PC to provide the participant with a secondary task when automation was enabled. Participant behavior was recorded using a camera with a wide-angle lens. Two Arduinobased LED strips were fitted to the wall on each side of the driving position, C code was written to enable them to perform as blind spot indicators. A TAD (tactile acoustic device) was placed on the driving seat to provide sound-based haptic information.

| Procedure
After welcoming, the participants were briefed on the experiment and presented with a demographics questionnaire. The simulator was then presented to the participants, and they embarked on a short introductory test drive. No other information was provided to the participants, other than a brief overview of the vocalization system, to avoid training effects. The driving scenarios were then run, using a counterbalanced design to mitigate order effects. During each automation phase, participants were requested to play the tablet-based memory game; a total of three VtD transitions were performed per scenario, at approximately 10%, 40%, and 70% progress through the route. VtD transition dialogues used synthetic vocalizations; the participants were expected to respond vocally before switching to manual control using a button on the steering wheel. A Wizard of Oz-based system was employed to manage the synthetic vocalizations in response to the participant. At the end of the study, participants were provided with remuneration in the form of a £20 web voucher for their time.

| Analysis
Signal detection theory's primary use is to discern between "signals" and noise (Abdi, 2007), four stimulus-response events exist: Hits, Misses, False Alarms (FAs), and Correct Rejections (CRs) (Nevin, 1969). In the context of this experiment, it provided a method by which to compare participant behavior, observed during experiments, with predicted driver behavior illustrated on OESDs. See | 93 aid the analysis, the driver processes were split into three phases, A, B, and C, representing the participant preparing to take control, proceeding through the protocol, and taking control back from the automation, respectively. The Timer condition was particularly short and therefore had no requirement for Phase B. Figure 1 shows the six driver-based processes of Phase A in the driver column.
A "perfect" SDT score was attained when the participant only carried out every predicted process as part of the VtD control transfer; in this case, the SDT matrix would have an equal number of "Hits" as there were predicted driver processes. For each predicted process that a participant failed to carry out, a "False Alarm" was recorded. In the event that a driver exhibited behavior in addition to that which was predicted, a "Miss" was recorded. CRs were calculated at the end of the SDT analysis by subtracting the number of FAs from the total pool of all FAs for all participants. In summary, the data were processed as follows.

| Hits
Present in OESD and present in the video of automation-driver handover (e.g., the driver vocalizing their readiness to begin the handover process).

| Misses
Not present in the OESD but present in the video (e.g., the driver resuming control of the vehicle early and not progressing through all of the handover protocol).

| False alarms
Present in the OESD but not present in video (e.g., the driver does not vocalize hazard information).

| Inter-Rater Reliability (IRR) Method
IRR testing was carried out due to the subjective nature of analyzing, interpreting, and categorizing driver behavior. An analyst was provided with approximately 10% of the video footage files, together with associated SDT analysis forms, a list of exceptions, and a list of driver processes split across the three phases. The analyst watched the footage and compared the driver behavior to that which was expected as specified in the list of driver processes. SDT results were recorded on the SDT analysis forms, together with any exceptions that occurred. This was identical to the method used by the original analysts. Equal weighted Cohen's κ values were calculated and are reported in Section 3.7.

| Results
Equal weighted Cohen's κ values were calculated, resulting in a value of 0.773 for the Lower-Fidelity simulator. This represents a moderate agreement between the analysis in their classification of hits, misses, FAs, and CRs (Landis & Koch, 1977).
As shown in Figure (12) and Timer (13) conditions; all interquartile ranges were identical (1).   At the end of every road section of conditions 2, 3, and 4, the participant was presented with an icon, accompanied by a synthesized vocalization, asking them to take manual control of the car by pressing the "M" button on the steering wheel. This completed the condition by transferring control from the Vehicle to the Driver (VtD).

| Experiment design
The dependent variable for the Higher-Fidelity study was driver behavior, captured via video footage, in an identical way to that of the Lower-Fidelity study. A wide range of additional data relating to performance and the subjective experience was also collected as part of this study but only data relevant to the validation focus of this paper will be discussed.

| Equipment
The driving simulator consisted of a Land Rover Discovery Sport, in combination with three frontal screens, providing a 140 field of view.
Door mirrors employed LCD screens and the rearview mirror reflected a rear-mounted projection. Vehicle control data was extracted from the Controller Area Network using a hardware interface and software in C#. STISim Drive® Version 3 simulator software provided the virtual environment. The vehicle's cluster was replaced by a Microsoft Surface tablet running a custom dashboard in C#; this displayed speed, vehicle position, and icons/graphics (as shown in Figure 6). An HUD was generated by displaying 2D icons and 3D objects in the driver's field of view using STISim's Open Module. An Arduino-based haptic system was fitted to the driver's seat. Arduinobased LED strips were fitted to the A-pillars and coded to act as blind spot indicators. A Microsoft Surface tablet was employed to run a C# Wizard of Oz application, allowing control over HUD and HDD icons graphics, audio vocalizations, and haptics as part of the handover protocol. Four webcams were fitted within the vehicle's interior, the footage from which was saved using a standalone PC. A mini tablet was employed as a secondary task, running a Java-based memory game. All audio outputs were fed into the vehicle's line-in, providing the driver with stereo sound via the internal speakers.

| Procedure
Before the start of the experiment, participants were welcomed, briefed on health and safety issues, and provided with a basic overview of the experiment and hardware before being presented with a consent form, patient information sheet, and biographical sheet. The principles of SAE level 3 automation were explained, together with the purpose of the study and the forms of technology in each of the four conditions. Participants were shown a map of the route, indicating the placement of roundabouts necessitating transition of control back to manual, and they were also informed that GPS-style verbal directions would be supplied. The tablet-based secondary task was explained, and they were instructed to use it when the automation mode was enabled. The participants were then asked to enter the vehicle before being introduced to the systems.

| Analysis
Analysis of the Higher-Fidelity experimental data was identical to that described for the Lower-Fidelity in Study 1.

| Results
Equal weighted Cohen's κ values were calculated, resulting in a value of 0.819 for the Higher-Fidelity simulator. This indicates good agreement between analysts in the classification of hits, misses, FAs, and CRs (Landis & Koch, 1977).
Results from the SDT analysis were used to generate box plot graphs in R. Figure 7 illustrates the hit/miss/FA and CR outputs of the SDT matrices ordered by the condition.   show that in both lower and higher fidelity driving simulators, the median φ statistic exceeded 0.8, which is the standard criterion for acceptable validation (Landis & Koch, 1977). It is interesting to note that the difference in the fidelity of the simulators did not affect the performance of the OESDs. This means that OESDs can be used to make predictions about vehicle-driver handover behaviors with some confidence. Over 100 drivers took part in the studies with an age range of 17-86. For both studies, there were a relatively low frequency of FAs (i.e., predicting behaviors that did not occur) and a high frequency of hits (i.e., predicting behaviors that did actually occur). This is coupled with a relatively low frequency of misses (i.e., failing to predict behaviors that actually occurred) and a high frequency of CRs (i.e., not predicting behaviors that do not occur). In addition, the reliability of the classification of behavior into the four categories of hits, misses, FAs, and CRs from the video data were acceptable for both studies. There is often a disappointing lack of evidence on the reliability and validity of human factors and ergonomics methods , but this does not have to be the case. This and other studies have shown the way in which the evidence can be collected, analyzed, and presented ).
The findings from the current studies are promising for the continued use of OESDs, and it is certainly comparable with the better-performing methods in the discipline of ergonomics and human factors (Stanton et al., 2013. Other studies on the prediction of task performance time (Harvey & Stanton, 2013a;Stanton & Young, 1999a) and human error (Stanton & Baber, 2005;Stanton & Stevenage, 1998;) have all produced good levels of validity. This is not to say that all methods perform as well. In one of the first studies of its type, Young (1999a, 1999b) compared a range of methods, and some performed quite poorly.  notes that the betterperforming methods are generally quite focused in terms of their predictions, such as time and error (and now activities). Nevertheless, it is important to understand the limitations of any method before using it.
The misses are probably the most interesting category of behaviors in this study, as they represent behaviors conducted by the driver in the handover of vehicle control that were not anticipated in the OESD modeling. These were unexpected behaviors that may have occurred due to the drivers being impatient or over-eager to take manual control of the vehicle. With hindsight, the early takeover behaviors could have been modeling in the OESDs. Vocalizing readiness in the phase before carrying out previous processes, such as ceasing using a secondary task or resuming driving position, was not modeled as it was assumed that the driver would only vocalize readiness when they were actually ready.
However, the incidence of this behavior suggests impatience with the duration of the verbal feedback interaction or lack of perceived value in completing the full procedure. Equally, behavior such as removing hands from the wheel during the protocol and having to replace them during or after the protocol was not modeled as this was not anticipated in the design of the interaction. It was assumed (wrongly in some cases) that once the driving position was adopted, it would be maintained. The duration of the handover protocol may have encouraged drivers to take their hands off the wheel and feet of the pedals (after first placing them there) as they did not need to manually steer until the handover procedure was complete. A design solution to overcome this could include the use of steering wheel sensors to detect the correct driving position and provide feedback if hands are removed after the commencement of the handover procedure. This demonstrates the utility of the OESD method to test design assumptions and highlight short-cuts or process failures as a means for generating mitigation strategies through design or training.
Although these studies focused on the use of OESDs to predict handover of vehicle control, the approach could be extended to other aspects of driver behavior (Banks et al., 2014). It has shown utility in other domains, such as aviation (Sorensen et al., 2011;Harris et al., 2015;Walker et al., 2010). As a cautionary note, however, validity generalization cannot be assumed and must be tested . OESDs are good at modeling discrete events, such as the stages in the handover of vehicle control, but modeling continuous events is more challenging (such as manual control of a vehicle). This may require new notation and procedures for OESDs to handle the continuity of nondiscrete activities, such as steering, maintaining speed, maintaining lane position, searching for hazards, anticipating the behavior of other road users, reading the road ahead, and so on. The nomenclature would also require some notation for the interruption of the continuous activities (such as making an emergency stop) as well as some way of representing the duality of activity in driving (such as operating the climate system while also driving the vehicle). Harvey and Stanton (2013b) presented the multimodal critical path analysis method to show how the driver could engage in activities simultaneously (spearing cognitive and physical activities by modality). This would mean that OESDs need more than one "swim-lane" for the representation of driver behavior. Huddlestone and Stanton (2016) borrowed notation from computer science to represent continuous activity. So, potentially at least, there are methods for extending OESDs to cope with both continuous and multiple driver activities.
Future OESD modeling research could consider the incorporation of time and error data within the analysis ) to predict VtD handover times as well as the exceptions.
Data on human performance time are available in the literature and could be extended to this domain (Harvey & Stanton, 2013b;Stanton et al., 2014). This has the advantage of helping vehicle designers budget time allocation for the VtD handovers, a topic of much debate (Eriksson & Stanton, 2017).
OESDs appear to be able to make good predictions about driver activity during the simulated handover of vehicle control from automation in both lower and higher fidelity simulators. Over 100 drivers were tested in both studies with different interaction designs, and the median validity statistics were all above the criterion value.
Consequently, the OESD method may be used with some confidence in modeling driver interaction for discrete events, although validity generalization requires testing. Further development work is needed to incorporate continuous activities into OESD modeling.
The debate on the reliability and validity of Human Factors and Ergonomics methods has been continuing for some time (Stanton & Young, 1999a. Very little work has been conducted to validate methods and, consequently, very little is reported in the literature   (Stanton et al., 2013). Examples include most of the methods used for task analysis, process charting, error and time prediction, team performance, and system design (Stanton et al., 2013). Much more could be done, and surely needs to be, for Human Factors and Ergonomics to be a credible engineering discipline.
OESDs could be applied in any domain where a description of the interactions between human operators and technology (including automation) would be useful. Examples of transportation domains include aviation, maritime, and rail. For example, the relationship between the autopilot and human pilots could be explored to improve the handovers and reduce the problems encountered in Air France 447 crash (Salmon et al., 2016). Another example could be using OESDs to explore the interactions between the signalman, the alarm system, and system control displays in the Ladbroke Grove rail collision . In this latter case, the complexity of the interactions appeared to have slowed the reactions of the signalman. The OESD method is not restricted to transportation domains however and could be equally well be applied to control rooms in energy production and distribution, defense, manufacturing, medicine, nuclear, oil and gas, and security .
With the rise of interest in multiagent systems, OESD could prove useful in representing and exploring the relationships and interactions within and between different agents.

ACKNOWLEDGMENTS
This study was supported by Jaguar Land Rover and the UK-EPSRC grant EP/N011899/1 as part of the jointly funded Towards Autonomy: Smart and Connected Control (TASCC) Program.

DATA AVAILABILITY STATEMENT
Due to the nature of this study, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.