Automotive Augmented Reality: User Experience and Enabling Technology

AR HUD systems offer compelling benefits to drivers because of improvements in situational awareness and an increase in comfort with autonomous systems.


Automotive
Augmented Reality: User Experience and Enabling Technology AR HUD systems offer compelling benefits to drivers because of improvements in situational awareness and an increase in comfort with autonomous systems.
by Kai-Han Chang and Thomas Seder THE EMERGENCE OF MIXED REALITY (MR) AND VIRTUAL reality (VR) applications challenges the displays community for new display solutions that meet the demands of the consumer electronics, medical, and transportation markets. MR and VR can be understood in the context of Milgram's reality-virtuality continuum. 1 As Fig. 1 shows, the continuum is partitioned into four segments that span from reality to augmented reality (AR), through augmented virtuality (AV), and culminating at VR. In a non-augmented, non-virtual, real environment, displays are used to convey information about the world to a user who must acquire and mentally process the information and then act upon the information to achieve their goals. For example, in the automotive domain, LCD or OLED displays are used to convey geospatial information to aid the driver in navigational tasks. In AR, details are added regarding the real world to increase the saliency of important features in actual surroundings by overlaying virtual images or annotating the environment with missing information. Reusing the navigation example, an AR head-up display (HUD) can be used to annotate the world with a virtual arrow "painted" directly on the road, obviating the need for the user to go through the task sequence described for navigation. AV is in the MR regime, wherein real objects are brought into a virtual environment. A driving simulation environment is one example of AV. In VR, the user is fully immersed in-and can interact with-a synthetic world that contains no real objects. Use cases for VR range from gaming through visualization of forms to interaction designs in the product development cycle.
In this article, we focus on AR in the automotive domain by • describing how to achieve the illusion of fusing virtual images with real objects; • describing an automotive AR system's functional block diagram; • describing a few use cases for automotive AR; • illustrating some of the challenges for implementing an AR system in the automotive domain; • providing an overview of holography; and • showing how holographic technology may address implementation challenges.

Creating the AR Illusion in Automotive
Automotive AR solutions are more difficult because we desire not to use head-worn devices. Instead, the automotive approach is to scale the consumer electronic solutions, effectively growing the optics and embedding them under the instrument panel and using the windshield as a combiner to create an AR HUD. Further, in automotive, the gaming consoles or PCs that are often used in consumer electronics (CE) are replaced with the vehicle infotainment system and associated sensors. To create the illusion of AR using a HUD, it is necessary to trick the human visual system to perceive the displayed virtual images as part of the real external world. The typical non-AR HUD virtual image distance (VID) is approximately 2.3 meters, which requires a user informationdisplay.org | 2022 January/February to reaccommodate (refocus) upon switching gaze and attention from the virtual image to objects in the real world that are well beyond this distance. To answer the question-"How far away from a user does a virtual image need to be placed to trick the user's perceptual system into perceiving the virtual image as being part of the real world?"-we must address the linked triad of physiological cues for depth perception: accommodation, convergence, and binocular disparity. The first physiological depth cue arises when the ciliary muscles signal the brain on the corneal lens contortion required to focus the eye on a particular object. As Fig. 2 shows, the thin lens equation reveals that the cornea largely ceases to change shape when focusing on objects beyond approximately 7 meters (Fig. 2d). The second physiological depth cue arises when the ocular motor muscles used to rotate the eyes signal the brain on the degree of rotation required to shift gaze upon a particular object. The convergence angle diminishes as the object distance increases, and largely ceases to change beyond approximately 7 meters (Fig. 2e). The third physiological depth cue signals the brain on how far from the fovea an image is formed on the retina. Far objects have less disparity than near objects. Here, the binocular disparity cue ceases to be effective beyond 7 meters. For these reasons, the AR VID needs to be greater than 7 meters to have the perceptual cues created by the virtual image and external world to appear to be fused.
To further support the AR illusion, the look-down angle (LDA) of the virtual image-the angle between the driver's line of vision and the center of the vertical field of view (FoV) of the virtual image plane-is adjusted to approximately 2 degrees. For a VID beyond 7 meters, this LDA will make the virtual image appear as is if it is fused with the road surface, with the bottom portion of the vertical image appearing fused with the road nearest to the driver and the top of the virtual image appearing fused farther down the roadway. We use the term road coverage to denote the area of the road that can be "painted" with virtual images that appear fused with the road surface. The road coverage is  all the input of the sense block to generate AR graphics. In addition, the register block contains image distortion compensation and luminance control algorithms, as well as priority management algorithms that favor presentation of highly important graphics to elicit urgent driver action. The eponymous registration block also executes algorithms that not only register the virtual image at the proper location in the real world, but also temporally maintain that registration, enabling a conformal AR HUD system.
The display takes graphical output from the registration block and creates and displays virtual images. Before describing the viable AR HUD display variants, first we describe the AR HUD use cases with the type of graphical output that must be displayed.
The overarching goal of an AR HUD system is to enhance the driver's situational awareness-and to that end, a plethora of applications have been, or will be, developed. The applications fall into one of three broad categories. An application can provide graphics that beneficially augment information in the real world, annotating the world with useful information or providing a compelling user experience (UX). Examples of augmentation applications include using a night vision system to identify living creatures in the roadway and highlighting those animals. A driver's vision can be enhanced through the use of RADAR and LIDAR to detect a barely visible vehicle ahead or edges of the road, respectively, increasing the saliency of those features using virtual image overlays (Fig. 3), which also assist with seeing through fog, pedestrian detection and highlighting, and highlighting threats in a forward-collision scenario. Examples of augmentation applications include "painting" the road with a director arrow to aid navigation, presenting virtual signs (which is particularly beneficial to novice drivers and foreign drivers who may have difficulty reading real signs presented in the local language), 2 and displaying the automated cruise-control gap setting. Examples of AR HUD-enhanced UX includes the building of trust and confidence in autonomous systems by highlighting system-detected threats and displaying the vehicle trajectory to the occupant in autonomous mode and gaze orchestration. 3 Gaze orchestration will leverage ongoing work in camera vision and scene analysis and recognition to not only show the driver where to look, but where to look next in rich, complex driving environments. 4

Challenges to Create an Automotive AR System
Beyond the engineering challenges presented by the AR system complexity lie many obstacles that must be overcome to realize a safe, beneficial, and affordable automotive solution. Among determined by the FoV of the AR HUD and the LDA. The typical AR HUD FoV is 10o horizontal × 4o vertical, and LDA is a vehicle-specific parameter. More sophisticated AR HUD designs will employ dual fixed-image planes, with one beyond 7 meters for AR content and a smaller image plane at 2.3 meters for non-AR, traditional content. Finally, the AR illusion is bolstered by using perspective in the design of the AR HUD graphics and subsequently discussed assistive technologies to keep the location of the graphics pinned to a specific road location even as the vehicle's orientation in space and driver's head position change.

Automotive AR System Description and Use Cases
The AR HUD system is defined by three functional blocks: sense, register, and display (Fig. 3). The sense functional block, as the name suggests, is comprised of all the sensors needed to surveil the external world for the presence of threats that could be highlighted on the AR HUD. This sensor suite could include traditional forward-looking cameras: light detection and range (LIDAR), RADAR, and night vision. The sense block also would include ambient light sensors that allow control of the AR HUD image luminance against the background to achieve image discriminability goals. The sense block would include signals from the nascent vehicle V2X information systems (X = another vehicle, the infrastructure, satellites) that provide information to aid the driving task into the vehicle. Finally, the sense block also includes sensors that are used to compensate for changes in the vehicle's orientation in space and the driver's eye position that could cause the virtual image to become unpinned from the target location and begin to move about in space, destroying the AR illusion. Orientation and position in space sensors include GPS, inertial measurement unit (IMU) for yaw, pitch and roll, and speedometer. The eye location reported by an inward-looking camera, which is part of a driver monitoring system, is used to compensate perspective changes as the driver's head moves. All these signals are presented to the register block, where they are acted upon to perform a variety of tasks.
While not obvious from the name of the functional block, AR HUD applications are executed within the register block, using the chief human factor challenges is avoidance of cognitive capture. The driver must continue to drive the road and not "drive the display." The AR graphics must not capture the attention of the driver, only aid in accentuating the road scene. Thus, graphical content must be sparse, devoid of visual clutter, and be presented at a relatively low luminance contrast or the color difference ΔE*, relative to the external environment. This may seem counterintuitive to the display engineering community, but the goal of AR graphics (with the exception of annotations such as virtual signs) is to enhance the saliency of important real objects, not to be "read." A related concern is risk homeostasis, wherein a driver may increase their risk envelope when aided by assistive technologies. In the case of AR HUDs, a valid question, deserving of deep exploration is, "Will a driver in low visibility and foggy conditions, who would normally slow down to reduce collision risk, drive faster when aided by a system that highlights road edges, lane makings, and in-lane vehicles ahead?" Additionally, there is a host of vehicle integration challenges to be overcome to realize AR system implementation. Primarily, these challenges are centered on the design, build, and packaging of the AR HUD component into the vehicle. Since, by standards of automotive, this component must offer a rather larger 10o horizontal FoV, it also must have abundant resolution to produce sufficient pixels per degree (ppd) over that of 10o. We require at least ~100 ppd, which is beyond the Snellen acuity threshold of 60 ppd, and thereby avoiding the "screen door" effect in the graphics. Higher-resolution display elements are more desirable for the appearance of smooth, moving pointers and other objects. In traditional technologies, many pixels on the native display element are reserved for distortion compensation, which reduces the 100 ppd to something often considerably less. We find that slightly greater than 100 ppd is required to produce aesthetically pleasing graphics that exhibit smooth luminance gradients in shaded objects and smooth, unbroken movement of objects such as gauge pointers.
For AR HUDs, the question is, "How many pixels must be reserved for distortion compensation?" The unsatisfying answer is, "It depends." Distortion may move an image beyond the boundaries of the matrix display, so border pixels are reserved for pixel movement to correct distortion. The answer is complicated because in traditional design solutions, an extraordinarily large freeform mirror is used to highly magnify the image from the display element, compensate for distortions cause by the windshield shape, and cover a 10o FoV. The mirror delivers a ray bundle to a relatively large area of the windshield-the HUD zone patch. Because the shape of the patch varies from vehicle-to-vehicle, is large, and is part of a high-magnification optical system, AR HUD windshields can impart significant distortion that must be compensated for via end-of-production line calibration. Distortion compensation calibration imposes a pixel movement warp map on the display element that reduces its resolution. The magnitude of the reduction is highly dependent upon control of the windshield and freeform mirror fabrication processes, as well as the precision with which the windshield and HUD can be installed in the vehicle.
Packaging of the AR HUD into the vehicle is perhaps one of the largest hurdles that currently limits market adoption of AR HUDs. Simply using existing thin-film transistor (TFT) LCD or digital light processing (DLP) display elements already used in traditional automotive HUDs and scaling the optics to achieve the larger VID and FoV requirements lead to solutions that (today) are at least 15 liters or greater. Packaging such a large component requires the vehicle to be built around the HUD, with significant accommodations required in designing the plenum (firewall), cross-car beam, steering column support structures, and instrument panel (dashboard). Additional packaging challenges presented by such large-volume designs include routing wiring harnesses, HVAC, and windshield defrost ducts. These, along with the integration challenges mentioned, argue for the need of ultra-compact AR HUD design solutions.
Multiple advanced optical technologies are in development or production that enable more compact AR HUD designs. Among these are phase-only holography, windshield-embedded, holographically produced diffraction optical elements (HOEs), and HOEs-waveguide solutions. Because holographic technology occupies a preeminent position that enables the development of AR HUDs, we now delve deeper into this fascinating technology.

Using Holographic Technologies to Create Compact AR HUDs
Holography was first introduced by Denis Gabor in 1948 to improve the image quality of electron microscopes. 5 The word "holography" is composed of the Greek words holos, which means the whole, because the technology is capable of capturing the whole of light (amplitude and phase), and graphe, which means writing or drawing. In traditional static holography, the amplitude and phase of the light reflecting from an object are recorded into a photosensitive material (e.g., dichromated gelatin) by interfering the object beam with the reference beam (Fig. 4a). 6 The photosensitive material with the recorded interference pattern is referred to as a hologram. For reconstruction of the object beam, a coherent beam having the same wavelength as the reference beam is used to illuminate the hologram from the same direction as the recording reference beam. This produces a virtual object that is visible to the viewer, with both amplitude and phase information being reconstructed.
With the same technique, the characteristics of an optical component, such as a lens or wedge, also can be recorded in the photosensitive material. This component is referred to as a holographic optical element (HOE). Brown and Lohmann 7 proposed computer-generated holography (CGH) in 1969 (Fig. 4b), replacing the recording process in traditional holography with a digitally generated hologram. The hologram is encoded onto a spatial light modulator (SLM). When the hologram-encoded SLM is irradiated with a coherent reference wave, each SLM pixel becomes a "Young's slit" and the superimposed output of all pixels, which individually produce a unique phase modulation, creating the desired wavefront. Theoretically, to reconstruct the virtual object with minimal artifacts, the SLMs must be capable of modulating both amplitude and phase. However, the com-mercially available devices provide amplitude-only or phase-only modulation. Phase-only modulation is favored in applications for its high efficiency, with a trade-off in image quality. Liquid crystal on silicon (LCoS) devices often are used as SLM in phaseonly modulation.
The computation of a hologram is based on the mathematical approximation of free-space wave propagation. The pixelated SLM can be considered as the source plane in Fig. 5, 8 and the reconstructed image at a specific distance is at the observer plane.
Based on Huygens's principle, each point at the source plane can be considered as a point source emitting a spherical wavefront. The electromagnetic field U(x,y,z) at the observer plane at distance z from the source plane U(ξ,η,0) is the superposition of the spherical wavefronts, and the real image that the observer sees is the intensity (|U(x,y,z)| 2 ). When the distance z between source plane and observer plane is on the order of the wavelength of light, full wave equations are required to derive the field at the observer plane. As the distance z increases (z >> l, where l is the wavelength of light), a scalar approximation can be implemented, which is referred to as Rayleigh-Sommerfeld diffraction. As the distance z further increases, the diffraction can be categorized into a Fresnel (near-field) or Fraunhofer (far-field) regime, which simplifies the hologram computation.

Fig. 5.
Hologram calculation based on a diffraction regime. 8 informationdisplay.org | 2022 January/February from the point sources can be approximated as plane wavefront, represented in the mathematical form as e -i 2π λz [xξ+yη] . This leads to a fortuitous result: the field at the observer plane is the Fourier transform of the field at the source plane, which reduces the complexity of the hologram computation dramatically. Several phase retrieval algorithms that compute the phase-only hologram at the source plane have been proposed with Fourier transform as the core of the computation.
The Gerchberg-Saxton (GS) algorithm, which also is referred to as an error-reduction algorithm in other literature, was the first proposed algorithm using the Fourier transform relation between observer plane (real image) and the source plane (SLM). 9 Fig. 6 illustrates the flowchart. The algorithm starts with a random phase generator with the range from -π to π. The initial random phase seed is multiplied by the square root of the target image intensity and forms a new function. A fast Fourier transform (FFT) is performed on the new function, and only the phase function from the outcome of FFT is retained. An inverse FFT (iFFT) is performed on the phase function and, again, only the phase function is retained. The retained phase function is multiplied by the square root of the target image intensity, and the iteration continues. After multiple iterations, the image quality converges. The square of the amplitude function from the FFT is the resultant image. In some applications, such as displays and real-time imaging, the computation time available is limited by the frame rate. The reconstructed image quality, with a phaseonly hologram obtained iteratively, depends on the number of iterations. The computation time may not be sufficient to allow the image quality to converge with enough iterations. Therefore, the non-iterative phase retrieval algorithms, including the use of machine learning and deep learning, are being investigated intensively.
The digital approach of creating the wavefront provides enhanced flexibility for manipulating light. In automotive displays, many startup companies have shown the use of CGH in a HUD as a picture-generation unit (PGU), 10,11 replacing the DLP projector or LCD. Compared to traditional HUDs with DLP or LCD and freeform mirror(s), the CGH-based HUD has the advantage of efficient light use, high pixel density, wide color gamut, and the potential of variable virtual image distance. The high pixel density gives a higher-fidelity graphic and enables the use of higher magnification in optical design. Furthermore, CGH provides an advantage of compact packaging over conventional DLP-or LCD-based HUDs, and hence, CGH-based HUD can potentially be adapted in more vehicle programs.
The major challenge of implementing CGH-based HUD in a vehicle is to have sufficient FoV and eyebox. Eyebox is the space where eyes must be positioned for a driver to see the full graphic of a HUD without vignetting. Based on the conservation of Etendue, there is a trade-off relation between the size of the eyebox (exit pupil) and FoV. 12,13 For a digital holographic display, the eyebox is determined by the size of the SLM, and the FoV is determined by the diffraction angle of the SLM (defined by the SLM's pixel pitch). The diagonal size of commercially available SLMs is ~10-20 mm, which is 5 to 10× smaller than the eyebox required for a HUD. The FoV can be expanded by manipulating the divergence of the incident beam, 14 but the approach would reduce the size of the eyebox. To date, the popular eyebox-expansion methods are steering the exit pupil to the user's eye with eye tracking 11,15 and exit pupil replication. 16,17 HOEs, where a specific optical property is recorded into a photosensitive material via a traditional holographic method, catches wide interest in applying pupil replication for its compact form factor, efficiency, and design flexibility. HOE is a part of the diffractive optical element family that manipulates light through diffraction. Photopolymers that have a locally varying refractive index determined by exposure intensity are widely used as the base material of HOE. 18 With proper design of thickness and refractive index modulation, HOE can direct the light to the desired location at a designed efficiency. Pupil expansion based on a holographic waveguide is a combination of total internal reflection within the waveguide and diffraction of light to the designed direction with specific efficiency at each bounce in the waveguide. Using waveguide HUDs requires a set of nine holographically produced gratings (three for each color) that are affixed to the waveguide. Each color has an input grating that couples light into the waveguide and expands the image in one dimension, a second grating that expands the image in the orthogonal dimension, and a third grating (the output grating) that extracts the image over the whole area of the waveguide. The extracted image is directed to the windshield and, ultimately, to the driver's HUD eyebox. This technology, which is agnostic to the display technology used to generate the image, can produce wide FoV Images at large VID, as required of an AR HUD. It also offers vehicle packaging advantages, particularly in the up-down (z-axis) dimension.
In addition to pupil expansion, many companies demonstrate prototypes in conferences of embedding reflective HOEs in windshields to provide optical power and redirect the light to the driver. [19][20][21] Ideally, HOEs diffract only the wavelength of light emanating from the AR HUD and not the surrounding ambient light. This solution works best with narrowband illumination sources. The advantage of HOEs is that they add optical power to the windshield, relieving design constraints within the AR HUD module. It should be noted that windshield HOE technology is largely agnostic to the type of image-generating technology used within the AR HUD module and can be viewed as a complementary technology.

Summary
AR has a great potential, with public acceptance, with implementation in HUDs. AR HUDs can improve situational awareness, trust, and comfort to drivers and passengers. To achieve the best AR HUD experience, all three functional blocks (sense, register, and display) must meet human perceptual requirements. Holographic technology shows superior capability, enabling the implementation of AR HUD. While challenges still exist, we look forward to more creative solutions and disruptive technologies that can overcome these obstacles. ID Kai-Han Chang is a senior researcher at General Motors in the Information Display and Simulation group within GM R&D. Her research specialty is liquid crystal optics and device development. She is responsible for developing novel optical solution for both head-up displays and head-down displays. Chang holds a PhD in chemical physics from Kent State University. She can be reached at kai-han.chang@gm.com.
Thomas Seder is a GM Technical Fellow and Chief Technologist-HMI working within the Vehicle Systems Research Lab of GM R&D. His research interests are in the areas of information architecture and information display, display device physics, and the use of holographic methods to create compelling HUD-based AR and enhanced vision systems. Seder holds a PhD in physical chemistry from Northwestern University and an MBA from the University of Iowa. He can be reached at thomas.seder@gm.com.