The State of the Art of Spatial Interfaces for 3D Visualization

We survey the state of the art of spatial interfaces for 3D visualization. Interaction techniques are crucial to data visualization processes and the visualization research community has been calling for more research on interaction for years. Yet, research papers focusing on interaction techniques, in particular for 3D visualization purposes, are not always published in visualization venues, sometimes making it challenging to synthesize the latest interaction and visualization results. We therefore introduce a taxonomy of interaction technique for 3D visualization. The taxonomy is organized along two axes: the primary source of input on the one hand and the visualization task they support on the other hand. Surveying the state of the art allows us to highlight specific challenges and missed opportunities for research in 3D visualization. In particular, we call for additional research in: (1) controlling 3D visualization widgets to help scientists better understand their data, (2) 3D interaction techniques for dissemination, which are under‐explored yet show great promise for helping museum and science centers in their mission to share recent knowledge, and (3) developing new measures that move beyond traditional time and errors metrics for evaluating visualizations that include spatial interaction.


Introduction
The visualization research community has long recognized the importance of user interface research and the special role that interactive techniques can play in data visualization processes. Over the years, calls for additional research on interactive techniques have been raised repeatedly, highlighting the critical and foundational role of interaction within the visualization communities that focus on both non-spatial data (e.g. [Rhe02,TM04,CT05,YKSJ07]) and spatial (often 3D) data (e.g. [Sut66, Hib99, Rhe02, Joh04, TM04, Kee10, BDP11, LK11, KI13, Mun14, CSVBS15, FCC*15]). However, more study of vis-centric interaction is needed. Our specific interest in this survey is spatial 3D data. While interactive systems and techniques are certainly published at visualization venues, we have noticed that research papers that introduce new interaction techniques for exploring, filtering, selecting or otherwise manipulating 3D data are frequently published at non-visualization venues, so that visualization researchers may not always learn about them. We hope to bridge this gap, paying special attention to spatial user interfaces. We believe there is significant potential to make 3D interactive visualization systems more effective by leveraging new readily available sensing technologies [Bes17, LKM*17] and adapting 3D interaction techniques developed in other contexts [JH13] to work for the special needs of interactive data visualization tasks. Such an approach would make use of the skills to interact with the physical 3D world that people naturally possess, and, thereby have potential for great positive impact since so many important datasets have an inherent 3D structure: data acquired from simulations as well as spatial data, medical data or biological data. To contribute to this future, this state of the art report surveys the spatial 3D interaction techniques that have been presented in the literature, presents a task-based framework for guiding new research on vis-specific spatial 3D interaction techniques, and repeats the call for additional research on spatial 3D interfaces specifically to support 3D visualization tasks. Spatial 3D datasets are particularly challenging to visualize. Unlike general 3D interaction, visualization of 3D datasets is less focused on creation than it is on sense-making. Making sense of 3D datasets requires an ability to manipulate the data or the view, to select in 3D specific regions of interest, and to place and manipulate visualization widgets to better understand the inherent structure of the dataset or some of its internal properties. While 3D interaction techniques address some of these challenges on pre-defined objects, 3D visualization techniques should enable users to achieve all operations on non-predefined structures. This additional requirement is not satisfied by most of the classical 3D interaction techniques when used in a spatial visualization context. Moreover, 3D interaction, such as manipulation, selection and annotation, becomes more challenging when applied to complex features or structures of 3D VIS datasets, especially when more precise interaction is needed. For instance, selection of neural fibers becomes more difficult since they are a lot thinner and denser than the objects that are used to develop more generic 3D selection techniques. Similarly, annotation is more challenging when those annotations need to be linked precisely to a 3D volumetric context rather than just recorded as a Voiceover.
For the purpose of this report, we characterize spatial 3D interaction techniques for data visualization as post-WIMP user interaction techniques that employ tangible interaction proxies, tracked gestures, and/or 3D input devices to enable users to better leverage natural, human skills for working with data visualized in 3D spaces. This notion closely relates to the term 3D interaction, which is included as a keyword in several past surveys. Our report is unique in its combination of (1) focusing on interaction techniques to support data exploration tasks and (2) surveying multiple classes of spatial interaction techniques. We discuss prior work on both visualizationspecific interaction techniques and more generic 3D interaction techniques. The latter have traditionally appeared at venues such as IEEE VR (which merged with IEEE 3DUI), ACM CHI, ACM I3D (especially, in the early years of the conference), ACM SUI, ACM UIST, IEEE ISMAR, and sometimes also ACM ISS, which only a small portion of the visualization research community regularly attends. Thus, an important contribution of our report is to bring the results from these communities together within a single document.
Past surveys on 3D interaction techniques [Han97, LKM*17, JH13, JH14, LKM*17, MCG*19] have focused on the generic tasks for 3D interactions --namely selection, manipulation, navigation and system control --but not on specific tasks that are of paramount importance for visualization applications such as 3D picking/selections [Wil96b], concurrent manipulation of data and exploration objects, specification of 3D primitives for seeding or path planning, temporal navigation etc. Hand's survey [Han97], written in 1997, covers just the important early work in this area, while Christie et al. [CON08] focused exclusively on camera control. Other reviews focus on specific interaction paradigms. For example, Paneels and Roberts' review of haptic data visualization [PR10] discussed solely how data can be visualized or perceived through haptic interaction. The survey by Groenewald et al. [GAI*16] only covered 3D control with mid-air gestures. Another relatively recent survey of 3D interaction techniques by Jankowski and Hachet [JH13,JH14] placed the focus on generic 3D manipulation with mouse-based and touch-based systems. This is also the focus of the more recent work from Mendes et al. [MCG*19]. Finally, some authors focused on 3D data visualization but did not address 3D interaction. For instance, the survey from Oeltze-Jafra et al. [OJMN*19] focuses on medical data generation and its analysis without highlighting the large body of work done on interactive visualization tasks for 3D datasets.
We organized our state of the art report as follows. In Section 2, we define the interactive tasks users must perform with visualizations. In Section 3, we present the actual survey of the literature, using the tasks defined in Section 2 as an organizing principle. Finally, in Section 4, we discuss opportunities for future work that result from our review.

Defining a Classification System
Before surveying the spatial interaction literature, it is important to have a common understanding of both the interactive tasks users need to perform with visualizations (e.g. view manipulation, working with widgets, data selection) and the major interaction paradigms (e.g. tactile, tangible, and mid-air) that are possible with spatial interaction techniques. These two topics form the two axes of the classification system used for the survey presented in Section 3.

Axis 1: Spatial interaction paradigms
The first axis is the spatial interaction paradigm. A variety of spatial interaction paradigms have been investigated for both 3D manipulations and visualization-specific interaction techniques; we focus in particular on tactile/touch interaction, tangible interaction, mid-air gestural interaction, and hybrid interaction, i.e. interaction techniques combining several interaction paradigms, since these paradigms are most readily supported by current spatial interface hardware. While voice input could also be considered, using voice for direct manipulation is generally discouraged [KI13] and it is seldom used alone. Consequently, voice input falls under our category of hybrid interaction paradigms.

Paradigm 1: Tactile and pen-based interaction
Sutherland's Sketchpad [Sut64], created in the 1960s, used a lightpen to interact on a screen, demonstrating an early form of the direct-manipulation interactions that are now common in pen and touch-based interfaces. Research on interacting with touch screens followed with different sensing strategies: capacitive sensing [Joh65,Joh67], optical tracking [EJG73], or resistive sensing [CJWC75]. The first multi-touch screen followed in 1976: the keyboard with variable graphics [KM76]. Since then, multiple sensing systems and configuration have been explored. With the widespread adoption of mobile touch-enabled smartphones, horizontal projection surfaces integrated into a tabletop soon also became touchenabled. Shortly thereafter, tabletops became possible desktop surrogates.
The benefits of tactile interaction over other forms of interaction have been deeply studied for a variety of tasks and parameters. Studies have compared mouse and tactile interaction for speed [SS91,FWSB07,GBC13], error rate [SS91,FWSB07], minimum target size [AZ03], etc. Similarly, studies have compared tactile with tangible interaction for tasks as various as puzzle solving [TKSI07,Wan10], layout-creation [LJZD10], photo-sorting [TKSI07], selecting/pointing [RGB*10], 3D manipulations [BIAI17b] and tracking [JDF12]. To summarize, tactile interaction appears to be a good compromise between fast and precise input. Tactile interaction also lends itself to a direct style of interaction [BIRW19] where users' place their fingers right on top of the 2D or 3D representations of the data they wish to manipulate. The directness of tactile interaction has been studied in previous work [SS91, MCN94, PM03, SBG09, KH11, LOM*11, SG15,BIRW19]. Studies confirm that it increases the user's impression they are making direct manipulations [Shn83] of the data they are visualizing, which can make the interaction more engaging and can encourage further manipulations. Despite these interesting advantages, tactile interaction is often limited and limiting. It is limited because it is often used as a discrete interaction mechanism, while our human interaction mechanisms are continuous [FTW12]. It is also limiting because many complex tasks (in particular for 3D manipulations) require input/control with more than three degrees of freedom. Providing them using tactile input usually requires multiple fingers, thus leading to occlusion issues.
It is possible to distinguish two main types of devices offering tactile interaction. First, there are touch-enabled tabletops or wall displays, which are fixed and usually facilitate the viewing of large data with a possibility to carry out co-located cooperative work. Second, virtually all mobile devices today offer a multi-touch interface; they are easy to transport and affordable. These two types of devices, because of their inherent size, lead to different interaction designs. Indeed, while a large display can easily support more than three fingers without much occlusion, mobile devices are not that permissive. Similarly, large screens allow designers to add widgets on the screen, but this is not possible on mobile devices due to their much smaller screens where widgets could waste some precious visualization space. Other forms of touch interaction can also be found in the literature (e.g. skinput [HTM10]), but to the best of our knowledge, are not used for 3D spatial visualization applications.

Paradigm 2: Tangible and haptic interaction
The first prototypes and platforms for tangible interaction were developed and studied as early as 1976 with [PMIoT76]'s Slot Machine to help children discover programming languages. Other prototypes followed [Ais79,FFF80]. In 1996, Fitzmaurice introduced the concept of graspable user interfaces [Fit96]: an interaction paradigm that used physical objects to synchronously manipulate digital counterparts. In his work, the graspable props were associated with specific functions and allowed users to interact with both hands simultaneously. This concept evolved and expanded into Tangible User Interfaces (TUIs) [IU97]. Tangible User Interfaces aim to leverage peoples' natural skills for manipulating the surrounding physical environment [Fit96,IU97,Ish08]. Tangible input inherently offers six integrated DOF per prop. Several studies have investigated the benefits of TUIs when compared to other interaction paradigms for different tasks (e.g. [CMS88, HTP*97, TKSI07, Wan10, RGB*10, TKI10,BIAI17b]). Overall, tangible interactions have been proven to be useful for 3D rotations [CMS88, HTP*97] and, more generally, for fast and precise 3D manipulations [BIAI17b], collaboration [MFH*09, OAWH11], and entertaining [XAM08,BIAI17b].
Tangible interaction is promising for visualization tasks and purposes: it allows users to achieve complex 3D manipulations with simple real-world style gestures [Fit96,IU97]. Consequently , tangible interaction is perceived as more flexible than other interaction paradigms usually are (e.g. [HPGK94,BIAI17a]).
Tangible props may take the form of the data, serving as both a physical representation of data and a means of interacting with the data, or their physical form may be more abstract, providing passive haptic context or support for the interface, but without much visual feedback on the prop itself, e.g. [HPGK94, GHP*95, FBZ*99, IGA14a, JLS*13, IGA14b]. Extending beyond passive haptic aids, we also include active haptics in this paradigm. Haptic devices enable 3D manipulation and tactile feedback within a restricted interaction space (due to the limited range of robotic arms, cables, etc.). Manipulations with these devices can be programmed to feel realistic, as they would in the real world, or 'extra' effects can be added using programmatically controlled vibrations and forces. This property has been used for visualizing 3D datasets. One possible advantage of feeling the data through these output forces is the ability to explore and understand dense 3D datasets where occlusion or cluttering prevent clear visual-only displays (e.g. [TRC*93, AS96, LPYG02, LPGY05, LCP*07, COPG15, YJC15]).

Paradigm 3: Mid-air gestural interaction
Mid-air gestural interaction is often traced back to The ultimate display concept introduced Ivan Sutherland [Sut65], although concrete implementations are only recent, with a first step taken by the commercialization of the Wii controller [KV19]. Like tangible interaction, mid-air gestural interaction mimics the physical actions we make in the real world [FKK07] and, thus, has been studied as a promising approach to 3D manipulation [KTY97, HIW*09, WPP11, SGH*12], including for the purpose of increasing accuracy [FKK07,Osa08]. While it is possible to manipulate and track tangible objects in the air, the research notes significant differences in mid-air gestures made using only the hands; thus, this paradigm focuses exclusively on inputs made in mid-air without the need to hold an object. Such gestures can be tracked via wearable technologies, such as a glove [FMHR87], or optically. Optical tracking sometimes requires placing markers on the body (e.g. [FH00]). Solutions for precise tracking of the fingers have traditionally been elusive or expensive, but recent devices, such as the Microsoft Kinect and Leap Motion, now support precise hand and finger tracking (e.g. see [CPLCPFR14,SKR*15]), enabling a richer set of hand or body gestures. While this passive optical tracking helps freeing users from the need to wear any markers or devices [MS16,EAG17,Iss17], the accuracy is quite to the same level [EAG17].
In the medical field, the need to maintain a sterile environment naturally leads to an interest in touchless interaction [ [HPSH00] included adding low-cost components to mobile devices; the authors concluded that the resulting hybrids 'may prove to be the most practical approach.' Others also argued for the benefits of hybrid approaches to augment a limited interaction space [KR09], overcome the inherent limitations of a device (e.g. augmenting the number of DOF that can be manipulated [KRG*12], reduce the occlusion limitation with tactile interaction [BIH08]), combine the benefits of two interaction paradigms [BIAI17b], or simply tackle complicated tasks (e.g. seeding point placement in 3D [BIAI17a]). The resulting hybrid interaction paradigms can be used to support tasks ranging from abstract visualization tasks [AMR16,CVLB18] to 3D manipulations [LODI16,BIAI17a,BAI17,Bes17], and the combinations of paradigms are varied: pressure and tactile interaction (e.g. [CVLB18]), tactile and tangible interaction (e.g.[JGAK07, BIAI17a, BSY*19]), pressure and tangible interaction [BAI17], mid-air gestural interaction and tactile interaction (e.g.[WB03, HIW*09]), mid-air gestural interaction with tangible interaction (e.g. [SLM*03]) or vocal interaction with others [TFK*02]. We, however, limit our review to hybrid paradigms that specifically address 3D visualization problems and tasks.

Axis 2: Interaction tasks for 3D visualization
The second axis categorizes the 3D visualization tasks users must accomplish using the various interaction paradigms identified in Axis 1. Formal task taxonomies have been developed previously in both the visualization and 3D user interface research communities. Accordingly, our classification combines aspects from both areas of related work. The tasks involved in data visualization have been studied extensively, and task classifications have been proposed, both in early work [WL90,CR96,Shn96] and more recently [YKSJ07, BM13, RCDD13, SNHS13, Mun14, RAW*15, KK17,LTM18]. Most of these visualization task classifications are generic in the sense that they can apply to any type of visualization, including our focus on 3D visualization. Likewise, classifications also exist for understanding 3D interaction. LaViola et al. [LKM*17] identify the major 3D user interface task categories as selection, manipulation, navigation and system control. These categories are similarly generic -they can apply to any 3D user interface, therefore also including the focus of this survey on interactive visualizations.
Although task taxonomies from both areas clearly apply, our work also builds upon the arguments laid out by Keefe and Isenberg [KI13] who suggest that 3D visualization does introduce special requirements for interaction tasks. One example is exploring dense data within 3D neural pathway visualizations; the precision required for making 3D selections in this visualization context is far greater than in the scenarios typically studied within more generic 3D user interface research (e.g. quick 3D modelling, selecting items on a shelf during a virtual shopping experience). In addition to selection, other generic 3D interaction tasks such as, manipulation, and navigation, also have special requirements in the context of 3D visualization. To emphasize these and connect as closely as possible to earlier classification systems (sometimes a direct 1-to-1 mapping is impossible), we organize this Axis of the taxonomy around three high-level task groups: (1) Volumetric view and object manipulation; (2) defining, placing and manipulating visualization widgets and (3) 3D data selection and annotation. In the following discussion we place these task groups as closely as possible within the context of earlier classifications and describe the special 3D visualization challenges these tasks present and how they can be addressed by spatial interaction techniques.

Task Group 1: Volumetric view and object manipulation including clipping
Volumetric view and clipping manipulation tasks are fundamental to visualize spatial 3D data effectively because it is rare that a single viewpoint can be found where all of the important aspects of the data may be analyzed. This issue is most often addressed via interaction to adjust the viewpoint of the rendering(s) or to manipulate clipping planes within the data. As a category, Volumetric View and Object Manipulation corresponds to 3D data space/view navigation and temporal navigation in Keefe and Isenberg's taxonomy [KI13] and relates closely to more general VIS tasks of explore and reconfigure; the closest link in the 3D tasks extracted from LaViola et al.
3D manipulations are often studied in human computer interaction to allow users to translate objects, rotate around the three axes, and perform uniform (or non-uniform) scaling. Considering that any manipulation of an axis requires 1 Degree of Freedom, this translates to providing at least 7 Degrees of Freedom (DOF), and possibly up to 9. A wide variety of techniques have been proposed (e.g. [Han97,HCC07, LGK*13, IBIA16, LKM*17]) and most have also already been surveyed [JH13].
However, volumetric view and object manipulation goes beyond 'simple' 3D manipulations and raises specific challenges that are not typically present in general spatial interfaces for navigation (e.g. redirected walking techniques, WIMs, wand-based flying). While simple rotations and/or translations make it possible to view 3D data externally, many 3D spatial datasets are dense, and relevant internal aspects of the data are, therefore, naturally occluded. Interactions for volumetric view and object manipulation should directly address this need. Cutting planes or transfer function editors are often used for this purpose, and since these are widget-based, one might consider these as falling under a visualization widget manipulation task. However, from the standpoint of the user's cognitive approach they are tied so tightly to view manipulations (e.g. moving the camera inside volumetric data necessarily involves clipping) that it can be useful to think of these as integral volumetric view and object manipulation tasks. In fact, we argue that this is the type of insight that is useful when determining the best ways to translate 3D user interfaces created for more generic 3D environments (e.g. architectural walkthroughs, simulations) to 3D visualization applications.
Many 3D visualization view manipulations consider only cutting planes to slice through the data, but it is interesting to notice that some experts might need non-planar or free-form surface slicing of their data to provide an easier and more natural analysis of some datasets [PTH98, GPB99, PSOP01, MFOF02, MFF03, SGH03, KVLP04, RRRP08, REM11, KGP*12, LSG*16, PCE*17]. These approaches can be linked to techniques such as peeling, which can be useful for surgical planning [SGH03, KVLP04, BHWB07, MRH08, REM11, HMP*12, PCE*17], but can also be used in other domains, such as reservoir visualizations [SSSS11]. Non-planar slices are often defined relative to the data but can also be specified with significant user input (e.g. [BHWB07] Beyer combines 2D mouse input with medical scan data). Specifying, modifying, and positioning non-planar slicing objects or without relying on underlying data poses an interesting challenge for spatial interfaces, and this is a topic that we return to in later sections (see Section 3.2 and Section 3.1).
Another challenge is the manipulation of data and cutting planes with axis-based constraints [BKBN12]. 3D visualization users must often manipulate/zoom heterogeneous datasets, including many manipulations along a single axis [FGN10], and more generic 3D manipulation techniques, as typically studied in the user interface literature, do not often address this latter point.

Task Group 2: Defining, placing and manipulating visualization widgets
Spatial 3D data may be analyzed simply by looking, but interacting with filters, probes, and other visualization widgets is required to more deeply explore and interrogate the data. Visualization widgets are virtual tools that are manipulable by users in much the same way as any traditional 2D or 3D user interface widget but that have a primary purpose of displaying data. A cutting plane that users can grab and manipulate relative to volume data is one example that fits well within this category. As a category, Defining, Placing, & Manipulating Visualization Widgets corresponds to positioning/manipulating data exploration objects or probes such as drilling cores (2 DOF) and specifying/manipulating 3D points and other primitives for par-ticle seeding, picking, or path planning in Keefe and Isenberg's taxonomy [KI13]. Like task group 1, this task group relates most closely to the more general VIS tasks of explore and reconfigure and LaViola et al.'s [LKM*17] Manipulation and Navigation task. Like Keefe and Isenberg, we believe it is important to highlight this as a separate task category because of its longstanding importance in exploratory 3D visualization systems.
Visualization widgets are extensively used in 3D flow visualization. For instance, aerodynamicists studying fluid flows might begin a visualization session by manipulating cutting planes to understand the internal structure of the visualized data. Then, they often need to rely on placing and manipulating widgets (e.g. particle emitters, streamline rakes) to further explore and understand the data or create useful pictures for communicating their findings. Many flow visualization widgets rely upon particle tracing and appropriate particle seeding. Weightless particles are placed within a vector field and then advected with the flow. It is then possible to integrate the path of particles along the flow as a function of time [Man01] and to visualize the resulting path with lines, ribbons or stream surfaces [PvW94, SFL*04]. The quality of the resulting visualization, often relates to the quality of the original particle seeds. Thus, controlling this seeding interactively using a widget is often a major benefit. Semiautomated techniques are also available, for example, specifying a single 3D origin from which several particles are generated with randomly jittered 3D offsets. This technique has proven useful for analysing reservoir data [Wil96a] or other forms of flow visualization [SBPM98,Man01,Sch07,KI13]. Aerodynamicists also make use of streakline or filament line visualizations [Fre93, BJS*98], which can be implemented as virtual smoke emitter widgets. The results help to visualize vortices more directly [Fre93], and, again, 3D placement of the emitter benefits from interactive control. Particle seeding is also used in medical visualization to depict pulsatile blood flow [Ste00]. Similarly, traces can help meteorological visualization of typhoons [LGY15]. While it is possible to display all streamlines simultaneously for each field in the data, this can lead to occlusion. Automated algorithms have been developed to minimize occlusion (see e.g. [TEC*16]), but the issue can be avoided altogether with the help of interactive placement.
Interactive visualization widgets have also been used in other contexts. The Glyph Lens technique uses a magic lens effect to overcome issues of occlusion for viewing volumetric tensor fields [TLS17]. A full overview of lenses and their use in visualization is available in the survey from Tominski et al. [TGK*17]. In addition to these primitives, domain expert sometimes need to assess the values of specific points in their datasets, a feature that is often implemented with a probe widget. Interactively positioned 3D probe widgets have been used to facilitate the computationally-heavy inspection of 4D MRI Blood-Flow [vPOBB*11] and other complex data [MEV*06, KGP*12]. Filter widgets have been explored [GNBP11], as have measurement widgets for assessing spatial relations to help, for example, for surgical planning [PTSP02,RSBB06].

Task Group 3: 3D data selection and annotation
Selection is the first step in accessing deeper information about some subset or feature of the 3D spatial data, annotating these data to include insights or questions, and many other operations that are critical to interactive data analysis. Selection can take many forms depending upon the data involved. Dense data with small features of interest and/or features that are not well defined, often make this task a significant challenge. As a category, 3D Data Selection and Annotation corresponds to 3D picking or selection of data subsets for further analysis in Keefe and Isenberg's taxonomy [KI13]. 3D Selection maps to the more general VIS tasks of abstract/elaborate and filter. The equivalent 3D task from LaViola et al. [LKM*17] depends upon the implementation but can fall under Selection, Manipulation and Navigation, System Control, or even Symbolic Input.
Selecting specific regions of interest is essential for revealing interesting patterns, properties, or internal structures in 3D data [Wil96b]; thus, selection is a critical task to support for data visualization [Ban14]. 2D regions are usually defined using picking, brushing or lassos --often achieved with a mouse/pen or on a tactile screen (both modalities provide the needed 2 DOFs). Many generic 3D object selection techniques in virtual environments rely on 3D ray-casting [AA13]: a ray, cast from the user's hand, selects the first object it hits. A number of variations on ray-casting are possible, and it is probably the most widely used 3D selection technique [TJ00, CSD03, dHKP05a, OF03, DHKP05b, GB06, VGC07, AAT08, KGDR09, BPC19, BS19, RBP*19, LYS20]. A major limitation of ray-based selections is, of course, the difficulty of selecting small and/or far-away objects, which is often complicated by hand jitter. Expanding the ray to a cone helps with this [LG94, FHZ96, OBF03, SBB*06, SP04,Ste06], and other primitives may also be used [ZBM94,WHB06,VGC07].
Unfortunately, many of these classic 3D selection techniques do not translate directly to 3D spatial visualization. The level of precision needed to make useful 3D selections for scientific or medical analysis tasks is one factor. Another factor is that spatial data are often volumetric, without clearly defined or discrete objects or structure; this makes it difficult to apply 3D object selection techniques that commonly rely on 3D intersection tests.
Annotation does not appear by name in Keefe and Isenberg's 3D visualization taxonomy [KI13] but is mentioned in general VIS tasks [BM13]. Depending on the implementation, it may include or require 3D picking or selection of data subsets for further analysis. For that reason, we grouped it here with 3D selection, even though it requires an additional input (which is often categorized as System Control or Symbolic Input). The need to integrate annotation into visualization systems has been highlighted by many different researchers in the literature [SBM92,HPRB96]. Springmeyer states, 'while images may be the goal of visualization, insight is the goal of the analysis' [SBM92]. Annotation is essential to sharing these insights. Scientists use annotation to keep track of their own findings and points of interest or easily share findings with collaborators or lay people. Providing a good contextual-aware annotation system fosters knowledge-sharing, teaching, and remote collaboration. Annotation can take the form of textual notes, drawings, voice recordings, and other records input by users. In the research context, the contextual information needed to place annotations within the context of the data is typically also included [HPRB96]. Thus, supporting interactive 3D annotation for visualization means that users must be able to record insights and other information within the spatial context provided by the 3D data. Automated positioning algorithms can assist with this challenge (e.g. [PHTP*10]), but defining the proper interface for annotating 3D visualizations remains a major challenge. Indeed, annotation within virtual environments, even outside of the visualization context is a longstanding topic of research that continues to be actively studied today [AS95, BHMN95, MBJS97, CL17, CG17, PMMG17].

Survey of the State of the Art
Now that the major spatial interaction paradigms (Axis 1) and visualization tasks (Axis 2) are defined, this section presents a survey of the state of the art of spatial interaction for visualization organized according to these two axes. To find relevant papers, we followed a semi-systematic approach. We used Google Scholar to find papers with specific keywords (e.g. '3D visualization', 'spatial interaction', '3D interaction'). Once we found a relevant paper, we followed the trail of citations: we looked at the references in that specific paper and the papers citing that specific paper. We also included papers suggested by reviewers of our manuscript. Finally, we classified all of the papers using the two axes. Figure 1 provides an overview of the entire collection of papers. The four major sections below correspond to the four interaction paradigms of Axis 1 and, within each section, we further divide the discussion into three subsections to correspond to the three task groups of Axis 2.

Visualization with tactile and pen-based interaction paradigms
This first group of techniques covers approaches that provide spatial input directly on a screen surface, via touch or pen input.

Volumetric view and object manipulation tasks with tactile input
3D object manipulation on tactile screens has been widely researched in general (e.g. [Han97, LKM*17, HCC07, RDH09, LAFT12, JH13, LWS*13, PBD*16, KKKF18]). Researchers have also explored 3D user interfaces for touch-based control using spherical or cubic screens (e.g. [GWB04,dlRKOD08]). However, none of these approaches address tasks that are specific to 3D spatial data visualization, such as ways to see through the data with cutting planes, or axis-aligned manipulations. One of the key design decisions in implementing tactile manipulations of 3D content is whether to control all DOFs simultaneously (e.g. [RDH09,LAFT12]) or to separate them using constraints or some other method. The trade-offs have been discussed in the non-visualization-specific literature (e.g. [ZS97a, ZS97b, VCB09, MCG10]), but some researchers note a special benefit to separating DOFs in visualization-specific cases [Ise16, CML*12]. In the remainder of this section, we limit the discussion to tactile interactions that have been designed explicitly with visualization purposes in mind.
The most common tactile 3D manipulation techniques from nonvisualization applications have also been used for data visualization. For instance, Lundström et al. [LRF*11] implemented a 3D RST (one finger for x-/y-rotations, two fingers for z-rotations, panning and zooming) technique for medical data visualization. To provide axis-constrained cutting plane manipulations, they added  GUI-based pucks. The 3D RST technique was estimated to be the most widely implemented for manipulation and visualization of 3D data in software for mobile devices in 2016 [BIAI17b].
Tactile 3D manipulation techniques have also been designed, from the start, specifically to address the needs of visualizations [Ise16]. Au et al. [ATF12], for example, proposed to use multi-touch gestures on a large display for camera control, object selection, uniform scaling, axis-constrained rotation and translation (two-finger gestures on a specific axis), and object duplication (three-finger gestures). They compared their approach to a traditional widget-based interface and concluded that a gesture-based approach can be just as efficient. One limitation of this approach is that users must discover and learn the set of tactile gestures before they can be used.
To overcome the discoverability issue, Yu et al. [YSI*10] developed FI3D. The FI3D widget surrounds the data visualization like a rectangular frame, and each edge of the frame is used to activate a different 3D manipulation. Translations around the x-/y-axes are initiated with a single finger interaction in the central space. Arcball (x-/y) rotations are initiated with a single finger touch on the frame and a drag into the centre visualization region. Touching the frame with a second finger during this interaction, constrains the rotation to a single axis (depending on the frame). Rotations about the z-axis are controlled by dragging a single finger along a frame (as opposed to perpendicular to it). Widgets in the corners of the frame activate zooming operations, and two additional horizontal bars along the top and bottom of the frame provide z-translations. Yu et al. also mention that the mapping could be changed to adapt to other datasets which might require different manipulations based on their inherent properties, as exemplified in the implementation of FI3D for the exploration of fluid flow data [KGP*12].
The principle of widget-controlled interaction was also used by Cohé et al. [CDH11], who developed tBox (see Figure 2a) to provide users with easy control over 9DOFs based on the context set by the location of their touches and the number of fingers used. The technique can easily be applied to 3D data views and relies on a cube-shaped widget overlaid on the scene. The widget contains multiple interaction zones and is oriented to match the orientation of the scene being viewed. One-finger manipulations along the edge of the cube translate along the parallel axis. One-finger manipulations on the sides of the cube control single-axis rotations. Scaling is controlled by pinching the cube, on the cube sides for uniform scaling while a pinch gesture and on opposite edges will initiate a nonuniform scaling. In another study [LODI16], the tBox technique was found to increase the feeling of precision for 3D interaction.
As noted by Yu et al. [YSI*10], dataset-specific interaction techniques are sometimes needed. Fu et al. [FGN10] present an example, combining trackball rotations with a custom 'powers-of-ten-ladder' (see Figure 2b). The technique facilitates exploration of astronomical datasets, which require rotations and scaling operations that span large magnitudes (translations are less useful in this scenario). Arcball rotations are controlled using a single finger, panning operations are controlled with five-finger gestures, and zooming operations are controlled using a bimanual two-finger pinch. Two-finger inputs activate the ladder widget, where each region corresponds to a power of ten zoom level. Fu et al. created this technique to alleviate the strain on users' hands when performing large zooming operations in astronomical datasets. While Fu et al. extend more traditional tactile interaction to support the special needs of astronomical spatial data, Kim et al. designed a tactile interface to support the special needs of navigating through and comparing spatial datasets that change over time [KJK*15]. The approach, applied to historical architectural reconstructions across different time periods, combines a timeline widget, multi-layer map-based navigation, and immersive visualization with staged, animated transitions between datasets. Other dataset-specific, or dataset-inspired, interfaces include the work by Sultanum et al. [SVBCS13], which addressed the challenge of navigating within geological outcrops via a two-step technique; users first indicate a navigation surface onto which the camera will be constrained, and touch gestures are then used to tilt, zoom, or pan the camera with respect to the x-/y axes.
Finally, some tactile interactions for visualization take the approach of augmenting tactile 2D input with additional inputs. This has most frequently been done on tabletops and in hybrid virtual environments (e.g. [BI05, SAL06, HIW*09, MJGJ11]) with hand tracking to augment touch input. Jackson et al. [JSK12] applied this concept to 3D data visualization, using the posture of the hand above a 3D stereoscopic table to allow users to tilt, bend or twist datasets within the 3D space (see Figure 2c). Song et al. [SYG*16] also augmented touch input with hand-posture sensing to help manipulation and exploration of 3D visualizations. They distinguish between the left/right hands, thumb and other figures, and hand tilting versus finger movement to provide methods for manipulating 3D data and cutting planes. Several of these techniques rely upon more-than-2finger gestures or screen-space widgets that are appropriate for large displays but may not translate well to smaller, mobile displays. For smaller displays, pressure has been used to augment tactile interaction, in particular to separate DOFs when manipulating 3D objects [WBAI17,WBAI19]. In this work, a combination of light and hard touches with one or two fingers were used to independently manipulate translation and rotation along the x-and y-axes or the z-axis. Panchaphongsaphak et al. [PBR07] also use pressure-augmented touch but for the purpose of orienting and translating a cutting plane within medical data. Pressure beyond a given threshold was used to translate the slicing plane in the direction of its normal.
Several other tactile input techniques have been developed for manipulating cutting planes. For example, Song et al. [SGF*11] enabled users to move cutting planes with one-and two-finger motions on a mobile phone. Klein et al. [KGP*12] used a three-finger technique to control a cutting plane within a FI3D widget: two fingers on the cutting plane specified a rotation axis, and a third finger somewhere else in the data view specified the amount of rotation. Or, by moving the third finger along one of the FI3D frames, the cutting plane was translated in the direction of its normal. Sultanum et al.'s [SSSS11] splitting and peeling techniques also relate to the use of cutting planes when the cutting operations that are constrained to the data's axes. The tactile input is used to either separate the data into two sub-parts or perform a local distortion that helps geologists explore the data's spatial structure. Recently, Sousa et al. [SMP*17] used a VR setup and touch sensing on a table with gesture based control of cutting planes to enable radiologists to explore 3D data. By placing the touch surface on the desk before the users and, thus, explicitly separating the 2D display from the stereoscopic 3D data display, Sousa et al. avoid the disconnect between 2D surface input and 3D graphical displays cited as a concern by other researchers in previous work [VSB*10, SHSK08, VSBH11].
To summarize, a myriad of touch techniques and platforms have been explored to support volumetric view and object manipulations. Overall, tactile input has been shown to be useful for 3D visualization, especially when combined with axis-constrained interaction [BKBN12,Ise16]. Researchers have adapted tactile interactions for visualization to different computing platforms (e.g. small displays), in part, by augmenting touch with additional inputs, such as pressure [WBAI19]. Researchers have also shown the utility of dataset-specific tactile interfaces (e.g. [FGN10,SVBCS13]). While the work in this area covers a broad range of topics, the community has yet to establish platform-or dataset-specific interface guidelines or standards that might help developers to follow best practices for 3D visualization with touch input [BIAI17b].

Visualization widget tasks with tactile input
Interactive seed point selection and manipulation is an important task for 3D visualization, especially for fluid flow data. Particle tracing based on these seed points helps researchers understand the motion of the fluid, and is one of the most common 3D flow visualization strategies (as explained previously in Section 2.2.2).
Using touch input and a dedicated widget, Butkiewicz and Ware [BW11], for example, facilitate the seeding of particles at various depths to explore ocean currents (see Figure 3a). Their setup is quite unique: they combine a stereoscopic screen that displays the 3D data with touch input, a setup which usually creates problems [SHSK08,VSBH11]. In their specific case, however, they place the physical touch surface (stereoscopic display) at an angle and render the data such that it is displayed at a similar angle, with the ocean surface coinciding with the physical touch surface. Butkiewicz and Ware then use data exploration widgets called 'dye poles' placed at the surface, with controls to create and manipulate seed point placement at varying ocean depths. Other widgets can be used to specify points or paths in 3D space, e.g. using Butkiewicz   display, but treat the cutting plane that they place in the projected view of a generic flow dataset as a proxy to specify 3D locations for seed point placement. This is combined with an unprojected view of the same cutting plane that acts as a widget. Using this widget, users can place particles around a small region (single-finger input), along a line embedded into the cutting plane (two-finger input), or around a larger circular volume (input from three or more fingers). In addition to particle seeding, they also make it possible for users to place drilling cores as columns oriented perpendicular to the cutting plane for data read-out. Coffey et al. [CML*12] also use a stereoscopic data projection, but in contrast to the two previously described techniques, they separate the stereoscopic (and vertical) data display from a monoscopic horizontal touch-sensitive surface, which is used for input. Their SliceWIM technique reinterprets the classic VR World-in-Miniature (WIM) interface technique to apply to volumetric data. Touch input is used to manipulate the WIM widget, which includes features for controlling slicing planes and selecting flow lines that pass through these planes as well as defining 3D points and curves relative to the volume data. The ability to touch with many fingers simultaneously enables users to specify and rapidly adjust complex selection shapes on the slicing planes and the linked 3D visualization displays the results in real time.
To summarize, researchers have used touch input to control visualization widgets in a variety of ways, introducing creative solutions to manipulate 3D contexts through this type of 2D input, and provide features impossible to implement using single-cursor techniques.

3D data selection and annotation tasks with tactile input
In addition to view changes and data object manipulation, one of the most essential tasks in visualization is data selection and picking. While they can be achieved with established techniques such as ray casting (we review some of the main selection metaphors in Section 2.2.3) for datasets that consist of explicit objects, additional techniques are needed for continuous data, such as volumetric scalar fields, particle clouds or flow fields. Tactile input selection techniques often mirror selection techniques developed for more traditional input modalities [Wil96b, AA09, AA13, Ban14], but researchers showed that the more direct style of control often possible with tactile input as compared to mouse input leads to benefit for visualization [BIRW19]. To provide direct manipulation with 3D con-tent, such interactions are often designed to be view-dependent and possibly structure-aware, to help users specify depth.
For picking in volumetric data, e.g. Wiebel et al. [WVFH12] introduced WYSIWYP -a technique that can easily be applied in a tactile input context. Given a selected 2D point on the filmplane, they analyse the corresponding view ray passing through the volume data, take the current transfer function into account, and select the largest jump in accumulated opacity. This typically denotes a feature that is locally visually dominant. The picking technique is thus view-dependent and structure-aware. Shen et al. [SLC*15] later described a variation of WYSIWYP which computes a saliency measure and picks the 3D point accordingly. Yet, picking single 3D points is often insufficient for preparing for further data analysis --in such cases, users have to be able to specify spatial subsets of the 3D data.
Structure-aware selection techniques that support selecting subvolumes of interest were pioneered by Owada et al. [ONI05]. Their Volume Catcher relies on a user-drawn stroke on the visible contour of a subset of the volumetric data, which Owada et al. then use to segment the underlying data to return the intended volume of interest. Inspired by this technique, Yu et al. [YEII12] presented CloudLasso, which used a user-drawn 2D lasso shape, extended it as a generalized cylinder into 3D space, and then used kernel density estimation to select the subset within the cylinder whose scalar property surpassed a given threshold (Figure 3b). This approach had the added benefit that the threshold could be adjusted after the lasso had been drawn, which enables users to adjust their selection. Shan et al.
[SXL*14] presented a further extension, which makes it possible to select only the largest connected component of the data rather than all components within the generalized cylinder, arguing that this is likely to better match the user's intent. Finally, to make it possible for users to better control which connected component is finally selected, Yu et al. [YEII16] later extended their work and introduced three CAST techniques, two of which used the shape of the drawn lasso to control the single component to select, while the third technique, named PointCAST, only relied on a single 2D input point to specify a 3D region of interest.
Selection techniques for other 3D spatial data have been explored. For line data, Akers [Ake06], e.g. described the CINCH fibretract pen-based selection technique, which uses sketched 2D paths to lines that pass through a lasso shape, but instead of sketching the lasso, the lasso is defined as the convex hull that surrounds the fingers touching the visualization, making it possible to rapidly change the shape of the selection in real time.
Although not as precise as pen input, touch input is also well suited for annotating data visualizations through writing and sketching, in particular for supporting collaborative data exploration. For example, Song et al. [SGF*11] make it possible for users to annotate 3D medical data on a mobile device (see Figure 3c). The annotation was created by drawing on the cutting plane shown on the mobile device, which then updates a larger, linked medical data visualization. This combination of a small mobile display with a static larger display also facilitates several hybrid techniques which we describe later in Section 3.4. Ohnishi et al. [OKKT12], in contrast, facilitate the annotation of 3D objects using a tablet placed statically on a table, but again visualize the main data on an additional large vertical display. Users annotate the data by drawing on flattened 3D surfaces displayed on the tablet. Sultanum et al. [SVBCS13], in contrast, use a single, combined display and tactile input device. With their system, users can annotate 3D surfaces of geological outcrops by projecting touches onto the displayed surface.
Using 2D tactile input for selection in 3D visualization seems challenging given the loss of one DoF, but research has shown that this challenge can be overcome. Solutions often involve interpreting input relative to data values or features or combining selection with other tasks and widgets, for example, specifying a 3D selection via interaction on a 2D cutting plane. combinations of data-specific computations, multiple selection steps or tools (e.g. combining selection with cutting planes). 3D data selection and annotation is clearly feasible with tactile input, and could have advantages over alternatives when considering the ease of sketching and writing and the importance of these traditional styles of input for annotation.

Visualization with tangible and haptic paradigms
This second group of techniques works with input that relies on additional sensing and/or feedback that relates to our haptic sense.

Volumetric view and object manipulation tasks with tangible input
Interactions via tangible props, proxies, and devices are appealing because they tend to mimics the way we have learned to work in the real, physical world [Fit96,IU97]. Consequently, many tangible visualization interfaces provide full 6-DOF tracking and input. One of the first systems was from Hinckley et al. [HPGK94] who designed passive props for neurosurgeons to manipulate and inspect their data using cutting planes. In addition to laying out the requirement and use of tangible props for scientific visualization, Kruszynski and van Liere [KL09] proposed to use a printed tangible prop that physically visualizes the data (see Figure 4a). In this way, the props can act as a physical world-in-miniature with any manipulations of the props in the 3D physical space being reproduced in the virtual world visualized on a large stereoscopic display. Couture et al. [CRR08]'s GeoTUI makes use of tangible props within a tabletop visualization of geo-data and compared their tangible interface to a more traditional mouse-based alternative. The props were used to indicate slicing planes, and three alternative props were compared (a 1-pluck prop, a 2-pluck prop, and a ruler). They found that the ruler was the most appropriate input device for the geophysicists. Rick et al. [RvKC*11] used a spatiallytracked prop in a CAVE to facilitate visualization of probabilistic fibre tracts. The prop supported 3D data manipulation and a virtualslicing-cone interaction with a flashlight metaphor. They also provided ways for the users to constrain the slicing plane to specific axis.
Picking up on the importance of constrained manipulation for data visualization that we mentioned in previous sections, other researchers have also combined tangible interaction with constraints. Bonanni et al.'s Handsaw [BAC*08] prototype made it possible to obtain slices of the data by interacting with hand-held objects (such as a laser). Despite the physical ability to move the hand-held object in any direction, the virtual slices were restricted to move only along a normal direction. Spindler and Dachselt [SSD09] make the slicing plane itself physical by supporting interaction with a tracked, physical, paper-like prop (called PaperLens). Their hardware includes a 2D tabletop augmented with a projector and sensors. Multiple interactions are possible and are visualized by projecting imagery directly onto the paper. For example, users can select which layer of multi-layered data to view simply by changing the height of the paper with respect to the table. This constitutes another interesting example of constraining tangible interaction (or at least the interpretation of the users' interaction) rather than treating the 6-DOF manipulation of tangible objects quite so literally.
Another interesting use of tangible interaction for visualization is to use multiple tangible objects to represent different portions of the data. For example, the tangible system developed by Reuter et al. [RCR08] used props to help archaeologists virtually reassemble fractured artefacts, like a 3D puzzle. Following a similar motivation, Khadka et al. [KMB18] use hollow tangible props worn around the wrist to represent individual slices or fields of data. Users can add or remove these from the visualization by manipulating the props.
Interaction using generic tracked VR controllers, AR markers, and the like can also be viewed as a form of tangible interaction as the shape of the controllers or surface the markers are printed on convey some tangible information, even if not dataset or taskspecific. For interaction in AR, Tawara and Ono [TO10] relied on a simple visual marker to enable users to manipulate medical data with 6 DOF (see Figure 4b). In a Desktop AR context, markers have also been used as metaphors for cutting planes to provide arbitrary slicing position and orientation of volumetric datasets, such as tomographies. Moving beyond a flat marker, while still acting as a generic prop, Chakraborty et al. [CGM*14] used a physical wireframe cube prop in AR for 3D manipulation of chemistry data. The cube is used as a container for the visualized dataset. Issartel et al. [IGA14b] used a cuboctahedron to manipulate fluid dynamic data with 6DOF in AR, also proposing different slicing techniques for us with hand-held AR visualization. The manipulated cuboctahedron is covered with markers and tracked with a tablet's camera. Their approach enables slicing through the data by treating the tablet as a cutting plane or by using an optically-tracked stylus. Interaction with generic VR controllers is also common, and the research includes techniques for simultaneously manipulating views of multiple volumetric datasets or 3D scenes in order to support comparative visualization. Bento Box [JOR*19] accomplishes this via a bimanual interface for quickly selecting and arranging sub-volumes of interest in a grid. Another approach, Worlds-in-Wedges [NMT*19], accomplishes a similar task by combining a custom world-in-miniature interface with a pie-slice view of several worlds at once. In both cases, generic VR controllers provide 6 DOF pointing and grabbing inputs that are interpreted relative to the data.
As mentioned in Section 2.2.1, non-planar slicing of volumetric data is often useful, and one of the interesting ways to achieve this using tangible interaction is with a pile of modular blocks, sand, or clay [PRI02, RWP*04, Lue13,LFOI15]. The data slice can be projected directly onto the material, and, optionally, an extra monitor can be used to provide a contextual visualization. This concept has been applied to landscape models [PRI02] and biological, seismic, and air temperature simulations [RWP*04]. It is also possible to implement a similar approach using optical see-through displays [LFO*13].
In summary, the research on tangible interaction for visualization demonstrates how physical props may be used as intuitive proxies for manipulating data and slicing planes and how constraining the interaction (not utilizing all 6-DOF simultaneously) can often be useful. Additionally, some of the most creative work in this area involves concurrent manipulation of multiple tangible objects or even piles of sand; these provide a decidedly different and potentially useful means of interacting with spatial 3D data.

Visualization widget tasks with tangible input
Tangible interaction can also be very helpful to specify and manipulate visualization widgets, for example, virtual probes, which are often controlled with a handheld stylus or controller. De Haan et al. [dHKP02] use a tracked stylus in head-worn VR to read specific data point values. Kruszynski et al. [KL09] use a stylus together with a 3D printed physical visualization to interactively select and measure data properties (data read-outs) of marine coral (see Figure 4a). The data and results are visualized on a large stereoscopic screen. Following a similar strategy of using one tangible prop for the data and one tangible prop to specify a 3D point, Issartel et al. [IGA14a] employ a stylus to generate particle seeds within volumetric data (see Figure 4c). The stylus and the dataset prop are visually tracked, thanks to visual markers, and a see-through tablet is used to provide Augmented Reality. Because the data are represented by a physical volume, the seed point origin must be offset from the stylus tip so that it can points can be placed inside the volume, but users still benefit from the tangible aspects and can push a button on the stylus to start emitting particles from the point of origin. A similar approach is used by Tawara and Ono [TO10], who make use of a wiimote augmented with visual markers to provide a seeding point origin visualized in AR with a head-mounted display. Similar to Issartel's approach, the location of the seeding point is not directly located on the wiimote, though this is not because of physical limitations in this case as the data is simply manipulated through a flat 2D marker. Finally, in the context of augmented reality visualization for structural design, Prioeto et al. [PSZ*12] use a specially designed tool to input the 3D locations where pressure will be applied to a structure in order to visualize its deformation.
Virtual probes are most tangible when implemented using active haptic devices. For example, direct haptic interaction with volumetric data was demonstrated by Lundin et al. [LPYG02]. They avoid using explicit geometry, while maintaining stable haptic feedback, by using proxies. This makes it possible to represent various data attributes and manipulate the orientation of visualized data based on additional attributes and channels. Their work was later extended [LPGY05, LPCP*07] to define haptic primitives for volume exploration, such as lines, planes, attractive forces based on data attributes. Similarly, Van Reimersdahl et.al [vRBKB03] present haptic rendering techniques for interactive exploration of computational fluid dynamics data, such as scalar and vector fields, that promote an intuitive understanding of the data. Direct haptic interaction has also been used to simulate palpations in medical simulator to assist in medical training procedures [UK12]. Finally, Prouzeau et al.
[PCR*19] used haptic-augmented VR controllers to explore the density of 3D scatterplots and manipulate cutting planes.
Haptic feedback can also help to guide the placement of probes.

b) two tangible props used to identify tracts of interest in brain data [GJL10] (reprinted from the publication with permission by Springer), and (c) selection in dense 3D-line datasets with a haptic-augmented tool [JCK12a] (image © The Eurographics Association, used with permission).
tic gamma knife brain surgery by using haptic feedback to convey dose distribution in a brain tumor and guide placement of gamma ray 'shots.' Related to surgery planning, Reitinger et al. [RSBB06] used jug and ruler widgets to provide volumetric and distance calculations and assists medical staff in their diagnosis and treatment planning.
Beyond virtual probes, 3D magic lenses (also called magic boxes) are another form of visualization widget that can be controlled with tangible input. For instance, Fuhrmann and Gröller [FG98] used a tracked pen to place a 3D magic lens that provides a more focused view of the data or constrains streamlines. The cutting planes discussed in previous sections are also interactive visualization widgets --we chose to group them with volumetric view manipulation tasks but they can be thought of as fitting here as well.
Tangible interaction is used routinely to place and manipulate 3D visualization widgets like virtual probes and magic lenses. These interactions can be successfully guided and/or convey additional data back to the user when they are coupled with active haptics.

3D data selection and annotation tasks with tangible input
Tangible interaction can be particularly useful for the problem of 3D data selection within volumetric data. Indeed, specific devices can be used and tracked in order to allow users to specify the 3D bounds of a subset of the data. Taking advantage of this, research projects have focused on designing and testing specific hardware for this task. For instance, Harders et al. [HWS02] use 3D haptic force feedback to facilitate the segmentation of linear structures. Similarly, Malmberg et al. [MVN06] use a haptic device and stereoscopic rendering to allow users to draw 3D curves based on the 2D live-wire method (see Figure 5a). This idea was improved with Spotlight [THA10] which adds visual guidance to improve the quality of the segmentation. A similar setup is used by Nyström et al. [NMVB09]. Gomez et al. [GJL10] propose to facilitate selections with two tracked props, a pen-like probe to brush in a 3D volume and a cube to manipulate the data (see Figure 5b. Their technique allows users to select tracts in a DTI fibre tract dataset. De Haan et al. [dHKP02] proposed to combine a tracked stylus and a tracked transparent acrylic plane to facilitate 3D selections of regions of in-terest in head-worn VR. The position of the plane is used to specify the extents of a selection box while the stylus is used to specify a point of origin. Jackson et al. [JLS*13] use a rolled piece of paper as a tangible prop to facilitate selection of thin fibre structures and manipulate views of the data. Schkolne et al. [SIS04] use custom tangible devices to interact with and select DNA parts in an immersive VR environments with a headset. In particular they use the metaphor of a raygun to select distant parts without having to physically move to these parts. Pahud et al. [POR*18] imagined that a spatially-aware mobile device could be used as the origin of a projection of different selection shapes onto a 3D volume to provide a volume selection mechanism. Finally, based on the haptic-aided drawing on air technique [KZL07], Keefe developed a free-form 3D lasso selection technique that can be used in fishtank VR environments [KZL08,Kee08].
In addition to guiding 3D drawing, haptic devices can use datadriven feedback to further assist with making accurate 3D selections, helping to overcome the challenges of occlusion and cluttering. Zhou et al. [ZCL08] use a Phantom force feedback device with stereoscopic glasses to draw 2D lassos that are then connected to select DTI fibre tracts. Jackson et al. [JCK12b] introduced Force Brushes, which uses progressive data-driven haptics provided by a Phantom to select subsets of 3D lines in dense datasets (see To facilitate 3D data annotation, tangible interfaces have also been used as tracked note-taking devices/screens to specify the 3D position first and input annotations. One of the first prototypes to provide annotation through a spatially tracked device is the Virtual Notepad [PTW98]. Users could navigate in their 3D environment by walking and annotate specific places within the virtual scene. Cassinelly and Matasoshi [CI09] also use a tracked screen but with cutting planes of medical data; once fixed by activating a clutching mechanism, data are annotated on the screen at the position of the slice. Song et al. [SGF*11] use a similar approach, combining an iPod Touch and a large vertical display.
Other approaches enable note taking through the manipulation of the device itself and support additional visual annotations beyond text that can be used to add visual notes, showing connections in graphs, etc. For example, Kukimoto et al. [KNEK05] use a tracked PDA in a collaborative VR environment to capture annotations. A single button press activates the note taking that is done by moving the PDA in the 3D space. Similar to this, Benko and Wilson [BW10] use an infra-red laser-pointer with dome-based visualizations of 3D graphs or astronomical data. In their system, a presenter can use the laser to write annotations directly on the dome surface, which are captured by a camera and integrated into the computer graphics renderings shown during public presentations.
Using tracked 3D paintbrushes or generic VR controllers, in Scientific Sketching [KAM*08] 3D sketches are used not just to annotate 3D datasets but also as a visualization design tool. The system aims at involving artists and other visual experts in the task of designing the most effective uses of color, texture, form, and metaphor for multivariate VR visualizations.
Tangible devices provide some valuable affordances for 3D selection and annotation tasks, including a natural support for writing and sketching (in both 2D and 3D). The research also points to strong potential to augment these capabilities with active haptics that help to guide the interactions, helping to reduce the natural hand jitter that is sometimes a problem with mid-air interfaces and/or to increase the precision of 3D inputs relative to the underlying data.

Visualization with mid-air gestural interaction paradigms
Next we discuss techniques that capture input in mid air.

Volumetric view and object manipulation tasks with mid-air input
Similar to tangible interaction, mid-air gestural interaction is motivated by the potential to provide natural 3D manipulations to users, though these are arguably less like real-world manipulations, since users cannot actually hold the object they are manipulating. A pioneering work in mid-air gestural interaction is the responsive workbench [CFH97] which provided, in a tabletop VR environment, mid-air gestural interaction to rotate and translate data and cutting planes, specify axis-or plane-constrained manipulations, and pick specific artefacts. The hand gestures were recognized with the help of worn gloves. The potential of mid-air gestural interaction has been demonstrated by Kirmizibayrak et al. [KRW*11] who compared bimanual mid-air gestural interaction tracked with a Kinect 2 and mouse-control of 3D medical data with two different experiments. The first study consisted in an orientation-matching task for which gestural interaction showed strong evidence for outperforming mouse input. The second study focused on slicing techniques for which their data support that mouse control was more accurate but slower than mid-air gestural interaction . Similarly, Theart et al. [TLN17] compared several interaction modalities in VR and suggest that a Leap-motion-based hand tracking system for microscopic data analysis is a good tool for data scaling and rotating.
Ruppert et al. [RRA*12] proposed two different prototypes for 3D rotation, scaling, slicing, and contrast adjustment (which is not directly linked to 3D data manipulation but rather to system con-trol). They successfully implemented their system, which relies on single-hand manipulation in the Operating Room, and tested it during real procedures (see Figure 6a). Also in the Operating Room, Mewes [GPC11,Gal13] focused on the possibility to provide many different 3D visualization tasks through bimanual gestures tracked by a Kinect (see Figure 6b). The visualized data is visible on a traditional 2D display. Since they implemented many different interaction techniques, they defined translations as two-handed (palm facing forward) concurrent manipulations. Zooming is achieved through the same posture but by moving both hands closer or further away. Rotations are also achieved with both hands, but with clenched fists. Similarly, Lubos et al. [LBLS14] proposed a set of bimanual gestures to support 3D manipulation of point clouds for users wearing HMDs (see Figure 6c). Laha et al. [LB13] proposed Volume Cracker, which consists of bimanual gestures to crack open a dataset to explore its internal structure. The authors aimed to replace traditional slicing plane manipulations with this approach. Users have to close both hands to crack the visualized volume in two and then can manipulate each individual cracked part or iteratively crack them into smaller subsets. While domain experts are more familiar with traditional axis-aligned slicing planes techniques, Volume Cracker illustrates the potential midair gestural interaction techniques have for helping us to rethink the way we analyse data. This work can help us develop new metaphors that could be more efficient or insightful once mastered.
While mid-air gestural interaction leads naturally to unconstrained, gestural interactions, several techniques have specifically addressed the need to provide axis-constrained interaction for visualization. Malkawi et al. [MS05] proposed a set of gestures, tracked with a glove, to provide constrained zooming operations and constrained translations of isoplanes in 3D data rendered with a HMD (see Figure 7a). Bonanni et al.'s Handsaw [BAC*08] includes tangible slicing of volumetric (medical and urban) data (see Section 3.2.1) and also supports single hand gestures to perform the slicing of the data. The slicing-plane is constrained to specific axes, and hand gestures allow users to translate the plane along the axis. The data and slices are visualized on a desktop display. Finally, Botero et al. [BODVOGHV17] proposed a set of gestures to manipulate data obtained from medical imaging. A pointing finger can move the data in 3D while gesturing with the whole hand allow users to translate the three axis-aligned slicing planes to analyse internal structures.
The work by Fleury et al. [FDGS12] (see Figure 7b) brings midair gestural interaction interaction for manipulation into collaborative spaces. In their collaborative studies, two users work together to define a cutting plane. The first uses both hands, and the second uses a single hand to provide the three points that will define the cutting plane. They can interactively manipulate the plane's position and orientation by moving their hands together.
From the surveyed work, we take away that one of the most important design decisions is the choice or uni-or bi-manual interaction. Sometimes the right choice is constrained by the application domain, for example, surgeons in the operating room might need to

Figure 7: Examples of mid-air gestural interaction for 3D spatial visualization. From left to right: (a) a user interacting with an isoplane [MS05] (reprinted from the publication with permission by Elsevier), (b) a user manipulating the centre of a cutting plane [FDGS12] (image courtesy of and Fleury et al.), and (c) mid-air gestural interaction hand-gestures to facilitate 3D selection [TLN17] ( Theart et al.).
keep one of their hands on the task they are performing while doctors performing a diagnosis may be able to use both of their hands.

Visualization widget tasks with mid-air input
Our survey uncovered far less on mid-air gestural interaction interaction with visualization widgets. Nonetheless, as explained in Section 2.1.3, to date, most of the applications of mid-air gestural interaction involve 2D image browsing. An exception is the work of Gallo et al. [GPC11] who proposed a specific gesture to facilitate measurements between points in 3D (see Figure 6b). Users have to keep a hand outside of the interaction space and use the other hand to point at specific locations. The locations are then recorded, and the distance is measured and shown to the users. Malkawi et al. [MS05] also proposed to use mid-air gestural interaction to provide data read-outs (see Figure 7a). Among the many commands they propose to support with their system, they explored a specific gesture to allow users to obtain detailed information of a specific 3D point in space, visualized through an AR headset. The point is indicated by making a pointing gesture at the desired 3D position using a tracked glove.

3D data selection and annotation tasks with mid-air input
3D data selection has also been explored through mid-air gestural interaction. Focusing first on object selection for archaeological purposes, Allen et al. [AFT*04] had users wear a glove to select archaeological objects in a head-worn VR environment. Similarly, focusing on selection of objects or subparts of data that can easily be isolated, Benko and Wilkinson [BW10] proposed, in a dome environment, to use pinch gestures in mid-air to provide a selection mechanism. Direct feedback is provided thanks to shadows that are cast on the dome screen while performing the movement. Pinch gestures with shadows as depth cues have also been explored by Wang et al. [WL14] in the context of Computer Aided Design. However, in their work, the authors concluded that the shadows provided insufficient depth cues for 3D selections.
Using bimanual gestures, Gallo et al. [Gal13] proposed an interesting way to specify ROIs. They focused on the possibility to offer 9 DoF manipulations for clipping manipulation. These clipping operations allow users to select a specific subset of the data by clipping-out the unwanted parts of the data. Users translate and rotate a clipping box but also rotate the data. One version of their prototype allows users to separate the control of the box and the data, while the other integrates all three possible manipulations. They compared their approach with a mouse-based approach and found that more simultaneous degrees of freedom could lead to more precise manipulations. Theart et al. [TLN17] proposed, in a HMD VR environment, to manipulate several selection shapes (box or cylinder, see Figure 7c) or to let users trace Regions of Interest with one finger and then let users scale this region into 3D through other gestures. This kind of selection gives more freedom and control to the users when compared to manipulating selection primitives, but other approaches can give even more control to the user. For instance, Lubos et al. [LBLS14], in addition to the gesture set they use to provide 3D manipulations, also proposed to support selection in 3D Point Cloud data with a brushing technique. Users, wearing a HMD, can brush in the 3D space to select points. Schönborn et al. [SHLPF14] used tracking of mid-air gestures to interact with 3D representations of nanotubes in public settings.
Similar to the manipulation of widgets with mid-air gestural interaction surveyed in Section 3.3.2, it appears that not a lot of work has focused on annotation with mid-air gestural interaction. Bacim et al. [BNB14] proposed to iteratively remove parts of the data that should not be annotated via gestures in mid-air tracked by a Leap Motion in a desktop environment. Annotations are then added by typing on the keyboard attached to the workstation. Also employing mid-air gestures to specify the area/volume to annotate, Lubos et al. [LBLS14] proposed to annotate the data that have been selected through the use of their mid-air brush. Similar to tangible interaction, one could envision using hand movements to directly annotate and write in the air (e.g. [AGLW16]), but, to the best of our knowledge, such a system has not been used to annotate 3D environments.
Several selection metaphors (e.g. selection shapes or brushing) have been proposed to support 3D selection and annotations with mid-air gestural interaction, but overall this task group has received little attention with this interaction paradigm.

Visualization with hybrid interaction paradigms
Finally, we survey techniques that combine the previously mentioned interaction paradigms.

Volumetric view and object manipulation tasks with hybrid input
For 3D data and cutting plane manipulation, several approaches rely on a combination of tactile and tangible interaction. In an attempt to 'leverage the benefits of precise 2D manipulations combined with fast 3D manipulations', Bornik et al.
[BBK*06] designed a custom device on a tablet PC for use in a VR environment (see Fig-ure 8a). Their system allowed users to manipulate the data (rotation, translation) and place a cutting plane by using the device on the tablet or moving the device directly in 3D space. Particularly relevant for our review, the authors conclude that combining interaction paradigms did not appear to be difficult for users. Cordeill et al.
[CBL*17] developed a touch-sensitive cube, augmented with a gyroscope and accelerometer; movement of the cube itself provides a tangible interface for 3D manipulation of a VR visualization registered to the cube, while the touch input is used for other visualization-oriented tasks. Designing custom devices is less common than utilizing the mobile devices that have, for more than a decade now, been able to provide both tactile and tangible input. The Natural Material Browser [FFH13] combines spatial/tangible manipulation of a tablet and multitouch gestures for volumetric material science datasets. The rotations tracked with the inner sensors of the tablet allow domain scientists to visualize different slices of the data on the tablet, while touch-interaction is simply used to either provide rotation information or change datasets and parameters (a system control task). Pushing this approach further, López et al. [LODI16] investigated touch input for 3D manipulations on a tablet, in the context of an additional vertical stereoscopic screen. They provide 3D manipulations through tactile input using the tBox method [CDH11] on the tablet, but also add the possibility to rotate the data by directly rotating the tablet. They propose a discussion on conflicts in perception between the two scenes and how to keep both displays synchronized.
To go beyond the limitations of internal sensors, other researchers have used visual tracking of multi-touch devices. For instance, Song et al. [SGF*11] proposed to manipulate slicing planes through medical data using a visually-tracked iPod Touch. The data is visualized on large vertical display and slices can be transferred to the iPod for annotations. Katzakis et al. [KTKT15] use a mobile device with touch input and rotation tracking to support multiple interactions depending on the mapping specified; their system includes support for manipulating cutting planes and data. With the availability of spatially-aware tablets such as the Google Tango, Besançon et al. [BIAI17a] decided to focus on the specific needs of fluid dynamic researchers and combined tangible manipulations of the tablet with tactile interaction to provide full 3D data and cutting plane manipulation. Their interaction mapping was deemed more flexible than the current state of the art tools by fluid dynamics experts.
Since they are complimentary and widely accessible in today's devices, it is unsurprising that most of the hybrid work surveyed thus far combines tactile and tangible interaction paradigms. However, the literature also includes examples that combine spatial interaction paradigms with traditional mouse and keyboard interaction. An interesting recent approach was presented by Mandalika et al. [MCB*18]. They combined a traditional 2D desktop and its usual interaction mechanism with the zSpace fishtank VR system for radiological analysis purposes. The mouse and keyboard and the zSpace can be used sequentially or simultaneously and the authors report that it integrates very well in radiologists' workflow.

Visualization widget tasks with hybrid input
Hybrid interaction paradigms have also been used to help with the 3D positioning of points of interest and visualization widgets. For instance, Sultanum et al. [SSSS11] combined a multi-touch table system with tangibles for exploring geologic reservoir data. They used tangible props to control detailed data read-outs and a focus+context view, while tactile input was used for regular data navigation, including data-specific techniques such as splitting and layer peeling. The spatially-aware approach of Besançon et al. [BIAI17a] allowed fluid dynamic researchers to position seeding points in 3D by, for instance, combining touch and tangible input. They could start by placing a cutting plane with tangible manipulations of the tablet. Then, they could use touch input on the tablet to specify the x-and y-position of the seeding point, and the z-position was derived from the intersection of a ray originating from the finger with the cutting plane (see Figure 8b). Cordeil et al.'s [CBL*17] touch-sensitive cube, on the other hand, allowed users to manipulate the data with direct spatial manipulations of the cube itself and use ray-casting from finger positions onto the cube's side to, among other things, specify parameters for data readouts.

3D data selection and annotation tasks with hybrid input
Similar to 3D data manipulation, most approaches for 3D data selection rely on a combination of tactile and tangible interaction. For this task, some approaches have created custom interaction devices to provide both interaction mechanisms. The 3D selection mappings proposed by Katzakis et al. [KTKT15] enable manipulating the selection volume with either tactile or tangible input. However, while they showcase part of their work with visualization data, their 3D selection is limited to an object selection interaction and is not suited for the needs of volume visualization as we previously argued in Section 2.2.3. The touch-sensitive cube developed by Cordeill et al. [CBL*17] goes further and implements a selection mechanism based on touch input of the tangibly-manipulated data (see Figure 8c). For instance, they use pinch gestures to define a rectangular subspace that passes through the entire data volume. Alternatively, free-form drawing on either side of the cube can be translated through the complete volume and multiple side drawings can together iteratively define a cross selection volume.
Moving away from custom-designed hardware, Veit et al. [VC14] proposed 3D selection and annotation of the selected subsets with a spatially tracked multi-touch device in a 3D stereoscopic setting. Their 3D selection technique focuses on point cloud data and proposes to manipulate a selection sphere that is attached to a ray con-trolled by the tablet's movement. Tactile manipulation changes the size of the selection sphere on the fly. This approach is versatile and can adapt to different datasets and regions of interest. Pushing the versatility even further, Besançon et al. [SAIB16,BSY*19,GSBI20] proposed to use a spatially tracked tablet to provide free form 3D selection of volumetric data. Their approach is not datadependent and allows users to first trace a lasso on the tablet to obtain a 2D shape that is then extruded in 3D using 6 DOF tablet manipulations. This approach relies on a combined tablet and large screen and enables entirely user-controlled free-form selection. Although their research focuses more on the specific workflow of annotations and their review and modifications, Pick et al. [PWHK16] combine custom CAVE hardware with a smartphone to support annotation tasks.
Many different visualization systems rely on immersive displays, such as VR headsets. Taking notes can be particularly challenging in these environments, and hybrid interaction paradigms often provide a solution. The early work of Harmon et al. [HPRB96] and Poupyrev et al. [PTW98] was pioneering in this regard. They used a spatially tracked device to annotate (medical) data for VR environments (with headsets). The users could navigate the environment by walking inside of it, position themselves and the tracked-tablet at a specific location, and start their note-taking process with a pen on the tablet. Tsang et al. [TFK*02] extend this style of spatial annotation in the Boom Chameleon interface, which combines tactile and tangible interaction. It is a mechanically tracked display, augmented with a tactile overlay to capture touch gestures and speech via a microphone. Manipulations of the display make it possible to capture a specific view that then can be annotated. The device developed by Bornik et al. [BBK*06] also allowed users to take notes by combining tactile and tangible input to select where notes should be placed. Interestingly, Kukimoto et al. [KNEK05] proposed to use a tracked PDA to take notes in a VR environment by using it solely as a tangible note-taking device or combining its location with the pen input. Finally, Lubos et al. [LBLS14] augmented their mid-air gestural interaction-based selection gestures with voice-recognition to create a VR annotation tool for 3D point cloud data.
The most commonly found hybrid paradigm for annotation and selection combine tactile and tangible interaction. For example, touch input can be used to sketch annotations or selection marks while tangible input is used to specify a 3D location or extrude 2D selections into 3D space.

Opportunities for Future Research
We would like to highlight three challenges and opportunities for future research that emerge from the survey.

A need to focus on visualization widgets
To facilitate discussions of the state of the art of 3D spatial visualization, we have gathered the approaches mentioned in our survey in Table 1, following the same classification as in Section 3 (an aggregated version was presented in Figure 1). To obtain this table, we have included all past work in Section 3, excluding papers that do not focus on proposing a new technique for visualization. Our survey highlights clear areas for future work and might also explain the lack of adoption of spatial 3D visualization techniques by domain experts, as pointed out by previous work [WBG*19]. In Table 1, we clearly see that HCI and VIS researchers have rightfully investigated several interaction paradigms, each of which is valuable, especially considering that experts in different domains have different needs. For instance, surgeons are more likely to be interested in mid-air gestural interaction, while researchers in geology, fluid dynamics, or archaeology can make use of props or screens. Our final report on hybrid interaction paradigms (see Section 3.4) and Table 1 also clearly highlight that researchers in VIS and HCI have been leveraging the potential of hybrid interaction paradigm for improving visual exploration of 3D datasets.
Despite these positive take-away messages, in Table 1 one can also clearly see that task groups 1 and 3 --Volumetric view and object manipulation and 3D data selection or annotation --have been investigated by numerous authors, covering all the interaction paradigms we discussed. Task group 2 (defining, placing, and manipulating visualization widgets), however, has been barely investigated --to the best of our knowledge and considering the very large body of work we reviewed. Furthermore, Section 3 highlighted that the few approaches that did consider this task narrowly focus on one or two visualization widgets. These widgets have been identified by domain experts to be essential to conduct a proper 3D data analysis (e.g. [BIAI17a,PTSP02]). The lack of adoption of 3D spatial techniques developed by HCI and VIS researchers could, hence, be explained by the fact that most prototypes do not have all the widgets and features necessary to conduct a complete analysis. In addition, past work has highlighted the need to provide a link between HCI/VIS prototype and traditional analysis software (e.g. Matlab, Python) run on desktop computers [WBG*19]. On the other hand, research on 2D visualization widgets is not rare (see visual analytics contributions, e.g. [HS04, KHPA12, YS20]), which suggests that such research is valued in the visualization research community. The lack of focus on 3D visualization widgets could therefore be explained either by the fact that the challenges of manipulating virtual probes, cutting planes and the like are still not addressed, or the fact that 3D visualization communities might not be interested or aware of the need for 3D visualization widgets beyond these. Either way, we hope that this survey will contribute to highlight this specific need and eventually foster more work that supports the manipulation of visualization widgets for 3D data.
Among the four spatial interaction paradigms we explored, we also found relatively few works on mid-air gestural input for all three task groups. One possible reason is that, compared to other paradigms, mid-air gestural interaction is less precise due to unstable gestures. Another possibility is that tactile and tangible interaction paradigms feel more comfortable to users. In some special cases, however, the use of mid-air gestural interaction can be essential such as in operating rooms where hygiene is paramount (see, e.g. Section 2.1.3). Furthermore, mid-air gestural interaction also seem to be the de-facto expectation of some experts [WDB*20], making it a potentially more intuitive solution in some cases. Consequently, we believe that more research is needed in this direction.

Interaction in public spaces
Science communication is currently rapidly embracing the possibility of using 3D data visualization as a way of telling engaging data driven stories to a broad audience at public venues, such as science centres and museums. As intuitive, reliable and robust interaction is a hard requirement at public venues, research is needed to identify specific interaction challenges posed when dealing with large number of users exhibiting diversity in age, language, culture and knowledge. As previously postulated by Sunden et al. [SBJ*14], interaction techniques can help in the dissemination of scientific discoveries using interactive installations. Also, in large-scale immersive theaters such as planetariums, interaction becomes a central part of live programs, often with a facilitator [KHE*10, BHY18] (see Figure 9a) carrying an interactive and non-linear narrative. Despite being already used for 2D data exploration tasks or discovery of ancient objects (e.g. [PPMW06, HdlRL*13, MSLF15]), the potential for interaction techniques in immersive visualization environments [BCD*18] to foster engagement and learning of complex 3D scientific subject matter remains under-explored and under-studied [YRA*16, YLT18] (see Figure 9b). Here, we provide a few examples of the related work that does exist. Tangible interaction lends itself to natural use in public dissemination, and a multitude of examples are used on a regular basis. We do, however, find that many of these approaches are not documented in terms of research papers and there is room for systematic studies of tangible interaction in public spaces. One available example of is the use of physical objects to steer the behavior of visual objects on a touch surfaces [HCM*16]. Another interesting example used in public dissemination is the use of haptic interfaces for communication of complex molecular interfaces [PCT*07].
In public dissemination work, gesture-based interaction is commonly used by visitors to interact with content and select options in exploration scenarios. However, most of the developed techniques focus on interaction with 2D content or slide-based presentations (e.g. [CFM*12, RS13]). Schönborn et al. [SHLPF14] used gesture based interaction with 3D representations of nanotubes together with stereoscopic viewing (see Figure 9c). A recent study on different gesture paradigms for 3D interaction in mediated presentations was published be Krekhov et al. [KEBK17]. They highlight the lack of focus on presentation of 3D data and applications and propose three gesture sets for such presentations that they also evaluate.
Finally, hybrid interaction has been used in the context of cultural heritage visualization in public dissemination. For example, a combined touch interface and HMD installation was produced by Sundén et al. to visualize 3D reconstruction of ship wrecks [SLY17].

The challenge of evaluation
Our state of the art report did not detail how each of the techniques we surveyed has been evaluated, although doing so would also highlight how challenging the evaluation of such new techniques and interfaces is. Indeed, these techniques are designed mostly for experts and researchers in specific domains. Domain experts using data visualization for sense-making rarely evaluate visualizations based on completion time or error-based measures but rather on the potential to turn data into insights. The visualization community has, therefore, sought to develop more holistic strategies to assess the quality of interactive visualization techniques 1 However, as we found in this survey, it is still rare to see published work that integrates an evaluation not based on traditional metrics. Since 3D visualization techniques often target a specific audience of domain experts, the alternative evaluation strategies that are most common focus on feedback from these users. Recruiting experts can be difficult, but the feedback they provide in user studies is invaluable.
Looking at the surveyed papers, however, seems to be promising as some of these approaches are evaluated without relying on traditional statistical analyses of time and errors. Some past work has relied on case studies instead [SYG*16, JJB18] or reports of use by real end users (i.e. domain experts) [TFK*02, RRA*12, MCB*18]. Others have relied on qualitative feedback from a small pool of experts, e.g. [GJL10, LRF*11, SMP*17, BIAI17a,Koe18]. It would therefore seem that the 3D spatial visualization community is open to such evaluation strategies. Despite these positive aspects, it remains that most evaluations are focused on understanding how the techniques work in a research lab environment. The lack of adoption of 3D spatial visualization techniques, as mentioned in Section 4.1, could also be explained by the lack of focus on how newly developed techniques can integrate within the workflows and environments used by experts, as indicated in previous surveys and position papers [MHWH17,WBG*19]. To degree to which experts adopt visualization tools and integrate them into their workflows is a clear measure of the impact of the work, but this is only sometimes brought to the forefront in research publications (e.g. [LRF*11, RRA*12, BIAI17a, MCB*18, WDB*20]).
While some of the papers surveyed are accompanied by opensource implementations, which help with adoption and follows current recommended practices [BPSS*20], it is not clear how many of these have been or will eventually be adopted outside of a lab environment. We know of some success stories [YRA*16, Hol19, KAM*08, CML*12], but others may also exist. 3D visualization interaction techniques are increasingly being adopted in public venues to disseminate scientific findings (see [YRA*16]), or in the planning of complex and news-worthy surgical procedures [Hol19]. These are highly visible outlets, but no specific 3D visualization research is cited. All in all, while we thoroughly examined all surveyed papers and tried to find which of the techniques have been adopted by domain experts, finding specific examples remains particularly challenging and one can question whether it is due to the fact that most techniques are not eventually adopted by experts or whether such adoption is not as relevant to the visualization community as the basic research. To disambiguate this in the future, the visualization community could try, much like case reports in the medical fields, to propose a venue or track to report on the successful (or failed) adoption of visualization techniques by domain experts.

The potential of hybrid interaction paradigms for 3D Visualization
Combining different interaction paradigms to leverage their inherent benefits and mitigate their limitations has been the focus of multiple research projects [ [LBLS14] combine midair gestural interaction with voice recognition to create an annotation tool in VR. Research outside of 3D visualization suggests a number of other possible combinations: pressure and tactile interaction (e.g. [CVLB18]), pressure and tangible interaction (e.g. [BAI17]), mid-air gestural interaction and tactile interaction (e.g.[WB03, HIW*09]), mid-air gestural interaction with tangible interaction (e.g. [SLM*03]). Our survey, therefore, suggests that such other combinations should also be investigated for use with 3D visualization.
Looking at Table 1, we can clearly see that tangible interaction has been used more than other paradigms to provide Visualization Widget placement and manipulation. As described in Section 4.1, additional studies with other paradigms are needed, but the current trend could also be explained by the inherent and natural 3D positioning that tangible interaction can provide to users. It is likely that there is good reason to consider this interaction paradigm when implementing a new 3D visualization technique, and to potentially combine it with other interaction techniques.
Another take-away from this survey is the lack of focus on voiceinput for 3D visualization purposes. In particular, our survey only revealed examples of voice input for annotation. While we have noted in Section 2.1 that using voice for direct manipulations is generally discouraged [KI13], voice input has been and can be used for visual analytic purposes. Natural language has been used to query data via speech (e.g. [CGH*01, GDA*15, DMN*17, SS18, YS20]) and typing (e.g. [YS20]). Past work [SS18] has highlighted the potential to combine natural language with other input modalities and called for more research on this topic in visualization. Our survey highlights that such work also has to be conducted in the specific case of 3D visualization, beyond the more classical and straightforward use of speech for capturing annotations.

Conclusion
In our report we have surveyed interaction techniques designed to assist domain experts who rely on 3D spatial visualization. We have discussed techniques relying on tactile, tangible, gestural input or hybrid combinations of interaction paradigms, and classified each technique based on the 3D visualization task it supports. We used three high-level task groups: (1) Volumetric view and object manipulation, (2) defining, placing and manipulating visualization widgets and (3) 3D data selection and annotation. Our classification highlighted the tremendous amount of effort put into two of these tasks. In particular, Volumetric view and object manipulation and 3D data selection have been well covered with all interaction paradigms. A possible explanation lies in the overlap of interest in these tasks between the visualization community and other HCI and VR communities. However, the tasks within visualization widget manipulation, while also being essential (as highlighted in Section 2.2), appear to have received less attention. Similarly, but perhaps to a lesser extent, while we grouped annotation with 3D data selection, we can see from our survey that it has been inves-tigated less frequently than 3D data selection. We hypothesize that placing an increased emphasis on interactive techniques for manipulating visualization widgets and making annotations may lead to a better adoption rate for visualization techniques.
Another pertinent take-away message from our report is the lack of studies on how to best use the various interaction paradigms to disseminate scientific knowledge in public venues. While much of this literature has focused on the dissemination of and interaction with 2D data or slide-based presentations (see our previous discussion in Section 4), only a few studies have investigated these topics with 3D data. We believe, however, that the potential of these interaction paradigms goes beyond supporting scientific discoveries to also include engaged learning [XAM08,HCB12,BIAI17b].
Our survey also highlighted the benefits of each interaction paradigm that are most applicable to addressing the challenges of 3D spatial visualization. Our final report on hybrid interaction paradigms (see Section 3.4) highlighted the potential of leveraging the benefits of multiple interaction paradigms to address the challenges of 3D Spatial data visualization. Moreover, it is essential to consider the unique affordances of interaction paradigms in different scenarios or contexts. For instance, when analysing spatial data in a CAVE (wearing a VR/AR HMD or a tracked pair of glasses) it is easy and effective to change viewpoint to view the visualization from different perspectives [LBS14, BSB*18, MSD*18, WSS20]. However, when the task is to analyse 3D volume data in the operating room, the space limitation, the light condition as well as the convenience of head-worn cameras need to be taken into consideration.
Our survey also discusses the challenges of evaluating 3D visualization techniques as well as the difficulty to knowing which techniques have been successfully adopted by domain experts in their workflows. To address this challenge, we recommend that the visualization community creates additional opportunities for contributing such reports, much like the tradition within the medical research community of publishing case reports.
Finally, our survey can be used to support the evaluation of future work: researchers can use our table to identify relevant related work to compare to new approaches.