As the volume of data grows, computer technology becomes more powerful and simpler, and end-user sophistication increases, the question of what information retrieval systems (IR) ought to supply end-users is altered. Should IR systems be used solely to locate objects or should they be tailored to a broader conception of user groups' information needs? More information about the records in an IR system might be used to help end-users both to locate and to interpret the data in a more direct way. For example, searching for visual resources (images representing 2Dand 3D art objects), end-users see the representation of the art, but without more knowledge cannot see the value of an object in society. 3D representations of retrieval sets capitalize on the visual orientation of the user and of the objects in a collection. In some domains, such as biology and medicine that rely on scientific visualizations, users expect a tighter integration of the visual representation of information and the supporting literature (Bergeron, 2002; Benoit, 2004). How might a visual emphasis for general users or other specific domain users make existing collections more useful?
This paper describes a project that intended originally to create a 3D interactive interface that could overlay existing collections to help a general user population search records. It became clear during an early Joint Application Development session that the concept of a “better” system was hollow because end-users were not interested in performing to traditional IR measures, such as searches faster, and wanted a system that supported their real goals, to learn more about a topic. In light of this, described below, the project changed to a second phase, focusing on a user population that is visual, but not scientific - art historians.
Metadata records for visual materials follow the Visual Resource Association's Core, version 4.0. VRA Core is an XML implementation that provides the usual descriptive and subject access, such as artist name, title, and descriptors from LCSH and the Art and Architecture Thesaurus (AAT), in other words, the “facts” of the object being described. But in general visual information retrieval (VisIR) systems focus either on searching by properties of the image itself (such as color, texture, or guessing at the shape) or operate as full-text systems that display sets of pictures instead of text as references to items in the collection. These efforts lack the integration of expert knowledge such that the metadata record contributes to the information seeker's understanding of the value of the object; there is a kind of fact/value dichotomy. Fortunately, XML is pliant enough to address this.
What is needed, perhaps, is a VisIR system that effectively permits and encourages domain experts to enrich the record by storing contextualizing data that reflect the values that help one understand the images. This requires a VisIR system that is designed to use the data, to supplement the IR processes and complement what the viewers extract from their information seeking session. The interface, of course, must support the user where these richer records meet their want of interpretation and knowledge. Unlike other VisIR applications (a) this project's search engine uses an innovative resource file of fuzzy associations between concepts in art history and (b)expanded metadata records that capture the values and interpretive perspectives that art educators value.
The rest of this paper describes how the project started, why it was altered, and how the mechanics of the redirected 3D interface work. It closes with some conclusions and plans for future work.
There are long-standing concerns in IR related to mapping queries to document collection representations and the source of mismatches, such as the semantics of query representation and documents' metadata contents. A review of the literature suggests there are identifiable streams of responses in visual information retrieval (VisIR) systems research. The first pursues the traditional IR approach of improving query/collection matching algorithms. The second focuses on the interface and the affective or emotional responses evoked.
The following two projects, as do many described in Ware (2004), layout a variety of affective measures for evaluating facets of VisIR systems. For example, Haik et al. (2002) focus on measuring task performance and user attitudes towards the display by enumerating what the researchers believe the end-users value: Navigation:
|Navigation:|| || |
|Display:|| || |
|Display:||Too dense||Not dense|
|Effectiveness:||Weak sense||Strong sense|
|Control:||Not in control||In control of session|
Chen & Czerwinski ([n.d]) consider user satisfaction, the speed of retrieval, and add spatial ability and the sense of immediacy of using the system in a suite of qualitative evaluative measures: Design satisfaction ratings for the user interface:
|“The purpose of software immediately clear”|
|“It was easy to get what I wanted”|
|“I knew what to do”|
|“Each area of the software was clearly marked to indicate my location” Usability satisfaction ratings of the interface|
|simplicity, ability to zoom and walk around topics, navigation topical clusters Online Appeal:|
|software feels unique or different|
|responsive (not too slow)|
|provides valuable information|
|easy to use|
|“cutting edge” technology used|
|software provided a detailed environment to interact with|
|software is timely|
|personalized or customizable|
|shared experience (community)|
These papers demonstrate the aesthetic attraction of visual IR, but no widely adopted evaluative criteria dominates (Morse, Lewis, & Olsen, 2002) to provide empirical results applicable to a recognized user group needs.
Moreover, they imply that affective measures are useful predictors of VisIR success, but this is not substantiated. Some groups (e.g., biologists, scientists in general; Benoit, 2002, 2004) are less swayed by emotional considerations, being more willing to endure uncertainty and complexity in order to explore hidden properties of the data in contextualizing retrieval results (van der Eijk et al., 2004).
This leads to a considerable and growing body of work, mostly proceedings, about individual visualization projects, explaining the designers' motivation and system architecture, without detailed empirical components. For example, Santini and Jain (2000) created “El Niño” as a suite of tools addressing what they term the “semantic gap” between query and searching visual collections. Santini, Gupta and Jain (1999) discuss the semantics emphasizing features, such as image color, structure and texture, emphasizing that the search “engine must be able to understand the placement of images in the display space … [and] be able to create a similarity criterion ‘on the fly’. … (p. 8). Urban and Jose (2006)describe “Ego: a personalized multimedia management tool” that also “should address the designing of a system that supports a variety of interactions and personalizing the support of information interaction” (p. 1). They create a product to support “retrieval in context”, learning from the user's personal organization. Others, Kang & Shneiderman's PhotoFinder (2000), Shen, Lesh, Vernier, Forlines & Frost's “Personal Digital Historian” (2002),collectively support the idea that visual retrieval is bound either to the text in the metadata record accompanying an image file or by some kind of automatic classification based on the image's color, structure, and texture. Borland 2003) also argues for more flexible modes of evaluating interactive information retrieval. She believes a “set of components” should be identified:
the involvement of potential users as test persons;
the application of individual and potentially dynamic information need interpretations deriving from e.g., the sub-component of a simulated work task situation; and
the assignment of multidimensional and dynamic relevance judgments.”
Recent work (Giereth et al., 2007; Spoerri, 2007; Luboschik & Schmann, 2007; Lau & Vande Moere, 2007) also value aesthetics, interactivity, and end-user understanding. The literature in general supports Chen's list of unsolved problems (2005): usability, understanding elementary perceptual-cognitive tasks, the impact of prior knowledge, eduation and training, quality measures, scalability, aesthetics, shifting to dynamic interfaces, visual inference, and knowledge domain visualization. Chen and Geroimenko advance the question by integrating XML (2005)and the Web (2006) into the challenge of making IV more useful.
From 2D to 3D:
Some researchers appeal to3D interfaces to address the issue of volume, similarity, and navigation (Cockburn& McKenzie, 2001; Wiza, Walczak& Cellary, 2004; Benoit, 2004). 3D has long been considered a way to address large sets of documents: “interactive 3D graphics techniques can be used to help the user comprehend and filter such [large]result sets” (Cugini, Piatko,and Laskowski, 1996, p. 1). Mackinlay,Robertson and Card (1992) and Calitz and Munro (2001)recognized the usefulness of visualization for hierarchical relationships. These and other studies examine how end-user spatial reasoning nad memory affect the user's interaction in full-text retrieval (Hemmje, 1993,1995). Newby (2002, p. 50) drew interesting conclusions from his Yavi system. He proposes that 3D interfaces
limit rotation, especially about the y axis, to lessen disorientation;
provide visual reference points (backgrounds, terrain, etc) to help enhance depth perception;
enable viewing of relationships beyond similarity (e.g., by selecting a document and seeing all terms in it);
de-emphasizing keystroke commands in favor of on-screen heads-up menus or pull-down menus, and
zoom and expanding/exploding the space.
He explains that this type of visualization is “more suited for visualizing several hundred, or perhaps a few thousand, items” (p. 49).
Eidenberger and Breitendeder's(2003) application “VizIR” integrated a 3D interface of icons but on a single plane, items receding from the user, similar to a stage, using a head-up display (“media panel”). This application parses XMLrecords to provide the data. Risden et al. (2004) go a step further examining 2D and 3D information visualizations integrating XMLand 3D (XML3D) but using colored balls with edges, superimposed over a globe with longitude and latitude lines.