Digital libraries and supporting technologies have now matured to the point where their contents are incorporating complex and dynamic resources and services. Given the proliferation of scholarly digital contents, research in almost every discipline is becoming more data-intensive and current users demand access to various formats regardless of temporal and spatial restrictions and the types of devices used.
Advancement of digital technologies is shaping creation, access, use and preservation of information resources in ways that are so profound that traditional methods and concepts of access and organization are becoming less effective.
Various emerging (Web 3.0) applications, driven by semantic web technologies such as OWL, RDF, SPARQL, SWRL, etc., indeed offer powerful data organization, combination, and query capabilities. Some of the trends that are already taking shape include:
Huge multimedia digital libraries instead of documents
Complex retrieval systems instead of matching queries and document representations
Visualization of the information space instead of a ranked list of search results
Human information behavior instead of information need
Users as both creators and consumers of information
Bottom-up/user assisted indexing instead of “top-down” approach to indexing by trained professionals
As more users move into the more self-structured digital environment, a new paradigm for user experience will be required.
Indexing and Representation
An index term is simply a systematic representation of an information-bearing object (text, images, audio, video, etc) which points users to specific items on topics of interest. In other words, it is an information retrieval tool. As noted by Caropreso, Matwin and Sebastiani, (2001), one of the key issues for information retrieval (IR) and all other content-based text management applications is document indexing.
The generation of an accurate indexing term, i.e. a representation of an information-bearing object, is fundamental to the discovery, use, and reuse of digital resources. Indexing enhances the accessibility and value of a resource, provided that it is based on a thorough analysis of the resource. A good index helps users find what they need, even when they are not sure of what they need or looking for.
To fully understand what a good index is, it is necessary to be both micro- and macro-minded. On the micro level, we concern ourselves with the specific mechanics of creating an index term. On the macro level, indexing could be thought of as part of a larger context of an information retrieval system. At the basic level, retrieval of information then involves the user expressing an information need in the form of an information retrieval request by using terms from the common vocabulary and matching requests with stored records.
Considering the growing interdiciplinarity and constant changes and development of information and knowledge management; digital resources demand a more specialized treatment and characterization that can help to better capture the semantics and relations of the underlying concepts.
Different metadata elements describe different characteristics or aspects of an object or digital resource. However, users are more interested in the contents and the subjects, rather than in what the objects are. The most useful metadata about a digital object is the subjects (or keywords), since they explicitly describe what it is about. To describe digital resources accurately, metadata creators and/or catalogers try to follow (as closely as possible) the thinking of the creator/author, and also anticipate what and how the users might want to discover and retrieve them. Otherwise, the descriptions or subject headings will be ineffective. Wichowski (2009) noted that in the rapidly growing information environment, unidentified and unorganized content, however useful it may be, is at risk of being rendered unfindable, and thus obsolete.
A number of researchers, Bates (1998), Peterson (2006) and Spiteri (2007); among others, analyzed content indexing (especially subject indexing) and described the general behavior of users' information seeking and their queries Many agreed that the two major reasons why users experience problems with subject access are the quality and application of subject index on the one hand, and the complexity of knowledge as well as information literacy skills required for successful subject access on the other. To maintain the consistency of search results and high recall of available resources, it is critical to ensure the quality of the keywords and taxonomies used to index heterogeneous digital resources within digital libraries.