An analysis of users' image queries of a photojournalism image database: A Web analytics approach

Authors


Introduction

Research in image retrieval attracts researchers and practitioners from a variety of fields, and one major research challenge is meeting the needs of the end users of those images (Jorgensen, 2003). Jorgensen (2003) also suggests that various image collections are needed for system evaluation and that to “ensure both reliability and validity of testing results, the content should be grounded in the reality of the image community who make heavy use of digital images in their jobs, as well as providing suitable testing material for a variety of techniques” (p. 267). Historical photographs are on Jorgensen's list for desirable testbeds. Pictures of the Year International (POYi) describes itself as

“the oldest and one of the most prestigious photojournalism programs in the world. The mission of POYi is to empower the world's best documentary photography to provide a visual portrayal of society and foster an understanding of the issues facing our civilization. POYi presents public exhibitions, promotes professional development, and cultivates its vast 38,000-image archive of historic photographs. These programs have enormous potential to engage and educate citizens about their world and the role of a free press in a democratic society. Since 1944, POYi has set the gold standard for documentary photography. Now in its 66th year, POYi continues to reflect the news events, social issues, and cultural influences of our world by recognizing excellence in photojournalism” (http://archive.poyi.org/about/).

A prototype of the POYi image collection called the POYi Archive has been developed using the Omeka.org content management system. The collection contains over 38,000 photographs and offers end users the ability to search the photographic collection. The original and main Pictures of the Year website (http://www.poyi.org/) was created as a static web site using a proprietary web development environment. The original website still exists and the new prototype was created using open source solutions and launched on a subdomain. Visitors of the main website were encouraged to browse the website's various collections by Year of Award. There were little or no search capabilities present in this initial incarnation and only the last several years were available online. The new version, which operates side-by-side with the existing source, was built as a prototype for further development. The first set of features were based on a database-driven design, which supports both searching and browsing the collections and the photographs. Based on the indexing of metadata established by the Photojournalism faculty and staff at Missouri, the designers sought to provide a more dynamic approach to locating relevant content within and across the various collections. See Figure 1 for a complete list of metadata elements. Figure 2 depicts the Basic and Advanced Search interface which supports interactive end user searching by Specific fields, by Collection, by Type and by metadata tag. Dropdowns are included for ease of selection within those categories. Figure 3 depicts a typical search result which includes some of the fielded information, including Publisher, Creator, Relation, Spatial Coverage, Rights and Source. Additional fields can be displayed by scrolling down the selected results page. Figure 4 depicts the metadata description of a specific image item.

This current project focuses on how well the metadata used within this new implementation meets the needs of the target audience. The results are expected to enhance the quality of the metadata elements, system functions, and POYi's outreach program. The browsable and searchable POYi Archive is available on the open Web at http://archive.poyi.org/.

The investigators address several key issues recommended by the Library of Congress's Future of Bibliographic Control report (2008). These issues include: testing the use of metadata elements for news photos on the Web; collaborating with end-users on organizing and accessing news photos; and collecting evidence to support system enhancement of the POYi testbed. The research questions of the projects are:

  • 1Do end users search or browse to find images?
  • 2What are the most frequently used/least used metadata elements searched by users?
  • 3What are the common characteristics of search terms selected for use by end users?

This poster reports preliminary results of users' use of search functions provided by the POYI website and their image queries.

Data collection

The investigators installed Google Analytics (URL: http://www.google.com/analytics/) on November 12, 2008 at the POYi Archive website. Google Analytics captures users' search keywords, visit length, viewed pages, their exit pages, and other pertinent end user actions. The Web Analytics Association (2008) defines Web analytics as “the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage.” In contrast to log analysis that mainly collects data from the Web server, Web analytics collects data about the interaction from the client-side (users' computer) and therefore has the potential to provide more accurate and specific data on end user behavior. This is generally accomplished by placing a small piece of JavaScript code on every page of the website to be analyzed. In this instance, the POYi Archive website uses a common PHP footer for all its pages and the JavaScript is only necessary on this page. Each time a user visits that page the code is activated and that user's actions are recorded. This Web analytic tool integrates data collection with data analysis and reporting.

Preliminary data analysis

The investigators collected data between Nov. 16, 2008 and Feb. 7, 2009 and reported the data using two-week increments as depicted in Table 1. The total number of visits gradually increased from weeks 1 to 10 but fell slightly in weeks 11-12. The total number of visits was 3,674 in 12 weeks. Most visitors browsed the images instead of conducting search queries. Over the 12 week period, basic and advanced searches represented only 8.7% of the total visits. When conducting searches, visitors used the basic search box to fill out their image queries. The number of queries from basic search increased from weeks 1 to 10 but fell slightly in weeks 11 and 12. Overall, the numbers of single-word queries and multiple-word queries were reasonably close during the 12 weeks. By comparison, visitors seldom use advanced search functions to construct their image queries. The percentage of advanced search usage was little more than.57%. ‘Description’ and ‘Creator’ were the top used metadata elements representing slightly more than.40% of total advanced search usage.

Conclusions

According to the preliminary results, the visitors during this twelve week period tended to browse the POYi collection rather than launch a specific search query. When conducting image searches, they mostly used the basic search function with single- and multiple-word queries. Advanced search functions were rarely used, representing.57% of searches compared to total visits. Based on these preliminary results, the investigators intend to continue to collect additional data on end users' searching behavior and their queries. Further analysis of the relationship between the image queries and metadata elements will be conducted in the second phase of the project. In the meantime, the investigators will use Google Analytics to analyze users' image browsing habits and other behavioral factors when visiting the POYi Web site. Data from Google Analytics could include the popularity of various images and their corresponding categories, the number of viewed images, the time of the visit, etc. Based on such data, the investigators would be able to discover in greater detail just how visitors browse the POYi image archives. We will also be able to gauge whether their patterns of searching and browsing change over time. It might just be the case that existing users who are accustomed to browsing the original collection (which had no search capabilities) over the past few years have carried those characteristics over to the new Archive site, which is now searchable. Since this is a new site, with minimal advertising and promotion, we expect cadres of future users to perhaps consider utilizing the new search and browse features contained therein.