Image categorization in theory and practice
There exist several typologies and frameworks for image content which serve as a starting point when discussing image categorization. Some of these are born out of content analysis of image descriptions while others are more theoretical constructs on image semantics. In studies focused on image descriptions and the attributes of images, one commonly distinguishes between generic (pre-iconographical), specific (iconographical) and abstract/affective (iconological) image descriptions based on the work of Erwin Panofsky. Shatford (1986) categorized the subjects of images further as generic of, specific of and about based on Panofsky's levels of analysis. Here, ofness refers to the factual content of the image (“what is the image of?”) and aboutness refers to its expressional content (“what is the image about?”). Based on data from image description tasks Jörgensen (1998) distinguished between 12 image attribute classes which she classifies as perceptual (Object, People, Color, Visual elements, Location, Description), interpretive (People-related attributes, Art historical information, Content/story, Abstract concepts, External relationships), and reactive (Viewer response). Approaching the subject of image content from yet another viewpoint, Burford et al (2003) and Eakins et al (2004) proposed a taxonomy of image content based on literature in computer science, art history and psychology. In addition to metadata they list nine hierarchical categories of information associated with an image: Perceptual primitives, Geometric primitives, Visual relationships, Visual extension, Semantic units, Contextual abstraction, Cultural abstraction, Technical abstraction and Emotional abstraction.
But how do humans judge the similarity of images and what types of image categories result from these subjective evaluations? Prior research has discovered that people evaluate image similarity based on the presence of people in the photographs, distinguishing e.g. between people, animals and inanimate objects as well as according to whether the scenes and objects in the images are man-made or natural, e.g. buildings vs. landscapes (Mojsilovic & Rogowitz 2001; Teeselink et al 2000). Other studies have concentrated on finding which types of image attributes could be used in creating efficient and effective image groupings for retrieval and browsing. Using photographs of people with the backgrounds removed the participants of Rorissa and Hastings (2004) formed the following main categories: exercising, single men/women, working/busy, couples, poses, entertainment/fun, costume and facial expression. Vailaya et al (1998) used outdoor images which subjects sorted into the following categories: forests and farmlands, natural scenery and mountains, beach and water scenes, pathways, sunset/sunrise scenes, city scenes, bridges and city scenes with water, monuments, scenes of Washington DC, a mixed class of city and natural scenes, and face images. The abovementioned studies indicate that people mostly evaluate the similarity of images at a high conceptual level constructing semantic image categories. This is also the level mainly employed in image description tasks as shown by e.g. Jörgensen (1998) and Hollink et al (2004).
It has, however, been noted that people do not always form the above types of generic content categories. For example, image categories are also created based on abstract concepts related to emotions or atmosphere, cultural references and visual elements (Laine-Hernandez & Westman 2006). Sormunen et al. (1999) found that journalists evaluate image similarity based on several criteria, including shooting distance and angle, colors, composition, cropping, direction (horizontal/vertical), background, direction of movement, objects in the image, number of people in the image, action, facial expressions and gestures, and abstract theme. The nature of these criteria varies from purely syntactic to highly abstract. And while Greisdorf and O'Connor (2002) report that in their study the most utilized categorical descriptions were animals, art, sports, nature, transportation and people it is obvious from the complete list of categories that the subjects categorized images on various levels of detail, such as texture/material (e.g. plastic, metal), objects (e.g. horse, eye, vegetables), generic scene (e.g. urban landscapes, water scene), specific setting (e.g. California, Texas) and abstract concepts (e.g. love, death).
When constructing image categories, test participants have been found to evaluate similarity by considering overall image similarity across all dimensions rather than maximal similarity on one dimension (Greisdorf & O'Connor 2002). This means that the categorization may be based on using multiple similarity criteria at once. This was reflected in Jörgensen's (1995) categorization study where a third of the image group names provided by participants consisted on multiple words. In these group names one attribute seemed to modify the “main” attribute, creating image groupings such as Fantasy Landscape and City Landscape. In a similar vein, Enser (1993) found while analyzing requests for visual content that over half (52%) of all topical requests were further refined in terms of time, location, action, event, or technical specification such as image orientation.
Based on these previous studies one can draw the conclusion that photograph categories should therefore not be restricted to general semantic categories (e.g. cityscape), but also take into account the more affective and abstract concepts people use while describing image content (e.g. a photograph of a busy city street represents a hectic atmosphere or stress) as well as the visual impressions images make (e.g. associations from the colors, textures and orientations of buildings). Depending on the situation, these may reflect more accurately the mental strategies of the users sorting images or searching for them. Also the different hierarchical levels of image content need to be considered in order to take advantage of theories of image semantics and to reflect the multiple categorization criteria which may be present in image category names and similarity evaluations by subjects.
Methodological approaches to image categorization experiments
Various procedures have been used in subjective image similarity experiments. Most subjective categorization studies have used some type of free sorting procedure (Jörgensen 1995; Laine-Hernandez & Westman 2006; Rorissa & Hastings 2004; Vailaya et al 1998). In some studies, the number of categories has been predefined (Teeselink et al 2000). Some researchers (Mojsilovic & Rogowitz 2001) have opted to anchor the categorization by placing a set of images onto the table by category and then asking subjects to categorize another set of images onto the same surface. In one study, images did not need to be placed into distinct categories but subjects could arrange them along two dimensions according to similarity (Rogowitz et al 1998). The same study also had subjects evaluate similarity by selecting images most similar to reference images shown, i.e. using a modified paired comparison procedure.
The above-mentioned techniques have yielded various types of results and important information about how people interpret and categorize images. However, the basis for selecting the test images was not always stated (e.g. Greisdorf & O'Connor 2002) or only a thematically and/or visually narrow set of images was used (e.g. Jörgensen 1995; Rorissa & Hastings 2004). Greisdorf and O'Connor (2002) comment on another shortcoming in subjective image categorization experiments present in their own study as well as other categorization studies (e.g. Laine-Hernandez & Westman 2006): the subjects were allowed to assign a particular image to only one category. If the subjects were allowed to assign images into several categories, they would not be forced to choose only the strongest criteria for categorization, but express several categorization criteria at once. This is consistent with people's naturally multidimensional manner of evaluating images, be it from the viewpoint of content semantics, quality or something else. Offering the subjects the possibility to assign an image to several categories would yield more fine-tuned results, assuming that the task is not too demanding cognitively.
In addition to image content, the functions of images (i.e. their purposes of use) are a central aspect in image retrieval, because the intended use of the image affects both selection strategies and results (Jaimes 2006). Image functions should also be considered from the viewpoint of image categorization, as the function or intended purpose of use is a possible categorization criterion. Several researchers have reviewed image functions based on e.g. studies on image-searching behavior (Conniss et al 2000) and informational and pedagogical functions of images in professional use (Pettersson 1998). These studies have shown that images may have several types of functions in different contexts:
Illustrative—representing what is being referred to, e.g. depict situations, elaborate text
Informative—disseminating or processing information, e.g. convey or organize information
Pedagogical—uses in teaching, e.g. stimulating learning, facilitating understanding
Persuasive—influencing or controlling viewers, e.g. convincing or seducing them
Attention-related—roles in attentive processes, e.g. attracting and holding attention
Aesthetic—decorative purposes, e.g. appealing to the eye, adorning something
Affective—stimulating emotions, e.g. enhancing enjoyment, establishing a mood.
Pettersson (1998) also reports on a study where subjects evaluated the presumed intentions for utilizing visuals in newspapers, magazines and brochures. In half (51%) of the instances, the subjects felt the sender's intention was to induce receivers to take a stand for some person or issue. In this category common functions were to sell products, services or a life style, convey or create associations and convince viewers. In third of the cases (30%), the subjects felt the sender was attempting to convey objective information. The top functions were to convey factual information, illustrate actual circumstances, document and instruct. In a few instances (11%), the interpreted intention was to induce receivers to take a stand against some person or issue. In some instances, the subjects felt the sender was attempting to provide entertainment (5%) or that images were used as decoration (3%).