Multifaceted image similarity criteria as revealed by sorting tasks

Authors


Abstract

This paper reports a study on the types of image categories constructed from magazine photographs. A novel sorting procedure was tested with the aim of providing more data on image similarity and possible category overlap. Expert and non-expert participants were compared in their categorizations. The new similarity sorting procedure resulted in an average of 67%–111% increase in similarity data gathered compared to basic free sorting. Categories were constructed on various levels of similarity: image Function, main visual content (People, Objects and Scene), conceptual content (Theme) and descriptors (Story, Affective, Description, Photography and Visual). Most categories were based on the theme and people portrayed in the photograph, and in the case of the expert subjects, image function. Also abstract and syntactic similarity criteria were employed by the subjects. The categories created by each subject showed on average a 35%–53% overlap. Participants also demonstrated a tendency to use multiple similarity criteria simultaneously and to combine terms from different levels in a single category name. These results indicate a need for a multifaceted approach in image categorization.

Introduction

Image categorization is a central subject in the field of image retrieval, due to the need to index and represent meaningful groupings of images. Several user studies have aimed to discover the natural image categorization behavior and criteria people apply when evaluating image similarity. Previous image categorization studies are limited in that they focused on classifying images into single categories. This is the case despite the widespread acceptance of the need to categorize information and information objects on multiple facets (i.e. multifaceted categorization), instead of just one. Also, prior studies have used generic test materials and participant samples due to the aim of gaining knowledge about image similarity evaluations for use in content-based image retrieval systems. The tasks employed in previous studies have often demonstrated little connection to real life image tasks.

The aim of the present study was to find subjective image sorting criteria using a carefully selected image type, i.e. magazine photographs. This study also has a clear connection to an actual work context, journalism, where image categories are constructed for the purposes of archival, search and selection tasks (Westman & Oittinen 2006). Another goal was to test a novel categorization procedure which would enable multifaceted and overlapping categorization of images. Two subject groups, experts and nonexperts on journalistic images, were compared in their categorizations.

Past research on image categorization

Image categorization in theory and practice

There exist several typologies and frameworks for image content which serve as a starting point when discussing image categorization. Some of these are born out of content analysis of image descriptions while others are more theoretical constructs on image semantics. In studies focused on image descriptions and the attributes of images, one commonly distinguishes between generic (pre-iconographical), specific (iconographical) and abstract/affective (iconological) image descriptions based on the work of Erwin Panofsky. Shatford (1986) categorized the subjects of images further as generic of, specific of and about based on Panofsky's levels of analysis. Here, ofness refers to the factual content of the image (“what is the image of?”) and aboutness refers to its expressional content (“what is the image about?”). Based on data from image description tasks Jörgensen (1998) distinguished between 12 image attribute classes which she classifies as perceptual (Object, People, Color, Visual elements, Location, Description), interpretive (People-related attributes, Art historical information, Content/story, Abstract concepts, External relationships), and reactive (Viewer response). Approaching the subject of image content from yet another viewpoint, Burford et al (2003) and Eakins et al (2004) proposed a taxonomy of image content based on literature in computer science, art history and psychology. In addition to metadata they list nine hierarchical categories of information associated with an image: Perceptual primitives, Geometric primitives, Visual relationships, Visual extension, Semantic units, Contextual abstraction, Cultural abstraction, Technical abstraction and Emotional abstraction.

But how do humans judge the similarity of images and what types of image categories result from these subjective evaluations? Prior research has discovered that people evaluate image similarity based on the presence of people in the photographs, distinguishing e.g. between people, animals and inanimate objects as well as according to whether the scenes and objects in the images are man-made or natural, e.g. buildings vs. landscapes (Mojsilovic & Rogowitz 2001; Teeselink et al 2000). Other studies have concentrated on finding which types of image attributes could be used in creating efficient and effective image groupings for retrieval and browsing. Using photographs of people with the backgrounds removed the participants of Rorissa and Hastings (2004) formed the following main categories: exercising, single men/women, working/busy, couples, poses, entertainment/fun, costume and facial expression. Vailaya et al (1998) used outdoor images which subjects sorted into the following categories: forests and farmlands, natural scenery and mountains, beach and water scenes, pathways, sunset/sunrise scenes, city scenes, bridges and city scenes with water, monuments, scenes of Washington DC, a mixed class of city and natural scenes, and face images. The abovementioned studies indicate that people mostly evaluate the similarity of images at a high conceptual level constructing semantic image categories. This is also the level mainly employed in image description tasks as shown by e.g. Jörgensen (1998) and Hollink et al (2004).

It has, however, been noted that people do not always form the above types of generic content categories. For example, image categories are also created based on abstract concepts related to emotions or atmosphere, cultural references and visual elements (Laine-Hernandez & Westman 2006). Sormunen et al. (1999) found that journalists evaluate image similarity based on several criteria, including shooting distance and angle, colors, composition, cropping, direction (horizontal/vertical), background, direction of movement, objects in the image, number of people in the image, action, facial expressions and gestures, and abstract theme. The nature of these criteria varies from purely syntactic to highly abstract. And while Greisdorf and O'Connor (2002) report that in their study the most utilized categorical descriptions were animals, art, sports, nature, transportation and people it is obvious from the complete list of categories that the subjects categorized images on various levels of detail, such as texture/material (e.g. plastic, metal), objects (e.g. horse, eye, vegetables), generic scene (e.g. urban landscapes, water scene), specific setting (e.g. California, Texas) and abstract concepts (e.g. love, death).

When constructing image categories, test participants have been found to evaluate similarity by considering overall image similarity across all dimensions rather than maximal similarity on one dimension (Greisdorf & O'Connor 2002). This means that the categorization may be based on using multiple similarity criteria at once. This was reflected in Jörgensen's (1995) categorization study where a third of the image group names provided by participants consisted on multiple words. In these group names one attribute seemed to modify the “main” attribute, creating image groupings such as Fantasy Landscape and City Landscape. In a similar vein, Enser (1993) found while analyzing requests for visual content that over half (52%) of all topical requests were further refined in terms of time, location, action, event, or technical specification such as image orientation.

Based on these previous studies one can draw the conclusion that photograph categories should therefore not be restricted to general semantic categories (e.g. cityscape), but also take into account the more affective and abstract concepts people use while describing image content (e.g. a photograph of a busy city street represents a hectic atmosphere or stress) as well as the visual impressions images make (e.g. associations from the colors, textures and orientations of buildings). Depending on the situation, these may reflect more accurately the mental strategies of the users sorting images or searching for them. Also the different hierarchical levels of image content need to be considered in order to take advantage of theories of image semantics and to reflect the multiple categorization criteria which may be present in image category names and similarity evaluations by subjects.

Methodological approaches to image categorization experiments

Various procedures have been used in subjective image similarity experiments. Most subjective categorization studies have used some type of free sorting procedure (Jörgensen 1995; Laine-Hernandez & Westman 2006; Rorissa & Hastings 2004; Vailaya et al 1998). In some studies, the number of categories has been predefined (Teeselink et al 2000). Some researchers (Mojsilovic & Rogowitz 2001) have opted to anchor the categorization by placing a set of images onto the table by category and then asking subjects to categorize another set of images onto the same surface. In one study, images did not need to be placed into distinct categories but subjects could arrange them along two dimensions according to similarity (Rogowitz et al 1998). The same study also had subjects evaluate similarity by selecting images most similar to reference images shown, i.e. using a modified paired comparison procedure.

The above-mentioned techniques have yielded various types of results and important information about how people interpret and categorize images. However, the basis for selecting the test images was not always stated (e.g. Greisdorf & O'Connor 2002) or only a thematically and/or visually narrow set of images was used (e.g. Jörgensen 1995; Rorissa & Hastings 2004). Greisdorf and O'Connor (2002) comment on another shortcoming in subjective image categorization experiments present in their own study as well as other categorization studies (e.g. Laine-Hernandez & Westman 2006): the subjects were allowed to assign a particular image to only one category. If the subjects were allowed to assign images into several categories, they would not be forced to choose only the strongest criteria for categorization, but express several categorization criteria at once. This is consistent with people's naturally multidimensional manner of evaluating images, be it from the viewpoint of content semantics, quality or something else. Offering the subjects the possibility to assign an image to several categories would yield more fine-tuned results, assuming that the task is not too demanding cognitively.

Image functions

In addition to image content, the functions of images (i.e. their purposes of use) are a central aspect in image retrieval, because the intended use of the image affects both selection strategies and results (Jaimes 2006). Image functions should also be considered from the viewpoint of image categorization, as the function or intended purpose of use is a possible categorization criterion. Several researchers have reviewed image functions based on e.g. studies on image-searching behavior (Conniss et al 2000) and informational and pedagogical functions of images in professional use (Pettersson 1998). These studies have shown that images may have several types of functions in different contexts:

  • Illustrative—representing what is being referred to, e.g. depict situations, elaborate text

  • Informative—disseminating or processing information, e.g. convey or organize information

  • Pedagogical—uses in teaching, e.g. stimulating learning, facilitating understanding

  • Persuasive—influencing or controlling viewers, e.g. convincing or seducing them

  • Attention-related—roles in attentive processes, e.g. attracting and holding attention

  • Aesthetic—decorative purposes, e.g. appealing to the eye, adorning something

  • Affective—stimulating emotions, e.g. enhancing enjoyment, establishing a mood.

Pettersson (1998) also reports on a study where subjects evaluated the presumed intentions for utilizing visuals in newspapers, magazines and brochures. In half (51%) of the instances, the subjects felt the sender's intention was to induce receivers to take a stand for some person or issue. In this category common functions were to sell products, services or a life style, convey or create associations and convince viewers. In third of the cases (30%), the subjects felt the sender was attempting to convey objective information. The top functions were to convey factual information, illustrate actual circumstances, document and instruct. In a few instances (11%), the interpreted intention was to induce receivers to take a stand against some person or issue. In some instances, the subjects felt the sender was attempting to provide entertainment (5%) or that images were used as decoration (3%).

Methodology

We conducted an empirical study on image categorization in order to discover which image attributes are employed in the categorization of magazine images. In addition to content attributes we wanted to see if image functions are reflected in categorization. We also tested a novel two-phase categorization procedure aimed at providing more information about image similarity and to show if the categories overlap. The procedure was based on the principles of free sorting (no constraints on the number of categories or the time taken for categorization) but allowed for multiple categorizations of a single image. We also set out to analyze the relationships between the categories with hierarchical clustering and a qualitative analysis of the category names. The following research questions were posed: On what level does similarity evaluation of magazine photographs occur (e.g. function, syntax, semantics)? Are photographs categorized according to multiple criteria or into overlapping categories? Do non-expert and expert participants categorize magazine photographs in the same way?

Material

The test material consisted of photographs from five Finnish magazines of different genres. The magazines were chosen based on their wide circulation as well as varied photographic content and photojournalistic style. We used the issues sold during week 7 in 2006. The genres and typical photographic content of the magazines were as follows:

  • Women's magazine with portraits, product photographs, food, interiors and scenery.

  • General weekly magazine, predominantly photographs of people.

  • Economy magazine with people, product and other object photographs.

  • Travel magazine with scenery, interiors, portraits and object photographs.

  • Magazine for individuals, topics ranging from pop culture to politics. Various types of portraits, scenery and artistic photographs. Emphasis on aesthetically high visual quality.

All the editorial photographs (i.e. excluding advertisements) in the magazines were numbered. We excluded photographs that had text written or graphics drawn over them covering more than approximately 20% of the photograph, photographs which spanned over two pages, or were smaller than 4 cm in any dimension. Twenty photographs were chosen at random from the numbered photographs of each magazine, resulting in a total of 100 test photographs. No two photographs were chosen from the same page. The selected photographs were cut out and glued onto grey cardboard. If the photograph was partially covered by text or graphics, this additional element was hidden by covering it with black ink. The photographs were then numbered in random order.

Participants and procedure

The participants (n=30) were divided into two groups: non-expert and expert, based on their knowledge and experience on image categorization. The 18 non-expert participants (8 female, 10 male) were students of technology or university employees. Their average age was 24. The 12 expert subjects (11 female, 1 male) were staff members at magazines (4), newspapers (2), picture agencies (2) and museum photograph archives (4). Their average age was 44.

The experimental procedure was the same for both subjects groups and consisted of two phases. In phase 1 all the photographs were handed to the subjects in a random ordered pile. The subjects were instructed to go through the photographs and sort them into an unrestricted number of piles according to their similarity so that photographs similar to each other would be in the same pile. They were told to decide on their own the basis for evaluating similarity. There was no time limit for the task. At the end of phase 1, the subjects were asked to describe the similarity element in each pile, i.e. to name the piles. The experimenter wrote down these category names. The photographs were then removed from the categories, placed into a single pile and shuffled.

In phase 2 the subjects were shown the category list they generated during phase 1. The subjects were asked to go through the photographs again one at a time and to write the number of each photograph next to each category name in the list the photograph could belong to. Each photograph could be assigned to one or more categories. The subjects were told they need not remember how they categorized the photographs in phase 1. Again, there was no time limit. The purpose of the procedure was to provide some flexibility for the categorization and to enable - even if in a limited manner - the categorization of a single photograph in multiple ways.

Data analysis

Data were analyzed both quantitatively and qualitatively. The category names given by the subjects were analyzed qualitatively using grounded theory methodology. If the category name included more than one term and thus possibly more than one basis for categorization, two (or more) instances were created from the category name. For example, in the case of a category named successful women the instances successful and women were separated. All of the resulting instances were grouped iteratively by placing similar instances together. For example, the following category names were placed together: cars, vehicles, means of transportation, photographs of cars, means of transport and transportation, passenger car. Gradually top-level classes (e.g. Object) containing identifiable sub-level classes (e.g. Vehicles) emerged from the data. The data were then coded according to these classes. The reader should note that in this paper, the terms category and class carry distinct meanings. Category refers to a photograph group created by a subject in the experiment. Class refers to an instance of category names coded according to the class structure. A single category may include references to multiple classes. For example, the category successful women references both classes Description-Property and People-Gender. Categories which in their name include more than one term and thus may reference more than one class are referred to as multi-class categories.

The reported statistically significant differences are based on conducting two-tailed t-tests with unequal variances or, when specified, chi squared tests on frequency data. The categorization data were analyzed using hierarchical cluster analysis in Matlab based on the pairwise occurrences of photographs in the same categories. Due to the procedure in phase 2, no common similarity measure such as percent overlap or co-occurrence measure could be used as these could have resulted in distances smaller than 0 in cases where two photographs co-occurred in several groups. Therefore the similarity of two photographs was calculated using a modified percent overlap measure. The similarity P of two photographs i and j (for each pair of photographs i and j) was the ratio of the number of common placements of both i and j in the same category to the total number of placements of i, where the total number of placements of i was smaller or equal to the total number of placements of j. The formula for P thus became P = 1—[p(i, j)/p(i)], where p(i) ≤ p(j).The modified percent overlap gave a measure of similarity, which was converted to a measure of dissimilarity or distance D = 1—P.

Results

Time spent and categories created

Non-experts created on average 17 categories while experts created an average of 12 categories. This difference was statistically significant (p<0.05). The amount of categories created by individual subjects varied widely as can be seen from Table 1. The categories of non-experts contained between 1 and 46 images (mean=6.0, std=6.8) and those of experts between 1 and 48 images (mean=8.3, std=9.0). Altogether 145 categories were collected from the 12 experts and 298 categories from the 18 nonexperts. Non-experts seemed to take longer to complete the experiment but the difference was not statistically significant (p=0.08).

Table 1. Number of categories in the two subjects groups and time used in the experiment
Subject groupNo. of categoriesTime [min]
 minmaxmeanstdminmaxmeanstd
Non-expert435177451397325
Expert52012441856112

The number of category placements (minimum of 100) in phase 2 of the experiment is shown in Table 2. In phase 2, non-expert subjects averaged 153 photograph placements while experts placed the 100 photographs into categories 136 times on average. The difference was not statistically significant (p = 0.17). A single image was placed from 1 to 6 categories. For non-experts, in 59% of cases the photograph was placed into a single category, in 31% of cases to two, 9% to three and 1% to four or more. For experts, in 73% of cases there was a single placement, in 22% two, in 4% three and in 1% four or more placements. The test photographs averaged between 1 and 2.17 placements per photograph across all test subjects.

Table 2. Number of category placements in phase 2
Subject groupNo. of category placements in phase 2
 minmaxmeanstd
Non-expert10222415337
Expert10320813630

Types of categories

The non-expert participants often formed categories on various semantic levels instead of a controlled categorization on a single level. They named objects present in the images (e.g. in the category cars or in the category statues) or described the scene portrayed in the image (e.g. street photographs, European cities). Photographs of people often prompted categories based on gender, social status (e.g. politicians, public figures), and whether or not the people were posing for the photograph. Also the context in which a person was photographed (e.g. work environment, in the city) was used as a categorization criterion. Non-expert subjects also created categories based on the emotional impact images had on them or the mood they interpreted from the image (e.g. neutral images of people, casual atmosphere).

Experts typically formed thematic categories such as culture, travel, fashion, food and drink, symbols, technology, work, sports and transportation (vs. the category “cars” often formed by non-expert participants). Categories were created based on the objects appearing in the photographs (e.g. several objects, signs). Photographs of people were further categorized according to various properties such as number of people, social status (e.g. youth/students), relationships (e.g. partnership) and what the person represented (e.g. ordinary people). The experts often created categories reflecting on the photographs' purpose of use (e.g. product photographs, reportage photographs).

The vast majority of categories by both groups were formed based on semantic image content. However, syntactic similarity criteria via photography vocabulary was present in some category names with references to e.g. image size, distance to the subject, black and white photographs, cropping and separation of foreground and background.

The category names provided by the subjects were analyzed in order to study the sorting criteria employed. From the 443 categories collected 586 class instances (409 non-experts; 177 experts) were created and analyzed. Each instance refers to a single term/categorization criterion. Three categories were left out of the content analysis due to their ambiguous nature. The percentage distributions of the found top- and sub-level classes of image similarity criteria appear in Table 3.

Table 3. Percentage distribution of classes in category names by provided by subjects
original image

A chi square test was conducted to find out if there was a statistically significant difference as to what types of categories the two subject groups constructed. The expected frequencies at the sub-level classes were too low for the requirements of the analysis but at the level of top-level classes the difference was significant (p<0.001). Figure 1 illustrates the use of the top-level classes by the two subject groups. Experts created more categories referring to the Function of the photograph (20% vs. 3.2%) while non-experts employed the People, Object and Scene levels slightly more (25% vs. 24%; 12% vs. 11% and 10% vs. 6.8%). The class Story was used nearly three times as much by the nonexpert group than by the experts (7.6% vs. 2.8%). Non-expert subjects used all of the descriptive classes (Description, Visual, and Photography) more than experts in their category names.

Figure 1.

Percentage distribution of top-level classes in categories created by subject group (p<0.001, chi square=54.88, df=9)

Clustering results

Hierarchical clustering was conducted on the categorization data from phase 2 using the average-linkage method also employed by Rorissa and Hastings (2004). It produced better results on the current data sets than the complete-linkage method used by Laine-Hernandez and Westman (2006), Lohse et al (1990), Teeselink et al (2000), and Vailaya et al (1998). The quality of the solution was evaluated by with the cophenetic correlation coefficient which should be close to 1 for a high-quality solution. The coefficient value was 0.86 for non-experts and 0.81 for experts.

The dendrograms in Figures 2 and 3 have been drawn to show 17 nodes for non-experts and 12 for experts, corresponding to the mean number of categories constructed in each group. The node labels have been extracted from the category names provided by the subjects. Nodes in italics only contain a single image. Jaccard's coefficient was used to measure if the groups categorized the images in a similar manner. Lohse et al (1994) and Rorissa and Hastings (2004) have used it to test the consistency of subjects in categorization tasks. The coefficient value for phase 1 categorization data was 0.623 and for phase 2 data it was 0.809. It may be concluded that there was similarity between the groups as to which images were categorized together.

Figure 2.

A dendrogram of the photographs for the categorization data of the non-expert group.

Figure 3.

A dendrogram of the photographs for the categorization data of the expert group.

Multi-class categories

Out of the 443 categories collected from the participants, 131 categories included more than one term and thus received multiple class codes. The share of these multi-class categories out of all categories formed was 33.8% for non-experts and 21.5% for experts. That is, one fifth to one third of all category names included more than one term and possibly more than one basis for categorization. The different between the two subject groups was statistically significant (p<0.01, chi square=6.95, df=1). The 131 category names were divided into 277 class instances. On average, these multi-class categories consisted of 2.13 terms/classes (non-experts) and 2.06 terms/classes (experts).

The emergence of multi-class categories enabled analysis on the types of similarity criteria combined in creating a category. Most multi-class categories (70%) included the class People. It was most often combined with Description (e.g. category images with more than one person) and Photography (black and white images of children). Over 19% of multi-class categories included the class Theme combined with e.g. People (people in a studio/fashion). The class Story was used in 18% of multi-class categories, most often combined with the class People (shots of groups in action). Nearly 17% of multi-class categories included the class Scene, again most often combined with People (people in an urban environment). The class Object was included in 13% of multi-class categories, most commonly combined with Description (masculine objects). The class Description was used to modify 24% of multi-class categories and Photography was used in 18% of them.

Table 4 shows the frequency of each top-level class in the multi-class categories (i.e. how many times a class was referenced in the 131 multi-class category names) and how each class was combined with other classes by the subjects to create a multi-class category. The proportion of class references in multi-class categories to all class references varied greatly for the ten top-level classes. This reflects the different natures of the classes, as some appeared to be standalone descriptors of image categories and others mostly modified another classes. Almost 72% of all mentions of the class People occurred in multi-class categories. Theme was mostly used on its own as a single term name (19%) while descriptive classes such as Photography and Description were predominantly used in multi-class categories (81% and 80%). The total number of combinations in which a certain class was used may be larger than the frequency of the class since a single class could be combined with multiple other classes in one category name. The number of combinations may also appear smaller than the frequency since any combination with another instance of the same class (e.g. People-People) was counted as a single combination only.

Table 4. Combining classes in multiclass categories
ClassnCombined with other classes (no of times)
Function15People(6), Theme(4), Object(2), Scene(2), Description(1), Story(1), Photography(1)
People104Description(22), Photography(16), Story(15), Theme(12), People(12), Scene(11), Function(6), Affective(4), Object(3), Visual(1)
Object17Description(6), Scene(4), People(3), Function(2), Affective(1), Visual(1), Photography(1)
Scene22People(11), Object(4), Function(2), Theme(2), Story(1), Visual(1), Photography(1)
Theme26People(12), Function(4), Story(3), Scene(2), Photography(2), Theme(1), Visual(1)
Story24People(15), Theme(3), Photography(2), Function(1), Scene(1), Visual(1)
Affective6People(4), Object(1), Description(1)
Description32People(22), Object(6), Description(1), Function(1), Affective(1), Visual(1)
Visual6People(1), Object(1), Scene(1), Theme(1), Story(1), Description(1)
Photography25People(16), Theme(2), Story(2), Photography(2), Function(1), Object(1), Scene (1)

The use of classes in multi-class category names differs from that found overall in category names. The differences illustrated in Table 5 were statistically significant for both non-experts and experts. The difference in the classes People and Theme is particularly noteworthy.

Table 5. Frequency and share of top-level classes in all categories and in multi-class categories
ClassNon-expertExpert
 All categoriesMulti-class categoriesAll categoriesMulti-class categories
 n%n%n%n%
Function133.283.83519.8710.9
People10325.27937.14223.72539.1
Object5012.2115.21910.769.4
Scene4210.3188.5126.846.3
Theme9122.3209.44424.969.4
Story317.62310.852.811.6
Affective174.262.821.100.0
Description317.62310.895.1914.1
Visual82.052.410.611.6
Photography235.6209.484.557.8
 p<0.001, chi square=34.41, df=9p<0.05, chi square=19.22, df=9

Discussion

The novel categorization procedure tested functioned well. Participants were able to follow the instructions and complete the tasks, resulting in 66%-111% increase in the data gathered regarding image similarity and the classes in which photographs could belong. First of all, participants could name their categories according to multiple classification criteria (multi-class categories). On average categories constructed by non-experts included 1.38 and experts' categories contained 1.23 classes. Secondly, participants could assign a photograph to multiple categories in phase 2. Non-experts categorized an image into an average of 1.53 categories while experts used 1.36 categories per photograph. These figures multiplied (the number of classes per category times the number of category placements) resulted in one image belonging to an average of 2.11 classes in the non-expert group and 1.67 classes in the expert group. One possible alternative for the categorization procedure used here would be to employ a multiple sort method which consists of asking for different basis of categorization instead of single sort so that the subjects provide various different categorization outputs. With 100 test images the actual sorting might take too long. Further development and testing of the procedure described here is needed in order to verify its usability and the results obtained here.

Non-experts categorized the 100 test photographs into a larger number of categories than experts (17 vs. 12 categories) and their categories overlapped more (53% vs. 36% overlap). Clustering results show mainly general semantic content clusters, which is in line with previous results on image categorization (Mojsilovic & Rogowitz 2001; Rorissa & Hastings 2004; Teeselink et al 2000). However, in several of the named nodes the similarity criteria are abstract. The results from experts show a functional cluster named symbolic, containing photographs to be used as symbolic illustrations of a topic (a key; two sets of hands on an office table; a technical closeup from a factory). The dendrograms for both non-expert and expert subjects include a node containing fictional images (e.g. shots from a movie scene). These were recognized based on their visual appearance and were clearly separated from real-life scenes by most subjects. The results from the non-expert group also show syntactical clusters, as black and white images were separated from color photographs in two cases. The number of people present in an image as well as their cultural/ethnic background was also a dividing factor between clusters. The division between portraits of a single person and images of groups seemed particularly clear.

The category names (i.e. sorting criteria that the participants chose to communicate) provided by subjects were of several types. They referred to the Function of the images, their main visual content (e.g. People, Objects and Scene recognized), conceptual content (e.g. Theme interpreted) and descriptors (Story, Affective, Description, Photography and Visual). The descriptors were mainly used in conjunction with other classes, in most cases People and Object. Roughly 50% of all class instances analyzed from the category names given by both non-experts and experts referred to People or Theme. These were thus the prevalent facets on which magazine images were categorized. Non-experts categorized images on the level of scene and story more than experts who in turn created more functional categories. The finding that the Story class was more pronounced in the non-expert groups is interesting since descriptions of image content often include a story connected with the image (Jörgensen 1995; Laine-Hernandez & Westman 2006) but our results suggest that when categorizing images expert subjects are more prone to summarizing these story aspects (e.g. time, activity) into thematic descriptions.

The possible intended uses and functions of the image in publishing were mentioned in nearly 20% of the instances from categories created by experts, and were also employed by the non-experts. The functions of images included illustrative (e.g. illustrations about things, objects, places), informative (e.g. documentary images) and persuasive functions (e.g. product images, function to sell), also found in literature. Many of the individual functions (e.g. profile piece) seemed akin to image genres, which are important in image journalism in that shoot search and selection tasks are often typified through these. Also, some of our Function classes are similar to the uses of Type in Art historical attributes by Jörgensen (1995). This type of functional categorization was based on the idea that a particular type of images functions similarly when used, rather than their images' subject matter necessarily being similar.

The categories which subjects created were not mutually exclusive. This was shown by the 35%–53% overlap in category placements in phase 2. Especially images depicting people in action in rich contexts (e.g. foreign cultures, work environment) received multiple placements from many subjects. Subjects also explicitly combined several terms in 22%–34% of all category names, i.e. creating multi-class categories. Participants used these multifaceted category names to distinguish between two of their groups on a sublevel (e.g. women - posing vs. women—at work) but they could also result from the participant combining images matching several alternate criteria into a joint category (e.g. work, workplace and worker) or simply describing the similarity elements with multiple terms (e.g. successful women). Jörgensen (1995) found that one third of the group names were composed of multiple terms. She recorded an average of 1.51 codes (classes) per group name (category). Our figures were 1.38 and 1.23 classes for non-experts and experts, respectively.

The category names composed of multiple terms differed from overall category names. For example, although class Theme was used in roughly 23% of all classes it accounted only for 9% of terms present in multi-class category names. Other classes became more pronounced in the multi-class categories. The class People accounted for roughly 25% of all coded terms but its share grew to roughly 38% in multi-class categories. Multi-class category names were in fact most frequently constructed when naming categories of photographs of people. Almost three out of four People categories were further typified with some description of the person(s), photographical attribute, story or theme of the photograph or another people-related attribute. Some of the classes identified seem to be descriptive in their nature and were used in modifying other classes. Between 67% and 80% of all terms in classes Story, Visual and Photography (in addition to Description) were found in in multi-class category names, that is, in combination with another class.

Conclusions

As Mojsilovic and Rogowitz (2001) note, the image categories created in a single experiment are not deterministic but depend on the task, user and environment of the interaction with images. Naturally, the categories also depend on the test images used. The present study has discovered important levels on which non-experts and experts evaluate image similarity and describe their image groupings in the context of magazine photographs and our novel categorization procedure. Among these, different types of image attributes (image function, main content, content descriptors) were identified together with strategies of combining them to create multifaceted image categories. These are valuable when e.g. evaluating the suitability of image indexing frameworks for use in journalistic annotation tasks and selecting the types of concepts to be automatically detected from images via feature analysis algorithms. Our results as a whole speak for the need of a multifaceted approach to image categorization both in research and practice.

Acknowledgements

The authors wish to acknowledge the support of The National Technology Agency of Finland for this research project. We would also like to thank Professor Pirkko Oittinen for her constructive comments.

Ancillary