Computational Exploration of Metaphor Comprehension Processes Using a Semantic Space Model

Authors


should be sent to Akira Utsumi, Department of Informatics, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofushi, Tokyo 182-8585, Japan. E-mail: utsumi@inf.uec.ac.jp

Abstract

Recent metaphor research has revealed that metaphor comprehension involves both categorization and comparison processes. This finding has triggered the following central question: Which property determines the choice between these two processes for metaphor comprehension? Three competing views have been proposed to answer this question: the conventionality view (Bowdle & Gentner, 2005), aptness view (Glucksberg & Haught, 2006b), and interpretive diversity view (Utsumi, 2007); these views, respectively, argue that vehicle conventionality, metaphor aptness, and interpretive diversity determine the choice between the categorization and comparison processes. This article attempts to answer the question regarding which views are plausible by using cognitive modeling and computer simulation based on a semantic space model. In the simulation experiment, categorization and comparison processes are modeled in a semantic space constructed by latent semantic analysis. These two models receive word vectors for the constituent words of a metaphor and compute a vector for the metaphorical meaning. The resulting vectors can be evaluated according to the degree to which they mimic the human interpretation of the same metaphor; the maximum likelihood estimation determines which of the two models better explains the human interpretation. The result of the model selection is then predicted by three metaphor properties (i.e., vehicle conventionality, aptness, and interpretive diversity) to test the three views. The simulation experiment for Japanese metaphors demonstrates that both interpretive diversity and vehicle conventionality affect the choice between the two processes. On the other hand, it is found that metaphor aptness does not affect this choice. This result can be treated as computational evidence supporting the interpretive diversity and conventionality views.

1. Introduction

Metaphors pervade language, both in spoken and written discourse. For example, in an analysis of different types of discourses, Cameron (2008) demonstrated that 20 metaphors were used per 1,000 words for college lectures, 50 metaphors in ordinary discourse, and 60 metaphors in discourses by teachers. Hence, it is not an exaggeration to say that people cannot verbally communicate with each other without using metaphors. Furthermore, an increasing number of studies have revealed that metaphors are essentially involved in our everyday thought (e.g., Gibbs, 2006; Kövecses, 2002; Lakoff & Johnson, 1980).

The prevalence of metaphor in language and thought has motivated a considerable number of cognitive studies on metaphor, particularly on the cognitive mechanism of metaphor comprehension. These studies have focused on how people comprehend metaphors and have discovered that two different processes, namely, comparison (Gentner, 1983; Gentner, Bowdle, Wolff, & Boronat, 2001; Gentner & Markman, 1997) and categorization (Glucksberg, 2001; Glucksberg & Keysar, 1990), are involved in metaphor comprehension. Therefore, recent psycholinguistic studies have explored the metaphor property that determines the choice between the two processes and proposed different views: the conventionality view (Bowdle & Gentner, 2005; Gentner & Bowdle, 2008), aptness view (Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006), and interpretive diversity view (Utsumi, 2007); these views, respectively, argue that vehicle conventionality, metaphor aptness, and interpretive diversity determine the choice. However, the studies that have empirically tested these three hybrid views show different results. Researchers have hitherto not reached a consensus, and there is a heated debate regarding the kind of metaphors that are processed as comparisons and as categorizations (e.g., Bowdle & Gentner, 2005; Gibbs, 2008; Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006; Utsumi, 2007).

To answer the question regarding which of these metaphor views is most plausible, this study adopts an approach that is different from that of existing studies, namely, computational modeling and simulation. In this approach, this study employs a semantic space model (e.g., Landauer, McNamara, Dennis, & Kintsch, 2007; Padó & Lapata, 2007; Widdows, 2004). It models two processes of metaphor comprehension (categorization and comparison) using a semantic space model and determines which of the two models better explains the human interpretation of each Japanese metaphor obtained in a psychological experiment (Utsumi, 2007). The result of the model selection procedure is then predicted by three metaphor properties, namely, conventionality, aptness, and interpretive diversity. The best predictor can be treated as the most plausible view of metaphor comprehension that determines a shift between comparison and categorization.

The rest of this article is organized as follows. Section 2 illustrates the comparison and categorization views of metaphor comprehension, and then presents three hybrid views for reconciling the categorization and comparison views, which I intend to compare in this study. Other metaphor views are also reviewed in this framework and discussed again in Section 5 with reference to the implications of the result of this article. Section 3 explains the semantic space model as a computational framework used for the simulation experiment, and two algorithms as models of the comparison and categorization processes. Furthermore, this section shows that these algorithms are plausible models of the psychological processes of comparison and categorization by demonstrating that they are consistent with two processing phenomena that the existing empirical studies of metaphor have employed to determine the dominant process in metaphor comprehension, that is, grammatical concordance between form and function and directionality in the metaphor comprehension process. Section 4 presents the procedure for model selection and theory testing performed in the simulation experiment and the result of the simulation experiment. Section 5 explains the implications of the simulation results as well as some issues concerning the computational methodology based on the semantic space model. Finally, Section 6 provides concluding remarks for this study.

2. Metaphor theories

A metaphor consists of two concepts, which are referred to as topic and vehicle. A topic is the concept described using a metaphor, and a vehicle is the concept employed to describe a topic in a metaphor. For example, a metaphor ‘‘An X is a Y’’ has the topic X and the vehicle Y. Note that analogy researchers often refer to the topic as the target and the vehicle as the base (or the source).

2.1. Disagreement between comparison and categorization

Metaphor comprehension is a process of establishing correspondences in the non overlapping domains of the vehicle and the topic of a metaphor. For example, consider the metaphor ‘‘Socrates is a midwife.’’ This metaphor implies a correspondence between the topic Socrates and the vehicle midwife such that, just as a midwife helps women to bring children into the world, Socrates helps his students to bring interesting ideas into the world. Hence, the manner in which such correspondences are established for comprehending metaphors is a central topic for metaphor research. Although a considerable number of studies have been conducted on this topic, a consensus has not been reached. The approaches of such studies to the mechanism of metaphorical mapping are divided into two categories, namely, the comparison view (e.g., Gentner, 1983; Gentner & Markman, 1997; Gentner et al., 2001) and the categorization view (e.g., Glucksberg, 2001; Glucksberg & Keysar, 1990).

2.1.1. Comparison view

The comparison view has argued that metaphors are comprehended via a comparison process, wherein metaphorical correspondences are established by finding commonalities between conceptual representations of the vehicle and the topic. According to the structure mapping theory proposed by Gentner et al. (Gentner, 1983, 1989; Gentner & Markman, 1997; Gentner et al., 2001), which is a representative theory of the comparison view, metaphorical commonalities are found by a process of structural alignment between two representations. In the alignment process, all identical elements (i.e., predicates and arguments) in the topic and the vehicle (e.g., the predicates help and bring in the case of ‘‘Socrates is a midwife’’) are matched. In the later process of alignment, these local matches are collected to form structurally consistent mappings, and these mappings are then merged into the common structure. Next, the elements connected to the common structure in the vehicle but not initially present in the topic (e.g., the predicates specifying the gradual development of the child within the mother) are projected as candidate inferences into the topic. Structural alignment and inference projection constitute a process of comparison, which Gentner et al. propose as a general cognitive mechanism for analogy, metaphor, and similarity.

The basic concept of the structure mapping theory is shared by dominant theories of analogy (e.g., Holyoak & Thagard, 1989; Hummel & Holyoak, 1997; Larkey & Love, 2003). They accept the process of comparison, comprising alignment and projection, as a mechanism of metaphorical mapping. (In particular, Hummel and Holyoak's [1997] Learning and Inference with Schemas and Analogies [LISA] is basically the structure mapping view.) The difference among these theories lies primarily in the kind of similarities that are preferentially included in the common structure. The structure mapping theory emphasizes relational similarities, particularly similarities between higher order relations according to the systematicity principle. On the other hand, Holyoak and Thagard's (1989) Analogical Constraint Mapping Engine (ACME) and Hummel and Holyoak's (1997) LISA argue that semantic and pragmatic similarities are also required in analogical mapping. Despite such differences, these theories of analogy can be classified into the comparison view (for the same treatment, see Gentner & Bowdle, 2008).

The conceptual metaphor theory (Clausner & Croft, 1997; Grady, 1997; Kövecses, 2002; Lakoff & Johnson, 1980, 1999) is related to the comparison view. The basic tenet of the conceptual metaphor theory is that our conceptual system is structured by preexisting cross-domain mappings, the so-called conceptual metaphors, which are grounded in embodied experiences. Verbal metaphors (i.e., metaphorical expressions) are assumed to be comprehended merely on the basis of such conceptual metaphors. Conceptual metaphors differ in the way they are experientially grounded; Grady (1997, 2005) distinguished primary metaphors (e.g., Happy Is Up)—directly motivated by experiential correlation—from complex metaphors (e.g., Theories Are Buildings), which do not appear to be directly motivated by experiential correlation but are constructed by combining primary metaphors. Although primary metaphors are embodied and not based on analogical mappings, some complex metaphors are based on analogical mappings (Grady, 2005). Hence, the conceptual metaphor theory can be regarded as relevant to the comparison view, rather than to the categorization view. Note that such an embodied view of metaphors leads to the critical argument that metaphor comprehension cannot be simulated using non embodied computational methods, such as the semantic space model (e.g., Louwerse & Van Peer, 2009), which appears to be incompatible with this study. This issue will be discussed in Section 5.2.

2.1.2. Categorization view

The categorization view by Glucksberg et al. (Glucksberg, 2001, 2003; Glucksberg & Keysar, 1990; Glucksberg, McGlone, & Manfredi, 1997) has claimed that metaphors are comprehended via a categorization process, by which the topic is treated as a member of an abstract superordinate category exemplified by the vehicle. The topic and the vehicle play different roles in this comprehension process; the vehicle provides a superordinate category that can be used to characterize the topic, whereas the topic constrains the dimensions by which it can be characterized. For example, the metaphor ‘‘My job is a jail’’ is comprehended so that the topic my job is categorized as an ad hoc category like ‘‘unpleasant and confining things’’ to which the vehicle jail typically belongs. In evoking the ad hoc category, my job facilitates the attribution of features related to tasks and jobs, while blocking out irrelevant features such as those related to jail building.

The recent development of Sperber and Wilson's (1995) relevance theory adopts a view of metaphor comprehension that is similar to the categorization view. Relevance theory argues that metaphor comprehension can be regarded as the online construction of ad hoc concepts by broadening or narrowing lexical (or literal) meanings of the vehicle (Carston, 2002; Sperber & Wilson, 2008). The role of the topic assumed in relevance theory differs from Glucksberg's attributive categorization view. Relevance theory assumes that the topic influences the process of concept construction through pragmatic inferencing according to the principle of relevance, rather than restricting the dimensions along which features are mapped. Despite such a difference, the relevance-theoretic view of metaphor is very similar to the categorization view; thus, it can be reasonably classified as the categorization view (Carston, 2002).

2.1.3. Which view is better?

The two views of categorization and comparison encounter serious problems of their own; this makes it difficult to decide which view is better for a plausible theory of metaphor comprehension.

The main problem of the categorization view is that it downplays the role of the topic in metaphor comprehension (Bowdle & Gentner, 2005). This view assumes that multiple ad hoc categories are initially derived in parallel from the vehicle. This assumption implies that in many cases, an unlimited number of categories have to be generated and stored in working memory until the topic determines the relevant category. Such a process seems to be resource demanding and therefore is intuitively less plausible. An adequate theory of metaphor should allow more interaction between the topic and the vehicle.

In contrast, the comparison view has a serious problem of dealing with unspecified topics (e.g., Glucksberg et al., 1997). When hearers are unfamiliar with the topic, they cannot align the topic with the vehicle and thus cannot derive common structures sufficient to yield a proper interpretation. Given that a metaphor serves as an efficient way of describing the topic from a novel perspective, an adequate theory of metaphor should provide a way of deriving a metaphorical interpretation primarily from the vehicle.

2.2. Reconciliation between comparison and categorization

The comparison and categorization views have their respective limitations and advantages; this has led metaphor research to reconcile these two opposite views. Thus, recent studies (e.g., Bowdle & Gentner, 2005; Glucksberg & Haught, 2006b; Jones & Estes, 2005, 2006; Utsumi, 2007) have provoked a heated debate on how these two views can be reconciled; that is, they debate over the metaphor properties that determine the choice of comprehension strategy between categorization and comparison. Three different views, namely, the conventionality view, aptness view, and interpretive diversity view, have been proposed for reconciling the categorization and comparison views; these three views are summarized in Table 1. In this section, using the metaphor examples listed in Table 2, I explain how these views predict the comprehension process.

Table 1. 
Comparison of three hybrid metaphor views that attempt to reconcile the categorization view and the comparison view
Metaphor ViewInitial ProcessAlternative ProcessWhat Kind of Metaphor Should Activate the Alternative?
  1. Note. Each view predicts that the comprehension of all metaphors starts with the initial process (the second column), but a specific kind of metaphor (the last column) is comprehended later by the alternative process (the third column).

Conventionality view (Bowdle & Gentner, 2005)ComparisonCategorizationConventional metaphors (metaphors referring to a lexically encoded metaphoric category that can be attributed to the topic)
Aptness view (Glucksberg & Haught, 2006a, 2006b)CategorizationComparisonLess apt metaphors (metaphors that cannot evoke any metaphoric categories relevant to the important feature of the topic)
Interpretive diversity view (Utsumi, 2007)CategorizationComparisonLess diverse metaphors (metaphors that cannot evoke any rich metaphoric categories for the topic)
Table 2. 
Examples of metaphors showing how the three metaphor views make the same or different predictions on the comprehension process
Metaphor ExampleConventionality ViewAptness ViewInterpretive Diversity View
ConventionalityPredicted ProcessAptnessPredicted ProcessDiversityPredicted Process
  1. Note. Each metaphor example can be characterized by the distinctive properties (i.e., conventionality, aptness, diversity) of three metaphor views, and its comprehension process is predicted on the basis of the characterized properties.

My job is a jailConventionalCategorizationAptCategorizationDiverseCategorization
A gene is a blueprintConventionalCategorizationAptCategorizationLess diverseComparison
My memories are moneyConventionalCategorizationLess aptComparisonDiverseCategorization
Birds are airplanesConventionalCategorizationLess aptComparisonLess diverseComparison
A goalie is a spiderNovelComparisonAptCategorizationDiverseCategorization
That supermodel is a railNovelComparisonAptCategorizationLess diverseComparison
A child is a snowflakeNovelComparisonLess aptComparisonDiverseCategorization
A fisherman is a spiderNovelComparisonLess aptComparisonLess diverseComparison

2.2.1. Conventionality view

Bowdle and Gentner (2005) have claimed that although metaphors are initially processed as comparisons, conventional metaphors are processed as categorizations by accessing the stored metaphoric categories that are conventionalized by repeated figurative use of the vehicle term.1 According to this view, it is vehicle conventionality that determines the choice of comprehension strategy. Vehicle conventionality (or simply, conventionality) refers to the degree of association between the figurative meaning of a metaphor and the vehicle of that metaphor (Bowdle & Gentner, 2005; Gentner & Wolff, 1997; Jones & Estes, 2006). Consider, for example, that the term snowflake is used as the vehicle of metaphors. As it is rarely used in metaphors, this term seems to have no conventional metaphoric categories, and it only refers to a crystal of snow. The metaphorical meaning of the metaphor ‘‘A child is a snowflake,’’ shown in Table 2, cannot be associated with the vehicle snowflake before the comprehension of that metaphor. Hence, according to the conventionality view, the metaphorical meaning should be derived online by the process of comparison between child and snowflake. (Therefore, this novel metaphor may mean that all children are unique and delicate.) Likewise, the terms spider and rail have no salient metaphorical meanings. Hence, metaphors with these vehicle terms (e.g., ‘‘A goalie is a spider,’’‘‘That supermodel is a rail,’’ and ‘‘A fisherman is a spider’’ in Table 2) are novel (or less conventional) and are processed as comparisons.

On the other hand, the term jail refers to the metaphoric category ‘‘unpleasant and confining things,’’ which is conventionalized by the repeated figurative use of the term. Hence, the metaphorical meaning of the metaphor ‘‘My job is a jail’’ is highly associated with the vehicle jail, and the comparison process can be bypassed. As a result, the metaphor is comprehended directly by the categorization process, in which the topic is regarded as a member of the metaphoric category ‘‘unpleasant and confining things.’’ Other terms in Table 2 such as blueprint, money, and airplane also refer to a conventional metaphoric category relevant to the metaphors ‘‘A gene is a blueprint,’’‘‘My memories are money,’’ and ‘‘Birds are airplanes.’’ Thus, they are comprehended as categorizations. It must be noted here that when the vehicle of a metaphor has a conventional metaphoric category but that category cannot be ascribed to the topic (e.g., the metaphor ‘‘My team is a jail’’ is less likely to mean that my team is unpleasant and confining), the metaphor is not conventional and should be comprehended via the comparison process.

2.2.2. Aptness view

Against Bowdle and Gentner's (2005) conventionality view, the recent development of the categorization view (Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2005, 2006) has advocated that metaphor aptness, not vehicle conventionality, mediates both the processes. This view argues that while metaphors are initially processed as categorizations, less apt metaphors are processed as comparisons, as shown in Table 1. Metaphor aptness (or simply, aptness) refers to the extent to which the vehicle's metaphoric category captures an important feature of the topic (Blasko & Connine, 1993; Chiappe & Kennedy, 1999; Jones & Estes, 2006). If the vehicle of a metaphor indicates a metaphoric category that is relevant to an important feature of the topic during a categorization process, the metaphor is apt and the comprehension process is completed regardless of whether the metaphoric category is constructed online or lexically encoded in the vehicle. A typical example is the metaphor ‘‘My job is a jail,’’ whose metaphorical meaning is relevant to the important feature of the topic (e.g., pleasantness and the degree of confinement are important characteristics of all jobs). Although some conventional metaphors such as the job-jail example are apt, some novel metaphors are also apt. For example, the term spider may have no conventional metaphoric categories, but the metaphor ‘‘A goalie is a spider’’ in Table 2 is nevertheless highly apt because the ad hoc metaphoric category ‘‘waiting for something in the net and shooting it down in a quick manner,’’ which is newly created by this metaphor, expresses an important feature of goalies. The metaphor ‘‘That supermodel is a rail’’ is also apt and eventually processed as categorization because the property implied by the vehicle (i.e., being extremely thin) is highly relevant to fashion models. As a result, in the case of novel but apt metaphors, the aptness and conventionality views make a different prediction about the comprehension process.

According to the aptness view, less apt (or low apt) metaphors may be processed as comparisons because a categorization does not make sense. For example, although the metaphor ‘‘A goalie is a spider’’ is highly apt, a different metaphor with the same vehicle ‘‘A fisherman is a spider’’ is not apt because the vehicle spider cannot evoke any metaphoric categories that include the topic fisherman as a typical member. Therefore, it should be reinterpreted by the process of comparison, and it yields an interpretation such as ‘‘a fisherman uses a net to catch fishes like a spider.’’ Conventional metaphors are also less apt when the conventional metaphoric category is not relevant or informative. The term money refers to the conventional metaphoric category of precious things, and thus, the metaphor ‘‘My memories are money’’ in Table 2 can mean that my memories are precious to me. However, preciousness is not necessarily a salient feature of memory. Hence, the metaphor is less apt and may be reinterpreted as comparison.

2.2.3. Interpretive diversity view

Utsumi (2007) has recently argued that interpretive diversity determines whether metaphors are processed as comparisons or categorizations. Although metaphors are initially processed as categorizations, less diverse metaphors fail to be processed as categorizations and must be reinterpreted as comparisons, because their vehicles do not readily exemplify a metaphorical category to which the topic might belong. Interpretive diversity (or simply, diversity) refers to the semantic richness of metaphors and depends on the following two factors: the number of features or interpretations that constitute the figurative meaning, and the uniformity of the salience distribution of those features (Utsumi, 2005, 2007). A higher value of interpretive diversity means that the metaphor has a larger number of meanings (or interpretations) and that the salience of those meanings is more uniformly distributed. For example, the interpretive diversity view explains that the metaphor ‘‘My job is a jail’’ is processed as categorization not because it is either conventional or apt, but because it is interpretively diverse; the evoked metaphoric category implies many equally salient properties (e.g., unpleasant, involuntary, confining, unrewarding) that are applicable to the topic. Novel metaphors (e.g., ‘‘A child is a snowflake’’) are also interpretively diverse when an ad hoc category evokes many equally salient meanings (e.g., ‘‘A child is unique, delicate, and likely to change the atmosphere just as snowfall changes the landscape’’). Similarly, some less apt metaphors such as ‘‘My memories are money’’ may be interpretively diverse because a number of less relevant but equally salient meanings are evoked from the vehicle money as a potential metaphorical meaning (e.g., ‘‘My memories are precious to me,’’‘‘I keep my memories in the soul so as not to miss it,’’ and ‘‘I cannot live without memory’’). Therefore, all these metaphors are predicted to be comprehended as categorizations.

On the other hand, metaphors are interpretively less diverse when the vehicle cannot evoke a metaphoric category with many potential features, regardless of whether they are conventional or whether they are apt. For example, the metaphor ‘‘A fisherman is a spider’’ is interpretively much less diverse because the vehicle spider cannot evoke any rich categories for the topic fisherman, although ‘‘A goalie is a spider’’ may imply diverse meanings. For the same reason, some apt metaphors (e.g., ‘‘That supermodel is a rail’’) are also less diverse. These less diverse metaphors can be reinterpreted via the comparison process. The interpretive diversity view also predicts that even conventional metaphors (e.g., ‘‘A gene is a blueprint’’) are comprehended via the comparison process if the conventional metaphoric categories associated with the vehicle is semantically less rich (or semantically ‘‘narrow’’).

2.3. Disagreement among three hybrid views

These three hybrid views have empirically demonstrated the superiority of their own views. In these experiments, the metaphor–simile distinction was used as a valuable tool for examining the use of comparison and categorization during metaphor comprehension. The basic assumption underlying this method is that the linguistic form of a figurative statement invokes a specific comprehension process. Metaphors of the form ‘‘An X is a Y’’ should invite categorization because they are grammatically identical to literal categorization statements, whereas similes of the form ‘‘An X is like a Y’’ should invite comparison because they are grammatically identical to literal comparison statements. Therefore, if the process initially invoked by the form is different from the process eventually used for comprehension, such figurative statements should be reinterpreted; thus, such statements are comprehended more slowly and are less preferred. Following Bowdle and Gentner (2005), I refer to this link between form and process as grammatical concordance.

Bowdle and Gentner's (2005) conventionality view predicts that novel topic–vehicle pairs should be comprehended as comparisons, and according to grammatical concordance, it follows that they should be more comprehensible when presented in the simile form ‘‘An X is like a Y’’ than in the metaphor form ‘‘An X is a Y.’’ This is because the metaphor form initially invites an inappropriate process of categorization, whereas similes are comprehended as comparisons from the very beginning. In contrast, if topic–vehicle pairs are conventional, both forms should be equally comprehensible. Bowdle and Gentner (2005) demonstrated that the experimental results were consistent with this prediction; moreover, they showed that these results could not be explained in terms of metaphor aptness.

On the other hand, Glucksberg and Haught (2006b) demonstrated that novel but apt figurative statements (e.g., ‘‘My lawyer is (like) a well-paid shark’’) were easier to comprehend in the metaphor form than in the simile form. This finding is obviously inconsistent with the prediction of the conventionality view, and therefore, they concluded that the aptness or the quality of metaphors determines the choice of comprehension strategy. Furthermore, Jones and Estes (2005, 2006) reported that apt metaphors were more likely to be processed as categorizations and were comprehended faster than less apt metaphors; however, no such differences were observed between novel and conventional metaphors.

Against these two hybrid views, Utsumi (2007) demonstrated that only the interpretive diversity of topic–vehicle pairs was positively correlated with the relative comprehensibility of the metaphor form, as compared with the simile form; however, neither vehicle conventionality nor metaphor aptness showed a correlation with the relative comprehensibility. Although diverse pairs were equally comprehensible in both forms, less diverse pairs were more comprehensible when presented in the simile form than in the metaphor form. In addition, less diverse metaphors shared more meanings with the corresponding similes than diverse metaphors, suggesting that less diverse metaphors and similes are likely to be understood by the same process, namely, a comparison process. Again, such a difference was not observed for either vehicle conventionality or aptness.

As I have described earlier, recent metaphor studies have provoked a heated debate with regard to which metaphor properties determine the choice between categorization and comparison processes for comprehending metaphors. However, the question of determining which view is the most plausible remains unresolved. This study thus employs a different approach to this issue, namely, computational modeling and simulation experiment. I attempt to provide a computational or theoretical solution to the problem by identifying the metaphor property from vehicle conventionality, metaphor aptness, and interpretive diversity that best explains the result of model selection between comparison and categorization models. Given the lack of metaphor research using a model comparison technique, this study can be regarded as a pioneering study that derives evidence or knowledge about the mechanism of metaphor comprehension through a computational method.

3. Computational model

3.1. Semantic space model

Recently, vector-based semantic space models have been frequently used to represent lexical meanings and have proved highly useful for a variety of natural language processing (NLP) tasks, such as word sense disambiguation (Schütze, 1998), information retrieval (Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990; Widdows, 2004), thesaurus construction (Lin, 1998), document clustering (Shahnaz, Berry, Pauca, & Plemmons, 2006), and essay scoring (Landauer, Laham, & Foltz, 2003). What is more important is that semantic space models have also provided a useful framework for cognitive modeling, for example, similarity judgment (Landauer & Dumais, 1997), semantic priming (Jones, Kintsch, & Mewhort, 2006; Lowe & McDonald, 2000), text comprehension (Foltz, Kintsch, & Landauer, 1998; Kintsch, 2001), and language-mediated eye movement (Huettig, Quinlan, McDonald, & Altmann, 2006). There are also good reasons for using semantic space models for cognitive modeling and NLP. First, semantic space models are cost-effective in that it takes lesser time and effort to construct large-scale geometric representations of word meanings than to construct other types of lexical knowledge, such as dictionaries or thesauri. Second, they can represent the implicit knowledge of word meanings that dictionaries and thesauri cannot. Lastly, semantic spaces are easy to revise and extend.

Semantic space models are based on two main assumptions. One assumption is that the meaning of each word wi can be represented by a high-dimensional vector v(wi) = (wi1,wi2,…,wiD), that is, a word vector. These D real-valued components define the lexical meaning of the word. The second assumption is that the degree of semantic similarity sim(wi,wj) between any two words wi and wj can be computed using the similarity function of their word vectors. Among the variety of functions that can be used to compute the similarity between two word vectors in semantic space models, the cosine  cos (v(wi),v(wj)) is the most widely used. Using the similarity measure, one can easily compute the degree to which two words are semantically related.

Semantic spaces (or word vectors) are constructed from large bodies of text by observing distributional statistics of word occurrence. The method for constructing word vectors generally comprises the following two steps. First, M content words in a given corpus are represented as R-dimensional initial word vectors, and a M by R matrix A is constructed using M word vectors as rows.2 Then, the dimension of A’s row vectors is reduced from the initial dimension R to D and, as a result, a D-dimensional semantic space including M words is generated.

Numerous methods have been proposed for computing initial word vectors and for reducing the dimensionality (for an overview, see Padó & Lapata, 2007). Among them, latent semantic analysis (henceforth, LSA; Landauer & Dumais, 1997; Landauer et al., 2007) is the most popular. LSA uses the frequency of words in a document (e.g., paragraph) to compute initial vectors, whose dimension R is equal to the number of documents.3 LSA then reduces the number of dimensions using singular value decomposition (henceforth, SVD). Many studies (e.g., Kintsch, 2001; Landauer & Dumais, 1997; Landauer et al., 2007) have demonstrated that LSA successfully mimics a variety of human behaviors, particularly those associated with semantic processing. Hence, this study uses LSA to construct a semantic space for computer simulation.

3.2. Semantics in a semantic space model

To simulate human sentence processing including metaphor comprehension, it is necessary to devise a method for generating a vector representation of a piece of text (e.g., phrase, sentence) from its constituent words. Formally, given a piece of text S comprising a sequence of words w1,…,wn, a generation function f(v(w1),…,v(wn)) that computes a vector representation v(S) of S must be defined.

The standard method widely used in LSA research is the computation of the centroid of constituent word vectors; in other words, a generation function is defined as inline image. However, such a representation is not intuitively plausible because word orders (and thus, semantic roles) are completely ignored; the meaning of ‘‘People eat fishes’’ is obviously different from that of ‘‘Fishes eat people.’’ This drawback is particularly harmful when LSA is used for simulating metaphor comprehension; when the topic and the vehicle of a metaphor are reversed, its metaphorical meaning is drastically altered.

Kintsch (2001) proposed a predication algorithm for generating intuitively plausible and contextually dependent vectors of the proposition with the predicate argument structure. Given a proposition P(A), where P is a predicate and A is an argument, the predication algorithm first chooses m nearest neighbors of a predicate P (i.e., m words with the highest similarity to P). The algorithm then picks up k neighbors of P that are also related to A.4 Finally, the algorithm computes the centroid vector of P, A, and the k neighbors of P as a vector representation of P(A). The essence of the algorithm lies in the set of neighbors of P relevant to A. It can be interpreted that this set of words represents the meaning of a predicate P that is appropriate for describing or predicating an argument A. For example, consider the sentence ‘‘The computer works’’ with the predicate work and the argument computer. The verb work is semantically ambiguous because it has many different meanings, such as to perform a task for money, to operate correctly, or to have an effect. In LSA, all these meanings of work are represented together as a single vector. However, each use of work in a sentence does not represent all the meanings together; it represents a specific meaning appropriate for a given context, such as work in ‘‘The computer works’’ is used to represent the meaning to operate correctly. This is why the simple combination of P and A cannot represent the appropriate meaning of the whole sentence. In the predication algorithm, such contextually dependent meaning, for example, features of work that are relevant to the argument computer, can be represented as the set of neighbors of P relevant to A. Kintsch (2001) demonstrated that the predication algorithm performs in the way it is supposed to perform using several semantic problems, such as causal inference, similarity judgment, and metaphor comprehension.

3.3. Modeling the metaphor comprehension processes

As described in Section 2, metaphor comprehension involves two different processes, namely, categorization and comparison. In this section, these two comprehension processes are modeled in the LSA framework.

3.3.1. Modeling the process of categorization

As a computational model of the categorization process, this study employs Kintsch's (2001) predication algorithm without modification because it is reasonable to assume that a set of P’s neighbors relevant to A characterizes an abstract metaphorical category exemplified by the vehicle. The word virus in the metaphor ‘‘A rumor is a virus’’ refers to the abstract category of contagious things that are spreading, preventable, and harmful. These features of the abstract category can be included in the set of P’s neighbors because they are closely related to the literal meaning of virus (P) and are also related to rumor (A). On the other hand, the metaphorical category of contagious things does not include diseases literally caused by a virus, such as influenza and tuberculosis, and they will be excluded from the set of k neighbors because they are not relevant to rumor (A). Glucksberg's categorization theory also argues that literal categorization statements can be comprehended in the same way as metaphorical assertions. The word virus in the literal statement ‘‘Influenza is a virus’’ refers to the literal contagious virus, and the literal category can be represented as the set of k neighbors; in this case, the set of k neighbors can include meanings that are not relevant to the metaphorical category, because the argument influenza is also a virus. It must be noted that Kintsch (2000) briefly describes the relationship between the predication algorithm and Glucksberg's categorization theory. He suggests that the predication algorithm is consistent with the categorization theory, although he does not argue that it can be considered as a computational model of categorization. Furthermore, Glucksberg (2003) points out that the predication algorithm is very similar to the categorization process.

Let M be a given nominal metaphor with the vehicle wV (i.e., predicate) and the topic wT (i.e., argument), and Ni(x) be a set of i neighbors of the word x (i.e., a set of words with i highest similarity to x). The algorithm Categ(v(wT),v(wV);θcat) of computing a metaphor vector vcat(M) for M by the process of categorization is given as follows: (Note here that the algorithm Categ is identical to Kintsch's [2001] predication algorithm.)

Categ(v(wT),v(wV);θcat)
  • 1 Compute Nm(wV), that is, m neighbors of the vehicle wV.
  • 2 Choose k words with the highest similarity to the topic wT from among Nm(wV).
  • 3 Compute a vector vcat(M) as the centroid of v(wT), v(wV), and k vectors of the words chosen in Step 2.

The parameter m denotes the number of vehicle neighbors that should be searched for in the algorithm and the parameter k denotes the number of vehicle neighbors that should be selected to be similar to the topic. The notation θcat denotes a list of these parameter values (m, k) for the Categ algorithm.

For example, Table 3 shows the step-by-step behavior of the Categ algorithm when it computes a vector for the metaphor ‘‘A rumor is a virus’’ with the parameters θcat = (m,k) = (500,5). The first column of Table 3 lists the top 10 and last 2 neighbors of the vehicle virus included in the set N500(wV) computed at Step 1.5 (Hence, the words capsule and estimation are the 499th and the 500th nearest neighbors.) These 500 neighbors are sorted in descending order of cosine similarity to the topic rumor and are listed in the second column. As k = 5, the top five words (i.e., doubt, trigger, recency, spread/get about, and guess) are chosen for characterizing a metaphoric category of ‘‘contagious things.’’ (Note that vehicle neighbors are generally not the nearest neighbors of the topic, although they move closer to the topic as m grows larger. For example, the top five nearest neighbors of the topic rumor are disclosure, prosecutor, expose, report, and conjecture.) These chosen words do not seem to represent the metaphoric category of contagious things on their own, but their centroid vector is close to the features of contagious things. In Step 3, these five vectors of the chosen words are averaged with the topic and the vehicle vectors to obtain a metaphor vector vcat(M). The rightmost column of Table 3 lists the top 10 nearest neighbors of the metaphor vector. They can be regarded as representing the meanings of the metaphor because, as mentioned in Section 3.1, the cosine similarity between two vectors is used as a measure of semantic relatedness. Some nearest neighbors of the vehicle (i.e., contagion and take effect), which are also relevant to the topic, are attributed to the topic rumor. On the other hand, some nearest neighbors of the vehicle that are not relevant to the topic, such as tuberculosis, are downplayed. More important, some emergent words, such as recency and epidemic/spread, which are not close to the vehicle or the topic, but appropriate as a metaphorical meaning, are also attributed to the topic; a rumor spreads rapidly, just like a virus is epidemic, and recency is an important factor for the spread of both a virus and a rumor.

Table 3. 
An example of the step-by-step behavior of the Categ algorithm in comprehending the metaphor ‘‘A rumor is a virus’’
Nearest Neighbors of the Vehicle virus and Cosines with the Vehicle (Step 1)Sorted List of the 500 Vehicle Neighbors and Cosines with the Topic (Step 2)Nearest Neighbors of the Metaphor Vector and Their Cosines (Step 3)
  1. aThe original Japanese word hiromaru means both ‘‘spread’’ and ‘‘get about.’’

  2. bThe original Japanese word ryuko means both ‘‘epidemic’’ and ‘‘spread.”

contagion0.94doubt0.28recency0.86
fungus0.86trigger0.19virus0.63
tolerance0.81recency0.18epidemic/spread b0.63
disease onset0.80spread/get about a0.18contagion0.60
tuberculosis0.80guess0.17fungus0.59
bacteria0.80ovulation0.17take effect0.57
antibiotic0.77sneeze0.17bacteria0.57
heated0.75pregnancy0.14tolerance0.55
drug disaster0.75efficacy0.14of a kind0.55
blood sampling0.75appearance0.14disease onset0.55
capsule0.18trachea−0.11  
estimation0.18pulse−0.11  

3.3.2. Modeling the process of comparison

For a computational model of the comparison process, I propose the following algorithm Compa(v(wT),v(wV);θcom) that computes a metaphor vector vcom(M), given that θcom = (k) is a list of one parameter value k.

Compa(v(wT),v(wV);θcom)
  • 1 Compute k common neighbors Ni(wT) ∩ Ni(wV) of wT and wV by finding the smallest i that satisfies |Ni(wT) ∩ Ni(wV)| ≥ k.
  • 2 Compute a metaphor vector vcom(M) as the centroid of v(wT) and k vectors of the words chosen in Step 1.

Table 4 shows an example of how the Compa algorithm works for the metaphor ‘‘A fisherman is a spider’’ with the parameter θcom = (k) = (3). First, the algorithm finds k common neighbors of the topic fisherman and the vehicle spider. In this example, the first common neighbor prey/bait is found when i = 23 (in other words, Ni(wT) ∩ Ni(wV) first becomes non-empty when i = 23). As a result, three common neighbors prey/bait, net, and fishing are found when i = 67. These common neighbors represent a correspondence between fisherman and spider, in that just as a spider waits for and catches its prey in a net (web), a fisherman waits for and catches fish (as prey) using a net. (Note that the common term fishing means the act of catching fish.) Then in Step 2, the vectors of these common neighbors are averaged with the topic vector such that the resulting metaphor vector is close to both the original topic vector and common neighbors. As shown in Table 4, the resulting metaphor vector is indeed close to the three common vectors, as well as the topic fisherman itself and some of the topic properties, such as fishery and catch landing, which are included in the top 10 nearest neighbors of the metaphor vector. In particular, the common neighbors are highlighted, because their cosines with the metaphor vector are higher than those with the original topic vector. Furthermore, for example, the term wait, which is not initially salient for fisherman, is also highlighted, although it is not ranked among the top 10 nearest neighbors; wait has a higher cosine with the metaphor vector (0.35) than with fisherman (0.24). Taken together, these results mean that the Compa algorithm produces an appropriate metaphor vector that represents an intuitively sensible interpretation that the fisherman's specific property of ‘‘waiting for and catching fish using a net’’ is emphasized by this metaphor.

Table 4. 
An example of the step-by-step behavior of the Compa algorithm in comprehending the metaphor ‘‘A fisherman is a spider’’
Common Neighbors of fisherman and spider Computed at Step 1Cosine with fisherman and Its Rank in ParenthesesCosine with spider and Its Rank in Parentheses
prey/bait a0.55 (18)0.31 (23)
net0.56 (16)0.28 (58)
fishing0.46 (67)0.30 (26)
Top 10 Nearest Neighbors of the Metaphor Vector Computed at Step 2Cosine with the Metaphor Vector
  1. aThe original Japanese word esa means both ‘‘prey’’ and ‘‘bait.’’

prey/bait0.92
net0.85
small fish0.79
fishing0.79
fishery0.76
water temperature0.75
fisherman0.75
angler0.72
migration0.71
catch landing0.70

This algorithm can be regarded as a simplified model of the comparison process that comprises alignment and projection (Gentner, 1983; Gentner et al., 2001). The computation of common neighbors Ni(wT) ∩ Ni(wV) of topic and vehicle at Step 1 can be reasonably regarded as the alignment process. It is likely that the set of common neighbors includes identical elements found in the early stage of alignment (e.g., the arguments net and prey in the case of ‘‘A fisherman is a spider’’ and the predicates bring and help in the case of ‘‘Socrates is a midwife’’). It must be noted that according to Gentner (1983), the alignment process is governed by the systematicity principle: a system of relations connected by higher order relations is preferred over one with an equal number of independent matches. The Compa algorithm does not explicitly take into account the later stage of alignment in which structurally consistent mappings are derived from local matches according to the systematicity principle; however, it implicitly deals with some aspects of this process. Predicates for higher order relations are likely to be expressed by ambiguous words, and such ambiguous words are likely to be common neighbors because they are similar to many words in a semantic space. These predicates constitute consistent mappings, and as a result, the Compa algorithm seems to prefer higher order relations in the alignment process.

Of course, I do not argue that the Compa algorithm completely embodies the later stage of alignment (and the systematicity principle). This limitation is common to any algorithm in semantic space models, rather than being specifically meant for the Compa algorithm, because semantic space models at present lack the ability to represent the relational knowledge of concepts expressed by words (Kintsch, 2008a). However, I do not consider this to be a serious limitation of the Compa algorithm for the present purpose because most empirical findings regarding the debate between comparison and categorization (e.g., Bowdle & Gentner, 2005; Chiappe & Kennedy, 1999; Glucksberg & Haught, 2006b; Jones & Estes, 2006; Utsumi, 2007; Wolff & Gentner, 2000) have been obtained for simple nominal metaphors, and understanding these metaphors does not require so many alignments of higher order relations.

On the other hand, the computation of the centroid of k common neighbors and the topic vector in Step 2 can be reasonably regarded as the projection process. Projection is a process of transferring to the topic predicates and arguments connected to the common structure found in the alignment process. As a result, projected predicates and arguments that are included in the aligned structure (i.e., those common to both concepts or unique to the vehicle) are highlighted, whereas the original salient properties of the topic are retained (e.g., Gentner & Bowdle, 2008; Gentner et al., 2001). This can be modeled as the centroid computation of the vectors of k common neighbors and the topic vector, because the centroid of multiple vectors is generally close to those vectors. It implies that the resulting centroid vector is close to the elements in the aligned structure (represented by k common neighbors), as well as to the salient properties of the topic (represented by the topic vector). In fact, as shown in the fisherman–spider example, the elements connected to the common structure, that is, net and prey, which are common to both concepts, and wait, which are unique to the vehicle but not initially salient in the topic, are highlighted by the Compa algorithm. In addition, the original salient properties of the topic, such as fishery, are also included in the nearest neighbors of the metaphor vector.

3.4. Justifying the plausibility of two models

Before presenting the results of the simulation experiment, I must demonstrate the plausibility of the two algorithms Categ and Compa as models of categorization and comparison processes, so that the simulation result constitutes a valid test of metaphor theories. In this regard, to demonstrate the empirical adequacy of the model (McClelland, 2009), I show that these algorithms are consistent with the following two distinctive processing phenomena:

Psychological studies of metaphor have employed these phenomena as a test to determine whether people comprehend metaphors as comparisons or categorizations and as a means of encouraging them to comprehend metaphors as comparisons or categorizations. Hence, if the algorithms can explain these distinctive phenomena, they are plausible models at least for reproducing the findings obtained in these psychological studies; this is sufficient for the present purpose.

As described in Section 2.3, grammatical concordance refers to the link between linguistic form and function. Literal statements of the form ‘‘An X is a Y’’ are interpreted as categorizations (e.g., ‘‘A whale is a mammal’’), whereas literal statements of the form ‘‘An X is like a Y’’ are interpreted as comparisons (e.g., ‘‘A whale is like a dolphin’’). Therefore, the algorithms Categ and Compa are shown to be plausible models of the categorization and comparison processes if they can produce the sentence vector for literal categorization and comparison sentences that fits our intuitions about the meaning of these sentences. Specifically, the Categ algorithm would be expected to produce intuitively more plausible results for a literal categorization statement ‘‘An X is a Y,’’ whereas the Compa algorithm would produce more plausible results for a literal comparison statement ‘‘An X is like a Y.’’

Directionality is concerned with the asymmetry of metaphors and its processing stage. When the topic and the vehicle of a metaphor are reversed, the resulting statement expresses a different meaning from the original metaphor or, in many cases, does not make sense. For example, reversing the terms of the metaphor ‘‘A rumor is a virus’’ produces a different metaphor ‘‘A virus is a rumor,’’ which seems meaningless. Categorization and comparison processes make a different prediction with regard to when this asymmetry appears during metaphor comprehension. Categorization is initially asymmetrical (or role specific) because a metaphoric category is constructed primarily from the vehicle. On the other hand, the comparison process begins with a symmetrical (i.e., role neutral) alignment process and becomes asymmetrical in the later projection process. Hence, the plausibility of the algorithms can be tested by examining whether and when these algorithms can yield the result that is consistent with the asymmetry of metaphors.

3.4.1. Products of literal comparison and categorization statements

In general, a categorization statement ‘‘An X is a Y’’ is processed so that, owing to default inheritance, the features characterizing Y-ness are highlighted unless they are irrelevant to X, and other salient features of X are downplayed. For example, consider a literal categorization statement ‘‘A whale is a mammal.’’ Our intuition says that this statement modifies our knowledge of the whale by emphasizing its typical mammalian features such as having animal nature and suckling and de-emphasizing its distinctive, whale-specific features that many (land-living) mammals do not have, for example, living in the sea and swimming.

Fig. 1 shows the simulation result of computing the sentence vectors of the literal categorization statement ‘‘A whale is a mammal’’ by the Categ algorithm and the Compa algorithm. Fig. 1 depicts one bar chart and two line graphs; the bar chart shows the cosine similarity between the original vector of topic whale and the vectors for the relevant landmark features, whereas the line graphs show the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categorization statement (S) ‘‘A whale is a mammal’’ and the landmark features.6 Note that, as mentioned in Section 3.1, the cosine similarity between two vectors is used as a measure of semantic relatedness; a higher cosine implies that two words or sentences are semantically more related. Hence, when the vector of a landmark feature has a higher cosine with the sentence vector (i.e., vcat(S) or vcom(S) denoted by line graphs) than with the vector of the topic whale (denoted by bars), the algorithm determines that the sentence emphasizes or highlights that feature.

Figure 1.

 An illustrative example showing that the Categ algorithm generates a more plausible vector than the Compa algorithm for a literal categorization statement: The case of ‘‘A whale is a mammal.’’ The bar chart shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark features. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categorization statement ‘‘A whale is a mammal’’ and the landmark features. It is preferable that mammalian features are more highlighted (i.e., graphs are on the right of the bars) and whale-specific features are more downplayed (i.e., graphs are on the left of the bars). The Categ algorithm increases the cosine similarity of the mammalian features and decreases the cosine similarity of the whale-specific features to a greater extent than the Compa algorithm.

The algorithm Categ mimics the intuition (that the categorization statement emphasizes mammalian features and de-emphasizes whale-specific features) more appropriately than the algorithm Compa, as shown in Fig. 1. First, although the typical mammalian features animal nature and suckle are highlighted by both algorithms, the Categ algorithm highlights these mammalian features to a greater extent than the Compa algorithm; the increase in cosine similarity from the original whale vector (denoted by the gray bars) is greater for the sentence vector vcat(S) computed by the Categ algorithm (denoted by filled circles) than for the sentence vector vcom(S) computed by the Compa algorithm (denoted by filled triangles). The mean increase in cosine similarity for the two mammalian features is 0.27 for the vector vcat(S) and 0.17 for the vector vcom(S). Second, whale-specific but non mammalian features such as sea and swim are downplayed by the Categ algorithm, but they are not downplayed (or somewhat highlighted) by the Compa algorithm. The mean decrease in cosine similarity from the original whale vector is 0.08 for the vector vcat(S) by Categ, and it is greater than the mean decrease −0.01 for the vector vcom(S) by Compa. Finally, owing to these differences, the sentence vector vcat(S) of ‘‘A whale is a mammal’’ by the Categ algorithm behaves like a mammal more appropriately than the vector vcom(S) by the Compa algorithm; the vector vcat(S) by Categ has higher cosine with the mammalian features than with the whale-specific features, but the vector vcom(S) by Compa inappropriately has higher cosine with the whale-specific feature sea than with the mammalian feature suckle. From these simulation results, this example indicates that the algorithm Categ works better as a model of categorization than the algorithm Compa.7

In contrast, it is reasonably assumed that people comprehend a literal comparison statement ‘‘An X is like a Y,’’ such that only the common features shared by X and Y are highlighted without other Y-ness features being highlighted. For example, in comprehending ‘‘A whale is like a dolphin,’’ people would try to seek commonality between whale and dolphin and arrive at the interpretation in which common features (e.g., living in the sea, swimming) are emphasized but dolphin-specific features (e.g., therapy, intelligence) are not highlighted. Fig. 2 shows that such a pattern of interpretation can be replicated more appropriately by the algorithm Compa than by the algorithm Categ. (Note that the bar chart of Fig. 2 represents the cosine between the whale vector and feature vectors, and two graphs represent the cosines between the sentence vector vcat(S) or vcom(S) for the literal comparison statement (S) ‘‘A whale is like a dolphin’’ and feature vectors.) First, the common features sea and swim are highlighted by the Compa algorithm (i.e., they are closer to the sentence vector vcom(S) than to the original whale vector), but the Categ algorithm undesirably downplays the common feature sea. Moreover, the increase in the cosine similarity of the feature swim is greater for the Compa algorithm than for the Categ algorithm. Second, the unshared dolphin-specific features therapy and intelligent are less highlighted by the Compa algorithm than by the Categ algorithm. The mean increase in cosine similarity of two dolphin-specific features is 0.06 for the vector vcom(S) and is smaller than the mean increase of 0.21 for the vector vcat(S). Finally, as a result, the Compa algorithm generates an intuitively plausible sentence vector that is more similar (and thus, semantically more related) to the common features than to the dolphin-specific features. The Categ algorithm, however, does not generate such a plausible vector; the generated sentence vector is less similar to the common features than to the dolphin-specific feature therapy. These simulation results indicate that the Compa algorithm works better as a model of comparison than the Categ algorithm.

Figure 2.

 An illustrative example showing that the Compa algorithm generates a more plausible vector than the Categ algorithm for a literal comparison statement: The case of ‘‘A whale is like a dolphin.’’ The bar chart shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark features. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categorization statement ‘‘A whale is like a dolphin’’ and the landmark features. It is preferable that the common features are more highlighted (i.e., graphs are on the right of the bars) and dolphin-specific features are not highlighted (i.e., graphs are located near the bars). The Compa algorithm increases the cosine similarity of the common features to a greater extent and the cosine similarity of the dolphin-specific features to a lesser extent than the Categ algorithm.

3.4.2. Directionality in the metaphor comprehension process

In this section, I test the plausibility of the algorithms Categ and Compa by showing whether and when these algorithms can yield the asymmetry of metaphors.

Concerning whether the algorithms Categ and Compa are consistent with the asymmetry of metaphors, they can indeed compute different meanings (i.e., vectors) for an original metaphor and its reversed metaphor, as shown in Tables 5 and 6. Table 5 illustrates that the sentence vector of ‘‘A rumor is a virus’’ computed by the Categ algorithm (with the parameters θcat = (m,k) = (500,5)) highlights the feature contagion that is typical of virus (i.e., contagion has higher cosine with ‘‘A rumor is a virus’’ than with rumor alone, which is shown by ΔCosine = 0.60). Further, this sentence vector downplays scandal that is typical of rumor but irrelevant to the metaphor (i.e., scandal has lower cosine with ‘‘A rumor is a virus’’ than with rumor, which is shown by ΔCosine = −0.22). On the other hand, the vector for ‘‘A virus is a rumor’’ shows a different result; it highlights the feature scandal (ΔCosine = 0.19) and downplays the feature contagion (ΔCosine = −0.42). Although the cosine similarity of contagion is still higher than the cosine of scandal, it may reflect the intuition that the reversed metaphor ‘‘A virus is a rumor’’ does not make sense. This meaningless metaphor cannot appropriately describe the relevant features of a virus, and thus, the originally salient features of a virus may be still salient in the metaphor. Likewise, Table 6 shows that the Compa algorithm (with the parameter θcom = (k) = (3)) reflects the asymmetry of the metaphor; two metaphor vectors differ in that the vector for ‘‘Deserts are ovens’’ highlights the two features dry and dish, whereas the vector for ‘‘Ovens are deserts’’ highlights different features dry and vast. Note that the Compa algorithm may highlight common properties (e.g., dry in this case) shared by the topic and the vehicle regardless of their order. This tendency is consistent with the existing findings that reversed similes (i.e., reversed figurative comparisons) preserved the original interpretation better and lowered the comprehensibility to a lesser extent than the reversed metaphors (Chiappe, Kennedy, & Smykowski, 2003).

Table 5. 
Asymmetry between the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a rumor’’ generated by the Categ algorithm
 contagionscandal
CosineΔCosineCosineΔCosine
  1. Notes. Cosine denotes the cosine similarity to two landmarks contagion and scandal. ΔCosine denotes an increase in cosine similarity by metaphorization, which is equal to (Cosine of the metaphor)  − (Cosine of the topic alone).

Rumor0.000.32
Virus0.940.12
A rumor is a virus0.600.600.10−0.22
A virus is a rumor0.52−0.420.310.19
Table 6. 
Asymmetry between the metaphor ‘‘Deserts are ovens’’ and its reversed metaphor ‘‘Ovens are deserts’’ generated by the Compa algorithm
 dryvastdish
CosineΔCosineCosineΔCosineCosineΔCosine
  1. Notes. Cosine denotes the cosine similarity to the three landmarks. ΔCosine denotes an increase in cosine similarity by metaphorization, which is equal to (Cosine of the metaphor)  − (Cosine of the topic alone).

Deserts0.280.56−0.02
Ovens0.35−0.020.70
Deserts are ovens0.780.500.41−0.150.290.31
Ovens are deserts0.780.430.370.390.32−0.38

With regard to when the algorithms Categ and Compa generate the asymmetry of metaphors, they differ in the stage at which directionality arises during computation, just as the categorization and comparison processes differ as to when the asymmetry appears during metaphor comprehension. In general, the algorithm Categ is initially asymmetrical in the same way as the categorization process. The first step (Step 1) of the algorithm Categ computes a set of m neighbors of the vehicle, that is, the set of Y's neighbors for the original metaphor ‘‘An X is a Y,’’ and the set of X's neighbors for the reversed metaphor ‘‘A Y is an X.’’ It is very likely that these two sets are not only different but also have quite a small overlap, unless X and Y have very similar vectors. Particularly in the case of metaphors, X and Y are not semantically similar, and thus, it seems much less likely that two sets of neighbors would be identical; in many cases, they do not even overlap (especially when the number of vehicle neighbors m is small). As a result, the second step (Step 2) chooses a different set of k neighbors between the original metaphor and its reversed metaphor. For example, Table 7 lists 20 neighbors of the vehicle for the metaphors ‘‘A rumor is a virus’’ and ‘‘A virus is a rumor’’ computed in Step 1 of the Categ algorithm. In this example, the two sets of vehicle neighbors do not overlap. Hence, when m ≤ 20, the sets of k neighbors chosen at Step 2 are inevitably different. Even in the case of m = 500 and k = 5, the sets of vehicle neighbors N500(virus) and N500(rumor) only have two common words (i.e., doubt, trigger). These two common words are chosen at Step 2 for both metaphors, but the other three words differ between them. The words recency, spread/get about, and guess are chosen for ‘‘A rumor is a virus,’’ as shown in Table 3, whereas fear, topic, and emerge/show up are chosen for the reversed metaphor ‘‘A virus is a rumor.’’ Furthermore, when the reversed versions of all the metaphors used in the simulation of Section 4 are computed by the Categ algorithm with the optimal parameters, none of the reversed metaphors produce the same set of m neighbors in Step 1 and the same set of k neighbors in Step 2 as the original metaphors. These results show that the Categ algorithm is basically asymmetrical from the beginning.

Table 7. 
Twenty neighbors of the vehicle computed at Step 1 of the Categ algorithm in comprehending the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a rumor’’
‘‘A rumor is a virus’’ (neighbors of virus)‘‘A virus is a rumor’’ (neighbors of rumor)
  1. Note. The words are listed in descending order of cosine similarity to the vehicle.

contagiondisclosure
fungusprosecutor
toleranceexpose
disease onsetreport
tuberculosisconjecture
bacterialady
antibioticsurprised
heatedtrouble
drug disasterpublic prosecutors office
blood samplingaide
blood donationresignation
administrationscandal
preventiontale
take effectfact
blood transfusionbusiness trip
bloodreveal
immunitymonthly
vaccineillegitimate child
chronicmass media
side-effectdisavow

On the other hand, following the same steps as the comparison process, the algorithm Compa is initially symmetrical and asymmetrical later. The first step (Step 1) of the algorithm computes k common neighbors of the topic and the vehicle, and thus it is obviously symmetrical; the Compa algorithm computes the same set of common neighbors for the reversed metaphors as the original metaphor. The second step (Step 2) computes the centroid of the topic and k common words as the metaphor vector, and the topic X of the original metaphor ‘‘An X is a Y’’ is different from the topic Y of the reversed metaphors ‘‘A Y is an X.’’ Hence, Step 2 produces different metaphor vectors when the word order is reversed, which means that Step 2 is asymmetrical.

Note that, in almost all cases (particularly in the case of a small m), the k common neighbors computed at Step 1 of the Compa algorithm differ from the k neighbors computed at Step 2 of the Categ algorithm. Words that are highly similar to both the vehicle and the topic are likely to be included in both sets of k neighbors (i.e., k neighbors computed by Compa and those computed by Categ); however, such words are quite rare, given that the topic and the vehicle of metaphors are usually semantically dissimilar. Hence, most of the vehicle neighbors are not neighbors of the topic.

To sum up the discussion, both algorithms can produce the asymmetry of metaphor. Furthermore, the Categ algorithm behaves like categorization in that the computation is asymmetrical from the beginning, whereas the Compa algorithm behaves like comparison in that the computation is initially symmetrical and asymmetrical later. This consistency strengthens the plausibility of the algorithms as a model of categorization and comparison.

4. Simulation experiment

This section presents the details and the results of the simulation experiment comprising model selection and theory testing. The overall procedure of model selection and theory testing is summarized as follows and is illustrated in Fig. 3.

Figure 3.

 An illustration of model selection and theory testing procedure. The numbers in parentheses denote the number of the section in which the corresponding procedure is explained.

  • 1 Forty Japanese metaphors of the form ‘‘An X is a Y’’ were used for the simulation experiment, as listed in Table 8. The human interpretation data (i.e., a list of meanings W(M) and its salience distribution p in Fig. 3) of these metaphors, their ratings of vehicle conventionality and metaphor aptness, and their interpretive diversity values were obtained beforehand in a previous experiment (Utsumi, 2007). Section 4.1 and the Appendix A will explain how these data were obtained.
  • 2 For each of the 40 metaphors, optimal parameters inline image and inline image of the two algorithms Categ and Compa were estimated by the maximum likelihood method as follows.
    • (a) For given parameter values θ, an algorithm (Categ or Compa) computed the similarity distribution q(θ) (i.e., qcat(θcat) or qcom(θcom) in Fig. 3) for the list of meanings W(M).8 The method for computing the similarity distribution is described in Section 4.2.
    • (b) Kullback–Leibler divergence D(p || q(θ)) (henceforth, KL-divergence) between the computed similarity distribution q(θ) and the salience distribution p is computed as a measure of the match between the model and data. KL-divergence and its relation to the maximum likelihood method will be described in Section 4.3.
    • (c) The optimal parameter inline image (i.e., inline image or inline image) is obtained by finding the parameter values that minimize the KL-divergence, which is described in Section 4.4.
  • 3 For each metaphor, the two algorithms (i.e., models) were compared using Akaike's information criterion (henceforth, AIC), which is a measure of statistical model selection considering the tradeoff between the model's precision (i.e., the maximum log-likelihood for the model computed by the minimum KL-divergence) and complexity (i.e., the number of free parameters of the model). The model with a smaller AIC is selected as the best one. Hence, if the AIC of the Categ algorithm (denoted by AICcat in Fig. 3) is smaller than the AIC of the Compa algorithm (denoted by AICcom), the categorization model is selected as the best one. Likewise, if AICcat is greater than AICcom, the comparison model is selected as the best one. The details of AIC and the result of model selection are described in Section 4.5.
  • 4 For each metaphor and its selected model, whose optimal similarity distribution is denoted by inline image in Fig. 3, the goodness-of-fit between the model and data is tested using a chi-square test, owing to the well-known fact that KL-divergence can be approximated by chi-square. Metaphors that exhibit the significant discrepancy between the model and data are excluded from the subsequent analysis. The issue of the goodness-of-fit test is described in Section 4.6.
  • 5 A linear discriminant analysis is conducted with the selected model (i.e., categorization or comparison) as the dependent variable and three metaphor properties (vehicle conventionality, metaphor aptness, and interpretive diversity) as the independent variables. The method and the result of the discriminant analysis are described in Section 4.7.
Table 8. 
Metaphors used in the simulation experiment
  1. Note. The original Japanese expressions used in the experiment are shown in parentheses, preceded by their literal English translations.

 1.Life is a journey. (Jinsei ha tabi da) 2.Life is a game. (Jinsei ha ge-mu da)
 3.Love is a journey. (Ai ha tabi da) 4.Love is a game. (Ai ha ge-mu da)
 5.Anger is the sea. (Ikari ha umi da) 6.Anger is a storm. (Ikari ha arashi da)
 7.Sleep is the sea. (Nemuri ha umi da) 8.Sleep is a storm. (Nemuri ha arashi da)
 9.Perfume is a bouquet. (Ko-sui ha hanataba da)10.Perfume is ice. (Ko-sui ha koori da)
11.A star is a bouquet. (Hoshi ha hanataba da)12.A star is ice. (Hoshi ha koori da)
13.A sky is a mirror. (Sora ha kagami da)14.A sky is a lake. (Sora ha mizuumi da)
15.An eye is a mirror. (Me ha kagami da)16.An eye is a lake. (Me ha mizuumi da)
17.A lover is the sun. (Koibito ha taiyo da)18.A lover is a rainbow. (Koibito ha niji da)
19.One's hope is the sun. (Kibou ha taiyo da)20.One's hope is a rainbow. (Kibou ha niji da)
21.A child is water. (Kodomo ha mizu da)22.A child is a jewel. (Kodomo ha houseki da)
23.Words are water. (Kotoba ha mizu da)24.Words are jewels. (Kotoba ha houseki da)
25.An elderly person is a doll. (Roujin ha ningyou da)26.An elderly person is a deadwood. (Roujin ha kareki da)
27.One's voice is a doll. (Koe ha ningyou da)28.One's voice is a deadwood. (Koe ha kareki da)
29.One's character is fire. (Seikaku ha hi da)30.One's character is a stone. (Seikaku ha ishi da)
31.A marriage is fire. (Kekkon ha hi da)32.A marriage is a stone. (Kekkon ha ishi da)
33.Death is the night. (Shi ha yoru da)34.Death is the fog. (Shi ha kiri da)
35.Anxiety is the night. (Fuan ha yoru da)36.Anxiety is the fog. (Fuan ha kiri da)
37.Time is money. (Jikan ha okane da)38.Time is an arrow. (Jikan ha ya da)
39.Memory is money. (Kioku ha okane da)40.Memory is an arrow. (Kioku ha ya da)

4.1. Metaphors, human interpretation data, and metaphor properties

This study employed 40 metaphors, as shown in Table 8. They were created from 10 groups, each of which comprised two topic words and two vehicle words. For each group, four metaphors were created from all possible pairings of two topic words with two vehicle words. For example, from the two topics, anger and sleep, and the two vehicles, sea and storm, the following four metaphors were created: ‘‘Anger is the sea,’’‘‘Anger is a storm,’’‘‘Sleep is the sea,’’ and ‘‘Sleep is a storm.’’ Topic and vehicle words were selected from an experimental study on Japanese metaphor and a list of words frequently used for Japanese metaphors so that they are highly frequent and familiar.

For human interpretation data of metaphors, this study employed the result of the psychological experiment (Experiment 2) that Utsumi (2007) conducted using the same 40 metaphors. This experiment addressed the difference in the comprehensibility between the metaphor form and the simile form of a topic–vehicle pair and demonstrated that among the three hybrid metaphor theories, the interpretive diversity view best explained the observed comprehensibility difference. This study used some of the results obtained in this experiment, namely, the listed meanings of metaphors (with the number of participants who listed that meanings), ratings of vehicle conventionality and metaphor aptness, and interpretive diversity values. A detailed procedure for obtaining these results in this experiment is provided in the Appendix A.

For each metaphor M, a list W(M) of metaphorical meanings wi with the number of participants xi who listed that meaning was provided by Utsumi's (2007) experiment. These meanings were used as landmarks with respect to which the computational model's interpretation and human interpretation were compared for evaluation. Note that in the experiment, participants were instructed that they should write down three or more meanings by single words wherever possible; as a result, the final list of meanings included only single words, which corresponds to the unit of representation of the semantic space model. For example, the list of meanings for the metaphor ‘‘Anger is the sea’’ includes eight features, such as fearful/dreadful, rage/stormy, and deep, as shown in Fig. 4.

Figure 4.

 Simulation results for the metaphor ‘‘Anger is the sea.’’ The bar chart indicates the degree of salience pi of the human interpretation. Two line graphs indicate the normalized degree of similarity inline image or inline image computed by Categ or Compa algorithms. The closer a line graph is to the bar chart, the better is the match between its corresponding model and the human data. For six of eight features (i.e., fearful/dreadful, rage/stormy, deep, surge, wave, strong), the normalized degree of similarity inline image computed by the Compa algorithm is closer to the human data p than the degree of similarity inline image by the Categ algorithm. This result indicates that the metaphor ‘‘Anger is the sea’’ is comprehended by the comparison process.

Using these data, the degree of salience pi for each meaning wi in the list W(M) is defined as the ratio of the number of participants xi to the total number of tokens inline image, where n=|W(M)|.

image(1)

This definition of salience is almost identical to the definition of salience for prototype representation of concepts by Smith, Osherson, Rips, and Keane's (1988), and it reflects the subjective frequency with which the feature (i.e., meaning) occurs in people's interpretations of the metaphor. This definition is psychologically plausible because it has been pointed out that frequency is closely related to salience (Giora, 2003; Tversky, 1977). For example, the bar chart of Fig. 4 indicates the degree of salience of the eight features that the participants listed as the meaning of the metaphor ‘‘Anger is the sea.’’ The meaning fearful/dreadful had the highest salience of 0.25, indicating that the number of participants who listed it was the largest.

The interpretive diversity of each metaphor M was calculated using Shannon's entropy H(p), defined by the following formula (Utsumi, 2005, 2007):

image(2)

For example, the interpretive diversity of the metaphor ‘‘Anger is the sea’’ in Fig. 4 was calculated as 2.71, given that the bar length for a feature wi corresponds to pi. The mean interpretive diversity across the 40 metaphors used in the experiment was 3.01 (SD = 0.42), ranging from 2.09 (‘‘One's character is a stone,’’ numbered 30 in Table 8) to 3.76 (‘‘An eye is a mirror,’’ numbered 15).

For vehicle conventionality and metaphor aptness, this study used the mean ratings for each metaphor obtained in the previous experiment (Utsumi, 2007). The metaphors were rated on a 7-point scale ranging from 1 (very novel) to 7 (very conventional) for conventionality, or from 1 (not at all apt) to 7 (extremely apt) for aptness. The details of the rating experiment are provided in the Appendix A. The mean conventionality rating across the 40 metaphors was 4.46 (SD = 1.19), ranging from 1.83 (‘‘Memory is an arrow,’’ numbered 40 in Table 8) to 6.28 (‘‘A lover is the sun,’’ numbered 17). The mean aptness rating was 3.70 (SD = 1.07), ranging from 1.83 (‘‘One's voice is a doll,’’ numbered 27) to 6.00 (‘‘Life is a journey,’’ numbered 1).

4.2. Computer interpretation

To generate a semantic space for computer simulation, a term–paragraph matrix A was constructed from a Japanese corpus of 523,249 paragraphs containing 62,712 different words, which were derived from a CD-ROM of Mainichi newspaper articles published in 1999. The dimension of the row vectors of A was then reduced using SVD. The number of dimensions D of the semantic space was determined to be 300 because a 300-dimensional space usually yields the best performance for simulating human behavior (e.g., Kintsch, 2001; Landauer & Dumais, 1997).

Using the constructed semantic space, the metaphorical interpretation (i.e., similarity distribution q(θ)) of a metaphor M was computed as follows:

  • 1 For a given list of parameter values θ (i.e., θcat or θcom presented in Section 3.3), an algorithm (Categ or Compa) computed the metaphor vectors v(M;θ) (i.e., vcat(M;θcat) or vcom(M;θcom)).
  • 2 For each of the features wi listed for a metaphor M, the cosine similarity  cos (v(wi),v(M;θ)) between the feature wi and the metaphor vector v(M;θ) was computed. Features with higher cosine similarity to the metaphor vector were more appropriate as a metaphorical meaning, or in other words, more relevant to the metaphorical interpretation.
  • 3 Finally, the similarity distribution q(θ) (i.e., qcat(θcat) or qcom(θcom)) was calculated by the following formulas:
image(3)
image(4)

In Eq. 4, Ω denotes the set of all words in the semantic space, and thus, minx∈Ω{cos (v(x),v(M;θ))} denotes the smallest (minimum) cosine value between the metaphor vector v(M;θ) and any word vector v(x) in the semantic space. Therefore, di(M;θ) expresses the deviation of wi’s cosine similarity from the minimum cosine. Equation 3 shows that the normalized degree of similarity qi(θ) for the feature wi is defined as the ratio of the derivation of cosine, so that it is analogous to the degree of salience p defined in Eq. 1. The reason for using the deviation of cosine similarity instead of the cosine similarity itself is that cosine takes a negative value, and thus, the absolute cosine value does not necessarily reflect the degree of similarity in that, for example, a zero cosine does not imply that there is no similarity.

For instance, the two line graphs in Fig. 4 illustrate the similarity distribution computed using two metaphor vectors for the metaphor ‘‘Anger is the sea.’’ The solid line with filled circles depicts the similarity distribution qcat(θcat) computed by the categorization algorithm Categ with θcat = (m,k) = (10,7), and the dotted line with filled triangles depicts the similarity distribution qcom(θcom) computed by the comparison algorithm Compa with θcom = (k) = (1). This figure shows, for example, that the normalized degree of similarity qi(θ) for the feature fearful/dreadful is qi,cat(θcat) = 0.157 for the Categ algorithm and qi,com(θcom) = 0.247 for the Compa algorithm.

4.3. Assessing the match between model and data

As mentioned in the beginning of this section, the match between the model and data can be assessed as the degree of similarity between the computed distribution q(θ) and the human salience distribution p. The greater the similarity between the two distributions, the better is the algorithm's simulation of human interpretation.

To quantitatively evaluate similarity or dissimilarity between the two distributions, this study used KL-divergence, which is also known as relative entropy. KL-divergence is the most popular measure of dissimilarity between two probability distributions and has been applied in computational semantics as a semantic similarity measure (e.g., Hu et al., 2006; Manning & Schütze, 1999). The KL-divergence D(p || q(θ)) of the computed similarity distribution q(θ) relative to human salience distribution p is given by the following formula:

image(5)

As it measures how badly the model's distribution q(θ) approximates observed distribution p, lower divergence implies better performance.

Minimizing the KL-divergence is equivalent to maximizing the likelihood function (Bishop, 2006), because Eq. 5 can be written as:

image(6)

The first term −H(p) on the right-hand side of Eq. 6 is the (negative) interpretive diversity defined by Eq. 2 and independent of θ, and the second term is the (negative) log-likelihood function inline image (divided by N) for θ under the distribution q(θ). Therefore, minimizing D(p || q(θ)) is equivalent to maximizing the log-likelihood function.

4.4. Estimating optimal parameters by maximum likelihood method

For each metaphor, optimal parameters inline image and inline image of the two algorithms (i.e., categorization and comparison models) were computed by finding the parameter values that minimize the KL-divergence, that is, maximize the likelihood function. The parameter space was given such that the parameter m varied between 10 and 45 in steps of 5 and between 50 and 500 in steps of 50, and the parameter k varied between 1 and 10.

In the case of the metaphor ‘‘Anger is the sea,’’ for example, the parameter θcat = (m,k) = (10,7) minimized the KL-divergence of the categorization model (i.e., Categ algorithm), and the parameter θcom = (k) = (1) minimized the KL-divergence of the comparison model (i.e., Compa algorithm). (Two line graphs in Fig. 4 show the similarity distributions at these optimal parameters.) The minimum KL-divergence was 0.147 for the categorization model and 0.0854 for the comparison model. These KL-divergence values indicate that the similarity distribution computed by the comparison model (i.e., the dotted line with filled triangles) is more similar to the human salience distribution (i.e., bar chart) than the similarity distribution computed by the categorization model (i.e., the solid line with filled circles).

4.5. Selecting the best comprehension model by AIC

To compare the two models while considering the tradeoff between the model's precision and complexity, this study used AIC, which has been widely used as a tool for statistical model selection (e.g., Wagenmakers & Farrell, 2004). In general, AIC is given by:

image(7)

where inline image is the maximum value of the likelihood function for the model, and K is the number of parameters in the model. Smaller AIC values represent more plausible models. Hence, the model with the smallest AIC can be selected as the best one.

In this study, the AIC value can be calculated by:

image(8)

where K = 2 for the categorization model (because the algorithm Categ has two parameters m and k), and K = 1 for the comparison model (because the algorithm Compa has only one parameter k). For each metaphor, ΔAIC was calculated as the difference between the AIC value of the categorization model (AICcat) and the AIC value of the comparison model (AICcom).

image(9)

If ΔAIC > 0 (i.e., AICcom is less than AICcat), the comparison model (Compa) was selected as the one that best approximated the underlying comprehension process; conversely, if ΔAIC < 0 (i.e., AICcat is less than AICcom), then the categorization model (Categ) was selected.

The result of the model selection for all 40 metaphors was that the categorization model was selected for 11 metaphors (the metaphors numbered 3, 17, 18, 25, 28, 31, 32, 33, 37, 38, and 39 in Table 8), and the comparison model was selected for 29 metaphors. The mean ΔAIC for the 11 metaphors judged to be comprehended as categorizations was −2.71 (SD = 2.64), and that for the 29 metaphors judged to be comprehended as comparisons was 1.87 (SD = 1.38). For example, in the case of ‘‘Anger is the sea’’ in Fig. 4, the AIC value of the categorization model was 165.98, and that of the comparison model was 159.07. Because ΔAIC = 6.91 was positive, the comparison model was selected as the best model, suggesting that the metaphor ‘‘Anger is the sea’’ is likely to be comprehended as a comparison, rather than as a categorization. Indeed, Fig. 4 shows that for six of eight features, the normalized degree of similarity inline image computed by the comparison model is closer to the human data than the degree of similarity inline image by the categorization model. Furthermore, the comparison model correctly distinguishes the three most salient features (i.e., fearful, rage, and deep) from other less salient features by computing the degree of similarity, whereas the categorization model does not.

4.6. Testing the goodness-of-fit between data and model

For each metaphor and its selected model, the chi-square goodness-of-fit test was conducted to examine whether the match between the model and data was significant. Metaphors that displayed a significant discrepancy between the model and data would be excluded from the subsequent analysis. (Note that the goodness-of-fit test was not applied to the model that was not selected by the AIC model selection procedure because the goodness-of-fit for the model not selected has no influence on the subsequent analysis.)

It is well known that the KL-divergence can be approximated by chi-square (divided by 2N) because it is identical up to the third order (Cover & Thomas, 2006).

image(10)

Hence, the discrepancy between the model distribution and the observed human distribution is significant (i.e., the null hypothesis that the data distribution follows the model distribution is rejected) if the KL-divergence multiplied by 2N exceeds the critical value of the chi-square distribution with n − 1 degrees of freedom.

image(11)

The result of the goodness-of-fit test for all the metaphors was that none of the fits of the selected model to the data were rejected at a significance level of α = 0.05; thus, all the 40 metaphors would be included in the subsequent analysis. For example, in the case of the metaphor ‘‘Anger is the sea’’ in Fig. 4, the selected model (i.e., comparison) was accepted as a good fit to the human data; the minimum KL-divergence (=0.0854) multiplied by 2N(=2 × 40) equals 6.832, which did not exceed the critical value 14.07 (d.f. = 7).

4.7. Evaluating metaphor theories by discriminant analysis

A linear discriminant analysis was conducted to reveal the metaphor properties that determine the choice of comprehension process. The dependent variable was whether the selected model (i.e., the comprehension process) is categorization or comparison. The independent variables comprised three metaphor properties, namely, vehicle conventionality, metaphor aptness, and interpretive diversity, whose correlations were r = .36 between conventionality and aptness, r = −.30 between conventionality and diversity, and r = −.14 between aptness and diversity.

Table 9 shows the result of the discriminant analysis based on all the 40 metaphors. The analysis yielded a significant discrimination function, Wilk's lambda = 0.70, F(3,36) = 5.08, p < .005. The function correctly classified 32 of 40 metaphors (80.0%), and the kappa coefficient of agreement κ = 0.55 was significant, Z = 3.10, p = .002. The metaphors numbered 15, 16, 23, 24, 29, 35, 37, and 39 in Table 8 were not correctly classified. The left table in Table 10 shows the classification table of the analysis. For the class of categorization, recall was 0.82 (=9/11) and precision was 0.60 (=9/15). For the class of comparison, recall was 0.79 (=23/29) and precision was 0.92 (=23/25). Therefore, the balanced F-score (i.e., the harmonic mean of recall and precision) was 0.69 for the class of categorization and 0.85 for that of comparison.

Table 9. 
Result of the non-cross-validated discriminant analysis for predicting the choice between categorization and comparison models
Standardized coefficients
 Vehicle conventionality1.37 (p = .0062)
 Metaphor aptness−0.28 (p = .54)
 Interpretive diversity1.47 (p = .0022)
Wilk's lambda0.70
R20.30
Accuracy (correctly predicted)0.80
Cohen's kappa0.55
Table 10. 
Classification tables of non-cross-validated and cross-validated discriminant analyses
ActualPredicted Categories
Non-cross-validated ClassificationCross-validated Classification
CatComTotalCatComTotal
  1. Note. Cat, categorization; Com, comparison.

Cat92119211
Com6232972229
Total152540162440

Concerning the standardized discriminant coefficient for the three metaphor properties, Table 9 demonstrates that interpretive diversity had the highest discriminant coefficient and was significantly associated with the choice of comprehension process, F(1,36) = 10.89, p < .005. This result is consistent with the interpretive diversity view, which argues that high-diversity metaphors are processed as categorizations and low-diversity metaphors are processed as comparisons. Vehicle conventionality had the second-highest coefficient and also reached statistical significance, F(1,36) = 8.47, p < .01. This result is consistent with the conventionality view, which predicts that conventional metaphors are processed as categorizations, whereas novel metaphors are processed as comparisons. On the other hand, metaphor aptness did not affect the choice of comprehension process; its standardized coefficient −0.28 was not significant. This result is not consistent with the aptness view, suggesting that the choice of comprehension strategy may not depend on metaphor aptness. Hence, the result of the discriminant analysis indicates that both the interpretive diversity view and the conventionality view are plausible theories of metaphor comprehension.

This result was replicated by the cross-validated discriminant analysis, suggesting that the finding on the importance of interpretive diversity and conventionality may be independent of the training data. The leave-one-out cross-validation method was used for the cross-validated analysis, in which each metaphor was classified using a discriminant function derived from the remaining 39 metaphors. The classification table for the cross-validated analysis (the right table of Table 10) shows that 31 metaphors (77.5%) were classified correctly and the kappa coefficient of agreement κ = 0.51 was significant, Z = 2.92, p < .005. All the eight misclassified metaphors in the non-cross-validated analysis were also misclassified in the cross-validated analysis, and additionally, the metaphor numbered 2 was misclassified in the cross-validated analysis.

To compare the predictive ability between interpretive diversity and vehicle conventionality, I conducted a commonality analysis. Commonality analysis is a method of variation partitioning by which one can calculate the proportions of variance in the dependent variable associated uniquely with each of the independent variables (i.e., unique contributions of independent variables to the prediction of the discriminant analysis), as well as the proportions of variance attributed to various combinations of independent variables (i.e., common contributions of the combinations of variables). Table 11 shows the result of the commonality analysis. Interpretive diversity made a larger unique contribution (0.212) to predicting model selection than vehicle conventionality (0.165); this suggested that interpretive diversity may be a more important factor in explaining the choice of comprehension process. The negative common contribution (−0.080) of interpretive diversity and vehicle conventionality indicates that they have no joint effect, or more concretely, that the two variables are competitive in the sense that one variable hinders the contribution of the other (Legendre & Legendre, 1998). In addition, I conducted two separate discriminant analyses, one considered only interpretive diversity as the independent variable, whereas the other considered only conventionality. The discriminant analysis with interpretive diversity yielded a significant discrimination function, Wilk's lambda = 0.87, F(1,38) = 5.64, p < .05, which correctly classified 29 (72.5%) metaphors. The kappa coefficient κ = 0.42 showed moderate agreement and was significant, Z = 2.56, p = .01. On the other hand, the discriminant analysis with conventionality did not yield a significant discrimination function, Wilk's lambda = 0.92, F(1,38) = 3.08, p = .09. The function correctly classified only 47.5% of the metaphors and the kappa coefficient was negative, which indicated that there was no agreement between the prediction and the simulation experiment. These findings suggest that interpretive diversity may be a better predictor of the metaphor comprehension process.

Table 11. 
Unique and common contributions of three metaphor properties in accounting for the variance in the choice of the metaphor comprehension process
Unique ContributionsCommon ContributionsSum
IDVCAPID and VCID and APVC and APID, VC, and AP
  1. Note. ID, interpretive diversity; VC, vehicle conventionality; AP, aptness.

0.2120.1650.007−0.0800.002−0.004−0.0060.298

Furthermore, to justify the finding of the simulation, I examined the effect of the imageability of metaphors (Marschark, Katz, & Paivio, 1983), that is, the ease with which the metaphorical sentence evokes mental imagery. Various metaphor studies (e.g., Marschark & Hunt, 1985; Marschark et al., 1983; Paivio & Walsh, 1993) have addressed the role of mental imagery in metaphor comprehension since the very beginning of metaphor research. If metaphor imageability accounts for a significant portion of the variance in the discriminant analysis, this weakens the validity of the finding that both the interpretive diversity view and the conventionality view are plausible. The imageability of the 40 metaphors was rated by 21 participants on a 7-point scale ranging from 1 (difficult to evoke mental imagery) to 7 (easy to evoke), and the mean imageability rating for each metaphor was used as the independent variable of the discriminant analysis. Furthermore, lexical properties (i.e., vehicle frequency, topic frequency, vehicle concreteness, topic concreteness, vehicle familiarity, and topic familiarity) were also used as the independent variables. Word concreteness was obtained by the rating study in which 11 participants rated 40 words used in the 40 metaphors on a 7-point scale of concreteness (1: abstract, 7: concrete). On the other hand, word frequency and familiarity values were derived from the database of Japanese lexical properties ‘‘Nihongo No Goi Tokusei.’’ The result of the (non-cross-validated) discriminant analysis was that none of these seven properties were significantly associated with the dependent variable. This result indicates that the explanatory power of interpretive diversity and conventionality is not attributable to these factors.

In sum, these results support the conclusion that both the interpretive diversity view and the conventionality view are plausible theories of metaphor comprehension. Additionally, interpretive diversity emerged as a better predictor of the metaphor comprehension process in this study, mimicking the experimental results. However, simulations with other metaphor data may yield different conclusions.

5. General discussion

5.1. Implications of the simulation results

The simulation experiment reported in this article demonstrated that the interpretive diversity view and the conventionality view are plausible, but it did not provide evidence supporting the aptness view. This computational finding is consistent with the empirical finding obtained by Utsumi (2007); in his psychological experiment (Experiment 1), both interpretive diversity and vehicle conventionality were found to be significant predictors of the choice of comprehension process. Therefore, I have obtained theoretical and experimental convergence on the conclusion that the interpretive diversity view and the conventionality view are plausible theories of metaphor comprehension. The observed consistency between empirical and theoretical findings also indicates that the computational methodology of this study is potentially useful for providing a new insight into the cognitive processes in metaphor comprehension and possibly language comprehension in general. If the cognitive processes that are being explored can be appropriately modeled in the semantic-space-based framework, the maximum likelihood method can determine which processes are plausible.

Why does interpretive diversity (or semantic richness) affect metaphor comprehension? Utsumi (2007) has provided one possible answer in terms of the nature of categorization. When people interpret that an entity X is a member of, or classified into, a category Y, entity X is expected to share many salient features with category Y because the members of a category inherit many features of the category by default. In other words, a semantically rich entity is easy to categorize. Hence, the categorization process proceeds more easily when more features of category Y can be attributed to X, that is, when a pairing of X and Y is more diverse. As a result, interpretively diverse metaphors are comprehended via a categorization process, whereas less diverse metaphors fail to be processed as categorizations, and thus, they must be reinterpreted via a comparison process.

Empirical evidence for the effects of semantic richness or diversity has been established by a number of studies on language comprehension. Rodd, Gaskell, and Marslen-Wilson (2002) demonstrated that semantically rich words with many related senses facilitated word recognition. Similarly, Pexman, Lupker, and Hino (2002) found a number-of-features effect, that is, faster lexical decision responses, for words with many semantic features than words with fewer semantic features. Pexman, Holyk, and Monfils (2003) demonstrated that the number-of-features effect was also observed in the semantic categorization task; semantically richer words were more quickly judged as a member of a given category, and such an effect was greater when a given category was broader, in other words, semantically richer. Furthermore, Adelman et al. (Adelman & Brown, 2008; Adelman, Brown, & Quesada, 2006) have recently demonstrated that contextual diversity—the number of contexts in which a word appears—affects lexical decision. As semantically richer words will be used in more variable contexts, contextual diversity can also be considered as a measure of semantic richness.

5.2. Semantic space model and the embodied theory of metaphor

As mentioned in Section 2, cognitive linguists have proposed that metaphor comprehension is fundamentally embodied (e.g., Gibbs, 2006; Kövecses, 2002; Lakoff & Johnson, 1980, 1999). According to the embodied theory of metaphor, metaphorical expressions can be comprehended on the basis of conceptual metaphors, which are grounded in embodied experiences. For example, to comprehend a verbal metaphor ‘‘I'm feeling up today,’’ people must know the conceptual metaphor Happy Is Up and it is acquired from their experiential correlation between an affective state of happiness and an upright posture (Lakoff & Johnson, 1999). Many abstract concepts are comprehended in the same way; love can be understood as an act of traveling (e.g., Love Is A Journey), and theories can be understood as physical structures (e.g., Theories Are Buildings). In addition, recent research in cognitive science has also argued that language comprehension in general is embodied (e.g., Barsalou, 1999, 2008; Pecher & Zwaan, 2005); the meaning of linguistic symbols can be captured by grounding them in human perceptual experiences with the environment. The embodied theory of language comprehension naturally implies that semantic space models such as LSA cannot simulate language comprehension in general (Glenberg & Robertson, 2000; Zwaan & Yaxley, 2003) and metaphor comprehension in particular because semantic space models are not embodied. If this is true, then this study cannot provide any evidence concerning the cognitive mechanism of metaphor comprehension.

Against the criticism of the inability of semantic space models made by the embodied theory, this article responds in two ways. One way of defending the position that metaphor comprehension can be computationally simulated by the semantic space models is to demonstrate the ability of semantic space models in explaining linguistic phenomena that according to the embodied theory cannot be explained by semantic space models. Although it is still controversial whether semantic space models can represent knowledge based on embodied experiences and whether they can explain embodied comprehension (de Vega, Glenberg, & Graesser, 2008), many recent studies have demonstrated that semantic space models such as LSA (or co-occurrence statistics) are capable of doing so (e.g., Kintsch, 2007, 2008b; Louwerse, 2007, 2008; Louwerse & Van Peer, 2009). For example, Louwerse (2007) demonstrated that LSA can successfully distinguish non-afforded sentences (e.g., ‘‘He used his glasses to dry his feet.’’) from afforded sentences (e.g., ‘‘He used his shirt to dry his feet.’’) and related sentences (e.g., ‘‘He used his towel to dry his feet.’’); this is in contrast to Glenberg and Robertson's (2000) claim that LSA cannot capture such embodied distinction. Furthermore, the assumption of the embodied metaphor theory that metaphorical expressions are inevitably linguistic realizations of conceptual metaphors would imply that linguistic co-occurrence can capture conceptual metaphors. For example, it is highly likely that Happy Is Up encourages words expressing affective states and words expressing vertical positions to co-occur in text. It follows that LSA can explain embodied metaphors. Indeed, Mason (2004) revealed that many conceptual metaphors can be extracted automatically from a large corpus.

Another way of defending the position that the semantic space models can simulate metaphor comprehension is to demonstrate that the role of embodiment in metaphor comprehension is more limited than expected by the embodied theory of metaphor. As mentioned in Section 2.1.1, although there is little doubt that primary metaphors are embodied, it is highly unclear whether any complex metaphors are embodied or whether they are necessary for metaphor comprehension. Concerning the need for conceptual metaphors, some negative findings have established that people do not necessarily comprehend metaphors depending on conceptual metaphors (Glucksberg & McGlone, 1999; Keysar, Shen, Glucksberg, & Horton, 2000; Murphy, 1996). Surprisingly, even Barsalou (1999), who adopts an embodied view of cognition, pointed out that abstract concepts such as anger are directly grounded in perceptual experience, without being mediated by conceptual metaphors. He also suggested that conceptual metaphors were not required in this case; familiar or conventional metaphors may bypass conceptual metaphors.

From these discussions, it can be concluded that metaphor comprehension can be adequately simulated by the computational models presented in this article. It can be asserted that the criticism made by the embodied theories does not apply to the framework of this study.

5.3. Computational approaches to metaphor comprehension

Over the past few decades, a number of computational studies on metaphor comprehension have been conducted. Computational studies from the 1990s include computational discrimination among metaphor, metonymy, anomaly, and literalness using lexical semantics (Fass, 1991); comprehension of predicative metaphors using knowledge about conceptual metaphors (Martin, 1992); and connectionist implementation of nominal metaphor comprehension (Thomas & Mareschal, 2001) and adjective metaphors (Weber, 1991).

This study essentially differs from these computational studies in that they did not test the validity of their computational models in a systematic way; they provided only a small number of examples, whose plausibility was judged on the basis of the researcher's insight. The reason behind this drawback was that the lexical, semantic, or metaphorical knowledge used in these studies had to be manually coded by the researchers, and therefore, was small in size.

In recent years, however, very large corpora have become easily available and corpus-based computational studies on metaphor have been conducted. One corpus-based approach to metaphor is to automatically build a large knowledge base on conceptual metaphor, which is used for comprehending predicative metaphors (Martin, 1994; Mason, 2004), particularly for the technical purpose of dealing with metaphors in an NLP system.

A more important and promising corpus-based approach is to develop a computational model of metaphor comprehension using a semantic space model constructed from the statistical analysis of a huge corpus. A pioneering work that follows this approach is Kintsch's (2000, 2008a) computational model of metaphor comprehension based on LSA. Kintsch applied his predication algorithm to metaphor comprehension and demonstrated that the model can not only compute intuitively reasonable interpretations of metaphors but also account for some of the phenomena observed in metaphor comprehension experiments, such as the nonreversibility of metaphors. However, he did not test the model's psychological plausibility either in a direct or systematic fashion; in other words, he did not clarify how well the computed interpretation fits with human data for metaphor interpretation. Lemaire and Bianco (2003) also employed LSA to develop a computational model of referential metaphors for simulating the processing time difference between a metaphorical reference and a literal reference. They modeled the processing time of referential expressions as the depth of the search for those neighbors of a referential expression that are also related to a given context. Using this model, they showed that the simulation result was consistent with empirical data on the processing time difference between a metaphorical reference and a literal reference according to a different (literally supportive or metaphorically supportive) context. However, their model has some limitations: It cannot compute the meaning of referential metaphors (i.e., the referent of metaphorical references). Thus, Lemaire and Bianco have not addressed how well their model mimics human interpretations.

In contrast, the LSA-based approach to metaphor presented in this article differs from these studies in two ways. First, this study employs a quantitative measure of the fit between the model (i.e., computer interpretations of metaphors) and data (i.e., human interpretations of the same metaphors) to evaluate the degree to which the computational model imitates human behavior concerning metaphor comprehension. Second, this study uses a computational methodology to provide an original contribution to the understanding of the cognitive mechanisms of metaphor comprehension, rather than to simply retest or confirm the empirical findings. In other words, this study determines which of the metaphor views are more plausible, by identifying the view that can best explain the result of the simulation in which human behavior can be simulated by the model that embodies metaphor comprehension process. In contrast, other LSA-based studies only showed whether human behavior could be simulated by the model that may not embody existing metaphor views. As mentioned in Section 5.1, the observed consistency between the existing empirical findings and the computational finding of this study provides some support for the usefulness of the computational methodology of this study for metaphor research.

5.4. Limitations of the simulation method

The semantic-space-based methodology presented in this article has its limitations; these limitations reveal the aspects of the simulation results that do not hold for other metaphor views apart from those tested directly in the simulation experiment.

One important limitation is that the finding obtained in this study does not address the subtle but crucial differences among various views on the comparison process described in Section 2.1.1. A crucial perspective according to which these views are differentiated is the kind of similarities that are preferentially included in the common structure obtained during the comparison process. For example, Gentner's structure mapping theory argues that the comparison process primarily focuses on the relational similarities, whereas Holyoak's ACME and LISA argue that semantic and pragmatic similarities are required in the comparison process. At present, the semantic-space-based methodology is less likely to provide an appropriate technique to compare the plausibility of these views; Ramscar and Yarlett (2003) suggested that a semantic space model such as LSA does not have sufficient modeling ability for analogical mapping, although it simulates appropriate patterns of analogical retrieval. However, I am somewhat optimistic about this issue in that Kintsch (2008a) and Mangalath, Quesada, and Kintsch (2004) show a possibility of LSA-based modeling of analogical mapping.

Another limitation of the semantic-space-based methodology concerns the time-course of metaphor comprehension. The semantic space framework is not suitable for simulating the temporal behavior of the comprehension process, because it does not provide a method for representing time. Although some product measures such as comprehension speed can be simulated in this framework (e.g., Lemaire & Bianco, 2003), a fine-grained analysis of the time-course using eye movement or functional brain mapping cannot be simulated. (Note that it does not mean that the semantic space model cannot model cognitive processes. The extent of time-course details specified is different between the semantic space model and other models, such as the connectionist model [e.g., recurrent networks], that are much more adequate for representing time.) This limitation may be serious for metaphor research, given that a considerable number of metaphor studies (e.g., Gibbs, 1994; Giora, 2003) have been conducted on the time-course of literal and metaphorical comprehension. In particular, it is impossible to examine whether two processes run serially or in parallel in the semantic space framework. An interesting and efficient method to deal with the time-course in the semantic space model is to integrate the connectionist models with the semantic space model. Recurrent neural networks can exhibit dynamic temporal behavior of activation computation, which can be analyzed as the time-course of language comprehension (e.g., Kawamoto, 1993; McRae, de Sa, & Seidenberg, 1997). Such neural networks generally employ distributed representation; the vector representation of word meaning is a good implementation of distributed representation. If a computational method for constructing a metaphor vector (or a sentence vector in general), such as Categ and Compa algorithms, can be implemented on a recurrent network, it provides an effective way to computationally analyze the time-course of metaphor comprehension.

Examining whether the two metaphor comprehension processes (i.e., categorization and comparison) run serially or in parallel is a very interesting topic for further research. All the three hybrid views of metaphor tested in this article assume serial processing, but it is likely that there is a race between two processes running in parallel.9 One possible version of a race model would be that metaphor comprehension starts with both processes, and the comparison process is suppressed later when the categorization process works properly because of the high diversity of a metaphor or its fast access to the conventional metaphoric categories; the comparison process wins the race only when the categorization process does not work. Although the present semantic-space-based methodology cannot provide a useful tool for investigating such the race model, it is worth pursuing this issue as a metaphor study as well as a methodological study of semantic space models. The integration of the semantic space model and the connectionist model mentioned earlier may offer an effective approach to evaluating the race model.

6. Conclusion

A semantic space model, such as LSA, can provide an effective technique to simulate metaphor comprehension processes, such as categorization and comparison. Using a semantic space model, this study has attempted to determine which of the existing metaphor views, namely, the conventionality view (Bowdle & Gentner, 2005), aptness view (Glucksberg & Haught, 2006b), or interpretive diversity view (Utsumi, 2007), is most plausible. The simulation experiment, which comprised model selection and theory testing, has shown that the interpretive diversity and conventionality views significantly account for which of the categorization and comparison models fits better with the empirical data; this finding indicates that both views are plausible. These results are consistent with Utsumi's (2007) empirical findings, and thus, they strengthen the validity of the interpretive diversity and conventionality views. At the same time, these results indicate the potential of the semantic-space-based computational methodology for the cognitive study of language comprehension.

Footnotes

  • 1

    Bowdle and Gentner (2005) refer to this view as the ‘‘career of metaphor’’ hypothesis. This term emphasizes an evolutionary aspect of metaphor comprehension. When a metaphor is first used (i.e., it is novel), it is comprehended strictly as comparison. However, if this metaphor is repeatedly used to convey the same meaning, then this repeated mapping process gives rise to the creation of an abstract category that becomes associated with the vehicle. They refer to the process through which a vehicle term becomes associated with a metaphoric category as conventionalization. Hence, conventionalization results in an evolutionary shift from comparison to categorization. Note that in this article I do not use the term ‘‘career of metaphor’’ to refer to their view, because this study is not directly concerned with an evolutionary aspect of metaphor comprehension.

  • 2

     Content words are words that primarily express lexical meanings; therefore, they can be represented as vectors in a semantic space. On the other hand, function words, such as articles, auxiliary verbs, and pronouns, which primarily express grammatical relationships, are not represented because they have little lexical meaning. Grammatical functions should not be attributed to the vector representation; they should be considered in the method for generating a vector representation of a sentence.

  • 3

     Formally, given that tfij is the frequency of the ith word wi in the jth document (e.g., paragraph) and R is the number of documents, the jth element wij of the word vector for the word wi is computed by the following formulas:

    image
    image
  • 4

     The step of picking up k neighbors with the highest cosine to A is an approximation of the original predication algorithm. In the original predication algorithm, a spreading activation network along the lines of the construction–integration model (Kintsch, 1998) is constructed; this network comprises A, P, and m nearest neighbors of P (or all other terms in the semantic space). In this network, each term is connected to A and P with the cosine similarity between the two nodes as a weight and is also connected to every other term by an inhibitory link. However, Kintsch (2000, 2001) suggested that an approximation yields quite the same result as such a self-inhibitory network, but reduces the processing cost of spreading activation. Therefore, similar to Kintsch (2000, 2001), I have used this approximation. Note that this approximation is also supported by the recent finding (Rowe & McNamara, 2008) that inhibition needs no negative links in the construction–integration model.

  • 5

    Table 3 (and other tables) lists some phrases comprising multiple words (e.g., ‘‘disease onset,’’‘‘drug disaster,’’‘‘blood sampling’’), which appear inconsistent with the assumption that the semantic space model can only represent vectors for single words. However, the Japanese translations of these phrases are single words (e.g., ‘‘hatsubyo,’’‘‘yakugai,’’‘‘saiketsu’’), and thus, an inconsistency does not actually occur.

  • 6

     These values are computed by using the semantic space employed in the simulation experiment, which will be presented in Section 4. The Categ algorithm computed a sentence vector in Fig. 1 (and also in Fig. 2) with m = 20 and k = 3, whereas the Compa algorithm computed a sentence vector with k = 3. Kintsch (2001) suggests that these parameter values work effectively for literal sentences.

  • 7

     Using some examples, Kintsch (2001) also showed that the predication algorithm works well for a model of categorization. For example, he demonstrated that the vector for ‘‘A pelican is a bird’’ computed by the predication algorithm became more similar to the features related to bird (e.g., sing beautifully) and less similar to the features irrelevant to bird (e.g., eat fish and sea) than the original vector of pelican.

  • 8

     In this article, I use a generic notation without a subscript indicating the algorithm (e.g., θ instead of θcat and θcom, and q instead of qcat and qcom), if the description is applicable to both algorithms or models.

  • 9

     I thank one of the reviewers for suggesting the possibility of a race between categorization and comparison.

Acknowledgments

This study was supported by a Grant-in-Aid for Scientific Research C (No. 17500171 and No. 20500234), The Ministry of Education, Culture, Sports, Science and Technology. I thank the associate editor Danielle S. McNamara and four anonymous reviewers for their insightful comments and suggestions, which helped me improve the article.

Appendix

Appendix A: Procedure for obtaining human interpretation data

In this appendix, I describe in detail the procedure of the psychological experiment (Experiment 2) conducted by Utsumi (2007), which is the source of human interpretation data for this study.

The experiment comprised metaphor comprehension, simile comprehension, and the rating of topic–vehicle pairs Utsumi (2007). (Simile comprehension is not described next because this study did not use any results related to simile comprehension.) In the experiment for metaphor comprehension, 42 undergraduate students of Japan Women's University, who were all native speakers of Japanese, were assigned two metaphors that shared neither vehicles nor topics (e.g., ‘‘Anger is the sea’’ and ‘‘Sleep is a storm’’) from each of the 10 groups; therefore, each participant comprehended 20 metaphors from the total of 40. Metaphors of each group were counterbalanced such that they were assigned to 21 participants. Participants performed three subtasks, namely, a feature listing task, a free description task, and a comprehensibility rating task; however, this study used only the data obtained in the feature listing task. In the feature listing task, participants were asked to consider the meaning of each metaphor and list three or more features (i.e., meanings) of the topic that they thought were involved in the interpretation of metaphors by words or phrases.

After the metaphor comprehension experiment, the following preprocessing was conducted for each metaphor M to obtain the final list of metaphorical meaning W(M). First, a list of features generated in the metaphor comprehension experiment was generated for each metaphor textitM. Then, closely related words or phrases in the generated list of features were accepted as the same feature, if they met any of the following four criteria: (a) they belonged to the same deepest category of a Japanese thesaurus Bunrui Goi Hyo (e.g., kakasenai and hitsuyoufukaketsu in Japanese, both of which mean being indispensable); (b) they shared the same root form (e.g., red [akai in Japanese] and redness [akasa in Japanese]); (c) they differed only in degree because of an intensive modifier (e.g., frightened and quite frightened); or (d) a dictionary description of one word included the other word or phrase (e.g., lie and not true). After this feature combination process, any feature mentioned by only one participant was eliminated from the list of features. The amended list of features according to this preprocessing was used as a list of meanings W(M) in this study.

The rating experiment comprised three rating tasks (vehicle conventionality, metaphor aptness, and similarity) for metaphors and similes (Utsumi, 2007). The simulation experiment in this study required only the conventionality and aptness ratings of metaphor. For vehicle conventionality and metaphor aptness, 144 Japanese undergraduate students of the University of Electro-Communications were recruited and assigned 10 metaphors. One half of these students performed the conventionality rating task, whereas the other half performed the aptness rating task. In the conventionality rating task, participants were given a list of the vehicle terms of the metaphors and the most salient meaning and asked to rate how conventional each meaning was as an alternative sense of the vehicle term on a scale of 1 (very novel) to 7 (very conventional). For example, as the meaning ephemeral was listed by the largest number of participants for ‘‘Death is the fog,’’ the participants of this task were asked the following question: ‘‘When we say that something (X) is the fog, how conventional is the interpretation that this is something (X) that is ephemeral?’’ This method of assessing vehicle conventionality was identical to the method used by Bowdle and Gentner (2005). In the aptness rating task, participants were asked to rate how apt each metaphor was, on a 7-point scale ranging from 1 (not at all apt) to 7 (extremely apt). Following previous research (Jones & Estes, 2006), this study defined aptness as the extent to which the metaphor captured the important features of the topic. These ratings were then averaged across participants for each metaphor.

Ancillary