This research is based in part on the first author's PhD dissertation and was supported by a Standard Research Grant from the Social Sciences and Humanities Research Council (SSHRC) of Canada to Heine and an SSHRC post-doctoral fellowship to Falk. Correspondence should be addressed to Carl F. Falk, University of California, Los Angeles, Graduate School of Education & Information Studies, Los Angeles, CA 90095; email: email@example.com.
Are Implicit Self-Esteem Measures Valid for Assessing Individual and Cultural Differences?
Article first published online: 8 FEB 2014
© 2013 Wiley Periodicals, Inc.
Journal of Personality
Volume 83, Issue 1, pages 56–68, February 2015
How to Cite
Falk, C. F., Heine, S. J., Takemura, K., Zhang, C. X. J. and Hsu, C.-W. (2015), Are Implicit Self-Esteem Measures Valid for Assessing Individual and Cultural Differences?. Journal of Personality, 83: 56–68. doi: 10.1111/jopy.12082
- Issue published online: 7 JAN 2015
- Article first published online: 8 FEB 2014
- Accepted manuscript online: 3 DEC 2013 05:27AM EST
- Standard Research Grant from the Social Sciences and Humanities Research Council (SSHRC) of Canada
- SSHRC post-doctoral fellowship
- Implicit Attitudes;
- Cross-Cultural Psychology;
Our research utilized two popular theoretical conceptualizations of implicit self-esteem: 1) implicit self-esteem as a global automatic reaction to the self; and 2) implicit self-esteem as a context/domain specific construct. Under this framework, we present an extensive search for implicit self-esteem measure validity among different cultural groups (Study 1) and under several experimental manipulations (Study 2).
In Study 1, Euro-Canadians (N = 107), Asian-Canadians (N = 187), and Japanese (N = 112) completed a battery of implicit self-esteem, explicit self-esteem, and criterion measures. Included implicit self-esteem measures were either popular or provided methodological improvements upon older methods. Criterion measures were sampled from previous research on implicit self-esteem and included self-report and independent ratings. In Study 2, Americans (N = 582) completed a shorter battery of these same types of measures under either a control condition, an explicit prime meant to activate the self-concept in a particular context, or prime meant to activate self-competence related implicit attitudes.
Across both studies, explicit self-esteem measures far outperformed implicit self-esteem measures in all cultural groups and under all experimental manipulations.
Implicit self-esteem measures are not valid for individual or cross-cultural comparisons. We speculate that individuals may not form implicit associations with the self as an attitudinal object.
With the recent advent of measures that assess implicit processes, much research has targeted people's nonconscious and automatic attitudes. In the past decade or so, there have been hundreds of published studies conducted with just one of these measures: the Implicit Association Test (IAT; Greenwald, Poehlman, Uhlmann, & Banaji, 2009). Simultaneously, self-esteem stands as one of the most popular research topics in psychology (Scheff & Fearon, 2004). At the intersection of these two influential research topics lies implicit self-esteem (ISE), which may be defined as “a global self-evaluation that people are unable or unwilling to report” (Buhrmester, Blanton, & Swann, 2011, p. 366). Researchers have developed a variety of ways to measure ISE, and such measures are thought to circumvent any explicit attempts for impression management. These measures have offered the promise that scientists can peel away the layers of self-presentation motives to discover one's “true” self feelings as they appear in the raw.
Most implicit attitude measures show respectable evidence of validity. For example, the IAT correlates with various outcomes .27 on average across over 100 studies and often predicts outcomes above and beyond self-report (Greenwald et al., 2009), and other implicit attitude measures perform similarly well (Rooke, Hine, & Thorsteinsson, 2008). The IAT tends to correlate stronger with self-reported outcomes than observer-ratings (Greenwald et al., 2009) and most research (about two-thirds) has used self-reported outcomes rather than observer-ratings (about one-third). However, despite the evidence for implicit measures in general, the evidence for the validity of ISE measures has been strikingly limited.
Bosson, Swann, and Pennebaker (2000) found that 4 self-report measures of explicit self-esteem (ESE) far outperformed 7 different ISE measures in terms of convergent validity and criterion validity. Despite this poor performance, the IAT self-esteem measure and the name-letter test (NLT; Bosson et al., 2000) emerged as widely used indices of ISE. Two meta-analyses have shown that the IAT self-esteem and NLT have the lowest implicit-explicit correlations among any kind of implicit attitude (Hofmann, Gawronski, Gschwendner, Le, & Schmitt, 2005; Krizan & Suls, 2008). More recently, Buhrmester and colleagues’ (2011) review and meta-analysis of these measures concluded that the IAT self-esteem and NLT lacked construct validity. More specifically, these measures displayed poor convergent and predictive validity across a wide range of phenomena and empirical studies, and did not exhibit properties thought to be characteristic of ISE (e.g., stability over time and under manipulations). Many research findings establishing the “validity” of these measures are isolated or rarely replicated. In general, these measures were widely outperformed by ESE measures. Given the widespread popularity of the IAT and NLT together with the penchant of journals for publishing significant findings, we would expect that the validity evidence for these measures may even be worse due to a file-drawer effect. In sum, the existing validity evidence for ISE, at least with the methods that have been explored in these meta-analyses, is surprisingly lacking for such a widely used construct.
What Could be Wrong with a Measurement Procedure?
A single measurement procedure does not work well for all psychological constructs. For example, although self-report measures constitute the most widely used method for assessing personality, individuals are unaware of or are unwilling to report about some aspects of their personality. In these cases, we cannot directly ask participants about the construct of interest. Similarly, implicit attitude measures may not work well for all types of implicit constructs. For example, Karpinski (2004) noted that because many implicit attitude measures require a reference category by which to compare oneself (e.g., “self” versus “other”), individuals could earn a high implicit attitude score by having either strong self-positive associations or strong other-negative associations. It has also been argued that fast reaction times—a feature of most implicit measures—limit the amount of time available for self-reflection and evaluating one's self-esteem (Buhrmester et al., 2011). In addition, there have been numerous critiques regarding the sources of method variance of implicit attitude measures (e.g., De Houwer, Teige-Mocigemba, Spruyt, & Moors, 2009). Since implicit attitude measures are relatively new, other kinds of unknown methodological artifacts may obscure the measurement of ISE.
Furthermore, there may be limitations with the very notion that the global self-concept can be evaluated implicitly. Two conceivable reasons why self-esteem might be particularly resistant to an implicit representation are that 1) the self is a highly multifaceted construct (Markus & Wurf, 1987), and thus people might not hold an implicit global evaluation of the self, and 2) given that implicit associations develop slowly over time through evaluative conditioning (e.g., Gawronski & Bodenhausen, 2006), and that people are more frequently experiencing their selves in the role of the subject, as opposed to an object (Duval & Wicklund, 1972), they may rarely form implicit associations of the self as an attitudinal object.
Against this backdrop of a lack of validity evidence for ISE measures, the primary goal of our research was to find a valid measure of ISE. A secondary goal was to conduct this search among individuals from multiple cultural backgrounds. Although there is much converging evidence that individuals from East Asian cultures self-enhance far less than Westerners (e.g., Heine & Hamamura, 2007; but see Sedikides, Gaertner, & Vevea, 2007), it is sometimes claimed that self-presentational biases are responsible for this cultural difference (e.g., East Asians being modest) and that cultural variability in ISE does not exist (Kobayashi & Greenwald, 2003; Yamaguchi et al., 2007; but see Falk, Heine, Yuki, & Takemura, 2009). Yet, there are few tests of ISE measure validity among East Asian populations. In Study 1 we tested the validity and cultural variability of the newest and most popular ISE measures among three cultural groups. In Study 2, we sought to improve the validity of ISE measures via two experimental manipulations.
A plethora of implicit attitude measures have recently emerged, each with potential methodological improvements over previously existing ones. The go/no-go association test (Nosek & Banaji, 2001) and the single-category IAT (Karpinski & Steinman, 2006) can assess associations between the self and positive/negative concepts, without the need for comparative reference categories. The affect misattribution procedure does not require fast reaction times (Payne, Cheng, Govorun, & Stewart, 2005) and the single-block IAT reduces some method variance associated with the IAT (Teige-Mocigemba et al., 2008). However, since Bosson et al.'s study (2000), there has not been a systematic comparison of the convergent and criterion validity of new ISE measures versus ESE measures. Rudolph and colleagues (2008) assessed several new ISE measures and found poor convergent validity, but did not assess criterion validity.
In Study 1 we compared ISE and ESE measures, together with criterion variables, among Euro-Canadians, Asian-Canadians, and Japanese. ISE measures were chosen based on their previous popularity, potential methodological improvements, and portability across cultures (e.g., the NLT was omitted to avoid comparing cultures that use different alphabets). We sampled criterion measures from a range of possible options in an attempt to replicate previous research findings (for a review, see Buhrmester et al., 2011).
A total of 107 Euro-Canadian (77.57% female; M age = 21.66; SD = 4.67) and 187 Asian-Canadian students (74.33% female; M age = 19.98; SD = 1.89) from the University of British Columbia (UBC) participated for extra course credit or monetary compensation. An additional 112 Japanese students (32.14% female; M age = 20.96; SD = 2.30) from Kyoto University participated for monetary compensation. The Japanese data collection was interrupted by the March 11, 2011 earthquake and tsunami; 30 participants were collected before this date, and the remaining were collected after April 28, 2011. See the online supplementary materials for additional sample demographic information.
Design and Procedure
On average participants spent 63 minutes completing the study via the Internet (SD = 22.89); 95% of participants took 94 minutes or less (not including 2 scheduled breaks of 5 minutes each). Whereas UBC students participated in English, Japanese completed the study in Japanese. All study materials were translated into Japanese by a Japanese researcher involved in the study and were independently checked for accuracy by two bilingual research assistants.
Participants first completed demographic questions about age, gender, cultural background, and idiographic information later used as stimuli for some ISE tasks: their first and last names, a place they identify with (e.g., hometown), birthdate (month and day), and the same information for their best friend. Participants provided contact information for a friend who could provide an independent rating of them. Next, participants completed a battery of measures: 1) explicit self-esteem, 2) implicit self-esteem, and 3) criteria. To reduce the possibility that fatigue could explain the relative performance of the measures, the order of ISE and ESE measures were counterbalanced and the set of criteria measures always appeared last. Concise descriptions of each measure appear below. Readers interested in more detailed descriptions, scoring procedures, and psychometric properties may see the supplementary online materials.
Self-report measures of self-esteem included the Rosenberg self-esteem scale (RSES; Rosenberg, 1965), self-liking (SL) and self-competence (SC) scales (Tafarodi & Swann, 2001), feeling differentials (FD; e.g., Kobayashi & Greenwald, 2003), a feeling thermometer (FT; e.g., Kobayashi & Greenwald, 2003), the self-attributes questionnaire (SAQ; Pelham & Swann, 1989) and a measure of self-enhancement via a false uniqueness effect (FU; e.g., Heine & Lehman, 1997).
Three measures derived from the IAT were included in this study. The self-esteem version of the IAT (Greenwald & Farnham, 2000) is a categorization task that requires words from two pairs of categories. To test the validity of IAT variants previously used to explore cross-cultural differences, we used “Self” versus “Best Friend” and “Unpleasant” and “Pleasant” as the categories (Yamaguchi et al., 2007). The idiographic stimuli (i.e., self and best friend name, hometown, and birthdate) served as words for the first pair of categories, and the pleasant and unpleasant words were the same as those used by Kobayashi and Greenwald (2003). Conceptually, IAT scores compare response latencies from a “compatible” block in which “pleasant” and “self” (and “unpleasant” and “best friend”) share the same response keys to an “incompatible” block in which “unpleasant” and “self” (and “pleasant” and “best friend”) share the same response keys. Resulting scores are typically interpreted as an implicit preference for the self (vs. the best friend). Two IAT variants using the same categories and stimuli were included in our study: The single-block IAT (SB-IAT; Teige-Mocigemba et al., 2008), in which compatible and incompatible trials can occur in the same block and are determined by the position of the target word on the screen; The single-category IAT (SC-IAT; Karpinski & Steinman, 2006) assesses the relationship between “self” and valence attributes.
The go/no-go association test (GNAT; Nosek & Banaji, 2001) is a word identification task that assess pairs of associations by analyzing response errors using signal detection theory. The GNAT consisted of 4 blocks, each of which featured 2 target categories (Self-pleasant, self-unpleasant, best friend-pleasant, and best friend-unpleasant). We examined scores for the implicit self-pleasant (GNAT-SP) and self-unpleasant (GNAT-SU) relationships.
We included 2 methods that rely on the influence of self-primes. In the affect misattribution procedure (AMP; Payne et al., 2005), participants were primed with self or best-friend idiographic stimuli or a neutral (blank) prime and then rated the pleasantness of an ambiguous target. Since we anticipated many of our participants would be familiar with the Chinese ideographs used by Payne et al. (2005), we used a set of 48 Tibetan characters as the targets. Conceptually, those with high ISE should report higher liking of the target after receiving a self-relevant prime. In the affective priming task (APT; Hetts, Sakuma, & Pelham, 1999), participants identified the words “good” or “bad” after being primed. Conceptually, fast identification of “good” (versus “bad”) after a self-prime is thought to be indicative of high ISE.
Two methods using a less explicit approach to ISE measurement were included. In the birthday number task (BNT; e.g., Bosson et al., 2000), participants rated their liking of the numbers 1 through 40 and their responses were compared to their actual birthday and birth month. More liking of one's own birth month and day is thought to indicate higher ISE. This task was chosen in lieu of the NLT as the different writing systems between Japanese and English make cross-cultural comparisons of the NLT problematic. Participants also completed a self-evaluation under load task (SEL) in which they rated 30 personality traits as characteristic of “me” or “not me” while remembering an 8-digit number (Falk et al., 2009). At the end of the study, participants rated the social desirability of each trait. A tendency to claim to possess highly desirable traits while under cognitive load is thought to be indicative of high ISE.
Peers rated the participants on rephrased versions of the Rosenberg self-esteem scale (FR-RSES), self-competence scale (FR-SC), and self-liking scale (FR-SL). An example is: “My friend feels that s/he is a person of worth, at least on an equal basis with others.” Peers simply answered these questions, and were not instructed to answer how they thought their friend would answer. To the extent that peers would know how participants tend to behave and feel, we expected this measure to positively correlate with ISE. Some evidence suggests that independent raters can pick up on non-verbal behavior indicative of ISE (Spalding & Hardin, 1999). Perhaps due to the interruption in data collection, response rates were higher for Euro-Canadians (71.96%) and Asian-Canadians (62.57%) than for Japanese (34.82%).
Participants also completed several self-report measures, including the ambiguous statements task (AST; Tafarodi, 1998), the parental bonding instrument (Parker, Tupling, & Brown, 1979) assessing retrospective reports of mother's caring (PBI-MC), mother's over-protectiveness (PBI-MO), father's caring (PBI-FC), and father's over-protectiveness (PBI-FO), the positive and negative affect scales (PA and NA; Watson, Clark & Tellegen, 1988), authentic and hubristic pride (PRIDE-A and PRIDE-H; Tracy, Cheng, Robins, & Trzesniewski, 2009), and the narcissistic personality inventory (NPI; Raskin & Terry, 1988). The AST was the only measure flagged for difficulty in translation to Japanese.
Each of these measures has been linked to ISE in previous research. To the extent that it acts as a filter for ambiguous information, we would expect ISE to positively correlate with the AST (Bosson et al., 2000). DeHart, Pelham, and Tennen (2006) found that retrospective reports of parental “nurturance” were positively related to university students’ implicit self-esteem, whereas those of parental over-protectiveness were negatively related. If ISE reflects an affective reaction towards the self, we might expect that participants with high ISE would typically feel more PA than NA (Koole & DeHart, 2007). Tracy and colleagues (2009) have argued that ISE is positively related to PRIDE-A and negatively related to PRIDE-H. Finally, previous research suggests that narcissism is either negatively related to ISE or is characterized by the combination of high ESE and low ISE (e.g., Bosson et al., 2008).
Overview and Data Analysis Strategy
Results are divided into 2 sections concerning: 1) the validity of each type of measure, and 2) mean differences across cultures. Due to the large number of statistical tests we adopted an α = .01 level in interpreting statistical significance and focus on overall patterns in the data.1 We report exact p-values where possible so that readers may also draw their own conclusions. In addition to the aforementioned incomplete data on peer-ratings, some participants had missing partial data on at least one other measure (9.34% of Euro-Canadians, 8.56% of Asian-Canadians, and 7.14% of Japanese). Since retaining only cases with complete data would mean discarding a substantial proportion of our sample, we used a combination of the Expectation-Maximization algorithm and bootstrapping for point estimates and inferences (e.g., Little & Rubin, 2002).
Validity of Explicit Versus Implicit Self-Esteem Measures
All ESE measures positively correlated with each other among Euro-Canadians (Mean r = .49, p < .001; range: .21 to .82), Asian-Canadians (Mean r = .50, p < .001; range: .27 to .72), and Japanese (Mean r = .41, p < .001; range: .13 to .77), with most correlations reaching statistical significance. A visual interpretation of the data can be obtained from Figure 1: Positive correlations are blue whereas negative correlations are red.
To ease interpretability for ISE convergent validity, the sign for the GNAT-SU was reversed such that high scores indicate a low self-unpleasant relationship. Convergent validity is indicated by positive (blue) correlations in Figure 2. There was a striking lack of positive relationships among ISE measures for Euro-Canadians (Mean r = .005, p = .80; range: −.85 to .29), Asian-Canadians (Mean r = .002, p = .87; range: −.85 to .23), and Japanese (Mean r = −.02, p = .23; range: −.79 to .29). The GNAT-SP and GNAT-SU (reversed) had a strong relationship in the opposite direction than was expected. Relationships among measures other than the GNAT-SU were close to 0 and tended to be positive, but also contained many negative correlations.
Assuming that the default explicit response is in agreement with one's implicit feelings towards the self, we expected small positive correlations between ISE and ESE (Dijksterhuis, Albers, & Bongers, 2009; Gawronski & Bodenhausen, 2006). For example, Epstein (2006) argues that “most people's experientially and rationally determined beliefs are mainly congruent, or else they would be in a continuous state of conflict and stress” (p. 71). We briefly note, however, that some theoretical positions state that ISE and ESE ought to be independent (e.g., Hetts & Pelham, 2001). ISE correlations with ESE hovered near 0 for Euro-Canadians (Mean r = .03, p = .01; range: −.22 to .27), Asian-Canadians (Mean r = −.003, p = .12; range: −.22 to .32), and Japanese (Mean r = .08, p < .001; range: −.24 to .61), with a few exceptions (see supplementary online materials). The SEL tended to have moderate correlations with ESE measures among Japanese (Mean r = .47, p < .001; range: .21 to .61) and Asian-Canadians (Mean r = .23, p < .001; range: .10 to .32). Finally, the BNT (Mean r = −.15, p < .01; range: −.22 to −.09) tended to have negative correlations with ESE among Asian-Canadians.
Scales theoretically negatively related to ESE (NA, PBI-MO, and PBI-FO) or ISE (NPI, PRIDE-H, NA, PBI-MO, and PBI-FO) were reverse-scored when examining criterion validity. Thus, criterion validity is indicated by positive (blue) correlations in Figures 3 and 4. To formally compare the predictive power of ESE and ISE measures, we also computed average correlations between each self-esteem measure and the set of criterion measures, and the set of all ESE or ISE measures across the set of criterion measures. That is, a single index reflected the criterion validity of each ISE and ESE measure (see the supplementary online materials for individual values) and a single index reflected the predictive validity of all ISE or ESE measures within each cultural group.
Overall, ESE measures moderately predicted the criterion measures for Euro-Canadians (Mean r = .26, p < .001; range: −.10 to .62), Asian-Canadians (Mean r = .29, p < .001; range: .06 to .58), and Japanese (Mean r = .20, p < .001; range: −.16 to .54) and demonstrated mostly positive correlations within each cultural group. With few exceptions (the FU for Euro-Canadians and the FT for Japanese), all ESE measures were significant predictors of the set of criteria with average correlations ranging from .04 to .35 (see supplementary materials).
Overall, ISE measures weakly predicted the criterion measures for Euro-Canadians (Mean r = .04, p < .01; range: −.22 to .27), Asian-Canadians (Mean r = .02, p = .14; range: −.21 to .30), and Japanese (Mean r = .04, p < .01; range: −.63 to .63). Many correlations between ISE measures and criteria were either negative or near zero. Turning to individual ISE measures, only the SC-IAT was significant among Euro-Canadians (Mean r = .13, p < .001; range: −.09 to .26). The IAT may have reached significance if a more liberal significance level were used (Mean r = .08, p = .05; range: −.15 to .27). Only the SEL was significant for Asian-Canadians (Mean r = .13, p < .001; range: −.15 to .30) and Japanese (Mean r = .19, p < .001; range: −.38 to .42).
ISE by ESE Interactions
The final test of validity of the ISE measures involved the disjunction between ESE and ISE in predicting narcissism (e.g., Bosson et al., 2008). For each ISE measure and within each cultural group, we regressed narcissism on ESE, ISE, and their interaction. Due to its popularity, the RSES was chosen as the explicit measure in these analyses. Not a single interaction term approached significance and the average standardized regression coefficients were .03 for Euro-Canadians, −.03 for Asian Canadians, and .01 for Japanese.2
Cultural Variability in Self-Esteem
Comparisons of self-esteem reveal that Euro-Canadians tended to have higher ESE than both Asian-Canadians and Japanese, and Asian-Canadians tended to have higher ESE than Japanese (see supplementary online materials), replicating past research. The majority of the cultural differences in ESE constituted non-trivial effect sizes (e.g., the effect size for Euro-Canadians vs. Japanese ranged from d = .55 to 1.48, all with p < .001, and the average effect size was d = 1.10). In contrast, the pattern of cultural variability in ISE was inconsistent (e.g., the effect size for Euro-Canadians vs. Japanese ranged from d = −.38 to 1.70 with only 3 measures with p < .01, and the average effect size was d = .25; see supplementary online materials).
In Study 1 we found that ESE measures outperformed ISE measures. Nearly all ESE measures correlated positively with each other and with each criterion, and cultural variability in ESE was consistent with previous research (Heine & Hamamura, 2007). In contrast, the convergent and criterion validity of ISE measures was nearly non-existent. This result is unlikely to be due to fatigue (since ISE and ESE were counterbalanced) and converges with findings from previous research (Bosson et al., 2000; Buhrmester et al., 2011; Rudolph et al., 2008). No single ISE measure stood out across all cultural groups and consistent cultural variability in ISE measures was not evident. Why did ISE measures display such poor validity evidence? Given that measures such as the IAT show good predictive validity in other domains and some new measurement procedures examined have provided methodological improvements, it seems implausible that none of the measurement procedures we examined are good candidates for assessing ISE. Instead, we now turn to how ISE is often conceptualized and operationalized.
Conceptualizing and Measuring Implicit Self-Esteem
Many agree that the implicit processing system is associative (Bosson, 2006; Epstein, 2006; Gawronski & Bodenhausen, 2006; Greenwald et al., 2002). For example, the self may be associated with multiple other valenced concepts, and possibly exists as a network of associations or a schema in memory (e.g., Epstein, 2006; Gawronski & Bodenhausen, 2006). Consistent with these ideas, Greenwald and colleagues (2002) have defined ISE as: “… the association of the concept of self with a valence attribute” (p. 5). It is clear from this definition and from how many ISE measures are implemented (with the self-concept typically primed with self-related pronouns or ideographically-generated items; e.g., Yamaguchi et al., 2007), that ISE is viewed as a high-level global construct, rather than something specific or multifaceted. Furthermore, the positive and negative stimuli used often do not form a well-defined positive or negative concept (e.g., warm, ugly, happy, filthy, etc.).
Self-concepts may be multidimensional, and individuals may have multiple different self-representations (Markus & Wurf, 1987). ISE may be similarly multidimensional or highly complex (see Bosson, 2006; Epstein, 2006). For example, Koole and DeHart (2007) argue that implicit representations of the self possibly encompass “the totality of the person's needs, motives, and autobiographical experiences” (p. 25). Just as global ESE may be hierarchically structured and encompassing of self-worth, self-liking, self-competence, and feelings that the self is moral, strong, valued, and accepted by others, so might ISE (Epstein, 2006). Measures of ISE that encompass the multidimensionality of the self might thus evince greater criterion validity. Alternatively, we may have implicit attitudes towards ourselves for different social contexts. For example, Bosson (2006) argues that different facets of ISE may correspond to different domains such as the “social self” or “academic or intellectual self” (p. 55). The predictive validity of ISE may thus be enhanced by targeting the self in different contexts.
The above review suggests that there are two prominent alternative ways of conceptualizing ISE that are currently not reflected in typical measurement instruments: 1) ISE as a multifaceted construct (e.g., self-competence, self-liking, etc.), and 2) ISE as a domain-specific construct (e.g., implicit feelings towards one's academic self, social self, etc.). In Study 2, we aspired to increase the validity of some ISE measures by using two manipulations meant to tap these alternative conceptualizations. To allow for some diversity in criterion measures, we included some new criteria as well as kept some of the same ones from Study 1. Since one critique of ESE measures is their contamination with response biases, we included measures of self-deception, impression management, and modesty. Finally, to reduce possible fatigue, the total number of measures to complete was greatly reduced.
We recruited 623 individuals via Amazon's Mechanical Turk for $.50. A single question was used to screen participants: “Answer six for this question so that we know you are paying attention.” A total of 582 people correctly answered this question (65.12% female; M age = 32.25; SD = 11.93; see supplementary online materials for additional demographic information).
Design and Procedure
After completing a demographics form, participants were randomly assigned to one of 3 conditions: 1) Control, 2) Explicit Prime, or 3) Task Prime. All participants completed ISE, ESE, and criterion measures, in the exact order as presented below. Analogous to Study 1, participants in the control condition did not receive any prime before or during completion of these measures.
Before completing these measures, participants in the Explicit Prime condition wrote for 5 minutes about the following prompt: “Please think for a moment about how you feel about yourself when at work or school. Do you feel good about yourself or bad about yourself? Do you often do a good job at work/school? Or do you perform poorly compared to others? Do you get along with others? Or do you have a hard time making friends at work/school?” The purpose of this task was to tap ISE as a domain specific construct by focusing participants’ attention to a context that typically takes up a large proportion of individuals’ lives (i.e., work/school) and to focus on one of two domains (i.e., competence in work performance or relationships). Given the diversity of our sample, a limited degree of choice in the exact topic was necessary as no single life activity would be equally important for all.
Participants in the Task Prime condition completed ISE measures with stimuli intended to tap implicit self-competence, which may be considered as one major facet of self-esteem (Tafarodi & Swann, 2001). Specifically, pleasant and unpleasant category and stimuli words for the IAT and SC-IAT were replaced with words corresponding to competent (competent, capable, skilled, qualified, smart, and intelligent) and incompetent (incompetent, incapable, clumsy, unqualified, stupid, dumb). Participants in the other conditions saw the same stimuli words for the pleasant and unpleasant categories as in Study 1.
Since the sample was primarily of a Western cultural background, we included ISE measures that were the most promising in the Euro-Canadian sample from Study 1, namely the IAT and SC-IAT. For self and best friend related stimuli, participants saw pronouns (e.g., I, me, mine, friend, bud, companion). In addition, the name letter test (NLT; Bosson et al., 2000) was included to make up for its absence from Study 1. Analogous to the BNT, participants who tend to like the initials of their own name are thought to have high ISE.
Since many ESE measures performed similarly in Study 1, the Rosenberg (1965) self-esteem scale (RSES) was included in Study 2 due to its high popularity.
As in Study 1, participants completed the authentic pride (PRIDE-A), hubristic pride (PRIDE-H), and positive and negative affect scales (PA and NA; Tracy et al., 2009; Watson et al., 1988). These measures were retained to assess whether any ISE measures could predict self-reported affect. Several self-report measures new to Study 2 were the habit index of negative thinking (HINT; Verplanken et al., 2007), self-deception and impression management (SD and IM; Paulhus, 1991), modesty (MOD; Whetstone et al., 1992), and the feedback-seeking questionnaire (FSQ; Swann et al., 1992). The HINT is intended to measure the tendency for individuals to automatically have negative self-thoughts and is theoretically negatively related to ISE. SD, IM, and MOD were measured to test whether such response tendencies would be unrelated to ISE measures, but entangled with ESE measures. Finally, the FSQ asks participants to determine from a list of questions, which questions they would like their friend to answer. Available questions are designed to elicit either favorable or unfavorable information about the participant. Participants with high ISE are expected to seek more positive feedback to reinforce their pre-existing schemas. This measure was chosen as an alternative to the AST as a measure meant to tap participants’ tendencies regarding the seeking and interpretation of information. In addition, analogous to Bosson et al (2000), 2 research assistants rated participant essays from the explicit prime condition, including the essay writer's self-competence (ES-SC), self-liking (ES-SL), and global self-esteem (ES-GL). We also used the Linguistic Inquiry and Word Count program (LIWC; Pennebaker, Booth, & Francis, 2007) to calculate the percentage of positive emotion (ES-PA) and negative emotion (ES-NA) words that participants used.
Although we had hoped that our experimental manipulations would boost the validity of ISE measures, in general this was not the case. Regardless, results are presented for each experimental condition separately (Control, Task Prime, and Explicit Prime). Due to the presence of missing data, we again used the same analytic techniques as in Study 1.
Convergent and Divergent Validity
The RSES did not strongly correlate with any ISE measure, ranging from r = −.05 to r = .17, all p's > .01 (see online supplementary materials). Average intercorrelations among ISE measures were r = .04, p = .36 (explicit prime), r = .04, p = .31 (task prime), and r = .10, p = .02 (control). This latter finding was primarily driven by tendency for a positive IAT and SC-IAT relationship (range: .28 to .33); this effect could be due to shared method variance as these measures did not tend to correlate with the NLT (range: −.16 to .01).
The RSES demonstrated small to large positive correlations with IM (range: .12 to .51) and moderate negative correlations with MOD (range: −.22 to −.47). Weaker relationships were present between the RSES and SD (range: −.01 to .28). In contrast, ISE measures did not display a consistent pattern of correlations with any measure of response bias or modesty (range: −.08 to .15; all p's > .01; see online supplementary materials). These results suggest that ISE measures are unrelated to response biases, whereas ESE measures may be contaminated with them.
As was found in Study 1, all ISE measures exhibited poor criterion validity. To enhance interpretability for Figure 5 (left panel), several criteria were reverse coded, including NA, PRIDE-H, HINT, and ES-NA. There were weak correlations in the direction consistent with criterion validity; however, these correlations were small and inconsistent across experimental condition. Many correlations were in the opposite direction than expected. We again computed average correlations between criteria and each ISE measure (see the supplementary materials). Results indicated that only the SC-IAT among control participants (Mean r = .10, p < .01) significantly predicted the criterion measures at the α = .01 level, although the IAT in the control condition (Mean r = .07, p = .04) and the NLT in the task prime condition (Mean r = .11, p = .03) came close. Note also that across all correlations in the left panel of Figure 5, none met the p < .01 threshold, even for relatively large sample sizes in each cell (n's > 190).
As shown in Figure 5 (right panel), the RSES displayed good criterion validity. Correlations ranged from .20 to .77 (all p < .01) across all measures and conditions. Average correlations between the RSES and criteria ranged from .49 to .53 (all p < .001) across all conditions (see the supplementary materials). Controlling for response styles, ESE still had much better criterion validity than ISE (see partial correlations in supplementary materials).
Study 2 provided an initial test of whether ISE is a unidimensional, global construct or a more refined conceptualization. The manipulations implemented in Study 2—using self-competence based stimuli (instead of general positive-negative stimuli) for the IAT and SC-IAT, or asking participants to explicitly write a self-esteem related essay before completing ISE measures—were both unsuccessful at boosting the criterion validity of ISE measures. Overall, the RSES again outperformed ISE measures. One possible explanation for this pattern of results is that the manipulations we implemented were ineffective or too vague. However, even a weak manipulation should have yielded at least a trend of the effect in such a large sample (N = 582). The data we have presented thus suggest that an increase in self-reflection and depth of processing about the self does not lead to an increase in the validity of ISE measures.
Our two studies constitute an extensive search for ISE measure validity. In all cases, ESE measures outperformed ISE measures, and we found scant evidence that ISE measures assessed anything related to self-esteem. One possible rebuttal to our findings is that the criteria we used could be seen as unrelated to ISE, yet this is implausible as we covered a wide range of phenomena thought to be linked to ISE—such as affective experiences (Conner & Barrett, 2005), parents’ care and over-protectiveness (DeHart et al., 2006), seeking and interpreting information about the self (Hetts & Pelham, 2001), and automatic negative self-thoughts (Verplanken et al., 2007). Although it is commonly thought that ISE predicts behaviors, or LIWC coded essays (Peterson & DeHart, 2013; Rudolph et al., 2010; Spalding & Hardin, 1999), the ISE measures in our studies did poorly at predicting peer-ratings of self-esteem (Study 1) and ratings of participants’ essays (Study 2; using LIWC and independent raters) whereas ESE predicted these criteria. To the extent that any of these 3 criteria could pick up on such unconsciously driven behavior, we should have seen at least some evidence for ISE validity, however, previous research is inconsistent as to whether ISE should have a unique effect (Rudolph et al., 2010; Spalding & Hardin, 1999) or interacts with other variables (Peterson & DeHart, 2013). Moreover, the notion that implicit measures ought to correlate better with independent ratings (versus self-reports) is not supported by a recent meta-analysis (Greenwald et al., 2009).
Our studies are consistent with previous findings, but also included advanced implicit attitude measurement procedures not available to Bosson et al. (2000) and not reviewed by Buhrmester et al. (2011). Despite the methodological advances over the past decade, we saw no improvement in the validity of these new measures. Although some have suggested alternative measurement procedures based on in-depth interviews and coding schemes that avoid the quick/intuitive judgments required of most ISE measures in our studies (Buhrmester et al., 2011), such measures have not yet been developed and so at this time cannot be evaluated. Based on existing measures thus far and the available evidence, our research indicates a replication problem. To highlight this problem, consider that the inclusion of unpublished data leads one of the most “replicated” ISE effects to vanish or become very weak (Bosson et al., 2008), and that many IAT and NLT findings have few or no replications (Buhrmester et al., 2011). Thus, we suggest that the weight of the validity evidence in support of ISE is weak, and the burden of proof now lies with those who wish to consider ISE as a viable construct.
In conclusion, one plausible remaining explanation for a lack of ISE validity is that there may be problems with the way ISE is conceptualized. We speculate that individuals may not form implicit associations with the self as an attitudinal object (cf., Duval & Wicklund, 1972), as our attention is typically directed at objects that are separate from the self, and thus we would be more likely to form such implicit associations with the objects themselves. This possibility remains to be addressed in future research along with other possible measurement procedures for ISE (e.g., Buhrmester et al., 2011). Elsewhere we elaborate on this explanation and the utility of considering ISE as a process whereby individuals project their self-feelings onto self-associated objects (Falk & Heine, 2013; see also Greenwald & Banaji, 1995). For instance, people may unknowingly apply their self-feelings to various extensions of the self: the objects they own, the groups they belong to, the decisions they make, etc. This alternative conceptualization could allow for indirect measurement of self-feelings through these self-associated objects. As conceptualizing ISE as a domain specific construct failed to improve ISE validity, we suggest this position as an alternative to the idea that there may be too many implicit associations with the self to be reliably activated and measured by current procedures.
Since we wish to err on the side of high power for the benefit of ISE measures, this alpha level was somewhat arbitrarily chosen as a compromise between no Type I error control and conducting Bonferroni corrections adjusting for all tests—the latter of which may be overly conservative. For example, Bonferroni corrections on just the 117 ISE-criteria correlations for a single cultural group would yield a threshold of p = .0004 for significance. Incidentally, a .01 threshold corresponds roughly to what would be used had we implemented the Benjamini-Hochberg procedure for controlling the false discovery rate, which has been recommended as a high power replacement for the Bonferroni procedure (see Thissen, Steinberg, & Kuang, 2002). For instance, if we were to implement this procedure on all correlations among ESE, ISE and criterion measures for Euro-Canadians in Study 1, this procedure would result in the p-value for the relationship between PBI-MO and SCIAT being adjusted from .011 to .049.
Comparable results are obtained regardless of the choice of ESE measure and are reported in the supplementary materials; a negative coefficient is the expected direction.
- 2006). Conceptualization, measurement, and functioning of nonconscious self-esteem. In M. H. Kernis (Ed.), Self-esteem Issues and Answers: A Sourcebook of Current Perspectives (pp. 53–59). New York: Psychology Press. (
- 2008). Untangling the links between narcissism and self-esteem: A theoretical and empirical review. Social and Personality Psychology Compass, 2, 1415–1439. , , , , , & (
- 2000). Stalking the perfect measure of implicit self-esteem: The blind men and the elephant revisited? Journal of Personality and Social Psychology, 79, 631–643. , , & (
- 2011). Implicit self-esteem: Nature, measurement, and a new way forward. Journal of Personality and Social Psychology, 100, 365–385. , , & (
- 2005). Implicit self-attitudes predict spontaneous affect in daily life. Emotion, 5, 476–488. , & (
- 2006). What lies beneath: Parenting style and implicit self-esteem. Journal of Experimental Social Psychology, 42, 1–17. , , & (
- 2009). Implicit measures: A normative analysis and review. Psychological Bulletin, 135, 347–368. , , , & (
- 2009). Digging for the real attitude: Lessons from research on implicit and explicit self-esteem. In R. E. Petty, R. H. Fazio, & P. Briñol (Eds.), Attitudes: Insights from the New Implicit Measures (pp. 229–250). New York: Psychology Press. , , & (
- 1972). A theory of self awareness. Oxford: Academic Press. , & (
- 2006). Conscious and unconscious self-esteem from the perspective of cognitive-experiential self-theory. In M. H. Kernis (Ed.), Self-esteem Issues and Answers: A Sourcebook of Current Perspectives (pp. 69–76). New York: Psychology Press. (
- 2013). What is implicit self-esteem and does it vary across cultures? Unpublished manuscript. Vancouver, Canada: Department of Psychology, University of British Columbia. , & (
- 2009). Why do Westerners self-enhance more than East Asians? European Journal of Personality, 23, 183–203. , , , & (
- 2006). Associative and propositional processes in evaluation: An integrative review of implicit and explicit attitude change. Psychological Bulletin, 132, 692–731. , & (
- 1995). Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychological Review, 102, 4–27. , & (
- 2000). Using the implicit association test to measure self-esteem and self-concept. Journal of Personality and Social Psychology, 79, 1022–1038. , & (
- 2002). A unified theory of implicit attitudes, stereotypes, self-esteem, and self-concept. Psychological Review, 109, 3–25. , , , , , (
- 2009). Understanding and using the implicit association test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97, 17–41. , , , & (
- 2007). In search of East Asian self-enhancement. Personality and Social Psychology Review, 11, 4–27. , & (
- 1997). The cultural construction of self-enhancement: An examination of group-serving biases. Journal of Personality and Social Psychology, 72, 1268–1283. , & (
- 2001). A case for the nonconscious self-concept. In G. B. Morskowitz (Ed.), Cognitive Social Psychology: The Princeton Symposium on the Legacy and Future of Social Cognition (pp. 105–123). Mahwah, NJ: Erlbaum. & (
- 1999). Two roads to positive regard: Implicit and explicit self-evaluation and culture. Journal of Experimental Social Psychology, 35, 512–559. , , & (
- 2005). A meta-analysis on the correlation between implicit association test and explicit self-report measures. Personality and Social Psychology Bulletin, 31, 1369–1385. , , , , & (
- 2004). Measuring self-esteem using the implicit association test: The role of the other. Personality and Social Psychology Bulletin, 30, 22–34. (
- 2006). The single category implicit association test as a measure of implicit social cognition. Journal of Psychology and Social Psychology, 91, 16–32. , & (
- 2003). Implicit-explicit differences in self-enhancement for Americans and Japanese. Journal of Cross-Cultural Psychology, 34, 522–541. , & (
- 2007) Self-affection without self-reflection: Origins, models, and consequences of implicit self-esteem. In C. Sedikides & S. Spencer (Eds.), The Self in Social Psychology (pp. 36–86). New York: Psychology Press. , & (
- 2008). Are implicit and explicit measures of self-esteem related? A meta-analysis for the name-letter test. Personality and Individual Differences, 44, 521–531. , & (
- 2002). Statistical Analysis with Missing Data, 2nd edition. New York: John Wiley. and (
- 1987). The dynamic self-concept: A social psychological perspective. Annual Review of Psychology, 38, 299–337. , & (
- 2001). The go/no-go association task. Social Cognition, 19, 625–664. , & (
- 1979). A parental bonding instrument. British Journal of Medical Psychology, 52, 1–10. , , & (
- 1991). Measurement and control of response bias. In J. P. Robinson, P. Shaver, & L. S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17–59). San Diego: Academic Press. (
- 2005). An inkblot for attitudes: Affect misattribution as implicit measurement. Journal of Personality and Social Psychology, 89, 277–293. , , , & (
- 1989). From self-conceptions to self-worth: On the sources and structure of global self-esteem. Journal of Personality and Social Psychology, 57, 672–680. , & (
- 2007). Linguistic Inquiry and Word Count: LIWC [Computer software]. Austin, TX: LIWC.net. , , & (
- 2013). Regulating connection: Implicit self-esteem predicts positive non-verbal behavior during romantic relationship-threat. Journal of Experimental Social Psychology, 49, 99–105. , & (
- 1988). A principal-components analysis of the narcissistic personality inventory and further evidence of its construct validity. Journal of Personality and Social Psychology, 54, 890–902. , & (
- 2008). Implicit cognition and substance use: A meta-analysis. Addictive Behaviors, 33, 1314–1328. , , & (
- 1965). Society and the Adolescent Self-image. Princeton, NJ: Princeton University Press. (
- 2010). Easier when done than said! Implicit self-esteem predicts observed or spontaneous behavior, but not self-reported or controlled behavior. Zeitschrift für Psychologie, 218, 12–19, , , , & (
- 2008). Through a glass, less darkly? Reassessing convergent and discriminant validity in measures of implicit self-esteem. European Journal of Psychological Assessment, 24, 273–281. , , , , & (
- 2004). Cognition and emotion? The dead end in self-esteem research. Journal for the Theory of Social Behavior, 34, 73–90. , & . (
- 2007). Evaluating the evidence for pancultural self-enhancement. Asian Journal of Social Psychology, 10, 201–203. , , & (
- 1999). Unconscious unease and self-handicapping: Behavioral consequences of individual differences in implicit and explicit self-esteem. Psychological Science, 10, 535–539.Direct Link:, & (
- 1992). The allure of negative feedback: Self-verification strivings among depressed persons. Journal of Abnormal Psychology, 101, 293–306. , , , & (
- 1998). Paradoxical self-esteem and selectivity in the processing of social information. Journal of Personality and Social Psychology, 74, 1181–1196. (
- 2001). Two-dimensional self-esteem: theory and measurement. Personality and Individual Differences, 31, 653–673. , & (
- 2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83. , , & (
- 2008). Minimizing method-specific variance in the IAT: A single block IAT. European Journal of Psychological Assessment, 24, 237–245. , , & (
- 2009). Authentic and hubristic pride: The affective core of self-esteem and narcissism. Self and Identity, 8, 196–213. , , , & (
- 2007). Mental habits: Metacognitive reflection on negative self-thinking. Journal of Personality and Social Psychology, 92, 526–541. , , , , & (
- 1988). Development and validation of brief measures of positive and negative affect: The PANAS scale. Journal of Personality and Social Psychology, 54, 1063–1070. , , & (
- 1992). The modest responding scale. Paper presented at the convention of the American Psychological Society, San Diego. , , & (
- 2007). Apparent universality of positive implicit self-esteem. Psychological Science, 18, 498–500.Direct Link:, , , , , , et al. (
Appendix S1 Supplementary materials for Falk, Heine, Takemura, Zhang, & Hsu, 2014, Journal of Personality.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.