Contract grant sponsor: Social Sciences and Humanities Research Council of Canada; Contract grant sponsor: Canadian Institutes of Health Research.
Research Article
Students Aggress Against Professors in Reaction to Receiving Poor Grades: An Effect Moderated by Student Narcissism and Self-Esteem
Article first published online: 19 SEP 2012
DOI: 10.1002/ab.21450
© 2012 Wiley Periodicals, Inc.
Additional Information
How to Cite
Vaillancourt, T. (2013), Students Aggress Against Professors in Reaction to Receiving Poor Grades: An Effect Moderated by Student Narcissism and Self-Esteem. Aggr. Behav., 39: 71–84. doi: 10.1002/ab.21450
Publication History
- Issue published online: 22 JAN 2013
- Article first published online: 19 SEP 2012
- Manuscript Accepted: 7 AUG 2012
- Manuscript Received: 27 FEB 2012
Funded by
- Social Sciences and Humanities Research Council of Canada
- Canadian Institutes of Health Research
- Abstract
- Article
- References
- Cited By
Keywords:
- aggression;
- narcissism;
- self-esteem;
- experimental design;
- teaching evaluations
Abstract
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Laboratory evidence about whether students’ evaluations of teaching (SETs) are valid is lacking. Results from three (3) independent studies strongly confirm that "professors" who were generous with their grades were rewarded for their favor with higher SETs, while professors who were frugal were punished with lower SETs (Study 1, d = 1.51; Study 2, d = 1.59; Study 3, partial η2 = .26). This result was found even when the feedback was manipulated to be more or less insulting (Study 3). Consistent with laboratory findings on direct aggression, results also indicated that, when participants were given a poorer feedback, higher self-esteem (Study 1 and Study 2) and higher narcissism (Study 1) were associated with them giving lower (more aggressive) evaluations of the "professor." Moreover, consistent with findings on self-serving biases, participants higher in self-esteem who were in the positive grade/feedback condition exhibited a self-enhancing bias by giving their "professor" higher evaluations (Study 1 and Study 2). The aforementioned relationships were not moderated by the professor's sex or rank (teaching assistant vs.professor). Results provide evidence that (1) students do aggress against professors through poor teaching evaluations, (2) threatened egotism among individuals with high self-esteem is associated with more aggression, especially when coupled with high narcissism, and (3) self-enhancing biases are robust among those with high self-esteem. Aggr. Behav. 39:71-84, 2013. © 2012 Wiley Periodicals, Inc.
“My evaluator must not mark hastily and should re-read key points to offer better feedback. It is actually the poorest job in terms of evaluation that I've ever seen. It is easy for the evaluator to talk crap like “This is the worst essay I have ever read,” but when offering feedback about how it may be improved say nothing except “needs clarification.” How is this supposed to help me improve? This evaluator is a complete dumb ass, excuse my French.” Male, negative feedback condition.
INTRODUCTION
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
It is common practice to have students evaluate the performance of college/university instructors and professors. It is also common for faculty to express concern over student evaluations. Indeed, many feel that student ratings are not valid, but rather reflect a popularity contest (Aleamoni, 1987), which is largely driven by ease of marking (reviewed below), and potentially influencing the trend of grade inflation seen over the past few decades (Anglin & Meng, 2000; Kuh & Hu, 1999; Redding, 1998). Concerns over the validity of student evaluations are reasonable, given the fact that decisions about promotion, tenure, and annual raises are in part, based on these assessments.
There are many published studies, almost all nonexperimental, on the topic of students’ evaluations of teaching (SETs), which contribute to the concern about the soundness of such assessments. In theory, a student's appraisal of a professor's teaching ability should be based on competence and performance and not on other factors that are beyond the scope of practice. Unfortunately, research points to the fact that there are many nonmerit-based factors that are related to students’ evaluations. For example, professors’ level of physical attractiveness is one source of bias with so called “hot” professors receiving better teaching evaluations than their presumably less attractive colleagues (Felton, Mitchell, & Stinson, 2004; Freng & Webber, 2009). Sex seems to also matter; but not always. Several researchers have found that students tend to rate their female professors lower than their male professors (Basow & Silberg, 1987; Bennett, 1982; Kierstead, D'Agostino, & Dill, 1988), although others have found no relationship (see review by Laube, Massoni, Sprague, & Ferber, 2007). The extent to which these sex differences still holds true is unclear as recent work in this area is lacking. SETs are also influenced by other factors such as perceived course difficulty (Addison, Best, & Warrington, 2006), personal interest in the subject matter (Marsh & Roche, 1997), faculty offering the course (Beran & Violato, 2005), school's ranking (Bowling, 2008), and first impression of the professor (Ambady & Rosenthal, 1993).
The most robust correlate of favorable student ratings is, however, expected or received grades (e.g. Addison et al., 2006; Beran & Violato, 2005; Bowling, 2008; Brown, 1976; Coladarci and Kronfield, 2007; Felton et al., 2004; Greenwald & Gillmore, 1997; Griffin, 2004; Isely & Singh, 2005; Kennedy, 1975; Krautmann & Sanders, 1999; Langbein, 2008; Olivares, 2001). There are many factors that could explain this relationship (see Greenwald & Gillmore, 1997; Marsh, 1987). For example, it could be that effective teachers have a positive influence on students’ performance, which in turn positively impact students’ evaluations (validity hypothesis). It could also be that academically motivated students are more likely to do well and appreciate their instructors’ efforts (prior characteristics hypothesis). It is also suggested that professors who are generous with their grades are rewarded for their favor with more positive ratings (of instructor and/or course) than professors who are more parsimonious (grading leniency hypothesis).
The grading leniency hypothesis is controversial in the field of adult education. Critics tend to cite among other things, the positive, albeit low, correlation obtained between expected/received grades and SETs as evidence that the bias does not really exist (e.g. Marsh & Roche, 1997, 2000). The central problem, however, with those arguing for or against the grading leniency hypothesis, is that causality cannot be inferred from correlational research. Curiously, the few experimental studies that can provide information about causality have been vilified by opponents of the grading leniency hypothesis. In particular, Marsh and Roche (1997) have stated that the experimental work in this area is “methodologically weak, ethically indefensible, unrepresentative of naturally occurring differences in grading leniency (to the extent the manipulations represent grading leniency at all), and weak in terms of results” (p. 1192).
So what has been found using experimental methods? In all the studies that have manipulated the type of grade received, students have responded by evaluating their supposed instructor more favorably (higher SETs) in the high grade condition and more negatively (lower SETs) in the low grade condition (e.g. Blunt, 1991; Holmes, 1972; Powell, 1977; Snyder & Clair, 1976; Worthington & Wong, 1979). In other words, experimental studies show support for the grading leniency hypothesis. However, as March and Roche (1997) point out, these studies do share a number of methodological weaknesses including randomly assigning students who were selected from courses taught by the researcher into different grading groups (issue of dependence), and not reporting effect sizes or proving relevant information so that an effect size could be calculated.
Experimental studies involving grade manipulation have also shown that students who received a bogus high grade tend to attribute it to their own ability, while students who received a bogus low grade attribute their poor performance to the instructor's incompetence (e.g. Snyder & Clair, 1976). Although this type of self-serving bias has been found across many studies (Arkin & Maruyama, 1979; Arnold, 2009; Davis and Stephan, 1980; Gilmor & Reid, 1979), it is interesting to note that lower SETs are often not viewed as a form of revenge (e.g. Arnold, 2009; Boysen, 2008). This opinion tends to diverge from what is reported in the aggression literature. In this area of study, experimental paradigms that involve provoking undergraduate students by manipulating their “test” results or using negative “intelligence” feedback as the instigator, tend to result in an aggressive response (see Bettencourt & Miller, 1996 for review). Indeed, threats to ego are often met with aggression, particularly among individuals who are high on narcissism (e.g. Barry, Chaplin, & Grafeman, 2006; Bushman et al., 2009; Jones & Paulhus, 2010; Konrath, Bushman, Campbell, 2006; Stucke & Sporer, 2002; Thomaes, Bushman, Stegg, & Olthof, 2008; Twenge & Campbell, 2003). Narcissism is characterized by a grandiose sense of self-importance, uniqueness, and entitlement, a need for admiration, a lack of empathy, arrogance, the exploitation of others, and envy (APA, 2000).
In a seminal study by Bushman and Baumeister (1998), the relationship between narcissism and aggression was convincingly illustrated. Participants were given the opportunity to aggress against an innocent third party or against a person they presumed had insulted them (“This is one of the worst essays I have read!”) or praised them (“No suggestions, great essay!”) on an essay they wrote about the controversial topic of abortion. Results indicated that narcissism and insult “led to exceptionally high levels of aggression toward the source of insult” (p. 219), whereas self-esteem held no relationship. More recently, Bushman et al. (2009) reanalyzed the data from this study and found that in fact, self-esteem did moderate the relationship between narcissism and aggression. At high levels of self-esteem, there was a positive relationship between narcissism and aggression, especially in the context of a threat to ego, whereas at low levels of self-esteem, narcissism was not found to be related to aggression (see also Thomaes et al., 2008).
The present studies were designed to examine the hypothesis that poor teaching evaluations are a form of revenge. This hypothesis was examined by replicating Bushman and Baumeister's (1998) findings, using a more ecologically valid measure of aggression (i.e. SETs). Bushman and Baumeister's aggression measure was the deliverance of a noise blast to the person who had ostensibly insulted them. Participants were able to set the intensity and duration of the blast of noise. Although replicated across several others studies (e.g. Jones and Paulhus, 2010; Thomes et al., 2008), one issue with these studies is that in the real world, people do not aggress against others by delivering white noise. Rather, most adults aggress against others using the more nuanced and sophisticated form of indirect aggression, which would include assessing someone as less competent in reaction to a threat (Vaillancourt, 2005). In the present studies, aggression, defined as an act that is intended to harm or thwart another person (Mischel, 1993), was conceptualized as debased SETs (numerical and written), following the same threat to ego that was used in Bushman and Baumeister's study (1998).
Because SETs have been shown (in some studies) to be influenced by a professor's sex, the moderating role of sex was examined in Study 1. One of the hallmark characteristics of narcissism is the belief that one is special and unique. Accordingly, narcissists believe that they can “only be understood by, or should be associated with, other special or high-status people” (APA, 2000). Given this characteristic, the rank of the evaluator was manipulated in Study 2 to see if it influenced SETs. Receiving a poor grade from a lower ranking TA likely represents a greater threat to ego for a narcissist, than receiving a poor grade from a professor. In Studies 1 and 2, the feedback in the negative condition was insulting and so a "tit-for-tat" was expected. In Study 3, the degree to which participants were insulted was manipulated using five different grading conditions. It was expected that the greater the threat to ego (i.e. the greater the insult), the poorer the SETs would be.
Across Studies 1, 2, and 3, it was hypothesized that students randomly assigned to the positive feedback condition would reward their evaluator with a high/positive SETs (numeric and written), and those randomly assigned to the negative feedback condition would punish their assessor with low/negative SETs (numeric and written). This relationship was expected to be moderated by the professor's sex (lower SETs for female professors; Study 1), rank (lower SETs for TAs; Study 2), and by strength of ego-threat (lowest SETs for most negative feedback; Study 3). In all of the studies, self-esteem and narcissism were also expected to moderate the relationship between threat and aggression. Specifically, consistent with recent finding by Bushman et al. (2009), those high on self-esteem and narcissism were expected to provide the lowest SETs (i.e. highest aggression), especially in the negative feedback condition.
STUDY 1
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Study 1 examined the effect of receiving a negative or positive evaluation on students’ appraisals of their professor (SETs) and whether their evaluations would be moderated by (1) the supposed sex of their professor and (2) students’ own level of self-esteem and narcissism.
METHOD
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Participants
Participants included 176 students (96 women), predominately first-year (80%) university students (Mage = 18.78, SD = 1.80), recruited from the psychology participant pool and given course credit for participation. The main faculties in which students were enrolled in were Science (44.3%), Social Science (26.7%), and Business (10.2%). Most students were not psychology majors (76.1%). In terms of cultural/ethnic heritage, most of the participants were Caucasian (44.3%), followed by Asian (21%) and South Asian (15.3%). Participants were excluded if English was not the language they most often communicated in.
Measures
Self-esteem was measured using Rosenberg's (1965) well-validated scale that asks participants to rate ten items on a 4-point scale with responses averaged to create a global self-esteem score (α = .85). Sample items include “I feel that I have a number of good qualities” and “I am able to do things as well as most people.”
Narcissism was measured using the Narcissistic Personality Inventory (NPI; Raskin and Terry, 1988), which is a 40-item, well-validated, binary true–false format scale that is averaged into a total narcissism score (α = .79). Sample items include “I want to amount to something in the eyes of the world” and “Everybody likes to hear my stories.”
Students’ Numeric Evaluations of Teaching (Numeric SETs) were assessed by having participants rate their evaluator's (1) marking ability, (2) fairness, (3) helpfulness, and (4) general competence; along a scale from 1 (very poor/unfair/unhelpful/incompetent) to 7 (excellent/very fair/very helpful/very competent), following SETs standards used at the university from which students were recruited. Participants were also asked to provide written feedback to their evaluator. A marking ability composite was created by adding the four (4) evaluation questions together and creating a mean score, with higher scores indicating a more positive appraisal of the professor's competence (high SETs; low aggression) and lower scores indicating a more negative appraisal of the professor's competence (low SETs; high aggression). The SETs composite (α = .91) was supported by factor analytic results which showed a one-factor solution accounting for 79.46% of variance.
Students’ Written Evaluations of Teaching (Written SETs). Across all studies, participants’ written feedback to their evaluator were analyzed by having two independent raters blind to study design and condition code the written responses as either positive or negative. For example, the following were rated as positive:
“I think you marked this very well!! Thanks! No suggestions? Wow! I should be an English teacher. Haha! :) Thxs again!” (Study 1; Female, positive feedback condition)
“The marker not only graded my paper but he/she provided comments which made me proud of what I had written and the points, I had made. If only we had more profs like him/her students would be encouraged more to do better at university.” (Study 2; Female, positive feedback condition)
“Overall I was pleased knowing I could write well, I think it would be a little more helpful if I had some more feedback as to ways I could improve, in this case, my organization.” (Study 1; Female, positive feedback condition)
And these examples were rated as negative:
“Hope that I never find you!” (Study 2; Male, negative feedback condition)
“….The other problem I felt this T.A had is the final comment he left on the evaluation sheet. “Worst essays I have ever read!”… is inappropriate. I didn't give a flying fuck whether it was the best essay you have ever read. I'm not looking for the T.A to share my work in the New York Times. I'm simply asking him to Unbiasedly evaluate my style of writing by giving me advice on how to fix it. By telling me “I suck” doesn't help… Jerks!!! Maybe this T.A needs to think about helping students and not giving personal comments. “This T.A is the worst T.A. I have ever had!” Jerks!” (Study 2; Female, negative feedback condition)
For discrepant scores, a third rater was used. Cohen's κ was .76 for Study 1, indicating good agreement between raters.
Procedure
Participants were told they would be taking part in two related studies. The first study involved investigating the evaluation skills of professors and the second study involved examining whether “people with different personalities” held different opinions about euthanasia. This cover story was used so that students did not become suspicious1 about why their personal information was being collected and to avoid unduly influencing SETs ratings.
Participants were tested individually in the laboratory. Informed consent was obtained after they were told the purpose of the study and before they completed any of the tasks or measures. Participants were given up to 20 min to compose a one-paragraph essay on a laptop computer that was provided to them. The topic of euthanasia was chosen as it represented a topic that people could write about without having much prior knowledge. A definition of euthanasia was given and participants were told not to worry about spelling or to use the spell check program if they wanted. Participants were told their essays would be numerically evaluated (from A+ to F; following marking conventions of their university) on organization, originality, writing style, clarity of expression, persuasiveness of arguments, and overall quality.
Once participants completed their essays, they were told the essays would then be evaluated (graded) by the professor. Participants were randomized into four conditions. In the positive feedback condition (n = 20 men; n = 24 women), they were given standardized positive numeric feedback on their essays that ranged from B+ on organization to A+ on originality, and were given the written comment “No suggestions, great essay!.” In the negative feedback condition (n = 20 men; n = 24 women), participants were given standardized negative numeric feedback on their essays that ranged from a C- on originality and persuasiveness of arguments to a D on all other evaluation criteria. They were also given the written comment “This is one of the worst essays I have ever read!.” The lowest grade was given for organization to make the feedback more believable.2 In both conditions, comments were added to each essay at around the same location to convey the impression that the essay was indeed read by the professor (it was not). The sex of the professor was mentioned several times to the participants before they wrote their essays, and again before they provided their feedback. To ensure the written feedback did not influence students’ impression of their professor's sex, the handwriting of the evaluator was assessed by independent raters who were asked to indicate on a 3-point scale if the handwriting was feminine, masculine, or gender-neutral. All ten respondents indicated that the handwriting in the female professor condition was feminine and eight of ten respondents indicated that the handwriting was masculine for the male professor condition (the other two indicated that the handwriting was gender-neutral).
During this marking time (10 min), participants completed the measures on self-esteem, narcissism, and a demographic background questionnaire. Once the essays were “marked,” each participant's essay was returned to that participant in an envelope and they were asked to provide feedback to their evaluator about his/her marking ability, fairness, helpfulness, and general competence. Written feedback was also solicited.
After completion of the evaluation, participants were asked to place their SETs and their questionnaires into the envelope. They were then debriefed about the true purpose of the study, and later dismissed.
RESULTS AND DISCUSSION
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Numeric SETs
The means, standard deviations, and correlations are presented separately for each feedback condition in Table I. The effect size for SETs by feedback condition was large (d = 1.51; see Figure 1). Self-esteem and narcissism were correlated positively in both the positive (r = .34) and negative (r = .32) feedback conditions, consistent with the magnitude reported in Bushman et al. (2009). Narcissism was negatively correlated with SETs at r = –.26 (higher narcissism was associated with lower SETs) in the negative condition, a finding consistent with other studies of aggression (e.g. Bushman and Baumeister, 1998; Bushman et al., 2009).
| Correlations | |||||
|---|---|---|---|---|---|
| Self-esteem | Narcissism | SETs | Mean | SD | |
| |||||
| Study 1 professor's sex | |||||
| Positive feedback (N = 88) | |||||
| Self-esteem | 1.00 | 2.13 | 0.38 | ||
| Narcissism | .34* | 1.00 | 0.42 | 0.14 | |
| Students's evaluations of teaching | .21 | .09 | 1.00 | 5.20a | 1.35 |
| Negative feedback (N = 88) | |||||
| Self-esteem | 1.00 | 2.15 | 0.32 | ||
| Narcissism | .32* | 1.00 | 0.41 | 0.16 | |
| Students’ evaluations of teaching | –.18 | –.26** | 1.00 | 3.10a | 1.42 |
| Study 2 professor's rank | |||||
| Positive feedback (N = 80) | |||||
| Self-esteem | 1.00 | 2.12 | 0.35 | ||
| Narcissism | .41* | 1.00 | 0.41 | 0.14 | |
| Student’ evaluations of teacher | .26** | .00 | 1.00 | 5.13b | 1.37 |
| Negative feedback (N = 80) | |||||
| Self-esteem | 1.00 | 2.14 | 0.37 | ||
| Narcissism | .36* | 1.00 | 0.40 | 0.15 | |
| Students’ evaluations of teaching | –.07 | –.07 | 1.00 | 3.06b | 1.24 |
Figure 1. Mean differences, post-hoc comparisons, and effect sizes by feedback condition: Studies 1, 2, and 3. Notes. Higher aggression = lower students’ evaluations of teaching; Study 1 d = 1.51; Study 2 d = 1.59; Study 3 partial η2 = .26, Student Newman–Keuls post-hoc comparison 1 < 2, 3 < 4, 5.

Following procedures used by Bushman et al. (2009), hierarchical multiple regression analysis was conducted to examine predictors of SETs (i.e. aggression). Predictor variables were centered to increase interpretability. The positive feedback condition and male professor conditions were coded as –1 and negative feedback condition and female professor conditions were coded as 1. Statistically significant moderating effects were examined by comparing simple slopes as per recommendations by Aiken and West (1991). Because aggression has been shown to differ as a function of sex (men tend to be more aggressive than women, a difference that tends to be attenuated as a function of provocation; Bettencourt & Miller, 1996), the sex of participants was controlled for Step 1 of the analysis (dummy coded as –1 for males and 1 for females).
Results indicated a statistically significant effect for feedback condition, b = 1.06, t(164) = 10.58, P < .0001. Participants in the negative feedback condition provided lower SETs than those in the positive feedback condition.
An interaction between self-esteem × feedback condition, b = 0.77, t(164) = 2.44, P < .02, was found and further examined by testing the simple slopes. Results indicated at high (b = 1.33, P < .0001) and low (b = 0.79, P < .0001) levels of self-esteem, the slopes were different from zero. Specifically, in the negative feedback condition, high self-esteem was associated with lowest SETs, and in the positive feedback condition, high self-esteem was associated with highest SETs.
There is debate in the literature about whether it is high or low self-esteem that is associated with aggression. Baumeister, Smart, and Boden (1996) have argued that high self-esteem contributes to aggression (see also; Bushman and Baumeister, 1998; Bushman et al., 2009), whereas Donnellan, Trzesniewski, Robins, Moffitt, and Caspi (2005) have argued that in fact it is low self-esteem that predicts aggression. These results support the view that high self-esteem is related to aggression. However, it is worthwhile to mention that in the context of receiving a high grade and being praised, high self-esteem was associated with the highest SETs. This finding is consistent with studies showing that self-enhancing and self-protective biases are more pronounced in people with high self-esteem than those with low self-esteem. In particular, when people with high self-esteem are given positive feedback, they tend to think the evaluation is more credible and that the evaluator is more competent than people with low self-esteem (e.g. Shrauger & Lund, 1975; Swann, Griffin, Predmore, & Gaines, 1987). In the present study, participants with high self-esteem rated their professor the highest when the feedback was positive. By rating the professor highly they were, in essence, suggesting that the positive grade and feedback they received was indeed valid and was given by a competent individual who would be skilled enough to recognize their talent. The caveat of course is that the essay was never really evaluated.
Donnellan et al. (2005) pointed out that the small and fluctuating effect sizes for self-esteem and aggression likely indicate the “presence of moderator variables” (p. 334). In Study 1, a self-esteem × narcissism interaction was also found, b = –6.70, t(164) = –3.36, P < .001, consistent with Donnellan et al.'s point. The simple slope analysis indicated that the lowest SETs scores were associated with high self-esteem and high narcissism (b = –1.24, P < .02). The simple slopes relating narcissism and SETs (aggression) at high and low levels of self-esteem were also tested by feedback condition, following procedures by Bushman et al. (2009). Results indicated that, in the negative feedback condition, high self-esteem combined with high narcissism was associated with the lowest SETs (b = −3.62, P < .004). This finding replicates findings reported by Bushman et al. using low SETs as a measure of aggression rather than a noise blast.
It was hypothesized that the sex of the professor would also moderate the relationship between feedback condition and SETs. Specifically, it was expected that female professors would be rated lower than male professors, and in particular, in the negative feedback condition. Results indicated no moderating effect of professors’ sex. Although several researchers have reported that students tend to rate their female professors lower than their male professors, the evidence is far from conclusive—many have also reported no sex differences (see review by Laube et al., 2007).
Written SETs
Most (92.6%) participants provided written feedback to their professor. Because the written SETs were coded as positive or negative a hierarchical logistic regression was used. The covariates were participants’ sex (female = 1, male = 0) entered at Step 1, professor's sex (female = 1, male = 0), feedback condition (positive feedback = 0; negative feedback = 1), self-esteem, and narcissism (centered) entered at Step 2, and the interaction between self-esteem and narcissism entered at Step 3. The full model containing all predictors was statistically significant, χ2(df = 6) = 49.53, P < .0001. The model explained between 26.6 (Cox and Snell R2) and 35.5% of the variance (Nagelkerke R2) in SETs (aggression outcome). The Wald criterion demonstrated that only feedback condition was statistically significant (Wald = 35.04, β = 2.29, P < .0001; Exp(β) = 9.87). Specifically, participants in the negative condition were almost ten times more likely to provide a negative written response to their evaluator than those in the positive feedback condition.
STUDY 2
Study 2 examined the effect of receiving a negative or positive numeric and written evaluation on students’ appraisal of their evaluator and whether their evaluations would be moderated by (1) the supposed rank of their evaluator (TA vs.professor), and (2) students’ level of self-esteem and narcissism.
METHOD
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Participants
Participants included 160 (80 women), predominately first-year (85%) university students (Mage = 19.16, SD = 3.18), recruited from the psychology participant pool and given course credit for their psychology course. The main faculties in which students were enrolled in were Science (43.4%), Social Science (25.7%), and Business (12.5%) and 80% were not psychology majors. Most students were Caucasian (50%) followed by South Asian (20.3%) and Asian (15.2%).
Measures
Rosenberg's self-esteem measure (α = .86) and the NPI (α = .76) were again used in this study. SETs were complied in the same manner as Study 1 (α = .91; one-factor solution 80.33% of variance; k = .85 for written SETs).
Procedure
Similar procedures were taken as described in Study 1, however in this phase, participants were cued about the rank of their evaluator. Specifically, participants were randomly assigned to one of four conditions: (1) positive feedback/professor (n = 20 men; n = 20 women), (2) negative feedback/professor (n = 20 men; n = 20 women), (3) positive feedback/TA (n = 20 men; n = 20 women), (4) negative feedback/TA (n = 20 men; n = 20 women). Participants were told on several occasions that they were being evaluated by either a professor or a graduate TA. The sex of their evaluator was never mentioned to the participant and caution was taken to ensure the handwriting of the evaluator did not inadvertently have participants think about the sex of their evaluator. The handwriting of the evaluator was assessed by ten independent raters who were asked to indicate on a 3-point scale if the handwriting was feminine, masculine, or gender-neutral. All ten responded gender-neutral.
RESULTS AND DISCUSSION
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Numeric SETs
The means, standard deviations, and correlations are presented separately for each feedback condition in Table II. The effect size for SETs by feedback condition was again large (d = 1.59; see Figure 1). Self-esteem and narcissism were again correlated positively in the positive (r = .41) and negative (r = .36) feedback conditions. In this study, narcissism was not related with SETs, however, self-esteem was positively correlated with SETs in the positive feedback condition (r = .26).
| Negative numeric with insult (1) Mean (SD) | Negative numeric only (2) Mean (SD) | Negative numeric with praise (3) Mean (SD) | Positive numeric only (4) Mean (SD) | Positive numeric with praise (5) Mean (SD) | SNK Post-Hoc Comparison | |
|---|---|---|---|---|---|---|
| ||||||
| Main effect | ||||||
| 2.94 (1.21) | 3.94 (1.24) | 4.03 (1.13) | 4.86 (1.70) | 5.17 (1.41) | 1 < 2, 3 < 4, 5 | |
| Manipulation validation | ||||||
| How would you describe your experience in receiving the feedback on your essay? | ||||||
| Negative (1) to positive (7) | 1.81 (1.42) | 2.38 (1.26) | 3.50 (1.26) | 6.56 (0.51) | 6.69 (0.79) | 1, 2 < 3 < 4, 5 |
| Threatening (1) to nonthreatening (7) | 3.00 (1.83) | 3.88 (1.67) | 5.25 (1.48) | 6.56 (0.73) | 6.63 (1.02) | 1, 2 < 3 < 4, 5 |
| Malicious (1) to nonmalicious (7) | 2.44 (1.26) | 4.63 (1.69) | 5.12 (1.50) | 6.69 (0.79) | 6.75 (0.77) | 1 < 2, 3 < 4, 5 |
| Unfair (1) to fair (7) | 3.00 (1.77) | 3.63 (1.50) | 4.31 (1.40) | 6.25 (1.12) | 5.94 (1.52) | 1, 2 < 3 < 4, 5 |
| How did the feedback make you feel? | ||||||
| Angry (1) to calm (7) | 2.38 (1.09) | 4.00 (1.97) | 3.87 (1.23) | 6.25 (0.86) | 6.75 (0.58) | 1, 2 < 3 < 4, 5 |
| Unhappy (1) to happy (7) | 2.19 (1.47) | 2.88 (1.15) | 3.19 (0.83) | 6.44 (0.73) | 6.62 (0.81) | 1, 2 < 3 < 4, 5 |
| Embarrassed (1) to proud (7) | 2.50 (1.21) | 3.00 (1.26) | 2.88 (0.96) | 6.10 (1.06) | 6.10 (1.29) | 1, 2, 3 < 4, 5 |
| Stupid (1) to smart (7) | 2.94 (1.57) | 3.19 (1.64) | 3.31 (1.64) | 5.81 (1.05) | 6.25 (0.86) | 1, 2, 3 < 4, 5 |
| Discouraged (1) to encouraged (7) | 3.31 (2.06) | 3.56 (1.86) | 3.50 (1.32) | 6.13 (0.72) | 6.44 (0.89) | 1, 2, 3 < 4, 5 |
Hierarchal multiple regression analysis was conducted using the same analytic procedures described in Study 1, including controlling for sex of participants at Step 1. Results indicated statistically significant effects for feedback condition, b = 1.04, t(148) = 10.02, P < .0001. As was the case with Study 1, individuals in the negative feedback condition provided lower SETs than those in the positive feedback condition, consistent with results obtained using noise blasts as an aggression outcome (e.g. Bushman & Baumeister, 1998; Bushman et al., 2009). These results indicate that poor SETs are likely a revenge response, providing experimental evidence in favor of the grading leniency hypothesis.
It was hypothesized that self-esteem and narcissism, as well as the rank of the evaluator, would moderate the relationship between feedback condition and SETs. Results indicated no moderating effects of rank and narcissism. It is likely that rank did not moderate the relationship because the university from which students were recruited relies heavily on the use of TAs for marking papers and grading exams. Accordingly, students may not think much about the status difference between TAs and professors in that both wield considerable power in assigning grades.
The fact that narcissism did not emerge as a moderate is surprising, given the results of Study 1, and considering that narcissism has been shown to be a more consistent predictor of direct aggression than high self-esteem in the context of being slighted or rejected (e.g. Bushman & Baumeister, 1998). It is likely that the study was under powered to detect a moderating effect of narcissism (i.e. the magnitude of the effect for feedback was very large, d = 1.59). Nevertheless although the effect was nonsignificant, it was in the predicted direction (narcissism was negatively associated with SETs, as it was in Study 1).
An interaction between self-esteem × feedback condition, b = 0.66, t(148) = 2.11, P < .04 was found. Consistent with Study 1, at high (b = 1.28, P < .0001) and low (b = 0.80, P < .0001) levels of self-esteem, the slopes were different from zero. In the negative feedback condition, high self-esteem was associated with lowest SETs, and in the positive feedback condition, high self-esteem was associated with highest SETs.
Written SETs
Most (91.9%) participants provided written feedback to their evaluator. Hierarchical logistic regression was again used to predict the valence of the written feedback (positive or negative). The covariates were participants’ sex (female = 1, male = 0) entered at Step 1, evaluator's status (professor = 1, TA = 0), feedback condition (positive feedback = 0; negative feedback = 1), self-esteem, and narcissism (centered) entered at Step 2, and the interaction between self-esteem and narcissism entered at Step 3. The full model containing all predictors was statistically significant, χ2 (df = 6) = 43.06, P < .0001. The model explained between 25.4 (Cox and Snell R2) and 33.9% of the variance (Nagelkerke R2) in SETs (aggression outcome). The Wald criterion demonstrated that only the feedback condition was statistically significant (Wald = 33.00, β = 2.28, P < .0001; Exp(β) = 9.77). Specifically, participants in the negative feedback condition were close to ten times more likely to provide a negative written response to their evaluator than those in the positive feedback condition.
STUDY 3
Given the possibility that students’ negative appraisals in Studies 1 and 2 were influenced by the particularly harshly written statement: This is one of the worst essays I have ever read!,” Study 3 examined the effect of receiving a poor or good numeric evaluation, coupled with a less offensive written appraisal in the negative feedback condition, or no feedback at all in the positive and negative feedback conditions.
Methods
Participants
Participants included 150 students (80 women), predominately first-year (87%) university students (Mage = 18.81, SD = 3.02). Students were recruited from the psychology participant pool and given course credit for participation. The main faculties in which students were enrolled in were Science (46%), Social Science (28.7%), and Business (11.3%). Most students were not psychology majors (77.2%) and most students were Caucasian (46.7%) followed by Asian (18.7%) and South Asian (17.3%).
Measures
Rosenberg self-esteem measure (α = .86) and the NPI (α = .77) were again used in this study. SETs were complied in the same manner as in Studies 1 and 2 (α = .86; one-factor solution 77.23% of variance; k = .73 for written SETs).
Procedure
Participants were randomized into one of five feedback conditions, following the numeric marking convention and procedures described in Study 1. In this study, the written feedback was manipulated. Specifically, the five feedback conditions were (1) negative numeric with insult (“This is one of the worst essays I have ever read!”), plus negative comments on side margin (see Study 1; n = 15 men; n = 15 women), (2) negative numeric only (no comments given; n = 13 men; n = 17 women), (3) negative numeric with praise (“Good start, but more work is needed to clearly explain your ideas”), plus positive comments on side margin (see Study 1; n = 15 men; n = 15 women), (4) positive numeric only (no comments given; n = 15 men; n = 15 women), and (5) positive numeric with praise (“No suggestions, great essay!”), plus positive comments on side margin (see Study 1; n = 15 men; n = 15 women).
The feedback for condition 3 (negative numeric with praise) was developed with the input from 157 third-year students (76% women; Mage = 20.56, SD = 1.39), who were asked to indicate which statement they thought was “the most effective at telling a student, in a nice way, that he/she has composed a poorly written essay.” The four options were (1) “Requires more work to create a more effective structure and organization,” (2) “Good start, but more work is needed to clearly explain your ideas,” (3) “You are getting there but some effort is needed to further develop this essay,” and (4) “Your point would have been made clearer with more detailed arguments.” Most students (52%) endorsed option number 2 and thus it was used in this study.
RESULTS AND DISCUSSION
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
Manipulation validation
The five levels of feedback were examined to verify the impact of the ego threat manipulation. Participants included 80 students (50 women), predominately first-year (66%) university students (Mage = 18.74, SD = 1.30). Although participants were recruited from the psychology participant pool and were given course credit for their psychology course, most (77.5%) were not psychology majors. The main faculties in which students were enrolled in were Science (50%), Social Science (18.8%), and Business (7.5%). Most students were Caucasian (42.5%) followed by South Asian (21.3%) and Asian (16.3%).
Following procedures used in Studies 1 and 2, participants were randomized into one of five feedback conditions (see procedures above); however in this phase of Study 3, participants did not proceed to the SETs phase. Instead, participants completed a survey about their experience and feelings regarding the feedback they received. Specifically, participants were asked to describe how negative/positive, threatening/nonthreatening, malicious/nonmalicious, and unfair/fair their experience was, and how they felt in terms of being calm/angry, happy/unhappy, proud/embarrassed, smart/stupid, and encouraged/discouraged. These questions were presented to students in a counter-balanced fashion along a 7-point rating scale ranging from 1 to 7.
Given the number of validation questions asked, two separate multivariate analyses of variance (MANOVAs) were conducted to control for Type I error with type of feedback as the IV (five levels) and students’ experiences or feelings as the DV. Results confirmed that the more negative the feedback participants’ received, the more they found it to be negative, threatening, malicious, and unfair (Wilks’ Λ = .14, F (16, 220.60 = 12.17, P < .0001)); and more they felt angry, unhappy, embarrassed, stupid, and discouraged (Wilks’ Λ = .14, F (20, 236.43 = 8.89, P < .0001)). The means, standard deviations, and post-hoc comparisons are presented in Table II. Interestingly, receiving a low numeric evaluation that was coupled with an insult did not always engender more negative feelings or more negative impressions than receiving a low numeric evaluation.
Numeric SETs
Because the feedback predictor had five levels, a multiple variable analysis of variance (ANOVA) in a backward hierarchical fashion was used.3 As in Studies 1 and 2, participants’ sex was controlled for and SETs were predicted from centered scores of self-esteem, narcissism, and type of feedback. Results indicated that only feedback condition was a statistically significant predictor of SETs, F (4, 141) = 12.41, P < .0001, partial η2 = .26. The means, standard deviations, and post-hoc comparisons are presented in Table II and mean differences are represented in Figure 1. As seen in Figure 1, participants in the negative numeric with insult condition rated their professor the lowest of all five feedback conditions and participants in the positive numeric with praise condition rated their professor the highest. Post-hoc analyses revealed there was no difference in SETs between those in the negative numeric only and negative numeric with praise conditions, as well as no difference between those in the positive numeric only and the positive numeric with praise conditions.
Taken together, these findings suggest that buffering the blow to ego (poor grade) with praise did not change the way students rated their professor. All things considered, it is grades that appear on students’ transcripts and not comments and therefore the most salient (and likely ego threatening) aspect of the feedback for this population seems to be the grade received, consistent with the grading leniency hypothesis.
Written SETs
Most (98%) participants provided written feedback to their evaluator. Hierarchical logistic regression was again used to predict the valence of the written feedback (positive or negative). The covariates were participants’ sex (female = 1, male = 0) entered at Step 1, feedback condition (reference category = Positive numeric with praise), self-esteem, and narcissism (centered) entered at Step 2, and the interaction between self-esteem and narcissism entered at Step 3. The full model containing all predictors was statistically significant, χ2(df = 8) = 31.49, P < .0001. The model explained between 19.6 (Cox and Snell R2) and 26.2% of the variance (Nagelkerke R2) in SETs (aggression outcome). The Wald criterion demonstrated that only threat was statistically significant (Wald = 23.76, P < .0001). Specifically, compared to the positive numeric condition with praise, participants in the negative numeric with praise were 4.27 times more likely to rate the professor negatively (β = 1.51, P < .02), participants in the negative numeric only condition were 6.73 times more likely to rate the professor negatively (β = 1.91, P < .02), and those in the negative numeric with insult were 19.3 times more likely to rate the professor negatively (β = 2.95, P < .0001). There was no difference between those who were given a good grade with and without praise. This “dose effect” was similar to the one found for numeric SETs.
These sets of results suggest that the "tit-for-tat" effect on SET was not necessarily a function of being insulted. Rather, low SETs were related to receiving a low grade. Indeed, even though the lowest SET was given to "professors" who gave a low grade and insulted the participant, a low grade coupled with an encouraging comment or with no comment still engendered a negative reaction. Moreover, participants’ SETs were not differentiated on the basis of whether or not a high grade was tied to a positive written comment. These results seem to imply that comments (insult or praise) matter less than the grade received.
GENERAL DISCUSSION
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
The goal of this research was to examine the grading leniency hypothesis using experimental design. It was hypothesized that poor teaching evaluations are a form of aggressive behavior motivated by revenge for poor appraisals of the student by the professor, and accordingly, it was expected that students would reward professors for good grades with high SETs and would punish professors for bad grades with low SETs. It was also hypothesized that the relationship between grade feedback and SETs would be moderated by professor's sex (poorer SETs for female professors), rank (poorer SETs for TAs), strength of ego-threat (poorest SETs for most negative feedback), and by participants’ level of self-esteem and narcissism.
Experimental studies on aggression provide compelling evidence that students do aggress against others who have slighted them by providing negative feedback about their writing competence (e.g. Bushman & Baumeister, 1998; Bushman et al., 2009; Jones & Paulhus, 2010; Konrath et al., 2006). Oddly, researchers examining the real world affront of receiving a poor grade have argued that low SETs are not a form of revenge (e.g. Arnold, 2009; Boysen, 2008). Results of the present studies diverge from this point of view. Specifically, results indicated that students did seem to reward professors for good grades with high SETs and did seem to punish them for low grades with low SETs. The effect sizes across all studies were very large. Moreover, the means and standard deviations were almost identical across feedback conditions in all three studies even though different factors were manipulated in each study (see Tables I and II). In addition to providing a favorable or unfavorable numeric SETs, students also provided professors with flattering (e.g. “I wish all evaluators were like this one.”) or harsh (“The evaluator is a complete dumb ass.”) written feedback, based on the grade they received. In fact, students were between 10 and 19 times more likely to provide negative feedback to their evaluator when they were in the negative feedback condition than in the positive feedback condition.
Aggression is defined as an act that is intended to harm or thwart another person (Mischel, 1993). Providing low SETs/negative comments to professors on the basis of receiving a poor grade is a form of aggression (i.e. revenge) as SETs are used by universities for decisions about tenure, promotion, and annual raises. Many students are aware of this fact (Marlin, 1987) and those who are not, likely implicitly know that the professor will be hurt by receiving negative feedback.
The fact that self-esteem (Studies 1 and 2) and narcissism (Study 1) moderated the relationship between feedback and SETs further emphasizes the point that low SETs are a form of revenge. In Study 1, students high on self-esteem and narcissism provided the lowest SETs when their ego was threatened, a finding that replicates the results from numerous studies examining narcissism and/or self-esteem on direct aggression (e.g. Barry et al., 2006; Bushman et al., 2009; Jones & Paulhus, 2010; Konrath et al., 2006; Stucke & Sporer, 2002; Thomaes et al., 2008; Twenge & Campbell, 2003). Results from Studies 1 and 2 also showed that high self-esteem was associated with the lowest SETs in the negative feedback condition but also the highest SETs in the positive feedback condition. As mentioned previously, there is debate in the literature about whether high or low self-esteem is related to aggression. Experimental studies tend to find support for the high self-esteem–aggression hypothesis (e.g. Bushman et al., 2009) whereas nonexperiential studies have shown that low self-esteem is related to aggression (e.g. Donnellan et al., 2005). The results of the present study suggest that high self-esteem is problematic when there is negative feedback given (i.e. a threat to ego). However, when praise is given, people with high self-esteem seem to be munificent with their evaluations, consistent with studies showing that self-enhancing biases are more distinct in people with high self-esteem rather than low self-esteem (Shrauger & Lund, 1975; Swann et al., 1987; see also Blaine & Crocker, 1993).
As Crocker, Karpinski, Quinn, & Chase (2003) point out, for university students, “few events are as important as receiving grades for their course work” (p. 507). Students’ self-worth is strongly linked to their academic performance—poor grades are associated with significant drops in self-esteem (Crocket et al., 2003). The importance of doing well academically is highlighted by results of Study 3. Although participants in the negative numeric with insult condition rated their professor the lowest, there was no difference in SETs between those in the negative numeric only and negative numeric with praise conditions, as well as no difference between those in the positive numeric only and the positive numeric with praise conditions. This finding suggests that cushioning the impact of a poor grade with praise did not seem to change the way students rated their professor. Rather, it appears that students were much more focused on the grades they received, and how those grades were justified by the instructor seemed inconsequential. For example, a participant from Study 1 wrote the following: “The evaluator seemed to not have much feedback about the essay. Also, it was very unclear as to what specific sections of the essay the evaluator was referring to in his comments. Rather than simply say my essay was great (I already know this), specific improvements that could be made would be helpful. His marking style seems rather arbitrary if he grades by my organization with a B+ but then offers no suggestions for improvement.” It is noteworthy that this student focused on the one grade given in the positive feedback condition that was not an A. This type of comment was not unique. For example, another male participant from Study 1 wrote “While I did appreciate my grade, I would've liked an explanation of my B+ for organization + a little blurb stating why he though what he did. Overall, just suggest something if you're not going to give someone all A's.”
It was hypothesized that the evaluator's sex and rank would moderate the relationship between type of feedback and SETs, however no moderating effect was found. Results on the impact of the professor's sex on SETs outcomes are mixed; with some showing women are rated lower than men and other finding no sex differences (see review by Laube et al., 2007). Given these varied results, it is perhaps not surprising that no relationship was found. It could be that this null finding reflects societal changes in gender equity that includes increases in the number of female professors now employed at universities (Statistics Canada, 2012). Research showing sex differences whereby female professors are rated lower than male professors is dated. It is also likely that the threat to ego was so pronounced that it trumped any other instructor-related variable. The same may be true for evaluators’ rank, which did not influence the magnitude of the feedback effect despite hypothesizing an interaction between rank, narcissism, and feedback condition (narcissists were expected to rate TAs the lowest in the negative feedback condition). Participants were recruited from a top-tiered Canadian university that relies heavily on TAs for marking exams and papers. It could be that at other academic institutions, where grading is at the full discretion of the professor, an effect by rank may be found.
Limitations
It is expected that those who oppose the grading leniency hypothesis will argue the results from the present studies do not generalize to the real world for three main reasons: One, in the real world, students have an expectation that the grade they receive will reflect their past and current efforts. Two, in the real world, SETs are typically given before students receive their final grade. According to Marsh and Roche (1997), giving students a manipulated grade before collecting SETs likely enhances the “saliency of the grades or violations of reasonable grade expectations” (p. 1192). Three, in the real world, SETs are not just a function of the grades they have received—they are also based on other factors such as course interest and characteristics of the professor.
Although students randomized to the negative feedback condition received a grade that very likely disconfirmed their expectation, it is worthy to draw attention to the fact that this would be true of most poor grades received at university, especially those given to students in their first year who have yet to calibrate their expectations with the reality of bell-curve marking schemes. One of the largest effect sizes (d = 0.96) demonstrated on human cognition is the positivity bias in attributions (Mezulis, Abramson, Hyde, & Hankin, 2004). College/university students think well of themselves and their intellectual ability. They enter university with the expectation that the high grades they received in high school will continue throughout their post-secondary schooling. This inflated self-assessment means that most low grades will represent an expectation violation because most students overvalue their ability and performance. As one example, Kruger and Dunning (1999) found that students’ perceptions of their general grammar ability were unrelated to their actual performance. Of relevance to higher educational practice is the finding by Holmes (1972), who demonstrated that when students’ grades disconfirmed their expectations, they tended to “deprecate the instructor's teaching performance” beyond the area of his/her grading system (p. 130). Disconfirming expectations in higher education characteristically means students fared worse than they expected, not better, and this likely occurs more often among students who are enrolled in top-tiered universities. After all, these students are admitted precisely because of their high grade point average. The students who participated in Studies 1, 2, and 3 were enrolled in a university that has an average entrance grade of A across faculties (McMaster University Office of Institutional Research and Analysis, 2011). And yet, despite their exceptional academic performance in high school, students at this university do not uniformly receive As. In Studies 1, 2, and 3, participants were drawn from the introductory psychology participant pool, which means they were enrolled in one of the introductory psychology courses. In the years from which participants were recruited, less than a third of students (M = 27.8%; SD = 3.76) enrolled in these courses received an A (R. Day, personal communications, January 24, 2012).
The argument made by Marsh and Roche (1997), that the timing of feedback enhances the saliency of the grades and hence unnaturally influences SETs, is a reasonable criticism. Historically, academic institutions did administer SETs before the final examination. However, many academic institutions now have their SETs on-line and these forms can be accessed before and after the final examination. When SETs are completed does seem to matter. Arnold (2009) examined SETs obtained from Dutch students enrolled in the Erasmus School of Economics before and after the final examination. In this faculty, between 70 and 100% of the course grade comes from the final exam. Results showed that among students who passed the final exam, there were few differences in SETs between those who had filled out the teaching evaluation form before or after the exam. However, among students who failed the final exam, SETs were significantly lower. Interestingly, Arnold concluded that these findings were consistent with a “self-serving bias in student evaluations” and were not indicative of that “students [sought] revenge on instructors through lower ratings” (p. 2009). The results of the present studies support a self-enhancing bias among those who were evaluated positively (and particularly if they were high on self-esteem), but they also suggest that students sought revenge. Again, further evidence that SETs are used as a form of aggression comes from the results that high self-esteem and high narcissism were associated with the lowest SETs when these participants were given a poor grade. This finding replicates that which has been shown, using direct aggression as an outcome (reviewed above).
It is true that there are many factors such as course interest and characteristics of the professor that influence SETs. This important point notwithstanding, the purpose of the present studies was to demonstrate that grades matter a lot. This was clearly demonstrated with the very large effect sizes found across all three studies. The extent to which grades matter the most (very likely do) beyond other course and student-related factors need to be further examined using an experimental approach so that causality can be inferred. The quality of the written essays may have influence the interaction between feedback and SETs. For example, it could be that a student who wrote a poor quality essay and was given negative feedback would be less critical of his/her evaluator than a student who wrote a good quality essay but was given negative feedback. Conversely, a student who wrote a poor quality essay in the positive condition may be less impressed with his/her evaluator's competence than a student who wrote a good quality essay in the positive feedback condition. Assessing the quality of participants’ work and how it relates to the type of feedback they give their evaluator is an important next step in this research.
Implications
The findings from the present studies have important implications for professors in that they demonstrate that SETs are biased and that favorable SETs can be bought with high grades. Given the trend toward grade inflation seen over the past few decades (Anglin & Meng, 2000; Kuh & Hu, 1999; Redding, 1998), it is likely that many professors have already capitalized on this relationship without the benefit of these results. The results also seem to suggest that grades matter more than feedback. This is problematic because many professors and TAs put a concerted effort into providing feedback that students can learn from. The findings from Study 3 imply that once a poor grade is received, the feedback is all but ignored and the focus is on the grade. However, it is possible that this effect was driven by the fact that the feedback given was not extensive. Indeed, students may have responded differently had the feedback been more thorough, with specific instances where improvements could be made. Accordingly, these results need to be replicated and expanded as the implications for teaching practice are significant. The moderating roles of self-esteem and narcissism have not been considered in the SETs literature (notwithstanding the enormous amount of research conducted in this area). Social psychologists have long considered the importance of person in context interactions. To date, the focus of SETs studies has been on instructor and course-related factors, with little consideration of student characteristics (beyond their academic interests). Future research in this area ought to consider personality and psychopathology as explanatory variables as they appear to have a significant impact on SETs.
ACKNOWLEDGEMENTS
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
I thank Heather Brittain, Amanda Krygsman, Eric Duku, Justin Mattina, Graham Trull, Mohamed Al-Hakim, Kristen Hamilton, Richard Day, and Brad Bushman for their help with this research. I also thank Charles Cunningham, Jessica Whitley, Louis Schmidt, Rebecca Lloyd, and the graduate students in the Brain and Behaviour Laboratory at the University of Ottawa for their helpful comments on this manuscript.
- 1
A total of four participants (two women and two men) were excluded from Studies 1, 2, and 3 because they indicated in the debriefing that they figured out the true purpose of the experiment. Another participant was excluded based on her written feedback to the evaluator that indicated she was suspicious (“Obviously the marker has no experience marking essays or wants everyone to fail the course. While it is likely that the mark was predetermined based on the experiment, if this was the real mark, I'd be very upset”).
- 2
Because students were only given 20 min to complete the essay, organization was given the lowest grade to increase the face validity of the feedback.
- 3
Results from the multiple regression analysis indicated that only feedback condition was statistically significant (the same pattern found with the ANOVA, thus data are not shown).
REFERENCES
- Top of page
- Abstract
- INTRODUCTION
- STUDY 1
- METHOD
- RESULTS AND DISCUSSION
- METHOD
- RESULTS AND DISCUSSION
- RESULTS AND DISCUSSION
- GENERAL DISCUSSION
- ACKNOWLEDGEMENTS
- REFERENCES
- , , & (2006). Students' perceptions of course difficulty and their ratings of the instructor. College Students Journal, 40, 409-416.
- , & (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
- (1987). Typical faculty concerns about student evaluation of teaching. New Directions for Teaching and Learning, 31, 25-31.Direct Link:
- , & (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64, 431-441.
- , & (2000). Evidence on grades and grade inflation at Ontario's universities. Canadian Public Policy, 26, 361-368.
- , & (1979). Attribution, affect, and college exam performance. Journal of Educational Psychology, 71, 85-93.
- (2009). Do examinations influence student evaluations? International Journal of Educational Research, 48, 215-224.
- , , & (2006). Aggression following performance feedback: The influences of narcissism, feedback valence, and comparative standard. Personality and Individual Differences, 41, 177-187.
- , & (1987). Student evaluations of college professors: Are female and male professors rated differently? Journal of Educational Psychology, 79, 308-314.
- , , (1996). Relation of threatened egotism to violence and aggression: The dark side of high self-esteem. Psychological Review, 103, 5-33.
- (1982). Student perceptions of and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation. Journal of Educational Psychology, 74, 170-179.
- , & (2005). Ratings of university teacher instruction: How much do student and course characteristics really matter? Assessment & Evaluation in Higher Education, 30, 593-601.
- , & (1996). Gender differences in aggression as a function of provocation: A meta-analysis. Psychological Bulletin, 119, 422-447.
- , & (1993). Self-esteem and self-serving biases in reactions to positive and negative events: An integrative review. In R. F. Baumeister (Ed). Self-esteem: The puzzle of low self-regard (pp. 55-85). Hillsdale, NJ: Erlbaum.
- (1991). The effects of anonymity and manipulated grades on student ratings of instructors. Community College Review, 18, 48-54.
- (2008). Does the relationship between student ratings of course easiness and course quality vary across schools? The role of school academic rankings. Assessment & Evaluation in Higher Education, 33, 455-464.
- (2008). Revenge and student evaluations of teaching. Teaching of Psychology, 35, 218-222.
- (1976). Faculty ratings and student grades: A university-wide multiple regression analysis. Journal of Educational Psychology, 68, 573-578.
- , & (1998). Threatened egotism, narcissism, self-esteem, and direct and displaced aggression: Does selflove or self-hate lead to violence? Journal of Personality and Social Psychology, 75, 219-229.
- , , , , , & (2009). Looking again, and harder, for a link between low self-esteem and aggression. Journal of Personality, 77, 427-446.
- , & (2007). Ratemyprofessors.com versus formal in class student evaluations of teaching. Practical Assessment, Research and Evaluation, 12, 1-15.
- , , , & (2003). When grades determine self-worth: Consequences of contingent self-worth for male and female engineering and psychology majors. Journal of Personality and Social Psychology, 85, 507-516.
- , & (1980). Attributions for exam performance. Journal of Applied Social Psychology, 10, 235-248.
- , , , , & (2005). Low self-esteem is related to aggression, antisocial behavior, and delinquency. Psychological Science, 16, 328-335.Direct Link:
- American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed). Washington, DC: Author.
- , , & (2004). Web-based student evaluations of professors: The relations between perceived quality, easiness and sexiness. Assessment & Evaluation in Higher Education, 29, 91-108.
- , & (2009). Turning up the heat on online teaching evaluations: Does “hotness” matter? Teaching of Psychology, 36, 189-193.
- , & (1979). Locus of control and causal attribution for positive and negative outcomes on university examinations. Journal of Research in Personality, 13, 154-160.
- , & (1997). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209-1217.
- (2004). Grading leniency, grade discrepancy, and student ratings of instruction. Contemporary Educational Psychology, 29, 410-425.
- (1972). Effects of grades and disconfirmed grade expectancies on students' evaluations of their instructor. Journal of Educational Psychology, 63, 130-133.
- , & (2005). Do higher grades lead to favorable student evaluations? The Journal of Economic Education, 36, 29-42.
- , & (2010). Different provocations trigger aggression in narcissists and psychopaths. Social Psychological and Personality Science, 1, 12-18.
- (1975). Grades expected and grades received their relationship to students' evaluations of faculty performance. Journal of Educational Psychology, 67, 109-115.
- , , & (1988). Sex role stereotyping of college professors: Bias in students' ratings of instructors. Journal of Educational Psychology, 80, 342-344.
- , , & (2006). Attenuating the link between threatened egotism and aggression. Psychological Science, 17, 995-1001.Direct Link:
- , & (1999). Grades and student evaluations of teachers. Economics of Education Review, 18, 59-63.
- , & (1999). Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self assessments. Journal of Personality and Social Psychology, 77, 1121-1134.
- , & (1999). Unraveling the complexity of the increase in college grades from the mid-1980s to the mid-1990s. Educational Evaluation and Policy Analysis, 21, 297-320.
- (2008). Management by results: Student evaluation of faculty teaching and the mis-measurement of performance. Economics of Education Review, 27, 417-428.
- , , , & (2007). The impact of gender on the evaluation of teaching: What we know and what we can do. NWSA Journal, 19, 87-104.
- McMaster University Office of Institutional Research and Analysis. (2011). Common University Data Ontario (CUDO). Retrieved from http://www.mcmaster.ca/avpira/cudo.html
- (1987). Student perception of end-of-course evaluations. Journal of Higher Education, 58, 704-716.
- (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388.
- , & (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychology, 52, 1187-1197.
- , & (2000). Effects of grading leniency and low workload on students' evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92, 202-228.
- , , , & (2004). Is there a universal positivity bias in attributions? A meta-analytic review of individual, developmental, and cultural differences in the self-serving attributional bias. Psychological Bulletin, 130, 711-747.
- (1993). Introduction to personality (5th ed.). Fort Worth, TX: Harcourt Brace Jovanovich.
- (2001). Student interest, grading leniency, and teacher ratings: A conceptual analysis. Contemporary Educational Psychology, 26, 382-399.
- (1977). Grades, learning, and student evaluation of instruction. Research in Higher Education, 7, 193-205.
- , & (1988). A principal-components analysis of the Narcissistic Personality Inventory and further evidence of its construct validity. Journal of Personality and Social Psychology, 54, 890-902.
- . (1998). Students' evaluations of teaching fuel grade inflation. American Psychology, 53, 1227-1228.
- , & (1975). Self-evaluation and reactions to evaluations from others. Journal of Persnality, 43, 94-108.
- , & (1976). Effects of expected and obtained grades on teacher evaluation and attribution of performance. Journal of Educational Psychology, 68, 75-82.
- Statistics Canada. (2012). Full-time teaching staff at Canadian universities, by rank and sex. Available online at: http://www.statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/educ68a-eng.htm [accessed July 26, 2012].
- , & (2002). When a grandiose self-image is threatened: Narcissism and self-concept clarity as predictors of negative emotions and aggression following ego-threat. Journal of Persnality, 70, 509-532.Direct Link:
- , , , & (1987). The cognitive-affective crossfire: When self-consistency confronts self-enhancement. Journal of Personality and Social Psychology, 52, 881-889.
- , , , & (2008). Trumping shame by blasts of noise: Narcissism, self-esteem, shame, and aggression in young adolescents. Child Development, 79:1792-1801.
- , & (2003). Isn't it fun to get the respect that we're going to deserve? Narcissism, social rejection, and aggression. Pers Soc Psychol Bull, 29, 261-272.
- (2005). Indirect aggression among humans: Social construct or evolutionary adaptation? In R. E. Tremblay, W. H. Hartup, & J. Archer (Eds.). Developmental origins of aggression (pp. 158-177). New York: Guilford Press.
- , & . 1979. Effects of earned and assigned grades on student evaluations of an instructor. Journal of Educational Psychology, 71, 764-775.

1098-2337/asset/olbannercenter.gif?v=1&s=8d35bfb28644509545d9136b7b414e01c5b42dc9)
