An earlier version of this paper was presented at the 1986 national convention of the American Psychological Association, Washington, D.C. The authors would like to thank Dan IIgen and Angelo DeNisi for their helpful comments on an earlier draft of this paper.
MULTIPLE ASSESSMENT OF MANAGERIAL EFFECTIVENESS: INTERRATER AGREEMENT AND CONSENSUS IN EFFECTIVENESS MODELS
Article first published online: 7 DEC 2006
Volume 41, Issue 4, pages 779–803, December 1988
How to Cite
TSUI, A. S. and OHLOTT, P. (1988), MULTIPLE ASSESSMENT OF MANAGERIAL EFFECTIVENESS: INTERRATER AGREEMENT AND CONSENSUS IN EFFECTIVENESS MODELS. Personnel Psychology, 41: 779–803. doi: 10.1111/j.1744-6570.1988.tb00654.x
- Issue published online: 7 DEC 2006
- Article first published online: 7 DEC 2006
Research has consistently identified poor interrater agreement among multiple assessments of managerial performance. Three alternative sources of dissensus in the effectiveness ratings were examined: rating errors, selective perceptions, and variations in criteria type or weight. As the available empirical evidence and theoretical analysis show, all three causes provide plausible reasons—though in varying degrees—for the low agreement coefficient. However, an empirical study designed to test three specific hypotheses on criterion type and criterion weights found consensus in the effectiveness models of superiors, subordinates, and peers. Consensus among different raters was high on both the role behaviors and on the personal traits of the managers as criteria for effectiveness. While these findings supported Biddle's role theory (1979), disagreement on the relative weights of these criteria was evident. These observations underscore the need for further conceptualization on the preference functions of raters as a primary source of the low convergent validity coefficients among multiple raters. Further research is also desirable on contextual and cognitive factors that may lead to shifts in criterion type and criterion weight, as well as on actual rating error tendencies among different raters.