Although the interpretation of entheses has undergone considerable discussion recently, the role of interobserver error, especially in comparative contexts, has been only sporadically addressed. Using standards developed by Hawkey and Merbs, currently the most widely used system, rates of reproducibility were evaluated in two prehistoric North American skeletal series. Eight observers of varying experience levels scored 17 long bone entheses, representing both fibrous and fibrocartilaginous attachment types, on 58 individuals. Results showed rates of reproducibility to be only marginally higher than what would be expected by chance alone.
Observer experience level did not appear to be a factor nor was attachment type. As might be predicted, those entheses enjoying the highest rates of reproducibility exhibited relatively smooth attachment morphology and/or less defined boundaries whereas those with the lowest rates displayed the greatest range of surface morphology expression. Possible explanations for the levels of interobserver error observed include difficulties in reducing the highly variable enthesis morphology to a few discrete categories, categories that encompass too many criteria, and use of vague terminology in describing morphological features. Consequently, comparison of data across studies by different observers, especially those not trained by the developer of a given scoring method, must be undertaken with great caution. Copyright © 2012 John Wiley & Sons, Ltd.