Interobserver variation in visual evaluation was analyzed for 10 cranial traits in a homogeneous archaeological series. Two observers independently scored cranial traits commonly used for determination of sex. Though determination of sex did not differ significantly for the two observers, individual traits had different levels of interobserver reliability. In addition, indices of relative “maleness” and “femaleness” derived by the two observers differed at statistically significant levels. Because such indices are used in cross-population comparisons of relative gracility and robusticity of diverse samples, these comparisons should be interpreted with caution when more than one investigator has performed an assessment. Most of our instances of interobserver discordance derived from character traits described in subjective terms without accompanying diagrams. Clarity of definition, rather than number of character traits, was found to be critical for effective determination of sex by the visual assessment method. Use of fewer, more precisely defined character traits can improve interobserver reliability. Am J Phys Anthropol, 2004. © 2004 Wiley-Liss, Inc.