How many raters? toward the most reliable diagnostic consensus



When faced with a decision whether or not to treat a patient, to enter or to withdraw a patient from a clinical trial, or any other such binary decision, based on diagnosis with unsatisfactory reliability, can a consensus diagnosis be used to improve reliability? If so, exactly how? That is the question I address here. I draw comparisons and contrasts between the known results with an interval consensus and those with a binary consensus and suggest tactics for use in a pilot study to answer the above questions.