“PATHOLOGIST SUSPENDED FOR FAILING TO DIAGNOSE CERVICAL CANCER” screams the tabloid headline, signalling yet another witch-hunt against the medical profession. The implication is that the pathologist was incompetent, and that if only proper standards had been in place women would not have had to suffer the discomforts of radical surgery and radiation therapy. This excess of emotion results in new guidelines being formulated by hospital trusts, health boards or government agencies, guidelines which may have little basis in scientific research and which are increasingly prescriptive.
For the truth is that government agencies have little notion of the frailty of histological diagnosis. It will never be possible to identify every woman with cervical intra-epithelial neoplasia or invasive carcinoma of the cervix, because of the inherent unreliability of cervical cytology and histological diagnosis. The reliability of a diagnostic test is its ability to give the same result if it is performed more than once in the same woman, the tests being performed independently of each other. It is because of this need for independence that two or more observers are required to measure reliability, the observers being unaware of each others' assessments. Reliability is an inherent property of the clinical test, and so is not influenced by the observers, provided the observers are fully trained in the technique of the assessments. Reliability is measured by the agreement between the observers and is expressed as a quantity, enabling the investigators to make a judgement about the value of the clinical test. Formal measurement of reliability should be the first investigation in the evaluation of a clinical test, yet all too often a test is introduced into routine clinical practice without this basic requirement.
It was this consideration which led Mario Preti and his colleagues (pages 594–599) to undertake their investigation of the reliability of the histological diagnosis of vulval intraepithelial neoplasia (VIN). Sixty-six slides were sent to six fully trained pathologists in five European countries. The slides were not marked by identifying signs and were sent in random order to each pathologist, who did not consult the others in making his diagnosis. No clinical features were included. Thus all the requirements were fulfilled for a rigorous study of reliability.
There was poor agreement between the observers concerning the adequacy of the specimen and the histological features of infection with the human papillomavirus, and moderate agreement concerning atypical cytological patterns, neoplastic architectural patterns and the grade of vulval intraepithelial neoplasia. The observers assessed histological grade in five ordered categories: no VIN; VIN grades I to III; and superficial invasive carcinoma of the vulva. Agreement was measured by the kappa statistic but from the information in Table 6 it is possible also to measure a weighted proportion of agreement: it is 0.86 (95% confidence interval 0.78, 0.94). The advantage of the proportion of agreement is that it allows the agreement to be quantified. The interpretation of this proportion of agreement is that in 100 slides the pathologists will agree that the histological features are normal or abnormal, however normal and abnormal are defined, in 86 cases, and will disagree in 14. This is good agreement, but it is not perfect. It is the test which is not perfect, not the pathologists. No amount of additional training of the pathologists will improve the reliability of the test.
The study by Preti and colleagues is important, for based on these results realistic standards can be set for the histological diagnosis of vulval intraepithelial neoplasia. These standards are not arbitrary; instead they are founded on the results of this rigorous scientific research which has quantified the reliability of a diagnostic test. Only when government agencies understand the concept of reliability will they be less likely to react inappropriately to tabloid headlines of “missed cancer”, and instead agree to guidelines which are based on realistic standards.