Validation and application of verbal autopsies in a rural area of South Africa


correspondence K. Kahn, Department of Communith Health, University of the Witwatersrand, Johannesburg, South Africa. E-mail:


Summaryobjective  To validate the causes of death determined with a single verbal autopsy instrument covering all age groups in the Agincourt subdistrict of rural South Africa.

methods  Verbal autopsies (VAs) were conducted on all deaths recorded during annual demographic and health surveillance over a 3-year period (1992–95) in a population of about 63 000 people. Trained fieldworkers elicited signs and symptoms of the terminal illness from a close caregiver, using a comprehensive questionnaire written in the local language. Questionnaires were assessed blind by three clinicians who assigned a probable cause of death using a stepwise consensus process. Validation involved comparison of VA diagnoses with hospital reference diagnoses obtained for those who died in a district hospital; and calculation of sensitivity, specificity and positive predictive value (PPV) for children under 5 years, and adults 15 years and older.

results  A total of 127 hospital diagnoses satisfied the criteria for inclusion as reference diagnoses. For communicable diseases, sensitivity of VA diagnoses among children was 69%, specificity 96%, and PPV 90%; among adults the values were 89, 93 and 76%. Lower values were found for non-communicable diseases: 75, 91 and 86% among children; and 64, 50 and 80% among adults. Most misclassification occurred within the category itself. For deaths due to accidents or violence, sensitivity was 100%, specificity 97%, and PPV 80% among children; and 75, 98 and 60% among adults. Since causes of death were largely age-specific, few differences in sensitivity, specificity and PPV were found for adults and children. The frequency distribution of causes of death based on VAs closely approximated that of the hospital records used for validation.

conclusion  VA findings need to be validated before they can be applied to district health planning. In Agincourt, a single verbal autopsy instrument provided a reasonable estimate of the frequency of causes of death among adults and children. Findings can be reliably used to inform local health planning and evaluation.


Information on cause of death is needed by policy-makers, planners and managers at every level: local and district, provincial, national and international (Who/Unicef 1994). Health and development policies and programmes are influenced by it. Programme monitoring and evaluation depend on it. Yet for much of the developing world, where scarce resources need to be carefully allocated, data on cause of death are incomplete or absent.

The verbal autopsy (VA) is a tool to determine probable cause of death in areas lacking a vital registration system (Fortney et al. 1986; Garenne & Fontaine 1990; Gray et al. 1990; Zimicki 1990; Bang et al. 1992; Ross 1992; Snow & Marsh 1992; Snow et al. 1993; Chandramohan et al. 1994). The technique relies on clinical assessment of signs and symptoms during the terminal illness, reported retrospectively by a close caregiver of the deceased. It is based on the assumption that most causes of death can be distinguished by their signs and symptoms, and that these can be accurately recognized, recalled and reported by lay respondents (Snow & Marsh 1992). Findings are determined in part by the causes of death in the community, and in part by the questionnaire, field procedures, and the analytic process used (Chandramohan et al. 1994). There are thus multiple factors influencing the outcome of a VA study which make it critical to validate the findings. Without this, reliability of the data cannot be established. Yet few validation studies have been conducted (Datta et al. 1988; Kalter et al. 1990; Mirza et al. 1990; Snow et al. 1992; Todd et al. 1994).

Recent initiatives include validation of a VA tool for adult deaths and a WHO/UNICEF multicentre initiative to validate a tool for childhood deaths (Daniel Chandramohan, David Ross: personal communication). These are testing the validity of standardized VA instruments and diagnostic algorithms in a variety of settings. Achieving this, however, lies in the future. In the interim, a feasible validation process, to determine whether VA findings can be used as part of a health information system, should be sought and incorporated where possible, into every VA study.

This paper reports on such a process: the validation of VA findings compared with hospital diagnoses, produced as part of an ongoing demographic surveillance system established to support district health planning. A single verbal autopsy instrument was introduced with the aim of providing a reasonable estimation of the distribution of causes of death, for all ages combined, in order to establish priorities for action in the local population. Given that most causes of death are age-specific, results are presented for adults and children separately.

Study area

The Agincourt field site was established in South Africa's rural north-east in 1992 and covers a population of some 63 000 people in 20 villages. As part of the Bushbuckridge demonstration health district (Tollman et al. 1993), the site had two major purposes: to pilot and evaluate subdistrict health centre programmes; and to address the complete absence of valid population-based data to inform health programme planning and evaluation. The study area contains four clinics and a larger health centre which refer patients to one of three district hospitals. Details of the area and baseline findings are fully described elsewhere (Tollman et al. 1999).


Demographic and health surveillance

This comprises the systematic recording, on an annual basis, of all births, deaths, and migration events. Data are collected by a team of 10 local fieldworkers, and entered onto a personal computer via a custom-designed data entry program. Birth and death registration are virtually complete.

Verbal autopsy

Specially trained fieldworkers conducted a verbal autopsy in each household where a death had been recorded, selecting as respondent the person most closely associated with the deceased during the terminal illness. The interview was carried out in her/his mother tongue (Tsonga) once its purpose had been fully explained and consent obtained. Fieldworkers were recruited locally to ensure a common cultural background with the local community. All had completed secondary school, were experienced in conducting surveys, and had demonstrated the ability to conduct a VA interview with insight and empathy. The first several interviews of each fieldworker were carefully monitored and supervised. Thereafter, feedback sessions were held on a regular basis, providing an opportunity to appraise the quality of information recorded.

The interview schedule, adapted from a questionnaire first used in Niakhar, Senegal (Garenne & Fontaine 1990) is divided into two main parts: an open section where the informant freely describes the symptoms and signs preceding death, and their sequence; followed by a closed section in which a basic filtering question (such as presence of a fever or diarrhoea), when answered positively, leads to a more detailed enquiry of the particular symptom. Further sections address use of modern and traditional treatments, and lifestyle practices. The questionnaire was translated into Tsonga, the local language, back-translated into English, and modified to reflect culturally recognized and accepted terms.

Clinician assessment was the method used to determine VA diagnoses (predefined algorithms were not applied), an approach similar to that of other studies (Garenne & Fontaine 1990; Snow et al. 1992; Dowell et al. 1993; Todd et al. 1994). Each completed questionnaire was reviewed independently by two medical practitioners. If the same diagnosis was reached, this was accepted as the ‘probable cause of death’. Where not, a third practitioner made a further blind and independent assessment. If two out of three diagnoses corresponded, the three medical reviewers discussed the case. Where consensus was achieved, the diagnosis was accepted. Where not, the cause of death was described as ‘undetermined’.

Validation study

The validation consisted of a comparison of VA final diagnoses with hospital reference diagnoses, taken as a gold standard, followed by calculation of their sensitivity, specificity and positive predictive value (PPV).

All persons reported to have died in one of the three local hospitals were included in the validation study and their hospital records sought. This involved identifying the hospital number from the mortuary register, then locating the patient's record from the hospital filing system. Only those records meeting a series of inclusion criteria were accepted as ‘gold standard’ or reference diagnosis. Hospital records were assessed by a medical practitioner (KK) blind to the VA diagnoses to be used for comparison. Records were included if a diagnosis was substantiated by radiological or laboratory reports; or if it was consistent with a full recorded history and examination, despite the absence of radiological or laboratory reports. Note that this approach excluded undetermined cases from the hospital sample. Although it is difficult to distinguish cases genuinely undetermined from those with an incomplete assessment, where possible such cases should be included in future work.

The main cause of death determined by verbal autopsy was compared with the corresponding hospital diagnosis. Where the diagnosis on VA and hospital record were in agreement, the diagnoses were categorized as ‘the same’. Those diagnoses not ‘the same’ were categorized as either ‘different’ (VA diagnosis wrong, i.e. false negative) or ‘undetermined’ (no diagnosis had been assigned on VA). False positives occurred when a diagnosis appeared on VA but not in the corresponding hospital record. Twenty-two causes of death were identified and classified consistent with categories in the International Classification of Diseases, 9th revision (WHO 1978). For practical purposes pneumonia was included in the infectious and parasitic disease category. Sensitivity, specificity and positive predictive value were computed according to standard formulas (Hennekens & Buring 1987).


Loss to follow-up

During the 3-year period 1992–95, 1001 deaths were recorded in Agincourt. A verbal autopsy was attempted in all cases and successfully conducted in 93% for all ages combined (Table 1). The proportion of successful interviews was slightly lower for deaths of children (90%) than for deaths of adults (94%). No interviews were refused. Despite two return visits for the remaining 69 deaths, no suitable respondent could be located, due largely to high mobility in the area.

Table 1.  Deaths in Agincourt 1992–95, and attrition of hospital cases available for study Thumbnail image of

Table 1 details the selection of deaths. For all ages combined, 60% (= 604) occurred at home or at the site of an injury. The remaining 40% (= 397) occurred in hospital, with 320 occurring at one of the three district hospitals receiving referrals from the Agincourt subdistrict. Hospital records were found for 58%, with two-thirds being suitable for inclusion in the study. Poor quality clinical information precluded use of the remainder as reference diagnoses. The proportion of children who died in the three hospitals was slightly higher (39%) than that of adults (33%), the proportion of records found was also higher (76%) than for adults (53%), but the proportion of diagnoses that could be ascertained was lower (57%) than for adults (72%). Altogether, the proportion of VAs that could be validated was only slightly higher for children (17%) than for adults (13%).

Frequency distribution of cause of death from VA and hospital records

Considering only those VAs included in the validation study, the frequency distribution of death by cause from the verbal autopsy, compared with hospital records, for both children and adults, was similar for most disease categories (Table 2). No difference was statistically significant. An exception was the ‘unknown’ category, involving 14 VA diagnoses, which did not appear among the selected hospital diagnoses.

Table 2.  Frequency distribution of causes of death from verbal autopsy and from hospital records Thumbnail image of

Comparing the frequency distribution of VA diagnoses in the population with the hospital records, however, revealed certain selection biases inherent in a hospital sample (Table 2). As anticipated, these include an under-representation of accidents and violence, and an over-representation of infectious and parasitic disease, together with chronic illness, including circulatory, liver and endocrine diseases. Among children, the representativeness of the hospital sample was somewhat better, although showing an over-representation of kwashiorkor. Biases were quite strong among adults, with over-representation of infectious and parasitic, circulatory, digestive, endocrine (primarily diabetes), and malignant causes, but with a strong under-representation of accidents and violence.

Infectious and parasitic diseases

Among the 33 hospital cases of infectious and parasitic disease (all ages), sensitivity of the VA diagnoses was 82%, specificity 93% and PPV 79%. High sensitivity was achieved for pulmonary tuberculosis in adults (92%), diarrhoeal diseases (86% in children and 100% in adults), and AIDS (although numbers were small for the latter category, = 4) (Table 3). Specificity for these diseases reached 99%. The seven false positive diagnoses were distributed amongst the different diseases in the category. Specificity was consistently high for adults (93%) and for children (96%). Sensitivity was higher among adults (89%), and lowest for certain childhood diseases such as meningitis and typhoid.

Table 3.  Validation of deaths due to selected causes, Agincourt 1992–95 Thumbnail image of

Non-communicable diseases

For the 86 cases of non-communicable diseases (all ages), sensitivity of the VA diagnoses was 65%, specificity 66% and PPV 81%. Sensitivity, specificity and PPV among children were 75, 91 and 86%, respectively; and among adults 64, 50 and 80% (Table 3). Among children, sensitivity and specificity were highest for kwashiorkor (100 and 100%), and lower for other non-communicable diseases (67 and 97%). Among adults, sensitivity was highest for cerebrovascular accident (87%), and lower for diabetes, malignant neoplasm, chronic liver disease, cardiac disease, and other non-communicable diseases (75, 64, 64, 50 and 63%, respectively). None of the renal conditions were identified in the verbal autopsies. The numbers for other specific causes were too small to validate adequately.

Since many signs and symptoms are shared by the different non-communicable diseases, most cases of misclassification occurred within this category (11 of the 18 false negatives, and 11 of the 13 false positives). Some examples are given in Table 4. For instance, malignant diseases could be mistaken for a terminal condition (liver disease, renal disease, cerebrovascular accident, diarrhoea); conversely, congestive cardiac failure, chronic liver disease and renal diseases could be misdiagnosed as other chronic conditions, and in a few cases, as acute infection.

Table 4.  Misclassification of non-communicable diseases Thumbnail image of

Violent and accidental deaths

For the eight hospital cases of accidental and violent deaths, sensitivity of the VA diagnoses was 88%, specificity 98% and positive predictive value (PPV) 70%. Values were somewhat higher among children (100, 97 and 80%, respectively) than among adults (75, 98 and 60%) (Table 3). These deaths included household accident, accidental injury, motor vehicle accident, suicide and homicide. However, the numbers were too small to validate separately.


Despite problems of historical inequity and rural neglect, South Africa has a public health infrastructure which is used, to a varying extent, by local people. In Agincourt, 40% of deaths occurred in a hospital, with 80% of these in district referral hospitals. This provides the opportunity to validate VA findings for all ages. Few other sub-Saharan African countries, where VA studies are likely to be conducted, have this degree of access to western health services.

Hospital diagnoses were used as the ‘gold standard’. Several biases are inherent in this. First, hospital deaths do not represent all deaths in the community, a typical example being those due to accidents and violence. However, this should not affect the sensitivity and specificity of those causes of death investigated. Second, the total sample was small. Thus, confidence intervals around specificity and sensitivity were relatively wide and less frequent causes of death could not be adequately validated.

Third, respondents of hospitalized patients may not be representative of those in the community. However there were only minor biases with respect to level of education: the proportion of adults with no education was 68% in the community, 65% among hospital deaths, and 50% among the validated cases. Mean level of education was identical in these three groups: 5.8 years among those with at least 1 year of formal schooling. The only noticeable bias was the proportion of persons of Mozambican origin: 13% among validated cases and 18% among hospital deaths, as opposed to 27% in the whole population. While the quality of information reported may be better following exposure to modern medical care, knowledge of signs and symptoms may be impaired following separation from the deceased during hospitalization.

Fourth, the ‘gold standard’ provided by hospital documentation is not absolute. Of 23 false negative cases, at least five could be considered ‘compatible’, that is lead to somewhat different interpretations of essentially the same disease. For instance, the two false positive AIDS cases, recorded as a diarrhoea and a dementia in the hospital, are open to doubt since serology was not done: it is therefore possible that the VA diagnoses were correct and the hospital records incomplete.

Findings in Agincourt, as elsewhere, are influenced by the local environment. Sensitivity for infectious and parasitic diseases was 69% in children, 89% in adults, and 82% for all ages combined, a relatively high value. This would not be so were there more cases of typhoid and malaria in the area, diseases difficult to diagnose from postmortem interviews (Todd et al. 1994). Conversely, cases of measles or neonatal tetanus, diseases with unique signs, would probably have produced a higher figure. In Bushbuckridge, deaths due to vaccine-preventable diseases have been low for some time, probably because of annual mass immunization campaigns conducted by the local health service.

Deaths from accidents or violence were frequent: 140 of 932 deaths in the population, contributing 42% of deaths among males aged 15–49 years (Kahn et al. 1999). Yet only 11 were available for validation since many occurred at the site of the assault or accident, or never reached the local hospitals. Although sensitivity in this study was high (100% in children, 75% in adults, and 88% for all ages), it is not as high as that found in other studies (97% in the adult validation study conducted by Chandramohan et al. 1998). A close look at the false positive diagnoses reveals that a family's explanatory construct—particularly when death is sudden and unexpected—affects both recall and reporting of symptoms and signs leading up to the terminal event (Gray et al. 1990; Gray 1992). This was evident in the case of an acute pancreatitis, well diagnosed in the hospital, which was interpreted as poisoning by the family.

Sensitivities for the non-communicable disease category are generally lower than for infectious and parasitic diseases and accidents and violence. This was anticipated since few of these diseases have unique features, as demonstrated for cardiovascular disease elsewhere (Lopez 1993). Highest sensitivities were obtained for kwashiorkor among children and for cerebrovascular accident among adults. The latter was generally described as a sudden onset of hemiplegia and often attributed, by the family, to bewitchment.

The less reliable VA diagnoses for non-communicable diseases are more useful than they initially appear: most misclassification occurs within the non-communicable disease category itself, and the specific diseases tend to have common risk factors. Examples are hypertension (probably the most important factor underlying cerebrovascular accident and cardiac failure) and alcohol consumption (contributing to chronic liver disease, cardiomyopathy, and peptic ulcer disease). Interventions directed at these factors will address a number of the non-communicable diseases simultaneously.

Specificities, computed relative to all diseases other than the one investigated, were high for most conditions. This was partly due to the good performance of the VA questionnaire and careful assessment of reported information by clinicians. However, the high values were also a consequence of the large number of disease categories used, and therefore the correspondingly low prevalence of each condition. An analysis restricted to defined age-groups could be expected to increase the proportion of all deaths for certain causes, and consequently lower the specificity. However, when analysing children and adults separately, specificities remained high.

In the separate analysis of children (age < 5 years) and adults (15 years and older), sensitivity, specificity and positive predictive value were quite similar for the selected causes of death. This finding was not surprising since most true cases of disease are largely age-specific. The only significant difference was for all non-communicable diseases combined, where the specificity in children (91%) was significantly higher than that in adults (50%) (P-value 0.001). This reflected very few false positives in children, since verbal autopsy reviewers would not readily code non-communicable diseases in this age group.

Agincourt findings can be compared with those of other studies, most of which target children. The relatively low sensitivities obtained for meningitis, septicaemia and pneumonia are consistent with those determined elsewhere (Kalter et al. 1990; Snow et al. 1992). Diarrhoea among the under-5s achieves considerably higher sensitivity in Agincourt (86%) than in Kilifi, Kenya (36%) (Snow et al. 1992) and is comparable with that in Cebu, Philippines (78%) (Kalter et al. 1990). Diseases without unique features tend to have low sensitivity, while those with specific features (e.g. neonatal tetanus) can reach sensitivities of 100% (Kalter et al. 1990). Diarrhoea may be a symptom of many illnesses though not necessarily the underlying cause. The high sensitivity demonstrated here may be related to the prevalence of diarrhoeal illness in the area. It could also reflect the emphasis in fieldworker training, on duration and sequence of symptoms and signs.

  A key finding in this study was the similarity between the cause of death distribution derived from the validated sample of verbal autopsies, and that from the hospital sample. Thus, in this particular setting, the frequency of cause of death in the overall population, derived by verbal autopsy, is likely to be valid and can therefore be used to support planning and resource allocation. This substantially improves on decisions based solely on the distribution of hospital deaths.

  In Agincourt, verbal autopsies formed a routine element of a surveillance system introduced to support district and provincial health systems planning. Demographic surveillance provided trends in mortality by age and sex, while the VAs provided a reasonable estimate of cause of death. Unlike most other studies, the Agincourt work demonstrates that a single verbal autopsy instrument, covering deaths in all age-groups, can be satisfactorily validated for children under 5 years and adults 15 years and older. In sum, this experience demonstrates that verbal autopsy findings, from a single instrument, can be effectively used in areas without vital registration, to prioritize public health problems, inform resource allocation, and target and evaluate community-based interventions.


This work could not have been conducted without the support of the study communities, and the interest of health providers to respond to findings. Ethical clearance was granted by the University of the Witwatersrand's Committee for Research on Human Subjects (Medical) (No. M960720). We acknowledge the roles of Obed Mokoena, Evangeline Shivambu, Julia Moorman, Paul Moxey and the Northern Province Department of Health, and thank David Ross for valuable comments on an earlier draft. The European Union and Kagiso Trust, Henry J. Kaiser Family Foundation and Trust for Health Systems Planning and Evaluation have all taken an interest in this work. Kathleen Kahn was supported by the British Council, and Stephen Tollman by the British Council and a Wellcome Trust Travelling Research Fellowship (049336/Z/96/Z) during the analysis and write-up of this work.