Corresponding Author Heribert Ramroth, University of Heidelberg, Institute of Public Health Im Neuenheimer Feld 324, 69120 Heidelberg, Germany. Tel.: +49 6221 56-5215; Fax: +49 6221 56-5948. E-mail: firstname.lastname@example.org
Objectives To compare the cause of death distribution using the Physician Coded Verbal Autopsy approach versus the Interpreting Verbal Autopsy model, based on information from a French verbal autopsy questionnaire, in rural north-western Burkina Faso.
Methods Data from 5649 verbal autopsy questionnaires reviewed by local physicians at the Nouna Health and Demographic Surveillance Site between 1998 and 2007 were considered for analyses. Information from VA interviews was extracted to create a set of standard indicators needed to run the Interpreting Verbal Autopsy model. Cause-specific mortality fractions were used to compare Physician Coded Verbal Autopsy and Interpreting Verbal Autopsy results.
Results At the population level, 62.5% of causes of death using the Interpreting Verbal Autopsy model corresponded with those determined by two or three physicians. Although seven of the 10 main causes of death were present in both approaches, the comparison of percentages of single causes of death shows discrepancies, dominated by higher malaria rates found in the Physician Coded Verbal Autopsy approach.
Conclusion Our results confirm that national mortality statistics, which are partly based on verbal autopsies, must be carefully interpreted. Difficulties in determining malaria as cause of death in holoendemic malaria regions might result in higher discrepancies than those in non-endemic areas. As neither Physician Coded Verbal Autopsy nor Interpreting Verbal Autopsy results represent a gold standard, uncertainty levels with either procedure are high.
Objectifs: Comparer la distribution des causes de décès selon l’approche de l’Autopsie Verbale Codée par le Médecin à celle la méthode de l’Interprétation de l’Autopsie Verbale, sur la base des informations obtenues à partir d’un questionnaire d’autopsie verbale français, dans les régions rurales du nord-ouest du Burkina-Faso.
Méthodes: Les données de 5.649 questionnaires d’autopsie verbale (AV) analysées par les médecins locaux dans le site de Surveillance Démographique et de Santé de Nouna entre 1998 et 2007 ont été prises en compte pour l’étude. L’information à partir des entretiens AV a été extraite pour créer un ensemble d’indicateurs standard nécessaires pour le fonctionnement du modèle de l’Interprétation de l’autopsie verbale. Les fractions de causes spécifiques de mortalité ont été utilisées pour comparer les résultats de l’Autopsie Verbale Codée par le Médecin et ceux de la méthode de l’Interprétation de l’Autopsie Verbale.
Résultats: A l’échelle de la population, 62,5% des causes de décès selon le modèle de l’Interprétation de l’Autopsie Verbale correspondaient avec celles déterminées par deux ou trois médecins. Bien que 7 des 10 principales causes de décès étaient présentes dans les deux approches, la comparaison des pourcentages des causes uniques de décès montrait des discordances, dominées par les taux plus élevés de paludisme trouvés dans l’approche de l’Autopsie Verbale Codée par le Médecin.
Conclusion: Nos résultats confirment que les statistiques nationales de mortalité qui sont en partie basées sur des AV doivent être interprétées avec prudence. Les difficultés à déterminer le paludisme comme cause de décès dans les régions holoendémiques pour le paludisme pourraient entraîner des discordances plus importantes que dans les zones non endémiques. Étant donné que ni l’approche de l’Autopsie Verbale Codée par le Médecin, ni la méthode de l’Interprétation de l’Autopsie Verbale, constitue une référence, les degrés d’incertitude avec l’une ou l’autre approche sont élevés.
Objetivos: Comparar la distribución de las causas de muerte utilizando el enfoque de codificación de las AV por parte de médicos versus el modelo probabilístico de interpretación de las AV, InterVA (Interpreting Verbal Autopsy), utilizando la información de un cuestionario francés de autopsias verbales, en la zona rural del noroeste de Burkina Faso.
Métodos: Para el análisis se tuvieron en cuenta datos de 5,649 cuestionarios de autopsias verbales (AV) revisadas por médicos en el área de vigilancia demográfica y sanitaria de Nouna entre 1998 y 2007. Se extrajo información de las AV para crear una serie de indicadores estándar necesarios para implementar el modelo InterVA. Se utilizaron las fracciones de mortalidad por causa específica para comparar los resultados de las AV codificadas por médicos y del InterVA.
Resultados: A nivel de la población, un 62.5% de las causas de muerte utilizando el modelo InterVA correspondían con aquellas determinadas por dos de tres médicos. Aunque 7 de las 10 principales causas de muerte estaban presentes en ambas metodologías, había discrepancias en los porcentajes de una única causa de muerte, principalmente debidas a unas tasas de malaria más altas al utilizar la metodología de codificación por médicos.
Conclusión: Nuestros resultados confirman que las estadísticas nacionales de mortalidad que están en parte basadas en AV deben de ser cuidadosamente reinterpretadas. Las dificultades en determinar la malaria como causa de muerte en regiones holoendémicas para malaria resulta en mayores discrepancias que en áreas no endémicas. Puesto que ni los resultados de las autopsias verbales codificadas por médicos ni la InterVA representan un patrón oro, los niveles de incerteza con cualquiera de los dos procedimientos es alta.
In many developing countries, most deaths occur outside hospitals. Reliable cause-specific mortality data based on clinical data with complete coverage of the population as in industrialised countries are not available. Therefore, information on numbers and causes of death (COD) are normally based on health facility data, Health and Demographic Surveillance Systems (HDSS), or sample surveys like the demographic and health surveys, which cover only a small proportion of the target area. While local registries may have clinical data, they often lack a well-defined sampling frame, so that estimates of national rates or total number of deaths may be questionable (Winkler et al. 2011). Surveys and HDSS data, on the other hand, lack clinical data, and CODs may be misdiagnosed.
Probable CODs are often obtained using the Verbal Autopsy (VA) method, in which trained field workers conduct an interview with one of the closest relatives of the deceased about signs, symptoms and circumstances preceding the death (Soleman et al. 2004). VA is a health surveillance technique used to quantify causes of mortality in settings where vital registration systems are weak or absent. VA is a useful and often the only possible method to derive COD statistics in developing countries, despite well-known limitations (Joshi 2009).
In the past decade, computerised models have been developed for determining COD, in the hopes of minimising errors associated with the physician diagnosis method and rendering tracking COD more consistent at the population level (Byass et al. 2006; Murray et al. 2007a,b). One such publically available model, the InterVA (Interpreting Verbal Autopsy), has begun to be used most notably in the INDEPTH, a global network of members who conduct longitudinal health and demographic evaluation of populations in low and middle income countries (LMICs). The aim of InterVA is to obtain posterior probabilities for CODs, given an a priori distribution of CODs in the population and conditional probabilities for circumstances leading to death (Byass et al. 2003). This study presents a comparison between the PCVA and the InterVA methods in the Nouna HDSS in Burkina Faso, a malaria-holoendemic region. We focus on analysing the agreement of both methods on the population level, as the population distribution of the CODs is more important for the public health point of view than the individual outcomes.
Study area and population
Burkina Faso is one of the poorest countries in the world; 46% of the population live in poverty, and of those, 92% live in rural areas (D’Ambruoso et al. 2010). The official language is French. The HDSS site of the Nouna Health Research Center (CRSN, Centre de Recherche en Santé de Nouna) is located in north-western Burkina Faso, covering 59 villages on 1756 km2 with about 81 500 inhabitants in 2008 (Siéet al. 2010). The area has a sub-Saharan climate, with two very distinct wet and dry seasons.
The Nouna HDSS VA questionnaire
In addition to the routine data collection rounds, the Nouna HDSS integrates the VA process for COD ascertainment. Up to 2007, four rounds of routine data collection were conducted per year. Trained field staff (without a medical background and not specifically trained in health care) visit the household a certain time after a registered death and interviews the person who assisted the deceased before his death. The Nouna questionnaire covers background characteristics of the deceased using structured filter questions on specific signs and symptoms experienced by the deceased up to the point of death (including diarrhoea, vomiting, convulsions, neck stiffness, epilepsy, difficulty breathing, cough, rashes, measles, wounds, lesions, bleeding, etc.). Additionally, free text provides an opportunity to describe conditions not catered for in the structured questions. The interview is a locally adapted version of an INDEPTH standard VA questionnaire given in French, which is translated into local languages (mainly Dioula, but also Bwamu, Moore and Fulfulde) (Siéet al. 2010). The method for ascertaining CODs in Nouna has been consistent over time in the period 1998–2007, where 7445 deaths were registered.
Application of the InterVA model
A detailed description of the InterVA model has been given elsewhere (Byass et al. 2006). In short, InterVA seeks to define the probability of a given set of CODs given a set of 106 indicators using a Bayesian principle, involving conditional and marginal probabilities between causes and indicators. As a standard model designed for cause of death determination in LMICs, it has the advantage of consistency over time and place (Fottrell et al. 2010; Byass et al. 2011). The model has subsequently been evaluated in a number of settings (Byass et al. 2010; Fantahun et al. 2006). Recently, it was shown that InterVA is robust for different a priori distributions of CODs (Fottrell et al. 2011). Thus, a country-specific a priori distribution is not needed to apply the model to the local situation. Considering the malaria and HIV/AIDS prevalence in the Nouna region, we set the malaria prevalence to ‘high’ and the HIV/AIDS prevalence to ‘low’. The current version of the model (InterVA-3) calculates the posterior probabilities for 35 COD groups for each individual given individual disease indicators built from the French VA questionnaire and country-specific information on HIV/AIDS and malaria. It displays up to three probable CODs and their corresponding likelihoods. Fewer than three causes are displayed if the probability of the second or third cause is less than 50% of the probability of the preceding cause (Fottrell et al. 2010). Thus, the sum of the likelihoods displayed may not add up to 100% for any given observation.
To our knowledge, our study is the first to apply the InterVA model using a French-language VA questionnaire for all ages. Ethical approval for this study was granted by the Ethics Committee of the University of Heidelberg and by the local ethics committee in Nouna.
Electronically captured VA information was allocated to the InterVA indicators by an international team of physicians and epidemiologists. After the InterVA was run, the likelihoods for the cause category were summed up to derive population-level cause-specific mortality fractions (CSMFs). In cases where these likelihoods did not sum up to 100%, the remaining percentage was added to the proportion of indeterminates following a concept described previously (Fottrell et al. 2007). The weighted InterVA and final physician CODs were compared by determining agreement in each category, as in the following example: if a COD C counted as 15% in PCVA and 10% in InterVA, the minimum of the two (i.e. 10%) was regarded as the agreement at the population level, regardless of whether the same records were included by physicians and the InterVA in that 10% or not. The sum of all minima was regarded as ‘concordance’ of the two methods. Physicians in the Nouna HDSS used a more detailed COD list with 148 categories, while the model only had 35 possible diagnoses. To allow for comparison, we narrowed the two COD lists to 24 common categories (Table 2). Data preparation was performed using STATA 9, and all analyses were performed using SAS 9.2.
We considered all deaths that occurred in the Nouna HDSS from 1998 to 2007 for which VA questionnaires were available (5649 of 7445 deaths, 75.8%). The study population comprised 51.5% men and 48.5% women. Mean age at death was 26.1 years, ranging from day of birth up to 83 years (Table 1). The majority of patients (60.3%) died during the dry season (November–May).
Table 1. Study population characteristics for 5649 deaths in the Nouna Health and Demographic Surveillance Systems
4 weeks-1 year
15–49 years, male
15–49 years, female
Information from the Nouna VA questionnaire data could be extracted for 69 (64.5%) InterVA indicators. Sixteen (15.1%) of the indicators could not be allocated because they rely on clinical diagnosis, information that cannot reliably be obtained from informants at the household and is not available in the VA questionnaire for the many individuals who did not seek clinical care prior to death. The concordance between the two methods was 62.5% (Table 2). Differences were observed in the estimates for the distribution of CODs attributed to malaria, meningitis, diarrhoea, pneumonia/sepsis and tuberculosis. The InterVA model attributed 11.1% of deaths to malaria, while physicians attributed nearly triple that number (31.9%). The model attributed 12.3% of deaths to meningitis, designated as the COD by physicians only one-fourth as frequently (2.7%). Diarrhoea was designated as the COD in 15.0% of cases by the model, in contrast to 10.2% by physicians. No notable differences were observed in the proportions of deaths owing to malnutrition, non-communicable disease, injury and external COD.
Table 2. List of the 21 causes of death (COD), frequency of deaths by physician and interpreting verbal autopsy (InterVA), and agreement
Other acute infection
HIV/AIDS related death
Other chronic infection
Acute respiratory disease not pneumonia
Chronic respiratory disease
Disease of nervous system
External cause of death
Kidney or urinary disease
Maternity related death
Figure 1 shows CODs separately by season, showing nearly identical concordance rates (wet season, 62.7%; dry season, 62.8%). One can see higher proportions of malaria during the wet season for both approaches, whereas the percentage of pneumonia/sepsis, tuberculosis (pulmonary) and diarrhoea increased slightly during the dry season.
Causes of death by age group are shown in Figure 2. At least three or four CODs are concordantly within the top five diagnoses. However, the relative importance of each COD differs greatly between the two methods within all age categories. Comparing the level of agreement in the different age groups, the lowest concordance was observed for those below 1 year of age (nearly 50%) and the highest for those between 15 and 49 years (males, 69.9%; females, 67.8%). The reasons for the discrepancies in the CSMFs were mainly found to be malaria, meningitis and diarrhoea for those under five and HIV/AIDS, tuberculosis and pneumonia/sepsis for those aged 15 and older. Cardiovascular diseases played an increasing role with increasing age (Figure 2). According to the physician diagnoses, malaria increased again for adults above 65 years, in contrast to the InterVA model. Following the malaria pattern over the years, the InterVA showed a decrease in malaria, in contrast to the PCVA method, where the percentage for malaria remained stable or only slightly decreased over the years (data not shown).
The percentage of yearly averaged malaria cases per month for both approaches are shown in Figure 3. The shapes of the curves are very similar, but the solid curve of physician’s diagnoses is constantly above the dashed curve indicating those of InterVA. During the dry season, physicians assign on average 25% of all deaths owing to malaria, during the wet season about 35%, in contrast to the InterVA model (dry season, 5%; wet season, 15%). For sensitivity analysis, we set the model’s malaria indicator to ‘yes’ in case of any presence of fever, presented by the dotted curve falling midway between the two curves described above.
Physician coded verbal autopsy has been used to obtain CODs for decades, and only recently, InterVA has been applied at different sites of the INDEPTH network. A recent review concluded that InterVA represents an effective way forward for standardised interpretation of VA data (Byass et al. 2010; Kinyanjui & Timæus 2010). Thus, the aim of our study was to apply the InterVA model to a French-language questionnaire, in order to compare InterVA results and those obtained by physician coding in a holoendemic malaria setting. Seven of the 10 main CODs matched in both approaches, covering 66.6% of all physician CODs and 60.1% of the InterVA CODs. The estimated concordance between PCVA and InterVA was 62.5%, thus falling midway between those shown in other studies (40–80%) (Byass et al. 2003, 2006; Fantahun et al. 2006; Oti & Kyobutungi 2010), although these studies used different comparison methods.
Regarding overall and age-specific distributions, physicians more frequently identified malaria as COD than did the InterVA model. It is generally thought that physicians tend to overdiagnose malaria in highly endemic malaria areas (Chandler et al. 2008; Gwer et al. 2007), just as HIV/AIDS tends to be overdiagnosed in HIV-endemic areas (Anglewicz & Kohler 2009). This might be especially true for those cases where questionnaire information is sparse, but VA has generally been shown to be a poor diagnostic tool for malaria (Snow et al. 1992). In contrast, InterVA most frequently identified diarrhoea and pneumonia/sepsis, followed by malaria (only diagnosed one-third as frequent as by PCVA) and meningitis (four times higher than that in PCVA). These results are not entirely surprising as there is a great overlap between these disease conditions in terms of clinical symptoms and signs between meningitis and malaria (Kallander et al. 2004). Burkina Faso is one of the core countries of the meningitis belt, where more than 90% of all meningitis cases occur during calendar weeks 1–20 (Tall et al. 2012). As malaria is endemic throughout the year, during the meningitis season both diseases are present, with potential for misdiagnosis. Moreover, from a public health perspective, control and prevention of either disease cannot be considered without regard to the other, and it seems to be acceptable to treat malaria even if false positive (White 2009). Nevertheless, it is important to be able to distinguish between the conditions to evaluate the effectiveness of control programs.
InterVA showed a decreasing trend of malaria during the study period, slightly more pronounced than shown by PCVA (Figure 4). The strength of the model approach is that it follows consistent rules in the categorisation of malaria-related CODs. This might be different for PCVA, where multiple physicians are involved in the coding process over the years. Several arguments support some of the InterVA malaria patterns: (i) the ratio of deaths comparing wet and dry season observed previously (Greenwood et al. 1987), (ii) the pattern of a decreasing proportion of malaria for older age groups (Lopez et al. 2006) and (iii) high levels of malaria in adult women (but not adult men) (Lemma et al. 2010). However, studies examining malaria in adults are sparse, as malaria is more threatening for those under 5 years of age.
The InterVA indicator ‘any diagnosis of malaria’ is the only indicator that uniquely captures malaria. Few people in the Nouna region access health care for treatment of their ailments, and many die without having consulted a physician who might have diagnosed malaria. Symptoms such as ‘acute fever’, which are present in many other diseases as well, are consequently not malaria-specific (Delley et al. 2000; Nwuzo et al. 2009). Thus, for our analyses, we decided to set this indicator when acute fever was present, and the deceased had taken chloroquine sulphate, the medication prescribed most commonly against malaria in the Nouna region between 1998 and 2007. Hypothesising that physicians may be heavily influenced in malaria diagnosis by any presence of fever, we set the malaria indicator in all cases with fever for subsequent analysis (dotted line, Figure 3). This resulted in a higher malaria proportion (21.6% vs. 11.1% of the total CODs) and a higher overall concordance between the models (70.2% versus 63.0%) supporting this hypothesis.
It appears plausible that malaria highly influences the concordance rates in the different age groups except those between 15 and 64 years of age. The reason for the high concordance in the age group 15–49 might be that both methods capture the main characteristics of the circumstances leading to death: here, the portion of diarrhoea, meningitis, pneumonia/sepsis, malnutrition and external CODs show similar patterns. This holds true for malaria in males, too, but not for females. However, in females, this is compensated by the similar portion of maternity-related CODs. The concordance in the age group below 4 weeks needs some closer look: both methods agreed in the percentage of preterm births, but a higher percentage of intrapartum deaths was found by the model (mainly identified as asphyxia) compared with PCVA (mainly declared as stillbirths). Here, stillbirth and asphyxia were grouped into intrapartum death following a series of manuscripts in the International Journal of Gynecology and Obstetrics (Lawn et al. 2009) and reflecting the difficulty to exactly define a COD on the basis of VA data for this age group. Discrepancies were observed for meningitis, diarrhoea, pneumonia/sepsis and malaria – where the latter is by definition not a possible COD of the model for neonates. The biggest gap was found in the category ‘indeterminates’: the model could not allocate any COD for 12 cases (5.5%). The remaining 14.5% of indeterminates in the CSMFs result from how the model accumulates percentages to this category in these cases where the likelihoods for the CODs do not sum up to 100% (see Methods, Fottrell et al. 2007). In contrast, physicians did not agree on a final COD in 7.3% of the 218 deaths, and for another 34.8%, the final COD agreed upon was ‘unspecific neonatal death’– representing ‘indeterminates’, too. However, these results are based on a small group of neonatal deaths only.
The seasonal pattern for meningitis in PCVA and InterVA had a similar shape during the dry season, but showed discrepancies during the wet season, during which deaths from meningitis are less likely (Hart & Cuevas 2009). We assume that InterVA overestimates meningitis deaths owing to the insufficient allocation of indicators that strongly influence the posterior probability of meningitis (e.g. stiff neck, convulsions, rigidity). In general, the quality of the model’s results depends on how well the VA questionnaire captures the information necessary for setting the InterVA indicators. Some of the indicators included in the probabilistic model were not available in the data from the Nouna VA questionnaire (e.g. transport, smoking, suicide, alcohol consumption, poisoning, excessive food intake and indicators beginning with ‘any diagnosis of’). The latter ones remain a problem in regions where the population does not regularly attend clinics. However, the wording ‘diagnosis’ tries to distinguish between ‘facts’ versus guesses of the interrogated relative.
In some instances multiple indicators are covered in the Nouna questionnaire by a single dichotomous question, which complicates their allocation to the InterVA indicators. It is likely that these mismatches had an impact on the model’s performance, although they would also have affected physicians’ judgement. However, little is known about how questionnaire design and data completeness affect the reliability and utility of VA data, whichever method of interpretation is used (Fottrell et al. 2010). Additionally, free texts might provide an opportunity to describe conditions not catered for in the structured questions. Free text information was not included in the InterVA model here, as it was not electronically available. In separate analyses, we considered the role of narrative free-text in discrepancies between physician coding and the InterVA regarding the determination of malaria as COD. In a representative subsample, we saw an increase in the concordance rate between the two methods of about 2% showing a similar pattern for the CSMFs (Rankin et al. 2012). InterVA and PCVA shared the limitation of a relatively long time between date of death and date of interview. The longer the time since death, the higher the expected loss of information for both methods. In our study, around 60 of the interviews were conducted within 1 year. However, no differences were observed in the number of indicators set for the model, stratifying for different time periods (data not shown). Further discrepancies in COD coding might have been due to the Nouna questionnaire’s particular focus on motherhood/childhood, with reduced emphasis on non-communicable diseases and HIV/AIDS, which was also reflected in the respective CSMFs. However, HIV/AIDS is unlikely to be a major source of discrepancy, as it represents only 3.7 and 1.1 of the overall CODs for PCVA and InterVA, respectively (Figure 1) (Tollman et al. 2008). HIV’s relative importance is higher for the age groups 15–49 years, with a much higher proportion in the distribution of CSMFs for PCVA (men, 16.5%; women, 18.4%) than that for InterVA (men: 4.7%, women: 5.3%). Here, we assume that some HIV/AIDS diagnoses might be contained within the 6.5% diagnosed as tuberculosis by the InterVA, as shown in other settings (Tensou et al. 2010).
Byass et al. (2010) grouped 250 CODs used by physicians into categories comparable with the 35 COD codes used by the InterVA. To compare InterVA with PCVA, it was necessary to group CODs into 24 common categories. Such a categorisation provides only a broad picture of the full range of CODs, but it is nonetheless useful for representing major COD burdens offering opportunities for intervention. Consolidating causes, with multiple-related causes falling into a single category, is consistent with the philosophy of focusing on cause definitions and broad care needs of public health importance, rather than focusing on traditional clinical and pathological approaches (Fottrell et al. 2010).
InterVA provides the possibility of identifying outcomes of major health concern both for routine surveillance and for focused research. However, our results confirm that national mortality statistics, which are partly based on VAs, must be carefully interpreted. InterVA did not replicate physicians coding in this setting; but in the absence of a gold standard, one might ask whether replication should be the aim. Difficulties in determining malaria as COD in holoendemic malaria regions, a problem relevant for both physicians and mathematical models, might result in higher discrepancies than in non-endemic areas. This question might best be addressed by a study in a malaria-endemic setting, comparing outputs from both systems with clinical data as a gold standard. Standardising VA questionnaires and a continuous improvement of the InterVA would lead to better comparability across HDSS sites and countries (Vergnano et al. 2011).
Sources of financial support: Collaborative research grant ‘SFB 544’ of the German Research Foundation. INDEPTH network supported data entry for VA questionnaires. EF and PB were supported by the Swedish Council for Working Life and Social Research. The authors would like to extend their appreciation to the team of the Centre de Recherche en Santé de Nouna and to the children and families who participated in the various studies.