Accounting for misclassified and unknown cause of death data in vital registration systems for estimating trends in HIV mortality

Abstract Introduction Misclassification of HIV deaths can substantially diminish the usefulness of cause of death data for decision‐making. In this study, we describe the methods developed by the Global Burden of Disease Study to account for the misclassified cause of death data from vital registration systems for estimating HIV mortality in 132 countries and territories. Methods The cause of death data were obtained from the World Health Organization Mortality Database and official country‐specific mortality databases. We implemented two steps to adjust the raw cause of death data: (1) redistributing garbage codes to underlying causes of death, including HIV/AIDS by applying methods, such as analysis of multiple cause data and proportional redistribution, and (2) reassigning HIV deaths misclassified as other causes to HIV/AIDS by examining the age patterns of underlying causes in location and years with and without HIV epidemics. Results In 132 countries, during the period from 1990 to 2018, 1,848,761 deaths were reported as caused by HIV/AIDS. After garbage code redistribution in these 132 countries, this number increased to 4,165,015 deaths. An additional 1,944,291 deaths were added through correction of HIV deaths misclassified as other causes in 44 countries. The proportion of HIV deaths derived from garbage code redistribution decreased over time, from 0.4 in 1990 to 0.1 in 2018. The proportion of deaths derived from HIV misclassification correction peaked at 0.4 in 2006 and declined afterwards to 0.08 in 2018. The greatest contributors to garbage code redistribution were “immunodeficiency antibody” (ICD 9: 279‐279.1; ICD 10: D80‐D80.9) and “immunodeficiency other” (ICD 9: 279, 279.5‐279.9; ICD 10: D83‐D84.9, D89, D89.8‐D89.9), which together contributed 77% of all redistributed deaths at their peak in 1995. Respiratory tuberculosis (ICD 9: 010–012.9; ICD 10: A10‐A14, A15‐A16.9) contributed the greatest proportion of all HIV misclassified deaths (25–62% per year) over the most years. Conclusions Correcting for miscoding and misclassification of cause of death data can enhance the utility of the data for analyzing trends in HIV mortality and tracking progress toward the Sustainable Development Goal targets.


I N T R O D U C T I O N
Accurate measurement of mortality and causes of death is an essential basis of policy planning and prioritizing interventions. To facilitate systematic recording and comparison of morbidity and mortality data, the International Classification of Diseases (ICD) is used globally as the standard diagnostic classification tool [1]. The ICD is periodically revised to reflect new scientific knowledge of health and disease [2]. HIV/AIDS is a relatively recent addition to the ICD with changes in coding between the 9th and 10th revisions of the ICD [1,3]. However, despite the availability of ICD coding rules on systematic selection of the underlying cause of death [1,4], misclassification of deaths in ICD-coded vital registration data is common [5]. For instance, ICD codes for immediate causes of death (e.g. respiratory failure and cardiac arrest) or intermediate causes of death (e.g. sepsis and heart failure) are frequently classified as the underlying cause of death. ICD codes for immediate causes, intermediate causes or illdefined causes of death (e.g. sequelae of unspecified infectious and parasitic diseases) are often referred to as "garbage codes" [5] as they do not represent the underlying cause of death that triggered the sequence of events leading to death. The use of garbage codes varies substantially across countries and over time, resulting in incomparability of the causeof-death data. Certification and coding of certain causes, such as HIV/AIDS, is especially challenging because of multiple factors, including the substantial stigma associated with the disease, confidentiality-related concerns and the similarity of numerous signs and symptoms of HIV/AIDS with that of other diseases [6]. Additionally, people with HIV infection are vulnerable to opportunistic infections (e.g. cryptococcal meningitis and cerebral toxoplasmosis) [7], co-infections (e.g. HIV-tuberculosis co-infection and HIV-hepatitis B virus coinfection) [8], certain malignant neoplasms (e.g. Kaposi sarcoma and non-Hodgkin lymphoma) [9] and other comorbid conditions (e.g. endocrine disorders) [10], and these conditions are often incorrectly assigned as the underlying cause of death [11]. Four types of misclassification of HIV deaths can occur: (1) incorrectly assigning intermediate causes or illdefined causes as the underlying cause of death; (2) assignment of HIV deaths to relevant garbage codes, such as unspecified immunodeficiency; (3) allocation of HIV deaths to diseases that can mimic HIV infection (e.g. inflammatory bowel disease and some skin diseases); and (4) misassigning HIV deaths to other underlying causes of death, such as tuberculosis and meningitis.
To enable a meaningful comparison of cause-of-death data, it is essential to reallocate deaths assigned to garbage codes or misclassified as other causes to the target underlying causes. Measuring progress towards attainment of the Sustainable Development Goal target 3 to end the HIV/AIDS epidemic by 2030 requires a reduction in new HIV infections and HIV/AIDS-related deaths by 90% between 2010 and 2030 [12,13]. Robust and valid data on causes of death are thus crucial for tracking the success of programs aimed at reducing HIV/AIDS mortality. A comprehensive framework for adjusting vital registration data to enhance the accuracy and comparability of cause-of-death data for the Global Burden of Disease study (GBD) has been published elsewhere [14]. The objective of this paper is to describe the methods used to account for the misclassified cause of death data in 132 countries and territories for estimating trends in HIV mortality.

Overview
The GBD provides a standardized approach to addressing the challenges of measuring causes of death, including variability in the completeness of vital registration data, deaths attributed to garbage codes and misclassification of HIV deaths [11,14,15]. The GBD geographic hierarchy comprises 204 countries and territories grouped within 21 regions (based on epidemiological commonality and geographic closeness) and seven super-regions (groupings of GBD regions based on cause-of-death patterns) (Appendix S1). GBD 2019 evaluated the overall quality of the cause-of-death data from each country based on factors, including completeness, garbage coding and time periods covered, and gave a quality rating of 0 stars (the poorest) to 5 stars (the best) (Appendix S1) [11]. The detailed methods used to assess the quality of the data and adjust vital registration completeness have been published elsewhere [11,15]. Here, we describe the methods used for correcting misclassified cause of death data from vital registration systems in 132 countries and territories for estimating HIV mortality.

Input data
Every year, member states report to the World Health Organization (WHO) the number of deaths by sex, age group and cause of death registered through national civil registration. WHO harmonizes these data in the WHO Mortality Database and publicly release data for research purposes. Additional cause-of-death data were obtained from official country-specific mortality databases. Country-specific data sources are provided Appendix S1. We adjusted all the raw data in any format for garbage coding and HIV misclassification by (1) redistributing garbage codes to underlying causes of death including HIV/AIDS, and (2) reassigning misclassified HIV deaths to HIV. These adjustments are implemented by age, sex, year and location using the methods described below.

Redistribution of HIV/AIDS-related garbage codes
To determine the proportion of HIV/AIDS-related garbage codes (Table 1) to be reassigned to HIV/AIDS and other underlying causes of death, we first generated target proportions for each garbage group by 5-year time interval and sex for the following age categories: under 1 month, 1-59 months, 5-19 years, 20-49 years, 50-59 years, 60-69 years, 70-79 years and 80+ years. The assignment of deaths to HIV or other underlying causes is based on the level of regional increase in the mortality rate for ICD codes shown in Table 1 in each group relative to the rates observed during the period from 1980 to 1984. We assumed that an increase of more than 5% is HIV/AIDS-related, and reassigned the proportion of those excess deaths over 5% to HIV/AIDS. For an increase of 5% or less, the excess deaths were assigned to other underlying causes of death. The 5% cut-off was arbitrarily chosen based on available data on the pattern of age-specific mortality rates.

Redistribution of other garbage codes
We used two methods to redistribute other garbage codes (Appendix S1) to HIV/AIDS. For ill-defined causes (Appendix S1), proportional redistribution was used. We generated the proportions based on the distribution of the target ICD codes in the data by age, sex, location and year. These target codes were defined based on multiple cause data, pathology and evidence from the literature. Deaths from the ill-defined causes were then split proportionally over all target underlying causes, including HIV/AIDS. For HIV deaths assigned to intermediate causes, such as sepsis, we determined the fraction of deaths due to HIV/AIDS based on our analysis of the multiple causes of death data. Multiple causes of death data include a combination of an underlying cause of death and other causes, such as immediate and intermediate causes, that were included in the series of events leading to death. Analyzing multiple causes of death data can provide insight into identifying the true underlying cause of death in data from other sources where the underlying cause is incorrectly assigned to a garbage code. Multiple causes of death data from the United States, Mexico, Brazil, Taiwan, Italy and Colombia were used to inform the redistribution of the following intermediate causes to HIV/AIDS and other target underlying causes: acute renal failure; acute respiratory failure; cachexia; chronic respiratory failure; empyema; fluid, electrolyte and acid-base disorders; hepatic failure; unspecified central nervous system disorders; osteomyelitis; peritonitis; pneumonitis; pulmonary embolism; sepsis (excluding maternal and neonatal sepsis); and shock, cardiac arrest and coma. We first ran a generalized linear model to estimate the fraction of deaths related to intermediate causes for each underlying cause as a function of covariates (Appendix S1 Below is an example for sepsis, where a, s, l, y, c represent a given age group, sex, location, year and underlying cause of death: sepsis deaths a,s,l,y,c = sepsis fraction a,s,l,y,c * cause specific deaths a,s,l,y,c total sepsis deaths a,s,l,y = ∑ c sepsis deaths a,s,l,y,c fraction of sepsis to redistribute a,s,l,y = sepsis deaths a,s,l,y,c total sepsis deaths a,s,l,y , where sepsis fraction is the proportion of deaths related to sepsis, 0 is the global intercept, HAQ is the Healthcare Access and Quality Index (a measure on a scale of 0-100 created based on 32 causes for which mortality is amenable to healthcare) [16], sex is an indicator variable on sex, Y cause is the random effect on the underlying cause of death, such as HIV, and cause specific deaths is GBD 2019 estimated causespecific deaths. Separate generalized linear models were run for each age group. Due to the limited availability of data, we applied the country-specific fractions from this analysis to corresponding super-regions, and used global fractions for sub-Saharan Africa.

HIV/AIDS misclassification correction
To correct for HIV deaths that were incorrectly assigned to other underlying causes (Appendix S1), we implemented the following steps: (1) examine the age patterns of underlying causes in location and years with and without HIV epi-demics and isolate the causes with age pattern shifts during the epidemic years; (2) compute the expected deaths by location, year and sex for each underlying cause with an agepattern shift; (3) attribute expected deaths to the corresponding underlying cause; and (4) compute the difference between observed and expected deaths and reallocate the difference to HIV/AIDS [17].
To identify the age pattern in years without HIV epidemics, we generated a global standard relative mortality age pattern based on vital registration data from countries where HIV prevalence was less than 1% using the following equation: where RR asc is the relative death rate for age group a, sex s and cause c; R asc is the death rate for that age-sex-cause group; andx(R 65sc , R 70sc , R 75sc ) is the average mortality rates in the 65-69, 70-74 and 75-79 age groups for that sex and cause.
Expected deaths for an identified underlying cause were computed using the following: where ED lyasc are expected deaths for location l, year y, age group a, sex s and cause c;x(R ly65sc , R ly70sc , R ly75sc ) is the observed average cause-specific mortality rates for the 65-69, 70-74 and 75-79 age groups for that location, year, sex and cause; p lasc is the population for that location, year, age, sex and cause; and RR asc is the global standard relative mortality rate determined during the preceding step.

Overall data description
Data on HIV deaths from 132 countries were subject to garbage code redistribution (Appendix S1). During redistribution, garbage codes that contributed deaths to HIV included data from all age groups, and 29 years by sex, yielding 646,528 unique records. During the period from 1990 to 2018, 1,848,761 deaths were reported as caused by HIV. After redistribution, this number increased to 4,165,015. Countries from seven super-regions had redistributed deaths; the high-income super-region (n = 35) had the most countries with redistributed data, while South Asia had the least (n = 1). Results adjusted for vital registration completeness are available in Appendix S1. Countries with good quality vital registration systems were not largely affected by accounting for completeness.
HIV misclassification correction was more restricted than garbage code redistribution. The causes subject to HIV misclassification are shown in Appendix S1. These misclassified HIV deaths were located in 44 countries, primarily in central Europe, eastern Europe and central Asia (n = 16) and Latin America and the Caribbean (n = 16). Misclassification affected age groups up to age 65 and both sexes, for a total of 88,104 unique records. A total of 1,944,291 deaths were added through misclassification  Table 2). The proportion of deaths derived from HIV misclassification peaked at 0.4 in 2006 and gradually declined afterwards. Recent years had fewer countries with data, which tended to be higher quality, leading to lower proportions contributed by both redistribution and HIV correction in 2017 and 2018.
The greatest contributors to garbage code redistribution were "immunodeficiency antibody" and "immunodeficiency other," whereas the largest contributor to misclassified HIV deaths was "Respiratory tuberculosis" over the most years ( Figure 1).

Sources of misclassification and redistribution
The number of countries with deaths from misclassification correction and redistribution, and contributing packages and causes, varied over time ( Figure 2). The "mycobacterial skin infection" garbage code package contributed deaths in less than 10 country-years in any given 5-year period, while the "urogenital candidiasis" package contributed deaths to 15 country-years between 1990 and 1994, but only one countryyear in 2010-2014. In contrast, only 10 causes contributed misclassified deaths in more than 10 country-years. Finally, "tuberculosis, any" representing any form of tuberculosis, contributed deaths in over 100 country-years in all 5-year periods.

Patterns in redistribution and HIV misclassification correction with illustrative examples
Fifty-seven countries had at least 1 year between 1990 and 2018 when redistributed deaths contributed 100% of final deaths. The "immunodeficiency, other" ICD garbage code package (Table 1)   HIV misclassification correction contributed between 8.5% and 39.7% of total deaths across all included countries, depending on the year, between 1990 and 2018. The deaths were mainly reassigned from "respiratory tuberculosis," which contributed between 25% and 62% of all corrected deaths. The impact of HIV misclassification correction was more evident in some countries. For example, in Russia, more than 80% of final deaths were from HIV misclassification until 2004. This adjustment contributed an average of 6530 deaths per year to Russia's total death during this time period, which were primarily from "respiratory tuberculosis." Misclassification correction was also common in Thailand, where it contributed between 34.7% and 79.3% of final deaths. This country had the most deaths from misclassification correction in most years prior to 1997; the adjustment contributed an average of 7087 deaths per year during this time. Similar to redistribution, South Africa then had the most deaths from misclassification correction between 1997 and 2016, with average of 70,294 deaths per year from misclassification ( Figure 3). Also mirroring trends overall, these deaths were mostly from "respiratory tuberculosis."

Effect of redistribution and HIV correction on age and sex distributions in HIV deaths
Redistribution and HIV misclassification correction altered the sex and age distributions of HIV/AIDS deaths significantly in some countries. The effect of redistribution and misclassification correction on changing deaths between sexes is seen clearly in Australia and Hungary (Figure 4). In raw data from Australia, females had between 0 and 15 deaths between 1990 and 2017 compared to 0 to 463 deaths for males during the same period. After redistribution and misclassification correction, the proportion of deaths in females relative to males increased in some years. For example, in 2010, 64 out of 71 raw deaths were in males (90%) and 7 out of 71 raw deaths were in females (10%); after redistribution and misclassification correction, 79 out of 98 deaths were in males (81%) and the remaining were in females (19%). Hungary displayed this pattern as well. In 2010, 10 out of 10 raw   deaths were in males (100%) and none were in females (0%); after redistribution and misclassification correction, 56 out of 67 deaths were in males (84%) and the remaining were in females (16%) (Figure 4).
Changes in age distribution were less pronounced and less common than changes in sex distribution. Reported versus adjusted deaths due to HIV in Russia illustrate the change in age distribution. Garbage code redistribution and HIV misclassification correction together shifted peak deaths in 1995 to older age groups, peaking only in 40-to 44-year olds instead of having a second, smaller peak in 15-to 24-year olds. The shift to older ages was a pattern in Russia; these adjusted data were more consistent with the reported data in later years after 2009 ( Figure 5).

Effect of redistribution and HIV correction on time distributions in HIV deaths
More than sex and age, the time distributions were significantly altered by redistribution and/or misclassification, mirroring the distribution of underlying garbage codes and causes that contributed deaths. In several high-income countries, the peak in HIV/AIDS deaths was shifted earlier from the mid-1990s. This is seen in France's pre-and postadjustment death distributions, driven entirely by redistribution ( Figure 6). Counter to expected trends, zero deaths were reported prior to 2000. Redistribution increased the mortality rate in the 1990s, primarily redistributed from the "immunodeficiency" garbage codes. In addition to the skewness of  the death distribution, redistribution and misclassification also affected the shape of the mortality curve. This was evident in the Philippines, where the distribution of deaths was flattened over a longer period ( Figure 6). The reported deaths suggested a sharp increase in HIV deaths between 2010 and 2017. The redistribution adjustment, driven by "non-maternal and neonatal sepsis" and "shock, cardiac arrest and coma," flattened the slope considerably. Misclassified deaths further flattened this distribution ( Figure 6).

D I S C U S S I O N
In this study, we have demonstrated the algorithms used to correct for garbage coding and HIV misclassification. These methods can be applied to country-specific vital registration data to enhance the comparability of data within a country, between countries and over time. Our results showed that taking into account garbage coding and potential HIV misclassification could substantially enhance our understanding of the actual level and trend of country-specific HIV-related mortality.
Our results showed that the proportion of HIV deaths derived from redistribution of garbage codes have declined over time, indicating improvements in the cause of death certification and coding. Yet, there is considerable variation in the fraction of deaths assigned to garbage codes across countries globally. With the recent investments and initiatives, more training opportunities have become available in low-and middle-income countries. As a part of the Bloomberg Philanthropies Data for Health Initiative Project [18], an evaluation of a strategy called "training of trainers" showed a decrease in incorrect completion of medical certificates of 28% in Sri Lanka and 40% in the Philippines [19]. The same study evaluating a different training strategy in Peru found a decrease in incorrect medical certificates of 43% after training of physicians on an online certification system and how to complete the certificate of cause of death [19].
Misclassification of HIV deaths as other underlying causes of death is common especially in non-high-income countries. Consistent   Thailand estimated that about two-thirds of HIV deaths in the death registry during 1996-2009 did not record HIV as the underlying cause of death [20]. This finding is in line with our results, which showed that during the same period, garbage code redistribution and HIV misclassification correction together contributed between 49.9% and 96.2% of HIV deaths in Thailand. According to studies conducted in South Africa in 2000 and 2003-2004, HIV/AIDS was not recorded as the underlying cause of death for 61-73% of HIV/AIDS deaths [22,23]. Bradshaw and colleagues estimated that 93% of HIV deaths were incorrectly assigned to other causes during the period from 1997 to 2010 in South Africa [24]. Consistently, our results showed that between 89.4% (in 1997) and 94.6% (in 2003) of HIV-related deaths in South Africa were incorrectly assigned to garbage codes or other causes during the same period. This study has some limitations. The quality of vital registration data varies across countries but we did not weight the data by data quality when calculating the global relative death rates. Additionally, multiple causes of death data were available only for a subset of countries. Due to the limited availability of data, we used the global redistribution proportions for sub-Saharan Africa. Availability of additional multiple causes of death data will help to produce more accurate location-specific redistribution proportions. In this study, we followed the ICD rule and considered certain infectious diseases associated with death, such as tuberculosis, among HIV-positive people as a direct consequence of HIV/AIDS. It is possible that tuberculosis is not always a direct consequence of HIV/AIDS; for example, an HIV-negative tuberculosis patient can be newly infected with HIV. As such data become available, they could be considered in future revisions of our methods. The cause of death data corrected for garbage coding and HIV-misclassification (using the algorithms described in this paper) are used as inputs to the GBD modelling exercises, which produce results separately for overall HIV deaths and HIV-tuberculosis deaths to facilitate an understanding of the mortality burden due to HIV and tuberculosis comorbidity.

C O N C L U S I O N S
Monitoring and tracking countries' progress towards the global target of reducing HIV/AIDS-related deaths by 90% between 2010 and 2030 [12,13] requires reliable and valid HIV cause of death data. Whereas strengthening vital registration systems to accurately measure HIV-related mortality is a principal public health goal for every country, until such a goal is achieved, statistical methods, such as those presented in this paper, will be required to enhance the utility of available cause-of-death data.