Validity of ICD‐9 and ICD‐10 codes used to identify acute liver injury: A study in three European data sources

Abstract Purpose Validating cases of acute liver injury (ALI) in health care data sources is challenging. Previous validation studies reported low positive predictive values (PPVs). Methods Case validation was undertaken in a study conducted from 2009 to 2014 assessing the risk of ALI in antidepressants users in databases in Spain (EpiChron and SIDIAP) and the Danish National Health Registers. Three ALI definitions were evaluated: primary (specific hospital discharge codes), secondary (specific and nonspecific hospital discharge codes), and tertiary (specific and nonspecific hospital and outpatient codes). The validation included review of patient profiles (EpiChron and SIDIAP) and of clinical data from medical records (EpiChron and Denmark). ALI cases were confirmed when liver enzyme values met a definition by an international working group. Results Overall PPVs (95% CIs) for the study ALI definitions were, for the primary ALI definition, 84% (60%‐97%) (EpiChron), 60% (26%‐88%) (SIDIAP), and 74% (60%‐85%) (Denmark); for the secondary ALI definition, 65% (45%‐81%) (EpiChron), 40% (19%‐64%) (SIDIAP), and 70% (64%‐77%) (Denmark); and for the tertiary ALI definition, 25% (18%‐34%) (EpiChron), 8% (7%‐9%) (SIDIAP), and 47% (42%‐52%) (Denmark). The overall PPVs were higher for specific than for nonspecific codes and for hospital discharge than for outpatient codes. The nonspecific code “unspecified jaundice” had high PPVs in Denmark. Conclusions PPVs obtained apply to patients using antidepressants without preexisting liver disease or ALI risk factors. To maximize validity, studies on ALI should prioritize hospital specific discharge codes and should include hospital codes for unspecified jaundice. Case validation is required when ALI outpatient cases are considered.


| INTRODUCTION
Acute liver injury (ALI) is defined as a sudden appearance of liver test abnormalities and includes a broad spectrum of clinical scenarios, ranging from mild abnormal biochemical liver values to acute liver failure. 1,2 Previous validation studies have shown that identification of potential ALI events through diagnosis and procedural codes is challenging and that most validated algorithms have positive predictive values (PPVs) below 60%, [3][4][5] except in one study, which reported PPVs >75%. 6 All previous studies highlight the need for validation by medical record review when conducting studies of ALI based on automated health care data sources. This is especially important in drug safety studies, in which reliance on algorithms alone for automated case identification will most likely result in misclassification and overestimation of the true incidence of ALI and biased effect estimates.
As part of a recent post-authorization safety study (PASS) conducted in five European data sources investigating the potential risk of ALI associated with the use of agomelatine and nine other antidepressant drugs, 7 validation of the algorithms used to identify ALI cases was conducted. This was done via medical record review in three of those data sources: two Spanish health care databases and the Danish National Health Registers.

| METHODS
The objective of this study was to determine the ability of three ALI definitions to correctly identify ALI cases in three automated health care data sources. Specifically, we aimed to validate the following: • An ALI definition including only main hospital discharge diagnosis specific codes • An ALI definition including main hospital discharge diagnosis specific and nonspecific codes • An ALI definition including main hospital discharge and also outpatient diagnosis codes (both specific and nonspecific)

| Study setting
Five automated health care databases were used in the agomelatine PASS. 7 Three of these were used to conduct a validation study: in Spain, the EpiChron Cohort Study from Aragon Health Sciences Institute (Aragón, Spain) 8 and the Information System for Research in Primary Care (SIDIAP) (Catalonia, Spain) 9 ; and in Denmark, the Danish National Health Registers (Denmark). 10,11 The main characteristics of each database are included in Supplementary eTable S1. Of the two databases that were not used, validation by review of medical records is not an option in the German Pharmacoepidemiological Research Database (GePaRD) (Germany) [12][13][14] and was not feasible within the study timeframe in the Swedish National Registers (Sweden). 15,16 Nevertheless, an external validation study was conducted in Germany, 17 the results of which will be presented in a separate publication.

| Identification and definition of ALI
Cases of ALI were identified in cohorts of new users of the 10 study antidepressants evaluated in the agomelatine PASS study between 2009 and 2014 7 : citalopram, agomelatine, fluoxetine, paroxetine, sertraline, escitalopram, duloxetine, venlafaxine, mirtazapine, and amitriptyline. Individuals aged 18 years or older at the date of their firstrecorded prescription fill of any of the study antidepressants during the study period(s) entered the cohort if they (a) had not received a prescription fill for the same study antidepressant within the prior 12 months (new users) and (b) had at least 12 months of continuous enrolment in the data source before the first prescription fill. Absence of pregnancy at the start date of antidepressant use was an additional inclusion criterion for women. Patients with a history of liver disease or risk factors for liver disease (eg, alcohol and drug abuse and dependence-related disorders), chronic biliary or pancreatic disease, malignancy, or other life-threatening conditions (eg, HIV infection) were excluded from the study cohort (Supplementary eMethods).
Three algorithms corresponding to three ALI definitions were used in the agomelatine PASS to automatically identify potential ALI cases based on diagnosis codes (Table 1). 7,18 These definitions include combinations of codes that have shown higher (specific) or lower (nonspecific) PPVs in previous validation studies. [3][4][5][6] The primary ALI definition was defined as any patient with a specific main hospital discharge  Table 2). The primary ALI definition was

KEY POINTS
• Case validation of acute liver injury (ALI) was conducted in two Spanish databases, EpiChron and SIDIAP, and in the Danish national registers.
• Validation of potential cases included patient profiles review and adjudication based on clinical data extracted from medical records.
• The overall PPVs obtained were higher for specific than for nonspecific codes and for hospital discharge than for outpatient codes.
• The nonspecific code "unspecified jaundice" had high PPVs for all ALI definitions in Denmark but not in the Spanish databases.
• To maximize validity, studies on ALI should prioritize hospital specific discharge codes. not validated per se, but the specific codes identifying the primary ALI definition were included in the secondary ALI definition, which underwent validation. The algorithm used to identify potential cases of the secondary study ALI definition was defined as any patient with a main hospital specific or nonspecific discharge code (ICD-9-CM or ICD-10) for ALI. Finally, the algorithm for the tertiary ALI definition was assessed using specific and nonspecific codes from either ICD-9-CM or ICD-10 identified in both hospital and outpatient settings.
In EpiChron, International Classification of Primary Care (ICPC) codes were used to identify outpatient cases of the tertiary ALI definition and ICD-9-CM to identify hospital cases. In SIDIAP, ICD-10-CM was used to identify primary care diagnoses and ICD-9-CM to identify hospital cases. In Denmark, primary care codes were not available, and therefore only hospital ICD-10 codes were used both for case identification and to apply exclusion criteria. The interplay between the three ALI definitions is displayed in Figure 1.

| Diagnostic criteria for ALI
Potential cases of ALI identified with the electronic algorithms and reviewed by adjudicators were considered confirmed (true positives) 19 if any of the following three qualifying criteria for increases in serum levels with <1 year of persistence were met (aspartate transaminase The requirement of less than 1 year of persistence of the liver function test abnormalities was introduced to ensure that cases had ALI and not chronic liver injury. 19 This criterion was evaluated using the most recent liver enzymes results from the period 12 to 24 months before the index date to check whether they were not elevated beyond 10% of the ULN (if no results were available, the criterion was considered as met).
A false-positive case of ALI was defined as a potential case with enough data to be evaluated but that did not meet the criteria to be classified as a confirmed case of ALI. A nonevaluable case of ALI was defined as a potential case that lacked some of the required liver enzyme results to be evaluated.

| Validation steps
The strategy for validating potential cases identified by automated algorithms across the three data sources included up to three steps: review of patient profiles (which is a deidentified chronological listing of medical events and drug prescriptions and is used to detect exclusion diagnoses missed by the electronic algorithm and to provide an initial assignment of case status), medical record abstraction of relevant clinical data by trained health care professionals, and review of abstracted data and case adjudication by trained physicians. However, local adaptations were required in Denmark and SIDIAP to reflect data on the validation processes. In EpiChron, for quality control purposes, patient profiles of a random sample of 10 potential cases were reviewed independently by a second physician and a random sample of 25% of the confirmed cases and of 10 inpatient nonevaluable cases also were reviewed by a second physician. In SIDIAP, for the tertiary ALI definition, an electronic algorithm evaluated all potential cases, and 10% of them were also evaluated manually by trained professionals blinded to the study exposure. A very high level of agreement (kappa statistic equal to or larger than 0.95) between the algorithm and the manual reviewers was obtained before the algorithm was generalized; agreement between the two clinician reviewers was also assessed (kappa statistic = 1). Similarly, in Denmark, an algorithm was created to evaluate potential cases. Trained physicians manually reviewed 50 potential cases, all of which were also reviewed using the automated algorithm. All potential cases were evaluated using the automated algorithm only after the kappa measuring the agreement between manual review and the algorithm reached 1.

| Statistical analyses
Validity of the electronic algorithms and individual codes used to identify potential cases of ALI for the secondary and tertiary ALI defini-   Figure 2).
Regarding the tertiary ALI definition, which includes the total number of cases for all ALI definitions (see Figure 1), more than 70% of true positives in Denmark and SIDIAP and 56% of true positives in EpiChron were females. Overall, the age group with the highest number of true positives was patients 80 years and older, followed by patients aged 50 to 79 years (Supplementary eTable S4).
The overall PPVs for the algorithm used to identify potential cases of the secondary ALI definition were 65% (95% CI, 45%-81%) in EpiChron, 40% (95% CI, 19%-64%) in SIDIAP, and 70% (95% CI, 64%-77%) in Denmark ( Table 2). As discussed in the Methods section, the primary ALI definition was indirectly validated through the specific hospital discharge codes used in the secondary ALI definition, for which the overall PPVs were 84% (95% CI, 60%-97%) in EpiChron, 60% (95% CI, 26%-88%) in SIDIAP, and 74% (95% CI, 60%-85%) in Denmark. The overall PPVs for the specific codes were higher than those for the Note: In each cell, the first number refers to secondary ALI definitions, and the second number refers to tertiary ALI definitions. a One hundred fifteen patients did not undergo further validation due to the lack of additional hospital data for those cases. Among them, eight patients were excluded based on the presence of exclusion or censoring criteria and did not undergo further validation. b One hundred seven patients identified on ambulatory codes and with lack of additional hospital data were directly adjudicated during the patient profile phase. Among them, three were classified as true positives, 69 as false positives, and 35 were considered nonevaluable. c Patients with study exclusion criteria not identified by hospital codes were excluded during the abstraction or review of medical records.
nonspecific codes in all data sources (  (Table 3). In Denmark, the individual specific codes K71.2 (toxic liver disease with acute hepatitis) and K71.6 (toxic liver disease with hepatitis, not elsewhere specified) obtained the highest PPVs and captured the highest proportion of true positives ( Table 4). None of the nonspecific codes captured more than two true positives in EpiChron and SIDIAP (Table 3) For the tertiary ALI definition, the overall PPVs were 25% (95% CI, 18%-34%) in EpiChron, 8% (95% CI, 7%-9%) in SIDIAP, and 47% (95% CI, 42%-52%) in Denmark. As observed for the secondary ALI definition, we observed higher PPVs for specific than nonspecific codes in all data sources (Table 2). Among the individual specific codes, 570.x (acute and subacute necrosis of liver) had the highest PPV in EpiChron and SIDIAP (Table 3 and Supplementary eTable S5). In Denmark, code K71.2 (toxic liver disease with acute hepatitis) had the highest PPV among specific codes (Table 4). Among the nonspecific codes, 782.4

TABLE 3
Positive predictive values (PPVs) of specific and nonspecific codes used to identify potential acute liver injury (ALI) cases: Secondary (regular font) and tertiary (italics) ALI definitions in data sources using ICD-9-CM codes (nonevaluable cases not included)   In the sensitivity analysis including nonevaluable cases in the denominator of the PPV calculation, the overall PPVs for all study ALI definitions and for both specific and nonspecific codes were smaller than those for the main PPV analysis in all data sources (see Supplementary eTables S6 and S7).

| DISCUSSION
We observed consistently higher overall PPVs for specific ALI codes versus nonspecific codes and higher overall PPVs for hospital discharge codes versus outpatient codes. The identification of ALI cases based on hospital discharge specific codes, considered as the primary ALI definition in this study, resulted in higher PPVs when compared with most previously described algorithms. [3][4][5][6] In contrast to the present study, previous studies conducted to validate ALI cases have reported PPVs below 60%, 3-5 or around 75%. 6 A recently published systematic review and meta-analysis including 29 studies validating ALI or drug-induced liver injury (DILI) (25 of them presenting PPVs) showed a pooled PPV estimate for ALI of 13.4% (95% CI, 6.1%-22.8%) and for DILI of 15.3% (95% CI, 9.5%-22.2%). 21 The authors of that study suggested that the low PPVs observed in the studies might be explained by the low prevalence of ALI or DILI.
In addition, a different list of diagnosis codes, laboratory threshold criteria, and study drugs might be the cause of the differences between studies. When we compared our study with previous studies validating ALI definitions, we observed that our study differed from these previous studies in different ways: Bui et al 6

| Strengths and limitations
In terms of number of validated cases, the present validation study

| CONCLUSIONS
The PPVs obtained in this study apply to patients using antidepressants without preexisting liver disease or risk factors for ALI. Future studies evaluating ALI in these and similar data sources should prioritize use of hospital discharge and specific codes to maximize validity.
Moreover, case-identifying algorithms should include hospital ICD codes for unspecified jaundice. In studies including nonspecific codes and outpatient cases, case validation is essential.

FUNDING
This study was funded by Les Laboratoires Servier under a contract granting independent publication rights to the research team.

DATA AVAILABILITY
The data sets used for this study are owned by each of the individual research center or by the government data custodians from which the research centers obtained access to the data at IACS (Spain), SIDIAP (SIDIAP), BIPS (Germany), Karolinska Institutet (Sweden), and Southern Denmark University (Denmark). Researchers desiring access to the data sets would be required to obtain permission from research center and/or data custodians at each country. Researchers desiring access to the code used to analyze that data would be required to obtain permission from the research centers and the study sponsor.