The objectives of this study were to characterize the validity of algorithms to identify AF from electronic health data through a systematic review of the literature and to identify gaps needing further research.
The objectives of this study were to characterize the validity of algorithms to identify AF from electronic health data through a systematic review of the literature and to identify gaps needing further research.
Two reviewers examined publications during 1997–2008 that identified patients with atrial fibrillation (AF) from electronic health data and provided validation information. We abstracted information including algorithm sensitivity, specificity, and positive predictive value (PPV).
We reviewed 544 abstracts and 281 full-text articles, of which 18 provided validation information from 16 unique studies. Most used data from before 2000, and 10 of 16 used only inpatient data. Three studies incorporated electronic ECG data for case identification or validation. A large proportion of prevalent AF cases identified by ICD-9 code 427.31 were valid (PPV 70%–96%, median 89%). Seven studies reported algorithm sensitivity (range, 57%–95%, median 79%). One study validated an algorithm for incident AF and reported a PPV of 77%.
The ICD-9 code 427.31 performed relatively well, but conclusions about algorithm validity are hindered by few recent data, use of nonrepresentative populations, and a disproportionate focus on inpatient data. An optimal contemporary algorithm would likely draw on inpatient and outpatient codes and electronic ECG data. Additional research is needed in representative, contemporary populations regarding algorithms that identify incident AF and incorporate electronic ECG data. Copyright © 2012 John Wiley & Sons, Ltd.
Atrial fibrillation (AF) is an increasingly common arrhythmia that increases the risk of stroke and death.[1, 2] Electronic health data can be used to study the epidemiology of AF, assess quality of care, and monitor for AF as an adverse event related to newly approved medications. The latter is of interest to the Food and Drug Administration (FDA), which through its Sentinel Initiative seeks to monitor prospectively the safety of medical products using electronic health data for over 100 million people.
These efforts depend on the accuracy of electronic health data to identify AF. To date, there has been no systematic review of the validity of algorithms used to identify AF. The primary objectives of this project were to review all studies that have validated algorithms to identify AF from electronic health data, summarize their results, and identify gaps in knowledge for future research.
Detailed methods of this systematic review can be found in the accompanying manuscript. (see “Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned” by Carnahan and Moores on page 82) Briefly, as part of the FDA's Mini-Sentinel pilot, systematic reviews including this one were conducted for 20 health outcomes. The search strategy was developed by investigators at the University of Iowa in collaboration with the FDA, based on prior work by the Observational Medical Outcomes Partnership. Details of the search strategy are available online at the Mini-Sentinel website. PubMed was searched on 14 May 2010 and 6 July 2010 and the Iowa Drug Information Service database on 11 June 2010. Mini-Sentinel collaborators were asked to identify relevant unpublished studies or other validation studies of which they were aware.
Two authors independently reviewed each abstract. Articles were selected for full review if AF was studied using electronic health data from USA or Canada. If the reviewers disagreed or there was insufficient information, the article was selected for full-text review.
Two authors independently reviewed the articles to identify validation studies described in the article or its references. Articles were excluded if they did not meet the abstract inclusion criteria (previously mentioned), if the algorithm for identifying AF was inadequately described, or if validity statistics were not provided or could not be calculated. If there was a disagreement, reviewers attempted to reach consensus, and if they could not, a third author was consulted. One author (PJ) extracted information from articles for the evidence tables, and a second author (SD) checked it for accuracy.
Inter-rater agreement about inclusion of abstracts and articles was calculated using Cohen's kappa.
PubMed searches identified 527 citations and Iowa Drug Information Service searches 49, for a total of 544 unique citations. Of these, 249 were selected for full-text review (Figure 1). Cohen's kappa for agreement among reviewers regarding article selection for full-text review was 0.62 (95% confidence interval [CI] 0.55–0.69). A total of 281 full-text articles were reviewed: 249 from the original search and 32 more identified mainly from these articles' references. From the 281 articles, we identified 18 reporting 16 unique validation studies. Cohen's kappa for agreement on the presence of a validation study meeting inclusion criteria before consensus discussions was 0.83 (95%CI 0.68–0.98).
The methods and data sources used to identify AF varied across studies. Of the 16 unique studies, 10 used only inpatient data,[6-15] two used only outpatient data,[16, 17] and four used both.[18-21] Two studies described algorithms incorporating electronic ECG data in addition to administrative data but did not actually validate these algorithms.[17, 21] Studies differed according to which ICD-9 codes were used, which coding positions were searched, and the number of codes required; algorithm details are provided in Tables 1 and 2. Most studies used the code 427.31 (AF). Four studies explicitly included atrial flutter (ICD-9 code 427.32).[11, 12, 19, 20] Three others[15, 16, 18] probably included some cases of atrial flutter because they used a four-digit ICD-9 code, 427.3.
|Citation||Population and time period||Outcome(s) studied||ICD-9 code(s)||Validation/adjudication procedure and definition||Validation statistics|
|Alonso (2009)||Atherosclerosis risk in communities cohort study||Prevalent and incident AF†||427.31||1. Physician review of hospital discharge summaries and inpatient ECGs||1. PPV 89% (111/125) for any AF|
|Age 45–65 years in 1987–1989; excluded if baseline ECG showed AF||PPV 62% (78/125) for incident AF**|
|Substudy 1: participants with a first inpatient ICD-9 code for AF||2. Medical record review||2. SN 84% (135/161)|
|Substudy 2: Suspected stroke cases||SP 98% (1351/1385) for prevalent AF|
|Antani (1996)||Inpatients, two teaching hospitals||Prevalent AF||427.31||Review of medical record including ECGs||PPV 90.8% (178/196)|
|Brass (1997)||Medicare patients age ≥65 years from acute care hospitals; 50% with ischemic stroke; none with principal diagnosis of acute MI or nonstroke embolic event||Prevalent AF||427.31 in primary or secondary position||Medical record review seeking ECG(s) interpreted as showing AF. If none found, cardiologist reviewed medical record||PPV 97% (686/707) for any history of AF|
|1994||‘Detailed’ history: notes included dates or specific treatments for AF||PPV 89.9% (635/707) for ‘detailed’ history|
|Flaker (1998)‡||Hospitalized medicare||Prevalent AF||427.31 in primary or secondary position||Medical record review||PPV 90.2%|
|patients aged ≥65 years, without rheumatic heart disease or a recent cardiothoracic procedure who did not die and were not transferred during the admission||Looser criterion: physician acknowledgement of AF in notes||(1035/1147) for looser criterion|
|Stricter criterion: AF on ECG or rhythm strip||PPV 70% (800/1147) for stricter criterion|
|Hravnak (2001)||CABG patients age >18 years exclusions: prior AF, other or prior cardiac surgery, perioperative or postoperative MI, death within 12 hrs of surgery 1996–1998||New-onset AF||427.31||Word search run on the clinical database and from pharmacy data (receipt of procainamide)||SN 56.9% (148/260)|
|Medical records reviewed to verify AF and determine if new onset|
|Psaty (1997)||Cardiovascular health study (age ≥65 years, four geographic areas). Excluded if had a pacemaker or AF at baseline (from ECG, Holter or self-report)||Prevalent AF or atrial flutter||427.3, 427.31, 427.32||Physician review of all hospital ECGs||SN 70.7% (29/41)|
|Validation study: participants with potential cerebrovascular or cardiovascular events§|
|1992 and 1996|
|Shen (2008)||HMO inpatients (Kaiser Permanente Southern California)||Prevalent AF or atrial flutter||427.31 in any position or 427.32 among the first three codes||Review of medical records including ECGs||PPV 96% (96/100)|
|Shireman (2004)||Medicare patients discharged from acute-care hospitals; 750 with primary or secondary diagnosis of AF from each US state||Prevalent AF||427.31 in primary or secondary position||Review of inpatient records for index AF hospitalization||PPV 71.1 % (27 674/38 924)|
|Whittle (1997)||Medicare patients age ≥65 years discharged from five small hospitals; no open heart surgery||AF present during hospitalization||427.31||Medical record review to confirm whether AF occurred during hospitalization||PPV 85% (242/322)|
|Yuan (1998)||Medicare patients age ≥65 years at one teaching hospital; exclusions: stroke or venous thrombosis in the prior year; cancer; race unknown or ‘other’||Prevalent AF||427.3 in one of five diagnosis fields||Hospital discharge records containing up to 27 diagnosis fields††||SN 87.7%**|
|Citation||Population and time period||Outcome||ICD-9 code(s)(inpatient or outpatient unless otherwise specified)||Validation/adjudication procedure and definition||Validation statistics|
|Borzecki (2004)||VA outpatients with ≥2 clinic visits at least 6 months apart; 100 with hypertension and 20 without from each of 10 sites||Prevalent AF||427.3: ≥1 code in 1 year||Review of outpatient medical record; AF considered confirmed if mentioned in the chart||≥1 claims diagnosis in 1 year: SN 80%, SP 99%, PPV 84%|
|Examined impact of varying number of codes required and number of years of claims data used|
|≥2 claims diagnoses in 1 year: SN 67%, SP 99%|
|≥1 diagnosis in 2 years: SN 86%, SP 97%|
|≥2 diagnoses in 2 years: SN 74%, SP 99%|
|Brophy (2004)||Patients with ≥1 encounter at VA Boston healthcare system and ≥1 ECG showing AF in the electronic ECG database||Prevalent AF||427.3, 427.31||ECG showing AF in electronic database||SN 77.8% (2619/3366)|
|Dublin (2006)‡||HMO members; average age 73, 38% male, and 73% with treated hypertension||Prevalent AF or atrial flutter||427.31 or 427.32 in any position from an inpatient, outpatient or ED encounter||Medical record review||SN 95% (236/248)|
|SP 99% (244/247)|
|Glazer (2007)||HMO enrollees without a previous ICD-9 code for AF||Incident AF or atrial flutter||427.31 or 427.32||Medical record review||PPV 76.8% (1105/1438)|
|Go (2000)†||HMO members (Kaiser Permanente Northern California)||Prevalent AF||Validation study: >1 outpatient 427.31 but no electronic ECG showing AF||ECG showing AF in medical record||PPV 78% (39/50)|
|Go (2001)||HMO members (Kaiser Permanente Northern California) ≥20 years old.||Prevalent AF||Validation study: one outpatient 427.31 but no electronic ECG showing AF§||ECG showing AF in medical record||PPV 56%** (28/50)|
One study specifically validated an algorithm for incident AF.[6, 20] To ensure that people were free of AF prior to the event of interest, this study excluded people with any prior ICD-9 code for AF or atrial flutter during their health maintenance organization (HMO) enrollment. Another study defined incident AF as the first study of ECG showing AF or a hospital ICD-9 code for AF during follow-up and excluded people whose baseline study of ECG showed AF, but the study lacked other information about AF history before baseline. Because information about prevalent AF was very limited, we did not classify this study as a relevant validation study of incident AF.
Two studies used a population or gold standard that made their findings less relevant to the aims of this review. Go et al. reviewed medical records to validate the presence of AF in a population not likely to have AF. Yuan et al. used as a gold standard a hospital database of diagnosis codes, rather than medical record review. Their results shed light on how the number of diagnosis codes in a database impacts sensitivity (not surprisingly, databases retaining a limited number of codes per hospitalization have lower sensitivity than those with a larger number) but are not informative about the actual validity of AF diagnosis codes. These studies are described in Tables 1 and 2, but their results are omitted from the summary statistics for positive predictive value (PPV) and sensitivity.
Fourteen of 16 studies validated the diagnosis of AF by medical record review, including four studies[6-8, 11] in which a physician reviewed ECGs from the medical record. The definition used for the gold standard varied markedly, ranging from lax (any mention of AF in the chart) to strict (requiring that the chart contain an ECG or rhythm strip showing AF.) One study used as a gold standard a teaching hospital database with up to 27 diagnosis codes for each hospitalization (versus five in the primary database). The remaining study provided information from which we could calculate the sensitivity of their algorithm compared with their institution's electronic ECG database.
Validity statistics for individual studies are shown in Tables 1 and 2. We included in these summary measures all PPVs and sensitivities reported or that could be calculated from each study that we classified as a relevant validation study. In summary, the PPV of algorithms ranged from 70% to 96% (median 87%). Sensitivity was reported in seven studies [6, 10, 11, 15, 16, 18, 19] and ranged from 57% to 95% (median 79%).
Algorithm validity varied by characteristics of the algorithm, the data source, and the gold standard chosen for comparison. Borzecki found that increasing the required number of AF codes from one to two decreased the sensitivity from 80% to 67%, whereas specificity was unchanged. Extending the time period searched for these codes from 1 to 2 years increased the sensitivity from 80% to 86%, whereas specificity decreased from 99% to 97%.
Two studies examined the impact of different validation criteria on algorithm validity. Brass reported a PPV of 97% using any documented history of AF within the medical record as the validation criterion, compared with a PPV of 90% when validation required dates and specific treatments for AF. Similarly, Flaker et al. reported that using a validation criterion of any physician acknowledgement of AF yielded a PPV of 90.2%, whereas requiring an ECG or rhythm strip showing AF yielded a lower PPV of 70%.
One study reported that sensitivity was slightly lower for African Americans than for whites (80% vs. 85%), whereas specificity was similar. Most studies included predominantly white populations.
We identified 16 studies that validated algorithms to identify AF from electronic health data. Validation statistics varied by algorithm and data source, with the PPV from the 14 most relevant and comparable studies ranging from 70%–96% and sensitivity from 57%–95%. The PPV was lower for incident than for prevalent AF and was affected by characteristics of the algorithm, the database, and the validation criteria.
The PPV of a test is highly influenced by the prevalence of the disease in the source population. Many studies focused on older individuals[8, 9, 11, 13-15] or other high-risk populations. Results from these studies may overestimate the true PPV in the general population. Variability in the racial makeup of the population between studies may have also played a role; studies performed on populations with a larger percentage of nonwhites may result in a lower PPV, given that overall, the prevalence and incidence of AF are lower in nonwhite populations.
Validation criteria varied substantially across studies. In clinical practice, the gold standard for diagnosing AF is a 12-lead ECG. Few studies required an ECG showing AF as part of their validation criteria. Instead, most considered AF confirmed based on any mention in the medical record. The use of this looser criterion will identify people with paroxysmal AF or a remote history of AF who would be incorrectly considered free of AF if validation criteria required a positive ECG. On the other hand, relying on any mention of AF in the chart could lead to misclassification, as some patients reported to have AF may not truly have such a history. The use of stricter validation criteria would be expected to result in lower apparent PPV for the algorithm being considered.
Limitations on the number of diagnosis codes per hospitalization in some databases may also limit the ability to identify AF, lowering the sensitivity of algorithms. This characteristic of the source database was not explicitly discussed in the majority of studies, so we could not assess its contribution to cross-study variation.
No study specifically examined what was added by including 427.32 (atrial flutter). AF and atrial flutter share many features. They often occur within the same individual, and their potential complications and clinical management are similar. Physician documentation and encounter coding may not distinguish atrial flutter from AF when both arrhythmias have been present. For many purposes, an algorithm that includes codes for both AF and atrial flutter will be appropriate.
Some clinical trials monitor ‘serious’ or symptomatic AF and do not investigate or monitor asymptomatic AF. The articles we reviewed did not distinguish between symptomatic and asymptomatic AF, so separate analyses of these AF subtypes were not possible. We believe that for the purposes of post-marketing surveillance, this distinction is not particularly helpful. Both symptomatic and asymptomatic AF are of interest, because both confer an increased risk of stroke and death and thus are clinically relevant. In addition, the decision to treat AF (e.g., with antithrombotic agents) is not guided by symptoms alone but by overall stroke risk; so, for future studies seeking to identify cohorts of AF patients the distinction between ‘serious’ and ‘nonserious’ AF will not be relevant.
Only one study specifically examined the PPV of an algorithm for incident, rather than prevalent, AF. Incident AF is of particular interest because incident outcomes are more useful than prevalent conditions when conducting surveillance for adverse effects of new medications. Also, comparative effectiveness studies may seek to include only newly diagnosed, treatment-naïve individuals to decrease some types of bias that arise in observational studies of treatment effectiveness. The few data available suggest that algorithms have considerably lower PPV for incident than for prevalent AF.
Our search strategy may have failed to identify some validation studies. Most studies provided limited information about the characteristics of the source population, which may influence risk of AF and thus algorithm PPV. Also, the use of different gold standards across studies inhibited our ability to compare directly the validation statistics for different algorithms. No included studies examined the validity of algorithms using ICD-10 codes, although we expect that algorithm validity may be similar in that setting because the general approach to categorizing these arrhythmias has not changed from ICD-9 to ICD-10.
Most of the studies we identified used data from 10 to 15 years ago. The PPV of these algorithms may be higher in the current era because the prevalence of AF has increased. In addition, the number of codes per encounter retained in electronic databases has generally increased over time. Thus, the sensitivity of algorithms may be higher in contemporary data than our results suggest.
No studies compared the validity of algorithms using only outpatient or only inpatient diagnosis codes to algorithms using both. Algorithms using data from only one setting are likely to be less sensitive than algorithms using data from both settings. Many of the studies we identified included only inpatient data, and their findings may not be generalizable to databases including both inpatient and outpatient codes.
Overall, the included studies had considerable heterogeneity in terms of their aims and methods. Most provided little information about their validation methods, which prevented us from ranking the studies according to quality. Only one study set out specifically to validate an algorithm to identify AF from electronic medical data. Additionally, the populations studied may not be a representative of the general US population today, in terms of many characteristics including comorbidity and race/ethnicity. Given the increasing use of electronic medical records, as well as the development of electronic ECG databases, the lack of recent data inhibits the generalizability of these results to contemporary studies. Further study is urgently needed given recent changes in the available data.
Across a broad range of electronic health databases, an inpatient or outpatient ICD-9 diagnosis code of 427.31 correctly identifies prevalent AF in a substantial proportion of patients. Including code 427.32 (atrial flutter) may be desirable in many settings. Combining inpatient and outpatient AF diagnosis codes with AF diagnoses from an electronic ECG database appears especially promising for identifying prevalent AF. However, no study has validated this approach using medical record review as the gold standard.
Although available data are limited, we propose that the best algorithm to identify incident AF would use inpatient and outpatient diagnosis codes and also electronic ECG data, with the availability of data for a multiyear lead-in period (for example, 2–5 years) to demonstrate that individuals were free of AF previously. Whether both an ICD-9 code and an ECG should be required needs further study. The validity of algorithms for incident AF warrants further study, because this outcome is of particular interest for purposes including prospective drug safety surveillance and comparative effectiveness research.
Dr. Dublin received a Merck/American Geriatrics Society New Investigator Award. No other authors reported conflicts of interest. This manuscript was derived from a longer report which is published on the Mini-Sentinel website.
Mini-Sentinel is funded by the Food and Drug Administration (FDA) through the Department of Health and Human Services (HHS) contract number HHSF223200910006I. Additional support was provided by NHLBI grants HL068986 and T32HL007902, NIA grant K23AG028954, and Group Health Research Institute internal funds.