Novel methodology for the evaluation of symptoms reported by patients with newly diagnosed atrial fibrillation: Application of natural language processing to electronic medical records data

Understanding symptom patterns in atrial fibrillation (AF) can help in disease management. We report on the application of natural language processing (NLP) to electronic medical records (EMRs) to capture symptom reports in patients with newly diagnosed (incident) AF.


| INTRODUCTION
Symptoms and quality of life experienced by patients with atrial fibrillation (AF) play a central role in choosing AF therapies and assessing treatment response. As symptomatic improvement is the primary indication for rhythm control in AF, 1,2 understanding symptom patterns in specific subgroups of patients with AF would be clinically valuable to guide treatment decisions and improve insight into patient experience.
Broad AF cohorts have reported that the majority of patients with AF are symptomatic and that symptoms correlate with risk of hospitalization. 3 Symptoms commonly reported among patients with AF include palpitations, fatigue, dyspnea, syncope, lightheadedness, dizziness, exercise intolerance, and chest pain or tightness. 3,4 Electronic medical records (EMRs) represent an enormous repository of data, which has driven advances in medical research; however, there is a large, untapped source of information within EMRs in the form of nonstructured fields and/or narrative notes.
Using these EMR data for clinical analyses has proven to be challenging. 5,6 Advances in machine learning algorithms have the potential to improve the efficiency and accuracy of using unstructured EMR data for large-scale clinical research. In the present study, we used natural language processing (NLP) to evaluate information contained in EMRs of patients with AF to provide clinically relevant data that could be used to inform research and treatment decisions. This study was designed to explore the feasibility of using unstructured data within EMR to characterize symptomatic status among patients with AF in the United States, evaluate temporal AF symptom profiles in these patients, and determine whether the NLP methodology can be used to differentiate outcomes between antiarrhythmic drug (AAD) treatment groups.

| Data source
This observational retrospective study used the Optum-Humedica deidentified EMR database, linked to the Optum ® Clinformatics ® claims data set (i.e., Optum Integrated), to identify symptom reports in patients with AF. Optum Integrated allows for the assessment of treatment patterns using information from pharmacy and medical insurance claims, EMR, demographic data, and related documentation. Optum's EMR database encompasses over 80 million patients from all four census regions in the United States, with at least 5 million patients from each region. Data are derived from more than 140 000 physicians at more than 600 hospitals and 6500 clinics. On average, patients contributed 4 years of medical history each to the database.

| Study design
The study design is summarized in Figure 1. Adults diagnosed with incident AF in the United States in the period from January 1, 2016 through June 30, 2018 (identification period) were identified, as described below, in the Optum Integrated datasets. The index date was the earliest AF diagnosis during the identification period. A 1- year "lookback" before the index date (baseline period) was included for the determination of concomitant medications, comorbidities, and demographic characteristics and for patient selection. Patients were tracked longitudinally for up to 12 months of follow-up from the index date for assessment of symptom reports. Thus, data from the period from January 1, 2015 through June 30, 2019 (study period) were used in this analysis.

| Patient selection
Patients included in the study were at least 18 years of age at the beginning of the baseline period, had continuous enrollment in a health plan during the baseline period, and received an initial diagnosis of AF during the identification period. AF was diagnosed based on the availability of at least one in patient claim or two outpatient claims with a primary or secondary diagnosis of paroxysmal AF (International Classification of Diseases, 10th Revision

| Study objectives
The primary objectives of the study were to characterize AF-related symptoms among patients with incident AF, derived from the Optum EMR database of structured NLP terms, and to describe the characteristics (demographics and medical history, including procedures, comorbidities [Supporting Information: Table 1], CHA 2 DS 2 -VASc score, 7 concomitant medications, and AAD use) of patients in the overall incident AF cohort and those with symptom reports.
To evaluate the utility of NLP to identify differences between treatments, an exploratory analysis was also performed in which patient characteristics and symptom reports were evaluated for patients with prescription claims for dronedarone or sotalol as their first AAD following AF diagnosis. Dronedarone and sotalol were chosen for this purpose as they have overlapping pharmacologic properties and contraindications. As a result, patients prescribed these two drugs are more likely to be similar than patients prescribed Class I-C drugs (contraindicated in coronary artery disease) or amiodarone, which is more often used in heart failure and is to some degree discouraged as a first-line agent.
To further reduce potential confounding, adjustment was performed using 1:1 propensity-score matching between patients receiving dronedarone and those receiving sotalol. In addition, incidence rates for first cardiovascular hospitalization among patients with prescription claims for dronedarone or sotalol were assessed for the study period. Cardiovascular hospitalization was defined as a composite of hospitalizations for AF, heart failure, myocardial infarction, and stroke. Patients were censored after first observed event, end of enrollment, or death.
Results by AF type at diagnosis (paroxysmal vs. persistent), determined based on ICD-10 codes, were also evaluated in the study as an exploratory analysis.

| NLP data collection
NLP is a form of artificial intelligence that, in part, can scan free-text fields of EMRs to identify and index clinical information with contextual information. The current study "trained" an NLP algorithm to identify the existence of predefined symptoms using Optum EMR free text data ( Figure 2). The algorithm used the following predefined terms: chest pain, palpitations, fatigue, dyspnea, shortness of breath, syncope, presyncope, lightheadedness, and dizziness. A few of the terms were grouped based on overlapping symptomatology: dyspnea and shortness of breath; syncope, presyncope, lightheadedness, and dizziness.
To further characterize symptoms, the NLP algorithm ( Figure 2) was used to search unstructured text data within the EMR to capture information provided by the patient to the healthcare provider during consultation. To capture true reports of symptoms, the algorithm was designed to capture and apply additional context, including physician sentiments. For example, "shortness of breath" identified in the free text fields does not necessarily indicate that the patient has reported that symptom. The full field is processed to determine whether the patient "denied having shortness of breath" and other considerations to characterize whether the patient had the symptom or not.

| Statistical analysis
The analyses for this study were descriptive in nature and were not powered to detect statistical differences between groups. For all analyses, the index date was the date of the earliest AF diagnosis. For propensity matching of the dronedarone and sotalol cohorts, greedy matching was employed with a caliper of 0.10. 12

| Patient characteristics
Of 143 625 patients identified from the EMR database with an index diagnosis of AF, 30 447 were found to meet the eligibility criteria for inclusion in the study analyses, as shown in Figure 3. Characteristics in the 12-month baseline period (i.e., before index date) are shown in Prescriptions for direct-acting oral anticoagulants and warfarin were claimed by 4.7% and 3.8% of the incident AF population, respectively, during the year before AF diagnosis, indicating possible indications other than AF for anticoagulation during the baseline period.
Of the 30 447 patients with an index diagnosis of AF, 9734 (31.9%) had a documentation of at least one of the nine predefined symptoms. The characteristics of these patients were similar to those of the overall incident AF population in the baseline period (Table 1).

| Characterization of AF-related symptoms in patients with incident AF
The predefined symptoms reported by patients with incident AF are summarized in Figure 4. The incidence rate of symptom reports was highest soon after diagnosis (0−3 months) and lower during the >3 to 6-and >6 to 12-month time periods. Across all time periods, the most common of the predefined symptoms reported were dyspnea or shortness of breath, followed by syncope, presyncope, lightheadedness, or dizziness.  Table 2. Characteristics of patients in the firstline dronedarone and first-line sotalol cohorts were comparable to those in the overall incident AF cohort except for prescription claim rates for direct-acting anticoagulants (8.3% in the dronedarone and 8.6% in the sotalol cohorts vs. 4.7% in the overall incident AF cohort), rate-control medications (56.7% and 56.5% vs. 41.9%, respectively), and warfarin (2.2% and 1.7% vs. 3.8%, respectively) (Tables 1 and 2).
T A B L E 1 Baseline characteristics of patients in the overall incident AF cohort (n = 30 447) and in patients with incident AF cohort reporting a predefined symptom (n = 9734) The incidence of predefined symptom reports in patients treated with dronedarone or sotalol during the follow-up period is summarized in Figure 5. As in the overall incident AF cohort, the incidence of symptom reports was highest in the first 3 months postdiagnosis and decreased thereafter in both the dronedarone and sotalol groups.
Numerically lower rates of nearly every symptom during all three time periods were reported in the dronedarone cohort compared with the sotalol cohort. Rates of first cardiovascular hospitalization at 12 months were higher in patients in the sotalol cohort (50.6%) compared with the dronedarone cohort (33.7%).

| Characterization of symptoms by AF type
Of the 9734 patients with an index diagnosis of AF and symptom reports, 8621 had an initial diagnostic code of paroxysmal AF, and 1113 of persistent AF. However, it was observed that ICD-10 coding of AF pattern was inconsistent over time. For example, 950 patients were observed to have codes for paroxysmal followed by persistent AF within 6 months. However, of these patients, 899 (94.6%) later had an observed encounter again coded as paroxysmal AF, complicating any analysis on AF progression using ICD coding alone.

| DISCUSSION
In this proof-of-concept study, we found that NLP successfully identified the presence of 1 or more pre-defined symptoms in 32% of new-onset AF patients identified in a linked EMR-claims database.
The most frequently documented symptoms were dyspnea/shortness of breath, followed by syncope, presyncope, lightheadedness, or It is also possible that the current NLP methodology may underestimate symptom burden. For example, prior studies reported the presence of palpitations in roughly 30%−50% or more of patients, compared with 10% in our sample. The term "palpitation" to some degree may represent medical shorthand for a number of sensations experienced by patients and described with various words (e.g. racing, skipping, pounding etc.). The NLP approach studied herein will benefit from additional refinement and validation in other data sources.
Despite these potential limitations, we believe NLP has the potential to augment the type of information available for research using administrative data, enabling investigation into symptoms and response to treatment in large EMR databases. Alternatively, the only other method would be to use disease specific surveys to prospectively assess patient report outcomes. Although this would be more precise, it is costly and limited in its scope of reach, while the use of NLP would provide information on population-level symptoms with antiarrhythmic drug therapy, thereby allowing electrophysiologists that use these therapies to measure their effectiveness.
Until recently, EMRs have proven challenging to use beyond diagnostic and procedural code analyses, in large part because aspects of these records, such as patient notes, do not easily lend themselves to extractable and consistent, large-scale analysis. 6,14 Advances in machine learning algorithms have made considerable progress in overcoming some of these challenges through the development of semi-structured and structured fields that "read" information on patients' symptomology. Recent studies have demonstrated the potential utility of EMR/NLP in identifying incident stroke T A B L E 2 Baseline characteristics of patients with incident AF and reporting a predefined symptom who were prescribed first-line dronedarone or sotalol postdiagnosis (propensity-matched cohorts) and in assessing changes in patient risk for stroke following changes in treatment patterns. 15,16 In this analysis, we illustrate the feasibility of using NLP to characterize symptom reports in patients with incident AF. Similar methods could potentially be used in future "pragmatic trials" in which documentation from routine clinical care is leveraged, instead of the much more laborious and costly approach of requiring dedicated study visits with research personnel, and attendant data entry. As EMR use continues to increase nationally, and NLP algorithms become refined, we feel that these approaches will become a useful tool for outcomes researchers in electrophysiology and other fields. However, more work is likely needed to refine the algorithms for optimal performance.
AF symptom burden plays a central role in planning both rhythmand rate-based treatment strategies. AF management is founded upon treatments to reduce the risks of stroke, heart failure, and death, as well as to reduce arrhythmia-related symptoms and improve quality of life. 1,2 The present study supports the use of NLP methodology to conduct studies using symptomology as an endpoint using large, unstructured datasets. It also shows that the same analytic tools can be used to ascertain symptom-benefit over F I G U R E 5 Incidence rates of predefined symptoms reported in patients with incident AF prescribed first-line dronedarone or sotalol post-diagnosis (propensity-matched cohorts; n = 409 each). AF, atrial fibrillation; CI, confidence interval.
time with defined rhythm-based treatment approaches that can complement traditional clinical outcomes-based analyses.
In the current study, among propensity-matched patients with AF prescribed dronedarone or sotalol, the incidence of symptom reports for dyspnea or shortness of breath, chest pain, and fatigue was lower in the dronedarone cohort compared with the sotalol cohort at all time periods assessed. The incidence of first cardiovascular hospitalization over the study period was also lower in the dronedarone cohort compared with the sotalol cohort. Given that the index date for this analysis was AF diagnosis, which was included in the 0−3 month interval but may have predated treatment with either drug, it is not possible to determine whether the lower incidence rates of symptom reports and cardiovascular hospitalization were due to AAD treatment effects or whether less symptomatic versus more symptomatic patients received dronedarone versus sotalol, respectively. Nonetheless, these results suggest that NLP could potentially be applied to identify symptomatic outcomes between treatment groups and pave the way for conducting future studies investigating comparative treatment benefits.
In this study, we also attempted to evaluate results by AF pattern (i.e., paroxysmal vs. persistent AF) using ICD-10 codes. We found ICD-10 coding of AF pattern to be highly inconsistent over time.
Thus, these data do not allow for further analysis of symptoms for assessment of AF progression. We believe this is the first report to demonstrate the lack of accuracy of ICD-10 diagnosis codes for AF pattern, which has important implications for future clinical research using ICD-10 diagnosis codes.

| Limitations
The claims database used for our study, like all administrative databases, may be prone to coding inaccuracies that could have affected the identification of AF patients or the ascertainment of their baseline characteristics. In particular, the accuracy of ICD-10 coding to distinguish between clinically defined AF patterns is not well-validated. For this reason, we excluded patients for whom selected codes suggested likely permanent or long-standing persistent AF, as described above, and chose not to stratify our results according to AF sub-types. Nonetheless, potential coding errors remain a limitation.
Our study was not intended to assess quality of life in AF.
Existing, validated quality of life instruments are better suited for this purpose. 17,18 The European Heart Rhythm Association symptom scale alternatively provides clinician ratings on the impact of AF on patients symptoms and functioning. 19 While the use of NLP allows for an examination of symptoms and their relationship to treatment, it is unlikely to facilitate drawing conclusions about symptom severity or functional impairment in the absence of systematic patientreported outcome data collection. 20 A hierarchy or weighting was not applied to assessment of physician notes in the EMR; so, for example, a note from a cardiology clinic, more likely to evaluate cardio-pulmonary symptoms, would have the same weight as a note from a urology clinic, much less likely to evaluate these symptoms. The comparative analysis of two AADs with class II and III properties was exploratory. Analysis of symptom report and clinical outcomes in propensity-matched groups was performed based on date of AF diagnosis (and not treatment initiation). Differences in symptom report rates and outcomes between patients receiving dronedarone versus sotalol reflect relative effects between study groups and should not be directly attributed to treatment effects.

| CONCLUSIONS
In this study, we characterized symptom reports among patients with incident AF by applying NLP to EMR data. Our findings support the feasibility of this approach and the potential of incorporating NLP into the design of future observational AF studies, but further validation and refinement of these methods will be needed.

DATA AVAILABILITY STATEMENT
Qualified researchers may request access to patient level data and related study documents including the clinical study report, study protocol with any amendments, blank case report form, statistical analysis plan, and data set specifications. Patient level data will be anonymized and study documents will be redacted to protect the privacy of our trial participants. Further details on Sanofi's data sharing criteria, eligible studies, and process for requesting access can be found at: https://www.vivli.org/.