Validation of venous thromboembolism diagnoses in patients receiving rivaroxaban or warfarin in The Health Improvement Network

Abstract Purpose To describe the effect that validation of venous thromboembolism (VTE) coded entries in the health improvement network (THIN) has on incidence rates of VTE among a cohort of rivaroxaban/warfarin users. Methods Among 36 701 individuals with a first prescription for rivaroxaban/warfarin between 2012 and 2015, we performed a two‐step VTE case identification process followed by a two‐step case validation process involving manual review of patient records. A valid case required a coded entry for VTE at some point after their first rivaroxaban/warfarin prescription with evidence of referral/hospitalization either as a coded entry or entered as free text. Positive predictive values (PPVs) with 95% confidence intervals (CIs) were calculated using validated cases as the gold standard. Incidence rates were calculated per 1000 person‐years with 95% CIs. Results We identified 2166 patients with a coded entry of VTE after their initial rivaroxaban/warfarin prescription; incidence rate of 45.31 per 1000 person‐years (95% CI: 43.49‐47.22). After manual review of patient records including the free text, there were 712 incident VTE cases; incidence rate of 14.90 per 1000 person‐years (95% CI: 13.85‐16.02). The PPV for coded entries of VTE alone was 32.9%, and the PPV for coded entries of VTE with a coded entry of referral/hospitalization was 39.8%; this increased to 69.6% after manual review of coded clinical entries in patient records. Conclusions Among rivaroxaban/warfarin users in THIN, valid VTE case identification requires manual review of patient records including the free text to prevent outcome misclassification and substantial overestimation of VTE incidence rates.

surgery, and prophylaxis of recurrent VTE. 1 Since 2016, rivaroxaban has been the most common oral anticoagulant prescribed to patients in England with incident VTE. 2 Approval of rivaroxaban for VTE indications was based on data from randomized controlled trials (RCTs) [3][4][5] with strict inclusion/exclusion criteria. Thus far, data on the effectiveness of rivaroxaban for VTE indications among the broad spectrum of patients receiving the drug in routine clinical practice have come from patient registry 6,7 or claims database studies, 8 or observational field studies. 9,10 Databases of electronic health records (EHRs) are other appropriate sources to efficiently conduct rivaroxaban effectiveness studies.
One such database-The Health Improvement Network (THIN)-has been used extensively for pharmacoepidemiological research. It holds the pseudo-anonymized primary care EHRs of approximately 6% of the UK population, 11 who are broadly representative of the UK demographic. 12 As of September 3, 2017, 1 million patients were actively contributing patient data through their THIN participating practice. 13 Data are recorded by the primary care practitioner (PCP) and other practice staff using Vision software during or after each consultation, or retrospectively after receiving information from secondary care via postal letter or email. Diagnoses are entered via Read codes, the clinical classification system used by the UK's National Health Service. 14 After entering a Read code, a comment box opens in which the PCP can freely enter associated details, such as a referral to hospital, symptoms, or factors relating to the diagnostic work-up-these can also be entered in part via Read codes if an appropriate code exists.
It is recommended that outcome identification using primary care databases such as THIN involves supportive evidence to validate the recorded diagnosis and avoid misclassification. 15,16 False negatives may arise through searches for Read codes for clinical entries supporting the diagnosis (eg, for a code for a hospitalization) during an overly restricted time interval, or if supporting information is recorded in the free text and these data are not accessed and reviewed. Conversely, false positives will occur if the free text refers to a previous/historical episode or confirms the absence of the event-an important factor to consider due to importance of achieving high case specificity in drug effectiveness/ safety studies. However, access to free text comments in THIN requires an additional cost, and scrutiny of the comments is labor intensive. This study explored a stepwise validation process of VTE Read code entries in THIN among a cohort of oral anticoagulant users (new users of warfarin or rivaroxaban). The primary objective was to validate cases of VTE through a process involving review of coded clinical entries and free text comments. The secondary objective was to describe the effect that inclusion of a validation step involving the review of free text comments (vs Read code entries only) has on incidence rates of major VTE events among this cohort of patients.

| Study cohort and follow-up
Details of the study cohort including the age and sex distribution have been published previously. 17 Briefly, we included 36 701 individuals in THIN aged between 2 and 89 years with a first prescription for rivaroxaban or warfarin between January 1, 2012 and May 31, 2015.
Individuals were followed-up from the date of their first rivaroxaban/ warfarin prescription (start date) to identify the first recorded VTE after this first prescription (individuals may therefore have had a VTE before the start of follow-up). End of follow-up was the earliest of the following: a Read code indicative/suggestive of deep vein thrombosis (DVT) or pulmonary embolism (PE) (see Table S1 for the code list), death, the date of the last data collection from their practice, or the end of the study period (May 31, 2015).

| Case identification
Our operational definition of the first VTE event recorded during follow-up was a VTE event that led to a referral either to a specialist or to hospitalization, or was recorded in the primary care record as the cause of death. We did not restrict to cases with evidence of hospital admission because a previous VTE validation study in the Clinical Practice Research Datalink (CPRD, formerly the General Practice Research Database-a highly similar database to THIN), found that approximately 20% of VTE cases confirmed by the PCP via paper questionnaires did not have a database entry indicative of hospital admission for the event. 18 As shown in Figure 1, VTE case identification and validation involved a four-step sequential process.
Step 1 of the case identification process involved an automated computer search to identify patients with a Read code for VTE during follow-up. In step 2, we performed automated computer searches among the EHRs of patients identified in step 1 to identify those with a specific entry or Read code for a referral to secondary care and/or a

KEY POINTS
• Use of only coded clinical information in THIN database is insufficient to accurately identify incident cases of major venous thromboembolism (VTE).
• Manual review of patient records substantially increases the validity of VTE cases identified through algorithmic searches for coded diagnoses.
• The free text comments in THIN commonly provide clinical information important for valid case identification.
• A manual review process including scrutiny of the free text comments is a valid method to identify cases of VTE and avoid misclassification, especially to reduce false positives.
• Without manual review of free-text comments in THIN, incidence rates of VTE will be substantially overestimated.
hospitalization in the 15 days before the VTE record or in the 30 days after. This time frame was applied to maximize the sensitivity of our case definition due to the fact that referrals/hospitalizations are not always recorded on the same day as a clinical diagnosis in a patient's EHR. All patients identified during this step were considered to be potential cases of VTE.

| Validation of VTE
In Step 3, we manually reviewed the coded entries in the primary care record of potential VTE cases identified in the previous step to confirm the diagnosis and that the referral/hospitalization was related to the VTE event. For potential cases retained after this step 3, we then requested and accessed free-text comments in the patients' primary care record for further manual review and final confirmation of the VTE event. To undertake this review efficiently we only accessed free text comments entered in the 15 days either side of the event, as well as all those specifically attached to an entry of DVT/PE or to any entry of hospitalization or referral in the 16 to 180 days after the event. These comments often contain information on referrals and details from hospital discharge letters describing the clinical evaluation and tests performed (eg, radiology tests and reports), as well as information from death certificates. A first manual review of these free-text comments was performed independently by one researcher (AR) to confirm VTE case status and to establish whether the text referred to a current event or to a previous event. Subsequently, potential cases were grouped according to three main characteristics: (a) the interval between the date of first rivaroxaban/warfarin prescription and the date of the recorded VTE, (b) the indication for the anticoagulation (VTE or other), and (c) whether it was a recorded hospitalization or a referral to a specialist that led to the case being confirmed. For cases where the status was not clear after this review, a second independent review was performed by another researcher (LAGR) and consensus on case status was reached through discussion.

| Statistical analysis
For cases identified in each step, we calculated the incidence rate of VTE per 1000 person-years with 95% confidence interval (CI) as the number of first VTE cases identified during follow-up divided by the corresponding observed person-years. The incidence rate of final confirmed hospitalized/referred VTE cases was also calculated according to referral/hospitalization case status. We calculated the positive predictive value (PPV) of each validation step, as the ratio of the number of participants with a confirmed diagnosis after final confirmation (step 4) to number of patients with a VTE diagnosis identified after each step, and expressed this as a percentage with 95% CIs. As a post-hoc analysis, stratified analyses were performed calculating the confirmation rate according to age at the start of follow-up In stratified analyses, variation was seen in the confirmation rate across patient characteristics (Table 3). Of all potential VTE reviewed, there were more confirmed cases among males, older patients (aged ≥60 years), among those with a history of VTE (for recurrent events), and among those where the indication for the first rivaroxaban/warfarin prescription was for VTE (rather than another indication for example, atrial fibrillation). A higher confirmation rate was also seen among patients whose VTE event was a PE rather than a DVT, among patients hospitalized for their VTE, and when the time between the first rivaroxaban/warfarin prescription and the VTE event was >90 days. Two Read codes-G801.11 and G401.00-were responsible for identifying close to 85% of all confirmed VTE cases. We have previously shown the benefit of accessing and reviewing the data in free-text comments to validate cases of major gastrointestinal and urogenital bleeding events among our cohort of anticoagulant users, with incidence rates overestimated more than two-fold when this process was not undertaken. 17 Studies of other clinical conditions in THIN have similarly highlighted the benefit of manually reviewing patient records, especially with the free text, for case validation. [25][26][27][28] A strength of our study is the broad study population where inclusion was based on having a first prescription for rivaroxaban or warfarin, and all VTE events whether first time or recurrent were included. The scrutiny of the free-text comments through manual review not only enabled the acquisition of previously "hidden data" but also avoided missing any crucial information that could happen using an approach involving algorithmic searches in the free text for specific text strings. Furthermore, our operational code lists for DVT/PE in step 1 and for referral/hospitalizations in step 2 were broad in order to maximize the sensitivity of our case identification process.

| DISCUSSION
Manual review of patient records including the free-text comments is costly and labor intensive, and requires the reviewer to have

ETHICS STATEMENT
The study protocol was approved by an Independent Scientific Research Committee for THIN (reference THIN14-018). No individual patient consent was required because the study used de-identified data provided by patients as a part of their routine primary care.