Novel analysis of clinically relevant diagnostic errors in point-of-care devices

Authors


Kenneth M. Shermock, Center for Pharmaceutical Outcomes and Policy, 600 N. Wolfe Street/Carnegie 180, Baltimore, MD 21287, USA.
Tel.: +1 410 502 7674; fax: +1 410 502 0287.
E-mail: kenneth@jhmi.edu

Abstract

Summary. Background: To ensure proper clinical decision-making and avoid preventable harm, the quality of point-of-care (POC) device measures is routinely assessed. Traditional analyses may not reveal clinically important diagnostic errors. Objectives: To compare results between a novel analytic framework and traditional analyses. Methods: Patients in four anticoagulation clinics provided two measures of the International Normalized Ratio (INR) at the same visit as part of routine quality assurance: one via a venous sample and one fingerstick. These were assessed with Hemochron POC devices. Traditional, quarterly, quality assurance assessments emphasized correlation analysis. The novel analysis used enhanced graphics and a validated assessment of clinical decision-making. Results: 1518 paired INRs were analyzed. The correlation between the POC and laboratory assessments ranged between 0.84 and 0.91. Traditional quality assurance showed that the Hemochron devices were acceptable for continued use in each quarterly analysis. Enhanced graphical analysis demonstrated that the Hemochron devices never reported seven common INR values. The Hemochron devices systematically inflated values < 3 and deflated values > 4, biasing results towards the target INR range. Consequently, the Hemochron devices lead to a different clinical decision than the clinical laboratory measure in 31% of cases (458/1466; 95% confidence interval [CI] 29–34). When the reference INR was low, the Hemochron devices would not result in appropriate dose increases in 52% of cases (95% CI 48–56), placing these patients at risk for a significant adverse drug event. Conclusions: Our novel, clinically relevant analysis revealed previously undetected deficiencies in our POC INR devices, and our approach should be adopted by industry, regulators, and institutions.

Introduction

To ensure proper clinical decision-making and avoid preventable harm, the quality of clinical measures is routinely evaluated with the use of method comparison studies [1–4]. These assessments determine the level of agreement between different measures of the same clinical parameter, typically comparing a reference standard with an alternative measure of the same analyte. Regulatory requirements and authoritative guidance documents recommend a variety of statistical methods for comparison of measures, such as linear regression, correlation analysis, and assessment of bias [4–9]. The authoritative Stockholm Conference hierarchy suggests that the optimal way to evaluate the quality of a clinical measure is to determine its impact on clinical decision-making [2,3,10,11]. However, no current regulations require analyses to incorporate clinical decision-making in their assessments of alternative measures.

Clinical decisions for patients treated with warfarin are commonly determined on the basis of measurements of the International Normalized Ratio (INR) from point-of-care (POC) devices. The performance of these devices is routinely assessed by determining the level of agreement between the POC devices and a local reference standard. Published papers have used a diverse array of techniques to assess agreement between INR measures. A standard developed in the UK determines that INR measures do not agree if they not within 15% of each other [12–15]. However, methods that use percentage difference to assess agreement have been shown to correspond poorly with clinical decision-making [16]. Our group has conducted an assessment of the impact of differences in INR measures on clinical decision-making [17]. In that analysis, 61% of INR measures that were > 15% apart actually led to the same clinical decision [16,17]. This highlights the need for alternative methods that correspond more closely to clinical decision-making.

Recently, we developed and validated a method to assess when alternative measures lead to different clinical decisions for oral anticoagulation therapy [16,18]. The method has been advocated by methodologic and regulatory leaders [3,19,20]. However, the extent to which the new method produces different inferences regarding the quality of INR measures is unknown. Therefore, we used this new method, combined with other simple analytic improvements, to perform an enhanced quality assurance analysis of POC INR devices. Our aim was to compare inferences and overall conclusions about the quality of POC INR measures, using two analytic frameworks for quality assurance: one based on current regulation and common practice; and the other developed by our team to provide clinically relevant information.

Methods

Patients and study measurements

Patients were enrolled in one of four outpatient oral anticoagulation clinics affiliated with our institution from January 2006 to March 2008. The INR measurements used in this study were previously collected as part of an ongoing quality assurance program. In this program, a convenience sample composed of a majority of patients in their first 4 weeks of oral anticoagulation therapy were asked to provide one venous blood sample assessed by our core laboratory and one fingerstick sample assessed by a POC device during the same clinic visit. Prothrombin times were measured on venous blood samples with a highly sensitive (International Sensitivity Index 1.0) recombinant human thromboplastin (Innovin; Siemens Healthcare Diagnostics, Deerfield, IL, USA) on the BCS coagulation analyzer (Siemens Healthcare Diagnostics). Calibration of the reference laboratory was carried out according to World Health Organization (WHO) guidelines [21]. Annual external calibration of the reference laboratory was conducted with certified plasmas supplied by Siemens Healthcare Diagnostics that are traceable to the 3rd WHO international reference preparation, recombinant human plain rTF/95 thromboplastin. Quarterly proficiency testing was conducted by the reference laboratory by using analyte supplied as part of the College of American Pathologists quality assurance program for coagulation testing. The Hemochron Jr and Hemochron Signature Elite (International Technidyne Corporation, Edison, NJ, USA) were used to perform POC INR assessment. Most of the anticoagulation clinics in our system switched from the Hemochron Jr to the Hemochron Signature Elite in early 2007. In accordance with standard operating procedure, all of the clinics performed daily electronic quality control procedures on the POC devices. Additionally, whole blood normal and abnormal controls were conducted weekly. Our anticoagulation clinics followed the Hemochron Jr and Hemochron Signature Elite Operator’s Manual procedures for performing POC INR tests. All clinic personnel were required to pass annual proficiency tests on the POC device to work in the clinic. In addition, proficiency testing with whole blood samples from an independent Clinical Laboratory Improvement Amendments (CLIA)-approved proficiency survey program were performed three times annually.

Traditional quality analysis

The core laboratory conducted quarterly, traditional quality assurance analysis with data collected during that quarter. These analyses featured linear regression, correlation, analysis of average bias, and Bland–Altman-style difference plots, consistent with current regulations and guidance [4,5,22]. The core laboratory used EP Evaluator software (Data Innovations, LLC, South Burlington, VT, USA) to conduct these analyses. The proportion of POC INR measures that would result in the same clinical decision as the core laboratory INR was estimated according to the criteria in Table 1.

Table 1.   Differences between INR measures used to define agreement by the core laboratory
Laboratory INRDifference between INRs that defined clinical agreement
  1. INR, International Normalized Ratio.

< 2.5< 0.5 INR units
2.5–4.5< 1.0 INR unit
> 4.5< 2.0 INR units

Clinically relevant quality analysis

Our enhanced assessment of the paired INR values began by including all available data in a single analysis. Once these data were assembled and checked for completeness and accuracy, we applied principles of excellent graph and table design [23,24]. Overarching principles guiding our displays were to visualize all the data, to eliminate non-information in our graphs, and to show clinically relevant comparisons between the different INR pair measures [24].

Graphical analysis included XY scatter plots of the INR measures, difference plots, and construction of histograms of the frequency with which each INR value was observed by the POC device and the laboratory. These histograms were then oriented in the same graphical space to create ‘adjacent histograms’ to allow for a direct comparison of the frequency of all observed INR values.

Statistical analyses were conducted with Shermock’s method, a previously validated method that corresponds closely to oral anticoagulation clinical decision-making [18]. Shermock’s method predicts that clinical decisions will agree if INR measures fall within the same clinically relevant range (Table 2).

Table 2.   Clinically important INR ranges and associated clinical decisions used to define agreement between INR measures according to Shermock's method
  1. INR, International Normalized Ratio.

INR below 1.9Increase warfarin dose
INR 1.9–3.3Maintain warfarin dose
INR 3.4–5.5Decrease warfarin dose
INR 5.6–9.0Decrease/hold warfarin; consider vitamin K if significant bleeding
INR > 9.0Hold warfarin and administer vitamin K

INR measures were said to disagree if they were not located in the same INR range. We calculated risk ratios (RRs) with 95% confidence intervals (CIs) of the proportion of POC device measures that would lead to different clinical decisions than the laboratory measures.

Analyses and graphics were produced by author K.M.S. with stata, version 11 (Stata, College Station, TX, USA). A P-value of < 0.05 was considered to be statistically significant. We received approval from our local Institutional Review Board to conduct this research.

Results

A total of 1518 paired INRs were collected in the four anticoagulation clinics affiliated with our institution from January 2006 to March 2008. Of these, 1433 paired observations were included in linear regression and correlation analyses, and 1466 were included in assessments of clinical agreement (1381 were included in both, 85 only in the clinical agreement analyses, and 52 only in the linear regression and correlation analyses).

Traditional quality analysis

Figure 1 shows the nine linear regression plots that were generated during the quarterly quality assurance analyses. Regression parameters from these analyses are listed in Table 3. Notably, the correlation coefficients between the Hemochron Jr and Hemochron Signature Elite and the laboratory ranged between 0.84 and 0.91. Six of the nine calculated correlation coefficients were 0.89 or greater. All regression slopes were significantly < 1 (P < 0.05), and y-intercepts were significantly different from zero (P < 0.05). The clinical decision analyses produced agreement estimates of between 66% and 94%. Agreement exceeded 83% for seven of the nine quarterly analyses, and was at least 86% for five of the analyses. The conclusion by the core laboratory after each of the nine quarterly analyses was that the POC devices were acceptable for continued use in our anticoagulation clinics.

Figure 1.

 Linear regression plots of the core laboratory International Normalized Ratio (INR) and the Hemochron point-of-care (POC) INR values. The solid line in each scatterplot represents the line of equality, the line where POC INR = core laboratory INR. The dashed line in each scatter plot represents the best-fit straight line from linear regression analysis.

Table 3.   Linear regression parameters from plots in Fig. 1
Quarter and year (n)Coefficient of correlation (95% CI)Slope (95% CI)Y-intercept (95% CI)
  1. CI, confidence interval.

1Q2006 (243)0.91 (0.88–0.93)0.94 (0.89–1.0)0.63 (0.49–0.77)
2Q2006 (139)0.84 (0.79–0.89)0.78 (0.7–0.86)0.90 (0.69–1.12)
3Q2006 (163)0.87 (0.83–0.91)0.85 (0.78–0.92)0.71 (0.5–0.91)
4Q2006 (132)0.91 (0.87–0.93)0.75 (0.69–0.81)0.91 (0.74–1.07)
1Q2007 (135)0.9 (0.86–0.93)0.81 (0.74–0.88)0.82 (0.64–1.01)
2Q2007 (161)0.85 (0.8–0.89)0.56 (0.51–0.62)1.34 (1.15–1.52)
3Q2007 (145)0.89 (0.85–0.92)0.66 (0.6–0.71)1.09 (0.92–1.26)
4Q2007 (154)0.89 (0.85–0.92)0.67 (0.61–0.73)1.09 (0.94–1.24)
1Q2008 (161)0.91 (0.88–0.93)0.66 (0.61–0.71)0.89 (0.75–1.03)

Clinically relevant quality analysis

In our enhanced analysis, all 1433 paired observations used in the previous regression analyses were combined in the same analysis for the first time. Linear regression analysis produced a slope of 0.71 (95% CI 0.69–0.73; P < 0.0001, test that slope = 1), and a y-intercept of 1.01 (95% CI 0.95–1.07; P < 0.0001, test that y-intercept = 0). The correlation between the POC device and core laboratory INR measures was 0.87 (95% CI 0.86–0.88). In an XY scatterplot of the data, horizontal white lines were noted in the cloud of data points that were not observed in the traditional quality assurance analyses.

The plot of the differences between the measures against the laboratory value of all 1433 paired observations demonstrated that the POC devices systematically inflated INR values < 3 and deflated values > 4, biasing results towards the target INR range (Fig. 2). This plot also revealed parallel white lines traversing the cloud of data points, similar to those observed in the XY scatterplot.

Figure 2.

 Plot of the difference between the point-of-care (POC) and standard (core laboratory) International Normalized Ratio (INR) measures against the standard (core laboratory) INR measures for all 1433 paired INR measures. The data points are reduced in size to increase the data/ink ratio. The white stripes that correspond to INR values that are never reported by the Hemochron Jr and Hemochron Signature Elite POC devices can be seen. A systematic pattern of bias is also observed: the Hemochron devices tend to inflate INR values > 3 and deflate INR values > 4.5.

The adjacent histogram distributions were not mirror-like; low INR values appeared much more commonly when measured by the core laboratory, whereas the POC device INR values were skewed towards the target INR range of 2–3 (Fig. 3). The midpoint of the target INR range, 2.5, was the most common value produced by the Hemochron Jr and Hemochron Signature Elite, accounting for 8.3% of values produced. Several bars appeared to be ‘missing’ in the POC device histogram. These bars corresponded to the white lines observed in the XY scatterplot and difference plot; they indicated that the Hemochron POC devices never reported seven common INR values: 2.1, 2.7, 3.1, 3.5, 3.8, 4.1, and 4.4.

Figure 3.

 Adjacent histograms of International Normalized Ratio (INR) values from the core laboratory and Hemochron point-of-care (POC) devices. Seven commonly occurring INR values were never reported by the Hemochron devices (2.1, 2.7, 3.1, 3.5, 3.8, 4.1, and 4.4). Additionally, the histograms reveal markedly different distributions of INR values. The Hemochron devices tended to report more INR values in the target INR range of 2–3; the laboratory reported more INR values below 2.

On assessment of the 1466 paired INR measures used in the traditional analysis of agreement, the core laboratory was twice as likely as the POC devices to report an INR suggesting that a dose increase was required (n = 522 vs. n = 256; RR 2.0 95% CI 1.8–2.3; Table 4). The Hemochron Jr and Hemochron Signature Elite were 30% more likely to report an INR in the target range (n = 877 vs. n = 679; RR 1.3; 95% CI 1.2–1.4). The POC devices were also significantly more likely than the laboratory to report an INR above the target range (n = 333 vs. n = 265; RR 1.3; 95% CI: 1.1–1.5).

Table 4.   Cross-tabulation of the number and percentage of International Normalized Ratio INR values reported by the Hemochron point-of-care (POC) devices and the clinical laboratory, by clinically relevant INR range as defined by Shermock’s method
Hemochron POC INRLaboratory INR
Suggest warfarin dose increase (INR < 1.9)Suggest no change in warfarin dose (INR 1.9–3.3)Suggest warfarin dose decrease (INR 3.4–5.5)Suggest warfarin dose decrease and consider vitamin K (INR 5.6–9)Hold warfarin and administer vitamin K (INR > 9)Total, no.
  1. Data are shown as no. (%). Percentages indicate the proportion of values within each column. For example, when the laboratory INR suggested an increase in the warfarin dose, the POC devices suggested that no dose change was required in 52% of cases. The bold values indicate instances where clinical decisions were predicted to agree.

Suggest warfarin dose increase (INR < 1.9)248 (48)7 (1)1 (< 1)0 (0)0 (0)256 (17)
Suggest no change in warfarin dose (INR 1.9–3.3)273 (52)571 (84)32 (15)1 (2)0 (0)877 (60)
Suggest warfarin dose decrease (INR 3.4–5.5)1 (< 1)101 (15)171 (81)28 (65)1 (10)301 (22)
Suggest warfarin dose decrease and consider vitamin K (INR 5.6–9)0 (0)1 (< 1)8 (4)14 (33)5 (50)28 (1)
Hold warfarin and administer vitamin K (INR > 9)0 (0)0 (0)0 (0)0 (0)4 (40)4 (< 1)
Total522 (100)679 (100)212 (100)43 (100)10 (100)1466

We estimated that the Hemochron Jr and Hemochron Signature Elite would lead to a different clinical decision than that of the clinical laboratory in 31% of cases (458/1466; 95% CI 29–34). Most (60%, 273/458) instances of disagreement were cases when the POC devices suggested that no dose change was necessary, and the core laboratory INR suggested that a dose increase was required (Table 4). In another 22% (100/458) of cases of disagreement, the POC devices’ INR suggested a decrease in dose and the clinical laboratory no dose change. When the core laboratory reported an INR suggesting that a dose increase was required (n = 522), the Hemochron Jr and Hemochron Signature Elite suggested no change in dose in over half of cases (n = 273; 52%; 95% CI 48–56).

The conclusion of this research team and, ultimately, the institution was that the performance of the Hemochron Jr and Hemochron Signature Elite was unacceptable. The institution decided to find replacement POC devices.

Discussion

Our enhanced analysis revealed previously undetected deficiencies in the POC INR devices used in our anticoagulation clinics that could result in preventable patient harm. First, we estimated that the Hemochron Jr and Hemochron Signature Elite led clinicians to an incorrect clinical decision in nearly one-third of cases. Failure to increase the warfarin dose when the INR is low could result in a preventable thromboembolic event, and failure to reduce the warfarin dose when the INR is high could result in preventable bleeding. Additionally, unbeknown even to the clinicians who used the devices daily, the POC devices failed to produce seven common INR values. These findings constituted powerful evidence that led to the reversal of the core laboratory’s previous conclusions – the POC devices were declared to be unacceptable for continued use in our anticoagulation clinics. We removed these POC devices from all anticoagulation clinics at our institution.

It is noteworthy that markedly different inferences were drawn from the two analytic frameworks when the same set of data was analyzed. Traditional quality assurance supported continued use of the POC devices, whereas our novel analysis led us to remove the devices from our clinics. The dramatic difference in conclusions is based on the disparate analytic frameworks. Despite warnings in the medical literature against using correlation as a measure of agreement, the traditional analysis, consistent with regulatory guidance, focused on correlation [1,25,26]. This analytic approach is so ingrained in our culture that clinicians and laboratorians call the process of taking paired INR measures from patients for quality assurance assessment ‘doing our correlations’. Consequently, it is not surprising that, although there were contrary signals in the traditional analyses (e.g. slopes significantly different from 1, y-intercepts significantly different from 0, positive overall bias), the high correlation between the Hemochron Jr and Hemochron Signature Elite and our laboratory (r = 0.87) probably drove the decision to endorse the devices.

This study builds on work that has been ongoing for decades around the issue of quality assessment of INR values [14,16,18,27–33]. Our techniques add to this work by providing explicit, valid information about how measures impact on clinical decisions [18]. Previous authors have advocated that clinical decision-making should be the basis for comparing measures [2,3,11]. However, to our knowledge, this approach is exceedingly rare. This is the first report that our new analytical method can result in dramatically different conclusions regarding the quality and acceptability of a clinical measure. We agree strongly with recent publications that have advocated increased use of our method at the regulatory and institutional levels [3,19,20].

Our enhanced analytic framework can be applied to any case where agreement needs to be assessed between alternative measures of INR – even disagreement between laboratories. It uses the same set of assumptions as any method comparison analysis – there is no requirement for one measure to be considered a ‘gold standard’. For any set of alternative INR measures, it estimates the proportion of clinical decisions that will be identical. If a gold standard, or true INR, is the reference value, then such an analysis provides information about true bias. In most cases, there is no gold standard at the institutional level. In these analyses, as is the case in our current analysis, the alternative measure is compared with an accepted local standard measure. Method comparison assessments are conducted to assess agreement between alternative measures available within the clinical setting. As the fundamental expectation is that the alternative measures will lead to the same clinical decision as the accepted standard, our method can be just as easily applied in these situations.

There was a sharp contrast between the core laboratory’s estimate of agreement (approximately 83% overall) and ours (69%). Our method is the only one that we know that has been validated against actual clinical decisions. We have previously demonstrated that our method is superior to alternative techniques that were developed without stated rationales or were based solely on the magnitude of numeric differences between the measures [16,34]. Other elements of our analysis were also decisive. The simple act of bringing all available data together into one analysis enhanced the visualization of patterns and increased the statistical power. Finally, the intensive, multistage graphical analysis improved our understanding of the data; each image produced distinct insights.

Our findings also have implications for clinical researchers. The clinical laboratory was twice as likely to report an INR below the target range, and the POC devices were much more likely to report a value within the range. Therefore, clinical trials that use different sources for INR measures between study groups risk introducing a significant source of measurement bias.

There are caveats and limitations that should be considered regarding this research. We did not assess the impact of differences in decision-making on clinical outcomes. Given the strong, well-known relationship between INR and clinical events[35,36], we view such a study to be, perhaps, unnecessary. Our study population consisted primarily of individuals in the early stages of oral anticoagulation therapy; however, the data collected during our routine quality assessment do not allow us to identify them in this analysis. Evidence exists that INR measures stabilize over the course of many weeks. Although the INR measures were considered to be valid by clinicians at our institution, these results may not apply to institutions that would somehow adjust INR values during the initial stages of therapy. The impact that the Hemochron Jr and Hemochron Signature Elite bias pattern would have on a population with a different distribution of core laboratory INR values is unknown. However, the bias pattern seen in Fig. 2 suggests that there would be substantial amounts of disagreement virtually anywhere on the INR scale. Our anticoagulation clinics switched from the Hemochron Jr to the Hemochron Signature Elite in early 2007. We conducted separate analyses for each device, and found no substantive differences in our conclusions. If one is astute enough to find it, the fact that the Hemochron devices do not report seven INR values can be found in the product labeling, following several foreign language sections. Nevertheless, the company product information representative whom we contacted denied knowledge of this phenomenon. Furthermore, no clinician that we have encountered is aware of this issue. However, once they are made aware, clinicians become very uncomfortable with the device (a phenomenon that is unacceptable), especially as there is no clear evidence in the histograms that a simple rounding phenomenon is occurring (i.e. in Fig. 3, there is no bar for 2.1, but neither the 2.0 nor the 2.2 bar appears to be overrepresented). Finally, this analysis was limited to the assessment of two POC devices and one laboratory. Other studies have used multiple laboratories, and found greater discrepancy between laboratories than between POC devices and laboratories [37]. No attempt to do this was made in this assessment.

In conclusion, our novel, clinically relevant analytic approach revealed previously undetected, but serious, deficiencies in our POC INR devices. This approach is simple to implement, and provides unique, clinically relevant insight. Our approach should be adopted by industry, regulators, and local institutions.

Addendum

K. M. Shermock was responsible for the conception and design of this study, analysis of the data and interpretation of the results, drafting and revising the manuscript and approval of the final version. M. B. Streiff, B. L. Pinto, P. Kraus, P. J. Pronovost were responsible for acquisition of study data, analysis of the data and interpretation of the results, drafting and revising the manuscript and approval of the final version.

Acknowledgements

There were no external sponsors for this work. The Johns Hopkins Hospital ‘funded’ this work by paying the salaries of several authors (K. M. Shermock, B. L. Pinto and P. Kraus). This author team maintained independence and made all decisions regarding study design, data collection, data analysis, interpretation of results, writing this research report, and deciding to submit it for publication. The corresponding author, K. M. Shermock, had full access to all of the study data, and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Disclosure of Conflict of Interests

The authors state that they have no conflict of interest.

Ancillary