I read with interest the recent article by Shermock  concerning point-of-care (POC) INR testing. Given the information presented, I would agree with their ultimate conclusion – that the devices were not an acceptable replacement for laboratory measurement. However, their ‘novel’ method of error analysis is no more complete than the ‘traditional quality analysis’ methods they criticize.
To perform a complete analysis of any clinical instrument requires consideration of both the interpretation of results around clinical decision points and the analytic performance of the device(s) being used. A previous study of neonatal POC bilirubin measurement serves as a good model for this approach .
Categorizing results solely based on clinical interpretation incorrectly assumes that it is possible for an instrument to be analytically perfect. However, even if a sample is run repeatedly on the same instrument, all the results will not be identical. Though it varies between laboratories, a typical coefficient of variation (CV) for patients with moderately prolonged INRs (not quality control material) is around 3.2% . The CV at higher INR values is typically even larger.
Given this amount of variability, a sample whose INR is (magically) known to be exactly 1.8 will give a result of 1.8 only about 60% of the time, with 20% of results being higher and 20% being lower. In terms of the authors’ analysis, the 20% of results above 1.8 would be interpreted as a diagnostic error.
The problem is that this 20% are not failures per se, but rather unavoidable error due to the fact that even the most modern, well-maintained instruments are not perfectly reproducible. Further complicating analysis, as INR increases, the likelihood of getting the ‘wrong’ clinical categorization for values at cut-points would increase to at least 33% at 3.4 and at least 40% at 5.6.
When comparing two results, true diagnostic failures really only occur when a clinical decision is changed and the difference between the results exceeds the minimal amount of unavoidable analytic error. The magnitude of this error can be estimated using the significant change limit (SCL) and for laboratory INR measurement is around 9% . Therefore, when comparing paired INR results minimal possible analytic variation alone may explain a difference of ≤ 0.2 units at 1.9, ≤ 0.5 units at 5.5, etc.
A final consideration is that, in most instances, POC testing is not meant to be a complete replacement for laboratory testing. Extreme values and those near critical decision points should be confirmed by laboratory analysis. Nevertheless, a good POC instrument can significantly reduce the number of blood draws needed . In this respect, the value of POC INR measurement may lie in distinguishing patients who most likely need no therapeutic change (laboratory INR will be 1.9–3.3) versus those who likely do (laboratory INR will be < 1.9 or > 3.3).
Performing a ‘complete’ analysis of the authors’ data is not likely to change the conclusions of the study. Indeed, around 25% of paired results are probably both clinically and analytically different – still too high to be a replacement for laboratory measurement. As such, this discussion may seem academic. However, even when considering a test like INR, which has a relatively low CV compared with some laboratory tests, 20–40% of results near cut-points could be misclassified due to nothing more than unavoidable analytic variation.
For tests where the ‘best possible’ assay has a larger CV, the rate of clinical disagreements will be even more inflated without a definite relationship to outcomes. One must acknowledge and account for the uncertainty in analytic measurements that occurs even under the best circumstances when comparing multiple measurements on the same or different instruments.