LETTERS TO THE EDITOR
Prospective evaluation of the interobserver reliability of the 4Ts score in patients with suspected heparin-induced thrombocytopenia
Version of Record online: 4 JAN 2012
© 2011 International Society on Thrombosis and Haemostasis
Journal of Thrombosis and Haemostasis
Volume 10, Issue 1, pages 151–152, January 2012
How to Cite
NAGLER, M., FABBRO, T. and WUILLEMIN, W. A. (2012), Prospective evaluation of the interobserver reliability of the 4Ts score in patients with suspected heparin-induced thrombocytopenia. Journal of Thrombosis and Haemostasis, 10: 151–152. doi: 10.1111/j.1538-7836.2011.04552.x
- Issue online: 4 JAN 2012
- Version of Record online: 4 JAN 2012
- Accepted manuscript online: 7 NOV 2011 10:16AM EST
- Received 4 September 2011, accepted 25 October 2011
Suspicion of heparin-induced thrombocytopenia (HIT) necessitates an immediate diagnostic workup to prevent severe complications . The diagnostic strategy must comprise a clinical pretest scoring system to increase the low specificity of high-sensitivity antibody tests, and the 4Ts score is probably the score most often used in this context . Recent studies have questioned the interobserver reliability of the 4Ts score [2,3]. However, the determination of 4Ts scores in previous studies does not reflect real-life clinical practice. In these studies, the score had usually been calculated retrospectively, by laboratory staff receiving specimens for antibody testing, by pharmacists, by specially trained physicians, or by investigators familiar with HIT. As no prospective evaluation in a real-life setting exists, the applicability of results of validation studies in clinical practice may be limited.
We conducted a single-center, prospective observational study to investigate the interobserver reliability of the 4Ts score in patients with suspected HIT in a real-life setting of a tertiary hospital. The study was approved by the local ethical review board of our institution (Kantonale Ethikkommission Luzern). Between June 2010 and March 2011, we included all consecutive patients evaluated for suspected HIT (n = 40; medium age, 63.4 years [range, 14–86 years]; 23 females and 17 males). Sixteen of the subjects were surgical patients (six cardiothoracic and three orthopedic), and 21 patients were evaluated on the intensive care unit. In eight of the patients, antibody testing was positive (ID-H/PF4-PaGIA; DiaMed SA, Cressier sur Morat, Switzerland). Three physicians rated the 4Ts score independently for every patient: (i) the attending physician who asked for antibody testing (n = 36; a specialist registrar in intensive care medicine, internal medicine, or general surgery); (ii) the hematologist on duty (n = 6); and (iii) the principal investigator (PI) (M.N.). The clinical and laboratory data were obtained by chart review and by visiting the patient, and the score’s calculation was blinded between observers. Determination of the score was performed before antibody testing by the attending physician and hematologist, and usually on the same day by the PI. Only data obtained before the time of antibody testing were considered, and the results of antibody testing were not known. A structured, standardized questionnaire was developed, taking previous published details and specifications into account (Table S1) [4,5]. This questionnaire was incorporated into the hospital’s local patient management. Before the study was started, a training lecture was given to hematology staff members. Attending physicians were supported in case of questions regarding particular items of the 4Ts score. For data analysis, we used raw agreement to quantify the interobserver reliability. Raw agreement describes the proportion of possible agreement realized (an 80% agreement means that in 80% of the cases an agreement between the different observers was realized and in 20% it was not). It was calculated for the three individual risk categories – low risk (0–3 points), intermediate risk (4–5 points), and high risk (6–8 points) – as well as overall. All statistical analyses were performed and all plots constructed with the statistical software r (version 2.11) . The distribution of the patients regarding clinical risk category is given in Fig. 1. We observed very limited agreement of the different observers for all three categories of the 4Ts score (Fig. 1). Forty-eight per cent agreement (95% confidence interval [CI] 25–68%) was observed for the low-risk category, 65% (95% CI 52–79%) agreement for the intermediate-risk category, and 47% (95% CI 22–68%) agreement for the high-risk category. Overall agreement was 58% (95% CI 45–69%). Interobserver agreement in real-life clinical practice appears to be considerably lower than expected.
Although previous investigations have not primarily addressed reliability in real-life situations, our observations are in line with them, with the lower agreement between different observers. In a recent investigation, an intraclass correlation coefficient (r) of 0.71 (95% CI 0.54–0.83) was found . In another investigation, discrepancies in up to 50% of the patients were described . Even Lo et al.  proposed problems of interobserver agreement as a possible reason for variable results in evaluation of the 4Ts score.
Additionally, we calculated the raw agreement for the individual items of the 4Ts score (T1 = magnitude of thrombocytopenia; T2 = timing of thrombocytopenia; T3 = presence of thrombosis; T4 = existence of other causes of thrombocytopenia). Good agreement was seen regarding T1 (78%; 95% CI 68–89%) and T3 (89%; 95% CI 81–96%). Clearly lower agreement was seen regarding T2 (55%; 95% CI 43–66%) and T4 (62%; 95% CI 50–73%). This reflects our experience of daily clinical practice. Most discussions are about time of onset of thrombocytopenia and possible other causes. Discrepancies in these items were also noticed in an observational trial discussed above .
A systematic shift between our three observer groups (attending physician, hematologist, and PI) was analyzed. With the use of a generalized estimating equation that takes the relationship of the three ratings per patient into account, average ratings were estimated as 4.46, 4.65 and 4.15 for attending physicians, hematologists and the PI, respectively, and differed significantly (P = 0.026). The hematologists on duty scored systematically higher and the PI lower than the attending physicians. This phenomenon is perhaps a reflection of the duty hematologist(s) being careful not to miss the diagnosis of HIT, and the PI systematically avoiding overdiagnosis of HIT, among other things to prevent unnecessary side effects of alternative anticoagulants. Owing to the design of the study, there are no data that enable the correctness of observer judgements to be compared.
In conclusion, our study shows that the interobserver reliability of the 4Ts score is limited in a real-life setting. Discrepancies arise particularly for T2 (timing of thrombocytopenia) and T4 (existence of other causes of thrombocytopenia). Furthermore, there were small but significant differences when the 4Ts score was rated by an attending physician, a hematologist, or the PI, suggesting that, even with a standardized approach, real-life differences in awareness of the complex aspects of HIT undermine the 4Ts score’s reliability. This may limit the application of results of validation studies of the 4Ts score to clinical practice. Determination of the items of the 4Ts score should be further specified, as suggested by Warkentin and Linkins , to overcome this limitation. Furthermore, evaluation of interobserver agreement in real-life settings is a prerequisite for developing future clinical assessment tools.
Disclosure of Conflict of Interests
The authors state that they have no conflict of interest.
- 6R Development Core Team. A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2011.
Table S1. Questionnaire for calculation of 4T’s score.
|JTH_4552_sm_TableS1.doc||28K||Supporting info item|
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.