Comparison of methodological data measurement limits in CD4+ T lymphocyte flow cytometric enumeration and their clinical impact on HIV management

Authors


Correspondence to: Liam Whitby, UK NEQAS for Leucocyte Immunophenotyping, 4th Floor, Pegasus House, 463a Glossop Road, Sheffield S10 2QD, United Kingdom. E-mail: liam.whitby@ukneqasli.co.uk

Abstract

UK NEQAS for Leucocyte Immunophenotyping, an ILAC G13:2000 accredited External Quality Assessment (EQA) organization, with over 3000 international laboratories participating in 14 programmes, issues 2 proficiency testing samples of stabilized whole blood to 824 participants in the Immune Monitoring (lymphocyte subset) programme every two months. We have undertaken a study of 58,626 flow cytometric absolute CD4+ T lymphocyte count data sets from these laboratories over a 12-year-period (2001–2012) to determine counting method variation in data measurement limits and how this could influence the clinical management of HIV patients.

Comparison of relative error and 99.9% confidence limits for absolute CD4+ T lymphocyte values was undertaken using dual platform (DP) and single platform (SP) data and showed that the SP consistently outperformed DP, giving lower relative errors and confidence limits at clinically significant absolute CD4+ T lymphocyte counts. Our data shows that absolute CD4+ T lymphocyte counts should be obtained using single platform technology to reduce the variability at clinically relevant levels. On data where results (irrespective of platform) were below the international treatment threshold of 350 cells/μl, there was no significant misclassification between either SP or DP techniques meaning most patients would receive the correct treatment at the correct time. However, results that were above the treatment level of 350 cells/μl had a significant difference (P = 0.04) between DP and SP platforms, suggesting patients monitored using DP technology were 20% more likely to start therapy prematurely than those monitored with SP technology. © 2013 International Clinical Cytometry Society

In individuals infected with human immunodeficiency virus (HIV), the enumeration of absolute CD4+ T lymphocytes is an essential parameter for monitoring and reducing the risk of disease progression [1]. In order to effectively manage the patient, it is important to accurately define absolute CD4+ T lymphocyte levels that have clinical and therapeutic relevance [2-5]. This is particularly important in the decision to begin therapeutic interventions and to provide prognosis [1]. Guidelines for HIV infected adults advise that highly active anti-retroviral therapy (HAART) be commenced once the absolute CD4+ T lymphocyte count falls below 350 cells per microlitre [3-5] (UK and EU guidelines) or below 500 [2] (USA). Other CD4+ T lymphocyte counts are used to predict higher risk of specific opportunistic infections and suggest the need for antibiotic prophylaxis [4]. For example, the risk of Pneumocystis jirovecii pneumonia is increased when the absolute CD4+ T lymphocyte count falls below 200 cells per microlitre. Therefore, identifying the limitations and inherent variations of a technique used in measuring such levels is important to avoid incorrect treatment intervention and ensure that international studies are comparable when dealing with cases of adult HIV where the use of absolute numbers of CD4+ lymphocytes is essential.

Historically, laboratories enumerating absolute CD4+ T lymphocytes have used the dual platform (DP) [6]. However, UK NEQAS and others have highlighted the limitations of this technique and have recommended that the single platform (SP) approach is the method of choice [7-11]. The difference between the DP and SP techniques has been summarized in detail elsewhere [7]. A study undertaken by Kunkl and colleagues identified and reported measurement error for 18 Italian laboratories using the DP approach [12]. However, as the SP approach has long been established as the method of choice [7, 10, 11], a similar evaluation needs to be undertaken.

We have previously reported the External Quality Assessment (EQA) results from laboratories that undertake absolute CD4+ T lymphocyte enumeration in the UK NEQAS Immune Monitoring programme. Whilst the Immune Monitoring programme has been operational since 1988, the present study is restricted to data collected over a 12-year period (2001–2012), as this gave the highest mix of DP versus SP data to enable the robust calculation of confidence limits, relative error and to define the boundaries of acceptability for each method. Furthermore, we have examined in detail the use of SP Technology. In addition, we have also studied how these variations will the impact upon the clinical decisions at the therapeutic level of 350 CD4+ T lymphocytes per microlitre.

Currently the British HIV Association (BHIVA (UK)), National Institute for Health (NIH (USA)), European AIDS Clinical Society and World Health Organization (WHO) recommend commencing HAART at CD4+ T lymphocyte levels of less than 350 cells/μl [2-5]. For each sample analyzed during this period the CD4+ T lymphocyte count was categorized as <350 cells/μl (treat) or >350 cells/μl (don't treat) based on the overall mean of results returned during the EQA send out.

METHODS AND MATERIALS

Two stabilized blood samples, prepared in the manner previously described [13], were dispatched bimonthly to between 260 (January 2001) and 824 (October 2012) centres worldwide participating in the UK NEQAS Immune Monitoring programme over the 12-year period 2001 to 2012 (2006–2012 for FACSCount data). The absolute CD4+ T lymphocyte count of the samples ranged from 60 to 1759 cells/μl. All samples were supplied by the National Blood Service and consisted of surplus donated material following informed consent.

Laboratories were required to determine the absolute CD4+ T lymphocyte count (cells per microlitre) for each sample and submit results to UK NEQAS for Leucocyte Immunophenotyping via a dedicated website (www.ukneqasli.org/sampleentry). Results were analyzed using previously reported methods [14]. Each centre was monitored for performance; satisfactory performance was based upon the closeness of a participant's result to the trimmed mean for the sample, as defined by the range of the standard deviation. The generation of trimmed mean and standard deviations were derived using the method developed by Healy [15], and was designed to eliminate outliers in EQA exercises.

For this study, 58,626 individual data sets from a total of 143 individual send out samples issued over the 12-year period were analyzed. To ensure statistical robustness each data set contained a minimum of 30 participants in each platform subgroup. From the data, we calculated and compared the 99.9% confidence limits and the relative error at all levels of absolute CD4+ T lymphocyte count for each technology. Following the comparison of relative errors we undertook an additional analysis of the data to assess how the identified variations would impact upon the clinical decision making processes. For each sample analyzed during this period the CD4+ T lymphocyte count was categorized as <350 cells/μl (treat) or >350 cells/μl (don't treat) based on the overall mean of results returned during the EQA send out.

RESULTS

The results were stratified in relation to the trimmed mean CD4+ T lymphocyte count (obtained from the SP user group within each EQA send out), and separated into groups depending on whether DP or SP methodology was used with the SP approach further subcategorized into the specific technology based on whether TruCount, Flow-Count, or FACSCount approaches were used. These were analyzed to calculate the relative errors and 99.9% confidence limits for absolute CD4+ T lymphocyte counts both as approach and technology based.

A comparison of 99.9% confidence limits for dual and the main single platform technologies over the 12-year period at CD4+ T lymphocyte levels that correspond to HAART initiation is shown in Figure 1 (note FACSCount not shown on graph due to having a shorter analysis period). All three SP technologies studied have significantly smaller 99.9% confidence limits when compared to the dual platform technique (P < 0.0001) indicating that a SP derived result is more precise than one obtained using the DP method. Furthermore, it would appear that the technique with the least amount of sample manipulation (FACSCount) had a consistent relative error of 7% across the full range of CD4+ T lymphocyte levels studied for the period 2006 onwards.

Figure 1.

99.9% Confidence limits for CD4+ T lymphocyte counts by single and dual platform technologies.

Results at the clinically significant absolute CD4+ T lymphocyte level of 350 cells per microlitre (defined by results of all participants), below which guidelines say antiretroviral therapy commencement is recommended [2-5], were examined in more detail. The confidence limit range for each technique were 96 cells per microlitre for dual platform, 59 cells per microlitre for Flow-Count, 40 cells per microlitre for FACSCount and 37 cells per microlitre for TruCount. This means that for a given result of 350 cells per microlitre when using dual platform the ‘true’ result would lie between 302 and 398 cells per microlitre, 304–363 cells per microlitre when using Flow-Count, and 330–370 and 341–378 cells per microlitre for FACSCount and TruCount system respectively. The 99.9% confidence limits for each of the techniques at this level are shown in Table 1.

Table 1. 99.9% Confidence Limits for CD4+ T Lymphocyte Counts at a Level of 350 Cells per Microlitre
MethodLower 99.9% confidence limitUpper 99.9% confidence limitConfidence limit range
Dual platform30239896
Flow-count (single platform)30436359
TruCount (single platform)34137837
FACSCount (single platform)33037040

We then compared our data with a previously published study that reported the relative error for CD4+ T lymphocyte enumeration for dual platform methods, albeit using a significantly smaller data set (18 sites and 24 samples) [12]. The relative errors for the data returned to UK NEQAS and this previous study [12] are shown in Figure 2. Both studies are comparable over the entire range of CD4+ T lymphocyte counts examined, having a Spearman correlation value (r) of 0.95. However, this comparison also highlighted that single platform techniques have a lower relative error compared to the dual platform approach. Following the comparison of relative errors, an additional analysis of the data was performed to determine the effect that the identified variations would have on clinical decision making processes. The results of this analysis are shown in Figure 3.

Figure 2.

Relative errors for CD4+ T lymphocyte counts using differing technologies.

Figure 3.

Treatment administration for different absolute counting techniques at differing levels of CD4+ T lymphocytes.

Below the level of 350 cells/μl it was found that there was no significant difference (P = 0.2) in cases where treatment would be given, irrespective of whether the analysis was performed using dual platform or single platform analysis. However, for samples above the level of 350 cells/μl, laboratories using dual platform techniques would classify 20% of samples incorrectly as being below the 350 cells/μl level, significantly more than laboratories using single platform (P = 0.04) and therefore could result in patients receiving treatment too early.

DISCUSSION

Previous studies have repeatedly shown that single platform technologies consistently outperform dual platform technology in the enumeration of CD4+ T lymphocytes [7, 9-11]. However, what is not known is the effect these technologies have upon the clinical and therapeutic use of CD4+ T lymphocytes and how misclassification of the results impacts upon its use. A previous study by Kunkl et al. [12] reported the relative error from a cohort of 18 laboratories using dual platform in the determination of CD4+ T lymphocytes, however there is no data assessing this with single platform technologies. To address this we have reviewed 12 years of UK NEQAS data collected from up to 824 international laboratories that yielded 58,626 data points. We then compared single platform with dual platform technology and examined the effect upon: relative error, technology confidence limits and the impact results obtained by these different technologies have upon the misclassification of the 350 CD4+ T lymphocytes/μl clinical therapeutic threshold. We used the same 99.9% confidence limits as used in the earlier work by Kunkl et al. [12] to enable direct comparison to this previous study.

This current study shows, for the first time, that the 99.9% confidence limits obtained using single platform technologies are superior to those obtained with dual platform methods. All of the single platform technologies examined gave a smaller range of confidence limits across all CD4+ T lymphocyte counts when compared to the dual platform method. In addition, we have shown that the data from this study shows a high degree of concordance with the previous study of Kunkl et al. [12] for dual platform approaches, despite the fact that they used fresh samples and our study used stabilized whole blood. This confirms that stabilized blood behaves in an identical manner (but without the degradation effects) as fresh blood (note the study conducted by Kunkl et al. was performed within 24 hours of sample draw). However, and of significance, the difference in confidence limits at clinically significant levels was such that results derived using dual platform methods were more likely to adversely influence patient treatment compared to results generated by single platform technology by potentially causing an earlier initiation of HAART.

A divergence of confidence limits was observed for the Flow-Count system and was especially noticeable above the 1000 cells/μl CD4+ T lymphocyte level (not shown) but these levels are above those used in clinical management of HIV+ individuals and thus will not be discussed further. However at clinically relevant levels of CD4+ T lymphocytes the confidence limits of Flow-Count are within those generated by the dual platform users. It is most likely that the explanation lies in the difference between the two single platform techniques. TruCount features a lyophilized pellet in the tube, to which a set volume of blood is added. This method has only two manipulation steps of the sample—adequate mixing and accurate blood aliquoting. However, Flow-Count uses a vial of beads that must be premixed and then a set volume of these must be added to a set volume of blood. There are four manipulation steps of the sample using this method—sample mixing, accurate blood aliquoting, bead mixing, and accurate bead aliquoting. In any analytical method every step has an inherent error that cannot be avoided, and the error of every step in a process combines to create the compound error for the method. It is, therefore, logical to keep the manipulation steps to a minimum to avoid generating a larger compound error. It is interesting to note that whilst the relative error of TruCount is lower than that seen with FACSCount, at the 350 CD4+ T lymphocyte cells/μl level, this could be explained by the fact that the FACSCount requires no operator intervention compared to the TruCount and that the FACSCount instrument is probably being used as a point of care instrument by individuals who have a lower skill level. Thus, TruCount operators are able to intervene to reanalyze any suspect data, particularly important at critical decision making points. With four manipulation steps Flow-Count has a larger potential compound error than FACSCount and TruCount which have only two manipulation steps. It seems likely that this is the cause for the slightly higher confidence limits observed in Flow-Count users and the divergence of Flow-Count results from FACSCount and TruCount users at higher CD4+ T lymphocyte levels. However, where targeted training is used Flow-Count has been shown to reduce this variability (particularly in CD34+ Stem Cell counting) and give results comparable to those sites using TruCount approach [16].

The observed differences in the confidence limits highlight that there is significant variability of the absolute CD4+ T lymphocyte counts generated using different technologies and that this may then have an impact on patient treatment. Therefore, centres should tailor their analytical acceptable limits on the techniques in use, a factor that should also be taken into account when performing comparison studies between centres using different technologies to generate absolute counts. In addition centres taking part in clinical trials may have to adjust their target ranges appropriately to correct for the effect that their technique variation is having on results. External quality assurance agencies will have to be aware of the differences in results that can be produced by different technologies, and any performance monitoring performed by EQA organizations will have to take this into account.

When comparing the relative errors inherent to dual and single platform technologies an additional comparison was made to a previous study, where the relative errors for dual platform users generating CD4+ T lymphocyte counts in an EQA programme were calculated [12]. Data obtained from both of the single platform methods studied and from dual platform users was compared to this previously published data. From the data comparison the initial finding was that there was no significant difference between the relative errors of dual platform users, obtained using the results returned by UK NEQAS participants, and the relative errors obtained previously for dual platform users [12]. As our studies used stabilized blood and the earlier study used fresh blood this data shows that there is no significant difference (P = 0.0214) in the choice of material used in the EQA scheme operation. However the use of stabilized material does have significant benefits over fresh material such as increased antigen stability over time and making logistics of sample shipment easier [13]. In addition the correlation of the dual platform users in this study to the previous study also shows that since 2002, when the initial study was published, there has been no improvement in the generation of absolute CD4+ T lymphocyte counts by dual platform technologies. This is an expected finding because, although new haematology analyzers have been launched, the problems inherent in dual platform technologies are still in place whatever the age of the haematology analyzer. It should be noted that the samples produced by UK NEQAS are optimized for flow cytometry but not for use in routine haematology analyzers. As such there is the potential for bias in this aspect as a sub optimal performance of the samples on a DP methodology could be the reason for the apparent difference between this and SP techniques. However, given that the results (as expressed as confidence limits) for the UK NEQAS DP data have no significant difference to the previous study by Kunkl et al.'s DP data [12] it would suggest that in this study the use of a stabilized material has made no difference to the outcomes and findings.

The third major finding from the comparison was that much lower relative errors are obtained by users of SP methods when compared to DP users. This was irrespective of the level of the absolute CD4+ T lymphocyte cell count; this study shows that single platform consistently outperformed dual platform methods, returning relative errors over half those seen in dual platform. The lowest relative errors seen were obtained with TruCount users and although the relative error was higher for users of Flow-Count, they were still lower than those seen by users of dual platform techniques. Interestingly, the relative error for FACSCount was a consistent 7% across the entire range of CD4+ T lymphocyte levels examined for the six years of the FACSCount data sets (60 cells/μl to 1368 cells/μl).

When analysis was performed on the data where consensus results (irrespective of platform) were below the international treatment threshold of 350 cells/μl, we found that there was no significant misclassification between either SP or DP technologies meaning most patients would receive the correct treatment at the correct time. However, when we examined results that were above the treatment level of 350 cells/μl a significant difference in misclassifications (P = 0.04) was seen between the DP and SP results. This meant that patients being monitored using DP technology were more likely to start HAART therapy prematurely than those being monitored using SP technology.

Whilst several international organizations [3-5] have recommended the use of a 350 cells/μl cut-off for the initiation of HAART, others have recommended beginning earlier at a level of 500 cells/μl [2]. Also limitations in the availability of drugs due to cost factors has led several resource poor countries routinely use a limit of less than 350 cells/μl, thus delaying initiation of therapy. Furthermore, where the possibility of coinfection exists with either hepatitis B or hepatitis C then treatment before the 350 cells/μl limit is reached has been recommended [4]. Thus, whilst the focus of this study has been on the main therapeutic cut off limit of 350 cells/μl, it should be noted that the factors relating to confidence limits in DP and SP methods exist across the entire range of CD4+ T lymphocyte levels examined (Fig. 1). As such the findings of this study can be applied to any CD4+ T lymphocyte level experienced in routine testing of patients. Whilst, the degree of variability will alter, the underlying message of SP methods having better confidence limits than the DP methods remains unchanged. One interesting point to note here is that if a laboratory/clinician is using a DP approach to monitor patient CD4+ T lymphocyte levels then it is feasibly possible that the rate of progression will remain undetected using DP compared to SP approach because the variability is wider and therefore less sensitive to change. This would then have the effect of making the monitoring of disease progression more difficult for clinicians and may have adverse outcomes for any patients so affected. Alternatively the wider confidence limits of DP technologies could show changes in CD4+ T lymphocyte levels when no such alteration was occurring (e.g. a false fall in CD4+ T lymphocyte values), leading to unnecessary clinical interventions. As such, centres performing CD4+ T lymphocyte monitoring using DP methods will experience the limitations of this technique throughout the course of monitoring patients.

Finally, this study has focussed upon the use of absolute values and the uncertainty for commencement of treatment based upon the therapeutic decision making points of such values. It is acknowledged that in certain instances, for example paediatrics, it is often advisable to use percentage values. Whilst this study has not examined such variability of percentage values and this is the subject of a further study, early anecdotal evidence from our EQA program (unpublished observations and data not shown) would suggest that there is very little difference between percentage values irrespective of which platform (SP or DP) is used. This is due to the fact that determination of percentage values is undertaken only on the flow cytometer and does not have any external influence, other than the gating strategy used. We have also previously shown that the variability in absolute counts derived by SP versus DP is due to the use of the total leucocyte count used by DP approaches to generate the absolute count [7] and not the determination of the percentage value as this is an independent variable.

In summary it is recommended that laboratories consider changing their technologies to utilize a single platform approach to obtain the benefits outlined and decrease the impact that dual platform will have on treatment modalities.

Acknowledgments

The authors would like to acknowledge the participants in the UK NEQAS Immune Monitoring programme, as without their continued support and EQA results this paper would not have been possible.

Ancillary