CD4 criteria improves the sensitivity of a clinical algorithm developed to identify viral failure in HIV-positive patients on antiretroviral therapy

Introduction Several studies from resource-limited settings have demonstrated that clinical and immunologic criteria are poor predictors of virologic failure, confirming the need for viral load monitoring or at least an algorithm to target viral load testing. We used data from an electronic patient management system to develop an algorithm to identify patients at risk of viral failure using a combination of accessible and inexpensive markers. Methods We analyzed data from HIV-positive adults initiated on antiretroviral therapy (ART) in Johannesburg, South Africa, between April 2004 and February 2010. Viral failure was defined as ≥2 consecutive HIV-RNA viral loads >400 copies/ml following suppression ≤400 copies/ml. We used Cox-proportional hazards models to calculate hazard ratios (HR) and 95% confidence intervals (CI). Weights for each predictor associated with virologic failure were created as the sum of the natural logarithm of the adjusted HR and dichotomized with the optimal cut-off at the point with the highest sensitivity and specificity (i.e. ≤4 vs. >4). We assessed the diagnostic accuracy of predictor scores cut-offs, with and without CD4 criteria (CD4 <100 cells/mm3; CD4 < baseline; >30% drop in CD4), by calculating the proportion with the outcome and the observed sensitivity, specificity, positive and negative predictive value of the predictor score compared to the gold standard of virologic failure. Results We matched 919 patients with virologic failure (1:3) to 2756 patients without. Our predictor score included variables at ART initiation (i.e. gender, age, CD4 count <100 cells/mm3, WHO stage III/IV and albumin) and laboratory and clinical follow-up data (drop in haemoglobin, mean cell volume (MCV) <100 fl, CD4 count <200 cells/mm3, new or recurrent WHO stage III/IV condition, diagnosis of new condition or symptom and regimen change). Overall, 51.4% had a score 51.4% had a score ≥4 and 48.6% had a score <4. A predictor score including CD4 criteria performed better than a score without CD4 criteria and better than WHO clinico-immunological criteria or WHO clinical staging to predict virologic failure (sensitivity 57.1% vs. 40.9%, 25.2% and 20.9%, respectively). Conclusions Predictor scores or risk categories, with CD4 criteria, could be used to identify patients at risk of virologic failure in resource-limited settings so that these patients may be targeted for focused interventions to improve HIV treatment outcomes.


Introduction
In 2012, 9.7 million people in low-and middle-income countries received antiretroviral therapy (ART), representing 61% of all who were eligible under the 2010 WHO HIV treatment guidelines [1]. When patients start ART, mortality is highest during the first months of therapy [2]. This high-risk period requires careful monitoring of patients for disease progression, potential drug toxicity and treatment failure. As more people in resource-limited settings receive ART, treatment failure and the need to switch drug regimens will increase [3].
Measurement of HIV viral RNA (viral load) has been shown to be one of the best predictors of clinical disease progression, as well as being the main parameter to assess treatment response in HIV-positive patients [4]. The diagnosis of treatment failure in many resource-constrained settings is difficult because of limited access to plasma HIV-RNA testing and because few laboratories can afford to perform these tests routinely [5Á9]. Not only are the laboratory assays (e.g. CD4 cell counts and HIV-RNA testing) for monitoring treatment expensive, but they are technically demanding, requiring high levels of expertise and equipment Á thereby limiting widespread use in resource-poor settings [10]. Monitoring of ART with dual HIV-RNA and CD4 count testing is often unsustainable within national health programmes in resource-constrained settings. Currently, there is no real alternative to CD4 count testing for monitoring immunologic responses to treatment [10]. The World Health Organization (WHO) recommends clinico-immunological criteria to identify first-line treatment failure in resource-constrained settings; however, this has limited sensitivity and positive predictive value which can lead to both delayed and inappropriate premature switching to more expensive second-line agents [11Á15]. Several studies from resource-limited settings have demonstrated that clinical and immunologic criteria are poor predictors of virologic failure and, therefore, the need for a viral load monitoring algorithm exists [16,17]. Results of work on predictive markers has been conflicting [7,9,10,18,19]. Thus, the need for low-cost surrogate markers of virologic failure that are widely available, even in resource-limited settings, is essential. Numerous attempts at identifying such markers have been made with the intent of alleviating the need for frequent CD4 count or viral load testing [10]. For example, Meya and colleagues reported that an algorithm including data on patient adherence to drug regimens, CD4 cell count and clinical criteria may identify those at risk for virologic failure and this can be used to allocate viral load testing in resource-limited settings more efficiently and costeffectively [6].
With the expansion of ART, there is an urgent need for alternative low-cost predictors of virologic failure in resourcelimited settings. Monitoring the efficacy of ART in resourceconstrained settings remains a critical challenge and current research priorities include optimizing monitoring strategies and developing simpler, cheaper assays that can be administered by minimally trained clinic personnel [5]. Targeted viral load testing in a subgroup of patients with an increased risk of treatment failure may be a feasible and effective strategy for resource-limited settings. As the South African guidelines for monitoring HIV-patients on ART require regular CD4 counts and viral load testing [20Á22], we set out to use data from routine care in a large-volume ART clinic to identify low-cost predictors of treatment failure in order to identify patients in need of targeted viral load testing.

Study site and subjects
The study was a retrospective analysis of data collected from a cohort of HIV-positive adults initiated on ART at the Themba Lethu Clinic (TLC), Johannesburg. This cohort has been described elsewhere [23]. By the end of 2011, TLC has provided care for over 29,000 patients with HIV, including 20,600 on ART. Patients at Themba Lethu Clinic are initiated on ART and receive routine follow-up visits according to South African National Department of Health ART treatment guidelines [20Á22]. Longitudinal, clinical and demographic data are collected and stored on an electronic patient management system, TherapyEdge-HIV (Associated Biological Systems, South Africa), and exported into SAS 9.1 (SAS Institute Inc., Cary, NC) for the analysis. This study was approved by the Human Research Ethics Committee of the University of the Witwatersrand.

Inclusion criteria
Eligible subjects included HIV-positive ART naïve adults ( ]18 years of age) who were initiated according to South African National Department of Health ART treatment guidelines onto a standard first-line regimen of stavudine (d4T) or zidovudine (AZT) with lamivudine (3TC) and either efavirenz (EFV) or nevirapine (NVP) between April 2004 and February 2010. The analysis was limited to patients who had two or more viral load results recorded and more than six months on ART. We excluded those who were not ART naïve, those who transferred in from another antiretroviral (ARV) treatment site, those with a baseline CD4 count 350 cells/mm 3 , those with baseline viral load 5400 copies/ml and those receiving second-line ART (regimen containing either didanosine or a protease inhibitor such as lopinavir-ritonavir) at ART initiation. We also excluded pregnant women as they are initiated using different criteria and are typically placed on lopinavir and ritonavir rather than EFV.

Study variables
Since 2004, patient data have been captured electronically. Between 2004 and 2010, monitoring tests were done every six months but after 2010 this changed to 6 and 12 months during the first year of treatment and yearly thereafter [23]. After ART initiation, ARV drug collection visits are scheduled monthly for the first 6Á12 months on treatment and every two months thereafter once stable whereas medical followup visits are scheduled at months one, three, six, and twelve thereafter depending on the regimen.
Our primary outcome was virologic failure defined as sustained or two consecutive HIV-RNA (]400 copies/ml) following suppression below this level [24]. Viral failure was assessed at three time points: 1) ever, 2) in the first 18 months after ART initiation and 3) after the first 18 months. We developed the monitoring algorithm in patients with virologic failure and time-matched controls (91 month) who did not fail, using greedy matching (1:3).
Candidate predictors for the monitoring algorithm were divided into baseline and follow-up variables. Over 27 inexpensive, routinely collected variables were included in a preliminary analysis [25,26]. These included body mass index (BMI; weight in kg/height in m 2 ), haemoglobin (Hb), haematocrit (%), to mean corpuscular haemoglobin (MCH), mean corpuscular haemoglobin concentration (MCHC), mean cell volume (MCV), platelet count, total lymphocyte count, white blood cell (WBC) count, systolic and diastolic blood pressure, oral candidiasis, WHO stage III or IV condition and serum albumin. Of the 27, 12 were included in the final predictor score.
Follow-up variables were defined as 1) the clinical and laboratory parameter at the visit prior to the outcome if the parameter was a single measurement (i.e. Hb B10 g/dl, MCV B100 fl, CD4 count B200 cells/mm 3 , current weight below baseline or new/recurrent WHO stage III/IV condition) and 2) the difference in value between the two visits prior to the outcome (i.e. drop in MCV or haematocrit by more than 20% or drop in weight by more than 5 kg). The cut-offs used for these variables were obtained from literature and a dichotomous outcome was obtained for each variable (0 for No and 1 for Yes) [5,6,18,27]. For other variables, including measures of adherence, missed medical visits or missed ARV drug pickups (defined as missing a scheduled visit by more than seven days), the (re)appearance or diagnosis of a new symptom or opportunistic infection and regimen change or substitution, if these were reported within 6Á9 months prior to the outcome then the variable was dichotomized as Yes (1) Where there were no missed medical visits or missed ARV drug pickups, no (re)appearance or diagnosis of a new symptom or opportunistic infection and no regimen change during the 6Á9 months prior to the outcome, the variable was dichotomized as No (0). While CD4 testing may be available in most settings, this is not the case in every country [28]. Therefore, two scenarios were considered: 1) including clinical factors alone and 2) including both clinical factors and CD4.

Statistical analysis
Patient demographics and baseline characteristics were summarized using means with standard deviation for normally distributed variables, medians with interquartile range for not normally distributed variables and proportions for categorical variables. We assessed the association between the different variables or predictors and the risk of virologic failure using Cox-proportional hazard models to estimate the hazard ratio (HR) and 95% confidence interval (CI). Person time accrued from ART initiation until the earliest of 1) date of virologic failure, 2) date of death, 3) date of loss to followup (defined as missing their last scheduled visit by more than 122 days), 4) date of transfer or 5) administrative censoring (close of dataset). Proportional hazard assumptions were formally checked using logÁlog survival curves and goodness of fit tests for predictor variables.
We then developed a scoring system following the approach of Spiegelhalter and Knill-Jones [29,30]. We created a predictor score for each patient defined as the sum of the natural logarithm of the adjusted HR for each predictor rounded to the nearest integer [5]. The predictor score was dichotomized using different cut-offs ranging from ]1 vs. B1 to ]7 vs. B7. We assessed the diagnostic accuracy of each cut-off (i.e. ]4 vs. B4) by calculating the proportion with the outcome and the observed sensitivity (Se), specificity (Sp), PPV and NPV of predictor scores compared to the gold standard of virologic failure. The optimal cut-off, as guided by Receiver operator curves (ROC) to calculate the area under curve (AUC), was identified as the point with the highest sensitivity and acceptable specificity. Logistic regression was used to estimate the odds ratio to determine associations between each cut-off and virologic failure.

Results
Of a total of 11,582 patients that initiated ART, 32% (2632/ 8225) never achieved an undetectable viral load. A total of 3675 patients were included in the analysis, 919 with virologic failure (rebound after initial virologic suppression) matched (1:3) to 2756 patients who did not experience virologic failure during follow-up. Baseline demographics and clinical characteristics were similar between the two groups, although those with virologic failure appeared somewhat more immunosuppressed (Table 1).
In crude analysis, numerous baseline and follow-up variables were associated (pB0.1) with virologic failure. These variables were entered into a multivariate Coxproportional hazards model and characteristics that maintained statistical significance are shown in Table 2. Twelve variables were included in the final predictor score. We adjusted for gender in the Cox-proportional hazards model although this variable did not contribute to the overall score. We assessed the diagnostic accuracy of the total predictor score to identify virologic failure, defined by viral load, as the gold standard. The sensitivity and specificity of the total predictor score (including CD4 criteria) was 57.1% and 50.5%, respectively (Figure 1). Using the same cut-off as that of the algorithm with CD4 criteria (]4 vs. B4), the sensitivity increased to 62.0% but the specificity decreased to 34.6% when we ignored the information about patient CD4 count. When using a cut-off of ]3 vs. B3, the sensitivity was 40.9% and the specificity was 52.7% while the sensitivity and specificity of a cut-off of ]5 vs. B5 was 78.6% and 20.7%, respectively (Figure 2).
When virologic failure was stratified by time on ART, that is, 1) virologic failure in the first 18 months after ART initiation and 2) virologic failure after 18 months on ART, the sensitivity decreased from 67.3% to 51.8% while the specificity increased from 41.7% to 55.1%, respectively (Table 3). When restricting the analysis to patients on second-line ART at time of virologic failure, (220/919; 23.9%), the sensitivity increased to 58.6%, the specificity remained at 50.7% and the PPV improved from 27.8% to 41.0%.
We included two definitions of the WHO criteria for treatment failure in a sensitivity analysis, that is, 1) clinical and immunological WHO criteria which included a new or recurrent stage 3 or 4 condition, a CD4 count below baseline and persistent CD4 count below 100 cells/mm 3 and 2) new or recurrent stage 3 or 4 condition [5,6,11]. Using the new or Figure 1. Algorithm for targeted viral load testing using CD4 criteria. Baseline (n 05) and follow-up (n07) variables are used to calculate the total score. The diagram shows how a cut-off score of B4 and ]4 can be used to manage patients at low-risk (reassess at next medical visit) or medium to high-risk (refer for viral load testing) for virologic failure. recurrent stage 3 or 4 condition, the sensitivity dropped to 20.9% compared to 25.2% for the clinical and immunological WHO criteria while the specificity decreased from 73.7% to 70.6%, respectively.

Discussion
Many reports have shown that viral load is the single most important variable that should be used for long-term optimization of ART in resource-limited settings [32]. Several studies from resource-limited settings have provided evidence that supports the need for viral load monitoring of ART or at least the use of an algorithm to target viral load testing [16,17,33Á35].
Our findings support others from resource-limited settings that report a poor sensitivity of WHO clinical and CD4 criteria to identify treatment failure (sensitivity 15.2Á36.4%) [6,13,16,17]. We found that a predictor score with CD4 criteria performed better than WHO clinico-immunological criteria or WHO clinical staging (sensitivity 57.1% vs. 25.2% and 20.9%, respectively). The low PPV is not atypical of those previously reported [6,13]. Our predictor score includes WHO clinical staging and the diagnosis of a new condition or symptom which highlights the value of clinical judgment in diagnosing treatment failure. The AUC to detect virologic failure was calculated as 0.62 (for the score with CD4 criteria) which is less than others have reported for derivation (0.70) and validation (0.75) populations but is similar to what has been reported in routine care (0.69 95% CI 0.60Á0.77) [36,37].
The sensitivity of the predictor score presented here is similar to the sensitivity of an algorithm developed in Kampala, Uganda (Se 57.1% vs. 67%) although this algorithm would result in more false positives (Sp 50.5% vs. 82%) [6]. In their analysis, virologic failure was defined as a single viral load 400 copies/ml. This has been identified by others as a possible weakness since the majority of patients who have intermediate or high viraemia achieve viral suppression at a second measurement [37Á39]. Thus, a single measurement may overestimate the failure rate. In our analysis, we defined virologic failure as sustained or two consecutive HIV-RNA ( ]400 copies/ml) following suppression below this level [24]. Two consecutive episodes is regarded as a more reliable definition of virologic failure [14,40]. . Algorithm for targeted viral load testing without CD4 criteria. Baseline (n04) and follow-up (n 06) variables are used to calculate the total score. The diagram shows how a cut-off score of B3 and ]3 can be used to manage patients at low-risk (reassess at next medical visit) or medium to high-risk (refer for viral load testing) for virologic failure.
Another strength of the predictor score is that the laboratory and clinical information was obtained from the visit preceding virologic failure (median 225 days IQR 176Á 255) and not at the visit, the same day blood was drawn for the viral load [31,37]. This has the additional advantage over other scoring systems in that it can target high-risk patients for interventions (e.g. intensive adherence counselling or incentivized visit attendance) prior to virologic failure. As van Griensven and colleagues highlight, the exact time points during ART when these scores should be applied, for different populations and settings, is unknown [37]. Therefore, we considered two time points, that is, 1) virologic failure in the first 18 months after ART initiation and 2) virologic failure after 18 months on ART. Although the sensitivity decreased from 67.3 to 51.8% and the specificity increased from 41.7 to 55.1%, we identified the same variables for the score in the first 18 months and after 18 months as were identified for the combined score. This suggests that the score may be applied at routine clinic visits while on first-or second-line ART (providing that the patient has been on treatment for more than six months) and combined with targeted viral load testing.
In agreement with Labhardt and colleagues who reported an increase in mortality in patients with a score ]5, we showed an increase in mortality in those with a score ]4 vs. B4 [31]. Those with a score ]4 (229/1889; 12.1%) had an increased risk of mortality compared to those with a score B4 (136/1786; 7.6%) (HR 1.71; 95% CI 1.38Á2.11). Discordant responses, including incomplete viral suppression in the presence of immunologic response, known to occur in resource-limited settings, may be associated with increased mortality [41].
These findings should be interpreted in light of some limitations. First, since the score is dependent on multiple study variables, the analysis is limited by missing data. As mentioned, self-reported adherence had too few observations and could not be included in the model, therefore missed medical visits or missed ARV drug pickup were used as a measure of adherence [27]. Second, the sensitivity of the predictor score with CD4 criteria performed better than the predictor score without CD4 criteria (57.1% vs. 40.9%). Although the usefulness of the predictor score would be limited by access to CD4 count measurements, Marinucci and colleagues reported that by 2010 over 60% of rural ART clinic had access to CD4 count measurements on site [28]. Therefore, the predictor score could be applicable to most resource-limited settings. Third, consistent with other studies, we did not report on individual conditions but rather grouped them into new or recurrent WHO stage III/IV condition and diagnosis of new condition or symptom [6,37]. Fourth, conditions or symptoms can be reported at scheduled and unscheduled medical visits while viral load is only measured every 6Á12 months. It is therefore possible that virologic failure precedes the report of a new condition or symptom resulting in temporal ambiguity. Also, we do not distinguish between immune reconstitution inflammatory syndrome (IRIS) and opportunistic infections. Finally, as the ART treatment guidelines change to initiate patients at a CD4 threshold B350 cells/mm 3 or B500 cells/mm 3 , it is not clear how this will affect the predictor score Á a point that was highlighted by van Griensven and colleagues [37].
WHO recognizes the importance of regular monitoring, including viral load, for reinforcing adherence to ART [42,43]. This algorithm may be useful in settings, which lack the capacity to implement routine viral load testing, to discriminate between patients in need of adherence support and those that may require testing to confirm virologic failure (Figures 1 and 2). We are currently testing the scoring algorithm prospectively, in the same setting, to determine if a changing (worsening) score can predict virologic failure or perhaps identify poor adherence in patients on ART. Since monitoring strategies for resource-constrained settings should include simpler, cheaper assays or methods that can be administered by minimally trained clinic personnel, it will be interesting to see how the predictor score administered by a primary health care nurse compares to ART monitoring performed by clinicians, and used to derive the predictor score.

Conclusions
We identified a predictor score that had an acceptable diagnostic performance Á comparable to other studies Á and performs better than WHO clinic-immunological criteria or WHO staging. This algorithm has a number of strengths, including virologic failure based on two consecutive viral loads, the score uses clinical and laboratory values at the visit preceding virologic failure and that the score can be applied at any point six months after the start of the treatment. The advantage of scoring systems is that they are more flexible than WHO criteria in allowing ART providers to choose the proportion of the population that undergo targeted viral load testing (i.e. the cut-off at which to allocate viral load assays is flexible) [34]. For example, a lower cut-off ( B4 vs. B3) could be used to increase sensitivity (57.1% vs. 75.7%) at the cost of relying on more viral loads. Practical application of the algorithm developed with CD4 criteria could mean that approximately 40% of true viral failure will be misclassified as viral suppression by the algorithm. Perhaps further categorizing patients into low-, medium-and high-risk for virologic failure where medium-risk patients are targeted for intensive support including more frequent monitoring or adherence counselling may help avoid late switching from a failing regimen. Importantly, as many recommend, where possible the viral load should be confirmed before switching therapy [5,6,34,44]. This avoids premature or inappropriate switching to more costly and complicated second-line regimens.
Although studies indicate that the predictive value of predictor scores can be improved by adding laboratory and clinical information Á to date, none have shown to be accurate enough to replace viral load testing [31].