Updated systematic review and meta‐analysis of the predictive value of serum biomarkers in the assessment and management of fever during neutropenia in children with cancer

Routinely measurable biomarkers as predictors for adverse outcomes in febrile neutropenia could improve management through risk stratification. This systematic review assesses the predictive role of biomarkers in identifying events such as bacteraemia, clinically documented infections, microbiologically documented infection, severe sepsis requiring intensive care or high dependency care and death. This review collates 8319 episodes from 4843 patients. C‐reactive protein (CRP), interleukin (IL)‐6, IL‐8 and procalcitonin (PCT) consistently predict bacteraemia and severe sepsis; other outcomes have highly heterogeneous results. Performance of the biomarkers at admission using different thresholds demonstrates that PCT > 0.5 ng/mL offers the best compromise between sensitivity and specificity: sensitivity 0.67 (confidence interval [CI] 0.53‐0.79) specificity 0.73 (CI 0.66‐0.77). Seventeen studies describe the use of serial biomarkers, with PCT having the greatest discriminatory role. Biomarkers, potentially with serial measurements, may predict adverse outcomes in paediatric febrile neutropenia and their role in risk stratification is promising.


INTRODUCTION
Neutropenic sepsis, or febrile neutropenia (FN), remains a serious complication of childhood cancer therapy with an incidence of bacteraemia in 11-24% cases, paediatric intensive care unit (PICU) admissions in 0.9-11% cases, and fatality in 0.2-3% cases. [1][2][3][4][5][6][7][8] For this reason, children receiving anticancer treatment are frequently required to present to hospital if they have a fever. Subsequently, they experience long hospital admissions for treatment of FN despite data supporting safe and effective use of risk-stratified early discharge. [9][10][11] There are at least 25 different paediatric clinical decision rules for the assessment of FN. These require local calibration before Abbreviations: CDI, clinically documented infection; CI, confidence interval; CRP, C-reactive protein; FN, febrile neutropenia; FUO, fever of unknown origin; HSCT, hematopoietic stem cell transplant; IL, interleukin; LOS, length of stay; MDI, microbiologically documented infection; PCT, procalcitonin; PICU, paediatric intensive care unit; QUADAS-2, Quality Assessment of Diagnostic Accuracy Studies; ROC, receiver operating characteristic.
Two previous systematic reviews have assessed the use of biomarkers in predicting adverse outcomes in febrile neutropenic episodes in children and young people with cancer. 22,23 These reviews showed marked variation in terms of the quality of individual studies, biomarkers used and outcomes measured, which made it difficult to make comparisons between the biomarkers. Further studies have been published since the last review in 2011, necessitating an updated systematic review to evaluate the sensitivity and specificity of serum biomarkers in predicting adverse outcomes in paediatric FN.

MATERIALS AND METHODS
This is an update of two preceding systematic reviews of the predictive value of serum biomarkers in the assessment and management of fever during neutropenia in children with cancer. 22,23 The review protocol was registered with the International register of systematic reviews (PROSPERO) database of systematic reviews: CRD42016036350 in March 2016 (https://www.crd.york.ac.uk/prospero/).

Search strategy and selection criteria
The update search strategy mirrored the preceding reviews.

Data extraction and risk of bias assessment
Data were extracted by one reviewer using a standardised data extraction form, which had been used in the preceding systematic reviews, and checked for accuracy independently by a second reviewer.

Methods of data synthesis
Quantitative pooling was performed for the commonest biomarkers if there was sufficient data for meta-analysis, where the same biomarker for equivalent clinical outcomes was available. Where possible, the groups were analysed for sources of heterogeneity. The MADA package was used in R to undertake the data pooling. The results are displayed using cross-hairs plots in receiver operating characteristic (ROC) space. This graphical approach combines the forest plot with the ROC curve, showing study weight, the bivariate relationship of sensitivity and specificity, and the confidence intervals around each individual study as well as the overall pooled estimate.

RESULTS
The search strategy identified 509 articles, of which 38 new articles were included. Sixteen studies were excluded because data were not extractable either by a 2 × 2 table of dichotomized data or by a measure of central tendency plus spread. Twenty-two remaining studies with suitable quantitative data were combined with 21 studies from the preceding two systematic reviews ( Figure 1). Two further studies were excluded before quantitative synthesis because there were insufficient studies looking at similar outcomes using interleukin (IL)- 10 24 or insufficient numbers examining adrenomedullin. 25

Study and population characteristics
Twenty-two new studies with quantitative data were identified in this update comprising of 1851 patients and 3060 episodes. The new studies were geographically diverse (11 different countries) with an appropriate range of paediatric malignant diagnoses ( Table 1). The mean age of the patients within these studies was 6.7 years with an age range between 0.3 and 23 years. One study did not provide data on patient age. 26

Risk of bias assessment
The summarised QUADAS-2 assessment of the 22 new studies is shown in Figure 2; the quality assessment of individual studies is provided in Supporting Information S2.
The selection process was inadequately described in 12 (55%) studies and three included studies were not cohort designed. Two of these studies were case-control 40,44 and one was a clinical trial. 38 FN data relating to biomarkers and outcomes were extractable from these studies but case-control studies have been shown to exaggerate diagnostic accuracy estimates 48  Leukaemias (27), solid (7)  The timings of tests were not given in two studies. 19,26 'Admission' samples were reported as taken before the commencement of treatment. When serial biomarkers were used, timings varied between studies and missing values were unclear if mean results were given.  Bivariate meta-analysis of the biomarkers at different cutoff levels reiterates the expected relationship found in the previous reviews;

Quantitative synthesis
low biomarker cutoff levels predict adverse outcomes with great sensitivity but poor specificity, and high biomarker cutoff levels predict adverse outcomes with poor sensitivity but good specificity ( Table 2).

Comparison of biomarkers
Thirteen studies in this review used more than one biomarker and gave comparative descriptions of performance. [26][27][28][29][30][31][32]37,38,42,44,45,47 Three out of the four studies comparing the performance of CRP and PCT 31,32,37,44 found the latter to be better at predicting adverse outcomes. Such comparisons can be affected by the choice of cutoff, but the following are consistent across thresholds. PCT appeared to be more discriminatory at admission, whereas CRP was more discriminatory after 48 h. In one of these comparative studies, CRP was more sensitive but not more specific than PCT. CRP was also found to be more sensitive but less specific in a study where its performance was compared with IL8. 42 There were seven studies evaluating CRP and IL-8 or IL-6 26,28-30,37,38,42 but only two compared their predictive capacities, finding that the ILs added greater predictive value than CRP. 29,30 Five studies explored the predictive role of IL-6 and IL-8: one found IL8 to perform better, 30 one found them to be equivalent, 45 and the other three did not make any comparisons. 26,27,37 There were no direct comparisons of PCT with IL-8 but one study compared the predictive value of PCT to IL-6, 47 finding that IL-6 demonstrated better discriminatory power at admission and at 12-24 h of admission but particularly at admission. The authors also found combining PCT (>0.25 ng/L) with IL-6 (>60 ng/L), which significantly increased the likelihood of identifying a bacterial infection at both time points.

Use of serial biomarkers
Eleven new studies in this systematic review assessed the four commonest biomarkers at more than one time point. Serial CRP levels were evaluated in seven studies, PCT in six studies, IL-8 in four studies and IL-6 in four studies ( Table 3). The description of the timings was often unclear or varied; for example, studies describe the timing of the initial biomarker as 'admission' , 'day 0 ′ or 'day 1 ′ . Meta-analysis was not possible due to varying time points and outcome measures, and insufficient data.
Four of seven studies evaluating serial CRPs described a better predictive value after 48 h than at admission, echoing five out of six studies showing serial PCTs were likely to be more useful than single PCTs.
The claimed benefit of serial IL-8 levels was inconsistent, and two of the four IL-6 studies showed no benefit in serial assessment.   52 This review did not find adequate data to perform meta-analyses on LOS in hospital or community-based treatment (i.e. treatment duration) but the available data for such outcomes are likely to be confounded by centre-specific FN policy.

DISCUSSION
The biomarkers predictive ability decreased in sensitivity and increased in specificity as the cutoff level increased. The potential use of different biomarker assays between studies for a given threshold may impact the reliability of the pSn and pSp results obtained, especially where fewer studies contributed to the pSp/pSn of a threshold.
The trade between an acceptable level of sensitivity and specificity is a clinical decision and factors such as study/episode numbers and heterogeneity of data should be considered when deciding upon which threshold to use in clinical practice.
Comparative descriptions of the biomarkers found CRP to be the poorest performing biomarker. The ILs possibly have a predictive role within 24 h of admission but greater patient numbers and studies are required to strengthen this finding. PCT is more discriminatory at admission and performs better than CRP but its performance against the ILs has only been explored in one study against IL-6. The

CONCLUSIONS
Biomarkers have been used to fortify existing clinical decision rules in the management of FN. The choice of biomarker for predicting an adverse outcome and the choice of optimal threshold remain inconclusive due to the variability within and between studies. However, based on this review, PCT at a threshold of 0.5 ng/mL appears the most suitable admission biomarker to predict adverse outcomes. There may be additional benefit in using serial PCT measurements. This needs to be validated through a larger multicentre study, using consistent biomarker timings, assays and outcome definitions, before widespread clinical recommendation and use.