Impact of centralized diagnostic review on quality of initial staging in Hodgkin lymphoma: experience of the German Hodgkin Study Group

Accurate clinical staging is crucial for adequate risk‐adapted treatment in Hodgkin lymphoma (HL) to prevent patients from under‐ or over‐treatment. Within the latest German Hodgkin Study Group trial generation, diagnostic findings such as histopathology, computerized tomography imaging and clinical risk factors were re‐evaluated by expert panels. Here, we retrospectively analysed 5965 patients and identified 399 in who major discordant findings changed their first‐line treatment allocation. Histopathology review did not confirm the initial diagnosis of HL in 87 patients. Treatment allocation was revised in 312 of the remaining 5878 patients: 176 were assigned to a higher and 128 to a lower risk group, respectively; the correct treatment group remained unclear in 8 patients. Cases of revised treatment allocation accounted for 9·8%, 6·0%, 0·8%, and 14·8% of patients initially assigned to the HD13, HD14, HD15 trials and stage IA lymphocyte‐predominant HL project, respectively. Most revisions were due to wrong application of clinical stage (20·5% of 312 patients with revised treatment group), histological subtype (9·0%) or the risk factors ≥3 involved areas (46·8%) or large mediastinal mass (9·3%). In conclusion, centralized review by experienced experts changed risk‐adapted first‐line treatment in a relevant proportion of HL patients. Quality control measures clearly improve the accuracy of treatment and should be implemented in clinical practice.

Despite substantial progress in the treatment of Hodgkin lymphoma (HL) von Tresckow et al, 2012;Behringer et al, 2014), therapy-related side effects, such as secondary malignancies, organ toxicities, infertility, fatigue and impaired quality of life are of major concern (Aleman et al, 2003;Heutte et al, 2009;Swerdlow et al, 2011;Sasse et al, 2012;Behringer et al, 2013). Recent activities in this field thus focus on more individualized therapeutic concepts to further reduce treatment-related side effects while maintaining the current outcome levels (Hay et al, 2013;Klimm et al, 2013). To implement truly customized therapeutic approaches that prevent patients from being under-or over-treated, correct treatment group allocation (TGA) based on detailed assessment of histopathological profile and potential targetable structures is essential (Eichenauer et al, 2014).
Discrepancies in the interpretation of diagnostic findings between local institutions and centralized review with impact on TGA and radiotherapy (RT) volume have been observed in a relevant proportion of patients within several studies (Eich et al, 2004;Stevens et al, 2012). Within the German Hodgkin Study Group (GHSG) HD13, HD14, HD15 trials research paper and the observational project for stage IA lymphocyte-predominant HL (NLPHL-IA), diagnostic findings, including histopathology and images from contrast-enhanced computerized tomography (CT) scans, were centrally reviewed by GHSG multidisciplinary quality panels. To better understand the impact of our quality control measures for initial staging we analysed the proportion of patients with revised histopathological diagnosis or treatment group re-allocation according to the GHSG risk stratification.

Patients
Between 28 January 2003 and 30 September 2009, over 400 trial centres in Germany, Czech Republic, Switzerland, Austria, and the Netherlands recruited patients for the concurrent GSHG trials HD13 (n = 1694), HD14 (n = 2032), HD15 (n = 2173) and NLPHL-IA (n = 81). Patients with newly diagnosed HL were allocated to one of the trials according to their treatment group as described below. All patients provided written informed consent before study entry. Fifteen patients withdrew consent and were thus excluded, resulting in a total of 5965 analysed patients.
Patients received risk-adapted treatment consisting of multi-agent chemotherapy, RT or both modalities. Treatment groups were defined by clinical stage and the presence of risk factors including (a) large mediastinal mass (≥ 1 / 3 of the maximal thoracic diameter), (b) extranodal disease, (c) elevated erythrocyte sedimentation rate (≥50 mm/h without B symptoms, ≥30 mm/h with B symptoms), and (d) involvement of three or more nodal areas. Patients with early-stage favourable HL, i.e. those in clinical stage I-II without risk factors (a-d), qualified for the HD13 trial, except for patients diagnosed with stage IA nodular lymphocyte-predominant HL (NLPHL-IA). Patients with stage I-II and at least one of the risk factors (a-d) were classified as early-stage unfavourable HL and qualified for the HD14 trial, except for patients having stage IIB and at least one of the risk factors (a-b). These patients, as well as those in clinical stage III-IV, qualified for the HD15 trial for advanced-stage HL. Except for the NLPHL-IA, all trials were registered at Current Controlled Trials (ISRCTN63474366, ISRCTN04761296, ISRCTN32443041) and recently published von Tresckow et al, 2012;Behringer et al, 2014). In NLPHL-IA, patients were treated with 30 Gy involved-field RT (IFRT) and followed up for tumour status and survival (Eichenauer 2015).
The procedures for diagnosis, staging, and trial allocation were similar in all studies. There was no screening phase and patients were randomized without prior centralized review. Patients for whom HL was diagnosed by a primary care pathologist had their tissue samples sent to one of six reference pathology review centres within the GHSG network. The patient was removed from the trial analysis set if HL diagnosis was disconfirmed.
Initial clinical staging was conducted by the treating physician and included contrast-enhanced CT imaging performed and interpreted locally. All diagnostic findings were reported to the GHSG Trial Coordination Centre (TCC). Trial allocation followed by randomization was then performed within the GHSG database. After randomization, a GHSG trial physician reviewed disease extent documentation. In addition, for all patients enrolled into HD13, HD14 and NLPHL-IA, initial CT scans were sent to the University Hospital of Cologne, where a panel of experts in imaging and radiation oncology re-evaluated the initial findings and developed an IFRT plan for each individual patient. In the HD15 trial, initial imaging was only reviewed in patients with a residual mass ≥2Á5 cm after completion of chemotherapy (% 1 / 3 of patients), because the protocol treatment did not routinely include IFRT .
Incorrect allocation to a trial either revealed by the GHSG trial physician or the imaging panel, or based on a correction of initial findings reported to the TCC by the local physician, was documented in the trial database. If protocol treatment had not been started by then, it was at that time still possible to remove the patient from the original trial and register them for the appropriate trial. If the correction was found after treatment had been initiated, the treating physician was informed immediately to stop protocol treatment and treat the patient according to the correct risk group.
In preparation for interim and final analyses, comprehensive data checks were performed by the TCC. Some additional cases with incorrect TGA were identified during this procedure, which were mainly stage IA patients assigned to HD13 that were diagnosed with NLPHL by histopathological review. However, given that checks were mostly performed for patients that already finished protocol treatment, therapy could not be adjusted.

Statistical analysis
Patients from HD13, HD14, HD15 and NLPHL-IA with documented correction of the initial histopathological diagnosis or TGA were identified within the GHSG database and analysed descriptively for details regarding the correction, including initial and corrected treatment group, decisive corrections and localizations, and time to correction. Patients enrolled into a second GHSG trial after revision of TGA as described above were here analysed according to the trial they were incorrectly first registered to.
Potential risk factors for up or down revision of TGA were assessed by performing multivariate logistic regression analyses; P-values <0Á05 were considered significant. Patients initially assigned to NLPHL-IA were excluded from the model evaluating down-staging, and patients assigned to HD15 were excluded from the model for up-staging.
Statistical analyses were performed using SAS software (version 9.4 for Microsoft Windows; SAS Institute, Cary, NC, USA).

Results
Of the 5965 analysed cases, 87 were removed from the trials' analysis sets based on disconfirmation of HL diagnosis. Of the remaining 5878 patients (see Table I for patient characteristics), 312 had been initially assigned to an inappropriate treatment group (Fig 1). Of these patients, 176 (56Á4%) were then upstaged and 128 (41Á0%) were downstaged while the correct risk group remained unclear due to insufficient diagnostics or documentation in eight patients (2Á6%). The proportion of HL patients initially assigned to an inappropriate treatment group was 9Á8% in HD13, 6Á0% in HD14, 0Á8% in HD15 and 14Á8% in NLPHL-IA.
In total, 399 of 5965 patients (6Á7%) had an incorrect histopathological diagnosis or TGA. Details are provided in the following paragraphs and in Tables II and SI.

Histopathology
Histopathology review was documented for 5269 patients (88Á3%) and accounted for disconfirmation of HL diagnosis in 87 of them (1Á7%). In most cases a non-HL was diagnosed (70 cases), but expert review also revealed no malignant disease in 17 patients, of whom the majority had initially been allocated to the early-stage favourable HL risk group.
Additionally, correction of histological subtype from classical HL to NLPHL or vice versa resulted in different TGA Table I. Baseline patient demographics and clinical characteristics of HL patients according to study. for 26 HD13 and two NLPHL-IA patients in stage IA without any risk factors and thus accounted for 9Á0% of corrections of TGA (Table II). In HD13, more males than females were down-staged due to histological subtype [21 of 982 male (2Á1%) and 5 of 680 female (0Á7%) patients].

Extent of disease and risk factors
The review of diagnostic findings, including contrast-enhanced CT scans for initial staging, resulted in a correction of TGA in 254 patients and thus accounted for 81Á4% of corrections of TGA. In particular, review of imaging led to the correction of clinical stage in 64 patients (20Á5% of corrections of TGA) and correction of the GHSG risk factors ≥3 nodal areas in 146 (46Á8%), large mediastinal mass in 29 (9Á3%), and extranodal involvement in 15 patients (4Á8%), respectively (Table II). Of those patients with different TGA due to correction of clinical stage, 26 switched from stage I or II to stage III or IV and 11 vice versa, representing a major change in disease extent. Critical nodal localizations were mostly found below the diaphragm and more often resulted in down-staging, whereas upstaging of patients was frequently caused by correction of organ involvement (Table SII). In the remaining 27 patients, changes from stage I to stage II or vice versa accounted for altered TGA due to the simultaneous presence of risk factors or NLPHL histology.
Decisive locations for down-staging due to rejection of the risk factor ≥3 nodal areas were usually localized above the diaphragm in both male and female patients; they were found across all supradiaphragmatic regions (Fig 2).
Upstaging due to this risk factor was mostly related to supradiaphragmatic lymph nodes in women; in men, upstaging was often due to additional lesions detected below the diaphragm (Fig 2).
Correction of the risk factor extranodal involvement, nearly exclusively resulted in upstaging and was caused by infiltration of varying localizations, such as thoracic wall, nasopharyngeal tissue, bones or lung.

Risk factors for incorrect TGA
Significant risk factors for downstaging included male sex [odds ratio (OR) 1Á61; P = 0Á02] and the trial the patient was initially assigned to (OR 2Á91 for HD14 versus HD13 and 0Á59 for HD15 versus HD13; P < 0Á0001). A non-significant trend towards an increased risk for downstaging was observed in patients of older age and higher body mass index (Table III).
Risk of upstaging was significantly decreased in male patients (OR 0Á71; P = 0Á03) and older patients (OR 0Á98; P < 0Á0001), and increased in patients with infradiaphragmatic disease (OR 1Á99; P = 0Á005). The initial trial also had a significant impact (OR 0Á23 for HD14 versus HD13 and 1Á94 for NLPHL-IA versus HD13; P < 0Á0001) (Table III).

Time until correction & treatment
Median time from enrolment to correction of TGA was 1Á6 months (range 0Á0-121Á1; based on review of diagnostic findings and imaging took a median of 1Á4 months (range 0Á0-121Á1). Information on treatment was available for 301 of 312 patients. Treatment according to the revised treatment group was administered in 75 patients (24Á9%) and adjusted early in 106 patients (35Á2%). Therefore a total of 181 patients (60Á1%) received appropriate treatment due to timely detection of incorrect staging. No adjustment of treatment was reported in the remaining 120 patients, resulting in potential over-or undertreatment in 74 patients (24Á6%) and 46 patients (15Á3%), respectively (Table II).

Discussion
Although the impact of centralized quality control on the accuracy of RT has been studied in detail (Bijker et al, 2001;Kouloulias et al, 2003;Eich et al, 2004;Fairchild et al, 2012;Kriz et al, 2012), less is known regarding the overall quality of initial staging of HL patients. We thus analysed the effect of our centralized quality control measures on correct allocation to a treatment group in a total of 5965 HL patients treated in the GHSG trials HD13, HD14, HD15 and NLPHL-IA. We found that centralized, multidisciplinary review of diagnostic procedures clearly changed the recommended treatment in a relevant proportion of patients. Revision of the initial treatment group was most frequently caused by a different histopathological diagnosis or revised interpretation of CT imaging.
In the present analysis, 87 patients (1Á7%) had a different histology upon expert pathology revision, including no malignant disease in 17 patients and a different lymphoma in 70 patients. Overall, these numbers are substantially lower than observed in an earlier analysis reflecting implementation of consistent classifications and immunohistochemical staining in malignant lymphoma: Whereas 8Á3% of cases were revised in 1993 (Georgii et al, 1993), this figure was down to  3Á0% in 2001 (Glaser et al, 2001). More recently, a report on expert review within a UK network on various lymphoid malignancies revealed discordant histopathological findings relevant for TGA in 2Á1% of cases (Proctor et al, 2011). The distinction between different disease entities is highly relevant to every single patient given that treatment approaches may differ fundamentally, especially in those patients not suffering from any malignant disease. Reliable identification of targetable structures expressed within the tumour additionally represents a challenging task in the era of targeted therapies.
Corrections of clinical stage or the risk factor ≥3 nodal areas accounted for the majority of cases re-allocated due to revised CT imaging. Of note, downstaging due to correction of clinical stage might be underreported in our analysis, because initial imaging was not reviewed systematically in HD15. Relevant changes in clinical stage were mainly due to additionally detected organ involvement or absence of infradiaphragmatic lesions. In contrast, a change of the risk factor ≥3 nodal areas mainly occurred in the cervico-thoracic junction and paraaortic region. Discrimination of adjacent lymph nodes and assignment to one of the predefined areas seems to be difficult in these regions of complex and dense anatomic structures and hence constitutes a well known challenge to clinicians (Dechow et al, 2002). Uncertainty regarding the exact definition of this by now well-established GHSG risk factor (Sieber et al, 2002) and distinction of the different areas may account for some of the revisions.
To define a subgroup of patients at risk for incorrect TGA, we performed multivariate analyses for potential risk factors. These analyses showed that initial treatment group, age, sex and infradiaphragmatic lesions played a decisive role. Interestingly, centre type did not affect the risk of incorrect TGA. This is in accordance with a previous analysis by the GHSG that found no difference in the treatment outcome with regard to centre type or size . The GHSG quality control programme as well as prompt consulting by study physicians via telephone in case of uncertainties prior to enrolment may contribute to this observation.
The quality of initial staging has been addressed in other analyses with a focus on RT, as successful reduction of the RT fields requires precise staging (Girinsky et al, 2006). As previously reported (Kriz et al, 2012), centralized review revealed minor corrections of disease involvement in 56% and 72% of patients allocated to HD13 and HD14, respectively, which are also included in the present analysis. These corrections resulted in revised IFRT fields in 38%, which also contributes to the adequate application of risk-adapted therapy. A more recent analysis in 124 adult HL patients treated at Dutch hospitals outside clinical trials reported necessary corrections of treatment including RT in 19% of cases (Stevens et al, 2012). A similar analysis in 578 paediatric HL patients revealed incorrect staging in 20% of patients, resulting in a correction of treatment group in 13% of patients (Dieckmann et al, 2002). Centralized review by a paediatric Hodgkin study group had to be omitted for 2 years due to insufficient funding which allowed Luders et al (2014) to compare 142 paediatric patients treated outside clinical trials to those treated within the GPOH-HD95 trial. Incorrect TGA and RT-fields resulted in inadequate treatment in 39% of patients leading to a significantly poorer progression-free survival (PFS) in this group of patients (Luders et al, 2014). In most patients included in our analysis, timely adjustment of treatment was possible due to prompt review of diagnostic findings. Given that all review procedures, including shipment of imaging and histopathological material, were only carried out after registration and start of treatment in our studies, time until correction might be further reduced by implementation of a screening phase prior to enrolment. Interestingly, we did not observe a sequential improvement in the quality of TGA over the course of the study durations. This might reflect the existing challenge posed by certain aspects of initial staging of HL and emphasizes the need for additional modalities and/or expert review. All diagnostic findings should be interpreted carefully, especially in patients with limited stage or infradiaphragmatic disease, to facilitate adequate TGA also without centralized review.
As a potential limitation of our analysis, we were not able to assess the impact of incorrect TGA on PFS or overall survival (OS). The group of patients at risk for impaired outcome comprises those potentially undertreated due to an upstaging without adjustment of treatment. Due to a timely correction of risk group in the majority of cases, this applied to only 46 patients in our analysis. As some of those patients responded well to the initial treatment approach, additional treatment according to the revised risk group might not have been deemed necessary by the treating physician. Furthermore, due to incomplete documentation after revision of risk group and exlcusion from the trial, it remains unclear whether some of these patients were actually undertreated. Thus, it is not possible to draw sound conclusions regarding PFS or OS from this small and heterogeneous subset.
From an oncologist's perspective, correct TGA is especially important when pursuing modern risk-adapted approaches to ensure high quality of care. It is also essential for patients treated within clinical trials to help improving treatment strategies based on data collected in such a trial. Differentiation of risk groups has demonstrated significant impact on PFS and OS, even in early-stage HL . Hence, inappropriate treatment may result in refractory disease and premature death, which is a disaster for the individual patient and results in a loss of productivity in the society (Hanly et al, 2015). Whereas costs for centralized review of staging images that were already generated is difficult to measure, the costs for detection of one patient with histopathological misdiagnosis of malignant lymphoma in the UK were 690€ (Proctor et al, 2011). In comparison to the relevant socioeconomic burden in case of relapsed disease or premature death, reasonable costs for centralized review in all HL patients could hence be a useful investment, also for treatment outside clinical trials and with no sponsored funding.
Furthermore, the routine use of additional modalities with high sensitivity and specificity, such as a 18 F-fludeoxyglucosepositron emission tomography ( 18 F-FDG-PET) scan (Isasi et al, 2005;Cheson et al, 2014), which frequently leads to a correction of disease involvement (Moog et al, 1998;Jerusa-lem et al, 2001), could improve the quality of initial staging. Despite increased sensitivity, correct interpretation and assessment of PET scans is challenging due to false-positive lesions and heterogeneity in criteria: Whereas divergence is high among observers with varying experience (Zijlstra et al, 2007), inter-observer agreement seems to be high in experienced physicians (Hofman et al, 2009) using pre-defined guidelines (Barrington et al, 2010). These findings reflect the importance of qualified evaluation and consistent classifications in any type of imaging.
In summary, we identified 399 cases (6Á7%) of incorrect diagnosis or staging resulting in major changes to first-line treatment among 5965 patients registered for clinical HL trials in this largest analysis on quality assessment of initial staging conducted so far. Given that exact initial staging of HL is crucial for risk-adapted treatment, multidisciplinary expert review and more sensitive modalities, such as PET, should be implemented in initial staging for all HL patients to ensure high quality of care. Within clinical trials, a screening phase and centralized review could be implemented to assure high-quality data.