98% IGHV gene identity is the optimal cutoff to dichotomize the prognosis of Chinese patients with chronic lymphocytic leukemia

Abstract Immunoglobulin heavy chain variable region (IGHV) mutational status has been an important prognostic factor for chronic lymphocytic leukemia (CLL) for decades. Patients with unmutated IGHV (≥98% identity to the germline sequence) have inferior prognosis and tend to carry unfavorable genetic markers compared to those with mutated IGHV (<98% identity to the germline sequence). However, 98% as the cutoff for IGHV mutational status is a mathematical choice and remains controversial. We have previously reported distinct IGHV repertoire features between Chinese and western CLL populations. Here, we retrospectively studied 595 Chinese CLL patients to determine the best cutoff value for IGHV in Chinese CLL population. Using 1% as the interval for IGHV identity, we divided the studied cohort into seven subgroups from 95% to 100%. Briefer time to first treatment (TTFT) and overall survival (OS) were observed in cases with ≥98% compared to those with <98%, while the differences were obscure within subgroups ≥98% (98%‐98.99%, 99%‐99.99%, and 100%) and <98% (<94.99%, 95%‐95.99%, 96%‐96.99%, and 97%‐97.99%). Multivariate analysis confirmed the independent prognostic value of 98% being the cutoff for IGHV identity in terms of both TTFT and OS. All the prognostic factors, including del(17p13), del(11q22.3), TP53 mutation, MYD88 mutation, NOTCH1 mutation, SF3B1 mutation, CD38, ZAP‐70, Binet staging, gender, and β2‐microglobulin, were significantly different in distribution between group <98% and group ≥98%, but not among subgroups 98%‐98.99%, 99%‐99.99%, and 100%. In conclusion, 98% is the optimal cutoff of IGHV identity for the prognosis evaluation of Chinese CLL patients.

(<98% identity to the germline sequence). However, 98% as the cutoff for IGHV mutational status is a mathematical choice and remains controversial. We have previously reported distinct IGHV repertoire features between Chinese and western CLL populations.

| INTRODUCTION
Chronic lymphocytic leukemia (CLL) is the most common type of adult leukemia in western countries. 1 The mutational status of the immunoglobulin heavy chain variable region (IGHV) gene is one of the most important prognostic factors for CLL patients with high identity to the germline IGHV sequence corresponding to poor prognosis. [2][3][4][5][6][7][8][9] The criteria for differentiating IGHV mutational status varied from 95% to 98%. The 98% cutoff value to dichotomize IGHV mutational status recommended by European Research Initiative on CLL (ERIC) has been widely used nowadays 10 : IGHV sequences with <98% identity to the germline sequence are termed as mutated (M), while those with ≥98% identity to the germline sequence are termed as unmutated (UM). However, the "borderline" cases defined as 97%-97.99% identity to the germline represent a mixture group of both indolent and aggressive cases, 10,11 which brings about discussions regarding the optimal cutoff value for IGHV. Davis et al proved that trichotomy by 97% and 99% could better predict time to first treatment (TTFT) (in stage A cohort) and overall survival (OS) for the whole cohort (after ruling out cases with subset#2). Progressionfree survival (PFS) was also different between <97% cohort and ≥97% cohort in the clinical trial cohort. 12 Another team proved that continuous IGHV mutational rate rather than a point value was prognostically significant in patients treated with fludarabine, cyclophosphamide, and rituximab (FCR). 13 Since all of the previous researches on IGHV cutoff values were from western countries predominantly composed of Caucasian, who are different from Asian CLL patients in terms of both clinical features and IGHV gene usage, 14,15 we conducted this retrospective study including 595 cases in order to fill the gap of IGHV cutoff value study in Chinese CLL patients. Through clinical correlation and survival analysis, we found that 98% IGHV gene identity is still the optimal choice for prognosis prediction in our study cohort.

| Patients
The study includes 595 patients with newly diagnosed CLL/small lymphocytic leukemia (SLL) based on the criteria by the International Workshop on CLL-National Cancer Institute (IWCLL-NCI). 16 All patients were from our center (diagnosed from 2002 to 2017). The study was approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University with a reference number 2014-SR-204. Informed consents were provided according to the Declaration of Helsinki before the samples were collected.

| Analysis of immunoglobulin rearrangements
Mononuclear cells were isolated from peripheral blood by lymphocytes separation medium. Then gDNA or cDNA was subjected to polymerase chain reaction (PCR) amplification following the IGH Somatic Hypermutation Assay v2.0 protocol (InVivoScribe) (PCR was performed by Veriti 96well thermal cycler, applied biosystems). The kit provides both leader primers and FR1 primers. The later was used in case of failed detection after using leader primers. In most cases, we used the leader primers to determine the IGHV somatic hypermutation status of clonotypic IGHV-IGHD-IGHJ gene rearrangements. IGHV-D-J rearrangements were sequenced by 3130 Genetic Analyzer (Life Technologies, Carlsbad, CA).
Sequences were aligned to ImMunoGeneTics/V-QUEry and Standardization (IMGT/-VQUEST) database and the IMGT/V-QUEST tool (version 3.3.0). IGHV usages and rates of somatic hypermutation of productive rearrangements were recorded. Adjusted IGHV identity only happened when the option "search for insertions/deletions" was shown.

| Immunophenotyping
The procedures of immunophenotyping of CD38 and ZAP70 by flow cytometry were described previously. 17 The positive cutoff values for CD38 and ZAP-70 were 30% and 20%, respectively.

| Statistical analysis
OS was calculated from diagnosis to death or last followup. TTFT was calculated as time between diagnosis and first treatment. Survival curve was generated via the method of Kaplan-Meier. Log-rank test was used for significant associations. Categorical variables were compared by Chi-square test. Cox regression analysis was constructed to determine the hazard ratio (HR). Variables of significance in univariate analysis were included in multiple Cox proportional hazards model. Statistical analyses were performed by IBM SPSS Statistics 23 (IBM Corporation, Armonk, NY, USA). Tables and figures were drawn by Microsoft office 2016 software and Graphpad Prism 7.0 (GraphPad Software, San Diego, CA) software. P-values were two-sided and P values < 0.05 were considered significant.

| Subjects
The characteristics of 595 patients were summarized in Table  1. The median age was 61.4 years (range 16-92) and 60.9% were male. Twenty-eight patients were diagnosed with SLL and the rest of 567 were CLL patients. 214 (39.9%) patients were in Binet A, 148 (25.8%) in Binet B, and 184 (34.3%) in Binet C. Sixteen (2.7%) patients suffered from Richter's syndrome.

| Influence of mutational load on clinical outcomes
About 352 (58.7%) cases were M, while 248 (41.3%) cases were UM if we used the classical 98% classification by ERIC. In order to determine the optimal cutoff value, we used 1% as the interval to divide the entire cohort into seven groups according to the mutational rate, which were <95%, 95%-95.99%, 96%-96.99%, 97%-97.99%, 98%-98.99%, 99%-99.99%, and 100%, respectively. First, we investigated the best cutoff value in Binet A patients (n = 213，with one patient lost to follow-up). Cox regression analysis showed that only the 100% group (hazard ratio (HR): 2.46, P = .001) was significantly different in TTFT when compared with the <95% group (Table 2A). Then, we compared TTFT in the whole cohort (n = 586，with nine patients lost to follow-up  Table 2C).
It has been reported that up to 30% of CLL patients belong to B cell receptor (BCR) stereotypy and with "some" subsets conferring specific clinical outcomes, especially those who belong to subset#2 characterized with IGHV3-21 usage and predominantly mutated IGHV status. 11,19,20 Lacking the CDR3 information in 131 patients limited further identification of subsets. However, we still identified subsets of the remaining 469 sequences. There was only one patient belonging to subset#2 in 469 evaluable sequences and the result was consistent with our previous study. 15 Since the rarity of subset#2 in our research, we thought the isolated case could not affect the result of cutoff analysis, so we reached a compromise by excluding all the IGHV3-21 cases that tend to have poor prognosis, though there have been controversies over the prognosis of them. 21,22 98% cutoff value could still better predict TTFT and OS in cohort without IGHV3-21 cases than any other cutoff values (Table S2).

| Clinical correlations
Although 98% was the appropriate cutoff for TTFT and OS in our study, we still wanted to know if there were a maldistribution of other prognostic factors in the M/UM groups and whether it was due to the increase of these poor prognostic factors that led to a gradual increase in the HR of OS among three intervals that are ≥98% (2.94, 3.44, 4.25, respectively).
There were no statistically significant differences in the distribution of these prognostic factors in the three subgroups that are ≥98% with the exception of SF3B1 mutation. The frequency of SF3B1 mutation in the 98%-98.99% group was significantly higher than that in the 99%-99.99% and 100% groups (P = .032, P = .004, respectively, Table 3). But given that SF3B1 mutation was not a prognostic factor for TTFT and OS in our study, it did not change our conclusion. On the other hand, due to the low mutation rate of SF3B1, theoretical frequencies were less than 5 in some groups.
Therefore, we conclude that within the ≥98% group, the gradual increase in HR was more likely due to the decrease in IGHV mutational rate rather than other concurrent effects of poor prognostic factors we have known. The lower the mutational rate of IGHV is, the higher the HR of OS is.
Then we conducted Multivariate Cox regression analyses containing prognostic factors above. UM-IGHV (HR:  Table 4). β 2 -MG showed marginal significance in multivariate analysis of TTFT (P = .053). It should be noted that neither TP53 mutation nor del(17p13) was independent prognostic factors for TTFT in this cohort, probably due to their weak power as indications for treatment of CLL.   27 Perhaps the current debate on the best IGHV cutoff values was due to the effect of this group of cells with complex performance. There are differences in IGHV and BCR stereotypy usage between CLL patients from East Asia and those who come from western countries. IGHV1-69, IGHV3-07, IGHV3-23, and IGHV4-34 are the most frequently used genes in CLL patients from the West, while IGHV4-34, IGHV3-23, IGHV3-07, and IGHV4-39 are the most frequently used genes in patients from East Asia. [4][5][6]14,15 In western CLL patients, almost one third of the IGHV sequences belong to stereotyped BCRs. However, in East Asian patients, the proportion of stereotyped BCRs is significantly lower-in our previous research, the ratio is 22.4%. 14,15 Of note, patients in East Asia seem to have a significantly higher proportion of subset #8, while subset #2 common in western patients is rare. 15 These differences within IGHV sequences between ethnic groups possibly originate from ethnic genetic diversity and environmental effects, which urged us to seek the best cutoff value for IGHV in Chinese patients.

| DISCUSSION
In this study, we testified 98% is the optimal cutoff for IGHV in Chinese CLL patients for prognosis evaluation. At the same time, ERIC do also note that caution is warranted in borderline cases and that the clinical implications remain to be elucidated. Multivariate analysis showed that 98% cutoff value was an independent prognostic factor for TTFT and OS. All the prognostic factors involved in our study were significantly different in the two groups that were dichotomized at 98%, indicating high efficiency of 98% as a cutoff value for IGHV in assessing patients. We also found in the subgroups of ≥98%, the increased HR was consistent with an increased IGHV identity to germline sequence and not to other prognostic factors. However, the IGHV mutational status seemed to have limited effects on TTFT and OS in our Binet A cohort (accounting for 40% of the whole cohort), which may have resulted from the uneven distribution of numbers in each interval (the five groups of 95%-99% had no more than 20 cases in each group). A larger CLL cohort is needed to verify this result.
Also, there were some limitations in our study. First of all, follow-up time in our study was shorter compared with that in similar studies. 12,13 In addition, due to the heterogeneity of therapy, we did not take the effects of treatment into consideration in neither univariate nor multivariate analysis. Therefore, it cannot be completely ruled out that the increase in HR within the ≥98% group was partially affected by treatment strategies, while PFS was adopted as an important index for assessing the efficacy of treatment in the other two studies recently published regarding IGHV cutoff value. 12,13 Among them, Jain et al believed that IGHV as a continuous variable among patients treated with FCR can accurately predict the patient's PFS. We hoped for an appropriate Chinese patient cohort to explore whether PFS should be measured using different IGHV cutoff criteria.
In conclusion, we show that 98% cutoff value for IGHV is still the optimal choice for clinical applications. But nothing stays unchangeable. In the current "new agent" era, targeted drugs, such as ibrutinib, idelalisib, and venetoclax (ABT-199), have affirmative effects on UM patients. [28][29][30] When using ibrutinib in relapse/refractory UM patients, the median PFS was 43 months after a median follow-up of 5 years. 31 However, targeted drugs are generally expensive and the follow-up time of clinical trials is not long enough. IGHV mutational status together with FISH could still better predict TTFT in newly diagnosed patients, which could guide on follow-up time and treatment strategies. 30 On the other hand, with the development of new technologies, such as next-generation sequencing (NGS), up to 25% of CLL patients had ≥2 IGHV rearrangements (the ratio was only 5% by Sanger sequencing). 32,33 The phenomenon that one patient has multiple sequences grouped into different mutational status (according to current standards) also exists. Once NGS has been widely used, is 98% the optimal cutoff value for IGHV still appropriate, or is there a revolutionary subversion of the definition of IGHV mutational status? These hypotheses require further researches to be certified.