A reduced panel of eight genes (ATM, SF3B1, NOTCH1, BIRC3, XPO1, MYD88, TNFAIP3, and TP53) as an estimator of the tumor mutational burden in chronic lymphocytic leukemia

Abstract Introduction Mutational complexity or tumor mutational burden (TMB) influences the course of chronic lymphocytic leukemia (CLL). However, this information is not routinely used because TMB is usually obtained from whole genome or exome, or from large gene panel high‐throughput sequencing. Methods Here, we used the C‐Harrel concordance index to determine the minimum panel of genes for which mutations predict treatment‐free survival (TFS) as well as large resequencing panels. Results An eight gene estimator was defined encompassing ATM, SF3B1, NOTCH1, BIRC3, XPO1, MYD88, TNFAIP3, and TP53. TMB estimated from either a large panel of genes or the eight gene estimator was increased in treated patients or in those with a short TFS (<2 years), unmutated IGHV gene or with an unfavorable karyotype. Being an independent prognostic parameter, any mutation in the eight gene estimator predicted a shorter TFS better than Binet stage and IGHV mutational status among patients with an apparently non‐progressive disease (TFS >6 months). Strikingly, the eight gene estimator was also highly informative for patients with Binet stage A CLL or with a good prognosis karyotype. Conclusion These results suggest that the eight gene estimator, that is easily achievable by high‐throughput resequencing, brings robust and valuable information that predicts evolution of untreated patients at diagnosis better than any other parameter.

any treatment, to rapidly progressive forms, leading to death, and occasionally undergoing transformation to aggressive lymphoma known as Richter syndrome. From a clinical point of view, Binet and Rai classifications are the most reliable staging systems to predict overall survival. 3,4 However, evolution remains rather heterogeneous within each prognostic group. Among the numerous other published prognosis factors, only IGHV mutational status and the presence of TP53 mutations or deletions, are consensually accepted as predictive for CLL progression and resistance to therapy, respectively. [5][6][7] Unmutated IGHV (UM-CLL) genes are strongly associated with poor overall survival, while Binet stage A patients with mutated IGHV (M-CLL) genes have a very good prognosis, especially when serum protein electrophoresis is normal. 5,6,8 Other prognostic cytogenetic markers include isolated del(13q), which is associated with good prognosis, del(11q), which is thought to correspond to ATM inactivation and trisomy 12, which defines a CLL group with an intermediate prognosis. 9,10 Description of the mutational landscape either by whole exome or whole genome sequencing identified new mutations in CLL such as those of SF3B1, NOTCH1, BIRC3 that are associated with a poor prognosis as well as CLL driving mutations such as those of ATM or MYD88. [11][12][13][14] Puente et al 15 and Burns et al 16 showed that the number of mutations in driver genes is increased in UM-CLL when compared to M-CLL patients. Puente et al 15 also reported that an increased number of driving mutations in CLL is associated with shorter treatment-free survival (TFS) in CLL patients. The prognostic interest of the number of accumulated mutations has also been evaluated from large panels of resequenced genes. 17,18 With the aim to bring the maximum information while sequencing the minimum number of genes, we raised the question whether the prognostic information of the number of accumulated mutations at diagnosis could be estimated from a reduced panel of genes. We first evaluated the number of accumulated mutations or tumor mutational burden (TMB) from two resequencing panels of 70 and 65 genes in two completely independent series of 80 and 70 CLL patients, respectively. The C-Harrel concordance index was used to search for the most informative genes and to define a reduced panel of genes to predict TFS. This TFS predictor was compared to Binet stage, IGHV mutational status, and cytogenetic abnormalities.

| Patients
We analyzed 150 samples from patients with typical CLL, diagnosed between 1980 and 2018, from two University Hospital Centers (series 1, n = 80 and series 2, n = 70). Their Matutes score was 4 or 5 by flow cytometry in all cases. Inclusion criteria were based on the availability of biological samples and cytogenetic results. Flowchart of patients is presented in Figure S1. TMB, the eight gene estimator (see Section 3), Binet stage, cytogenetic abnormalities and their IGHV mutational status were analyzed.

| DNA extraction
Genomic DNA was extracted from peripheral blood mononuclear cells using the QIAamp DNA Blood Mini Kit (Qiagen) according to the manufacturer's instructions.
Sequences of primers are listed in Supplementary Information.
Cases from series 1 were sequenced using a panel of targeted regions known to be involved in lymphoma (see Table S1). This panel spanned 221.6 kb and was designed on the AmpliSeq designer platform (www.ampli seq.com). Libraries were constructed using the Ion AmpliSeq Library kit 2.0 (Thermo Fischer Scientific) according to the manufacturer's instructions and sequenced on Proton (IonTorrent, Thermo Fisher). Cases from series 2 were sequenced using a custom panel of 122.2 kb targeting 65 genes known to be involved in hematological neoplasms (see Table S2). Libraries were generated in duplicate for each patient, using Advanta NGS library prep kit on 48.48 Access Array system (Fluidigm) and sequenced on NextSeq550 (Illumina).

| Variant annotation
To overcome the fact that the germline counterpart was not available, variants were filtered to retain exonic and pathogenic acquired mutations based on a methodology previously described 19 according to an in-house pipeline (Supplementary material and methods).
Briefly, we first restricted the analysis to variants with a sequencing depth ≥100× and supported by ≥5 mutated reads. Minimum variant allele frequency (VAF) was set to 2%. Highly recurrent hot-

| Immunoglobulin gene sequence analysis
Amplification of IGH gene rearrangements was performed using VH Leader or FR1 primers and JH or CH primers as described in ERIC guidelines. 20 Analysis of VDJ sequences was assessed with IMGT/V-QUEST (http://www.imgt.org/IMGT_vques t/analysis) or IGBLAST for complex rearrangements (https://www.ncbi.nlm.nih.gov/igblast).

| Cytogenetics
Conventional cytogenetic and fluorescence in situ hybridization were performed after IL2 plus DSP30 stimulation, according to recommendations of the French Group for Hematological Cytogenetics. 21,22 Quantitative multiplex PCR of short fluorescent fragment (QMPSF) was performed as described elsewhere. 23

| Statistics
The chi-square test was used to evaluate the difference between categorical covariates for TMB or the eight gene estimator subgroups when sufficient patient numbers were achieved. For small sample size, chi-square test was replaced by Fisher's exact test. The Mann and Whitney test was used to compare lymphocyte blood counts be- Here, xi and xj take the value 0 or 1 according to the criteria studied (mutational status of a given gene, mutational complexity, Binet stage A vs B…). The C-Harrel concordance index can also be interpreted as a summary measure of the area(s) under the time dependent ROC curve(s). 26 With each variable being coded as 1 (present) or 0 (absent), the C-Harrel concordance index was calculated with the "rcorr.cens" command of the Hmisc package (https://github.com/harre lfe/Hmisc).

| Population characteristics and distribution of accumulated mutations
The impact of TMB was studied in a series of 150 CLL patients issued from two hospital centers (n = 80 and 70 cases each). Clinical, biological, and cytogenetic characteristics are shown in Table S3. From the first

TA B L E 1 Impact of the main genetic markers on TFS among 80
Binet stage A and B patients with a TFS over 6 months: number of cases, C-Harrel concordance index (C-index), and logrank test P-value are given for each parameter

| Relationship between TMB and TFS in CLL patients
We then studied the TMB impact on TFS for the 110/116 (95%) Binet stage A and B untreated patients of the entire series. As shown in Figure S6, and consistent with previous reports, 15,17 TFS decreased with accumulation of mutations, with a similar poor prognosis for patients with two or more mutations. Patients were thus separated into two categories, those with a TMB below 2 and those with a TMB equal or greater than 2. TFS was significantly decreased among patients with TMB equal to or greater than 2 ( Figure S7, mean TFS = 1.5 and 4.3 years for high and low TMB, respectively, logrank test P = .0001). This was also true when both series were analyzed separately ( Figure S8). Thus, we confirm that TMB is an informative prognostic parameter in CLL.

| Definition of the eight gene estimator
We next used the C-Harrel concordance index to identify the most informative genes. 27 The farthest from 0.5 is the C-Harrel concordance index, the highest is the predictive survival informativeness of a given cri-   (Table S5). With a C-Harrel concordance index of 0.408, prognostic informativeness of these eight genes was superior to that of any individual gene. Individually, ATM, NOTCH1 and SF3B1 were the most informative. C-Harrel concordance index of these three genes together was 0.412, close to that of the eight genes. By contrast, C-Harrel concordance index of TP53 was weak, in accordance with the fact that TP53 mutations mainly predict resistance to Fludarabine.
As shown in Figure S9, any mutation in genes of this panel deeply influenced the survival rate of Binet stage A or B patients, without effect of the number of accumulated mutations. By contrast, mutations in the remaining genes other than the eight selected had no impact on TFS ( Figure S10). This panel of eight genes was then referred to as the eight gene estimator.

| Relationship between TMB and the eight gene estimator with the CLL short-term progressiveness
With a threshold of at least one mutation, we compared the C-Harrel concordance index of the eight gene estimator to that of the main cytogenetic abnormalities and IGHV mutational status. Among poor prognosis criteria, Binet stage B and unmutated IGHV genes had the most informative C-Harrel concordance index followed by the eight gene estimator (Table S5). However, the eight gene estimator had the best C-Harrel concordance index for patients with a TFS of at least 6 months while the Binet stage was in eighth position (Table 1). In this analysis, the weight of Binet stage on TFS is artefactually overestimated since it is the main decision criteria to treat patients. Therefore, this result underlines the importance of genetics in predicting the prognosis of patients with an apparently non-progressive disease.

| Relationship between TMB and the eight gene estimator with the CLL mid-term progressiveness
Even if nearly impossible to estimate, patients seen in specialized hospital centers are very heterogeneous in terms of the duration disease progression (or silent history of their cancer) at time of diagnosis and this should influence the TFS. Therefore, we separated the entire series of untreated patients into two categories: those that had a TFS <2 years and the others (short and long TFS, respectively). As shown in Figure 1A, TMB estimated from the whole panel of genes was significantly higher in both previously treated patients and untreated patients with a short TFS when compared to those with long TFS (Chi 2 -test, P = 4.10 −6 and 0.004, respectively). TMB was greater in previously treated patients than in untreated patients with a short TFS. As from the whole gene panel, TMB from the eight gene estimator was decreased in patients with long TFS and was higher in previously treated patients when compared to those with a short TFS ( Figure 1B).
These results clearly indicate that, with Binet stage and IGHV mutational status, TMB estimated from either the whole panel or the eight gene estimator may identify patients at diagnosis who will need to be rapidly treated despite a clinically non-progressive disease.

| Relationship between TMB and the eight gene estimator with cytogenetics and IGHV mutational status
We also raised the question of the relationships between TMB and cytogenetics combined with IGHV mutational status. TMB evaluated from the whole panel or from the eight gene estimator was significantly higher in high risk (del(11q), del(17p) or complex karyotype) than in low risk (isolated del(13q) or normal karyotype) cytogenetic categories and was strongly associated with cytogenetic complexity (Figure 2). TMB from the entire panel was also higher

| The eight gene estimator harbors all the TFS informativeness of TMB for patients with isolated del(13q) or Binet stage A CLL
To further evaluate the interest of the eight gene estimator, the impact of TMB on patients with Binet stage A CLL or with a normal karyotype or isolated del(13q) was studied (Figure 4). TFS was significantly shorter for patients who were mutated for the eight gene estimator ( Figure 4A, Binet stage A, logrank test, P = .0097; Figure 4B, isolated del(13q) or normal karyotype, logrank test, P < .0001). By contrast, mutations not included in the eight gene estimator had absolutely no effect on TFS. This indicates that TMB given by the eight gene estimator could help to predict evolution of patients in Binet stage A or with good prognosis cytogenetics. Apart from TNFAIP3, genes of the eight gene estimator are all well known to be recurrently mutated in CLL. As cited above, the mutational status of TP53 is required to predict the response to Fludarabine. 28  High-throughput sequencing and analysis of the mutational status of this limited set of genes is easily achievable in hospital centers where molecular diagnosis and follow-up of CLL patients is routinely done.

| Univariate and multivariate survival analysis of TMB and the eight gene estimator among Binet stage A patients
Whether the eight gene estimator could predict the response to chemotherapy and/or the time to next treatment is an open question that needs future prospective studies.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

AUTH O R ' S CO NTR I B UTI O N S
JC and DR performed high-throughput sequencing and data analysis for center 1. CP and TF performed high-throughput sequencing and data analysis for center 2. LD performed a part of high-throughput sequencing technique. NG analyzed IGHV mutational status. JF supervised the statistical analysis and performed the C-Harrel analysis.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.