Investigating genotype-to-phenotype correlation in CHARGE syndrome by deep phenotyping and multiparametric clustering

CHARGE syndrome, due to CHD7 pathogenic variations, is an autosomal dominant disorder characterized by a large spectrum of severity. Despite the great number of variations reported, no clear genotype-to-phenotype correlation has been reported. Unsupervised machine learning and clustering was undertaken using a retrospective cohort of 42 patients, after deep radiologic and clinical phenotyping, to establish genotype – phenotype correlation for CHD7 -related CHARGE syndrome. It resulted in three clusters showing phenotypes of different severities. While no clear genotype – phenotype correlation appeared within the first two clusters, a single patient was outlying the cohort data (cluster 3) with the most atypical phenotype and the most distal frameshift variant in the gene. We added two other patients with similar distal pathogenic variants and observed a tendency toward mild and/or atypical phenotypes. We hypothesized that this finding could potentially be related to escaping nonsense mediated RNA decay, but found no evidence of such decay in vivo for any of the CHD7 pathogenic variation tested. This indicates that this milder phenotype may rather result from the production of a protein retaining all functional domains.

Almost 90% of CHD7 pathogenic variants identified in patients with CHARGE syndrome are nonsense/frameshifts/splice modifications predicted to lead to synthesis of a truncated protein, or absence of protein synthesis secondary to nonsense-mediated mRNA decay (NMD). 4,5 It is commonly considered that haploinsufficiency following NMD is likely to be the major disease-causing mechanism in CHARGE syndrome. 6 Missense pathogenic variations have been suggested to be more frequent in CHD7-related disorders with a milder clinical expression. 7 However, no genotype-tophenotype correlation has been established in CHARGE syndrome despite about 1000 CHD7 variations published to date. Such correlations would be valuable for a better understanding of the disease and its prognosis, phenotype prediction, postnatal outcomes and genetic counseling for patients when CHARGE syndrome is diagnosed using prenatal imaging.
Unsupervised machine learning techniques represent a promising potential to discover unrevealed patterns in nonstructured multiparametric data including genotype-to-phenotype correlation. 8 This study aimed to perform clinical and radiological deep-phenotyping in a cohort of patients previously diagnosed with CHD7-related CHARGE syndrome and investigate the hypothesis of genotype-to-phenotype correlation using unsupervised machine learning clustering.

| Patients
We conducted an international dual-center retrospective cohort study at Hôpital Necker (AP-HP, Paris, France) and McGill University Health Center (MUHC, Montréal, Canada). Written consent was obtained for genetic testing for patients and fetuses. The research project was approved by the respective research ethics boards.
We included all patients (n = 42, born between 1992 and 2018) with a diagnosis of CHARGE syndrome according to Hale criteria,9,10 confirmed by the presence of a de novo CHD7 pathogenic or likely pathogenic variation (ACMG class 4 or 5), with available CT-scan and/or MRI of the head and temporal bones.

| Clinical, radiological, and genetic assessment
We collected demographic data, medical history, and physical examination (mediastinal malformations, hypothalamo-hypophyseal dysfunction, genital hypoplasia, intellectual disability if at least 3 years old of ageclassified as absent to mild, moderate or severe-and chorioretinal colobomas). The retrospective review of the available CT/MRI was performed independently by 2 specialized radiologists (RL and KB-Hôpital Necker; CSM and JD-MUHC) using a reading map established on existing diagnostic criteria, as well as on a systematic evaluation of anatomic landmarks (Table S1). Genomic analyses were performed for diagnostic purposes over the years and followed the technological evolutions of sequencing, from standard CHD7 Sanger sequencing to the use of a dedicated Agilent ® capture panel and next-generation sequencing using Illumina ® Next-seq or Hi-seq sequencer. Details of pathogenic variations identified for each patient are provided in Table S2

| RESULTS
Phenotypes are summarized in Table 1. Apart from well-described features of CHARGE syndrome, we also noted 21% of unilateral hypoplastic internal carotid artery and 55% of missing or hypoplastic parotid glands.
Three clusters were identified based on the clinical and radiological phenotypes ( Figure 1). The 26 patients in cluster 1 had a typical and severe phenotype, including 31% death (median age [Q1-Q3] = 36 days [13.5-175]; Table 1). The 15 patients in cluster 2 showed less severe involvement, with only 1 case (7%) of death (21 days), a tendency for a less severe intellectual disability, absence of mediastinal and cardiovascular abnormalities and slightly fewer malformations (median  Table S2). Non-truncating variants were equally represented (15.3% and 26.6% in clusters 1 and 2, respectively, p = 0.43) although their small number precluded any more interpretation, and the variants that are predicted to truncate CHD7 protein were distributed along the gene in both clusters T A B L E 1 Demographics characteristics in each cluster including K-means clustering (n = 3) of clinical profiles of CHD7-mutated patients after dimensionality reduction using principal component analysis. We reproduced the same clustering analysis on the extended cohort including these two additional patients, which resulted in one of these patients being classified in cluster 2 and the other in cluster 3, while the clustering of the initial patients remained unchanged. This suggested that the most distal variants tend to be associated with a mild phenotype.
To date, the assumption of a pathogenic mechanism by haploinsufficiency due to CHD7 mutant mRNA degradation by NMD has not been experimentally proven. This gene is widely expressed during embryogenesis. 12 To support the discussion about

| DISCUSSION
Machine learning-based clustering is an innovative approach that could reveal patterns in phenotype expression across data, in order to formulate genetic hypotheses. Indeed, it allows high-throughput information extraction from complex multilevel data. In this study, the clinical and radiological findings highlighted phenotypic heterogeneity in patients with CHARGE syndrome, from mild to life-threatening expression, as already largely reported. 4 We demonstrated no genotype-to-phenotype correlation for most patients carrying a CHD7 pathogenic variation in the two largest clusters of patients. However, we observed a one-patient cluster with the most distal variant, a frameshift located at the distal end of the penultimate exon. This

C T G A G T T T G T T G A G T T T G T T A T Reverse
Ref. Mut.

Ref.
Mut. region is proposed to escape degradation by NMD, therefore, allowing the synthesis of an altered protein.

Forward
Haploinsufficiency has been proposed as the most likely pathogenic mechanism underlying CHARGE syndrome, which would result from NMD in the case of truncating variations. NMD is probably a tissue-or cell-dependent mechanism. 14 For accessibility reasons, and to better reflect the pathophysiological mechanism, we chose to study RNA decay in fetal tissues. We found no evidence of degradation of the pathogenic allele in any of the fetuses and any of the tissues tested. These results led us to rule out the hypothesis of systematic degradation of truncating variants by NMD during fetal development.
Of note, the existence of a long 3 0 UTR in this gene may be related to this observation. 15 Therefore, the opposite hypothesis of a mild phenotype in the one-patient cluster by preservation of all CHD7 functional domains should be considered.
In this study, the population size is relatively small and we cannot exclude that a larger population would have allowed detection of more subtle genotype-to-phenotype correlations. Our inclusion criteria most likely limited patient inclusion at the expense of very mild phenotypes (CHD7-related disorders) and the assessment of intellectual disability was limited given the retrospective aspect of the study.
In addition, our machine learning approach required replacement of missing data by the calculated mean of each criterion, which could have resulted in misclassification at the individual level, but allows investigating genotype-to-phenotype correlations at the group level.
The broad phenotypic spectrum of CHARGE syndrome and the lack of genotype-to-phenotype correlation make genetic counseling Boucher-Brischoux reviewed clinical data.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.