The utility of DNA methylation signatures in directing genome sequencing workflow: Kabuki syndrome and CDK13‐related disorder

Abstract Kabuki syndrome (KS) is a neurodevelopmental disorder characterized by hypotonia, intellectual disability, skeletal anomalies, and postnatal growth restriction. The characteristic facial appearance is not pathognomonic for KS as several other conditions demonstrate overlapping features. For 20‐30% of children with a clinical diagnosis of KS, no causal variant is identified by conventional genetic testing of the two associated genes, KMT2D and KDM6A. Here, we describe two cases of suspected KS that met clinical diagnostic criteria and had a high gestalt match on the artificial intelligence platform Face2Gene. Although initial KS testing was negative, genome‐wide DNA methylation (DNAm) was instrumental in guiding genome sequencing workflow to establish definitive molecular diagnoses. In one case, a positive DNAm signature for KMT2D led to the identification of a cryptic variant in KDM6A by genome sequencing; for the other case, a DNAm signature different from KS led to the detection of another diagnosis in the KS differential, CDK13‐related disorder. This approach illustrates the clinical utility of DNAm signatures in the diagnostic workflow for the genome analyst or clinical geneticist—especially for disorders with overlapping clinical phenotypes.

as a recognizable syndrome with the cardinal features of facial dysmorphology, skeletal anomalies, dermatoglyphic abnormalities, intellectual disability, and postnatal growth retardation. In 2018, an international consensus was reached on diagnostic criteria (Adam et al., 2019). The authors proposed a clinical diagnosis be based on the presence of infantile hypotonia, developmental delay, and typical dysmorphic features (arched and broad eyebrows with lateral notching/ sparseness, short columella with depressed nasal tip, large prominent or cupped ears, and persistent fingertip pads). Pathogenic variants in the genes KMT2D and KDM6A are known to be causal in Kabuki syndrome KS1 [MIM: 147920] and KS2 [MIM: 300867] cases, respectively (Banka et al., 2012;Lederer et al., 2012;Miyake et al., 2013;Ng et al., 2010).
Among patients with a clinical diagnosis of KS, 75% are attributable to pathogenic variants in KMT2D and 3%-5% to pathogenic variants in KDM6A (Adam et al., 1993;Bogershausen et al., 2016). These two genes have opposite functions: KMT2D encodes a histone methyltransferase, whereas KDM6A encodes a lysine demethylase. The proteins encoded by these two genes form a functional complex that underpins the pathogenic mechanisms of both types of KS through the developmental epigenetic dysregulation of multiple genes. For 20%-30% of children, with a clinical diagnosis of KS, the genetic cause remains unknown (Adam et al., 1993;Bogershausen & Wollnik, 2013;Bogershausen et al., 2016). Reduced sensitivity of genetic testing could result from: locus heterogeneity due to unidentified novel genes; deep intronic variants beyond the detection limits of current gene panel testing (structural variants, promoter, or regulatory variants); or the existence of syndromes with overlapping phenotypic features.
Functional assays, such as genome-wide DNA methylation (DNAm) analysis, can help clarify some of these diagnostic challenges. We and others have previously described a gene-specific KMT2D DNAm "signature" defined as specific sites of differential DNAm in peripheral blood of individuals with pathogenic variants in KMT2D (Butcher et al., 2017). These signatures have been used to build models to classify variants of uncertain significance (VUS) in KMT2D as pathogenic (overlapping the KS DNAm profile) or benign (overlapping the control DNAm profile) (Aref-Eshghi et al., 2017. We also found that individuals with KS due to a KDM6A pathogenic variant had a DNAm signature overlapping that of KMT2D (Butcher et al., 2017). This is not surprising as the proteins KDM6A and KMT2D form a functional complex. Overlapping DNAm signatures have previously been identified for genes encoding proteins that form complexes, for example, BAF (Aref-Eshghi et al., 2018) and PRC2 . Therefore, the KMT2D signature can be used to identify patients with KS secondary to pathogenic variants in the KDM6A gene (Sadikovic et al., 2021).
It is important to determine the exact genetic etiology in patients who meet the clinical diagnostic criteria for KS because the information informs recurrence risk, prenatal testing options, and potential genetargeted therapies (Zhang et al., 2021). Here, we describe two patients who were initially given a working clinical diagnosis of KS despite negative genetic testing (sequencing and deletion/duplication analysis of

| DNA methylation array processing
Genome-wide DNAm profiling was completed for typically developing controls (n = 45), the two patients, along with individuals with pathogenic KMT2D (n = 9) and KDM6A (n = 1) variants at The Center for Applied Genomics, SickKids Research Institute. Whole-blood genomic DNA from each subject was sodium bisulfite converted using the EpiTect Bisulfite Kit (EpiTect PLUSBisulfite Kit, QIAGEN), according to the manufacturer's protocol. Modified genomic DNA was then processed and analyzed on the Infinium HumanMethylationEPIC BeadChip (Illumina 850K) according to the manufacturer's protocol. The raw IDAT files were converted into betavalues, which represent DNAm levels as a percentage (between 0 and 1), using the minfi Bioconductor package in R as previously reported . All samples passed standard quality control metrics in minfi.

| Generation of machine learning scores for variant classification
Using our established DNAm signature for KMT2D and the Support Vector machine (SVM) model as previously described (Butcher et al., 2017;Turinsky et al., 2020) beta values were imported into Epi-genCentral (https://epigen.ccm.sickkids.ca)  to impute SVM classification scores for all samples tested using the KMT2D SVM model. The model was set to the "probability" mode to generate SVM scores ranging between 0 and 1 (or 0% and 100%), thus classifying samples as "KS" (high scores) or "not-KS" (low scores).

This SVM model was built as a tool for the classification of variants in
KMT2D and KDM6A as previously described (Butcher et al., 2017).

| Genome sequencing
Genome sequencing was performed at the Centre for Applied Genomics with high-quality DNA extracted from whole blood using established methods (Costain et al., 2020;Lionel et al., 2018). Patient 1 was included as individual "CMC 16" in a previous cohort study (Costain et al., 2020). Sequence data were analyzed to identify putative disease-associated variants as previously described (Costain et al., 2020;Lionel et al., 2018). Variants were confirmed by an orthogonal method in a CLIA/CAP approved clinical laboratory and returned to the families accompanied by genetic counseling. We performed trio genome sequencing, which identified an 8 kb duplication encompassing exon 3 in KDM6A predicted to result in a frameshift due to a 109 bp insertion as shown in Figure 3 (NM_001291421.2:g.44818001_44826000dup) (Costain et al., 2020).

| Consent
This small copy number variation was not detected by chromosomal microarray analysis, targeted gene testing (including sequencing and MLPA), or clinical exome sequencing. Calling single-exon level copy number variants (CNVs) by exome sequencing remains technically challenging, especially for duplications. In this case, exome sequencing was performed in a large, experienced CLIA/CAP-approved laboratory. Data published by this laboratory indicate incomplete sensitivity for detection of clinically significant CNVs by exome sequencing (Dharmadhikari et al., 2019;Gambin et al., 2017). We would have expected the duplication to be detected by MLPA. After informing the original testing lab of the duplication detected on genome sequencing, they reviewed their results and told us that the exon 3 probe was "top normal" (just less than their cut-off point for calling a duplication). The testing lab subsequently re-ran the MLPA using a new kit which did detect the duplication.
The variant was determined to be maternally inherited and therefore had important implications for future pregnancy planning ( Figure 3). The facial recognition platform Face2Gene was used retrospectively to provide a list of syndromic matches based on his picture at 10 months of age (Gurovich et al., 2019;Marwaha et al., 2021). A gestalt score over 0.5 is highly indicative of a genuine phenotype match (Marwaha et al., 2021). Face2Gene analysis placed KS as the top match with a gestalt score of 0.61.

| Patient 2
A 9-year-old girl of Indian descent, born to non-consanguineous parents, was referred for a genetics assessment due to a history of severe developmental delay (non-verbal), microcephaly, hypotonia, severe F I G U R E 1 Facial gestalt of patient 1 (a) and patient 2 (b). Facial raw images are shown aligned to the composite image produced by the Face2Gene software for Kabuki syndrome (KS). Similarity scale reflective of the gestalt score for a match to KS is also shown feeding issues in infancy, postnatal growth retardation, and esotropia. CDK13-related disorder (MIM #617360)-also known as congenital heart defects, dysmorphic facial features, and intellectual developmental disorder-has previously been noted to present with a KS-like phenotype (Bostwick, 1993) for genome sequencing as this was available to us through a research study and provided the more comprehensive testing option. Genome sequencing has the added benefit that analytical detection of CNVs is at least equivalent to chromosomal microarray analysis, which is not the case for exome sequencing . the former has a separate pathogenic mechanism. This suggests that CDK13-related disorder is likely to be a distinct clinical syndrome as opposed to "Kabuki syndrome 3." As we learn more about the basic biology of conditions with overlapping phenotypes, the inclusion of both genotype and phenotype data into nomenclature, as recently proposed by Biesecker et al. (2021) will allow for more accurate definition of complex syndromic disorders.
We suggest that in cases of suspected KS with negative targeted panel testing, utilization of genome-wide DNAm and consideration of facial phenotyping tools could help provide direction for further genetic testing to improve the efficiency of the diagnostic workflow.
DNAm can be used to streamline further genetic testing-in some cases suggesting more detailed analysis of a specific gene and in other cases expanding to genome-wide sequencing, which is more costly and not universally available in the clinical setting. The utility of DNAm profiling in classifying VUS , arising from genomic sequencing, has already been demonstrated for a large number of rare neurodevelopmental disorders caused by pathogenic variants in genes that affect epigenetic regulation Cytrynbaum et al., 2019;Rots et al., 2021;Sadikovic et al., 2021

CONFLICT OF INTEREST
The authors have no conflict of interests to declare.

AUTHOR CONTRIBUTIONS
Ashish Marwaha is the primary author of the manuscript, gathered clinical report data and helped direct data analysis. Gregory Costain performed the whole genome sequencing analysis and reviewed the manuscript. Cheryl Cytrynbaum and Roberto Mendoza-Londono contributed to the clinical data reported and reviewed the manuscript.
Lauren Chad reviewed the manuscript. Zain Awamleh, Eric Chater-Diehl, and Sanaa Choufani performed methylation sequencing and analysis and reviewed the manuscript. Rosanna Weksberg secured ethics approval and funding for the data collection and analysis, she also co-wrote the manuscript, directed data analysis/presentation and is corresponding author.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.