Diagnostic yield and clinical impact of chromosomal microarray analysis in autism spectrum disorder

Abstract Background Autism spectrum disorder (ASD) is characterized by high heritability estimates and recurrence rates; its genetic underpinnings are very heterogeneous and include variable combinations of common and rare variants. Array‐comparative genomic hybridization (aCGH) offers significant sensitivity for the identification of copy number variants (CNVs), which can act as susceptibility or causal factors for ASD. Methods The aim of this study was to evaluate both diagnostic yield and clinical impact of aCGH in 329 ASD patients of Italian descent. Results Pathogenic/likely pathogenic CNVs were identified in 50/329 (15.2%) patients, whereas 89/329 (27.1%) carry variants of uncertain significance. The 10 most enriched gene sets identified by Gene Ontology Enrichment Analysis are primarily involved in neuronal function and synaptic connectivity. In 13/50 (26.0%) patients with pathogenic/likely pathogenic CNVs, the outcome of array‐CGH led to the request of 25 additional medical exams which would not have otherwise been prescribed, mainly including brain MRI, EEG, EKG, and/or cardiac ultrasound. A positive outcome was obtained in 12/25 (48.0%) of these additional tests. Conclusions This study confirms the satisfactory diagnostic yield of aCGH, underscoring its potential for better, more in‐depth care of children with autism when genetic results are analyzed also with a focus on patient management.


| INTRODUCTION
Autism spectrum disorder (ASD) is a heterogeneous collection of neurodevelopmental conditions with onset in early childhood, characterized by impairment in social interaction and communication, as well as at least two among repetitive behaviors, insistence on sameness, restricted interests, and abnormal sensory processing (American Psychiatric Association, 2013). ASD patients display impressive interindividual differences in clinical symptoms, developmental trajectories, and treatment response . Despite its high prevalence, no pharmacological treatment effective on core symptoms of ASD has still been found (Persico et al., 2021).
Autism spectrum disorder is considered one of the most "genetic" neuropsychiatric disorders: concordance in monozygotic twins is consistently higher than that observed in dizygotic twins (Huguet et al., 2016). Similarly, family studies show elevated recurrence rates among siblings and first-degree relatives of affected children, confirming high heritability, which has been estimated at approximately 80% in cohorts from five different countries (Bai et al., 2019). A specific genetic etiology is identifiable in up to 40% of individuals, including known genetic syndromes, mitochondrial disorders, chromosomal deletions or duplications of largely variable sizes, and disruptive mutations detected by exome and genome sequencing (Genovese & Butler, 2020;Schaefer & Mendelsohn, 2013). The majority of cases display complex gene x gene interactions involving multiple common and rare variants, the former endowed with variable penetrance (Bai et al., 2019;Genovese & Butler, 2020;Schaefer & Mendelsohn, 2013). For many patients, also gene-environment interactions involving a genetic predisposition conferred by common variants are plausible (Fernandez & Scherer, 2017). In addition, genetic variants can also contribute to explain interindividual variability in clinical phenotype, developmental trajectories, and responsiveness to behavioral or pharmacological treatment Vorstman et al., 2014). Collectively, genetics can thus provide precious information above and beyond "what caused the disorder", ultimately promoting better care for children with ASD (Butler et al., 2022).
The advent of microarray-based comparative genomic hybridization (aCGH) technology has unveiled many submicroscopic copy number variations (CNVs) associated with ASD (Devlin & Scherer, 2012). Research studies have shown that clinically relevant CNVs are detected in 9.3-29.0% of patients with idiopathic ASD (Battaglia et al., 2013;Nicholl et al., 2014;Pellanda et al., 2015;Rosenfeld et al., 2010;Tammimies et al., 2015), a substantially higher diagnostic yield compared to conventional karyotyping . Research data from array-CGH are important, on the one hand, to define the etiology of autism, since both rare and common CNVs can contribute to cause the disorder, and on the other hand, to outline the functional gene networks involved in the underlying pathophysiology (Gaugler et al., 2014;Grove et al., 2019;Pinto et al., 2014). Therefore, the International Standards for Cytogenomic Arrays (ISCA) Consortium has recommended chromosomal microarray as the firsttier clinical diagnostic test for children with ASD and various developmental disorders already since 2010 . However, moving beyond the diagnostic yield, the potential roles of genetic testing by array-CGH in promoting better clinical management of ASD patients have not yet been directly assessed.
The aim of the present study is twofold: on the one hand, we wish to identify and characterize pathogenetically relevant CNVs in a reasonably sized cohort of Italian ASD patients; on the other hand, we aim to explore whether and to what extent array-CGH results can contribute to improve the clinical management of autistic patients.
were recruited at the Service for Neurodevelopmental Disorders at Campus Bio-Medico University Hospital in Rome (Italy) and at the Interdepartmental Program "Autism 0-90" of the "G. Martino" University Hospital (Messina, Italy) between the years 2012 and 2019. All patients fulfilled DSM-5 criteria for a clinical diagnosis of ASD (American Psychiatric Association, 2013). Developmental, clinical, and family history variables were characterized using an ad hoc questionnaire. Patients with known genetic syndrome or a positive karyotype were excluded. Also patients with major dysmorphisms and malformations were excluded, even in the absence of a genetic diagnosis. Patients with sporadic seizures (<1 every 6 months) were included, whereas epileptic encephalopathy or severe perinatal brain damage documented by MRI were causes for exclusion. The clinical diagnosis of ASD was confirmed in all patients using both the Autism Diagnostic Observation Schedule (ADOS, ADOS-2) (Lord et al., 2012) and the Autism Diagnostic Interview-Revised (ADI-R) (Rutter et al., 2003); cognitive level was assessed using either the Wechsler Intelligence Scales for Children (WISC-III, WISC-IV) (Wechsler, 2003), Griffith Mental Developmental Scales II (Huntley, 1996), Colored Raven Matrices (Heinz Wiedl & Carlson, 1976), Leiter International Performance Scale R, or Leiter International Scale-third edition (Roid & Koch, 2017), depending on age and language development. Adaptive behaviors were assessed using the Vineland Adaptive Behavior Scales (Sparrow et al., 1984). All parents gave written informed consent for themselves and for their children. The consent form and all the methods of the study were approved by the Institutional Review Board of University "Campus Bio-Medico" of Rome, Italy (prot. n. 14/98, first approval on April 28, 1998 and subsequent amendments) and the Ethics Committee of Messina, Italy (prot. n 22/17, approved on June 19, 2017). All methods were carried out in accordance with relevant guidelines and regulations.

| Microarray-based CGH and data analysis
Blood was drawn into EDTA-anticoagulated tubes from the autistic proband, both parents and unaffected siblings, whenever available. Genomic DNA was extracted and array-CGH was performed as previously described (Lintas et al., 2017), using the Human Genome CGH SurePrint G3 Microarray 4 × 180 K Kit (Agilent), consisting of ∼170.000 60-mer oligonucleotide probes which span the whole genome with an average spatial resolution of ∼50 Kb. Following the manufacturer's instructions, 200 ng aliquots of genomic DNA from the test and the sex-matched reference samples were digested with AluI and RsaI (restriction enzymes). DNA aliquots were then labeled with fluorescent nucleotides (Cy3 and Cy5, respectively) and hybridized for 24 h with an equivalent amount of Cy3-and Cy5-labeled DNA into the microarrays. Slides were finally washed according to manufacturer's instructions and scanned immediately using the DNA Microarray Scanner (Agilent). Quality control was performed using the Agilent Feature Extraction v10.7, and CNV call was performed using the ADM-2 algorithm, as implemented in the Agilent Cytogenomic Software v.4.0.3.12 and considering aberrations with at least three consecutive probes. All calls were visually inspected to remove possible false positives characterized by irregular Log2 ratios. In order to ensure reliability, CNVs were defined applying the following parameters: minimum number of probes = 3; if 0 = 2 alleles, mean deletions log 2 ratio < −0.60, and mean duplication log 2 ratio > +0.54. De novo CNVs and potentially relevant inherited CNVs with ambiguous Log2 ratio profiles were validated by RT-PCR using TaqMan assays, whenever available, or selective PCR amplification and SybrGreen.
Following genetic testing, patients were clinically reassessed and further medical testing based on the outcome of array-CGH was prescribed, whenever appropriate.

| Gene set enrichment analysis (GSEA) and gene ontology
All genes spanning rare CNVs classified as either "pathogenic", "likely pathogenic", or "uncertain clinical significance" were selected, in addition to all genes spanning common CNVs and listed in the SFARI Gene database ("autism genes"). The open-access web platform Gene Set Enrichment Analysis (GSEA) (http://softw are.broad insti tute.org/gsea/index.jsp) was then used to perform Enrichment Analysis with the Gene Ontology Functional database (Subramanian et al., 2005), applying a hypergeometric statistics. The FDR method was used to correct for multiple testing, setting statistical significance at FDR <0.05, and then exploring the dataset C5 from the Molecular Signature Database v7.2 (https://www.gsea-msigdb.org/ gsea/msigd b/) to select the top 10 most significant categories. In addition, pathway analysis was performed with R, using specific functions implemented in the Bioconductor package clusterProfiler version 4.6.2 1. The specific function groupGO() was used. In this analysis, we considered the more restrictive Gene Ontology levels 4 and 5. Other statistical analyses were performed using the IBM Statistical Package for Social Science (SPSS), version 19.0.

| Gene ontology enrichment analysis
Gene ontology enrichment analysis was performed using 436 unique genes, spanning CNVs scored as "pathogenic", "likely pathogenic", or of "uncertain clinical significance" in 134 of the 139 ASD cases carrying these variants (Table 3). Five cases carrying a chr. 15q11.2-q13.1 duplication were excluded from this analysis, because they alone produced a spurious, extreme enrichment in "Nucleolus" (adj-p = 1.33 e −41 ) and "RNA processing" (adj-p = 3.08 e −34 ) gene sets, essentially due to the SNORD gene cluster spanning these five duplications (Table S2). The top 10 most significant gene ontology categories identified by enrichment analysis in the remaining 134 cases encompassed genes involved in neuronal function and synaptic connectivity, such as neuron projection (adj-p = 9.26 e −8 ), synapse (adj-p = 3.4 e −7 ), and cell-cell signaling (adj-p = 4.75 e −7 ). Many of the genes spanned by these CNVs are already associated with ASD and/or neurodevelopmental disorders. F I G U R E 1 Copy number variants (CNV) classification in accordance with the American College of Medical Genetics and Genomics (ACMGS) and the Clinical Genome Resource (ClinGen) recommendations (Riggs et al., 2020). N (%) of patients in each CNV class is specified.
F I G U R E 2 Significantly higher frequency of rare copy number variants (CNVs) in the "pathogenic" and "likely pathogenic" classes, as compared to "variants of uncertain significance", where rare and common variants are equally distributed (Fisher's exact p < 0.00001).

F I G U R E 3
Partitioning of rare and common duplications and deletions among "pathogenic", "likely pathogenic", or "uncertain significance" copy number variants (CNVs).
In addition, we conducted a complementary analysis of gene ontology with ClusterProfiler using more restrictive levels for each class. The results of the first 30 classes obtained using level 4 and 5 essentially confirm our initial results, also underscoring the importance of calcium-binding intracellular proteins, as well as proteins involved in DNA/RNA binding and transcriptional regulation (Tables S3-S8).

| DISCUSSION
This paper reports the results of array-CGH analysis conducted on a sample of 329 Italian children with ASD. To ensure the reliability of our CNV scoring method, we adopted a two-step approach, first classifying blindly CNVs in accordance with the ACMG and the ClinGen recommendations (Riggs et al., 2020), and then reanalyzing these results using publicly available software. Using this approach, we reached a total detection rate of 15.2% "pathogenic" and "likely pathogenic" variants, which is fully comparable with previously reported diagnostic yields T A B L E 1 Inheritance patterns among: (a) rare and common CNVs, and (b) rare deletions and duplications, defined "pathogenic", "likely pathogenic", or of "uncertain significance" based on ACMG criteria (Riggs et al., 2020

F I G U R E 4
Maternal and paternal inheritance among rare and common "pathogenic", "likely pathogenic", or "uncertain significance" copy number variants (CNVs).

T A B L E 2 (Continued)
T A B L E 3 Gene set enrichment analysis (GSEA) performed using 436 unique genes spanning CNVs scored as "pathogenic," "likely pathogenic," or of "uncertain clinical significance." Gene set name ranging between 9.3% and 29% (Battaglia et al., 2013;Nicholl et al., 2014;Pellanda et al., 2015;Rosenfeld et al., 2010;Tammimies et al., 2015). Predictably, rare and de novo CNVs are associated with greater pathogenicity, as compared to common and inherited variants, while neither deletions nor duplications are significantly predominant. This may partly stem from the methodological approach, whereby CNVs inherited from an apparently unaffected parent or overlapping with common population variation receive a lower score, according to ACMG recommendation (Riggs et al., 2020). However, we tried as much as possible to determine ACMG scores based on the intrinsic features of the CNV, rather than relying largely on the "de novo" vs. "inherited" criterion, because neurodevelopmental disorders are enriched with inherited pathogenic variants with reduced penetrance and no clear parental expression, as well as with pathogenic epimutations in the proband. At the same time, it is biologically plausible that these variants may be endowed with lower penetrance, while rare and de novo variants, especially those affecting neuronal genes, in different samples typically explain the presence of ASD in ∼5-10% of cases (Autism Genome Project Consortium et al., 2007;Marshall et al., 2008;Pinto et al., 2010). Unfortunately, follow-up information regarding additional genetic testing performed using NGS is not available, so we do not know how many cases were explained by variants uncovered performing whole exome sequencing. Among rare variants found in this study, several represent recurrent CNVs in the autism literature or variants of clinical interest. The 15q11.2-q13.1 duplication syndrome involves several genes implicated in autism, playing key roles in neurodevelopment and specifically expressed in the central nervous system, for example, ATP10A (OMIM #605855), UBE3A (OMIM #601623), and the GABRB3 (OMIM #137192), GABRG3 (OMIM #600233), and GABRA5 genes (Urraca et al., 2013). Other genes perform basic cellular functions known to be involved in ASD, such as RNA processing (SNRPN) and protein degradation (UBE3A, HERC2). Clinically, our five patients with the 15q11.2-q13.1 duplication syndrome all show severe deficits in social communication, mild intellectual disability, and sex ratio M:F = 4:1, in line with clinical descriptions of this syndrome (OMIM #608636) (Urraca et al., 2013). The 16p11.2 microdeletion syndrome (OMIM #611913) (Shinawi et al., 2010) and the 17q11.2 deletion syndrome (OMIM # 613675) (Osio et al., 2018), both confer high susceptibility to ASD, developmental delay, and minor craniofacial dysmorphisms (Osio et al., 2018;Shinawi et al., 2010), all features present in our two patients. Finally, patient n. 376 is a 5-year-old boy with severe ASD, verbal language impairment and developmental delay, who inherited from an apparently unaffected parent a Note: The analysis was performed using the MSigDB database v7.5.1, updated January 2022. Gene set names refer to gene ontology cellular components (GOCC) and biological processes (GOBP). 65 kb deletion in chr. 13q32.2, involving the FARP1 gene.

T A B L E 3 (Continued)
We have recently described this case in detail , because he did not respond to the same early intensive behavioral intervention which was successful in bringing out of the autism spectrum his older brother, who does not carry this deletion. Although this CNV does not overlap nor appear similar to CNVs identified in other autistic patients, this genetic variation was classified as "likely pathogenic" because FARP1 hemizygosity may represent a plausible candidate to influence neuroplastic responses to therapeutic environmental stimulation. Farp1 is a synaptic scaffolding protein which regulates synapse function and morphology and promotes actin assembly, dendritic growth, and synaptogenesis (Cheadle & Biederer, 2014). Common variants collectively have been shown to provide large contributions to ASD susceptibility, with each variant exerting a small effect (Devlin & Scherer, 2012;Gaugler et al., 2014;Huguet et al., 2016). However, some common variants provide more sizable contributions, although their penetrance remains relatively low and clinical expression is variable. The 15q11.2 BP1-BP2 CNV encompassing TUBGCP5, CYFIP1, NIPA2, and NIPA1, presented in a previous report (Picinelli et al., 2016), is a paradigmatic example. In another patient (n.10) were detected as many as 11 CNVs that were not present in parental genomes, suggesting a strong tendency to genomic instability. Many of these CNVs encompass genes included in the best-known lists of candidate genes for autism, including CTNNA3 (OMIM #607667) , MACROD2 (OMIM #611567) , IMMP2L (OMIM #605977) (Maestrini et al., 2010), PARK2 (OMIM #600116) , LZTS2 (OMIM #610454) , and LRP1 (OMIM #107770) (De Rubeis et al., 2014). In addition, TBX1 (OMIM #602054) (Paylor et al., 2006), which is located in the center of the region associated with DiGeorge syndrome (OMIM #188400), is partially deleted (six deleted exons out of a total of nine exons) in patient n. 10. The Decipher database lists 16 variations containing the TBX1 gene and associated with autistic disorder; for SFARI Gene database, TBX1 is a known "syndromic" gene; moreover, there is also a linkage study (International Molecular Genetic Study of Autism Consortium, 1998) that associates the chr. 22q11.21 region with autism.
Gene Ontology enrichment analysis is aimed at identifying and ranking functionally related groups of genes obtained from high-throughput experiments. In our study, this analysis fully confirms the importance of neuronal genes, especially structural and functional genes involved in neuronal connectivity (Table 3). This outcome fits well with electrophysiological and functional imaging evidence supporting autism as a mainly "developmental disconnection syndrome" characterized by reduced connectivity among distant brain regions, paired with increased local connectivity (Geschwind & Levitt, 2007). In addition to neuronal gene sets, also "transcriptional regulation", chromatin structure", and "immune" genes have been reported in several other genomic and transcriptomic studies (De Rubeis et al., 2014;He et al., 2019;Satterstrom et al., 2020;Voineagu et al., 2011). In our sample, the "nucleolus" and "RNA processing" gene sets yielded the most impressive pvalues, when we analyzed all 139 subjects carrying CNVs with scores 3-5 (i.e., VUS, likely pathogenic, certainly pathogenic; Table S2). If we exclude the five cases carrying the chr. 15q11.2-q13.1 duplication, these two gene sets disappear from the top 10 list (Table 3). We believe this discrepancy documents that in our sample, the association with transcriptional regulation gene sets was being spuriously boosted by chr 15q duplications, in particular due to the entire SNORD gene cluster being consistently duplicated in all five cases (Table S1). However, using more stringent levels of analysis, we find a very complex and mixed set of GO categories, which primarily encompass genes encoding calcium-binding proteins, as well as DNAor RNA-binding proteins and transcriptional regulators (Tables S3-S8). This more stringent analysis confirms that transcriptional regulation and chromatin management are involved in ASD genetics, as documented by GSEAs performed in large exome-sequencing studies (De Rubeis et al., 2014;Satterstrom et al., 2020). Finally, we do not find "immune" genes spanned by putatively pathogenic CNVs, but indeed there is ample evidence of overexpression of immune genes, especially in ASD brains, according to the vast majority of genome-wide transcriptomic studies (He et al., 2019;Voineagu et al., 2011). Evidently, this overexpression likely represents one of the convergent functional consequences shared by many different autism-causing gene variants not directly related to immune function per se, although an additional modulation by common genetic and epigenetic variants located in transcriptional regulatory regions of immune genes is quite plausible.
Another aim of the present work was to verify whether and to what extent genetic testing by CGH-array can contribute to improve the clinical management of autistic patients. Among the 50 patients carrying pathogenic/likely pathogenic CNVs, 13 (26%) underwent additional medical exams spurred by array-CGH results (Table 4). These ranged from relatively common exams in the neurodevelopmental disorders clinic, like EEG and EKG, to more specific tests, like cardiac or neck ultrasound. These exams were prescribed only because of a-CGH results. A positive outcome was obtained in almost half of these diagnostic tests and the more specialized exams almost always yielded positive results. This further step in diagnostic sensitivity arising from array CGH analysis, which goes beyond the mere identification of a plausible etiology and provides information able to improve the clinical management of ASD patients, represents an excellent example of "actionable genomics in clinical practice" (Butler et al., 2022). At this moment, CNV-based or etiology-based treatment for ASD is still scarce and this is a major limitation of our current medical management of ASD. This upgrade requires at least two components, a genetic analysis of the results performed also with this clinical aim in mind and a strict collaboration between the clinical/ molecular geneticist and the child psychiatrist, who are primarily responsible for the genetic testing and for the clinical management of ASD patients, respectively. In the near future, the complexity of merging genetic and molecular information with structural neurodevelopment, neuropsychological and executive functions, cognitive level, emotional reactivity, social adaptation, and the existential trajectory of an autistic person will represent an increasingly exciting challenge. This perspective may likely require novel teaching and training strategies able to reduce the gap between molecules, neural circuits, and the human mind in order to provide more effective and targeted support to individuals with ASD.