Targeted resequencing of 358 candidate genes for autism spectrum disorder in a Chinese cohort reveals diagnostic potential and genotype–phenotype correlations

Abstract Autism spectrum disorder (ASD) is a childhood neuropsychiatric disorder with a complex genetic architecture. The diagnostic potential of a targeted panel of ASD genes has only been evaluated in small cohorts to date and is especially understudied in the Chinese population. Here, we designed a capture panel with 358 genes (111 syndromic and 247 nonsyndromic) for ASD and sequenced a Chinese cohort of 539 cases evaluated with the Autism Diagnostic Interview‐Revised (ADI‐R) and the Autism Diagnostic Observation Schedule (ADOS) as well as 512 controls. ASD cases were found to carry significantly more ultra‐rare functional variants than controls. A subset of 78 syndromic and 54 nonsyndromic genes was the most significantly associated and should be given high priority in the future screening of ASD patients. Pathogenic and likely pathogenic variants were detected in 9.5% of cases. Variants in SHANK3 and SHANK2 were the most frequent, especially in females, and occurred in 1.2% of cases. Duplications of 15q11–13 were detected in 0.8% of cases. Variants in CNTNAP2 and MEF2C were correlated with epilepsy/tics in cases. Our findings reveal the diagnostic potential of ASD genetic panel testing and new insights regarding the variant spectrum. Genotype–phenotype correlations may facilitate the diagnosis and management of ASD.

for disorders with high genetic heterogeneity like ASD, targeted resequencing could increase the molecular diagnostic sensitivity and significantly reduce cost by focusing on genes with available evidence associated with ASD. Alvarez-Mora et al. (2016) designed an ASD panel comprised of 44 candidate genes and conducted a pilot study in 50 patients.
However, they reported a low diagnostic yield, possibly because of the small size of the panel and cohort. There is a need for a more comprehensive panel and a larger ASD cohort to evaluate the diagnostic potential and study the variant spectrum and genotypephenotype correlations.
Most known ASD genes have been identified in Caucasian populations, and their association with ASD in Chinese populations has not been established beyond specific candidate gene association studies (de la Torre-Ubieta et al., 2005). The correlations between genetic variants and clinical features also remain unexplored in Chinese patients. These gaps may hinder the development of genetic testing and management of ASD in the Chinese population. Here, we developed a comprehensive gene panel covering 358 ASD genes and sequenced an extensively phenotyped Chinese cohort (539 cases and 512 controls). We sought to test the panel, ascertain the variant spectrum for potential future genetic diagnoses in Chinese patients, and understand genotype-phenotype correlations.

| Editorial policies and ethical considerations
This study was approved by the Peking University Institutional Review Board (IRB00001052-11043 and IRB00001052-14055). All subjects or their legal guardians completed written informed consent.

| Subjects
We recruited ASD families from training centers in Beijing and Tsingdao, China. All patients had a clinical diagnosis of ASD and underwent our assessments. In particular, they were evaluated with the current "gold standard" diagnostic tools, the Autism Diagnostic Interview-Revised (ADI-R; Lord, Rutter, & Le Couteur, 1994) and the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989) by assessors with certified reliability. We also conducted assessments and questionnaires regarding the patients' developmental, medical, and family histories. A full list of the assessment tools used in this study is available in Table S1. To be included in our study as a case, the patient was required to satisfy the same ADI-R criteria for ASD as used by the Simons Simplex Collection (SSC; Fischbach & Lord, 2010). Namely, a child was classified as having ASD if he/she met the ADI-R cutoffs in the Social and Communication domains, scored within two points of the cutoffs in either the Social or Communication domain, or scored within one point in both domains.
We collected control samples from blood donors at blood donation stations in Beijing, China. All participants, aged 18-55 years old, were born before 2000. Considering that the first Chinese autism cases were reported in 1982 (Tao, 1982), and in China, there were very few hospitals that could diagnose ASD in the 1990s (Zhou et al., 2014), these donors likely had little chance of being diagnosed if they suffered from ASD. To reduce the influence of unrecognized ASD in the donors, we adopted the Adult Autism Spectrum Quotient (AQ; Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001) to screen for ASD in the donors. The recommended cutoff (an AQ of 32) was used (Baron-Cohen et al., 2001). Meanwhile, we also designed the questionnaire to collect other information, including personal and family medical history, as well as pregnancy history, to exclude any risk of ASD-related diseases. A participant was used as a control if he/she had an AQ below 32, did not have any personal or family history of neurological disorders, psychiatric illness, or adverse pregnancy outcomes such as fetal loss, and had completed education through at least middle school to exclude any risk of low intellectual functioning.

| Panel design
We selected syndromic genes as follows and grouped them by the strength of existing evidence, from strongest to weakest: Group 1, high-confidence syndromic genes labeled as Levels 3 or 4 in AutismKB that were generally acknowledged to be related to ASD or reported in more than one family with ASD (Xu et al., 2012); Group 2, genes summarized by Neale et al. (2012) that were only reported in a single family or a single case with ASD; and Group 3, genes reported by recent studies of new syndromes possibly related to ASD (Hoppman-Chaney, Wain, Seger, Superneau, & Hodge, 2013;Schaaf et al., 2013;Sweatt, 2013;Williams et al., 2010).
We selected nonsyndromic genes by ranking all candidate genes in AutismKB (Xu et al., 2012) using an improved multidimensional evidence-based candidate gene prioritization approach. Specifically, we used a new benchmark gene set that included genes that had been reported more than three times in high-quality literature studies, and we used the "Nelder-Mead" algorithm (Nelder & Mead, 1965) to optimize the weight matrix to ensure that 95% of the benchmark genes were ranked in the top 2% of all candidate genes. The genes with weighted combined scores greater than or equal to those of benchmark genes were classified as the core gene set. We then selected and grouped the nonsyndromic genes as follows: Level 1, genes belonging to the core gene set and supported by more than one genetic study; Level 2, the top 300 genes ranked by the above weighted combined scores or genes collected from AutDB (Basu, Kollu, & Banerjee-Basu, 2009) and supported by more than one genetic study; Level 3, genes recently predicted by the TADA model (He et al., 2013) and genes reported by two genetic studies focused on the effect of homozygous variants on ASD Yu et al., 2013). These three groups were further classified into "Level 1-Association only", "Level 1-Association and other", "Level 2-Association only", "Level 2-Association and other", and "Level 3-Association and other" according to whether the gene was supported only by association studies or by both association studies and other studies. In addition, 26 syndromic genes were also recurrently reported in nonsyndromic ASD patients and were selected.
To control for population structure, we added to the panel the top 300 highly differentiated ancestry informative markers (AIMs) from the published panel of AIMs for Han Chinese (Qin et al., 2014).
Probes were designed using the SSAHA algorithm to capture all exons of the selected genes, 50 bp of the intronic sequences flanking the exons, 2 kb of upstream sequences, and the 300 AIMs (Roche NimbleGen, Inc., Madison, WI).

| Variant calling
Adapter sequences were removed from raw reads using cutadapt.

| Quality control
We excluded individuals whose recorded gender did not match that estimated by PLINK (Purcell et al., 2007); individuals with a crosssample contamination level >2%, as estimated by verifyBamID (Jun et al., 2012); and individuals with high levels of pairwise identity by ZHOU ET AL.
| 803 descent (IBD)) with others, as estimated by PLINK (PI_HAT > 0.2); to remove contaminated samples and cryptic related samples. Additionally, we removed nucleotide positions that had >10% missing genotypes and positions that failed a Hardy-Weinberg equilibrium test (p < 0.001).

| Validation of SNVs, indels, and CNVs
Single nucleotide variants (SNVs) and indels were validated via polymerase chain reaction (PCR)-Sanger sequencing. Rare exonic autosomal CNVs were validated by SYBR-based quantitative PCR (qPCR). Reference genes chosen from COBL, GUSB, PPIA, and SNCA were included based on the minimal coefficient of variation, and the normal control was assigned a value of 1 to normalize the data. The −ΔΔ 2 Ct relative quantification method (Livak & Schmittgen, 2001) (Richards et al., 2015), to confirm the classification. (c) the inherited LoF variant is in a gene where LoF is a known mechanism of the syndrome. Furthermore, only when the above variants conformed to the inheritance pattern of the syndrome was the variant considered to be pathogenic or likely pathogenic.
Second, ACMG-AMP guidelines were applied to further confirm the pathogenic and likely pathogenic variants as classified by the above in-house criteria. First, we used the semiautomated tool InterVar (Q. Li & Wang, 2017 Nomenclature was checked using xMutalyzer 2.0.29 (Wildeman, van Ophuizen, den Dunnen, & Taschner, 2008).

| Statistical analysis
After performing the quality control described above, 521 cases and 483 controls remained for subsequent analyses. As population structure has been reported to have an impact on the result of rare variant analyses of case-control studies (L. , we conducted a principal components analysis to detect ancestry differences between cases and controls using EIGENSTRAT (Price et al., 2006) based on the genotype data for AIMs. Only the first eigenvector was statistically significant (p = 2.31 × 10 −8 ), and the first two eigenvectors are plotted in Figure S1.
To better minimize spurious associations, we still adjusted for population structure along the first 10 eigenvectors in the analyses. First, we investigated differences in the variant spectrum stratified by gene groups and MAF between cases and controls. We built the logistic regression models using the phenotype (case: 1 and control: 0) as the dependent variable, and whether the patient carried the variant (he/she carried the variant: 1 and he/she did not carry the variant: 0), gender, and the first 10 eigenvectors from EIGENSTRAT as independent variables to test the difference in the fraction of samples carrying specific variant sets.
Furthermore, we studied the clinical features of the cases with SHANK3 and SHANK2 variants and 15q11-13 duplications to explore their common characteristics and then conducted genotype-phenotype correlation analyses, focusing on loss of language skills, minimal verbal skills, unusual sensory interests, self-injurious behavior, epilepsy or tics, gastrointestinal problems, hypotonia, and insensitivity to pain. By comparing the variant spectra between cases with and without specific features using Fisher's exact test with a 2 × m contingency table, where "m" indicates the number of genes, we selected the feature shown to be marginally statistically significant to investigate the correlation with the gene by Fisher's exact test with a 2 × 2 contingency table.
All statistical analyses were conducted using R 3.2.2 (R Core Team, 2015). For all analyses, p < 0.05 were considered statistically significant. However, when multiple tests were performed, the Bonferroni correction was used to adjust p values by multiplying each p value by the total number of tests. In this study, the following three questions utilized multiple testing: The difference between cases and controls in the fraction of cases carrying variants in five nonsyndromic and three syndromic gene subgroups and the difference in variant spectra between cases with and without specific features. Therefore, only these with adjusted p < 0.05 were considered statistically significant.

| 805
Specifically, the panel contained 111 syndromic genes, including 78 Group 1 genes, 29 Group 2 genes, and 4 Group 3 genes (Table S2), as well as 247 nonsyndromic genes, including 58 "Level 1-Association only" genes, 54 "Level 1-Association and other" genes, 26 "Level 2-Association only" genes, 36 "Level 2-Association and other" genes, and 73 "Level 3-Association and other" genes (Table S3).  The AQ scores for all controls were below 32, and their distribution is shown in Figure 1d. Furthermore, according to their self-report, the controls did not have any personal or family history of neurological disorders or psychiatric illness related to ASD, such  Figure 1e).

|
F I G U R E 2 Variant spectra and differences between cases and controls. (a) The distribution of different types of variants for all targeted regions (left pie chart) and the targeted coding regions (right pie chart). (b) The percentage of cases (red bar) with ultra-rare functional variants in 358 genes was significantly higher than that in controls (blue bar) (p = 0.043 by logistic regression). (c) The differences in the proportion of rare variants between cases and controls for five subgroups of nonsyndromic genes. The bars represent the proportions of cases (solid bars) and controls (hollow bars) that carried the corresponding variants, as grouped by subgroups and MAF. Only rare functional variants in the "Level 1-Association and other" gene set were significantly enriched in cases when compared with the controls. The adjusted p values after Bonferroni correction are shown on the right of the bars. (d) Comparison of the percentage of samples with rare LoF variants in syndromic genes following their inheritance patterns in cases (red bar) and controls (blue bar) grouped by subgroups. Only the variants in the Group 1 gene set were detected in more cases than controls, and the difference was statistically significant (adjusted p = 0.024 by logistic regression after Bonferroni correction). (e) Comparison of rare autosomal exonic CNVs detected in cases and controls. The proportion of samples with deletions is shown in the upper bar and that of duplications is shown in the lower bar. CNV: copy number variant; MAF: minor allele frequency 3.4 | Statistically significant differences in ultrarare functional variants between cases and controls We identified a total of 45,847 variants, whose distribution is shown in Figure 2a. Ultra-rare (i.e., MAF < 0.1%) functional variants were found in significantly more cases than controls, as shown by logistic regression after controlling for gender and population structure by including the first 10 eigenvectors (p = 0.043; Figure 2b), whereas the total number of all variants, the total number of functional variants, and the number of functional variants with a MAF ≥ 0.1% were similar between the cases and controls (p = 1, 1, and 1, respectively).
Details about all SNVs and indels are provided in Table S5.
For nonsyndromic genes, functional variants with a MAF < 0.1% were significantly enriched in cases when compared with controls, as estimated by the proportion of carriers among cases and controls using logistic regression, correcting for gender and population structure (p = 0.039). We then tested the enrichment in five subgroups. After Bonferroni correction, only rare functional variants in the "Level 1-Association and other" group were significantly enriched in cases compared with controls ( Figure 2c).
On the contrary, there was no significant difference in the spectrum of "Level 1-Association only" variants between cases and controls, although this gene set also belonged to the "Level 1" class.
Additionally, controls carried even more rare functional variants in the genes of the "Level 1-Association only" class than cases. The result of this comparison with the "Level 2-Association only" gene set was similar to that of the "Level 1-Association only" set, and the number of individuals with rare functional variants in both of these two gene sets was less than those in the other three subgroups.
Considering that the designation of these two gene sets was only supported by genome-wide or candidate gene association studies, these genes may influence the risk of ASD mainly through common variants, not rare variants. The variant spectra of the "Level 2-Association and other" and "Level 3-Association and other" subgroups showed very weak differentiation between cases and controls, suggesting that the relationship between these genes and ASD requires further confirmation. Thus, among all of the subgroups, "Level 1-Association only" displayed the highest diagnostic potential and should be given first priority in genetic testing for ASD patients.
For syndromic genes, cases harbored significantly more rare LoF variants (MAF < 1%) following the inheritance pattern of the associated syndrome than the controls (19 cases vs. 6 controls, logistic regression correcting for gender and population structure, p = 0.023). All of these variants were validated by Sanger sequencing. This significant difference was mainly due to "Group 1" genes, indicating that this subgroup of genes was the most discriminative ( Figure 2d). The other subgroups, "Group 2" and "Group 3," displayed limited abilities to differentiate cases and controls. This suggested that "Group 1" genes have higher diagnostic potential and could be used to simultaneously screen for multiple ASD-related syndromes in a clinical setting.
In addition to SNVs, 42 rare autosomal exonic CNVs were identified in 27 cases and 12 controls and were validated by qPCR.
There were two cases that carried two CNVs. A larger percentage of cases harbored CNVs than controls, as estimated by the logistic regression correcting for gender and population structure (p = 0.060), but did not reach the significance threshold (Figure 2e). Eight cases and two controls carried CNVs encompassing syndromic genes.
Previous studies have shown that the most frequently and consistently reported chromosomal abnormalities in ASD patients, as detected by karyotyping, fluorescence in situ hybridization (FISH), and CMA, are 15q11-13, 16p11.2, and 22q11.2 CNVs (Abrahams et al., 2013;Schaefer & Guidelines, 2008Yang et al., 2018). In our cohort, five cases and only one control carried CNVs located in these regions. Among them, 15q11-13 duplications were detected most frequently (four cases).
3.5 | Pathogenic or likely pathogenic variants with potential diagnostic value were found in 9.5% of cases  (Tables 1 and S6). These variants accounted for 3.5% of cases ( Figure 3a). All but one of these variants belonged to the Group  (Figure 3a).
In addition to SNVs, 28 rare exonic CNVs were also classified as pathogenic or likely pathogenic, accounting for another 5% of cases (Table 2 and Figure 3a). Of these, seven CNVs (25%) were DN.
Duplications of 15q11-13 occurred in four cases, especially those in 15q13.3, which were found in three cases. GSAMD and CytoScan HD further confirmed and refined the boundaries and size of the 15q13.3 duplications (Table S7). The 15q13.3 duplications were found to be 375-514 kb, only encompassing CHRNA7 and the first exon of OTUD7A ( Figure S2). Except for AU096503, whose maternal sample CNVs (Gillentine & Schaaf, 2015). Although 15q13.3 duplications have decreased penetrance when compared with deletions, there is evidence suggesting that they may be pathogenic (Szafranski et al., 2010). Furthermore, ASD occurs more frequently in individuals with CHRNA7 duplications than in those with deletions (Gillentine & Schaaf, 2015 (Nowakowska et al., 2010). Some MRD20 cases also display hypotonia, delayed motor development, variable dysmorphic features, and variable brain anomalies on imaging. These two cases displayed the typical symptoms described above. Additionally, we identified a pathogenic variant of TSC2 in AU039303, who was rediagnosed with tuberous sclerosis and epilepsy. For commonly reported CNVs, such as 15q11-13 duplication, 2q37 deletion, and 22q11.2 deletion, we also found that many features of our carriers were consistent with previously reported phenotypes of the cases carrying these CNVs (Table S8). Thus, considering the severity of the variants and the clinical phenotypes of carriers, our panel detected pathogenic and likely pathogenic variants in approximately 9.5% of ASD cases (Figure 3a).

| Genotype-phenotype correlations in ASD cases
In this study, pathogenic variants of SHANK3 and SHANK2 showed the highest incidence. In addition to the four above LoF variants, another two missense variants of SHANK3, c.593C>G (p.Ala198Gly) and c.898C>T (p.Pro300Ser), which have been reported in Caucasian ASD patients, were found in two cases (Figure 3c). In total, these six variants explained 1.2% of cases. Although the association between ASD and SHANK genes has been established, the phenotype of the carriers requires further exploration. We conducted a clinical investigation of the six carriers described above (Table S9). All showed severe language and social deficits, which were consistent with previous findings in Caucasian patients (Moessner et al., 2007).  In addition to our comprehensive analysis of the most frequently occurring genes and CNV regions in our samples, we also aimed to identify common genetic factors underlying eight comorbid conditions, including loss of language skills, minimal verbal skills, unusual sensory interests, self-injurious behavior, epilepsy or tics, gastrointestinal problems, hypotonia, and insensitivity to pain, via a twostep method. First, we explored whether cases having specific comorbidity displayed a difference in their variant spectrum by Fisher's exact test with a 2 × m contingency table counting the number of variants, where "m" was the number of genes. As a result, of eight comorbidities, only "epilepsy/tics" showed marginal significance (Table S11). Next, we further explored which genes were correlated with "epilepsy/tics" using Fisher's exact test with a 2 × 2 contingency table. After Bonferroni correction, variants in two genes, CNTNAP2 and MEF2C, were found to be significantly correlated with this phenotype (Figure 3d). In ASD cases with epilepsy or tics, 25.81% and 9.68% carried rare functional variants in CNTNAP2 and MEF2C,

| DISCUSSION
In this study, we investigated the variant spectra, diagnostic yields, and genotype-phenotype correlations of 358 ASD candidate genes in a Chinese case-control cohort. Moreover, we updated the contribution estimate of these targeted genes to ASD. "Group 1" syndromic genes and "Level 1-Association and other" nonsyndromic genes were found to be most significantly associated with ASD and should be given high priority when screening ASD patients. Pathogenic and likely pathogenic variants were identified in 9.5% of cases. Variants in members of the SHANK gene family and 15q11-13 duplications were the most frequent abnormalities found in our Chinese ASD patients.
New phenotype-genotype correlations were also identified. ASD patients carrying rare functional variants of CNTNAP2 or MEF2C were more likely to have epilepsy/tics and require monitoring of these comorbidities. Our findings may facilitate genetic testing for ASD, especially in Chinese patients.
Our sample sets included the only Chinese ASD cohort assessed with ADI-R and ADOS by evaluators with certified reliability; these assessments are regarded as the "gold standard" of ASD diagnosis. sets between cases and controls. We observed a greater enrichment of rare variants of "Group 1" syndromic genes and "Level 1-Association and other" nonsyndromic genes in cases than in other subgroups. This further confirmed their strong associations with ASD in Chinese patients and suggested that they have a higher diagnostic potential.
With the development of research on ASD genetics and the increased attention paid to this disorder, there has been an increase in the number of patients referred for clinical genetic evaluation to identify the genetic etiology. A tiered genetic evaluation is recommended for ASD patients and CMA has been suggested as the first-tier testing approach for CNVs (Schaefer et al., 2013;Shen et al., 2010). However, there is currently still no high-throughput second-tier testing to efficiently and economically screen for diseasecausing SNVs, although genomic technologies have advanced rapidly, and multiple genetic factors associated with ASD have been reported. Here, we used targeted resequencing to enrich for the highest-confidence ASD candidate genes with clinical potential.
Compared with whole-genome and whole-exome sequencing, targeted resequencing can more effectively discover variants of genes of interest at a lower cost and higher sequencing depth, although at the expense of the ability to identify variants outside of the target regions. We assessed the possible pathogenicity of the variants found in our cases using our criteria in combination with ACMG-AMP variant interpretation guidelines. Finally, pathogenic or likely pathogenic variants of targeted genes were identified in 9.5% of cases.
As our panel and analysis pipeline reliably detected rare SNVs, indels, and exonic CNVs of various gene sets simultaneously, we were able to further evaluate the contribution of different types of genetic variation to ASD. Pathogenic and likely pathogenic SNVs of syndromic ASD genes were identified in 3.5% of cases. However, this detection rate is lower than the expected yield. None of the ZHOU ET AL.

| 811
syndromic ASD genes could explain more than 1-2% of the ASD cases, but collectively, they are estimated to be found in approxi- For identifying the actual causal variants from those detected in the panel, we performed extensive analyses for variant interpretation and classification. We used our in-house criteria to identify putative pathogenic and likely pathogenic variants of syndromic and nonsyndromic genes and CNVs. This criteria integrated variant determination protocols primarily based on those described in previously published studies or those collected from two databases, HGMD (Stenson et al., 2017) and ClinVar (Landrum et al., 2014). The ACMG-AMP guidelines (Richards et al., 2015) are more stringent when used to interpret the variants identified in genes that cause Mendelian disorders. As ASD-related syndromes are all Mendelian disorders, we used this guideline to further confirm the pathogenic and likely pathogenic variants in syndromic genes. Seven variants were differentially classified by our criteria and the ACMG-AMP guidelines. We double-checked the reasons for different classifications of these seven variants.
The pathogenicity of three variants, IQSEC2 c.1229delC (p.Pro410fs), PTEN c.404dupG (p.Gly136fs), and PTEN c.460dupC (p.Arg154fs), was upgraded. This was mainly because the population allele frequencies for these variants were extremely low, and the corresponding clinical features of the proband matched to the syndrome, which strengthened the evidence of pathogenicity.
There were two variants, HEPACAM c.803+1G>A and NF1 c.1742dupT (p.Leu581fs), that were downgraded from likely pathogenic to uncertain significance, and one variant, RNF135 c.1015delG (p.Val339fs), was downgraded from pathogenic to uncertain significance. For the splice variant of HEPACAM, null variants in this gene are not a known mechanism of pathogenicity, which explains the loss of PVS (pathogenic very strong) evidence in the ACMG-AMP guidelines. Actually, HEPACAM was in the Group 2 gene set, which had less genetic evidence as the ASD candidate genes. For two other variants, we did not observe similar conditions in the parents who passed the variants on to their children, and they were considered healthy adults, which weakened the evidence of pathogenicity. The CDKL5 c.2854C>T (p.Arg952*) variant was downgraded from likely pathogenic to likely benign. When applying our initial criteria, we collected the HGMD evidence with the tag of "DM?" and estimated it to be likely pathogenic. When applying the ACMG-AMP guidelines, we referred to the original study (Intusoma et al., 2011)

ACKNOWLEDGMENTS
We thank all of the ASD families and normal controls who participated in this study. We also would like to thank the Beijing