Association of copy number variation across the genome with neuropsychiatric traits in the general population

Copy number variants (CNVs) are associated with psychiatric conditions in clinical populations. The relationship between rare CNV burden and neuropsychiatric traits in young, general populations is underexplored. A total of 6,807 children from the Avon Longitudinal Study of Parents and Children (ALSPAC) were studied. CNVs were inferred from single nucleotide polymorphism‐array data using PennCNV. After excluding children with known candidate CNVs for schizophrenia (SCZ), rare (<1%) CNV burden (total number of genes affected by CNVs, total length of CNVs, and largest CNV carried) was analyzed in relation to: psychotic experiences (PEs) and anxiety/depression in adolescence; autism spectrum disorder (ASD) and attention‐deficit hyperactivity disorder (ADHD), ASD and ADHD traits, and cognitive measures during childhood. Outcomes were also assessed in relation to known SCZ CNVs. The number of genes affected by rare CNVs was associated with a continuous measure of ASD: the standardized mean difference [SMD] per gene affected was increased by 0.018 [95%CI 0.011,0.025], p = 3e‐07 for duplications and by 0.021 [95%CI 0.010, 0.032], p = 1e‐04 for deletions. In line with our published results on educational attainment in ALSPAC, intelligence quotient (IQ) was associated with CNV burden: the SMD per gene affected was −0.017 [95%CI −0.025, −0.008] p = 1e‐04 for duplications and −0.023 [95%CI −0.037, −0.009], p = .002 for deletions. Associations were also observed for measures of coherence, attention, memory, and social cognition. SCZ‐associated deletions were associated with IQ (SMD: −0.617 [95%CI −0.936, −0.298], p = 2e‐04), but not with PEs or other traits. We found that rare CNV burden and known SCZ candidate CNVs are associated with neuropsychiatric phenotypes in a nonclinically ascertained sample of young people.

Compared to phenotype associations with single nucleotide variants, CNVs tend to confer larger effects (Girirajan et al., 2011;Thapar & Cooper, 2013). A recent survey of rare CNVs in 60,000 human exomes found that 70% of individuals carry at least one rare, genic CNV, and that on average, copy number gains are more common than losses (Ruderfer et al., 2016).
There is substantial pleiotropy between many of these psychiatric phenotypes: in a study of five disorders (SCZ, BPD, MDD, ASD, ADHD), consistent genetic correlations (rG) estimated from single nucleotide polymorphism (SNP) data by both restricted maximum likelihood) and linkage disequilibrium-score regression were found, including the highest rG of 0.79 for SCZ and BPD (Bulik- . This same study observed more modest (yet still substantial) rG values of 0.14 and 0.23 for SCZ/ASD and SCZ/ADHD. Cross-phenotypic overlap is also observed with CNV associations: perhaps the strongest evidence comes from CNVs robustly associated with SCZ, which are also associated with developmental delay, ASD, and congenital malformations O'Donovan & Owen, 2016). CNVs implicated in ASD and SCZ have also shown enrichment in children with ADHD (Williams et al., 2010). While many studies have examined the association of genetic variation in relation to clinical psychiatric diagnoses, genetic association studies of neuropsychiatric traits in population-based samples of younger individuals (i.e., prior to the age of onset of many psychiatric conditions) are less numerous. Within the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort, the use of common genetic variation has provided insight into the shared etiology of psychiatric disorders and traits in the general population. Salient examples include: the genetic correlation between ADHD traits and ADHD diagnosis (Stergiakouli et al., 2015), the shared genetic risk of ASD and population-level variance in social and communication ability (Robinson et al., 2016), and the association of a polygenic risk score for SCZ with adolescent measures of negative symptoms and anxiety (but not psychotic experiences [PEs]) (Jones et al., 2016) SCZ also shows a strong genetic correlation with ADHD in ALSPAC (Nivard et al., 2017). Concerning rarer variation, in a large study of the burden of rare CNVs and cognitive phenotypes in unselected populations, including ALSPAC, we previously found that these variants were negatively associated with educational attainment (Männik et al., 2015). In UK Biobank, carriers of known pathogenic CNVs have also been observed to have reduced cognitive performance . However, associations between rare CNVs and other neuropsychiatric traits are underexplored.
Using data from a young, general population sample (ALSPAC), we first sought to study the burden of large, rare CNVs on a range of neuropsychiatric phenotypes, including ADHD, ASD, depression, anxiety, PEs, and neurocognition. The secondary objective was to test the relationship of known SCZ candidate CNVs with the traits studied.

| Cohort details
The ALSPAC is a prospective cohort of mothers and children. Between 1991 and 1992, 14,541 women living in the former county of Avon, UK were recruited during pregnancy, of whom 13,761 were enrolled into the study. Participants have been followed up longitudinally since recruitment. Further details are available in the cohort profile papers Fraser et al., 2013), and the study website contains details of available data through a fully searchable data dictionary: http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/ Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.

| Genotyping and quality control
The ALSPAC children were genotyped on the Illumina HumanHap550-Quad platform, by the Wellcome Trust Sanger Institute, Cambridge, UK, and the Laboratory Corporation of America, Burlington, NC, using support from 23andMe. Samples were removed if there were gender mismatches, disproportionate heterozygosity, >3% missingness, and insufficient sample replication (IBD <0.8). A total of 8,365 participants who were unrelated at an IBD of >0.125 were included, and all were of European ancestry (non-Europeans were removed after multidimensional scaling, and comparison with the HapMap CEU population).
Genomic locations described in this paper relate to NCBI build 37/ hg19, unless otherwise stated.

| CNV calling
CNVs were called on 7,572 participants using PennCNV (Wang et al., 2007), and the default libraries provided with the package for the After calling, all 7,572 subjects had at least one CNV (with 150,657 CNVs among this group in total). These individuals were retained for quality control (QC). Figure 1 shows a flowchart of the QC and filtering process for the CNV data. First, CNVs were merged if they were separated by a gap less than half of their combined length using the "clean_cnv.pl" script provided within PennCNV (Wang et al., 2007). Individuals were then dropped from the analysis if they had >30 CNV calls (Nag et al., 2013), if they had a LRR standard deviation (SD) of >0.3, a B Allele Frequency drift (BAF drift) of >0.002, or an absolute waviness factor of >0.05. The filter values for LRR SD, BAF drift and waviness factor are all as recommended by PennCNV developers (Wang et al., 2007). Next, CNVs were removed from the analysis if at least 50% of the CNV call overlapped with telomeric, centromeric, or immunoglobulin regions (see Supporting Information Notes 1, 2 and 3 for details of coordinates). Known regions of segmental duplications were also removed (as downloaded from the UCSC Genome Browser [https://genome.ucsc.edu/] [Kent et al., 2002]). CNVs were removed if they fulfilled any of the following criteria: if they spanned <10 probes, were <5 kb or >5 Mb in length, or had a confidence score of <10.
Finally, those CNVs that had a density of less than one probe per 20 kb were removed.
After filtering for those with no outcome data, but complete confounder data, 21,678 CNVs remained (11,750 with a frequency <1%), split among 6,807 subjects. CNVs were mapped to genes using the "refGene" database, also downloaded from UCSC Genome Browser (Kent et al., 2002).

| Phenotypes
Descriptive statistics of each variable, as well as the numbers in the risk group (or means/medians for continuous variables) are summarized in

Psychotic experiences
The psychosis-like symptoms (PLIKS) semi-structured interview was carried out at both 12 and 18 years. The content and reliability of the interview has been described previously (Zammit et al., 2013), but briefly, it covers the occurrence of hallucinations, delusions, and experiences of thought interference, with ratings based on the Schedule for Clinical Assessment in Neuropsychiatry. For this study, the primary outcome was a binary definition of suspected or definite PEs at either age 12 or 18, versus no PEs at these ages.

Anxiety and depression
Binary variables for anxiety and depression were derived using scores from the Computerized Interview Schedule-Revised (CIS-R), carried out at 18 years (Lewis, Pelosi, Araya, & Dunn, 1992). This interview establishes the type and severity of neurotic symptoms, and categorizes depressive episodes according to ICD-10 criteria (including mild, moderate, or severe depression). Anxiety was defined as the presence of ICD-10 concordant diagnoses of generalized anxiety disorder, social phobia, specific phobia, agoraphobia, or panic disorder, using the CIS-R (Jones et al., 2016).

ASDs and traits
ASD diagnosis Diagnoses of ASD have been recorded in ALSPAC via several methods, as described previously (Golding et al., 2017): all children given a statement of special educational needs in the Avon area were reviewed to identify those diagnosed as having ASD according to ICD-10 criteria ( Maternal reports were also used to source cases, according to responses to the question (asked when children were 9 years): "Have you ever been told that your child has autism, Asperger's syndrome or autistic spectrum disorder?." Additional sources of cases included: children diagnosed by age 16, due to classification by the educational system as requiring special educational needs due to ASD; text responses to ALSPAC questionnaires relating to ASD diagnosis between 6 months and 11 years; and finally, letters from parents to the ALSPAC study director.

ASD traits
Mean of seven ASD factors generated previously in ALSPAC Steer, Golding, and Bolton (2010) report a factor analysis of 93 individual measures related to ASD in ALSPAC. They generated seven factors. In the current paper, the mean of these seven factors was used as a global measure of ASD on a continuous scale. While initially very skewed, this variable was readily transformable to approximate normality after reflection and log-transformation (i.e., after transformation, a higher (-) 5 lower score is indicative of reduced performance on these metrics (for all other traits, higher scores indicate a reduced performance, or the trait has been dichotomized so that the risk group is coded as "1," control group as "0"). Gray 5 binary. White 5 continuous. Abbreviations in this Table are  used in subsequent Tables/Figures. score is associated with ASD). Histograms of this variable before and after transformation are shown in Supporting Information Figure 1.

Specific measures of ASD that predict ASD diagnosis in ALSPAC
Four ASD traits that have previously been noted to form a predictive model of ASD in ALSPAC (pseudo-r 2 reported as .48) (Steer et al., 2010) were also used in this analysis. The coherence subscale of the Children's Communication Checklist (CCC), was scored at 9 years (Bishop, 1998), and includes questions such as whether the child could explain the rules of a simple game to a younger child. The Social and Communication Disorders Checklist (SCDC, 91 months) (Sebat et al., 2007) includes questions such as whether the child was able to realize if they had offended people, and about whether they responded to instructions. Another measure explained the presence of repetitive behaviors (RB) at 69 months) (Rutter, Tizard, & Whitmore, 1971).
Finally, the sociability subscale of the Emotionality Activity and Sociability (EAS) temperament scale included measures of whether the child enjoyed the company of people at age 38 months (Buss & Plomin, 1984).
The RB and SCDC, CCC measures were highly skewed and were therefore dichotomized, defining 10% of the sample for each group as the "risk" group for ASD traits. The SCDC has previously been dichotomized using a cutoff of 9 (Barona, Kothari, Skuse, & Micali, 2015), and in this paper, the dichotomization method (using a 10% risk group) supported a similar cutoff of 8. All traits were based on responses to questionnaires filled out by mothers. The EAS was approximately normally distributed, and thus was analyzed continuously, after reflecting the variable, so that a higher score was associated with increased risk of ASD. Histograms of these variables before and after transformation are shown in Supporting Information Figure 2. FIG URE 2 Heatmap of phenotypic correlations between neuropsychiatric traits. This heatmap shows the correlations between all outcomes studied, computed using the "mixed.cor" function provided in the R package "psych." Pearson correlations are computed for pairs of continuous variables, tetrachoric correlations for dichotomous variables (denoted with a "*"), and biserial correlations for mixed pairs. Positive correlations are given in red, and negative correlations in blue, with intensity indicating the magnitude of the correlation.
(-) 5 lower score is indicative of reduced performance on these metrics (for all other traits, higher scores indicate a reduced performance, or the trait has been dichotomized so that the risk group is coded as "1," control group as "0"). For expansion of abbreviations, see Table 1 GUYATT ET AL.

| 493
Hyperactivity Hyperactivity was assessed using the hyperactivity score from the Strengths and Difficulties Questionnaire (Goodman, 1997).
This measure was derived from the results of a maternal questionnaire administered when children were 81 months old, and provides a quantitative measure of ADHD that can also be used to generate a categorical definition (Stergiakouli, Thapar, Davey Smith, & Pediatr, 2016).
Since the measure was zero-inflated in the participants included in this study (see Supporting Information Figure 3), this variable was analyzed as a dichotomous variable, defining a 10% risk group (equivalent to using a cutoff of 7, as previously used in ALSPAC) .
Inhibition and impulse control Inhibition and impulsivity were measured using the "Stop-Signal Inhibition" task (see Supporting Information Methods) (Logan, 1994;Logan, Cowan, & Davis, 1984;Pindus et al., 2015). The number of trials correct at the 250 ms delay and the 150 ms delay were used in this analysis. Both variables were dichotomized into 10% risk groups, since they were negatively skewed, as shown in Supporting Information Figure 4.
Attention Two measures of attention, capturing selective attention and attentional control were used (both measured at 8 years), and were assessed using the "Sky Search" and "Opposite Worlds" subtasks of the Test of Everyday Attention for Children "TEACh" task (see Supporting Information Methods) (Manly & Thames Valley Test Company, 1999). Both measures were log-transformed as this approximated a better normal distribution (three observations on the selective attention measure were dropped to facilitate this). For both attentional measures, a higher score indicates reduced attention, since the measures were based on response times. Histograms of these variables before and after transformation are shown in Supporting Information

| Neurocognitive traits
See Supporting Information Figure 6 for histograms of cognitive variables. All of these variables were analyzed continuously.

Memory
Working memory was assessed using the digit span subtest of the Wechsler Intelligence Scale for Children (WISC), administered at 8 years (Wechsler, Golombok, & Rust, 1992). Phonological memory was assessed using a nonword repetition task that was an adaptation of a previously published task (Gathercole, Willis, Baddeley, & Emslie, 1994). For both of these measures, a higher score indicates better performance. See Supporting Information Methods for details.

Social cognition
Social cognition (nonverbal recognition) was measured using the number of errors made on the Diagnostic Analysis of Non-verbal Accuracy (DANVA) face recognition task (Nowicki & Duke, 1994).
This variable was reflected before analysis, so that a higher score indicates a better performance. See Supporting Information Methods for task details.

Intelligence quotient
IQ was measured at 8 years by the WISC (Wechsler et al., 1992). A higher score indicates a higher IQ.

| Statistical analysis
All analyses were carried out using R. Analyses wherein the number of subjects in a given cell was <5 were censored to protect confidentiality, concordant with ALSPAC policy (https://www.bristol.ac.uk/medialibrary/sites/alspac/documents/alspac-publications-checklist.pdf).

| Transformations of phenotypes and covariates
Psychiatric diagnoses of PEs, anxiety, depression, ASD, and ADHD were binary variables. Due to severe non-normality, many of the measures were dichotomized as detailed above. To summarize, the transformed mean ASD factor score, the measures of cognition, and attention were analyzed as continuous variables. Continuous outcomes were standardized before analyses, to give mean 5 0 and SD 5 1. All analyses were adjusted for sex and population ancestry, using the first two principal components generated from the SNP genotype data used to call CNVs.

| Genome-wide CNV burden
For all analyses, deletions and duplications were analyzed separately.
Effect sizes are presented as SD changes (continuous outcomes) and log odds ratios (binary outcomes). In burden analyses, individuals carrying one of the known pathogenic CNVs (described below) were dropped (n 5 85). Analyses retaining these subjects are available in the Supporting Information.
Analyses were undertaken to assess CNV burden in several ways: in the first analysis, the total number of genes affected by rare CNVs (frequency <1%) was computed for each subject. This variable was used as a measure of rare CNV burden, and each outcome was regressed in turn against it, using either logistic or linear models for binary and continuous outcomes, respectively. We also ran sensitivity analyses, in which the main analysis was repeated, but with individuals carrying known pathogenic CNVs retained.
In addition, we performed two other types of burden analyses, relating to the length of CNVs carried. First, we computed the total length (in kilobases, kb) of deletions and duplications of a frequency <1% per subject. Next, based on the methods of Szatkiewicz et al.
(2014) and Männik et al. (2015), the length of the largest rare deletion and duplication carried was noted for each individual. These variables were then categorized into size: the reference group consisted of individuals carrying no rare (<1%) CNVs of >100 kb (or common CNVs only). Three other categories were defined, in which the largest rare CNV carried was >100 kb to 500 kb, >500 kb to 1 Mb, or >1 Mb.
These categories span the breadth of CNV sizes studied in two recent papers of CNV burden in large cohorts (Männik et al., 2015;Szatkiewicz et al., 2014). For each outcome, presence of a CNV in each size category was compared to the reference category.

| Candidate CNVs analysis
A number of CNVs have been identified as being associated with SCZ . The association between presence of at least one of these 12 rare CNVs (7 deletions, 5 duplications), and each outcome, was assessed separately for deletions, duplications, and any CNV (see Table 2). For the analysis assessing the effect of pathogenic deletions only, those individuals carrying pathogenic duplications were dropped, and vice versa. For coordinates of candidate CNVs, critical regions (defined as stated in Kendall et al. [2016]), and criteria for defining CNVs, see Table 3. To establish the presence or absence of these CNVs, the package "bedtools" was used to compute overlaps between observed CNVs and critical regions (Quinlan & Hall, 2010). For neurexin-1 (NRXN1) deletions, exon coordinates were downloaded from "TableBrowser" and according to transcript NM_004801 (Kent et al., 2002). Results are summarized in Table 4.

| Multiple testing
Correlations between phenotypes were quantified using the "mixed.
cor" function of the "psych" R package (Revelle, 2015). Pearson correlations are computed for pairs of continuous variables, tetrachoric correlations for dichotomous variables, and biserial correlations for mixed pairs. After computing the correlation matrix, the effective number of tests were calculated using Nyholt's "matSpDLite" method of spectral decomposition. This method performs spectral decomposition of a correlation matrix, and then examines the ratio of observed eigenvalue variance to its possible maximum (Nyholt, 2004).

| R E SU LTS
3.1 | Descriptive statistics 3.1.1 | Correlations between phenotypes Table 1 summarizes the descriptive data for the phenotypes studied, including sample numbers, and the numbers in the risk group (for binary/dichotomous outcomes) and means/medians (plus SDs and IQRs) for continuous outcomes. Several of the phenotypes were correlated. Figure 2 shows a heat map of Pearson's correlation coefficients between all outcome variables. As expected, the ASD and ADHD measures showed correlations both between and within groups. Anxiety and depression were also correlated with one another (r 5 .61), and moderately with PEs (r.35). Better performances on the cognitive measures were negatively related to the majority of the psychiatric risk traits studied, but most strongly to ASD and ADHD.
Nyholt's method of spectral decomposition calculated that there were 18 independent tests among all of the outcome variables studied (Sterne & Davey Smith, 2001).  Table 3, and so were excluded from the burden analyses, and descriptive data. Three pathogenic CNVs were not observed in this particular study population: the 7q11.23 duplication, the 15q11.2-q13.1 duplication, and the 22q11.2 deletion . These CNVs were all observed at frequencies of <0.01% in UK Biobank (n 5 151,169), so it is unsurprising that we did not observe them in our study population of n 5 6,807.

| CNVs
About 3,526/6,722 subjects in the burden analysis (52%) carried no rare CNVs (deletions or duplications with frequency <1%) greater than 100 kb in length. These individuals formed the reference category in the "largest carried" CNV burden analysis. The median (IQR) length of rare deletions across the genome (36 kb [0-131]) was greater than the median length of rare duplications (11 kb [0-136]). Large deletions were less common than large duplications when considering CNVs >500 kb, but >100 kb to 500 kb deletions were slightly more common than duplications of the same size.
FIG URE 3 Number of genes affected by rare CNVs in relation to neuropsychiatric traits. This analysis is a regression (logistic for binary traits, linear for continuous traits) of the number of genes affected by rare CNVs (exposure) in relation to neuropsychiatric traits (outcome). This is done separately for deletions (a) and duplications (b). For expansion of trait abbreviations, see Table 1. Other abbreviations: logOR| SMD 5 effect size (logOR for binary traits [denoted with a *], SD change for continuous traits); LCI/UCI 5 lower and upper bounds of 95% confidence interval; (-) 5 lower score is indicative of reduced performance on these metrics (for all other traits, higher scores indicate a reduced performance, or the trait has been dichotomized so that the risk group is coded as "1," control group as "0"). All analyses were adjusted for sex and population ancestry, using the first two principal components generated from the SNP genotype data used to call CNVs IQR 5 interquartile range; kb 5 kilobase; Mb 5 megabase; rare 5 frequency <1%; SCZ 5 schizophrenia. a Total N for whole paper 5 6,807, N 5 6,722 after excluding 85 carriers of pathogenic CNVs (61 deletion carriers, 24 duplication carriers). b See Table 3 for coordinates of these CNVs.  (-) 5 lower score is indicative of reduced performance on these metrics (for all other traits, higher scores indicate a reduced performance, or the trait has been dichotomized so that the risk group is coded as "1," control group as "0"). Gray 5 binary. White 5 continuous. See Table 1 for expansion of abbreviations and phenotype descriptions. See Table 3 for coordinates of candidate CNVs.
a Binary variable (highlighted in gray), log odds ratio (logOR) given (as opposed to SMD, given for continuous variables). LCI/UCI 5 lower and upper bounds of 95% confidence interval. Results for split deletion/duplication results are censored in cases where there are fewer than five individuals in one of the risk groups (N1), to protect ALSPAC participants' confidentiality. b N1 5 numbers of individuals carrying at least one CNV for continuous variables, number of individuals carrying at least one CNV and in risk group for binary variables.

| Total length of rare CNVs
The associations for this analysis are generally similar to that of the "number of genes" analysis, above, which may be explained by the fact that the total length of rare CNV regions and the number of genes affected by rare CNVs are correlated at r [Spearman] 5 .76 (r [Pearson] 5 .64).
There was strongest evidence for an increase in total length of deletions being associated with both an increase in the mean ASD factor, and a decrease in IQ at age 8. There was evidence for similar patterns of association for the ASD factor for duplications.
See Supporting Information Figure 8 for graphical representations of the association of increasing length of total rare deletions and duplications across the genome with the phenotypes studied. These analyses excluded individuals carrying known pathogenic CNVs; Supporting Information Figure 9 shows the results of the analysis in which carriers of these known CNVs were retained.

| Largest CNV carried
Supporting Information Figures 10 and 11  Plots of analyses wherein individuals with any of the pathogenic CNVs were retained (as a sensitivity analysis) are available in Supporting Information Figures 12 and 13 (for the deletion and duplication analysis, respectively).

| Candidate CNV analysis
Since CNVs associated with SCZ may be related to other neurodevelopmental traits Szatkiewicz et al., 2014), presence of CNVs robustly associated with SCZ Rees et al., 2014Rees et al., , 2016, was tested against the neuropsychiatric traits discussed.
These results are summarized in Table 4. There were insufficient case numbers to assess the association between the pathogenic CNVs and the four psychiatric variables (anxiety, depression, ASD, and ADHD), as well as the repetitive behaviors and social communication trait.
There was no strong evidence that the SCZ candidate CNVs were associated with PEs in ALSPAC (OR 0.858 [95%CI 0.403,1.827], p 5 .691), although confidence intervals were wide. There were insufficient cases to assess the effect of deletions and duplications separately.
The low numbers of cases for the dichotomized measure of attention meant that power for these associations was low, but there was a trend toward reduced attention being associated with presence of the pathogenic CNVs.
Overall, these results provide some suggestion that the candidate deletions for SCZ are associated with reduced cognition and ASD traits, even after considering the number of comparisons made.

| D ISC USSION
This study sought to analyze the relationship between the burden of rare copy number variation across the genome in relation to a wide variety of neuropsychiatric phenotypes, measured in a young (<18 years), population-based cohort. The secondary aim was to examine the association of known SCZ CNVs with these traits.
We have prevoiusly found a relationship between the presence of rare CNVs of increasing sizes and educational attainment in ALSPAC (Männik et al., 2015), and a very recent study in a young Swedish population also confirmed associations between presence of large CNVs and neurodevelopmental problems . In line with this work, an association in the current analysis was observed between IQ at 8 years and the number of genes affected by rare CNVs, as well as the presence of rare deletions >500 kb. We also confirm that even in a young, unselected population, it is still possible to detect associations between genome-wide CNVs and neuropsychiatric traits, with the association of a continuous measure of ASD also observed. Concordant with observations reported an Icelandic sample (Stefansson et al., 2014) and in UK Biobank, in which carriers of known pathogenic CNVs had impaired performance on cognitive tests , we also observed associations between known SCZ CNVs and IQ, with slightly weaker evidence of associations for the continuous measure of ASD. When interpreting these results, it should be noted that there was a moderate (r ffi -.3) negative correlation between the ASD/ ADHD trait measures and IQ. It is therefore possible that the associations with the ASD/ADHD trait measures could be driven by larger effect sizes in lower functioning subgroups of these individuals.
With the exception of a measure of coherence (which is also correlated with IQ at 20.36), there were few clear associations observed between presence of rare CNVs and the binary traits studied. This is likely to be in part because there was less statistical power in these analyses. In addition, general cognitive ability is associated with psychiatric disorders very broadly, which may explain why the deleterious effect of CNV burden across the genome was observed most strongly for IQ (Koenen et al., 2009).
The strengths and novelty of this work include the use of an unselected population, and the use of measures that capture both clinical and sub-threshold levels of psychopathology in a general population.
However, since for traits that were primarily ascertained by maternal report (e.g., ASD traits), measurement error is more likely, and the nature of the measurement error would determine the type of any subsequent bias (i.e., if measurement error is related to CNV burden, then this could affect associations in either direction, but if it is unrelated to CNV burden, then the error would attenuate associations toward the null). Nevertheless, using questionnaire or interview-based populationbased data is helpful, since it may be more readily attainable than clinical records of psychiatric diagnoses, which are by their nature sensitive, and require specific data linkage approval to be able to use and access them. For ASD, we had access to both clinical and research measures of ASD. Given that autism represents a spectrum, it is also possible that the genetic architecture of clinical and subclinical diagnoses may vary (as has been observed for psychosis) (Jones et al., 2016), in which case studying these measures may not only be of value as a proxy but as a means to understanding the genetic determinants of the "Broader Autism Phenotype" (Losh, Childress, Lam, & Piven, 2008). There is increasing evidence that ASD is heterogeneous, and that individual trait measures may have distinct etiologies (Happ e & Ronald, 2008). However, for ADHD, it has been observed that the genetic correlation of ADHD as a disorder and ADHD traits approaches 1 (Demontis et al., 2017), and there is also genetic correlation between ASD and its associated traits (in ALSPAC [Robinson et al., 2016], and the Psychiatric Genetics Consortium [Bralten et al., 2017]). This suggests that studying ADHD/ASD traits in general populations is likely to be a powerful method that will provide useful insights into these disorders.
The lack of association between PEs and CNVs is notable because of the recent observation in ALSPAC that common genetic variation predisposing to SCZ (a polygenic risk score) also showed no strong correlation with PEs in this study (Jones et al., 2016). It has been suggested that PEs in late childhood and adolescence may be more strongly attributable to environmental factors, such as childhood trauma and substance abuse, than to genetic predisposition to SCZ (Jones et al., 2016). In addition to a lack of power, this could be another explanation for the lack of association between CNVs and PEs in this work. Other papers have also postulated that CNVs may modify the psychosis phenotype: those with CNVs may be more likely to develop SCZ, whereas those without may have a trajectory toward affective disorders with psychotic symptoms. If the PEs measured in ALSPAC are more representative of those with affective disorders than prodromal features of SCZ, this could also explain the apparent lack of a clear increased burden of rare CNVs among those with PEs (Grozeva et al., 2010).
We considered the possibility of type I and type II error in our results. The associations detected may be true positives, suggesting that the presence of rare CNVs is truly associated with cognition and ASD/ADHD traits in a general population. This fits with the results from ALSPAC published previously as part of a replication effort to study the relationship between CNVs and educational attainment in an Estonian cohort (Männik et al., 2015) as well as with the results of a large study of neurodevelopmental CNVs in relation to cognition in UK Biobank . However, it is also possible that at least some of the associations are due to chance. Type I error was considered by computing the number of independent tests using a previously validated method (Nyholt, 2004). Our strongest results (the association of IQ and the composite ASD factor measure) are likely to be robust to the number of comparisons made (18 tests, equivalent p-value for alpha .05: p 5 2.78e-03), but there is a greater possibility of false positives among the results for which there was weaker evidence. However, despite the possibility of multiple testing, the converse problem is also possible: false negatives (type II error) are likely to have been a problem for many of the binary traits, because of the relatively small numbers in the risk groups.
Another important limitation of this study is the effect of attrition: selective attrition, patterned by outcome may lead to dilution of effect sizes (Wolke et al., 2009). In ALSPAC, a polygenic risk score for SCZ has been found to be strongly associated with participant drop out, indicating that individuals at risk for SCZ may be underrepresented in ALSPAC. This may have an impact on statistical power, and may also attenuate power in genetically correlated disorders, such as depression and ADHD (Martin et al., 2016). This phenomenon might be explained by common factors lying on the causal pathway between genetic risk for SCZ, which could be responsible for attrition. Given that IQ was found to be related to rare SCZ CNVs, and educational attainment is known to be related to drop out in ALSPAC, this could be one mechanism by which study children are lost to follow up .
In addition, selection bias can be viewed as a collider, as participation in a study may be a common effect of the exposures and outcomes being studied. In this case, spurious correlations ("collider bias") may be induced between the genetic exposure and outcomes under study (Martin et al., 2016;Munafo, Tilling, Taylor, Evans, & Davey Smith, 2016). This bias is likely to be worst when performing complete-case analyses; since the analyses were not complete-case analyses (individuals were included if they had at least one CNV passing QC, all confounding variables, plus at least one-but not necessarily all-of the outcomes), this may lessen the effect of collider bias on the results.
In conclusion, we have found some evidence of an association between the burden of rare CNVs, IQ, and psychiatric traits in a population-based cohort of UK children, and associations between candidate CNVs for SCZ and cognition. Having shown that CNV burden and known SCZ CNVs are associated with these traits, a logical extension of this work could seek to map these associations to specific GUYATT ET AL.
| 499 regions of the genome using a hypothesis-free approach. The results demonstrate the utility of studying population-based samples and nonclinical outcomes to better understand the genetic architecture of cognitive and psychiatric phenotypes.