Effects of copy number variations on brain structure and risk for psychiatric illness: Large‐scale studies from the ENIGMA working groups on CNVs

Abstract The Enhancing NeuroImaging Genetics through Meta‐Analysis copy number variant (ENIGMA‐CNV) and 22q11.2 Deletion Syndrome Working Groups (22q‐ENIGMA WGs) were created to gain insight into the involvement of genetic factors in human brain development and related cognitive, psychiatric and behavioral manifestations. To that end, the ENIGMA‐CNV WG has collated CNV and magnetic resonance imaging (MRI) data from ~49,000 individuals across 38 global research sites, yielding one of the largest studies to date on the effects of CNVs on brain structures in the general population. The 22q‐ENIGMA WG includes 12 international research centers that assessed over 533 individuals with a confirmed 22q11.2 deletion syndrome, 40 with 22q11.2 duplications, and 333 typically developing controls, creating the largest‐ever 22q11.2 CNV neuroimaging data set. In this review, we outline the ENIGMA infrastructure and procedures for multi‐site analysis of CNVs and MRI data. So far, ENIGMA has identified effects of the 22q11.2, 16p11.2 distal, 15q11.2, and 1q21.1 distal CNVs on subcortical and cortical brain structures. Each CNV is associated with differences in cognitive, neurodevelopmental and neuropsychiatric traits, with characteristic patterns of brain structural abnormalities. Evidence of gene‐dosage effects on distinct brain regions also emerged, providing further insight into genotype–phenotype relationships. Taken together, these results offer a more comprehensive picture of molecular mechanisms involved in typical and atypical brain development. This “genotype‐first” approach also contributes to our understanding of the etiopathogenesis of brain disorders. Finally, we outline future directions to better understand effects of CNVs on brain structure and behavior.

40 with 22q11.2 duplications, and 333 typically developing controls, creating the largest-ever 22q11.2 CNV neuroimaging data set. In this review, we outline the ENIGMA infrastructure and procedures for multi-site analysis of CNVs and MRI data.
So far, ENIGMA has identified effects of the 22q11.2, 16p11.2 distal, 15q11.2, and 1q21.1 distal CNVs on subcortical and cortical brain structures. Each CNV is associated with differences in cognitive, neurodevelopmental and neuropsychiatric traits, with characteristic patterns of brain structural abnormalities. Evidence of gene-dosage effects on distinct brain regions also emerged, providing further insight into genotype-phenotype relationships. Taken together, these results offer a more comprehensive picture of molecular mechanisms involved in typical and atypical brain development. This "genotype-first" approach also contributes to our understanding of the etiopathogenesis of brain disorders. Finally, we outline future directions to better understand effects of CNVs on brain structure and behavior.

K E Y W O R D S
brain structural imaging, copy number variant, diffusion tensor imaging, evolution, genetics-first approach, neurodevelopmental disorders, psychiatric disorders

| INTRODUCTION
Classical twin and family studies show that most complex human traits are moderately to highly heritable, including brain structure and function (Hilker et al., 2018;Jansen, Mous, White, Posthuma, & Polderman, 2015;Teeuw et al., 2019). Since 2009, the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) Consortium (Thompson et al., 2014;Thompson et al., 2020) and other large-scale consortia such as Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) (Psaty & Sitlani, 2013) have made significant progress in identifying common genetic variants associated with variability in brain structure (Adams et al., 2016;Grasby et al., 2020;Hibar et al., 2015;Hibar et al., 2017;Knol et al., 2020;Satizabal et al., 2019;Stein et al., 2012) and function (Smit et al., 2018) through so-called genome-wide association studies (GWAS). The relatively common variants (genotyped in large numbers on single nucleotide polymorphism [SNP]) arrays in these studies are typically associated with minor variations in magnetic resonance imaging (MRI)-derived brain measures, thus highlighting the polygenic nature of structural neuroanatomy. So far, our understanding of the biology including the impact of individual, single variants is limited. Therefore, identifying genetic variants with larger effects on MRI-derived measures of brain structure or function may provide a path to help deduce molecular mechanisms contributing to brain development and diseases.
Copy number variants (CNVs) (Figure 1) result from the deletion or duplication of a segment of the genome (Feuk, Marshall, Wintle, & Scherer, 2006) (a glossary of genetic terms is found in Table 1). CNVs represent a promising approach to study neurogenetic mechanisms shaping human behavior, cognition, and development. There are several rationales for this: certain rare, recurrent CNVs are associated with high risk (odds ratio up to 67.7) for a wide range of medical and behavioral consequences including brain disorders (Hastings et al., 2009) and some display large macroscopic effects on brain structure. The same CNV may confer elevated risk for several different (brain) disorders while reciprocal CNVs (Figure 1) at each end of the gene dosage response may be associated with the same disorder. Such clues gleaned from CNV research suggest that brain disorders are highly interlinked. Consequently, the study of rare CNV carriers may help us to understand the mechanisms behind not only rare isolated syndromes, but also of interrelated disorders, including the interaction between rare and common variants in shaping brain and disease as well as the intersection between somatic and brain disorders. This may allow us to identify both resilience and risk factors in common variants with potential to improve individual disease management.
Despite their clinical relevance and evolutionary importance (Lauer & Gresham, 2019), effects of rare CNVs on human brain structure are poorly understood partially due to the rarity of these CNVs, which pose challenges in data collection. Several consortia including the 16p11.2 European consortium (Maillard et al., 2015) and Simons VIP/Searchlight (Qureshi et al., 2014) as well as individual projects (Meda, Pryweller, & Thornton-Wells, 2012;Reiss et al., 2004;Stefansson et al., 2014;Ulfarsson et al., 2017)  WGs aim to address some of the core limitations, especially those relating to low power and replicability, of prior brain imaging CNV studies and to foster collaborative discovery.
In this review, we focus on the work done by the ENIGMA WGs on CNVs. We first outline the significance of CNVs for elucidating genetic mechanisms underlying brain development and disease. We then describe the data collection, study design, and analytical methods used by the two WGs. Next, we review key findings of the 22q-ENIGMA and ENIGMA-CNV WGs on the 22q11.2, 16p11.2 distal, 15q11.2, and 1q21.1 CNVs and include results from other relevant work that has helped us to understand effects of CNVs on brain structure. We then discuss emerging principles that may govern how rare CNVs affect the brain. Finally, we summarize future plans to understand the neurobiology of CNVs for a broader range of brain phenotypes.

| The role of CNVs in neurodevelopmental disorders
CNVs may account for up to 13% of the genome (Stankiewicz & Lupski, 2010), with the vast proportion being common across individuals and without any known negative effects (Iafrate et al., 2004).
However, some CNVs can disrupt normal function in humans, causing psychiatric and neurodevelopmental disorders (NDs), somatic and neurological diseases, as well as cancer (Hastings et al., 2009). For instance, individuals with rare, recurrent CNVs are at much higher risk of NDs, including autism spectrum disorders (ASD), epilepsy, schizophrenia (SCZ), and intellectual disability (ID) (Kirov, Rees, & Walters, 2015) as well as Alzheimer's disease and other neurodegenerative diseases (Cervera-Carles et al., 2016;Cuccaro, De Marco, Cittadella, & Cavallaro, 2017). De novo and inherited CNVs combined have been estimated to explain 15% of neurodevelopmental disorder cases (Wilfert, Sulovari, Turner, Coe, & Eichler, 2017). Likewise, at least 9% of all ASD (Munnich et al., 2019) and 2.5% of SCZ cases carry a known pathogenic CNV . CNV carriers also have high rates of additional comorbid medical conditions (Crawford et al., 2018) and some display altered anthropometric traits (Mace et al., 2017;Owen et al., 2018). This high disease rate is often mirrored by reduced fecundity (Stefansson et al., 2014). Thus, altogether, a high impact CNV may represent a lifelong burden for the affected individuals and their caregivers, leading to substantial personal and societal costs.
The high odds ratio (>10) (Marshall et al., 2017) for neurodevelopmental disorders associated with specific CNVs is in con-  Wray et al., 2018) and attention deficit hyperactivity disorder (ADHD; OR = 1.12; Demontis et al., 2019). This has spurred considerable interest in studying CNVs as a genetics-first approach to understand mechanisms of abnormal brain development as well as risk for disorders such as SCZ (Kirov et al., 2015), ASD (Stessman, Bernier, & Eichler, 2014) in addition to other medical comorbidities (Pierpont et al., 2018).
Such interest has been further encouraged by the diversity of CNVs: There are at least 93 known clinically relevant recurrent rare CNVs (Kendall et al., 2016), each with its own clinical profile/consequences (Girirajan et al., 2012;Rosenfeld & Patel, 2017). Some recurrent CNVs have moderate to small effects, for example, the more common 15q11.2 deletion, while others have large effects with nearcomplete penetrance, such as the very rare Williams syndrome/ 7q11.23 deletion. Such high penetrance is positively correlated with the proportion of de novo occurrence in the population (Rosenfeld, Coe, Eichler, Cuckle, & Shaffer, 2013). In contrast, CNVs with small effects tend to be inherited more often, and may be identified in seemingly asymptomatic parents. Thus, different CNVs allow insight into different clinical risk profiles and their potential mechanisms.
Likewise, a specific CNV lacks diagnostic specificity and offers hugely diverse pleiotropic outcome. For instance, the same CNV may be associated with congenital defects, SCZ, ASD, ID, epilepsy, or early-onset Parkinson's disease (Bijlsma et al., 2009;Shen et al., 2011;Stefansson et al., 2014;Tabet et al., 2012) as exemplified by the 22q11.2 deletion syndrome (22q11DS), which has been associated with all the above conditions (Boot et al., 2018;Butcher et al., 2013;Gudmundsson et al., 2019;Marshall et al., 2017). Furthermore, CNVs may lead to multiple disorders in the same individual (known as multimorbidity). For example, individuals with 22q11DS who have a psychiatric disorder are at increased risk for other psychiatric disorders, as well as motor coordination problems (Cunningham et al., 2018) and sleep problems (Moulding et al., 2020). Thus, few traits show evidence of genotypic specificity Cunningham, Hall, Einfeld, Owen, & van den Bree, 2020;Girirajan et al., 2012;Rosenfeld & Patel, 2017).
F I G U R E 1 Copy number variants. CNV carriers may have a deletion (one copy of region D, red) or duplication (three copies of region D, blue) compared with the normal copy number (two copies of region D, black). Reciprocal CNVs are a deletion and duplication occurring at the same locus T A B L E 1 Glossary table   Term Definition

Aneuploidy
The presence of an abnormal number of chromosomes in a cell. Examples are Down syndrome and monosomy X (Turner syndrome).
Anthropometric trait A trait that describes body dimensions, such as head circumference, height, weight, girth, or body fat composition.
Array comparative genomic hybridization (aCGH) A molecular cytogenetic method to detect copy number variants (CNVs) by comparing large fragments of DNA from a test individual to those from a reference sample.
Breakpoints (BP), chromosomal A specific site of breakage, usually associated with a recurrent chromosomal abnormality. As in 16p11.2 distal CNV BP2-BP3, where BP2-BP3 refers to BP 2 to BP 3. For some CNVs, several low copy repeats (LCRs) in the region allow for multiple such BPs.
Copy number variant (CNV) A type of structural genomic variation ( Figure 1) that includes insertions, inversions, and translocations  in which segments of the genome are either deleted or duplicated. "Pathogenic" recurrent CNVs are of vastly different sizes and can span many genes (up to 90 for 22q11.2; McDonald-McGinn et al., 2015) or just one (as in the case of NRXN1 CNVs; Lowther et al., 2017). Differences in BPs within the same locus add to the complexity of CNVs (e.g., in the 16p11.2 or 1q21.1 regions). In addition to recurrent CNVs, numerous ultra-rare nonrecurrent, "one-hit," or single CNVs may also disrupt normal function.

CNV -naming
A CNV is named based on its locus, that is, its specific position on the chromosome. The shorter arm of a chromosome is termed the p-arm (petite = French for small), while the longer arm is the q-arm. For example, the 16p11.2:16 = chromosome 16; p = p-arm; 11 = region 1, band 1; 2 = sub-band 2. Distal and proximal are used when two CNVs are present at the same locus (e.g., the 16p11.2 distal and proximal CNVs)-Distal is situated farther away from the centre of the chromosome (called the centromere) than the proximal which is closer to the centromere.
de novo A genomic variation that occurs spontaneously in the offspring and thus is not inherited from the parents.

Fluorescence in situ hybridization (FISH)
A targeted molecular cytogenetic method used to detect and localize a chromosomal deletion or duplication using fluorescent probes corresponding to the DNA sequence targeted.

Gene dosage effect
The relationship between the number of copies of a gene and, for example, gene expression or brain volume.

Gene dose response
The effect of altering the amount of genetic material in a region/the magnitude of the response of an organism to changes in gene presence.
Genome assembly/build A reference genome assembly is a string of digital ATCG nucleotides representing the complete set of genes from an organism. It is assembled through a consensus of the genomes of different donors. The most recent human genome assembly, termed GRCh38 (also called "build 38"), was released in 2013 and is derived from 13 anonymous donors. Earlier human reference genome versions include: GRCh37 or hg19 ), NCBI36 or hg18 (2006), NCBI35 or hg17 (2004), and NCBI34 or hg16 (2003 Genetics-first approach A strategy used in epidemiological studies to associate specific genotypes (such as a specific CNV) with apparent clinical phenotypes of a complex disease or trait. Also called "genotype-first."

Genotyping
The process of determining differences in the genetic make-up (genotype) of an individual by examining the individual's DNA sequence using biological assays. The term is often used to refer to the identification of SNPs through (SNP) genotyping arrays.

Genetic heterogeneity
The same or similar phenotypes caused by different genetic mechanisms.

Idiopathic
Any disease or condition for which the cause is unknown.

Insertion
A structural variant that involves a mutation through the addition of genetic material to a chromosome.

Inversion
A structural variant in which a segment of a chromosome is reversed end to end.
Low copy repeats (LCRs) Highly homologous sequence elements within the eukaryotic genome arising from segmental duplication and predisposing the genome to nonallelic homologous recombination (NAHR). LCRs mediate many of the chromosomal rearrangements that underlie genomic disorders by predisposition to recombination errors.
Multiplex ligation-dependent probe amplification (MLPA) A molecular cytogenetic method used to identify copy number variants. It is a variation of the multiplex polymerase chain reaction that permits amplification of multiple targets with only a single primer pair.

Nonallelic homologous recombination (NAHR)
A form of homologous recombination that occurs in two pieces of DNA that have similar sequences, often as a result of the presence of low copy repeats (LCRs). NAHR can occur within the same LCR or in an alternative LCR, and can result in a variety of chromosomal rearrangements, including deletion, duplication, translocation, and inversion. The presence of LCRs and resultant NAHR is believed to play a Harmful effects of CNVs may be partially explained by altered expression of genes in the affected region due to the difference in gene copy number, leading to higher or lower transcription levels (Hastings et al., 2009). This phenomenon is sometimes referred to as the "gene dosage effect" or "dose response per copy number." CNVs can also modulate expression of genes outside of the region deleted or duplicated, either by addition or removal of regulatory elements, or by modifications of the 3D structure of the genome (Spielmann, Lupianez, & Mundlos, 2018). Thus, CNVs is a means for studying the effects of gene dosage alterations for many genes at a time and how these shape neurodevelopmental disease and brain structure.
F I G U R E 2 World map of the ENIGMA-CNV and 22q-ENIGMA WG study sites. A full list of participating cohorts and members for ENIGMA-CNV and 22q-ENIGMA may be found at the respective webpages: http://enigma.ini.usc.edu/ongoing/enigma-cnv/enigma-cnv-co-authors/ and http://enigma.ini.usc.edu/ongoing/enigma-22q-working-group/22qwg/. Both working groups consist of international teams of clinicians, neuroscientists, engineers, bioinformaticians, statisticians, computer scientists, and geneticists who pool their resources to conduct large-scale neuroimaging studies of CNVs Term Definition key role in molecular evolution in primates, as a mechanism involved in rapidly changing gene dosage (which may be advantageous) and even in the creation of new genes.

Noncarrier
In the context of CNVs, this is usually defined as an individual who does not carry the particular CNV being studied.

Penetrance
The proportion of people with a particular genotype/CNV who have any signs or symptoms of the disease.

Pleiotropy
The phenomenon whereby one allele (or a pair of alleles) influences multiple, independent phenotypes.
Polygenic trait A phenotype that is influenced by multiple genetic variants at different genomic sites.
Rare CNV Typically defined as a CNV with <1% frequency in the population.

Reciprocal CNVs
Deletions and duplications that occur at the same locus, usually flanked by LCRs.

Recurrent CNVs
CNVs that occur as spontaneous de novo events at the same sites in the genome repeatedly in unrelated individuals due to the presence of flanking low copy repeats, or LCRs) (Hastings, Lupski, Rosenberg, & Ira, 2009). In other words, they occur de novo in the first individual, and hence are not observed in the CNV carrier's parents but are potentially inherited in subsequent generations.
Single nucleotide polymorphism (SNP) The substitution of a single base (A, T, C, or G) for another base at a specific genetic location that occurs in at least 1% of the population. A SNP may or may not have functional consequences on gene expression.
SNP genotyping array DNA microarray used to detect SNPs within a population.
Somatic disease A disease relating to the body, especially as distinct from the mind.
Translocation A structural variant in which a portion of a chromosome breaks from its original location and reattaches to a different location in the genome.
The 22q11.2 region is an interesting region in this regard as it displays dose response with regard to SCZ risk: the deletion is associated with increased risk  but the duplication appears to be associated with decreased risk (Marshall et al., 2017;Rees et al., 2014). In contrast, reciprocal CNVs may also carry risk for related disorders. For instance, the 22q11.2 deletion and duplication both confer high risk of ADHD (Gudmundsson et al., 2019), Likewise, the reciprocal 16p11.2 distal and proximal (Loviglio et al., 2016;Niarchou et al., 2019), 1q21.1 distal (Bernier et al., 2016;Mefford et al., 2008) and 22q11.2 loci  all confer risk of ASD.
In this context, it is noteworthy, that population-based studies overall suggest milder effects of duplication (vs. deletion) CNVs on cognition (Männik et al., 2015), which could suggest differences in the severity of, for example, ID in the reciprocal CNVs. Thus, CNVs allow investigations into how reciprocal CNVs at each end of the gene dosage response can cause both a "gene dose response" for disease risk but also similar disease risk.
The ultimate phenotype of a CNV likely depends on both environmental impacts and genetic background (Cleynen et al., 2020;Huguet et al., 2018;Kirov et al., 2014). Such influencing genetic factors likely include protective or disease-enhancing genes located within the CNV region, or elsewhere in the genome. Educational attainment as a proxy for parental intelligence, for example, seems to modulate intellectual impairment related to a 22q11.2 deletion (Klaassen et al., 2016), indicating interplay of the CNV with common variants. The interactions between genetic factors as well as the environment will be key to a better understanding of CNV-mediated disease risk. Investigations of interactions between CNVs and polygenic risk score as a proxy for common variants have already been initiated in disorders such as SCZ (Bergen et al., 2018;Davies et al., 2020;Tansey et al., 2016) and ADHD (Martin, O'Donovan, Thapar, Langley, & Williams, 2015). Thus, studies of CNV carriers may help disentangle the effects of the combination of rare and common variants as well as environment in shaping neurodevelopmental disease risk.

| The role of CNVs in brain evolution
Changes in DNA-including CNVs-occur naturally and are a part of the evolutionary process and adaptation (Hastings et al., 2009) in all living organisms including animals and plants (Lauer & Gresham, 2019).
Gene duplications provide a driving force in evolution (Bailey & Eichler, 2006) by allowing for the adaptation of new gene copies while maintaining the function of the old gene copy (Innan & Kondrashov, 2010). Even so, they also put the next generation at risk for rearrangements due to the presence of low copy repeats (LCRs), long clusters of related gene sequences with high sequence identity, that arise via duplication (Harel & Lupski, 2018). Interestingly, in the human and great ape lineage, there are proportionately more deletions and duplications observed in comparison to other mammals (Hahn, Demuth, & Han, 2007).
Some of these duplications have been hypothesized to be major driving forces in the rapid evolution of the human and great ape lineages (Dennis & Eichler, 2016) including brain enlargement and have given rise to entirely new human-specific genes with novel characteristics. Examples include SRGAP2 (three copies in humans, one in nonhuman primates) (Dennis et al., 2012), NOTCH2NL (three-four copies in humans, one in primates) (Fiddes et al., 2018;Suzuki et al., 2018) and BOLA2 (Giannuzzi et al., 2019). The NOTCH2NL and SRGAP2 genes are particularly interesting in the context of brain development: The NOTCH2NL genes confer delayed neuronal differentiation and increased progenitor self-renewal (Fiddes et al., 2018;Suzuki et al., 2018), their occurrence coincides with a time just before or during the early stages of the expansion of the human cortex and they have thus been hypothesized to have contributed to the rapid evolution of the human neocortex. Likewise, transient overexpression of SRGAP2C in culture and in vivo leads to human-specific features, including neoteny of dendritic spine maturation, promotion of longer spines at a greater density, and sustained radial migration in the developing mouse neocortex. Thus, duplications in human evolution appear to have shaped the formation of the human brain.
To date, discoveries on CNV-related phenotypes have been hindered by the low frequency of each single pathogenic CNV in the general population (from 1 in 400 to 1 in 50,000 for recurrent CNVs; Kendall et al., 2016;Stefansson et al., 2014), making it challenging to collect sufficiently large, well-powered samples. Even so, new technologies have moved the field forward considerably during the last 10 years.

| Genotyping and CNV calling
Among the earliest genetic syndromes to be detected were those caused by aneuploidies, such as trisomy 21 (Down's syndrome) and monosomy X (Turner syndrome). Testing for such genetic syndromes was incorporated into clinical practice in the 1950s and involved counting the number of chromosomes per cell, a technique known as karyotyping (Durmaz et al., 2015). Since then, a number of techniques including targeted fluorescence in situ hybridization (FISH), genomewide array comparative genomic hybridization (aCGH) and SNP arrays have allowed detection of smaller aberrations including CNVs down to 10 kb (Nowakowska, 2017).
In 2004, two landmark studies (Iafrate et al., 2004;Sebat et al., 2004) showed that submicroscopic variations (<500 kb in size) in DNA copy number are widespread across the human genome. In the last 10-15 years, it has become possible to obtain genome-wide CNV "calls" for many individuals through massive population-scale SNP genotyping followed by demanding computational analyses. Likewise, clinical investigations and detection have become standard for some disorders. These new developments in technology have been vital for the increased knowledge of CNV carriers obtained in recent years.
3.2 | Neuroimaging as a tool to study CNV effects on the brain

| ENIGMA-standardized image processing
A prerequisite for large imaging studies is the standardization of approaches. The publicly available ENIGMA imaging processing and analysis protocols make it possible to consistently extract brain measures, and perform quality assessment and statistical modeling across many international research centers (http://enigma.ini.usc.edu/protocols).
These standardized feature extraction pipelines lead to more unbiased investigations of brain metrics, in that they are consistently applied across many data sets and cohorts. This approach improves upon traditional meta-analyses, which often attempt to combine published effect sizes derived from different processing and analysis protocols. By pooling data derived using standard image processing pipelines in a coordinated effort, the ENIGMA-CNV and 22q-ENIGMA WG studies boost statistical power by incorporating data sets that may have been underpowered to detect brain effects on their own. The standardization of protocols, now being applied in large prospective studies such as UK Biobank (Alfaro-Almagro et al., 2018), allows large-scale comparison of brain measures and profiles of disease effects across studies to better characterize common and distinct brain signatures across CNVs and major brain disorders from independently collected study samples.

| Collection of CNV information
A molecularly confirmed diagnosis of 22q11.2 deletion is necessary for study inclusion. The most common deletion subtype, known as the LCR22A-LCR22D or A-D deletion, is found in 85% of cases and involves the loss of 2.6 megabases (Mb) of DNA. A smaller 1.5 Mb deletioncalled the LCR22A-LCR22B or A-B deletion-is the next most common subtype, found in 10% of cases (McDonald-McGinn et al., 2015).

| Demographic data harmonization
History of psychotic disorder is established by a trained mental health professional at each 22q-ENIGMA site via a structured diagnostic interview, collateral information, and medical records. A cross-site reliability procedure is conducted by two investigators to independently review representative cases from each site and to ensure diagnostic reliability across sites (Gur et al., 2017).

| THE ENIGMA-CNV WORKING GROUP: STANDARDIZED DATA COLLATION, PROCESSING AND ANALYSIS TO EMPOWER LARGE-SCALE STUDIES
The primary goal of the ENIGMA-CNV WG is to identify CNVs that significantly influence the brain globally and regionally to gain insight into the neurobiology of CNVs. The WG follows the main philosophy of the wider ENIGMA Consortium, which is to leverage existing legacy data sets to their full potential by combining samples using standardized processing. Notably, few of the research groups in ENIGMA-CNV could have performed well-powered CNV-brain imaging studies on their own due to the low prevalence of individual CNVs.

| Data collection and coordination
The large-scale international nature of ENIGMA requires coordination of data originally collected with vastly different study designs, so initial analyses tend to be simple, followed by more complex analyses. From The overall procedure for participation in ENIGMA-CNV and 22q-ENIGMA the beginning, ENIGMA-CNV, rather than focusing on predefined selection of CNVs, chose a pragmatic approach driven by data availability.
One key to success is a unified approach across studies for CNV calling, imaging analysis and quality control (Figure 3), given the differences in original cohort data collection and study design.

| Standardized CNV calling and visualization across cohorts
The low frequency of recurrent CNVs makes a mega-analysis approach preferable to the original ENIGMA meta-analysis approach (Boedhoe et al., 2018). Given the lack of experience in genetic analysis, in particular CNV calling, for many participating cohorts, ENIGMA-CNV first developed an easy-to-follow protocol for CNV calling. Many SNP genotyping arrays exist that vary in the number of SNPs included and their coverage of the genome. The often nonuniform distribution of tagged SNPs across the genome means that there may be limited coverage in regions with segmental duplications or complex CNVs (Carter, 2007). Consequently, larger CNVs (> 500 kb) can be reliably detected by microarrays from most platforms, whereas variability between platforms is greater for smaller CNVs (10-100 kb). A number of different CNV calling methods exist (Pinto et al., 2011). PennCNV , a widely used CNV calling software platform (Macé et al., 2016), was chosen since it accommodates a wide selection of SNP-based arrays (e.g., Affymetrix and Illumina) and is user friendly and fast (Macé et al., 2016)-a key advantage at a time when the number of available samples increases at an unprecedented rate.
Most participating cohorts call CNVs themselves. Alternatively, the ENIGMA-CNV WG does the calling on their behalf based on raw genotype information provided by the respective participating cohort.
To address regulatory issues, the CNV calling protocol includes a de-

| Demographic data
A minimal number of demographic metrics are collected, including age at brain scan, sex, diagnosis (if applicable), scanner site, and multidimensional scaling (MDS) factors (when available) from the analysis of population structure in the genome-wide data.

| Study and analysis design
In disease studies, controls are typically defined at the outset of the individual studies. This contrasts to ENIGMA-CNV where controls, dubbed noncarriers, are individuals who do not carry the particular CNV being studied nor any other potentially pathogenic CNV (as defined by a precompiled list; Kendall et al., 2016). The latter allows a truly blinded sampling as neither the recruiters, nor the participants, knew CNV status at the time of the analysis except for the few clinically ascertained carriers.
For primary data analyses, ENIGMA-CNV applies both a linear regression, to test the effect of the CNV per copy number of the region in question, that is, the dose response, and a t test to compare the pairs of groups (deletion or duplication vs. noncarriers or deletion vs. duplication carriers). Imaging data are adjusted for age at brain scan, sex, and scanner site-both with and without adjusting for ICV.
The number of noncarriers in ENIGMA-CNV is an order of magnitude larger than carriers. This provides the opportunity to perform an estimate of the effect of the CNVs in comparison to the overall population. Separate "sensitivity" analyses are performed including a matched analysis (matching each carrier with a noncarrier based on, e.g., age, sex, affection status, and ICV) as well as separate analyses that take into account ancestry information (MDS factors) and diagnoses (if known). These sensitivity analyses allow testing of the robustness of the results in selected subsets of the sample.

| Overview of the ENIGMA-CNV working groups
The ENIGMA-CNV sample currently comprises a total of 38 cohorts ( Figure 2) with genotyping and MRI data comprised of core ENIGMA-CNV based on clinical (mostly case-control) and population studies as well as publicly available data sets (currently the UK Biobank) and represent a broad spectrum of CNVs (

| White matter structure
The first study of white matter microstructure from the 22q-ENIGMA WG was a mega-analysis of 334 deletion carriers and 260 healthy controls (age: 6-52 years) from 10 international sites The minimal core segment of the 16p11.2 distal CNV is 200 kb in length and contains nine genes. A study in zebrafish found that only over-expression of the LAT gene from the 16p11.2 distal region induced a decrease in cell proliferation in the brain with a concomitant microcephalic phenotype (Loviglio et al., 2017). LAT knockout mice also showed anatomical brain abnormalities (Loviglio et al., 2017) and brain regions expressing the highest levels of the LAT gene include basal ganglia (Hawrylycz et al., 2012), providing overlap with the brain structural changes identified in the ENIGMA-CNV study. These findings provide converging evidence that LAT, an immune signaling adaptor, is a possible dosage-dependent driver of the CNV-associated brain phenotypes, including the alterations in the basal ganglia. These findings also fit well with a proposed role of the immune system in the development of psychiatric disorders such as SCZ (Khandaker et al., 2015). Notably, a recent GWAS on subcortical volumes identified a GWAS hit, rs1987471, for the caudate nucleus in the 16p11.2 distal region upstream of the ATXNL2 gene (Satizabal et al., 2019), indicating that several genes in the interval may be involved in the brain structural changes.

| 15q11.2 CNV
In this study, ENIGMA-CNV targeted a more frequent CNV, the 15q11.2 CNV (BP1-BP2, 20.3-20.8 Mb, hg18 genome assembly) with a population prevalence around 0.3% (Crawford et al., 2018;Stefansson et al., 2014). The deletion has unequivocally been associated with an increased risk for SCZ, (OR = 1.6; Marshall et al., 2017). Overall, the effect sizes on disease, cognitive and behavioral phenotypes on the duplication are absent or small: In fact, the duplication has not been clearly associated with psychiatric or neurodevelopmental disorders and its carriers perform on par with controls on cognitive tests (Abdellaoui et al., 2015;Stefansson et al., 2014). In contrast, the deletion is associated with a reduction in IQ of 4 points (Huguet et al., 2018;Jønch et al., 2019) while deletion carriers unaffected by psychiatric or neurodevelopmental disorders have an increased risk of dyslexia and dyscalculia (Stefansson et al., 2014). Reflecting these small effects, the vast majority of carriers are not clinically affected, and the deletion is inherited in >90% of the cases (Cox & Butler, 2015;Jønch et al., 2019;Mohan et al., 2019). Finally, 15q11.2 dosage has been reported to be associated with white matter alterations (Silva et al., 2019;Stefansson et al., 2014;Ulfarsson et al., 2017).
We assessed the association of the 15q11.  Chai et al., 2003). The first three of these genes have known roles in neurodevelopment and contain polymorphisms associated with several brain disorders (Goytain, Hines, El-Husseini, & Quamme, 2007;Goytain, Hines, & Quamme, 2008;Napoli et al., 2008;van der Zwaag et al., 2010). CYFIP1 and NIPA1 are highly expressed in the developing brain (van der Zwaag et al., 2010) and are key players in a number of processes contributing to brain plasticity, including axon outgrowth and dendritic spine formation (De Rubeis et al., 2013;Schenck et al., 2003;Wang, Shaw, Tsang, Reid, & O'Kane, 2007). Likewise, common CYFIP1 polymorphisms, that influence its expression levels, have been linked to variation in cortical surface area (Woo et al., 2016). Thus, the pattern of results fits well with known molecular functions of the genes in the 15q11.2 region, in particular CYFIP1, and suggests involvement of these genes in neuronal plasticity and cortical development.

| 1q21.1 distal CNV
Carriers of the 1q21.1 distal CNVs (BP3-BP4, 145-145.8 Mb, hg18 genome assembly) are at higher risk for several disorders including SCZ, ID, developmental delay, speech problems, ASD, motor impairment and epilepsy (Bernier et al., 2016;Chawner et al., 2019;Gourari, Schubert, & Prasad, 2018;Haldeman-Englert & Jewett, 1993;Mefford et al., 2008) and separate risk for the duplication carriers for ADHD (Gudmundsson et al., 2019), bipolar disorder and major depressive disorder (Green et al., 2016;. The CNV has a frequency of 0.03% and 0.05% for the deletion and duplication, respectively, (Kendall et al., 2016;Stefansson et al., 2014). The 1q21.1 distal CNV has an effect on head circumference, as evident from a high prevalence of micro-and macrocephaly in deletion and duplication carriers, respectively (Bernier et al., 2016;Brunetti-Pierri et al., 2008;Rosenfeld et al., 2012). Despite the high effect sizes identified, at the time of writing, GWAS based on the hg19 genome assembly have not identified hits in the 1q21.1 genomic region for ICV (Adams et al., 2016;Knol et al., 2020), total cortical or regional surface area Hofer et al., 2020). Because of the many LCRs in the region (Brunetti-Pierri et al., 2008;, assembly of the 1q21.1 region was faulty until version GRCh38 , likely inhibiting gene discovery and this may explain the current lack of GWAS hits in the region.

ENIGMA-CNV systematically assessed brain structural MRI varia
Given the different breakpoints of the 1q21.1 distal CNVs, estimates of the number of affected genes vary, but the core interval encompasses at least 12 protein-coding genes including several human-specific genes, such as HYDIN2 (Dolcetti et al., 2013;Rosenfeld et al., 2012), NOTCH2NLs (Fiddes et al., 2018;Suzuki et al., 2018) and the DUF1220/Olduvai domain-containing NBPFencoding genes. The recently characterized NOTCH2NL genes are particularly interesting in the context of brain development, and as candidates for a dosage-dependent amplifier of the CNV-associated brain phenotypes. They are absent in humans' closest living relatives-nonhuman primates-and confer delayed neuronal differentiation and increased progenitor self-renewal (Fiddes et al., 2018;Suzuki et al., 2018)-in line with the radial unit hypothesis of cortical development (Rakic, 1995). A neurodevelopmental effect on cell proliferation fits well with the overall directional effect of this CNV on cortical surface area and ICV.

| Summary and implications of the findings
A common finding across all four CNVs studied by the 22q11-ENIGMA and ENIGMA-CNV WGs is the presence of a gene dosage response on several brain structures (Sønderby et al., 2021;Lin et al., 2017;Sønderby et al., 2018;van der Meer et al., 2020)   . The general rule seems to be that the effect on brain structure fits an additive model for gene dosage formed by, for example, gene expression, whereas an inverted U-shaped effect curve is observed for the phenotype (Deshpande & Weiss, 2018).
Importantly, brain structural findings in CNV carriers appear to overlap, to some extent, with patterns of brain alterations found in several major brain disorders including ADHD (Hoogman et al., 2017), ASD (van Rooij et al., 2018), SCZ (van Erp et al., 2016), bipolar disorder , major depressive disorder (Schmaal et al., 2016), and epilepsy (Whelan et al., 2018). However, several CNVs clearly have effect sizes far greater than those of the idiopathic diseases (Figures 4 and 5). Likewise, so far, it is difficult to find a direct pattern in the overlap between known disease susceptibility for CNV carriers and brain structural effects-both in terms of specificity (actual overlap) and effect sizes. Thus, this makes it increasingly evident that vastly different brain alterations-for example, large macroscopic effects (e.g., in 16p11.2 and 1q21.1 CNVs) as well as small subtle effects (e.g.,15q11.2 CNVs)-can lead to similar phenotypes, underlining the heterogeneity in brain structure within diseases such as SCZ and a putative potential to stratify based on brain structure within specific diseases. Improved understanding of these types of relationships may prove important for understanding disease susceptibility and outcome.

| ONGOING PROJECTS AND FUTURE DIRECTIONS
Results from the 22q-ENIGMA and ENIGMA-CNV WGs confirm that multiple CNVs are associated with differences in brain morphology. This effort is providing information on genetically determined variation in brain development and its relation to neurodevelopmental, psychiatric, and neurological disorders. Current and future CNV studies will benefit both from even larger samples with more ethnicities represented, based on broad collaborations and standardized data collection across samples and in smaller, more flexible studies with deeper phenotyping.

| Increasing sample sizes
There is great potential to include additional samples in ENIGMA WGs on CNVs. First, additional cohorts can easily be incorporated and additional measures such as the corpus callosum, cerebellum, brain stem, and ventricles-not yet targeted in ENIGMA-CNV-can be added to the protocol. Second, independent research projects performing targeted recruitment and MRI brain scans on CNV carriers can easily join. Third, clinical scans of CNV carriers may be leveraged, provided the MRI quality is sufficient for accurate morphometry, and a number of appropriate genotyped controls from the same scanner site is provided: part of the standard evaluation for children with developmental delay or ID may include brain MRI, in particular for cases where additional clinical indications such as epilepsy, head circumference abnormalities or focal neurological signs are present (Mithyantha, Kneen, McCann, & Gladstone, 2017).
Finally, such independent samples can be supplemented with an increasing amount of data from open data sets such as the ABCD study (Casey et al., 2018) and UK Biobank (Littlejohns et al., 2020).
There are notably analytical challenges and interpretational risks involved in large-scale neuroimaging studies (Smith & Nichols, 2018

| Lifespan trajectories
Current investigations have focused on the overall impact of CNVs on the brain, mostly disregarding a potential dynamic or interactive effect of age on brain maturation. Gross structural brain alterations are likely present from an early stage of development-as exemplified by the macrocephaly observed in utero in a 1q21.1 distal duplication carrier (Verhagen et al., 2015). However, detailed knowledge on the development of brain structure in CNV carriers over the lifespan is lacking.
Recently, the first study to address this-targeting the 16p11.2 proximal CNV-suggested that differences in brain structure in deletion and duplication carriers in comparison to noncarriers remain stable throughout childhood, adolescence and at least until around 7.3 | Expanding phenotypic information 7.3.1 | Expanding to brain connectivity: Inclusion of DTI and resting state MRI Accumulating evidence converges on brain dysconnectivity as a transdiagnostic phenotype in mental illness, based on aberrations in the "wiring" of the brain in individuals suffering from mental illness. To date, the 22q-ENIGMA WG has targeted brain measures more broadly-including diffusion-weighted MRI and surface-based shape analysis-whereas the ENIGMA-CNV WG has focused on ROI-based measures of brain structure. Workflows for diffusion MRI analyses have already been developed and tested in various ENIGMA WGs (Jahanshad et al., 2013;Kochunov et al., 2016) Moreau et al., 2020), the 16p11.2 proximal (Chang et al., 2016;Moreau et al., 2020), or the most abundant low impact recurrent CNV, 15q11.2 (Silva et al., 2018). A combined effort within this arena would be likely to move the field forward.

| Structural covariation and spatial gene expression
The primary ENIGMA-CNV and 22q-ENIGMA imaging studies have provided a better understanding of the spatial distribution of CNVrelated brain alterations across the brain (localized to lobes, gyri, sulci, etc.). Methods such as structural covariance may help to better understand CNV-related disruptions in the developmental coordination between maturing brain regions (Alexander-Bloch, Raznahan, Bullmore, & Giedd, 2013). Techniques such as "virtual histology," which relate group differences in MRI-derived cortical measures (e.g., 22q11.2 carrier vs. healthy control) to gradients in cell-specific gene expression from the Allen Human Brain Atlas may provide a step toward bridging the gap between MRI-brain alterations and underlying cell-specific pathophysiology (Patel et al., 2018;Shin et al., 2017).

| Adding clinical, cognitive, and behavioral data
The strength of the 22q-ENIGMA and ENIGMA-CNV WGs is combining large-scale data on both CNV calls and imaging data. Adding deeper phenotyping information such as cognitive, mental, and behavioral data would be highly beneficial. The challenge with incorporating such data from independent studies is that the standardization of phenotypic information across the independently collected cohorts is lacking; other ENIGMA WGs have begun to deal with this challenge by harmonizing cognitive endpoints (Tate et al., 2021). Through organization and standardization of samples, we have the potential to deepen our knowledge regarding the relationships between CNV carriers, imaging measures and other phenotypes.
Publicly available data sets, such as the UK Biobank, already allow for large-scale analysis of cross-phenotypic traits-such as brain-cognition mediation by combining cognition and brain imaging data.
Given the continued recruitment of individuals for MRI studies, and the future availability of other large-scale harmonized brain imaging data sets such as in the ABCD study (Casey et al., 2018), the potential of this type of analysis is continuously expanding. Future studies aim to move beyond case-control comparisons to better understand the underlying brain structure and functional mechanisms related to behavioral phenotype heterogeneity in CNV carriers (Marquand, Rezek, Buitelaar, & Beckmann, 2016). These types of analysis, taking advantage of "big data" samples, call for robust data-driven approaches that do not simply maximize prediction accuracy but improve our interpretation and understanding of underlying mechanisms (Bzdok, Nichols, & Smith, 2019).

| Identifying causal mechanisms-Mechanistic follow-up studies
Candidate genes within the CNV regions can be linked to neuronal differentiation, axon outgrowth and dendritic spine formation, neurotransmitter and immune signaling, and other processes important for brain development. Moving forward, interest will no doubt focus on identifying and characterizing these driver genes causing the phenotypes or factors modifying the phenotypes. Many approaches can aid in this effort.
Studies of variable CNV breakpoints can narrow down potential driver genes. This is exemplified by the recent dismissal of HYDIN2 as a driver gene for the head circumference phenotype in 1q21.1 distal CNV: Several atypical 1q21.1 distal deletion and duplication carriers with normal copy number variation of HYDIN2, but still exhibiting the microcephaly or macrocephaly phenotype, were identified (Dougherty et al., 2017). A recent exome sequencing study of people with autism identified BCL11A as a potential driver gene for autism in the 2p15-p16.1 CNV (Satterstrom et al., 2020). Likewise, gene expression studies have helped to narrow down the gene behind the SNP association for brain structure. Another option is expression of individual genes in model organisms such as mouse (Dominguez-Iturza et al., 2019;Nielsen et al., 2017) or zebrafish (Loviglio et al., 2017). Such approaches may identify driver genes for brain phenotypes associated with CNVs, pinpointing the biological mechanisms involved.
Finally, future studies should try to disentangle the role of common genetic variants in moderating the phenotype caused by CNVs, through common and rare variant interplay analysis as done for instance in SCZ (Bergen et al., 2018;Tansey et al., 2016) and ADHD (Martin et al., 2015).

| Secondary proposals
Several secondary projects are ongoing in the 22q-ENIGMA and ENIGMA-CNV WGs. In the 22q-ENIGMA WG, a secondary project led by Fidel Vila-Rodriguez (The University of British Columbia) is now investigating the structural covariance of gray matter volume using source-based morphometry, and another study led by Jennifer Forsyth at UCLA is investigating specific 22q11.2 genes driving cortical surface area and thickness alterations.. In addition to recurrent CNVs, numerous single, nonrecurrent CNVs disrupting one or more genes may have a large impact on the brain and behavior but determining the impact and clinical interpretation of these single CNVs is even more challenging (Nowakowska, 2017). In ENIGMA-CNV, "The effect of very rare CNVs on brain structure and function," headed by the group of Sébastien Jacquemont at University of Montreal, investigates very rare CNVs by analyzing all CNVs in the genome-even single hit CNVs-and their effect on brain structure. Another secondary project in the ENIGMA-CNV WG, headed by David Linden (Cardiff University/Maastricht University), addresses how brain changes across different pathogenic CNVs correlate with penetrance scores for SCZ and developmental delay, aiming to find common brain phenotypes that are most related to risk for disorders.

| CONCLUSION
CNV analysis offers a genetics-first approach to studying neu-

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.