The impact of human copy number variation on a new era of genetic testing

Authors


Dr KW Choy, 1E, Department of Obstetrics & Gynaecology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, China. Email richardchoy@cuhk.edu.hk

Abstract

Please cite this paper as: Choy K, Setlur S, Lee C, Lau T. The impact of human copy number variation on a new era of genetic testing. BJOG 2010;117:391–397.

Cytogenetic studies have demonstrated that duplications or deletions of entire chromosomes or microscopically visible aberrations are associated with specific congenital disorders. The subsequent development and application of microarray-based assays have established the importance of copy number variants (CNV) as a substantial source of genetic diversity in the human genome. Pathogenic CNVs are associated not only with birth defects and cancers, but also with neurodevelopmental disorders at birth or neurodegenerative diseases in adulthood. Unfortunately, the limited knowledge of the phenotypic effects of most CNVs has led to the classification of many CNVs as genomic imbalances of unknown clinical significance. This has caused many clinicians to resist the introduction of microarray technologies in detecting CNVs in a genome-wide manner for prenatal applications. This review summarises our current understanding of CNVs, the common detection methods, and the implications for human health and prenatal diagnosis.

Introduction

Diploid organisms (such as humans) have two copies of each autosomal region: one per chromosome. What is less well known is that there are many genetic regions where there is a variation in the numbers of copies (i.e. less or more than the expected two). This phenomenon was first discovered during the human genome project, and may occur as a result of deletions or duplications, and may be inherited or aquired.

Copy number variants (CNVs) are defined as stretches of DNA larger than 1000 base pairs (bp) that are normally found only once on each chromosome in each person, but in some individuals these are duplicated or triplicated—i.e. there is a variation in the number of copies of this section of DNA from one individual to another.1 With advances in molecular-based techniques, several microarray platforms based on a chip format with lots of segments of DNA on it, i.e. a micro-array, have been developed, enabling the detection of submicroscopic deletions or duplications in segments as small as tens to hundreds of kilobases (kb) in size, which is well below the level of discrimination by conventional G-banded karyotype analysis.2–5 It is these methods that have led to the discovery that chromosome CNVs are widespread throughout the genome of healthy individuals.6–12

At present, microarray-based comparative genomic hybridization (array CGH or aCGH), or array genomic hybridisation (AGH), is rapidly becoming the method of choice to detect very small chromosomal imbalances associated with postnatal syndromic malformations.13,14 The ability to provide a comprehensive profile of all known and clinically relevant genomic gains and losses in one single test could therefore make array CGH/AGH the most attractive tool for prenatal genetic diagnosis. However, the practicality of array CGH/AGH for prenatal diagnosis in cases with abnormal cytogenetics (including apparently balanced or new translocations) has only been demonstrated in a few studies.4,15–20 Although such high-resolution analysis enables the detection of genome changes previously not observed because of technology limitations, it also creates challenges in data interpretation as many new CNVs of doubtful clinical significance become apparent. This review summarises our current understanding of CNVs, the latest developments in detection methods, and the current implications of CNVs for human health, including prenatal diagnosis.

What are copy number variations?

A CNV is a form of structural variation in the genome that consists of differences in the number of copies of a particular region in the genome. Hence, depending on the nature and resolution of the microarrays, array CGH/AGH is capable of detecting copy number changes of varying sizes, from as small as a few bp to submicroscopic deletions, or duplications that are up to multimillion bp in size,21,22 in a single hybridization. There are now over 29 133 CNVs catalogued in the Database of Genomic Variants.7

Depending on the method used for assessment, up to about 12% of the human genome is CNV,12,23–26 with CNVs contributing between 0.12% and 7.3% of the genomic variability seen within humans.12,25 The fact that over 41% of all CNVs identified overlap with known genes20,26 suggests that CNVs may play a substantial role in modulating how genes are expressed.

Currently, the proportion of genetic disease caused by CNVs is unknown, but this may be substantial. A comprehensive map of CNVs and SNPs from 1000 phenotypically normal individuals is now being constructed as part of an international sequencing effort (http://www.1000genomes.org). The data generated from this project will help us to address this question.

The broad definition for CNVs makes no reference to the clinical impact of a given genomic imbalance, and it can therefore be confusing for clinicians, clinical geneticists and cytogeneticists, who have traditionally taken the view that chromosomal ‘variants’ are alterations that are not clinically significant. Hence, what was once thought of as clinically insignificant might later turn out to be a CNV that confers differential susceptibility to a disease, or could even be causative of a genomic disorder with late onset or variable penetrance. Therefore, CNV is now used to describe copy number differences in studies of both disease and normal controls.

Mechanisms of CNV formation

Copy number variants are produced by germline genomic rearrangements that result in gains or losses of DNA segments. The mechanisms that cause CNVs can be broadly categorised into three pathways.27 Please refer to the cited references for detailed explanations.

  •  Non-allelic homologous recombination (NAHR), where homologous recombination between the genome results in gains and losses of DNA. NAHR events occur at a rate of up to 10−4 per generation, much more often than the rate for the creation of single nucleotide polymorphisms (SNPs) (10−8 per generation).26,28
  •  Non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ) mechanisms.29
  •  The replication Fork Stalling and Template Switching (FoSTeS) model.30,31

Characteristics of CNVs

Copy number variants were first identified by Lee et al.7 and Wigler et al.8 using two independent approaches. Current estimates are that more than 240 Mb of the reference genome is CNV,1,24 representing up to 12% of the euchromatic genome, and present in at least 6% of each chromosome.12 However, these estimates are based on many CNV discovery projects that used large insert clone-based array CGH/AGH experiments, resulting in poorly defined boundaries for the CNVs and possibly an overestimation of the size of many currently identified CNVs. Several databases have now been established to collate CNV data. CNV data from healthy individuals are being catalogued in the Database of Genomic Variants (http://www.projects.tcag.ca/variation), whereas ‘pathogenic’ CNVs from patients affected with a neurodevelopmental disorder are being collated in databases such as DECIPHER (http://www.sanger.ac.uk/PostGenomic/decipher/), the chromosome abnormality database (http://www.ukcad.org.uk/cocoon/ukcad/), the International Standard Cytogenomic Array (ISCA) Consortium (http://www.isca.genetics.emory.edu/), and the European Cytogenetics Association Register of Unbalanced Chromosome Aberrations (http://www.ecaruca.net).

Presence of somatic CNV in human tissues

Somatic mosaicism is defined by the presence of genetically distinct populations of somatic cells in a single organism. Although somatic chromosomal copy number alterations have long been known to be connected with a disease phenotype, generally we have assumed that normal cells are genetically identical, including CNVs. This concept has been challenged by recent studies in humans and cattle.32 Piotrowski et al.33 detected the presence of somatic CNVs ranging from 82 to 176 kb, frequently encompassing known genes in adult human tissues. This study concurs well with another monozygotic twins study,34 suggesting that somatic mosaicism for CNVs is relatively common in normal adult and even fetal tissues (our unpubl. obs.). These data suggest that some CNVs previously reported as germline might represent somatic events. The frequency and spectrum of CNVs among somatic cells are still largely unexplored.

Technologies detecting CNVs in a genome-wide manner

Array-based comparative genomic hybridization (array CGH)

Initially, arrays had several hundred or even up to a thousand large-insert DNA clones (multiple copies of a DNA sequence). For example, a bacterial artificial chromosome clone is derived from a bacterial DNA sequence that allows a bacterium to produce the sex pilus necessary for conjugation, and contain partition genes that can be used to separate out particular sections of DNA for amplification and study. Array CGH was first introduced as ‘matrix CGH’,35 and then later as ‘array CGH’.36 In array CGH, the microarray contains probes that can be short oligonucleotides (25–60-mer probe length) or genomic fragments (up to 1 Mb in length). Nimblegen Inc. (http://www.nimblegen.com) uses a programmable mirror array to synthesise a Whole Genome Tiling Array (containing 2.1 million target oliogonucleotides) directly on a glass surface using photolithography. Agilent Technologies, Inc. (http://www.chem.agilent.com) uses ink-jet technology to synthesise 1 million oligonucleotides, on a spot-by-spot basis, to produce human genomic arrays.

To identify copy number gain or loss from an individual patient, the whole genomic DNA from the patient’s DNA and a reference (normal) control DNA are differentially labelled with two different fluorophores. After the samples are co-hybridised to the same array, the fluorescence ratio generated from control versus patient DNA for each probe represents the average copy-number ratio between the patient’s and control’s DNA (Figure 1). A gain or loss of fluorescence signal intensity from the patient’s DNA indicates a respective gain or loss of the patient’s DNA copy numbers. For example, the presence of an extra copy of chromosome 21 in the genome of a patient with Down syndrome will be represented as a single copy number gain of DNA at chromosome 21. In addition, array CGH/AGH platforms are capable of mapping rearrangement breakpoints and analysing the genome at resolutions down to a few kb, or even less, in a single hybridization.

Figure 1.

 Schematic representation of array CGH technology. Whole-genomic DNA from a normal control (reference) and genomic DNA from a patient sample is differentially labelled with two different fluorophores (normal sample, Cy3, versus patient sample, Cy5). The green colour (Cy3) peak (green arrow) indicates a loss of red signals and a deletion in the patient DNA sample compared with the reference (normal) DNA. On the other hand, if there is a gain of Cy5 signal from the patient sample, this results in a red peak (red arrow), indicating an increase in copy number of that specific genomic region in the patient’s DNA sample. Array CGH can also map DNA copy-number alterations according to the probe positions in the genome.

Single nucleotide polymorphism arrays

High-throughput array technologies for identifying SNPs can also be used to identify CNVs. The Affymetrix SNP arrays (http://www.affymetrix.com) and the Human 1M-Duo BeadChip by Illumina Inc. (http://www.illumina.com) use mismatch hybridisation and single-base extension, respectively. The SNP arrays consist of probes for the detection of both SNPs and CNVs.8–10 In general, these arrays contain short oligonucleotides (20–30 bases) that make them ideal for detecting single-base alterations, but less ideal for identifying CNVs, when compared with long oligonucleotide-based arrays. Hence, an SNP array can interrogate both loss of one allele and loss of heterozygosity (LOH) events. In addition, this platform is able to detect copy-neutral LOH events, uniparental disomy, non-paternity and consanguinity. Affymetrix 500 K arrays were recently used to identify CNVs in the 270 HapMap (Haplotype Map) individuals.12

The ‘effective resolution’ of array CGH

This depends largely on the ‘cut-off’ established for CNV identification (i.e. the number of consecutive probes needed to define a CNV region). For example, most array CGH platforms require at least three consecutive probes to make a confident CNV call. If a higher noise level is encountered the array may require more consecutive probes to make a CNV call.

Limitations of the current genome-wide platforms for CNV detection

By using independent platforms (Agilent 244 K and Affymetrix 6.0 SNP array) for analyses of genome-wide CNVs in ten humans, we found that even these newer platforms miss a large portion of CNV regions present in any given individual. A study comparing different genome-wide SNP genotyping platforms, with reference to the independent sequence-based CNV maps by Copper et al.,37 concurred with our findings. The low detection rate across different platforms mainly results from the fact that commonly used SNP and array CGH/AGH platforms have limited or no probe coverage for a large fraction of CNVs.38 Therefore, it is important to be aware of the pros and cons of having common polymorphic CNVs within the prenatal chip design.

Copy number variation and human diseases

Copy number variants can be divided into three groups: germline inherited and present in a parent; newly aquired but not present in a parent; and somatic (meaning that it occurred after the single-cell stage of an embryo). Inherited CNVs and new CNVs are increasingly being found to be associated with risk for various diseases (Table 1).

Table 1.   Copy-number variants associated with human diseases
DisorderCNVGeneEffectRisk associatedStudy typeSignificanceReferences
Infectious disease
HIV-1/AIDS susceptibilityCommonCCL3L1DosageLow CNVCase controlVaries in populations[54]
Autoimmune disorder
Systemic lupus erythematosis (SLE)CommonFCGR3BDosageLow CNVCase controlP = 2.7 × 10–8[40,42]
PsoriasisCommonDEFBDosageHigh CNVCase controlP = 1.65 × 10–6[48]
Crohn’s diseaseCommonHBD-2DosageLow CNVCase controlP = 0.002[47]
Neurological disorders
Autism Spectrum DisordersUnknownMultipleUnknownDe novo CNVs; multiple CNVsFamilialP = 0.043[50]
Parkinson’s diseaseRareSA/CADosageDuplication/TriplicationFamilialNA[44–46]
Bipolar disorderRareGSKSbDosageDeletions and duplicationsCase controlP = 0.002[43]
SchizopreniaRareMultiplePositionalDeletions and duplications; de novo CNVsCase controlVaries in CNV types[51,52]
Cancers
Breast cancerRareMTUS1 (exon 4)PositionalExon deletion (decrease risk)Familial CaseP = 0.01, OR = 0.58[59]
Prostate cancerCommonUGT2B17PositionalGene deletionCase controlVaries in study subjects[60–62]
NeruoblastomaCommonNBPF23DosageDeletions and duplicationsCase controlP = 1.7 × 10–11, OR = 2.23[58]

New studies suggest that germline CNVs can also predispose an individual to syndromic malformations.39,41,44–55 For example, autosomal-dominant Microtia was linked to five tandem copies of a CNV region at chromosome 4p16,56 whereas germline genomic rearrangements and gene copy-number alterations have been associated with nervous system disorders.57 Prenatal assessment is possible, and we have shown that a newly aquired CNV at 16p13.11 is associated with increased nuchal translucency (NT), which is a prenatal ultrasound abnormality indicative of increased risk of a fetal chromosomal abnormality.16

Recent studies have also shown that germline CNVs may play a significant role in cancer risk, including neuroblastoma,58 breast59 and prostate cancers,60–62 revealing a previously unknown role of CNV in carcinogenesis.

Challenges of CNVs in the application of array CGH in prenatal diagnosis

The use of array CGH in prenatal diagnosis is a comparatively new concept, which has been advocated in two opposite scenarios: the presence of sonographic abnormalities or the presence of risk factors. As illustrated earlier in postnatal applications, the use of array CGH may actually detect unanticipated chromosomal microdeletions/microduplications, but these may be merely normal variants, and might carry no clinical significance.

Assessment of pathogenic CNV

A detailed assessment of pathogenic CNVs has been previously reported by Lee et al.63 In principle, a pathogenic CNV usually overlaps critical regions of known microdeletion or microduplication syndromes in nature. In addition, pathogenic CNVs not only account for major birth defects but are also associated with children and adults with mental retardation or adulthood cancer. Many phenotypes associated would be expected to be relatively mild or even undetectable during a fetal ultrasound morphology scan, such as in the case of Y-chromosome deletions and infertility. Another example is Charcot-Marie-Tooth disease type 1A (CMT1A), which is a late-onset neuropathy associated with CNV duplication at 17p12 that can be inherited, but which is not as ‘serious’ as the typical mental retardation syndrome.64 Although these studies are beginning to clarify the role of CNVs in relation to neurological deficit, prenatal genetic diagnosis remains complex, and genetic counselling should be provided to the patient prior to the array CGH testing, so as to maximise the benefit of array CGH and to minimise the false-positive rate, as well as the risk of missing a serious condition attributable to CNV. In addition, there is the need for parental DNA samples for comparison, and to assist in interpretation and counselling.

Which array should be used for prenatal diagnosis?

Many cytogenetic laboratories offer genome-wide array CGH/AGH testing for CNVs as analogous to banded chromosome analysis (karyotypic analysis), but with the added ability to obtain more genomic information in a single test. Indeed, it has been estimated that array CGH-based assays are now detecting apparently pathogenic genomic imbalances in as much as 15% of cases that have had normal results from chromosome-banded karyotyping tests.65,66 It has been estimated that in the USA alone, more than 10 000 array CGH/AGH tests are now performed clinically every year. Genome-wide arrays give an overview of all of the alterations in a given genome at a high resolution.12–15

Although array CGH has been widely accepted for postnatal diagnosis, there has been some hesitation in using genome-wide arrays in prenatal diagnosis. The major difficulty is how to accurately discriminate benign from pathogenic CNVs. The particular concern is with genomic CNVs that have not been observed among healthy individuals, and do not coincide with recurrent CNVs observed in individuals with similar clinical presentations.21,56 All CNVs detected in a prenatal array CGH/AGH test need to be validated by another method, and assessed for their potential pathogenicity contributing to the clinical presentation, on the basis of genome size and gene content.

Strategies to avoid the clinical uncertainty of CNVs identified

The identification of an increased number of non-pathogenic CNVs or CNVs with unclear clinical significance21,67 has made the interpretation of array CGH-based assays more difficult. For example, it is difficult to rule out a causative or a modifier role of these genomic variants or polymorphisms in developmental disorders. To address this issue, the vast majority of prenatal testing is performed using ‘targeted’ array platforms that specifically assess relative copy number for critical regions of well-defined pathogenic associations.10,11 Targeted array CGH platforms, which primarily only use probes with well-defined associations with genetic disorders, have begun to be used for specific prenatal diagnoses.68,69 One group using a non-targeted array recommends collecting blood from both parents at the time of the prenatal procedure, so that whether a CNV is newly aquired or inherited can be determined without delay, and before issuing a report.20

The need for database development

In the future, the building of a genotype–phenotype database of disease like DECIPHER will be of paramount importance. Such a database will be useful in helping to narrow down diagnosis to a single syndrome when a comprehensive prenatal assessment of the fetus cannot be completed before delivery.

Cost issues

In a cost-effectiveness analysis of patients with idiopathic learning disability in the UK, Wordsworth et al.70 showed that the average cost of array CGH was £442, and that the average cost of karyotyping was £117. However, when there was need for further diagnostic tests, such as multi-telomere multiplex ligation-dependent probe amplification (MLPA) (£245) or targeted fluorescence in situ hybridisation (FISH) (£214), in addition to karyotyping, the cost per diagnosis for array CGH would be comparable or even less than karyotyping plus targeted FISH or MLPA. No studies assessing prenatal testing have been reported.

Conclusion

With the completion of the human genome project, we have learnt that the genomes of healthy individuals are 99.9% similar. It has also become obvious that there is more genetic variation than had been previously appreciated: a substantial portion is in the form of structural genomic variation, including CNVs, which has contributed up to 7.3% of the genome-wide variation. However, our knowledge of the phenotypic effects of most CNVs is still limited, hence the classification of many CNVs as ‘genomic imbalance of unknown clinical significance’. Therefore, further work is warranted in the following areas.

  •  The identification of small copy-number changes.
  •  Phenotype–genotype correlation of microdeletion/microduplication syndromes.
  •  Understanding of the locations, structures and frequencies of CNVs.

These will greatly accelerate the accurate clinical interpretation of CNVs, enhance the potential of array CGH for clinical and prenatal applications, and improve our knowledge of the structure and function of the human genome.

Disclosure of interests

None.

Contribution to authorship

SRS performed research; KWC, CL and TKL analyzed data and wrote the paper.

Details of ethics approval

Not appliable.

Funding

TKL and KWC are supported by the Hong Kong Research Grants Council, and by a direct grant for research at the Chinese University of Hong Kong. KWC is also supported by the Global Scholarship Programme for Research Excellence: CUHK.

Acknowledgement

We would like to thank Drs Larry Baum and Terence Lao for comments and useful discussion.

Ancillary