Copy number variation (CNV) has been revealed as a significant contributor to the genetic variation in humans. Although CNV has been reported in several model animal and plant species, the presence of CNV and its biological impact in polyploid species has not yet been documented. We conducted a fluorescence in situ hybridization (FISH)-based CNV survey in potato, a vegetatively propagated autotetraploid species (2n = 4x = 48). We conducted FISH analysis using 18 randomly selected potato bacterial artificial chromosome (BAC) clones in a set of 16 potato cultivars with diverse breeding backgrounds. Six BACs (33%) with insert sizes of 137–145 kb were found to be associated with large CNV events detectable at the cytological level. We demonstrate that the large CNVs associated with two specific BACs (RH102I10 and RH83C08) were widespread among potato cultivars developed in North America and Europe. We measured the transcript abundance of four genes associated with the CNV spanned by BAC RH102I10. All four genes displayed a dosage effect in transcription. Although potato is vegetatively propagated, we observed that female gametes lacking the RH102I10-associated CNV were inferior to those with at least one copy of this CNV, indicating that the RH102I10-associated CNV can impact on the growth and development of the potato plants. Our results show that CNV is highly abundant in the potato genome and may play a significant role in genetic variation of this important food crop.
Copy number variation (CNV) is defined as stretches of DNA from 1 kilobase (kb) to several megabases (Mb) that display different copy numbers in populations (Feuk et al., 2006). Presence/absence variation (PAV) is an extreme example of CNV, in which DNA sequences are present in one genome yet completely absent in another genome (Springer et al., 2009). CNV was discovered initially in the human genome (Iafrate et al., 2004; Sebat et al., 2004), and has been documented in other model animal species, including Drosophila melanogaster (Emerson et al., 2008) and mouse (Yalcin et al., 2011). The human database of genomic variants (http://projects.tcag.ca/variation) currently contains a collection of >15 900 CNV loci, altogether covering 35% of the human genome. More importantly, many human CNVs are associated with diseases or susceptibility to diseases, either through dosage of a single gene, a contiguous set of genes, or in the case of complex diseases, allelic combinations (Henrichsen et al., 2009; Craddock et al., 2012). Thus, CNV has been demonstrated to be an important contributor to genetic variation in humans (Henrichsen et al., 2009; Stankiewicz and Lupski, 2010).
Copy number variation has recently been reported in several plant species, including maize (Springer et al., 2009; Swanson-Wagner et al., 2010), Arabidopsis thaliana (DeBolt, 2010; Cao et al., 2011), rice (Yu et al., 2011), and soybean (Haun et al., 2011; McHale et al., 2012). Analyses of multiple genotypes in Arabidopsis and maize suggest that CNV may play a significant role in phenotypic diversity and hybrid heterosis in plant species (Swanson-Wagner et al., 2010; Cao et al., 2011). In addition, several recent reports showed that duplication of a single or an array of multiple genes in plants can have a dramatic impact on growth and development (Pearce et al., 2011; Li et al., 2012; Wingen et al., 2012) or generate novel resistance to pests (Cook et al., 2012). These results suggest that CNV may contribute to genetic variation in plants similarly to humans.
Polyploidization plays a more significant role in diversification and evolution of plants compared to animals as approximately 70% of species within the angiosperms are polyploids (Masterson, 1994). The presence of multiple copies of homologous and/or homoeologous DNA sequences in both autopolyploids and allopolyploids presents a technical challenge for CNV analysis. CNV has not been documented in any of the classical polyploid plant species. Here, we report a FISH based CNV survey in potato (Solanum tuberosum, 2n = 4x = 48), a vegetatively propagated autotetraploid species. We demonstrate that CNV is highly abundant in the potato genome. Cultivated potato, due to its vegetative propagation through tubers and autopolyploid nature, has accumulated a large number of CNV loci. Understanding the role of CNVs in potato genetic variation may hold the key for future breeding of this important food crop.
Copy number variation of BAC-sized DNA fragments in potato revealed by FISH
We used a BAC FISH-based approach to explore potential CNV in potato. A set of 18 BAC clones with insert sizes ranging from 110 to 163 kb that mapped to the euchromatic regions of potato chromosome 6 (Iovene et al., 2008) were hybridized to the metaphase chromosomes of two tetraploid potato cultivars, Atlantic and Katahdin (Table S1). If the potato genomic DNA within the BAC clone does not contain duplicated sequences, it will hybridize to a specific position on all four copies of chromosome 6. Indeed, most BACs generated the expected four FISH foci on all four chromosomes in both Atlantic and Katahdin (Figure 1). However, five BACs generated only 1–3 FISH foci in either Atlantic and/or Katahdin (Table 1), suggesting a deletion of the corresponding fragment on the chromosome(s) that lacks the FISH signal. One additional BAC, RH94G20, hybridized to the long arm of all chromosome 6 homologs yet this BAC consistently hybridized to the pericentromeric region of a second chromosome (Figure S1), suggesting a duplication of the RH94G20-associated sequences elsewhere in the genome.
Table 1. Potato chromosome 6-specific BACs associated with copy number variations (CNVs) in Katahdin and Atlantic
Potato linkage group 6 contains a total of 53 cm (Iovene et al., 2008).
No. of signals reported as 2 + 2, 1 + 3, stand for two major signals plus two minor ones; one major signal plus three minor ones, etc.
2 + 2
2 + 2
1 + 3
1 + 3
1 + 1
Three of these six BACs generated FISH signals that were consistently different in size and intensity on the four chromosome 6 homologs (Table 1), suggesting that a partial deletion of the fragment is associated with the chromosome having a weak/low FISH signal (see below for BAC RH83C08). However, quantification of the size and intensity of the minor FISH signals is difficult as most BACs contain some repetitive sequences that generate background FISH signals. Thus, a weak/low FISH signal may not be unambiguously and/or consistently distinguished from the background signals.
CNVs associated with BACs RH102I10 and RH83C08 are widespread in potato cultivars
BAC RH102I10 (138 kb) generated only two FISH foci in both Atlantic and Katahdin. To determine the extent of this CNV within cultivated potato, we conducted FISH analysis of RH102I10 in 16 potato cultivars developed by different breeding programs in North America and Europe and in three diploid potato clones (2n = 2x = 24) (Table 2). Only one of the 16 potato cultivars, Juanita, showed four FISH foci (Figure 2) with 1–3 FISH foci in the other 15 cultivars and two foci in all three diploid clones (Table 2 and Figure 2).
Table 2. FISH survey of BACs RH102I10 and RH83C08 among 16 potato cultivars
A second BAC clone, RH83C08 (148 kb), generated one normal and three very weak signals in Katahdin, and one normal and one very weak signal in Atlantic (Table 1). RH83C08 generated high background FISH signals and as a consequence, the size and intensity of the minor signals on some chromosome 6 homologs were similar to background signals. However, many of these minor signals can be identified based on the reference FISH signals derived from RH60H14, an overlapping BAC from the chromosome 6 BAC tiling path (Figure 3). A partial deletion of the sequences within RH83C08 is most likely associated with the chromosome 6 homologs that exhibit a minor FISH signal. Only a single potato cultivar, Kennebec, had four major foci with RH83C08 with the other cultivars exhibiting combinations of 1–3 major signals and 1–3 minor signals (Figure 3 and Table 2). In the two diploid clones, RH and SH, only a single major FISH signal was observed (Table 2), whereas in the doubled monoploid, DM, two foci were observed. These results show that the CNV loci spanned by BACs RH102I10 and RH83C08 are present widely in tetraploid potato germplasm.
Copy number variant associated with BAC RH160C14 spans >200 kb DNA
BACs RH69B12, RH160C14, RH102I10 and RH83C08 were associated with CNVs in Katahdin and Atlantic (Table 1). We selected several BACs that partially overlap with these four BACs in the chromosome 6 tiling path to investigate if the overlapping BACs extend the original CNVs. BACs RH6L15 (overlapping with RH69B12), RH147M20 (overlapping with RH160C14 on one end), RH134F01 (overlapping with RH102I10), and RH127J15 (flanking RH83C08 with an estimated gap of 23 kb) produced four FISH foci in Katahdin and Atlantic (Table S1). Thus, these BACs do not extend the CNVs (Figure 4(a)). However, BAC RH87P14 (overlapping with RH160C14 on the opposite end) showed a FISH pattern identical to RH160C14 (Figure S2), indicating that RH87P14 and RH160C14 span the same CNV (Figure 4(b)). Based on the overlap between RH87P14 and RH160C14, this CNV spans approximately 216 kb. Notably, RH87P14 generated only a single FISH signal in the diploid clone RH (Figure 4(b)), whereas RH160C14 produced the expected two FISH signals in RH, suggesting that the two chromosome 6 homologs of the RH clone contain different variants of this CNV locus.
Transmission of RH102I10-associated CNV in potato
We were interested in the transmission of the CNVs identified in Katahdin and Atlantic. The BAC clone RH102I10 hybridized to only two of the four chromosomes in both cultivars. We conducted FISH analysis with this BAC in 17 Atlantic haploids (2n = 2x = 24) and four Katahdin haploids. These haploid clones were derived from the female gametes of the corresponding tetraploid potato genotypes by crossing with ‘haploid extraction clones’ of diploid Solanum species (Hougas and Peloquin, 1957; Hougas et al., 1964). Thus, each haploid receives two of the four copies of chromosome 6 from its parental clone. If the four chromosome 6 homologs segregated randomly and transmitted to the female gametes, then there are three different types of haploids that will have 0, 1, or 2 RH102I10 FISH signals, respectively. The ratio of these three types of haploids will be 1:4:1. Strikingly, the ratio of these three types of FISH signal patterns from the 21 haploids was 0:16:5 (Table 3). Thus, the two chromosome 6 homologs lacking the RH102I10 DNA fragment were not simultaneously transmitted to any of the 21 haploids analyzed, a number that is significantly less than the predicted ratio of 1 (haploids with no signal): 5 (haploids with signal) (binomial test, P < 0.02174).
Table 3. Transmission of RH102I10-associated copy number variants to haploid Atlantic and Katahdin clones
Numbers of haploids with
0 FISH signals
1 FISH signal
2 FISH signals
Atlantic haploids (17)
Katahdin haploids (4)
We also analyzed the pedigrees of four of the cultivars examined by FISH using RH102I10. For Kennebec, MegaChip, Ranger Russet, and White Pearl, FISH data were available for the parental or other clones within the pedigree (Figure S3). All sampled clones had at least one copy of RH102I10, suggesting that a minimum copy number of one is a requisite for the level of vigor permitted in cultivated tetraploid potato.
Transcriptional analysis of genes associated with the CNV locus spanned by BAC RH102I10
The BAC clone RH102I10 aligned to superscaffold PGSC0003DMB000000461 of the DM reference genome (The Potato Genome Sequencing Consortium, 2011) and a total of 19 genes were annotated within the region spanning the 138-kb insert of RH102I10. We selected four single copy genes from this region (Table S2, PGSC0003DMG400017574 (P574), PGSC0003DMG400017575 (P575), PGSC0003DMG400017577 (P577), PGSC0003DMG400017582 (P582)) to examine the impact of CNV on gene expression. All four of these genes are broadly expressed across a wide range of developmental, abiotic stress, and biotic stress tissues (The Potato Genome Sequencing Consortium, 2011) and we used quantitative RT-PCR (qRT-PCR) of leaf tissues from 18 haploid/diploid (2x) potato clones and 16 tetraploid (4x) potato cultivars to assess the impact of CNV on transcript abundance. Transcripts of the α-tubulin and actin-97 genes were used separately to normalize the relative abundance of each target transcript in each potato line using six replications (two biological replications, each with three technical replications). Normalized transcript levels for the four genes are shown in Figures 5, S4 and S5. The non-parametric Kolmogorov–Smirnov (K–S) test was used to evaluate differences in the gene expression levels between genotypes with the same ploidy level (2x or 4x) but with different copy numbers of RH102I10. Specifically, within the 2x genotypes, the comparison was performed between genotypes with two copies of RH102I10 and genotypes with only a single copy of RH102I10. Similarly, for 4x lines, comparisons were made between genotypes with four copies versus three copies, three copies versus two copies, and two copies versus one copy of RH102I10. The results were considered significant when the P-values were less than 0.01 in both datasets that were normalized by the α-tubulin and actin-97 genes, respectively.
The K–S tests support that, at each ploidy level, genotypes with more copies of RH102I10 consistently showed significantly higher transcript levels of all four genes than genotypes with fewer copies. For example, 2x potato clones with two copies of RH102I10 showed a significantly higher expression level of P582 compared to the 2x clones with only a single copy of RH102I10 (Figure 5a). Similarly, the 4x potato cultivars with a single copy of RH102I10 showed the lowest expression level of P582 compared to the 4x potato cultivars with multiple copies of RH102I10 (Figure 5(b)). Similar results were obtained for the other three genes (Figures S4 and S5, see Table 4 for a list of P-values). Based on linear regression analysis, we found that the normalized transcript levels of these four genes correlated positively and significantly with RH102I10 copy number in different genotypes (Figure 6). This pattern was consistent for both reference genes used for normalization (Table 4).
Table 4. P-values from Kolmogorov–Smirnov (K–S) statistical testsa
Comparisons of the level of gene expression were made between groups of potato lines with the same ploidy but different copy numbers of the RH102I10 DNA fragment. The first four genes (P574 to P582) are located within the RH102I10 DNA fragment. The last three genes mapped to a non-CNV region (BAC RH060H14) are used as control. Expression levels were obtained using alpha-tubulin and actin-97 as reference genes. Differences in the expression level of the different groups were considered statistically significant when the obtained P-values were <0.01 for both delta Ct data sets calculated using alpha-tubulin and actin-97 for normalization.
2x = diploid lines; 4x = tetraploid lines; 2 vs. 1, …, 4 vs. 1 = two copies vs. one copy of 102I10 DNA fragment, …, four copies vs. one copy of RG102I10 DNA fragment; tub = Ct values normalized using alpha-tubulin; act: normalized using actin-97.
As a control, we also analyzed the expression level of three genes (Table S2, PGSC0003DMG400026963 (P963), PGSC0003DMG400027047 (P047), PGSC0003DMG401026960 (P960)) that mapped to the genomic region corresponding to RH60H14, a BAC clone that hybridized to all four chromosome 6 homologs in all potato cultivars examined and thereby lacking CNV (Figure 1). The same genotypes were used to perform similar K-S tests within each ploidy level (Figures 5(c,d), S6 and S7) and no significant differences were detected in the 2x clones (Table 4). Among the 4x cultivars, a single case of significant difference was found for the expression level of P960 with cultivars that had a single or only two copies of RH102I10 (P < 0.015; Table 4). No significant differences were detected in any other comparisons (Table 4).
There is limited information on the extent of CNVs in plant genomes. Sequencing of multiple A. thaliana lines revealed 1059 CNV loci covering 2.2 Mb of the reference Col-0 genome (Cao et al., 2011). These CNV loci account for approximately 2% of the A. thaliana genome with only 393 CNVs overlapping with coding sequences (Cao et al., 2011). A microarray-based survey among 19 diverse maize inbreds and 14 genotypes of teosinte, the progenitor of maize, revealed that approximately 8% of the maize genes are associated with CNVs (Swanson-Wagner et al., 2010). Our FISH-based survey revealed that six of the 18 (33%) randomly selected potato BACs were associated with CNVs. However, this survey most likely under-estimates the extent of CNVs in the potato genome for several reasons. First, the FISH technique is limited in both resolution and sensitivity. As a consequence, a CNV associated with only a portion of the cloned BAC fragment may not be revealed by the FISH method. Small CNVs, such as those detected in A. thaliana using re-sequencing approaches, will be missed by using the FISH method. Second, our estimations of the length of each CNV locus most likely under-estimates the true CNV size as the FISH probes were limited to the fragment length within the BAC clones. Indeed, examination of an overlapping BAC from the chromosome 6 minimal tiling path extended the size of one estimated CNV locus. Third, the 18 BACs used in CNV survey are all located within the euchromatic regions of potato chromosome 6 (Iovene et al., 2008). Low-recombination regions in maize showed elevated frequencies of CNV (Swanson-Wagner et al., 2010) and the pericentromeric heterochromatin of potato, which is suppressed in recombination, may contain a higher amount of CNVs than what we observed based on FISH using BACs-derived euchromatic regions. Thus, it is evident that while we have demonstrated an extensive degree of CNV in terms of frequency and size of the CNV loci, we most likely have under-estimated the number and diversity of CNV in the potato genome.
A haploid-based transmission study revealed that female gametes with fewer copies of the RH102I10-associated CNV may be inferior compared with those with more copies of the CNV (Table 3). The Atlantic and Katahdin haploids used in the analysis were selected by the Wisconsin potato breeding program from a large number of haploids based mainly on their vigor and fertility. All 21 Atlantic and Katahdin haploids analyzed contain at least one copy of RH102I10. Thus, haploids lacking both copies of RH102I10 may never be produced. The RH102I10-associated CNV may contain genes important to female gamete function, thus, gametes may need at least one copy of RH102I10 to be functional. Alternatively, haploids without including a single copy of RH102I10 may be inferior in vigor and these haploids were all eliminated during selection. Interestingly, qRT-PCR analysis showed that all four genes associated with this CNV display a dosage effect in transcript levels (Figure 6). In general, more copies of these genes result in more transcripts, which may ultimately impact on gamete fitness or plant vigor. If a gene(s) in such large CNVs plays a regulatory role, then the CNV, which is similar to a segmental aneuploid, may have a stronger impact on growth and development because the CNV will affect the expression of other genes in the same regulatory pathway (Birchler and Veitia, 2007).
The CNVs detected in A. thaliana and maize by DNA sequencing and microarrays were primarily a few kb in length (Swanson-Wagner et al., 2010; Cao et al., 2011). In contrast, the CNVs detected by FISH using BAC clones in potato are large, greater than 100 kb. The autotetraploid nature of the potato genome provides great potential for retention of deleterious and dysfunctional mutations and deletions compared to diploid species. Furthermore, vegetative propagation enhances the retention rate of mutations that negatively impact gamete development and transmission. The accumulation of mutations in potato is also reflected by the fact that inbreeding of most potato cultivars results in dramatic yield depression and, primarily, in sterile and weak progeny. We hypothesize that large CNVs are less likely to survive in sexually reproducing diploid species and predict that vegetatively propagated, autopolyploid species such as potato have a higher frequency of CNVs than the majority of diploid and allopolyploid plant species.
Cultivar improvement was the most important contribution in the twentieth century to yield increases in the USA for most major crops, including maize, wheat, barley, and cotton (Fehr, 1984). However, an assessment of potato cultivars developed in the nineteenth century compared to those developed in the USA from 1932 to 1991 revealed no yield difference under modern field management practices (Douches et al., 1996). The six-fold yield increase of potato in the USA during the twentieth century was attributed to the use of disease-free seed, nitrogen fertilizer, irrigation, and pest management (Douches et al., 1996). It is intriguing that modern plant breeding did not result in significant yield improvement in potato during the same period while it lead to advances for nearly every other major crop. In fact, Russet Burbank, a clonal selection of ‘Burbank’ released in 1874, still accounts for nearly half of the United States potato acreage.
The vigor and yield of potato cultivars relies on breeders' efforts to maintain the ‘maximum heterozygosity’ in the genome (Carputo et al., 2003). However, the molecular basis of the maximum heterozygosity has been obscure. An early study of enzyme polymorphism in 94 North American potato cultivars revealed an average of 2.13 alleles per locus (Douches et al., 1989); this allelic complexity was supported in subsequent sequence-based analyses (Simko et al., 2004; Kuang et al., 2005). These early studies suggest that the maximum heterozygosity is probably composed of maximum number of alleles at each locus and possibly non-additive genetic variance. Our current study shows that CNV, including extreme CNV in which three of the four copies of a locus are absent, may be another significant contributor to potato maximum heterozygosity. A significant number of deletions has been accumulated in the potato genome, as shown at the genome level in analysis of the doubled monoploid DM and the heterozygous diploid RH (The Potato Genome Sequencing Consortium, 2011) and now in this study by FISH with cultivated potato. These deletions, although not lethal, can have a negative effect on plant vigor especially if genes associated with CNV have dosage effects. Thus, a high yielding potato cultivar may need to combine not only superior alleles, but also advantageous CNV loci. This finding provides an explanation why modern potato breeding has been very ineffective in developing high yielding cultivars. Genome-wide mapping and characterization of CNVs will play an important role in future breeding of potato.
In total, 16 potato cultivars developed by different breeding programs in North America and Europe (Table 2) were chosen for a FISH-based CNV survey. Haploids of Atlantic and Katahdin were developed and maintained by the University of Wisconsin potato breeding program. The homozygous doubled monoploid clone DM was derived from anther culture (Lightbourn and Veilleux, 2007) and is the source for the potato reference genome (The Potato Genome Sequencing Consortium, 2011). Diploid clones RH and SH were parental clones for a potato genetic mapping population (van Os et al., 2006). The BAC clones (Table S1) used in CNV survey have been mapped previously to the euchromatic regions of potato chromosome 6 (Iovene et al., 2008).
Chromosome preparation and FISH were performed following published protocols (Dong et al., 2000; Iovene et al., 2008). BAC DNA was labeled with either biotin-16-UTP or digoxigenin-11-dUTP (Roche Diagnostics, Indianapolis, IN, USA) using a standard nick translation reaction. Chromosomes were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) in VectaShield antifade solution (Vector Laboratories, Burlingame, CA, USA). The FISH images were processed using meta imaging Series 7.5 software. The final contrast of the images was processed using adobe photoshop CS3 software.
Quantification of transcripts of genes associated with CNVs
Using annotation of the DM genome (The Potato Genome Sequencing Consortium, 2011) (http://solanaceae.plantbiology.msu.edu/), primers were designed for four annotated DM genes that were collinear with RH102I10 and three genes collinear with RH60H14 (Table S2). All primer efficiencies were estimated using template dilutions and the equation E = 10(−1/slope); all were in the 1.8–2.2 range. qRT-PCR was performed using two biological replicates and three technical replicates. For each genotype, two or three tubers were planted in the greenhouse. Seven to 12 days after emergence, one or two fully expanded terminal leaflets from the top of each plant were collected and immediately frozen in liquid nitrogen. Leaf tissue from two or three plants of the same genotype was pooled and RNA was extracted using the Qiagen plant RNeasy kit (Valencia, CA, USA) and the on-column DNase digestion according to the manufacturer's instructions. RNA was then treated with Turbo DNA-free (Ambion/Applied Biosystems, Houston, TX, USA) and RNA quality/quantity and integrity were evaluated using a NanoDrop spectrophotometer and agarose gel electrophoresis, respectively. Super Script III reverse transcriptase (Invitrogen, Carlsbad, CA, USA) and oligodT primers were used to generate the first-strand cDNA. All RT-PCR reactions and subsequent amplicon melting curves were performed in triplicate using Dynamo SYBR Green master on the Opticon 2 of MJ Research (Waltham, MA, USA).
The comparative Ct method and the reference genes α-tubulin and actin-97 were used to calculate the normalized expression level, according to the formula 2−ΔCT where ΔCT = CT (target gene, using the CT of a single technical replicate) − CT (actin-97 or α-tubulin; for each genotype the CT value of each reference gene was obtained by averaging the CT values of the three technical replicates). Thus, for each genotype, six ΔCT values (three for each biological replicate), normalized either by α-tubulin or actin-97, were obtained and used in the subsequent analysis. Statistical analysis was performed in the R statistical analysis environment (Yuan et al., 2006). Within each ploidy group (2x and 4x), the non-parametric K–S statistical test was used to test differences in the gene expression pattern between groups of genotypes with differential CNV for 102I10. Specifically, within the diploid lines, the comparison was carried out between the expression pattern of the genotypes with two copies of the RH102I10 DNA fragment and genotypes with a single copy. Similarly, for the tetraploids, comparisons were made between genotypes with four copies versus three copies, three copies versus two copies, and two copies versus one copy. Differences in the expression pattern of different ploidy-CNV groups were considered statistically significant when the obtained P-values were < 0.01 for both ΔCt datasets calculated using α-tubulin and actin-97 for normalization. As a control, a similar analysis using the same ploid-CNV groups was performed with expression patterns of genes mapping outside the CNV region.
This work was supported by grants DBI-0923640 and ISO-1237969 from the National Science Foundation to C.R.B. and J.J.