A multivariate analysis of variation in genome size and endoreduplication in angiosperms reveals strong phylogenetic signal and association with phenotypic traits


Author for correspondence:

Jillian D. Bainard

Tel: +1 519 824 4120 extn. 56002

Email: jillian.bainard@gmail.com


  • Genome size (C-value) and endopolyploidy (endoreduplication index, EI) are known to correlate with various morphological and ecological traits, in addition to phylogenetic placement. A phylogenetically controlled multivariate analysis was used to explore the relationships between DNA content and phenotype in angiosperms.
  • Seeds from 41 angiosperm species (17 families) were grown in a common glasshouse experiment. Genome size (2C-value and 1Cx-value) and EI (in four tissues: leaf, stem, root, petal) were determined using flow cytometry. The phylogenetic signal was calculated for each measure of DNA content, and phylogenetic canonical correlation analysis (PCCA) explored how the variation in genome size and EI was correlated with 18 morphological and ecological traits.
  • Phylogenetic signal (λ) was strongest for EI in all tissues, and λ was stronger for the 2C-value than the 1Cx-value. PCCA revealed that EI was correlated with pollen length, stem height, seed mass, dispersal mechanism, arbuscular mycorrhizal association, life history and flowering time, and EI and genome size were both correlated with stem height and life history.
  • PCCA provided an effective way to explore multiple factors of DNA content variation and phenotypic traits in a phylogenetic context. Traits that were correlated significantly with DNA content were linked to plant competitive ability.


The amount of DNA present in nuclei varies widely across extant organisms, but is not correlated with organismal complexity (the ‘C-value enigma’; Gregory, 2001). Several factors can contribute to the variation in DNA content, including both gains and losses of DNA (Hawkins et al., 2008; Grover & Wendel, 2010). Genome size is often measured as the 1Cx-value (DNA present in one chromosome complement) or 2C-value (DNA present in the sporophytic or somatic phase of the life cycle). DNA content can also vary within an organism as a result of endoreduplication, when DNA replication is not followed by nuclear or cell division (Comai, 2005). This results in nuclei at various ploidy levels in certain cells and tissues (Nagl, 1978), a situation known as endopolyploidy (or somatic polyploidy). The proportion of endopolyploid nuclei in a tissue sample is quantified by averaging the number of endoreduplication cycles undergone by each nucleus, called the endoreduplication index (EI) or cycle value (Barow & Meister, 2003).

The role of DNA content, irrespective of the genotype or phenotype, is called the ‘nucleotype’ (Bennett, 1971, 1972). The nucleotypic theory states that, at the most basic level, nuclei must be large enough to contain the DNA present in a cell, and the cell must be large enough to contain the nucleus. In plants, this creates a direct relationship between the genome size and meristematic cell size, which, in turn, can be expanded to other aspects of organism morphology and ecology. Knight et al. (2005) provided a literature review, covering many of the factors that have been linked to DNA content variation in plants, such as species diversity, altitude, latitude, temperature, precipitation, seed mass, various leaf anatomical traits, generation time and growth rate. On the basis of these literature surveys, together with new analyses presented in the paper, the authors proposed a ‘large genome constraint’ hypothesis (Knight et al., 2005). This hypothesis suggests that there are costs associated with having a large genome that is saturated with noncoding DNA, and predicts that organisms with small genomes will have a wide range of morphological characteristics and be found in varying habitats, whereas organisms with large genomes will be comparatively constrained.

Several large-scale analyses have been conducted recently exploring the relationships with genome size in plants. The correlation between seed size and genome size was explored by Beaulieu et al. (2007) using a dataset of 1222 seed plants. They found a quadratic relationship, in which species with small 2C-values (and 1Cx-values) had a range of seed sizes, whereas species with large DNA content did not have small seeds. Beaulieu et al. (2008) confirmed the correlation between genome size and both guard and epidermal cell size in a large set of angiosperms. Knight et al. (2010) found a slight positive relationship between pollen size and gametic DNA content in a study of 464 species, but this relationship was lost once phylogenetic independent contrasts were implemented. Gruner et al. (2010) found a significant negative correlation between genome size and root meristem growth rate.

If DNA content is correlated with these various aspects of plant morphology, it is intuitive to suspect that endopolyploid nuclei might also have a similar effect. In fact, genome size itself is correlated with endopolyploidy, as species with smaller genomes are more likely to have endopolyploid nuclei (Nagl, 1978; Barow & Meister, 2003). As with genome size, the degree of endopolyploidy often correlates positively with cell size (e.g. Melaragno et al., 1993), but it does not control the final cell size entirely (Sugimoto-Shirasu & Roberts, 2003; Cookson et al., 2006). This relationship with cell size may, in turn, influence organ size, but this is also not a strict pattern (Mizukami, 2001). Endoreduplication can be initiated at various stages of plant development (De Veylder et al., 2011), and endopolyploid nuclei are often associated with specific organs and tissues (Joubès & Chevalier, 2000). Various environmental factors have been suggested to influence endopolyploidization in a plant, including temperature (Engelen-Eigles et al., 2000; Jovtchev et al., 2007), light (Kinoshita et al., 2008), drought (Setter & Flannigan, 2001) and salinity (Ceccarelli et al., 2006).

Endopolyploidy is also correlated with life history strategy in angiosperms (Barow & Meister, 2003). A hypothesis for this relationship was derived from the observation that organisms with small genome sizes have a shorter cell cycle and may have a shorter time to anthesis, which is beneficial in shorter growing seasons. Larger genome sizes can result in larger cells, which are beneficial for fast plant growth by cell expansion early in the growing season (Barow & Meister, 2003). Endopolyploidy may confer a benefit under certain life history strategies by providing a way to combine both of these features. Angiosperm annuals and biennials are more likely than perennials to be endopolyploid. The effect of endopolyploidy, however, is complicated by shared evolutionary history; the largest factor for the presence of endoreduplication is familial placement, or phylogenetic relatedness (Barow & Meister, 2003). The presence of endopolyploidy is found predominantly in some plant groups and families, whereas it is almost absent in others.

It is likely that a multitude of factors influence DNA content; however, most previous approaches have utilized univariate analyses (i.e. Knight et al., 2010), which do not allow a comparison to be made of the relative influence of many traits and other factors simultaneously. Multivariate analysis allows us to determine the relative contributions of many morphological and ecological factors in relation to a variable such as DNA content. Balao et al. (2011) utilized such a comparative approach to interpret the role of polyploidy and genome size in phenotypic trait variation in Dianthus broteri. Furthermore, as closely related species exhibit both similar genome sizes and morphological traits as a result of shared recent ancestry, phylogenetic analyses allow researchers to interpret how shared evolutionary histories might control these factors.

The objective of this study was to explore the relationships between DNA content and phenotypic traits in a phylogenetic context. Specifically, this research: (1) quantifies variation among species with respect to two measures of DNA content: genome size (1Cx-value and 2C-value) and EI (in four tissue types: leaves, stems, roots and petals); (2) determines the phylogenetic signal in each measure of DNA content alone and in a combined multivariate analysis; and (3) uses phylogenetic canonical correlation analysis (PCCA) to observe the relationships between DNA content and a suite of 18 morphological and ecological traits in a phylogenetic context. To accomplish these objectives, we used flow cytometry to estimate genome size and the degree of endopolyploidy in 41 angiosperm species from 17 families.

Materials and Methods

Experimental design

Seeds from three individuals each of 41 herbaceous plant species, representing 17 families, were collected from local populations in Guelph, ON, Canada. The 41 species were all common plants readily available for collection (and germinated successfully), and were selected to have a range in morphology and habitat. Seeds were surface sterilized (incubated for 1 min in 10% bleach and 95% ethanol solutions) and added to pots (Deepots, Stuewe and Sons, Tangent, OR, USA) with potting soil (Sunshine Mix #4, Sun Gro Horticulture, Vancouver, BC, Canada) that had been sterilized by autoclaving twice (45 min at 121°C). After addition of the seeds, a layer of sterile Turface (Profile Products LLC, Buffalo Grove, IL, USA) was placed over the seeds. Pots were placed randomly on a bench at the University of Guelph Phytotron, watered as required and fertilized every 2 wk. Growing temperatures were maintained at 22–24°C during the day and 16–18°C at night, with a 16-h photoperiod. After emergence, seedlings were thinned to one plant per pot, resulting in six individuals per species, two from each parent plant from which the seeds were collected. Plants were grown until the flowering stage, which ranged between 12 and 36 wk, depending on the species. Several perennial species did not mature beyond the rosette stage and were harvested before flowering. A voucher of each plant specimen was deposited in the University of Guelph OAC Herbarium.

Flow cytometric analyses

Determination of genome size

The genome size (2C-value) was estimated using healthy tissue from young leaves. Of the six individuals growing for each species, three plants were randomly selected, and the genome size was estimated for each plant on two separate days. To determine the genome size, fresh leaf tissue from a standard with known DNA content was co-prepared with the sample tissue. Seeds for standards were acquired from the Laboratory of Molecular Cytogenetics and Cytometry, Olomouc, Czech Republic, and the standards were grown in the University of Guelph Phytotron. The sample and standard tissues were co-chopped in 1.2 ml cold LB01 buffer (Doležel et al., 1989) in the presence of 100 μg ml−1 propidium iodide and 0.5 μg ml−1 RNase (methods determined by preliminary tests). The resulting homogenate was filtered through a 30-μm mesh, resulting in c. 1.0 ml of sample. Samples were incubated on ice for 20 min.

Flow cytometric analysis was completed on a Partec CyFlow SL (Partec GmbH, Münster, Germany) equipped with a blue solid-state laser tuned at 20 mW and operating at 488 nm. Before each use, the instrument was calibrated using 3-μm calibration beads (Partec). Relative nuclear fluorescence was measured at 590/50 nm on a linear scale. The fluorescence was also plotted vs side scatter (a measure of surface complexity) and polygons were drawn around the scatterplots of nuclei to further isolate the nuclei of interest. This was carried out using FloMax Software (version 2.52; Partec). Over 1000 nuclei were acquired for both the standard and sample peaks, and coefficients of variation averaged below 5% for both peaks. The genome size was calculated as:

display math

The six genome size estimates were averaged to produce one estimate for each species. To calculate the 1Cx-value, an extensive literature review was conducted to determine the ploidy level for each species and the base chromosome count (x) for each genus. For species with multiple ploidy levels recorded, published genome size data and distribution maps were utilized. To calculate the 1Cx-value, the measured 2C-value was divided by the predicted ploidy level of the plant. For example, the 2C-value for a diploid was divided by two, whereas the 2C-value for a tetraploid was divided by four, to determine the 1Cx-value. We recognize that there are likely to be errors in our determination of the 1Cx-value as we did not have original chromosome counts; however, this rough estimate of the 1Cx-value is still useful in our exploratory analysis.

Determination of EI

The plant tissues used for the analysis of endoreduplication were selected to be representative of the plant and across species. Although it is known that endopolyploidy can vary across different regions of the same tissue or organ (Barow & Jovtchev, 2007), an average estimate of the degree of endopolyploidy (over tissues that were relatively comparable between species) was suitable for this study. Most importantly, tissue was selected that was both mature enough to be expressing final levels of endoreduplication, but also still fresh and healthy for the application of flow cytometry. Leaf tissue was selected from the mid-region of the plant, unless the plant did not bolt. When species had small leaves, several leaves were chopped together, and, when species had large leaves, fresh tissue was cut from the middle of the leaf blade. Healthy stem tissue was selected from the middle of the plant, but, for several species that only had rosettes (e.g. Plantago, Taraxacum), peduncle tissue was used instead of stem, and, for plants that did not bolt, petiole tissue was used. Although endopolyploidy varies for these tissues, they are not that different from the expected stem endoreduplication values (Barow & Jovtchev, 2007). Petals were collected from all plants that flowered. To acquire sufficient nuclei, petals from multiple flowers were used if necessary. In the case of extremely small flowers (e.g. Chenopodium), the entire flower head was chopped. Root tissue was collected from throughout the soil depth, but avoiding the root tips. All root samples from an individual plant were analyzed on the same day, as the plant had to be harvested completely to sample the root tissue.

To determine the degree of endopolyploidy, two samples of all tissues were collected from each of six individuals on separate days, resulting in 12 samples per tissue type, except that two root samples were taken from each of three individuals. Flow cytometric analysis followed the procedure given above, except that no standard tissue was included in the nuclei suspension and at least 1500 nuclei were analyzed in each sample. An example of a flow cytometry histogram demonstrating endoreduplication can be found in Supporting Information Fig. S1. To determine the number of nuclei (n) in each ploidy level, debris was first gated out on the side scatter vs fluorescence (log) plot, and then gates were drawn around each cluster of nuclei to obtain an accurate count. The degree of endopolyploidy was quantified by calculating the cycle value, or EI, which is a measure of the number of endoreduplication cycles per nucleus that occurred in the nuclei measured. This is calculated according to the following formula (Barow & Meister, 2003):

display math

Histograms that have nuclei in more than two ploidy levels clearly exhibit endopolyploidy. However, samples that only have two peaks of nuclei could be an indication of nuclei at varying stages of the cell cycle (i.e. nuclei in 4C during the G2 phase) or of doublets (when two nuclei are stuck together as they pass through the flow channel of the flow cytometer). To distinguish the presence of doublets, Barow & Meister (2003) proposed a threshold of EI = 0.1, where values above this cut-off indicate endopolyploid nuclei. In addition, all tissues selected for flow cytometric analysis in this study were fully differentiated and should not have been experiencing mitotic activity. This is also confirmed by the lack of nuclei in the S phase of the cell cycle, which would appear between the first and second peaks in the flow histogram (Fig. S1).

Phylogenetic reconstruction

To construct a phylogeny of the 41 taxa studied, sequence data were obtained from Genbank and supplemented by de novo sequencing. Sequence data for portions of the plastid genes coding for rbcL, matK and rpoC1 of many of the species of interest were available and downloaded from Genbank (Table S1). Almost all of the data were derived from populations close to where the plants were sampled. Any missing data in the species/region matrix were supplemented by de novo sequencing of these plastid regions. DNA was isolated from c. 10 mg of dried leaf material using the PlantII DNA extraction kit (Macherey-Nagel, Düren, Germany). The three regions (rbcL, matK and rpoC1) were PCR amplified using primers with broad taxonomic coverage (Fazekas et al., 2008). The 20-μl PCR mixtures contained 2 μl of 10 × buffer, 2.0 mM magnesium chloride, 0.2 mM deoxynucleoside triphosphates (dNTPs), 0.5 μM of each primer, 0.5 U Amplitaq Gold polymerase (Applied Biosystems, Carlsbad, CA, USA) and 1.0 μl of template DNA (15–30 ng μl−1). Amplification of each gene region was performed using the following thermal cycling protocol: initial denaturation at 95°C for 1 min, 35 cycles of 95°C for 30 s, annealing at 52–55°C for 40 s and extension at 72°C for 1 min, followed by a final extension at 72°C for 5 min and a final hold at 4°C. The resulting amplicons were sequenced directly on both strands using the same primers as employed in the PCR. The cycle sequencing reactions were performed in 10-μl volumes containing 1.9 μl of 5 × sequencing buffer, 1 μM primer, 0.5 μl of BigDye terminator mix v3.1 and 0.5 μl of PCR product. The cycle sequencing reactions were performed using the following thermal cycling protocol: initial denaturation at 96°C for 2 min, 30 cycles of 96°C for 30 s, annealing at 55°C for 15 s and extension at 60°C for 4 min, followed by a final hold at 4°C.

We obtained bidirectional sequence reads from most PCR products, but, in a small number of cases, either the forward or reverse sequencing reaction consistently failed. For these samples, a minimum of two-fold coverage was obtained by sequencing twice in one direction. Sequence contigs were assembled and edited using Sequencher 4.8 (Gene Codes Corporation, Ann Arbor, MI, USA). DNA sequences were aligned using the default settings of the ClustalW algorithm (Thompson et al., 1994) as implemented in Bioedit (Hall, 2007) and adjusted manually if required. Sequence data from each of the three loci were concatenated before phylogenetic analysis.

Bayesian inference of phylogeny was performed using a Metropolis-coupled Markov chain Monte Carlo (MCMC) approach in MrBayes 3.2 (Ronquist & Huelsenbeck, 2003). For each region, the best-fitting nucleotide substitution model was determined using the Akaike Information Criterion (AIC) as implemented in MrModeltest 2.3 (Nylander, 2004). In the Bayesian MCMC model, each region was assigned its own model of nucleotide substitution (GTR + Γ + I for all three regions). Model parameters (including overall rate) were unlinked across partitions. Two independent Bayesian MCMC analyses were conducted using flat priors and four chains (three heated and one cold). Each analysis was run for 5 000 000 generations and trees were sampled from the cold chain every 500 generations. Consensus trees from both independent runs were nearly identical, and so a 70% consensus tree with posterior node probabilities and median branch lengths was calculated from the pooled sample trees (discarding the first 5% as burn-in) using sumTrees 3.3.1 (DendroPy 3.11.0; Sukumaran & Holder, 2010). This is the tree that was used in subsequent analyses.

Morphological and ecological traits

To determine the relationship between morphological and ecological traits and DNA content, 18 traits were selected in addition to the six measures of DNA content. These traits included seven continuous variables (pollen length, leaf length, cotyledon length, stem height, seed length, seed mass and seed number) and 11 categorical variables (native or introduced, distribution, dispersal mechanism, root type, arbuscular mycorrhizal (AM) association, reproductive strategy, life history, flowering time, floral biology, light preference and soil preference). These traits were selected to reflect characters that have previously been found to correlate with genome size or endoreduplication, or that might correlate with DNA content based on their association with broader characters (e.g. flowering time is related to minimum generation time, which is known to associate with genome size). The trait matrix was created from various resources, including scientific papers, the Flora of North America, Biology of Canadian Weeds literature, FOIBIS (Newmaster & Ragupathy, 2012), and other peer-reviewed data sources. Where multiple values were found over different literature sources, the average was obtained. Numerical traits were log-transformed before analysis to better meet the assumptions of linearity and homoscedasticity. All trait data can be found in Table S2. Missing data were imputed using the Bayesian principal component analysis estimation method from the Bioconductor package ‘pcaMethods’ (Stacklies et al., 2007) by including the full set of traits as predictors.

Phylogenetic signal

The phylogenetic signal (Pagel's λ) was explored in relation to DNA content and the suite of morphological and ecological traits. The univariate phylogenetic signal was estimated separately for each of the DNA content measures (2C-value, 1Cx-value and four measures of EI). In addition, Revell & Harrison (2008) described a method for the maximum likelihood estimation of a multivariate version of Pagel's λ (Pagel, 1999; Freckleton et al., 2002), which can be used as a general scalar for the interior branch lengths of a tree to accommodate deviations from Brownian motion. Estimation of the multivariate λ is impossible when the dataset contains more variables than observations, but it should be noted that the estimate of λ is immune to rotational or scalar transformations, such that the maximum likelihood estimate of the multivariate λ for any dataset is identical to its estimate obtained from the full set of principal components. For a dataset with more variables than observations, an approximate estimate of the multivariate λ can be obtained by using a subset of the components with the largest eigenvalues, as performed here. Multivariate λ was estimated separately for the DNA content matrix, the trait matrix and the full dataset using a subset of the principal components (16 components for each; c. 85% of the variation explained).

Multivariate analysis

Canonical correlation analysis (CCA) is a multivariate technique which can identify the canonical axes of highest correlation between two data matrices. Unfortunately, CCA cannot be performed when one of the matrices has more variables than observations (as in the dataset analyzed here). However, this problem can be overcome by including a regularization step in the analysis that involves the addition of a constant to the diagonal of the trait covariance matrices and where the regularization parameters are optimized using a ‘leave-one-out’ cross-validation procedure (González et al., 2008). This type of regularization procedure has been used to analyze large gene expression datasets, which often contain many more variables than observations (e.g. González et al., 2009; Soneson et al., 2011).

In comparative analyses, it is also necessary to incorporate information on the evolutionary relationships between species (e.g. Felsenstein, 1985) because species data (usually) do not represent independent observations. The phylogenetic generalized least-squares (PGLS) method (e.g. Rohlf, 2001) is a flexible approach for including phylogenetic information in comparative analyses, and this method has been incorporated into several multivariate techniques, including PCCA (Revell & Harrison, 2008). PCCA cannot be implemented when there are more variables than observations, but, here, PCCA is adapted to include a regularization step as described above to permit an analysis of the correlation between variation in DNA content and trait variation.

A phylogenetically informed multivariate analysis of the relationships between the variation in DNA content (genome size: 2C-value and 1Cx value; EI in four tissue types: leaf, stem, root, and petal) and the remaining morphological and ecological traits was performed in R version 2.15.0 (R Development Core Team, 2012) using the R packages ‘ape’ (Paradis et al., 2004), ‘ade4’ (Dray & Dufour, 2007), ‘caper’ (Orme et al., 2012), ‘CCA’ (González et al., 2008), ‘geiger’ (Harmon et al., 2008) and ‘phytools’ (Revell, 2012). To obtain an estimate of the multivariate version of Pagel's λ (Revell & Harrison, 2008), an estimation of λ was performed on the first 16 principal components of the standardized data and using the Hill–Smith method for scaling categorical indicator variables (Hill & Smith, 1976). These 16 components captured c. 85% of the variation in the original traits. This estimate of λ was then used as an interior branch-length scalar for the tree during subsequent analyses. A regularized PCCA was performed to explore the relationships between the variation in DNA content and trait variation. The regularization parameters were estimated using the ‘leave-one-out’ cross-validation procedure, as described in González et al. (2008). The cross-validation statistic formed a smooth surface over the explored parameter space (λx and λy allowed to vary independently between 0 and 1 at an interval of 0.01) with a single local maximum at λy = 0.48 and λx = 0.35. These were the regularization parameters used in the final regularized PCCA model. Species scores are presented in the phylogenetically dependent species space (Revell, 2009). Axis loadings were explored by PGLS correlation analysis (for numerical traits) and PGLS ANOVA (for categorical traits), that is, in the phylogenetically independent species space.


In the 41 species analyzed, there was a considerable range in DNA content estimates. 2C-values ranged from 0.43 pg (Erysimum cheiranthoides) to 11.72 pg (Achillea millefolium) and 1Cx-values ranged from 0.21 pg (E. cheiranthoides) to 3.25 pg (Crepis tectorum) (Table 1). Forty-one leaf, stem and root EIs were also obtained (Table 2). Petal EI was calculated for 29 species, as not all plants flowered during the study. The highest EI was found in the stem tissue of Sisymbrium altissimum, and the lowest EI was observed in the petal tissue of Lotus corniculatus (Table 2). The phylogenetic reconstruction of the 41 species is shown in Fig. S2.

Table 1. Genome size estimates: 2C-value and 1Cx-value for 41 species from 17 families
 SpeciesFamily2C-value ± SE (pg)1Cx-value ± SE (pg)Standarda
  1. Taxa are arranged alphabetically within families and values reported are averages over all replicates ± standard error (SE) of the mean.

  2. a

    2C-values for the standards are: Raphanus sativus ‘Saxa’, 1.11 pg (Doležel et al., 1998); Solanum lycopersicum ‘Stupicke polni tyckove rane’, 1.96 pg (Doležel et al., 1992); Glycine max ‘Polanka’, 2.50 pg (Doležel et al., 1994); Zea mays ‘CE-777’, 5.43 pg (Lysák & Doležel, 1998); Pisum sativum ‘Citrad’, 9.09 pg (Doležel et al., 1998).

  3. b

    Estimate based on less than full replicates.

1. Chenopodium album Amaranthaceae3.80 ± 0.0070.63 ± 0.001 Glycine max
2. Daucus carota Apiaceae1.04 ± 0.0100.52 ± 0.005 Solanum lycopersicum
3. Asclepias syriaca Apocynaceae0.86 ± 0.0030.43 ± 0.002 G. max
4. Achillea millefolium Asteraceae11.72 ± 0.0642.93 ± 0.016 Pisum sativum
5. Arctium minus  4.41 ± 0.0452.20 ± 0.023 P. sativum
6. Cichorium intybus  2.76 ± 0.0401.38 ± 0.020 Zea mays
7. Cirsium arvense  3.02 ± 0.0061.51 ± 0.003 G. max
8. Cirsium vulgare  5.45 ± 0.0101.36 ± 0.003 P. sativum
9. Crepis tectorum  6.51 ± 0.0203.25 ± 0.010 G. max
10. Erigeron philadelphicus  4.61 ± 0.0072.31 ± 0.003 Z. mays
11. Lactuca serriola  5.91 ± 0.0192.96 ± 0.010 P. sativum
12. Solidago flexicaulis  4.00 ± 0.0212.00 ± 0.011 G. max
13. Symphyotrichum lanceolatum  4.17 ± 0.0350.69 ± 0.006 Z. mays
14. Taraxacum officinale  2.64 ± 0.0190.88 ± 0.006 S. lycopersicum
15. Tripleurospermum inodorum  9.72 ± 0.0792.43 ± 0.020 Z. mays
16. Cynoglossum officinale Boraginaceae1.25 ± 0.0070.62 ± 0.004 Z. mays
17. Brassica nigra Brassicaceae1.11 ± 0.0060.56 ± 0.003 Z. mays
18. Erucastrum gallicum  2.12 ± 0.0190.53 ± 0.005 Z. mays
19. Erysimum cheiranthoides  0.43 ± 0.0030.21 ± 0.001 Raphanus sativus
20. Lepidium campestre  0.71 ± 0.0030.35 ± 0.002 R. sativus
21. Sisymbrium altissimum  0.56 ± 0.0060.28 ± 0.003 S. lycopersicum
22. Thlaspi arvense  1.10 ± 0.0070.55 ± 0.004 Z. mays
23. Cerastium fontanum Caryophyllaceae5.69 ± 0.0170.71 ± 0.002 P. sativum
24. Silene latifolia  5.97 ± 0.0082.99 ± 0.004 P. sativum
25. Lotus corniculatus Fabaceae2.46 ± 0.0250.62 ± 0.006 S. lycopersicum
26. Medicago lupulina  1.41 ± 0.0070.71 ± 0.003 S. lycopersicum
27. Medicago sativa  3.73 ± 0.0170.93 ± 0.004 G. max
28. Trifolium pratense  1.00 ± 0.0030.50 ± 0.002 G. max
29. Geranium robertianum Geraniaceae2.44 ± 0.021b1.22 ± 0.011 Z. mays
30. Leonurus cardiaca Lamiaceae1.68 ± 0.0040.84 ± 0.002 G. max
31. Nepeta cataria  1.23 ± 0.0050.61 ± 0.003 S. lycopersicum
32. Epilobium parviflorum Onagraceae0.75 ± 0.0200.37 ± 0.010 R. sativus
33. Oenothera biennis  2.28 ± 0.0061.14 ± 0.003 Z. mays
34. Plantago lanceolata Plantaginaceae2.85 ± 0.0501.42 ± 0.025 S. lycopersicum
35. Plantago major  1.46 ± 0.0150.73 ± 0.007 S. lycopersicum
36. Setaria viridis Poaceae1.10 ± 0.0060.55 ± 0.003 G. max
37. Persicaria maculosa Polygonaceae3.71 ± 0.0230.93 ± 0.006 G. max
38. Rumex crispus  4.59 ± 0.0090.77 ± 0.001 Z. mays
39. Geum aleppicum Rosaceae3.03 ± 0.0191.51 ± 0.010 S. lycopersicum
40. Linaria vulgaris Scrophulariaceae1.97 ± 0.0100.98 ± 0.005 G. max
41. Urtica dioica Urticaceae1.17 ± 0.007b0.58 ± 0.004b S. lycopersicum
Table 2. Endoreduplication index (EI) for four tissue types (leaf, stem, root and petal) from 41 species
 SpeciesLeaf EI ± SEStem EI ± SERoot EI ± SEPetal EI ± SE
  1. Taxa are arranged alphabetically within families (see Table 1) and values reported are averages over all replicates ± SE of the mean.

  2. a

    Flower heads were used instead of petals.

  3. b

    Estimate based on less than full replicates.

  4. c

    Peduncles or petioles were used instead of stems (no stem as a result of plant morphology or lack of bolting).

1. Chenopodium album 0.07 ± 0.0030.34 ± 0.0160.24 ± 0.0100.33 ± 0.008a
2. Daucus carota 0.02 ± 0.004b0.01 ± 0.003c0.01 ± 0.001
3. Asclepias syriaca 0.24 ± 0.0160.57 ± 0.0270.87 ± 0.008
4. Achillea millefolium 0.02 ± 0.002b0.01 ± 0.002b0.02 ± 0.001b0.02 ± 0.001b
5. Arctium minus 0.04 ± 0.009b0.03 ± 0.004c0.02 ± 0.001
6. Cichorium intybus 0.01 ± 0.001b0.01 ± 0.001c0.02 ± 0.002
7. Cirsium arvense 0.03 ± 0.0010.02 ± 0.0010.02 ± 0.0020.04 ± 0.004
8. Cirsium vulgare 0.02 ± 0.0020.01 ± 0.001c0.01 ± 0.001
9. Crepis tectorum 0.02 ± 0.0030.02 ± 0.0010.02 ± 0.0020.02 ± 0.004
10. Erigeron philadelphicus 0.03 ± 0.0040.01 ± 0.001c0.02 ± 0.002
11. Lactuca serriola 0.01 ± 0.0020.02 ± 0.0020.02 ± 0.0010.03 ± 0.003
12. Solidago flexicaulis 0.05 ± 0.0080.02 ± 0.0020.02 ± 0.0020.07 ± 0.011a,b
13. Symphyotrichum lanceolatum 0.05 ± 0.0050.03 ± 0.0020.01 ± 0.0010.01 ± 0.001
14. Taraxacum officinale 0.03 ± 0.0080.04 ± 0.006c0.02 ± 0.0020.01 ± 0.001
15. Tripleurospermum inodorum 0.02 ± 0.0020.02 ± 0.003c0.01 ± 0.001
16. Cynoglossum officinale 0.20 ± 0.0101.04 ± 0.031c0.57 ± 0.022
17. Brassica nigra 0.53 ± 0.1140.66 ± 0.0480.42 ± 0.024b0.43 ± 0.047
18. Erucastrum gallicum 0.26 ± 0.0450.80 ± 0.0500.40 ± 0.0170.31 ± 0.039
19. Erysimum cheiranthoides 0.59 ± 0.0520.86 ± 0.0440.67 ± 0.0130.40 ± 0.027
20. Lepidium campestre 0.23 ± 0.033b0.56 ± 0.0330.26 ± 0.0160.20 ± 0.012a,b
21. Sisymbrium altissimum 0.50 ± 0.0301.13 ± 0.0320.46 ± 0.0230.80 ± 0.030b
22. Thlaspi arvense 0.65 ± 0.0941.08 ± 0.0420.69 ± 0.0110.42 ± 0.034
23. Cerastium fontanum 0.12 ± 0.0180.38 ± 0.0210.12 ± 0.0060.15 ± 0.010
24. Silene latifolia 0.26 ± 0.0210.86 ± 0.0760.28 ± 0.0221.11 ± 0.044
25. Lotus corniculatus 0.05 ± 0.0050.08 ± 0.0090.21 ± 0.0160.00 ± 0.000
26. Medicago lupulina 0.04 ± 0.0020.35 ± 0.0130.36 ± 0.0160.12 ± 0.008a
27. Medicago sativa 0.05 ± 0.0070.13 ± 0.0140.10 ± 0.0130.15 ± 0.021b
28. Trifolium pratense 0.13 ± 0.0230.59 ± 0.0320.47 ± 0.0060.47 ± 0.029a
29. Geranium robertianum 0.19 ± 0.0250.98 ± 0.057c0.15 ± 0.0260.05 ± 0.007
30. Leonurus cardiaca 0.09 ± 0.0180.04 ± 0.0030.57 ± 0.005
31. Nepeta cataria 0.05 ± 0.0030.03 ± 0.0030.08 ± 0.0050.02 ± 0.003
32. Epilobium parviflorum 0.07 ± 0.0250.04 ± 0.0040.02 ± 0.0020.01 ± 0.002
33. Oenothera biennis 0.04 ± 0.0070.01 ± 0.002c0.02 ± 0.001
34. Plantago lanceolata 0.05 ± 0.0030.09 ± 0.009c0.14 ± 0.0150.65 ± 0.051a
35. Plantago major 0.04 ± 0.0090.01 ± 0.002c0.01 ± 0.0010.25 ± 0.032a
36. Setaria viridis 0.07 ± 0.0100.50 ± 0.0220.62 ± 0.0120.27 ± 0.013a
37. Persicaria maculosa 0.01 ± 0.0030.03 ± 0.0090.11 ± 0.0080.03 ± 0.015a
38. Rumex crispus 0.06 ± 0.0030.48 ± 0.032c0.14 ± 0.008
39. Geum aleppicum 0.11 ± 0.0140.38 ± 0.014c0.09 ± 0.004
40. Linaria vulgaris 0.06 ± 0.0050.05 ± 0.0120.02 ± 0.0010.03 ± 0.004
41. Urtica dioica 0.20 ± 0.0080.41 ± 0.0150.23 ± 0.0180.34 ± 0.031a

Phylogenetic signal

Estimates of λ varied for the six univariate measures of DNA content (Table 3). As a λ value of zero indicates a lack of phylogenetic signal, the high λ scores for EI in all tissue types indicated a large amount of phylogenetic signal, although petals had a lower λ score than the other tissues. With regard to genome size, the 2C-value had a greater phylogenetic signal than the 1Cx-value, although both were lower than the EI λ scores. The multivariate estimate of λ, incorporating all measures of DNA content, indicated a strong phylogenetic signal, whereas the multivariate λ for the trait data indicated a lower phylogenetic signal (Table 3).

Table 3. Phylogenetic signal (Pagel's λ) of 41 angiosperm species in relation to DNA content (genome size and endopolyploidy) and phenotypic traits
 Pagel's λlogeL (λ = MLE)logeL (λ = 0)logeL (λ = 1)
  1. The loge likelihood (logeL) scores represent the probability of observing the data given each of three models: λ = 0, where traits are assumed to have evolved independently of phylogeny; λ = 1, where traits are assumed to have evolved following a model of Brownian motion; and λ = maximum likelihood estimate (MLE; 0 < λ < 1), where traits are less similar among related species than expected under a model of Brownian motion. EI, endoreduplication index.

  2. a

    Approximated using the first 16 principal components.

2C genome size0.74−37.69−48.29−56.30
1Cx genome size0.54−34.99−42.52−46.94
Leaf EI0.93−45.80−62.91−52.28
Stem EI0.94−62.97−80.65−78.80
Root EI0.93−58.49−75.42−73.42
Petals EI0.85−61.09−69.15−74.81
DNA content alone0.71−244.80−263.98−295.02
Traits alonea0.52−1098.49−1107.42−1201.93
DNA content + traitsa0.61−1125.09−1138.41−1213.65

Multivariate analysis

The PCCA results indicated that axis 1 explained 34.7% and axis 2 explained 23.7% of the total variation captured in the analysis (Table 4). The canonical correlation values for axes 1 and 2 were 0.728 and 0.601, respectively, and the eigenvalues dropped off sharply after the first two axes; therefore, only the variation in axes 1 and 2 was interpreted. Genome size was not correlated with axis 1 (1Cx-value, P = 0.080; 2C-value, P = 0.608), but EI in all four tissue types was correlated positively with this axis (P < 0.001; Table 5). All measures of genome size and EI were correlated with axis 2 (P < 0.001), except for EI in petals. In addition, genome size was correlated negatively with EI in axis 2, as indicated by the negative axis loadings for the 2C-value and 1Cx-value (P < 0.001; Table 5).

Table 4. Summary of the phylogenetic canonical correlation analysis (PCCA)
 Axis 1Axis 2Axis 3Axis 4Axis 5Axis 6
Canonical correlations0.7280.6010.4470.4270.3730.337
Proportion of total variation explained0.3470.5840.7150.8340.9251.000
Table 5. Canonical correlations of DNA content (genome size and endopolyploidy) and phenotypic traits in relation to 41 angiosperm species from the phylogenetic canonical correlation analysis (PCCA)
VariableAxis 1Axis 2
r/R2 P r/R2 P
  1. Bold values are significant at  0.01. Axis loadings were determined using phylogenetic generalized least-squares (PGLS) correlation analysis (for numerical traits) and PGLS ANOVA (for categorical traits). For numerical traits, r values indicate the PGLS Pearson's product-moment correlation between the original trait and the axis scores. For categorical traits, R2 values indicate the coefficient of determination from a PGLS ANOVA model with the axis scores as the response variable and the original trait as the grouping variable. AM, arbuscular mycorrhizal; EI, endoreduplication index.

  2. a

    Continuous variables = r, calculated using PGLS correlation analysis.

  3. b

    Categorical variables = R2, calculated using PGLS ANOVA.

DNA content
2C-value0.08a0.608 −0.59 a < 0.001
1Cx-value0.28a0.080 −0.71 a < 0.001
Leaf EI 0.58 a < 0.001 0.59 a < 0.001
Stem EI 0.72 a < 0.001 0.64 a < 0.001
Root EI 0.74 a < 0.001 0.60 a < 0.001
Petal EI 0.86 a < 0.001 −0.12a0.469
Pollen length −0.47 a 0.002 −0.20a0.219
Leaf length0.15a0.357−0.23a0.153
Cotyledon length−0.01a0.961−0.32a0.043
Stem height0.22a0.170 −0.39 a 0.010
Seed length0.21a0.1970.10a0.535
Seed mass 0.43 a 0.005 −0.04a0.821
Seed number0.02a0.891−0.22a0.172
Native or introduced0.00b0.7580.00b0.684
Dispersal mechanism 0.40 b 0.001 0.09b0.476
Root type0.12b0.0870.15b0.046
AM association 0.19 b 0.004 0.05b0.149
Reproductive strategy0.03b0.5180.15b0.043
Life history0.19b0.164 0.45 b 0.001
Flowering time 0.29 b 0.005 0.17b0.068
Floral biology0.17b0.0260.17b0.031
Light preference0.08b0.1890.08b0.223
Soil preference0.29b0.0130.26b0.024

In axis 1, five traits were correlated significantly with EI (Table 5). Pollen length was correlated negatively with EI (P = 0.002), and seed mass was correlated positively with EI (P = 0.005). Three categorical variables were also correlated with EI in axis 1: dispersal mechanism (P = 0.001), AM association (P = 0.004) and flowering time (P = 0.005). In axis 2, two traits were correlated significantly with both genome size and EI (Table 5). Stem height was correlated positively with the 1Cx-value and 2C-value and correlated negatively with EI in leaf, stem and root tissue (P = 0.010). Life history was correlated significantly (P = 0.001) with both genome size and EI (except for petal tissue). The biplot diagram in Fig. 1 gives additional information on the strength and direction of the traits that correlate with the variation in DNA content. Variation in traits resulted in clusters of species in the ordinations (Figs 2, 3). The species ordinations for genome size show clustering along axis 2 (Fig. 2), although this clustering is more pronounced for the 1Cx-value. Species also clustered according to high or low EIs, primarily along axis 1 (Fig. 3).

Figure 1.

Biplot of the significant phenotypic traits ( 0.01) from the phylogenetic canonical correlation analysis (PCCA). The P value for each trait can be found in Table 5. For categorical traits, more than one category within the same trait could drive the ordination: life history (a), perennials; life history (b), having multiple life history strategies (annual, biennial, perennial). AM, arbuscular mycorrhizal.

Figure 2.

Phylogenetic canonical correlation analysis (PCCA) ordination showing the relationship between phenotypic traits and genome size ((a) 2C-value, (b) 1Cx-value) of 41 angiosperms. Species with a ‘small’ genome size (2C-value < 2 pg and 1Cx-value < 1 pg) are represented by open circles, and species with ‘larger’ genome sizes (2C-value > 2 pg and 1Cx-value > 1 pg) are represented by closed circles. Numbers correspond to the species identifications given in Table 1. Species scores are presented in the phylogenetically dependent species space.

Figure 3.

Phylogenetic canonical correlation analysis (PCCA) ordination showing the relationship between phenotypic traits and endopolyploidy in (a) leaf, (b) stem, (c) root and (d) petal tissues of 41 angiosperms. Species that are nonendopolyploid (EI value < 0.1) are represented by open circles and species that are endopolyploid (EI value > 0.1) are represented by closed circles. Numbers correspond to the species identifications given in Table 1. Species scores are presented in the phylogenetically dependent species space.


The phylogenetic signal was much stronger for DNA content than for traits (Table 3). The strong phylogenetic signal for endoreduplication was not unexpected, as it has long been understood that the presence of endopolyploidy is limited to certain groups or families (Tschermak-Woess, 1956; D'Amato, 1964; Nagl, 1978; Barow & Meister, 2003). For example, species within the Brassicaceae had high degrees of endopolyploidy, and species within the Asteraceae had low degrees of endopolyploidy. The lower phylogenetic signal for petal EI could be explained by the fact that several perennial species did not flower during the experiment, resulting in some families with significantly reduced representation (i.e. five species in the Asteraceae did not bloom). This is also probably the cause of the lack of correlation between petal EI and axis 2 (Table 4).

The weakest phylogenetic signal for DNA content was found for the 1Cx-value (Table 3). However, there was strong phylogenetic signal for the 2C-value (Table 3), and species with ‘large’ 2C-values were often polyploids and in the same family (such as the Asteraceae and Caryophyllaceae; Table 1). This indicates that, although changes in holoploid genome size are tightly linked to phylogenetic relatedness among species, changes in monoploid genome size are perhaps not as tightly linked. This has interesting implications for the evolution of monoploid genome size, which is more likely to be affected by the amount of transposable elements or other repetitive sequences. Admittedly, our determination of the 1Cx-value was subject to errors as we did not have original chromosome counts. Regardless, current understanding of ancestral ploidy events is still weak and accurate determination of what constitutes a polyploid would probably also affect the outcome.

The PCCA confirmed that EI is correlated negatively with genome size (Table 4). The ordinations showed a tight cluster of species with smaller genome sizes and higher EI (Figs 2, 3). In general, low EI values were found in species with a range of genome sizes, but high EI was generally limited to species with small genomes. It is predicted that endoreduplication of large genomes would be time-consuming and costly. The PCCA also revealed several phenotypic traits that were correlated significantly with EI, and two of these traits were also correlated with genome size (Table 4). These traits drive the variation expressed in the species ordinations (Fig. 1).

Pollen length and seed mass were correlated significantly with EI, but, contrary to expectation, neither was correlated significantly with genome size. Pollen nuclei have been documented to undergo endoreduplication (Sunderland et al., 1974), but it is unclear how common this is in nature. Seeds can also have endopolyploid nuclei (Nagl, 1978) and, although not measured in this study, the presence or absence of endoreduplication in other tissues can indicate the potential for it to occur in seeds, and might explain the positive relationship between EI and seed mass. The lack of correlation with genome size and seed mass (and seed length) could be a result of low trait variation in this relatively small dataset of common herbaceous species.

At first glance, the results for life history seem to be contradictory to earlier reports with regard to a relationship with genome size, as some of the variation in axis 2 (Table 4, Fig. 1) is attributed to finding perennials at the top of the ordination (smaller genome sizes) and species that exhibit a range of life history strategies at the bottom of the ordination (larger genome sizes). However, on closer inspection, it becomes evident that the open circles in the genome size ordinations (with ‘smaller’ genome sizes, Fig. 2) and the closed circles on the EI ordinations (with ‘higher’ EIs, Fig. 3) are actually mostly annuals and biennials, which matches previous research. In addition, several of the species found in the life history category of exhibiting many strategies are most often found growing as perennials. In general, the perennial species in this study had larger genomes, though this is not to say that all perennials have large genomes; rather, species with large genomes are less likely to be annuals (Bennett, 1972; Grime & Mowforth, 1982). Barow & Meister (2003) found that endoreduplication was associated with life history, with annuals being more likely to have high degrees of endopolyploidy, and this was corroborated in this study. Barow & Meister (2003) suggested that endoreduplication allowed plants to exploit habitats that they might otherwise be excluded from temporally by combining the advantages of a small genome (rapid cell cycle) with the advantages of a large genome (growth by cell expansion).

When all of the strongly significant relationships between traits and DNA content (< 0.01) are observed together, it is clear that most of these traits can be considered as aspects of plant growth strategies and are linked to plant competitive ability (Grime, 1977; Bonser & Ladd, 2011). This suggests a relationship between DNA content and several aspects of plant form and function. For example, species with lower EI values are more likely to have taller stems, disperse via wind, and are more likely to flower later in the year (Figs 1, 3). Species with lower EI values are also more likely to associate with AM fungi, and, in addition, other traits relating to soil biology (soil preference, root type) were weakly significant (< 0.05) in one or both axes. Although the relationship between EI and the association with AM fungi is negative, plants that are colonized with AM fungi actually experience an increase in the degree of endopolyploidy in root cells, regardless of whether or not the plant is known to have endopolyploid nuclei (Bainard et al., 2011).

It is clear that, although there is a strong phylogenetic component to variation in both genome size and endoreduplication, these measures of DNA content are still strongly correlated with various traits relating to plant success. PCCA provided an effective way to measure the relationship between DNA content and many traits at once. Further research should continue to explore different species with broader phylogenetic coverage, and with a range of morphological and ecological traits. In addition, although it is clear from the current body of literature that plant DNA content is correlated with phenotype, more research is needed to understand the functional significance of these relationships.


We are grateful for comments on an earlier draft of the manuscript from three anonymous reviewers. We would also like to thank K. O'Brien and B. Yim for laboratory assistance, C. A. Lacroix for plant identification assistance, and M. Mucci and T. Slimmon at the University of Guelph Phytotron. Funding support was provided to J.D.B. (Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship (PGS)), L.D.B. (NSERC PGS) and S.G.N. (grants from NSERC and the Canadian Foundation for Innovation).