The contrasting effects of genome size, chromosome number and ploidy level on plant invasiveness: a global analysis


  • Maharaj K. Pandit,

    1. Department of Environmental Studies, Centre for Inter-disciplinary Studies of Mountain & Hill Environment, University of Delhi, Delhi, India
    Search for more papers by this author
  • Steven M. White,

    1. Centre for Ecology & Hydrology, Crowmarsh Gifford, Wallingford, Oxfordshire, UK
    2. Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Oxford, Oxfordshire, UK
    Search for more papers by this author
  • Michael J. O. Pocock

    Corresponding author
    1. Centre for Ecology & Hydrology, Crowmarsh Gifford, Wallingford, Oxfordshire, UK
    Search for more papers by this author


  • Understanding how species' traits relate to their status (e.g. invasiveness or rarity) is important because it can help to efficiently focus conservation and management effort and infer mechanisms affecting plant status. This is particularly important for invasiveness, in which proactive action is needed to restrict the establishment of potentially invasive plants.
  • We tested the ability of genome size (DNA 1C-values) to explain invasiveness and compared it with cytogenetic traits (chromosome number and ploidy level). We considered 890 species from 62 genera, from across the angiosperm phylogeny and distributed from tropical to boreal latitudes.
  • We show that invasiveness was negatively related to genome size and positively related to chromosome number (and ploidy level), yet there was a positive relationship between genome size and chromosome number; that is, our result was not caused by collinearity between the traits. Including both traits in explanatory models greatly increased the explanatory power of each.
  • This demonstrates the potential unifying role that genome size, chromosome number and ploidy have as species' traits, despite the diverse impacts they have on plant physiology. It provides support for the continued cataloguing of cytogenetic traits and genome size of the world's flora.


Analyses of how traits of different species relate to aspects of their status have been long considered a tool in conservation biology (Fisher & Owens, 2004). From these relationships, it is possible to infer the mechanisms that promote or permit species' status, for example, their rarity, invasiveness or population trends. However, while such approaches have been widely used, they have had mixed success, with sometimes inconsistent results across taxonomic groups or geographic regions (Williamson & Fitter, 1996; Kunin & Gaston, 1997; Pyšek & Richardson, 2007).

Invasiveness is a trait that is especially valuable to consider with cross-species analyses because there is great value in identifying species likely to be invasive, given the huge difference in the cost of management of invasives at different stages in their establishment (Pyšek & Richardson, 2010). Of course, invasiveness is, to an extent, context-specific (Van Kleunen et al., 2010a). However, if invasive species could be predicted from their traits then it would support governments' efforts to fulfil their obligation to ‘as far as possible and as appropriate, prevent the introduction of, control or eradicate those alien species which threaten ecosystems, habitats or species' (Article 8 h in the Convention on Biological Diversity (CBD)). Several biological traits have been shown to be important in explaining plant invasiveness, for example, short generation time, high growth rate and high fitness (Pyšek & Richardson, 2007; Ordonez et al., 2010; Van Kleunen et al., 2010a,b; Schmidt & Drake, 2011). Also, species' traits such as chromosome number and ploidy level have shown potential in explaining invasiveness (Soltis & Soltis, 2000; Pandit, 2006; Pandit et al., 2006, 2011). In addition to these traits, genome size has been used successfully to explain extinction risk (Vinogradov, 2003), and although it has a variable effect on invasiveness in individual taxa (Gallagher et al., 2011; Varela-Álvarez et al., 2012), there has been no attempt to assess this at a large scale across the plant phylogeny.

Genome size is an invariant characteristic of an individual and is usually invariant within a species; the amount of nuclear DNA follows a set of simple multiples of its basic quantity, designated as ‘C-values’ (e.g. 1C, 2C, 4C, 8C). 1C is the amount of DNA in the unreplicated gametic nucleus of an organism (i.e. the holoploid genome size; Greilhuber et al., 2005), and the C-values have subsequently been used as a reference value for genome size studies. Nuclear DNA content varies c. 2400-fold in angiosperms as a result of changes in the amount of noncoding DNA sequences and genome duplication (Bennett & Leitch, 2011). Despite what was once thought, it has no relationship with an organism's phenotypic complexity (Gregory, 2001), but it does influence a wide array of characteristics, for example, rate of cell division, sensitivity to radiation and ecological behaviour in plant communities (reviewed in Bennett, 1987; Bennett & Smith, 1991). Genome size has been described as a trait that ‘uniquely lies at the intersection of phenotype and genotype’ (Oliver et al., 2007) and, for this reason, it has also been described as an ‘important biodiversity character, whose study provides a strong unifying element in biology with practical and predictive uses’ (Bennett & Leitch, 2005). In plants, comparative studies have suggested that large genome size is maladaptive through its constraints on plant physiology (Vinogradov, 2003; Knight et al., 2005). However, some have also suggested that large genome sizes may be beneficial; for example, in some fish, high DNA C-values (as a result of the accumulation of noncoding DNA) are associated with lower basal metabolic rates, which appears to allow them to adapt to environmental niches with lower energy supply (Szarski, 1983). It is also possible that variation in genome size has little adaptive value: the neutral theory of selection (Oliver et al., 2007).

Genome size influences a wide range of plant physiological and evolutionary traits (Bennett & Leitch, 2005) which have individually been shown to relate to invasiveness (Van Kleunen et al., 2010b), so we expected that invasive plants would have relatively small genome size. This fits with the conjecture that large genome size is maladaptive (Orgel & Crick, 1980; Rejmánek, 1996). Based on predictions of the effects of genome duplication and polyploidization, we expected genome size to be positively correlated with the cytogenetic traits (ploidy and chromosome number). However, we also expected a positive effect of ploidy and chromosome number on invasiveness (Pandit, 2006; Pandit et al., 2006, 2011; Schmidt & Drake, 2011), because chromosome number is positively related to rates of adaptation (te Beest et al., 2012) and polyploidy leads to an evolutionary advantage as a result of the effects of heterosis and gene redundancy (Comai, 2005). The fact that these pairs of expectations contradict each other was identified by Rejmánek (1996), who also identified that ‘research on this subject seems to be very scanty’.

In the current study, we tested for relationship of genome size with invasiveness in angiosperms, using a global dataset of species from across the angiosperm phylogeny. We compared these results with the relationship of cytogenetic traits (chromosome number and ploidy) with invasiveness. Throughout we considered phylogeny and the latitude of each species, given the evidence of the effect of both on genome size (Bennett et al., 1998; Knight et al., 2005).

Materials and Methods

Data on chromosomal data and invasiveness

Holoploid genome size (DNA 1C-values of species in pg) and chromosome numbers were collated from the Kew Royal Botanic Gardens Plant C-values database, release 5.0 (; Bennett & Leitch, 2010). We undertook analyses on a balanced subset of the species for which there was information on genome size, ploidy level and chromosome number (described later in the section on 'Data analysis'). We defined invasive plants as those that were included in the Global Invasive Species Database (GISD; and Pacific Island Ecosystems at Risk (PIER; list. These two databases provide a global perspective on invasiveness in plants. Our dataset therefore had similar scope and global geographic coverage to our previous study (Pandit et al., 2011).

Latitudinal data

It has previously been suggested that genome size and cytogenetic traits vary according to latitude, with a peak at temperate latitudes (Bennett, 1987; Knight et al., 2005). We therefore extracted information on the distribution of each species from the Global Biodiversity Information Facility (accessed through GBIF Data Portal,, 2013-02-04) by calculating the average latitude of the centres of one-degree latitude/longitude grid cells in which the species had been recorded. The extraction of these data from GBIF was automated with the Rgbif package (Chamberlain et al., 2012) in R 2.15.2 (R Development Core Team, 2012), with additional code written by us to gather data on all the synonyms of each taxon under consideration (as listed by GBIF). We considered the distribution of occupied cells rather than the distribution of individual records, because it was more robust to spatial variation in recorder intensity, and considered the absolute value of latitude because it provides a better assessment of the latitude for species introduced from the southern to the northern hemisphere, and vice versa. A small number of records may have been wrongly geolocated, but our observation of the location data suggests this is negligible in influencing the average absolute value of latitude.

Phylogenetic data

We constructed the phylogenetic tree according to a fully resolved family-level phylogeny ( in Phylomatic v3 (Webb & Donoghue, 2005), based on the Angiosperm Phylogeny (APG III, 2009). We calibrated the branch lengths in the tree using the BLADJ algorithm in Phylocom 4.2 (Webb et al., 2008). It assigns dates to nodes contained in a dated tree (Wikström et al., 2001) and then divides the remaining, unassigned, nodes evenly across time. Although simple, this is a widely used routine that improves on alternative methods for calibrating phylogenetic trees (Webb, 2000) and provides similar results in phylogenetically informed analyses to other methods (e.g. Davies et al., 2013). The minimum branch lengths from this analysis were 6.25 Myr, but because we wanted to include all aneuploids (chromosome number variants within a species; 63 instances across 52 species) in the analysis, we set their branch lengths to an arbitrary small value of 0.1 Myr.

Data analysis

In our analysis we tested the relationships of invasiveness with genome size and chromosome number, with and without latitude as a covariate. We found that there were computational limitations in adopting a fully phylogenetically informed approach with the whole dataset; specifically, the highly unbalanced nature of the full dataset (i.e. 90% of genera in the full dataset did not have invasive species present) regularly led to lack of model convergence, while run time was estimated to be at least several weeks for each model (it scaled exponentially with sample size). Therefore, we undertook the analysis with the 62 genera for which there were both invasive and noninvasive species. We thus excluded 854 and 35 genera for which there were, respectively, only noninvasive and invasive species, although the majority of these genera (61%) comprised only one species. We excluded a further 50 species for which distribution data were not present in GBIF, but excluding these species did not influence the final number of genera. Overall, we reduced the sample size from 4504 to 890 species (see the 'Results' section), but we retained as many highly informative comparisons as possible (i.e. between congeners; Pandit et al., 2011), while creating a smaller, more balanced dataset suitable for analysis. This, then, was akin to a ‘sister pairs’ analysis. Importantly, because species within a genus have a tendency to co-occur regionally, this analysis also helps to account for regional variation in the intensity of records in GBIF (Yesson et al., 2007) and the unbalanced geographical representation of the Kew Plant C-values database (Leong-Škorničková et al., 2007).

Given that species’ traits are often not randomly distributed across phylogenetic trees, we undertook analyses using a phylogenetically informed approach, thus incorporating an appropriate degree of phylogenetic signal (Revell, 2010). In our analyses, when the response trait was continuous, we used phylogenetic generalized least squares (PGLS) analyses with the function ‘pgls’ in ‘caper’ (Orme et al., 2011). When the response variable was binary (e.g. invasive or not), we used phylogenetic logistic regression (PLR; Ives & Garland, 2010), which is a logistic regression with the appropriate degree of phylogenetic signal, run in MATLAB (Release 2013a; The MathWorks Inc., Natick, MA, USA) with code available from T. Garland.

For all analyses, we complemented the fully phylogenetically informed approaches with a generalized linear mixed model (GLMM) in which genus was treated as a random intercept, thus retaining within-genus comparisons. Although reporting both phylogenetically informed and cross-species analyses is not recommended (Freckleton, 2009), the value of using GLMMs is that they allowed us to assess model fit (both absolute model fit with r2 and relative model fit with Akaike's information criterion (AIC)); these values are not currently possible to obtain for PLRs (Ives & Garland, 2010). Model fit was apportioned as the proportion of variance explained by the fixed effects (inline image) and the proportion of variance explained by the total model (inline image) (Nakagawa & Schielzeth, 2013). These models were run with the function ‘lmer’ and the significance of the variables was estimated with ‘mcmcamp’ in ‘lme4’ (Bates et al., 2012) in R 2.15.2.

We also tested for a positive relationship between genome size and cytogenetic traits (chromosome number and ploidy) by using PGLS models with genome size as the dependent variable and by considering the additive and interaction effects of latitude on the relationship.


Our final dataset comprised the species for which we had chromosome numbers, genome size and distribution data, from all genera for which there were both invasive and noninvasive species: that is, 890 species from 62 genera in 27 families belonging to 21 orders. The species in the dataset were from across the angiosperm phylogeny (Supporting Information, Fig. S1) and were well distributed across latitudes, from tropical to northern temperate regions (Fig. S2).

We found that invasiveness was negatively related to holoploid genome size but positively related to chromosome number (Table 1; Figs 1, 2). We found best support for models that included genome size and chromosome number together. In these models, the qualitative results were the same as for the traits individually, but the magnitude and significance of the effects were increased (Table 1; Figs 1, 2). The models explaining invasiveness showed little phylogenetic signal (in the PLRs the measure of phylogenetic signal was low: < −2.7; Ives & Garland, 2010), which is what we expected because ‘invasiveness’ is a complex trait that is not directly inherited. These findings confirmed our expectations, and the simplest way of explaining them is that the two independent traits are negatively associated. However, the findings were particularly striking, because genome size and chromosome number are actually positively related, as we predicted (Figs 2, S3; Table S1). This positive relationship showed strong phylogenetic signal (in the PGLS models, the measure of phylogenetic signal was high: λ > 0.92; Revell, 2010), which confirmed our expectations because both genome size and chromosome number are directly inherited.

Table 1. Effect sizes (unstandardized beta) from the relationship of plant invasiveness with genome size (DNA 1C-value; model 1) and chromosome number (model 2), both together (model 3) and together with an interaction (model 4), with the best supported model being model 3
ModelParametersPhylogenetic logistic regression (PLR)aGeneralised linear mixed model (GLMM) with genus as a random effect
Beta P a b Beta P AICΔAICcinline image (%)dinline image (%)d
  1. a

    We were unable to perform model selection for the PLRs because of the lack of a verified method for calculating model fit (AIC or r2) for these types of models, so we included GLMMs to provide an assessment of fit.

  2. b

    a is a measure of the phylogenetic signal of the PLR; values < −2 indicate weak phylogenetic signal.

  3. c

    ΔAIC is an assessment of the relative model fit and is the difference between the model's Akaike's information criterion (AIC) and the minimum AIC.

  4. d

    inline image is an assessment of the variance explained (i.e. the absolute model fit) when considering: (m) the fixed effects alone, and (c) the fixed and random effects.

1Log2 (DNA 1C-value)−0.1860.020−3.02−0.1720.095708.4314.171.415.2
2Log2 (chromosome number)0.3150.007−2.710.519< 0.001699.885.633.717.1
3Log2 (chromosome number)0.522< 0.001−3.060.653< 0.001694.2609.018.6
Log2 (DNA 1C-value)−0.311< 0.001 −0.2990.005    
4Log2 (chromosome number)0.4690.013−3.020.6090.006696.181.929.319.0
Log2 (DNA 1C-value)−0.4400.326 −0.4500.410    
Log2 (chromosome number): Log2 (DNA 1C-value)0.0280.761 0.0320.776    
Figure 1.

The relationship of the probability that a species in our dataset is invasive with: (a) genome size (DNA 1C-value); (b) chromosome number; and (c) chromosome number and genome size. In panels (a) and (b) the results of the fully phylogenetically informed analyses (phylogenetic logistic regression; PLR) are shown in red, while, from the generalized linear mixed model, the overall average effect is shown in black and effects for individual genera are shown in grey. Individual data points are shown as translucent points and are jittered in the y-axis for clarity. These genus-level random effects and individual data points are omitted for clarity in panel (c). In (c), the additive effect of genome size is presented at low, medium and high values (DNA 1C-value = 0.5 (narrow solid lines), 2 (bold solid lines) and 8 (dashed lines), respectively). Relationships with ploidy level instead of chromosome number are very similar, and so are not shown.

Figure 2.

Standardized effect sizes of the phylogenetic logistic regressions (PLRs) between holoploid genome size (DNA 1C-value), chromosome number and invasiveness. Arrow widths are proportional to standardized effect sizes and significance is indicated as follows: *, < 0.05; **, < 0.001. Black arrows, negative relationships; white arrows, positive relationships. The joined arrows indicate the model in which the two traits are included as additive effects.

We used three lines of evidence supporting the conclusion that genome size and chromosome number are best included together in models to explain invasiveness: model fit (r2), relative model fit (AIC) and standardized effect sizes (the latter two as recommended by Freckleton (2009)). It is not currently possible to obtain r2 or AIC for PLRs (Ives & Garland, 2010) so we relied on the results of the GLMMs. We were confident in doing this because the measure of phylogenetic signal in the PLRs was low (< −2.7) and model parameters were similar between the two (Table 1). The fit of the fixed effects to the data (inline image) increased considerably when the two traits were included together (i.e. r2 rose from < 4% with each univariate model to 9% with both traits; Table 1). The best-fitting candidate model (i.e. lowest AIC) was that which included both traits, with some support for the model with an interaction between the two and decreasing support for the models with chromosome number alone and genome size alone (Table 1). The standardized model parameters revealed that standardized effect sizes of genome size and chromosome number were similar in magnitude, albeit in opposite directions, but when included together, the magnitude of each almost doubled (Fig. 2). In other words, genome size explains not only variation in invasiveness but also, importantly, residual variation of the relationship of chromosome number with invasiveness.

We present results for chromosome number because this is a directly observable trait, but all our reported results were very similar to ploidy level (Tables S1, S2). Latitude was not an important explanatory variable for invasiveness, chromosome number or ploidy level (Table S3). Genome size was significantly higher at higher latitudes, but there was no evidence of a unimodal (quadratic) relationship. Latitude was not an important covariate in models explaining invasiveness (Table S2). There was little phylogenetic signal in the results (the value of phylogenetic signal, a, in the PLR models was always < −2.7 (Tables 1, S2; Ives & Garland, 2010). Also, although we used information on invasive plants from two sources (the GISD and the PIER database), all our results were qualitatively similar, whether considering GISD alone, PIER alone or both (Table S4).

The simplest explanation for our findings about the relationship between genome size or chromosome number and invasiveness was that the two are negatively associated, but the data confirmed our expectations that genome size is significantly positively related to chromosome number. The simplest PGLS model was: log2 (DNA C-value) = −1.327 + (0.460 × log2(chromosome number)), with both intercept and slope being significantly different from zero (= 0.047 and < 0.001, respectively). Therefore, a doubling of chromosome number results in a 1.38-fold increase in genome size (because 20.460 = 1.38). However, there was support for a more complex PGLS model in which genome size was a function of the interaction between chromosome number and latitude squared. The relationship between genome size and chromosome number was steepest at high latitudes (a doubling of chromosome number resulted in a 1.8-fold increase in genome size at a latitude of 55°, but a 1.3-fold increase at a latitude of 30°; Fig. S3). In all PGLS models, the effect of phylogeny was substantial (λ > 0.925, indicating strong phylogenetic autocorrelation). There was a similarly strong relationship between genome size and ploidy level (Table S1).


The results presented in this study show that there is strong evidence that invasiveness is associated with both smaller genome sizes and larger chromosome numbers (and ploidy levels). The results also show that there is synergy in explaining invasiveness with both traits together rather than considering each separately. The results for the individual traits are despite the conflicting positive relationship between genome size and chromosome number (and ploidy), and so all three sets of relationships (Fig. 2) confirm the conjecture of Rejmánek (1996) using a global dataset of species from across the angiosperm phylogeny.

Our results raise two important questions. The first question is, how is it possible for all three relationships to be significant when they appear to conflict? Collinearity between genome size and chromosome number would have been the simplest explanation, but these traits are positively related (Fig. 2), so collinearity is not the answer (Rejmánek, 1996). The effect of genome size and chromosome number is much stronger when considering both traits together in an analysis (i.e. standardized betas are increased; Fig. 2), which shows the importance of genome size, when considering the effect of chromosome number, and vice versa. Therefore, one parsimonious interpretation is that invasiveness is related to changes in chromosome number/ploidy (and its consequent effect on genome size) and to changes in genome size for a given chromosome number/ploidy. Genome downsizing after whole-genome duplication (Ibarra-Laclette et al., 2013) also helps explain these effects and there may be interactions between the effects of genome size and ploidy on plant physiology, for example, increases in genome size being more important as ploidy level increases (Bennett & Smith, 1972).

The second important question raised by the results is, what are the causal mechanisms explaining the relationship of invasiveness with genome size and chromosome number/ploidy? Genome size, chromosome number and ploidy each has effects on diverse aspects of plant physiology, and there are many mechanisms by which they may influence plant status, such as invasiveness. Considering genome size, it appears to affect adaptability of plant species, with larger genome sizes failing to adapt to variable habitats, while plants with smaller genomes thrive successfully and become invasive (Bennett, 1987; Bennett et al., 1998). This is possibly because smaller genomes are associated with smaller cell size (Cavalier-Smith, 1982) and faster rates of mitotic and meiotic divisions (Gregory, 2001; Francis et al., 2008; Knight & Beaulieu, 2008), faster germination (Minelli et al., 1996) and hence reduced generation times (Bennett, 1972; Grime et al., 1985; Mowforth & Grime, 1989). It is likely that this is an adaptation to time-limited environments, thus preadapting the plant to invasiveness (Rejmánek, 1996). Smaller genome size is also associated with smaller seed mass (Bennett, 1987; Knight & Ackerly, 2002) and lower plant height (Minelli et al., 1996), which, as a result of complex tradeoffs in plant traits, could lead to increased or decreased speed of spread and competitiveness (Thomson et al., 2011; Caplat et al., 2012). Even stronger evidence for these mechanisms comes from within-species studies, for example, that genome downsizing leads to increased colonization potential (Lavergne et al., 2010). Polyploidy, and hence higher chromosome numbers, also contribute to increased invasiveness through the beneficial effects of heterosis, increased speed of cell division, gene redundancy and increased phenotypic variation (Bennett & Smith, 1972; Comai, 2005; te Beest et al., 2012), which can ‘preadapt’ taxa to be invasive or to evolve invasiveness (te Beest et al., 2012). Empirical studies on individual invasive plant species such as Centaurea stoebe (=Centaurea maculosa; Treier et al., 2009; Hahn et al., 2012) and Claytonia perfoliata (McIntyre, 2012) have helped to elucidate these mechanisms and they have been discussed in previous cross-species studies on the effect of chromosome number and ploidy on plant status (Pandit, 2006; Pandit et al., 2011).

We found no effect of latitude on the relationship of chromosomal traits with invasiveness (Table S2), but genome size increases with latitude, when taking chromosome number into account, and it increases more rapidly with chromosome number at higher latitudes (Table S1; Fig. S3). This relationship appeared linear rather than unimodal (Bennett et al., 1998; Knight et al., 2005), probably because we had few high-latitude species in the dataset (the absolute latitude of the range of most species was < 60°) and the omission of arctic species may explain the lack of an observed relationship of latitude with ploidy.

Plant traits such as genome size, ploidy and chromosome number show potential to be unifying characters explaining plant status, but we believe that there is important future work to further elucidate the mechanisms linking these traits to invasiveness and to discover how these relate to the different stages in the route to becoming invasive (Kubešová et al., 2010). Within this context, the intention to continue cataloguing the genome size of the world's flora (Bennett & Leitch, 2011; Galbraith et al., 2011) is to be welcomed. We note, however, that increasing representation of species within genera, where arguably it is most useful in conservation practice, is not a specific target of the Plant Genome Size workshops (Bennett & Leitch, 2011). Despite holoploid genome size being ‘less cumbersome’ to measure than chromosome number (Galbraith et al., 2011), our results show that both traits are important and data on both traits should be collected for maximum benefit to conservation practice.

Finally, the bigger evolutionary question that needs to be answered concerns the role and existence of ‘selfish’ DNA (Orgel & Crick, 1980). Whether or not genome size is under direct selection (Oliver et al., 2007), increased genome size does appear, through its diverse impacts on plant competitiveness, plasticity, speed of adaptation or dispersal, to be negatively related to plant ‘success’, whether that is considering the ability of species to become invasive (Figs 1, 2), to avoid becoming rare (Vinogradov, 2003), or to respond to climate change (Caplat et al., 2013). Having a holistic approach to understanding the status of species is therefore important (Van Kleunen & Richardson, 2007; Caplat et al., 2013). Mechanisms influencing genome size, apart from polyploidy, still remain to be addressed; for example, if smaller genomes proffer adaptive advantage to plant species, is this because redundant or repetitive sequences are trimmed from the genome? Even though this study does not provide answers to these questions, the clear associations that we have uncovered and the links with putative physiological mechanisms make the study of genome size a potentially powerful tool for conservation and evolutionary biologists.


M.J.O.P. was partly supported by a NERC postdoctoral fellowship (grant number NE/F014546/1). We thank Chan Wai Kit for help with data compilation of endangered species and Ray Callaway for constructive comments and suggestions.