Genome size and genomic GC content evolution in the miniature genome-sized family Lentibulariaceae



Since the first measurements of genome size in the early 1950s (Swift, 1950), researchers have tried to estimate the maximum capacity of plants for genome growth and the minimum DNA content essential for proper cell function. Plants with smaller genome size soon became important subjects of study as it was possible to completely sequence their genome without the need for processing a huge amount of uninformative, repetitive DNA (Flagel & Blackman, 2012) which covers the bulk of their genomes (Bennetzen et al., 2005; Ambrožová et al., 2011). Unsurprisingly, the first nearly-complete genome sequence published was Arabidopsis thaliana (Arabidopsis Genome Initiative, 2000) as it was then considered to be the plant with the smallest genome (Bennett & Leitch, 2005). Analysis of the Arabidopsis genome (1C ≈ 157 Mbp; Bennett et al., 2003) and the virtual removal of repetitive DNA and duplicated genes lead to the theoretical estimate of the minimum size of gene complement needed for plant functioning as 1C ≈ 50 Mbp (Bennett & Leitch, 2005).

Such small genomes were soon discovered by Greilhuber et al. (2006) in the carnivorous family Lentibulariaceae (Lamiales). They documented the genome size of two samples of Genlisea aurea as low as 1C = 63.4 Mbp (originally, one sample of G. aurea was misidentified as G. margaretae). In addition to this, relatively small genomes with 1C < 1000 Mbp were found to prevail in all three monophyletic lineages of the family, that is, the genera Genlisea, Pinguicula and Utricularia. Until recently, however, genome size is known only for c. 8% of the Lentibulariaceae species, which contains 29 Genlisea, c. 233 Utricularia and c. 101 Pinguicula species. This provides the challenge to search for other species with miniature genomes and possible genomic models.

Detailed sequence analyses of G. aurea and Utricu-laria gibba which have been published in the last months (Ibarra-Laclette et al., 2013; Leushkin et al., 2013) clearly confirm the expected minimalistic genome composition of these species and show that this is reached with the removal of duplicated or otherwise redundant genes (e.g. genes relating to roots in rootless U. gibba) and virtually all noncoding repetitive DNA (transposable elements). This finding suggests a limited role of repetitive DNA in the regulation of complex eukaryotic genomes. However, this tells nothing about the reasons and driving forces behind this extreme DNA shrinkage, which is important for understanding why variations in plant genome size and genome architecture exist. Clearly, answering this question will require future, targeted comparisons between species selected with regard to the evolutionary history of miniaturization events and the specific hypotheses addressed.

In order to extend the contemporary pool of suitable model species and to improve current knowledge on the history of miniaturization events in Lentibulariaceae, an extensive survey and phylogeny-based analysis of genome size evolution in 119 (c. 35%) of Lentibulariaceae species is presented. Genomic DNA base composition (GC content) is also reported for all taxa to add further to the knowledge of the process of genome miniaturization.

Materials and Methods

Samples for the measurements were mainly from the authors’ private and institutional collections with a few species provided by other Czech carnivorous plant collections (Supporting Information Tables S1, S2). In most samples, original species identification was verified based on their flower morphologies. The genome size (referred to as the 1C value in this paper) and GC content were measured with flow cytometry on two CyFlow flow cytometers (Partec GmbH, Münster, Germany) using the base unspecific, intercalating fluorochrome propidium iodide (PI) and the AT-selective DAPI (4’,6-diamidino-2-phenylindole). The details of the procedure and the concentrations of reagents followed Šmarda et al. (2008). The fully-sequenced Oryza sativa subsp. japonica ‘Nipponbare’ (1C = 388.8 Mbp, GC = 43.6%; International Rice Genome Sequencing Project, 2005) was the internal reference standard and four other internal standards, whose genome size and GC content were derived from comparison with this Oryza cultivar, were used (Methods S1). Every sample was measured at least three times (on different days) and replicated measurements were averaged (Table S3).

In addition to the measured genomic characters, information on chromosome number, life-form, altitudinal and latitudinal distribution, and distributions on particular continents was compiled from the literature or based on personal experience (Table S2, Methods S1).

For the purpose of phylogeny-based analyses, we constructed a Bayesian, ultrametric phylogenetic tree for the measured species (Figs 1, S1). The tree is based on the concatenated alignment of available sequence data from one nuclear (ITS) and three plastid regions (rps16, matK, trnL-F) searched in the NCBI GenBank database (Benson et al., 2013; Table S1). The details on the tree construction are found in Methods S1.

Figure 1.

Ancestral state reconstruction of genome size in Lentibulariaceae. Significant decreases and increases of genome size (< 0.05) are marked, respectively, with blue and red arrows. Genome sizes referring to samples with probable recent polyploid origin are marked with grey asterisks.

The relationships between genome size, GC content and other trait variables were tested using the phylogenetic generalized least-squares (pgls) in the caper package (function pgls; Orme et al., 2012) of R (R Core Team, 2013). Ancestral genome sizes were reconstructed using maximum likelihood (using function ace from R package ape v. 3.0-10; Paradis et al., 2004) and visualized on the tree with contMap function of R package phytools v. 0.2-80 (Revell, 2012). Significant increases or decreases in genome size (Fig. 1) or GC content (Fig. S2) were detected by comparing the actual ancestral node values vs the random node values obtained with the same procedure, calculated with randomly reshuffled tip values. The randomization was repeated 999 times. All the statistics were done with log10 transformed data on genome sizes and logit transformed values (with natural logarithm) of the GC contents.

Results and Discussion

Summary and reliability of the data

The Lentibulariaceae species clearly have smaller genomes when compared with the related families of the Lamiales (Fig. 2). Approximately 95% of the 119 measured taxa have a 1C-value smaller than 1000 Mbp and 19 have a genome size smaller than that of Arabidopsis (Table 1). Our results mostly agree with those of Greilhuber et al. (2006), although some minor differences may appear due to the slightly different genome sizes assumed for the genome size standards (cf. Methods S1). The species with the smallest known genome size in the Lentibulariaceae (and all angiosperms) still remains G. aurea (63.4 Mbp; Greilhuber et al., 2006). Our measurement of the genome size of this species (1C = 131 Mbp), however, is almost exactly double that reported by Greilhuber et al. (2006) and corresponds to a different ploidy level (‘tetraploid’) within this morphologically and karyologically variable species (Rivadavia, 2002; Albert et al., 2010). Similarly, in Pinguicula ehlersiae, the two-fold difference in the measured genome size (1C = 978 Mbp in our study vs 1C = 487 Mbp by Greilhuber et al., 2006) also corresponds with the existence of two ploidy levels (2n = 22, 44; Casper & Stimper, 2009). Some other disagreements reported here, such as in Genlisea violacea, are perhaps due to the unrecognized taxonomic diversity, noting that the G. violacea complex has only recently been divided into five separate species (Fleischmann et al., 2011). Unrecognized karyological variability (aneuploidy) known in several Lentibulariaceae species (cf. Table 1) may cause further differences.

Table 1. Results of genome size and genomic DNA base composition (GC content) measurements together with published data on chromosome number
Species1C (Mbp)GC (%)2n
  1. ASpecies where flowering individuals were not available for identification. Chromosome numbers were taken from BSarkar et al. (1980), CLöve & Löve (1982), DTanaka & Uchiyama (1988), ETaylor (1989), FRahman et al. (2001), GGreilhuber et al. (2006), HCasper & Stimper (2009). Chromosome counts that probably do not refer to the measured plants are in brackets.

 aurea 13138.9(52G)
 flexuosa 112144.3
 glandulossisima A 16934.1
 hispidula 141741.5
 lobata 120044.016G
 margaretae A 16834.0
 nigrocaulis clone1 8038.9
 nigrocaulis clone2 73
 pygmaea 16140.7
 repens 7738.8
 subglabra 147141.7
 violacea 46043.7
 agnata 65141.122H
 bohemica 59039.864H,(32H)
 caerulea 117840.832H
 chilensis 24139.416H
 colimensis 60042.522H
 corsica 34439.916H
 hirtiflora 52940.728H
 cyclosecta 50040.022H
 dertosensis A 70838.964H
 ehlersiae 97840.444H,(22H)
 emarginata 71740.922H
 esseriana 76040.532H
 gigantea 59840.822H
 gracilis 51840.922H
 grandiflora 42439.132H
 gypsicola 50140.322H
 hemiepiphytica 70241.822H
 heterophylla 52239.722H
 ibarrae 67641.222H
 jarmilae 17342.4
 jaumavensis 49540.422H
 laueana 78941.622H
 longifolia ssp. caussensisA62339.232H
 lusitanica 66543.212H
 macroceras A 59139.964H
 macrophylla 62741.122H
 mirandae 66341.2
 moctezumae 57241.622H
 moranensis 71341.822H,(44H)
 mundi 61639.964H
 planifolia 58343.132H
 primuliflora 83039.822H
 rectifolia 67641.522H
 reichenbachiana A 46938.732H
 rotundiflora 54740.822H
 vallisneriifolia 34439.432H
 vulgaris 58338.864H
 alpina 15939.918E
 amethystina A 38240.1
 asplundii 20241.1
 aurea 19338.342E,80D
 aureomaculata A 10435.5
 australis 20040.036E,38E,40E,44E
 bifida 24542.4
 biloba 15039.1
 bisquamata 30844.5
 blanchetii 12940.1
 bremii 29940.136F
 caerulea 70643.236E,40E
 calycifida 28743.9
 chrysantha 40440.3
 cornuta 10239.818E
 dichotoma 24641.428E
 dimorphanta 18738.644F
 endresii 13338.4
 flaccida 34942.1
 floridana 10039.9
 fulva 12038.4
 geminiloba 28738.4
 geminiscapa A 19139.1
 gibba 10339.928E
 graminifolia A 37740.8
 hirta 15241.3
 humboldtii 22841.6
 hydrocarpa 10736.8
 inflata 31340.1
 intermedia 20339.244E
 involvens A 28741.2
 juncea 10639.418E
 laxa 38145.1
 livida 23942.036E
 longeciliata 23443.3
 longifolia 9741.1
 macrorhiza 19339.440E,42E,44E
 menziesii 27441.4
 microcalyx 19742.9
 minor 19038.836E,40E,44E
 minutissima 20342.1
 monanthos 16540.9
 nana A 56140.5
 nelumbifolia 34939.7
 nephrophylla 24737.0
 ochroleuca 20339.240E,44E,46E,48E
 paulineae 15939.6
 praelonga A 16242.4
 prehensilis 52642.8
 pubescens 23242.8
 purpurea 7934.4
 quelchii 19140.7
 radiata 16338.4
 reflexa 27038.8
 reniformis 29238.0
 resupinata 16939.036E,44C
 rostrata 19141.6
 sandersonii 20441.4
 stellaris 19239.540B,42E
 striata 11741.1
 stygia 31540.6
 subulata 34041.230E
 tenuicaulis 18338.540D
 tricolor 26241.428E
 tridentata A 14239.3
 uliginosa 11639.6
 uniflora 24540.856E
 volubilis 21140.6
 vulgaris 19939.336E,40E,44E
 warburgii 32444.3
 welwitschii 29842.0
Figure 2.

Comparison of the measured genome sizes of Lentibulariaceae genera with genome size data from other Lamiales families in the Plant DNA C-value Database (Bennett & Leitch, 2005). Boxplots show the median (thick horizontal line), interquartile range (boxes), nonoutlier range (whiskers) and outliers (circles). The red horizontal line indicates the predicted genome size of the common Lentibulariaceae ancestor. Sister relatives: Acanthaceae, Bignoniaceae, Martyniaceae, Pedaliaceae, Verbenaceae; near relatives: Lamiaceae, Orobanchaceae, Paulowniaceae, Phrymaceae. Numbers of species displayed per group are given in brackets. The Lentibulariaceae family has a significantly smaller genome size than both its sister relatives and near relatives (two-sample Wilcoxon test; both comparisons < 0.05).

Our GC content estimate of U. gibba (39.9%) agrees well with that reported for the complete genome sequence (GC = 40.0%; Ibarra-Laclette et al., 2013). However, some difference is found between our GC content estimate of G. aurea (38.9%) and that reported from the partial genomic sequence (40.0%) by Leushkin et al. (2013). This difference might arise from gaps in the genomic data and/or may correspond to a different ploidy between races of G. aurea, with our sample possibly being tetraploid.

Genome size evolution

The genome size of the common ancestor of the family is estimated to be 414 Mbp (95% confidence interval: 284–603 Mbp), which is less than that of any of the close Lentibulariaceae relatives (Fig. 2). In spite of this relatively small ancestral genome size, further miniaturizations can be recognized in the evolution of the family. The exceptional tendency for genome miniaturization is most remarkable in Utricularia (Fig. 1), where ultra-small genomes (1C < 100 Mbp) have evolved independently in three clades: U. sect. Foliosa – (U. longifolia), U. sect. Vesiculina – (U. purpurea) and U. sect. Utricularia (U. floridana; not shown in the phylogenetic tree because of absence of sequence data). Beyond Utricularia, other prominent miniaturization is found in Genlisea. Here, significant genome miniaturization accompanies the evolution of G. sect. Genlisea and G. sect. Recurvatae (Fig. 1). These sections typically contain species with very small genomes (all 1C < 170 Mbp; the smallest one in our dataset represented by G. nigrocaulis clone 2, 1C = 73 Mbp). This contrasts with other Genlisea clades possessing larger genomes, with G. subglabra (1C = 1471 Mbp) having the largest genome in the whole family (Fig. 1).

In contrast to Utricularia and Genlisea, genome size evolution in Pinguicula is less dramatic, showing a consistent tendency for genome expansion. The only miniaturizations appear in P. jarmilae and P. chilensis (Fig. 1). The quiet genome size evolution of Pinguicula allows some of the genome size differences to be ascribed to recent polyploidy, e.g. between the closely related P. jaumavensis (2n = 2x = 22, 1C = 495 Mbp) and P. ehlersiae (2n = 4x = 44, 1C = 978 Mbp). In Utricularia and Genlisea the chromosome counts do not correlate with the observed genome sizes in any predictable way. This suggests that recent polyploidy has only a limited effect on the extreme size dynamics of Lentibulariaceae genomes. Consequently, this variation is most likely to be caused by differences in the content of noncoding repetitive DNA, as was indeed documented by the recent detailed genomic data (Ibarra-Laclette et al., 2013; Leushkin et al., 2013). Variation in repetitive DNA is the general reason for large-scale variation in plant genome sizes (Bennetzen et al., 2005; Grover & Wendel, 2010). In Genlisea and Utricularia, however, the turnover of noncoding DNA is unusually high, with large genome size differences generated relatively quickly, even among closely related species. This provides a unique opportunity for effective study of the principles and the reasons of genome size variation in plants.

While the outcome of genome miniaturization in Lentibulariaceae is recognized, the reasons for and driving forces behind this drastic genome miniaturization remain unclear. The obvious interest in Lentibulariaceae lies in carnivory, which is an adaptation to nutrient-poor environments. As expected by Leitch & Leitch (2008), the plants with larger genomes could be disadvantaged in such places, possibly because of phosphorus and/or nitrogen limitation. Members of the Lentibulariaceae usually grow under harsh conditions of nutrient-poor soils or waters. Here, the evolutionary pressure on genome size could be very strong, thus keeping the genome sizes of Lentibulariaceae species very low. However, species with miniaturized genomes did not show any common morphological and ecological features, and genome size showed no relationship with life-form or any ecological variables tested (pgls,> 0.05). This indicates that nutrient availability or environmental selection play perhaps only a minor role in driving the extreme genome miniaturizations. Nevertheless, nutrient limitation and associated carnivory may have been the actual reason for the initial genome size reduction in the Lentibulariaceae ancestor as well as the factor preventing excessive genome growth. This hypothesis needs further testing by comparing the genome sizes of carnivorous taxa with their noncarnivorous relatives.

Albert et al. (2010) and Ibarra-Laclette et al. (2011a,b) presented a unique mechanism of energy production which leads to the formation of reactive oxygen species. These can damage DNA molecules, possibly causing loss of the damaged DNA region. Utricularia and Genlisea might therefore be in an active process of genome downsizing without an external selection pressure. Both Utricularia and Genlisea (but not Pinguicula) are also known for extremely high substitution rates (Jobson & Albert, 2002; Müller et al., 2004; Ibarra-Laclette et al., 2011a,b), which could correspond with the influence of these reactive oxygen species. Such processes might indeed serve as a mechanistic explanation of the extremely high mutation rates and variable genome sizes observed in both genera. However, even with the data available on the complete sequence of U. gibba, the role of increased mutation rate in driving genome shrinkage in Lentibulariaceae genomes could not be verified (Ibarra-Laclette et al., 2013).

GC content

This survey of the genomic GC contents in Lentibulariaceae has shown that both genome quantity and quality have a surprising pattern of variation within the group. The unusually wide variation of genomic GC contents appearing even within a genus (10.7% difference in Utricularia and 10.2% in Genlisea) is particularly interesting. This variation covers a substantial part of the entire known genomic GC content variation in vascular plants (ranging from 33% to 50%; Šmarda & Bureš, 2012) and represents the highest difference so far determined within a plant family or genus. The notably low GC contents are found in G. sect. Recurvatae (G. margaretae, G. glandulossisima with GC = 34.0% and 34.1%, respectively) and in U. purpurea (GC = 34.4%; Tables 1, S3, Fig. S2). The increased GC content is typical of G. sect. Tayloria (all GC > 43.7%) and occurs also in several clades of Utricularia with the most GC rich Lentibulariaceae genomes found in U. laxa (GC = 45.1%; Tables 1, S3).

GC content correlates well with genome size in both GC variable genera (Fig. 3), Utricularia (pgls, λ = 1, P < 0.001) and Genlisea (pgls, λ = 1, P = 0.019; excluding the outlying G. sect. Recurvatae). In Pinguicula, the phylogenetic trend between GC content and genome size is absent (pgls, λ = 1, P = 0.497; Fig. 3), perhaps due to the fact that Pinguicula genomes are mostly shaped by polyploidy (whole genome duplication) which has no direct effect on the overall genomic GC content. The correlation between GC content and genome size in Genlisea and Utricularia indicates that the extreme GC content variation of their genomes primarily relates to the high genome size dynamics and to the processes of genome miniaturization and genome growth. Assuming that coding DNA would form only a minor part of the removed or amplified DNA (because of the direct effect of gene loss or duplication on plant fitness), the most intuitive explanation for this trend would be the preferential removal or amplification of GC-rich, noncoding DNA (Šmarda & Bureš, 2012; Veselý et al., 2012). However, the exact proof of this, with detailed sequence data, still poses a challenge.

Figure 3.

Comparison of genome sizes with genomic DNA base composition (GC content) in particular Lentibulariaceae genomes. GC content is positively correlated with genome size in Utricularia (blue squares) and Genlisea (red circles) but not in Pinguicula (yellow circles) (pgls α = 0.05).

Given that coding DNA is regularly the most GC-rich component of plant genomes and noncoding DNA is usually GC-poor when compared with genes (cf. Šmarda & Bureš, 2012), one would expect high GC-richness in the miniature Lentibulariaceae genomes. This work has, however, revealed several species whose very small genomes were surprisingly GC-poor (Genlisea margaretae, G. glandulossisima and Utricularia purpurea with 34.0%, 34.1% and 34.4%, respectively). These approach the minimum genomic GC content yet known in some Cyperaceae and Juncaceae species (Šmarda & Bureš, 2012; Šmarda et al., 2012; Lipnerová et al., 2013; P. Šmarda et al., unpublished). These whole genome GC contents are even lower than the GC content of the noncoding genome fraction of U. gibba (GC = 35.9%; Ibarra-Laclette et al., 2013), indicating a very different genome structure of the GC-poor species compared with the other miniature-sized genomes of Lentibulariaceae. Such a low GC content could be reached with the frequent presence of AT-rich, noncoding DNA, which is less probable due to the minimal genome size of all three species and the expected high content of coding DNA. Therefore, the depletion of GC bases must also include the coding DNA and perhaps affects the structure of genes. This suggests the existence of an additional mechanism shaping the miniature Lentibulariaceae genomes, together with the removal and amplification of noncoding DNA. Sequencing of any of the GC-poor miniature genomes of Lentibulariaceae and their comparison with the available genomic sequences for GC-rich G. aurea and U. gibba (Ibarra-Laclette et al., 2013; Leushkin et al., 2013) now seems to be a promising way of detecting this mechanism, which might substantially improve our understanding of the reasons behind the evolution of the GC-poor genome architectures also found in other small-genomed plants.


This study was supported by the Czech Science Foundation (projects P505/11/0551, P506/11/0890, P504/11/0783, 13-29362S) and the Academy of Sciences of the Czech Republic (to L.A., project RVO 67985939). The authors thank Brian G. McMillan for language correction, Miroslav Macák, Jaroslav Neubauer, Michal Rubeš and David Švarc for providing fresh plant material of several species, and Andreas Fleischmann for help with identification of three Genlisea samples.