Measurements of genomic GC content in plant genomes with flow cytometry: a test for reliability


Author for correspondence:
Petr Šmarda
Tel: +420 532 146 275


  • Knowledge of the phylogenetic pattern and biological relevance of the base composition of large eukaryotic genomes (including those of plants) is poor. With the use of flow cytometry (FCM), the amount of available data on the guanine + cytosine (GC) content of plants has nearly doubled in the last decade. However, skepticism exists concerning the reliability of the method because of uncertainty in some input parameters.
  • Here, we tested the reliability of FCM for estimating GC content by comparison with the biochemical method of DNA temperature melting analysis (TMA). We conducted measurements in 14 plant species with a maximum currently known GC content range (33.6–47.5% as measured by FCM). We also compared the estimations of the GC content by FCM with genomic sequences in 11 Oryza species.
  • FCM and TMA data exhibited a high degree of correspondence which remained stable over the relatively wide range of binding lengths (3.39–4.09) assumed for the base-specific dye used. A high correlation was also observed between FCM results and the sequence data in Oryza, although the latter GC contents were consistently lower.
  • Reliable estimates of the genomic base composition in plants by FCM are comparable with estimates obtained using other methods, and so wider application of FCM in future plant genomic research, although it would pose a challenge, would be supported by these findings.


Research on DNA base composition (DNA quality) and DNA quantity began in the 1950s and has continued since (Chargaff, 1950; Swift, 1950; Lee, 1968; Bennett & Smith, 1976; Shapiro, 1976). Although there are genomic size data for several thousands of plant species (Bennett & Leitch, 2011), data on the base composition of plants have been published for only c. 300+ species (Shapiro, 1976; Meister & Barow, 2007; Šmarda et al., 2008). Moreover, these data are frequently inconsistent as a consequence of various methodological biases, and conclusions about the gross phylogenetic pattern of the base composition, its biological relevance, and underlying molecular forces cannot be drawn.

Knowledge concerning genome size has significantly accumulated since the introduction of flow cytometry (FCM), a method based on the fluorescence measurement of individual nuclei in a fluid suspension aligned in a linear stream using hydrodynamic focusing (Robinson & Grégori, 2007). In addition to the widespread nonspecific dyes used for DNA content quantification, this method also allows the use of base-specific dyes (Doležel et al., 1992; Marie & Brown, 1993; Vinogradov, 1994, 1998) to calculate the base composition in a manner similar to the routine FCM analysis of genome size (Barow & Meister, 2002; Meister & Barow, 2007). FCM requires only a small amount of material and is relatively easy to use, fast and cheap, which gives this method an advantage over many other previously used biochemical methods that were much more complicated and/or time-consuming (e.g. thermal denaturation, paper chromatography, and buoyant density centrifugation; reviewed in Shapiro, 1976; Meister & Barow, 2007). Unfortunately, the methodological uncertainty of FCM prevents its routine application and general acceptance in plant genomic research.

Knowledge of the concentration of any base in DNA allows calculation of the concentration of the remaining bases using universal base-pairing rules (cytosine (C)% = guanine (G)%; adenine (A)% = thymine (T)%; G% + C% = 100 – (A% + T%)). Conventionally, the base composition is expressed as the per cent proportion of G + C (i.e. GC content), which is assumed to have the most important influence on the resulting DNA quality and function. In a typical FCM experiment, the GC content is calculated from measurements of the sample and standard with two different dyes: an intercalating dye (e.g. propidium iodide (PI)) to measure the total DNA content and a base-specific dye (such as AT-specific 4′,6-diamidino-2-phenylindole (DAPI); Kapuscinski, 1995) to calculate the relative proportion of the estimated bases (Barow & Meister, 2002; Burešet al., 2004; Ricroch et al., 2005; Meister & Barow, 2007; Šmarda et al., 2008). The final DNA content and base composition are calculated by comparing the fluorescence data to the simultaneously processed standard with a known genome size and GC content. Although the use of nonspecific intercalary dyes allows comparisons with the standard and the calculation of the DNA content of a sample in a linear fashion, the base-specific dyes usually require several consecutive base pairs of the same type to form fluorescent complexes with DNA (reviewed in Meister & Barow, 2007; Table 8.1). Thus, the resulting curvilinear relationship between the dye fluorescence and the actual GC content requires more sophisticated calculations (Langlois et al., 1980; Godelle et al., 1993).

For practical reasons, all calculations assume a random distribution of bases in the genome and consider the constant number of contiguous base pairs needed for a dye to bind the DNA molecule (i.e. ‘binding length’). The calculations assume that one molecule of a dye with binding length = 3 is bound to every group of three, four or five isolated base pairs of the same type (either AT or GC) and that two molecules of a dye are bound to groups of six, seven and eight consecutive base pairs. Binding length represents the minimum space required for the dye to bind to the DNA molecule, and the actual binding and fluorescence properties of the dye are further affected by preferences for certain types of base patterning (Abu-Daya et al., 1995; e.g. AATT is more likely to be bound by DAPI than TATA; Breusegem et al., 2002) and the interactions of the dye with other molecules (Larsen et al., 1989) or a molecule of dye itself (Manzini et al., 1983). Although binding lengths may be calculated in carefully controlled experiments with oligonucleotides of known base composition and patterning, binding length cannot be precisely calculated for unknown natural DNA where different types of nonrandom base patterning occur in different parts of the genome (e.g. in GC-rich exons and AT-rich repetitive DNA) or where complex interactions may exist with other molecules released during the sample preparation in an FCM experiment. Therefore, contemporary FCM calculations of GC content are only approximate, as many theoretical assumptions that are used in the calculations are not necessarily realized in the actual experiments.

There are a number of alternative base-specific dyes which are particularly useful for the measurements of base composition; for example, Hoechst33258, Hoechst 33342, DAPI, chromomycin A3, mithramycin A and olivomycin (Doležel et al., 1992; Marie & Brown, 1993; Vinogradov, 1994, 1998), of which DAPI is the most frequently used for FCM measurements of GC content in plants (cf. Meister & Barow, 2007). Compared with other common dyes, the molecule of DAPI is relatively small, which may favor the use of DAPI in extremely large genomes with highly condensed chromatin, where accurate staining with larger dyes (such as olivomycin; cf. Vinogradov, 2005) may be problematic with standard plant FCM protocols (e.g. Doležel et al., 2007). DAPI is well known to form fluorescent complexes with at least three consecutive A or T bases in experiments with synthetic oligonucleotides (i.e. binding length = 3; Kapuscinsky & Szer, 1979; Jansen et al., 1993; Trotta & Paci, 1998). In synthetic oligomers, such as CGCGAATTCGCG, it was also demonstrated that DAPI could form a complex with one molecule of water, which complex spans four AT bases (i.e. binding length = 4; Larsen et al., 1989). The practical FCM application indicated that the proper binding length may range from 2.77 to 4.22, with the universal value for FCM experiments recommended to be binding length = 4 (Barow & Meister, 2002). However, with lower affinity and considerably lower fluorescence, DAPI might also form fluorescent complexes analogous to DAPI–DNA with some mucopolysaccharides, polyphosphates or proteins (tubulins) released during sample preparation (reviewed in Kapuscinski, 1995) that might affect the observed fluorescence and GC content estimates in real FCM experiments (cf. Noirot et al., 2002). The possible deviations from random base patterning in a sample, the uncertainty about the effect of secondary compounds and the approximation of the binding lengths contribute to the question of reliability concerning the use of FCM to estimate the CG content across plant phylogeny and the diversity of the entire genome.

Comparisons between the FCM data on GC content and older estimates using biochemical methods yeilded very similar results, with differences between FCM and older estimates of the GC content > 6% in some species (cf. Meister & Barow, 2007). It is still not clear to what extent these discrepancies between results of FCM and other methods are consequence of the above discussed reason and to what extent these discrepancies are attributable to inaccuracy of other methods and other studies comparing them with FCM. Indeed, the practical limits of the determination of base composition by FCM are not well understood. To date, the only comparison of FCM-derived GC content data with the complete genomes of Arabidopsis thaliana and Oryza sativa showed large inconsistencies (Meister, 2005), because large regions of the GC-rich repetitive centromeric sequences in A. thaliana were not contained in the ‘complete’ genome sequence (Bennett et al., 2003) which makes any serious comparison with the much complete genome of O. sativa impossible. To our knowledge, no other comparisons between GC estimates obtained from sequence and FCM data have been made.

Therefore, we tested the reliability and practical limits of measurements of base composition (GC content) in plant genomes by comparing the results of FCM analysis with measurements made by the alternative biochemical method of DNA temperature melting analysis (TMA). The carefully designed comparison was performed in 14 species across a phylogenetic spectrum of monocots with the highest GC content diversity discovered to date. TMA analysis has been widely used in microbiological research in past decades, and data obtained using this method have been found to exhibit a tight correlation with data obtained using other alternative biochemical methods (De Ley, 1970) and with genomic data from the complete genome sequences of numerous microorganisms (Fig. 1). In addition, we compared GC content data from our ongoing flow cytometry project in Oryza with the exact sequencing data from the Oryza Map Alignment Project (Wing et al., 2005; Ammiraju et al., 2006).

Figure 1.

Comparison of the GC content in bacteria calculated from historical melting temperature analysis (TMA) data (De Ley, 1970) with the corresponding GC content from complete genomic sequences (SEQ) revealed a high correspondence and high reliability of TMA. Only data for which the reported species (open circles) or strains (closed circles) clearly matched in both data sets were compared (see the Materials and Methods section for details).

Materials and Methods

Plant material

Fourteen species (including an internal standard) of monocot plants were selected as representatives of both the phylogeny and range of GC content (Table 1). The species were selected using the following criteria: the species had to have similar genome sizes to allow the use of a single FCM standard in the measurements (included within the compared data set); they had to contain no secondary compounds affecting the proper binding of the dye (evaluated based on the perfect peak quality); and plants had to be of sufficient size to permit a large quantity of fresh material to be sampled from a single individual for the TMA analysis (Table 1). The measurements using FCM and TMA analysis were made on a sample from a single individual of a species to prevent the eventual effect of individual variation. In the experiment with Oryza taxa, samples of the same accession number as those used in the Oryza Map Alignment Project (excluding Porteresia coarctata) were propagated from seeds kindly provided by the International Rice Research Institute (IRRI; Los Baños, Philippines), and one leaf per species (one individual) was used in all FCM measurements.

Table 1.   Results of flow cytometry and DNA melting temperature measurements
SpeciesFlow cytometryDNA melting temperature
GC content mean (%)GC min–max range (%)2C DNA content (Mbp)2C CV (%)GC content mean (%)GC min–max range (%)GC FCM–TMA (%)
  1. CV, coefficient of variation; min–max, absolute difference between minimum and maximum GC contents estimated in individual measurements; 2C, nuclear DNA content of non-endopolyploid sporophytic cells.

Brachypodium pinnatum (L.) P. Beauv.46.40.7712312.1446.40.220.06
Canna indica L.39.70.5022901.1639.51.660.20
Carex acutiformis Ehrh.35.60.087760.3735.61.12−0.05
Dactylis polygama Horv.45.10.3038000.6645.01.590.05
Dioscorea caucasica Lipsky42.50.5911191.1542.60.88−0.07
Fosterella penduliflora (C. H. Wright) L. B. Sm.38.00.6716082.0440.00.46−1.95
Juncus inflexus L.33.70.736571.8634.40.46−0.77
Luzula sylvatica (Huds.) Gaudin33.60.629541.0335.81.10−2.19
Molineria capitulata (Lour.) Herb.42.30.2627150.8141.71.050.52
Oryza sativa L. ‘Nipponbare’43.677843.10.410.53
Schoenoplectus lacustris (L.) Palla35.80.2110500.4534.51.241.24
Stipa calamagrostis (L.) Wahlenb.47.50.4913031.4249.01.07−1.49
Strelitzia nicolai Regel & K. Koch41.50.3913280.8939.80.221.71
Zea mays L.47.40.6444461.5349.41.63−2.02


FCM analysis was performed according to the approach used by Barow & Meister (2002), and the methods were the same as those in Šmarda et al. (2008). The genome size and GC content were measured by FCM at the Department of Botany and Zoology, Masaryk University (Brno, Czech Republic). The measurements were conducted on two CyFlow ML flow cytometers (Partec, GmbH; Münster, Germany) using two different fluorochromes – propidium iodide (PI) for absolute DNA content estimations, and an AT-specific DAPI for calculation of the GC content. A two-step procedure (Otto, 1990) was used for sample preparation. Briefly, c. 0.5 cm2 pieces of young leaves of the sample and the standard were chopped together using a sharp razor blade in a Petri dish containing 1 ml of Otto I buffer (0.1 M citric acid and 0.5% Tween 20). Subsequently, an additional 1 ml of Otto I buffer was added. The crude nuclear suspension was filtered through a 50-μm nylon mesh. The filtered suspension was divided into two sample tubes to which either 1 ml of Otto II buffer (0.4 M Na2HPO4·12 H2O) supplemented with DAPI or 1 ml of Otto II buffer containing PI was added. The final concentrations of PI and DAPI were 50 and 2.0 μg ml−1, respectively.

Of the 14 selected species, Fosterella penduliflora was selected as an FCM internal standard because no confluence of its peaks with the peak of any of the other samples was observed and because its genome size was intermediate in the selected species pool, ensuring a < 2.8-fold difference between the sample and the standard peak for all measurements. All 13 test species were measured with F. penduliflora as the internal standard within 1 d. This series of measurements was repeated on five consecutive days, and the results for PIsample/PIstandard ratios and DAPI factors (= (DAPIsample/DAPIstandard)/(PIsample/PIstandard); Barow & Meister, 2002; eqn 6) from particular measurements/days were averaged consecutively. The genome size and GC content of F. penduliflora were calculated on the basis of comparisons with those of O. sativa‘Nipponbare’, which has a nearly completely known genomic sequence (2C DNA content = 777.64 Mbp; GC = 43.6%; International Rice Genome Sequencing Project, 2005), and used to calculate genome size and GC content in the remaining species. The calculations of GC content were performed with the equations published in Barow & Meister (2002, eqns 7, 8) using mathematical approximation by the regula falsi method in an automated Microsoft Excel sheet (Šmarda et al., 2008: The standard calculations were performed with binding length of DAPI = 4, as recommended by Barow & Meister (2002).

The 11 wild-type Oryza taxa used for sequencing data comparisons were measured on the same instruments and by the same method as the selected monocots, with the exception of the internal standard, which was either the single individual of O. sativa‘Nipponbare’ or a single individual of Zizania latifolia (Griseb.) Stapf in the case of any Oryza/Oryza peak interference. In that case, the GC content of the unknown Oryza sample was calculated from the genome size and GC content of Z. latifolia estimated from previous measurements with O. sativa‘Nipponbare’ as the standard. The measurements of the Oryza species were repeated three times on different days and averaged consecutively. Assuming that inappropriate binding length was the only reason for a discrepancy between the FCM and either the TMA (monocots) or sequence (Oryza species) data, we also calculated new binding lengths for both data sets by searching for the binding length that minimized the sum of the squared distances between the FCM and TMA or FCM and sequence GC content data. This approximative search was performed manually with known DAPI factors on the previously mentioned automated Excel sheet.

DNA melting

The TMA is based on the difference in stability between the AT (double hydrogen bond and weaker stacking interactions) and GC (triple hydrogen bond and stronger stacking interactions; Yakovchuk et al., 2006) base pairs. The differences in GC content result in different DNA melting (denaturation) temperatures. The Tm (i.e. the temperature at which 50% of DNA is melted) can be determined from experiments that measure DNA melting during continual heating (Marmur & Doty, 1962; Mandel & Marmur, 1968).

In our experiment, DNA was isolated from all 14 species using the DNeasy Plant Mini Kit (Qiagen) following the manufacturer’s instructions, with minor modifications to ensure the highest DNA quality possible, as indicated by the shape of the melting curves. Approximately 1 g of frozen plant tissue was disrupted in liquid nitrogen, and 4 ml of buffer AP1 was added; the volumes of buffer AP2 and AP3/E were increased correspondingly. Following DNA extraction and RNase A incubation (step 7), the samples were subjected to chloroform treatment. Chloroform was mixed 1 : 1 with AP1 buffer, added to the RNase A-digested samples, mixed by inversion and centrifuged for 5 min at maximum speed. The use of a QIAshredder Mini spin column (step 11) was omitted and substituted with an additional chloroform treatment (added at a ratio of 1 : 1 with the AP1/AP2 buffers, mixed by inversion and centrifuged for 5 min at maximum speed). The wash with AW buffer (step 17) was followed by an additional wash with 500 μl of ethanol (96%). The DNeasy Mini spin column was subsequently centrifuged for 3 min at maximum speed. Instead of AE buffer, the DNA was eluted into double-distilled water. Each sample was isolated twice, with TMA performed two or three times independently with each isolation (Supporting Information Table S1).

The melting curves were obtained by monitoring the UV absorption of DNA at 260 nm at a gradually increasing temperature on a Libra S22 (Biochrom; Cambridge, UK) spectrophotometer equipped with a heated cuvette holder and a temperature control unit, TK-10 (Keva; Brno, Czech Republic). The temperature program was as follows: 5 min at 35°C and a subsequent temperature increment of 0.5°C min−1 to a final temperature of 101°C. The DNA stock solution was prepared by dissolving DNA in 0.1× SSC (Sigma-Aldrich). A volume of 250 μl of the stock solution was placed into the quartz Suprasil microcuvette with a 10-mm light path length and covered with 300 μl of paraffin oil (Sigma-Aldrich) to prevent evaporation during the temperature ramp. The absorbance was measured every 30 s during the experiment.

The temperature data were collected using the data logger Testo 175-T3 (Testo AG; Lenzkirch, Germany) at 30-s intervals. The temperature calibration was performed four times using a temperature probe that was inserted into the center of a microcuvette filled with the same volume as the real sample. The four calibration measurements were averaged.

The melting temperatures were calculated from the absorbance and temperature data according to a standard formula (Wartell & Benight, 1985; eqn 1). The GC content was calculated from the melting temperatures according to the equations published in Schildkraut & Lifson (1965), including the calculation of GC percentage from Marmur & Doty (1962) as follows: inline image.

To demonstrate the reliability of the TMA in estimating GC content, we compared older data for the Tm values of bacteria with data from complete genome sequences (Fig. 1). A search of the review of older Tm data by De Ley (1970) and a list of complete bacterial genomes (, version 8 July 2011) revealed 37 taxa present in both databases. The list published in De Ley (1970) was revised on the basis of the taxonomic status of taxa using the StrainInfo database (, and only clearly corresponding taxa were included in the analysis. Occasionally, we found that TMA data were available for the same strains as were being consecutively sequenced. When multiple data were available for one taxon from either sequencing or TMA, the data were averaged before comparison. Exceptions were made in cases of a complete strain match in the TMA and sequence data, where only these completely matching data were taken instead of calculating species averages. The GC contents from the TMA data review by De Ley (1970) were calculated in the same manner as in the present paper, assuming the former authors were consistently using 0.195 M Na+ (i.e. 1.0× SSC) concentrations in the TMA. Data from the two older studies carried out in the same laboratory (Saunders et al., 1964; Welker & Campbell, 1967) were removed from the data set because those studies consistently reported considerable overestimations of the GC content. The final data set comprised 32 taxa, including six with strain matches in TMA and sequence data (Fig. 1).

Statistical analyses

The statistical analyses were performed and their graphical outputs prepared using the statistica 9.0 software package ( The calculation of the coefficient of determinations for regression relationships, with intercept equal to zero and slopes equal to 1, was performed manually in Microsoft Excel.


The GC contents measured by FCM in 14 selected monocot species ranged from 33.6% in the rush Luzula sylvatica to 47.5% in the grass Stipa calamagrostis (Table 1). The results of TMA were similar to those of FCM. The preliminary regression analysis of the results obtained using the two methods revealed a nonsignificant intercept, and the slope did not differ significantly from 1 (P > 0.05). When the intercept was removed and the slope set to 1, the final regression relationship remained tight, with a coefficient of determinacy of R2 = 0.935 (Fig. 2). The differences between the GC content estimates in the same individuals obtained using the two methods did not differ by > 2%, with a maximum difference of 2.19% in the case of L. sylvatica (Tables 1, S1). This difference is higher than that observed between the TMA and sequence data for the same bacterial strains (Fig. 1; maximum difference 0.66% in GC content), indicating that the bias introduced with FCM is perhaps higher than that of TMA. No systematic bias (under- or overestimation) was found with either method across the entire GC content range (Table 1, Fig. 2). Changing the binding length of DAPI had no substantial effect on the correspondence of FCM and TMA estimates, and the R2 of the relationship FCM = TMA remained > 0.93 for a wide range of DAPI binding lengths, from 3.39 to 4.09. The best match between FCM and TMA was with a binding length of DAPI = 3.74 (R2 = 0.941).

Figure 2.

Comparison of the GC contents measured in the same individuals of 14 monocot species with flow cytometry (FCM) and DNA temperature melting analysis (TMA) revealed a high correspondence between the two methods.

Our pilot FCM measurements of the GC content in 11 Oryza species were highly correlated with sequence data for the same accessions obtained by Ammiraju et al. (2006). When the nonsignificant intercept (P > 0.05) was removed, the coefficient of determinacy was R2 = 0.894 (Fig. 3). The GC content estimates obtained using FCM were consistently 1.028-fold higher (c. 1% GC content) than those from the sequence data (Fig. 3). Changing the binding length used in FCM had only a negligible effect in improving the correspondence between the FCM and sequence data, whereas the effect of changing the assumed GC content of the standard O. sativa‘Nipponbare’ was found to be dramatic (Fig. 3). The best correspondence of the FCM and sequence data was obtained when the GC content of O. sativa‘Nipponbare’ was calculated to be 42.3% (Fig. 3), for which the coefficient of determinacy of the regression line for FCM = sequence was R2 = 0.874 for a binding length of DAPI = 4, and the maximum was R2 = 0.894 for a binding length of DAPI = 3.52. These results indicated that GC-rich sequences were systematically underrepresented in the sequence data from Ammiraju et al. (2006). The alternative possibility that the GC content of the O. sativa‘Nipponbare’ reported by the International Rice Genome Sequencing Project (2005) was underestimated seems improbable, as the sequence is relatively complete (95% cover), and the GC contents calculated with FCM on the basis of the original O. sativa (GC = 43.6%) are consistent with the GC contents provided by TMA (see Fig. 2).

Figure 3.

Comparison of the GC contents from the partial genomic sequences (SEQ) of 11 species of Oryza (Ammiraju et al., 2006, table 4) with flow cytometry (FCM) estimates in the same accessions of the same species (closed circles). Despite the high correlation (upper solid line, intercept excluded), the FCM data were consistently 1.028-fold higher than the data from the partial genomic sequences (ideally, the FCM results should be equal to the SEQ results; lower solid line). The linear regression lines (including intercepts) for the FCM results with various inputs of 4′,6-diamidino-2-phenylindole (DAPI) binding lengths (BLs) and the GC content of the standard (Oryza sativa‘Nipponbare’; Os) are shown as dot-dashed lines (point symbols are hidden).


Our analysis reveals that FCM provides accurate data on base composition, which are correlated with the results of independent biochemical methods. Although we were unable to ascertain whether the differences observed between the FCM and TMA estimates were caused by a methodological bias of either FCM or TMA, the observed difference of c. 2% in estimated GC content provides a practical estimate of the resolution limits of FCM data obtained from measurements of phylogenetically distant species with a previously unknown genome structure. Notably, this prediction was based on the results of a carefully designed experiment in which an individual standard was used for all measurements, and the measurements were performed with the same instruments and within a few consecutive days. With modifications of this design and the introduction of other sources of possible bias, the deviation might occasionally increase. Our present experience with GC content measurements in numerous taxa, including diverse accessions of different Oryza species (Šmarda et al., 2008; P. Šmarda et al., unpublished), indicates that this bias would be to a major extent of methodological origin, while biological factors, such as the possible presence of intraspecific variation, would have only a very minor effect on the accuracy and repeatability of species’ GC content estimates.

Our experiments revealed the most appropriate DAPI binding length for the FCM measurements of GC content to be slightly less than 4, which lies within the limits of DAPI binding lengths (between 3 and 4) generally predicted in experimental oligonucleotide studies (see the Introduction). Despite the general uncertainty regarding the exact value of the binding length of the AT-specific DAPI used, changing this parameter had a limited effect across the plant GC content ranges studied and on the general correspondence between the estimates obtained using FCM and other alternative methods. Hence, the DAPI binding length of 4 recommended by Barow & Meister (2002) might still be used as a universal and sufficiently robust approximation, enabling reliable FCM analysis of the GC content across a variety of plant species. Nevertheless, some care must still be taken when comparing distantly related taxa with extremely different genome compositions and GC contents, for which the importance of using the correct binding length may be substantial. Such differences might be expected, for example, when comparing plant genomes with the isochore structured genomes of warm-blooded vertebrates (e.g. human, mouse and chicken; Eyre-Walker & Hurst, 2001; Constantini et al., 2009), which cannot therefore currently be recommended as safe reference standards for calculating plant GC content with FCM. This is a critical point, because the standards currently used in FCM for the calculation of base composition are based on comparison with the sequence of human or chicken (Doležel et al., 1992; Marie & Brown, 1993; Barow & Meister, 2002; Doležel & Greilhuber, 2010), and the effect of the different base patterning on resulting GC content estimates for plant reference standards is not known. As the effect of the GC content of the standard appears to be more important than knowledge of binding length for the correct estimation of GC content in plants, we prefer the calculations of plant GC content with FCM to be based on O. sativa‘Nipponbare’ (GC = 43.6%) as the primary reference standard. Four different groups (Vij et al., 2006) have independently sequenced O. sativa‘Nipponbare’, and c. 95% of its genomic sequence is known, including the number of highly repetitive centromeric regions (International Rice Genome Sequencing Project, 2005), which is frequently absent in ‘complete’ genomic projects carried out in other plants (e.g. in Arabidopsis; Bennett et al., 2003; Meister, 2005).

In contrast to the large variation in FCM measurements obtained by comparing phylogenetically distant species with contrasting genome organizations, the experiment with Oryza and some of our former analyses with related grass species (Šmarda et al., 2008) clearly demonstrated that FCM is a sensitive method that enables the detection of small differences in GC content between closely related taxa with similar genome structures and probably similar base patterning. In comparisons between closely related species, with consistent usage of a single standard, the resolution of FCM might be on the order of tenths of GC %. Experiments in which a series of samples are compared with a single standard do not require any knowledge of the GC content of the standard, and estimates may simply be calculated using the dye factor only, enabling the use of any species as a standard (Barow & Meister, 2002; Meister & Barow, 2007). Thus, the calculation of the absolute GC value appears to be a technical problem that should not prevent a researcher from drawing conclusions about the meaning of the observed base composition variation, even if the exact GC content of the standard remains unknown.

Research on base composition has a long tradition in microbiological research, in which the base composition represents a significant classification criterion (Stackebrandt & Liesack, 1993; Stackebrandt et al., 2002), and, similar to genome size, it is an important predictor of species ecology (Rocha & Danchin, 2002; Musto et al., 2006; Mann & Phoebe-Chen, 2010). Similar research on the magnitude and dynamics of genome evolution with respect to base composition (DNA quality) and its significance for ecology and species distribution is widely lacking in plants because of the absence of sufficient DNA base composition data. Therefore, FCM provides a useful tool for the future completion of base composition data in plants and the elucidation of the evolutionary dynamics of base composition and its biological relevance.

Our present FCM data on GC content represent both the lowest and highest GC contents observed in vascular plants (Table 1) and indicate that the range of GC content in monocots is at least 14%. This wide range results partly from the extremely high GC content found for grasses, consistent with all previous FCM studies (Barow & Meister, 2002; Meister & Barow, 2007) and sequence data (cf. Šmarda & Bureš, 2012). The extremely high GC content of Poaceae is consistent with the unusual pattern of GC composition of grass genes (Kuhl et al., 2004; Guo et al., 2007), and elucidation of the selection forces that mediate the function of these genes might help to significantly improve our knowledge of the function of GC content (Šmarda & Bureš, 2012). Recent evidence indicates that these global GC content increases might coincide with intragenomic divergence in gene regulation and in overall improvement of the stress response (Tatarinova et al., 2010), which might be partly responsible for the extreme evolutionary success of grasses in modern geologic times (Šmarda & Bureš, 2012). Studies in bacterial and vertebrate genomes have suggested several hypotheses about the driving forces of DNA base composition and the effect that changing GC content may have on species ecology (reviewed in Šmarda & Bureš, 2012). One of the most widely discussed hypotheses is the thermostability hypothesis, which suggests an increased tolerance of GC-rich organisms for higher temperatures because of the higher thermal stability of GC- vs AT-rich DNA (Galtier et al., 1999; Vinogradov, 2003; Musto et al., 2006; Bernardi, 2007; Bucciarelli et al., 2009). Based on the fact that the synthesis of GC base pairs is more energetically demanding (Rocha & Danchin, 2002), fast-growing organisms, for example, may be preferentially composed of AT nucleotides, which have a lower cost and are easy to synthesize, and a decrease in genomic GC content (as compared with close relatives) might be expected in parasitic and fast-growing species, species with extremely large genomes or species living in resource-limited environments (e.g. limited in terms of nutrients, light or competitors; Šmarda & Bureš, 2012). It would be useful to test these hypotheses in plants in the near future.

The measurement of GC content is easily accessible for laboratories carrying out FCM measurements of DNA content and possessing at least two independent sources of excitation light (included within the same or situated on different flow cytometers). The FCM measurement of GC content requires only minor modifications to the standard procedures for DNA content measurements and has the same requirements for tissue quality and quantity (cf. Doležel et al., 2007; Greilhuber et al., 2007). The only differences are in the use of another base-specific dye (Barow & Meister, 2002) and sample measurement in a separate flow cytometer in parallel to the standard procedure of DNA content measurement. Our modification of this procedure for the purposes of GC content measurements consist only of chopping up the sample and the standard for both dyes together in a single Petri dish and pairwise calculations of the corresponding results (rather than comparing the averages of DAPI and PI measurements over several days), which helps to limit the possible effects of secondary metabolites and the day-to-day variation in instruments/operators. In view of the simplicity of the modifications used, and the reductions achieved in the time required to carry out the measurements and in their associated costs, we hope that the measurement of GC content in plants by FCM will find a wider application in the near future.


We are obliged to the staff of the Botanical Garden of Masaryk University for kindly providing us with several experimental plants, the International Rice Research Institute (Philippines) for providing all rice seeds, and the Czech Ministry of Education (projects MSM0021622416 and LC06073) and the Czech Academy of Sciences (grants GAČR 206/08/P222, GAČR 206/09/1405, GAČR506/10/1363, and GAČR 506/11/0890) for research funding.