Quantification of gene copy numbers is valuable in marine microbial ecology

Marine nitrogen (N 2 ) ﬁ xation is important in the global bio-geochemical cycling of N, and the interlinked cycling of carbon (C). Understanding and predicting global marine N 2 ﬁ xation requires information on diazotroph species speci ﬁ c rates of growth and N 2 ﬁ xation, their biogeography, and the physiology of nutrient limitation in diazotrophs and nondiazotrophs (for the basis of competition in models). Because many diazotrophs are not easily visualized or quanti ﬁ ed and many are yet uncultivated, the application of polymerase chain reaction (PCR) methods to amplify the nifH gene, which encodes nitrogenase, the enzyme that catalyzes N 2 ﬁ xation, have led to multiple discoveries including new microorganisms (Zehr et al. 1998; Zehr and Capone 2020). Moreover, use of quantitative PCR (qPCR) has revealed unexpected biogeography of key marine diazotrophs regionally and globally (Church et al. 2005; Bentzon-Tilia et al. 2015; Langlois et al. 2015; Messer et al. 2015; Shiozaki et al. 2017; Harding et al. 2018; Mulholland et al. 2019). Biogeochemical modeling approaches have also been invaluable for providing hypothetical dynamic biogeography of the biomass associated with size classes of diazotrophs based on a number of assumptions (e.g., growth, nutrient uptake characteristics, and mortality; Dutkiewicz et al. 2014)

of nutrient limitation in diazotrophs and nondiazotrophs (for the basis of competition in models).Because many diazotrophs are not easily visualized or quantified and many are yet uncultivated, the application of polymerase chain reaction (PCR) methods to amplify the nifH gene, which encodes nitrogenase, the enzyme that catalyzes N 2 fixation, have led to multiple discoveries including new microorganisms (Zehr et al. 1998;Zehr and Capone 2020).Moreover, use of quantitative PCR (qPCR) has revealed unexpected biogeography of key marine diazotrophs regionally and globally (Church et al. 2005;Bentzon-Tilia et al. 2015;Langlois et al. 2015;Messer et al. 2015;Shiozaki et al. 2017;Harding et al. 2018;Mulholland et al. 2019).Biogeochemical modeling approaches have also been invaluable for providing hypothetical dynamic biogeography of the biomass associated with size classes of diazotrophs based on a number of assumptions (e.g., growth, nutrient uptake characteristics, and mortality; Dutkiewicz et al. 2014).It is, however, difficult to validate these models since there are few comprehensive datasets of N 2 -fixing organism biogeography.
In a recent study, Meiler et al. (2022) attempted to use assembled qPCR nifH data to deduce the biogeography of key cyanobacterial diazotrophs using a derived currency of cell abundances and biomass, in order to assess the value of these data for validating numerical model biogeography.Using a few published studies (six publications that varied in species data and used different approaches) that had reported in situ data on nifH gene abundances, cell numbers or biomass (e. g.Hynes et al. 2012;Foster et al. 2013;Krupke et al. 2013;White et al. 2018) they derived conversion factors from nifH gene abundances to cell numbers, and from cells to biomass.The maximum and minimum range of derived conversion factors (i.e., the extremes, not average values) for both steps were then applied to an existing global marine database on nifH gene abundances (Luo et al. 2012;Tang and Cassar 2019).This approach generated ranges for depth integrated nifH gene abundance and nifH inferred cell counts and biomass across oceanic regions and latitudinal gradients for comparison with biogeochemical model output.At the biome level, the range in biomass predicted from nifH gene measurements was approximately four orders of magnitude greater than the dissimilarity in output from four model simulations from the same group over a period of 6 years.It was concluded that the error associated with converting nifH gene abundances to the model "currency" (form and units of variables needed by models) were too large for utility in model validation (ranging from four orders of magnitude for Trichodesmium to one order for Crocosphaera, UCYN-B).They concluded that qPCR data for the nifH gene, and likely other genes, were best used as an indicator of presence/absence rather than a measure of abundance for evaluating models.Although the analysis of Meiler et al. (2022) based on the commonly used approach of converting data to presence/absence is very useful and highlights the need for further research, their conclusion that "Despite its usefulness as an indicator of diazotrophic presence, nifH gene abundance may only weakly correlate with cell abundance and diazotrophic biomass" is worth revisiting.
We disagree with the conclusion that qPCR data are only weakly correlated with cell abundance because the analysis was based on very select few published datasets and (1) used assumptions about species and taxa biogeography based on size classes, (2) inflated the significance of polyploidy in undermining quantitation of diazotrophs, (3) used the resulting unnecessarily high ranges of combination of two conversion values as metrics, and (4) because the original nifH gene data reveal spatial patterns consistent with direct and indirect observations of diazotroph abundance and N 2 fixation rate contradicting the Meiler et al. (2022) assertion that nifH gene abundance is only of utility in presence/absence assessment.
The problem with defined size classes for key diazotrophs Meiler et al. (2022) argue that qPCR data are at the root of the currency disconnect between environmental measured abundances and model outputs, yet there are additional factors that affect abundance measurements based on qPCR (or any other molecular or imaging methods) and models.One important factor is that models use defined size classes, and many genera and species bridge different size classes and have different physiological characteristics that are not defined by size.The models define up to five size classes of diazotrophs between 3 and 15 μm in diameter (Dutkiewicz et al. 2021).The four cyanobacterial diazotrophs treated in the Meiler et al. (2022) study, Trichodesmium, the UCYN-A symbiosis, diatoms with Richelia symbionts, and Crocosphaera (some of which form large aggregates), vary in size across these size ranges (Pierella Karlusich et al. 2021).qPCR primers generally target a narrowly defined taxonomic group, and thus can distinguish between abundant and rare taxa, and define different sizes and biogeographic ranges.However, the data that were used combined taxa (qPCR data) within genus and species that varied not only in size, but also in frequency of occurrence in the oceans and biogeography.
The four groups addressed by Meiler et al. (2022) each have their own challenges for quantification.Trichodesmium spp.are a group of filamentous, cyanobacteria that do not form specialized N 2 -fixing heterocyst cells (Zehr et al. 2023).Trichodesmium species vary greatly in size (6-16 μm in width and 4-23 μm long across very diverse species; Hynes et al. 2012), and other characteristics (Webb et al. 2009, Zehr et al. 2023;Hynes et al. 2012), but only a few species are typically abundant and found in wide areas of the ocean.Filamentous heterocyst-forming N 2 -fixing cyanobacteria are associated with a number of genera of diatoms that also range in size and physical location of the symbiont, but also the number of filaments per diatom, the number of heterocysts per filament, and the degree of aggregation or chain formation (Caputo et al. 2019).UCYN-A (Candidatus Atelocyanobacterium thalassa) are a collection of mostly uncultivated strains of symbionts of the metabolically reduced cyanobacterium UCYN-A, endosymbiotic with the haptophyte Braarudosphaera bigelowii and relatives (Hagino et al. 2013;Farnelid et al. 2016).There are coastal and open ocean UCYN-A ecotypes that vary in size by at least a factor of 5 (Turk-Kubo et al. 2021).Crocosphaera watsonii is a marine planktonic species that is comprised of a number of very closely related strains (Webb et al. 2009;Zehr et al. 2023), include both free-living and symbiotic forms, but vary in size, physiology and aggregation.C. watsonii natural populations are comprised of large and small forms and aggregates (Zehr et al. 2023).Most common is the small form (Webb et al. 2009;Bench et al. 2016).Hence, as illustrated here (see also below), there is a fundamental currency problem when defining a size for diazotrophic genera or species, yet these different genera were all binned by Meiler et al. (2022).This problem inevitably contributes to variability in calculated nifH : cell and C : cell values across the different species and symbioses, but will also make any comparison between model output and nifH gene copy data highly questionable.There are qPCR primers for large and small Crocosphaera, different strains of Trichodesmium, Richelia/Calothrix, and UCYN-A, but Tang and Cassar (2019) combined the data for different strains/species within each group, which unfortunately was then used by Meiler et al. (2022) and contributed to the unnecessarily high ranges of conversion factors.

Number of nifH genes per cell (polyploidy)
The conclusions of Meiler et al. (2022) are based on analysis of a very small number of studies that reported in situ data for nifH gene abundance as well as some other measure of cell abundance or biomass (Krupke et al. 2013;Wilson et al. 2017;White et al. 2018) to calculate gene-to-cell conversion for the four major cyanobacterial groups based on the qPCR abundances of taxa-specific nitrogenase genes (nifH) and cell abundances.The studies differed in the target species, the specific measurements and methods used, and biogeographic location.Although these are among the few data available to attempt the desired conversion factors, the studies were not adequate for this purpose due to limitations that lead to erroneous conclusions regarding the ranges of values.
The studies cited used different approaches to determine nifH-to-cell conversions: White et al. ( 2018) used microscopy vs. qPCR; Wilson et al. (2017) used flow cytometry sorted cells coupled to qPCR; and Krupke et al. (2013) used catalyzed reporter in situ hybridization (CARD-FISH) vs. qPCR.These studies sometimes involved analyses from separate sample collections, as noted by Meiler et al. (2022), but this may be highly problematic for the conclusions drawn.For instance, Trichodesmium populations are comprised of free filaments and aggregates (puffs and tufts) that are buoyant and both float and sink and are notoriously heterogeneously distributed and difficult to sample (Rodier and Borgne 2008).Therefore, measurement comparisons need to be made on exactly the same water sample, not replicates from a CTD cast or replicate casts at the same station.Nonetheless, the cited study for Trichodesmium had an average nifH : cell that ranged by only a factor of 2 ( 130 One of the fundamental limitations of the approach taken by Meiler et al. (2022) is due to the aggregation of species/ strains that are known to vary greatly in size (or sizes of symbiotic hosts), physiology and biogeography, which unfortunately were combined in the Tang and Cassar (2019) database used for the study.For example, data were combined from individual measurements for UCYN-A1 and UCYN-A2, that differ in cell size and biomass, biogeography, and physiology (Farnelid et al. 2016;Turk-Kubo et al. 2021) to represent UCYN-A in the models that address the global open ocean.
Similarly, the diatom symbiont species were combined, despite the fact that there are qPCR probes that target the individual symbioses that differ in size, physiology, and biogeography.In the case of Richelia there are a number of sources of variability of nifH : cell and cell : C, as Richelia and related strains form associations with several genera of diatoms of varying sizes, with differing numbers of symbiotic filaments and numbers of cells per filament (Pierella Karlusich et al. 2021;Flores et al. 2022).
A variety of issues pertain to quantifying cell numbers from any gene (DNA) measurement.Since many of the diazotroph species targeted by qPCR are uncultivated (e.g., UCYN-A1), there are not precise estimates of extraction efficiencies as noted by Meiler et al. (2022), but they are likely relatively constant within species.Furthermore, extraction efficiency is also an issue with any DNA based approach, such as metagenomics (Pierella Karlusich et al. 2021), so the implications are not unique to qPCR.Polyploidy, or multiple genome copies per cell, has long been known in cyanobacteria (and eukaryotes), but only recently recognized with respect to N 2 -fixing cyanobacteria in oceans (Sargent et al. 2016;Pierella Karlusich et al. 2021).Although little is known, as there have been few direct studies, the potential for polyploidy to affect cell abundance estimates may be significant.However, even for Trichodesmium field data suggest that the average copy number could be easily modeled based on the linear relationship of cell number vs. nifH gene copy number (Fig. 2).Thus, a more straightforward approach, until there are further defining studies, would be to make simple assumptions about the relationship between cell and nifH gene copies for common representative species, as exemplified for Trichodesmium (Fig. 2).Nonetheless, we do advocate that extreme caution should be taken when converting gene copies to cell number, and in many cases this conversion is not necessary or desirable (see below).
The example data provided in Figs. 1, 2 show that qPCR data can be related to cell numbers and certainly are not consistent with the wide range of conversion factors for nifH gene copy : cell used by Meiler et al. (2022).Rather, we find that the choice by Meiler et al. (2022) of using range as the metric amplifies the currency problems and that the problems are not specifically related to qPCR per se (e.g., the biomass conversion problem would also apply to cell counts).

Cell-to-biomass conversions inflate the qPCR: Biomass error
The cell-to-biomass conversion calculations are perhaps the most problematic analysis in Meiler et al. (2022).In order to convert cell numbers to biomass (in C units), Meiler et al. (2022) used a few published datasets (Foster et al. 2011;Hynes et al. 2012;Wilson et al. 2017;Harding et al. 2018) and biomass : carbon conversion parameters (Strathman 1967;Verity et al. 1992).The estimated biomass is very sensitive to the measurements of size, assumed dimensions, and assumed elemental stoichiometry.
Only a few of the studies directly measured cell C on cultures (Strathman 1967;Verity et al. 1992).Much of the cell C data came from studies that were not designed to directly determine universal cell : C conversion factors, but were estimating cell C in order to determine isotope enrichment in labeled C and N uptake studies by nanoscale secondary ion mass spectrometry (nanoSIMS; Foster et al. 2011;Krupke et al. 2013;Harding et al. 2018).In nanoSIMS, cells are only partially analyzed as an ion beam ionizes successive layers of the cell, and the total biomass measured is dependent on the orientation of the cell and the fraction of the cell "burned" through.Total cell C is estimated by calculating volumes based on assumed 3D shapes from twodimensional slice data.Although this approach is necessary for nanoSIMS isotope tracer experimental analysis, accurate estimates of cellular C are better derived from cultivated organisms and/or other techniques.Furthermore, in the case of UCYN-A, the study chosen for analysis is from the extremes of the biogeographic ranges for UCYN-A (the Arctic; Harding et al. 2018), which is not the best representative for the species, particularly for the open ocean that is the focus of the models.
The range of cell : C conversions is amplified by combining strains/species of known differences in size and biogeography.For example, Meiler et al. (2022) combined UCYN-A strains, even though the strain that inhabits the open ocean, UCYN-A1, is much smaller than the others and can be constrained in size.The size of the UCYN-A1 host, which is the relevant measure for modeling biomass, is only 2-3 μm in diameter, whereas the coastal UCYN-A2 strains can be over five times this size (Cabello et al. 2016).
The limitations of cell : C conversions apply when calculating biomass numbers from any cell data to biomass C or N, which are the typical currency of models, independent of the method used to derive the conversions (including microscopy or flow cytometry).Thus, such conversions do not apply only to qPCR data, but represent a general challenge in a wide variety of environmental studies that seek to extrapolate specific measurements to their biogeochemical impact at larger scales, including modeling approaches (Coles et al. 2017).
Diazotroph qPCR data are valuable Meiler et al. (2022) describe the known limitations of qPCR data, including the issue of polyploidy and sources of variance associated with any DNA (or RNA) based methods.Combined with their handling of conversion factors and choice of range as a metric, this may leave the casual reader with the impression that qPCR is a flawed technique with no value.However, qPCR data can allow for quantitative comparisons in abundances even if there are limitations in converting to cell number and biomass.We stress that qPCR data have demonstrated important characteristics of the biogeography of diazotrophs, including seasonality (Church et al. 2009;Bentzon-Tilia et al. 2015;Cabello et al. 2016;Tang and Cassar 2019), biogeography (Shiozaki et al. 2017;Hallstrøm et al. 2022) and hotspots or blooms of uncultivated or difficult to detect organisms (Goebel et al. 2008;Moisander et al. 2010).This has led to invaluable insights into the environmental drivers of diazotrophic taxa now known to be of key importance to marine N cycling.Also, high abundances measured by qPCR made it possible to define samples for cell sorting and genome sequencing of UCYN-A (Bombar et al. 2014).qPCR is the only current method allowing quantification of uncultivated diazotrophs at sea, which is critical for obtaining more data for defining biogeography and validating models given the large spatial and temporal scales and patchiness of natural populations.
Finally, if we examine the nifH dataset used by Meiler et al. (2022), we find that in fact, not only do the spatial patterns of gene abundance closely match independent estimates of diazotroph biomass, but they echo the distribution of N 2 fixation rates from compilations of observations, inverse models based on nutrient distributions, and ecosystem models (Luo et al. 2012;Aumont et al. 2015;Landolfi et al. 2015;Letscher et al. 2015;Wang et al. 2019).Indeed, the analysis of Luo et al. (2012) while lacking a number of newer samples in the more recent Tang and Cassar (2019) compilation, demonstrates the close relationship between depth integrated diazotroph carbon biomass (their fig.8a) and depth integrated nifH counts converted to carbon biomass (their fig.9a).

Summary
The main conclusion of Meiler et al. (2022) is that caution should be taken when extrapolating from gene counts (qPCR, but also applies to metagenomic data) to cells or biomasses of organisms.This is an important point, but we argue that the ranges provided for gene : cell or cell : biomass relationships are poorly supported by current data.Moreover, we find that using such range extrema to evaluate empirical observations is inappropriate and unintentionally support the conclusion that gene counts are a poor basis for understanding controls of marine diazotrophy.
Taken together, we find that the conversion factors calculated by Meiler et al. (2022) are based on too few data and that these in several cases are obtained with variable methods and sampling strategies unsuited for calculating gene: cell or cell: biomass relationships.Hence, we find that (1) the confidence in the ranges of conversion factors calculated by Meiler et al. (2022), which forms the backbone of their study, is limited, which makes their conclusion on the restricted utility of qPCR data questionable; (2) their study emphasizes the importance of ranges, although methods typically use reasonable estimates or averages rather than ranges that represent the extremes; and (3) the basic premise for their study, the conversion of genes to cell abundance and biomass, is constrained to the purpose of integrating gene counts into biogeochemical models, whereas most studies using qPCR have a different goal.
In summary, there are several implications of the Meiler et al. (2022) study.First, the study combines errors associated with DNA measures with those of cell biomass.This second source of error, ranging over two orders of magnitude in Meiler et al. (2022), does not only apply to qPCR studies but to many modeling applications using many sources of data for cell abundance and thus does not apply directly to qPCR data but to virtually all oceanographic data.The discussion of Meiler et al. (2022) leads to the implication that there is no quantitative value in the spatial patterns observed, yet the extant patterns do agree well with what is known about diazotroph N 2 fixation biogeography.More generally, if gene abundance does not reflect cellular genome content, then many microbial oceanographic studies, not involving modeling, would need to be re-evaluated.In conclusion, the Meiler et al. (2022) study provided important and thought-provoking insights, but it should be noted that qPCR data, of other genes as well as nifH, are quantitative and valuable.We encourage continued use of qPCR enumeration of key genes to gain insights into the biogeography, controls, and ecology of microbes in the oceans.
AE 239 nifH : cell; White et al. 2018), compared to the total range Meiler et al. (2022) used for error propagation of 1.4-1405 nifH per cell.Furthermore, some of the studies also compared different sample types, such as different size-fractionations.For example, Krupke et al. (2013) used different samples (pooled FISH filter size fractions to compare to 0.2-μm filtered DNA samples) to measure abundances of UCYN-A by CARD-FISH and nifH gene abundance by qPCR, respectively, thus involving another source of error for comparison purposes.A recent study shows that nifH gene abundance can be well-correlated, certainly not varying by orders of magnitude (Fig. 1, panels A-D, Gradoville et al. 2022).

Fig. 1 .
Fig. 1.Correlation of nifH gene abundances vs. diazotroph cell concentrations from two cruises in 2017 and 2018.Solid lines show a 1 : 1 relationship;dashed lines show the slope from a simple linear regression model with a fixed zero intercept.Regression statistics are provided for each subplot.Gray dotted lines reflect detection limits, corresponding to 50 cells L À1 for 20 mL binned samples ($ 200 cells L À1 ) for IFCb measurements and 44-181 nifH genes L À1 for ddPCR measurements(Gradoville et al. 2021).Note that data are plotted on a logarithmic scale but regressions were performed using untransformed data.FromGradoville et al. (2022), with permission.

Fig. 2 .
Fig. 2. Relationship between Trichodesmium abundance and nifH gene copy number obtained from surface samples on two transects (ATM17, blue and D361, black) in the tropical and subtropical Atlantic Ocean.Trichodesmium erythraeum IMS101 culture samples are shown in red.Proposed linear relationship drawn in to indicate relationship between gene copy and cell numbers.Redrawn from Sargent et al. (2016), with permission.