Low base‐substitution mutation rate and predominance of insertion‐deletion events in the acidophilic bacterium Acidobacterium capsulatum

Abstract Analyses of spontaneous mutation have shown that total genome‐wide mutation rates are quantitatively similar for most prokaryotic organisms. However, this view is mainly based on organisms that grow best around neutral pH values (6.0–8.0). In particular, the whole‐genome mutation rate has not been determined for an acidophilic organism. Here, we have determined the genome‐wide rate of spontaneous mutation in the acidophilic Acidobacterium capsulatum using a direct and unbiased method: a mutation‐accumulation experiment followed by whole‐genome sequencing. Evaluation of 69 mutation accumulation lines of A. capsulatum after an average of ~2900 cell divisions yielded a base‐substitution mutation rate of 1.22 × 10−10 per site per generation or 4 × 10−4 per genome per generation, which is significantly lower than the consensus value (2.5−4.6 × 10−3) of mesothermophilic (~15–40°C) and neutrophilic (pH 6–8) prokaryotic organisms. However, the insertion‐deletion rate (0.43 × 10−10 per site per generation) is high relative to the base‐substitution mutation rate. Organisms with a similar effective population size and a similar expected effect of genetic drift should have similar mutation rates. Because selection operates on the total mutation rate, it is suggested that the relatively high insertion‐deletion rate may be balanced by a low base‐substitution rate in A. capsulatum, with selection operating on the total mutation rate.

, much less is known about the effects of acidic pH on the genome-wide spontaneous mutation rate in bacteria.
Microorganisms have evolved to grow in different ranges of environmental pH from pH 0 to above pH 13 (Nordstrom et al., 2000;Roadcap et al., 2006;Slonczewski et al., 2009). The external pH has an effect on intracellular pH, which affects all biochemical activities and the structure and stability of both nucleic acids and many other biological molecules. Thus, all biological processes are dependent on pH, for in any cell, the intracellular pH must maintain specific and constant value (within a narrower range than external pH and usually close to neutrality; Slonczewski et al., 2009). For example, under optimal growth conditions (pH = 3.5-3.7 and 75°C), the thermoacidophilic archeon Sulfolobus acidocaldarius maintains its intracellular pH around 6.0, with a high pH homeostasis capacity at external acidic pH. Also, acidophilic microorganisms have a number of adaptations to survive in acidic environmental conditions. For example, proteins may have an increased negative surface charge that stabilizes them at low pH (Baker-Austin & Dopson, 2007;Xia & Palidwor, 2005). Therefore, in order to understand how environmental factors (temperature, pH, etc.) and intrinsic mechanisms (DNA replication and repair) cooperate and determine the genome-wide mutation rate and spectrum across the tree of life, it is necessary to expand experimental assays to species living in extreme environments.
In this study, we performed a mutation accumulation (MA) experiment combined with whole-genome sequencing (WGS) on Acidobacterium capsulatum, a member of phylum Acidobacteria. The members of the Acidobacteria are abundantly distributed in soil habitats with different physical and biogeochemical characteristics worldwide (Janssen, 2006;Tringe et al., 2005), and can represent 20% of the microbial community across diverse soil environments (Janssen, 2006). Thus, it is assumed that Acidobacteria are genetically and metabolically diverse and, they also have a significant role in biogeochemical processes because of their ubiquity and abundance in various ecosystems (Barns et al., 1999). The first recognized species of Acidobacteria was A. capsulatum, isolated from an acid-mine drainage in Japan (Kishimoto et al., 1991). A. capsulatum grows best at a pH of 3.0-6.0 (optimum 5.0) but does not grow at a pH below 3 or above 6.5. The genome sequence of A. capsulatum is 4.13 Mb in length with 60.5% GC content (Ward et al., 2009). By expanding previous mutational analyses to an unexplored A. capsulatum with unusual environmental requirements, this work will enhance the understanding of how different genetic and environmental backgrounds contribute to the mutation process.

| Mutation accumulation
To estimate the mutation rate in A. capsulatum (ATCC 51196), 80 independent MA lines were initiated from a single colony of A. capsulatum. The recommended ATCC agar medium 1168 was used for the mutation-accumulation line transfers. All lines were incubated at 30°C under aerobic conditions. Every week, a single colony from each line was transferred onto a fresh plate, minimizing the effective population size. This bottlenecking procedure ensures that mutations accumulate in an effectively neutral fashion (Kibota & Lynch, 1996). Every month, single colonies from 10 randomly selected MA lines were used to count colony forming units (CFU) and the mean number of generations (n) was estimated by n = log 2 (CFU). The total

| Mutation analyses
The median depth of coverage for the 69 MA lines was about 139×, and >85% of the genomic sites were covered with reads in all sequenced lines (Supporting Information File S1). All MA lines have at least 20× depth of coverage and no cross-line contamination. Adaptors of paired-end reads were removed with Trimmomatic 0.32 (Bolger et al., 2014), and then trimmed reads were mapped to the reference genome (NCBI accession number: NC_012483.1) using BWA 0.7.12 ). Then, SAMtools-1.3.1 ) were used to transform sam files to the bam format. Duplicate reads were removed using picardtools-1.141 and read realignment around insertion-deletion using GATK 3.5, before performing SNP and indel discovery with standard Van der Auwera et al., 2013) and only unique mutations were included. Base-pair substitutions and small indels were called using the HaplotypeCaller in GATK. Perl scripts were used to detect variants located in SSRs (https://cci-git.uncc.edu/wsung/ ssrse arch). We also tested the mutation calls by using breseq 0.32.0 (Deatherage & Barrick, 2014). Read alignment for all mutation sites was validated visually with the Integrated Genome Viewer v.2.3.5 ( Thorvaldsdottir et al., 2013). Greater than 99% of reads in a line were required to determine the line-specific consensus nucleotide at a candidate site-a level of 1% was set to allow for aberrant reads originating from sequencing errors, impure indices during library construction, or barcode degeneracy during sequence demultiplexing.

| Calculations and statistics
The pooled mean mutation rate (μ) was calculated with the formula  (Table S1). The number of base-substitution mutations detected by the breseq pipeline was 97, and 84.5% of these mutations were also identified with the GATK method (Table S2). The base-pair substitution mutation rate from the breseq method is 1.36 × 10 −10 per site per generation (95% confidence intervals: 1.10 × 10 −10 , 1.67 × 10 −10 ), which is not significantly different than the value calculated using GATK.
Using the annotated A. capsulatum genome (NCBI accession:   (Table S1), which is lower than the G∕C→A∕T rate (95% Poisson confidence intervals for G∕C→A∕T 0.86−1.53 × 10 −10 , for A∕T→G∕C 0.43−1.1 × 10 −10 ). Given these conditional A/T↔G/C mutation rates, the expected genomic GC content if mutation alone is the driving process is 38%, significantly lower than the actual chromosomal GC content of 60.5%. Our results are consistent with a hypothesis of near-universal mutation bias toward A/T countered by selection for GC content (Hershberg & Petrov, 2010;Long et al., 2018).
Though A. capsulatum has a low base-substitution mutation rate, its indel rate is higher than previous analyses of genome-wide spontaneous mutations in prokaryotes (Long et al., 2018;Sung et al., 2016). While indel mutations range from 1.8% to 11.9% of total mutations in most organisms , we found that >26% of total mutations are indel mutations in A. capsulatum. To understand the high indel rate observed in this organism, we further examined the simple sequence repeat regions (SSRs), which are well-known as mutational hotspots. The A. capsulatum genome has 2474 SSRs, located mainly in coding regions (92.4%). These regions cover 0.98% of the genome, similar to that found in other prokaryotic genomes (Mrazek et al., 2007). We found that 38.7% (12/31) of the small indels occur in SSRs in A. capsulatum (Table S4), and that 8 of the 12 indel mutations are both in SSRs and coding regions. We then surveyed bacterial genomes by focusing on the relationship between SSR abundance and the indel rate in bacteria. However, we could not find such a relationship, except that the A. capsulatum indel rate is not different than that of Staphylococcus aureus with the same SSR percentage (Figure 1). F I G U R E 1 (a) (Base-substitution mutation rate) and (b) (insertion-deletion rate) show the relationships between the mutation rate (/ site/generation) and total haploid genome size. (c) (Base-substitution mutation rate) and (d) (insertion-deletion rate) show the relationships between the mutation rate and abundance of simple sequence repeats within the genome (% SSR share two mutational characteristics. First, they have remarkably similar genome-wide base-substitution mutation rates ranging from 0.0025 to 0.0046 per genome per generation (Drake, 1991;Long et al., 2018;Lynch et al., 2016;Strauss et al., 2017).
Second, insertion-deletion mutations range from 1.8% to 11.9% of total mutations in an organism . Interestingly, we found a low genome-wide base-substitution rate of 0.0004, compared to rates estimated in other neutrophilic (that commonly need pH 6-8 for optimum activity) prokaryotes, and a relatively high indel rate compared to the base-substitution mutations. The observed low base-substitution rate suggests that either A. capsulatum replication fidelity is higher than in other prokaryotes or alternative biochemical repair mechanisms are used to maintain a low mutation rate. But, the relatively high indel rate may be a consequence of the underlying molecular mechanisms that arise and repair base-substitution and indels (Kunkel, 2009;Sung et al., 2015). While indels mainly derive from strand slippage or doublestrand breaks, and are often repaired by nucleotide-excision repair, base-substitutions mostly result from base misincorporation or biochemical alteration, and are primarily reversed by alkyl transferases or base-excision repair (Morita et al., 2010).
According to the drift barrier hypothesis Sung et al., 2012), the efficacy of selection in reducing the mutation rate is determined by the power of random genetic drift, which is inversely proportional to the effective population size. Thus, it is expected different that bacterial species have roughly similar per genome mutation rates if they have similar effective population sizes. Consistent with this view, it is possible that the relatively low base substitution mutation rate in A. capsulatum is balanced by a relatively high insertion-deletion rate, as selection operates primarily on the total genome-wide mutation rate, and less so on the detailed molecular spectrum of mutations Sung et al., 2012Sung et al., , 2016; Figure 1). While this work contributes to our understanding of mutation rates and spectra, and how these factors may differ among organisms, further studies with other extremophiles will help provide a deeper understanding of how mutational processes are shaped by intrinsic and extrinsic conditions.

This study is funded by a Multidisciplinary University Research
Initiative award (W911NF-09-1-0444) from the US Army Research Office and National Institutes of Health awards (R01GM036827 and R35-GM122566). This research was supported in part by Lilly

CO N FLI C T S O F I NTE R E S T
No conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw Illumina sequence data reported in this study has been de-