Matching population diversity of rhizobial nodA and legume NFR5 genes in plant–microbe symbiosis

Abstract We hypothesized that population diversities of partners in nitrogen‐fixing rhizobium–legume symbiosis can be matched for “interplaying” genes. We tested this hypothesis using data on nucleotide polymorphism of symbiotic genes encoding two components of the plant–bacteria signaling system: (a) the rhizobial nodA acyltransferase involved in the fatty acid tail decoration of the Nod factor (signaling molecule); (b) the plant NFR5 receptor required for Nod factor binding. We collected three wild‐growing legume species together with soil samples adjacent to the roots from one large 25‐year fallow: Vicia sativa, Lathyrus pratensis, and Trifolium hybridum nodulated by one of the two Rhizobium leguminosarum biovars (viciae and trifolii). For each plant species, we prepared three pools for DNA extraction and further sequencing: the plant pool (30 plant indiv.), the nodule pool (90 nodules), and the soil pool (30 samples). We observed the following statistically significant conclusions: (a) a monotonic relationship between the diversity in the plant NFR5 gene pools and the nodule rhizobial nodA gene pools; (b) higher topological similarity of the NFR5 gene tree with the nodA gene tree of the nodule pool, than with the nodA gene tree of the soil pool. Both nonsynonymous diversity and Tajima's D were increased in the nodule pools compared with the soil pools, consistent with relaxation of negative selection and/or admixture of balancing selection. We propose that the observed genetic concordance between NFR5 gene pools and nodule nodA gene pools arises from the selection of particular genotypes of the nodA gene by the host plant.

, which together with the GFG represent the opposite end of the same continuum of host-parasite specificity (Agrawal & Lively, 2002). These theoretical concepts, initially developed for antagonistic systems, found their reflection also in mutualistic symbiotic systems (Cregan, Sadowsky, & Keyser, 1991;Lewis-Henderson & Djordjevic, 1991;Parker, 1999;Sachs, Essenberg, & Turcotte, 2011;Sadowsky et al., 1991). One of the possible consequences of the above-mentioned concepts is that matching between symbionts could be observed not only on the gene sequence level but also on the population structure level. We proposed that coordinated population diversity of symbionts can be a significant aspect of symbiotic interactions in a row with the difference in evolutionary rates between interacting species as considered in the Red Queen and the Red King dynamics (Bergstrom & Lachmann, 2003;Pal, Maciá, Oliver, Schachar, & Buckling, 2007;Paterson et al., 2010;Van Valen, 1973).
Previous studies have demonstrated the matching population diversities of symbionts. For example, the analysis of the symbiosis between Neorhizobium galegae and its host plant Galega indicated correspondence of population diversity levels between microsymbionts and the host Galega species (Andronov et al., 2003;Österman et al., 2011). In particular, a more genetically diverse Galega orientalis population harbors a more diverse root nodule rhizobial population, while its less diverse sympatric counterpart Galega officinalis forms symbiosis with a less diverse rhizobial population. This observation is related to the well-studied phenomenon of shaping the genetic structure of the rhizobial population through the selection of specific rhizobial genotypes by the host plant (Depret & Laguerre, 2008;Heath & Tiffin, 2007;Laguerre, Louvrier, Allard, & Noelle, 2003;Paffetti et al., 1998). Moreover, it has been shown that the topology of the nodA gene tree follows the corresponding host plant tree more strictly than the 16S rRNA-based rhizobial phylogeny (Dobert, Breil, & Triplett, 1994;Suominen, Roos, Lortet, Paulin, & Lindstro, 2001). Therefore, we expect that the interplay of symbiotic populations leads to concordance between the diversity levels in their symbiotic genes.
In this study, we focused on two symbiotic genes that can be considered interacting as both encode the essential components of the rhizobium-legume signaling system; these are associated with each other through a lipochito-oligosaccharide called Nod factor (NF) (Figure 1). The first component is the rhizobial nodA gene which encodes an acyltransferase enzyme essential in NF biosynthesis, specifically in the attachment of the long-chain fatty acid tail to the oligosaccharide backbone (Dénarié, Debellé, & Promé, 1996;Esseling & Emons, 2004;Oldroyd, 2013). The second component is one of the plant symbiotic receptor genes, NFR5, which is a homologue of LjNFR5, MtNFP, PsSym10 genes. Its product recognizes NFs (signaling molecules) by three extracellular LysM domains and triggers the formation of root nodule primordia giving the green light to the process of bacterial infection (Oldroyd, 2013). The NFs are major determinants of host specificity: rhizobia produce NFs with different structures, and host plants percept only those NFs that have a certain composition (Mergaert & Montagu, 1997). The variation of NFs structure is observed not only between rhizobia species but also at the intra-species level (Spaink, 2000); one rhizobia species produces a mixture of NFs that vary in the fatty tail modifications. As proposed, the nodA product can vary in its fatty acid specificity, thus contributing to the bacterial host range (Dénarié et al., 1996;Moulin, Béna, & Stępkowski, 2004;Ritsema, Wijfjes, Lugtenberg, & Spaink, 1996;Roche et al., 1996). It is logical to assume that the nodA gene diversity in a rhizobial population can reflect the structural variation of NFs produced by this population. Indeed, it has been shown that minor differences in the structure of fatty acids tail can affect intraspecies host specificity (Li et al., 2011). On the host plant side, NFs are recognized by high-affinity legume receptors (Broghammer et al., 2012;Moulin et al., 2004). Studies on the model legumes revealed NFR5 as one of the major receptors to percept NFs (Radutoiu et al., 2007). Mutant analysis showed that single amino acid differences in one domain of the NFR5 receptor change recognition of NF variants (Broghammer et al., 2012;Radutoiu et al., 2007). Such mediation of NFs between rhizobial nodA and legume NFR5 genes make them good candidates for testing the hypothesis that population diversities of partners in nitrogen-fixing rhizobium-legume symbiosis are matched.
We tested the hypothesis on symbiotic systems of three wildgrowing legume species (Vicia sativa, Lathyrus pratensis and Trifolium hybridum) with their rhizobial microsymbionts. Sampling of the experimental material was performed uniformly on the one large natural fallow (more than 25 years) field in order to avoid the influence of ecological factors. We collected 30 plant individuals for each of three species and, for each individual, collected (a) leaves, (b) nodules, and (c) soil samples. After the pooling, we obtained nine samples (3 species × 3 types of materials) each containing aggregated information of 30 samples ( Figure S1). Vicia and Lathyrus species represent the F I G U R E 1 A part of the signal transduction system that governs the rhizobium-legume symbiosis. The rhizobial nodA gene encodes the acyltransferase that participates in the attachment of the hydrophobic long-chain fatty acid tail to the Nod factor core. Plant NFR5 gene encodes the symbiotic receptor recognizing the rhizobial Nod factor followed by symbiosis formation same cross-inoculation group nodulated with Rhizobium leguminosarum bv. viciae strains, while Trifolium belongs to a separate group nodulated with R. leguminosarum bv. trifolii. One of the important traits of the rhizobium-legume symbiosis is the annual cycle of rhizobia, consisting of nodule formation with consequent amplification of rhizobia inside of the nodule, followed by a release of the nodule rhizobia back into the soil after nodule degradation leading to an increase of the frequency of this rhizobial genotype in the soil (Spaink, Kondorosi, & Hooykaas, 1998). Therefore, we analyzed both soil and nodule populations of rhizobia, which affect each other.
Testing the hypothesis of matching population diversities required comparison of structural (topological) characteristics of plant and rhizobia populations. Traditionally, the topological similarity between two populations is estimated as the congruence of two respective labeled trees (Leigh, Lapointe, Lopez, & Bapteste, 2011).
Here, we propose a novel method to compare topologies of two gene trees with unlabeled leaves. The method is based on the gCEED approach (Choi & Gomez, 2009) that translates each population to the Gaussian mixture model in a K-dimensional space. This method can be classified as a kind of beta-diversity metric, which, by analogy with taxonomic (Jost, Chao, & Chazdon, 2011) and phylogenetic methods (e.g., UniFrac, Lozupone, Lladser, Knights, Stombaugh, & Knight, 2011), could be denoted as "topological beta-diversity." We apply it to show that the tree structures are concordant between the two symbiont species.

| Sampling
Three wild-growing legume species (30 samples per species) together with rhizosphere soil-the common vetch V. sativa, the meadow vetchling L. pratensis, and the alsike clover T. hybridum-were uniformly collected from the large natural fallow (more than 25 years) field near the town Vyritsa (Gatchinskii region of Leningradskaya oblast, Russia, 59°24′7.74′′N; 30°15′28.74′′E). All sampled plants had formed nitrogen-fixing symbiotic root nodules that were selected and thoroughly washed. Soil samples were collected from close proximity to the plant roots (1-5 cm). Despite the fact that a host plant has an influence on the rhizobial population within these soil samples, these samples are referred to as soil samples. Three nodules from each plant sample were picked from the main or the closest to the main lateral roots. For each legume species, we prepared three pools for DNA extraction: the plant pool (30 leaf pieces, 0.1 g each), the nodule pool (90 nodules, 3 nodules per plant individual), and the soil pool (30 soil samples, 0.2 g each).

RDACGAGBACRTCTTCRGT-3′)-produces a 217 bp amplicon
product. The reaction conditions in the first and the second round of the PCR consisted of the initial denaturation step at 94°C for 3 min followed by 35 cycles with denaturation at 94°C for 30 s, primer annealing at 50°C for 30 s, and extension at 72°C for 1 min.
The bar-coded PCR products from six nodA libraries (soil and nodule libraries from three plants species) were sequenced with a Roche 454 GS Junior (following the manufacturer's protocols) generating an average of 3,000-4,000 reads per library. All obtained sequences were subjected to filtration by quality (quality score higher than 25), length (longer than 170 bp), and separating into libraries according to barcodes in QIIME. We introduced the term "pool" to designate a set of nodA gene sequences from nodule or soil rhizobia population. We performed the rarefaction analysis of the π nucleotide diversity within rhizobial pools to demonstrate whether the number of resultant sequences was sufficient to estimate the diversity ( Figure S2).
The sequencing data were deposited at the NCBI short read archive under the bioproject number PRJNA297503. The multiple alignments of the remaining sequences within each pool were performed with ClustalW algorithm as implemented in MEGA (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013). Sequences with frameshift errors were removed. The resultant multiple alignment for each pool did not contain gaps.

| Gene trees
The total nodA gene sequences from nodule and soil pools aligned with ClustalW were clustered into operational taxonomic units (OTUs) at 95% nucleotide sequence identity threshold using the UCLUST algorithm implemented in QIIME 1.9. A neighbor-joining (NJ) dendrogram based on the numbers of differences between representatives of each OTU was constructed in MEGA6 and rooted using the outgroup nodA gene sequence of Sinorhizobium meliloti (GenBank ID AZNW01000092.1).

| Diversity analysis
To compare the levels of nucleotide diversity π between plant and nodule rhizobia pools, we constructed the distributions of π statistics for each plant population subsampling with replacement of 70 sequences in 2,000 trials from each plant NFR5 pool that initially contained 100 sequences. Subsequently, we randomly formed pairs of π diversity values from the obtained distributions for the plant pools and the respective nodule pools. The total set of 6,000 pairs (2,000 per plant sp.) was taken as a paired sample data. The relationship between host plants (NFR5 gene) and nodule rhizobial (nodA gene) population diversity was assessed as the value of Spearman's rank correlation coefficient for the paired sample data. The values higher than 0.8 were taken to imply monotonic relationship in π between plant and nodule rhizobial pools.
All calculations were implemented in MATLAB. The link to the GitHub repository containing MATLAB scripts for the diversity analysis is provided in the Supporting information.

| Statistical methods for detecting selection
The dN/dS ratio is a widely used measure to quantify the selection selection in the other pool. We hypothesized that the pN value in a soil rhizobial pool is greater than or equal to the pN value in the respective nodule pool. We tested this hypothesis for each legume species separately, performing Welch's t test considering 0.01 level of significance as described above.
Tajima's D statistic represents the difference between the observed and the theoretically expected nucleotide diversity (Tajima, 1989). If the mutations are neutral and the population adheres to the Wright-Fisher assumptions (Hartl & Clark, 2007), Tajima's D equals zero. Values significantly higher than zero indicate the deficit of rare alleles (e.g., due to a recent decrease in population size or balancing selection), and values significantly lower than zero indicate the excess of rare alleles (e.g., due to a recent population size expansion or purifying selection).
In order to identify the selection type underlying the transformation of a soil pool to the respective nodule pool, we compared the values of Tajima's D between the pools. We considered the hypothesis that the Tajima's D in the soil pool is higher or equal to that in nodule pool. We tested this hypothesis using Welch's t test at 0.01 level of significance.
The calculations were performed using MATLAB PGEToolbox (Population Genetics Evolution Toolbox). The link to the GitHub repository containing MATLAB scripts for the analysis of selection is provided in the Supporting information. Let D p i,j be the symmetric distance matrix between each pair of haplotypes in a population p, i = 1,n p . At the second step, the hierarchical agglomerative clustering method merges haplotypes into clusters until the number of clusters equals to a predefined number m. We fix a population p and omit this index. The clustering algorithm starts with placing each haplotype in its own cluster that is described by two parameters: frequency f i and mean difference σ i (initialized to zero). Then, the following procedure is repeated until exactly m clusters occur. Two clusters, the i′-th and j′-th with the smallest pairwise distance

| Topological organization of diversity in plant and rhizobia pools
, are merged into one new cluster. A distance from this cluster to a k-th cluster is calculated as follows: .
The frequency of the new cluster is f(i) + f(j) and the mean difference of the new cluster is D i ′ ,j ′.

| nodA gene: OTUs and cluster analysis
NodA gene libraries (for root nodules and for soil samples) were The NJ tree constructed for OTU representatives showed that the rhizobium population consisted of two clusters (orange and blue in

| Relationship between the bacteria and host plant diversities
We calculated the diversity levels π in each plant population and found that the ranking of the three species was the same as the ranking based on the π nucleotide diversity values in corresponding nodule pools. By bootstrapping the plant and rhizobial nodule pools, we estimated the Spearman correlation between nucleotide diversities in these pools. The 0.89 value, which was higher than the predefined threshold, indicated that the monotonic relationship between diversities in pools of plant NFR5 gene sequences and in bacterial nodA nodule pools is statistically significant (Figure 3).

| Concordance of gene trees
A visual comparison of topologies of plant NFR5 trees with those of corresponding rhizobial nodA trees for nodule and soil pools revealed that the topology of clades in the plant trees was more similar to the topology of clades in the nodule trees than in the soil trees ( Figure S4, Appendix S1). Based on this observation, we proposed that the gene tree of the nodule rhizobia is more similar to that of the host plant than the gene tree of the soil rhizobia.
To formally test this, we developed a method for comparing structures (topologies) between two pools. We tested the null hypothesis that the topological similarity between a nodule rhizobia pool and a plant pool is not higher than the topological similarity between the soil rhizobia pool and the plant pool. This hypothesis was rejected at the 0.01 level of significance. We constructed tanglegrams that also illustrated a higher similarity of the topology of the nodule rhizobial nodA gene trees (Figure 4, left tanglegram) than the soil rhizobial nodA gene trees (Figure 4, right tanglegram) to that of plant NFR5 gene tree. The statistics ∆G, which reflects the topological difference between structures of plant and bacteria populations, can be referred as to "topological beta-diversity." The obtained tanglegrams demonstrated that nodule pools contained clades of genotypes that were not detected in the corresponding soil pools (Figure 4). The presence of these clades can be a result of host plant selection of rare soil genotypes and is responsible for the increased topological similarity between NFR5 gene pools with nodule nodA gene pools.

| Analysis of selection
Analysis of population structures revealed the significant linkage between synonymous and nonsynonymous polymorphic sites, suggesting that the pN and pS statistics were not independent (see

Appendix S2). Further comparison of pN/pS values between nodule
and soil pools did not establish the significant difference ( Figure S3).

| D ISCUSS I ON
Symbiotic interactions represent a special case of ecological interactions when one of the partners provides an "environment" for another. In On the Origin of Species (Darwin, 1872), Charles Darwin proposed that "the life of each species depends in a more important manner on the presence of other already defined organic forms, than on climate." This is particularly true for organisms in deeply integrated symbiotic systems.
Here, we traced the coordination in levels of population diversities between partners within the essential components of the rhizobium-legume signaling system: plant symbiotic receptor gene  (Table 1). This may be indicative of weaker negative selection in a nodule gene pool in a comparison with the respective soil gene pool, but is also consistent with a contribution of balancing selection or presence of stronger population structure in the former. In the previous study, it was shown that in symbiotic systems, besides the above-mentioned types of selection, negative frequency-dependent selection in favor of rare genotypes during the competition of rhizospheric bacteria for root nodulation (Amarger & Lobreau, 1982) may also play an important role (Andronov, Igolkina, Kimeklis, Vorobyov, & Provorov, 2015;Provorov & Vorobyov, 2000 2005; Depret & Laguerre, 2008;Österman et al., 2011;Paffetti et al., 1998;Rangin, Brunel, Cleyt-Marel, Perrineau, & Gilles, 2008;Vuong, Thrall, & Barrett, 2016). Finally, there are some suggestions on the molecular mechanism involved in this process. In our recent study, we modeled the 3D sandwich-like structures of the NFR5-K1 heterodimeric receptor with its ligand Nod factor and observed the mutually polymorphic areas in the contact zone between NFR5 and K1 that were overlapped with known structural variation of Nod factor (in the fatty acid part) produced by R. leguminosarum bv. viciae (Igolkina, Porozov, Chizhevskaya, & Andronov, 2018).
These results demonstrate the possible specificity of host plant receptors to variations in NF structure likely resulting in matching population diversities between nodA and NFR5 gene pools.
The observed matching between nodule rhizobial nodA gene pools and plant NFR5 receptor gene pools revealed the hierarchical organization of effective interaction: two symbionts should be genetically compatible at the single organism level and also at the population level. The process of forming this interaction could be explained metaphorically as an evolutionary molding: shaping the population structure of one symbiont using the population structure of another symbiont as a "matrix." The important point in this shaping is the difference between evolutionary rates in plants and bacteria. The bacteria have a significantly higher evolutionary rate than plants; therefore, the diversity of the nodA gene in bacterial populations, like the flexible genetic material in the evolutionary molding, reflected the shape of more "rigid" diversity of the NFR5 receptor gene in plant populations. We hypothesize that under the evolutionary molding effect, two symbiotic populations tend to relax the incoordination of genetic diversities between two parts of the symbiont-host signaling system, which is mostly achieved by a faster evolving partner, rhizobia in our case. We proposed that according to the effect described, the relationship in population diversity between rhizobia and host plant may be observed not only within the pair of nodA-NFR5 genes (which are related through the NF) but also within any pair of interplaying genes from plant and bacterial sides, and that genome-wide scanning for "matching" genes can be an extension to the traditional methods of functional analysis of genes.

CO N FLI C T O F I NTE R E S T
None declared.