Caught in the Act: Variation in plastid genome inverted repeat expansion within and between populations of Medicago minima

Abstract The inverted repeat (IR) lacking clade (IRLC) is a monophyletic group within the Papilionoideae subfamily of Fabaceae where plastid genomes (plastomes) do not contain the large IR typical of land plants. Recently, an IRLC legume, Medicago minima, was found to have regrown a ~9 kb IR that contained a number of canonical IR genes, and closely related M. lupulina contained an incomplete IR of ~425 bp. Complete plastomes were generated for seven additional species, putative members of the M. minima clade. Polymerase chain reaction was employed to investigate the presence of the IR across M. minima and M. lupulina including individuals of nine and eight Eurasian and North African accessions and 15 and 14 Texas populations, respectively. While no sequence similar to the ~9 kb IR was detected among the seven newly sequenced plastomes, all Eurasian and North African accessions of M. minima contained the IR. Variation in IR extent was detected within and between the Texas populations. Expansions of 13 bp and 11 bp occurred at the boundaries of both IR/small single‐copy regions, and populations had one or the other expansion, but not both. Expansion of the IR was not detected in the accessions from Eurasia and North Africa suggesting recent mutations yielded at least two additional plastid haplotypes in M. minima.

to ~88 kb . Likewise, some of the smallest plastomes among autotrophic plants lack the IR entirely (Blazier et al., 2016;Ruhlman et al., 2017;Sabir et al., 2014;Sanderson et al., 2015), or contain highly reduced remnants of the ancestral repeat (Guisinger et al., 2011). In at least two cases, IR reemergence has been documented within lineages where the IR was lost.
The highly unusual long-branch clade (LBC) species of Erodium exhibit novel IR structures ranging from 25 kb to more than 47 kb concomitant with plastome expansion (Blazier et al., 2016). While it is somewhat obscure whether there were two independent IR losses, or a shared loss with reemergence exclusively in the LBC for the return of the Erodium IR, the case in one clade of legumes is clear.
Loss of the IR is the predominant character that defines the IR lacking clade (IRLC; Wojciechowski et al., 2004) of papilionoid legumes. Since the earliest example of IR loss was uncovered (Kolodner & Tewari, 1979), direct DNA sequencing and more recently next generation, high-throughput sequencing have facilitated analyses that supported the monophyly of the IRLC (McMahon & Sanderson, 2006;Schwarz et al., 2017;Wojciechowski et al., 2000). The genus Medicago contains about 87 described species (Small, 2011) and belongs to the tribe Trifolieae, which is nested within the IRLC (Cardoso et al., 2015;The Legume Phylogeny Working Group, 2017). The most well recognized Medicago species is the food and forage crop alfalfa (M. sativa. ssp. sativa). Also notable is the plant research model M. truncatula, with high reproductive rates and amenability to genetic manipulation. Medicago were not considered highly rearranged (Cai et al., 2008;Sveinsson & Cronk, 2016); however, this assumption was based on a single plastome, that of M. truncatula. Expanded sampling included 19 species representing all the major clades in the genus and found only modest divergence among congeners with regard to structural organization overall (Choi et al., 2019). However, one clade contained unique variation in repeat structure including an incomplete IR in M. lupulina and the presence of a novel IR of ~9 kb in M. minima. The sequences that flank the novel IR differ from the canonical boundary sequences (Mower & Vickrey, 2018;Yamada, 1991), yet the IR of M. minima contains a portion of the ribosomal operon (rrn23 … trnN-GUU), with the remaining IR core genes (trnV-GAC…trnA-UGC) and other common IR sequences lying upstream of IRb (Choi et al., 2019).
The previous study included single individuals for just three taxa in the M. minima clade. To examine the evolution of IR reappearance in the clade, plastomes for seven taxa suggested to be close relatives in phylogenetic studies (see Small, 2011) were completed.
Additionally, populations of M. minima and M. lupulina from across the native range, Northern Africa, and Central Asia, as well as field-collected populations from across Texas were assayed for the presence of the novel IR and variation in its extent. Questions about the role of the IR in plastome recombination and replication have persisted since its discovery. Investigating variation in IR presence and extent within and between populations and closely related species may illuminate our understanding of the role of the IR and the mechanisms of plastome maintenance.

| Additional taxon selection and plastome sequencing
Phylogenetic studies suggested that additional taxa could be included in the monophyletic group that includes M. minima that were not included in the previous analyses (Small, 2011). Accessions of seven taxa were acquired from USDA-GRIN (U.S. Department of Agriculture Germplasm Resources Information Network) (Table   S1). Seeds were germinated and grown in the greenhouse at UT Austin and emergent leaves were collected in liquid nitrogen from single individuals. All DNA protocols followed Choi et al. (2019), including isolation, sequencing, assembly, and annotation of plastomes. Completed plastome sequences were submitted to NCBI and GenBank accession numbers are given in Table S1. Repeat content of newly sequenced plastomes was estimated according to Choi et al., 2019. Gene sequences were extracted from the new taxa and combined with the phylogenetic data set from the previous analysis (Choi et al., 2019). All shared protein-coding sequences (69; see Table S2, (Choi et al., 2019) for 27 Medicago, and eight outgroups were used to infer relationships. All phylogenetic methods followed Choi et al. (2019).

| PCR screen of Medicago minima and M. lupulina accessions
Nine accessions of Medicago minima and eight of M. lupulina were acquired from USDA-GRIN (Table S2) Table S3 for accessions, location, and voucher information.
Individuals representing all included accessions, both from USDA and field collected, were vouchered and deposited in the TEX-LL.
A PCR screen was applied to all accessions and employed primers utilized by Choi et al. (2019) to amplify IR junction sites in M. minima and to assess a small inverted repeat of interest identified previously in M. lupulina (Choi et al., 2019). Initial screening of USDA acces-  (Katoh & Standley, 2013), implemented in Geneious, using the default settings. Genbank accession numbers for sequenced amplification products are provided in Table S4.

| Sequencing and phylogenetic inference for seven new Medicago plastomes
Complete plastomes were assembled for seven Medicago.
Sequencing statistics and overall plastome characteristics are reported in Table 1, while Table S1 Table   S2), all contained the ~9 kb IR identified in the original sequenced plastome (Choi et al., 2019) and were identical with respect to the IR/SC boundaries. While no variation was detected in the position of the IR/LSC junction, initial screens of the Texas populations did reveal variation in the junction of the IR and the SSC region (Table S3).

| PCR screening for IR distribution and extent
The remaining seven individuals from each Texas population were screened using primers to amplify the variable junctions at both IR A / SSC (J SA ) and IR B /SSC (J SB ).
The sequences at J SA and J SB were polymorphic both within and between the Texas populations. Eight of the 15 sampled populations were polymorphic for expansions at either J SA or J SB , but not for both yielding three plastome IR haplotypes: Type O, representing the unexpanded IR identified in the originally sequenced plastome of M. minima (Choi et al., 2019); type A, derived from expansion that initiated at J SB ; and type B, derived from expansion that initiated at J SA (Figure 3). Haplotype designation of "A" or "B" indicates the IR junction at which single-copy sequence was overwritten resulting in expansion of the IR. Figure 2 and Table S3 summarize the geographic distribution of IR boundary variation among the Texas populations.
While both polymorphisms extended the IR into the SSC, the less common was expansion to yield the A haplotype. This expansion, which was identified in a single population, appeared to include an additional 13 bp of single-copy sequence in the IR. The expansion includes 10 bp upstream of J SB that were single copy in M. minima plastomes that lack the expansion. Five base pairs were inserted at J SA (TTTAT), and five base pairs of formerly single-copy sequence have likely undergone gene conversion (ATAGA ➔ TATGA) homogenizing the two repeats. Coincidentally, extension of the IR places both SSC junctions adjacent to an existing three-base sequence (TGG) that is present near both ends of the SSC in M. minima plastomes that contain unexpanded IRs. This three-base sequence does not appear to have been duplicated through IR expansion, but given its identical sequence at both junctions may be considered as included in the IR of the A haplotype. The polymorphic expansion yielding haplotype B included 11 bp in the M. minima IR relative to the unexpanded  (Figure 3).

| D ISCUSS I ON
The Plastome transmission is primarily uniparental and predominantly maternal in most groups providing for the assumption that plastomes will be uniform among individuals due to the lack of sexual recombination (Birky, 2001;Greiner et al., 2015). In biparental lineages, those that inherit their plastomes from both the pollen and seed parent, the potential for plastome haplotype diversity is greatly increased.
Several studies have investigated the mode of organelle inheritance in different species of Medicago. Many efforts have been dedicated to characterizing alfalfa (Medicago sativa ssp. sativa) and the closely allied taxa of the M. sativa complex, which contains both tetraploid (2n = 4x = 32) and diploid (2n = 2x = 16) genotypes (Quiros & Bauchan, 1988 in larger individuals and increases amenability to hybrid production. Inheritance modes have been identified as predominantly uniparental paternal, biparental, and uniparental maternal, depending on a number of variables (Lee et al., 1988;Nagata et al., 1999;Schumann & Hancock, 1989;Smith, 1989;Zhu et al., 1993). itance have employed ecotypes that contain polymorphic sites so that the parental origin of the sequence marker can be followed in "hybrid" progeny (i.e., Matsushima et al., 2008). While both seed and pollen parental markers were detected in cotyledons of F1 progeny, leaves evaluated later in development tended to retain only one or the other marker. Populations of F2 progeny raised from the seed of a single mother uniformly showed the markers of either parent supporting the notion that vegetative segregation (sorting out) resolves rapidly in this system (Birky, 1978;Johnson & Palmer, 1989).
The annual diploid Medicago minima is a predominantly self-pollinated inbreeder that yields highly uniform progeny and is naturalized the world over (Small, 2011 (Fernald, 1941), Oklahoma (Hopkins et al., 1943), Arizona, and California (Howell, 1949). Today, M. minima can be found on six continents and is particularly common in disturbed habitats (Small, 2011 Texas sometime in the early twentieth century (Diggs et al., 1999) in approximate agreement with the herbarium record from 1914.
A number of ecotypes/cultivars were collected from several locations in Texas for evaluation of tolerance to the colder and/or drier regions of the state (Ocumpaugh, 2001)  to Texas A&M AgriLife foundation seed (Ocumpaugh et al., 2007).  tained. This mechanism has been invoked to explain small migrations in IR/SC boundaries in many land plants (Goulding et al., 1996;Wang et al., 2008;Zhu et al., 2016). conversion are thought to be random (Goulding et al., 1996)

ACK N OWLED G M ENT
This work was supported by grants from Texas Ecological Laboratory Program to R.K.J, T.A.R, and I.C., the National Science Foundation (DEB-1853024) to R.K.J. and T.A.R and the Sidney F. and Doris Blake Professorship in Systematic Botany to R.K.J. The authors thank TEX-LL for serving as a repository for voucher specimens and the United States Department of Agriculture Germplasm Resources Information Network for providing seed.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
DNA sequences were deposited in GenBank. GenBank and USDA-GRIN accession numbers, sample voucher information, and collection locations are provided in Appendix S1.