Attack of the clones: Population genetics reveals clonality of Colletotrichum lupini, the causal agent of lupin anthracnose

Abstract Colletotrichum lupini, the causative agent of lupin anthracnose, affects lupin cultivation worldwide. Understanding its population structure and evolutionary potential is crucial to design successful disease management strategies. The objective of this study was to employ population genetics to investigate the diversity, evolutionary dynamics, and molecular basis of the interaction of this notorious lupin pathogen with its host. A collection of globally representative C. lupini isolates was genotyped through triple digest restriction site‐associated DNA sequencing, resulting in a data set of unparalleled resolution. Phylogenetic and structural analysis could distinguish four independent lineages (I–IV). The strong population structure and high overall standardized index of association (r̅ d) indicates that C. lupini reproduces clonally. Different morphologies and virulence patterns on white lupin (Lupinus albus) and Andean lupin (Lupinus mutabilis) were observed between and within clonal lineages. Isolates belonging to lineage II were shown to have a minichromosome that was also partly present in lineage III and IV, but not in lineage I isolates. Variation in the presence of this minichromosome could imply a role in host–pathogen interaction. All four lineages were present in the South American Andes region, which is suggested to be the centre of origin of this species. Only members of lineage II have been found outside South America since the 1990s, indicating it as the current pandemic population. As a seedborne pathogen, C. lupini has mainly spread through infected but symptomless seeds, stressing the importance of phytosanitary measures to prevent future outbreaks of strains that are yet confined to South America.

phasing out of fungicides and the aim for a more sustainable agriculture require a better understanding of pathogen population dynamics to design adequate breeding and disease management strategies.
Population genetics offers great potential to get insight in the evolutionary processes that affect genetic diversity and population structure of fungal plant pathogens (McDonald & Linde, 2002). In many important fungal and oomycete plant pathogens, such as Peronospora destructor on onion (Van der Heyden et al., 2022), Fusarium oxysporum on cotton (Halpern et al., 2020), and Colletotrichum graminicola on maize (Rogério et al., 2023), population genetics provided essential information on pathogen evolution and diversity. Knowledge on pathogen population structure is crucial for developing longterm disease management strategies as was pointed out by Wallace et al. (2020) for Pseudoperonospora cubensis on Cucurbitaceae.
Acute ends of its conidia are the most characteristic of the species complex (Damm et al., 2012), but differentiation purely based on morphology has proven to be extremely hard (Cannon et al., 2000).
Hemibiotrophy is the most common lifestyle within the complex, but purely biotrophic, necrotrophic, and endophytic lifestyles have been observed as well (De Silva et al., 2017;Peres et al., 2005). Besides being devastating plant pathogens, members of the C. acutatum species complex offer great potential to serve as model organisms to study host-pathogen evolution (Baroncelli et al., 2017).
The disease is seed-and airborne with typical symptoms being stem twisting and necrotic lesions on stems and pods (Alkemade, Messmer, Voegele, et al., 2021). The most agriculturally important lupin species are blue lupin (Lupinus angustifolius), white lupin (Lupinus albus), and Andean lupin (Lupinus mutabilis) (Wolko et al., 2011). Andean lupin plays a major role in regional food security, while blue and white lupin mostly serve as feed for the livestock industry. Most lupin production takes currently place in Western Australia (47%; FAOSTAT, 2021) but initiatives of the European Union to reduce its dependency on imported soybean have renewed interest in lupin cultivation and the EU is currently producing 39% of the global production. Lupin anthracnose, however, is significantly hampering a further increase in lupin cultivation. As no effective sustainable treatment is available yet (Alkemade, Arncken, et al., 2022), host resistance would be the most desired solution. Recent studies identified three resistance genes in blue lupin (Fischer et al., 2015;Yang et al., 2004Yang et al., , 2008) and one candidate gene and two major quantitative trait loci in white lupin (Alkemade, Nazzicari, et al., 2022;Książkiewicz et al., 2017).
To aid resistance breeding in lupin crops, a better understanding of C. lupini population structure and evolution is required as different populations may vary in geographic distribution and differ in pathogenicity, virulence, or other biological traits.
Characterization of a global C. lupini collection by combining isolate morphology and multilocus sequencing of four loci could distinguish six groups (I-VI; Alkemade, Messmer, Voegele, et al., 2021).
However, morphological characterization has often been shown inconsistent within Colletotrichum (Cannon et al., 2000) and phylogeny based on four loci only gives limited information. To get insight in population structure and evolution, population genetics has been performed on a global collection of C. lupini isolates and members of the C. acutatum species complex. Genotyping of the population has been done through triple digest restriction site-associated DNA sequencing (3D-RADseq), which was shown to increase the number of markers at lower startup costs (Bayona-Vásquez et al., 2019), providing an unparalleled resolution to study lupin's worst pathogen.

| Four distinct lineages identified within C. lupini
A total of 76 Colletotrichum samples (Table S1), originating from North America, South America, Europe, South Africa, and Australia, were sequenced. Genotyping through 3D-RADseq resulted in 1,882,704 single-nucleotide polymorphisms (SNPs) spread over 11 chromosomes after variant calling. The 16 included technical replicates showed an overall pairwise similarity of 99.4%, indicating a sequencing error rate of approximately 0.06% (Table S2). The data set was split into two data sets: one complete data set containing all sampled Colletotrichum species (76) and a second data set containing only C. lupini (67) isolates. After filtering, a total of 9923 and 1863 biallelic and phylogenetically informative SNPs spread over 10 chromosomes were retained for each data set, respectively. The mean sequencing depth was 16 before filtering and 18 after filtering the complete data set (Table S1, Figure S1). Phylogenetic analysis based on maximum likelihood and Bayesian interference was performed on the complete data set and revealed four (I-IV) well-defined lineages within C. lupini (Figure 1). Within the C. lupini data set consisting of 1863 SNPs, 334 SNPs corresponded specifically to lineage I, 552 to lineage II, 134 to lineage III, and 147 to lineage IV (File S1). South African isolate JA10, which was previously grouped together with Peruvian isolate JA20 in group III based on multilocus sequencing and morphology (Alkemade, Messmer, Voegele, et al., 2021), firmly grouped within lineage II. Selected lineage II isolates, except for isolate JA10, showed high virulence (standardized area under the disease progress curve [sAUDPC] > 3) on white lupin, whereas selected isolates from lineage I, III, and IV showed low virulence (sAUDPC < 3) on white lupin ( Figure 1; Table S3). On Andean lupin, observed virulence varies within lineages, with low and high virulence observed for selected isolates of lineage I, II, and IV. Lineage III isolate JA20 showed low virulence on both tested Andean lupin accessions.
Besides differences in virulence, members of lineage I and IV also displayed a diverse intralineage morphology ( Figure S2; Table S4).

| Highest C. lupini diversity found in South America
All isolates collected outside of South America since the 1990s were grouped in lineage II, whereas isolates collected in South America were grouped across the four identified lineages (Figures 1 and 2; Table S1). The global presence of lineage II isolates and their strong virulence indicates this population to be causing the current lupin anthracnose pandemic. In South America, the four distinct lineages appeared to be geographically separated, with isolates belonging to lineage I and III being found in Peru and Bolivia, lineage II isolates being found in Chile, and lineage IV isolates being found in Ecuador ( Figure 2). Separating the C. lupini data set based on origin (South America, North America, Europe, South Africa, and Australia) allowed for pairwise comparison between regions. Analysis of molecular variance (AMOVA) showed that most variance (70%) is explained within regions ( Table 1). The generated minimum spanning networks (clonecorrected and nonclone-corrected) based on multilocus genotypes visualized in Figure 3 highlight that the highest diversity is found in South America. The Simpson diversity index for the nonclonecorrected data showed Europe (0.87) and South America (0.85) to be the most diverse, compared to the other regions (0.56-0.61; Table S5).
However, the Simpson diversity index for the clone-corrected data indicated that the highest diversity was found in South America (0.72) compared to the other regions (0-0.24; Table S6). Altogether, these results contribute to the hypothesis that South America, and specifically the Andes region, is the centre of origin of C. lupini.

| Low genetic exchange between and within observed C. lupini lineages
We examined the genetic structure of C. lupini by performing Bayesian model-based clustering analysis ( Figure 4c) and found that F I G U R E 1 Phylogeny of Colletotrichum lupini. Bayesian analysis tree inferred from 9923 single-nucleotide polymorphisms of 76 Colletotrichum isolates used in this study. Maximum-likelihood bootstrap support values (>90) and Bayesian posterior probabilities (>0.95) are given at each node. The tree is rooted to Colletotrichum fioriniae (RB025 and PF). Isolate codes are followed by species and country of origin. For virulence (green = 1-2, blue = 2-3, orange = 3-4, and red >4) on white lupin (Lupinus albus) cultivar Feodora and Andean lupin (L. mutabilis) landraces LUP17 and LUP100, see Alkemade, Messmer, Voegele, et al. (2021) and Table S3. Isolates followed by "_2" were sequenced twice. Asterisk indicates reference genome isolate. three or four populations best matched the phylogenetic tree of  Figure S4). This strong separation was also observed through AMOVA (Table 1), revealing a significant differentiation between the four different lineages (p = 0.001), explaining 99.88% of the observed genetic variance, whereas the observed genetic variance within lineages only accounted for 0.14%. No big difference was observed between clone-corrected (mlg.filter: 0.05) and nonclone-corrected (mlg.filter: 0.0008) data sets (Table 1 and   Table S7). Estimates of the population differentiation statistic (F ST ) further support an almost complete genetic differentiation between the lineages (Table 2) (Table S8), rejecting the null hypothesis of random mating and indicating clonal reproduction.
High r̅ d values were also observed for the South American, North American, and European populations, but not for the Australian and African populations. The observed results consistently show very low genetic diversity within and rare genetic exchange between the observed lineages of C. lupini, suggesting clonal reproduction.

| Variability in the presence/absence of a minichromosome
A third data set was created exclusively containing markers on the 11th (mini)chromosome present in the reference genome RB211   Baroncelli et al., 2022). A total of 57 proteins have been annotated on minichromosome 11 of the reference genome. Six of those proteins are predicted to be excreted, with three being potential effector candidates (CLUP02_18383, CLUP02_18406, and CLUP02_18404).

| DISCUSS ION
In this study a 3D-RADseq approach was employed on a global collection of Colletotrichum isolates, resulting in a broader data set with a higher resolution compared to previous studies (Alkemade, Messmer, Voegele, et al., 2021;Dubrulle et al., 2020), with the aim to investigate the population structure and genetic diversity of the lupin pathogen C. lupini. Phylogenetics based on 9923 SNPs and population structure analysis based on 1863 SNPs clearly separated C. lupini into four independent lineages. In our previous study (Alkemade, Messmer, Voegele, et al., 2021), we suggested the presence of six distinct groups based on four loci and isolate morphology.
However, classification of Colletotrichum and other fungal species solely based on morphological traits and a few loci can be unreliable (Cannon et al., 2000;Lardner et al., 1999). Most diversity was found in South America, in agreement with previous suggestions that South America, and specifically the Andes region, is the centre of origin of C. lupini (Alkemade, Messmer, Voegele, et al., 2021;Nirenberg et al., 2002;Riegel et al., 2010). This is in line with the assumption that South America is the centre of origin for members of Clade 1 of the C. acutatum species complex (Baroncelli et al., 2017;Bragança et al., 2016). As only a limited amount of South American isolates was analysed and many regions in South America remain unsampled, extensive sampling will possibly identify more C. lupini lineages.
To the best of our knowledge, only lineage I and II have escaped The Andes region is home to a vast diversity of wild lupin species  and is the area of domestication of L. mutabilis . For Andean Lupinus, the estimated species diversification rate is among the fastest recorded for plants (Drummond et al., 2012). In general, the speed of plant evolution in the environmentally diverse Andes has been extraordinarily high (Hughes & Atchison, 2015;Madriñán et al., 2013). The two No sexual state has been recorded for C. lupini and other members of Clade 1 of the C. acutatum species complex (Damm et al., 2012). Vegetative compatibility groups have been described for C. lupini lineage I and II (Elmer et al., 2001;Shivas et al., 1998).
Together with our results, showing low genetic exchange and a high overall r̅ d (0.662), this strongly indicates that at least C. lupini lineage II is a clonal population. However, it is likely that C. lupini as a species reproduces purely clonally, but due to limited samples from the presumed centre of origin South America and lineages I, III, and IV, further research is needed to confirm this. In clonal species, recombination can occur but is scarce enough to maintain a pattern of a clonal population structure (Tibayrenc & Ayala, 2012).
Genetic exchange among C. lupini lineages has happened as admixture was observed between lineage I and IV, IV and III, and III and II. Examples of single genotypes causing significant crop losses are the broad host range Verticillium dahliae (Milgroom et al., 2016), F. oxysporum on banana (Ordonez et al., 2015), and Phytophthora infestans on potato (Maurice et al., 2019). Clonal populations have also been reported within Colletotrichum, such as Colletotrichum kahawae on coffee (Vieira et al., 2018) and the broad host range Colletotrichum fioriniae (Eaton et al., 2021). Other Colletotrichum species show high recombination rates, as observed for C. graminicola on maize (Rogério et al., 2023), Colletotrichum truncatum on soybean (Rogério et al., 2022), and Colletotrichum tanaceti on Australian pyrethrum (Lelwala et al., 2019). However, the most successful invasive fungal pathogens have both a clonal and a sexual reproductive phase .

F I G U R E 4 Colletotrichum
Variation was observed in the presence or absence of a minichromosome (0.5 Mb) described in the reference genome RB221, representing lineage II . This minichromosome was present in all lineage II isolates, partly present in lineage III and IV, but completely absent in lineage I. Mini (dispensable/lineage-specific) chromosomes tend to be highly variable, accumulating mutations F I G U R E 5 Presence/absence matrix of single-nucleotide polymorphisms (SNPs) on minichromosome 11. A total of 5199 SNPs were called on chromosome 11 of reference genome RB221. Red indicates absence and blue indicates presence. Species are followed by lineage and strain code (see also Table S1). and structural rearrangements more rapidly compared to the core genome, and often contain genes involved in host-pathogen interactions (Bertazzoni et al., 2018;Croll & McDonald, 2012 (Ma et al., 2010).
In C. lupini, the minichromosome could play a role in virulence and host adaptation but does not seem to be required for pathogenicity on lupins for its lack in lineage I. As only three potential effector candidate genes were predicted on the minichromosome, it could also have a function unrelated to host-pathogen interaction. Further research is required to draw conclusions on its function in C. lupini.
Altogether, this study demonstrated the existence of four in-

| Culture collection, DNA isolation, and 3D-RAD sequencing
Colletotrichum isolates were collected from public culture collections and from lupin plants with symptoms by collaborators worldwide (Table S1). A total of 51 unique C. lupini isolates were collected from 17 countries across five continents. Nine Colletotrichum species representing the genetic diversity of the C. acutatum species complex were also included. All isolates were single-spored and maintained on potato dextrose agar (Carl Roth) at 22°C in the dark as working cultures. Isolates were stored in 25% glycerol at −80°C for long-term storage. DNA of 2-week-old single-spore cultures was obtained as described in Alkemade, Messmer, Voegele, et al. (2021) using a cetyltrimethylammonium bromide extraction protocol (Minas et al., 2011). The DNA concentration was adjusted to 10 ng/

| Phylogenetic analysis
The SNP data were organized in two data sets. The first data set included all isolates retained after filtering. The second data set included only C. lupini isolates. The informloci function of R package poppr (Kamvar et al., 2014) was used in R v. 4.0.3 (R Core Team, 2020 to remove phylogenetically uninformative loci. For the phylogenetic analysis the complete data set was used. The VCF file was converted to a fasta file using the R package vcfR (Knaus & Grünwald, 2017) and multiple sequence alignment was performed with MAFFT v. 7.453 (Katoh & Standley, 2013). Phylogenetic analyses were based on maximum likelihood and Bayesian inference and were performed through the CIPRES science gateway portal (Miller et al., 2011). The maximum-likelihood analysis was performed using RAxML v. 869 (Stamatakis, 2014)  Bootstrap support values from the maximum-likelihood analysis were plotted on the Bayesian phylogeny. Trees were visualized using iTOL v. 6 (Letunic & Bork, 2007).

| Genetic differentiation and population structure
Population analysis was performed on the C. lupini data set. The number of multilocus genotypes was determined using the mlg.
filter function by using a threshold determined by the cutoff_predictor based on Euclidean distance using poppr in R. Clone correction was performed using the mlg.filter option using a threshold of 0.05. Diversity statistics and minimum spanning networks of the clone-corrected and nonclone-corrected data sets were generated using the poppr and poppr.msn functions, respectively. The genetic variation among and within populations was estimated by AMOVA (Excoffier et al., 1992) using the amova function within poppr. Analysis was performed with 1000 permutations on the nonclone-corrected and clone-corrected data sets. Genetic differentiation among populations was assessed by calculating SNP F ST values (Weir & Cockerham, 1984) using the genet.dist function of hierfstat (Goudet, 2005). To examine population structure, a PCA was performed using the glPca function of adegenet v. 2.1.3 (Jombart, 2008). An identity-by-state kinship matrix (Astle & Balding, 2009) was generated using statgenGWAS (van Rossum et al., 2020). Heatmaps were visualized using ComplexHeatmap in R.
Genetic structure was further assessed by performing a nonmodelbased DAPC using the dapc function of adegenet (Jombart, 2008).

| Mode of reproduction
The index of association (I A ) is a measure of multilocus linkage disequilibrium based on the variance of pairwise distances between genotypes to test the null hypothesis of random mating. The I A and standardized index of association (r̅ d ), taking into account the number of loci, were calculated for both the clone-corrected and the nonclone-corrected data set using the ia function of poppr with 999 permutations. An r̅ d close to 0 indicates random mating, whereas an r̅ d close to 1 indicates clonal reproduction.

| Functional analysis
Annotated protein sequences on the minichromosome (11) of the C. lupini reference genome (RB221) were used to predict a potential role in host interaction. We used SignalP v. 6.0 (Teufel et al., 2022) and Phobius (Käll et al., 2004) to identify secreted proteins.

| Virulence and morphology
Virulence tests were performed on white lupin cultivar Feodora and Andean lupin landraces LUP17 and LUP100 with C. lupini isolates JA01 (II), CBS19225 (I), RB121 (I), and JA23 (IV) through stemwound inoculation as described by . The method was shown to highly correlate with three- year field trials in Switzerland (Alkemade, Nazzicari, et al., 2022).
Disease scores ranging from 1 (nonpathogenic) and 2 (low virulence) to 9 (highly virulent; Alkemade, Messmer, Arncken, et al., 2021) were taken 3, 7, and 10 days postinoculation (dpi) and the sAUDPC was calculated (Jeger & Viljanen-Rollinson, 2001). All inoculations were performed in a growth chamber (25 ± 2°C, 16 h light, and 70% relative humidity) in a complete randomized block design with nine replicates. RB121 and JA23 morphologies were characterized as described in Alkemade, Messmer, Voegele, et al. (2021). Data were merged with available morphological and virulence data of Alkemade, Messmer, Voegele, et al. (2021). Statistical analyses were performed with R v. 4.0.3 using the packages lme4 (Bates et al., 2015), lmerTest (Kuznetsova et al., 2017), and emmeans (Lenth et al., 2020), following a mixed model with factors of interest (i.e., isolate, lupin accession) as fixed and replicated block as random factor. Data sets that did not follow assumptions of normality of residuals and homogeneity of variance were log 10 -transformed. Data are presented as estimated least-squares means using the aforementioned mixed model.
Tukey's honestly significant difference (HSD) test (p ≤ 0.05) was applied for pairwise mean comparisons of the different strains within each lupin accession.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available