Identification of Avramr1 from Phytophthora infestans using long read and cDNA pathogen‐enrichment sequencing (PenSeq)

Abstract Potato late blight, caused by the oomycete pathogen Phytophthora infestans, significantly hampers potato production. Recently, a new Resistance to Phytophthora infestans (Rpi) gene, Rpi‐amr1, was cloned from a wild Solanum species, Solanum americanum. Identification of the corresponding recognized effector (Avirulence or Avr) genes from P. infestans is key to elucidating their naturally occurring sequence variation, which in turn informs the potential durability of the cognate late blight resistance. To identify the P. infestans effector recognized by Rpi‐amr1, we screened available RXLR effector libraries and used long read and cDNA pathogen‐enrichment sequencing (PenSeq) on four P. infestans isolates to explore the untested effectors. Using single‐molecule real‐time sequencing (SMRT) and cDNA PenSeq, we identified 47 highly expressed effectors from P. infestans, including PITG_07569, which triggers a highly specific cell death response when transiently coexpressed with Rpi‐amr1 in Nicotiana benthamiana, suggesting that PITG_07569 is Avramr1. Here we demonstrate that long read and cDNA PenSeq enables the identification of full‐length RXLR effector families and their expression profile. This study has revealed key insights into the evolution and polymorphism of a complex RXLR effector family that is associated with the recognition by Rpi‐amr1.


| INTRODUC TI ON
Potato late blight, caused by the hemibiotrophic oomycete pathogen Phytophthora infestans, triggered the Irish and European famine in the late 1840s, and still causes severe losses to world potato production.

T E C H N I C A L A D V A N C E
Identification of Avramr1 from Phytophthora infestans using long read and cDNA pathogen-enrichment sequencing (PenSeq) 2016). Unlike wild potatoes, Solanum nigrum and Solanum americanum have been reported to be nonhosts for P. infestans (Colon et al., 1993).
Two Rpi genes encoding NLR proteins, Rpi-amr3 and Rpi-amr1, were cloned from S. americanum and confer late blight resistance in potato (Witek et al., 2016(Witek et al., , 2020.  would open the way to investigate their virulence function and distribution in P. infestans populations. Moreover, it could also help to diagnose Rpi gene repertoires in resistant plants, and individually confirm their activity in genetically modified potatoes carrying multiple Rpi genes. In oomycetes, all the cloned Avr proteins contain a signal peptide and RXLR motif (Rehmany et al., 2005), and the genomic sequencing of P. infestans revealed 563 RXLR effectors in the T30-4 reference genome (Haas et al., 2009). This enabled a high-throughput effectoromics approach for functional screening of the candidate effectors in plants (Vleeshouwers et al., 2008(Vleeshouwers et al., , 2011, and many Avr genes were identified by this approach, including Avrblb1, Avrblb2, and Avrvnt1 (Vleeshouwers et al., 2008;Oh et al., 2009;Pel, 2010).

Identification of the recognized effectors for
However, available RXLR effector libraries do not contain recombinant clones of all P. infestans RXLR effectors, and the effector candidates were defined on the basis of expression profile, motif analysis, and distribution between P. infestans races (Vleeshouwers et al., 2008;Haas et al., 2009;Oh et al., 2009). In total, c.300/563 RXLR effectors were previously cloned into expression vectors for functional screening (Rietman, 2011).
To further explore the diversity of RXLR effectors from P. infestans, a pathogen-enrichment sequencing (PenSeq) approach was adopted to study allelic variation of RXLR effectors and population genomics of oomycetes. A bait library of RXLR effectors and some other pathogen-related genes was synthesized and used for enrichment prior to sequencing Thilliez et al., 2019). However, the previous PenSeq analyses used Illumina reads and genomic DNA (gDNA), making it difficult to differentiate individual effector alleles and closely related paralogs or to find out which effectors are expressed.
Here, to identify the recognized effector of the newly cloned Rpi-amr1 protein from S. americanum (Witek et al., 2020), we screened all currently available RXLR effectors for recognition but without success. Therefore, we adapted and improved PenSeq with long read (PacBio) and cDNA sequencing, and extended the list of candidate effectors that could be screened. Amongst these additional candidate RXLR genes, we identified Avramr1 and defined orthologs and paralogs from four different isolates of P. infestans.

| Available recombinant RXLR effector libraries do not contain Avramr1
To identify Avramr1, we tested 278 available RXLR effectors (Table S1) by coexpressing them with Rpi-amr1-2273 in Nicotiana benthamiana (Rietman, 2011). Rpi-amr1-2273 (hereafter Rpi-amr1) is a functional Rpi-amr1 homolog cloned from S. americanum SP2273 (Witek et al., 2020). However, we did not identify an effector that activated Rpi-amr1-dependent hypersensitive response (HR) from the available RXLR effector libraries and concluded these libraries are incomplete. Notably, Avr8 was not originally included in the core effector selection because Avr8 expression goes up earlier then 2 days postinoculation (dpi) (Jo, 2013), showing that the criteria adopted to define core effectors do not reveal all recognized effectors.
To find Avramr1, we proposed three hypotheses: (a) Avramr1 is an RXLR effector but the recognized allele is not present in the assembled version of P. infestans T30-4 reference genome,

| PacBio PenSeq of four P. infestans isolates, EU_13_A2, EC1_A1, EU_6_A1, and US23
PacBio gDNA PenSeq was performed on four P. infestans isolates of genotypes EU_13_A2, EC1_A1, EU_6_A1, and US23 ( Figure 1a). To evaluate the enrichment efficiency, quantitative polymerase chain reaction (qPCR) was performed with the DNA pre-and postcapture. In general, the targeted genes of different length were well enriched at concentration × time (Cot) value <20, while the untargeted genes were almost undetectable, with Cot > 27 (Peterson et al., 2002) ( Figure S1). Furthermore, we found that the capture efficiency was increased by including a 10-fold molar excess of nonadaptor-ligated fragmented P. infestans DNA (500-1,000 bp) in the reannealing reaction to reduce the extent to which sequences were recovered due to concatenation of transposon-containing sequences adjacent to RXLR genes. After sequence capture, enrichment of most effector genes was more efficient when nonadaptor-ligated P. infestans DNA was included ( Figure S1).
Following the enrichment sequencing, circular consensus sequencing reads were assembled ( Figure 1a) and contigs with fewer than 10 reads were removed. The average length of the contigs of coverage >10 reads was 7 kb (Table 1), and the size of the largest contig was over 50 kb. This suggests that the PacBio PenSeq successfully captured the target effector genes and the adjacent flanking DNA sequences. In total, 1,137, 1,054, 1,283, and 925 contigs were obtained from EU_13_A2, EC1_A1, EU_6_A1, and US23, respectively, of which 687, 650, 741, and 571 contigs contain RXLR effectors (Table 1 and Notes S1-S4). The remaining contigs contained non-RXLR effectors, which were included in the bait library design for other purposes (Thilliez et al., 2019).
The PacBio PenSeq data allowed us to detect new RXLR effector alleles from different haplotypes of various P. infestans isolates and even in polyploid genotypes like EU_13_A2 (Li et al., 2017). This data set can also be used to extensively study allelic variation, presence/ absence (P/A) polymorphism, and effector evolution. For example, Avr1 (PITG_16663) and a paralogous Avr1-like gene (PITG_06432) are located on supercontigs 1.51 and 1.8 of the reference T30-4 genome, respectively. The R1-breaking clonal lineage EU_13_A2 was reported to have an 18 kb deletion comprising the Avr1 locus (Cooke et al., 2012). Also, the Illumina PenSeq data showed that the Avr1 locus is missing in EU_13_A2, EC1_A1, and US23 (Thilliez et al., 2019). We mapped the four Avr1 contigs from EU_13_A2 (contigs 192, 261, 296, and 329) to supercontig 1.51 and 1.8, and found that all four contigs map to the Avr1-like supercontig 1.8. Two contigs (contig 261 and 286) mapped to the Avr1-like locus, and two other contigs (contig 192 and 329) mapped to a locus next to Avr1-like that was not previously annotated ( Figure S2), although the genes in those two contigs might be pseudogenes as the signal peptide is missing in both of them. Additionally, in EU_6_A1 and US23, two Avr1 contigs did not map to Avr1 or Avr1-like loci of T30-4. Thus, our PacBio PenSeq data set can provide the means to detect novel RXLR effector paralogs absent from the reference genome.
As another example, our data set carries in total 504 of the 563 predicted RXLR effectors from the reference genome T30-4 (Haas et al., 2009). To investigate P/A polymorphism of RXLR effectors in the four sequenced isolates, we performed a basic local alignment search (BLAST) of the 504 effectors against the PacBio contigs, with hits with <50% coverage defined as absent (Table S2). We found F I G U R E 1 The pipelines of PacBio and cDNA pathogen-enrichment sequencing (PenSeq). (a) The pipeline of PacBio gDNA PenSeq. Briefly, the gDNA isolated from various Phytophthora infestans was enriched for RXLR effectors, sequenced by PacBio and de novo assembled for data mining. (b) The pipeline of cDNA PenSeq. The cDNA was synthetized using RNA sampled from various P. infestans at different stages (mycelium, zoospore, 12 hr postinoculation, 1, 2, and 3 days postinoculation). The libraries enriched for RXLR effectors were sequenced, reads were mapped to the RXLRome of the reference P. infestans genome T30-4 and the expression levels of samples were calculated and compared. Black lines with dots represent the baits, the enriched fragments are depicted in blue (EU_13_A2), yellow (EC1_A1), pink (EU_6_A1), and green (US23 Taken together, we have generated a rich data set that could help to define full-length RXLR effector genes, deliver robust information on alleles and paralogs, and reveal conserved or race-specific effectors from different isolates. The data set is available in full in Notes S1-S4.

| cDNA PenSeq enables effector expression detection in early stages of infection
To clarify whether the untested effectors might be putative Avr genes, we performed cDNA PenSeq for the four P. infestans isolates EU_13_A2, EC1_A1, EU_6_A1, and US23, at different time points after infection (12 hr postinoculation [hpi], 1, 2, and 3 dpi) and in mycelium and zoospores ( Figure 1b). To analyse and visualize the cDNA PenSeq data, we built an artificial DNA sequence contig ("RXLRome") for the RXLR effectors. In addition, nine non-RXLR genes from the bait library were included as controls The cDNA PenSeq data for the RXLR effectors from four Phytophthora infestans at different stages were mapped to an artificial contig (RXLRome) of 499 RXLR effectors and nine non-RXLR genes, demarcated by bright green arrows on the outer edge of the diagram. Black lines separate the previously tested RXLR effectors (grey bar), new effector candidates with differential expression (red bar), unexpressed effectors (blue bar) and non-RXLR controls (cyan). The concentric circles in blue, yellow, pink, and green represent data from P. infestans EU_13_A2, EC1_A1, EU_6_A1, and US23, respectively. The arrows on them indicate differential expression (red, up-regulation; blue, downregulation; no fill, no difference), where the more intense the colour, the bigger the difference. The data are plotted as follows: a, mycelium vs. zoospores (for EU_13_A2 only); b, zoospores vs. 12 hr postinoculation (hpi); c, 12 hpi vs. 1 day postinoculation (dpi); d, 1 dpi vs. 2 dpi; e, 2 dpi vs. 3 dpi. Eleven known Avr genes, namely Avr4, AvrSmira1, Avr8, Avr10, Avr3a, Avrvnt1, Avr1, Avr3b, Avrblb2, Avrblb1, and Avr2, are indicated by black arrows. PITG_07569 is indicated by a black arrow might also represent additional potential Avr genes, while others are poorly expressed in some isolates. The details of the cDNA PenSeq are available in Table S3.

| Identification of Avramr1
To test if Avramr1 is among the untested effectors, we selected 47 highly expressed RXLR effectors ( Figure 3) present in all tested lineages that had not previously been investigated. The effectors were synthesized, cloned into an expression vector with CaMV 35S promoter, and transformed into Agrobacterium GV3101-pMP90 for agroinfiltration in N. benthamiana (Figure 4a). All the effectors were infiltrated alone or coinfiltrated with Rpi-amr1 (Witek et al., 2020). Among the 47 effectors, PITG_07569 was the only effector that triggered an HR when coexpressed with Rpi-amr1. Hence, we concluded PITG_07569 is Avramr1. To verify if both proteins were expressed in planta, we cloned Avramr1 and Rpi-amr1 with C-terminal green fluorescent protein (GFP) and His-FLAG (HF), respectively. Both recombinant proteins were expressed and detected in N. benthamiana by western blot (Figure 4c) The same constructs were used for agroinfiltration in N. benthamiana, and HR was observed specifically after coexpression of Avramr1-GFP and Rpi-amr1-HF (Figure 4b).

| Avramr1 homologs in different P. infestans isolates and other Phytophthora species
Avramr1 is a canonical RXLR effector with RYLR and EER motifs and an N-terminal signal peptide (Figure 5b). Avramr1 locates on supercontig F I G U R E 3 Raw transcript counts for the new candidate RXLR effectors. The 47 most differentially expressed RXLR effectors from the previously untested set were selected, and the raw transcript counts were visualized as a heat map across time points and treatments. Each square indicates a single data point derived from two independent biological replicates. The colours red, orange, yellow, and blue represent >100, 50-100, 0-50, or 0 raw transcripts, respectively 1.11 of the P. infestans reference genome T30-4. Avramr1-like (hereafter Avramr1L), a truncated paralog (PITG_07566), maps adjacent to Avramr1 (Figure 5a,b). Two known Avr effectors, Avr8 (PITG_07558) and Avrsmira1 (PITG_07550), are physically close to the Avramr1 locus in the T30-4 genome ( Figure 5a) (Rietman et al., 2012).
To study the sequence polymorphism of Avramr1 homologs in P. infestans, we used BLAST to search for Avramr1 homologs in the PacBio PenSeq assemblies generated in this study. This revealed that EU_13_ A2, EC1_A1, EU_6_A1, and US23 carry six, four, three, and six Avramr1 homologs, respectively (Note S5). Next, we aligned the corresponding Avramr1 amino acid sequences and generated a neighbour-joining (NJ) tree for phylogenetic analysis (Figure 6a). Two Avramr1 homologs from Phytophthora parasitica and Phytophthora cactorum were identified from public databases, and they were used as an out-group (Figures 5b and 6a). Based on the phylogenetic tree, we distinguished four Avramr1 clades, clade A (containing Avramr1 from T30-4) and clade C (with Avramr1L from T30-4), and two more clades, B and D (Figure 5a). For a more detailed analysis, we selected one Avramr1 homolog from clade B and one from D (Avramr1-13B1 and Avramr1-13D1 from EU_13_A2) and aligned them with Avramr1 homologs from clade A and C, and with P. parasitica and P. cactorum homologs. Significant sequence polymorphisms were observed between effectors from different clades (Figure 5b). Meanwhile, the Avramr1 homologs within the same clade were almost identical ( Figure 6a).

| Differential expression of Avramr1 homologs in different P. infestans isolates
To investigate the expression patterns of Avramr1 homologs de- in expression at the zoospore stage, and at 1, 2, and 3 dpi.
In summary, our PacBio PenSeq analysis created a rich data set to reveal new Avr variants from different P. infestans isolates, and to quantify their expression profile individually. This facilitates the analysis of the polymorphism of pathogen effectors and their potential differential recognition patterns with the corresponding Rpi genes (Witek et al., 2020).

| D ISCUSS I ON
The availability of the P. infestans genome sequence enabled a step-change in the rate of investigation of this pathogen, accelerating the discovery of recognized effectors, and of new Rpi genes (Vleeshouwers et al., 2008(Vleeshouwers et al., , 2011Haas et al., 2009) (Jupe et al., 2013;Witek et al., 2016;Arora et al., 2019;Lin et al., 2020). Recently, the pan-NLRome of 65 diverse Arabidopsis thaliana accessions was determined by a similar strategy, revealing that any one accession lacks many of the NLRs found in the species pan-NLRome ( Van de Weyer et al., 2019).
PenSeq was developed to facilitate cost-effective investigation of pathogen diversity on infected plants, and polymorphism of pathogen effectors Thilliez et al., 2019). The first PenSeq studies, however, were conducted using Illumina short reads. This significantly limited their resolving power as many oomycete genomes are highly heterozygous, and some F I G U R E 5 Genomic localization and amino acid alignment of Avramr1. (a) The localization of Avramr1 (PITG_07569, red arrow) on supercontig 1.11 of the reference Phytophthora infestans T30-4 genome. A paralog Avramr1L gene (PITG_07566, blue arrow) is located close to Avramr1. The supercontig contains two other known Avr genes, Avrsmira1 (PITG_07550, pink arrow) and Avr8 (PITG_07558, green arrow).
(b) The alignment of protein sequences of Avramr1 and selected homologs from P. infestans, Phytophthora cactorum (Pc), and Phytophthora parasitica (Pp). The dark green bars on top of the alignment indicate 100% identity while olive green and red bars indicate various degrees of polymorphism between the sequences. RXLR and EER motifs are highlighted by red boxes effectors belong to large gene families with multiple sequence-related paralogs that can lead to false assemblies (Gilroy et al., 2011;Oliva et al., 2015).
In this study, we combined long read and cDNA Penseq, enabling a detailed analysis of the RXLR genes and their expression patterns in different P. infestans isolates. The cDNA PenSeq data set allowed us to define an additional set of 47 RXLR genes expressed during infection that were not previously investigated. Amongst these, we identified Avramr1, which encodes the cognate recognized effector for Rpi-amr1 from S. americanum (Witek et al., 2020). It is noteworthy that PITG_07569 (Avramr1) was identified by an alternative splicing reporter system as a splicing regulatory effector; furthermore, it was shown to promote the colonization of P. infestans .
The long read PenSeq data helped us to obtain full-length RXLR effector haplotypes with their flanking sequences. This allowed us to distinguish individual alleles from polyploid isolates like EU_13_A2, and also distinct effector paralogs. The sequences flanking the RXLR genes enabled us to understand the possible translocation events and identify new RXLR loci. We were also able to identify multiple new Avramr1 homologs from different isolates, and identified a new Avramr1 clade D that is not present in T30-4. This data set allows us to study the differential recognition pattern of Rpi-amr1 and Avramr1 homologs.
Indeed, different Rpi-amr1 homologs could recognize different sets of Avramr1 homologs, including the Avramr1 homolog from the newly identified clade D (Witek et al., 2020). So far, no Rpi-amr1-breaking P. infestans isolates have been found (Witek et al., 2020), and therefore we propose that Avramr1 might be crucial for the virulence of P. infestans. The identification of Avramr1 will enable us to study its virulence function and its polymorphism in the P. infestans population, and study the effector-triggered immunity mediated by Rpi-amr1.

F I G U R E 6
Phylogeny and expression profile of Avramr1 homologs from EU_13_A2, EC1_A1, EU_6_A1, and US23. (a) Maximum-likelihood phylogeny of the protein sequences of the Avramr1 homologs were made by IQ-TREE (Minh et al., 2020). PcAvramr1 and PpAvramr1 are Avramr1 homologs from Phytophthora cactorum and Phytophthora parasitica, and they were used as out-groups. (b) The expression profile of Avramr1 homologs at different stages and time points (zoospores, 12 hr postinoculation, 1, 2, and 3 days postinoculation). Transcripts per kilobase million (TPM) for each effector homologs were visualized as follows: black, data not available; red, 350,000-1,000,000 TPM; orange, 200,000-350,000 TPM; yellow, 0-200,000 TPM; beige, 0 TPM. Avramr1 and Avramr1L are from the reference genome T30-4, which was not included in the cDNA PenSeq Collectively, the PenSeq data set constitutes a valuable community resource for investigating the allelic and expression diversity of multiple recognized effectors. The long reads and cDNA PenSeq methods will contribute to understanding this fast-evolving and destructive oomycete pathogen, and to achieving durable late blight resistance in potato.

| Sample preparation
To collect the mycelium of P. infestans for DNA extraction, P. infestans strains were grown on rye sucrose agar (RSA) for 7 days and then

| PacBio and Illumina PenSeq capture
The biotinylated RNA bait library of 120 nucleotides (nt) was designed for enriching RXLR effectors from P. infestans and some other genes of interest. The library contains 18,348 baits, as described previously Thilliez et al., 2019).

| gDNA PenSeq assembly
PacBio raw reads were processed as described in Witek et al. (2016) to generate ROI reads and demultiplexed using custom script (Van

| Analysis of cDNA PenSeq
All RXLR effectors from the P. infestans reference genome T30-4 were used to generate an artificial "RXLRome" contig, where RXLR effectors' sequences were separated by stretches of 500 "Ns". The contig also contained nine non-RXLR control genes . The cDNA PenSeq reads from all treatments were mapped to the T30-4 RXLRome, and the expression analyses were performed and visualized using Geneious R10 (Kearse et al., 2012).

| New candidate RXLR effectors
For the previously untested RXLR effectors, we first selected the effectors showing differential expression at different stages and ranked them based on the raw transcript counts. Next, local alignment searches (BLAST) were performed against the 563 predicted RXLR effectors (Haas et al., 2009) to remove the previously tested effectors. This analysis revealed 47 candidate RXLR effectors that were not included in the previous functional study. The 47 RXLR effectors were synthesized by Twist Bioscience. The signal peptides were removed, the sequences were domesticated for Golden Gate cloning, and overhangs containing BsaI restriction sites were added to both ends of all effector sequences.
All the effectors were cloned into vector pICSL86977 (TSL SynBio) with CaMV 35S promoter and OCS terminator. To further verify the expression Avramr1 and Rpi-amr1, they were fused with C-terminal GFP and His-FLAG (HF) tags, respectively. Agrobacterium strain GV3101-pMP90 was transformed with the constructs for agroinfiltration.

| Cell death assay
Transient expression of RXLR effectors and Rpi-amr1 in N. benthamiana was performed as described previously (Bos et al., 2006).

| Protein extraction and immunoblot analysis
The p35S-Rpi-amr1-HF and p35S-Avramr1-GFP constructs were used to transiently express the fusion proteins in N. benthamiana.
The leaf tissue was harvested 2 days after infiltration and proteins were extracted as described in Guo et al. (2020). The expression of recombinant Avramr1-GFP and Rpi-amr1-HF was determined by SDS-PAGE as described in Guo et al. (2020). Horseradish peroxidase-conjugated antibodies (anti-FLAG M2, 1:10,000 dilution, Sigma; anti-GFP, 1:10,000 dilution, Santa Cruz Biotechnology) were used for the immunoblot. The chemiluminescence was detected by ImageQuant LAS 4,000 (Life Sciences) after chemiluminescent substrate incubation (SuperSignal West Pico & West Femto).

| Sequence and phylogenetic analysis
All sequences were analysed in Geneious R10 (Kearse et al., 2012), MAFFT was used for sequence alignment (Katoh and Standley, 2013), and the signal peptides of Avramr1 homologs were removed manually for the phylogenetic analysis. IQ-TREE was used for the phylogenetic analysis and the JTTDCMut model was selected as best-fit model by IQ-TREE (Minh et al., 2020).