Comparative genomics among cyst nematodes reveals distinct evolutionary histories among effector families and an irregular distribution of effector‐associated promoter motifs

Abstract Potato cyst nematodes (PCNs), an umbrella term used for two species, Globodera pallida and G. rostochiensis, belong worldwide to the most harmful pathogens of potato. Pathotype‐specific host plant resistances are essential for PCN control. However, the poor delineation of G. pallida pathotypes has hampered the efficient use of available host plant resistances. Long‐read sequencing technology allowed us to generate a new reference genome of G. pallida population D383 and, as compared to the current reference, the new genome assembly is 42 times less fragmented. For comparison of diversification patterns of six effector families between G. pallida and G. rostochiensis, an additional reference genome was generated for an outgroup, the beet cyst nematode Heterodera schachtii (IRS population). Large evolutionary contrasts in effector family topologies were observed. While VAPs (venom allergen‐like proteins) diversified before the split between the three cyst nematode species, the families GLAND5 and GLAND13 only expanded in PCNs after their separation from the genus Heterodera. Although DNA motifs in the promoter regions thought to be involved in the orchestration of effector expression (“DOG boxes”) were present in all three cyst nematode species, their presence is not a necessity for dorsal gland‐produced effectors. Notably, DOG box dosage was only loosely correlated with the expression level of individual effector variants. Comparison of the G. pallida genome with those of two other cyst nematodes underlined the fundamental differences in evolutionary history between effector families. Resequencing of PCN populations with different virulence characteristics will allow for the linking of these characteristics to the composition of the effector repertoire as well as for the mapping of PCN diversification patterns resulting from extreme anthropogenic range expansion.


| INTRODUC TI ON
Worldwide, affordable food and feed production depends on largescale monocropping. For practical and economic reasons crop homogeneity in terms of yield quality and quantity is essential. At the same time, such systems are intrinsically vulnerable to damage by pests and pathogens. The highest susceptibility to biotic stressors is found in genetically homogeneous crops. Potato is the third most important staple food (Birch et al., 2012), and in most production systems clonally propagated seed potatoes are used as starting material. Such production systems need rigorous disease management.
Potato cyst nematodes (PCNs), the common name for two species, Globodera pallida and G. rostochiensis, are among the primary yieldlimiting potato pathogens worldwide. PCNs co-evolved with potato in the Andes in South America (see, e.g., Plantard et al., 2008), and proliferation of potato as a main crop outside of its native range was unintentionally paralleled by an enormous range expansion of PCNs. For decades, PCN control has mainly been dependent on the application of nematicides. Due to the nonspecific nature of these nematicides, they have a highly negative impact on the environment, and their use is therefore either banned or severely restricted in most parts of the world. Currently, crop rotation and the use of resistant potato varieties are the main means for PCN control. For economic reasons, the use of plant resistances is preferred over crop rotation. However, potato resistance genes such as H1 (Toxopeus & Huijsman, 1952), Gro1-4 (Paal et al., 2004), Gpa2 (Bakker et al., 2003) and H2 (Strachan et al., 2019) are only effective against specific pathotypes of one of these PCN species. Nevertheless, there is no robust (molecular) pathotyping scheme that would allow for matching of the genetic constitution of field populations with effective host plant resistance genes.
Effectors are proteins secreted by plant pathogens that allow manipulation of the physiology of the host plant and interfere with the host's innate immune response in favour of the invading organism (e.g., Stergiopoulos & De Wit, 2009). PCN effectors have some peculiar characteristics. With at least one known exception, HYPer variable proteins (HYPs) (Eves-van den , most effectors are produced in large single-celled glands referred to as the subventral and dorsal oesophageal glands. These glands empty into the pharynx lumen, and the lumen is connected to a hollow protrusible stylet with which nematodes pierce plant cell walls.
Via the orifice of the stylet, effector proteins are transferred to the apoplast or the cytoplasm of infected host plant cells. Notably, subventral gland effectors are functional during plant penetration.
Subsequently, dorsal gland secretions are responsible for feeding site induction and suppression of the host's innate immune system . It has been hypothesized that common transcription factors and/or common promoter motifs might facilitate coordinated expression of effectors during the infection process.
Such mechanisms have been identified to regulate effector expression in plant pathogenic fungi and oomycetes (Jones et al., 2019;Roy et al., 2013). Also, among plant-parasitic nematodes, promotor motifs have been identified upstream of effectors that could contribute to the orchestration of the infection process. In the case of the PCN G. rostochiensis, a DOrsal Gland motif ("DOG box") was identified by Eves-van den Akker et al. (2016). For the pinewood nematode Bursaphelenchus xylophilus, a regulatory promotor motif referred to as STATAWAARS was demonstrated to affect effector expression (Espada et al., 2018). Expression of several effectors of Clade I tropical root-knot nematodes (Tandingan De Ley et al., 2002) was suggested to be steered by a putative cis-regulatory motif "Mel-DOG" (Meloidogyne DOrsal Gland, Da Rocha et al., 2021).
Probably as a reflection of the co-evolution between nematodes and their host(s), effectors are typically encoded by multigene families showing family-specific levels of diversification (Masonbrink et al., 2019;Van Steenbrugge et al., 2021). Cyst nematodes harbour numerous effector families (see, e.g., Pogorelko et al., 2020), and genome (re-)sequencing is a rigorous approach to generate comprehensive overviews of PCN effector family compositions. The first genomes of G. pallida and G. rostochiensis were published by Cotton et al. (2014) and Eves-van den Akker et al. (2016). Although this constituted a major step forward, both genomes are highly fragmented, hampering effector family inventories. Recently, long-read technology allowed for the generation of a less fragmented and more complete reference genome for G. rostochiensis with-among other things-a 24-fold reduction in the number of scaffolds as compared to the initial reference genome (Van Steenbrugge et al., 2021).
Here, we present a novel reference genome for the other PCN, G. pallida, characterized by a 42-fold reduction in the number of scaffolds, together with a reference genome of the beet cyst nematode Heterodera schachtii. The H. schachtii genome was used to establish the polarity of effectorome contrasts between the two PCN species. Detailed knowledge of the nematode's effector repertoire, a complete overview of variants within effector families and insights in the evolutionary history of individual effector families are essential for a molecular pathotyping scheme. In addition to comparing effector diversification patterns, we investigated DOG box distribution and DOG box dosage (up to 16 DOG boxes were observed per putative promoter region) both within and among effector families.
Scrutinizing putative effector promoter regions in three reference genomes allowed us to pinpoint the distribution of this putative regulatory motif among cyst nematode species, as well as among and within effector families. Subsequently, the impact of these new, long-read technology-based reference genomes on ecological PCN diversification in general and on the development of effectoromebased pathotyping systems for PCNs in particular is discussed.

| DNA isolation and sequencing
Cysts from Globodera pallida line D383 were used as starting material for the collection of preparasitic second-stage juveniles (J2). Pa2-D383 is a relatively avirulent G. pallida population from The Netherlands (Rouppe van der Voort et al., 1998). J2s were concentrated, and sucrose centrifugation was used to purify the nematode suspension (Jenkins, 1964). After multiple rounds of washing the purified nematode suspension in 0.1 m NaCl, nematodes were resuspended in sterilized MQ water. Juveniles were lysed in a standard nematode lysis buffer with proteinase K and beta-mercaptoethanol at 60°C for 1 hr as described in Holterman et al. (2006). The lysate was mixed with an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1) (pH 8.0) following a standard DNA purification procedure, and finally, DNA was precipitated with isopropanol. After washing the DNA pellet with 70% ethanol several times, it was resuspended in 10 mm Tris-HCl (pH 8.0). G. pallida D383 DNA (10-20 mg) was sequenced using was isolated with a procedure similar to the one used for G. pallida, but DNA was precipitated using an ice-cold ethanol precipitation step (Jain et al., 2018). DNA fragments of <10 kb were depleted using a short read eliminator kit (Circulomics SS-100-121-01) and H. schachtii DNA (15 mg) was sequenced using Oxford Nanopore technology at NexOmics.

| Genome assemblies and synteny
For G. pallida D383, raw PacBio reads, and for H. schachtii IRS, Oxford Nanopore reads were corrected to, in essence, merge haplotypes using the correction mode in canu (Koren et al., 2017), by reducing the error rate to a maximum of 15% and the corrected coverage to a minimum of 200. Using wtdgb2 version 2.3 (Ruan & Li, 2020), multiple initial genome assemblies were generated based on the corrected Nanopore reads while manually refining the parameters minimal read length, k-mer size and minimal read depth. These parameters were optimized to generate an assembly close to the expected genome size of G. pallida and H. schachtii.
After optimization, for G. pallida, a minimum read length cut-off of 5000, minimal read depth of 6 and a k-mer size of 18 were used.
To generate the assembly of H. schachtii, a minimum read length cut-off of 6000, minimal read depth of 8 and a k-mer size of 21 were used. The remaining haplotigs were pruned from the assemblies using purge haplotigs version 1.0.4 (Roach et al., 2018). The contigs from the assemblies were then improved using finishersc version 2.1 (Lam et al., 2015) at default settings and scaffolded using sspace-longread version 1.1 (Boetzer & Pirovano, 2014 (Darling et al., 2004). The resulting alignments of regions larger than 1 kb and larger than 3 kb were visualized in circos version 0.69-9 (Krzywinski et al., 2009).
Completeness of the assemblies was assessed using busco version 5.2.2 using the eukaryota_odb10 database (Manni et al., 2021).
busco was run in genome mode using metaeuk version 5.34c21f2 as the gene predictor (Levy Karin et al., 2020).

| Generation of RNAseq data from various life stages
Stem cuttings of potato genotype SH (Bakker et al., 2003) were grown in vitro and inoculated with infective second-stage juveniles of G. pallida population D383. Plants were kept at 18°C in the dark and nematode-infected potato root segments were collected 3 and 6 days after infection (three biological replicates).
Samples were collected and snap-frozen in liquid nitrogen. RNA extraction was performed using the Maxwell 16 LEV-plant RNA kit (Promega) following the manufacturer's protocol. RNA degradation and contamination were monitored on a 1% agarose gel.
Purification was checked by using a NanoPhotometer spectrophotometer (IMPLEN). RNA integrity and quantification were assessed by using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies). RNA sequencing was done at Novogene by using a Novaseq 5000 PE150 platform, providing at least 50 million clean paired-end reads of 150 bp per sample. See section 2.2 for data accessibility.

| Identification of effector homologues
Effector gene families were identified in the genomes based on the predicted genes by braker2. Homologues for Gr-1106/Hg-GLAND4 were identified using hmmsearch ( Hg-GLAND5 and Hg-GLAND13 were identified with blastp searches with GenBank sequences KJ825716 and KJ825724, respectively, maintaining thresholds at 35% identity, 50% query coverage and an E-value of 0.0001. Each effector homologue was tested for the presence of a signal peptide for secretion by phobius version 1.01 (Käll et al., 2007) and the presence of one or multiple DOG box motifs in the promoter region using a custom script (available on GitHub https://github.com/Joris vanst eenbr ugge/Gros_Gpal_Hsch).
For G. rostochiensis, effector annotations were used as described in Van Steenbrugge et al. (2021), except for CLE-like proteins. CLE variant Gros19_g16105.t1 was excluded because the gene model probably contains errors, and the exact location of this variant in the phylogenetic tree is therefore uncertain. Furthermore, the HMM scoring cut-off was lowered to 300 to include two additional potential CLE variants.

| Phylogeny
A multiple sequence alignment was generated for each effector gene family using muscle version 3.8.1551 based on gene coding sequences to infer the phylogeny of effector gene families between species. Next, phylogenetic trees were produced with raxml ver-

| Use of long read sequence technologies to generate novel reference genomes
The mapping of diversification patterns of effector families requires a high-quality reference genome with preferably a low number of scaffolds and a minimal gap length. Cotton et al. (2014) were the first to present a reference genome of the PCN species Globodera pallida. For our specific purpose (i.e., the generation of complete inventories of effector families), this reference genome was too fragmented, and the TA B L E 1 Comparative genome statistics of four cyst nematode genome assemblies. In bold, data from the current paper; data on Globodera pallida Lindley and G. rostochiensis Line 19 genomes were published by respectively Cotton et al. (2014)  total gap length was too large. PacBio long-read technology allowed us to generate a new reference genome from the G. pallida population D383 with a 42-fold reduction of scaffolds and a 21-fold reduction of the total gap length. As one consequence, the number of predicted genes increased from 16,403 to 18,813, and the level of completeness as estimated by busco increased by more than 10% (Table 1).
In addition, we assembled the genome sequence of the IRS population of the beet cyst nematode Heterodera schachtii. This allowed us to compare effector family diversification among the two PCN species, G.
pallida and G. rostochiensis, and establish the polarity of these contrasts by using H. schachtii as an outgroup (both Globodera and Heterodera belong to the family Heteroderidae). The current genome size, 190 Mb, is slightly above the genome size estimated by flow cytometry, 160-170 Mb (Siddique et al., 2021). Notably, both the predicted number of genes and transcripts were about 50% higher in H. schachtii than in the two Globodera species (Table 1)

| Effector family selection
In our comparison between the three cyst nematode species, we concentrated on effectors. Cyst nematodes were shown to harbour numerous effector families (Pogorelko et al., 2020). Here we concentrated on six effector families. For four of these families, one or more representatives are known to be involved in the suppression of plant innate immune system: SPRYSEC (Diaz-Granados et al., 2016;Mei et al., 2018), GLAND4 (also referred to as Gr-1106) (Barnes et al., 2018), GLAND5 (also referred to as G11A06) (Yang, Pan, et al., 2019) and VAP (Wilbers et al., 2018). CLE  is an intriguing effector family involved in feeding site induction, and the GLAND13  family is essential in the hydrolysis of plant sugars once they are taken up by the nematode.
The phylogenetic analysis yields a tree with a well-supported backbone ( Figure 3)

| GLAND5
With 13 homologues, the GLAND5 effector family (also referred to as G11A06) was significantly less diversified in Gr-Line19 than in

| VAPs (venom allergen-like proteins)
The levels of diversification in the VAP effector family were highly comparable between the three cyst nematode species. In Gp-D383, Hs-IRS and Gr-Line19, respectively, eight, eight and 10 VAP paralogues were identified. Phylogenetic analysis resulted in a tree with a well-supported backbone ( Figure 5). The tree contains three clusters

| Effects of DOG box dosage on SPRYSEC expression
Although DOG box motifs in the promoter regions of effector variants are present in many effector families, their presence is not a necessity for the functioning of an effector family. For example, none of the variants of the CLE family contained DOG box motifs, regardless of the species (Figure 6). Dorsal gland-expressed effectors can therefore be expressed and secreted without the presence of DOG box motifs. This is further illustrated in Figure 8a

F I G U R E 3
Phylogeny of GLAND 4 (equivalent to 1106, see, e.g., Noon et al., 2015) effector genes of Globodera pallida (population D383) (ochre), Globodera rostochiensis (Gr-Line19) (green) and Heterodera schachtii (population IRS) (purple). A multiple sequence alignment was made using muscle on the coding sequence. A phylogenetic tree was made using raxml using a GTRGAMMA model, validated by 100 bootstrap replicates. Bootstrap values <50% are indicated by a dash. GenBank IDs in lighter shades of ochre, green or purple are used to indicate effector variants lacking a signal peptide for secretion (R 2 = .67 and .62 respectively) between the DOG box dosage and expression levels based on RNA abundance is present for this family.
In G. pallida, in particular, high expression levels of SPRYSEC variants can be reached in the absence of DOG boxes in its promoter region ( Figure 8b). For H. schachtii, a slightly negative correlation was found (R 2 = −.11) between DOG box dosage and expression levels.

| DISCUSS ION
In our attempt to fundamentally understand the interaction between plant-parasitic nematodes and their hosts, the usefulness of high-quality reference genomes of these pathogens is vital. Given the enormous impact of PCNs in all major potato production regions of the world, it is not surprising that high priority was given to the sequencing of both the Globodera pallida (Cotton et al., 2014) and the Globodera rostochiensis (Eves-van den Akker et al., 2016) genome. This was done before long-read sequencing technologies became available. Although some research questions can be well addressed with these reference genomes, less fragmented genomes are needed for studying effector diversification. Therefore, a new reference genome was generated from G. pallida population D383. As compared to the G. pallida Lindley genome assembly, this resulted in a 42-fold reduction in the number of scaffolds and a 24-fold increase in N50. In the comparison of the effectoromes of the two PCN species, we included a newly generated genome of the Heterodera schachtii population IRS as an outgroup. Note that reference genomes from these obligatory sexually reproducing pathogens are actually population-based consensus genomes.
Long read sequencing technologies require DNA from tens of thousands of genetically nonidentical nematodes. While an individual of these diploid species could theoretically carry a maximum of two haplotypes per locus, a population has the potential to carry many more. It is essential to mine these haplotypes and assemble them into a single haploid assembly to generate a proper reference. This is not a trivial process and requires specialized bioinformatics software (Roach et al., 2018). As the sizes of the current genome assemblies are comparable to the genome sizes assessed by flow cytometry, and as the busco duplication scores are relatively low, we assume that the current genome assemblies are a reasonable reflection of their actual constitution.

| Effector diversification
In our analyses we concentrated on six selected effector families, and this selection included relatively widespread effector families such as CLE, GLAND13 and VAP, as well as families that appear to F I G U R E 4 Phylogeny of GLAND 5 (equivalent to G11A06, see, e.g., Noon et al., 2015) effector genes of Globodera pallida (population D383) (ochre), Globodera rostochiensis (Gr-Line19) (green) and Heterodera schachtii (population IRS) (purple). A multiple sequence alignment was made using muscle on the coding sequence. A phylogenetic tree was made using raxml using a GTRGAMMA model, validated by 100 bootstrap replicates. Bootstrap values <50% are indicated by a dash. GenBank IDs in lighter shades of ochre, green or purple are used to indicate effector variants lacking a signal peptide for secretion be cyst nematode lineage-specific such as SPRYSEC, GLAND4 and GLAND5. Although the protein architecture is distinct between lineages (see, e.g., Mitchum et al., 2012), the CLE family-a category of effectors involved in feeding site induction-were shown to be present also in root-knot and reniform nematodes (Rutter et al., 2014;Wubben et al., 2015). GLAND13 effectors, members of the glycosyl hydrolase family 32, were shown to be present in a range of root-knot and cyst nematodes species as well as in other plant-parasitic nematodes such as Nacobbus aberrans and Rotylenchus reniformis . The distribution of VAPs within the phylum Nematoda is even broader (Wilbers et al., 2018). VAPs were discovered in the animal parasite Ancylostoma caninum (Hawdon et al., 1996). They were later isolated from the root-knot nematode Meloidogyne incognita (Ding et al., 2000), and subsequently in a wide range of obligatory plant-parasitic nematodes including various cyst nematode species.
A number of VAP variants were shown to be implicated in the suppression of both PAMP-and effector-triggered immunity (e.g., Li et al., 2021) for the burrowing nematode Radopholus similis. Our effector family selection also included families that (so far) appear to be specific to the cyst nematode lineages. These include SPRYSEC (Diaz-Granados et al., 2016), GLAND4 (also referred to as Gr-1106) and GLAND5 (also referred to as G11A06). For all of these effector families, at least a subset was shown to be involved in repression of the host plant immune system.
While comparing the overall diversification patterns of the six effector families under investigation, striking differences were observed. In SPRYSEC, GLAND5 (G11A06) and GLAND13 (GH32 members), virtually all H. schachtii paralogues appeared to be phylogenetically separate from the G. pallida and G. rostochiensis effector family variants, while representatives from the two PCN species were present in mixed clusters. These results should be taken with some caution as the backbone resolution of these phylogenetic trees ranges from poor (SPRYSEC) to robust (GLAND5, GLAND13). These patterns suggest that SPRYSEC, GLAND5 and GLAND13 effectors started to diversify after the split between Heterodera and Globodera.
Effector families GLAND4 (Gr-1106) and CLE showed distinct diversification patterns; by far most paralogues are grouped in species-specific clusters. As both effector families show a reasonable backbone resolution, we hypothesize that these effector families might have diverged after the split between G. pallida and G. rostochiensis.
Phylogenetic analysis of the VAP effector family in the three cyst nematode species revealed an opposite pattern, with almost complete mixtures of representative paralogues from the individual species. VAPs constitute an exceptionally widespread effector family within the phylum Nematoda (Wilbers et al., 2018), and our results indicate diversification of this family before the split between the cyst nematode genera Globodera and Heterodera.

F I G U R E 5
Phylogeny of VAP (venom allergen-like protein, see, e.g., Wilbers et al., 2018) effector genes of Globodera pallida (population D383) (ochre), Globodera rostochiensis (Gr-Line19) (green) and Heterodera schachtii (population IRS) (purple). A multiple sequence alignment was made using muscle on the coding sequence. A phylogenetic tree was made using raxml using a GTRGAMMA model, validated by 100 bootstrap replicates. Bootstrap values <50% are indicated by a dash. GenBank IDs in lighter shades of ochre, green or purple are used to indicate effector variants lacking a signal peptide for secretion Further investigation is necessary to elucidate the function of DOG boxes in effector regulation.
In plant-pathogenic fungi, a few transcription factors have been identified that were shown to steer effector expression. SIX Gene Expression 1 (Sge1), a conserved member of the Gti1/Pac2 protein family, was instrumental in the regulation of effector repression in a range of fungal pathogens including Verticillium dahlia (Santhanam & Thomma, 2013), Zymoseptoria tritici (Mirzadi Gohari et al., 2014) and Fusarium oxysporum f. sp. cubense (Hou et al., 2018). Another example is AbPf2, a zinc cluster transcription factor from the necrotrophic plant pathogen Alternaria brassicicola. Via a loss of function approach, this transcription factor was shown to regulate the expression of eight putative effectors (Cho et al., 2013

| CON CLUS IONS
The PCN Globodera pallida and its sibling species G. rostochiensis coevolved with potato in the Andes in South America. These pathogens have been introduced unintentionally in all major potato-growing regions in the world. Currently, PCNs are the most harmful pathogens in potato production systems, and as a result of this extreme anthropogenic range expansion potatoes worldwide cannot be grown without adequate PCN management. For both G. pallida and G. rostochiensis, host plant resistance are the best way to control these soil F I G U R E 6 Phylogeny of CLE (CLAVATA3/ESR-related peptides, see, e.g., Lu et al., 2009) effector genes of Globodera pallida (population D383) (ochre), Globodera rostochiensis (Gr-Line19) (green) and Heterodera schachtii (population IRS) (purple). A multiple sequence alignment was made using muscle on the coding sequence. A phylogenetic tree was made using raxml using a GTRGAMMA model, validated by 100 bootstrap replicates. Bootstrap values <50% are indicated by a dash. GenBank IDs in lighter shades of ochre, green or purple are used to indicate effector variants lacking a signal peptide for secretion F I G U R E 7 Phylogeny of GLAND13 (invertases, see, e.g., Danchin et al., 2016) effector genes of Globodera pallida (population D383) (ochre), Globodera rostochiensis (Gr-Line19) (green) and Heterodera schachtii (population IRS) (purple). A multiple sequence alignment was made using muscle on the coding sequence. A phylogenetic tree was made using raxml using a GTRGAMMA model, validated by 100 bootstrap replicates. Bootstrap values <50% are indicated by a dash. GenBank IDs in lighter shades of ochre, green or purple are used to indicate effector variants lacking a signal peptide for secretion pathogens. However, their effectiveness depends on proper matching between the genetic constitution of the PCN field population and the set of host pant resistances present in modern potato varieties. Niere et al. (2014) reported G. pallida populations that could no longer be controlled by any of the currently used potato cultivars.
This, combined with inherent imperfections of the current G. pallida pathotyping system (e.g., Phillips & Trudgill, 1983), underlines the need for a new pathotyping system. Such a system will be based on distinctive effector variations present in any given PCN population.
The availability of a high-quality reference genome is a prerequisite for the development of such a system. We have demonstrated that the quality of the G. pallida genome presented in this paper allows for the mapping of complete effector families. With this resource, resequencing data from pathotypically diverse G. pallida populations will provide insight into the ecological diversification of this extreme range expander, and enable the development of a new pathotyping system that will facilitate the targeted and durable use of precious host plant resistances against this notorious plant pathogen.

CO N FLI C T O F I NTE R E S T
All authors declare that they have no conflict of interest.