Molecular signatures of selection associated with host plant differences in Pieris butterflies

Adaptive traits that enable organisms to conquer novel niches and experience subsequent diversification are ecologically and evolutionarily important. The larvae of Pieris butterflies express nitrile‐specifier proteins (NSPs), a key innovation for overcoming the glucosinolate (GLS)‐myrosinase‐based defence system of their Brassicales host plants. Nitrile‐specifier proteins are a member of the NSP‐like gene family, which includes the major allergen (MA) protein, a paralog of NSP with a GLS‐disarming function, and a single domain major allergen (SDMA) protein, whose function is unknown. The arms‐race between GLS‐based defences and the NSP‐like gene family is suggested to mediate diversification in both Pierid butterflies and Brassicales plants. Here, we tested whether the expected strong selection on NSP‐like gene family correlates with shifts in host plant spectra among Pierid butterflies. We combined feeding experiments using 25 Brassicaceae plants and five Pieris species with larval transcriptome data to investigate the patterns of selection acting on NSP‐like gene family members. Although we observed significantly elevated nonsynonymous to synonymous substitution rate ratios in NSPs on branches associated with changes in patterns of host plant usage, no such pattern was observed in MAs or SDMAs. Furthermore, we found evidence for positive selection of NSP at a phylogenetic branch which reflects different host plant spectra. Our data indicate that the NSP‐related gene members have evolved differently: NSPs have accumulated more amino acid changes in response to shifting preferences for host plants, whereas MAs and SDMAs appear to be more conserved. Further detailed functional assays of these genes would provide important insights to understand their role in the chemical arms‐race between Pieris butterflies and their Brassicales host plants.

Previous studies indicated that the co-evolutionary diversification of Brassicales plants and Pierid butterflies was mediated by the chemical arms-race between the glucosinolate-myrosinase defence system and members of the NSP-like gene family (Edger et al., 2015).
Past increases of GLS complexity in Brassicales were followed by frequent gene birth-death events of NSP-like gene family members in Pierid butterflies. This suggests that members of the NSP-like gene family would potentially be under strong selection pressure, were Pieridae butterflies to expand or shift their host plants. Such a scenario is supported by recent findings of signatures of positive selection in partial NSP sequences of a pair of Pieris butterflies in comparison with the signatures of 70 randomly selected genes (Heidel-Fischer, Vogel, Heckel, & Wheat, 2010). However, the evolutionary forces acting on all NSP-like gene family members, especially when considering the associated host plant spectrum, remain unknown.
Besides NSP-like gene family members, a number of detoxification-related genes are either hypothesized or were shown to be directly involved in overcoming chemical challenges of host plants, such as glutathione S-transferases, UDP-glycosyltransferases or cytochrome P450 enzymes (Feyereisen, 2012;Krempl et al., 2016;Simon et al., 2015). The expression patterns of numerous putative detoxification-related genes in larvae feeding on different host plants have been broadly tested in both specialist and generalist herbivores (Celorio-Mancera et al., 2016;Heidel-Fischer et al., 2009;Mao et al., 2007;Nallu et al., 2018). However, in most of these cases there is a lack of data pertaining to field-observed host plant associations (or associations of the larvae with specific groups of secondary metabolites). We thus not only need reliable host plant data but also more in-depth analyses of enzymatic activities and patterns of selection of these host plant chemistry-induced genes.
Here, we focus on five Japanese butterfly species (Pieris napi, P. melete, P. rapae, P. brassicae and P. canidia) in the genus Pieris, which has both NSP and MA genes and feed on Brassicaceae plants F I G U R E 1 Field observations of primary habitat and larval host plant spectra of five Pieris butterflies in Japan. Pieris napi and Pieris melete tend to be found in montane habitat and rely mostly on Brassicaceae plants in forests; these include Arabis, Arabidopsis or Turritis. Pieris rapae and Pieris brassicae are known as Brassica crop pests. In Japan, Pieris canidia can only be found in a restricted area and uses Cardamine or Lepidium as host plants [Colour figure can be viewed at wileyonlinelibrary.com] with the highest GLS diversity among the Brassicales. The five Pieris species have different host plant spectra according to field observations (Figure 1), with P. napi and P. melete frequently using wild Brassicaceae plants (such as Arabis or Arabidopsis), whereas P. rapae and P. brassicae tend to feed on Brassicaceae crops and are known as major pests (Benson, Pasquale, Van Driesche, & Elkinton, 2003;Kitahara, 2016;Ohsaki & Sato, 1994;Ueno, 1997). In contrast, in Japan, P. canidia can be found only in the southern islands (Yonaguni Island, Okinawa), relying on the limited number of host plants, such as Cardamine or Lepidium, in their habitat range. We aim to identify patterns of selection of NSP-like gene family members correlating with different host plant spectra among the five Pieris species used in this study (Figure 1).
To this end, we conducted feeding experiments with 25 Brassicaceae plants to acquire patterns of host plant utilization in Pieris species. With larval transcriptome (RNA-seq) data from the five Pieris species, we analysed the divergence in amino acid sequences of orthologs based on nonsynonymous (dN) and synonymous substitution (dS) rates. We investigated signatures of selection on members of the NSP-like gene family compared with other larvalexpressed orthologs. We also performed tests to detect evidence of positive selection in NSP-like gene family members. Additionally, we searched for potential genes more generally related to host plant detoxification with signatures of selection which correlate with the observed larval performance based on gene ontology (GO) and dN/dS analyses. By combining these approaches, we were able to investigate whether there are correlations between host plant spectra and signatures of selection on ecologically important NSP-like gene family members or other detoxification-related genes in Pieris (Figure 2).
The obtained results provide important insights into the evolution of adaptive key innovations in Pieris butterflies.

| Feeding experiments
We used four Pieris butterfly species for the feeding assay, leaving out P. canidia, which is endemic and rather rare in Japan. We collected 7-10 female butterflies of three Pieris butterfly species (P. napi, P. melete, P. rapae) from wild populations in Chiba and Hokkaido, Japan. Most wild-caught female butterflies were already fertilized. We released the female butterflies into cages containing cabbage (Brassica oleracea var. capitata) or Cardamine leucantha under high-intensity light conditions and waited for eggs to be laid.
For P. brassicae, final-instar larvae were caught in the wild (Hokkaido, Japan), fed on cabbage and reared to the adult stage. After eclosion, 10 female butterflies were hand-paired with males and eggs were collected as they were from the other species. Eggs of the four Pieris butterfly species were incubated at 25°C until they hatched.
The seeds of 19 Brassicales plant species were collected from the F I G U R E 2 Analysis pipeline used to compare dN/dS ratios of NSP-like gene family members with all observed ortholog sets from the reciprocal best hit using BLAST across five Pieris butterflies. Signatures of selection on NSP-like gene family members were investigated in each phylogenetic branch and compared with the results of the feeding assay wild and the others were acquired from commercial suppliers (Table   S1). We grew the plants in the greenhouse at 25°C, with 60% relative humidity and L16:D8. Plants were watered and fertilized every week with a 2,000× diluted solution of Hyponex (N:P:K = 6:10:5; Hyponex, Osaka, Japan). After 2 months of cultivation, plants were used for the feeding experiments.
Neonate larvae were collected within 12 hr after they hatched for the feeding experiment. We transferred three neonate larvae to each of two plants per plant species using a soft-haired brush (n = 6).
To minimize changes in the condition of the experimental plants, experimental trials were carried out within 5 days for all four Pieris species. We conducted feeding experiments under the same temperature and light conditions used for plant growth. We measured the weight of each larva individually (within 0.1 mg) after 120 hr of feeding. Since there was no significant difference of larval performance between the two plants replicates (ANOVA; p ≥ .05), we used the average weight of larval individuals from each plant species as an index of the performance of each Pieris butterfly species.
Larval weights were standardized as z-scores to enable comparison between species. We calculated the mean scores of each plant treatment and used these for the comparative analysis. We conducted Pearson's correlation test and hierarchical clustering analysis to assess differences in larval performances among the four Pieris species. The possible clustering was evaluated with the gap statistics (Tibshirani, Walther, & Hastie, 2001). All of these analyses were performed on r studio ver. 1.1.453 (RStudioTeam, 2016).

| RNA sequencing
From four Pieris butterfly species (P. napi, P. melete, P. rapae and P. brassicae), excluding P. canidia, we collected larvae that we used for the feeding experiments for transcriptome analysis (Figures 1 and 2).
We used larvae that fed on Arabidopsis kamchatica and Cardamine occulta as representatives. The larvae were flash-frozen in liquid nitrogen and stored at −80°C until RNA extraction. We selected a single representative larva for each of the four Pieris and plant species combinations, and RNA was extracted using the RNeasy Mini

| De novo assembly, searching for reciprocal best hits (RBHs) using BLAST
Acquired reads of RNA-seq data were pooled for each species after filtering out bad quality reads by trimmomatic with the following options (LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20 MINLEN:40) (Bolger, Lohse, & Usadel, 2014). The quality of reads was checked by FastQC (Andrews, 2010). Pooled reads were de novo assembled by Trinity ver. 2.0.6 (Grabherr et al., 2011). We used TransDecoder (http://trans decod er.github.io/) to predict open reading frames (ORFs) from the assembled contigs and subsequently looked for reciprocal best hits (RBHs) using BLAST alignment methods to analyse amino acid sequences (longer than 100 amino acids) predicted by TransDecoder (Camacho et al., 2009;Cock, Chilton, Grüning, Johnson, & Soranzo, 2015). We used RBH BLAST software with default settings (minimum percentage identity for BLAST matches = 0.7, minimum percentage query coverage for BLAST matches = 0.5) on all possible species pairs (10 pairs) and subsequently extracted P. rapae orthologs from this RBH result and ran blastp on the amino acid sequences against a P. rapae protein database to confirm the ORF prediction from TransDecoder.
Orthologs in the RBH result without any BLAST hits to the P. rapae protein database (Shen et al., 2016) were removed since these amino acid sequences may have resulted from wrong ORF predictions by TransDecoder. We used PRANK to conduct codon-based alignment of each ortholog set acquired from the RBH result (Loytynoja & Goldman, 2005). Since the P. canidia sample was obtained from dissected gut tissue only, the entire RBH result was likely biased to gutexpressed proteins, that is the interface between larvae and their plant diet.

| Phylogenetic tree construction
We reconstructed an unrooted phylogeny of the five Pieris species using the transcriptome data by concatenating all aligned orthologous nucleotide sequences into one sequence for each species, generating an maximum-likelihood (ML) phylogenetic tree by IQ tree (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) after removing gaps with TrimAl (2,063,074 bp remaining) (Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009). We used the GTR + gamma substitution model and set ultrafast bootstrap approximation iterations as 1,000, using -bnni options to construct a phylogeny of the five Pieris species (Hoang, Chernomor, Von Haeseler, Minh, & Vinh, 2018).

| Comparing patterns of divergence of NSP-like gene family members with other Pieris species orthologs
We used the acquired unrooted tree for estimating dN/dS ratios of all the orthologs at each branch using PAML 4.8 (Yang, 2007). We used runmode = 0, model = 1 and NSsites = 0 option in codeml implemented in PAML and estimated dN/dS ratios using the ML method.
The estimated dN/dS values of NSP-like gene family members were compared with the entire dN/dS distributions of all ortholog sets in each phylogenetic branch. We discarded orthologs which had estimated dS below 0.01 from this analysis, since too low dS values can cause unreliable dN/dS value estimation (Villanueva-Cañas, Laurie, & Alba, 2013).

| Tests for positive selection on NSP-like gene family members
We used the branch-site model test (Zhang, Nielsen, & Yang, 2005) to identify cases of positive selection on specific sites of NSP-like gene family members at a specific branch. We prepared ML molecular phylogeny of a combined ML gene tree of NSP and MA from our RNA-seq data with additional sequences of MAs from Anthocharis spp. and Pontia spp. (Accession nos: EU137117.1, EU137133.1, EU137132.1) using IQtree. Regarding SDMA, we added SDMA sequences from A. cardamines, Eucheira socialis, Dixeia pigea, Colias eurytheme, P. xylostella (Accession nos: EU137118.1, EU137122.1, EU137121.1, EU137119.1, EU137131.1) for generating an SDMA gene tree. We tested all the branches in Pieris using codeml model 2 with NSsites = 2 option and ran an alternative model: varied dN/dS ratios across sites as well as lineages were allowed (fixed_omega = 0), and null model: fixed dN/dS (fixed_omega = 1). We conducted a likelihood-ratio test (LRT) with the chi-square distribution to evaluate significant differences between the alternative and null models.
Acquired p values were corrected with false discovery rates (FDRs) in each analysis. Signs of positive selection on each site were identified by the Bayes empirical Bayes (BEB) analysis (.90 cut-offs).
Since the branch-site model in codeml can cause false positives in case of multinucleotide mutations (MNMs, Venkat, Hahn, & Thornton, 2018), we also performed more conservative branch-site model tests covering MNM situations (BS + MNM). In BS + MNM, the additional parameter δ represents the relative instantaneous rate of double mutations compared to that of single mutations. We ran null models and alternative models in BS + MNM and conducted LRTs to evaluate significance.

| GO categories with elevated dN/dS values at the branch highlighting host plant differences in Pieris
We used P. rapae contigs from the RBH result for GO annotation and ran these genes against the NCBI nonredundant protein sequence database in Galaxy (Blastx, e-value = 10e−4). We subsequently used the Blast2GO platform to load the resulting Blast-xml file and to conduct mapping and annotation steps based on the BLAST result for acquiring GO annotations for each contig (Götz et al., 2008).
To test significantly elevated dN/dS ratios among genes associated F I G U R E 3 Feeding assays of four Pieris butterfly larvae on 25 different Brassicaceae plants (n = 6). The four Pieris butterfly species generally grew better on Cardamine occulta but could not use B. incana or E. cheiranthoides as optimal hosts. Gap statistic (inbox) was highest at clustering number 2, suggesting overall larval performance patterns of the four Pieris species could be best clustered in two groups. The hierarchical clustering analysis suggested that the two groups are as follows: Pieris napi-Pieris melete and Pieris rapae-Pieris brassicae shown here as larval performance similarity cladogram. The largest performance differences were observed on Thlaspi arvense on which Pieris rapae and Pieris brassicae larvae grew better than the other two species [Colour figure can be viewed at wileyonlinelibrary  with specific GO terms, we selected those that contained at least 20 orthologs and tested their dN/dS distributions with those of all the observed orthologs (background) using a Wilcoxon test. We performed this analysis based on the estimated dN/dS at the two internal branches: (P. melete, P. napi) (P. brassicae, P. rapae, P. canidia) branch, which highlighted the larval performance differences, and (P. melete, P. napi, P. brassicae) (P. rapae, P. canidia), which does not explain differential host plant use. We compared the GO categories with elevated dN/dS between these two branches to potentially identify genes with signatures of selection which correlate with the observed larval performance. All statistical analyses were performed in r studio ver. 1.1.453, and p values acquired were adjusted by FDR (RStudioTeam, 2016).

| Performance of four Pieris butterflies on 25 Brassicaceae plants
We obtained larval weights for four Pieris butterfly species (P. napi, P. melete, P. rapae and P. brassicae) feeding on 25 different Brassicaceae plant species (Figure 3). The gap statistics for the given number of clusters were as follows: Gap 1 = 0.080, Gap 2 = 0.135, Gap 3 = 0.119, Gap 4 = 0.123 ( Figure 3). Our analysis showed that larval performance of the four Pieris species could be best clustered into two groups: the P. napi-P. melete group and the P. rapae-P. brassicae group, which was also expected from field observations. The largest performance differences were observed on Thlaspi arvense, on which P. rapae and P. brassicae performed better than P. napi and P. melete ( Figure 3). However, since each of the four species also has their species-specific host spectra but also has shared host (or nonhost) plants, Gap statistics of other cluster numbers were also higher (e.g., Gap 4 ).

| RNA-seq, reciprocal best hit (RBH) BLAST analysis of Pieris butterflies
We obtained 32-40 million Illumina 100 bp pair-end reads for the four species (P. napi, P. melete, P. rapae and P. brassicae) and F I G U R E 4 (a) Unrooted species phylogeny of five Pieris species used in this study. The tree was constructed based on all the aligned orthologs from reciprocal best blast hit analyses (2,063,074 bp total). Statistical supports from bootstrapping tests are shown at each node. The branch that reflects host plant difference is marked with colour. Each branch has estimated dN/dS (=ω) values of NSP-like gene family members (ωN: ωNSP, ωM: ωMA, ωS: ωSDMA). Values with "*" mean that the value is located in the top 5% of the entire dN/dS distribution. Values with " †" showed too low estimated dS (dS < 0.01) and were removed from the analysis.

| Phylogeny of Pieris and dN/dS ratios of NSPlike gene family members across Pieris branches
The unrooted tree from all the aligned orthologs of Pieris species displays solid statistical support for all major nodes (Figure 4a), with P. napi and P. melete forming a distinct clade while a different clade consists of P. rapae and P. canidia. We estimated dN/dS ratios for all ortholog sets at all phylogenetic branches with PAML 4.8 (Yang, 2007). We found NSP had a significantly elevated dN/ dS value at the (P. melete, P. napi) (P. brassicae, P. rapae, P. canidia) branch (dN/dS = 0.578), the branch that is consistent with major host plant differences (Figures 3 and 4a). The complete distribution of estimated dN/dS values at this branch is shown in Figure 4b (mean dN/dS = 0.105). We also observed that (P.

| Signatures of clade-specific positive selection on NSP-like gene family members correlating with larval performance differences
The ML gene trees of NSP-like gene family members are shown in  Table 1). The BEB analysis suggested that one codon site had signs of positive selection in NSPs at branch1 (Table 1, Table 1). Regarding MA and SDMA, we found no sign of positive selection even at the branches at which we found higher dN/dS values of these genes compared to other orthologs (Figure 5a,b).

| GO terms with elevated dN/dS ratios associated with differential host plant use
After GO annotations of all P. rapae RBH contigs, we obtained 1,457 GO terms in our data sets. These included 680 terms related to biological process, 540 to molecular function and 237 to cellular component GOs. We conducted the Wilcoxon test for the GO terms that had more than 20 assigned orthologs. Based on the estimated dN/dS values at (P. melete, P. napi) (P. brassicae, P. rapae, P. canidia) branch, which highlights the host plant differences (Figure 4a), we found that one biological process-"proteolysis"-two processes associated with molecular function-"hydrolase activity" and "serine-type endopeptidase activity"-and two cellular component terms-"extracellular region" and "membrane"-had significantly elevated dN/dS values when compared to the entire dN/dS distribution of all contigs ( Figure 6, Table 2). This test also showed that 9 GO terms had significantly lower dN/dS values in the three categories at this branch (Table 2). For the other internal branch (P. melete, P. napi, P. brassicae) (P. rapae, P. canidia), which does not reflect differential host plant use (Figure 4a)

| D ISCUSS I ON
Focusing on five Japanese Pieris butterflies, we tested host plant spectra and investigated signatures of selection on NSP-like genes, which are a key innovation of these butterflies to overcome the GLS defence system of their Brassicales host plants (Edger et al., 2015;Wheat et al., 2007). We acquired RBH ortholog sets expressed in larvae of the five Pieris species based on transcriptome data and compared the calculated dN/dS ratios of each ortholog or performed evolutionary tests in order to investigate the effect of evolutionary forces on NSP-like gene family members. We also combined ecological approaches for acquiring performance data on larvae of Pieris species by conducting a feeding experiment using 25 Brassicaceae plant species. These approaches yielded four major findings. First, we observed that Pieris species showed phylogenetically conserved differences in larval host performance. Second, we observed that NSP had significantly elevated dN/dS ratios compared to other genes at some phylogenetic branches; however, its sister gene MA did not show this trend. Third, evidence of positive selection on NSPs was observed at a phylogenetic branch which showed differences in larval performance according to our feeding assays, but no evidence of positive selection was found in MA or SDMA. Last, we observed significantly elevated dN/dS ratios in GO terms which are associated with potential detoxification-related genes and could correlate with larval performance differences at the branch.
According to our feeding experiments with four Japanese Pieris species (P. napi, melete, rapae and brassicae) and 25 Brassicaceae plant species, P. napi and P. melete larvae performed similarly, as did P. rapae and P. brassicae larvae (Figure 3). Observations in the field suggest that these four Pieris species have slightly different host preferences: P. napi and P. melete feed on wild and montane Brassicaceae plants, such as Arabis or Turritis, and P. rapae and P. brassicae use Brassicaceae crops more often than the other two species (Figure 1) (Harvey, Poelman, & Gols, 2010;Ohsaki & Sato, 1994). In addition, feeding assays also showed that P. napi and P. melete have similar larval performance trends on a set of Brassicaceae plants compared to P. rapae, supporting our results (Okamura, Tsuzuki, et al., 2019b). Although the clustering analysis did not fully support this grouping (Gap4 was also higher), this was expected since each Pieris species can also have its own, discrete host plant spectrum but at the same time are also known to share some host or nonhost plant species. The main objective of this larval performance analysis was to see which species tend to perform similar in more controlled feeding assays and how these results correlate with field observation. Thus, our feeding assay results confirm that P. napi and P. melete have more similar host plant spectra as do P. rapae and P. brassicae (Figure 3).
Overall, phylogenetic analysis showed that the species phylogeny seemed to correspond with larval performance (Figures 3 and 4), suggesting that the larval host preferences of the four Pieris butterflies are phylogenetically conserved. In this study, we did not perform any Comparing dN/dS ratios of each ortholog at all branches, we found that NSPs had higher dN/dS values at two branches as ranked in the top 5% among all tested orthologs (Figure 4a). Although we filtered out a number of genes by RBH processes and therefore compared only a subset of the entire orthologs, our findings suggest that NSPs show evidence for positive selection-or, strongly relaxed purifying selection-among the five Pieris butterfly species. Interestingly, we also found that MAs had lower dN/dS values compared to NSPs (Figures 4 and 5), and their dN/dS values did not reach the top 5% among all tested orthologs, suggesting that in this genus MAs are under stronger purifying selection than are NSPs. NSPs and MAs are known as paralogs, and only NSP was confirmed to have GLS-disarming activity in Pieris. However, MAs also disarm GLSs in another Brassicaceae-feeding Pierid genus, Anthocharis, which has only MAs (Edger et al., 2015); this overlap strongly suggests that in Pieris MAs act like NSPs. Our results show that selection on these two paralogous genes, both of which have a similar repeat domain structure and can potentially disarm GLSs, can differ strikingly. This could imply that these paralogs have been differentially (sub-)functionalized in Pieris, where NSPs have more derived functions, whereas MAs have more conserved functions. Finally, for SDMA, we also observed elevated dN/dS ratios at one branch (P. rapae branch). Expressed in the gut, SDMAs are known to be found in all Lepidoptera, supporting the hypothesis that their function is related to digestion and not to disarming GLS (Fischer et al., 2008;Randall et al., 2013). However, we still lack reliable information about their role in the Lepidopteran gut environment. Focusing on the SDMA branch with elevated dN/dS ratios might provide additional information to understand its function.
Using the branch-site test implemented in both PAML and BS + MNM, we detected evidence of positive selection only in NSP at branch1 and branch3 in the ML gene tree (Figure 5a). These branches highlight and support the results of our feeding experiment, in which we found that the P. napi and P. melete clade had different host preferences from P. rapae and P. brassicae (Figures 2 and 4). Interestingly, branch2, which did not reflect the result of our feeding assays, also had elevated dN/dS ratios of NSP compared to other orthologs  Significantly elevated dN/dS values were observed in "proteolysis" from biological process (red), and "hydrolase activity" and "serine-type endopeptidase activity" from molecular function (blue), and "extracellular region" and "membrane" from cellular component (yellow), as compared to the entire distribution of all the observed contigs. Comparisons with other enriched GO terms are shown in Table 2   Besides individual NSP-like gene family members, elevated dN/dS values were also more broadly observed at the two internal branches of the unrooted Pieris phylogeny (Figure 4a): the (P. melete, P. napi) (P. brassicae, P. rapae, P. canidia) branch highlighting differential host plant use, and the (P. melete, P. napi, P. brassicae) (P. rapae, P. canidia) branch which does not reflect host plant differences. Surprisingly, we could only find GO terms with elevated dN/dS at the branch highlighting differential host plant use, including "proteolysis" (biological process); "serine-type endopeptidase activity" and "hydrolase activity" (molecular function); "extracellular region" and "membrane" (cellular component). These GO categories with elevated dN/dS values were broadly consistent with potential candidates of positive selection or relaxed purifying selection along with differential host plant use in herbivorous insects in general.
In Lepidopteran larvae, most of the digestive enzymes are involved in proteolysis (Simon et al., 2015) and several classes of digestive enzymes are necessary for insect herbivores to acquire essential nutrients in appropriate amounts (Broadway, 1989). In Pieris, these proteolytic activities are dominated by serine endopeptidases (Broadway, 1996). Since plants also have varied species-specific protease inhibitors to inhibit protease activity in herbivores, herbivores need to have evolved inhibitor-resistant proteinases as a counter adaptation (Bolter & Jongsma, 1997 (Simon et al., 2015). In addition, several detoxification-related proteins, including NSP and MA, are secreted and thus display extracellular localization. Therefore, although the GO category "extracellular region" appears to be a very general term, the observed elevated dN/dS values for these genes would also be consistent with their potential role in interactions with host plant-derived compounds, including complex polysaccharides and proteins, but also toxic metabolites. Previous research has uncovered differential regulation of genes associated with this GO term in several herbivore species responding to different host plants (Schweizer, Heidel-Fischer, Vogel, & Reymond, 2017). To uncover the co-evolutionary diversification of plants and herbivores, it is important to understand the evolutionary interactions between all involved partners. We found evidence for positive selection on NSPs in Pieris, suggesting that the evolution of host plant adaptive genes is correlated with patterns of host plant usage in this butterfly genus. Moreover, we also observed that MAs, which are paralogs of NSPs, are subject to more strict purifying selection than NSPs. Our findings combine results from genetic and ecological assays to focus on how the evolution of these two paralogous genes may affect the arms-race between Brassicales and Pieris butterflies and their consequent diversification. Functional assays focusing on selected sites will increase our understanding of the evolution and functional differentiation of NSPs and MAs and how Pieris butterflies adapted evolutionarily to diverse glucosinolates in their host plants.

ACK N OWLED G EM ENTS
We are grateful to Takashi Tsuchimatsu (Chiba University) for useful discussions and comments on this study. We thank Emily Wheeler, Boston, for editorial assistance. We also thank Itsuzai Aoki for his help in P. canidia field collection. This work was supported by Note: ALL: all the orthologs with assigned GO term. N: number of orthologs in the GO term. p values are adjusted with false discovery rates. GO terms with elevated dN/dS values are in bold. FDR adjusted p value: "*" < .05, "**" < .01, "***" < .001.
TA B L E 2 (Continued) a Grant-in-Aid for Scientific Research from the Japanese Society for the Promotion of Science (15J00320 to Y.O.) and partially by Max-Planck-Gesellschaft.

DATA AVA I L A B I L I T Y S TAT E M E N T
The RNA-seq short read data have been deposited in the EBI short read archive (SRA) with the following sample Accession nos: