•Kunitz protease inhibitors (KPIs) feature prominently in poplar defense responses against insects. The increasing availability of genomics resources enabled a comprehensive analysis of the poplar (p)KPI family.
•Using genome analysis, expressed sequence tag (EST) mining and full-length (FL)cDNA cloning we established an inventory and phylogeny of pKPIs. Microarray and real-time PCR analyses were used to profile pKPI gene expression following real or simulated insect attack. Proteomics of insect midgut content was used to monitor stability of pKPI protein.
•We identified 31 pKPIs in the genome and validated gene models by EST mining and cloning of 41 unique FLcDNAs. Genome organization of the pKPI family, with six poplar-specific subfamilies, suggests that tandem duplications have played a major role in its expansion. pKPIs are expressed throughout the plant and many are strongly induced by insect attack, although insect-specific signals seem initially to suppress the tree pKPI response. We found substantial peptide coverage for a potentially intact pKPI protein in insect midgut after eating poplar leaves.
•These results highlight the complexity of an important defense gene family in poplar with regard to gene family size, differential constitutive and insect-induced gene expression, and resilience of at least one pKPI protein to digestion by herbivores.
Kunitz-type PIs (KPIs), first isolated from soybean (Glycine max) (Kunitz, 1945), are small proteins (c. 20–25 kDa) that possess a β-trefoil fold with 10–12 antiparallel β-strands connected by long loops (Song & Suh, 1998). A reactive region on one of these loops protrudes from the globular KPI protein and facilitates binding and competitive inhibition at the active site of serine proteases (Bode & Huber, 1992). Most KPIs have four conserved cysteine residues forming two disulfide bridges, although KPIs with only one or no disulfide bridges exist (Cavalcanti et al., 2002; Araújo et al., 2005; Macedo et al., 2007). The highly conserved disulfide bridge closest to the N-terminus supports the reactive loop region and seems to be necessary for KPI activity (Dibella & Liener, 1969; Cavalcanti et al., 2002).
Kunitz-type PIs are a prominent feature of local and systemic defense responses in poplars (Populus spp.) induced by insect feeding or mechanical wounding (Bradshaw et al., 1990; Haruta et al., 2001; Christopher et al., 2004; Ralph et al., 2006; Philippe & Bohlmann, 2007), and several stress-induced poplar KPI (pKPI) genes have been cloned. Evidence for a defense function of pKPIs has been obtained with transgenic tobacco (Nicotiana benthamiana) plants that produce pKPI proteins and negatively affect growth of tobacco budworm (Heliothis virescens) (Lawrence & Novak, 2001). Recent work also demonstrated that pKPIs inhibit proteases in vitro and in forest tent caterpillar (FTC, Malacosoma disstria) gut extracts (Major & Constabel, 2008). In addition, microarray profiling showed strong upregulation of many pKPI genes in leaves following insect feeding (Ralph et al., 2006) or simulated insect damage (Major & Constabel, 2006). The pKPI gene family appears to be under strong selective pressure and rapidly evolving (Miranda et al., 2004; Ingvarsson, 2005; Talyzina & Ingvarsson, 2006). However, a complete genome sequence and gene expression analysis of the pKPI family has not yet been reported. Populus trichocarpa is the first KPI producing tree species for which a genome sequence is available (Tuskan et al., 2006) providing a unique opportunity to explore genome-wide evolution of the KPI gene family.
Here we describe a comprehensive genome- and transcriptome-wide analysis of the pKPI gene family across multiple poplar species using a combination of data mining of the P. trichocarpa genome sequence (Tuskan et al., 2006), expressed sequence tags (ESTs), and extensive new full-length (FL)cDNA collections. By examining the organization of KPI genes in the poplar genome, we highlight gene duplications and expansion of the pKPI family. Patterns of constitutive and stress-inducible KPI gene expression are revealed by cDNA microarray and quantitative real-time PCR (qRT-PCR) analysis. We demonstrate that at least one pKPI protein is stable and resists digestion within the gut of the lepidopteran defoliating herbivore Choristoneura rosaceana.
Materials and Methods
Plant and insect materials
Although multiple poplar genotypes were used for FL-cDNA cloning (see the Supporting Information Table S1), experiments for transcriptome and protein analysis were done with hybrid poplar (P. trichocarpa × deltoides, H11-11). Saplings were propagated and maintained in the greenhouse as described in Ralph et al. (2006). Tissues for analysis of gene expression were collected from trees 150 –200 cm high. For analysis of constitutive gene expression, juvenile leaves (leaf plastochron index LPI 0-5; Larson & Isebrands, 1971), intermediate leaves (LPI 6-9), mature leaves (LPI 9+), leaf petioles, bark, phloem, xylem, and primary and secondary roots were collected from untreated trees. Rearing conditions for forest tent caterpillar (FTC, Malacosoma disstria Hübner) larvae were as described in Ralph et al. (2006). For collection of FTC oral secretions (OS; regurgitated content of the foregut), 3rd and 4th instar FTC were fed on poplar saplings for 1 d and OS was collected under mild vacuum through a 50 μl microcapillary into a 1.5 ml microcentrifuge tube. The FTCs were gently squeezed behind the head to induce release of OS, and this was stored at −80°C. Eggs for oblique-banded leaf rollers (LR, C. rosaceana Harris) were from the Pacific Agri-Food Research Centre, Summerland, BC, Canada. The LRs were reared on artificial diet (multipurpose lepidopteran diet; Bio-Serv, Frenchtown, NJ, USA) containing 0.36% choline chloride and 500 mg l−1 Oreomycin (Bio-Serv) in a growth chamber with 18 h : 6 h and 23°C : 20°C day–night cycle.
A tblastn search of the Treenomix poplar EST and FLcDNA database (Ralph et al., 2006, 2008) using plant KPI protein sequences available from GenBank identified 365 putative pKPI ESTs. CAP3 sequence assembly (Huang & Madan, 1999) was used to group ESTs into singletons and contigs (40 bp overlap, 95% identity). The corresponding pKPI cDNA clones were identified in our cDNA library glycerol stocks, insert sizes were determined and inserts sequenced to high accuracy to capture all available unique FLcDNA sequences.
Genome and phylogenetic analyses
Using blastp analyses of the 45 555 protein-coding gene loci predicted from the poplar genome sequence (http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html) we identified pKPI genes in the P. trichocarpa Nisqually-1 genome (Tuskan et al., 2006). As query sequences, we used the 41 pKPI proteins identified in the Treenomix poplar EST collection and the plant KPI sequences available in NCBI GenBank. The blastp analyses were repeated with each newly identified KPI sequence from the poplar genome until no further sequences with significant similarity were identified. Predictions for protein sequence, pI and MW were for entire open reading frames (ORFs) using the pI/Mw tool at Expasy (http://www.expasy.org/tools/pi_tool.html). In cases where ORFs were predicted to be truncated, we used the genome sequence to design primers flanking the ORFs (Table S2), performed PCR using P. trichocarpa Nisqually-1 genomic DNA as template, and sequenced PCR products to validate the predicted gene model. Genomic DNA was isolated from mature leaves using the DNeasy plant mini kit from Qiagen (Valencia, CA, USA) according to manufacturer instructions.
Multiple sequence alignments of pKPI proteins (Fig. S1) were performed using clustalw (http://www.ebi.ac.uk/clustalw/) and boxshade (http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html). For phylogenetic analysis, all pKPI sequences were aligned using dialign (threshold = 0) (Morgenstern et al., 1998; http://bioweb.pasteur.fr/seqanal/interfaces/dialign2-simple.html). Alignments were manually adjusted before maximum likelihood analysis using phyml v2.4.1 (Guindon & Gascuel, 2003) with the JTT (Jones et al., 1992) amino acid substitution matrix. The proportion of invariant sites and the alpha shape parameter were estimated by phyml. Trees were generated using bionj (Gascuel, 1997), a modified neighbor joining algorithm. seqboot of the phylip v3.62 package (Felsenstein, 1993; http://evolution.genetics.washington.edu/phylip.html) was used to generate 500 bootstrap replicates, which were then analysed using phyml and the previously estimated parameters. consense, also from phylip, was used to create a consensus tree. treeview (Page, 1996) was used to visualize trees. Bootstrap values for nodes above 67% were indicated on the maximum likelihood tree generated from the original data set. For the phylogenetic analysis of pKPIs relative to other representative plant KPIs, the NCBI Genbank (http://www.ncbi.nlm.nih.gov) was searched using keyword ‘plant Kunitz’ revealing 45 mRNA sequences and 11 protein sequences for nonpoplar plant species annotated as trypsin or Kunitz protease inhibitors. Predicted or validated protein sequences were aligned with the 31 pKPI genome models using clustalw using default parameters, and a phylogenetic tree was generated using the clustalw guide tree.
For transcript profiling experiments, treatments were applied to the five lowest, fully-expanded, healthy leaves (LPI range of 13–18; surface area of c. 425 cm2) of a given tree. Treatments included FTC herbivory, mechanical wounding (MW), application of OS combined with MW (OS), and methyl jasmonate (MeJa) spraying (Fig. S2). Herbivory treatment was as described in Ralph et al. (2006) using 3rd to 5th instar FTCs caged in groups of five on leaves after larvae were starved for 40 h. To allow for comparisons with the effect of FTC treatment the MW, OS and untreated control treatments also had leaves caged in mesh bags. For MW four 10-cm pattern wheel strips were punctured parallel to the midrib on each leaf. The OS treatment was identical to MW except that 20 μl of FTC OS was applied with a paintbrush to the wounded leaf surface. For MeJa treatment leaves were sprayed with 25 ml of 0.01% v : v MeJa in 0.1% v : v Tween20. Control trees for MeJa treatments (see Fig. S2) were sprayed with 25 ml of 0.1% v : v Tween20. All treatments were performed with five biological replicate trees for each time-point. Leaves with petioles removed were harvested 2, 6, and 24 h after onset of treatment, separately flash frozen in liquid nitrogen and stored at −80°C. For protein analysis of gut contents, groups of four 3rd to 4th instar LR larvae were placed on each of four c. 125 cm poplar saplings in a cage to feed for 7 d, then collected and immediately frozen on dry ice and stored at −80°C before dissection.
For treated and control leaves, total RNA was individually isolated from each tree, quantified, and checked for integrity and purity as described in Kolosova et al. (2004). Microarray experiments were designed to comply with MIAME guidelines (Brazma et al., 2001). Details of the 15.5K poplar cDNA microarray platform (NCBI GEO platform number GPL5921) were described in Ralph et al. (2006). Microarray hybridizations, image capture and processing, data normalization and analysis were as described in Ralph et al. (2006). Scanned microarray TIF images, the gene identification file, and imagene quantified data files are available at the NCBI GEO database (series GSE13767, GSE14039 and GSE14081). Briefly, total RNA from treated and control leaves was compared using a total of 54 hybridizations for FTC/MW/OS vs untreated control comparisons, 18 for MeJa vs Tween control comparisons, and 20 for individual tree comparisons (Fig. S3). A detailed description of the genome-wide transcript profiling patterns will appear elsewhere (S. G. Ralph, D. Lippert, R. N. Philippe & J. Bohlmann, unpublished). Complete details of hybridization design and analysis can be found in the Supporting Information Methods.
Quantitative real-time PCR (qRT-PCR) analysis
The qRT-PCR analyses were performed as described in detail in Ralph et al. (2006). Complete details of experimental design and analysis can be found in the Supporting Information Methods. Primers used are listed in Table S2.
Analysis of KPI proteins in insect gut content
Proteomics methods were used to identify the midgut contents of poplar leaf-fed C. rosaceana (summarized in Fig. 6a). Briefly, midgut content was isolated from dissected caterpillars between abdominal segments 1 and 5 and immediately frozen in liquid nitrogen. Protein was extracted then chromatographically and gel separated. A band corresponding to 20–25 kDa (KPI molecular mass) was excised and trypsin digested. The resulting peptides were sequenced via LC-MS/MS, and KPI sequences were identified by searching against poplar genome and EST sequences. Complete details of experimental design and analysis can be found in the Supporting Information Methods.
Genome organization of the pKPI family
Searches of the predicted protein-coding gene loci of the P. trichocarpa Nisqually-1 genome using BLASTP resulted in the identification of 31 putative pKPI genes (Table 1). Of these, 19 are localized to linkage groups (LGs; i.e. chromosomes) and 12 are located on partly assembled contiguous sequence scaffolds (Table 1). Most pKPIs mapped to LGs are organized as gene clusters ranging in size from two to seven pKPIs (Fig. 1). Gene clusters are located on LG IV (PtiKPIs D2.1, D1.1 and D7), LG VII (PtiKPIs F1, F7, F6 and F2), LG X (PtiKPIs A1, A3, A2.1, A5, A4, B2 and PtiKPI-1), and LG XIX (PtiKPIs C2 and C1.1). Careful inspection of the predicted pKPI gene models suggests that several of the corresponding genome sequences are either incorrect (e.g. introducing a misplaced stop codon) or that these loci indeed code for truncated and potentially nonfunctional KPI. Suspicious or truncated gene models include PtiKPI-A2.1, PtiKPI-B1, PtiKPI-B2, PtiKPI-C3, PtiKPI-D6, PtiKPI-E4 and PtiKPI-1 (Table 1). Polymerase chain reaction amplification and sequencing of these genome regions confirmed that PtiKPI-1, PtiKPI-B1 and PtiKPI-E4 code for truncated pKPI ORFs. By contrast, we were able to obtain full-length ORFs for PtiKPI-A2.1 and PtiKPI-C3, suggesting the available genome sequences at these loci are incorrect (Table 1). Despite multiple attempts, we were unable to sequence the complete genome regions for PtiKPI-B2 and PtiKPI-D6. The qRT-PCR analysis and comparison of the 31 genomic pKPI sequences to poplar ESTs in GenBank identified five pKPIs (PtiKPI-1, PtiKPI-D4, PtiKPI-D6, PtiKPI-E3 and PtiKPI-E4; Table 1) that may represent pseudogenes since they did not have a match with any EST at ≥ 95% identity with a BLASTN match length of ≥ 400 nts and show no detectable transcript in any tissues we examined by qRT-PCR (data not shown). It is notable that five of the 12 gene models mapped to sequence scaffolds (Table 1; Fig. 1) fall into the group of KPIs without support for gene expression and/or code for truncated proteins.
Table 1. Poplar Kunitz-type protease inhibitor (pKPI) inventory: gene name, species, gene model, genomic location, protein and transcript features of poplar KPI genes
Populus species and genotype
Gene model or GenBank accession
ORF length (aa)
ORF, open reading frames; n.a., not available.
aPredicted pseudogene. Poplar gene models were evaluated by quantitative real-time polymerase chain reaction (qRT-PCR) and compared by blastn to sequences in the dbEST database of GenBank to evaluate gene expression. Gene models that were not expressed in tissues examined or did not match at least one public expressed sequence tag (EST) at ≥ 95% identity with a match length of ≥ 400 nt were considered pseudogenes.
bGene model represents a partial ORF.
cGene model from genome sequence represents a partial ORF of 106 aa. However, PCR amplification and sequencing of this genome region revealed a full-length ORF of 202 aa.
dGene model from genome sequence represents a partial ORF of 132 aa. However, PCR amplification and sequencing of this genome region revealed a full-length ORF of 207 aa.
To further validate the in silico genome sequence analysis of pKPIs and to extend the discovery of pKPIs, we performed tblastn searches of the FLcDNA-enriched Treenomix poplar EST database (http://www.treenomix.ca; Ralph et al., 2006, 2008) which identified 365 putative pKPI ESTs. Sequence comparisons and complete insert sequencing revealed 57 different full-length clones for 41 unique pKPI FLcDNAs (listed in Table 1 under GenBank accessions GQ184778–GQ184818). In addition to the 31 pKPIs predicted in the poplar genome sequence and the 41 unique pKPI FLcDNAs, Table 1 also includes KPIs of other poplar species described previously in the literature, giving a total of 61 pKPIs represented by 92 sequences. All pKPI have been assigned a unique gene identifier (e.g. PtiKPI-A1). For each identifier a species-specific prefix (e.g. Pti, Ptxd, Pal) has been assigned as described in Kumar et al. (2009), followed by the KPI subfamily designation (i.e. KPI-A to KPI-F; reflecting the phylogeny in Fig. 2). Each individual sequence within a subfamily is also numbered (e.g. A1, A2, A3). Sequences that are at least 98% identical (amino acids, aa) have the same gene number, unless the genes are derived from the P. trichocarpa genome sequence and are located at different loci. In cases where multiple sequences share at least 98% aa identity and are derived from the same genotype, the additional designation 0.1, 0.2 or 0.3 has been added.
Predicted pKPI ORFs range from 116 aa (PtiKPI-E3) to 219 aa (PtaKPI-B11) with predicted pIs from 3.98 (PtxdKPI-E1, PtxdKPI-E5.1 and PtxdKPI-E5.2) to 9.72 (PtxdKPI-D12) and Mw from 12.6 (PtiKPI-E3) to 23.8 (PtaKPI-B12) kDa (Table 1; except for those that appear to be truncated and for which no FLcDNAs are available (PtiKPI-1, PtiKPI-B1, PtiKPI-B2, PtiKPI-D6 and PtiKPI-E4). Several pKPIs share greater than 98% identity (aa) and may represent interspecies or intraspecies allelic variations of the same genes (Table S4). For example, FLcDNA PtiKPI-D2.2 is 100% identical at the aa level with the genome model of PtiKPI-D2.1 (both are from the P. trichocarpa Nisqually-1 genotype), but they are < 100% identical at the nucleotide level, suggesting that they are Nisqually-1 allelic variants (Tables 1, S4). In total, 14 potential allelic groups of pKPIs were identified. We also found three new FLcDNAs that are 100% identical at the nucleotide level with pKPI genome models (PtiKPI-D1.1, PtiKPI-D1.2; PtiKPI-A2.1, PtiKPI-A2.2) or a previously identified KPI gene (PtxdKPI-D10.1, PtxdKPI-D10.2) (Tables 1, S4).
Structure of the pKPI gene family
A phylogeny of the 61 unique pKPI proteins (represented by 92 sequences) identified six distinct subfamilies: A–F (Fig. 2). Overall sequence identity among subfamilies is low (Table S3); however, the conserved Kunitz motif (Fig. S1) supports the inclusion of all 61 proteins in the pKPI family, except for the truncated PtiKPI-1 and PtiKPI-E4 proteins. Sequence conservation is strongest among subfamilies A, B and C which contain KPIs from several poplar species and include protein-coding loci from the poplar genome and the majority (27/41) of the FLcDNAs identified in this study. Subfamily A contains 10 genes represented by 20 sequences and is the most conserved of the six subfamilies with aa sequence identity of 76.4% to 100% (Table S3). PtiKPI-A4 and PtiKPI-A5 share 100% aa identity and are predicted gene models from the poplar genome that are located adjacent to each other on LG X, and thus are likely the product of a localized tandem duplication. Subfamily B contains 13 genes represented by 15 sequences with overall sequence identity of 37.6–99.5%. A cluster within subfamily B containing PalKPI-B9, PtdKPI-B10, PtaKPI-B11, PtaKPI-B12 and PtdKPI-B13 does not include a representative from the P. trichocarpa genome. Subfamily C contains 10 genes represented by 20 sequences with overall sequence identity of 41.7–99.5%. A large cluster within subfamily C contains only a single KPI identified from the P. trichocarpa genome, PtiKPI-C1.1, yet four closely related but unique pKPI FLcDNAs were identified in both the H11-11 (PtxdKPI-C1, PtxdKPI-C4, PtxdKPI-C5.1, PtxdKPI-C5.3) and NxM6 (PtxnKPI-C1, PtxnKPI-C6.1, PtxnKPI-C6.2, PtxnKPI-C7) genotypes, suggesting that either PtiKPI-C1.1 has been duplicated multiple times within these lineages or that we have obtained multiple allelic variants of PtiKPI-C1.1 from these genotypes.
There are two notable features of subfamilies D, E and F. First, KPIs within these subfamilies are, in general, less conserved than those from subfamilies A, B and C (Table S3); and second, relatively few FLcDNA representatives exist for these subfamilies (Fig. 2). For example, subfamily D contains 12 genes represented by 17 sequences, including seven of the 31 predicted pKPI gene models in the P. trichocarpa genome, with overall aa sequence identity of 31.3–99.5%. Subfamily E consists of five genes represented by seven sequences, including four from the P. trichocarpa genome, with aa identity of 8.4–99.5%. Outside of the Kunitz motif there is very low sequence conservation between KPI proteins of subfamily E and the other subfamilies (Fig. S1). It is also notable that no FLcDNAs have been identified for the subfamily E members PtiKPI-E2, PtiKPI-E3, and PtiKPI-E4, suggesting these loci may represent pseudogenes or genes expressed at low abundance. Subfamily F contains nine genes represented by 11 sequences with sequence identity of 45.4–99.5%. Eight of the 31 pKPI predicted in the P. trichocarpa genome belong to subfamily F, with only three supporting FLcDNAs identified, again suggesting that the predicted protein-coding loci in the genome may represent pseudogenes, or genes expressed at low abundance or in tissues not examined here. In addition to the pKPIs of subfamilies A–F, we also identified two KPI predicted from the P. trichocarpa genome, PtiKPI-1 and PtiKPI-2, that could not be placed within any subfamily with confidence (Fig. 2). The gene model for PtiKPI-1 encodes for a truncated protein with no supporting FLcDNA, whereas the gene model for PtiKPI-2 likely encodes for a complete protein that has diverged considerably in sequence from other members of the poplar KPI family. Although no supporting FLcDNA for PtiKPI-2 has been identified, we found two ESTs (GenBank IDs CV263669 and CV261218) with > 99% nucleotide sequence identity, confirming that PtiKPI-2 is an expressed gene.
Conserved sequence motifs of pKPIs
Sequence alignment of the 92 sequences representing 61 pKPI proteins revealed several conserved features (Fig. S1). The Kunitz motif ((L,I,V,M)-X-D-(X2)-G-(X2)-(L,I,V,M)-(X5)-Y-X-(L,I,V,M)) is conserved across all pKPIs (aa 26 to 42 of PtiKPI-F6) except for PtiKPI-1 and PtiKPI-E4, which are predicted from the poplar genome sequence and appear to be truncated. Signal peptides were identified in all pKPIs except PtiKPI-E2, PtiKPI-E3 and PtiKPI-E4, with predicted cleavage sites 16–29 aa from the N-terminus. Cleavage of the signal peptide releases the active PI proteins (Song & Suh, 1998). Four cysteine residues are conserved in most pKPI sequences (aa 62, 109, 154, 168 of PtiKPI-F6), although their exact position varies slightly between subfamilies. These cysteine residues form two disulfide bridges that stabilize the mature protein (Richardson, 1977). Within KPI subfamily E only a few members possess the first conserved cysteine residue, and all proteins in this subfamily lack the third and fourth residues. A reactive region is highly conserved across all subfamilies (aa 81–90 of PtiKPI-F6), with some sequence divergence among subfamily E proteins (Fig. S1). The composition of amino acids at the reactive loop is known to strongly influence the inhibitory function of protease inhibitors (Song & Suh, 1998; Ravichandran et al., 1999). Within this loop, the P1 and P1′ residues (aa 86 and aa 87 of PtiKPI-F6) are considered the most critical for PI function (de Meester et al., 1998; Song & Suh, 1998). The P1 and P1′ residues appear to be more strongly conserved in subfamilies A, B and C compared with subfamilies D, F and especially E. This sequence divergence is perhaps indicative of functional divergence between these subfamilies.
Genome-wide analysis of pKPI evolution
To understand how pKPIs may have evolved at the level of genome organization, we searched for patterns of tandem duplications among the 31 genomic pKPIs. The three KPIs clustered on LG IV (Fig. 1) all group within subfamily D (Fig. 2), with highest aa identity (60%) between PtiKPI-D1.1 and PtiKPI-D2.1 suggesting an ancient tandem duplication. The third member of the LG IV gene cluster, PtiKPI-D7, shares low sequence identity with PtiKPI-D1.1 and PtiKPI-D2.1 (38% and 44%, respectively). The four KPIs clustered on LG VII all belong to subfamily F (Figs 2,3). These proteins share pairwise aa identities of < 50%, except for PtiKPI-F1 and PtiKPI-F2, which display 92% identity, suggesting that PtiKPI-F1 and PtiKPI-F2 resulted from a recent tandem duplication. Among the seven KPIs on LG X, five group with subfamily A, whereas PtiKPI-B2 groups in subfamily B and PtiKPI-1, a predicted pseudogene, does not cluster within any subfamily (Fig. 2). Pairwise aa identity between PtiKPI-A1, PtiKPI-A2.1 and PtiKPI-A3 range from 91% to 96%, suggesting recent duplications of an ancestral KPI, whereas PtiKPI-A4 and PtiKPI-A5 are 100% identical at the nucleotide level and thus may represent a very recent tandem duplication. The two genes located on LG XIX, PtiKPI-C1.1 and PtiKPI-C2, (Fig. 2), share only 72% aa identity thus suggesting an older tandem duplication.
Examination of predicted genes located immediately upstream and downstream of KPI gene clusters revealed that the six genes adjacent to the large KPI cluster on LG X include two pairs of high sequence identity, genes 18 and 22 (aa 72%) and 20 and 21 (aa 72%) (all code for proteins with no similarity to sequences in GenBank), providing further support for the occurrence of localized gene duplications and rearrangements at this locus (Fig. 1). In addition to these likely tandem duplications of KPIs within LGs, there are also several examples of high sequence similarity between KPI proteins mapped to different LGs and sequence scaffolds. These include: PtiKPI-F5 and PtiKPI-F6 (77% aa identity), PtiKPI-B1 and PtiKPI-B2 (96% aa identity), PtiKPI-C2 and PtiKPI-C3 (98% aa identity), PtiKPI-E2 and PtiKPI-E3 (77% aa identity; both mapped to different scaffolds), and PtiKPI-D4 and PtiKPI-D5 (100% aa and 99% nucleotide identity; both mapped to different scaffolds).
In addition to tracing tandem duplications, we also searched for evidence of segmental duplications, as it is known that the ancestral poplar genome underwent a whole genome duplication c. 60–65 Myr ago (Tuskan et al., 2006). Comparisons among the four linkage groups with two or more KPI (i.e. LG IV, LG VII, LG XIX and LG X) show overall low aa identity among gene clusters ranging from 15% to 25%, with sequence homology confined to the Kunitz motif and reactive loop regions (Table S3, Fig. S1). One exception is the pair of KPI clusters on LG X and LG XIX, which share c. 50% aa identity. Curiously, analysis by Tuskan et al. (2006) of chromosomal rearrangements associated with the whole genome duplication did not identify segmental duplications between these linkage groups. Conversely, the rearrangements identified by Tuskan et al. (2006) do not associate with the patterns of pKPI duplication, suggesting that polyploidization and subsequent genome reorganization have not played a major role in the evolution of this family in poplars.
In order to uncover the relationship of poplar KPIs to other plant KPIs, we compared the aa sequences of the 31 KPI from the P. trichocarpa genome with 56 proteins annotated as trypsin inhibitor or KPI proteins from 25 other species. The phylogenetic tree (Fig. 3) suggests genus-specific or species-specific patterns of gene family expansion. Poplar KPI subfamilies clustered independently of other plant KPI lineages, as do the two KPI and patatin families from potato (Solanum tuberosum), the sporamin family from sweet potato (Ipomoea batatas), the Theobroma genus expansion and the soybean (Glycine max)-specific family encompassed in the larger legume group. Poplar KPIs do not group with any of the genes obtained from the fully-sequenced Arabidopsis thaliana genome, which are scattered across the tree, further supporting the concept of independent gene family expansion in different plant lineages. Given that poplar is the only tree species producing KPIs for which a complete genome sequence is available, this analysis may be refined in the future as other relevant plant genomes are sequenced. In summary, the available data suggest that expansion and evolution of the pKPI family involved tandem duplications on several LGs followed by sequence divergence.
Microarray profiling reveals some general patterns of pKPI responses
In order to broadly assess the KPI transcriptome response in poplar leaves upon simulated and real insect attack, we performed a set of initial microarray profiling experiments for a time-course of 2–24 h following stress treatment by FTC feeding, MW, application of OS or application of MeJa (Figs S2,S3). The poplar 15.5K cDNA microarray (Ralph et al., 2006) used in this study contains 29 KPI elements representing all six pKPI subfamilies A–F (Table 2).
Table 2. cDNA microarray analysis of poplar Kunitz-type protease inhibitor (pKPI) transcript abundance in response to stress treatments
We found two major patterns of KPI responses when we analysed data for (a) type of treatment and (b) association with subfamilies. First, generally KPIs that responded with increased transcript abundance displayed a similar magnitude of induction for all treatments (Table 2). However, temporal patterns of induction varied with the type of treatment. Rapid upregulation with increased KPI transcript levels at 2 h and peak levels at 6 h was induced with MeJa. By contrast, FTC caused an initial downregulation at 2 h and 6 h for several KPI transcripts, most prominently for members of subfamilies A and B, followed by a dramatic increase at 24 h. The initial downregulation was not found in response to MW or OS, instead these treatments showed no change or a moderate upregulation of KPI transcripts at 2 h and 6 h. The addition of OS to MW leaves accelerated the initial response.
Second, KPIs fell into three groups delineated by association with subfamilies and different magnitudes and temporal patterns of response (Table 2). First, a subset of genes that prominently includes all members of subfamilies A and B represented on the array, as well as some members of subfamilies C and D, showed a strong response to treatments with up to 30-fold induction. A second group of transcripts, covering most of the subfamily C representatives on the array, displayed a more moderate upregulation and did not show initial FTC-induced downregulation. The third group, which includes most of the subfamily D, E and F members on the array, showed no or only weak response to any of the stress treatments tested.
To accommodate the large number of samples (90 trees for 18 time-points and treatments), the initial profiling experiments were done with pooled RNA from five biological replicates using a total of 72 array hybridizations (Fig. S3). In order to validate these analyses with independent biological replicates, we performed 16 additional hybridizations with separate RNA from four replicate trees for each treatment at the time-point of strongest response (FTC 24 h; MW 6 h; OS 2 h; MeJa 2 h; Fig. S3). The results obtained from analysis of pooled and independent replicate samples were qualitatively the same with the independent replicate analysis revealing slightly stronger response profiles (Table 3).
Table 3. Refined cDNA microarray analysis of poplar Kunitz-type protease inhibitor (pKPI) transcripts at peak abundance in response to stress treatments
pKPI gene expression analysis by qRT-PCR
In order to obtain a more refined picture of the KPI response in hybrid poplar (P. trichocarpa × deltoides H11-11) exposed to real and simulated herbivory, we measured, by qRT-PCR, transcripts for 12 KPIs for which gene-specific primers could be identified and verified by amplicon sequencing in this genotype. Before qRT-PCR analysis across all treatments and time-points, we first tested the performance of qRT-PCR with individual biological replicates and pooled replicates. For this purpose, we selected a subset of three genes representing low- (PtxdKPI-C1), mid- (PtxdKPI-C5.1) and highly-induced (PtxdKPI-B5) transcripts (Table 2) and measured their abundance in each of five individual OS-treated trees and controls. Our results for these three KPIs (Fig. S4, Table S5) demonstrated consistent responses to OS in each individual tree. The magnitude of response was comparable to that observed with pooled samples (Fig. 4, Table S6). The statistical significance of the observed induction, as determined with Student’s t-test, exceeded the 99.999% significance threshold for all three genes. Although it may have been preferable to use independent biological replicate analyses for the entire qRT-PCR study, the large number of samples (90 trees) required the use of pooled RNA samples. The experiments with PtxdKPI-B5, PtxdKPI-C1 and PtxdKPI-C5.1 showed that results with pooled replicates are accurate representations of those with independent replicates, at least when using clonal trees under glasshouse conditions.
Details of the complete qRT-PCR KPI profiling are shown in Fig. 4 and Table S6. In summary, all 12 KPIs tested responded to most treatments (FTC, MW, OS, MeJa) relative to the controls. Upregulation of transcripts in response to FTC, MW and OS was usually highest at 24 h ranging from eightfold (PtxdKPI-D2; MW-24 h) to 480-fold (PtxdKPI-E5.1; MW-24 h) change over controls. Transcript responses to MeJa typically peaked at 6 h and remained significantly increased at 24 h with maximum induction of fivefold (PtxdKPI-F9; 24 h) to 320-fold (PtxdKPI-B3; 6 h) change. With a few exceptions (PtxdKPI-A1.2, PtxdKPI-B3, PtxdKPI-D2) transcript responses to MW exceeded those to FTC or OS at all time-points tested, which indicates suppression of KPI induction by FTC elicitors in the OS. The greater sensitivity and detection range of qRT-PCR allowed for a better resolution of suppression of early KPI responses after onset of FTC feeding than that observed by microarray hybridization (Table 2). The FTC feeding resulted in initial (2 h) downregulation (up to fivefold; PtxdKPI-E5.1) of several KPI transcripts, in some instances continuing on through to 6 h (Table S6).
Constitutive KPI expression levels in different organs of poplar
Beyond analysis of induced KPI expression, we also measured spatial expression patterns for the same 12 KPIs in untreated tissues: mature (LPI 9+), intermediate (LPI 6–9) and juvenile (LPI 0–5) leaves, and petioles, bark, phloem, xylem and primary and secondary roots from 1-yr-old saplings (Fig. 5). Transcript levels for a given KPI gene varied on the order of 10 000-fold among tissues. Most KPIs were expressed in all tissues examined, although PtxdKPI-D12 displayed root-specific activity and PtxdKPI-A6 was detected only with very low levels in juvenile sink leaves. Several KPIs were expressed at higher levels in leaves compared with stem and root tissues. PtxdKPI-A6, PtxdKPI-B3, PtxdKPI-B5, PtxnKPI-B7.1 and PtxdKPI-E5.1 were preferentially expressed in leaves with transcript abundances 10- to 1000-fold greater than in other tissues. In contrast to transcripts for PtxdKPI-E5.1 which showed lower abundance as leaves matured, PtxdKPI-C5.1 and PtxdKPI-D10.2 transcripts were more abundant in older leaves. The KPIs with more abundant transcripts in roots compared with stems and leaves included PtxdKPI-C1, PtxdKPI-D2, PtxdKPI-D12 and PtxdKPI-F9.
KPI proteins are found in the insect gut
In order for KPIs to successfully inhibit digestive proteases as demonstrated by Major & Constabel (2008), and thereby potentially affect insect herbivores, it may be necessary for KPI proteins to survive passage through the insect digestive system. To assess the stability of pKPI protein in the insect gut, LR larvae were allowed to feed on leaves of poplar saplings for 7 d before midgut contents were isolated for H11HH proteprotein analysis (Fig. 6). Midgut proteins were fractionated by FPLC (Fast Protein Liquid Chromatography) ion exchange chromatography and sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS–PAGE). A partly purified gel-excised protein fraction of 20–25 kDa corresponding to the predicted molecular mass of intact KPIs (Table 1) was proteolytically digested and peptides were analysed by LC-MS/MS. This analysis identified four peptides matching the predicted PtxdKPI-C1 protein with a G-residue in the first peptide unambiguously identifying PtxdKPI-C1. The four peptides covered 27.7% of the PtxdKPI-C1 protein. When theses peptides are mapped onto the predicted aa sequence for PtxdKPI-C1, the sequence between the most N-terminal and the most C-terminal peptides includes over 73% (149 of 202) of the full-length protein including the reactive-site loop motif. Since these peptides were obtained from proteins of approx. 20–25 kDa, it is likely that the peptides measured by LC-MS/MS represent intact or near-intact PtxdKPI-C1 protein from the midgut of LR larvae.
The poplar genome contains a large KPI gene family
Previous analysis of the poplar draft genome sequence showed that gene families involved in defense are typically larger in this long-lived tree relative to those in the short-lived A. thaliana (Tuskan et al., 2006). The pKPI gene family shows evidence of expansion as a result of repeated tandem gene duplications. We identified a total of 31 KPI genes and possible KPI pseudogenes in the P. trichocarpa Nisqually-1 genome. Localization of nearly half of the pKPIs to scaffolds, as opposed to complete LGs, illustrates the still-preliminary status of the P. trichocarpa genome sequence assembly, and the total number of pKPI genes may increase with improved coverage and assembly of the poplar genome sequence.
The large size of the pKPI family may be a signature obtained during evolution of poplars exposed during their long lifespans to a multitude of potential insect pests (Philippe & Bohlmann, 2007) each equipped with its dynamic array of digestive proteases. Different proteases can be differentially expressed in response to the presence of KPIs in ingested plant materials and some insect proteases are less susceptible to KPIs than others (Bolter & Jongsma, 1995; Broadway, 1995; Jongsma et al., 1995; Mazumdar-Leighton & Broadway, 2001a,b; Volpicella et al., 2003; Major & Constabel, 2008). Differential constitutive and induced expression of members of the pKPI family may counter the effectiveness of changing profiles of insect proteases. Possessing a variety of KPIs may also be advantageous in interactions with a variable range of insect pests. Previous research suggests that pKPI genes are rapidly evolving, with considerable sequence diversity among genotypes that are spread across geographic clines (Ingvarsson, 2005; Talyzina & Ingvarsson, 2006). This capacity for rapid evolution may reflect importance of KPI proteins in defense against insects, which have short generation times and a potential for rapid adaptation within their arsenal of digestive proteases. The origins of selection pressures on the pKPI family are difficult to demonstrate directly, but could involve pressure from a diversity of herbivore pests with multiple digestive proteases, as well as selective pressures owing to other roles played by KPIs in plant biology (Downing et al., 1992). The limited transcriptional response observed in subfamilies D/E/F and sequence divergence in the reactive loop relative to subfamilies A/B/C could indicate biological function beyond insect defense for these pKPIs.
Herbivore-induced response of pKPI transcripts
As with many plant defense gene families, the pKPI family displays patterns of differential induced expression across the members of most of the KPI subfamilies. Many pKPIs were upregulated in response to insect attack (mainly subfamilies A/B/C), including those that demonstrated inhibition of insect proteases in vitro (Major & Constabel, 2008). Despite the general pattern of increased KPI transcript abundance in poplar leaves in response to real and simulated insect attack, a striking feature of the response to actual herbivory by FTC is the initial downregulation of many KPI transcripts at 2 h and 6 h after the onset of FTC feeding. This initial downregulation is perhaps caused by the presence of insect-specific elicitors. For example, FTC, like other lepidopteran larvae, produce a fatty-acid conjugate defense response elicitor found in the OS (Major & Constabel, 2006). Although suppression of KPI induction by OS could not be resolved by microarray analyses when we compared MW and OS (Table 2), the higher resolution of qRT-PCR analysis with select gene-specific primers showed, in most cases, a substantially reduced KPI transcript response in leaves treated by OS compared with MW-treated leaves (Fig. 4). These results suggest insect-specific and possible OS-elicitor mediated modifications of the poplar KPI response. Since not all KPIs respond in the same manner, there may be interplay of multiple signals produced at the site of herbivory that are recognized by different KPI promoters responsible for patterns of initial downregulation, upregulation or no apparent response of the various pKPI genes. Since the early suppression of KPI induction is not observed in response to MeJa treatment, it is possible that the suppression is controlled by signals that are, at least in part, independent of and possibly acting antagonistic to those that control the subsequent upregulation. A pattern of initial downregulation of a variety of putative defense genes in response to insect feeding was also observed in other plant species, such as tomato (Lawrence et al., 2007) and wild tobacco (Nicotiana attenuata) (Schittko et al., 2001). Using the poplar 15.5K array platform, Miranda et al. (2007) demonstrated that many herbivore defense genes, including pKPIs, were repressed following Melampsora medusae fungal infection of H11-11 hybrid poplar leaves. These same genes are also initially downregulated following FTC feeding in our study, but are strongly induced by 24 h after continuous FTC feeding. Recent work in N. attenuata demonstrated that small RNAs play a central role in regulating these OS-elicited changes in defense induction (Pandey et al., 2008). With the available poplar genomics resources, it will now be possible to test if patterns of initial downregulation and subsequent strong upregulation of KPIs in poplar are regulated by a similar mechanism involving stress-responsive microRNAs (Lu et al., 2008).
Stability of pKPI protein in the insect gut
For a plant protein to perform a defense function after ingestion by an insect, it may need to remain largely intact in the insect digestive system. Poplar polyphenol oxidase and tomato threonine deaminase have previously been shown to be stable or active in the guts of lepidopteran larvae (Wang & Constabel, 2004; Chen et al., 2005, 2007). Some plant KPIs are exceptionally resistant to reducing agents, boiling and extremes of pH (Garcia et al., 2004; Macedo et al., 2004, 2007). Major & Constabel (2008) found substantial variation in biochemical stability of several KPI proteins in in vitro stability assays, with subfamily E members being the most stable, retaining activity at high levels of DTT and high temperature, while members from subfamily B – the most strongly induced in this study – were the most sensitive to these stress conditions. Our result from mapping of peptides generated by proteolytic cleavage of partially purified proteins of 20–25 kDa suggests that at least one pKPI, PtxdKPI-C1 of subfamily C, is stable in the LR insect midgut.
The authors thank David Kaplan for greenhouse support, Sarah Martz for OS collection, Dana Aeschliman and Rick White for statistical support, Dustin Lippert and members of the UVic/Genome BC Proteomics Centre for help with protein analysis, Gary Judd for supply of oblique-banded leaf rollers, and Bob McCron for forest tent caterpillars. This project was supported with funding from the Natural Sciences and Engineering Research Council of Canada (NSERC grant to J.B.; graduate student fellowship to R.N.P.), Genome Canada and Genome British Columbia (grants to J.B.). Salary support for J.B. was provided, in part, by the UBC Distinguished University Scholar Program and an NSERC E.W.R. Steacie Memorial Fellowship.