K. G. Welinder, Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg University, Sohngaardsholmsvej 49, DK-9000 Aalborg, Denmark Fax: +45 9814 1808 Tel: +45 2196 5333 E-mail: firstname.lastname@example.org
Potato tuber storage proteins were obtained from vacuoles isolated from field-grown starch potato tubers cv. Kuras. Vacuole sap proteins fractionated by gel filtration were studied by mass spectrometric analyses of trypsin and chymotrypsin digestions. The tuber vacuole appears to be a typical protein storage vacuole absent of proteolytic and glycolytic enzymes. The major soluble storage proteins included 28 Kunitz protease inhibitors, nine protease inhibitors 1, eight protease inhibitors 2, two carboxypeptidase inhibitors, eight patatins and five lipoxygenases (lox), which all showed cultivar-specific sequence variations. These proteins, except for lox, have typical endoplasmic reticulum (ER) signal peptides and putative vacuolar sorting determinants of either the sequence or structure specific type or the C-terminal type, or both. Unexpectedly, sap protein variants imported via the ER showed multiple molecular forms because of extensive and unspecific proteolytic cleavage of exposed N- and C-terminal propeptides and surface loops, in spite of the abundance of protease inhibitors. Some propeptides are potential novel vacuolar targeting peptides. In the insoluble vacuole fraction two variants of phytepsin (aspartate protease) were identified. These are most probably the processing enzymes of potato tuber vacuolar proteins.
Database Proteome data have been submitted to the PRIDE database under accession number 17707.
receptor homology-transmembrane-RING H2 domain protein
sequence or structure specific
vacuolar processing enzyme
vacuolar sorting determinant
vacuolar sorting receptor
Potato (Solanum tuberosum) is the world’s third most important food crop providing starch, protein, fibres, vitamins and minerals. Depending on cultivar, the tuber content is ∼ 18% starch and 2% protein. Moreover, the protein has a nutritional value similar to hen egg protein. Most potato tuber protein is located in the vacuole and dominated by an abundance of patatin and protease inhibitor variants [1–4]. Understanding of tuber vacuolar protein targeting and processing is therefore of considerable interest to knowledge-based breeding for increased protein content or production of technical and medical protein products. Also the mature potato tuber vacuole might contribute to a general model of dicot vacuolar trafficking.
The plant literature on vacuolar trafficking has been extensively reviewed [5–7] and provides a complex picture pieced together from a variety of plant species and tissues. The present model of vacuolar protein import proposes that folded pro-proteins are subsequently recognized by specific membrane spanning vacuolar sorting receptors (VSRs) or receptor homology-transmembrane-RING H2 domain proteins (RMRs), followed by vesicular transport to the vacuole . On their way, via multi-vesicular bodies or prevacuoles, or in the vacuole, processing by vacuolar processing enzymes (VPEs) takes place. This process appears to be linked to acidification . The documented extensive processing of potato tuber storage proteins is discussed in this context.
The present study represents the first study of a protein storage vacuole (PSV) from a tuber. Most PSV studies have been carried out on seeds using various imaging techniques [5,8]. Central vacuoles from Arabidopsis rosette leaf  and cell culture [10–12] have been purified and subjected to extensive proteome studies and bioinformatics identification using the completed Arabidopsis genome sequence and functional annotation of genes. The soluble proteins were largely hydrolytic enzymes and stress related, i.e. typical of a lytic vacuole. However, the vacuolar proteins of a tuber storage organ derived from an underground stem include entirely different protein super families, whereas targeting and processing appear to be similar. We describe a simple method to release and prepare high quality vacuoles from the hard potato tuber tissue for the study of native storage proteins.
The promiscuous and sometimes autophagic nature of vacuoles [7,13,14] provides a challenge in classifying proteins as truly vacuolar, i.e. tonoplast and associated proteins, resident vacuolar proteins for storage and metabolic processes, versus those imported for degradation, in addition to contaminants of purification. Among approximately 450 putative vacuolar proteins identified by mass spectrometric sequencing (MS/MS) we analyse the post-translational proteolytic processing of the major families of potato tuber storage proteins and the presence of known vacuolar targeting signals and of novel propeptides. The results demonstrate an unexpected abundance of differential truncated forms of nearly all storage proteins originating from unspecific proteolytic processing of exposed protein termini and loops.
Preparation and purity of tuber vacuoles
Identification and in depth amino acid sequence analyses of true storage proteins require isolation of intact storage vacuoles and gentle and fast fractionation of the proteins. A simple method was developed for releasing vacuoles from field-grown mature (post-flowering) potato tubers. The hard tissue was chopped shortly in a household blender at low power in a cutting buffer containing 0.5 m mannitol. After removal of uncut tissue (∼ 75%) by filtering, the vacuoles were collected by centrifugation and purified by density centrifugation in a Ficoll step gradient. Vacuoles appeared at the interface between 0% and 5% Ficoll, whereas more dense organelles remained in the 25% Ficoll phase, and starch granules precipitated. The quality of tuber vacuole preparations was confirmed by microscopy (Fig. 1) and reproducible chromatographic profiles of soluble vacuolar proteins (Fig. 2). Contamination of mitochondrial protein from intact or bruised mitochondria was 2.7%, as determined by the relative content of cytochrome c oxidase (Table S4).
The tuber storage vacuoles stained well with neutral red in the native tissue and after preparation, indicating a vacuolar pH value below 6.8, the pKa of this pH-indicator dye. Figure 1 illustrates the neutral red stained vacuoles in intact tuber tissue and a purified vacuole preparation. The tissue containing large vacuoles in Fig. 1B was the starting material for purified vacuoles (Fig. 1C). Vacuoles in freshly harvested tubers are small (Fig. 1A) and remain small for 3–4 weeks in tubers stored at 5 °C (not shown). Thus it appears that tuber storage gives rise to fusion of small vacuoles into larger ones.
Isolation of potato vacuoles was not achieved from all batches of tubers but depended on storage. Therefore, the effect of storage on vacuole isolation was studied in more detail. Vacuoles were produced reproducibly from tubers stored at room temperature for 3–4 weeks. Freshly harvested tubers or tubers stored cold for 3–4 weeks gave no purified vacuoles, presumably because of their small size (Fig. 1A). Storage for more than 3 months spread the vacuoles in the 5% Ficoll layer, most probably due to water loss.
Purified vacuoles, ruptured by osmotic shock and freezing in gel filtration buffer, were separated into native soluble protein (vacuole sap) and insoluble protein (pellet of membrane associated proteins and aggregates) by ultracentrifugation. The content of native proteins and protein complexes of vacuole sap and juice of total tuber was compared by Superdex 200 gel filtration (Fig. 2). The gel filtration shows an almost complete elimination of large size proteins and complexes in vacuolar sap compared with tuber juice. This is in agreement with our earlier finding that large size proteins of tuber juice were dominated by ribosomes, proteasomes, various chaperonins and enzymes of starch biosynthesis, i.e. non-vacuolar proteins . Thus the gel filtration profile serves as an indicator of vacuole sap purity, and the inset of Fig. 2 shows the excellent reproducibility of gel filtration profiles of vacuole sap produced in different years (2004, 2005 and 2008). We conclude that the vacuoles appear rather pure (Fig. 1C), are absent of large cytosolic protein complexes (Fig. 2) and have very little mitochondrial impurity (cytochrome c oxidase), but may contain some Golgi impurities (mannosidase) (Table S4).
Proteome analysis of native vacuolar sap proteins
The supernatant of ultracentrifuged vacuoles was fractionated on a calibrated column of Superdex 200 (Fig. 2). Peaks were named according to the size of the native proteins, ranging from ∼ 340 to 5 kDa (vacuolar sap vs340, 140, 90, 45, 20, 15, 10 and 5), and a pool of protein retained by the column (vs0). Vs0 was eluted after phenolics (high absorption at 310 nm of the vs10 and vs5 peaks) and salt (conductivity not shown). In experiment 10-2005, ∼ 5 mg sap protein (from the pool of 35 mg tested for marker enzymes, Table S4) was fractionated, and the nine pools vs340–vs0 of Fig. 2 were reduced, carboxymethylated and digested by trypsin prior to LC-MS/MS collision induced dissociation (CID) analyses. In experiment 09-2008, 1 mg of a different preparation of vacuolar sap protein was fractionated and processed as above (Fig. 2, inset profile 09-2008). However, the results of experiment 10-2005 suggested that important additional information could be obtained by digesting samples of each 750 μL fraction (corresponding to the resolution of the gel filtration column) and with both trypsin and chymotrypsin, providing more complete information on sites of proteolytic processing and higher amino acid sequence coverage of mature vacuolar proteins .
MS/MS data from all trypsin in-solution digestions of experiments 10-2005 and 09-2008 were combined into one list and analysed by mascot software (Table S2). Chymotrypsin digestions of fractions of experiment 09-2008 were analysed similarly (Table S3). Proteins were identified by searching an in-house potato protein database derived from version 12 of potato expressed sequence tag (EST) contigs (TC number) and our cv. Kuras full-length cDNA (GenBank DQ number) and EST contigs (K number). mascot searches were filtered using a peptide ion cut-off of 10. False discovery rates for Tables S2 and S3 were 1.05% and 1.21%, respectively. Previous searches including the tomato and National Center for Biotechnology Information non-redundant plant protein databases of tryptic peptides from experiment 10-2005 gave very few additional hits (Table S5). Table S1 lists all proteins identified with a score ≥ 200 and lower scoring variants of these proteins. We consider the inhibitor, patatin and lipoxygenase families as storage proteins because of their high number of very similar sequence variants and their high abundance in tuber  and in vacuoles. This includes high abundance in both the soluble and insoluble vacuolar fractions (Table S6). Soluble subunits of vacuolar membrane proteins are also abundant. Proteins involved in folding, metabolism and glycolysis may be either truly vacuolar (unlikely) or imported for degradation, or impurities of preparation. All identified proteins or protein subunits were subjected to bioinformatics analyses of signal peptides (signalp), isoelectric point (pI) and molecular mass (Mr) of the translated proteins.
Table 1 comprises a subset of Table S1, i.e. the putative storage proteins. Four super families of protease inhibitors and two super families of lipid-transforming enzymes dominated among soluble vacuole proteins: 28 Kunitz protease inhibitors (KPIs), nine protease inhibitors 1 (PI 1), eight protease inhibitors 2 (PI 2), two carboxypeptidase inhibitors (CPIs), eight patatins (pats) and five lipoxygenases (lox). Many additional high scoring variants were present in the insoluble vacuolar pellet together with candidates of processing proteases (Table S6). Novel information derived from the present protein chemical study has been summarized below for each super family.
Table 1. Major storage/defence proteins identified in cv. Kuras potato tuber vacuole sap. Subset of the major proteins of vacuolar sap from Table S1, derived from the mascot files of Tables S2 and S3. Settings: semi-trypsin or semi-chymotrypsin; variable modifications Met oxidation, Cys carboxymethylation, N-terminal pyroglutamate; peptide score ≥ 10.
The major proteins of potato tuber are stored in the vacuole and serve dual roles as storage proteins and stress-response proteins.
Kunitz protease inhibitors (KPIs)
The proteome data distinguished 28 KPI variants, which cluster into the seven clades A, B, C, J, K, M and N (Fig. 3). Figure S1 shows a complete alignment and the ∼ 90% amino acid sequence coverage derived from tryptic and chymotryptic peptides. The few potential N-linked glycosylation sites, Asn-X-Ser/Thr, were found in their non-glycosylated form only, Asn59 in KPI A variants (18 non-glycosylated peptides sequenced), Asn233 in KPI K-k1 (no sequence), Asn85 in KPI J-k1 (one non-glycosylated), whereas none of the five known glycan sites in pat variants were found in non-glycosylated form in the data (Tables S2 and S3). Thus it is highly unlikely that potential sites in KPIs are glycosylated. A complex glycan would mean passage to the vacuole via the Golgi apparatus as expected for the glycosylated pats . Because potato tuber KPIs appear to be without N-glycosylation they may follow a different path to the vacuole. KPIs have two conserved disulfide bridges in analogy to soybean trypsin inhibitor (STI). The amino acid sequence identity between KPI A-k1 and STI is 24%, and the inhibitor loop of STI binding to the active site of porcine trypsin  appears to be a poor model for the KPI inhibitors of potato tuber.
Within each KPI clade the amino acid sequences are very similar, and variants may diverge by only a few amino acid replacements. Here we focus on the extensive post-translational proteolytic processing documented in the data (Tables S2 and S3). Unspecific peptide bond cleavage was assessed by manual counting of MS/MS spectra of semi-tryptic and semi-chymotryptic peptides with expect values < 0.05 only. Figure 4 shows that the N-terminal end of each KPI variant is truncated beyond the ER signal peptide, however, by unspecific cleavage at a variety of peptide bonds, thus giving rise to many molecular forms of each KPI variant. Furthermore, all KPI clades, except for KPI B, have variable C-terminal truncations. In contrast, KPI B variants are also cleaved internally after position 206, as has been seen previously . In KPI B-k1, for example, we observed three different N-terminal semi-tryptic or semi-chymotryptic peptides; the number of times each was observed is indicated by the superscript number at the start of the N-terminal bracket in Fig. 4. The longest variant of the mature KPI B-k1 protein started right after the predicted ER signal peptide (shown as lower case letters). Other variants were shorter by 10 and 12 residues. The predominant variant indicated by 43 observations of the N-terminal peptide had a propeptide of 10 residues removed, which included the well-known NPIR-like sequence-specific vacuolar sorting determinant (ssVSD). If the peptide scoring criterion is loosened from peptide expect values < 0.05 to peptide ion scores ≥ 10 (given in Tables S2 and S3), the MS/MS spectral counting indicates even more sites of unspecific truncation than the sites shown for KPI in Fig. 4, PI 1 in Fig. 5, and the proteins in Fig. 6.
The experimentally verified NPIR-like ssVSD sequences in sweet potato sporamin (homologous to potato KPI) and potato KPI B [18,19] are shown in grey in Fig. 4, although extended to similar sequences in all KPI subfamilies. In KPI C variants, the highly unspecific processing is observed within and around the NPIR motif, even extending to the first two residues of the Kunitz motif (framed in Fig. 4A).
In potato the Kunitz protease inhibitors are encoded by a multigene family, most members organized in a cluster mapped to chromosome 3 [20,51]. Interestingly, KPI proteins are also used as storage proteins in soybean seed (trypsin inhibitor), in sweet potato (sporamin) and in taro (taroG2, miraculin). The evolutionary origin of this preference of KPI for storage is unclear, as these storage organs derive from different botanical tissues, potato – stem, soybean – seed, sweet potato – root, and taro – corn .
Protease inhibitor 1 (PI 1)
Figure 5 summarizes the sequence coverage of nine PI 1 variants, the evidence for variable N-terminal truncations, and their phylogenetic relationship. PI 1 variants have a single disulfide bridge  and no potential N-glycosylation sites. The data show significant unspecific post-translational N-terminal processing, but no indication of C-terminal propeptides (Fig. 5B). Propeptides ranging in size from 0 to 30 residues have been cleaved off. Most mature PI 1 proteins start at Gly30. A potential ssVSD function of the corresponding propeptide residues 24–29 has not been tested experimentally. PI 1 proteins eluted at 45 kDa  and consist of six subunits , but the atomic structure of the hexamer is unknown. PI 1 proteins have been previously purified in harsh conditions (including pH 3 and 75 °C treatments) and sequenced on extensively purified preparations which, however, were still heterogeneous due to the presence of many highly similar variants [24,25]. Such preparations were truncated more than ours prepared in mild conditions and had N-termini at Glu44 or Lys38 (position numbers of Fig. 5). Figure 5A also contains information relevant to the structure and functions of PI 1 proteins, the subunits of which are homologous to the monomeric leech inhibitor eglin c. The similarity suggests that the protease inhibitory position 89 of eglin c can be extended to the PI 1 variants. Therefore, we predict that Lys89 in PI 1-k1 will bind to the substrate specificity pocket of trypsin-like proteases with specificity for the positive residues Lys and Arg. Other PI 1 variants with Leu or Met at position 89 will bind to and inhibit other proteases with binding specificity for hydrophobic side chains like chymotrypsin. In fact, the gel filtration pool of all PI 1 variants from potato juice inhibited alkalase (a subtilisin-type protease) more than trypsin . Alignment to five PI 1 variants from cv. Desiree  and contigs (TCs) from a number of potato cultivars demonstrated that also the PI 1 storage protein variants are cultivar specific (not shown).
Carboxypeptidase inhibitors (CPIs)
Two CPI variants were identified, CPI-k1 with full sequence coverage (Fig. 6A). signalp  predicted that residues 1–17 were removed on import to the ER. However, residues 18–44 and the last six residues, 85–95, were also absent in the MS/MS data, hence indicating propeptide functions.
At the N-terminus, six differentially truncated forms were identified starting at position 45, 46, 47, 48, 49 or 50. The putative N-terminal propeptide 18–44 (up to 49) might function as an ssVSD. Two forms of CPI, starting at positions Gln45 and Gln46, had been processed further to converting these glutamines fully to pyroglutamate. As spontaneous lactam formation of glutamine or glutamic acid is slow and incomplete, this means that cyclization of glutamine to pyroglutamate was catalysed by an enzyme (glutamine cyclotransferase). Hence truncation occurred first, and pyroglutamate formation followed later.
At the C-terminus, hydrophobic propeptides (G)GAMAIGL-COO− have been removed. The sequence is similar to experimentally verified ctVSDs.
The CPI-k1 molecule, residues 45–84, with N-terminal pyroglutamate was previously purified and its function and structure studied in detail in complex with carboxypeptidase A . In folded recombinant CPI, the C-terminal propeptide was accessible to unspecific proteolysis . We propose that CPI, like Arabidopsis 2S albumin and mung bean phaseolin, has functional ssVSD as well as ctVSD sequences (see Discussion).
Protease inhibitor 2 (PI 2)
Six PI 2 variants discriminated by at least one unique peptide were identified. Four full-length PI 2 proteins were fully sequenced for the first time. The data for PI 2-k2 and PI 2-k4 are the most diverse and illustrate the findings relevant to vacuolar import signals (Fig. 6B). Two C-terminal forms were seen. Seventeen chymotryptic peptide spectra defined Lys147 as the predominant C-terminus, whereas five tryptic and two chymotryptic peptide spectra were terminated by Ala148. The Ala148-Asn149 peptide bond is normally not cleaved by these proteases. Thus a seven or six residue propeptide has been removed at the C-terminus.
At the N-terminus the ER signal peptidase is predicted by signalp to cleave the translated proteins between positions 26 and 32, depending on variant. For PI 2-k2, Met26 was a slightly more likely site of signal peptidase cleavage than Lys32. For PI 2-k4, Lys32 was more likely than Val29. However, the chymotryptic data showed no peptides starting after Lys32.
Tryptic and chymotryptic peptides started at Ala31 eight times and at Lys32 10 times again demonstrating unspecific N-terminal processing (Tables S2 and S3). These peptide bonds are normally not cleaved by trypsin or chymotrypsin. In addition, extensive unspecific internal cleavages were observed. No known VSD is seen in the PI 2 family in the N-terminal part, whereas our data document, for the first time, the removal of a typical ctVSD, (A)NMYPAM-COO−.
Eight pat variants were identified (Table 1). An extended protein chemical analysis of the same eight mature pats purified from potato juice showed that all carried complex-type glycans and therefore must have passed the Golgi apparatus . Two characteristic vacuolar pat variants are shown in Fig. 6C. Pats have N-termini starting right after the predicted ER signal peptides, which are Thr24 of pat1-k1, pat3-k1 and pat4-k1 (26 tryptic and five chymotryptic peptides starting at Thr24 were sequenced) and Ser24 of pat1-k2 (20 chymotryptic peptides sequenced). signalp predicts equal propensities of Ala23 and Thr24 as N-termini of mature pat2-k4 and the other members of the pat2 clade. Indeed, we sequenced five tryptic peptides starting at Ala23, in addition to the 26 tryptic and five chymotryptic peptides starting at position Thr24, shared among all pat variants, except pat1-k2. In the pat2 variants, the small side chain of Cys22 substitutes the large Phe22 present in the remaining pats, thus explaining the differential signal peptidase specificity (Fig. 6C). In conclusion, the N-termini of pats have been generated by ER signal peptidase activity, and pats contain no N-terminal VSD.
At the C-terminus of pats, the vacuolar proteome data confirm our previous finding that a well-defined ct-propeptide has been removed (20 chymotryptic peptides ending at Arg381), presumably by a vacuolar processing protease .
In addition, our data demonstrated significant unspecific processing at a few, presumably exposed, surface loops. Thus in pat1-k1 and pat2-k4 cleavages before Pro360 (13 tryptic peptide spectra), Glu361 (four tryptic peptide spectra) and Thr362 (one chymotryptic peptide spectrum) and of three peptide bonds within positions 149–155 (four peptide spectra) were seen. In pat3-k1, peptide bonds 102–116 are not protected by a complex N-linked glycan at position 115 as in most other pat variants, and unspecific nicking in this surface loop was documented by eight tryptic peptide spectra. This unspecific cleavage of an exposed loop we ascribe to the activity of a vacuolar processing protease.
Lox oxygenate certain unsaturated lipid acids, such as the products of pat hydrolysis, to hydroperoxy lipid acids which are precursors of signalling molecules in growth regulation and wound response. In a previous study of total Kuras tuber juice  we identified a variety of highly similar lox and estimated the lox content to 10% of total juice protein, i.e. properties typical of storage protein. Lox are monomeric 97 kDa proteins with no disulfide bridges and glycans, and eluted together with the dimeric pats in gel filtration pool vs90 (Fig. 2). At least five different lox were identified (Table 1). Figure S2 shows four full-length lox sequences and the sequence coverage from tryptic and chymotryptic peptides. A very large number of additional lox variants were identified in the insoluble vacuolar fraction (see below; Table S6).
Insoluble vacuolar proteins
Membranes and associated proteins, and insoluble aggregates were identified from tryptic in-gel digestions of 24 SDS/PAGE slices (Table S6). Very many variant pats, lox and KPIs remained in the vacuole pellet after washing. A variety of vacuolar proton ATPase subunits was identified among both insoluble and soluble proteins (Table S1), indicating acidification of the tuber vacuole, as seen by the neutral red staining (Fig. 1). Notably, no aquaporin/TIP variants were identified.
Here we keep focus on vacuolar targeting and processing, and note that VSRs or RMRs and Asn-specific cysteine proteases (often referred to as vacuolar processing enzyme and aleurain) were not seen in the vacuole sap or the pellet. This indicates either a low abundance or that very little enters the vacuole of these proteins. On the contrary, two other classes of proteases were seen but only among insoluble proteins, i.e. the aspartate proteases (AP or phytepsin) DQ241852 and TC166915, and both the A and B subunits of two mitochondrial processing peptidases (MMPs). A sequence alignment and observed tryptic peptides of the two potato phytepsins are shown in Fig. S3. The data, although preliminary, indicate that the potato phytepsins have N-terminal and internal propeptides removed similarly to vacuolar barley phytepsin.
Proteolytic processing of soluble vacuolar proteins
We have used purified vacuoles to study the post-translational processing and integrity of potato tuber storage proteins. The proteome of purified potato tuber vacuoles is dominated by rather few stress-storage families typical of protein storage vacuoles (Table 1; Tables S1 and S6). There is a remarkable absence of the glycolytic and proteolytic enzymes typical of lytic vacuoles . We have chosen to focus on the six dominant protein super families of vacuole sap and identified a multitude of genetic variants (Table 1). The mass spectra of peptides from trypsin and chymotrypsin digestions with expect values < 0.05 matched to predicted semi-tryptic and semi-chymotryptic peptides of all known potato proteins documented an abundance of differentially truncated and nicked molecular protein forms of each genetic variant in the super families (Figs 4, 5 and 6A–C). These molecular forms are most likely the result of true post-translational proteolytic processing and not artefacts of the slow and harsh purification protocols applied to some of the protease inhibitors in the past. Here soluble vacuole proteins dominated by protease inhibitor families were prepared rapidly in gentle conditions. They stayed in their native environment, the vacuole, until cold centrifugation, followed immediately by gel filtration at physiological conditions. Protein fractions were denatured by reduction and carboxymethylation prior to specific in-solution digestion and mass spectrometric peptide sequencing. Therefore, our results reflect the true repertoire of post-translationally processed mature proteins of the potato tuber vacuole. This result was unanticipated knowing the very broad inhibitory activity and abundance of the protease inhibitor super families present in the tuber [1,4]. However, it might be understood in terms of prevacuolar recognition and processing.
The proteome data show that most vacuole pro-proteins (i.e. the transcribed protein after removal of the ER signal peptide) have N- and C-terminal segments and some loops accessible to proteolytic processing. The extensive nicking of the surface loop positions 102–116 in pat3-k2 only, which is not protected by a large complex N-linked glycan at position 115, illustrates this point. In all native folded proteins, unstructured or flexible termini and loops are exposed to proteolytic attack, whereas the internal well-packed polypeptide chain is protected against proteolysis . This means that such flexible segments will be exposed also to VSRs and can guide to the location and characteristics of novel VSDs. On import to the ER, proteins have their ER signal peptides removed. Furthermore, in the ER cysteine bridges are formed and the proteins fully folded. Also, glycan chains are attached and preliminarily processed, if present in a protein. Final processing of complex N-linked glycans by mannosidases and glycosyl transferases takes place in the Golgi compartments [31,32].
The data provided solid documentation of a multitude of cultivar-specific storage protein variants. In addition, many known vacuole proteins and novel proteins were identified (Tables S1 and S6). The novel proteins might be imported by an autophagic pathway or be non-vacuolar impurities. No vacuolar sorting receptors (VSR or RMR) were seen, which supports their recirculation to the Golgi from a prevacuole [33,34]. Two classes of proteases were significant among insoluble vacuole proteins and therefore are candidates for VPEs, i.e. two typical vacuolar phytepsins (Fig. S3), and mitochondrial processing peptidase A and B subunits.
C-terminal targeting sequences
The specific binding of a receptor and its complementary target VSD might be similar to specific protease–protease inhibitor or protease–protein substrate interactions. A ctVSR most probably has a binding site recognizing the negatively charged free alpha carboxylate of the pro-protein target and the particular biophysical properties of nearby ct residues. Figure 6 shows ct-propeptides terminated by -Ile-Gly-Leu-COO− (CPI), -Pro-Ala-Met-COO− (PI 2) and -Ala-Ser-Tyr-COO− (pat). The ct-peptide Ser-Phe-Lys-Gln-Val-Gln-COO− of KPI B proteins can also direct to vacuolar import in tobacco BY-2 cells . No C-terminal truncations were seen among KPI B proteins, however. The common properties of a putative C-terminal vacuolar recognition target in potato might be described as small–large hydrophobic-COO−. The last residue, hydrophobic-COO−, is the most likely core of recognition. A salt bridge between the carboxylate and a probable basic residue of the VSR will weaken on acidification and release the bound target. The crystal structure of the carboxypeptidase–CPI complex  might provide a model for C-terminal target binding by a ctVSR, while the overall structures of carboxypeptidase and ctVSR are unrelated.
The potato ct-propeptides are similar to those documented to be functional ctVSD in the subunits of soybean β-conglycinin, i.e. exposed unstructured C-terminal sequences ending with either Ala-Phe-Tyr or Ala-Leu-Tyr residues , or bean β-phaseolin subunits ending with Phe-Val-Tyr . The potato ct-propeptides also share their large hydrophobic-COO− with vacuolar Arabidopsis peroxidases which have C-terminal Met or Ile [9,37]. Published data on ctVSD indicate some differences between monocots and dicots, and minor differences among dicots, probably reflecting minor structural differences and affinities of the binding sites of receptors.
Internal targeting sequences
The laboratory of Nakamura has demonstrated that internal NPIR-like sequences in sweet potato and potato tuber KPI proteins will target a pro-protein to the vacuole independently of location within the protein chain . This type of exposed target sequence will be recognized by an ssVSR different from the ctVSR recognizing ctVSD. Hydrophobic isoleucine or leucine has been shown to be the core of recognition of this type of ssVSD . In all KPI clades several variants of the NPIR-like target are seen near the N-terminus (Fig. 4A, residues 29–33) and, in addition, near the C-terminus in the KPI A and B clades (Fig. 4B, residues 234–238).
The ssVSR recognition of a NPIR-like loop or other exposed protein loops or turns might be comparable with the specific binding mode of a protease and protease inhibitor, e.g. the binding sites of chymotrypsin–eglin c or subtilisin–eglin c [39,40]. In these complexes the loop containing Leu89 of eglin c (position 89 in Fig. 5) is the core of recognition and has a nearby proline residue in eglin c and in all PI 1 variants. In general, the probability of a turn in a folded protein chain is increased by the presence of proline (P), glycine (G), aspartic acid (D), asparagine (N) and serine (S) . These residues are predominant near the experimentally verified isoleucine or leucine core of the KPI vacuolar target sequences (Fig. 4). In fact, we propose that an NPIR-like ssVSD is just a special case of an exposed turn-shaped flexible vacuolar target sequence (often referred to as a ‘physical structure’ VSD), which has a hydrophobic core and is complementary in structure to the binding site of the corresponding receptor. (The ssVSD term should change its meaning from ‘sequence specific’ to ‘structure specific’.)
CPI might be imported to the vacuole via both the ctVSD (see above) and the ssVSD pathways. Figure 6 shows two forms of ct-propeptides of six or seven residues, and a multitude of nt-propeptides from 27 to 32 residues in size. We propose that the Leu37-Pro38 of CPI might function as the core of a novel type ssVSD in agreement with the analysis in the previous paragraph.
For PI 1 variants (Fig. 5) no sequence has been proven to function as vacuolar target so far. In line with the above characteristics of an ssVSD, we propose that the hydrophobic core residue might be conserved Ile34 or Leu36, as there are Pro31 and other turn residues (S, D, G) nearby. The loop of PI 1 variants, residues 85–90, predicted to inhibit proteases in analogy with the homologous eglin c (Fig. 5), have similar characteristics and cannot be excluded. However, it is less likely as no proteolysis was seen in this loop in the data set, indicating insufficient flexibility and exposure also for ssVSR binding.
All PI 2 variants except PI 2-k7 have ct-propeptides compatible with ct targeting, as discussed above (Fig. 6). However, PI 2 variants are also cleaved in a variable manner at the N-terminus, before Ala31 or Lys32. The N-terminal propeptides have none of the turn or loop characteristics we ascribed to ssVSD. (Still, it cannot be excluded that a novel type of positively charged nt target sequences might be recognized by a specific ntVSR in potato in analogy to the negatively charged ct target sequences.)
Import from cytosol?
Lox are abundant among the soluble vacuolar proteins as indicated by the absorbance at 310 nm seen at 90 kDa in Fig. 2, which we assign to the iron-containing active site of lox . Several lox variants have high protein scores (Table 1; Table S6), although the sequence coverage was insufficient to evaluate protein nicking. Prediction algorithms found no ER signal peptide and targeting signals (Fig. S2). A ct-propeptide is excluded because the conserved C-terminal isoleucine of all lox variants provides the free alpha carboxylate as ligand to the active site iron. This site is buried in the structure . The present data support that potato tuber lox are associated to the vacuole and have abundance and genetic variability in common with other vacuolar storage proteins. This direct lox import has been demonstrated in soybean leaves where paraveinal mesophyll vacuoles accumulate vegetative storage proteins [44,45]. Immunocytochemical microscopies showed that lox was translocated directly from the cytosol at the vacuolar membrane and also stained protein clouds inside the vacuole. In contrast, the Golgi apparatus showed no lox antibody staining. We propose that potato vacuolar lox must be imported similarly. The molecular mechanism of this translocation still requires clarification.
Properties and function of tuber potato vacuolar processing protease
The present data can be seen as an extensive study of the enzymatic properties of potato tuber vacuolar processing protease(s). First, the data show that the protease or proteases can associate to exposed protein termini and loops and cleave most peptide bonds. Thus, the substrate specificity must be very low. Second, either the catalytic rate or the enzyme concentration must be high to explain the observed extensive processing of all storage proteins, except maybe for lox variants. Significant proteolysis in the vacuole lumen appears to be unlikely as the combined protease inhibitor profile of the vacuole sap proteins KPI, PI 1 PI 2 and CPI is very broad and extends to all classes of endoproteases, including serine, cysteine and aspartate proteases, except maybe for metallo endoproteases . Thus the prevacuole or multi-vesicular bodies, where processing proteases and inhibitory storage pro-proteins might still be less mixed, appears to be the major site of proteolytic processing. Pyroglutaminylation of truncated CPIs (Fig. 6A) by glutamine cyclotransferase/glutaminyl cyclase (EC 220.127.116.11) appears to take place after proteolysis, and might occur in either the prevacuole or vacuolar lumen. However, we did not see glutamine cyclotransferase in our soluble or insoluble vacuole proteome data, indicating low abundance in the vacuole or presence in the prevacuole.
Aspartate proteases (phytepsins) and metallo proteases (MMP) scored with high significance among insoluble vacuole proteins (Table S6). Together with an Asn-specific cysteine protease referred to as VPE, which was not seen in the data, these three are candidates for truncating tuber storage proteins . Phytepsins have an acid pH-optimum and broad peptide bond specificity due to an extended substrate binding site [46–48], which match perfectly the unspecific cleavages observed in all protease inhibitor variants and pats. Thus potato phytepsins DQ241852 and TC166915 (Fig. S3) are the best candidates for processing. The potato MPPs were unexpected. However, an additional role as VPE deserves further analysis, because mitochondrial contamination of the vacuole preparations was very low (Table S4), and good protein scores were obtained even though some MPPs were only fragments in the potato protein database, thus giving fewer matching peptides and lower scores. The potato MMPs are very similar to other plant MMPs, consisting of an active B subunit and an inactive A subunit, and requiring an arginine residue (R) near the scissile peptide bond . This rather specific metallo protease might be active in the presence of vacuolar protease inhibitors. Although no Asn-specific cysteine proteases similar to Arabidopsis VPE were observed in the data, we cannot exclude that the favoured Asn-bond cleavage seen after Asn33 in KPI B variants and Asn42 in KPI C variants (Fig. 5A) might be the result of cysteine protease activity. But phytepsin also cleaves after Asn-residues  and is the more likely vacuolar processing protease in developing potato tuber.
Our laboratory has extensive digital transcriptome data for Kuras potato tuber at different developmental stages . The numbers of transcripts per 100 000 of phytepsin TC166915 in mini tuber (8–9 weeks after planting), mature tuber (after flowering), tuber at harvest, dormant tuber, and tuber tissue under sprouting eyes were 11, 65, 16, 9, 181, respectively, indicating that it plays an important role during energy storage and again during mobilization. Phytepsin DQ241852 has a similar expression profile of mRNA, although less abundant: 0, 9, 5, 2, 10. Also the MPP expression profiles are similar with rather low counts (DQ284488, 0, 14, 5, 6, 2; TC165357, 0, 3, 0, 0, 2). All VPE transcripts show increasing copy numbers with ∼ 10-fold increase during sprouting over previous stages, suggesting its major functional role during tuber mobilization (TC160874, 2, 7, 40, 80, 966; TC166220, 0, 0, 7, 13, 101; TC164239, 0, 0, 1, 1, 6). The relationship between transcript number and protein concentration is unknown as mRNA stability and translational rates differ among transcripts. Altogether, we conclude that phytepsin is the major VPE in protein storage, possibly together with MMP, whereas the VPE cysteine protease seems more important in protein mobilization during sprouting.
The extensive unspecific truncations documented in the present data on mature soluble vacuolar storage proteins raise the question of the biological purpose of VPEs during protein storage. Truncations cause a significant loss of storage protein mass. We propose that proteolysis works in conjunction with cargo release from the receptor, possibly at a lower pH, and that chopping up of a VSD will prevent recurrent binding to the receptor and thus permit the free receptor to return for a new round of vacuolar protein transfer.
The isolation of potato tuber vacuoles and protein chemical identification of the proteolytic maturation of protein storage families in addition to the recent release of the first potato genome model  invite to classic molecular imaging analyses of vacuolar targeting in potato tuber. The search for a common model of dicot vacuolar biogenesis and function should include tuber besides leaf, seed at various stages and root tissue.
Preparation of tuber vacuoles
Field-grown mature starch potato cv. Kuras was obtained from AKV-Langholt, Gravsholtvej 92, DK-9310 Vodskov, Denmark. They were used either freshly harvested or stored at 5 °C or 20 °C in the dark before use. Tubers (100 g) were cut in 2 cm cubes, mixed 1 : 1 w/v with cutting buffer [0.5 m mannitol, 1 mm KCl, 1 mm EDTA, 0.25% (w/v) polyvinylpyrrolidone, 2.5 mm dithiothreitol (DTT), 10 mm Hepes, pH 7.5] and subjected to the shearing forces of a domestic rotary chopper (Krupp’s model 708A, 270 W) for 5–10 s at 10 °C. This mild treatment disrupted ∼ 25% of the plant cells with minor disruption of the organelles. The crude homogenate (125 mL) was filtered through a double layer of Miracloth (Merck-Biosciences, Nottingham, UK) and centrifuged at 2000 g for 10 min. The pellet was suspended in 5 mL cutting buffer containing 25% Ficoll 400 (Sigma-Aldrich, St Louis, MO, USA) and subjected to step density gradient centrifugation under layers of 3 mL 5% and 1 mL 0% Ficoll in the buffer. After centrifugation at 650 g for 30 min at 4 °C vacuoles emerged at the interface of 0% and 5% Ficoll. Isolated vacuoles were washed in the buffer without Ficoll and collected by centrifugation. They were easily identified by bright-field and phase-contrast microscopy (DMR, Leica Microsystems, Wetzlar, Germany) after staining with 0.1% (w/v) neutral red (Sigma-Aldrich) in NaCl/Pi at room temperature for 30 min. Yields were 1–2 mg protein per preparation from 100 g tuber.
Protein and enzyme assays
All analyses were carried out in triplicate, if necessary after desalting on 5 mL Sephadex G25M columns to remove interfering substances like DTT. Protein concentration was determined by the bicinchoninic acid assay using bovine serum albumin as reference . Alcohol dehydrogenase (EC 18.104.22.168) and cytochrome c oxidase (EC 22.214.171.124) were assayed according to MacDonald and Rees  and Rasmussen and Møller , respectively. α-Mannosidase (EC 126.96.36.199) was assayed as previously described [4,55]. α-Mannosidase activity was calculated using ε405 = 18.5 μm−1·cm−1 for released p-nitrophenol.
Fractionation of vacuolar proteins
Vacuoles were ruptured by osmotic shock and freezing in gel filtration buffer (50 mm sodium phosphate, 40 mm NaCl, pH 7.0) and then separated into soluble vacuolar sap and an insoluble pellet by ultracentrifugation at 100 000 g for 90 min at 4 °C. Supernatant, 1–5 mg protein in 250 μL in experiments 09-2008 and 10-2005 respectively, was subjected directly to Superdex 200 gel filtration, whereas in experiment 10-2004 the sap was concentrated by dry Sephadex G25. Superdex 200 gel filtration of soluble vacuolar sap was carried out as previously described . The insoluble pellet was washed in 50 mm sodium phosphate, 300 mm NaCl, pH 8.0, buffer and then suspended in 200 μL of this buffer containing 0.1% Triton X100. Membrane associated proteins were released by gentle shaking for 1 h at 4 °C and collected in the supernatant after centrifugation at 20 000 g for 1 h at 4 °C. Protein was precipitated by seven volumes of ice-cold acetone overnight at −20 °C. Samples were not boiled before SDS/PAGE.
Samples from gel filtration pools were precipitated with seven volumes ice-cold acetone overnight at −20 °C. The pellets were dissolved in 50 μL of reducing buffer (8 m urea, 20 mm EDTA, 20 mm DTT, 200 mm Tris, pH 8.5) by sonicating for 10 min and incubating for 30 min at 25 °C. Samples were then alkylated for 30 min in the dark after adding 14 μL 0.5 m iodoacetic acid in 0.5 m Tris, pH > 8. Carboxymethylated proteins were precipitated with six volumes of ice-cold ethanol overnight at −20 °C and washed in 70% ethanol. Pellets were dissolved in 100 μL 50 mm NH4HCO3, pH 8.0, and digested at 37 °C for 60–90 min by 8 μL of 10 ng·μL−1 of either modified sequencing grad porcine trypsin (Promega, Madison, WI, USA), or modified bovine chymotrypsin (Princeton Separation Inc., Adelphie, NJ, USA). The reactions were stopped by adding 10 μL of 5% formic acid prior to LC-MS/MS or storage at −20 °C. SDS/PAGE separated protein bands were in-gel digested by trypsin as previously described .
Nanoflow LC mass spectrometry
Aliquots of the peptide mixture were analysed by a nanoflow capillary liquid chromatography system (Ultimate/Switchos/Farmos, LC-Packings, Amsterdam, Netherlands) interfaced directly to an ESI Q-TOF tandem mass spectrometer (MicroQ-TOF, Bruker Daltonics, Bremen, Germany) using a vented column setup . Peptides were separated on a 10 cm long 50 μm id-custom-packed C18 reversed-phase column of Reprosil-Pur C18-AQ, 3 μm particle size (Dr Maisch GmbH, Ammerbuch-Entringen, Germany) using a 35 min gradient of 0%–45% acetonitrile (v/v) containing 0.6% (v/v) acetic acid and 0.005% (v/v) heptafluorobutyric acid in MilliQ water, at a flow rate of 200 nL·min−1. Mass spectra obtained by automated data-dependent acquisition modes were processed using the DataAnalysis version 2.4 (Bruker Daltonics). A Waters Q-TOF Ultima API MS instrument was used for the analysis of tryptic peptides reported in Table S5.
Protein and peptide databases translated from all available potato EST sequences (DFCI Potato Gene Index, Release 12.0, 24 July 2008, at http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=potato) and from Kuras specific EST sequences  and ∼ 500 full-length cDNA submitted to GenBank with accessions starting by DQ (J. Emmersen, K. G. Welinder and K. L. Nielsen, unpublished) were created as described by Emmersen . The potato protein database StGI12-Kuras_29052009.txt and mascot searches in Tables S2, S3 and S6 are accessible at http://www.solanumdata.dk and the original data at http://www.ebi.ac.uk/pride/ accession 17707. The Matrix Science Mascot results (dat format) were converted to PRIDE xml using the Pride converter  and submitted to EBI-PRIDE according to current guidelines.
Mass lists were merged from several ESI Q-TOF experiments and analysed by mascot 2.2 MS/MS ion search (Matrix Science, London, UK) against the potato protein database. The peptide and peptide fragment mass accuracy were set to 0.1 Da, the threshold score for proteins to 50 for mascot, and peptides with significance P >0.05 were rejected prior to further analysis in mascot. Searches allowed mass spectra to be assigned to semi-tryptic and semi-chymotryptic peptides for unbiased identification of post-translational proteolytic processing.
Protein masses and sequence coverage were calculated using protein coverage summarizer software version 2.0 (http://omics.pnl.gov/software). ER signal and other known propeptides were not removed prior to calculation. bioedit sequence alignment editor version 7.0.9 (Isis Pharmaceuticals Inc., Carlsbad, CA, USA)  was used for sequence alignment. Phylogenetic trees were constructed with minimum evolution distance analysis of multiple alignments using Molecular Evolutionary Genetics Analysis (mega) software version 4 .
The major gene or protein families of potato tuber have significant sequence variation among cultivars or varieties  (present work). Extending established nomenclature for Kunitz protease inhibitor clades KPI A, KPI B, KPI C, we have previously identified the novel clades KPI K (identified first in cv. Kuras) and KPI M (miraculin-like). In the present work additional clades were found and named KPI J and KPI N (Fig. 3). Within most clades a number of genetic variants were distinguished by unique peptides. In the order of their identification in Kuras proteome studies, these sequence variants have been named by adding a variant name -k1, -k2 to the clade name, e.g. KPI C-k1, KPI C-k2, at present up to KPI C-k9 (Table 1). The unrelated potato tuber protease inhibitor super families, protease inhibitor 1 (PI 1), protease inhibitor 2 (PI 2) and carboxyprotease inhibitor (CPI), have similar variant names added, e.g. CPI-k1. Four clades of patatin proteins have been identified: pat1, pat2, pat3 and pat4 .
We thank Associate Professor Kåre Lehmann Nielsen and Dr Annabeth Høgh Petersen, Aalborg University, Aalborg, for calculating transcriptome data, Professor Ole Nørregaard Jensen, Southern Danish University, Odense, for access to a Waters Q-TOF Ultima API MS instrument, and Charlotte Sten for technical assistance. The work was supported by grant 274-06-0331 (to MJ) from the Danish Research Council for Technology and Production, and grant 2052-03-0022 (to KGW) from the Danish Research Agency.