Tail-anchored (TA) proteins function in key cellular processes in eukaryotic cells, such as vesicle trafficking, protein translocation and regulation of transcription. They anchor to internal cell membranes by a C-terminal transmembrane domain, which also serves as a targeting sequence. Targeting occurs post-translationally, via pathways that are specific to the precursor, which makes TA proteins a model system for investigating post-translational protein targeting. Bioinformatics approaches have previously been used to identify potential TA proteins in yeast and humans, yet little is known about TA proteins in plants. The identification of plant TA proteins is important for extending the post-translational model system to plastids, in addition to general proteome characterization, and the identification of functional homologues characterized in other organisms. We identified 454 loci that potentially encode TA proteins in Arabidopsis, and combined published data with new localization experiments to assign localizations to 130 proteins, including 29 associated with plastids. By analysing the tail anchor sequences of characterized proteins, we have developed a tool for predicting localization and estimate that 138 TA proteins are localized to plastids.
Distributing proteins to the correct cell compartment is a crucial function in eukaryotic cells, and is best understood in the case of endoplasmic reticulum (ER) proteins targeted co-translationally via the signal recognition particle (SRP)-mediated pathway. In this case, a signal sequence is recognized by SRP as it emerges from the ribosome, and specific delivery of the protein to the ER is ensured by a series of tightly coordinated interactions between the ribosome nascent chain complex, SRP, SRP receptor and Sec61 translocon (1–4). However, all other protein targeting appears to occur after translation is complete, and the mechanisms that ensure specific targeting are less well understood. For example, the N-terminal signal sequences that direct proteins into the chloroplast and mitochondria are remarkably similar, and some signal sequences are able to direct targeting to both organelles (5). We are focussing on a specific subset of membrane proteins that are post-translationally targeted, termed tail-anchored (TA) proteins, as a model system for understanding the mechanisms that determine organellar localization.
TA proteins are defined by a single transmembrane domain (TMD) at the C-terminus, which acts as a membrane anchor and as a targeting sequence (6,7). The position of this tail anchor ensures that it only emerges from the ribosome after translation is complete, thereby prohibiting co-translational membrane insertion. The N-terminus remains in the cytosol, and is usually the main functional domain, which can play a role in various cellular processes; for example SNARE proteins, translocase subunits and cytochrome b5. TA proteins become inserted at several different internal membranes, including the ER, mitochondria, chloroplasts and peroxisomes, yet have broadly similar tail anchors. Studies of specific TA proteins have identified features that are responsible for targeting to different organelles, including the length of the TMD and charged residues in the flanking regions (8–11). Specific motifs in the tail anchor of PEX26 are recognized by PEX19 to promote its peroxisomal targeting, but it is not known whether this is a widespread mechanism since few peroxisomal TA proteins are known (12). Despite these studies of the targeting sequences of specific TA proteins, it remains difficult to predict the localization of a novel TA protein.
Part of the difficulty underlying our limited predictive power is the complexity of targeting pathways responsible for TA protein targeting. For targeting to the ER, three distinct classes of targeting factor have been identified: SRP (13), TRC40/Asna1/Get3 (14–16) and Hsc70 (17,18). Tail anchors that are more hydrophobic appear to be targeted by SRP or TRC40, which have cognate membrane receptors at the ER (13,15), and therefore suggest a defined mechanism for specific targeting. However, Hsc70 is important for the ER targeting of less hydrophobic tail anchors, and has also been implicated in the targeting of proteins to mitochondria and chloroplasts. It is not clear whether a molecular chaperone such as Hsc70 could promote specific targeting, although chaperone receptors have been identified at the mitochondrial and chloroplast membranes. These receptors are Tom70 in the mammalian and yeast mitochondria (19), and the Toc64 family in plants; one member, mtOM64, is located to the mitochondrion (20), while Toc64 is located at the chloroplast and is able to bind precursors complexed with Hsp90 (21,22). Whilst there is clear evidence that Tom70 and mtOM64 function in protein insertion at the mitochondrial outer membrane (19,23), gene knockouts of Toc64 do not appear to have any phenotypic effect (24). A further mechanism for achieving specificity is the sterol content of the phospholipid bilayer, which may play a crucial role in preventing the ER isoform of cytochrome b5 from targeting to the mitochondrial membrane (25). The diversity of targeting pathways for TA proteins to the ER membrane alone suggests that identification of tail anchor features responsible for localization will be difficult.
TA proteins can be identified via bioinformatics approaches using a simple definition of the tail anchor. This has been applied to yeast (26) and humans (27), but has not been attempted for plants. Apart from characterizing the plant proteome, the importance of identifying plant TA proteins is the additional complexity of targeting caused by the presence of plastids, which are absent from non-photosynthetic eukaryotic cells. Furthermore, using TA proteins as a model system in plants provides the potential to unravel the general mechanisms of post-translational targeting specificity to mitochondria and chloroplasts. Specifically, the role of molecular chaperones in post-translational targeting can be addressed most effectively using plant TA proteins. An important step to using the plant TA protein model system is to determine the localization of plant TA proteins. Data are currently very limited for the plastids, with the only documented TA proteins being an isoform of cytochrome b5 (10), and Toc34 (28,29), which is a receptor of the outer envelope translocase. Therefore, a key aim of our study was to identify plastidial TA proteins.
Our approach was to search the Arabidopsis protein database for TA proteins meeting a basic definition, and to match this with new and pre-existing experimental evidence of localization. In this study, we identified 454 loci encoding TA proteins in Arabidopsis, and it was possible to assign localizations to 29% of these, including 11 proteins in plastids. From this dataset, we derived a localization prediction tool based on the TA and used this to predict the localization of the remaining uncharacterized proteins.
Results and Discussion
TA protein identification
The approach to identifying TA proteins in Arabidopsis was to use a robust definition based on the position of the tail anchor and its functional effect on targeting and membrane insertion. A post-translational mode of membrane insertion is obligatory if the tail anchor is still contained within the ribosomal exit tunnel during chain termination. As the ribosomal exit tunnel typically holds a chain of 30 residues (30), and the tail anchor will remain partly hidden from external targeting factors until this amount of C-terminal sequence is synthesized, the maximum sequence permitted to be C-terminal of the tail anchor was set to 30 residues. A typical TMD is 20 residues, so we defined a TA protein as one containing a TMD within the C-terminal 50 residues. TMDs were identified and their position defined by TMHMM (Transmembrane prediction using Hidden Markov Model) analysis (31). Additional constraints were the absence of any other predicted TMDs and lack of secretory signal peptides or N-terminal signal sequences that would direct the protein inside organelles; thus, we are defining TA proteins as those exposed to the cytosol, and not any that embed into the internal membranes of organelles. Well-established algorithms were used to identify such signal sequences [Figure 1; (32–34)]. The list of potential TA proteins was organized according to the confidence with which they fit the criteria. The search process identified 454 loci encoding 520 splice variants, which represents over 1% of Arabidopsis loci (Figure 2A). This number is similar to the 411 TA proteins predicted in humans (27), but much greater than the 55 TA proteins predicted in Saccharomyces cerevisiae(26).
Of the 520 splice variants, there are pre-existing experimental data indicating a localization for 130 proteins, based on either microscopy of a green fluorescent protein (GFP)-tagged protein or mass spectrometry of purified organelles [SUBA; (35)]. Experimental localization assigns the majority (71) of these to the ER and secretory membranes, 27 to the mitochondria, 29 to plastids and 3 to peroxisomes. The mitochondrial and plastidial proteins are expected to target directly to these organelles, but there is evidence that two peroxisomal proteins, APX3 and APX5, target via ER membranes (36). The remaining 390 proteins do not have experimental evidence to indicate their localization, and the prediction tools available are not designed to assign localization of TA proteins since their prediction is mostly based on N-terminal signal sequences. Therefore, the location of the majority of TA proteins has yet to be determined.
The TA proteins were then grouped according to their function annotated by SUBA (SUB-cellular location database for Arabidopsis proteins) and TAIR (The Arabidopsis Information Resource), with a strong indication available for 173 out of 520 proteins. Some of the classified protein functions are based on extensive characterization, whilst others are based on homology to characterized proteins. Of the unclassified proteins, some have no significant homology to characterized proteins, whilst others have homology to specific domains, although not sufficient to predict a function. The largest functional class of TA proteins (52) are SNAREs, which are involved in vesicle trafficking throughout the secretory system. This search has identified the majority (52), but not all of the 64 SNARE proteins identified in Arabidopsis (37). This is largely because some SNARE proteins do not possess TMDs, such as SNAP25 homologues, but it is also because of predictions of a second TMD or an N-terminal targeting sequence.
Membrane localization is also important in two other classes of TA proteins: ubiquitination (36 proteins) is used in the regulation of proteolysis and vesicle trafficking (38); and disease resistance (22 proteins) depends on the transmission of signals across membranes (39). The PEN2 proteins are known to be tail anchor at peroxisomes, where they play an important role in disease resistance [At2g44490 and At3g60120; (40)] but were not identified in the search because their TMDs were not recognized by TMHMM. Another large group of TA proteins (30) are transcription factors, many of which fall into the NAC-type (NAM, ATAF and CUC) group (reviewed in 41). A previous survey identified 190 transcription factors that are membrane bound in Arabidopsis. These generally possess either one or three TMDs, and become released from the membrane by regulated proteolysis to permit their relocation to the nucleus. TA proteins play a major role in all the known eukaryotic protein translocation complexes (16 identified), and are localized at the ER (Sec61β and Sec61γ) and mitochondria (TOM proteins and metaxin). Another protein known to be present in the mammalian translocon, RAMP4, has two homologues in Arabidopsis (loci At1g27330 and At1g27350), which are present in this search but only annotated as containing a RAMP4 domain. The chloroplast translocase proteins Toc33 and Toc34 are known to be tail anchor (53), but do not appear in this search because of the presence of a potential TMD in the N-terminus of the protein. The final major group of TA proteins identified (12) are the HSPs. The function of these may be diverse as different classes of HSPs are represented here, including Hsp40 (At3g62190) and Hsp20 (At1g54400). The remaining classified proteins highlight the range of functions that TA proteins play within cells, including numerous classes of enzymes. One of the best studied families of TA proteins is cytochrome b5, which is represented in this search by five proteins. Cytochrome b5 isoforms are found at the ER and in mitochondria in mammals (42) and also in the plastids in plants (10).
Overall, more than a third of TA proteins identified by this search can be classified by function, of which the major classes are involved in vesicle trafficking, cell signalling and protein translocation. However, the range of functions is diverse as there are many cytosolic processes that benefit from defined membrane localization. Figure 2B shows detailed information for representative examples of TA proteins with different localizations and functions.
The remaining unclassified proteins provide a resource for potential identification of further proteins involved in cellular processes typical for TA proteins. Therefore, these proteins could be searched using low stringency criteria to detect members of groups such as SNAREs and transcription factors. Such layered searches offer great potential for accelerating a core understanding of the entire proteome.
Experimental tests of TA protein localization
Characterized plastidial TA proteins were expected to be under-represented in our survey because of the bias towards TA protein study in mammalian systems focussing on ER and mitochondrial localizations. Therefore, we reasoned that a significant proportion of the unclassified proteins would be resident in plastids. Out of 29 potential TA proteins (stage D in Figure 1) with experimentally determined plastidial localization, 7 are predicted not to have a chloroplast signal peptide at all, and a further 9 have a weak prediction for N-terminal translocation. To identify novel plastidial TA proteins, we chose unclassified proteins with no significant N-terminal signal prediction, and tested their localization in plant cells by microscopy. This was performed by expression of a fusion protein with an N-terminal fluorescent tag. Transient expression in leaves resulted in good expression levels and allowed clear visualization of the localization (Figure 3).
Toc34 was chosen as a well-characterized TA protein localized to plastids (43), and its subcellular localization correlates closely with chloroplast autofluorescence (Figure 3, panel 1). The plastid localization of cyb26 [plastidial cyt b5; (10)] was also confirmed (Figure 3, panel 2). Therefore, the assay system results in authentic localization of known plastidial TA proteins. An unclassified protein that was predicted to be tail anchor and high in the list, At1g17780, was tested in a similar manner. At1g17780 colocalizes with chloroplast autofluorescence, showing that it is localized to plastids (Figure 3, panel 3). Therefore, At1g17780 is a novel plastidial TA protein, and this result suggests that many other plastidial TA proteins are present within this search list. To assess the potential for identifying novel mitochondrial TA proteins in plants, we used microscopy to test the location of At3g58840, for which mass spectrometry data suggested residence in the mitochondrial outer membrane. Our results show that At3g58840 tagged with mGFP5 overlaps with the mitochondrial marker CPN-60 [Figure 3, panel 4; (44)], thereby confirming the prediction of mitochondrial localization. The peroxisomal protein APX3 was used to show proper accumulation in peroxisomes (Figure 3, panel 5), showing that the expression system gives reliable localization to all of the target membranes that accept TA proteins. Another unclassified TA protein was shown to localize exclusively to the ER membrane (At2g27140; Figure 3D), showing that novel TA proteins at the ER also remain undiscovered. In addition, a protein localized to the ER membrane by mass spectrometry was tested by microscopy and its localization confirmed (At1g72090; Figure 3E). The ability to discriminate reliably between localizations to different organelles is shown in Figure S1. Thus, the available mass spectrometry data for TA protein localization correlate well with the microscopy data and can be considered to be reliable for ER proteins. Overall, we have shown that localizations of TA proteins by microscopy correlate completely with the SUBA database, which uses microscopy or mass spectrometry data. We have also been able to identify novel TA proteins targeted to the chloroplasts, mitochondria and ER.
Cell-free targeting assays
To enable the targeting analysis of a larger number of proteins and to provide an alternative assay, we established a method of cell-free targeting to chloroplasts and mitochondria. This facilitates competitive targeting of radiolabelled precursors translated in cytosolic lysate, and the localization can be determined by fractionation to separate the two types of organelles (Figure 4). True membrane insertion was confirmed by including carbonate washes. A further advantage of this system is that protein tags are not required, and therefore any native targeting sequences are preserved. To check for authentic processing by this cell-free system, we used the plastidial TA protein Toc33. These control proteins were almost exclusively localized to chloroplasts, thereby confirming authentic targeting (Figure 4). Mitochondrial localization was confirmed for the established mitochondrial TA protein TOM20-3. When uncharacterized TA proteins were tested, we found that At1g17 (At1g17780.1) and At3g63 (At3g63160.1) localized predominantly to chloroplasts, whereas At3g58 (At3g58840.1) and At3g21 (At3g21710.1) localized predominantly to mitochondria. This agrees with the localization ascertained by microscopy of the yellow fluorescent protein (YFP)-tagged At1g17780 and At3g58840, showing that the competitive cell-free targeting assay is a reliable test for the targeting of TA proteins.
An important characteristic of TA proteins is their topology in membranes: they are orientated such that they display the N-terminal domain in the cytosol. This topology was tested in the cell-free system by applying the protease thermolysin to the organelles that had been exposed to the putative TA proteins. When At1g17 and At3g63 were tested in this way, the observed loss of full-length protein was consistent with tail anchor conformations (Figure 4). The thylakoid membrane protein Lhcb1 (45) was unaffected by protease, confirming that the chloroplast interior was not accessible to protease. Similarly, the mitochondrial proteins At3g58 and At3g21 behave as TA proteins, whilst the inner membrane protein At5g01340 (46) is inaccessible to protease. Taken together with the microscopy analysis, we have confirmed the TA status of four TA proteins predicted by our bioinformatics survey, and established the reliability of published localization data for seven TA proteins.
Analysis of tail anchors to predict localization
TA proteins with reliable localizations were used to identify sequence characteristics that influence targeting, considering the initial localizations of ER, mitochondria and chloroplasts. As the tail anchor is primarily responsible for targeting, we focussed on length and hydrophobicity of the TMD. Regions flanking the TMD have also been shown to play important roles in targeting (8–11), so their amino acid composition was also quantified.
Although there are no simple correlations between targeting and tail anchor sequence, there are clear biases in many characteristics (Figure 5). Examples are the hydropathy (47) of the TMD, which is lower in mitochondrial TA proteins compared with plastidial and ER-TA proteins, and the number of amino acids after the TMD, which is lower in ER proteins. By combining these biases in a number of characteristics, we reasoned that it is possible to generate a prediction tool for TA protein localization based on the sequence of the tail anchor. The fit of each characteristic to the norm for each localization was calculated in terms of standard error from the mean. Each characteristic was weighted equally, except for the TMD hydrophobicity and the number of residues following the TMD, which were given higher weightings to reflect their biological importance and composition of longer sequences of residues. This weighting substantially increased the success of the predictive tool for plastidial proteins. The number of residues in the TMD was excluded on the basis that it is already accounted for by the hydropathy analysis. Each TA protein was assigned to the localization for which it showed least total difference from the norms.
The prediction tool was tested initially on the training set of proteins, showing that it was reliable for predicting mitochondrial proteins with an 81% success rate, and also predicted ER proteins correctly in 62% of cases (Figure 6). The success rate for plastid proteins was 55%, which most likely reflects the low number of plastidial proteins available to set a norm for the standard characteristics. This small training set is likely to skew the norms towards characteristics that may not be relevant to targeting, but prediction on the basis of these calculations is expected to improve with an increased experimental dataset of plastidial proteins. The localization of all putative TA proteins was predicted, and the totals adjusted to reflect the success rates for each localization. This yields a prediction of 239 ER-targeted proteins, 143 mitochondrial proteins and 138 plastidial proteins. Although the prediction of localization for individual proteins needs to be tested experimentally, this data provides a valuable resource for identifying the proteins that are most likely to target to a particular membrane. The estimated totals suggest that currently, only a small proportion of TA proteins at the mitochondria and plastids have been identified experimentally. As the localization of more TA proteins is verified experimentally, it should be possible to increase the predictive power of this tool, especially for plastid TA proteins. In addition, the limited availability of proteins with known localization has made tests of prediction accuracy impractical, and therefore the identification of new plastidial proteins will allow the accuracy of the prediction tool to be tested independently of the training set.
We have identified potential plant TA proteins by using a robust definition of tail anchor features in a database search. Function and localization could be assigned to a large proportion of these proteins using database annotations, and several novel TA proteins were characterized experimentally. Known TA proteins were generally identified by this survey, showing that the methodology is effective and reliable. However, some anomalies arose because of the difficulty of classifying proteins from sequence alone. The most important filter was the identification of TMDs, which falsely identifies a second TMD in Toc34 (28), and fails to identify a known TMD in PEN2 (40). It may be valuable to refine this filter in future by applying different parameters or by generating a probability output. The other key issue is determining whether a protein has an N-terminal targeting sequence, which we have ranked by probability. Therefore, the likelihood of a protein being situated in the cytosol varies with the ranking, and it is not possible to make a definitive judgement without experimental evidence. We have validated the prediction of seven TA proteins by a combination of in vivo and cell-free assays, thereby providing confidence for the prediction of TA proteins with the highest rankings.
Experimental evidence from the SUBA database was used to assign localization, and this was found to be reliable using both microscopy and cell-free assays. These two approaches support each other since the fluorescent tag has the potential to disrupt native targeting sequences, and the cell-free assay offers a limited selection of membranes. The combination of these two sets of data offers a definitive means of determining TA protein status and localization.
A key finding is that plastids possess a significant number of TA proteins, with strong evidence for 11, basic requirements returning 29 and an extrapolated total estimate of 138. The total estimates suggest that current proteomic approaches require significant improvement for identifying TA proteins in all organelles.
Although the localization predictions for plastidial proteins are not highly accurate, experimental localization for additional TA proteins should enhance the power of the predictive tool. With more data it may also be valuable to optimize the weighting of tail anchor characteristics. The TA proteins identified here will be valuable tools for understanding post-translational protein targeting, and in particular how plants are able to segregate proteins to mitochondria and plastids. Overall, we expect that the TA proteins identified here can be combined with other searches to advance our characterization of the plant proteome.
Materials and Methods
Cloning of expression plasmids
Primers were obtained from MWG Biotech, and restriction enzymes were from New England Biolabs. Phusion high-fidelity DNA polymerase (New England Biolabs) was used for all polymerase chain reaction (PCR) reactions. Modified binary vector pVKH18-EN6 containing N-terminal GFP (48) or eYFP-tag (49,50), respectively, were kindly provided by Professor Chris Hawes (Oxford Brookes University). Arabidopsis genes were obtained as cDNAs in plasmids from TAIR. Genes of interest were amplified with primers containing BamHI and SacI restriction sites and cloned into vectors via these sites.
Plant material and transient expression system
For Agrobacterium-mediated transient expression (49), 4-week-old tobacco (Nicotiana tabacum SR1 cv Petit Havana) plants grown in a growth chamber (make) at 21°C (14 h light, 10 h dark) were used. Briefly, each expression vector was introduced into Agrobacterium strain GV3101 (pMP90) by heat shock. A single colony from the transformants was inoculated into 5 mL of YEB medium (per litre: 5 g of beef extract, 1 g of yeast extract, 5 g of sucrose and 0.5 g of MgSO4·7H2O) supplemented with 50 μg/mL kanamycin and rifampicin. After overnight shaking at 25°C, 1 mL of the bacterial culture was pelleted in a 1.5-mL tube by centrifugation at 2200 ×g for 5 min at room temperature. The pellet was washed twice with 1 mL of infiltration medium (50 mm MES, 2 mm Na3PO4.12H2O, 0.1 mm acetosyringone and 5 mg/mL glucose) and then resuspended in 1 mL of infiltration buffer. The bacterial suspension was diluted with the same buffer to adjust the inoculum concentration to the final OD600 of 0.1 and inoculated using a 1-mL syringe without a needle by gentle pressure through the stomata on the lower epidermal surface. For experiments requiring coinfection of more than one construct, bacterial strains containing the constructs were mixed prior to the leaf infiltration, with the inoculum of each mixed construct adjusted to the required final OD600 of 0.1. Transformed plants then were incubated under normal growth conditions for 48 h.
Imaging was conducted on a Zeiss LSM 510 laser scanning microscope using a 63× oil immersion objective (Zeiss). For imaging expression of mGFP5 constructs, excitation lines of an argon ion laser of 458 nm were used with a 475/525-nm bandpass filter in the single-track facility of the microscope. For imaging expression of YFP, excitation lines of an argon ion laser of 514 nm were used and fluorescence was detected using a 560/615-nm bandpass filter. Marker constructs were mitochondrial GFP-CPN-60 (44) and peroxisomal CFP-SKL.
Competitive targeting import assays
Chloroplasts were prepared from pea leaves: 2–3 g of pea leaves grown for approximately 10 days in soil were harvested, cut into small pieces and placed into two 50-mL Falcon tubes with 30 mL ice-cold grinding buffer [2 mm EDTA, pH 8.0, 1 mm MgCl2, 1 mm MnCl2, 50 mm HEPES, pH 7.5, 330 mm sorbitol, 0.1% (w/v) sodium ascorbate, 0.25% BSA]. After homogenization for 30 seconds with a Polytron homogenizer, the homogenate was filtered through four layers of cheesecloth and spun at 3000 ×g for 2 min. The resulting pellets were resuspended with a paintbrush in 0.5 mL grinding buffer, layered onto a Percoll step gradient (3 mL of 80% Percoll in PBF [30% (w/v) polyethylene glycol (PEG), 10% (w/v) BSA, 10% (w/v) Ficoll] overlaid with 9.5 mL of 40% Percoll in PBF) and spun in swinging bucket at 9000 ×g and 4°C for 8 min. The lower band containing intact chloroplasts was transferred to a fresh tube, the volume adjusted to 25 mL with grinding buffer and spun at 3000 ×g for 4 min. The resulting pellet was redissolved in 0.5 mL 1× HSM (50 mm HEPES, pH 8.0, 330 mm sorbitol, 8.4 mm methionine).
Mitochondria were isolated from maize coleoptiles using a method based on Lennon et al. (51): 2–3 g of 5-day-old maize coleoptiles were cut into small pieces and homogenized for 30 seconds in a Polytron homogenizer in 30 mL of mitochondrial grinding buffer [50 mm TES, pH 7.5, 300 mm sucrose, 2 mm EDTA, 1 mm MgCl2, 1% (w/v) PVP-40, 0.4% (w/v) BSA, 4 mm cysteine]. The homogenate was filtered through four layers of prewetted muslin and centrifuged at 2960 ×g for 3 min; the pellets were discarded. The supernatant was centrifuged at 17 700 ×g for 20 min, the pellet was resuspended in 0.5 mL of wash buffer [20 mm TES, pH 7.5, 300 mm sucrose, 0.1% (w/v) BSA] and loaded onto a continuous Percoll/PVP-40 gradient made up with 20 mL of heavy solution [10 mL of solution II consisting of 600 mm sucrose, 20 mm KH2/PO4, pH 7.5, 0.4% (w/v) BSA, 5.5 mL Percoll and 4.5 mL 40% (w/v) PVP-40] and 20 mL of light solution (10 mL of solution II, 5.5 mL Percoll and 4.5 mL H2O). The preparation was centrifuged at 39 200 ×g for 40 min. Intact mitochondria formed a straw-coloured band near the bottom of the gradient, which was transferred to a fresh tube. Approximately 30 mL of wash medium was added to the mitochondrial fraction and centrifuged at 12 100 ×g for 20 min. The pellet was gently resuspended in 0.5 mL 1× HSM (50 mm HEPES, pH 8.0, 330 mm sorbitol, 8.4 mm methionine).
Transcription and translation
Proteins of interest were fused to the pSPUTK SP6 promoter via overlapping extension PCR (54). Transcriptions were performed with 15 μg PCR-fusion product and SP6-RNA-Polymerase (New England Biolabs), according to the manufacturer's manual. Protein translations were performed in wheat germ extract (Promega) according to the manufacturer's instructions using Easy Tag Express 35S (Perkin Elmer).
Competitive import assays
Imports were performed, based on Rudhe et al. (52), using 8 μL prespun translated protein, 52 μL import buffer (50 mm HEPES, pH 8.0, 330 mm sorbitol, 8.4 mm methionine, 13 mm ATP, 13 mm MgCl2), 20 μL chloroplasts and 20 μL mitochondria and incubated at 30°C for 20 min. Organelles were repurified by two successive centrifugation steps (2 min at 3000 ×g to pellet chloroplasts and 20 min centrifugation at 17 000 ×g of the supernatant to pellet mitochondria) and a wash with 0.1 m Na2CO3. Both fractions were run on an SDS–PAGE and the radiograms scanned on Cyclon Phosphor Screen (Packard) in order to visualize the 35S-labelled proteins.
Identification of potential TA proteins
The TMHMM analysis for the complete TAIR8 genome was downloaded from TAIR's FTP server (ftp://ftp.arabidopsis.org/home/tair/Proteins/Properties/Membrane_Proteins.tair8). All sequences with either 0 or more than 1 predicted TMDs were rejected. The retained sequences were reanalysed using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) and all loci with TMDs in a region other than the C-terminal 50 amino acids were rejected. The 695 loci identified were further refined using SignalP v3.0 analysis, a program which predicts the likelihood of a protein being secreted in vivo based on properties of the primary sequence (http://www.cbs.dtu.dk/services/SignalP/), and with high performance (98.2% sensitivity, 0.4% false positives for eukaryotes). The parameters were set as follows: eukaryotes, neural networks and Hidden Markov Model; truncated to first 70 residues. For each locus, the mean of D-score and Sprob was calculated: the top 20% were termed 1SPT; the middle 60% termed 2SPM and the bottom 20% termed 3SPB. The third program used was Protein Prowler v1.2 (http://pprowler.itee.uq.edu.au/pprowler_webapp_1-2/). The parameters were set as follows: plant, rejection when probability of secretory signal sequence was above 0.5. The top 20% were termed 1 PPT, the middle 60% termed 2 PPM and the bottom 20% termed 3 PPB. TargetP v1.1 (http://www.cbs.dtu.dk/services/TargetP/) was used to detect secretory pathway signal peptides and N-terminal plastidial or mitochondrial targeting sequences (performance is 80–100% specificity depending on the reliability class). The results of all predictive analyses were compared and annotated according to the degree of confidence with which each locus could be predicted to be both non-secretory, and not N-terminal targeted. The proteins were split into 12 equal groups of the 695 loci based on the consensus of the three prediction tools used. All proteins in the top 9 ranks (520 proteins) were investigated further and ranked summing up SignalP and TargetP ranking (ranks from 2 to 7). Detailed protein data were retrieved from SUBA2 (http://www.plantenergy.uwa.edu.au/suba2/ for the list of loci; Table S1).
Prediction of localization
Characteristics of the C-terminal sequence for prediction took into account 11 plastidial, 21 mitochondrial and 53 ER-targeted proteins. Total hydrophobicity of each TMD was calculated by summing the hydrophobicity values for each amino acid as calculated by Kyte and Doolittle (47). Mean and standard error were calculated for the sequence length carboxy-terminal to the TMD, hydrophobicity of the TMD, positive and negative charged amino acids and polar and aromatic amino acids in a sequence of 10 amino acids, both N-terminal and C-terminal of the TMD. For proteins with no known localization, the difference from the mean for each characteristic in each compartment (chloroplast, mitochondria and ER) was calculated in terms of the number of standard errors. For a final prediction the length after the TMD was weighted twice, and the K-D index weighted 5× compared with the other values (to account for the length of different features: average TM length 21.2, length after TMD 9.8 and amino acid composition 1.4). The total sum for each compartment was converted into a percentage of the maximum value, and localization was assigned to the compartment with the lowest value (Table S2F).
This work was supported by a Biotechnology and Biological Sciences Research Council (BBSRC) Research Grant awarded to B. M. A. We thank TAIR, Chris Hawes and Colin Robinson for supplying plasmids and Alison Baker for SKL tobacco seeds. Thanks to Chris Hawes' group for advice with in vivo localization and to Tom Smith for help with manuscript preparation.