Intrinsically disordered (ID) proteins (IDPs) are abundant in eukaryotes but are scarce in prokaryotes. Mitochondria, cellular organelles that descended from Rickettsia-like α-proteobacteria, are at the intersection between prokaryotes and eukaryotes. Although IDPs are reportedly as rare in mitochondria as in bacteria, these details remained to be clarified. Human mitochondrial proteins (n =706) were obtained from the UniProt database, and information on ID regions of all human proteins was extracted from the DICHOT database. A BLAST search carried out against all α-proteobacterial proteins identified two types of mitochondrial proteins: those with (B) and without (E) bacterial homologues. The B-type proteins (n =387) descended from a bacterial ancestor, whereas the E-type proteins (n =319) were more recently added to the mitochondria via the host cell during the early evolution of eukaryotes. The average ID ratios of B-type/E-type proteins are 10.3% and 21.4%, respectively. The 706 proteins were further classified into four groups based on the mitochondrial subcompartment, namely, the matrix, intermembrane space, inner membrane, or outer membrane. The ID ratios in these different locations suggest that the frequency of IDPs in mitochondria might be due to the evolutionary origin (B-type/E-type) of the protein, rather than differences in its functional environment.
Intrinsically disordered proteins (IDPs) are proteins that contain nonfolding or unstructured regions in vivo or in their intrinsic state; such regions sometimes extend throughout the protein (Dunker et al. 2008; Nishikawa 2009). A key characteristic of IDPs is that they do not form any predetermined tertiary structure while in an isolated state in solution; however, when in the presence of an interacting protein, an IDP coils itself around the partner protein to form a complex, which then exerts a specific molecular function (Wright & Dyson 1999). IDPs are more frequently observed in eukaryotes and often function in the cellular regulation process, such as signal transduction (Iakoucheva et al. 2002) and transcriptional regulation (Liu et al. 2006; Minezaki et al. 2006). In general, IDPs are composed of structural domains, which use a globular folded structure, as well as intrinsically disordered (ID) regions, which are unstructured.
We previously developed a methodology, DICHOT (Fukuchi et al. 2011), for dividing or dichotomizing entire protein sequences into structural domains (SDs) and ID regions. A SD is then further subdivided into the following two domains: the known domains (KDs), whose structures are already known and registered in the PDB database (Berman et al. 2000), and the cryptic domains (CDs), whose structures have not yet been experimentally determined. DICHOT is unique among the various disorder-prediction programs in that it predicts not only ID regions but also CDs.
When we applied DICHOT to several model organisms, including humans, to estimate the ID ratio (the ratio of the length of the ID regions to the total sequence length), we found that the ID ratio is quite high in eukaryotes (30–41%) but low in prokaryotes (8–10%) (Fukuchi et al. 2011), which supports the findings by Ward et al. (2004) that IDPs are more frequently observed in eukaryotes than in prokaryotes. Moreover, we also categorized human proteins according to their subcellular localizations and obtained the ID ratio for each category. We found that the ID ratio was the highest for nuclear proteins (47%). One explanation for this is that all transcription factors are nuclear proteins, and the average ID ratio for transcription factors is as high as 63% (Fukuchi et al. 2009a). In contrast, the lowest ID ratio was obtained for mitochondrial proteins (18% for mitochondrial membrane proteins and 13% for mitochondrial nonmembrane proteins). These findings were in accord with the above differences between eukaryotes and prokaryotes, considering the origin of eukaryotic mitochondria from the endosymbiosis of α-proteobacteria.
A comparison of the DICHOT-determined ID ratios showed that prokaryotes and mitochondria differed significantly, indicating a possibility that not all mitochondrial proteins have their origin in ancestral bacteria. In fact, comparative analysis of the genome sequences of Rickettsia and the yeast Saccharomyces cerevisiae (Karlberg et al. 2000; Andersson et al. 2002) showed that approximately 10% of mitochondrial proteins have homologues in Rickettsia per se; approximately 40% have homologues in bacteria excluding Rickettsia; and approximately 30% have homologues only in eukaryotes, leaving approximately 20% with no known homologues. Among these four groups, the third group might comprise proteins that were transported to mitochondria from the host cells after the establishment of endosymbiosis. This then indicates that not all mitochondrial proteins have their origin in bacteria, and that a significant number have their origin in eukaryotes.
To address this question further, in this study, we focused on human mitochondrial proteins, first classifying them into groups as bacterial type (B-type) or eukaryotic type (E-type), depending on whether protein homologues were found in bacteria or not, respectively. We then calculated the ID ratio for each type of protein, in an attempt to explain the difference between the ID ratios of mitochondrial and bacterial proteins. We also compared the B-type and E-type mitochondrial proteins specifically to investigate whether there were any characteristic differences.
On the basis of a homology search against all proteins of α-proteobacteria, the 706 human mitochondrial proteins in our dataset were classified as proteins of bacterial type (B-type; n =387) or proteins of eukaryotic type (E-type; n =319). As described in the Methods section, the proteins with a clear homologue in α-proteobacteria were defined as B-type; those with weaker homology in α-proteobacteria were classified as B-type only if the same SCOP/Pfam domains were found. Although our criteria for defining bacterial homologues were not particularly strict, the ratio of the B-type proteins we obtained using these criteria (55%) was similar to that previously reported for bacterial homologues in yeast mitochondria (approximately 50%) (Karlberg et al. 2000).
In contrast, mitochondrial proteins without homology to proteins in α-proteobacteria were defined as E-type. We found 99 proteins with SCOP/Pfam domains unique to eukaryotes and defined these proteins as E-type. Some examples of E-type proteins are CPT2_HUMAN (UniProt Accession Number: P23786), DBLOH_HUMAN (Q9NR28), KCRS_HUMAN (P17540), LETM1_HUMAN (O95202), MPCP_HUMAN (Q00325), PTH2_HUMAN (Q9Y3E5), STAR_HUMAN (P49675), and TFAM_HUMAN (Q00059).
Fukuchi et al. (2011) previously conducted a DICHOT analysis of all human proteins to classify the protein molecules into KDs, CDs, and ID regions; these results were compiled in the DICHOT database (http://spock.genes.nig.ac.jp/~genome/DICHOT/). Although DICHOT analysis is merely a computational prediction method based on sequence information, it fully uses experimentally determined 3D structural information, that is, PDB/SCOP, to preferentially assign KDs. It then classifies the remaining regions as either ID regions or CDs, which results in remarkably high prediction accuracy on a residue-by-residue basis (Fukuchi et al. 2011).
Here, we mainly focused on ID regions of the proteins, while regarding the remaining parts as SDs by combining KDs and CDs (including those regions predicted as globular domains of unknown structure). Using the prediction data of 706 human mitochondrial proteins from the DICHOT database, we obtained an average ID ratio of 14.2% on a per-residue basis (Table 1). For proteins having a transfer signal sequence, the ID ratio was calculated for the mature protein, excluding the signal sequences. The average ID ratios for B-type and E-type proteins were 10.3% and 21.4%, respectively (Table 1); therefore, this result agrees with the general trend of B-type < E-type (Fukuchi et al. 2011). Moreover, when the DICHOT analysis was applied to two species of α-proteobacteria as a reference standard (see Methods), ID ratios of 10.2% for Rickettsia prowazekii and 10.5% for Rickettsia typhi were obtained, which were nearly identical to the above-mentioned average ID ratio for B-type proteins (i.e., 10.3%).
Table 1. Number of proteins and ID ratios for each localization category
Owing to larger variances for smaller sample sizes, average ID ratio was deleted for sample sizes less than 20.
The average ID ratio values are shown in Fig. 1, as is the size distribution of the ID regions [the percentage of IDPs that contain consecutive ID regions longer than n amino acids are plotted at n =30, 40, …, 130 etc. (Iakoucheva et al. 2002)]. Figure 1 clearly shows that ID ratios for E-type proteins are statistically significantly higher than those for B-type proteins (P <0.0001, Mann–Whitney U-test). Contrary to the above coincidence of average ID ratios between B-type proteins and Rickettsia proteins, one may observe some differences between them in Fig. 1. This apparent contradiction is resolved when the plot in Fig. 1 is extended to show a longer range of the x-axis (see Fig. S1 in Supporting Information).
Dependence on protein localization in mitochondria
Next, all the human mitochondrial proteins were classified into four localization categories: mitochondrial matrix, inner membrane, intermembrane space, and outer membrane, according to the annotation given in UniProt. Proteins without localization annotation were treated as ‘unclassified’ (localization unknown). Table 1 shows the classification results, with matrix as the largest category (287 proteins) and intermembrane space as the smallest (34 proteins). In addition, the membrane protein fraction was high (36% of the total, excluding ‘unclassified’), which was considered characteristic of mitochondrial proteins. When considering B-type/E-type categories, this characteristic was more pronounced for E-type proteins, most of which were membrane proteins (135/211 = 64%; Table 1).
The ID ratio among the localization categories was the lowest for mitochondrial matrix (9%) and highest for mitochondrial inner membrane (16%) (excluding the ‘unclassified’ category) (Table 1). However, considering that matrix proteins are predominantly B-type proteins, and inner membrane proteins are predominantly E-type proteins, this difference in ID ratios simply reflects the above difference between B-type and E-type proteins. The lowest ID ratio category, B-type matrix proteins, is also included in Fig. 1 as a reference.
Comparisons of proteins with or without signal sequence and between enzymes or nonenzymes
We further investigated the ID ratios in more detail by dichotomizing proteins according to the presence or absence of a signal sequence, as well as between enzymatic and nonenzymatic function, and these results are summarized in Table S1 in Supporting Information. A localization signal is necessary for the transportation of mitochondrial proteins synthesized in the host cytoplasm into the mitochondria. The signal sequence is located at the N-terminus of proteins and varies in length from 10 to 110 amino acids (UniProt). This signal is cleaved after transportation. The translocon protein, embedded in the mitochondrial outer membrane, recognizes the localization signal and facilitates transportation of the protein into the intermembrane space, or further into the matrix. It is known that the translocon channel has a strong negative charge, whereas the signal sequence has a positive charge gradient toward the N-terminus, and the whole sequence shows a high ID ratio (Homma et al. 2012).
We investigated the existence of signal sequences in human mitochondrial proteins and found signals in 60% of the proteins. The majority (75%) of these were of the B-type (Table S1 in Supporting Information). Thus, the existence of a cleavable signal sequence does not directly correspond to B-type/E-type classification. However, it is significant that outer membrane proteins almost completely lack the cleavable signal sequence and many inner membrane proteins also do not have signal sequences. It is known that mitochondrial outer/inner membrane proteins are equipped with several types of noncleavable signal sequences (Schleiff & Becker 2011). It should be noted that none of the 13 proteins encoded by mitochondrial DNA (mtDNA), involved in the respiratory chain in the inner membrane, has a signal for insertion into the membrane; in fact, the insertion mechanism in the absence of a signal is well known (Ott & Herrmann 2010).
Table S1 in Supporting Information shows that ID ratios were low for proteins with a signal sequence (11%) and high for proteins without such a signal (19%). The ID ratios were also low for enzymes (9%) but high for nonenzymes (21%). When we further divided the proteins into B and E-types and classified them according to localization, the ID ratios were the lowest for B-type enzymes in the matrix (7% of 158 proteins) and highest for E-type nonenzymes in the outer membrane (24% of 32 proteins). Moreover, we found that many B-type proteins possessed a signal sequence and were enzymes, whereas many E-type proteins had no cleavable signal sequences and were nonenzymes (Table S1 in Supporting Information). Furthermore, all but 1 of the 53 outer membrane proteins did not possess a signal sequence, whereas E-type inner-membrane proteins were almost always nonenzymatic (1 exception of 93 proteins).
Size distribution of mitochondrial proteins
The size distribution of mitochondrial proteins is shown in Fig. 2. B-type proteins had a single-peaked distribution with a maximum around 350 amino acids, whereas E-type proteins showed a more ragged distribution that was shifted to the left (i.e., toward smaller sizes), with the highest peak around 150 amino acids long. The average sequence lengths of B-type and E-type were 398 and 268 amino acids, respectively; thus, B-type proteins were longer than E-type proteins by an average ratio of 3:2. This was an unexpected result, as it is well known that eukaryotic proteins not only have longer ID regions than prokaryotic proteins but also generally contain multiple structural domains (Apic et al. 2001). Figures 1 and 2 together suggest that E-type proteins contain long ID regions (Fig. 1), but their total sequence lengths are frequently shorter (Fig. 2).
Table S1 in Supporting Information indicates that the E-type protein category is enriched for membrane proteins, especially those of the inner membrane, and that the majority of E-type proteins are nonenzymes rather than enzymes. Most of the small proteins, with a sequence length shorter than 200 amino acids, were E-type proteins and were nonenzymes (Fig. 2). Enzymes are typically not small, as they need to maintain globular domains for enzymatic activity, whereas nonenzymes are free from such constraints. As a result, ID regions are more frequently found in E-type, nonenzyme proteins than in B-type proteins (Table S1 in Supporting Information).
The goal of the present study was to determine why the ID ratio is significantly higher in mitochondrial proteins than in bacterial proteins, such as those of Escherichia coli and Bacillus subtilis. We considered that the answer lay in the fact that mitochondria did not necessarily inherit all of their proteins from the bacterial symbionts, which are assumed to be the progenitors of mitochondria. Indeed, it is well known that translocons embedded in the mitochondrial outer and/or inner membranes, which are indispensable for the transportation of proteins synthesized in the host cytoplasm into the mitochondria, are unique to mitochondria and never existed in ancestral bacteria (Schleiff & Becker 2011). Moreover, on the basis of the complete sequencing of the Rickettsia genome (considered the phylogenetically closest relative to the purported proto-mitochondrion), Karlberg et al. (2000) reported that not many clear orthologs between Rickettsia and yeast mitochondria exist. They concluded that about half of the mitochondrial proteins were newly added from the host cells (i.e., E-type proteins) early in the evolution of eukaryotes. Our results showed that 55% of the human mitochondrial proteins were classified as B-type, with homologues in α-proteobacteria, whereas the remaining 45% were classified as E-type, and also that ID regions were found in the eukaryotic type (E-type) and nonenzymatic proteins more frequently than in the bacterial type (B-type) and enzymatic proteins (Table S1 in Supporting Information).
Further studies allowed us to determine localization of different types of proteins within the mitochondria. We found that the mitochondrial matrix contains more than half of the total proteins (n =501, eliminating proteins unclassified) (Table 1). As the matrix also contains mtDNA, some of these proteins are necessary for DNA replication and gene expression. In addition, the matrix is also the location of the tricarboxylic acid (TCA) cycle, which is responsible for ATP synthesis, the main function of mitochondria. All the enzymes involved in the TCA cycle, except 1 [succinate dehydrogenase, which resides in the inner membrane (Oyedotun & Lemire 2004)], as well as many other metabolic enzymes, are contained in the matrix. Most of these enzymes are commonly shared with α-proteobacteria and are therefore classified as B-type proteins (Table S1 in Supporting Information). However, major nonenzymes in the matrix include ribosomal proteins (n =58), of which 33 are B-type, and the remaining 25 are specific to mitochondria and are therefore considered E-type by definition. The average ID ratios of the B-type and E-type ribosomal proteins are 25.4% and 20.7%, respectively. In addition, six subunits of the F1 component of ATP synthase are nonenzymes of B-type, and 15 accessory subunits of NADH dehydrogenase complex (Complex 1), which is composed of many more protein components, are nonenzymes of E-type. Although Complex 1 itself is embedded in the inner membrane, the accessory subunits above are not membrane proteins but are bound to Complex 1 from the matrix side.
The mitochondrial inner membrane contains all 13 proteins encoded on mtDNA, along with other membrane proteins that are involved in the oxidative phosphorylation of the respiratory chain and the F0 components of ATP synthase. Although the main members of these membrane protein groups are enzymes belonging to the class of oxidoreductases and are commonly found in bacteria (B-type), most of the constituent membrane proteins (subunits) of the respiratory chain complexes are nonenzymes. For instance, more than 10 accessory subunits of Complex 1 are single-pass membrane proteins of E-type. Moreover, cytochrome-c oxidase complex (COX) contains 5 or more accessory subunits similar to those in Complex 1. Inner membrane translocons TIM22 and TIM23 are composed of at least 8 typical E-type membrane proteins (Chacinska et al. 2009) of nonenzymes. Furthermore, there is a large protein family called substrate carrier proteins (n ≈ 50) embedded in the inner membrane. These are transport proteins that incorporate various compounds from cytoplasm into mitochondria and are characterized by a barrel structure comprising six transmembrane helices. As this type of transporter never exists in prokaryotes, they are E-type nonenzymes, and their average ID ratio is 24.0%. Among them, ADP/ATP translocase is of high importance because it exports ATP molecules produced from oxidative phosphorylation to the cytoplasm (Voet & Voet 1995). This implies that the mitochondrial symbiosis is maintained by the mitochondrion receiving various nutrients from the host cell and providing ATP produced in the matrix to the host cell in return.
Not many kinds of proteins are found in the intermembrane space (Table 1). The most prominent among them (eight in total) may be the inner membrane translocase subunits that associate TIM22 from the intermembrane side and chaperone-like proteins that help target protein folding or insertion into the membrane. All of these are E-type nonenzymes. In addition, there are six subunits associating with respiratory chain complexes (Complex 1, COX, and cytochrome bc1 complex) from the intermembrane side, two of which have enzymatic activity (B-type), whereas the remaining four are all nonenzymes (E-type).
The mitochondrial outer membrane contains the translocon (TOM), through which mitochondrial proteins are transferred from the cytoplasm. TOM consists of 13 subunits of membrane proteins (Chacinska et al. 2009), all of which are eukaryote-specific (E-type) nonenzymes. Porin is a channel protein characteristic to the outer membrane, composed of a β-barrel structure embedded in the membrane, which can freely transfer relatively low-molecular-weight compounds or peptides up to MW 5,000 (Alberts et al. 2003). There are six kinds of porins with different channel diameters. In addition, the outer membrane also contains enzymes, long-chain acyl-CoA synthetase (LACS), and its derivatives (eight in total), all of which are single-pass transmembrane proteins (B-type). LACS catalyzes the prestep reaction for β-oxidation of fatty acids, coupled to the TCA cycle in the matrix (Soupene & Kuypers 2008). Yet another enzyme called mitofusin (E-type), which has GTPase activity and is involved in mitochondrial fusion, is in the outer membrane (Santel et al. 2003).
Not all of the proteins described above are expressed in all human cells, and some proteins are expressed in a tissue-specific manner. For instance, E-type enzymes of carnitine O-palmitoyltransferase in the outer membrane and creatine kinase in the intermembrane space are mainly expressed in muscle mitochondria. According to the UniProt annotation, some of the substrate carrier proteins embedded in the inner membrane are specific to particular tissues such as the liver, kidney, brain, and brown adipose tissue.
Furthermore, the protein repertoire of a mitochondrion may differ across species. Although the accumulation of genome sequence data strongly supports that mitochondria of all eukaryotes descend from a common ancestor, implying monophyletic origin (Gray et al. 1999), it is known that significant variations are seen in mitochondria, depending on different lineages of eukaryotes. For example, plant mitochondria have much longer (>200 kb) mtDNAs than mammals (≈17 kb). Furthermore, many kinds of protozoa, such as ciliates and amoebae, have cristae in different shapes, that is, tubular cristae instead of the flat cristae commonly observed in animals and fungi (Cavalier-Smith 1993). Comparing mitochondrial proteins between human and yeast (S. cerevisiae), we realized that some proteins appear in humans but not in yeasts (Table 2). They are all E-type nonenzymes involved in apoptosis, mainly residing in the outer membrane, and their average ID ratio is 28.5%. The Pfam analysis we carried out showed that at least the 12 proteins listed in Table 2 exist only in animals (metazoa), but do not exist in fungi, plants, or protozoa. This implies that they were introduced to mitochondria at the appearance of animals, later in the eukaryote evolution. Then, according to the evolutional order, mitochondrial proteins should be classified not into two groups (B-type and E-type), but into three groups (B-type, E-type, and E′-type). Here, E′-type proteins are those lineage-specific and E-type proteins later introduced to mitochondria from the host.
Table 2. Human proteins specific to animal mitochondria
Bcl-2-like protein 1
Bcl-2-like protein 2
Bcl-2-like protein 10
Bcl-2-like protein 13
Bcl2 antagonist of cell death
Bcl-2 homologous antagonist/killer
Apoptosis regulator BAX
Apoptosis regulator Bcl-2
BH3-interacting domain death agonist
BH3-like motif-containing cell death inducer
BCL2/adenovirus E1B 19 kDa protein-interacting protein 3
Second mitochondria-derived activator of caspase
Questions remain as to why IDPs are more often found in eukaryotes than in prokaryotes and why IDPs are abundant in the nuclei but scarce in the ER/Golgi apparatus, as well as in the extracellular matrix, of eukaryotes (Fukuchi et al. 2011). Does the frequency of IDPs depends on the surrounding protein environment in the subcellular regions or is it simply an evolutionary reflection of the genealogy of the individual protein families? If the former is true, the occurrence of IDPs would depend on the types of proteases that co-exist in the environment of a protein, because IDPs are more easily degraded by some types of proteases (Tompa 2002; Melo et al. 2011). Our analyses of the mitochondrial proteins categorized into four subcompartments (matrix, inner membrane, intermembrane space, and outer membrane) may provide the answer to this question. Among the four groups, we focused on proteins in the matrix group, which contained the largest number of proteins. If the occurrence of IDPs depended on the environment rather than on evolutionary origin (i.e., B-type or E-type), all matrix proteins would show equally low ID ratios, independent of whether the protein was of the B-type or the E-type. However, Table 1 shows a low average ID ratio (8%) for B-type proteins and a high average ID ratio (15%) for E-type proteins, in the category of matrix proteins. Therefore, these results suggest that the occurrence of IDPs is primarily determined by the evolutionary origin of the protein rather than on the in vivo protein environment.
Dataset of human mitochondrial proteins
In a keyword search for human mitochondrial proteins on the UniProt database (Release 2011.4) (Wu et al. 2006), we obtained more than 800 hits. The proteins with a signal sequence (transit peptide) were reserved as mitochondrial proteins by assessing the length of the signal sequence annotated in FT lines of UniProt. However, there were cases where signals were unclear and annotated as ‘Transit peptide 1 – ?’; such entries (approximately 70 proteins) were removed. Proteins that did not possess a signal but were directly encoded by mtDNA (13 human proteins) were reserved as a matter of course. We also removed proteins without a signal, if they had multiple subcellular localizations, including mitochondria, but the main localization was outside the mitochondria (approximately 50 proteins). After these selection processes, we obtained a dataset of 706 human mitochondrial proteins.
According to the annotation given in UniProt, human mitochondrial proteins were classified into four categories based on localization, viz., in the matrix, inner membrane, intermembrane space, or outer membrane, whereas those proteins without a clear annotation were treated as ‘unclassified’ (localization unknown). However, there were some exceptions, such as mitochondrial ribosomal proteins; even if their localization was not annotated, they could obviously be assigned to the matrix category.
Classification of proteins into B-type/E-type
Mitochondrial proteins in the dataset selected above were classified into the bacterial type (B-type) or the eukaryotic type (E-type) depending on the presence or absence, respectively, of homologues in α-proteobacteria. We used all 124 species of α-proteobacteria with a known genome, registered in the GTOP database (released in October 2010) (Fukuchi et al. 2009b). Existence of a homologue was defined by a hit in a BLAST search (Altschul et al. 1997) with an E-value less than 10−10. For weaker hits, with E-values less than 10−3, we used an additional criterion—whether the same structural/functional domains, that is, SCOP (Murzin et al. 1995) and Pfam (Finn et al. 2008) provided by GTOP, were detected in a pair of proteins. If the query protein contained multiple domains, bacterial homologues should also have the same types of domains, in the same order, along the sequence. The criterion of using domains was capable of distinguishing proteins into B-type/E-type. For instance, we reasoned that if a query protein had a SCOP/Pfam domain that was detected only in eukaryotic proteins, but never in prokaryotic proteins, the query protein must be of the E-type.
Information on ID regions
Information on ID regions in human mitochondrial proteins was available from the DICHOT database (http://spock.genes.nig.ac.jp/~genome/DICHOT/). In addition, the DICHOT method (Fukuchi et al. 2011) was applied to all the proteins of two species of α-proteobacteria, namely R. prowazekii (Andersson et al. 1998) and R. typhi (McLeod et al. 2004), to establish whether each of their amino acid residues belonged either to ID regions or to structured regions.
This research was supported by a Grant-in-Aid for Scientific Research (C) (No. 2250027 and No. 23500372) and a grant of Strategic Research Foundation Grant-aided Project for Private Universities (No. S1001042) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.