G. Schuster, Department of Biology, Technion – Israel Institute of Technology, Haifa 32000, Israel. Fax: + 972 4 8295587, Tel.: + 972 4 8293171, E-mail: firstname.lastname@example.org
The nonphotosynthetic mutant of Arabidopsis hcf152 is impaired in the processing of the chloroplast polycistronic transcript, psbB-psbT-psbH-petB-petD, resulting in nonproduction of the essential photosynthetic cytochrome b6f complex. The nucleus-encoded HCF152gene was identified to encode a pentatricopeptide repeat (PPR) protein composed primarily of 12 PPR motifs, similar to other proteins of this family that were identified in mutants defected in chloroplast gene expression. To understand the molecular mechanism of how HCF152 modulates chloroplast gene expression, the molecular and biochemical properties should be revealed. To this end, HCF152 and several truncated versions were produced in bacteria and analyzed for RNA-binding and protein–protein interaction. It was found that two HCF152 polypeptides bind to form a homodimer, and that this binding is impaired by a single amino acid substitute near the carboxyl terminus, replacing leucine with proline. Recombinant HCF152 bound with higher affinity RNA molecules, resembling the petB exon–intron junctions, as well as several other molecules. The highest affinity was found to RNA composed of the poly(A) sequence. When truncated proteins composed of different numbers of PPR motifs were analyzed for RNA-binding, it was found that two PPR motifs were required for RNA-binding, but had very low affinity. The affinity to RNA increased significantly when proteins composed of more PPR motifs were analyzed, displaying the highest affinity with the full-length protein composed of 12 PPR motifs. Together, our data characterized the nuclear-encoded HCF152 to be a chloroplast RNA-binding protein that may be involved in the processing or stabilization of the petB transcript by binding to the exon–intron junctions.
Chloroplast genes are often transcribed in polycistronic units. Following transcription, the precursor RNA undergoes a variety of maturation events, including cis- and trans-splicing, cleavage, processing of 5′- and 3′-end termini, and editing. In response to development, light stimuli and environmental cues, the modulation of gene expression is controlled in multiple steps including transcription, splicing, RNA stability and translation [1–5]. Nuclear-encoded (but chloroplast located) proteins possibly involved in chloroplast RNA processing and translation were identified while analyzing mutants having impaired expression of certain genes required for photosynthesis [2,4,6–12]. Such mutants were identified mainly in Chlamydomonas, maize and Arabidopsis. These studies revealed a complex regulation of gene expression, that is coordinated and involves a large number of proteins [2,13]. For example, about 14 nuclear-encoded loci were identified as being involved in the trans-splicing of the psaA transcript in the Chlamydomonas chloroplast .
The nuclear-encoded proteins identified so far that are involved in chloroplast gene expression can be divided into two groups: the first group includes proteins displaying amino acid sequence homology to enzymes involved in RNA maturation processes (such as peptidyl-tRNA hydrolase, pyridoxamine 5′-phosphate oxidase and pseudouridine synthase [7,8,15]); the second group is characterized by two similar repeated motifs of several dozen amino acids. The first motif is the tetratricopeptide (TPR) motif composed of about 34 nucleotides present 1–19 times in the proteins identified so far [9,16–18]. The second motif is the pentatricopeptide (PPR) motif that is similar yet distinguished from the TPR motif, and has been defined using a bioinformatics approach [19,20]. Proteins of the PPR motifs were identified upon analyzing RNA- and DNA-binding proteins, proteins that are involved in male sterility, and mutants impaired in RNA maturation [20–29]. The Arabidopsis genome contains more than 400 PPR proteins of this family in comparison to those of yeast and humans that contain only a few . Indeed, most of the PPR proteins in Arabidopsis are believed to be imported into the chloroplast and mitochondria, taking part in the gene expression processes of these organelles. Recently, while this manuscript was under review, an additional group of chloroplast group II intron splicing factors has being reported . These proteins were characterized by their similar repeated domains, termed CRM (chloroplast RNA splicing and ribosome maturation), that were proposed to be derived from an ancient RNA-binding module .
The Arabidopsis high chlorophyll fluorescent mutant, hcf152, is nonphotosynthetic, characterized as being impaired in the processing and accumulation of the psbB-psbT-psbH-petB-petD cotranscriptional unit that encodes subunits of the photosystem II and cytochrome b6f complexes. A more detailed analysis revealed that the processing of the petB transcript or the stabilization of the spliced transcript is impaired in the absence of the HCF152 protein . The nucleus-encoded HCF152 gene encodes a chloroplast located protein composed primarily of 12 PPR motifs . In addition to the hcf152-1 mutant, in which the gene was not expressed, an ethyl methylsulfonate (EMS)-induced mutant (hcf152-2), in which a single amino acid substitution was observed, showed a similar but less pronounced phenotype . In previous work, we showed that HCF152 is not associated in a high molecular mass complex and that it is an RNA-binding protein displaying high binding affinities to synthetic RNA molecules representing the petB intron–exon junctions . Here, in order to better characterize the RNA-binding properties and the possible protein–protein interactions of HCF152, we produced the protein and several truncated versions in bacteria, and analyzed its protein–protein interactions and RNA-binding properties. HCF152 was found to form a homodimer that is impaired in the hcf152-2 mutant in which one amino-acid at the C-terminus was substituted. The affinity of the protein to RNA is significantly dependent on the number and nature of the PPR motifs.
Materials and methods
Production of recombinant HCF152 and its truncated versions
The expression and purification of the mature full-length protein in bacteria was performed as described previously . The truncated HCF152 proteins were prepared according to the same procedure using the primers indicated in Table 1 and in the supplementary material. In the case of T152-P2, P1a and P1b, the protein was further purified using a Mono Q column.
Table 1. The RNA probes and HCF152 truncated proteins used in this work. R, reverse; F, forward.
Size-exclusion chromatography was performed by applying the purified recombinant HCF152 onto a Superdex 200 column in buffer E (20 mm Hepes pH 7.9, 12.5 mm MgCl2, 60 mm KCl, 0.1 mm EDTA, 2 mm dithiothreitol and 17% glycerol) at a flow rate of 0.5 mL·min−1. Proteins were precipitated by cold acetone and analyzed by SDS/PAGE. For digestion of the RNA, the extract was incubated with RNase A (1 mg·mL−1) and 3 U·µL−1 of RNase T1 at 37 °C for 1 h. The Superdex 200 column was calibrated with the following protein standards: thyroglobolin, 669 kDa; catalase, 232 kDa; aldolase, 158 kDa; bovine serum albumin, 67 kDa and casein, 30 kDa.
Analyzing the protein–protein interaction of HCF152
The HCF152, HCF152-2 and luciferase were synthesized in vitro and 35S-labeled using the TNT T3 coupled reticulocyte lysate system with the plasmid constructs containing, HCF152 (pcAT152), HCF152-2 (pcAT152/119) and luciferase (luciferase control T3 DNA, Promega), respectively. The labeled protein was mixed with a His6-fused recombinant protein [Trx (thioredoxin), T152-F, T152-NH or T152-CH] in a binding buffer (50 mm Tris/HCl, pH 8.0, 2 mm imidazole, 0.1% Tween 20, 2 mm dithiothreitol and 10 mm MgCl2) containing 100 mm NaCl for 20 min at room temperature. Ni-nitrilotriacetic acid agarose resin (50 µL) was then added, and incubation continued for an additional 30 min. The resin was washed five times with a binding buffer containing 100 mm NaCl, and the 35S-labeled proteins that bound the resin via the His6-fused proteins were eluted with the binding buffer containing 500 mm NaCl. The bound proteins were then analyzed by SDS/PAGE and autoradiography.
Preparation of RNA probes
Certain fragments of Arabidopsis chloroplast DNA (Table 1) were PCR amplified using the appropriate primers. They were used as templates for the transcription of the corresponding RNA by the T7 RNA polymerase primed by the T7 promoter sequence (AATACGACTCACTATAG) attached to the 5′-end of the forward primer. The PCR product was purified from gel using a QIAquick gel extraction kit (Qiagen), and the radiolabeled RNA probe was transcribed as described previously . For the production of nonradioactive RNA, the transcription reaction mixture included 5 mm of each nucleotide.
UV-crosslinking of the protein to radiolabeled RNA was carried out as described previously . The proteins (1 pmol) were incubated with [32P]RNA (25 fmol) in buffer containing 10 mm Hepes/NaOH (pH 7.9), 30 mm KCl, 6 mm MgCl2, 0.05 mm EDTA, 2 mm dithiothreitol, 8% glycerol, 0.0067% of Triton X-100 and 67 µg·mL−1 of yeast tRNA (Sigma) for 15 min. The protein and RNA were crosslinked by 1.8 J of UV irradiation in a UV-crosslinker (Hoefer Inc.) following digestion of the RNA by 10 µg of RNaseA and 30 U of RNase T1 at 37 °C for 1 h, fractionation by SDS/PAGE and analysis by autoradiography. For the competition assay, the protein was mixed with nonradioactive RNA for 5 min and the radiolabeled RNA was then added. When ribohomopolymers were used as competitors, an average length of 400 nucleotides was used to calculate the molar amount. When ssDNA and dsDNA were used as competitors, the PCR fragment of BDd, described above, was used when denaturated (ssDNA; 90 °C for 5 min) or not (dsDNA).
Preparation of recombinant HCF152 and the different fragmented proteins
The molecular analysis of the high chlorophyll fluorescent mutant 152 (hcf152) revealed that the processing of the chloroplast petB transcript is impaired . The cloning and characterization of the HCF152 locus revealed that the nucleus-encoded protein contains 12 repetitions of the PPR motifs (Fig. 1A). Another hcf mutant generated by chemical EMS treatment revealing a point mutation in the same gene resulted in the replacement of leucine with proline. The EMS mutant, termed hcf152-2, essentially displayed similar phenotype characteristics of the gene's inactivated hcf152-1 mutant, albeit to a lesser extent. The single amino acid substitution was located near the C-terminus of the protein but in none of the 12 PPR motifs (Fig. 1A) . As proteins characterized in mutants affected in chloroplast gene expression were found to belong to the PPR motif family of nucleus-encoded genes, and in order to explore the molecular mechanism in which these proteins affect gene expression post-transcriptionally, we decided to analyze the RNA-binding properties and the protein–protein interactions of a member of this family, HCF152. We therefore prepared the mature HCF152 recombinant protein, as well as several fragmented proteins containing different numbers of PPR motifs, while using the bacterial expression system (Fig. 1). As HCF152 is a nuclear-encoded and chloroplast located protein, it contains a transit peptide that is removed upon entering the chloroplast. This part of the protein could be defined using the chlorop program . In this work, the recombinant proteins were produced without the predicted transit peptide in order to resemble the mature protein in the chloroplast. Most of the available bacterial expression systems that were examined produced insoluble proteins. Finally, the production of the protein at 16 °C using the pBAD/Thio-Topo expression system, in which the protein was fused to a 18-kDa thioredoxin and a His6-tagged residue at the N- and C-terminus halves, respectively, was found to be the most efficient way to obtain a significant amount of soluble and active protein, as well as the different truncated forms. All experiments attempting to produce the HCF152-2 protein, that harbors a single mutation as a soluble protein, failed. In addition to the full-length protein, the N-terminus half (HCF152-NH) and the C-terminus half (HCF152-CH) of the protein containing four and eight PPR motifs, respectively, were also produced (Fig. 1). To characterize further the PPR motif, proteins containing both one and two PPR motifs were also produced (Fig. 1).
The recombinant proteins were purified and analyzed by SDS/PAGE in order to determine the correct molecular mass (Fig. 1B). In addition, all proteins were verified by immunoblotting using antibodies against the His6 tag (not shown).
HCF152 forms a homodimer of about 180 kDa
Several regulatory proteins described previously to be involved in chloroplast gene expression were found to be associated in high molecular mass complexes of about 300–1700 kDa [8,15,17,18,20,36,37]. However, when chloroplast soluble proteins were fractionated through a size-exclusion column, and the presence of HCF152 was detected with specific antibodies, HCF152 was eluted in one peak at about 180 kDa and was not associated in a high molecular weight complex . As the molecular mass of a mature HCF152 is 85 kDa, this result could be obtained by three possibilities. First, the protein is not associated in a complex and is fractionated at this molecular mass. Second, HCF152 is associated with other chloroplast proteins in a complex and third, HCF152 forms a homodimer. As it was found that two molecules of HCF152 could interact together (see below), the homodimer option seemed feasible. In order to analyze this possibility, purified recombinant HCF152 (HCF152-F) was fractionated on the same column. Figure 2A shows that the purified recombinant protein eluted at a molecular mass of about 180 kDa. As no protein other than recombinant HCF152 was loaded on the column, we concluded that HCF152 either forms a homodimer of about 180 kDa or that it is a monomer eluting from the column at this position. In order to distinguish between these possibilities, the purified recombinant protein was fractionated by nondenaturing PAGE. It was found that part of the protein population migrated at 180–200 kDa, while increasing the dithiothreitol concentration from 0.5 to 10 mm resulted in migration of the entire HCF152 population at 80–90 kDa (Fig. 2B). Taken together, these results suggested that HCF152 forms a homodimer and is not associated with other proteins.
In order to further characterize the homodimer formation of the HCF152 protein, we analyzed the one amino acid substitution mutant HCF152-2, and the C- and N-terminus halves of the protein. We had to use the in vitro translation system as HCF152-2 could not be produced in bacteria in a soluble form. First, the HCF152, HCF152-2 and luciferase (as a negative control) were produced and 35S-labeled by the in vitro transcription/translation system. Each protein was then incubated with a recombinant HCF152, HCF152-NH (N-terminus half) and HCF152-CH (C-terminus half), tagged with His6, followed by the addition of Ni-nitrilotriacetic acid/agarose and precipitation of the bound proteins. The results of this experiment are presented in Fig. 3. The 35S-labeled HCF152 bound the His6 HCF152, producing a signal five times greater than that of the background. The binding site was found to be located at the C-terminus of the protein although binding of the C-terminus half (CH) is only half of the full-length protein (Fig. 3). Interestingly, the binding efficiency of the full-length mutant HCF152-2 was also approximately half of the full-length HCF152, while the C-terminus half of HCF152 did not bind HCF152-2 at all (Fig. 3). No binding beyond the background level was observed for the luciferase protein that was used as a negative control (Fig. 3). In another control experiment, a His6 fused thioredoxin did not bind any of the 35S-labeled proteins beyond the background level (Fig. 3, lane marked ‘-’). These results confirmed the formation of a HCF152 dimer suggested by the size fractionation experiments. The C-terminus part of HCF152 is partially responsible for the intermolecular interaction. Furthermore, it is suggested that the one amino acid substitution in the EMS generated hcf152-2 mutant produced a protein partially impaired in dimer formation, and therefore this phenomenon is important for the biological activity of HCF152.
RNA-binding characteristics of HCF152
Analyzing the chloroplast transcript pattern of the hcf152-1 and hcf152-2 mutants by RNA gel blot revealed differences in the psbB-psbT-psbH-petB-petD polycistronic transcriptional unit, and, more specifically in the accumulation of the petB intron and processing of the 3′-termini of psbH[9,32]. This observation led to the hypothesis that the HCF152 gene product is required, either directly or indirectly, for the correct 3′-end processing of psbH and splicing of petB intron, or alternatively, stabilization of the splicing products . As HCF152 was characterized to bind RNA with preference to the psbH 3′-end and petB intron–exon sequences , we first asked whether this protein binds with high affinity other RNA molecules.
An RNA-binding UV-crosslinking experiment was performed analyzing several RNA molecules, as in previous work , spanning the psbB multicistronic transcript, as well as several other chloroplast transcripts. These included three RNA molecules resembling the psbA, rbcL and ribosomal 16S transcripts that do not contain introns and additional molecules of the petD transcript. In order to prevent nonspecific binding of HCF152 to RNA, an extra amount of about 330-fold yeast tRNA was included in the reaction mixture. As described in previous work , RNA molecules corresponding to the 5′- and 3′-borders of the petB intron, and to the corresponding parts of the related exons, were found to bind the recombinant HCF152 (BDd, BDe and BDf in Fig. 4). The UV-crosslinking assay gave very low signals with RNAs corresponding to the sequences of psbB and petD (BDa, and BDk). In addition, a high UV-crosslinking signal was also obtained with RNA corresponding to the psbA, but not with the 16S ribosomal RNA and rbcL or an RNA derived from the Bluescript plasmid (Fig. 4B). However, as the sequence of nucleotides differed within the RNA molecules, the lack of a UV-crosslinking signal does not necessarily imply that no binding takes place. In order to verify the binding properties of HCF152, we analyzed the binding of these RNAs using the UV-crosslinking competition method. In this method, only a single RNA is radioactively labeled to provide the UV-crosslinking signal when binding the protein, while extra amounts of the tested RNA molecule are added to compete with the binding of the radioactive RNA. An RNA that efficiently competes for the binding binds HCF152 with high affinity. The IC50 parameter was defined as the concentration of the competitor RNA that resulted in a 50% reduction in the radioactive UV-crosslinking signal (examples of competition curves are found below). The lowest IC50 value is the value of for a specific competitor RNA; the highest is the affinity of this RNA to the protein.
The UV-crosslinking assay was repeated using [32P]BDd RNA, the protein, about 330-fold of yeast tRNA, and the corresponding nonradioactive RNAs in molar excess as indicated in the figures. The results of this assay confirmed our previous result that the RNAs derived from the psbB intron–exon junctions bound HCF152 with a relatively high affinity while the other RNAs' multicistronic transcript displayed very low affinities (Fig. 4, Table 2, ). In addition, two RNA probes derived from the boundaries of the second intron of this polycistronic transcription unit, that of petD (BDi and BDk), displayed a lower affinity than BDd and BDf but a significantly higher one than the low affinity probes (Table 2). Moreover, RNA derived for the petD intron (BDj) showed high binding affinity. In addition, RNA derived from the psbA gene that does not contain an intron, displayed high binding affinity, while RNA derived from the ribosomal 16S RNA and rbcL displayed low binding affinity (Table 2). High binding affinity was obtained to ssDNA composed of the heat-denaturated PCR fragment used to transcribe the BDd RNA. However, a very low binding affinity was observed with dsDNA composed of the same PCR fragment but not denaturated (Table 2). The ssDNA binding phenomena is a characteristic of many RNA-binding proteins [39,40]. Upon analyzing ribohomopolymers, a high binding affinity to poly(A) was found, and to a lesser extent also to poly(U). Contrary to this, very low affinities were found for poly(C) and poly(G) (Fig. 5, Table 2). Total RNA of the photosynthetic cyanobacteria, believed to be related to the evolutionary ancestor of the chloroplast and yeast tRNA, displayed very low affinity to the recombinant HCF152 (Table 2).
Table 2. RNA-binding characteristics of HCF152. Competitive UV-crosslinking experiments were performed, as shown in Fig. 6, with radiolabeled BDd RNA and various in vitro synthesized competitor RNA probes (Bda–k, BA and 16S), as well as ribohomopolymers, Synechocystis total RNA, yeast tRNA and single- and double-stranded DNA. For each assay, the results were plotted as shown in Fig. 6. The concentration of the competitor that resulted in a 50% inhibition of the signal was defined as IC50 and is shown in the Table. Values represent at least three experiments.
Synechocystis RNA (ng)
Yeast tRNA (ng)
Taken together, these results indicated that HCF152 is an RNA-binding protein binding certain RNA molecules with higher affinity than others. In addition to previously shown molecules resembling the petB intron–exon junctions, it also binds RNA molecules resembling the petD intron, psbA and poly(A). Therefore, in order to better define the RNA-binding site, we carried out a deletion analysis of the BDd RNA, the high affinity binding molecule.
Defining the binding site of HCF152 in the psbH-petB transcript
In order to further characterize the HCF152 binding site, we synthesized a series of deleted RNA probes. Each RNA probe was designed by a subsequent deletion of the BDd and BDf sequences to which the HCF152 was bound with high affinity. When these RNA molecules were analyzed in the competitive UV-crosslinking assay, all were found to bind RNA with high affinity (Fig. 6A,B). In order to define the target sequence better, the Dd120 sequence was subsequently deleted by 20 nucleotides from the 5′- or 3′-ends. When the resulting seven RNA molecules were analyzed in the UV-crosslinking competition assay, five (D1–D5) were found to display high binding affinity and two (D6 and D7) low affinity (Fig. 6C,D). Therefore, the 21 nucleotides that differed between D5 and D6 are the putative target sequence for HCF152 in the psbH-petB intergeneic region. In addition, it is possible that a secondary structure involving the interaction between the 21 nucleotides and neighboring sequences is involved in formation of the binding site.
Together, these experiments defined several target sequences for high affinity RNA-binding of HCF152. These included the 21 nucleotides of the UTR between psbH and petB, the 82 nucleotides of petB intron (Df82), part of the psbA transcript (BA) and the petD intron (indicated by stars in Fig. 4A). Analyzing the secondary structure of these molecules revealed the ability to form a stem-loop structure (though with very short stems) with a single-stranded region of an adenosine-rich sequence (not shown). As HCF152 displayed high binding affinity to poly(A) (Table 2, Fig. 5), the single-stranded region of adenosine stretch could be a putative binding site for HCF152.
Contribution of the multiple PPR motifs for the affinity of HCF152 to RNA
As the major characteristic of HCF152 is the 12 PPR motifs, our next question related to their contribution to the RNA-binding phenomenon. First, the protein was divided into the C- and the N-terminus halves, containing eight and four PPR motifs, respectively (Fig. 1). Each part was analyzed for binding affinities to ribohomopolymers. While the full-length HCF152 bound poly(A) with the highest affinity of all molecules examined in this study, this affinity was drastically reduced in the C- and N-terminus halves of the proteins (Table 2, Fig. 5). On the other hand, while the full-length protein did not bind poly(G), the N-terminus half of the protein displayed affinity to this ribohomopolymer (Fig. 5B). The situation with poly(U) and poly(C) did not differ significantly between the full-length and parts of the protein. All bound poly(U) with a relatively high affinity and poly(C) with a very low affinity. Taken together, HCF152 as a full-length protein binds poly(A) with the highest affinity, but when divided into parts, each part displays a higher affinity to poly(U) than to poly(A). This experiment implies that the combinations of several PPR motifs, and probably the sequence of certain amino acids inside and perhaps outside the motifs, are responsible for the RNA-binding properties of the full-length protein.
To further characterize the RNA-binding properties of the proteins consisting of four, eight and 12 PPR motifs (HCF152-NH, -CH and -F, respectively), we performed a competitive UV-crosslinking assay using an Arabidopsis RNA sequence. As shown in Fig. 7, a 50-fold molar excess of BDd and BDf competitor RNAs, but not BDb, BDc and BDe RNAs, competed the RNA-binding of HCF152-F, the mature proteins containing 12 PPR motifs. However, binding of HCF152-CH, comprised of eight PPR motifs, to BDd RNA was competed efficiently by a 50-fold excess of BDe, BDd, and BDf (Fig. 7). These RNAs competed less efficiently with the binding of HCF152-NH, composed of four PPR motifs (Fig. 7). Therefore, the results of the experiments presented in Figs 5 and 7 showed that the number of PPR motifs in HCF152 is important for the specificity of RNA-binding. In addition, the results obtained so far strongly suggest that the combination of several PPR motifs determines the affinity and specificity of binding to RNA. However, as the PPR motifs differ in their sequence of amino acids, the specific amino acid sequence inside the PPR motifs might contribute significantly to their affinity and specificity. Furthermore, the number of PPR motifs seems to represent a critical parameter determining binding properties.
In order to obtain further details about this question, RNA-binding assays were performed using truncated proteins composed of one or two PPR motifs. As the UV-crosslinking signals for these proteins were very faint due to the low affinity of RNA-binding, the amount of proteins in the reaction mixture was significantly increased (Fig. 8). The results of this experiment showed that the affinity of the truncated protein composed of four PPRs was significantly reduced compared to the full-length protein. Reducing the number of PPR motifs to two resulted in an additional over 10-fold decrease (Fig. 8). Moreover, a UV-crosslinking signal could be obtained with these proteins only when the yeast-tRNA was omitted from the binding assay, indicating a lesser specific binding to RNA. Finally, the two truncated proteins containing a single PPR domain did not show any binding to RNA (Fig. 8). Together, these experiments demonstrated that for HCF152, the PPR motif is indeed an RNA-binding domain, but for the particular domains tested here the binding activity requires at least two PPR motifs and is drastically increased by increasing the number of PPRs to four and 12, respectively. As each PPR is unique and distinct in sequence, it is possible that other PPR domains of this protein, as well as sequences between the PPR motives, display higher affinity than the two tested here. Indeed, recent analysis of LRP130, a human PPR protein located mainly in the mitochondria, revealed RNA-binding activity of truncated proteins composed of only two or even one PPRs .
HCF152 is a specific RNA-binding protein
The results of this and the previous study  clearly show that HCF152 is an RNA-binding protein whose affinity and specificity are dependent upon the number and possibly the amino-acid sequence of the PPR domains. One of the four high affinity RNA-binding targets identified has been narrowed down to 21 nucleotides of the untranslated region between psbH and petB. The high affinity sequences are characterized by an adenosine stretch placed between sequences potentially forming short double-stranded regions. Indeed, the highest binding affinity of HCF152 to RNA was observed for poly(A). However, the adenosine stretch could not solely serve as the target sequence as poly(A) stretches were spread throughout the chloroplast genome and were found easily in most of the chloroplast transcripts as well as in the RNA probes used in this study. In addition, as the results showed that there is no simple nucleotide sequence forming the matrix for the high-affinity binding, it may be suggested, as for most of the specific RNA-binding proteins, that the combination of structural and sequence properties defines the binding site for the HCF152 in the RNA molecule. Similar observations were reported for other PPR proteins, the p67  and the LRP130 . However, a detailed analysis of LRP130 (harboring nine PPR domains) published while this manuscript was in the reviewing process, revealed that unlike HCF152, it displayed high binding affinity to poly(G) and poly(U) but not poly(A) . Additional major difference between HCF152 and LRP130 is that in LRP130 RNA-binding properties similar to the full length protein could be obtained with a truncated part composed only of two PPR motives . Therefore, because of the differences between the two proteins, the analysis of more proteins and PPR motives is required to define the specificity and affinity of RNA-binding and the interaction with proteins. The target region identified in this study for HCF152 is located downstream (+36 to +56) of the psbH stop codon and upstream (−79 to −99) of the petB translation start codon, suggesting that HCF152 is not involved in the translation regulation of the petB gene. However, a PPR protein of maize containing 14 PPR motifs that clustered in a very similar manner to HCF152, CRP1, has been proposed to be involved in the translation of the petD mRNA in addition to RNA processing .
Function of HCF152 in petB RNA maturation
The hcf152 strain phenotype suggests that HCF152 functions in the processing of petB by possibly stabilizing the 3′psbH terminus and the splicing products . In the chemically induced EMS mutant hcf152-2, in which one amino acid not located in a PPR motif was substituted, a similar yet less significant phenotype was observed. Unlike the hcf152-1 mutant in which the HCF152 protein is not produced, the protein in the hcf152-2 seems to be produced and accumulated, albeit with one amino acid changed. Our protein–protein binding experiment suggests that this single amino acid substitution has weakened the dimer formation in comparison to the HCF152 (Fig. 3). This observation suggests that the dimer formation is important for the function of HCF152 in RNA processing, and the inability to form the dimer results in a loss of function. Interestingly, the dimer formation was found to be located at the C-terminus half of the protein, and the single amino acid substitution next to the C-terminus of the protein but not in a PPR motif. Therefore, the question still arises as to whether or not the PPR motif, of which most of the HCF152 is composed, functions in the dimer formation.
The petB intron is classified as a group II intron that may be self-spliced in vitro. However, the group II introns of higher plant chloroplasts have lost their self-splicing ability when incubated in vitro, and auxiliary factors are therefore required for correct splicing. Several auxiliary factors from several organisms that assist group II intron splicing have been identified, and the molecular mechanisms regarding the way these proteins work are now under extensive study [31,42,43]. For example, the maize CRS1 and CRS2 proteins facilitate group II introns in the chloroplast; CRS1 is required for only one, the atpF intron, while CRS2 is involved in the splicing of nine of the 10 chloroplast group IIB introns [6,15,37]. The expression of CRS2 in E. coli together with the corresponding RNA did not promote splicing, indicating that other protein(s) are also required . Indeed, while this manuscript was under review, the discovery of new group II splicing factors that bound CRS2 and harbor a new characterized repeated domain, CRM, was reported . Both maize CRS2 and Arabidopsis HCF152 participate in the splicing of the petB intron. However, these two components are not engaged in the same protein complex. It will be interesting to explore whether CRS2 and HCF152 can interact with each other and/or promote the splicing. A possible model of how HCF152 is involved in the stabilization of the splicing products of the petB intron could be that the protein binds to the UTR region between psbH and petB, and to domain IV of the petB intron. The binding of the homodimer of HCF152 in this region somehow stabilizes the splicing products, possibly by folding the RNA into the correct splicing structure.
PPR motif is a polynucleotide-binding domain
The PPR motif was first described by Small and Peeters as a special structural motif whereby six repeats create a tunnel that fits the size of one single strand of RNA . So far, several PPR proteins, including HCF152, each containing a number of PPR motifs have been characterized as proteins involved in RNA and DNA metabolisms [20–27]. Two are characterized as DNA-binding factors [24,25], and therefore it appears that the PPR motif could be involved in both DNA- and RNA-binding. So far, proteins of the PPR family have not been identified in the prokaryote and in Archea, including cyanobacteria, which is believed to be closely related to the chloroplast ancestor (EMBL-EBI proteome database). Nevertheless, PPR proteins are very abundant in higher plants whereas other eukaryotic organisms contain no more than five PPR proteins. This observation suggests that this nucleus-encoded protein family has evolved into the ‘tools’ in which factors required for organelle gene expression are encoded and controlled by the nucleus gene expression machinery. Indeed, when the 452 ‘members’ of the PPR family in Arabidopsis were analyzed for their location in the cell, 189 were predicted to be located in the mitochondria and 96 in the chloroplast (35; EMBL-EBI proteome database; Fig. 9).
In this study, we showed that the PPR motif is an RNA-binding domain. Yet high affinity binding could not be obtained with one motif only but was possible with several repetitions of the motif. Repetition of the motif seems to determine the specificity of binding to the target RNA sequence as well. Indeed, analyzing the PPR proteins of the Arabidopsis genome disclosed an average of 11 repetitions of this motif and often 7–16 repetitions were found (Fig. 9; EMBL-EBI proteome database). Accordingly, the predicted computerized structure of PPR proteins implies that six PPR motifs form a tunnel that fits the size of one single-stranded RNA . In addition, the particular amino-acid sequence in each PPR motif is variable and probably contributes to the RNA binding properties. Indeed, as described above, the recent analysis of another PPR protein located mainly in the human mitochondria, LRP130, revealed a very limited contribution of the PPR motifs to the RNA-binding properties as the deletion of seven out of nine did not change the RNA-binding properties . Defining the exact structure of the HCF152 homodimer together with the petB precursor (unspliced) RNA will uncover the molecular mechanism of how this protein specifically facilitates the processing of this transcript.
We would like to thank the members of our laboratories for their helpful discussions and encouragement, and Lior Rosner for technical assistance during the preliminary stages of this work. This research was supported by grants from the Deutsche Forschungsgemeinschaft to Karin Meierhoff through SFB 189 at the University of Düsseldorf, and a grant from the German–Israeli-Foundation for Scientific Research and Development (GIF). Takahiro Nakamura is a recipient of a VATAT postdoctoral fellowship.
Table S1. The oligonucleotides used for the production of RNA probes and HCF152 truncated proteins. Italicized letters show promoter sequences for T7 RNA polymerase. Underlined letters indicate the restriction enzyme site.