Organellar function is essential for eukaryotic life, and depends upon activities that are maintained by proteins either internally synthesized within the organelle (organelle-encoded) or imported from cytosol (nucleus-encoded). Many of these functions are carried out by enzyme complexes with both organelle-encoded and nucleus-encoded subunits, and therefore coordination of the expression of organelle and nuclear genes is a critical matter. For example, the plant mitochondrial proteome can be estimated to consist of c. 2000 gene products (Millar et al., 2005). Of these, only c. 40 proteins are encoded within the mitochondrial genome, most of which encode essential subunits of oxidative phophorylation enzymatic complexes and ribosomal proteins (Kubo et al., 2000; Notsu et al., 2002; Handa, 2003; Ogihara et al., 2005; Sugiyama et al., 2005; Tian et al., 2006; Allen et al., 2007; Kubo & Newton, 2008; Fujii et al., 2010). Although these represent only a small proportion of all mitochondrial proteins, their importance to mitochondrial function means that incorrect regulation of these mitochondrial genes would severely affect the whole system. As plant mitochondria encode no machinery to manage their own RNA expression and post-transcriptional RNA modification processes, these essential steps are totally reliant on nuclear-encoded gene products (Binder & Brennicke, 2003).
RNA metabolism plays a particularly important role in organelle gene expression (Stern et al., 2010) and a wide array of different RNA binding proteins are found in organelles. Pentatricopeptide repeat (PPR) proteins are the most numerous of these. The first PPR protein to be described was the Saccharomyces cerevisiae mitochondrial protein Pet309, found to participate in translation of cox1 (Manthey & McEwen, 1995; Manthey et al., 1998; Tavares-Carreon et al., 2008). Subsequently, Pet309, as well as the protein P67 implicated in transcription in Triticum aestivum mitochondria (Ikeda & Gray, 1999) and CRP1 in Zea mays involved in translation of photosynthesis genes (Fisk et al., 1999; Schmitz-Linneweber et al., 2005), were recognized to be members of a large family of related proteins following the systematic analysis of the Arabidopsis thaliana (thale cress) genome (Aubourg et al., 2000; Small & Peeters, 2000). Since the discovery and definition of the PPR consensus sequence (Small & Peeters, 2000), predicted to form an antiparallel double alpha-helical motif, many studies have been conducted on PPR proteins covering biochemistry, molecular functions, cellular functions and roles in development (Schmitz-Linneweber & Small, 2008). The general picture that results from these studies is that PPR proteins form sequence-specific associations with RNA, and that these associations affect folding, processing and/or translation of the RNA, thus manipulating expression of the transcript. For example, CRP1 in maize is shown to associate with the 5′UTR region of photosynthetic genes psaC and petA, and mutants disrupted in CRP1 gene lack translation of these genes, leading to the defects in photosynthesis (Fisk et al., 1999; Schmitz-Linneweber et al., 2005). Arabidopsis CRR4 was the first gene found in plants to be directly involved in cytosine to uridine RNA editing (Kotera et al., 2005), since followed by many other PPR editing factors (Table 1). These are only a couple of examples, and for more discussion of the molecular functions of PPR proteins readers are redirected to reviews elsewhere that treat this topic comprehensively (Andres et al., 2007; Delannoy et al., 2007; Saha et al., 2007; Schmitz-Linneweber & Small, 2008; Chateigner-Boutin & Small, 2010). The PPR family is subdivided into two major classes, P and PLS. Whereas P-class PPR proteins consist of an orthodox tandem alignment of 35-amino-acid PPR (P) motifs, PLS-class proteins contain, in addition, slightly longer (L) or shorter (S) variant PPR motifs in tandem arrays of characteristic triplets, P-L-S (Lurin et al., 2004; Rivals et al., 2006; O’Toole et al., 2008). PLS-class proteins can be divided into two further groups, the E subclass and the DYW subclass, based on their C-terminal domains (Lurin et al., 2004; Rivals et al., 2006; O’Toole et al., 2008).
The aim of this review is to integrate our current knowledge about the evolution of the PPR gene family and connect this with the changes thought to have taken place in the sequence and expression of the organelle genomes during the history of land plants. Given the rapid expansion of PPR proteins in land plants, there is an interest in understanding what selective forces might have operated on them to increase their numbers. Recent progress in genome sequencing and the consequent enrichment of comparative genomic databases are starting to allow us to understand how and why the plant-specific expansion of the PPR family occurred.