SEARCH

SEARCH BY CITATION

Keywords:

  • ectopic recombination;
  • Hill–Robertson interference;
  • processed pseudogene;
  • recombination rate

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References

The aim of this article is to demonstrate possible recombination-associated evolutionary forces affecting the genomic distribution of processed pseudogenes. The relationship between recombination rate and the distribution of processed pseudogenes is analysed in the human genome. The results show that processed pseudogenes preferentially accumulate in regions of low recombination rates and this correlation cannot be explained by indirect relationships with GC content and gene density. Several explanatory models for the observation are discussed. A model of selection against ectopic recombination is tested based on the difference in distribution pattern between two classes of processed pseudogenes, which differ in the possibility of stimulating ectopic recombination. Our results indicate that the correlation between processed pseudogene density and recombination rate is probably results, in part, from the selection against ectopic recombination between closely located homologous processed pseudogenes. We also found a length effect in processed pseudogene distribution, namely long processed pseudogenes are located more preferentially in regions of low recombination rates than short ones.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References

Pseudogenes are noncoding copy of genes, which are created by genomic duplication and retrotransposition (Maestre et al., 1995; Balakirev & Ayala, 2003). The former type is referred to as duplicated pseudogenes and the latter is referred to as processed pseudogenes. Processed pseudogenes arise from the reverse transcription of mRNA transcript followed by subsequent integration into genomes (Maestre et al., 1995). Because of its retropositional origin, processed pseudogenes are also referred to as retropseudogenes. Processed pseudogenes are nonautonomous retrotransposons which are probably mobilized by long interspersed elements (LINEs), a kind of autonomous retrotransposons in the genome (Esnault et al., 2000; Kazazian, 2004). It has been known that processed pseudogenes occur in a great number of genomes including prokaryotes and eukaryotes, especially in mammalian genomes (Harrison et al., 2001; Homma et al., 2002; Torrents et al., 2003; Zhang et al., 2003).

Evolutionary pressures imposed on processed pseudogenes are reflected not only in the sequence variation, but also in the abundance or genomic distribution of it in genomes. Therefore, revealing the forces shaping the distribution pattern of processed pseudogenes along chromosomes has important implications to the population dynamics of pseudogenes and evolutionary process of genomes. The population dynamics of retrotransposons including processed pseudogenes depends on two factors, transposition activity and selection for (or against) the beneficial (or deleterious) effect of transposition activity. However, the concrete forms of the evolutionary forces affecting the genomic distribution and dynamics of processed pseudogenes in natural populations remain incompletely understood. It has been reported that processed pseudogenes are not evenly distributed across the genome, and the distribution of processed pseudogenes is correlated with many factors, such as local GC content (Torrents et al., 2003), DNA turnover rate (Harrison et al., 2003; Torrents et al., 2003), stability of chromosomal structure (Pavlicek et al., 2001; Zhang et al., 2002), the length of oogenesis (Drouin, 2006) and the distribution of functional elements (Chen et al., 2002).

A great deal of studies has shown that the distribution of transposons including DNA transposons and retrotransposons is associated with recombination in various genomes (Duret et al., 2000; Bartolome et al., 2002; Rizzon et al., 2002; Petrov et al., 2003; Wright et al., 2003; Hua-Van et al., 2005; Song & Boissinot, 2007). Thus, the recombination rate may also have an influence on the distribution of processed pseudogenes. Two models referred to as ectopic recombination model and deleterious insertion model have been proposed for the relationship between recombination rate and transposon distribution. Ectopic recombination is an event of illegitimate recombination between elements located at distinct sites in the genome, which can cause chromosomal rearrangements and functional element deletions. Under the ectopic recombination model, the frequency of ectopic recombination between transposons is supposed to be proportional to the meiotic recombination rate (Langley et al., 1988), and the transposons are expected to accumulate in less recombining regions because of relatively strong selection against ectopic recombination between transposons in high recombination rate regions (Charlesworth et al., 1994). The deleterious insertion model also predicts an accumulation of transposons in regions of reduced recombination because the selection against the deleterious insertions of transposons in regions of low recombination rates is reduced because of Hill–Robertson interference (Hill & Robertson, 1966). However, both above models cannot explain some exceptions, for example, unusual reproductive system of Caenorhabditis elegans (Duret et al., 2000). Processed pseudogene is a kind of retrotransposon, so it is reasonable to speculate that recombination has impact on processed pseudogene distribution. However, to our knowledge, no comprehensive survey has been achieved yet. The genome-wide identification of pseudogenes (Torrents et al., 2003; Zhang et al., 2003) and the increasingly improved high-resolution map of recombination (Kong et al., 2002) provide us an opportunity to study the relationship between processed pseudogene distribution and recombination in the human genome. This article reports our investigation on the relationship between recombination rate and the distribution of processed pseudogenes in the human genome. Our results show that processed pseudogenes preferentially accumulate in regions of low recombination rates. Several explanatory models for the observation are discussed.

Alu and LINE-1 (L1) retrotransposons resemble processed pseudogenes in some respects. Alu and L1 retrotransposons are the most abundant transposable elements in our genome, with approximately 1.1 and 0.5 million copies, respectively (IHGSC, 2001). The spreads of Alu, L1 and processed pseudogenes in a genome are all dependent on retrotransposition machinery. Of these three kinds of retroelements, L1 is autonomous retrotransposons, and Alu and processed pseudogenes are nonautonomous retrotransposons that are probably mobilized by LINEs (Esnault et al., 2000; Kazazian, 2004). The majority of both Alu and L1 are nonfunctional ‘fossils’ that either were already incapable of replication at the time of insertion or became inactive later because of mutations (Abrusan & Krambeck, 2006), which is similar to processed pseudogenes. Because of the similarity in transposition process and nonfunctionality with processed pseudogenes (Maestre et al., 1995; IHGSC, 2001), the Alu and L1 retrotransposons are included in our study for the purpose of comparison.

It is necessary to test the different models to fully understand the genomic distribution of processed pseudogenes. Discriminating between the different models is complex, because different mechanisms (i.e., insertion bias and the different types of selection) can have the same effect on transposon distribution, and the same observation can be explained by radically different mechanisms. For instance, the abundance of L1 in regions of low recombination can be explained by a bias of insertion into low GC content regions where reocombination rate is low, a reduced efficiency of selection against ectopic recombination or insertional mutations (Rizzon et al., 2002; Song & Boissinot, 2007), or the modification effect of L1 elements on recombination (Jensen-Seaman et al., 2004). Unlike high copy number retroelements, such as L1 and Alu repeats, which are densely distributed at a megabase scale across the genome, the moderate abundance of processed pseudogenes in the human genome can readily be separated into two classes which differ in the possibility of stimulating ectopic recombination according to the distance between homologous processed pseudogenes. The ectopic recombination model, therefore, can be tested using the two classes of processed pseudogenes that are at risk and not at risk for stimulation of ectopic recombination. Under the ectopic recombination model, it is expected that the processed pseudogenes that are at risk for stimulation of ectopic recombination should accumulate in regions of low recombination rates more preferentially than the processed pseudogenes that are not at risk for stimulation of ectopic recombination. Our data indicate that processed pseudogenes can serve as an ideal tool to distinguish the ectopic recombination model from the others.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References

Positional information of sequences

There are various human processed pseudogene datasets that differ in number and structure of the pseudogene sequences available at http://www.pseudogene.org. Generally, the more stringent the criterion is, the fewer pseudogenes would be identified. In the present article, three processed pseudogene datasets with large samples were selected for the comparative analysis of the distribution of processed pseudogenes. These datasets are referred to as Gerstein (11 766 processed) (Zhang et al., 2003), Bork (13 990 processed) (Torrents et al., 2003) and Hoppsigen (5105, all processed) (Khelifi et al., 2005). The Hoppsigen is a database of human and mouse processed pseudogenes with no duplicated pseudogenes, and the Gerstein and Bork datasets contain both processed and duplicated pseudogenes. There are some differences in the criteria between Hoppsigen, Gerstein and Bork processed pseudogene identification (Torrents et al., 2003; Zhang et al., 2003; Khelifi et al., 2005). The processed pseudogene identifications of the three datasets are all initiated by a similarity search against the human genome using protein or coding sequences of genes as queries, but the similarity search in the Bork processed pseudogene identification is less conservative than the others, which is responsible for the high number of pseudogenes identified. Another distinct criterion used in the Bork pseudogene identification is the test of nonfunctionality using the ratio of nonsynonymous/synonymous substitution rate (Ka/Ks). The positional information of the processed pseudogenes was retrieved from the pseudogene database (http://www.pseudogene.org) constructed by Gerstein lab. The positional information of L1 and Alu repeats and Ensembl genes was retrieved from UCSC (http://www.genome.ucsc.edu). Note that it has been reported that some pseudogenes are mis-annotated as genes in Ensembl (Zhang et al., 2003). Therefore, we excluded mis-annotated genes using the following criteria: if the coding region of a Ensembl gene is overlapped at least partially with the processed pseudogenes, then the Ensembl gene would be considered as mis-annotated. In the calculation of gene density, only the first transcripts of the genes were selected in the context of overlapping genes and alternatively spliced genes.

Estimation of recombination rate

The human genetic map (Kong et al., 2004) is obtained from MAP-O-MAT (Kong & Matise, 2005) (http://compgen.rutgers.edu/mapomat). It is a high-resolution genetic map that combines the genotype data from both Centre d’Etude du Polymorphisme Humain (CEPH) and deCODE pedigrees and that incorporates single nucleotide polymorphisms (SNPs). There are total of 14 759 genetic markers, and the physical positions of the markers on human genome (human build 36) are given in the map. Sex-average recombination rate was estimated using the method described in reference (Yu et al., 2001): cubic spline was fitted to the genetic positions as a function of physical positions for each chromosome, and first derivatives at marker physical positions were considered to be the recombination rates (cM/Mb) at those loci. Each position of the base pair in the interval between adjacent markers was then assigned the recombination rate at the position of the second (or downstream) marker. The genome was split into nonoverlapping 5-Mb sliding windows and the recombination rate for each window was computed by dividing the sum of the corresponding recombination rates at each base in the window by the size of the window. The windows that contain no marker or more than 50% unknown base ‘N’ were excluded.

Density calculation

The distribution bias of processed pseudogenes in regions of different recombination rates can be investigated by correlation analysis between recombination rate and the absolute density of processed pseudogenes (number Mb−1). The distribution bias can also be evaluated by the relative density of processed pseudogenes in regions of different recombination rates. The relative density calculation is as follows: (i) the whole genome was divided into nonoverlapping fragments of 5-Mb, which were then classified according to their recombination rate into bins with 0.5 cM/Mb intervals; (ii) the proportion of the processed pseudogenes (Pppgene) in a given bin of recombination rate was calculated as the number of processed pseudogenes in the given bin of recombination rate divided by the total number of processed pseudogenes, and the proportion of the genome in a recombination rate bin (Prec) was calculated as the amount of 5-Mb fragments in that bin divided by the total amount of the genome; (iii) the relative density of processed pseudogenes was calculated as the proportion of the processed pseudogenes in a given bin of recombination rate divided by the proportion of the genome in the same recombination rate bin. Similar method was applied to the calculation of relative density of processed pseudogenes in regions of different gene density: (i) the whole genome was divided into nonoverlapping fragments of 2-Mb, which were then classified according to their gene density into bins with 5 number/Mb intervals; (ii) the relative density of processed pseudogenes in regions of different gene density was calculated as the proportion of the processed pseudogenes in a given bin of gene density divided by the proportion of the genome in the same gene density bin.

Testing the ectopic recombination model

The ectopic recombination model states processed pseudogenes will accumulate in regions of reduced recombination, where ectopic recombination, supposed to be less frequent, would have less deleterious effects. We tested the ectopic recombination model based on a hypothesis that the processed pseudogenes that are at risk for stimulation of ectopic recombination (denoted as at-risk ER ppgenes) should accumulate in regions of low recombination rates more preferentially than the processed pseudogenes that are not at risk for stimulation of ectopic recombination (denoted as nonrisk ER ppgenes). The at-risk ER ppgenes are defined as homologous processed pseudogenes (generated from the same gene) that located in the same 5-Mb window and the remaining processed pseudogenes are included in the group of nonrisk ER ppgenes. To test the model, the processed pseudogenes excluding those located on chromosome Y were classified into two groups: at-risk ER ppgenes and nonrisk ER ppgenes. The distribution patterns of these two groups of processed pseudogenes in regions of different recombination rates were then compared by using their relative density in regions of different recombination rates and statistical analysis.

Results and discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References

Accumulation of processed pseudogenes in low recombination regions

We analysed the correlation between the absolute density of processed pseudogenes and recombination rate for nonoverlapping 5-Mb windows. The results are shown in Table 1. The pairwise correlation results in the table display (i) Gerstein pseudogene density is significantly negatively correlated with recombination rate; (ii) Bork and Hoppsigen pseudogene densities are also negatively, but not significantly, correlated with recombination rate.

Table 1.   Spearman correlations of absolute density of retrotransposons with recombination rate and gene density in 5-Mb windows.
 NRetrotransposon vs. RecombinationRetrotransposon vs. Gene
PairwiseP-valuePartialP-valuePairwiseP-valuePartialP-value
  1. The partial correlation coefficients between processed pseudogene density and recombination rate are obtained using gene density and GC content as a control, and the partial correlation coefficients between processed pseudogene density and gene density are obtained using recombination rate and GC content as a control.

Gerstein582−0.258<0.0001−0.233<0.00010.365<0.00010.409<0.0001
Bork582−0.0490.23−0.200<0.00010.657<0.00010.507<0.0001
Hoppsigen582−0.0460.26−0.0890.030.476<0.00010.395<0.0001
L1582−0.342<0.0001      
Alu5820.295<0.0001      

Chromatin structure, gene richness and recombination rate are viewed as factors that influence the distribution of processed pseudogenes. Given that the processed pseudogene density, recombination rate, GC content and gene density correlate with each other, the correlation of processed pseudogene density with recombination rate might be affected by the others. Partial correlation analysis (Table 1), however, shows that the processed pseudogene density is still negatively correlated with recombination rate when both the GC content and gene density are controlled as constant, suggesting the association of the processed pseudogene density with recombination rate is mediated neither by GC content nor by gene density. Figure 1 also shows that processed pseudogenes, particularly the Gerstein and Bork pseudogenes, show a distribution bias in regions of low recombination rates. We also found that processed pseudogene density is positively correlated with gene density (Table 1), which is in contrast with a model of selection against the insertion of processed pseudogenes into gene-dense regions. The effect of selection against the insertion of processed pseudogenes into gene-dense regions is only detectable in the highest gene density regions in Fig. 2, but cannot account for the general increasing trend of processed pseudogene density with gene density. Relatively more frequently occurred repeat-associated ectopic recombination process in gene-poor regions (Abrusan & Krambeck, 2006) might be responsible for the deficiency of processed pseudogenes in gene-poor regions. It has been shown that the shift of Alu and L1 repeats distributions towards gene-rich regions during evolution might be caused by ectopic recombination (Abrusan & Krambeck, 2006). Specifically, the ectopic recombination between the repeats occurs at a higher frequency in gene-poor regions introducing less deleterious effect than in gene-dense regions, and thus the repeats are deleted at a higher rate in gene-poor regions leaving a higher abundance in gene-dense regions. For the same reason, the processed pseudogenes might be deleted by repeat-associated ectopic recombination at a higher rate in gene-poor regions than in gene-dense regions, and thus led to the accumulation of processed pseudogenes in gene-dense regions as a passive consequence of their elimination from gene-poor regions.

image

Figure 1.  The relative processed pseudogene density in regions of different recombination rates, calculated over 5-Mb windows across the human genome. Recombination rate bins with 0.5 cM Mb−1 intervals are used. See ‘Materials and methods’ for the detail of the calculation of the relative processed pseudogene density.

Download figure to PowerPoint

image

Figure 2.  The relative processed pseudogene density in regions of different gene density, calculated over 2-Mb windows across the human genome. Gene density bins with 5 number Mb−1 intervals are used. See ‘Materials and methods’ for the detail of the calculation of the relative processed pseudogene density.

Download figure to PowerPoint

There are a limited number of fine-scale recombination rates in the human genome available (Crawford et al., 2004), the current genome-wide recombination map, however, is at megabase-scale resolution (Kong et al., 2002). The results of correlation analysis given in Table 1 are based on 5-Mb windows. To test the effect of window size, we also analysed the correlations based on 3-Mb and 10-Mb windows using Gerstein dataset. The results are similar with that based on 5-Mb windows (see Tables 1 and 2), indicating that the window size difference at megabase scale has no bearing on our main conclusions. However, we can see that the correlations are stronger when the window size is larger, which may be suggestive of the cumulative effect of processed pseudogenes in large-scale regions with conserved recombination rate patterns (Serre et al., 2005).

Table 2.   Spearman correlations of Gerstein processed pseudogene density with recombination rate and gene density for different window sizes.
Variable 1Variable 2Window sizeNPairwiseP-valuePartialP-value
  1. The partial correlation coefficients between processed pseudogene density and recombination rate are obtained using gene density and GC content as a control, and the partial correlation coefficients between processed pseudogene density and gene density are obtained using recombination rate and GC content as a control.

ppgene densityRecombination3-Mb959−0.235<0.0001−0.222<0.0001
10-Mb301−0.274<0.0001−0.286<0.0001
ppgene densityGene density3-Mb9590.298<0.00010.333<0.0001
10-Mb3010.334<0.00010.387<0.0001

In recent years, it has been reported that recombination plays an important role in the distribution of transposons along chromosomes in a variety of organisms (Duret et al., 2000; Bartolome et al., 2002; Rizzon et al., 2002; Petrov et al., 2003; Wright et al., 2003; Song & Boissinot, 2007). The association between the distribution of transposons and recombination depends on both the types of transposons (DNA or mRNA transposon) and the organisms and can be affected by other factors, such as gene density, the insertion preference of transposition process, etc. For the true correlation between transposon distribution and recombination rate, two models referred to as ectopic recombination model and deleterious insertion model have been discussed (Bartolome et al., 2002; Hua-Van et al., 2005). Now, let us discuss several explanatory models, including the two models, for the negative correlation between processed pseudogene density and recombination rate.

Firstly, processed pseudogenes may act as a suppressor of recombination by reducing the homology of homologous chromosomes. It has been shown that the decrease in the similarity between two homologous sequences caused by indel or SNP could reduce the homologous recombination rate (Brenner et al., 1985; Balakirev & Ayala, 2003). For similar reason, it is possible that the integration of processed pseudogenes into chromosomes and subsequent mutations occurred in the pseudogenes could decrease the similarity of the homologous regions and then reduce the homologous recombination rate. As an evolutionary consequence, the recombination rate would be reduced in regions where processed pseudogenes accumulated. The suppressor model seems possible at first glance. However, we do not think it is the reason for our observation. Both L1 and Alu retrotransposons have similar transposition machinery with processed pseudogenes (Maestre et al., 1995; IHGSC, 2001), and majority of them are also nonfunctional disabled copies characterized by high decaying rate. Hence, if the modification effect of processed pseudogenes on recombination rate is strong enough to be detected in the evolutional history, a similar effect should also be detected for L1 and Alu. However, no such effect was observed for Alu (Table 1).

Secondly, similar to the effect described in the weak selection model proposed for the negative correlation between intron length and recombination rate (Comeron & Kreitman, 2000), the insertion of processed pseudogenes into regions of low recombination rates might be favoured by selection because of its effect of decreasing the Hill–Robertson interference between weakly selected mutations, which allows the adjacent genes or exons evolve more efficiently. The weak selection hypothesis is theoretically possible to explain the negative correlation between recombination rate and processed pseudogene density. However, we speculate that such positive selection is not strong and ubiquitous enough in the genome to explain the observation, because the majority of new insertions of processed pseudogenes are more likely to be either neutral or detrimental.

Thirdly, the deleterious insertion model refers to the selection against deleterious insertion of processed pseudogenes into the genome which expects an accumulation of processed pseudogenes in regions of reduced recombination where selection efficiency decreased because of Hill–Robertson interference (Hill & Robertson, 1966). Our data show a positive correlation between processed pseudogene density and gene density, so it should be borne in mind that the selection against deleterious insertion described in the deleterious insertion model does not correspond to selection against insertion of processed pseudogenes into gene-dense regions.

Finally, if the frequency of ectopic recombination, which is known to be detrimental because of its effect of chromosomal rearrangements or functional element deletions, between transposons is proportional to the meiotic recombination rate (Langley et al., 1988), the insertions of transposons into regions of high recombination rates would have more deleterious effects (Charlesworth et al., 1994; Bartolome et al., 2002). Therefore, it is possible that the distribution preference of processed pseudogenes in regions of low recombination rates is caused by relatively strong selection against ectopic recombination between homologous processed pseudogenes in regions of high recombination rates. In summary, we rather refer to the negative selection described in the deleterious insertion model and ectopic recombination model to explain the accumulation of processed pseudogenes in regions of low recombination rates.

It should be noted that the results from different pseudogene databases do not resemble each other well (Table 1). Each of the different models discussed above, regardless of the mechanisms involved in, predicts that the processed pseudogene would be more abundant in reduced recombination regions. However, why is the correlation between recombination rate and pseudogene density stronger for Gerstein datasets and weaker for Bork and Hoppsigen datasets (see Table 1)? Hoppsigen dataset contains relatively small number of pseudogenes, although it is comprised of pseudogenes with high quality (Khelifi et al., 2005). So it is possible that the weak correlation for Hoppsigen dataset might be caused by the small sample size. The sample size of the Bork dataset is the largest, but the correlation is weak too. It might be caused by the relatively low quality of the pseudogenes in the dataset, because the method used in Bork pseudogene identification is less conservative (Torrents et al., 2003; Khelifi et al., 2005), which might result in more nonpseudogene contamination in the dataset.

Test of the ectopic recombination model

In the previous section, we have discussed some models for the accumulation of processed pseudogenes in regions of reduced recombination, but not strictly test whether each of the various models is responsible for the observation. Generally, it is hard to distinguish the various models that correspond to a same expectation. Now, we test the ectopic recombination model. Of the three processed pseudogene datasets, Gerstein processed pseudogene distribution shows a more significant relationship with recombination rate. So, Gerstein dataset is used in the analysis to discover a more observable biological effect.

The ectopic recombination model is established based on the selection against ectopic recombination, which has deleterious effect, between nonallelic sequences with high similarity. That is to say, the selection against ectopic recombination is absent if there is no risk of ectopic recombination between homologous processed pseudogenes. Ectopic recombination between two genomic sequences depends on two important, but not solely, conditions: high similarity and short distance between the two sequences both located on the same (or homologous) chromosome(s). It is impossible (or extremely rare) that ectopic recombination would occur between two genomic sequences located distantly between each other on same chromosome or located on different chromosomes. Therefore, the processed pseudogenes that are at risk and not at risk for stimulation of ectopic recombination could be separated to test the ectopic recombination model. The test of the ectopic recombination model can be based on a hypothesis that the processed pseudogenes that are at risk for stimulation of ectopic recombination (denoted as at-risk ER ppgenes) should accumulate in regions of low recombination rates more preferentially than the processed pseudogenes that are not at risk for stimulation of ectopic recombination (denoted as nonrisk ER ppgenes).

To test the ectopic recombination model, we classified the processed pseudogenes excluding those located on chromosome Y into two groups, at-risk ER ppgenes and nonrisk ER ppgenes, and compared the distribution patterns of the two groups of processed pseudogenes in regions of different recombination rates. See ‘Materials and Methods’ for the definitions of the at-risk ER ppgenes and nonrisk ER ppgenes. The classification result shows that there are 2856 at-risk ER ppgenes and 8508 nonrisk ER ppgenes. The distribution patterns of the at-risk and nonrisk ER ppgenes in regions of different recombination rates are illustrated in Fig. 3a. As shown in the figure, the at-risk ER ppgenes preferentially accumulate in the regions of low recombination rates (0.0–0.4 cM Mb−1), and the nonrisk ER ppgenes seems to have no observable distribution bias. The latter observation, however, is just caused by the matter of coordinate scale of the figure, because Fig. 3b shows that the nonrisk ER ppgenes also exhibit some extent of distribution bias in low recombination rate regions. The result of correlation analysis (Spearman correlation rs = −0.20, P < 0.0001, N = 582) between nonrisk ER ppgene density and recombination rate also confirms it. Compared to the nonrisk ER ppgenes, the at-risk ER ppgenes show a much more distribution bias in reduced recombination rate regions, and a Wilcoxon test also shows that recombination rates at the at-risk ER ppgene loci are significantly lower than the nonrisk ER ppgenes (Z = −14.2, < 0.0001), suggesting the presence of the effect of selection against ectopic recombination in the at-risk ER ppgene distribution. Of course, the ectopic recombination-associated selective effect is not exclusive, because the nonrisk ER ppgenes also show a distribution bias in reduced recombination rate regions, although it is not as strong as the at-risk ER ppgenes.

image

Figure 3.  The relative densities of the at-risk ER ppgenes and nonrisk ER ppgenes in regions of different recombination rates, calculated over 5-Mb windows across the human genome. Recombination rate bins with 0.2 cM Mb−1 intervals are used. Figure b is an enlargement of the distribution pattern of the nonrisk ER ppgenes in Fig. a, which is given only to clearly illustrate the distribution bias.

Download figure to PowerPoint

We propose that the risk of ectopic recombination between homologous processed pseudogenes is inversely correlated with the distance between them on the same chromosome, and therefore the closely located homologous processed pseudogenes would preferentially accumulate in regions of low recombination regions to avoid deleterious effect of ectopic recombination. For the same reason, it is expected that recombination rate is positively correlated with the distance between homologous processed pseudogenes. To test this hypothesis, we investigated the relation between recombination rates at processed pseudogenes and the distance between the processed pseudogene and its most closely located counterpart. The result shows that recombination rate is significantly positively correlated with the distance between homologous processed pseudogenes (rs = 0.137, < 0.0001, = 6262), which is consistent with the expectation.

We also analysed the correlation between average length of processed pseudogenes and recombination rate in 5-Mb windows for at-risk ER ppgenes and nonrisk ER ppgenes, respectively. The results show that average length of processed pseudogenes is significantly negatively correlated with recombination rate for both the at-risk and nonrisk ER ppgenes (rs = −0.110, = 0.025, = 440; rs = −0.214, < 0.0001, = 582). Under the ectopic recombination model, selection against the deleterious effects of ectopic recombination should affect longer elements more strongly than shorter ones, as they represent longer targets for homologous pairing (Dray & Gloor, 1997). Therefore, transposon length is expected to be negatively correlated with recombination rate (Petrov et al., 2003). The negative correlation between the length of at-risk ER ppgenes and recombination rate is consistent with the expectation. However, the length effect associated with ectopic recombination is not an exclusive effect, because similar length effect is observed for the nonrisk ER ppgenes as well. In conjunction with the deleterious insertion model, it is possible that the insertion of longer processed pseudogenes into chromosomes might lead to more deleterious effect, thus leading to the accumulation of long processed pseudogenes in regions of low recombination rates where selection efficiency is decreased because of Hill–Robertson interference.

In conclusion, the present study discovered a negative correlation between processed pseudogene density and recombination rate, which is consistent with the selection against ectopic recombination between closely located homologous processed pseudogenes. We believe that the selection against illegitimate recombination is a universal, but not unique, mechanism modulating the distribution of repeated DNA fragments, such as transposons (Hua-Van et al., 2005; Song & Boissinot, 2007), pseudogenes and palindromes (Waldman et al., 1999), because closely located repeats can result in genetic instability via illegitimate recombination. Thus, we hope the study may offer deeper insights into genome evolution and important implications to genetic diseases and unstable transgenes.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References

We thank Liqing Zhang from Virginia Tech for her helpful discussions and the anonymous reviewers for their constructive comments on the manuscript. We also thank Hao Lin from University of Electronic Science and Technology of China, Wei Zhao from University of Science and Technology of China, and Pro. Jian-Ying Wang from Inner Mongolia University of Science and Technology of China for their help in correcting the English. This work was supported by the National Natural Science Foundation grant 30660044 and the National Natural Science Foundation grant 60761001.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgments
  7. References