Complex genomes have a great advantage in transposable element (TE) insertion when new genes and splicing variants are required to cope with adaptation: exonization, together with retrogene formation, is one of the major sources of this variability (Lin et al. 2008). TE insertions may also alter the expression of nearby genes with new splicing sites, adenylation signals, promoters and transcription factor binding sites building new transcription modules that can generate diversity among transcriptomes (Goodier & Kazazian 2008).
The most widespread family of autonomous TEs is long interspersed nuclear element L1 (LINE-1 or L1), an LTR-less (long terminal repeats) retrotransposon, which constitutes about half of the total number of TEs. Approximately 75% of mammalian genes contain at least one LINE-1 segment in their transcription unit, mainly within intronic regions and in poorly expressed genes (Han et al. 2004). Compelling evidence has implicated LINE-1 elements in the regulation of genome-wide gene expression by acting as a molecular fine-tuner of the RNA and microRNA biology (Ramos 2009).
In physiological conditions, L1 elements are predominantly found in germ cells but they also seem to be abundantly transcribed in differentiated cells exposed to cellular stress (Li & Schmid 2001), suggesting their functional role as an integral component of global genomic response to environmental stress and in cellular stress–related disease aetiology, exerting drastic effects on the mammalian transcriptome (Schulz 2006). In addition, it is known that physical stress promptly activates neuroendocrine and immune responses with the production of pro-inflammatory mediators that alter gene expression, especially in peripheral blood cells (Kawai et al. 2007).
In this work, we investigated the involvement of several LINE-derived sequences in exercise-induced stress through bioinformatics and gene expression analysis, starting from an already available EST dataset derived from peripheral blood mononuclear cells (PBMCs) of endurance horses (Genbank accessions from CO508721 to CO598770, published in Cappelli et al. 2007; Genbank accessions from FG341833 to FG341843 and from GH986483 to GH986492, unpublished) containing 74 transcript-derived fragments (TDFs) with a different expression patterns at three time points defined as: before the race, at the end of the race, and 24 h after the race.
As the horse genome retroelement content is consistent with other mammals (Wade et al., 2009), we assessed the mean level of TE expression in Equus caballus. Specific analysis with RepeatMasker was conducted on the horse ESTs database, showing that repetitive elements are as low as 4.76% of the sequences, with LINEs equal to 2.17%, as expected. Conversely, RepeatMasker analysis of our EST dataset revealed that 23 of 74 sequences (equal to 31.0% or 26.8% of the total length of sequences in bp) belong to repetitive elements, the category of LINE being the one most represented, with 82.5% of the total. In particular, L1 plays the major role, with about 95% of the total (Table 1). Moreover, intersecting L1 repeat genomic coordinates with the EquCab2.0, we found that 72% of these elements fall in intergenic regions, while 28% are included in at least one gene. We also calculated the coverage of this repeat within the coding sequence of L1-containing gene: more than 82% of these sequences were inside introns (Table S1).
|Endurance ESTs (% on 12589 bp)||Equus caballus ESTs1 (% on 19767037 bp)|
|Total interspersed repeats||26.84||4.76|
|GenBank Accession||Primers (Forward and Reverse)||Putative containing gene (ENSEMBL or N-SCAN naming)||Description||Repeat Type (class/family)||Real Time (basal vs race)|
|ENSECAG00000016412||Brefeldin A-inhibited guanine nucleotide-exchange protein||LINE/L1||Up-regulated||1.365||0.008|
|ENSECAG00000015342||Interleukin-8 Precursor (IL-8)||DNA/hAT||Up-regulated||2.507||0.0009|
|chrUn.282.1||Olfactory receptor 5an1 (olfactory receptor or11-244)||LINE/L1||– no amplification –|
|ENSECAG00000000381||DmX-like protein 2 (Rabconnectin-3)||LINE/L1||Up-regulated||1.663||0.003|
|chrUn.021.1||Trigger transposable element derived 1||LINE/L1||No change||0.494||0.41|
|ENSECAG00000005540||Putative RNA-binding protein 16||LINE/CR1||Up-regulated||1.412||0.009|
|ENSECAT00000011120||Syntaxin binding protein 5||LINE/CR1||Up-regulated||2.084||0.007|
|ENSECAG00000020278||Rap1 GTPase-activating protein 2 (Rap1GAP2)||LINE/L1||No change||1.054||0.5731|
|chr11.298.1||Connexin 45||LINE/L1||No change||0.865||0.091|
|chr11.974.1||Riken cdna a530088h08 gene||LINE/L1||– no amplification –|
|ENSECAG00000015883||Aminopeptidase O (AP-O)||LINE/L1||– no amplification –|
|chr10.271.1||Zinc finger protein 420||LINE/L1||– no amplification –|
|chr16.576.1||Stromal antigen 1||LINE/L1||– no amplification –|
|ENSECAT00000025828||Uncharacterized protein KIAA0232||LINE/L1||Up-regulated||1.807||0.024|
|chr18.052.1||Erythrocyte membrane protein band like 5||LINE/L1||Up-regulated||2.925||0.003|
An automated gene ontology (GO) analysis of our EST dataset was performed with BLAST2GO (Conesa et al. 2005) to annotate all sequences. Several ontological terms related to signal transduction and stress response were identified for the TE-derived TDFs (Tables S2–S4). Statistical analysis of GO data distribution using the Fisher’s exact test supported the enrichment of terms associated with TE-derived TDFs, such as response to external stimulus (P = 0.012) and GTPase activator activity (P = 0.038).
To verify the TE-derived TDFs modulation of expression during exercise, qRT-PCR was carried out as described elsewhere (Cappelli et al. 2008) on PBMCs of six horses chosen from the high-level participants in national and international endurance races (90–160 km). Samples were collected at the three different time points described previously. Primers were designed on TE-derived TDFs using Primer3 software (Table 2). Ten of the 23 tested sequences significantly increased their expression in horses at the end of the race. The fold-change, calculated as the ratio of race/basal values and expressed as log2 of the real value, ranged from 1.605X of CO508722 to 3.523X of CO508756 (Table 2).
RACE experiments were successfully conducted for most of the TDFs showing an increased expression. BLAT mapping of the extended cDNA sequences (GenBank accessions GU797237, GU797238 and GU797239) confirmed the genomic position of original TDFs and revealed an exercise-induced transcription of large segments of the LINEs contained in known or predicted gene introns.
Our results are consistent with the accepted concept that stressful environments increase TE retrotransposition as well as alternative splicing (Teneng et al. 2007), hence, indicating that physical stress may play a role in modulating the activity of LINE-derived and/or LINE-linked sequences in strenuous exercise in horses. For these sequences, it is possible to hypothesize that they represent the relic of LINEs embedded in modern genes modulated by exercise and stimulated to exonize. The cellular context appears to be a crucial factor in TE exonization with identified tissues and disease-specific transposon-derived cDNA sequences (Mersch et al. 2007).
These integrated molecular and bioinformatics data reveal new insights and shed some light on the genomic factors and regulatory mechanisms involved in strenuous exercise in horses. They also offer intriguing scenarios that might explain the transcriptional processes of L1-type retrotransposons in exercise-induced stress.