Transcription of LINE-derived sequences in exercise-induced stress in horses

Authors


Dr S. Capomaccio, University of Perugia, Faculty of Veterinary Medicine, Via San Costanzo, 4, 06124, Perugia, Italy.
E-mail: vete7@unipg.it

Summary

A large proportion of mammalian genomes is represented by transposable elements (TE), most of them being long interspersed nuclear elements 1 (LINE-1 or L1). An increased expression of LINE-1 elements may play an important role in cellular stress–related conditions exerting drastic effects on the mammalian transcriptome. To understand the impact of TE on the known horse transcriptome, we masked the horse EST database, pointing out that the amount is consistent with other major vertebrates. A previously developed transcript-derived fragments (TDFs) dataset, deriving from exercise-stimulated horse peripheral blood mononuclear cells (PBMCs), was found to be enriched with L1 (26.8% in terms of bp). We investigated the involvement of TDFs in exercise-induced stress through bioinformatics and gene expression analysis. Results indicate that LINE-derived sequences are not only highly but also differentially expressed during physical effort, hinting at interesting scenarios in the regulation of gene expression in relation to exercise.

Complex genomes have a great advantage in transposable element (TE) insertion when new genes and splicing variants are required to cope with adaptation: exonization, together with retrogene formation, is one of the major sources of this variability (Lin et al. 2008). TE insertions may also alter the expression of nearby genes with new splicing sites, adenylation signals, promoters and transcription factor binding sites building new transcription modules that can generate diversity among transcriptomes (Goodier & Kazazian 2008).

The most widespread family of autonomous TEs is long interspersed nuclear element L1 (LINE-1 or L1), an LTR-less (long terminal repeats) retrotransposon, which constitutes about half of the total number of TEs. Approximately 75% of mammalian genes contain at least one LINE-1 segment in their transcription unit, mainly within intronic regions and in poorly expressed genes (Han et al. 2004). Compelling evidence has implicated LINE-1 elements in the regulation of genome-wide gene expression by acting as a molecular fine-tuner of the RNA and microRNA biology (Ramos 2009).

In physiological conditions, L1 elements are predominantly found in germ cells but they also seem to be abundantly transcribed in differentiated cells exposed to cellular stress (Li & Schmid 2001), suggesting their functional role as an integral component of global genomic response to environmental stress and in cellular stress–related disease aetiology, exerting drastic effects on the mammalian transcriptome (Schulz 2006). In addition, it is known that physical stress promptly activates neuroendocrine and immune responses with the production of pro-inflammatory mediators that alter gene expression, especially in peripheral blood cells (Kawai et al. 2007).

In this work, we investigated the involvement of several LINE-derived sequences in exercise-induced stress through bioinformatics and gene expression analysis, starting from an already available EST dataset derived from peripheral blood mononuclear cells (PBMCs) of endurance horses (Genbank accessions from CO508721 to CO598770, published in Cappelli et al. 2007; Genbank accessions from FG341833 to FG341843 and from GH986483 to GH986492, unpublished) containing 74 transcript-derived fragments (TDFs) with a different expression patterns at three time points defined as: before the race, at the end of the race, and 24 h after the race.

As the horse genome retroelement content is consistent with other mammals (Wade et al., 2009), we assessed the mean level of TE expression in Equus caballus. Specific analysis with RepeatMasker was conducted on the horse ESTs database, showing that repetitive elements are as low as 4.76% of the sequences, with LINEs equal to 2.17%, as expected. Conversely, RepeatMasker analysis of our EST dataset revealed that 23 of 74 sequences (equal to 31.0% or 26.8% of the total length of sequences in bp) belong to repetitive elements, the category of LINE being the one most represented, with 82.5% of the total. In particular, L1 plays the major role, with about 95% of the total (Table 1). Moreover, intersecting L1 repeat genomic coordinates with the EquCab2.0, we found that 72% of these elements fall in intergenic regions, while 28% are included in at least one gene. We also calculated the coverage of this repeat within the coding sequence of L1-containing gene: more than 82% of these sequences were inside introns (Table S1).

Table 1.   RepeatMasker output on different groups of sequences.
 Endurance ESTs (% on 12589 bp)Equus caballus ESTs1 (% on 19767037 bp)
  1. 1Downloaded from GenBank, December 2009

  2. The query species was Equus caballus, RepeatMasker version open-3.2.7, sensitive mode. Run with blastp version 2.0MP-WashU RepBase Update 20090120, RM database version 20090120.

SINEs3.831.07
 Alu/B100
 MIRs1.610.41
LINEs22.152.17
 LINE120.941.54
 LINE200.56
 L3/CR11.210.04
 RTE00.01
LTR elements00.7
 ERVL00.16
 ERVL-MaLRs00.33
 ERV_classI00.18
 ERV_classII00.02
DNA elements0.870.81
 hAT-Charlie00.5
 TcMar-Tigger00.13
Unclassified00.02
Total interspersed repeats26.844.76
Small RNA2.220.97
Satellites00
Simple repeats00.35
Low complexity00.82

A similar behaviour is apparent for our TE-derived TDFs mapped in the horse genome using BLAT (Kent 2002): 17 of 23 sequences were comprised in predicted or known genes (Table 2).

Table 2.   Details of TE-derived TDF expression and mapping.
GenBank AccessionPrimers (Forward and Reverse)Putative containing gene (ENSEMBL or N-SCAN naming)DescriptionRepeat Type (class/family)Real Time (basal vs race)
Exp. statusFCP-value
CO508721GGCAATCTCACTTCTGGGTAG
CTTGAGTAGTTGCTATATCTTGGC
chr5.772.1Dihydropyrimidine dehydrogenaseLINE/L1Up-regulated1.4240.004
CO508722AGTCTCGTAGGCTTTCTTCAC
TTGATGATGCAGAAGGAGGAG
ENSECAG00000016412Brefeldin A-inhibited guanine nucleotide-exchange proteinLINE/L1Up-regulated1.3650.008
CO508730TAGATACTGCGTGGGACAATGGC
CAATGAAACTTCAAGCAAATC
ENSECAG00000015342Interleukin-8 Precursor (IL-8)DNA/hATUp-regulated2.5070.0009
CO508733TTCGGGAAAGAGGGAGTGG
TTCATAGCACCAGCACCAAG
LINE/L1Up-regulated1.450.0213
CO508740ATAATAATGCCCACTGCCAATG
AGAATCACTTGATAGGCTCTGG
chrUn.282.1Olfactory receptor 5an1 (olfactory receptor or11-244)LINE/L1– no amplification –
CO508741AGAGCACTGAGGAGATAGAATTG
GAGTTGACAATATATAGGCAGCAG
ENSECAG00000000381DmX-like protein 2 (Rabconnectin-3)LINE/L1Up-regulated1.6630.003
CO508743AAGAACAGAGCCTCAGGATATG
CCAATGACATGGGTGTTAGATC
chrUn.021.1Trigger transposable element derived 1LINE/L1No change0.4940.41
CO508751AACCGCAATGAAGTACCACTAC
TATGGTCAGTCTTTCAGTTTAGGC
LINE/L1No change0.7640.756
CO508752AGAATATCACTACACACCCATTAG
TCCAGTTCCACACCCAAATC
LINE/L1No change0.4970.1158
CO508753ATCCAGCCTCCTGACTATCC
CTCCTAACCCTGTGAGATTCC
SINE/MIRNo change0.6350.3705
CO508755CCCTTTAGGTTTATTTGCTCCAG
AATGGGCAATCAGGATGGTG
ENSECAG00000005540Putative RNA-binding protein 16LINE/CR1Up-regulated1.4120.009
CO508756CCAGGTGCTACAAGGTAAGAC
GGTTTATTTGCTCCAGGTGAAG
ENSECAT00000011120Syntaxin binding protein 5LINE/CR1Up-regulated2.0840.007
CO508760TAAGGCCAGTTTAGTTGTAATGAG
ATTCAGGGAGAGACAAAGAGTG
ENSECAG00000020278Rap1 GTPase-activating protein 2 (Rap1GAP2)LINE/L1No change1.0540.5731
CO508763TCTTCACTGTAGATCCTATCTTCC
AAATCAGTGTGACAAAGAAGGC
chr11.298.1Connexin 45LINE/L1No change0.8650.091
CO508765CTAGTTCTTGTCCCTTCCTTGG
GGGCAGAGTACTGGAGAGG
LINE/L1No change0.6270.955
CO508768AGGCAAGTGAGGGAAGTTG
ACAGATATAACCACCAGAGTAACC
chr11.974.1Riken cdna a530088h08 geneLINE/L1– no amplification –
FG341837TGACTGCGTACCAATTCCG
AGTGATTGTCTTACAGGTTTGTG
ENSECAG00000015883Aminopeptidase O (AP-O)LINE/L1– no amplification –
FG341839TGAGCAATATCCACAAAGTTCTG
ACTGGCTGCGTATAATATTCTTG
chr10.271.1Zinc finger protein 420LINE/L1– no amplification –
FG341840CAGACCTAGACTGCTCATCAAG
GCTAACATCTGTGCCAATCTTC
SINE/tRNAUp-regulated1.3830.0494
FG341845AAACCACCTTTGGGAGCAG
CTGAGTCTGAAGGATGCTTATAG
chr16.576.1Stromal antigen 1LINE/L1– no amplification –
GH986484GCTCTATATTCTATCCCATTGGTC
TGAGTAAAGGAAATCGCAACAC
ENSECAT00000025828Uncharacterized protein KIAA0232LINE/L1Up-regulated1.8070.024
GH986488ATTCCCACTAGGCAAGATGAG
CTACAATTGAGAGGAGGTGTATAG
chr18.052.1Erythrocyte membrane protein band like 5LINE/L1Up-regulated2.9250.003
GH986492ATAATACTGTGCCAATGTGACTG
CGCATCGCCTCTTCCATC
ENSECAG00000001651DymeclinLINE/L1No change1.2180.338

An automated gene ontology (GO) analysis of our EST dataset was performed with BLAST2GO (Conesa et al. 2005) to annotate all sequences. Several ontological terms related to signal transduction and stress response were identified for the TE-derived TDFs (Tables S2–S4). Statistical analysis of GO data distribution using the Fisher’s exact test supported the enrichment of terms associated with TE-derived TDFs, such as response to external stimulus (= 0.012) and GTPase activator activity (= 0.038).

To verify the TE-derived TDFs modulation of expression during exercise, qRT-PCR was carried out as described elsewhere (Cappelli et al. 2008) on PBMCs of six horses chosen from the high-level participants in national and international endurance races (90–160 km). Samples were collected at the three different time points described previously. Primers were designed on TE-derived TDFs using Primer3 software (Table 2). Ten of the 23 tested sequences significantly increased their expression in horses at the end of the race. The fold-change, calculated as the ratio of race/basal values and expressed as log2 of the real value, ranged from 1.605X of CO508722 to 3.523X of CO508756 (Table 2).

RACE experiments were successfully conducted for most of the TDFs showing an increased expression. BLAT mapping of the extended cDNA sequences (GenBank accessions GU797237, GU797238 and GU797239) confirmed the genomic position of original TDFs and revealed an exercise-induced transcription of large segments of the LINEs contained in known or predicted gene introns.

Our results are consistent with the accepted concept that stressful environments increase TE retrotransposition as well as alternative splicing (Teneng et al. 2007), hence, indicating that physical stress may play a role in modulating the activity of LINE-derived and/or LINE-linked sequences in strenuous exercise in horses. For these sequences, it is possible to hypothesize that they represent the relic of LINEs embedded in modern genes modulated by exercise and stimulated to exonize. The cellular context appears to be a crucial factor in TE exonization with identified tissues and disease-specific transposon-derived cDNA sequences (Mersch et al. 2007).

These integrated molecular and bioinformatics data reveal new insights and shed some light on the genomic factors and regulatory mechanisms involved in strenuous exercise in horses. They also offer intriguing scenarios that might explain the transcriptional processes of L1-type retrotransposons in exercise-induced stress.

Acknowledgements

The authors thank Prof David Adelson for providing an improved version of LINE-1 annotation and Mr Gianluca Alunni for his valuable technical support. MIPAF SelMol supported this work.

Conflicts of interest

The authors have declared no conflicts of interest.

Ancillary