Expression of an endogenous retroviral sequence from the HERV-H group in gastrointestinal cancers
Article first published online: 1 JUN 2007
Copyright © 2007 Wiley-Liss, Inc.
International Journal of Cancer
Volume 121, Issue 7, pages 1417–1423, 1 October 2007
How to Cite
Wentzensen, N., Coy, J. F., Knaebel, H.-P., Linnebacher, M., Wilz, B., Gebert, J. and von Knebel Doeberitz, M. (2007), Expression of an endogenous retroviral sequence from the HERV-H group in gastrointestinal cancers. Int. J. Cancer, 121: 1417–1423. doi: 10.1002/ijc.22826
- Issue published online: 24 JUL 2007
- Article first published online: 1 JUN 2007
- Manuscript Accepted: 28 MAR 2007
- Manuscript Received: 13 DEC 2006
- colorectal cancer;
- gastric cancer;
- endogenous retrovirus;
Human endogenous retroviruses (HERVs) account for approximately 8% of the human genome. Since the majority of HERV elements have accumulated inactivating mutations in the viral genes, only few expressed viral open reading frames (ORFs) have been described. In this study, we have analyzed the expression of a HERV-H copy located on Xp22.3 encompassing a potential ORF immediately downstream of the viral promoter. Conventional and real time RT-PCR based expression analysis of this specific HERV-H sequence showed overexpression in 16 of 34 (47%) colorectal, 25 of 63 (40%) gastric and 2 of 12 (17%) pancreatic cancers, whereas no overexpression was detected in bronchial and cervical cancers. Normal human testis, placenta and breast tissue did not show expression of this sequence. CpG methylation analysis of the viral promoter revealed a loss of methylation in cell lines expressing the HERV-H sequence as compared to nonexpressing cell lines and lymphocyte DNA derived from healthy individuals. Further investigations of the HERV-H long terminal repeat and the HERV-H RNA are necessary to assess the functional relevance of the HERV-H expression. © 2007 Wiley-Liss, Inc.
About 8% of the human genome consist of human endogenous retroviruses (HERVs).1 They have developed in the course of evolution by integration of exogenous retroviruses into the germ line. The majority of integrations occurred 30–45 million years ago. Like their exogenous protoviruses, HERVs usually consist of 4 genes that are expressed under control of the long terminal repeat (LTR): gag is coding for matrix and capsid proteins, pro is coding for a viral protease and pol for reverse transcriptase and integrase proteins that are involved in integration and replication processes. env gene products build the viral envelope and transmembrane structures.2
It is assumed that HERVs might have provided an antiviral resistance for their hosts and thus retroviral integration could have conferred a selection advantage during evolution. It is well accepted that HERVs were repeatedly involved in the evolution of human genes3: A HERV element was found to be responsible for the tissue specific expression of a salivary amylase gene, other HERV copies were found to provide polyadenylation sites and were associated with alternative splicing of cellular genes.4
Because of their long persistence in the host genome without selective pressure, the majority of HERV genes has become inactivated by mutations. Still, several groups have described functional HERV open reading frames (ORF), mainly derived from the env region of HERV-K,5 but also from HERV-H6 and HERV-W.7 Berkhout et al.8 have described a functional RT protein encoded by a HERV-K pol gene.
HERV LTRs are strong promoters that may influence the transcriptional regulation of adjacent genes. Some LTRs exhibit a cell type specific activity,9 others were shown to have binding sites for human gene regulatory factors, such as Myb10 and Sp1.11 LTR activity can be altered by the host genome through methylation, a common cellular defense mechanism against expression of integrated foreign DNA.12 Recently, Seifarth et al.13 have published a comprehensive microarray based analysis of HERV expression in different normal human tissues. They found a high variation of HERV gene expression dependent on both the HERV and the tissue type. The HERV-H family was found to be rarely expressed in normal tissues.
Several HERV elements have been associated with human diseases, including autoimmune disorders14, 15 and cancer. HERV sequences were found over expressed mainly in germ cell tumors and teratocarcinomas.2 Recently, HERV-K copies were found to be activated in breast cancer, prostate cancer16, 17, 18 and melanoma.19 While normal germ cell tissue was shown to express HERV sequences at various levels, normal breast tissue and melanocytes usually either completely lack or show only low levels of HERV expression.
The expression of a HERV sequence in transformed cells could have tumor-promoting or tumor-repressive effects: The activation of HERV pol genes with possible integrase and reverse transcriptase activity might have a strong impact on the stability of the host cell genome and might thus be involved in the transformation process. Activation of HERV LTRs by epigenetic mechanisms such as demethylation might influence the expression of surrounding genes. On the other hand, the expression of usually silenced HERV proteins could stimulate a cellular or humoral immune response against HERV expressing cells.
In a differential gene expression analysis using suppressive subtractive hybridization (SSH), we recently identified an endogenous retroviral sequence expressed in colorectal adenomas and carcinomas.20 In the present study, we have analyzed the structure of the HERV sequence in detail and have discovered a partially disrupted HERV-H sequence encompassing a novel ORF directly downstream of the 5′ LTR. The HERV-H RNA sequence was found to be over expressed in a series of gastrointestinal cancers, its expression correlated to demethylation of the HERV-H LTR.
Material and methods
Cell lines and human tissue samples
Cell lines were obtained from the DKFZ tumor cell lines repository. All cell lines were grown in DMEM or RPMI with 10% fetal calf serum. Fresh tissue samples obtained from the Departments of Surgery and Gynecology at the University of Heidelberg upon written informed consent were frozen in liquid nitrogen immediately during surgery. Histological review was provided by a staff pathologist from the Institute of Pathology, University of Heidelberg. Total RNA from normal human testis, placenta and breast tissue was purchased from Stratagene (La Jolla, CA).
Amplification, cloning and sequencing of HERV-H on Xp
All primers used for RNA and DNA amplification of the HERV-H locus were designed using the Primer3 Software (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) unless stated otherwise. Primer sequences are summarized in Table I. Repeated 5′-rapid amplification of cDNA ends (5′-RACE) was performed to obtain information about the transcription start of the previously identified HERV-H env transcript. One microgram of RNA isolated from the colorectal cancer cell line LS174T was digested with a DNAse kit (Gibco BRL, Invitrogen) as described by the manufacturer. Reverse transcription was carried out using the Superscript II RT kit (Gibco BRL, Invitrogen) as recommended by the manufacturer. To add primer binding sites at the unknown 5′ region of the synthesized cDNA, a terminal transferase reaction was performed including 18 μl of the cDNA, 2.5 μl of 2 mM dATP and 6 U of terminal transferase (Roche Applied Science). The new product with poly A stretches at the 5′ and 3′ ends was analyzed with a sequence specific primer from the known 3′ sequence (RACE-rev1-3) versus an oligo dT primer (RACE-fwd) using reagents from the Platinum Pfx polymerase kit (Invitrogen). Next, PCR products were cloned as described in the TA cloning kit manual (Invitrogen). Sequencing was performed using the Big Dye terminator kit (Applied Biosystems); products were run on the ABI310 sequence analyzer (Applied Biosystems). Long distance PCR was performed using the Expand Long Template PCR System (Roche) as described by the manufacturers applying the conditions for system 1, using 320 ng of genomic DNA from HeLa as template with the locus specific primers Full-fwd and Full-rev. Shorter sequences were amplified using primers identified on the predicted HERV-H sequence (data not shown). Amplicons were cloned and sequenced.
|RACE-fwd/RT||5′-RACE/reverse transcription||TTT TTT TTT TTT TTT TT|
|RACE-rev1||5′-RACE||TGG GGC CTA ATA AAA AGG AG|
|RACE-rev2||5′-RACE||AAT GGG GGA ATG GTA AGG AG|
|RACE-rev3||5′-RACE||GCA TTA ACC TTG ACT ATG TCT T|
|Full-fwd||Long distance PCR||ATA GCA AAC ATG AGG ACA TAC AGA GC|
|Full-rev||Long distance PCR||TGA TTT CAC TGC ACA TAA TCC C|
|Reverse||Reverse transcription||ATG GGA CAC GGC TTA GGA G|
|PCR-fwd||Conventional PCR||TCA CAG ACT GGG AAG GCA G|
|PCR-rev||Conventional PCR||AGG GGT TTG GGG TTT CTT G|
|Real-fwd||Real time PCR||CAC GTT TTA TCC GTG GAC CC|
|Real-rev||Real time PCR||AGG CAT CCC TGC AAT GAT TAA|
|Meth-fwd1||Methylation analysis||AAT TAT AGT TGT TTG ATG TGG GGT TA|
|Meth-rev1||Methylation analysis||ACA ACC CAA TAC ACC CTT AAA AAA|
|Meth-fwd2||Methylation analysis||ATA TGA GGA TAT ATA GAG TAG GTT AT|
|Meth-rev2||Methylation analysis||ACC AAA TTT AAA ATT AAT AAA ATA TTT CTT|
|Meth-rev3||Methylation analysis||AAT CAT AAC ACC AAA TTT CAT ATA C|
RNA expression analysis
Total RNA was isolated from cell lines and clinical samples with patient-matched normal tissue (for colorectal adenomas: pooled normal tissues) using the Qiagen RNeasy kit according to the manufacturer's instructions (Qiagen, Hilden, Germany). Northern blotting and hybridization was performed as described previously.20 To specifically amplify the expressed HERV-H sequence from Xp22.3, all samples were subjected to rigorous DNAse digestion as described earlier. Several primer systems chosen based on sequence variation between different HERV-H copies were tested (data not shown), the most specific results were obtained using a HERV-H specific RT primer and PCR primers including the transcribed HERV-H sequence and part of the 3′ untranslated region. As positive control reaction, reverse transcription (using oligo dT priming) and amplification of the GAPDH mRNA was performed as described previously.20 Primers PCR-fwd and PCR-rev were used for conventional HERV-H RNA PCR with the following conditions: Initial denaturation at 94°C for 3 min, followed by 35 cycles at 94°C for 30 sec, 58°C for 30 sec and 72°C for 1 min and final elongation at 72°C for 5 min. To verify a specific amplification of the expressed HERV-H RNA derived from Xp22.3, amplicons obtained from all cell lines as well as 10 patient samples were cloned and sequenced as described earlier. All sequences proved specific amplification of the RNA expressed from the HERV-H locus Xp22.3.
For real time RT-PCR analysis, a different PCR primer set was designed suited for Taqman analysis using the primer express software (Applied Biosystems) (Real-fwd and Real-rev). Quantitative PCR was performed as described previously.20
Each data point represents the results of triplicate experiments. All PCR products were analyzed by agarose gel electrophoresis to confirm specificity. Exemplary PCR products were cloned and sequenced to confirm the amplified target. To exclude genomic DNA contamination, samples were generated in parallel to cDNA synthesis without adding reverse transcriptase and investigated independently. Data analysis was carried out as previously described.20
LTR methylation analysis
DNA was extracted from cell lines, healthy donor PBMCs, and from microdissected paraffin embedded tumor tissue as well as corresponding normal tissue using the Qiagen DNA extraction kit (Qiagen, Hilden) according to the manufacturer's instructions. The HERV-H LTR and the upstream unique cellular sequence were analyzed for the presence of CpGs. MethPrimer (http://www.urogene.org/methprimer/) was used to design primers for bisulfite sequencing (Methy-fwd1 and Methy-rev1 for the upstream region, Methy-fwd2 and Methy-rev2 for the LTR region). Bisulfite modification was performed using the EZ DNA Methylation kit (Zymo Research) according to the manufacturer's conditions using 1 μg DNA from each sample to be analyzed for methylation. PCR was performed using the following conditions: Initial denaturation at 94°C for 3 min, 35 cycles of 94°C, 55°C for 30 sec and 72°C for 1 min followed by a 5 min 72°C elongation step. Because of the low quality and limited integrity of DNA purified from paraffin embedded tissues, a semi-nested PCR was performed on these samples covering a subset of the CpGs. First, a PCR with identical conditions but only 30 cycles was performed using the primers Methy-fwd2 and Methy-rev2, followed by a PCR using Methy-fwd2 and Methy-rev3 at the same conditions. Amplicons were analyzed on agarose gels, excised, cloned and sequenced. Sequencing analysis of at least 10 different clones of each amplicon showed complete conversion of unmethylated cytosine residues to uracils by bisulfite modification.
Bioinformatic analysis of the HERV-H sequence
To determine the sequence of the full length HERV-H copy, Blast analyses were performed using the online tools provided by the NCBI (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi). A Blastn comparison to the nonredundant database and to the whole genome database was done with the previously identified SSH clone.20 To compare published HERV-H sequences to the sequence identified in this study, the Blast2sequences tool was used. The nucleotide sequence of the expressed HERV-H region was translated to a protein sequence using the standard genetic code and compared to public protein databases using Blastp. The putative protein was analyzed for known protein domains, peptide mass and the isoelectric point using the prosite and pepstats modules of the web-based W2H HUSAR sequence analysis software (http://www.dkfz-heidelberg.de/menu/cgi-bin/w2h/w2h.start).
Identification of an endogenous retroviral sequence on chromosome Xp
Recently, we identified a HERV-H env sequence upregulated in colorectal adenomas and cancers.20 To identify the complete endogenous retrovirus, a detailed analysis of the genomic surrounding was performed. Blast analysis of the env sequence yielded several HERV-H elements distributed over the whole genome including a copy from Xp22.3 (position 4.479.152-4.484.999 on chromosome X contig, homo sapiens build 36.2) showing 100% identity to the SSH clone. To get information about the initiation of the sequence expressed from this region, repeated 5′-RACE was performed on RNA isolated from the colorectal cancer cell line LS174T that showed strong expression of the HERV-H env sequence. The assembly of overlapping 5′-RACE clones indicated that RNA expression was initiated from a site in the 5′-LTR that showed high similarity to transcription initiation sites described by Sjottem et al.11 The total length of the sequence obtained from 5′-RACE was 5.4 kb, corresponding to the main signal in Northern hybridization of the env probe with LS174T RNA (Fig. 1a), see also.20 The 3′ and 5′ LTRs were identified by comparing chromosomal sequences flanking the SSH clone with a published HERV-H sequence (HUMRGH2) and a Xp specific BAC clone (AC079264). The complete HERV-H sequence including the terminal repeats had a length of 5.9 kb. The predicted HERV sequence was confirmed by long distance PCR using primers placed in nonrepetitive sequences on Xp22.3 yielding a 6.5 kb amplimer (Fig. 1b).
Characterization of HERV-H on Xp
Sequence comparison of the HERV-H sequence with a putative HERV-H prototype21 showed high similarity of >95% throughout the whole viral genome despite many deletions in the viral genes including a large deletion in the env region. None of the viral genes encoded an ORF (Fig. 1c). In contrast, the LTRs were highly conserved and had a high similarity to HERV-H LTRs with documented promoter activity.11 All domains relevant for promoter activity including the Tata box and GC/GT boxes were identified in the 5′ LTR. In addition, the transcription initiation site and the primer binding site were both present (Fig. 2). The 3′ LTR had a conserved poly A site, the similarity between the 5′ and 3′ LTRs was 94%. The upstream and downstream chromosomal surroundings of the HERV-H sequence on Xp22.3 did not show any genes or predicted genes in close vicinity. The closest genes, PRKX and NLGN4X, were found in >800,000 and >1,300,000 nucleotides distance, 2 predicted genes, LOC729162 and LOC347381, in >600,000 and >800,000 nucleotides distance, respectively.
Although all canonical retroviral ORFs were found to be inactivated, a potential ORF consisting of 822 nucleotides was identified immediately downstream to the 5′ LTR (Fig. 2). The start codon is located 5′ of a putative pre-gag ORF discovered by Jern et al.,21 but is not in frame with pre-gag. Database analysis of the sequence on Xp22.3 showed high similarity (>90%) to other HERV-H sequences, all of them lacking an ORF in this region. To determine whether the transcribed sequence was conserved between different individuals, genomic sequencing was performed using genomic DNA derived from a number of cell lines and of peripheral blood lymphocytes of healthy male and female caucasians. In 2 cell lines (HeLa and HT29) and 2 healthy donors, 5 nucleotide alterations were found that resulted in altered amino acid residues in 3 cases, but did not alter the potential ORF (Fig. 2).
HERV-H expression analysis
The mRNA expression analysis of specific HERV-H copies is challenging because of the high similarity of almost 1,000 sequences from the HERV-H family distributed over the genome. To exclude amplification of the intron less sequence from genomic DNA, total cellular RNA was treated with DNase prior to RT-PCR and control reactions without RT were performed. A conventional RT-PCR system was established to analyze cell lines and colorectal cancer samples with corresponding normal tissue. The HERV-H sequence was found to be strongly expressed in the colorectal cancer cell lines Colo60H, LS174T, LS180 and KM12. No or only weak expression was found in the colorectal cell lines SW480, SW48 and HT29, as well as in the bronchial carcinoma cell lines A427, A549, CaLU6 and H128 and the cervical carcinoma cell lines HeLa and SW756. Normal human testis, placenta and breast tissue did not show expression of the HERV-H sequence. Representative RT-PCR results of 14 paired colorectal cancer/normal colon samples are shown in Figure 3a. For quantitative HERV-H expression analysis, real time RT-PCR using Taqman optimized primers was performed on a set of paired tumor/normal tissue samples from different entities including gastric, colorectal, pancreatic, bronchial and cervical carcinoma. In addition, a set of colorectal adenomas was compared to a pooled normal tissue control (Fig. 3b). Overexpression of the identified HERV-H sequence was found in 25 of 63 (39.7%) gastric, 18 of 34 (52.9%) colorectal and 2 of 12 (16.7%) pancreatic cancers as well as in 8 of 36 (22.2%) colorectal adenomas. In contrast, no overexpression was found in samples derived from 15 bronchial and 10 cervical carcinomas.
Methylation analysis of the HERV-H LTR
To analyze whether the differential expression of the HERV-H sequence was related to differences in the methylation status of the HERV-H LTR, bisulfite sequencing of the LTR and the adjacent cellular sequence was performed in cell lines and PBMCs obtained from healthy individuals. In addition, tumor and corresponding normal tissue was microdissected from paraffin embedded tissue specimens of 2 cases with HERV-H expression. Primers were designed for bisulfite sequencing covering 12 CpGs in the region of the LTR and the start of the HERV-H sequence, and 14 CpGs in a 380 bp 5′ region of flanking cellular DNA (Fig. 4). The PBMC samples from healthy donors as well as 2 colorectal cancer cell lines lacking HERV-H expression showed CpG methylation patterns with at least 11 (SW480) and up to 23 methylated CpGs (N2). In contrast, 3 cell lines with strong HERV-H expression (Colo60H, KM12, LS174T) did not show any CpG methylation in the regions analyzed, suggesting that the differential expression of the HERV-H sequence might be related to different methylation of CpGs located upstream and inside the HERV-H 5′-LTR (Fig. 4). Because of the low quality of DNA isolated from microdissected paraffin embedded material, only 6 of the 26 CpGs could be analyzed with a semi-nested PCR design. In each of the 2 tumor tissues derived from specimens that exhibited HERV-H expression, only 1 of the 6 CpGs was methylated, while both corresponding normal tissues showed CpG methylation in 5 of 6 CpGs, thus corroborating the findings on HERV-H LTR methylation in cell lines and healthy donor PBMCs.
In this study, we have performed a detailed analysis of a HERV-H sequence located on chromosome Xp. The sequence was found to be strongly overexpressed in a subset of gastrointestinal cancers whereas expression in other cancer entities like bronchial and cervical carcinoma was low. The overexpression of the HERV-H sequence in cell lines correlated with the demethylation of the chromosomal region surrounding the 5′LTR while cell lines and normal PBMCs without HERV-H expression showed CpG methylation in this region.
The deregulated expression of endogenous retroviral sequences in human cancer might only reflect the increasing chromosomal and cellular instability rather than being of functional relevance for tumor development. However, several groups have analyzed the expression of endogenous retroviral sequences in human cancers and pointed to possible tumor promoting or tumor repressing effects of HERV expression.2 Most of the data about HERV expression in cancer is available for HERV-K, a HERV group that has entered the human genome more recently than HERV-H and that has repeatedly been shown to harbor intact viral ORFs. Wang-Johanning et al. have shown that spliced and unspliced variants of HERV-K env transcripts with coding potential are expressed in human breast cancer.16, 17 The same group showed HERV-E RNA expression in prostate cancer samples, but not in normal prostate controls.18 In a recent study, env protein expression of different HERV classes and antibody responses to HERV env were analyzed in a large number of ovarian cancer patients. In multiple tissue arrays, HERV K env protein expression was found increased in ovarian cancers with various differentiations, and increased HERV-K antibody titers were observed in ovarian cancer patients.22 Similarly, it was reported that HERV-K sequences are expressed in leukemia cells and that leukemia patients show antibody responses to HERV-K.23 Buscher et al. demonstrated the expression of HERV-K proteins in primary melanomas and metastases as well as a serum response against HERV-K in 22% of melanoma patients.19
Schiavetti et al.24 have reported an epitope encoded by a HERV-K sequence that is recognized by cytotoxic T lymphocytes infiltrating melanoma. In human testicular cancers, immune reactions of patients against HERV-K gag and env proteins were observed.25 Kleiman et al.26 have analyzed the humoral immune response against HERV-K in patients with germ cell tumors and found a correlation between decreasing titers and successful chemotherapy. In the HERV-H group, the only ORF reported to be conserved is the env gene. It was suggested that HERV-H env may have immunosupressive effects. In a tumor transplant experiment, murine fibrosarcoma cells were rejected after injection into allogenic hosts, but grew out to tumors if HERV-H env was expressed in these cells.27 Likewise, Kershaw et al.28 showed a therapeutic effect of immune stimulation against env peptides of the endogenous murine leukemia virus in tumor challenged mice. Recently, Jern et al. have described a prototype HERV-H sequence in detail and showed that HERV-H env sequences can be found expressed in gastrointestinal cancers.21
The HERV-H sequence found to be overexpressed in our study did not exhibit any functional viral ORFs. However, we identified a potential ORF located directly downstream of the 5′ LTR. To our knowledge, no similar ORF has been described for other exogenous or endogenous retroviruses. Database analysis of the sequence on Xp22.3 showed high similarity (>90%) to other HERV-H sequences, albeit all of them lacking an ORF in this region due to single nucleotide alterations generating stop codons. Sequencing analysis of genomic DNA derived from a number of cell lines and PBLs from healthy caucasians showed high conservation of the locus between different human individuals. Database comparison of the putative protein failed to show any sequence similarity to known human or viral proteins. A domain search revealed a leucine zipper motif, but no cellular localization signal or other protein domains (Fig. 2b).
Since the potential ORF is part of the standard HERV-H sequence and has very high similarity to it, it is conceivable that it has developed by chance after integration into the human genome and was conserved thereafter. Given the long persistence of HERV-H in the human genome and the presence of the HERV-H sequence in all samples analyzed from different sources, it seems to have been conserved for a long time. To get more clues about the evolutionary development of this sequence, the presence and similarity of the sequence in different ethnic groups as well as in closely related primates might be investigated.
Currently, there is no experimental evidence of a protein encoded by the HERV-H sequence. Even if the expression of a full length protein from this locus is abrogated by cellular factors that counteract nonsense protein expression, processing of protein fragments by the antigen presenting machinery and presentation of peptides could induce antitumoral immune reactions, since the HERV-H sequence was not found to be expressed in normal tissues.
On the basis of the analysis of the chromosomal surroundings of the HERV-H copy on Xp22.3, a tumor relevant alteration of nearby genes by the HERV LTR seems to be unlikely, since neither predicted nor known genes can be found close to the HERV locus.
In contrast to many other studies that have looked at the general expression of HERV families in various diseases, we here present a detailed analysis of the structure and expression of a specific HERV-H copy located on Xp22.3 in several cancer entities. The HERV-H sequence was found to encode an atypical ORF that is over expressed in up to 30–50% of different gastrointestinal cancers. The expression of the HERV sequence correlated with a demethylation of the HERV LTR. Currently, there is no evidence that the mRNA expression from this locus is involved in gastrointestinal tumorigenesis. Functional analyses of the HERV promoter and the expressed sequence will be necessary to assess the relevance of a tumor-specific HERV-H expression.