A new exon in the 5′ untranslated region of the connexin32 gene


Tatjana Simonic, Istituto di Fisiologia Veterinaria e Biochimica, Via Celoria, 10, 20133 Milano, Italy. Fax: +39-2-2666301,
Tel.: +39-2-2664343, E-mail: tsimonic@imiucca.csi.unimi.it


The cloning and sequencing of two bovine connexin32 cDNAs are reported. Comparative analysis with known corresponding mammalian cDNA and protein sequences, besides confirming a high degree of similarity among these proteins, allowed us to identify some specific features of the bovine connexin32 gene. The latter include: the presence of a novel exon in the 5′ UTR which is alternatively spliced, giving rise to a new mRNA species; the presence of two potential hairpin loops in the 5′ and 3′ UTR; and the presence of an additional amino acid, glycine235, in the C-terminal domain of the 284 residue protein. Among the common features, the presence of polypyrimidine clusters within the 3′ UTR, containing a consensus sequence for a cis-acting element, is noteworthy. Expression of connexin32 mRNAs was analysed in 16 bovine tissues. Transcript analysis suggests the presence, in cattle, of an alternative downstream promoter.


Charcot-Marie-Tooth disease

Gap junctions are specialized clusters of transmembrane channels that mediate intercellular communications, allowing small molecules and ions to move between neighbouring cells without exposure to the extracellular space [1–3]. Gap junctions appear to be dynamic structures, exhibiting short half-lives (5–6 h). They are functionally controlled by channel opening and closing and therefore play an important role in the regulation of intercellular communications [1,3,4].

The major structural components of gap junctions are integral membrane proteins, connexins, which belong to a multigene family [1,3,5,6]. In rodents 13 different members of this family, as well as their complex and often overlapping expression patterns [7], have so far been described. Genes belonging to this family share a similar structure consisting of two exons separated by a large intron (5–11 kb in size) interrupting the 5′ UTR. The entire coding region and the 3′ UTR are, therefore, part of the same exon (exon 2). This feature allowed connexin genes in different species to be mapped using rather short cDNA probes [8–11].

The structure of the connexin32 gene appears to be more complex, because, in addition to the upstream promoter P1, an alternative promoter P2, close to the 3′ end of the large intron, is present [12–14]. Connexin32 messengers are therefore formed by exon 2 linked either to exon 1, transcribed from the upstream promoter P1, or to exon 1b transcribed from the downstream promoter P2. The latter transcript is at least partially specific for the nervous system both in rodents and in humans [12–14]. This is noteworthy since mutations in the human connexin32 gene have been associated with an X-linked form of Charcot-Marie-Tooth disease (CMTX 1) which shows nervous-tissue-specific symptoms without affecting other connexin32-expressing organs [12].

The cloning and sequencing of bovine connexin32 cDNA reported here, led to the isolation of a new mRNA species derived from the transcript controlled by the upstream promoter P1 through alternative splicing. We demonstrated that the 5′ UTR of this new mRNA comprises a new exon, located within the large intron of the gene. Expression-pattern analysis of connexin32 mRNAs in 16 bovine tissues, besides allowing the detection of a transcript originating from the promoter P2, confirmed the existence of two alternatively spliced transcripts under the control of promoter P1.

Materials and methods

Screening of cDNA library

A λgt10 bovine liver library (Clontech) was screened using a previously isolated human connexin32 cDNA fragment (722 bp) [8]. This probe, labelled with [α-32P]dCTP (Amersham), was used to hybridize the plaque-transferred filters (Hybond N, Amersham) according to standard procedures [15].

PCR assays on genomic DNA

Oligonucleotides, purchased from PE-Applied Biosystems UK, are listed in Table 1. Bovine genomic DNA was extracted according to standard procedures [15]. PCR reactions were performed in a MJ PTC 100 thermal cycler, using either a Taq DNA polymerase (Promega) or a Gene Amp XL PCR kit (Perkin–Elmer), depending on the size of the fragment to be amplified. The corresponding reaction conditions were: 30 s at 95 °C, 30 s at 52 °C, 1 min at 72 °C for 30 cycles, preceded by 5 min at 95 °C and followed by further 10 min elongation at 72 °C for the Taq DNA polymerase system; and 5 min at 95 °C followed by 30 cycles of 30 s at 95 °C and 5 min at 60 °C for the Gene Amp XL PCR kit.

Table 1. Primers used in PCR and RT–PCR assays.
  Position in transcript
  1. Nucleotide positions are numbered according to the sequences reported in Fig. 1. *CDS, coding sequence; F, forward; R, reverse.

FRTGACTAAGCTTGTCGACG– – – complementary to part of 5A

RT–PCR assays and 5′ rapid amplification of cDNA ends

Total RNA (5 µg), extracted [16] from 16 bovine tissues, was used as the template for first-strand cDNA synthesis with 30 U of avian myeloblastoma virus (AMV) reverse transcriptase (Promega) at 42 °C and connexin32-specific primers.

The 5′ end of the cDNA was isolated from bovine liver by 5′ RACE using the single-strand ligation to ss-cDNA (SLIC) method [17]: ss-cDNA was ligated, using 10 U of T4 RNA ligase (New England Biolabs), to the 5′ anchor oligonucleotide 5 A, previously phosphorylated at the 5′ end and blocked at the 3′ end by the addition of an amino group. The ligation products were used as templates for PCR with the primer couple FR/R2. Usually, two additional semi-nested PCR steps, using R3 and R4 as 3′ primers, were required to obtain suitable amounts of the expected product.

To evaluate the tissue distribution of connexin32 mRNAs, one-tenth of the ss-cDNA was submitted to PCR with primers F3/R6, F2/R6 and F5/R6 following this thermal profile: 15 s for each step at 95 °C, 56 °C and 72 °C for 35–40 cycles.

Amplification efficiency was evaluated by withdrawing aliquots of the PCR reaction every second cycle. Following electrophoresis, the intensity of the bands was measured using a computer densitometer (GS-700 Imaging Densitometer, Biorad). Densitograms were analysed with Molecular Analyst Software 1.4 (Biorad).

Cloning of DNA fragments and sequence analyses

The insert of the single positive λgt10 clone was excised with EcoRI (Promega) and subcloned into pUC9. PCR products were inserted into pCRII using the ‘TA cloning kit’ (Invitrogen).

DNA sequencing was performed on both strands, either on plasmids or directly on purified PCR products, using the Taq dye-deoxy terminator method, using an automated 370A DNA sequencer (Applied Biosystems).

Northern blot analysis

Northern blots of total RNAs were hybridized to the purified 1035 base pair (bp) DNA fragment labelled with [α-32P]dCTP using a DNA labelling kit (Pharmacia Biotech) according to standard procedures [15]. Densitometric analyses of the developed films were performed using a computer densitometer.

Computer analyses

Eukaryotic nucleotide sequences in EMBL 46 Data Base and amino acid sequences in SWISS-PROT 33 Data Library were used for sequence comparisons. Other sequence analyses were carried out using several programs of the PcGene software package (Intelligenetics).


Cloning and sequencing of bovine connexin32 cDNAs

The whole nucleotide sequences of two connexin32 mRNAs present in bovine liver were obtained by isolating overlapping DNA fragments through different approaches.

The first DNA fragment was isolated by library screening. A single positive clone, out of 5 × 105 screened plaques, contained a 481-bp insert whose sequence was very similar (84.3%) to the 3′ end of the human connexin32 cDNA [18] and, therefore, likely to correspond to the 3′ UTR end of the bovine connexin32 cDNA.

The coding region was isolated through amplification reactions performed on genomic DNA, since in other species the whole coding region is present in a single exon (exon 2) [13, 19, 20]. Oligonucleotide R1 (Table 1) was used as the 3′ primer, while the 5′ primer F1 was constructed on the basis of a sequence of the coding region, conserved among human, rat and mouse cDNAs [18, 21]. Amplification led to the expected product, whose length was determined by sequencing to be 1035 bp. This fragment covered the coding sequence from codon 69 or nucleotide 205 (being + 1 the first nucleotide of the start codon), as judged by comparison with human, rat and mouse sequences [18, 21]. DNA-sequencing reactions starting from the 3′ end of this fragment stopped abruptly at position 1107, after the addition of only 132 nucleotides. However, the same segment was readily sequenced from the 5′ side and overlapped the 481-bp fragment, obtained from the library screening.

The 5′ region was obtained by 5′ RACE. Two products were obtained using the same primer couple FR/R4: a major band of 491 bp (transcript A) and a fainter band of 550 bp (transcript B). Alignment of their nucleotide sequences showed a perfect identity, the only difference being the presence in transcript B of 59 additional nucleotides, inserted between positions – 17 and – 16 of transcript A (Fig. 1).

Figure 1.

Nucleotide and deduced amino acid
sequences of bovine connexin32 cDNAs.
The first base of the start codon is designated + 1 and the stop codon is indicated by asterisks. The polyadenylation signal is underlined. The boxed amino acid, G235, is an additional residue, not found in other mammalian connexin32s. Putative phosphorylation sites are emboldened.

The whole cDNA sequences were derived from these overlapping DNA fragments. Possible errors introduced by Taq DNA polymerase activity in amplified products were checked by sequencing two or more fragments obtained from independent experiments for each PCR product. The nucleotide sequences of bovine liver connexin32 cDNAs and the deduced amino acid sequence are shown in Fig. 1. Transcript A is 1624 nucleotides long and comprises a 5′ UTR of 109 nucleotides, while transcript B spans 1683 nucleotides with a 5′ UTR of 168 nucleotides. Both contain the same coding region with an ORF of 852 nucleotides followed by the stop codon TGA (positions 853–855) and a 3′ UTR spanning 660 nucleotides with the polyadenylation signal ATTAAA (positions 1474–1479) located 19 nucleotides upstream the poly(A) tail.

The encoded protein comprises 284 residues, corresponding to an estimated molecular mass of 32 082 Da. An additional amino acid, namely a glycine at position 235, was found to be present following comparison with the known connexin32 protein sequences.

Computer analysis gave evidence of the presence, in bovine connexin32 mRNAs only, of two potential hairpin loops. The first is located at the 5′ end (positions – 109 to – 87 in transcript A and positions – 168 to – 146 in transcript B) with an estimated free energy value of – 82.75 kJ(Fig. 2a), the second is within the 3′ UTR (positions 1076–1107) with an estimated free energy value of – 127.1 kJ (Fig. 2b). Furthermore, three extensive pyrimidine clusters were evidenced in the 3′ UTR: from position 873–932, 974–1005 and 1104–1131. In the latter a sequence of 28 contiguous pyrimidines was found.

Figure 2.

Potential secondary structures within
the 5′ and 3′ UTRs of connexin32 mRNAs.
Numbering as in Fig. 1. (a) Hairpin at the 5′ end (positions – 109 to – 87 in transcript A or positions – 168 to – 146 in transcript B), (b) hairpin present within the 3′ UTR.

Identification of a new exon within the large intron interrupting the 5′ UTR

To determine whether the additional 59 bp of transcript B result from the use of a different splicing donor site for exon 1 or represent a new exon, a PCR analysis on genomic DNA was performed. Two primers (F2 and R5) mapping in the 59 bp insertion in opposite orientations, were used in conjunction with primers R6 and F3, respectively. Because F3 is located in the 5′ UTR upstream of the 59 bp insertion and R6 is located, in reverse orientation, within the coding region (exon 2), the expected PCR products should allow evaluation of the length of the intron between exon 1 and exon 2, and mapping of the position of the possible new exon. PCR reactions yielded a fragment of ∼4.2 kb for the primer couple F3/R5 and ∼1.3 kb for F2/R6. Both fragments, partially sequenced from both ends, showed the presence of canonical splicing donor and acceptor sites (Fig. 3a). These results supported the hypothesis of the existence of a new exon in the bovine connexin32 gene, located ∼4.2 kb downstream of exon 1 and 1.3 kb upstream of exon 2 (Fig. 3b). This new exon will hereafter be referred to as exon 1a. The intron of ∼4.2 kb between exon 1 and exon 1a will be named intron 1a1, and the intron of ∼1.3 kb between exon 1a and exon 2, intron 1a2. For the large intron alternatively spliced between exon 1 and exon 2, spanning ≈5.5 kb, the original name intron 1 will be kept.

Figure 3.

Schematic outline of the connexin32 gene organization. (A) Location of the four exons (not in scale) within the gene; (B) structure and origin of the three alternative transcripts. Boxes represent exons: UTRs are dark grey and the coding sequence is light grey. Introns and gene flanking regions are represented as lines. Upper-case letters correspond to exons, while lower-case correspond to introns. Arrows underlie the primer sequences; sequence corresponding to primer F4, derived from humans and rodents, is in italic.

Characterization of bovine connexin32 mRNA species

To verify the existence also in cattle of a transcript originating from a downstream promoter P2, we designed primer F4 on the basis of a conserved sequence of exon 1b [12–14]. The last two positions at the 3′ end of this primer were completely degenerated in order to allow a perfect 3′ match. RT–PCR assays were carried out on total RNA from bovine liver with the primer couple F4/R3 (the latter mapping in exon 2), followed by a semi-nested PCR with the inner R4 primer. They allowed amplification a 440-bp fragment which contained 60 bp that were almost identical (96.6% identity) to the corresponding region of the human and rodent exon 1b, joined to exon 2.

F4/R4 amplification was also carried out on genomic DNA and yielded a larger fragment of ≈ 800 bp containing an intron of about 370 bp referred to as intron 1b. It shares its acceptor splicing site with the 5.5-kb intron 1 processed in transcript A and with the 1.3-kb intron 1a2 processed in transcript B (Fig. 3b).

Tissue distribution of connexin32 mRNAs

A single connexin32 mRNA band of ∼1.6 kb was detected in five of 15 bovine tissues assayed by Northern blot (Fig. 4). The highest levels of gene expression were found in liver and kidney; lower levels were detected in spinal cord, brain and intestine.

Figure 4.

Tissue-specific expression of bovine
connexin32 mRNA as detected by Northern blot hybridization.
(a) Total RNAs (15 µg) from the indicated bovine tissues probed with the 1035 bp homologous cDNA fragment. (b)
Densitometric analyses. Expression levels of connexin32 mRNA were normalized by
hybridization of the same filter to a β-actin probe. The amount of connexin32 mRNA in the liver was set to 100%.

The expression pattern of the three transcripts identified for the connexin32 gene (Fig. 3b) was studied using RT–PCR assays. Owing to the higher sensitivity of RT–PCR, connexin32 mRNAs were detected in more tissues (Fig. 5). Transcripts A and B are present in liver, kidney, uterus, intestine, abomasum, testis, ovary and pancreas (the last tissue was not included in the Northern blot because of the poor quality of the extracted RNA). Because exon 1, recognized by primer F3, is common to both transcripts, the primer couple F3/R6 amplified two fragments of 218 and 159 bp (transcripts B and A, respectively, Fig. 5b). Transcript C is expressed in brain and spinal cord, nevertheless low levels are detectable in liver, providing a further nested amplification step is performed.

Figure 5.

RT–PCR assays on total RNA from 16 bovine tissues. (a) Location of the primers with respect to the connexin32 gene (see also Fig. 3). RT–PCR products obtained with primer pairs: (b) F3/R6, identifying transcripts A and B; (c) F2/R6, identifying transcript B; (d) F5/R6, identifying transcript C. Arrows indicate the size of the amplification products verified by sequencing. The low molecular mass band in panel b was shown to be a PCR artefact by sequence analysis. Size marker, pUC8 HaeIII; negative control, no DNA.

The different intensity of bands in Fig. 5b might be due to differences in amplification efficiencies between the two templates, and/or to different levels of the two transcripts. In order to evaluate the amplification efficiency of transcripts A and B, a PCR assay was performed and samples were analysed at different amplification cycles. Amplification efficiency between the two templates was identical during the whole process (Fig. 6a). Therefore, the different intensity of the bands of transcripts A and B is considered to reflect their relative expression levels in tissues [22].

Figure 6.

Ratio of transcript A and B expression in different tissues. (a) Evaluation of amplification efficiency of transcripts A (▮) and B (▴) (primers F3/R6). (b) Relative abundance of transcript B with respect to transcript A (set to 100%), evaluated by RT–PCR (primers F3/R6) in different tissues. Size marker: pUC8 HaeIII; negative control: no DNA.

In order to quantify the relative abundance of each transcript in the expressing tissues, PCRs were extended to 40 cycles (Fig. 6b). Densitometric measurements of the relative band intensities showed that transcript B always represents less than one-tenth of transcript A. Tissue differences were detectable: kidney and ovary showed the highest relative expression level of transcript B (7–8%), testis and abomasum the lowest detectable levels (2–3%), while in the other tissues tested, with the exception of pancreas, intermediate values were observed. In pancreas transcript B is not measurable through F3/R6 amplification. As shown in Fig. 5c, this transcript is present, but can be detected only by specific PCR (primers F2/R6), as a consequence of the very low mRNA level present in this tissue.


The bovine connexin32 cDNA sequences reported here were obtained from overlapping DNA fragments, isolated by different approaches from RNA and from genomic DNA.

The 284 amino-acid-encoded connexin32 protein is almost identical to other known mammalian connexin32s, showing 97–98% identity [18,21]. The only noteworthy difference is the presence of an additional glycine at position 235 lying in a glycine-rich context (four glycines in an exapeptide) within the C-terminal cytoplasmic domain [23]. A number of potential phosphorylation sites, conserved among mammals, were identified within this terminal portion of the protein (Fig. 1). Saez et al. [24] showed that in mouse and rat liver, two of these sites (Ser 229 and Ser 233) can be phosphorylated by protein kinase C and the latter (Ser 233) by protein kinase A as well.

The structure of the bovine gene for connexin32 appears to be more complex than in other mammals, since an additional exon (1a) was discovered. The bovine connexin32 gene consists of four exons (1, 1a, 1b and 2) and four introns (1, 1a1, 1a2 and 1b). Exons 1, 1a and 1b contain exclusively 5′ UTR sequences. The data reported here are consistent with the presence, also in the bovine gene, of two alternative promoters: P1, upstream of exon 1 and P2, upstream of exon 1b.

We isolated from bovine liver mRNAs two cDNAs derived, through alternative splicing, from promoter P1. Transcript A includes exon 1 and exon 2, while transcript B includes exon 1, the new exon 1a and exon 2. These two cDNAs show a high level of overall similarity (between 80% and 90%) with other mammalian connexin32 cDNAs [18,21]. The new exon in transcript B, which is discussed below, does not display any sequence similarity with the known connexin32 5′ UTRs [12–14]. It will be interesting to verify whether this exon is also present in other members of this gene family, whose intronic sequences are only partially known. Transcript C, originating from promoter P2, exhibited a high degree of identity (96.6%) with the corresponding sequences of the other mammalian species [12–14].

The 5′ UTRs of these transcripts do not show any similarity with a further connexin32 transcript (Dahl. E., accession number X84214), suggested to originate from a third promoter, whose presence in mouse has been reported, with no experimental data, by Söhl et al. [14].

A feature specific for bovine connexin32 cDNAs, is the presence of two potential secondary structures within the 5′ and 3′ UTRs. The calculated free energy value (−82.75 kJ) for the hairpin loop at the very beginning of the 5′ UTR of transcripts A and B suggests that this structure might be stable enough to restrict translation [25]. The second hairpin loop is located within the 3′ UTR (positions 1076–1107). Its calculated free energy value (– 127.1 kJ) could explain the cessation of sequencing reactions from the 3′ end of the 1035 bp fragment (see Results) exactly at nucleotide 1107. It has been reported that stem-loop elements located within 3′ UTR could be involved in the regulation of mRNA half-life, usually through binding of specific proteins [26].

A feature conserved among mammalian connexin32 cDNAs, including bovine, is the presence of an upstream open reading frame (uORF) beginning at position – 10 (i.e. in exon 2), out-of-frame with respect to the connexin32 start codon, extending 134 nucleotides into the coding region and encoding very similar peptides of unknown function constituted by 48 amino acids (43 residues in mouse). These peptides do not show sequence similarity with known proteins. This uORF lies in a context rather unfavourable for translation [27,28]. However, its constant presence in connexin32 cDNAs suggests a possible role, either disturbing translation start at the correct AUG or mediating regulatory effects through the encoded peptide [29].

Transcript B showed a unique feature, i.e. the presence of two additional in-frame overlapping uORFs starting at nucleotides – 63 and – 45, respectively, and comprising 18 and 12 codons. Their starting codons are located in functionally different contexts, only the first being favourable for translation [27,28]. They share the same stop codon (TGA) located six nucleotides upstream of the ATG start codon of connexin32 and partially overlapping the start codon of the common uORF, this stop signal is therefore present within the common exon 2. A functional role of short uORFs terminating before the main ATG codon has been demonstrated for some genes such as murine complement factor B and CCAAT/enhancer-binding protein, where the uORFs act in cis[30,31], and β2 adrenergic receptor mRNA, where the encoded peptide itself is able to inhibit protein expression [32].

Pyrimidine-rich sequences and polypyrimidine-binding proteins are important functional elements in RNA metabolism [33]. A putative cis-element, present within the first (positions 873–932) of the three pyrimidine clusters of bovine connexin32 mRNA, fits the consensus (C/U)CCANXCCC(U/A)YXUC(C/U)CC perfectly and is found in some stable eukaryotic mRNAs [34]. This sequence has been reported to bind cytosolic proteins able to form an RNA–protein complex (α-complex) involved in mRNA stabilization [34]. An almost identical sequence at corresponding positions is also present in the other mammalian connexin32 mRNAs, except for human where the consensus is only partially recognizable.

Tissue distribution of the connexin32 mRNA is in substantial agreement with the expression patterns detected in other mammals [35–38], including its prevalence in liver and kidney. RT–PCR analyses with primers specific for the different connexin32 mRNA species were carried out in order to assess the tissue distribution of transcripts A, B and C.

Transcripts A and B share the same qualitative expression pattern, are absent from nervous tissue and are detectable in liver, kidney, intestine, uterus, abomasum, testis, ovary and pancreas. As PCR efficiency was shown to be identical for these two transcripts, their co-expression was further investigated by densitometric measurements of the corresponding amplification bands. Although transcript B exhibits low expression levels, our results indicate tissue differences in its relative abundance. The meaning of these differences is unknown, however, they could be part of a subtle transcriptional and/or translational control in physiological and/or pathological conditions.

Transcript C, which is under the control of promoter P2, is present in nervous tissue (brain and spinal cord) thus confirming its expression specificity, already determined in humans and rodents [12–14].

The generation of transcripts differing only in their 5′ regions has been described for many unrelated genes and mostly interpreted as being an evolutionary gain to refine transcriptional and translational control. A relationship between these alternatively spliced transcripts and a particular function has been suggested for a minority of them. Examples are mouse heat shock protein 47 (HSP 47) gene [39] and human lactoferrin gene [40]. In the first case, one of the alternative transcripts acquires an increased translatability at high temperatures, while in the second case it might play an important role in the regulation of cell growth.

When alternative transcripts arise from distinct promoters, they usually show tissue-specific or cell-specific expression, as in the case of mouse leukaemia inhibitory factor receptor (LIF-R) [41] and mammalian connexin32 transcript controlled by promoter P2 (refs 12–14 and present paper). Indeed it is more difficult to understand the significance of 5′ UTR alternative transcripts originating from the same promoter, since they often do not display clear-cut tissue specificity. Among a number of published examples showing co-expression of these transcripts are: muscle-specific enolase [42], human vigilin [43], human CC chemokine receptor 5 (CCR5) [44] and bovine prion protein [45]. The most shared hypothesis suggests that their functional role might be related to physiological or metabolic changes. Because no quantitation of the relative abundance between the transcripts has been performed in the above cases, the existence of some differences cannot be excluded, as reported here for connexin32. Subtle transcriptional and post-transcriptional regulations have already been argued for connexin32 and other connexin mRNAs in both normal and regenerating rat liver after partial hepatectomy [46, 47], but the mechanisms by which regulation is achieved have not been elucidated. The new connexin32 transcript reported here may represent a further step in identifying the components of such complex mechanisms. Additional information could be acquired through a better knowledge of the large intron 1 interrupting the 5′ UTR and harbouring both promoter P2 and exons 1a and 1b. Structural and functional characterization of this intron is in progress in our laboratory.


This work was supported by grants from the Ministero dell’Università e della Ricerca Scientifica e Tecnologica (MURST 40% and 60%). The authors thank Elena Pupella Rizzo for excellent technical assistance.


  1. Note: The nucleotide sequence data reported in this paper have been submitted to the EMBL/GenBank/DDBJ Nucleotide Sequences Databases under accession numbers X95311 and AJ224440.