Ig Heavy Chains
Comparative studies of immunoglobulins in numerous species of chondrichthyans, teleost fishes, amphibians and reptiles have facilitated efforts to understand the nature of diverse antibody production (Rast et al., 1998; Anderson et al., 1999; Danilova et al., 2005; Saha et al., 2005). In mammals, the arrangement of the IgH locus is a “translocon” type, wherein multiple variable heavy chain (VH) segments are linked distantly to diversity (DH), joining (JH) and CH domains [(V)n-(D)n-(J)n-(C)n]. This translocon type arrangement also is conserved with limited variations in teleost fish, wherein C genes encode three distinct classes (IgM, IgD, and IgZ/T) as compared to as many as five major classes (IgM, IgD, IgG, IgE, and IgA) in mammals. In contrast, the most basal lineages of jawed fishes such as sharks and rays (elasmobranchs) possess IgH loci (IgM, IgW, IgNAR) of distinct “cluster” type arrangement comprised of repeated units of (V-Dn-J-C) or slight variants thereof, sometimes with germline-fused segments (Rast et al., 1989; Flajnik, 2002). A third and highly divergent IgH organization evolved within the avian lineage and consists of a single functional VH gene that undergoes gene conversion to generate antibody diversity (Reynaud et al., 1989). Outside of the invariant and close proximity (∼190 nt) of VH to DH segments in the coelacanth genome (Amemiya et al., 1993), nothing was known about its immunoglobulin loci prior to the acquisition of its genome sequence.
Characterization of IgH Isotypes in L. Menadoensis from BAC Clones
A ≥7× coverage BAC library generated from the Indonesian coelacanth, L. menadoensis, was screened via colony hybridization using a mixed coelacanth VH probe and 50 positive clones initially were identified. Screening with a horn shark Cµ probe identified 20 clones (Supplementary Table S1). All VH+, CH+ and VH+CH+ clones were restriction fingerprinted using Internet Contig Explorer (iCE) (Fjell et al., 2003) and assembled into two contigs (Supplementary Fig. S1). Clones 189I9 (177 kb), 58E24 (183 kb), 130A21 (159 kb), 206D14 (167 kb), and 217L16 (167 kb) were selected strategically and sequenced to high draft coverage (10×) via Sanger shotgun sequencing. Overlaps between clones 189I9 and 58E24 (Fig. 1A), and between clones 130A21 and 206D14 (Contig 2) were confirmed (Fig. 1B); 217L16 did not show any appreciable overlaps with the other clones even though it had been placed in Contig 2 via automated restriction fingerprinting.
Figure 1. Immunoglobulin heavy chain genome organization in Latimeria menadoensis. Overlapping BAC clones encompassing the immunoglobulin heavy chain loci of L. menadoensis were isolated and sequenced. Locus 1, designated IgW1 (BAC clones 58E24 and 189I9; GenBank JX848736.1) contains 19 constant region exons (A), whereas locus 2, designated IgW2 (BAC clones 130A21 and 206D14; GenBank JX840472) encodes 16 constant region exons (B). Both IgH loci are comparatively large and organized in a different pattern in which V and D segments are in close (∼190 bp) V-D linkage and contain consensus RSSs. The respective genes appear to be most similar in overall sequence relatedness to IgW from the African lungfish and cartilaginous fishes. Both IgW isotypes encode a secretory tail at the end of the first seven C domains and possess two transmembrane exons at the end of last C domain. A separate BAC clone, 217L16, was identified in the same large restriction fingerprint set as were the clones encompassing the IgW2 locus (Supplementary Fig. S1); however, it contains only VH and DH segments; no JH segments or CH domains were identified.
Download figure to PowerPoint
VH, JH, and CH elements readily were identified via motif searches. DH elements were identified by inference using the positions of flanking RSS sequences. BLAST analyses using authentic IgM sequences as queries showed that neither contig contained an IgM-type heavy chain. Two distinct gene loci consisting of VH, DH, JH, and CH elements were identified (Fig. 1); however, their CH segments are clearly not of the Cµ type (Supplementary Table S2) and most are related closely to the IgW type reported in cartilaginous fish (Harding et al., 1990; Rumfelt et al., 2004) and lungfish (Ota et al., 2003b). Based on their genomic structures and notable sequence differences, the two contigs represent distinct loci and clearly are not allelic forms. A fifth VHDH-containing clone, 217L16, (Fig. 1C) lacks the downstream JH and CH exons and does not show sequence overlap with the two contigs. Overall, the gene segments are organized in an intrinsically different pattern than observed in other vertebrate species, wherein individual V and D gene segments are paired and in close proximity (Amemiya et al., 1993), with a large track of J segments and several C region exons further downstream: [(VH-DH)n-(JH)n-(CH)16/19]. The internal duplication of CH exons is reminiscent of the IgD-encoding locus of pufferfish (Saha et al., 2004) (Supplementary Fig. S2). The patterns of organization of the loci preclude typical heavy chain class switching, although alternative splicing ostensibly could produce different isoforms of IgW.
IgW1 is predicted to contain 19 C domains (Fig. 1A), whereas IgW2 is predicted to contain 16 C domains (Fig. 1B). Each locus encodes two transmembrane domains and a termination codon followed by a 3′ untranslated region; a region that is predicted to represent a secretory tail (SecT) (Fig. 1 and Supplementary Fig. S2) is just downstream of CH7. Assuming that the primary transcript consists of seven CH domains (see below), these observations are consistent with predicted secretory and membrane-bound forms of the two IgW molecules for Latimeria. The evolutionary scenario and usage of the other CH domains largely are unknown at this time. These other downstream exons appear to be completely in-frame, devoid of stop codons and possess predicted splice donor/acceptor sites.
A total of 31 VH genes have been identified in the five sequenced BACs; of these, one is a partial sequence and five are pseudogenes (lacking one or more characteristic features such as an octamer and/or a TATA-box or possess stop codons and/or frame shifts in their coding sequences). All putatively functional VH sequences possess upstream regulatory sequences (an octamer that is separated from a TATA-box by 18 bp), a leader peptide sequence split by an intron with consensus splice sites, a reading frame that can be divided readily into framework regions (FRs) and complementarity determining regions (CDRs), and a 3′ RSS with a typical 23 bp spacer (Supplementary Table S3). Most of the VH genes span 291–300 nucleotides (97–100 amino acid residues from FR1 through FR3), typical for VH genes of other vertebrates. All DH segments exhibit conserved upstream and downstream RSSs as reported previously for L. chalumnae (Amemiya et al., 1993).
VH Repertoire in Latimeria
Multiple alignments of the deduced amino acid sequences of VH gene segments identified in this study indicate that the coelacanth VH germline repertoire is largely comparable to those characterized in other jawed vertebrates (Supplementary Fig. S3). The conserved GKGLEW and YYCAR motifs along with other canonical residues underscore the overall conservation of vertebrate VH sequences in vertebrate phylogeny. Both VH and VH pseudogenes from the two IgW loci are in the same transcriptional orientation; no observable inverted sequences have been detected.
VH gene families are defined on the basis of percent nucleotide sequence identity. Sequences that are greater than 80% identical are categorized as representing a single family. At least five distinct phylogenetic lineages of VH gene families have been identified in vertebrates (Ota and Nei, 1994; Andersson and Matsunaga, 1995), although the number of VH gene families can vary widely among species. Mice and humans possess 14 and seven families, respectively (Tutter et al., 1991; Tomlinson et al., 1992). Rainbow trout and channel catfish possess 11 and six VH gene families, respectively (Ghaffari and Lobb, 1991; Tutter et al., 1991; Warr et al., 1991; Roman et al., 1996), whereas the horn shark, a cartilaginous fish, possesses only two VH gene families (Hinds-Frey et al., 1993). Based on our analysis, at least nine VH families can be recognized in coelacanth. In addition, several unique VHs were delineated that cannot be ascribed to any specific gene families. A phylogenetic tree indicating the relationships of 70 VH genes identified in the L. chalumnae genome is presented in Figure 4; pseudogenes and VH genes containing ambiguous sequences were eliminated from the comparison. Nineteen of the VH genes are located in a TCRαδ locus (see below).
Figure 4. Phylogenetic analysis of VH segments in coelacanth. The relationships of VH genes were inferred using the Neighbor-Joining method. All ambiguous positions were removed for each sequence pair. The analysis involved 70 amino acid sequences; % bootstrap replicates are given on the tree. Coelacanth VH genes fall into nine bona fide VH families as indicated by brackets. VH elements that were identified within the TCR locus are in boxes.
Download figure to PowerPoint
Analysis of IgH-Transcripts
Although the L. menadoensis transcriptome assembly was produced from non-haematopoietic tissues (liver and testis), a small percentage (0.7%) of the ∼13,000 annotated genes correspond to immune system processes (Gene Ontology term 0002376). This dataset was used to identify multiple hits encompassing IgW transcripts; these represent both IgW1 and IgW2 molecules although none were found to be full-length rearranged molecules (Fig. 5). One IgW1 transcript (contig106265) is rearranged with a unique V-D segment (truncated), a known J segment and seven C domains (CH1 to CH7) followed by a secretory tail composed of 20 amino acid residues. A second, truncated IgW1 transcript (contig26989) was identified that lacks a V region but contained CH5 spliced directly to the TM-encoding exons (located downstream of the 12 additional CH domains, Fig. 1A) followed by a 3′ untranslated region. This likely represents a membrane form of IgW. A similar feature is seen in teleost IgM whereby the transmembrane domain is spliced directly to a CH3 domain (Bengten et al., 1992; Hansen et al., 1994; Saha et al., 2005). The genomic sequence for the secretory tail is located at the 3′ end of the CH7 domain in both IgW1 and IgW2 loci (Fig. 1). The IgW genes that were identified in coelacanth structurally resemble IgW described previously in lungfish and cartilaginous fish, although a characteristic two-domain form (Harding et al., 1990; Ota et al., 2003b) has yet to be identified in the transcriptome. The finding of multiple, internally repeated CH domains (Supplementary Fig. S2) also is curious and it will be interesting to determine whether or not any of these other domains are utilized, and in what context. This may be challenging given the difficulty in procuring any coelacanth tissue, let alone a hematopoietic source; however, one alternative may be to use a surrogate in vitro B-cell system to assess the functionality.
Figure 5. IgW transcript validation in coelacanth. IgW1 and IgW2 heavy chain transcripts were identified in the L. menadoensis RNAseq combined assembly, despite the dataset being generated from non-lymphoid (liver and testes) sources. (A) A rearranged IgW1 heavy chain transcript that includes a partial VD region, a known J-segment and seven constant domains (CH1 to CH7) followed by a secretory tail (upper panel). Another IgW1 heavy chain transcript that lacks the variable region and most of the constant region but contains a CH5 domain spliced to transmembrane domains followed by its 3′ UTR, suggests it derives from a membrane isoform of IgW. (B) The IgW2 heavy chain transcript lacks a variable domain but possess an intact constant region (7 domains) in which its secretory tail was attached at the 3′ end of CH7 domain. Three other constant region transcripts were identified in the muscle transcriptome of L. chalumnae (not shown).
Download figure to PowerPoint
Lack of IgM in Latimeria
Ig heavy chain that is encoded by IgM has been reported in all vertebrates characterized thus far and is considered to be essential for the initial phase of the humoral adaptive immune response in jawed vertebrates. Despite an exhaustive search of the coelacanth sequence data, the IgM gene constant region could not be identified, even though orthologs of most of the major genes involved in the adaptive immune system of jawed vertebrates are present. Moreover, L. menadoensis genomic BAC and λ libraries were screened exhaustively using numerous strategies and a variety of probes but no Cµ like sequences were identified. Additionally, PCR primers (Turchin and Hsu, 1996) that amplified VH from teleost fish, lungfish, amphibians, reptiles, and mammals produced fragments that fell into two distinct groups. One set consisted of bona fide IgW VH elements. Sequencing of VH elements from the second set, which we surmised may be embedded in a different heavy chain locus, instead were found in the TCR α/δ locus, described below. Furthermore, numerous additional degenerate primers that were designed to amplify Cµ sequences, based on published sequence data, failed to identify a Cµ homolog. No traces of Cµ were found in the RNA-seq data of coelacanth although as stated above, transcripts encoding both IgW1 and IgW2 heavy chains were identified (Fig. 5). The apparent lack of genes encoding IgM heavy chain is unexpected although it is known that the codfish and its close relatives apparently have lost major components (MHC class II) of their immune systems (Star et al., 2011). The evolutionary relationships of Ig heavy chains, including IgD (to which the coelacanth IgW shows a relationship), will be addressed elsewhere. The lack of IgM in the coelacanth raises questions as to whether an IgW molecule supplants classical IgM in a manner analogous to the compensatory modifications seen in the codfish with respect to the function of MHC class II (Malmstrom et al., 2013).
Ig Light Chains
Multiple immunoglobulin light chain isotypes have been identified in all vertebrates studied to date, with the exception of birds, bats and snakes, in which only a single light chain has been described (Lundqvist et al., 2006; Gambon-Deza et al., 2012; Magadan-Mompo et al., 2013). In tetrapods, IgL can be classified into three distinct groups: kappa (κ), lambda (λ), and sigma (σ) (Criscitiello and Flajnik, 2007). However, the σ isotype was thought to have been lost in all lineages after the divergence of amphibians (Das et al., 2012). A close examination of VL based on its phylogenetic relationships, CDR lengths and RSS orientation, recognized four ancestral VL clades that were maintained throughout the vertebrates (Criscitiello and Flajnik, 2007). A distinct variant of the σ isotype, which was named σ-cart (for cartilaginous fish), has been identified only in the shark (Criscitiello and Flajnik, 2007). The organization of the light chain loci among the vertebrates is not as definitive or diagnostic as for the heavy chain loci and can consist of cluster-type, translocon-type or perhaps other variations (Hsu and Criscitiello, 2006).
Homology searches with various vertebrate immunoglobulin light chain amino acid sequences have identified a large number of IgL genes in the African coelacanth genome. Following previous classification schemes (Criscitiello and Flajnik, 2007; Edholm et al., 2011) most coelacanth IgL genes can be separated into four groups based on the amino acid sequence, that is, sigma-cart (cartilaginous fish type I/NS5), sigma (fish L2), kappa (cartilaginous fish type III/NS4, fish L1/L3/F/G, Xenopus rho) and lambda (cartilaginous fish type II/NS3, Xenopus type III) (Fig. 6) and overall are in agreement with the generally accepted classification scheme (Criscitiello and Flajnik, 2007).
Figure 6. Phylogenetic analysis of VL segments in coelacanth. The relationships of VL genes were inferred using the Neighbor-Joining method. All positions containing gaps and missing data were eliminated; % bootstrap replicates are given on the tree. A total of 66 positions of framework region were represented in the final dataset. The Lc sequences are denoted by “JHxxxxxx”, followed by their positions on the scaffolds. Sequence identification numbers (GI) of all other taxa are mentioned after the species name. The light chain classes are given to the right of the tree and are in complete accordance with branching topology. The Sigma-2 designation is the former Sigma-cart subclass that had previously been found only in cartilaginous fishes.
Download figure to PowerPoint
The coelacanth genome encodes IgL genes of the sigma-cart type, which we provisionally denote sigma-2. These loci are in a cluster-type pattern of organization and four clusters are found in three different scaffolds (JH130719, JH128711, and JH132919), where V and J gene segments are germline-joined. An extra C region exon, which shows 94% identity at the nucleotide level with other C regions, is also observed in the scaffolds scaffold JH130719. It is uncertain if this distinctive C gene exon is expressed together with the VJ gene of the neighboring complete cluster or if it represents a pseudogene. Given the notable numbers of ambiguous regions (due to assembly gaps), it is possible that an additional VJ gene segment(s), which may be associated with a C gene segment, is present. CDR1 and CDR2 of V gene segments encode 13 and 11 amino acids, respectively, a characteristic feature of the sigma-cart IgLs of cartilaginous fish. Certain IgLs have insertions in their FRs. One scaffold (JH132380) contains a single C region, of which the amino acid sequence is similar to that in JH130719. The ortholog of this C region was identified in the Indonesian coelacanth transcriptome (testis: comp76432_c0_seq1) but the 5′ region preceding the C region showed little sequence homology with VJ region. The C gene segment of scaffold JH132919 is unusual in that its exon structure and 3′ end are predicted to be encoded by separate exons, somewhat different from that seen in other sigma-cart type IgL genes. Sigma cart initially was found only in elasmobranchs (Hikima et al., 2011; Sun et al., 2012), hence its presence in the coelacanth implies a wider distribution than initially thought.
The sigma type of IgL in coelacanth was detected in three scaffolds: JH126613, which contains 3(V-J)-3V-J-C; JH134803, which contains V-J; and JH135686, which contains V-J and a V pseudogene segment. Most J gene segments, with the exception of those most proximal to the C gene segment, may represent pseudo gene segments, as they contain an in-frame internal termination codon. The possibility remains that scaffolds JH134803 (8,271 bp) and JH135686 (6,455 bp) are located within JH126613 (3,167,360 bp). VL and JL gene segments are flanked by RSSs with 12 and 23 bp spacers, respectively. The CDR1 and CDR2 of V gene segments encode 10–11 and 12 amino acid residues, respectively, and are equivalent to those of cartilaginous and bony fishes. A YGxG (or PxYGxGFS) motif located at the CDR2-FR3 boundary region is conserved among most of the V gene segments of coelacanth IgL of the sigma type. Genes encoding Kcnv2 (potassium channel, subfamily V, member 2), Kank3 (KN motif and ankyrin repeat domains 3), Angpt14 (angiopoietin-like 4) and Rab11b (member of RAS oncogene family) downstream of C gene segments map to JH126613 (ENSLACG00000017294, ENSLACG00000017209, ENSLACG00000017018, and ENSLACG00000016974, respectively); these same genes also map downstream of the Ig sigma locus (XB-GENE-5806081) on Xenopus scaffold GL173022.1.
Igκ is present in all jawed vertebrates except birds and includes the amphibian rho-type light chain. The most extensive IgL gene family in coelacanth is of the kappa-type and encoded in three scaffolds: JH128084 (580,075 bp), JH129712 (214,159 bp) and JH130074 (167,776 bp). Four V gene segments, four J gene segments and one C segment are encoded in JH130074 in the same transcriptional orientation. A large number of VL gene elements without JL elements and CL segments have been identified in JH128084 and JH129712. The gene encoding ribose 5-phosphate isomerase A (RPIA) is located downstream of C gene segments in JH130074; close linkage of RPIA to the kappa locus is a tetrapod condition (Edholm et al., 2011). The gene encoding succinate-CoA ligase alpha subunit is upstream of V gene segments in JH128084 indicating that the 5′ end of IgL kappa loci likely is encoded in this scaffold. In addition to the above, three V gene segments were identified in scaffolds JH131133 (80,170 bp) and JH131467 (58,214 bp), and a single Vκ gene segment was identified in scaffolds JH130471 (127,935 bp), JH132852 (16,005 bp), JH133287 (13,169 bp), AFYH01278842 (5,496 bp), and AFYH01285422 (1,902 bp). Scaffolds JH128084, JH129712, and JH130074 and other short scaffolds could ostensibly be part of a longer contig. Some V gene segments detected in the aforementioned scaffolds are apparent pseudogenes, as defined by internal truncation, termination codons and/or frame-shift mutations.
Igλ constitutes the only IgL isotype in avians and was considered missing in fishes until its identification in channel catfish, Atlantic cod, and rainbow trout (Edholm et al., 2009). IgLλ in coelacanth maps to scaffold JH126620, which contains four V gene segments, two J gene segments and one C gene segment in a translocon-type gene organization. The genes for car15 (ENSLACG00000004420) and dgcr2 (ENSLACG00000005606) map downstream of the λ locus. Orthologous genes are located near the Ig type III locus in Xenopus tropicalis and near Igλ chain loci in human (chromosome 22q11) and in mouse (chromosome 16). CDR1 and CDR2 of the coelacanth V gene segments contain 13 and 11 amino acid residues, respectively and the length of CDR2 is longer generally than those of other vertebrate Igλ V gene segments but similar to some Xenopus Igλ V gene segments.
T-cell receptors (TCRs) are expressed on the surface of T-lymphocytes that recognize antigens presented by MHC and induce a series of intracellular signaling cascades, although it is not clear yet whether the γ/δ chains, which are found only in 5% of T-cells, are MHC-restricted. These signaling cascades regulate T-cell development, homeostasis, activation, acquisition of effector functions and apoptosis (Lin and Weiss, 2001; Okkenhaug et al., 2004). All jawed vertebrates thus far characterized possess four different types of TCR-chains: α, β, δ, and γ. T-lymphocytes are characterized as either being αβ or γδ. In addition, a divergent TCR, TCRµ, which has some features resembling a recently described TCRδ isoform in sharks, has been described in marsupials and a monotreme (Parra et al., 2007; Wang et al., 2011). The T-cell receptor genes, like those of immunoglobulins, consist of V, D, and J segments and a C region.
As expected, genes encoding the four basic types of TCR (α, β, γ, δ) have been identified in the coelacanth genome (described below). These were validated by partial cDNA sequences from both L. menadoensis and L. chalumnae (Supplementary Fig. S7). A phylogenetic tree of the C regions of the four TCR types of the coelacanth confirmed that each type is grouped with the expected clades (Supplementary Fig. S8). No genes encoding a conspicuous TCRµ were identified.
Coelacanth TCR α/δ
Scaffold JH127241 contains the extended TCR α/δ region and scaffold JH126915 contains primarily genes of the TCR-α locus; whether or not these two scaffolds are localized to the same chromosomal region has not been determined definitively. As in all other tetrapods examined to date, TCR-α locus is embedded with the genes encoding the TCR-δ chains (Fig. 7). The TCR α/δ region encompasses a track encoding 25 VH genes; these genes are nearly indistinguishable from those encoded at the IgW loci except they are not associated with DH segments. The VHs are located between the TCRα and TCRδ genes. VH genes also were reported at the TCRδ locus in the frog; however, there was no evidence for trans-locus somatic recombination between the loci despite the fact that both loci contain multiple VH gene segments (Parra et al., 2010). Analogous to the casein gene in Xenopus, VH genes also were found to be embedded in the TCRα/δ locus of the platypus and opossum (Parra et al., 2009). In marked contrast, several cDNAs that contained IgM or IgW V segments were rearranged with other gene segments of TCRδ and α in nurse shark (Criscitiello et al., 2010). At this point it is not known whether or not the coelacanth uses these VH gene segments in the context of TCRα/δ genes as no transcripts encoding VH gene segments could be identified in the available transcriptome databases. Fourteen Vα gene segments are located upstream and in the opposite transcriptional orientation as Cα. Another 58 Vα gene segments are in reverse orientation. A total of 80 Jα genes, which are in the same transcriptional orientation as Cα, have been identified. The large number of Jα elements exceeds that reported in frog (Parra et al., 2010), human and mice (Giudicelli et al., 2005). Only one TCRδ gene encodes Vδ in the same transcriptional orientation as Cδ and at least five Jδ elements have been identified. Sal-like protein 2 (SALL-2) and Methyl-transferase-like 3 (METTL3) delimit one end of the TCRα/δ locus in coelacanth, similar to the situation in birds and mammals.
Figure 7. Physical map and annotation of T-cell receptor α/δ locus (TCR locus 1). Scaffold JH127241 contains the coelacanth TCR α/δ locus. This locus contains genes for both TCR-α and TCR-δ in a typical arrangement; however, the locus also contains 25 V gene segments that are very closely related to those VHs encoded in the IgW loci. Transcriptional orientation is indicated by the direction of the arrowheads for each segment. Syntenic genes shown in gray are those conserved with mammalian TCRα/δ locus: methyl-transferase like 3 (METTL3), zinc finger protein (SALL2).
Download figure to PowerPoint
The overall organization of the coelacanth TCRα/δ locus as depicted in scaffold JH127241 is highly conserved with those of amphibians, birds and mammals, suggesting that it is a very stable genomic region. VH genes embedded in the TCRα/δ locus (previously named VHd) have been reported in different lineages of jawed vertebrates. In the amphibian, Xenopus tropicalis, the VHδ genes were only found expressed in TCRδ chains (Parra et al., 2010). The organization of TCRα/δ locus in birds varies among lineages. In the zebra finch the TCRα/δ locus contains VHδ, similar to the amphibians (Parra and Miller, 2012). However in galliformes and anseriformes, the TCRα/δ locus does not contain any VH genes; instead the VHδ genes have been translocated to a separate chromosome creating a second TCRδ locus (Parra et al., 2012). Among the mammalian lineages only the monotremes (the platypus) contains a single VHδ in the TCRα/δ locus (Parra et al., 2012); monotremes and marsupials have an additional TCR locus (TCRm) which contains VH-like and Cδ-like genes (Parra et al., 2007; Wang et al., 2011). These rearrangements suggest that VH-TCRδ genes are mobile and prone to translocation. In cartilaginous fish (nurse shark) there is also evidence that TCRδ chains use VH genes. NAR-TCR is a three domain receptor composed by double rearrangement of two V genes (VH and Vd) expressed with TCR Cδ (Criscitiello et al., 2006). In addition, shark VH (IgM and IgW) have been found rearranged with TCRα/δ D and J gene segments and then spliced to either Cα or Cδ TCR constant regions (Criscitiello et al., 2010). While no transcripts have been found, its highly intact and conserved VH elements might imply that the coelacanth also undergoes similar deployment of the VH-TCRδ chains seen in mammalian TCRµ, shark NAR-TCR and VH-TCR transrearrangements, and the VHδ-TCRδ chains in frogs and birds. These results are consistent with a selective pressure to maintain T cells that are capable of direct antigen binding.
The 2nd TCRα-containing scaffold (JH126915) is unlike those reported in other species. The IgW2 immunoglobulin heavy chain locus maps to the same scaffold but is in opposite transcriptional orientation as the respective TCRα genes (Fig. 8). The TCRα region consists of 74 Vα and 59 Jα segments followed by a single Cα domain. All of the TCRα components are in the same transcriptional orientation in this second TCRα scaffold. The linkage of TCRα with an IgH locus and the interdigitation of putatively functional VH segments with TCRα/δ, raises speculation about the genomic origins of immunoglobulin domains and their possible cooption to new immune functions. A short sequence segment, which is located between Cα and the first Jα, shows limited homology to the C region of kappa light chains in cattle and the secreted form of IgH rainbow trout. Although this fragment is an obvious member of Ig-superfamily, its relationship is unclear.
Figure 8. Physical map and annotation of T-cell receptor α locus (TCR locus 2) and tight linkage with IgW2. Scaffold JH126915 was annotated and shown to contain both the IgW2 locus (see Fig. 2) as well as TCRα. TCRα components are in reverse orientation with respect to IgW2. Unlike the components in TCRα/δ-containing scaffold (Fig. 7), all TCRα components are in the same transcriptional orientation. There are 74 Vα and 59 Jα segments followed by a single constant domain (Cα). Transcriptional orientation is demonstrated by the direction of the arrowhead for each segment. The chromosomal relationship of this region to the TCRα/δ locus (Fig. 7) is unknown.
Download figure to PowerPoint
Gene segments encoding coelacanth TCRβ were found in seven scaffolds. JH127253 (1,055,683 bp) contains 84 V gene segments, at least one D gene segment, 29 J gene segments and a C gene segment in one orientation and six V gene segments in the opposite orientation downstream of the C gene segment. JH134430 (10,111 bp) and JH134555 (8,965 bp) contain five and two V gene segments, respectively. The other scaffolds, JH137264 (3,510 bp), AFYH01288189 (1,308 bp), AFYH01289906 (1,164 bp), and AFYH01290638 (1,118 bp) each contain only a single V gene segment. Some of V gene segments apparently are pseudogenes with internal frameshift and/or nonsense mutations. In addition, several segments only are partially characterized and contain bad or marginal sequence stretches. Nonetheless, numerous, potentially functional V gene segments as well as J gene segments can be identified. In marked contrast to TCRα, coelacanth TCRβ is encoded at a single locus represented by the scaffold JH127253. It is assumed that the remaining six scaffolds either map to the middle of scaffold JH127253 or encode orphan genes. In addition, orthologs of CLCN1 (chloride channel voltage-sensitive 1; ENSLACG00000013923), FAM131B (family with sequence similarity 131 member B; ENSLACG00000013702), EPHA1 (EPH receptor A1; ENSLACG00000013051), NOBOX (NOBOX oogenesis homeobox; ENSLACG00000012815) and ARHGEF5 (Rho guanine nucleotide exchange factor (GEF) 5; ENSLACG00000012495) are localized on the 5′ side of scaffold JH127253 and orthologs of EPHB6 (EPH receptor B6; ENSLACG00000005719), KEL (Kell blood group), metallo-endopeptidase; ENSLACG00000002852), NECAP1 (NECAP endocytosis associated 1, i.e., adaptin ear-binding coat-associated protein 1; ENSLACG00000001735) are localized to the 3′ side of scaffold JH127253. The close linkage of CLCN1, FAM131B, EPHA1, NOBOX, ARHGEF35, EPHB6, and KEL to the TCR-β locus has been established in other vertebrates. Specifically, these genes are localized to the 3′ side of the TCRB locus on human chromosome 7q34-35, indicating local chromosomal rearrangement, such as inversion, occurred during vertebrate evolution.
TCRγ genes in the coelacanth can be detected among seven scaffolds: JH127368 (947,744 bp) containing four V gene segments; JH128975 (325,176 bp) containing 11 V gene segments, a pseudo V gene segment, 18 J gene segments and a C gene segment; JH132947 (15,197 bp bp) containing two V gene segments; JH134594 (8,858 bp) containing two V gene segments and a pseudo V gene segment; JH133588 (11,855 bp) containing two V gene segments and a partial V gene segment; AFYH01286773 (1,488 bp) containing a partial V gene segment and AFYH01287628 (1,377 bp) containing a V gene segment. The 5′ region of the coelacanth TCRγ locus likely is encoded by JH127368 as it contains non-TCRγ genes such as DNAH5 (dynein, axonemal, heavy chain 5, ENSLACG00000002629), whereas the 3′ region of the coelacanth TCRγ locus is encoded by JH128975. As in other vertebrates, V and J gene segments are associated with RSSs containing 23 and 12 bp internal spacers, respectively.
Major Histocompatibility Complex genes
MHC proteins are integral molecules in adaptive immunity and their genes provide one of the best examples of balancing selection in vertebrates. Unlike Ig and TCR, genes of the MHC do not undergo genomic rearrangement. MHC proteins present peptide fragments of processed intra-cellular antigens to CD4+ or CD8+ T-cells. MHC I is composed of MHC class I alpha and the invariant beta-2-microglobulin subunits. MHC II is composed of MHC class II alpha and beta subunits. The number and diversity of genes encoding MHC I alpha subunit and MHC II genes are related directly to the potential repertoire of peptide antigens that can be recognized. MHC I and II molecules are linked in many vertebrates; but in teleost fishes, MHC I and II genes are localized to separate genomic regions (Flajnik and Du Pasquier, 2008). The syntenic relationship of MHC regions provides compelling evidence for two rounds of whole genome duplications occurring at an early stage in vertebrate evolution (Kasahara, 1997).
Genes encoding MHC I alpha, beta-2-microglobulin (β2M), MHC II alpha and beta have been identified in homology searches of the African coelacanth genome database. Only the first exon of β2M was detected in the genome database (JH127334: 422278–422347); however, complete transcripts (i.e., comp30430_c0_seq1 of testis transcript) were identified in the Indonesian coelacanth transcriptome (Pallavicini et al., 2013), underscoring the lack of contiguity in some regions of the genome assembly. Nevertheless, at least 29 MHC I alpha, nine MHC II alpha and 12 MHC II beta genes can be recognized in the coelacanth genome database, including a few apparent pseudogenes (Table 2). Notably, the MHC loci are polymorphic and it is expected that some of the sequences could be allelic to each other. In several instances, multiple MHC genes are in close proximity, that is, six and five MHC I alpha genes in scaffold JH127214 and scaffold JH129212, respectively, and four MHC II alpha and four MHC class II beta genes in scaffold JH128941. However, others are in separate scaffolds, many of which represent extended chromosomal regions (Table 2). Some MHC I alpha and MHC II genes can be localized to scaffolds that contain homologs of COL11A2, RXRB, and SLC39A7 (in scaffold JH128993.1) or homologs of DAXX, SYNGAP1, PHF1, KIFC1, ZBTB9, CUTA, PFDN6, RGL2, TAPBP, WDR46, RPS18, RING1, VPS52, HSD17B8, PSMB8, TAP1, PSMB9, and BRD2 (in scaffold JH127214). Many of these also are found in the MHC region in human and Xenopus (Flajnik and Du Pasquier, 2008), although the overall degree of chromosomal synteny in the MHC regions is not entirely clear at this point because of the fragmented nature of the current assembly. Furthermore, some genes such as those of complement components (e.g., C2, C4A/B, CFB) present in MHC class III region (Supplementary Table S7), locating between MHC class I and MHC class II region in higher vertebrates, have been reported in the coelacanth genome paper (Amemiya et al., 2013).
Table 2. Major histocompatibility complex genes in Latimeria chalumnae
|(1) MHC class Iα|
|Scaffold||Scaffold size (bp)||Location in the scaffold|
|Exon 1||Exon 2||Exon 2'||Exon 3||Exon 4||Exon 5||Exon 6||Exon 7|
|JH126818||1,794,345||462,259–462,307||464,491–464,754||465,827–466,102||468,353–468,628||471,820–471,939|| || || |
|JH127214||1,088,405||612,101–612,038||607,138–606,869|| ||603,863–603,588||601,791–601,513|| || || |
|JH127214||1,088,405||656,018–655,955||654,809–654,549|| ||650,086–649,810a||647,527–647,250a|| || || |
|JH127214||1,088,405|| ||669,524–669,262|| || || || || || |
|JH127214||1,088,405|| ||701,062–700,793a|| ||699,468–699,193||697,298–697,020|| || || |
|JH127214||1,088,405||865,322–865,385|| || ||908,130–908,405||910,781–911,059||912,512–912,622||915,120–915,137||915,300–915,349|
|JH128073||586,066|| ||153,687–153,418|| ||140,330–140,055||136,998–136,720|| || || |
|JH128073||586,066||248,515–248,455a||212,796–212,551||208,340–208,308||204,621–204,342||202,272–201,994|| || || |
|JH128073||586,066|| ||261,470–261,724|| ||263,698–263,973||266,037–266,309|| || || |
|JH128073||586,066|| || || ||273,989–274,267|| || || || |
|JH128472||453,462|| ||411,589–411,843|| ||414,650–414,925||417,963–418,241|| || || |
|JH128993||322,248|| ||294,360–294,088|| ||290,524–290,249|| || || || |
|JH128993||322,248|| ||319,591–319,319|| ||313,902–313,627||311,453–311,175||310,305–310,189||309,835–309,803a|| |
|JH129212||282,836|| ||167,686–167,958|| ||169,382–169,657||170,218–170,496|| || || |
|JH129212||282,836|| ||216,314–216,586|| ||218,963–219,235||221,388–221,666||222,528–222,644|| ||228,168–228,208|
|JH129212||282,836|| ||274,491–274,763|| ||277,827–278,102||280,159–280,437||280,986–281,099||281,452–281,484||282,156–282,196|
|JH129714||208,590|| || || ||99,789–99,514|| || || || |
|JH130167||156,402|| || || ||147,202–146,933|| || || || |
|JH130808||100,269|| ||43,099–43,353|| ||46,245–46,520||49,557–49,835|| || || |
|JH130808||100,269|| ||86,006–86,260|| ||88,855–89,130||92,486–92,764|| || || |
|JH130480||126,838|| ||78,758–79,027|| ||81,551–81,826||83,466–83,744|| || || |
|JH130480||126,838|| ||96,595–96,864|| ||123,365–123,640||125,312–125,590|| || || |
|JH130480||126,838|| ||120,592–120,861|| || || || || || |
|JH130646||112,270||108,385–108,322||106,744–106,475a|| || || || || || |
|JH130654||112,651|| ||15,338–15,084|| ||13,108–12,833||6,175–5,913a|| || || |
|JH130654||112,651|| ||81,589–81,861|| ||82,741–83,016||84,873–85,149a|| || || |
|JH132002||46,392|| || || ||12,919–13,179||15,721–15,999|| || || |
|JH132348||23,389|| || || || || ||850–743|| || |
|JH134769||8,345|| ||3,708–3,962|| ||6,586–6,861|| || || || |
|JH134901||8,035|| ||3,539–3,285|| || || || || || |
|AFYH01279416||5,177|| ||3,322–3,068|| ||1,050–775|| || || || |
|AFYH01283960||2,885|| ||1,545–1,291a|| || || || || || |
|AFYH01287256||1,426|| || || ||952–1,227|| || || || |
|(2) MHC class IIα|
|Scaffold||Scaffold size (bp)||Location in the scaffold|
|Exon 1||Exon 2||Exon 3||Exon 4||Exon 5|
|JH128941||332,998|| || || ||3,322–3,325|| |
|JH128941||332,998||151,751–151,832||155,539–155,790|| ||176,762–176,765|| |
|JH128941||332,998|| || ||325,931–326,069||328,282–328,285|| |
|JH132119||43,233|| || ||3,707–3,845|| || |
|JH133334||12,986|| ||5,895–6,146||8,990–9,271||12,390–12,528|| |
|JH133683||11,598||5,404–5,332||4,970–4,722||4,281–4,000|| || |
|JH134128||10,105|| ||3,916–4,197||8,942–9,080|| || |
|JH135774||6,059||4,200–3,949|| || || || |
|AFYH01281632||4,068||221–140|| || || || |
|AFYH01282120||3,832|| ||2,036–2,287|| || || |
|AFYH01284662||2,301|| || ||(2,301)–2,166||142–4|| |
|AFYH01288882||1,242||(1,242)–1,169||806–558||142–(1)|| || |
|AFYH01285476||1,880||1,239–1,164||775–527||50–(1)|| || |
|(3) MHC class IIβ|
|Scaffold||Scaffold size (bp)||Location in the scaffold|
|Exon 1||Exon 2||Exon 3||Exon 4||Exon 5||Exon 6|
|JH127214.1||1,088,405|| || ||31,0508–31,0224|| || || |
|JH128993.1||322,248|| ||45,498–45,244||41,707–41,429||39,296–39,183|| || |
|JH131812.1||49,899|| || || ||6,788–6,675||3,740–3,708||3,585–3,560|
|JH132119.1||43,233||41,828–41,876|| || || || || |
|JH132855.1||15,952|| || ||2,699–2,977||4,132–4,245||5,389–5,421||5,525–5,550|
|JH132855.1||15,952|| || || || ||15,510–15,478||15,374-15,349|
|JH133922.1||10,792||3,829–3,730|| || || || || |
|JH134191.1||9,945|| || ||(9,945)–9,744||6,004–5,891||5,123–5,091||4,984–4,959|
|JH134411.1||9,293|| || ||6,864–6,583||4,538–4,425||2,434–2,402||2,279–2,254|
|JH134772.1||8,384||535–634||5,207-5,479|| || || || |
|JH135383.1||6,924|| ||979–1,251||5,209–5,490|| || || |
|AFYH01279633.1||5,076|| ||1,402–1,130|| || || || |
|AFYH01283994.1||2,863|| ||840–568|| || || || |
|AFYH01284697.1||2,290|| ||419–691|| || || || |
|AFYH01287635.1||1,377|| ||(1,377)–1,215|| || || || |
|AFYH01288041.1||1,323|| || || ||681–794|| || |
|AFYH01289521.1||1,191||778–877|| || || || || |
Our data on MHC class I genes largely corroborates a previous report (Betz et al., 1994) that identified sequences of L. chalumnae MHC I genes. The analysis of the MHC genes, with reference to their polymorphism, coalescence and evolution, should now be possible given these data and that of a recently published report of genome sequences of additional coelacanth specimens (Nikaido et al., 2013).
Recombination Activating Genes
Arguably, one of the most important events in the evolution of the adaptive immune system was the integration of the Rag genes via a transposon-based insertion event into the genome of a common ancestor of deuterostomes (Fugmann et al., 2006). Rag1 and Rag2, which mediate V(D)J recombination, are imperative for both the somatic generation of Ig and TCR, and ultimately, for the maturation of B- and T- lymphocytes. The genomic and transcriptomic databases of the Latimeria were searched using the partial sequences of previously identified coelacanth Rag1 and Rag2 (Brinkmann et al., 2004). The coelacanth Rag1 and Rag2 genes were localized to a 6.58 megabase scaffold (JH126568). Both genes consist of a single exon, unlike teleost fishes, and are in opposite transcriptional orientation (Fig. 9A). Coelacanth Rag1 is predicted to consist of 1,058 amino acids, whereas Rag2 consists of 522 amino acids. The distance between the two genes is 10.6 kb, which is shorter than in human (15 kb) but longer than in zebrafish (2.6 kb) and trout (2.4 kb) (Hansen and Kaattari, 1996; Willett et al., 1997). Of greater significance, long-range synteny is evident over the 17 genes flanking the Rag locus of coelacanth and tetrapods, compared with only two conserved flanking genes between coelacanth and bony fish (Brinkmann et al., 2004). These results suggest that the Rag genes have been highly conserved during sarcopterygian evolution in terms of both their gene organization and extended genomic milieu.
Figure 9. Analysis of coelacanth Rag genes. (A) Both Rag1 and Rag2 are located on genomic scaffold JH126568 (6,582,655 bp) at positions 121275–124451 and 135133–136701, respectively, and are oriented in a head-to-head manner. Each Rag gene is composed of a single exon. Only the first 150 kp of the large scaffold is illustrated. (B) Phylogenetic analysis of amino acid sequences of Rag1 (left) and Rag2 (right) by Maximum Likelihood method. The trees are rooted with bull shark Rag1 and Rag2; the % bootstrap replicates are indicated on the tree. The topology of the tree is consistent with known phylogeny, though strong bootstrap support is lacking for some of the nodes. GenBank accession numbers for all sequences used for (Rag1; Rag2): horse (NP_001243830; XP_001488023), rat (NP_445920; NP_001093998), sheep (XP_004016460; XP_004016461), lungfish (AAS75810; AAS75812), turtle (ACJ48241; AF369089), frog (ABS00344; AAI29720), zebrafish (NP_71464; NP_571460), fugu (AAD20561; AAD20562), trout (NP_001118209; AAB18138), carp (AAX16495; AAX16496), medaka (XP_004070148; XM_004069726), human (NP_00439; NP_000527), and cat (XP_003993217; XP_004001473).
Download figure to PowerPoint
The lack of lymphopoietic tissues in the RNAseq analyses limits the capacity to identify actively transcribed Rag genes. We only could identify three short transcripts (contig3388, contig89407, and contig10412) aligning to different locations within the Rag1 gene from the L. menadoensis liver transcriptome (Pallavicini et al., 2013) (Supplementary Fig. S5). No Rag2 transcripts were identified.
Rag1 and 2 sequences are frequently used for phylogenetic analyses due to their ubiquity in all jawed vertebrate taxa and to evolutionary behavior that is not outwardly affected by differences in molecular evolutionary rates (Brinkmann et al., 2004; Cramer et al., 2011). The degree to which these sequences are conserved is shown in Supplementary Table S6, which lists percent similarities and percent identities between coelacanth Rag1 and Rag2 versus those of various vertebrate taxa, and demonstrates that the Rag proteins are moderately, but not highly conserved. Phylogenetic trees constructed from the same amino acid sequences were used to assess the interrelationships of Rag1 and Rag2 among vertebrates (Fig. 9B) and more-or-less corroborate established phylogenetic relationships, although not always with high bootstrap support.
Activation-Induced Cytidine Deaminase
Activation-induced cytidine deaminase (Aicda, AID) is currently thought to be the master regulator of secondary antibody diversification through the initiation of three separate Ig diversification processes: somatic hypermutation, gene conversion, and class switch recombination. Somatic hypermutation involves a programmed mutational process affecting the V regions of Ig genes, whereas gene conversion is involved in partially templated replacement of portions of V regions of genes (Longerich et al., 2008). Both processes are mediated by AID and diversify the antibody repertoire. In contrast, class switch recombination does not alter the specificity of the antibody but supplants the C region of the Ig heavy chain and, consequently, its effector function (Kataoka et al., 1980). Class switch recombination (CSR) appears at the time of the emergence of the amphibians and is conserved in all tetrapods. It is absent in teleost fish (Stavnezer and Amemiya, 2004), although, paradoxically, teleost AID protein has been shown to undergo CSR catalytic activity in in vitro assays despite the fact that teleosts lack genomic loci amenable to CSR (Barreto et al., 2005; Wakae et al., 2006). The coelacanth AID gene is encompassed in two overlapping scaffolds: scaffold JH127875 (∼656 kb) and scaffold JH135912 (3,785 bp) as depicted in Supplementary Figure S6A. The predicted coding sequence is 582 bp (183 amino acids). The first few amino acids at the amino terminus are somewhat uncertain because of compromised sequence stretches at the 5′ end that confound annotation; however, the predicted protein contains the most important functional portions of the AID protein, including its catalytic domain and carboxy-terminal region, which are essential for the CSR activity (Ichikawa et al., 2006). Although the C-terminus clearly has the NLS motif the coelacanth C-terminus is distinct from any other AID in having that 10–20 residue extension. The alignment of the coelacanth AID to representative AID molecules from other species is given in Supplementary Figure S6B and shows high overall identity. The IgW loci of coelacanth do not possess cognate switch regions within and around their constant region, thereby precluding classical CSR; however, from an evolutionary standpoint it will be curious to assess the biochemical capabilities of coelacanth AID in a surrogate assay system.
Cluster of Differentiation (CD) Molecules
T-cells are classified into subsets based on their functionality and expression of distinct surface receptors. In mammals, the protein encoded by CD3-epsilon (CD3ϵ), together with CD3-gamma (Cd3γ), CD3-delta (CD3δ) and CD3-zeta (CD3ζ) and the TCR-α/β and -γ/δ heterodimers, form the TCR-CD3 complex. The CD3 components largely are responsible for antigen ligation events with intracellular signaling leading to the activation of the T-cell. CD3γ, CD3δ, and CD3ϵ chains, each of which contain a single extracellular Ig domain, are closely related. However, in chickens, amphibians and fish, the CD3γ and CD3δ subunits are replaced by a CD3γ/δ subunit (Bernot and Auffray, 1991; Dzialo and Cooper, 1997; Ropars et al., 2002; Araki et al., 2005; Park et al., 2005). It is inferred that separate mammalian CD3γ and CD3δ molecules were derived from a tandem gene duplication. In coelacanth, three CD3 chains, which are orthologous to CD3ϵ, CD3γδ, and CD3ζ, have been identified in both the genome assembly and transcriptome datasets. The scaffold, JH126582, contains both CD3γ/δ and CD3ϵ genes at nucleotide positions 2908204–2913094 and 2923833–2931815, respectively. CD3ζ is located on scaffold JH128766 between 158334 and 168776. The complete cDNA sequences of CD3 gamma-delta (Contig 43331) and zeta (Contig 96288) have been identified in the liver transcriptome dataset of L. menadoensis. A phylogenetic analysis of amino acid sequences of CD3ϵ, CD3γ, and CD3γ/δ has been performed (Fig. 10). All three chains of CD3 from coelacanth are distinct from the corresponding sequences from other fishes and, in terms of sequence homology, group together with the corresponding molecules found in avians and mammals.
Figure 10. Phylogenetic relationships of CD3. The phylogenetic tree was generated via Maximum Likelihood method. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial trees for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with the superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 30 predicted amino acid sequences. There were a total of 234 positions in the final dataset. GenBank accession numbers used in this analysis: carp.γ (DQ340867), fugu.γ (AB166800), seabass. γ (FN667954), medaka. γ (XM_004076021), flounder. γ (AB044573), human. γ (NP_000064), mouse.δ (NP_038515), frog. δ (XP_508789), chicken. δ (NP_990843), sterlet.ϵ (AJ242941), salmon. ϵ (GU180241), fugu. ϵ (AB166798), pig. ϵ (AY323829), mouse. ϵ (BC145926), dog. ϵ (M55410), sheep. ϵ (S53077), chicken. ϵ (EU779493), human. ϵ (BC049847), tilapia. ϵ (XP_003449345), Fugu.ζ (XM_003966619), Catfish. ζ (FJ809774), Halibut. ζ (FJ769820), trout. ζ (NM_001165113), human. ζ (AAA60394), rat. ζ (NP_740770), turtle. ζ (ADP21384), pig. ζ (NP_000725).
Download figure to PowerPoint
CD4, a single chain transmembrane glycoprotein, is expressed by helper T-cells and is a co-receptor with TCR in MHC II-mediated antigen recognition. CD4 has a fundamental role in thymocyte selection during development. In the context of antigen recognition by TCR, CD4 dimerizes and binds to the α2 and β2 domains of MHC class II molecules (Huang et al., 1997; Wu et al., 1997), acting as a TCR co-receptor. CD4 is composed of four Ig domains, a transmembrane region and a cytoplasmic tail that contains the canonical CXC motif involved in the interaction of CD4 with p56LCK, which is required for signal 1 of T-cell activation. A CD4 ortholog was identified in the L. menadoensis transcriptome dataset (Supplementary Fig. S9). Key functional motifs that potentially could be involved in the regulation of CD4 transcription also were identified. A large scaffold (JH126582, 771 kb) containing CD4 was identified in the L. chalumnae genome (Fig. 11A). However, many of the exons could not be identified owing to a 27 kb assembly gap (position 304318–356023 bp). This scaffold contains exon 1 (5′ UTR), exon 2 (5′ UTR and leader peptide), exon 9 and exon 10 (3′ UTR), based on the CD4 molecules found in human (Ansari-Lari et al., 1996), chicken (Koskinen et al., 2002) and other fish species, including zebrafish (unpublished data). A more extensive search led to the identification of a 10.4 kb scaffold (JH134022), which consists of the “missing” exons 3, 4, and 5. A phylogenetic analysis was carried out that included lungfish CD4 (Fig. 11B); however, the resolution of the coelacanth and lungfish branches was poor due to overtly long branches.
Figure 11. Coelacanth CD4 annotation and phylogenetic analysis. (A) Two scaffolds, JH126582 and JH10451, contain 4 and 3 exons, respectively. Scaffold JH134022, which contains exon 3, 4, and 5, maps to the assembly gap (∼23.75 kb) of scaffold JH126582. (B) The unrooted phylogenetic tree was inferred by using the Maximum Likelihood method. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 19 CD4 amino acid sequences, with a total of 535 positions in the final dataset. The resolution of the coelacanth and lungfish branches is poor due to the deep branch lengths. GenBank accession numbers of CD4s: Zebrafish (NP_001128568), fugu (NP_001072091), trout (NP_001118011), catfish (ABD93355), seabass (CAO98731), chicken (NP_989980), duck (AF378701), chimpanzee (NP_001009043), human (NP_000607), mouse (NP_038516), dog (NP_001003252), cow (NP_001096695), cat (NP_001009250), monkey (CAA51752), sheep (NP_001123374), whale (NP_001267583), and goat (ACG76115).
Download figure to PowerPoint
CD8 is a membrane bound glycoprotein found on cytotoxic T-cells that consists of either CD8αα homodimers or CD8αβ heterodimers. Both chains are composed of a single Ig domain linked to the membrane by a segment of extended polypeptide chain. Both CD8α and CD8β have been identified in most jawed vertebrates (Nagarajan et al., 2004; Moore et al., 2005; Suetake et al., 2007), and both genes also have been identified in the coelacanth genome assembly within the same locus (JH128706) at a distance of ∼84 kb (Fig. 12). The CD8β gene consists of nine exons and the spans ∼49 kb. For CD8α, four exons were predicted over a span of 25 kb that includes ∼16 kb of poorly assembled sequence. However, it was not possible to identify sequences corresponding to a transmembrane and cytoplasmic tail region of CD8α because of a sequence gap downstream of exon 4. Both coelacanth CD8α and CD8β show strong similarities to corresponding molecules of other vertebrates. A partial CD8α sequence (contig40330) has been identified in the liver transcriptome, but no expressed CD8β was identified.
Figure 12. Annotation of the scaffold containing CD8 genes. Genes for both CD8α and CD8β are located on JH128706, with an intergenic distance of ∼84 kb. Four exons of CD8α span ∼25 kb, including a stretch of ∼16 kb of poorly assembled sequence. The transmembrane and cytoplasmic tail could not be definitively identified owing to a sequence gap just downstream of exon 4. The CD8β gene consists of nine exons that span ∼49 kb.
Download figure to PowerPoint
Use of text searches on the Ensembl annotated assembly uncovered numerous other CD molecules (Supplementary Table S8). Manual BLAST searches on these molecules were used to validate that these were orthologous to their mammalian counterparts. The T-cell-specific surface glycoprotein, CD28, is located on JH127402 (430316–43931). CD9, which associates with CD3, CD4, CD5, CD29, and CD44, also was found in coelacanth. A gene encoding CD40 (costimulatory molecule involved in antigen presentation and class switching) was not identified. However, CD40L, also known as CD154 and a key member of TNF superfamily expressed on activated T-cells is found on scaffold JH126623 (1333194–1340474). CD40L is comprised of 3 exons and the translated amino acid residues show significant identity (∼40%) with CD40L of turtle (data not shown). CD45, known as leukocyte common antigen, was found on two scaffolds, JH127371 and JH126742. Key mammalian CD molecules, such as stem cell markers CD34, CD31, and CD117 have not been identified in the coelacanth genome, however, without better search tools and a more complete genome, it is, as yet, difficult to definitely state that they (including CD40) are truly absent.
Cytokines are small signaling molecules secreted by specific cells of the immune system that mediate signals between cells. Cytokines are transcribed from many cell types as needed in the course of an immune response. They are critical to the development and functioning of both the innate and adaptive immune responses, and modulate the responses in an autocrine or paracrine manner upon binding to their corresponding receptors (Zhu et al., 2013). Cytokines can be divided into interferons (IFNs), interleukins (ILs), tumor necrosis factors (TNFs), colony stimulating factors (CSF), and chemokines (Savan and Sakai, 2006).
A large number of ILs orthologous to those in mammals have been identified in the coelacanth genome (Fig. 13), however, others such as IL-2, IL-4, IL-5, IL-6, IL-7, IL-9, IL-15, and IL-21, which play crucial roles in the adaptive immunity of mammals, were not detected in this study. As is the case with many other genes, it is unclear whether they are absent from the genome or are not being detected because of substantial sequence divergence and/or sequencing-assembly issues. However, the cognate receptors for some of these (e.g., IL-2, IL-6, IL-7, and IL-21) definitively have been identified in the genome (Supplementary Table S9), making the latter explanation more plausible. STAT-6, a member of STAT family transcription factors that play a central role in exerting IL4-mediated biological responses, has been identified (JH126563: 4604284–4647669 bp). The genes encoding IL-1β and IL-18 have been discussed in detail in a companion paper (Boudinot et al., 2014).
Figure 13. Annotation of the scaffolds encoding coelacanth interleukin genes. Genes encoding interleukins were identified from the coelacanth annotated assembly at the Ensembl. Coding sequences were retrieved from their corresponding scaffolds and imported into Vector NTI sequence analysis software (Invitrogen). GENESCAN (Burge and Karlin, 1997) and BLASTX programs were used, respectively, to predict the exon–intron boundaries and determine amino acid alignments to other vertebrates.
Download figure to PowerPoint
Interleukin-10 is an anti-inflammatory cytokine capable of inhibiting synthesis of pro- inflammatory cytokines such as IFN-γ, IL-2, IL-3, TNFα, and GM-CSF, which are made by cells such as macrophages and regulatory T-cells. In mammals, IL-10 regulates growth and/or differentiation of B-cells, NK cells, cytotoxic and helper T-cells, mast cells, granulocytes, dendritic cells, keratinocytes, and endothelial cells (Moore et al., 2001), and also stimulates certain Th2 cells, mast cells and B-cells. The coelacanth IL-10 gene, is comprised of five exons (like that of human) and codes for a protein of 184 amino acid residues (Fig. 13). Coelacanth IL-10 has highest identity with IL-10 of green anole (52%), western clawed frog (50%), chicken (46%) and bottlenose dolphin (46%). The IL-10 family of interleukins also contains IL-20, which is present on the same scaffold (JH127167) as the gene for IL-10, localized ∼45 kb upstream and in opposite transcriptional orientation. Although human IL-20 is composed of five exons, only two exons were identified in coelacanth. This partial amino acid sequence has the highest overall identity (63%) to that of the gray short-tailed opossum.
IL-11, a stromal cell-derived member of the IL-6-type cytokine family, shares its receptor and signal transduction partially with IL-6. IL-11 functions in a wide range of hematopoietic and non-hematopoietic systems and supports the growth of plasmacytoma and hybridoma cells. IL-11 in coelacanth encodes six exons, as opposed to the human ortholog, which consists of five exons (Fig. 13). Although the homologous regions of exon 1 and 6 have not been established in any other animals, exons 2–5 are highly conserved with IL-11 genes of other animals, for example, zebra finch (66%), chicken (62%), western painted turtle (60%), and western clawed frog (57%). In addition, two IL-11 receptors (IL-11Rα and IL-11Rβ) also have been identified in coelacanth (Supplementary Table S9).
IL-12 is produced by activated macrophages and dendritic cells, stimulates the production of IFN-γ, induces the differentiation of Th cells to become Th1 cells (Heufler et al., 1996) and enhances the cytolytic functions of cytotoxic T-cells and NK cells. An IL-12 ortholog, which is comprised of exons, has been identified in coelacanth (Fig. 13) and is shown to exhibit moderate identity to rock pigeon (40%), chicken (40%), western painted turtle (40%), and peregrine falcon (39%). IL-10 production by Th1 cells requires an IL-12-induced STAT4 transcription factor (Saraiva et al., 2009), which also has been identified in coelacanth (JH128710 at the position of 208,614–275,761 bp).
IL-16 is involved with adaptive immunity. It is highly sensitive to mitogens phytohemaglutinnin and ConA and stimulates T-lymphocyte proliferation and activation in pufferfish (Wen et al., 2006). An IL-16 homolog, which consists of 26 exons (Fig. 13), has been detected in coelacanth. The predicted IL-16 encodes 1,592 amino acids and exhibits significant homology to that of the green sea turtle (45%), chicken (43%), and mallard (43%).
In human, the interleukin 17 family includes six members, IL-17A, IL-17B, IL-17C, IL-17D, IL-17E/IL-25, and IL-17F, which are produced by multiple cell types. At least four IL-17 genes (IL-17A, IL-17B, IL-17C, and IL-17D) (Fig. 13), as well as genes for three of the corresponding receptors have been identified in coelacanth (Supplementary Table S9). Both Il-17 and their receptors show closer phylogenetic relationships to orthologous forms in tetrapods than to the teleost orthologs (tree not shown).
In addition to interleukins, the coelacanth genome also contains many other orthologs to mammalian homolog cytokine genes and their receptors. For example, the gene for Transforming growth factor beta (TGF-β) has been identified in one scaffold (JH126565:1664929–1738138), whereas those for TGF-β receptors are present in two other scaffolds (JH126740:103899–129604 and JH128485:346074–379745). Macrophage migration inhibitory factor (MIF), which is one of the important regulators of innate immunity, is located in scaffold JH126570. A tumor necrosis factor receptor superfamily member 1B (TNF-1β) is found on scaffold JH126880 and discussed at length in the coelacanth innate immune paper (this issue). Lastly, we have identified genes for a large number of putative cytokines via our data mining efforts (Supplementary Table S9); these will be described and characterized in a separate report.