Role of recombination activating genes in the generation of antigen receptor diversity and beyond

Authors


Correspondence: Sathees C. Raghavan, Department of Biochemistry, Indian Institute of Science, Bangalore-560 012, India. Email: sathees@biochem.iisc.ernet.in

Senior author: Sathees C. Raghavan

Summary

V(D)J recombination is the process by which antibody and T-cell receptor diversity is attained. During this process, antigen receptor gene segments are cleaved and rejoined by non-homologous DNA end joining for the generation of combinatorial diversity. The major players of the initial process of cleavage are the proteins known as RAG1 (recombination activating gene 1) and RAG2. In this review, we discuss the physiological function of RAGs as a sequence-specific nuclease and its pathological role as a structure-specific nuclease. The first part of the review discusses the basic mechanism of V(D)J recombination, and the last part focuses on how the RAG complex functions as a sequence-specific and structure-specific nuclease. It also deals with the off-target cleavage of RAGs and its implications in genomic instability.

Introduction

Lymphocytes detect the antigens in the environment by means of antibodies on the surface of B cells and T-cell receptors (TCR) on the T cells. With the diverse and expanding array of antigens, the generation of antibody/TCR diversity using limited genetic resources remained a question that baffled scientists for decades. An almost limitless number of antigens exist in the environment and recent research suggests that among the millions of lymphocytes, each one expresses a structurally different antigen receptor to combat this plethora of antigens. How is the genetic information for all of these antigen receptors encoded in the DNA? Do cells carry enough DNA to encode all the antibody specificities? Or is it that random mutations generate this enormously diverse repertoire of antibodies? Two theories arose initially to answer these questions. Somatic mutation/variation theory suggested that a few inherited genes, with time, encountered mutations or recombinations to encode each antibody.[1] In contrast, germline theory proposed that the genome contains a large repertoire of antigen receptor genes and each of them encodes for separate, specific antibody.[2] Arguments supporting and opposing these theories were put forward and remained unresolved for several years. In this review, we summarize the basic principles that presently govern the generation of diversity of antibody and TCR with special emphasis on V(D)J recombination. We also discuss the role of recombination activating genes (RAGs) in the generation of antibody diversity and chromosomal translocations.

The discovery of RAG genes

In the early 1990s, it was shown that two tightly linked genes, RAG1 and RAG2, which were unique to vertebrates, were responsible for the generation of antigen receptor diversity.[3, 4] An elegant series of experiments involving genomic DNA transfections into mouse 3T3 fibroblasts lacking V(D)J recombination activity, showed that the transfer of a single genomic locus could make these cells proficient for V(D)J recombination.[5] Following this, using the technique of ‘genome walking’, the RAG1 gene was discovered. Comparative sequence analysis of RAG1 genes from various species indicated that they were evolutionarily conserved.[3] Further studies demonstrated that the locus contained two closely linked genes, RAG1 and RAG2 on chromosome 11p in humans and chromosome 2p in mice.[4, 6] The coding and 3′ untranslated sequences of RAG1 and RAG2 were contained in a single exon.[6] The proteins encoded by the RAG genes play a crucial role in the generation of antigen receptor diversity as discussed below.

Antigen receptors of lymphoid system

There are two major antigen receptors for the lymphoid system, antibodies and TCR in B and T cells, respectively. Antibodies or immunoglobulins are glycoproteins that are either secreted out from B cells or remain bound to their membrane. Each antibody is composed of four polypeptides – two identical heavy chains (H) and two identical light (L) chains held together by disulphide bonds.[7, 8] Amino acid sequence at the N-terminus of both chains varies greatly among different antibodies, whereas the C-terminal sequence remains strikingly similar.[9] These two regions are referred to as the variable (V) and constant (C) regions, respectively. The V region composed of 110–130 amino acids, gives the antibody its specificity for binding to antigen. The exon encoding the variable region is assembled from two (or three) individual gene segments,[2, 10] which are classified into variable (V),[11] diversity (D) (present only in immunoglobulin heavy chains, not in the light chains)[12-14] and joining (J)[15, 16] regions (Fig. 1). To obtain a functional variable region, recombination between D and J occurs to give a DJ segment, followed by another recombinational event involving V to yield the final V(D)J fragment. The germline consists of multitudes of V, D and J gene segments and random recombination among these results in the generation of approximately 106 different combinations, accounting for the dramatic expansion in the variability of the sequence (Fig. 1).

Figure 1.

Genomic organization of the antigen receptor loci in human beings. The maroon, green, purple and blue rectangles represent V, D, J and C segments, respectively and the blue ovals indicate enhancers. (a) The human IgH locus, located on chromosome 14 at band 14q32.33. It has 123 to 129 IgHV genes depending on haplotypes, 27 IgHD segments, and nine IgHJ segments and in the most frequent haplotype, 11 IgHC genes. (b) The human IgK locus located on the chromosome 2, at band 2p11.2. It consists of 76 IgKV genes, five IgKJ segments, and a unique IgKC gene. (c) The human IgL locus located on chromosome 22 at band 22q11.2. It consists of 73 to 74 IgLV genes, 7 to 11 IgLJ and 7 to 11 IgLC genes depending on the haplotypes. (d) The human TCRA locus located on chromosome 14 at band 14q11.2. It consists of 116 genes: 54 TCRAV, 61 TCRAJ and a unique TCRAC gene. TCRD locus is embedded in the TCRA locus, between the TCRAV and TCRAJ genes. Though represented in the figure as ‘TCRAV 54’, among the 54 V genes, five can be rearranged to either TCRAJ or TCRDD genes and therefore can be used in the synthesis of α or δ chains. (e) The human TCRB locus on chromosome 7 at 7q34, consisting of 64–67 TCRBV, 2 TCRBD, 14 TCRBJ and two TCRBC genes. (f) The human TCRG locus is on chromosome 7 at 7p14 and consists of 12–15 TCRGV, five TCRGJ and two TCRGC1 genes. The data presented are in accordance with IMGT (http://www.imgt.org).

The TCR is structurally similar to the antigen-binding fragment [F(ab)] of the antibody. Similar to the antibodies, it has two glycoprotein subunits and each is encoded by a somatically rearranged gene. The TCRs are composed of either an αβ or a γδ pair of subunits. The structure of TCR is further stabilized by interchain disulphide bonds. At the 5′ end of each of the TCR loci there is a cluster of V segments followed by J segments (Fig. 1). In the TCR-β and TCR-δ chain loci, these segments are interrupted by a series of D segments similar to that of the immunoglobulin heavy chain (Fig. 1).

Somatic recombination occurs in a strict regimen, with D to J recombination preceding V to DJ on the heavy chain and the heavy chain recombination in turn occurring before that of the light chains.[17] Similarly, the TCR-β rearrangement always precedes that of TCR-α. Besides, the TCR rearrangement is restricted to early stages of the T-cell development and immunoglobulin rearrangement to early B cells. Adherence to this chronological order relies on the cell lineage and cell cycle restricted expression of participating enzymes as well as on chromosomal accessibility of the recombining loci.[18]

A mature B lymphocyte expresses a single species of antibody possessing a unique specificity in spite of having multiple allelic loci for different antibody chains. This specificity is acquired by a process termed allelic exclusion.[19] Initially, two models were put forward to explain this process. In the case of the ‘regulated model’, gene assembly proceeds on one chromosome at a time and the protein products suppress further rearrangements by feedback inhibition.[20] The ‘stochastic model’ suggests that inefficient V(D)J rearrangement results in allelic exclusion.[21] Later studies demonstrated that the process was regulated by several factors including nuclear localization, non-coding RNA transcription, alteration in chromatin structure by histone modifications, epigenetic alterations at the DNA level, and feedback signalling from expressed alleles.[19]

Antigen receptor loci

By 1998, immunoglobulin and TCR genes were fully identified and sequenced. There are seven major loci, which undergo somatic recombination in developing B and T cells during the formation of antigen receptors. These are immunoglobulin heavy chain (IgH), light chain κ (IgK) and light chain λ (IgL) in B cells and TCR-α (TCRA), TCR-β (TCRB), TCR-γ (TCRG) and TCR-δ (TCRD) in T cells. Each of these is further divided into subexons, which undergo the recombination (Fig. 1).

Mechanism of V(D)J recombination

Recombination signal sequence and 12/23 rule

A fairly conserved DNA sequence known as recombination signal sequence (RSS) resides adjacent to each subexon and consists of a palindromic heptamer (CACAGTG) and an A/T-rich nonamer (ACAAAAACC)[14, 22-24] (Fig. 2a,b). The first three nucleotides of the heptamer are crucial for the recombination activity.[25, 26] Though the nonamer binding domain of RAG1 is well characterized, the region of the RAG complex that recognizes the heptamer is yet to be deciphered.[27, 28] The heptamer and nonamer are separated by a spacer DNA sequence of either 12 bp (12RSS) or 23 bp (23RSS) (Fig. 2a). Although the length of the spacer is conserved, its sequence is not of much importance.[12, 24] Generally, a 12RSS recombines only with a 23RSS and vice versa, a restriction termed as the ‘12/23 rule’ (Fig. 2b), which prevents non-productive rearrangements. The coupled cleavage of a 12RSS and 23RSS requires Mg2+, whereas Mn2+ supports RAG-mediated nicking of a single RSS.[29]

Figure 2.

Mechanism of the V(D)J recombination. (a). Recombination Signal Sequence (RSS). In 12RSS (represented by open triangle), the heptamer is separated from nonamer by 12 bp and in 23RSS (represented by closed triangle) by 23 bp. (b). Distribution of RSS at the seven antigen receptor loci in human beings and the 12/23 rule. The V, D and J regions of the coding segments are depicted as rectangles; the flanking 12RSS is represented by an open triangle and 23RSS by a closed triangle. As per the ‘12/23 rule’, V(D)J recombination is restricted to gene segments that are flanked by signals of different spacer lengths. (c). Mechanism of V(D)J recombination. Schematic model of the protein–DNA complexes in V(D)J recombination. The 12RSS and 23RSS are represented as white and blue triangles, respectively. Coding segments are shown as rectangles and proteins as shaded ovals. ‘V’ stands for variable and ‘J’ for joining gene segments. Recombination activating genes (RAGs; shaded ovals) bind the RSS and introduce a single-strand nick in the DNA precisely at the border between the heptamer of RSS and the coding segment. This nick generates a free 3′-OH group on the coding end which is then covalently linked to the opposing phosphodiester bond, leaving a covalently closed hairpin on the coding end and a blunt 5′ phosphorylated signal end. The ends remain associated with RAGs; constituting a transitory structure referred to as ‘post-cleavage complex’. Following this, the coding ends are processed and joined by NHEJ to create the exon, which forms the antigen-binding region of antigen receptors. Signal end remains bound to RAGs constituting a ‘signal end complex’ that forms the signal joint.

Recently, the ‘beyond 12/23’ rule has been proposed to explain the exclusion of direct TCRBV to TCRBJ joining in the TCR-β region, in spite of the incidence of appropriately oriented pairs of 12RSS and 23RSS.[30] The exclusion was enforced during the DNA cleavage step of the V(D)J recombination and was attributed to several factors, like relatively slow nicking of the TCRB substrates and poor synapsis of the TCRBV and the TCRBJ.[31] Extrachromosomal V(D)J recombination assays could recapitulate the ‘beyond 12/23 rule’ in the TCRBV, implying that it is solely the RAG proteins and RSSs, which play a role in establishing this restriction.[32] In contrast, with respect to TCRDV locus, the involvement of other factors was also suggested.[33]

The perfect cleavage

RAG1 and RAG2 initiate recombination by introducing a single-strand nick in DNA precisely at the border between the heptamer of RSS and the coding segment.[34] The 3′-OH group of the nick at the coding end then becomes covalently linked to the opposing phosphodiester bond of antiparallel strand by a transesterification reaction resulting in hairpin structure at the coding end and blunt signal end.[35] The signal ends remain associated with RAG proteins resulting in a transitory structure referred to as a ‘post-cleavage complex’.[36, 37] In the presence of Mg2+, nicking can take place, and with Mn2+, both nicking and hairpin formation are observed.[37] However, this was demonstrated only in vitro in a non-physiological concentration of MnCl2 using isolated RSS substrates and not in the physiological 12RSS and 23RSS pair.[37] The in vivo scenario is still unclear though it is commonly thought that Mg2+ is the physiological divalent metal ion involved in RAG-mediated cleavage. RAG1 and RAG2 are assisted by high mobility group proteins of the HMG-box family (HMGB1 and HMGB2) for bringing two signal ends together. The HMG proteins interact with the nonamer binding domain of RAG1 in the absence of DNA and enhance its intrinsic DNA bending activity.[38] Following resolution of the hairpin structure, the coding ends are joined to create the exon, which forms the antigen-binding region of the antigen receptors (Fig. 2c). The signal ends remain bound to RAGs, which in turn protect the ends from further nuclease digestion [36, 39] (Fig. 2c).

The imperfect joining

The blunt-ended signal ends can be directly ligated without any modification, while the coding ends undergo further processing (Fig. 2c).[34, 35] The hairpin at the coding end is opened and joined together by non-homologous end joining (NHEJ), the DNA double-strand break repair pathway.[40, 41] Artemis, in conjunction with DNA-PKcs, acts as an endonuclease and resolves the hairpins formed during V(D)J recombination.[42] Ku heterodimer, consisting of Ku70 and Ku80, binds to the broken DNA ends and forms a complex with DNA-PKcs.[43] Artemis has an inherent 5′-3′ exonuclease activity, whereas in association with DNA-PKcs it acts as a 5′-3′ endonuclease.[42] The ends are filled in by the Pol X family of polymerases namely Pol μ and Pol λ. Mice deficient in Pol μ are shown to have shorter immunoglobulin light chain V to J junctions,[44] while those lacking Pol λ have shorter immunoglobulin heavy-chain D to J and V to DJ junctions.[45] In addition, Pol μ plays a role in the processing of 3′ ends while Pol λ processes the 5′ ends.[45] This would suggest that Pol μ is involved in V(D)J recombination, but not Pol λ.[46] The ligase IV/XRCC4 complex ligates the processed ends[47, 48] with the help of XLF.[49, 50] Ku70, Ku80, XRCC4 and Ligase IV are considered to be the ‘core’ NHEJ factors as these proteins were conserved during evolution and are required for all known NHEJ reactions.[51, 52] These are also inevitable for the joining of both coding and signal ends. On the other hand, DNA-PKcs and Artemis are believed to have evolved more recently and are needed only for the joining of coding ends.[52]

Diversity at the junction

At the time of joining of V, D and J subexons, several modifications like insertions and deletions can occur at the junctions resulting in further increase in the antigen receptor diversity. Asymmetric hairpin opening at the coding ends due to nicking a few bases away from the terminus results in one DNA strand longer than the other. The shorter strand is extended by addition of nucleotides complementary to the longer strand resulting in the insertion of palindromic (P) nucleotides at the coding joints.[53] Terminal deoxynucleotidyl transferase (TdT) and DNA Pol μ further diversify these junctional sequences by catalysing the addition of non-templated nucleotides (N-nucleotides) to the coding ends.[54] The junctional diversification can expand the diversity upto 1011 from the earlier 106 through the combinatorial diversification.

Non-standard recombination products

Alternative outcomes of V(D)J recombination reported were ‘hybrid joint’ and ‘open-shut joint’. During the formation of ‘hybrid joint’, the coding end of one subexon is joined to the signal end of another following the initial cleavage step of V(D)J recombination. In certain cases, the original pair of coding and signal ends, which was separated during the RAG cleavage phase, rejoins leading to formation of an ‘open-shut joint’.[55] When there are no modifications at the joints, ‘open-shut joints’ are hard to detect. The released signal ends may non-specifically attack double-stranded DNA leading to transposition.[55]

Other modes of generation of the diversity

The antigen receptors are further modified by two processes, namely class switch recombination (CSR) and somatic hypermutation (SHM). The CSR refers to the rearrangements of the constant regions of antigen receptors upon encountering an antigen. This further expands the variability in the constant region following rearrangement at the variable region. The CSR replaces the expression from Cμ to Cγ, Cε or Cα, resulting in the switching of immunoglobulin isotype from IgM to IgG, IgE, or IgA without changing the antigen specificity (Fig. 3).[56] The immunoglobulin CH locus comprises an array of CH genes, flanked by a switch (S) region at its 5′ region. The CSR takes place between two S regions, resulting in the loop-out deletion of the intervening DNA segments as circular DNA [57] (Fig. 3). The SHM refers to the random genetic mutations that occur in the B cells (and not T cells) at certain hotspots in the antigen-binding regions following an antigen encounter and results in the increased affinity of the receptor to the antigen.[58] As a result of this, a fraction of the antibodies possessing low-affinity receptors to the defined antigen, further increase their affinity and undergo expansion. The SHM takes place in the V region of both H and L chain genes (VL/H), introducing a million times more point mutations than the genome-wide background leading to the generation of high-affinity antibodies. Hence, CSR and SHM act on entirely different targets, i.e. CH and VL/H, respectively. Therefore, it was believed that these two processes were regulated differently. However, recently, it has been shown that the same enzyme, the activation-induced cytidine deaminase initiates both CSR and SHM in mice and humans.[57, 59, 60]

Figure 3.

Mechanism of generation of the antigen receptor diversity by class switch recombination (CSR). CSR determines the type of constant region of antigen receptor by exchanging the gene encoding the immunoglobulin heavy chain constant region (CH) with one of the downstream CH genes. The figure depicts CSR between Sμ and Sα in the human immunoglobulin heavy chain locus. V, D and J segments of antigen receptor loci are represented as maroon, purple and green rectangles. The ovals represent the switch region and enhancers are marked by ‘E’. During the process, DNA double-strand breaks are introduced at the switch (S) regions followed by their rejoining. This leads to the deletion of the intervening sequence between S regions and juxtaposition of rearranged VDJ segment with the downstream CH fragment.

Mechanism of action of RAG proteins

RAG1

Murine RAG1 comprises 1040 amino acids. The ‘core region’ of RAG1 (cRAG1) consisting of amino acids 384–1008, is essential for all activities in vivo and in vitro.[61, 62] RAG1 exists as a homodimer in solution.[63] It harbours all the elements that are indispensible for activity including the active site, DNA-binding domain and RAG2-interacting domain. The core regions acted as focal points of subsequent research, mainly because they were more soluble than their full-length counterparts. Using surface plasmon resonance and in vivo one-hybrid experiments, it was shown that the N-terminus of cRAG1 (amino acids 384–460) harbours the nonamer binding region.[28] The heptamer recognition region of RAGs still remains obscure. The DDE motif (a triad of three acidic amino acids: D600, D708 and E962) of RAG1 forms the catalytic centre of the RAG1/RAG2 complex,[64-66] which plays a role in chelating the two divalent metal ions essential for catalysis.[67] The N-terminal non-core region (amino acids 1–383) contains a RING domain fold, which exhibits ubiquitin ligase activity.[68]

Studies by Rodgers's group[63] using limited proteolysis showed that murine cRAG1 is composed of topologically independent domains that can function individually. These include the N-terminal, the central and the C-terminal domains. The central domain has the heptamer binding site, RAG2 binding site and zinc finger motif. The C-terminal domain has the dimerization region and binds DNA co-operatively. Murine cRAG1 was successfully expressed in Escherichia coli as a fusion protein with Maltose binding protein (MBP) tag with high yield and solubility and was active when combined with cRAG2 expressed in human embryonic kidney cell line.[69] However, there is no report of successful bacterial expression of RAG2.

RAG2

Murine ‘core RAG2’ consists of amino acids 1–383 out of the total 527. The molecular function of core RAG2 remains elusive. RAG2 consists of an N-terminal 6-bladed beta-propeller domain and a C-terminal plant homeo domain (PHD).[70, 71] The PHD is a motif characteristic of chromatin remodelling proteins.[72] It has been predicted to facilitate the ordered rearrangement of IgH chains and the binding of core histone proteins.[72-74] The C-terminus of RAG2 contains a threonine residue (T490) that acts as a target of Chk2 kinase.[75] Phosphorylation of this amino acid regulates the proteosomal degradation of RAG2 at the G1/S transition of the cell cycle.[76] This regulatory mechanism ensures that RAG2 is degraded in a cell-cycle-dependent manner preventing RAG-induced DNA breaks during replication. Biochemical analysis of recombinant RAG2 has identified several basic residue mutants defective in catalysis. Accordingly, Schatz's group[77] has proposed a model for the interaction of RAG2 with DNA in which the amino acids K119 and K283 directly contact DNA. It was shown that the PHD finger specifically recognizes histone 3 trimethylated at lysine 4 (H3K4me3).[78] The H3K4me3 increases the catalytic turnover number (Kcat) of RAGs as well as tethering it to DNA.[79] Recently, it was reported that mice harbouring RAG2 with a deleted C-terminal end rapidly developed thymic lymphomas with chromosomal translocations, amplifications and deletions involving TCRA/D and IgH loci implying its role in genomic stability.[80]

Interaction between RAG1 and RAG2

Protein–DNA cross-linking studies have suggested that cRAG1/cRAG2 may form specific contacts with the heptamer of RSS.[81, 82] In addition, binding assays have shown that cRAG1, even in the absence of RAG2, could bind to RSS in a manner that was specific to both heptamer and nonamer.[38, 82] RAG1 and RAG2 can interact and exist as a complex.[83] Modification interference and direct footprinting studies indicated that RAG1 alone could make extensive contacts with the nonamer, whereas RAG2 is needed for the heptamer occupancy.[81] Direct interaction between RAG2 and DNA in the RAG1/RAG2/RSS complex has also been reported.[77] Fluorescence anisotropy and gel mobility shift assays indicate that RAG1 exhibits sequence specific binding character. It has been suggested that this property is masked by its non-specific DNA interactions and addition of RAG2 effectively rescues RAG1 from this non-specific binding.[84]

Using chromatin immunoprecipitation, it was shown that in vivo RAG binding was focused at the ‘recombination centres’ involving a small region encompassing J (and where present, J-proximal D) subexons in the IgH, IgK, TCRB and TCRA loci.[85, 86] The study showed that RAG2 binds throughout the genome at sites with substantial levels of H3K4me3. RAG1 binding was shown to be restricted to highly active chromatin in the presence of arrays of RSSs.[85, 86]

Role of RAGs in receptor editing

Expression of RAG in early B cells occurs in two waves, the former is responsible for V(D)J rearrangement of IgH genes in pro-B and pre-B-I cells and the latter for VJ recombination of the IgL genes in small pre-B-II cells.[87] There are also reports of a third wave of RAG expression, induced in activated mature B cells in vitro and in vivo.[88] Spleen B cells from mice activated with lipopolysaccharide or interleukin-4 showed expression of RAGs.[88] Mice immunized with the antigen trinitrophenyl-keyhole limpet haemocyanin also showed expression of RAG1 and RAG2 in the inguinal and popliteal cells of draining lymph nodes.[88] In addition, immature B cells carrying self-reactive receptors exhibited up-regulated RAG expression. Following B-cell antigen receptor cross-linking, an increased level of RAG mRNA was reported in cultures of developing B cells.[89, 90] The sustained RAG expression allowed secondary V(D)J recombination that involves the replacement of self-reactive antibody V region genes by the V(D)J recombination. During this process, B cells with a non-self-reactive receptor are allowed to exit from prolonged V(D)J recombination, whereas the self-reactive cells are retained for further editing, until they produce a non-self-reactive receptor.[91] ‘Receptor editing’ refers to the secondary rearrangement that occurs in immature B and T cells, which plays a role in mediating tolerance.[92-95] ‘Receptor revision’ refers to the secondary rearrangement in mature B and T cells which alters their receptors in response to antigenic stimulation.[96] In the case of immunoglobulin light chain and TCRA that lacks the D gene segments, the secondary rearrangement occurs between unrearranged V gene segments upstream and J segments downstream with deletion of the original rearranged VJ segment. These rearrangements do not violate the 12/23 rule. However, in the case of IgH and TCRB, rearranged gene contains a D segment and all other unused D segments are lost during DJ and VDJ rearrangements leaving behind only non-compatible RSSs. This obstacle is overcome by the presence of a 3′ sequence of the V segment, which plays the role of a surrogate RSS, thereby replacing the previously rearranged V, while retaining the already rearranged DJ.[96, 97]

RAG mediated transposition

RAGs have been shown to exhibit transposition activity by integrating excised RSS-flanked signal ends into a target DNA molecule, in vitro. Integration can be intermolecular wherein the target DNA is a plasmid or intramolecular in which the target can be the intervening sequence stretching between RSSs.[98-100] Integration was not sequence-specific but was targeted to altered DNA structures like hairpins.[101] Several lines of studies compared RAGs with bacterial transposons and revealed striking similarities.[102] Isolated studies have shown that RAG transposition can occur in vivo.[103, 104] The first among these demonstrated interchromosomal transpositions, wherein TCR-α signal ends from chromosome 14 inserted into the X-linked hypoxanthine-guanine phosphoribosyl transferase locus, resulted in gene inactivation.[103] It was also shown that RAG expression in yeast could lead to transposition.[104] The transib transposase from the insect Helicoverpa zea was shown to be active in vitro and its breakage and joining activities mimicked that of RAG, providing strong evidence that RAGs and transib transposases were derived from a common progenitor.[105] However, there is no evidence that RAG-mediated transposition can occur in the mammalian genome. This can be the result of the stringent regulation of the process in the mammalian system.[106]

Structure-specific nuclease activity of RAGs

In contrast to the standard function of being a recombinase, later studies pointed out that the RAG complex can also act as a structure-specific nuclease and this property has several implications in the pathological roles of the RAG complex (Fig. 4). Studies suggested that RAGs possess a structure-specific 3′ flap endonuclease activity that can remove single-strand (ss) extensions from branched DNA structures.[107] RAGs also showed hairpin opening activity in the presence of MnCl2.[108, 109] The fragility of the BCL2 major breakpoint region was attributed to its acquiring a stable non-B DNA structure in the genome, which was prone to RAG cleavage.[110] Further, it was shown that RAGs could cleave symmetric bubbles, heterologous loops and potential G-quadruplex structures at the physiological concentrations of MgCl2[111, 112] (Fig. 4). RAG nicking efficiency was dependent on size of heteroduplex DNA and the proximity to RSS sequence did not affect nicking efficiency.[113] It was also observed that structure specificity of RAGs could be attributed to the sequence at the single-stranded region. Cytosines were the most preferred, followed by thymines while purines were not cleaved at all. A consensus sequence of ‘C(d)C(s)C(s)’ (d, double-stranded; s, single-stranded) was also proposed for the generation of breaks at single-strand/double-strand transitions.[114] The nonamer binding region of RAG1 was not important for RAG cleavage at non-B DNA structures, in contrast to that at RSS.[115] The study showed low cleavage kinetics and a lack of cleavage complex formation at heteroduplex DNA, as the two mechanisms that ensured the control of the pathological activity of RAGs.[115]

Figure 4.

Structure-specific nuclease action of recombination activating genes (RAGs). The cleavage targets of the RAG complex when it acts as a structure-specific nuclease. The shaded ovals represent the RAG complex and the cleavage sites are indicated by red arrows.

Off-target cleavage by RAGs and its pathological relevance

In an ideal scenario, RAGs target RSS within the immunoglobulin/TCR loci. However, a large number of RSS-like sequences (cryptic RSS) exist throughout the genome and this would lead to the non-specific targeting of RAGs leading to DNA double-strand breaks outside the immunoglobulin/TCR loci, resulting in genomic rearrangements. If the rearrangement juxtaposes the immunoglobulin/TCR regulatory sequences like promoters or enhancers to proto-oncogenes, it could lead to over-expression of the oncogenes culminating in lymphoid malignancies. RAGs are known to generate breaks at sequences resembling heptamer or nonamer because of misrecognition in several leukaemias and lymphomas, which include translocations like MTS1, LMO2, TTG-1, SIL and SCL.[116-119] The discovery that RAGs can detect and cleave non B-DNA structures further increased the spectrum of non-specific cleavage by RAGs.[110] In case of t(14;18) translocation at follicular lymphoma wherein nearly 75% of the breakpoints are dispersed over a 150-bp region called major breakpoint region of BCL2,[120] it has been shown that a non-B structure can form, which is specifically targeted and cleaved by RAGs.[110, 111] Later, the nature of this structure was identified as a G-quadruplex.[112, 121] It has also been shown that RAGs can cleave at an eight-nucleotide motif ‘CCACCTCT’ in the minor breakpoint cluster of the BCL2 in a nonamer-independent manner.[122]

Conclusions

To generate a functional antibody or TCR, several of the genomic segments propagating in the embryo have to select each other, merge in various combinations and further modify themselves. Though the main players in the process have been identified, the mechanism by which each of the individual proteins acts and broadly how the chronological order is regulated are not known. The structure of RAG proteins still remains elusive. Several questions regarding the structure specificity of RAGs are unclear. Biochemical and biophysical studies on the domains within the core and non-core regions of these proteins, studies on the full length proteins in vivo, and detection of their interacting partners are being pursued.

Acknowledgements

We would like to thank Dr Mridula Nambiar, Ms Mrinal Srivastava, Dr Abani Naik, Ms Rupa Kumari and Ms Deepthi R for critical reading of the manuscript. This work was supported by grant (SR/SO/BB/0037/2011) from DST, India. NM is supported by a Senior Research Fellowship from CSIR, India.