Correspondence: Sathees C. Raghavan, Department of Biochemistry, Indian Institute of Science, Bangalore-560 012, India. Email: firstname.lastname@example.org
Senior author: Sathees C. Raghavan
V(D)J recombination is the process by which antibody and T-cell receptor diversity is attained. During this process, antigen receptor gene segments are cleaved and rejoined by non-homologous DNA end joining for the generation of combinatorial diversity. The major players of the initial process of cleavage are the proteins known as RAG1 (recombination activating gene 1) and RAG2. In this review, we discuss the physiological function of RAGs as a sequence-specific nuclease and its pathological role as a structure-specific nuclease. The first part of the review discusses the basic mechanism of V(D)J recombination, and the last part focuses on how the RAG complex functions as a sequence-specific and structure-specific nuclease. It also deals with the off-target cleavage of RAGs and its implications in genomic instability.
Lymphocytes detect the antigens in the environment by means of antibodies on the surface of B cells and T-cell receptors (TCR) on the T cells. With the diverse and expanding array of antigens, the generation of antibody/TCR diversity using limited genetic resources remained a question that baffled scientists for decades. An almost limitless number of antigens exist in the environment and recent research suggests that among the millions of lymphocytes, each one expresses a structurally different antigen receptor to combat this plethora of antigens. How is the genetic information for all of these antigen receptors encoded in the DNA? Do cells carry enough DNA to encode all the antibody specificities? Or is it that random mutations generate this enormously diverse repertoire of antibodies? Two theories arose initially to answer these questions. Somatic mutation/variation theory suggested that a few inherited genes, with time, encountered mutations or recombinations to encode each antibody. In contrast, germline theory proposed that the genome contains a large repertoire of antigen receptor genes and each of them encodes for separate, specific antibody. Arguments supporting and opposing these theories were put forward and remained unresolved for several years. In this review, we summarize the basic principles that presently govern the generation of diversity of antibody and TCR with special emphasis on V(D)J recombination. We also discuss the role of recombination activating genes (RAGs) in the generation of antibody diversity and chromosomal translocations.
The discovery of RAG genes
In the early 1990s, it was shown that two tightly linked genes, RAG1 and RAG2, which were unique to vertebrates, were responsible for the generation of antigen receptor diversity.[3, 4] An elegant series of experiments involving genomic DNA transfections into mouse 3T3 fibroblasts lacking V(D)J recombination activity, showed that the transfer of a single genomic locus could make these cells proficient for V(D)J recombination. Following this, using the technique of ‘genome walking’, the RAG1 gene was discovered. Comparative sequence analysis of RAG1 genes from various species indicated that they were evolutionarily conserved. Further studies demonstrated that the locus contained two closely linked genes, RAG1 and RAG2 on chromosome 11p in humans and chromosome 2p in mice.[4, 6] The coding and 3′ untranslated sequences of RAG1 and RAG2 were contained in a single exon. The proteins encoded by the RAG genes play a crucial role in the generation of antigen receptor diversity as discussed below.
Antigen receptors of lymphoid system
There are two major antigen receptors for the lymphoid system, antibodies and TCR in B and T cells, respectively. Antibodies or immunoglobulins are glycoproteins that are either secreted out from B cells or remain bound to their membrane. Each antibody is composed of four polypeptides – two identical heavy chains (H) and two identical light (L) chains held together by disulphide bonds.[7, 8] Amino acid sequence at the N-terminus of both chains varies greatly among different antibodies, whereas the C-terminal sequence remains strikingly similar. These two regions are referred to as the variable (V) and constant (C) regions, respectively. The V region composed of 110–130 amino acids, gives the antibody its specificity for binding to antigen. The exon encoding the variable region is assembled from two (or three) individual gene segments,[2, 10] which are classified into variable (V), diversity (D) (present only in immunoglobulin heavy chains, not in the light chains)[12-14] and joining (J)[15, 16] regions (Fig. 1). To obtain a functional variable region, recombination between D and J occurs to give a DJ segment, followed by another recombinational event involving V to yield the final V(D)J fragment. The germline consists of multitudes of V, D and J gene segments and random recombination among these results in the generation of approximately 106 different combinations, accounting for the dramatic expansion in the variability of the sequence (Fig. 1).
The TCR is structurally similar to the antigen-binding fragment [F(ab)] of the antibody. Similar to the antibodies, it has two glycoprotein subunits and each is encoded by a somatically rearranged gene. The TCRs are composed of either an αβ or a γδ pair of subunits. The structure of TCR is further stabilized by interchain disulphide bonds. At the 5′ end of each of the TCR loci there is a cluster of V segments followed by J segments (Fig. 1). In the TCR-β and TCR-δ chain loci, these segments are interrupted by a series of D segments similar to that of the immunoglobulin heavy chain (Fig. 1).
Somatic recombination occurs in a strict regimen, with D to J recombination preceding V to DJ on the heavy chain and the heavy chain recombination in turn occurring before that of the light chains. Similarly, the TCR-β rearrangement always precedes that of TCR-α. Besides, the TCR rearrangement is restricted to early stages of the T-cell development and immunoglobulin rearrangement to early B cells. Adherence to this chronological order relies on the cell lineage and cell cycle restricted expression of participating enzymes as well as on chromosomal accessibility of the recombining loci.
A mature B lymphocyte expresses a single species of antibody possessing a unique specificity in spite of having multiple allelic loci for different antibody chains. This specificity is acquired by a process termed allelic exclusion. Initially, two models were put forward to explain this process. In the case of the ‘regulated model’, gene assembly proceeds on one chromosome at a time and the protein products suppress further rearrangements by feedback inhibition. The ‘stochastic model’ suggests that inefficient V(D)J rearrangement results in allelic exclusion. Later studies demonstrated that the process was regulated by several factors including nuclear localization, non-coding RNA transcription, alteration in chromatin structure by histone modifications, epigenetic alterations at the DNA level, and feedback signalling from expressed alleles.
Antigen receptor loci
By 1998, immunoglobulin and TCR genes were fully identified and sequenced. There are seven major loci, which undergo somatic recombination in developing B and T cells during the formation of antigen receptors. These are immunoglobulin heavy chain (IgH), light chain κ (IgK) and light chain λ (IgL) in B cells and TCR-α (TCRA), TCR-β (TCRB), TCR-γ (TCRG) and TCR-δ (TCRD) in T cells. Each of these is further divided into subexons, which undergo the recombination (Fig. 1).
Mechanism of V(D)J recombination
Recombination signal sequence and 12/23 rule
A fairly conserved DNA sequence known as recombination signal sequence (RSS) resides adjacent to each subexon and consists of a palindromic heptamer (CACAGTG) and an A/T-rich nonamer (ACAAAAACC)[14, 22-24] (Fig. 2a,b). The first three nucleotides of the heptamer are crucial for the recombination activity.[25, 26] Though the nonamer binding domain of RAG1 is well characterized, the region of the RAG complex that recognizes the heptamer is yet to be deciphered.[27, 28] The heptamer and nonamer are separated by a spacer DNA sequence of either 12 bp (12RSS) or 23 bp (23RSS) (Fig. 2a). Although the length of the spacer is conserved, its sequence is not of much importance.[12, 24] Generally, a 12RSS recombines only with a 23RSS and vice versa, a restriction termed as the ‘12/23 rule’ (Fig. 2b), which prevents non-productive rearrangements. The coupled cleavage of a 12RSS and 23RSS requires Mg2+, whereas Mn2+ supports RAG-mediated nicking of a single RSS.
Recently, the ‘beyond 12/23’ rule has been proposed to explain the exclusion of direct TCRBV to TCRBJ joining in the TCR-β region, in spite of the incidence of appropriately oriented pairs of 12RSS and 23RSS. The exclusion was enforced during the DNA cleavage step of the V(D)J recombination and was attributed to several factors, like relatively slow nicking of the TCRB substrates and poor synapsis of the TCRBV and the TCRBJ. Extrachromosomal V(D)J recombination assays could recapitulate the ‘beyond 12/23 rule’ in the TCRBV, implying that it is solely the RAG proteins and RSSs, which play a role in establishing this restriction. In contrast, with respect to TCRDV locus, the involvement of other factors was also suggested.
The perfect cleavage
RAG1 and RAG2 initiate recombination by introducing a single-strand nick in DNA precisely at the border between the heptamer of RSS and the coding segment. The 3′-OH group of the nick at the coding end then becomes covalently linked to the opposing phosphodiester bond of antiparallel strand by a transesterification reaction resulting in hairpin structure at the coding end and blunt signal end. The signal ends remain associated with RAG proteins resulting in a transitory structure referred to as a ‘post-cleavage complex’.[36, 37] In the presence of Mg2+, nicking can take place, and with Mn2+, both nicking and hairpin formation are observed. However, this was demonstrated only in vitro in a non-physiological concentration of MnCl2 using isolated RSS substrates and not in the physiological 12RSS and 23RSS pair. The in vivo scenario is still unclear though it is commonly thought that Mg2+ is the physiological divalent metal ion involved in RAG-mediated cleavage. RAG1 and RAG2 are assisted by high mobility group proteins of the HMG-box family (HMGB1 and HMGB2) for bringing two signal ends together. The HMG proteins interact with the nonamer binding domain of RAG1 in the absence of DNA and enhance its intrinsic DNA bending activity. Following resolution of the hairpin structure, the coding ends are joined to create the exon, which forms the antigen-binding region of the antigen receptors (Fig. 2c). The signal ends remain bound to RAGs, which in turn protect the ends from further nuclease digestion [36, 39] (Fig. 2c).
The imperfect joining
The blunt-ended signal ends can be directly ligated without any modification, while the coding ends undergo further processing (Fig. 2c).[34, 35] The hairpin at the coding end is opened and joined together by non-homologous end joining (NHEJ), the DNA double-strand break repair pathway.[40, 41] Artemis, in conjunction with DNA-PKcs, acts as an endonuclease and resolves the hairpins formed during V(D)J recombination. Ku heterodimer, consisting of Ku70 and Ku80, binds to the broken DNA ends and forms a complex with DNA-PKcs. Artemis has an inherent 5′-3′ exonuclease activity, whereas in association with DNA-PKcs it acts as a 5′-3′ endonuclease. The ends are filled in by the Pol X family of polymerases namely Pol μ and Pol λ. Mice deficient in Pol μ are shown to have shorter immunoglobulin light chain V to J junctions, while those lacking Pol λ have shorter immunoglobulin heavy-chain D to J and V to DJ junctions. In addition, Pol μ plays a role in the processing of 3′ ends while Pol λ processes the 5′ ends. This would suggest that Pol μ is involved in V(D)J recombination, but not Pol λ. The ligase IV/XRCC4 complex ligates the processed ends[47, 48] with the help of XLF.[49, 50] Ku70, Ku80, XRCC4 and Ligase IV are considered to be the ‘core’ NHEJ factors as these proteins were conserved during evolution and are required for all known NHEJ reactions.[51, 52] These are also inevitable for the joining of both coding and signal ends. On the other hand, DNA-PKcs and Artemis are believed to have evolved more recently and are needed only for the joining of coding ends.
Diversity at the junction
At the time of joining of V, D and J subexons, several modifications like insertions and deletions can occur at the junctions resulting in further increase in the antigen receptor diversity. Asymmetric hairpin opening at the coding ends due to nicking a few bases away from the terminus results in one DNA strand longer than the other. The shorter strand is extended by addition of nucleotides complementary to the longer strand resulting in the insertion of palindromic (P) nucleotides at the coding joints. Terminal deoxynucleotidyl transferase (TdT) and DNA Pol μ further diversify these junctional sequences by catalysing the addition of non-templated nucleotides (N-nucleotides) to the coding ends. The junctional diversification can expand the diversity upto 1011 from the earlier 106 through the combinatorial diversification.
Non-standard recombination products
Alternative outcomes of V(D)J recombination reported were ‘hybrid joint’ and ‘open-shut joint’. During the formation of ‘hybrid joint’, the coding end of one subexon is joined to the signal end of another following the initial cleavage step of V(D)J recombination. In certain cases, the original pair of coding and signal ends, which was separated during the RAG cleavage phase, rejoins leading to formation of an ‘open-shut joint’. When there are no modifications at the joints, ‘open-shut joints’ are hard to detect. The released signal ends may non-specifically attack double-stranded DNA leading to transposition.
Other modes of generation of the diversity
The antigen receptors are further modified by two processes, namely class switch recombination (CSR) and somatic hypermutation (SHM). The CSR refers to the rearrangements of the constant regions of antigen receptors upon encountering an antigen. This further expands the variability in the constant region following rearrangement at the variable region. The CSR replaces the expression from Cμ to Cγ, Cε or Cα, resulting in the switching of immunoglobulin isotype from IgM to IgG, IgE, or IgA without changing the antigen specificity (Fig. 3). The immunoglobulin CH locus comprises an array of CH genes, flanked by a switch (S) region at its 5′ region. The CSR takes place between two S regions, resulting in the loop-out deletion of the intervening DNA segments as circular DNA  (Fig. 3). The SHM refers to the random genetic mutations that occur in the B cells (and not T cells) at certain hotspots in the antigen-binding regions following an antigen encounter and results in the increased affinity of the receptor to the antigen. As a result of this, a fraction of the antibodies possessing low-affinity receptors to the defined antigen, further increase their affinity and undergo expansion. The SHM takes place in the V region of both H and L chain genes (VL/H), introducing a million times more point mutations than the genome-wide background leading to the generation of high-affinity antibodies. Hence, CSR and SHM act on entirely different targets, i.e. CH and VL/H, respectively. Therefore, it was believed that these two processes were regulated differently. However, recently, it has been shown that the same enzyme, the activation-induced cytidine deaminase initiates both CSR and SHM in mice and humans.[57, 59, 60]
Mechanism of action of RAG proteins
Murine RAG1 comprises 1040 amino acids. The ‘core region’ of RAG1 (cRAG1) consisting of amino acids 384–1008, is essential for all activities in vivo and in vitro.[61, 62] RAG1 exists as a homodimer in solution. It harbours all the elements that are indispensible for activity including the active site, DNA-binding domain and RAG2-interacting domain. The core regions acted as focal points of subsequent research, mainly because they were more soluble than their full-length counterparts. Using surface plasmon resonance and in vivo one-hybrid experiments, it was shown that the N-terminus of cRAG1 (amino acids 384–460) harbours the nonamer binding region. The heptamer recognition region of RAGs still remains obscure. The DDE motif (a triad of three acidic amino acids: D600, D708 and E962) of RAG1 forms the catalytic centre of the RAG1/RAG2 complex,[64-66] which plays a role in chelating the two divalent metal ions essential for catalysis. The N-terminal non-core region (amino acids 1–383) contains a RING domain fold, which exhibits ubiquitin ligase activity.
Studies by Rodgers's group using limited proteolysis showed that murine cRAG1 is composed of topologically independent domains that can function individually. These include the N-terminal, the central and the C-terminal domains. The central domain has the heptamer binding site, RAG2 binding site and zinc finger motif. The C-terminal domain has the dimerization region and binds DNA co-operatively. Murine cRAG1 was successfully expressed in Escherichia coli as a fusion protein with Maltose binding protein (MBP) tag with high yield and solubility and was active when combined with cRAG2 expressed in human embryonic kidney cell line. However, there is no report of successful bacterial expression of RAG2.
Murine ‘core RAG2’ consists of amino acids 1–383 out of the total 527. The molecular function of core RAG2 remains elusive. RAG2 consists of an N-terminal 6-bladed beta-propeller domain and a C-terminal plant homeo domain (PHD).[70, 71] The PHD is a motif characteristic of chromatin remodelling proteins. It has been predicted to facilitate the ordered rearrangement of IgH chains and the binding of core histone proteins.[72-74] The C-terminus of RAG2 contains a threonine residue (T490) that acts as a target of Chk2 kinase. Phosphorylation of this amino acid regulates the proteosomal degradation of RAG2 at the G1/S transition of the cell cycle. This regulatory mechanism ensures that RAG2 is degraded in a cell-cycle-dependent manner preventing RAG-induced DNA breaks during replication. Biochemical analysis of recombinant RAG2 has identified several basic residue mutants defective in catalysis. Accordingly, Schatz's group has proposed a model for the interaction of RAG2 with DNA in which the amino acids K119 and K283 directly contact DNA. It was shown that the PHD finger specifically recognizes histone 3 trimethylated at lysine 4 (H3K4me3). The H3K4me3 increases the catalytic turnover number (Kcat) of RAGs as well as tethering it to DNA. Recently, it was reported that mice harbouring RAG2 with a deleted C-terminal end rapidly developed thymic lymphomas with chromosomal translocations, amplifications and deletions involving TCRA/D and IgH loci implying its role in genomic stability.
Interaction between RAG1 and RAG2
Protein–DNA cross-linking studies have suggested that cRAG1/cRAG2 may form specific contacts with the heptamer of RSS.[81, 82] In addition, binding assays have shown that cRAG1, even in the absence of RAG2, could bind to RSS in a manner that was specific to both heptamer and nonamer.[38, 82] RAG1 and RAG2 can interact and exist as a complex. Modification interference and direct footprinting studies indicated that RAG1 alone could make extensive contacts with the nonamer, whereas RAG2 is needed for the heptamer occupancy. Direct interaction between RAG2 and DNA in the RAG1/RAG2/RSS complex has also been reported. Fluorescence anisotropy and gel mobility shift assays indicate that RAG1 exhibits sequence specific binding character. It has been suggested that this property is masked by its non-specific DNA interactions and addition of RAG2 effectively rescues RAG1 from this non-specific binding.
Using chromatin immunoprecipitation, it was shown that in vivo RAG binding was focused at the ‘recombination centres’ involving a small region encompassing J (and where present, J-proximal D) subexons in the IgH, IgK, TCRB and TCRA loci.[85, 86] The study showed that RAG2 binds throughout the genome at sites with substantial levels of H3K4me3. RAG1 binding was shown to be restricted to highly active chromatin in the presence of arrays of RSSs.[85, 86]
Role of RAGs in receptor editing
Expression of RAG in early B cells occurs in two waves, the former is responsible for V(D)J rearrangement of IgH genes in pro-B and pre-B-I cells and the latter for VJ recombination of the IgL genes in small pre-B-II cells. There are also reports of a third wave of RAG expression, induced in activated mature B cells in vitro and in vivo. Spleen B cells from mice activated with lipopolysaccharide or interleukin-4 showed expression of RAGs. Mice immunized with the antigen trinitrophenyl-keyhole limpet haemocyanin also showed expression of RAG1 and RAG2 in the inguinal and popliteal cells of draining lymph nodes. In addition, immature B cells carrying self-reactive receptors exhibited up-regulated RAG expression. Following B-cell antigen receptor cross-linking, an increased level of RAG mRNA was reported in cultures of developing B cells.[89, 90] The sustained RAG expression allowed secondary V(D)J recombination that involves the replacement of self-reactive antibody V region genes by the V(D)J recombination. During this process, B cells with a non-self-reactive receptor are allowed to exit from prolonged V(D)J recombination, whereas the self-reactive cells are retained for further editing, until they produce a non-self-reactive receptor. ‘Receptor editing’ refers to the secondary rearrangement that occurs in immature B and T cells, which plays a role in mediating tolerance.[92-95] ‘Receptor revision’ refers to the secondary rearrangement in mature B and T cells which alters their receptors in response to antigenic stimulation. In the case of immunoglobulin light chain and TCRA that lacks the D gene segments, the secondary rearrangement occurs between unrearranged V gene segments upstream and J segments downstream with deletion of the original rearranged VJ segment. These rearrangements do not violate the 12/23 rule. However, in the case of IgH and TCRB, rearranged gene contains a D segment and all other unused D segments are lost during DJ and VDJ rearrangements leaving behind only non-compatible RSSs. This obstacle is overcome by the presence of a 3′ sequence of the V segment, which plays the role of a surrogate RSS, thereby replacing the previously rearranged V, while retaining the already rearranged DJ.[96, 97]
RAG mediated transposition
RAGs have been shown to exhibit transposition activity by integrating excised RSS-flanked signal ends into a target DNA molecule, in vitro. Integration can be intermolecular wherein the target DNA is a plasmid or intramolecular in which the target can be the intervening sequence stretching between RSSs.[98-100] Integration was not sequence-specific but was targeted to altered DNA structures like hairpins. Several lines of studies compared RAGs with bacterial transposons and revealed striking similarities. Isolated studies have shown that RAG transposition can occur in vivo.[103, 104] The first among these demonstrated interchromosomal transpositions, wherein TCR-α signal ends from chromosome 14 inserted into the X-linked hypoxanthine-guanine phosphoribosyl transferase locus, resulted in gene inactivation. It was also shown that RAG expression in yeast could lead to transposition. The transib transposase from the insect Helicoverpa zea was shown to be active in vitro and its breakage and joining activities mimicked that of RAG, providing strong evidence that RAGs and transib transposases were derived from a common progenitor. However, there is no evidence that RAG-mediated transposition can occur in the mammalian genome. This can be the result of the stringent regulation of the process in the mammalian system.
Structure-specific nuclease activity of RAGs
In contrast to the standard function of being a recombinase, later studies pointed out that the RAG complex can also act as a structure-specific nuclease and this property has several implications in the pathological roles of the RAG complex (Fig. 4). Studies suggested that RAGs possess a structure-specific 3′ flap endonuclease activity that can remove single-strand (ss) extensions from branched DNA structures. RAGs also showed hairpin opening activity in the presence of MnCl2.[108, 109] The fragility of the BCL2 major breakpoint region was attributed to its acquiring a stable non-B DNA structure in the genome, which was prone to RAG cleavage. Further, it was shown that RAGs could cleave symmetric bubbles, heterologous loops and potential G-quadruplex structures at the physiological concentrations of MgCl2[111, 112] (Fig. 4). RAG nicking efficiency was dependent on size of heteroduplex DNA and the proximity to RSS sequence did not affect nicking efficiency. It was also observed that structure specificity of RAGs could be attributed to the sequence at the single-stranded region. Cytosines were the most preferred, followed by thymines while purines were not cleaved at all. A consensus sequence of ‘C(d)C(s)C(s)’ (d, double-stranded; s, single-stranded) was also proposed for the generation of breaks at single-strand/double-strand transitions. The nonamer binding region of RAG1 was not important for RAG cleavage at non-B DNA structures, in contrast to that at RSS. The study showed low cleavage kinetics and a lack of cleavage complex formation at heteroduplex DNA, as the two mechanisms that ensured the control of the pathological activity of RAGs.
Off-target cleavage by RAGs and its pathological relevance
In an ideal scenario, RAGs target RSS within the immunoglobulin/TCR loci. However, a large number of RSS-like sequences (cryptic RSS) exist throughout the genome and this would lead to the non-specific targeting of RAGs leading to DNA double-strand breaks outside the immunoglobulin/TCR loci, resulting in genomic rearrangements. If the rearrangement juxtaposes the immunoglobulin/TCR regulatory sequences like promoters or enhancers to proto-oncogenes, it could lead to over-expression of the oncogenes culminating in lymphoid malignancies. RAGs are known to generate breaks at sequences resembling heptamer or nonamer because of misrecognition in several leukaemias and lymphomas, which include translocations like MTS1, LMO2, TTG-1, SIL and SCL.[116-119] The discovery that RAGs can detect and cleave non B-DNA structures further increased the spectrum of non-specific cleavage by RAGs. In case of t(14;18) translocation at follicular lymphoma wherein nearly 75% of the breakpoints are dispersed over a 150-bp region called major breakpoint region of BCL2, it has been shown that a non-B structure can form, which is specifically targeted and cleaved by RAGs.[110, 111] Later, the nature of this structure was identified as a G-quadruplex.[112, 121] It has also been shown that RAGs can cleave at an eight-nucleotide motif ‘CCACCTCT’ in the minor breakpoint cluster of the BCL2 in a nonamer-independent manner.
To generate a functional antibody or TCR, several of the genomic segments propagating in the embryo have to select each other, merge in various combinations and further modify themselves. Though the main players in the process have been identified, the mechanism by which each of the individual proteins acts and broadly how the chronological order is regulated are not known. The structure of RAG proteins still remains elusive. Several questions regarding the structure specificity of RAGs are unclear. Biochemical and biophysical studies on the domains within the core and non-core regions of these proteins, studies on the full length proteins in vivo, and detection of their interacting partners are being pursued.
We would like to thank Dr Mridula Nambiar, Ms Mrinal Srivastava, Dr Abani Naik, Ms Rupa Kumari and Ms Deepthi R for critical reading of the manuscript. This work was supported by grant (SR/SO/BB/0037/2011) from DST, India. NM is supported by a Senior Research Fellowship from CSIR, India.