Our ability to analyze adaptive immunity and engineer its activity has long been constrained by our limited ability to identify native pairs of heavy–light antibody chains and alpha–beta T-cell receptor (TCR) chains — both of which comprise coupled “halves of a key”, collectively capable of recognizing specific antigens. Here, we report a cell-based emulsion RT-PCR approach that allows the selective fusion of the native pairs of amplified TCR alpha and beta chain genes for complex samples. A new type of PCR suppression technique was developed that makes it possible to amplify the fused library with minimal noise for subsequent analysis by high-throughput paired-end Illumina sequencing. With this technique, single analysis of a complex blood sample allows identification of multiple native TCR chain pairs. This approach may be extended to identify native antibody chain pairs and, more generally, pairs of mRNA molecules that are coexpressed in the same living cells.
Antibodies and T-cell receptors (TCRs), the weapons of adaptive immunity capable of selective recognition of specific antigens, represent an invaluable resource for biological studies and medical applications [1, 2]. Analysis of the native TCR and antibody repertoires is also of fundamental importance for our understanding of adaptive immunity in health and disease . Because of this, methods for efficient identification of their functional units — native pairs of heavy/light antibody or alpha/beta TCR chains — have been highly desired since decades.
A series of different approaches have been developed to this end, including hybrid cells , single-cell PCR [5-7], and frequency-based pairing . Native pairs of TCR or antibody chains can also be identified after culturing of lymphocyte clones , or sorting of narrow antigen-specific populations of T cells  or B cells . While these approaches are feasible to identify native chain pairs, they can only be performed serially for a limited number of clones, thereby prohibiting comprehensive analysis of most biological samples. Phage and yeast display technologies [11-13], although efficient for isolation of antigen-specific antibodies, rely on random pairing and do not provide information on the native pairs of chains. Furthermore, those approaches that could yield massive output based on the amplification and assembly of genes in fixed cells [14, 15] have not demonstrated feasibility for the analysis of complex samples. Recently, a micro-well plate assay was reported to pair human immunoglobulin heavy and light chains for high-throughput sequencing . However, this assay is quite laborious and requires the fabrication of custom chips, while its efficiency is not fully clear.
Here, we report a new approach for identifying TCR alpha–beta pairing, based on specific RT of TCR alpha and beta chain mRNA, PCR amplification, and subsequent coupling via overlap extension , all performed within emulsion droplets each of which contains a single T cell. We overcame the primary obstacle of random, nonspecific overlap extension during postemulsion amplification reactions with a new PCR-suppression technique that we invented to specifically and efficiently block the 3′-ends of the free, nonoverlapped alpha and beta chain PCR products. Notably, the whole technology is simple to implement and does not require any special equipment.
Results and discussion
RT and amplification from cells in emulsion
The key requirement for the success of our approach is the ability to generate a representative library of TCR alpha and beta genes that are fused within emulsion reactions, wherein individual T cells are each contained within a separate minute droplet of sufficient reaction volume (Fig. 1A).
We employed a strategy of starting from RNA, which benefits from a high abundance of molecules encoding TCR alpha and beta chains and universal primer annealing sites on the constant regions of the TCR chains [18, 19]. The emulsion PCR protocol based on ABIL EM 90 surfactant was adapted from , with revisions to allow for single-stage, cell-based RT and amplification reactions. Control experiments with DAPI-stained cells and a mixture of green and red fluorescent-labeled emulsions demonstrated high emulsion stability (<0.1% fusion of droplets after thermo-cycling), sufficient abundance of droplets for our reaction (millions), and an undetectable percentage of droplets containing more than one cell (see the Materials and methods).
We designed a multiplex primer mix (Supporting Information Table 1) to prime cDNA synthesis, amplify, and then fuse via overlap extension  multiple TCR alpha and TRBV7-family beta genes, starting from T cells (Fig. 1B).
Predictably, the immediate product of cell-based RT-PCR emulsion reactions represented a complex mixture of PCR products, in which overlap-extended products of interest (i.e. fused TCR alpha and beta chain genes) were undetectable. Two sequential nested PCR amplification reactions  successfully yielded overlap-extended PCR product (Supporting Information Fig. 1). However, PCR amplification of a control mix of TCR alpha and beta chain genes that were independently amplified from peripheral blood mononuclear cells (PBMCs) in two separate emulsion reactions produced fused product with the same efficiency (Fig. 1D, lanes 1 and 2). Thus, random overlap extension of abundant TCR alpha and beta amplicons released from emulsion was apparently predominant, resulting in loss of cell-specific information amidst random noise.
Noise-free amplification of paired TCR genes using a new type of PCR suppression
To prevent random overlap extension and mega-priming by nonfused TCR alpha and beta PCR products after extraction from emulsion, it was necessary to find a way to efficiently block their 3′-ends. We made several unsuccessful attempts toward this end, using terminal deoxynucleotidyl transferase (TdT) to nonspecifically elongate 3′-ends, biotin labeling of internal overlapping primers to filter nonoverlapped products, increased primer annealing temperatures during nested amplification, and dilution of emulsion-derived PCR products. None of these procedures proved sufficient, yielding no more than a three-cycle delay in the appearance of random background signal for treated overlap-extension PCR reactions relative to untreated reactions (data not shown).
Finally, we developed the new and efficient PCR-suppression approach described schematically in Figure 1C. To “kick out” the 3′-ends of undesirable DNA molecules, we added an excess of oligonucleotide that is complimentary to the 3′-end, and also provides a template for further elongation with seven nonsense nucleotides. As a consequence, elongated products lose homology to their counterpart molecule and thus cannot enter the overlap-extension reaction efficiently.
The presence of 3.2 μM of two such blocking oligonucleotides complementary to the 3′-ends of the overlapping TCR alpha and beta PCR products efficiently suppressed amplification when we combined the products of two separate emulsion reactions, each performed with oligonucleotide sets for either TCR alpha or beta amplification (Fig. 1D, lane 5). At the same time, amplification starting from the product of a full emulsion reaction performed with all oligonucleotides for alpha and beta amplification and overlap extension remained efficient (Fig. 1D, lane 4). Higher concentrations of blocking oligonucleotides suppressed amplification of the pre-fused product of interest (Supporting Information Fig. 2), and we used a concentration of 3.2 μM for subsequent experiments.
This new type of PCR suppression thus solves the most challenging issue of overlap-extension-based approaches for identifying coexpressed gene pairs: the high level of noise resulting from random overlap extension after the emulsion stage. Due to this innovation, the library of paired TCR alpha and beta chains produced within each emulsion can be further selectively amplified and analyzed using next-generation sequencing.
As an additional control for emulsion stability, we performed experiments in which two separate emulsions containing oligonucleotide mix for amplification of either TCR alpha or beta genes were combined after RT (so that the amplification step was performed for the combined emulsions). In this setup, the inclusion of blocking oligonucleotides similarly suppressed the generation of fused product, indicating negligible fusion of cell-containing droplets during amplification (Fig. 1D lane 6).
Identification of native TCR chain pairs within a human blood sample
To demonstrate the power of the approach, we performed eight independent emulsion reactions each starting from 1 × 106 PBMCs obtained from a human donor. To provide an internal control, we added approximately 1000 cells of a cultured T-cell clone (C1A) with known TCR alpha and beta chains to three emulsion reactions. In addition, we performed two control reactions with cells without emulsion in order to provide statistics for the random pairing of TCR alpha and beta chains in solution. We further amplified the resulting PCR products with our new PCR-suppression method and analyzed the amplicons by paired-end Illumina 2 × 150 bp sequencing. In order to extract CDR3 alpha and beta information and correct PCR errors, we subsequently analyzed the raw Illumina data with recently described TCR analysis approaches .
Hundreds of CDR3 pairs were selectively and prominently enriched in the emulsion reactions, indicating specific pairing, in contrast to the control reaction where the relative abundance of pairs was nearly equal to the frequency-expected random pairing (Fig. 2 and Supporting Information Fig. 3). Of those, multiple TCR alpha–beta pairs were selectively enriched within several emulsion reactions (Table 1), providing independent cross-confirmation. Notably, there were no contradictory cross-pairings between these repeatedly identified pairs, which generally confirms the accuracy of the whole analysis pipeline. More than 700 alpha–beta pairs were unambiguously identified in single emulsion reactions (Supporting Information Table 2, and the Materials and methods).
Table 1. Cross-confirmed pairs of TCR alpha and beta chains
An independent set of experiments revealed two major TCR alpha–beta pairs for sorted CMV (NLVPMVATV peptide) specific CD8+ T cells obtained from the same donor (Supporting Information Fig. 4, Supporting Information Table 3, and the Materials and methods). Both of these pairs, P1 and P4, were successfully identified in multiple emulsion reactions (Table 1). We also identified our internal control clone C1A in two of the three emulsion reactions, indicating the approximate limits of the method's sensitivity in its current implementation and demonstrating that 1000 clonal T cells per million PBMCs (approximately 0.2% of all T cells) are roughly sufficient to repeatedly detect a clone in parallel emulsion reactions. This sensitivity is adequate for applications where one is looking for highly represented TCRs, especially in sorted T-cell subpopulations of interest.
Additionally, one of the clones captured in a single emulsion reaction consisted of the alpha and beta variants CILDNNNDMRF/CASSLAPGATNEKLFF-2, a combination that was previously identified among sorted NLVPMVATV peptide specific cells, confirming the correctness of this pairing (P30, Supporting Information Table 2 and 3). Strikingly, exactly the same amino acid CDR3 sequences for the TCR alpha and beta chains were earlier reported for the NLVPMVATV peptide specific T-cell clone for another patient .
Accuracy of pairing for the four known clones was generally high in all emulsion reactions, with the average ratio of native alpha–beta pairs to the most abundant mispairing of 334:1 (Fig. 2 and Supporting Information Table 4).
In summary, the independent confirmation of four TCR alpha–beta chain pairs identified in a complex PBMC mix (Table 1) allows us to conclude that the described emulsion-based approach for identifying immune receptor gene pairs works reliably. Having demonstrated the soundness of this concept, we recognize that substantial technical optimization will be necessary in order to efficiently capture chain pairs for rare clones and for the full diversity of TCR V gene segments. Relatively simple improvements such as the use of advanced live cell emulsion generators, such as those manufactured by RainDance Technologies, will dramatically expand the power of this method compared to manually generated emulsions, yielding increased efficiency of all reactions by consistently generating standardized droplets of sufficient volume. We believe that future iterations of our platform will greatly facilitate analysis of adaptive immune responses and the production of TCR- and antibody-based tools for biology and medicine.
Materials and methods
PBMCs were isolated from the peripheral blood sample of a 45-year-old man (informed consent obtained) by Ficoll-Paque (Paneco) density gradient centrifugation. For each of the eight emulsion and two control reactions, 1 × 106 PBMCs were preincubated overnight with 10 U/mL IL-2 (Roche), which increases TCR mRNA expression level (our observations, data not shown). Emulsions were generated at 25°C as described previously  from 900 μL oil surfactant (2% ABIL EM 90 and 0.05% Triton X-100 in mineral oil) and 100 μL aqueous phase. The aqueous phase included PBMCs in 10 μL 150 mM NaCl, PCR mix (7.5 U Encyclo polymerase (Evrogen), 5 μL 10×x Encyclo buffer, 5 U MMLV reverse transcriptase (Evrogen), 3.5 mM MgCl2, 1.4 mM DTT; in four emulsion reactions, it was substituted with single-stage RT-PCR mix, Evrogen), 0.5 g/L BSA, 30 U RNasin (Promega), 2.4 mM dNTP mix, and oligonucleotide primers. Primer set included 0.2 μM each: BC_synth_rev, AC_synth_rev, BV7_for, and AV_for_uni, as well as a mix of the following 13 TRAV primers (0.02 μM each): AV14_for, AV26–1/4_for, AV26–2/4_for, AV39/24_for, AV38–2_for, AV13–2_for, AV25–2_for, AV41/22_for, AV8_for, AV29/23_for, AV12–3_for, AV27_for, and AV5_for. All oligonucleotides used in the study and their functions are listed in Supporting Information Table 1.
Preliminary experiments demonstrated that MMLV reverse transcriptase survives heating at 65°C for 2 min, while T cells are efficiently destroyed by such incubation (data not shown). Thus, we employed this step in our emulsion reactions as a temperature shock to burst living cells and release RNA into the droplet volume (Fig. 1A).
For the sequential reactions shown in Fig. 1B, we employed the following temperature regimen: 30 min incubation at 50°C for specific cDNA priming and synthesis, followed by 27 cycles of PCR amplification and overlap extension at 94°C (10 s), 53°C (10 s), and 72°C (20 s). The two reverse primers specific to the constant regions of the TCR alpha and beta chains (BC_synth_rev and AC_synth_rev) served a triple function: specific priming of cDNA synthesis, PCR amplification of variable fragments, and PCR amplification of the overlapped extended (fused) chains within emulsion. The direct primers (named “for” in the Supporting Information Table 1) were specific to the variable TCR alpha and beta gene segments and served a double function: PCR amplification of variable fragments and introduction of complementary sequences for the overlap-extension reaction. These primers were designed to anneal specifically at corresponding TCR V gene segments with a complementary region of sufficient length for the overlap-extension reaction.
A total of 1 mM EDTA was added immediately after the emulsion reaction to block polymerase activity. Emulsion product was extracted using diethyl ether/ethyl acetate/diethyl ether as described previously  and purified using a PCR cleanup kit (Qiagen).
Postemulsion amplification with PCR suppression
Emulsion-derived DNA was further amplified in two sequential PCR amplification reactions (15 cycles/25-fold dilution/17–20 cycles) with nested primers specific to the constant regions of the TCR alpha and beta genes (Supporting Information Fig. 1 and Supporting Information Table 1). Blocking oligonucleotides were added at concentration 3.2 μM to suppress the nonoverlapped segments of TCR alpha and beta genes, as shown in Figure 1C. To minimize potential side effects in PCR, the 3′-ends of the blocking oligonucleotides were phosphate protected. Taq polymerase (Evrogen) was used at these steps to exclude proofreading activity that could alter efficiency of PCR suppression. The temperature regimen for the first postemulsion amplification was 94°C for 30 s followed by 15 cycles of 94°C (10 s) and 72°C (30 s); the regimen for the second postemulsion amplification was 94°C for 30 s followed by 17–20 cycles of 94°C (10 s), 62°C (10 s), and 72°C (20 s). We analyzed the resulting library of paired TCR alpha/beta fragments (average length ∼400 bp) via paired-end 150 nt Illumina sequencing. Raw data will be deposited on NCBI Sequence Read Archive (SRA).
(i)After 40–60 min of incubation in Encyclo PCR buffer (Evrogen), most PBMCs were positively stained with 5% trypan blue solution (PanEco, Russia), indicating that they were either dead or at least perforated. However, the whole RT-PCR reaction worked well starting from both centrifuged cells and from the supernatant after such incubation, indicating that although the RNA leaks into solution, the RNasin keeps it intact. To avoid contamination of leaked TCR alpha- and beta-encoding RNA between droplets, we therefore performed manual emulsification immediately after placing the cells into premixed PCR buffer. Manual emulsification itself takes only 2–4 min , while only a minor percentage of PBMCs was positively stained with trypan blue after being incubated in PCR buffer for 7 min.
(ii)To achieve both separation of T cells and completion of further RT-PCR reactions in individual volumes, appropriate droplet size and emulsion stability had to be maintained. To analyze these parameters, we generated two emulsions with the water phase stained by either green (FITC) or red (TAMRA) fluorescent oligonucleotides, mixed them, and subjected them to the RT-PCR temperature regimen. We analyzed these emulsion aliquots using fluorescent microscopy to calculate average droplet size, percentage of droplets of a volume sufficient to carry a T cell, and percentage of droplets that fused during thermocycling and thus fluoresced in both channels. On the basis of analysis of >4000 individual droplets, we estimated that a 50 μL reaction volume consists of approximately 2.5 × 108 droplets with an average volume of 210 fl, of which approximately 9 × 106 droplets have volume more than 500 fl, roughly sufficient to include a T cell and a required volume of reaction mix. These experiments also demonstrated minimal (<0.1%) fusion of droplets during thermocycling.
(iii)In another control experiment, we examined 5 × 105 DAPI-stained PBMCs put in emulsion within 50 μL reaction buffer (colored by carboxyfluorescein), and observed cell-containing droplets but failed to identify droplets containing multiple cells among more than 50 000 droplets we manually screened.
(iv)Finally, we performed control experiments to test emulsion stability, using a mixture of two emulsions each containing oligonucleotide mix for amplification of either TCR alpha or beta chains. We performed cell disruption, RT, and inactivation of reverse transcriptase for the two emulsions separately, and then combined them. Thus, amplification and overlap-extension reactions were performed in the same tube; if fusion of cell-containing droplets was occurring, this should have led to generation of full, fused TCR alpha–beta product. As shown in Figure 1D lane 6, this reaction performed poorly when the TCR alpha and beta RT reactions were performed separately, indicating negligible fusion between cell-containing droplets during amplification. In this control experiment, we intentionally joined emulsions only after RT; therefore, fusion of a small primer-containing droplet to a T-cell-containing droplet could not provide primers for cDNA synthesis of a second TCR chain at the RT stage. After heat inactivation of reverse transcriptase, fusion of small droplets with primers therefore did not lead to generation of a paired TCR alpha/beta product. Thus, this control was specifically for the background noise produced by the fusion of T-cell-containing droplets.
Basing on the above analyses, we postulated that the ABIL EM90-based emulsion provides sufficient stability under the temperature regimen used, with a sufficient number of more than 500 fl droplets potentially suitable for the living cell RT-PCR and low toxicity for cells during the emulsification timetable we employed.
Next generation sequencing data analysis
To extract CDR3 alpha and beta regions and correct PCR errors that introduce artificial sequence diversity and lead to biased read counts, raw Illumina data were analyzed as previously described . For each emulsion reaction, we obtained between 300 000 and 1.2 million high-quality, paired-end reads containing both TCR alpha and beta CDR3 sequence. To minimize noise from random events such as leakage through PCR suppression or overlap extension of incomplete PCR products during library amplification, we discarded low-count pairs (≤0.01% of total reads in each sample). We identified alpha–beta TCR CDR3 pairs that were significantly and selectively enriched within each emulsion reaction based on their read counts as follows.
First, we computed an expected value for the random pairing of each possible pair as:
where #alpha_reads and #beta_reads are total counts of reads with a given CDR3 alpha and CDR3 beta in a sample, respectively, and #total_reads is the total number of reads in a sample. To evaluate the strength of pairing (i.e. to check if the difference between observed and expected read count is significant), we computed p-values based on hypergeometric distribution, approximated by the worst of the two p-values obtained from the following binomial distributions:
where #pair_reads is a number of reads for each pair and f is a normalization factor—a reads-per-droplet parameter empirically chosen as f = 150, at which no significant events were observed in control reactions. Most of the significantly enriched events in a given emulsion had a very low p-value, and the final list of pairs all had p < 10−10 in corresponding emulsions.
Next we verified the specificity of pairing based on a pairing factor (PF), described as:
For the identification of highly expanded T-cell clones, those pairs that failed to fulfill the criteria of PF > 40% in at least two emulsion reactions were filtered out. In those cases in which several T-cell clones carry the same TCR beta but different alpha chains , we iteratively computed this score starting from the most abundant pair, then removing it with all its reads, continuing with the next most abundant pair, and so on.
To extract reliable information concerning TCR chain pairing for rare clones, we selected pairs that were unambiguously identified in only one emulsion reaction, and for which alpha and beta partners were also exclusively observed in this particular emulsion reaction (no cross-pairing with other partners in any emulsion). We filtered out those pairs that were observed in more than one emulsion preparation or failed to fulfill the criteria of PF > 80% for the joined data of all emulsion reactions. This eliminated noise that could originate from cross-contamination of minor clones by more abundant clones at any stage of paired library generation, and yielded 732 identified TCR chain pairs, 92 pairs per emulsion at average (Supporting Information Table 2).
Independent identification of the native TCR alpha/beta pairs
To identify native pairs of TCR alpha and beta chains within a human PBMC sample by independent method, we performed the following set of experiments.
PBMCs were isolated from the peripheral blood samples by Ficoll-Paque (Paneco) density gradient centrifugation, incubated with CD8-specific antibody (clone YTC 182.20, Abcam), CD4-specific antibody (clone S3.5, Invitrogen), and HLA-A*02-CMV (NLVPMVATV peptide specific) pelimer (Sanquin) for 20 min at room temperature, then washed twice with PBS, and sorted on a MoFlo cell sorter (DakoCytomation). RNA was extracted using TRIZOL reagent (Life Technologies), and TCR alpha and beta chain libraries were generated using template-switch technology as previously described .
In the first sorting experiment, we cloned and sequenced 50 TCR alpha and 50 TCR beta chain genes. Three variants of TCR alpha and two variants of TCR beta complementarity determining regions were identified (CDR3, Supporting Information Table 3). Among these variants, the predominant CDR3 alpha CIRDNNNDMRF and CDR3 beta CASSLAPGATNEKLFF-1 were suggested as a probable functional pair, a notion confirmed by the identification of a nearly identical NLVPMVATV peptide specific T-cell clone in an independent study . Note that these data were obtained before the design of emulsion reactions. The TRBV-7 family of the CASSLAPGATNEKLFF-1 clone was thus intentionally included so that we initially had an intrinsic control.
To confirm the correctness of these TCR chain pairings, we repeatedly sorted NLVPMVATV peptide specific CD8+ T cells from the same patient. We sorted more than 100 000 cells with a 97% purity (Supporting Information Fig. 4), and then performed deep TCR alpha and beta profiling on an Illumina MiSeq (1/10 of a run) using a quantitative RNA-based technique for the generation of CDR3 libraries . More than 20 000 CDR3-containing sequences were obtained and analyzed for TCR alpha and beta libraries. This deep profiling revealed the same major TCR alpha and beta CDR3 variants as previous cloning and sequencing analysis, but also provided sufficient statistics for rough frequency-based pairing . CDR3 variant CASSLAPGATNEKLFF-1 constituted approximately 90% of all NLVPMVATV peptide specific TCR beta sequences, nearly equal to the cumulative input of TCR alpha variants CIRDNNNDMRF and CAGYSGGGADGLTF. Thus, these data strongly suggested P1 and P4 TCR alpha–beta pairs (Supporting Information Table 3).
We are grateful to Ekaterina Barsova and Dmitriy Shagin (Evrogen JSC) for the valuable help with technical questions. This work was supported by the Molecular and Cell Biology program RAS, Russian Foundation for Basic Research (12-04-33139-mol-a, 12-04-33065-mol-a, 12-04-00229-а, 13-04-00998-a), Ministry of education and science of the Russian Federation (16.740.11.0748), Russian Federation President Grant for young scientists (МК-2382.2012.4), and European Regional Development Fund (CZ.1.05/1.1.00/02.0068).
Conflict of interest
Pairing technology is a property of Evrogen JSC, Moscow, Russia. S.D. is employed by Evrogen JSC. C.D.M. has a share in Evrogen JSC.