Deep Mining of Human Antibody Repertoires: Concepts, Methodologies, and Applications

antibody repertoire development, and strategies of functional antibody discovery from antibody repertoires. Finally, the pitfalls and opportunities in the deep mining of antibody repertoires are discussed


Introduction
Human humoral immune response to antigens is reliant upon the tremendous diversity of B cell receptors (BCR), the membrane-bound form of immunoglobulins (Ig). Key determinants of BCR diversity include 1) nearly random rearrangement of variable (V), diversity (D), and joining (J) genes in the immunoglobulin heavy chains variable regions (V H ), and rearrangement peripheral blood, [13] this population can only provide a narrow view of the adaptive immune response to antigens; therefore, lymph nodes, spleen, and red bone marrow are preferential as the sources of B cells if accessible. V(D)J recombination occurs in the bone marrow, [14,15] which means, for mature B cells in peripheral blood, either genomic DNA (gDNA) or mRNA can be used for the characterization of V(D)J sequences. In general, RNA-based approaches are adopted as they require only primers annealed to the conserved J or constant (C) region and template-switching primers using 5′ RACE (rapid amplification of cDNA ends) (Figure 2). [16] This eliminates the biases that occur in DNA-based assays, which necessitate the use of degenerate V primer sets required to amplify from all V H families ( Figure 2). [17,18] Moreover, in the step of reverse transcription converting RNA to cDNA, unique molecular identifier (UMI) tags can be added to correct sequencing errors, [19] and indexing barcodes can be introduced for pooled high-throughput sequencing, thus multiple samples can be sequenced in a lane to reduce costs and eliminate sequencing errors of different batches. [20] To provide a better understanding of antibody repertoires in the context of specific B cell populations, there has been increased application of single B cell sequencing, a commercially available RNA-based sequencing technology that uses the polyA tail of mRNA to capture BCR genes. [21,22] In some cases, DNA-based assays can be considered due to its accessibility, stability, and low heterogeneity, as immunoglobulin transcription varies dramatically between naive B cells and plasma cells. [23] The rapid development of Ig-Seq in recent years has established a variety of efficient sequencing strategies using different platforms. [6,20,[24][25][26][27][28][29][30][31][32][33] Once BCR genes have been sequenced, the raw sequencing data need to be filtered for high-quality reads based on the Phred quality score. [34] The sequence errors that arise from reverse transcription and PCR, or sequencing artifacts, have been well-studied and reviewed elsewhere. [20] Once clean data is obtained, the first step of processing is to assign each BCR gene to its reference gene. Unlike conventional DNA or RNA-Seq, V(D) J reference genes consist of a series of similar sequences from different families, which makes it difficult to perform an accurate gene assignment. Moreover, due to the occurrence of V(D)J junctional diversity and SHM, the rate of sequence alignment is usually less than 100%, and occasionally, two or more germline genes will be simultaneously called due to the nearby alignment rate.
Dozens of BCR gene annotations and high-level processing tools have been developed since 2013 [35] (https://b-t.cr/c/ wiki/15), yet the IMGT/V-QUEST and a higher-throughput version, IMGT/HighV-QUEST, [36] remain the most widely used. These are tools of the International Immunogenetics Information System (IMGT) (http://www.imgt.org), a global reference database in immunogenetics and immunoinformatics, which contains a well-annotated set of genomic V, D, and J genes and thus provide the reference for most BCR gene assignment tools. Another tool provided by IMGT, the IMGT/StatClonotype, [37] is operated in R software to perform a pairwise comparison of IMGT/HighV-QUEST output data. Once data sets are uploaded, IMGT clonotypes can be automatically clustered, and V(D) J genes usages statistically tested; IMGT/StatClonotype will also generate tables, and synthesize high-quality figures. It is time-saving and convenient for scientific researchers without a bioinformatics background, but only a pair of data sets can be analyzed at a time. However, the restriction of 200 000 input sequences for IMGT/HighV-QUEST is often not sufficient to cope with the millions of reads produced by typical Ig-Seq. Considering how time-consuming the ultra-high-throughput sequence processing is, the tools that have no sequence input limit and can run on personal servers are more widely used. [20,27] Finally, once the characteristics of antibody repertoires for different samples have been analyzed, the underlying biological or pathological meaning can then be extracted, which is discussed in the section "Applications of Ig-Seq." Additionally, efforts to mine functional antibodies from antibody repertoires can be attempted, which is discussed in the section "Strategies of Functional Antibody Discovery from Antibody Repertoires." Figure 1. Overview of Ig-Seq pipeline. IgBLAST [7] and MiXCR [8] are two of the Ig-Seq processing tools.

Methodologies for HTS of Antibody Repertoires
HTS refers to the deep, high-throughput, in-parallel DNA sequencing technologies developed a few decades after the Sanger DNA sequencing method first emerged in 1977, [38] and have had an enormous impact on the life sciences. Sequencing technologies are continually improved to become faster, more efficient, and cheaper, and the history, development, and applications have been well summarized. [39][40][41][42] In early studies of the antibody repertoire, only Sanger sequencing technology was available to delineate antibody sequences from hundreds of B cells. [43] Such a low throughput represents only a limited snapshot of the actual diversity of antibody repertoires. Compared to Sanger sequencing, next-generation sequencing (NGS) can provide a much broader picture of the antibody repertoire. The first NGS platform, 454 GS FLX, generated sequences up to 10 5 -10 6 with a read length of 400-600 bp that is sufficient to cover the full V H or V L region ( Figure 2). [44] This platform was instrumental in the initial exploration of the human antibody repertoire. Ion Torrent technology was developed by the inventors of 454 sequencing, reaching a read length of 400 bp with 10 6 -10 8 reads per run, and was also applied to the investigations of full-length variable regions of antibody repertoires. [45,46] Over the past ten years, the Illumina sequencing platforms have been well-developed, providing a sequencing depth of around 10 6 -10 10 , [47] thus improving the ability to study antibody diversity. Limited by a sequencing read length of no more than 300 bp, initially only the complementaritydetermining region (CDR3) region can be sequenced by most Illumina platforms [48][49][50] (Figure 2). Nevertheless, Illumina Hiseq, Miseq, and NovaSeq 6000, and so on, make it possible to avoid losing the full-length of the variable region, owing to its 2 × 250 to 2 × 300 bp maximum read length ( Figure 2). In addition, third-generation sequencing (also known as singlemolecule sequencing) platforms such as single molecule real time sequencing (SMRT) developed by Pacific Biosciences and nanopore sequencing offered by Oxford Nanopore's technology, offer the capability for single molecule real-time sequencing of average 10-15 kb read length [51] for the ultra-long antibodies explored. [52] These various platforms have their own advantages for conducting Ig-Seq, thus, sequencing strategies focused on different purposes have been flourishing ( Table 1). [33,53,54]

V H :V L Paired Ig-Seq
The IGHV gene is located on chromosome 14, [53] and the IGKV and IGLV are located on chromosome 2 and chromosome 22 respectively, [54,63] which makes it difficult to acquire V H :V L paired amplicons. As a result, efforts in the antibody repertoire researches are usually concentrated solely on IGHV diversity. Initial studies in V H :V L paired antibody repertoire sequencing relied on single B cell cloning by Sanger sequencing of the individual V H and V L genes in 96-well plates, [64] enabling the isolation of pathogen-neutralizing antibodies. [65] However, this process is inherently low-throughput, expensive, and yields a very limited set of antibody sequences. Wan et al. [9] took the first step in the V H :V L paired HTS sequencing. Rabbits were immunized with human progesterone receptor A/B (PR A/B) peptides, and 80 000 V region sequences from the immunized rabbits' antibody repertoires were sequenced using the Roche 454 platform. This generated a reference database to enable the protein sequence search of immunoglobulin V region matches. Once high-confidence and enriched antibodies were antigen-enriched and identified by liquid chromatographymass spectrometry (LC-MS/MS), these heavy and light chain sequences were then expressed combinatorially in a matrix. Although functional monoclonal antibodies were obtained, the native pairing of IGH and IGK/IGL was disrupted, thus access to V H :V L paired functional antibodies by randomly arranging and combining results resulted in a large workload and low probability.
To address the challenges in acquiring the natively paired heavy and light chains, Busse et al. [66] combined single B cell V H and V L gene amplicons using a 2D, bar-coded primer matrix. In brief, forward and reverse primers that incorporated unique identification tags were used for single B cell PCR, and according to the unique tag combinations, each IGH and IGK/ IGL sequences can be traced back to the original B cell. Nevertheless, using this approach to perform HTS Ig-Seq requires massive numbers of bar-coded primers and subsequent PCR.
An additional approach for sequencing native V H :V L pairs relies on single B cell lysis and mRNA capture in a high-density microwell plate. [67] In this study, by depositing single B cells into microwells and lysed in situ, mRNA is immediately captured on poly-dT magnetic beads, and V H :V L paired amplicons were generated via reverse transcription and emulsion V H :V L linkage PCR procedures. Finally, about 10% unique V H :V L pairs of 7 × 10 4 B cells were sequenced by Illumina MiSeq 2 × 250 bp platform. This approach opened the door to automated endogenous antibody V H :V L paired HTS. As an "upgraded version" of this method, Georgiou laboratory has recently developed an improved V H :V L paired HTS method. [68] Using an axisymmetric flow focusing device, millions of B cells were separated into microdroplets and lysed, then single-cell mRNA was captured to produce V H :V L paired amplicons with an overlap extension PCR. Illumina MiSeq 2 × 300 bp platform was used for sequencing. With this method, the throughput of V H :V L paired sequencing is unprecedently improved. More conveniently, the restriction sites contained in the 32 bp length linkers between V H :V L make it easily to be subcloned into an antigen binding fragment (Fab) expression vector. Therefore, this method is bifunctional in antigen-specific antibody discovery. Given the indispensability of antibody light chain in exercising humoral immunity functions, natively paired V H and V L genes may be expected in most antibody repertoire studies.
SMART-Seq2 is a proper plate-based single-cell sequencing technology in V H :V L paired sequencing which generates fulllength V H and V L cDNA libraries from each individual cell. [69] In this method, single cells are sorted and lysed in the microtiter plates, where mRNA is isolated and reverse transcribed. Next, DNA tagmentation reaction and adapter ligation are performed, and the subsequent enrichment PCR are used to construct sequencing libraries. The use of a dual-index strategy developed by Illumina makes it possible that up to 96 samples can be pooled and sequenced on a single lane of an Illumina sequencer. [70] However, no early multiplexing of the methods deeply limits the V H :V L paired sequencing throughput, and the additional PCR purification step may lead to the loss of material.
The high-throughput single-cell BCR sequencing (BCR-Seq) technology uses 10X Genomics Chromium, a droplet-based technique, and could obtain auto paired V H and V L sequences from tens of thousands single B cells in one run, has outperformed in Ig-Seq. [63][64][65][66][67][68][69][70][71][72] After a B cell sorting, each inputted cell is encapsulated with a 10X barcoded gel bead in a single partition by the 10X Chromium Controller. Within each nanoliter-scale partition, mRNA from cells undergoes reverse transcription to generate cDNA, where all cDNA from individual cells share a common 10X barcode, thus allowing the access of V H and V L paired sequences (Figure 3). The resulting 10X barcoded library is compatible with standard NGS short-read sequencing on Illumina sequencers, and the full-length V(D)J sequences are assembled. [73] High-throughput single-cell RNA-Seq methods allow paired V H and V L derived from thousands of cells to be captured and sequenced quickly, but only generate short reads from one end of a cDNA template, which limits the reconstruction of highly diverse full-length antibody variable regions. [74] A novel method, termed Repertoire and Gene Expression by Sequencing (RAGE-Seq) was developed to overcome this limitation. [71] In brief, droplet-based single-cell BCR-Seq is used to generate a barcoded cDNA library, which is split and simultaneously subjected to short-read sequencing for 3′ expression profiling and targeted capture using custom probes followed by long-read sequencing. The short-read sequencing is used to generate highly accurate cell-barcode sequences, which permit demultiplexing of the long-read data. Demultiplexed long-reads are subjected to de novo assembly and error correction to generate full-length V H and V L sequences. Transcriptome profiles generated from short-read sequencing can then be linked to the antigen-receptor sequence for each individual cell.

Antigen-Specific Ig-Seq
Ig-Seq is a powerful tool for interrogating immune responses to infection and vaccination reflected by the total antibody repertoire, but it provides limited information about the antigen specificity of the sequenced BCRs. The antigen enriched antibodies identified by the combination of LC-MS/MS and Ig-Seq as described above [9] is feasible in the study of antigen-specific BCRs sequencing. Additionally, recent technical improvements make single-cell sequencing an invaluable tool for describing complex cell populations, [75][76][77] T cells [78,79] or B cells, [80,81] which facilitates the bulk antibody sequencing of antigen-enriched B cells. As discussed above, single-cell sequencing advancement in bulk V H :V L paired sequencing proved invaluable in the antibody repertoire deep mining. Since the input cell count restriction of the single-cell sequencing platform necessitates a presorting for the millions of sampled B cells, antigen-specific B cells is thus widely adopted for further sequencing. [28] For example, using fluorescence-labeled OVA (Ovalbumin) conjugated to a secondary antibody, antigen-specific B cells from lymph nodes of immunized animals were OVA positiveselected by FACS, and then V H and V L were sequenced using 10X Genomics platform and natively paired according to the identical UMIs. [21] More recently, antigen-specific single-cell Ig-Seq was successfully applied to the development of antibodies against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). [62] In this study, biotinylated SARS-CoV-2 RBD (receptor-binding domain) recombinant proteins were coupled with streptavidin dynabeads for antigen-specific B cell enrichment, using the 10X Genomics platform, similar procedures were carried out to determine the selected paired V H and V L sequences, and potent neutralized antibodies were finally identified (Figure 3).
Using a 10X Genomics Chromium Controller platform, Setliff et al. [82] have developed an antigen-specific Ig-Seq method, LIBRA-Seq (linking B cell recept or to antigen specificity through sequencing). First, fluorescently labeled, distinctive DNA-barcoded antigens were used to sort antigen-positive B cells by fluorescence-activated cell sorting (FACS). Then, sorted single B cells were lysed, and cDNA was reverse transcribed sharing a common barcode. After barcoded cDNA from all cells was mixed freely, target V H and V L libraries were enriched using PCR, and the libraries were sequenced (Figure 3). Finally, each antigen specific paired V H and V L sequences were distinguished using their adjunct barcodes. This novel assay enables the detection of monoclonal antibodies specific to panels of diverse antigens, thus facilitating the rapid identification of cross-reactive antibodies and cost-effective antibody development for multiple antigenic pathogenic diseases. The LIBRA-Seq technology has been commercialized and will potentially be a practical tool to delineate antigen-specific antibody repertoires that may serve as therapeutic agents or vaccine templates. Predictably, as a more precise antibody repertoire is generated using novel antigen-specific sequencing strategies, subsequent applications of this technology will be encouraged.

Third-Generation Sequencing
Generally, V H or V L gene with an approximate size of 400 bp can be covered by NGS platforms. However, challenges remain for the conventional NGS sequencing of the single-chain variable fragment (scFv) and Fab from combinatorial libraries due to their approximately 750 and 1700 bp lengths, respectively. In this regard, the third-generation sequencing platforms are needed for the long-read antibodies exploring. As for the most widely accepted and used SMRT sequencing platform, when a sample of DNA or cDNA is loaded to a chip, it is diffused into a zero-mode waveguide (ZMW), [83] which allows the observation of only one nucleotide at a time. The four dNTPs are fluorescently linked; as each nucleotide interacts with the DNA polymerase active site, fluorescence is emitted and detected by ZMW. Finally, the fluorescent signals are translated into the sequences by base calling. [84] Using third-generation sequencing, full-length phage displayed scFv antibody repertoire was characterized, [85] with novel exons and splice variants in the human antibody heavy chain identified, [52] and ultralong CDR3 in bovine antibody heavy chains were detected. [86] Compared with NGS, the unique mechanism of third-generation sequencing makes it possess Figure 3. Schematic of the antigen-specific Ig-Seq. Peripheral blood collected from human is subsequently isolated into PBMCs. To obtained antigenspecific B cells, two methods are applied. 1) Antigen pulldown: streptavidin dynabeads are mixed with the biotinylated antigen. After incubation, the complex was added directly into the PBMCs, the mixture is then put on a magnetic rack for the antigen-specific B cell pulldown. 2) FACS: biotinylated antigen was mixed with fluorophore-labeled streptavidin, the complex was then mixed with PBMCs and sorted using FACS. In this method, more specific B cell subsets can be sorted by stained with a variety of cell markers, and multi-antigen specific B cells are also available by attaching DNA barcodes to each biotinylated antigen. After cell lysis, the mRNAs hybridized to the primers on the microparticle surface are then reverse transcribed, each with unique UMI from the beads. The microfluidic barcoded V H and V L cDNAs are pooled and sequenced. a higher cost per base and sequencing error rate. But as a PCR-free sequencing strategy, bias for some V H and V L families caused by PCR amplification can be excluded. Besides, a thorough understanding of antibody repertoires can best be achieved through the complete de novo assembly of a genome, which is far superior to the 10X Genomics platform. [87] As a demonstration of its utility, third-generation sequencing technology was used to isolate and characterize human monoclonal antibodies that recognize Plasmodium falciparum antigens, and it was found that the gene of human LAIR1 (Leukocyte Associated Immunoglobulin Like Receptor 1) was inserted into the V H ; this insertion is both necessary and sufficient for antibody function. [88] This finding suggests that third-generation sequencing with longer read lengths may be necessary in order to characterize certain antibody repertoires.

Applications of Ig-Seq
Over the last decade, Ig-Seq has revolutionized both basic and clinical researches in human immune repertoire. In previous reviews, the impacts of vaccination, infectious diseases, autoimmune diseases, and cancer [6,31,32,89] on the antibody repertoires have been deeply discussed. Herein, potential applications of Ig-Seq will be reviewed, including antibody diversity measurement, strategies of functional antibody discovery from antibody repertoires, and some novel findings involved in the antibody repertoire development.

Diversity Measurement
It is well known that the diversity of antibody libraries is huge; theoretically its size in a single individual is estimated to be between 10 11 to 10 13 . [90,91] Glanville et al. [90] performed the human Ig-Seq and generated 1.9 × 10 5 high-quality reads; combined with a capture-recapture method, the total diversity of the IgM antibody library diversity was determined to be at least 3.5 × 10 10 . Limited by the sequencing depth, this data is possibly far from the actual measurement of the antibody repertoire diversity. In 2019, a study carried out by Briney et al. [92] yielded an ultra-huge antibody repertoire of 2.9 × 10 9 raw reads from ten human subjects. Two estimators, Chao 2 [93] and Recon, [94] were selected for the intercomparable estimation of antibody repertoire diversity. Approximately 5 × 10 9 unique heavy-chain clonotypes (sequences with identical heavy chain CDR3 and V, J gene usage), 1 × 10 11 unique heavy-chain sequences, and 10 16 -10 18 paired antibody diversity were identified. Using the same estimation strategy, Soto et al. [89] performed an Ig-Seq in three individuals, yielding 1.4 × 10 9 , 1.5 × 10 9 , and 1.3 × 10 9 raw reads, respectively. The experiments revealed that circulating heavy-chain clonotypes of each individual is in the range of 0.9-1.7 × 10 7 , two to three orders of magnitude lower than the previous study, which can be explained by the further finding that only 1-6% of heavychain clonotypes were shared between two subjects, and 0.3% among all three subjects. In this regard, the repertoire size in the circulating repertoire estimated from the ten-subject pool by Briney et al. is probably overrated.
Taken together, these ultra-deep sequencing studies are currently the most solid diversity measurement, which broadens our knowledge to the authentic antibody repertoire size and guides the design of future research.

Comparison among Different Populations
Human neonates have not developed a mature immune system to respond to various antigens, [95] and elderly individuals exhibit a decline in immune function, [96] thus age is believed to be a key factor in the humoral immunity. The underlying molecular mechanisms regarding changes in immune function can be partially revealed by Ig-Seq. In elderly subjects, an increased number of B cells with long heavy chain CDR3 regions, accumulation of highly mutated IgM and IgG, and persistent clonal B cell populations were presented in the blood. [97] Interestingly, the finding that V, D, and J gene usages are similar in young and elderly individuals, is inconsistent with the study of a memory B cell IgM repertoire. [98] In that study, the elderly individuals demonstrated a higher usage of IGHV3 and IGHJ4 and lower IGHD1-IGHJ4 recombination than that of the young individuals. In neonates, the V, D, and J gene usage in the IgM repertoire showed no difference compared with adults, but the length of heavy chain CDR3 was shorter than that in the healthy human repertoire. Notably, the most prominent difference was the occurrence of an N2 addition (neonates: 64.87% vs adults: 85.69%). [99] Meanwhile, the identification of relatively high frequencies of shared antibody clonotypes between adults and neonates indicated that these clonotypes have persisted for decades. [89] Gender disparity in immune responses is a well-studied phenomenon, and sex hormone-dependent activation of T cells was shown to play an important role. [100] In B cells, a total of 24 IGHV genes, 9 IGHD genes, and 5 IGHJ genes presented sex-based differences but found no bias in V(D)J gene recombination. [101] Ig-Seq has also enabled the ability to uncover the extent to which genetic background affects the characteristics of an individual's antibody repertoire. Focusing on monozygotic twins, it was determined that the overall antibody repertoire is profoundly determined by the germ-line genome as both IgM and IgG repertoires showed high similarity in twins. But in response to an antigenic stimulus, antibodies showed individual-specific selection. [102] Rubelt et al. [103] further confirmed that heritable mechanisms affect the antibody repertoire, as well as T cell receptors (TCR) repertoire.

Novel Mechanisms in Antibody Repertoire Development Uncovered by Ig-Seq
Susumu Tonegawa discovered that V(D)J recombination is the genetic mechanism behind the diversity of human antibodies. [104] Subsequently, other mechanisms that contribute to the formation of antibody repertoires were discovered, such as somatic hypermutation, class switch, junctional diversity, and the 12/23 rule. [2][3][4]105] Recently, new mechanisms revealed how B cells exploit cohesin-mediated chromatin loop extrusion to create new antibodies via V(D)J recombination [106] and class switch. [107] Our lab has found a significantly higher occurrence and longer length of the N2 insertion in adults compared to neonates, suggesting a potential molecular mechanism behind the differences in diversity among antibody repertoires as humans age. [99] Furthermore, in addition to the common V(D) J recombination, V(DD)J recombinations have been discovered, and were detected with a frequency of 1 in 800 circulating B cells. [108] The ultra-long heavy chain CDR3s resulting from V(DD)J recombination expand the diversity of the antibody repertoires and now can be accurately aligned. [109] More than one antibody of individual B cells was revealed by single-cell Ig-Seq. [110]

Strategies of Functional Antibody Discovery from Antibody Repertoires
In addition to the hybridoma technique, B cell immortalization and microneutralization, [111] large-scale screening of libraries by phage, ribosome, yeast, or bacterial display are widely applied in antibody isolation. [112,113,61,114] Using these technologies, we and others have successfully developed functional antibodies against various targets. [115][116][117] For instance, human single-domain neutralizing antibodies targeting diverse epitopes of SARS-CoV-2 were recently identified by using phage-display based technology. [118] Flow cytometry-based single-cell sorting and cloning is another effective tool and has also successfully been employed in the development of neutralizing antibodies against SARS-CoV-2. [119] These traditional techniques are reliable, but time-consuming and labor-intensive. Also, single-cell sorting and cloning are limited to analysis of a small fraction of the human antibody repertoire. In contrast, application of Ig-Seq is able to capture the entire authentic human antibodies in circulating B cells. Different approaches that exploit antibody repertoire sequencing analysis for functional antibody isolation directly from humans or animals, without library screening, have been developed ( Table 2).
Even without rounds of selection for antigen binding and the V H and V L paired information, bulk BCR sequencing has been used to identify many functional antibodies. By sequencing the antibody variable region genes, the most abundant V H and V L sequences are paired and the recombinant antibodies verified with nanomolar affinity for antigens. [56] Combined with phage display technology, the V H clones picked from the antibody repertoire can be paired with the corresponding V L library with potent affinity against seasonal influenza. [120] Based upon sequence similarity to the HIV-1 broadly neutralizing antibody, [121] V H and V L genes generated by bulk BCR sequencing within matching branches in the phylogenetic trees were able to be paired and yield novel HIV-1 broadly neutralizing antibodies. [122] Setliff et al. [57] showed that a public antibody clonotype shared among the naïve V H IgG repertoires and antigen-specific V H :V L paired IgG repertoires encodes antibody with potent binding activity to gp120.
Customized antibody sequences databases produced by Ig-Seq have enabled the proteomic identification of antibodies from serum. In this manner, high-affinity antibodies for human PR A/B, [9] hepatitis B (HBV), [58] and tetanus toxoid (TT), [59,67]  and neutralizing antibodies against HIV-1 [68] and human norovirus [60] were accurately and efficiently observed. Applied with bulk BCR sequencing, vaccine-responsive V H clones that expanded dramatically after vaccination were able to be identified; accompanied with single-cell BCR sequencing preserving natively paired V L sequences, broadly anti-influenza antibodies were identified in seasonal influenza vaccinated adult. [27] In more recent studies, antigen-specific single B cell sequencing has been adopted for the identification of antibodies. Using this method, HIV-and influenza-specific antibodies concurrently were identified in HIV infected subjects, [82] OVA potent binding antibodies identified from OVA-immunized animals, [2] and SARS-CoV-2 potent neutralizing antibodies identified from recovered patients. [62]

Conclusion
The humoral immune system has evolved to encode an astonishing number of antibodies that collectively comprise the antibody repertoire to recognize virtually any antigen. Ultradeep Ig-Seq can now enable the full picture of human circulating antibody repertoire diversity. [89] In the last decade, Ig-Seq technologies and related bioinformatics analysis tools have been developed at a breakneck pace, and can now provide fast, unbiased, and almost error-free antibody repertoires, reshaping our understanding of many important aspects of humoral immunology and advancing antibody discovery and vaccine development. However, major biological gaps remain in the pipeline of Ig-Seq, such as 1) the human tissue-derived B cell source limitation and antibody repertoire derived from peripheral blood lacks representativeness, [27] which is insuperable currently; 2) low-throughput, time-consuming, and high-cost in native antibody chains pairing that is urgently in need of technological innovation; [26] 3) importantly, the missing relationship between antibody sequence and its function currently diminish the value of Ig-Seq and make it labor-intensive to perform functional antibody discovery from antibody repertoires. Gaps remain in the use of antibody repertoire data as no standard experimental annotation and data analysis currently exist. [123] Functional antibodies have only been identified in a small fraction of Ig-Seq studies, thus more effective approaches urgently need to be developed and commercialized. Currently, a total of 1184 BCR and TCR repertoires and 2 billion sequences are deposited in iReceptor (http://ireceptor.irmacs.sfu.ca/) and available across multiple projects, labs, and institutions. BCR sequences can be extracted from bulk unselected RNA sequencing data using the computational tool TRUST. [124] Many computational tools [125][126][127][128][129] have also been developed to assemble paired V H and V L antibody sequences from the increasing single-cell RNA-Seq data. Moreover, novel technologies of single protein molecules in nanopores and convergent selection in antibody repertoires by deep learning [130] may profoundly accelerate Ig-Seq development and antibody repertoire deep mining. Overall, it is expected that deep mining of the antibody repertoire will lead to more efficient and faster development of clinical diagnostics and immunological therapeutics. Tianlei Ying is a full professor at School of Basic Medical Science, Fudan University. He received his B.S. and Ph.D. degree from Fudan University. His research focuses on fully human therapeutic monoclonal for infectious diseases and cancer, the design, engineering, and clinical application of novel antibody constructs, and precision immunology.