Transcriptomics for child and adolescent tuberculosis

Summary Tuberculosis (TB) in humans is caused by Mycobacterium tuberculosis (Mtb). It is estimated that 70 million children (<15 years) are currently infected with Mtb, with 1.2 million each year progressing to disease. Of these, a quarter die. The risk of progression from Mtb infection to disease and from disease to death is dependent on multiple pathogen and host factors. Age is a central component in all these transitions. The natural history of TB in children and adolescents is different to adults, leading to unique challenges in the development of diagnostics, therapeutics, and vaccines. The quantification of RNA transcripts in specific cells or in the peripheral blood, using high‐throughput methods, such as microarray analysis or RNA‐Sequencing, can shed light into the host immune response to Mtb during infection and disease, as well as understanding treatment response, disease severity, and vaccination, in a global hypothesis‐free manner. Additionally, gene expression profiling can be used for biomarker discovery, to diagnose disease, predict future disease progression and to monitor response to treatment. Here, we review the role of transcriptomics in children and adolescents, focused mainly on work done in blood, to understand disease biology, and to discriminate disease states to assist clinical decision‐making. In recent years, studies with a specific pediatric and adolescent focus have identified blood gene expression markers with diagnostic or prognostic potential that meet or exceed the current sensitivity and specificity targets for diagnostic tools. Diagnostic and prognostic gene expression signatures identified through high‐throughput methods are currently being translated into diagnostic tests.

cells. 2 If Mtb survives this encounter with the innate system and sensitizes the adaptive immune system, as measured by tuberculin skin testing (TST) or interferon gamma release assays (IGRAs), the individual is said to have Mtb infection (sometimes termed TB infection or latent TB infection). Commonly, the mycobacteria are contained by the immune system with only low numbers of organisms persisting. However, if the bacilli overcome these constraints and multiply, symptoms and signs of TB disease develop, accompanied by radiological changes in the lungs or other sites of disease. It may be possible to isolate mycobacteria from respiratory samples or samples taken from other sites of disease that can be cultured or identified using molecular tests. Overall, about 10% of individuals with Mtb infection will progress to TB disease. Drug therapy given to individuals with Mtb infection is effective at preventing this progression and is termed TB preventive therapy (TPT or latent TB treatment). If an individual develops TB disease, then TB disease treatment can be given, which is again successful in most patients.
The natural history of TB in children and adolescents is different to adults. The risk of Mtb infection rises with age in a relatively linear way dependent on the prevailing TB prevalence in that context, reflecting cumulative exposure. However, the risk of progressing from Mtb infection to TB disease varies with age. 3 Young children (<5 years of age) are at high risk of disease progression, with the risk falling to a nadir in primary school age children. 4 This risk rises as children enter puberty, increasing earlier in females but with males then following and eventually overtaking females in adulthood.
Work done in the era prior to antibiotics suggests that there is a "timetable" for TB in children, with almost all children who progress to disease doing so within a year or two of infection. In addition, the type of disease seen in children is different to adults. Young children typically have paucibacillary disease, a term implying that few or no organisms are commonly found in respiratory samples that undergo microbiological evaluation. TB disease in this age group usually presents as either intrathoracic lymph node disease or disseminated disease, including TB meningitis or miliary TB. 5 As children enter adolescence, they begin to develop adult-type disease with extensive parenchymal involvement and cavities. Large numbers of organisms are commonly isolated from respiratory samples that undergo microbiological evaluation.
The World Health Organization (WHO) defines children as <15 years and adolescents as 10 to <20 years. It is estimated that currently 70 million children have Mtb infection and each year 1.2 million develop TB disease. 6 It is estimated that an additional 535 000 15 to <20-year-old develop disease each year. 7 Only about half of the estimated number of incident cases of child TB each year are diagnosed and treated, and WHO suggests that 230 000 children die of TB annually, 6 with 96% of these being undiagnosed. 8 For many years, global and most national guidance has been that children <5 years and those living with HIV who have been exposed to an infectious case of TB should receive TPT. Guidance has been to treat following exposure given that tests of Mtb infection are rarely available in high TB-burden, low-resource settings. However, of the 1.27 million children estimated to be eligible for TPT in 2017, only 23% received treatment. 9 The most recent WHO guidance expands TPT provision to also say that all household contacts of infectious TB patients can be given TPT following exclusion of TB disease. 10 Very few of these individuals receive TPT each year. The WHO End TB Strategy seeks to reduce TB deaths by 95% by 2035, compared with 2015 levels, as well as reduce incidence by 90% over the same period, with children and adolescent comprehensively included in the Strategy. However, it is recognized that using current approaches, global progress will fall far short of these targets. 11 The COVID-19 pandemic has severely disrupted health services and TB programs.
This disruption is almost certainly a factor behind the observed 25-50% fall in the detection and treatment of new TB cases which was observed in just a 3-month period in 2020. 6 Though the impact of COVID-19 control measures on Mtb transmission has not yet been defined, it is predicted that reduced case-finding and treatment during the pandemic will lead to increased TB mortality. [12][13][14][15] The End TB Strategy suggests that to achieve their ambitious targets new tools will be required. These include point-of-care diagnostic tests for TB disease, new tests to identify which individuals will progress to disease in the future, as well as new vaccines and new drugs. To develop these tools, it is increasingly recognized that a more complete understanding of the immune response to Mtb is required, as well as a better understanding of how to use host responses to discriminate between clinical groups. One area of host response biology that has evolved substantially over the last 10 years is transcriptomics or the study of RNA expression. In this article, we will discuss the role of transcriptomics in child and adolescent TB, with a particular focus on transcriptomics in blood. We review the literature on studies that have employed transcriptomics to both better understand child and adolescent TB as well as develop diagnostic tests.

| CHALLENG E S IN CHILD AND ADOLE SCENT TB
There are multiple challenges facing the field of childhood TB, many of which could benefit from the application of transcriptomic approaches ( Figure 1). Some of these challenges relate to decisionmaking around the clinical management of children, some to the development of new therapies or vaccines, and some to the way that clinical research in child TB is conducted. Below we first outline these challenges and then we systematically describe studies that have been done in this field.

| Discriminating children with TB from children with other diseases
Children with TB disease generally have non-specific symptoms and signs that overlap with other common conditions seen in childhood.
Radiological investigations, the most used being chest X-ray, are also frequently non-specific with abnormalities that could be consistent with TB but also with other conditions. While TB disease in adults is usually diagnosed through the identification of Mtb in the sputum, this type of diagnostic confirmation is uncommon in children. There are two reasons for this. The first is that samples for evaluation can be challenging to obtain. As young children are unable to spontaneously expectorate sputum, more invasive sampling is required, including induced sputum (requiring nebulized hypertonic saline to stimulate coughing followed by aspiration of sputum from the pharynx through the nose), gastric aspiration (the suctioning of stomach contents using a nasogastric tube to collect sputum coughed and swallowed during sleeping), and the collection of stool samples (to identify swallowed Mtb that has passed through the digestive system). The second is that even if good quality respiratory samples are collected, microbiological evaluation using culture or molecular diagnostic testing only identifies Mtb in a relatively low proportion of children determined clinically to have TB disease (commonly about 20%). 16 It is assumed that this is due to the presence of few organisms in respiratory samples. Given these challenges, new diagnostic tests are required, ones that do not rely on the microbiological evaluation of respiratory samples.
Of the children who present to health facilities for evaluation of their clinical symptoms and signs, some will have TB, but many will not. Established symptoms required to classify a child as a presumptive TB case include any of the following 17,18 : (a) cough ≥2 weeks, (b) persistent, unexplained lethargy, (c) unexplained fever ≥1 week, (d) poor growth/weight loss over the preceding 3 months, or (e) cough <1 week with a known TB exposure in the previous 12 months. A positive TST or chest X-ray suggestive of TB are also criteria. A test that can discriminate presumptive TB cases who have TB from presumptive TB cases who have other causes for their symptoms would make a substantial impact on the vast under-diagnosis of child TB that is seen on a global scale. In turn, this could impact dramatically on child TB mortality.
In addition to children being brought to healthcare services for evaluation (termed passive case finding), active case finding strategies seek to identify new, undiagnosed cases amongst high-risk populations. Children with HIV, those with malnutrition and those with any degree of immunosuppression should be considered at high risk. It is recommended that these children are regularly screened for TB disease. A further high-risk group is children recently exposed to infectious cases of TB, given the high proportion with infection and the substantial risk of disease progression in young children with recent exposure. Global guidance and almost all national guidelines suggest that after a new infectious adult TB case is diagnosed, the house should be visited, and all household members evaluated for TB disease. Systematic reviews of the yield of household contact tracing suggest that between 5 and 10% of children screened at home visits have prevalent TB disease. 19,20 Deciding which children have TB disease, which have other diseases that are causing symptoms, and which are well, can be challenging as again, symptoms, signs, and radiology are non-specific. A test that could assist in identifying those children who need TB disease treatment would be very beneficial.

| Predicting disease progression in TB-exposed children and adolescents
Following exposure to an infectious case of TB, and following exclusion of TB disease, TPT is advised for young children and children living with HIV. 10 However, even though these children are at increased risk of developing TB compared with HIV-negative older children and adults, most of these "high-risk" children will not progress to TB disease. The only tests that are currently available that assist in decision-making are the tests of Mtb infection, namely the TST and IGRA. These tests signify immunological sensitization by F I G U R E 1 Overview of the role of transcriptomics in pediatric and adolescent TB, together with steps required for RNA quantification and bioinformatics analysis. Created with BioRe nder.com detecting Mtb-specific T-cell responses, and do not indicate if there are either viable bacilli in the child or if there is a high risk of future disease progression. TB-exposed children with a positive TST or IGRA are at higher risk of disease progression than children with negative tests but still only a small proportion of those with positive tests will progress to disease. 21 Children under 5 years with a positive IGRA/TST have a 2-year incidence of ~20% while in children over 5 years this risk is ~10%. Overall, for all TB-exposed children (irrespective of infection status), the risk is substantially below 10%.
Increasingly there is recognition that older HIV-negative children and adults should also be given TPT following household exposure. 10 For these individuals, the risk of future disease progression is even lower.
This means that many well children, adolescents, and adults need to be given treatment to prevent each TB case. TPT involves several months of medication which can be challenging for children and families and puts additional strain on health systems. TPT is also seen as a low priority by health services and families and treatment completion rates for children are very low. 22 Although rare, adverse events do occur, there are non-specific effects on the microbiome, and any unnecessary pill burden is ideally avoided. A biomarker that could identify which TB-exposed children and adolescents are at high risk of disease progression would allow targeting of interventions to those most needing them.
Increasingly the concept of Mtb infection and TB disease being dichotomous disease states is being challenged and a dynamic continuum recognised. 23 Under the most commonly accepted model of this continuum, 24 those with incipient TB have detectable metabolic activity of Mtb, but without any clinical symptoms or signs, radiological abnormalities or positive microbiology for Mtb. Those with sub-clinical disease do not have clinical symptoms or signs but may have radiological changes or it may be possible to isolate Mtb from respiratory samples. Individuals with incipient or sub-clinical disease are more likely to progress to clinically apparent disease and so any biomarker that could identify these clinical states, allowing appropriate treatment, would be valuable.

| Identifying children and adolescents who are not responding to TB disease treatment
Treatment outcomes for children diagnosed and treated for TB are generally good with low rates of mortality. 25 However, there are several groups of children in whom the proportion with unfavorable outcome is higher. These include children with certain forms of TB, most notably TB meningitis, 26 and children with other conditions complicating their TB treatment, such as HIV co-infection or malnutrition. Also, if a child is treated for drug-susceptible TB, based on clinical criteria, while having drug-resistant TB, then treatment is unlikely to be effective. Children with TB who are not given their TB drugs, or who do not take their drugs, do not do well and if a child is diagnosed with TB based on clinical criteria and fails to respond, one potential reason is that they do not have TB but have another cause for their symptoms, signs, and radiology. In contrast to younger children, treatment outcomes for adolescents are less good, 27 with high rates of treatment failure and death. In all these instances, it would be useful identify that a child or adolescent is not responding to treatment as early as possible to allow appropriate interventions. In clinical practice, it often takes several months to identify that a child or adolescent is not responding well to treatment. Failure to put on weight, persistence or worsening of symptoms, and worsening radiology all indicate treatment failure, but these take time to detect and are not specific. For adult TB, where most patients are microbiologically confirmed at baseline, sputum smear, or culture conversion at 2 months is used as a surrogate marker. 28 While this marker is associated with favorable outcome, it is not a sensitive or specific indicator and 2 months is late to be identifying a patient failing therapy.
Although this 2-month microbiological conversion can be used for individuals who were microbiologically confirmed at baseline, this represents a small proportion of children treated for TB.
If it were possible early in therapy to identify children and adolescents who were not responding to treatment, then that individual could be evaluated thoroughly. The patient could be counselled intensively and supported to take their treatment if poor adherence was found to be a problem. Management of co-morbidities could be enhanced. Samples could be taken to evaluate for drug resistance and further investigations carried out to look for other diagnoses. Ultimately, it may be possible to use a change in biomarker status after several days to support the diagnosis of TB in those clinically diagnosed. It might be possible to conclude that those without any change in TB-specific biomarkers might not have had TB at baseline or have drug-resistant disease.

| Tailoring therapy to disease severity and treatment response
In many areas of medicine, personalized precision therapy is becoming more common with treatment targeted to host genotype, disease type, disease site, disease severity, and response to treatment.
Yet for programmatic reasons, almost all TB cases are given the same combination of drugs, at the same dosages and for the same duration. 18 Most children with TB do not need 6 months of therapy and some might be successfully treated with substantially shorter durations. Early TB trials in adults demonstrated that although most patients with sputum smear-negative TB were cured after even 2 months, an unacceptably high proportion relapsed. 29 As it was not possible in those early studies to predict which patients might relapse, it was felt preferable to treat all patients with the minimum duration to achieve relapse free cure in >95%. This ultimately means overtreating most individuals and almost all children.
Increasingly there is recognition that different patients with different forms of TB may be appropriately treated with different drug combinations, dosages, or durations. A recently completed phase 3 clinical trial called SHINE (Shorter Treatment for Minimal Tuberculosis in Children) recruited children with minimal TB and randomized them to either the conventional 6 months of treatment or a new 4-month treatment duration using the same drugs. 30 In children, minimal or paucibacillary disease accounts for two thirds of all childhood TB, and so, many children would be spared the additional and unnecessary 2 months of treatment.
The trial found that 4 months of treatment were not inferior to the longer treatment duration. This exciting development has led to a revision to WHO guidance, 31 but a key challenge is to reliably define non-severe disease. For the trial, non-severe disease was classified as extra-thoracic lymph node TB or pulmonary TB which was sputum smear-negative and non-severe on chest X-ray.
These are not easy to determine and are subject to substantial inter-investigator variability.
A biomarker that could discriminate severe disease from nonsevere disease would pave the way for decision-making at baseline that could be stratified, or ultimately personalized. In addition to stratifying children at baseline into different phenotypes that may benefit from different therapeutic approaches, it may also be possible to tailor treatment duration to therapeutic response.
A biomarker that modelled the trajectory of response to treatment would make it possible to decide when a child has returned to a "normal" state and at that point it might be possible to stop treatment.

| Identifying which children and adolescents will develop disease-related morbidity
There is increasing recognition that many TB survivors suffer substantial morbidity. A fifth of children with TB meningitis die, but of survivors over half have permanent long-term neurological impairment. 26 Although data in children and adolescents are limited, over half of adults who have survived pulmonary TB have substantial respiratory morbidity, and those surviving TB have increased risk of death. 32,33 Although severity of disease at baseline is a strong indicator of long-term morbidity, the reasons why some individuals develop post-TB morbidity while others do not is poorly understood and is likely to be due to the host inflammatory response causing host tissue damage and scarring.
If it were possible to determine either at baseline or during treatment, which individuals were likely to develop morbidity, it may be possible to intervene. This might include host-directed therapies (HDTs) at baseline to prevent future morbidity, 34 or the early identification of those with impairment and provision of supportive therapy.

| Using biomarkers in clinical research for the evaluation of new drugs or vaccines
Demonstrating the efficacy of new anti-TB drugs or TB vaccines requires trial entry and exit points. Entry points are inclusion or exclusion criteria while exit points are trial outcomes. For TB disease treatment trials, the entry point is TB disease, and the exit points include cure, treatment completion, treatment failure, death, or TB relapse. For TPT trials and most vaccine trials that aim to prevent TB, the entry point is the exclusion of TB disease, with the outcome of interest being TB disease or death. If a biomarker were able to distinguish children with TB disease from those without, the ascertainment of these entry and exit points would be made much easier. In addition, if a biomarker was available that indicated children and adolescents with Mtb infection who were at higher risk of disease progression, then TPT trials might opt to focus only on those individuals, making the sample size required for a trial much smaller. In addition, if surrogate biomarkers were identified that served as a correlate of disease or protection, ones which indicated future disease progression or treatment failure but at a much earlier timepoint than clinical outcomes, then the duration, sample size, and cost of clinical trials of new TB drugs (for both TPT and TB disease treatment) as well as for new vaccines, could be reduced.

| Biological insight
The development of new vaccines and new HDTs requires an insight into the biological interaction between the host and Mtb. The aim of most vaccines is to either prime or modulate adaptive immune responses, so that when Mtb is encountered, the response is more effective and can either contain or eradicate the organisms before they proliferate and causes disease. The aim of most TB HDTs is to promote helpful inflammatory processes that assist the immune response in containing or eradicating Mtb, while inhibiting the damaging, destructive components of the host response that either assist bacterial proliferation or cause substantial host tissue damage that leads to mortality or long-term morbidity beyond the impact of Mtb.
By comparing the transcriptomic response of children who have been exposed to Mtb and do not progress to disease, with TBexposed children who do progress, it may be possible to better understand the immune response that is effective in Mtb containment.
Vaccines that seek to prevent TB, should aim to promote those immune responses. When evaluating the impact of HDTs on TB pathogenesis, it would be informative to compare children with TB and extensive host damage with child with minimal damage. In this way, destructive inflammatory pathways might be identified that could be treated with targeted HDTs.

| Impact of coinfections on TB susceptibility and disease progression
Children and adolescents are infected with multiple pathogens during the first two decades. Consequently, TB-coinfections are common in high TB-prevalence regions, 35 and there is growing recognition that coinfections may influence TB susceptibility, natural history, and the performance of diagnostics. 36,37 Infection with a co-pathogen may provoke an immune response which may disrupt anti-mycobacterial immunological pathways important for controlling and containing Mtb infection.
Growing evidence suggests that viral coinfections such as influenza may increase susceptibility to Mtb infection or increase the risk of progression to disease. 37 with the true number of infections likely to be substantially higher.
Countries with high TB incidence such as India and South Africa have reported tens of millions of SARS-CoV-2 infections to date. 44 The pandemic has disrupted health services and TB control programs, 12 with significant drops in case finding and treatment 6 and predictions that this will lead to increased disease burden and mortality. [12][13][14][15] With their overlapping epidemiology, risk factors, and clinical pre-  Due to accessibility, minimal invasiveness in collection, and the key role peripheral blood and its compartments play in host defense and immunity, peripheral blood has become the focus tissue for host transcriptomic studies in child and adolescent TB. This follows work done in adult studies which first established that immune changes associated with pulmonary disease can be identified and quantified in RNA derived from peripheral blood. 58 In this review, we focus on transcriptomic profiling of whole blood cells and peripheral blood mononuclear cells (PBMC) in the context of Mtb infection and disease.

Induction of tissue and immune responses against
As RNA is sensitive to degradation, which can hamper the quantification of results, specific RNA stabilizing systems are used for sample collection, shipping and storage that allow preservation of RNA. Different RNA stabilizing blood-sampling systems may introduce differences in the downstream quantification results, with thousands of genes being reported as significantly differentially expressed (SDE) between RNA stabilizing reagents. This needs to be taken into consideration in study design and meta-analyses. 59 Subsequently, fine-tuned protocols for RNA purification can ensure the quality, integrity and yield of isolated RNA, along with minimization of potential DNA contamination. 60 Recent studies have shown in vitro whole blood stimulation with Mtb antigen peptides, which has been used in proteomic biomarker discovery studies, can unmask transcriptomic signals that are not detectable in unstimulated samples 61 or enhance the diagnostic potential of single gene markers in high burden settings. 62 In terms of sample volume, although blood-volume dependent reduction in gene levels has been reported, transcriptomic profiling can be achieved with small volumes of blood, which is particularly important in young children. High quality RNA-Sequencing results have been reported in neonates using volumes as low as 0.5 mL of peripheral venous or arterial blood. 63

| Methods for RNA quantification
The "candidate gene" approach focuses on measuring expression for small numbers of genes and can be used when the genes of interest are already known. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a sensitive, accurate, highly reproducible method able to detect very small amounts of RNA. It is considered a benchmark technology and forms the basis of various nucleic acid identification platforms with bedside use. Reverse transcription as a methodological step is shared between most protocols for RNA quantification. The fluorescence emitted at the end of each cycle of PCR is used for the estimation of the quantity of the starting material. 64 RT-qPCR has enabled multiple scientific breakthroughs and can now be used to simultaneously detect and quantify multiple nucleic acids through multiplexing. 65 Over the past decade, RNA-Seq has become an indispensable and popular tool for transcriptome-wide analysis with many applications in studying TB host response as it is independent from pre-existing sequence information. However, of the total RNA in a cell, 80% is ribosomal RNA (rRNA), 15% is transfer RNA (tRNA), leaving only 5% as mRNA and all the other RNA forms. 73 To focus on the RNA molecules of interest, RNA-Seq libraries are prepared using either polyA+ selection (mRNA enrichment) or rRNA depletion. rRNA depletion allows the detection of more transcripts including non-coding RNA (ncRNA), small nucleolar RNA (snoRNA), and small nuclear RNA (snRNA). A comparison between the two methods has shown that although rRNA depletion captured more unique transcriptome features, for blood-derived RNAs, 220% more reads would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection approach. 74 Globin transcript depletion is another critical step to obtaining data suited for blood transcriptome analysis. Total RNA from whole blood contains a large portion of globin transcripts, which originate from red blood cells and account for 80-90% of total transcripts. 75 These affect the quality and accuracy of gene expression profiling and mask the quantification of genes with low expression levels. Next Generation Sequencing (NGS) technology is based on detecting and recording light that is emitted when a complimentary nucleotide is added to a particular fragment of cDNA.
The light detected will determine the identity of the nucleotide ("base calling") and subsequently the sequence of the whole "read" in single base-resolution. The reads are either mapped bioinformatically to a reference genome or assembled de novo to produce the transcriptome, a base-resolution expression profile. Read mapping allows for the quantification of RNA and providing abundance estimates. 76 RNA-Seq has revolutionized the field of transcriptomics, also allowing discovery of novel transcripts, alternative splicing, the detection of gene fusion events and allele-specific expression. RNA-Seq also permits simultaneous sequencing of pools of transcripts that may come from different organisms that coexist in the same environment, termed metatranscriptomics. 77,78

| Analytical approaches for transcriptomewide profiling
The quality control, pre-processing and analysis steps for microarrays are mostly standardized, while RNA-Seq data analysis pipelines, which are more complex and computationally demanding, can consist of a greater variety of steps and tools. The abundance per feature per sample is the input for differential expression analysis for both quantification methods. After quality assessment and exclusion of poor-quality samples, normalized microarray expression values are used for downstream analysis. For RNA-Seq, sequence reads need to be adapter-and quality-trimmed, and then aligned either to the human genome or transcriptome. 79 Features then are quantified, with low abundance features filtered and followed by normalization processes to account for biases, noise, and sequencing depth variation. Subsequently analytical approaches follow according to the biological and clinical questions that are being addressed (i.e., differential gene expression analysis and alternative splicing analysis).
Both microarray and RNA-Seq differential expression analysis workflows are followed by multiple testing corrections to control for false positive errors.
The data analysis workflow is quite different for studies intending to discover diagnostic vs. mechanistic transcriptional signatures of disease. The one shared component is a set of initial algorithms to identify gene sets associated with different disease states, termed differentially expressed genes. For biomarker discovery, feature selection methods are employed to identify the marker or the combination of markers that minimize the classification error or maximize the accuracy of classification for patient subgroups, while eliminating noise and redundant genes. Feature selection methods are divided into filter, wrapper, and embedded methods. 80 Filter methods select a feature subset from the original dataset by evaluating the relation between each input variable and the target variable (e.g., statistical methods or feature importance methods). They are usually used as a pre-processing step, followed by a machine learning algorithm. The In prospective recruitment studies, the positive predictive value (PPV), which reflects the probability of a patient having the disease when the test is positive, the negative predictive value (NPV), which reflects the probability of a patient not having the disease when the test is negative and likelihood ratios are reported in addition. 84 Confidence intervals are calculated to measure the reliability of the estimates. For case-control studies, the ratio of gold standard positive and negative individuals does not reflect the real prevalence of the disease in a community or hospital setting, as in observational studies. Given the dependency of NPV/PPV on the prevalence of the disease in the population, it is important to provide estimates of these values specific to scenarios in which such a diagnostic test would be applied. In this case, prevalence can be interpreted as "the probability before the test is carried out that the subject has the disease." 82

| Understanding TB biology
Apart from biomarker discovery, interpreting differential expression results in terms of higher order biological processes or molecular pathways is a key outcome in transcriptomic analysis. One of the most commonly used resources is gene ontology (GO) databases, which annotate genes according to a dictionary of annotation terms, to identify the terms that are over-represented or enriched. 85 Another commonly used annotation database is the Kyoto Encyclopedia of Genes and Genomes (KEGG), a curated database of molecular pathways and disease signatures. 86 Ingenuity Pathway Analysis (IPA-QIAGEN) provides a series of different functionalities, allowing for the identification of significantly enriched canonical pathways, network analysis, and upstream regulating molecules. 87 In comparison to methods using overlap statistics such as the cumulative hypergeometric distribution to identify whether a group of differentially expressed genes is enriched for a pathway or ontology term, a different method can be used termed Gene Set Enrichment Analysis (GSEA) which considers all of the genes in an experiment, rather than only those above specific cut-offs. 88 There are different tools for performing GSEA analysis including MSigDB, 89 g:Profiler, 90 and DAVID. 91 Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Molecular profiling using bulk RNA-Seq in heterogeneous tissues, such as blood, is confounded by the relative proportions of different cell types in the tissue. Single cell RNA-Seq data is quickly becoming the "gold standard" technique for cell specific expression profiles but is an expensive and data-analysis intensive technique.  (Table 1).

| Transcriptomics as a diagnostic tool
The field of infectious disease diagnostics has embraced molecular tools that profile the host response and can enhance disease diagnostic pipelines, particularly when the detection of the pathogen of interest is challenging, as in pediatric TB. 96 In clinical practice, a gene signature measured in blood that can distinguish pediatric TB from other diseases with similar presentation to TB would be of great value in evaluating symptomatic patients presenting to medical services with symptoms of TB.
Two studies to date have discovered diagnostic gene signatures specific for pediatric TB in a hypothesis-free transcriptome-wide manner. Verhagen and colleagues in 2013 published the first microarray profiling study for pediatric TB biomarker identification in Warao Amerindian children. 97 A signature of 116 genes identified by the random forest algorithm separated 9 TB cases from 9 with Mtb infection and 9 healthy controls in the training set, which was then subsequently validated in publicly available adult datasets. 72,98 Following random forest bootstrapping, the list was reduced to 10 genes that was validated using RT-qPCR in the discovery cohort,  (Table 1). Although important as a proof-of-concept, TA B L E 1 Studies that have used transcriptomic approaches in child and adolescent TB, presenting original patient recruitment data and analysis

First author
Year published

Country -Population
Description of study  Mtb infection, using samples collected at different timepoints prior to TB diagnosis, which was able to discriminate between TB progressors from non-progressors (Figure 3). 110

| Treatment response
Currently available tests have very low accuracy for monitoring TB treatment response and predicting failure or relapse in pulmonary TB even in adults. 115 To improve disease outcomes, we need better biomarkers to identify appropriate responses to treatment, that will allow us to identify treatment failure early and enable shortening treatment. It has been shown that the RISK6 signature tracks treatment response in adults, 113 Figure 11).

F I G U R E 3
Strategy for discovery and validation of the tuberculosis risk signature. Synchronization of the adolescent cohort study training set in terms of the clinical outcome. To ensure optimal extraction of a tuberculosis risk signature from the adolescent cohort study training set, the timescale of the RNA-Sequencing dataset was realigned according to tuberculosis diagnosis instead of study enrolment, allowing gene expression differences to be measured before disease diagnosis. Each progressor within the adolescent cohort study training set is represented by a horizontal bar. The length of the bar represents the number of days between study enrolment and diagnosis with active tuberculosis. During follow-up, each progressor transitioned from an asymptomatic healthy state (green) to pulmonary disease (red). The left graph shows alignment of PAXgene sample collection (black points) with respect to study enrolment. The right graph shows alignment of PAXgene sample collection with respect to diagnosis with active tuberculosis, for use in analysis. From Zak and colleagues. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet 2016; 387: 2312-22

| Transcriptomic profiles in children vs adolescents vs adults
The risk of progression from Mtb infection to disease is substantially lower in pre-adolescent children above 4 years of age than in postpubescent adolescents and young adults. 123   performance is generally less good. However, it is in these children that tests are most needed. There are several explanations for why performance may be less good, and it is likely that all are to some extent evident. First, all children with unconfirmed disease have TB but have a milder form of TB that has a less pronounced transcriptomic signature than children with confirmed disease. Second, some of the children with unconfirmed disease have TB but the overall signature for children with unconfirmed disease is "diluted" by the children who do not actually have TB at all. Finally, the interaction between host and pathogen in unconfirmed TB is actually subtly different to the interaction seen in confirmed disease.
One approach to try to statistically deal with the analysis of diagnostic studies in which the "gold" standard is imperfect is using latent class analysis. 131

| Development of point-of-care tests
The use of these biomarkers in a clinical decision process either as standalone diagnostic tools or in conjunction with other tools needs to be studied further. Ultimately, prospective studies would be required in which the decisions about whether and when to initiate TB treatment are evaluated when using the new biomarkers. A concern in using transcriptional signatures as clinical diagnostic tools in resource poor settings is the complexity, cost, and time needed for the current methodologies for isolating and quantifying RNA from blood.
The approaches described above have generally collected samples from children/adolescents and then divided the population into distinct clinical groups before identifying the minimal number of transcripts that can effectively discriminate these groups. A discovery/ test approach is commonly used prior to validation in an external cohort. However, the next step necessary to make a signature like this useful clinically is to translate it into a true point-of-care test (POCT).
For RT-qPCR based platforms, individual primers to each transcript need to be designed and these need to be then tested experimentally and validated externally. Platforms need to be robust, affordable,

| Evaluation in real-time as point-of-care tests
Once a POCT has been developed it must be evaluated clinically to

| Integration into treatment decision algorithms
In most clinical decision-making, the pre-test probability of a disease is combined with a test result to arrive at a post-test probability that then informs whether treatment should be started. For many children with presumptive TB, information from the clinical history and examination alone is sufficient to either reassure the healthcare worker that the child does not have TB or that the child is highly  135,136 To date, these have not included transcriptomic biomarkers. An illustrative example is shown in Figure 12.

| CON CLUS IONS
Over the last ten years, transcriptomic approaches have led to the generation of multiple biomarkers in adults that can predict future TB disease progression, discriminate TB from other disease and monitor treatment response. Transcriptomics has also allowed for novel insights into the pathogenesis of Mtb infection and disease as well as better understanding of the host response to this pathogen. Work in children and adolescents has F I G U R E 1 2 Illustration of an integrated TB treatment decision algorithm including biomarkers lagged but several seminal studies have demonstrated that the host response to Mtb varies with age and the discovery of childspecific biomarkers requires child-specific studies. As these signatures are developed, they will need to be translated first into POCTs and then rigorously evaluated in the relevant clinical contexts, alone, and as part of integrated algorithms. In addition to the discovery of pediatric and adolescent biomarkers, transcriptomic studies in children are beginning to help us understand the biology of Mtb infection and disease in this age group, which will be vital to develop better vaccines and therapeutics.
Transcriptomics has the potential to substantially contribute to meeting global End TB targets.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analyzed in this study. Data presented in Figures 5-9 are publically available in Anderson et al [99].