Systematic review with meta‐analysis: the accuracy of serological tests to support the diagnosis of coeliac disease

Summary Background There is growing support for a biopsy avoidant approach to diagnose coeliac disease in both children and adults, using a serological diagnosis instead. Aims To assess the diagnostic accuracy of serological tests for coeliac disease in adults and children. Methods Seven electronic databases were searched between January 1990 and August 2020. Eligible diagnostic studies evaluated the accuracy of serological tests for coeliac disease against duodenal biopsy. Risk of bias assessment was performed using QUADAS‐2. Bivariate random‐effects meta‐analyses were used to estimate serology sensitivity and specificity at the most commonly reported thresholds. Results 113 studies (n = 28,338) were included, all in secondary care populations. A subset of studies were included in meta‐analyses due to variations in diagnostic thresholds. Summary sensitivity and specificity of immunoglobulin A (IgA) anti‐tissue transglutaminase were 90.7% (95% confidence interval: 87.3%, 93.2%) and 87.4% (84.4%, 90.0%) in adults (5 studies) and 97.7% (91.0%, 99.4%) and 70.2% (39.3%, 89.6%) in children (6 studies); and of IgA endomysial antibodies were 88.0% (75.2%, 94.7%) and 99.6% (92.3%, 100%) in adults (5 studies) and 94.5% (88.9%, 97.3%) and 93.8% (85.2%, 97.5%) in children (5 studies). Conclusions Anti‐tissue transglutaminase sensitivity appears to be sufficient to rule out coeliac disease in children. The high specificity of endomysial antibody in adults supports its use to rule in coeliac disease. This evidence underpins the current development of clinical guidelines for a serological diagnosis of coeliac disease. Studies in primary care are needed to evaluate serological testing strategies in this setting.


| INTRODUC TI ON
Coeliac disease is a chronic small intestinal immune-mediated enteropathy triggered by the ingestion of gluten, a protein found in wheat, rye and barley. 1 Exposure to gluten results in intestinal damage of varying severity in patients affected by coeliac disease.
Symptomatic coeliac disease is characterised by gastrointestinal symptoms, including diarrhoea, nausea, vomiting and abdominal pain, and extraintestinal symptoms such as fatigue and weight loss.
Coeliac disease is estimated to affect around 1% of people in the UK, 2 however only 24% of those with coeliac disease are thought to be diagnosed. 3 These large numbers of undiagnosed patients-known as the "coeliac iceberg"-are thought to be a consequence of the nonspecific nature of coeliac disease symptoms and variation in clinical presentation, from none (asymptomatic coeliac disease) to a broad spectrum of symptoms. 1 People with certain health conditions, such as type I diabetes, autoimmune thyroid disease or Down syndrome, as well as first-degree relatives of people with coeliac disease, are at higher risk of developing coeliac disease than the general population and are more likely to present without classical symptoms. 4 Currently, the only treatment for coeliac disease is lifetime adherence to a gluten-free diet, which is expensive and can be difficult to comply with. Left undiagnosed and untreated, coeliac disease often leaves patients with troublesome symptoms that significantly affect their quality of life and lead to a higher risk of complications such as osteoporosis, infertility and small bowel cancer. 5 As such, a timely and accurate diagnosis of coeliac disease is important.
Coeliac disease is diagnosed using a combination of serological tests for coeliac-specific antibodies and endoscopic intestinal biopsy. Current guidelines by the National Institute for Health and Care Excellence recommend both adults and children with suspected coeliac disease first undergo serological testing for total immunoglobulin A (IgA) and IgA anti-tissue transglutaminase (tTG). 4 In IgA deficient patients, immunoglobulin G (IgG) endomysial antibodies (EMA), IgG deamidated gliadin peptide (DGP) or IgG tTG can be used. In adults, weakly positive for IgA tTG, IgA EMA should be measured. Seropositive adults should be referred for intestinal biopsy, while seropositive children should be referred for further investigation, which may include intestinal biopsy, IgA EMA, human leukocyte antigen (HLA) genetic testing, or a combination of the above. 4 Intestinal biopsy is invasive and can be burdensome for patients, particularly children, who require general anaesthesia to undergo the procedure. Patients must consume a gluten-containing diet for at least six weeks prior to any serological test or biopsy, meaning those with coeliac disease may continue to experience painful and debilitating symptoms while they wait. Guidelines in the UK have begun to move towards biopsy-avoidance strategies for coeliac disease in children and, more recently, in adults. In their 2013 guidelines, the British Society of Paediatric Gastroenterology, Hepatology and Nutrition advised that children with IgA tTG greater than or equal to 10x the upper limit of normal for the assay, positive for IgA EMA and HLA positive do not need to undergo biopsy to confirm their coeliac disease diagnosis. 6 During the coronavirus pandemic, the British Society of Gastroenterology published interim guidance including a COVID-19 specific non-biopsy protocol for adults with suspected coeliac disease. 7 Previous systematic reviews of the accuracy of serological testing for diagnosing coeliac disease suggest that the tests are highly sensitive and specific in both adults and children. [8][9][10][11][12] These systematic reviews, however, are all out-of-date and most have methodological limitations. Limitations included: limited search [8][9][10][11] ; use of the Moses-Littenberg model 13 to pool estimates of sensitivity and specificity rather than the more robust bivariate or hierarchical summary receiver operating characteristic (HSROC) models 14,15 or no statistical synthesis of results 10 ; and use of the original QUADAS-tool 16 to assess study quality (although at the time this was the most appropriate tool) the results of which were then not incorporated into the synthesis, 8,9,11 or no quality assessment. 10 The most recent, comprehensive review by Maglione et al, conducted for the AHRQ programme, was the only review to include more than 20 studies. 12 However, this review also included existing systematic reviews and only generated overall summary estimates for studies published since existing reviews; it did not produce overall estimates of the accuracy of the included serological tests. This review was restricted to studies that either included at least 300 participants or were conducted in an "at risk" population-reasons for the sample size restriction were not justified. None of the reviews considered the study threshold when calculating pooled estimates of sensitivity and specificity.
The purpose of this systematic review is to provide a robust and up-to-date evaluation of the accuracy of serological tests for coeliac disease in adults and children.

| MATERIAL S AND ME THODS
This review followed Cochrane recommended methods and guidance from the Centre for Reviews and Dissemination for systematic reviews of diagnostic test accuracy. 17,18 Our findings are reported in accordance with the PRISMA-DTA guidelines. 19 We developed and followed a standard protocol for all stages of the review, which was registered with PROSPERO (registration number: CRD42019115506). 20 Any deviations from the protocol are indicated.

| Literature search
MEDLINE, Embase, Cochrane Library, KSR Evidence and the Science databases on Web of Science were searched for relevant studies from January 1990 (when IgA EMA antibodies were introduced into practice) to August 2020, combining terms for "antibodies" and "coeliac disease" (see Appendix 1 for full strategies).
Ongoing and completed studies were identified using the WHO International Clinical Trials Registry and the National Institutes of Health Clinical Trials database. Internet searches using keywords such as "celiac"/"coeliac" and "serological tests" were undertaken.
The reference lists of relevant systematic reviews identified during the literature search were also used as a source of potentially relevant studies. No language restrictions were applied.

| Inclusion criteria
Inclusion criteria were defined during protocol development and piloted on a subset of 500 articles at title and abstract screening to ensure functionality.
Studies using a diagnostic cohort design were included. Studies using a case-control design were excluded as they have been shown to overestimate test accuracy and a substantial evidence base from cohort studies was anticipated. 21 Studies in patients with classical symptoms of coeliac disease (eg, diarrhoea, abdominal pain, fatigue), as well as mixed symptomatic and risk group (eg, type I diabetic) populations, were included.
After piloting our inclusion criteria, we chose to exclude studies in healthy individuals (ie, screening) or specific risk groups only to ensure the review was conducted in a clinically relevant population and that accuracy measures could be reasonably combined in a meta-analysis.
Studies in which patients underwent at least one serological test for coeliac disease, including IgA tTG, IgG tTG, IgA EMA, IgG EMA, IgA DGP, IgG DGP and IgA anti-actin antibodies (AAA), were included.
Combined serological tests, such as IgA/IgG tTG (which detect the presence of IgA tTG or IgG tTG in a serum sample) were also included.
Studies were included if the diagnosis was confirmed by duodenal biopsy and if at least some seronegative patients also underwent a biopsy. Studies in which serology formed part of the reference standard, which could lead to overestimation of accuracy, were excluded.

| Study selection
Titles and abstracts identified through electronic database and web searching were uploaded to Rayyan and independently screened by two reviewers (ALS, MMCE or VC). 22 Articles considered potentially relevant were obtained and assessed by one reviewer (ALS) and checked by a second reviewer (MMCE, LJS or VC) for inclusion in the review. Any discrepancies between reviewers were resolved through discussion or referral to a third reviewer.

| Data extraction
Data from each study were extracted by one reviewer (ALS) and all were checked by a second (MMCE, LJS or VC) using data extraction forms developed in Microsoft Access 2016. Disagreements were resolved through discussion or referral to a third reviewer. Data on study and patient characteristics, serological tests, and biopsy procedures were extracted. Two-by-two data comparing serological test results with reference standard (biopsy) results (number of true positives, false negatives, false positives and true negatives) were extracted.
Data relating to patients that did not undergo biopsy were excluded from the 2 × 2 tables where possible. Where 2 × 2 data were reported at multiple thresholds within a study, data relating to the manufacturer or study authors' pre-specified cut-off were extracted. Where a threshold of primary importance was not pre-specified, data relating to the lowest reported threshold were extracted. Two-by-two data were extracted at biopsy cutoff Marsh Grade 3a if available, or at any reported biopsy cut-off otherwise.

| Study quality
Included studies were assessed for methodological quality using the QUADAS-2 tool, 23 tailored to our review (Appendix 2), which evaluates the risk of bias and applicability in primary diagnostic accuracy studies. The tool consists of four domains: patient selection, index test, reference standard, and flow and timing, each rated as high, low or unclear risk of bias. If at least one of the domains was rated as "high," the study was considered at high risk of bias; if all domains were judged as "low" the study was considered at low risk of bias; otherwise, the study was considered as "unclear" risk of bias.
When a study reported accuracy data for two or more tests, the "index test" and "flow and timing" domains were applied separately to each test. When a study reported accuracy data for adults and children separately, all domains were applied separately to each patient group.

| Quantitative analysis and metaanalysis methods
Analyses were stratified by age group (adults >16 years; children ≤16 years; mixed [adults and children] and age unspecified) and test.
All analyses were performed in Stata version 16.0 using the metandi command. 24

| Primary analyses
For data sets including four or more studies, a bivariate randomeffects meta-analysis of sensitivity and specificity was performed, 14 assuming binomial likelihoods for the number of true positive and true negative test results. 25 When there were few (2-3) studies in a data set, univariate fixed-effect meta-analyses of sensitivity and specificity were performed. Where only a single study was available, the sensitivity and specificity reported in that study are presented.
Where the extracted data on a test related to a range of thresholds, we report results from two separate meta-analyses. First, we fitted the bivariate model 14 to studies reporting at the most commonly reported threshold only. From these models, we report summary sensitivity and specificity at that threshold. We used summary estimates of sensitivity and specificity to calculate summary positive and (inverse) negative likelihood ratios and associated confidence intervals. Second, we fitted the HSROC model 15 to the full data set, which consisted of one estimate per study to avoid double counting. From these models, we present the summary receiver operating characteristic (ROC) curve, which represents the trade-off between sensitivity and specificity across thresholds.
The sensitivity and specificity reported in each study were plotted in ROC space, with colour coding allowing for comparisons between different thresholds to be made. Summary estimates of sensitivity and specificity with 95% confidence intervals at the most commonly reported threshold and summary ROC curves across all reported thresholds are presented.
Summary positive and negative predictive values and natural frequencies were estimated for a hypothetical population of 10,000 people tested for coeliac disease, for a pre-test probability of 2% (the estimated pre-test probability of coeliac disease in a primary care population presenting with symptoms suggestive of coeliac disease 26 ). Values were estimated based on summary sensitivity and specificity, restricted to the most commonly reported threshold.

| Direct comparisons
For the two most commonly assessed tests, IgA tTG and IgA EMA, we also estimated the relative sensitivity and specificity within each study to summarise their comparative accuracy. Relative sensitivity is a ratio of two sensitivities, for example if relative sensitivity is 1 then the sensitivity of the two tests is the same (similarly for specificity). We had intended to pool estimates of relative sensitivity and specificity. However, none of the studies that evaluated comparative accuracy reported estimates of sensitivity and specificity for the same thresholds. We therefore report the observed range of these measures across comparative studies (which evaluated both tests in the same group of patients). The relative accuracy of tests with a high estimated sensitivity and/or specificity (>90% across all studies), that were compared to IgA tTG or IgA EMA, are also reported.

| Sensitivity analyses
Sensitivity analyses were performed restricting inclusion to: (1) studies rated at low risk of bias using the QUADAS-2 tool, (2) studies carried out in symptomatic patients only and (3) studies in which all patients received a biopsy.

| Patient and public involvement
Patients and the public were not involved in the choice of research question, the design of the study, the conduct of the study, the interpretation of the results, or our dissemination plans.

| Deviations from the protocol
In the protocol for this review, 20 we described our target population as "adults or children at risk of coeliac disease." After piloting out inclusion criteria at title and abstract screening, we chose to exclude studies in healthy populations (ie, screening) or single risk groups only, as described in the Inclusion criteria.
We described the intervention as "any serological test for coeliac disease," including HLA-DQ typing. We decided not to include anti-gliadin antibodies as they are not recommended for use in the diagnosis of coeliac disease by the National Institute for Health and Care Excellence. 4 We decided to focus this review on serological tests; we have evaluated the accuracy of HLA testing in a separate review. 26 We did not include point-of-care or rapid serological tests as a systematic review of their accuracy has recently been published. 27 We described our comparator as "any reported reference standard." After piloting our exclusion criteria at title and abstract screening, we decided to exclude studies where serology formed part or all of the reference standard as this would lead to over-inflation of test accuracy estimates.
In the strategy for data synthesis, we said "If a test is reported at a single threshold for test positivity across studies, summary operating points will be used to measure the test's accuracy. If a test is reported at differing thresholds across studies, summary ROC curves showing the trade-off between sensitivity and specificity at the various thresholds will be produced." In the review, we produced both summaries of the evidence for completeness: a summary ROC curve across all reported thresholds and summary sensitivity/specificity at the most commonly reported threshold.

| Study characteristics
A total of 15 170 articles were identified through electronic searches (see PRISMA study flow diagram in Figure 1). After removing duplicates, the titles and abstracts of 7956 articles were independently screened by two reviewers, of which 398 were considered potentially relevant and full-texts were obtained. We were unable to ob-

| Quality of evidence
One hundred and thirty-seven sets of 2 × 2 data were judged to be at high risk of bias, 22 were low risk of bias and 44 were deemed unclear (Appendix 4).
Most (118) sets of 2 × 2 data were at high risk of bias because biopsy results were interpreted with knowledge of (or not explicitly blinded to) serology results. In 28 sets of 2 × 2 data, there was potential for partial verification bias due to some patients 1 Although we defined adults as >16 years and children as ≤16 years, cut-offs for age groups differed between studies in practice, for example studies in children including patients as old as 18 years, studies in adults including patients as young as 15 years.

| All thresholds
All accuracy data extracted from the included studies are summa- at the most commonly reported thresholds (Figure 4).

| Sensitivity analyses
There was little evidence that estimates of sensitivity and specificity varied according to study quality, whether all patients presented with symptoms or whether all patients within a study underwent biopsy (Table 3 and Appendix 5). However, formal comparison was not possible as too few studies within each subgroup reported accuracy estimates at consistent thresholds.

| Direct comparisons
Comparative accuracy studies provided little evidence of differences in accuracy between tests (

| Statement of principal findings
The accuracy of serological tests for detecting coeliac disease was high.  were insufficient data to formally evaluate its use as an add-on test.

| Strengths and limitations
There are several key strengths to this review, which avoids the methodological limitations highlighted in previous reviews. Limiting inclusion to diagnostic cohort studies helps to ensure the quality of the supporting evidence and avoids overestimation of test accuracy due to potential bias introduced by case-control designs. However, this also means that fewer studies were available to contribute to summary estimates, potentially resulting in reduced precision of these estimates. Potentially relevant studies were identified through an extensive literature search and screening was carried out independently by two reviewers at each stage. We identified 113 studies that fulfilled our review inclusion criteria, considerably more than were included in previous reviews; the largest number of studies included in any of the previous reviews identified was 31 studies in addition to 11 reviews that included smaller numbers of studies. 12 Data extraction was also performed by one reviewer and checked by a second to ensure accuracy and completeness. We conducted a detailed risk of bias assessment using an appropriate and validated tool. 23 Syntheses of studies were carried out in line with Cochrane recommended methods and sensitivity analyses were performed to explore heterogeneity. 17,18 A large amount of heterogeneity was present across included studies. A wide variety of thresholds for test positivity were reported across studies, with some not reporting the threshold at all.
There is a lack of clarity on how thresholds relate to one another across laboratories and manufacturers. Where threshold units differed between assays we assumed they represented the same arbitrary units and were comparable, however as they do not measure absolute amounts of antibodies there may be a slight variation between different commercial assays. We would have liked to investigate differences between commercial kits and whether the accuracy of tests has changed over time as new testing methods have evolved.
However, the different thresholds at which results were reported and wide variety of commercial kits employed mean that there were insufficient data to allow us to stratify our analysis in this way.
There was substantial variation in coeliac disease prevalence between studies, likely due to differences in patient characteristics such as clinical presentation and reason for biopsy. Some studies TA B L E 3 Study estimates of sensitivity and specificity limited to specific subgroups, stratified by age group and test  reviews. The inclusion of case-control studies may have inflated previous accuracy estimates.

| Implications for clinical practice and future research
Serological tests are useful as a first step towards diagnosis in patients with suspected coeliac disease. The British Society of Paediatric Gastroenterology, Hepatology and Nutrition guidelines have already incorporated the safe and secure serological diagnosis of coeliac disease, allowing children meeting certain criteriaincluding IgA tTG ≥10× the upper limit of normal across a number of different assays-to be diagnosed without biopsy. These guidelines have since been validated in large prospective studies. 29,30 IgA tTG accuracy should always be internally validated against biopsy results within a practice, due to variation between assays and laboratory procedures.
There is increasing evidence of the high predictive value of IgA tTG ≥10× the upper limit of normal in an adult population, 31 with interim guidance including a non-biopsy protocol for adults with suspected coeliac disease published in light of the coronavirus pandemic. 7 We found IgA EMA to be highly specific in adults, lending support to its utility as a secondary test to reduce the likelihood of a false positive tTG result. This may help to pave the way for serological diagnosis of coeliac disease in an adult population in the future, a topic that is of great interest to the gastroenterology community.
Although, EMA is not available in all labs because it depends much more on observer interpretation than other tests such as tTG.
The interpretation of serological test results remains an important area of research, and further work is needed to confirm the thresh- There is a need for research on serological test accuracy in primary care settings where serological tests are used in practice. With the growing movement towards biopsy-avoidant pathways, the diagnosis and management of coeliac disease is likely to increasingly take place in primary rather than secondary care. It is therefore key that serological testing strategies are evaluated in primary care populations.