Performance of different rapid antigen testing strategies for SARS‐CoV‐2: A living rapid review

Rapid antigen detection tests (RADTs) for SARS‐CoV‐2 testing offer several advantages over molecular tests, but there is little evidence supporting an ideal testing algorithm. We aimed to examine the diagnostic test accuracy (DTA) and the effectiveness of different RADT SARS‐CoV‐2 testing strategies.


| INTRODUCTION
The COVID-19 pandemic remains a public health issue.Globally, as of February 2023, more than three-quarters of a billion confirmed cases of COVID-19 have been reported. 1Testing is essential not only for the diagnosis and treatment of COVID-19 but to control its spread by enabling public health measures such as contact tracing, quarantine measures and community disease surveillance.
Nucleic acid amplification tests (NAATs), including real-time reverse-transcription polymerase chain reaction (RT-PCR), are considered to be the 'gold standard' for identification of SARS-COV-2, [2][3][4] with an upper respiratory specimen as the recommended sampling method for clinical diagnosis. 5While nasopharyngeal RT-PCR testing has high diagnostic accuracy, 2,3 drawbacks include the difficulty of specimen collection and discomfort associated with the procedure, along with requiring centralized test processing and a long turnaround time for results.In response, there was a demand for more rapid and accessible assays for SARS-COV-2 detection, such as antigen tests.Rapid antigen detection tests (RADTs) produce results faster than molecular tests (15-30 min), 6 are less expensive than RT-PCR 7,8 and reduce the need for specialized equipment, allowing the tests to be used in settings where screening among high-risk populations is required (e.g.congregate settings).
RADTs have comparably high specificity but lower sensitivity compared to RT-PCR, particularly in asymptomatic populations. 9The World Health Organization (WHO) recommends that antigen tests meet the minimum performance requirements of sensitivity and specificity >80% and 97%, respectively. 10To overcome the lower sensitivity of antigen testing, some organizations have recommended repeated (serial) testing after a negative antigen test. 6,10However, limited evidence supports the ideal testing algorithm for RADTs.A 2022 Cochrane review examined the diagnostic test accuracy (DTA) of serial testing with RADTs and conducted secondary analyses by sampling strategy, including sampling types or the individual performing the test. 9This review did not report the effect of the individual performing the sampling on DTA, an important question when developing testing algorithms, especially in community settings.There is also a need to assess how RADT strategies contribute to infection prevention and control programmes by reducing the incidence or transmission of SARS-CoV-2 in the community.The results of RADTs are varied across studies, with different factors affecting test performance including test timing concerning symptom onset, viral load, asymptomatic or symptomatic patients, and vaccination details. 11,12 Additionally, the evaluation of RADTs has been further complicated by the waves of the pandemic and changes in control measures, resulting in a need for a living review.
This living rapid review and meta-analysis aimed to determine the effectiveness of different SARS-CoV-2 RADT strategies.RADT testing strategies of interest included (1)the individual collecting the samples (e.g.selfadministered testing); (2) the test operator; (3) different testing frequencies (e.g.one-off compared to serial testing); and (4) sample type (e.g.nasal, nasopharyngeal and saliva).We aimed to systematically examine (A) the DTA of these strategies, as well as (B) their effectiveness in reducing incidence or transmission, to inform clinical and public health policymakers on the best RADT testing strategy.

| METHODS
This review was conducted following Cochrane guidance for rapid reviews 13 and followed a pre-defined protocol publicly registered on PROSPERO (CRD42021284168), the National Collaborating Centre for Methods and Tools Registry (ID#471) and Open Science Framework (https:// osf.io/pqjyr/).Reporting guidance was followed using the PRISMA-DTA checklist (Appendix S1). 14

| Search strategy
A detailed search strategy was developed by an information specialist (BS) in consultation with the review team (Appendix S3).Using the Ovid platform, we searched Ovid MEDLINE® ALL, Embase and EBM Reviews Cochrane Central Register of Controlled Trials.No language restrictions were placed on the searches.The initial search was conducted on 12 September 2021, and updated searches were conducted at regular intervals until 21 February 2022.The search strategy was peer-reviewed (via PRESS) 15 before execution (Appendix S2).We also searched Covidspecific resources (

K E Y W O R D S
COVID-19, SARS-CoV-2 rapid antigen detection tests, self-testing, serial testing, testing strategies

| Eligibility criteria
A complete summary of the eligibility criteria is available in Appendix S4.There were no restrictions on populations or settings.Eligible study designs included randomized controlled trials (RCTs), non-randomized controlled trials and observational studies with a control group.Studies published in languages other than English were excluded due to resource limitations.
We included studies that compared at least two different testing strategies with RADTs in the same testing population.A RADT strategy against no testing was also an eligible comparison.Any SARS-CoV-2 commercial point-of-care RADT was eligible for inclusion. 16We excluded antigen assays that required laboratory equipment for processing (e.g.chemiluminescent enzyme immunoassays, such as Lumipulse®).We examined DTA outcomes (e.g.test sensitivity) and the transmission and incidence of SARS-CoV-2 infection related to different antigen testing strategies.For DTA outcomes, studies were eligible if the DTA of the RADTs was calculated using a molecular test as the reference standard.

| Study selection
The results of the search were screened using DistillerSR online systematic review software. 17All reviewers completed a training exercise of 50 articles for title and abstract screening and 10 for full-text review prior to screening to ensure agreement between reviewers.One reviewer (ABeck, ABennett, NS or GZ) independently screened each title and abstract for inclusion.Full-text review was also completed by one reviewer (NS or GZ) with verification by a different second reviewer (NS or GZ) to determine inclusion.Disagreements were resolved through discussion or third-reviewer consultation (ABennett).

| Data extraction
Prior to beginning data extraction, all reviewers independently conducted a pilot training exercise.One reviewer (ABennett, NS or GZ) extracted data from each included study in Microsoft Excel, 18 with a different second reviewer verifying data (NS or GZ).
We extracted study characteristics, including the total number of samples analysed and participants in each study.We also extracted variables that may have influenced test performance, including test timing concerning symptom onset, asymptomatic or symptomatic patients, and vaccination details.In cases where studies did not report vaccination details, data on the cumulative doses for COVID-19 vaccinations were estimated from Our World in Data based on reported study dates and country. 19For DTA outcomes, summary estimates of sensitivity and specificity were extracted, as well as study data on the number of true positives, false negatives, false positives and true negatives.Any study that reported on multiple eligible antigen tests (e.g. two or more alternative antigen sampling techniques, each compared to a nasopharyngeal swab) had the results from each test extracted as a unique dataset.We also recorded the incidence of PCR-confirmed infection for different testing strategies.

| Risk of bias assessment
One reviewer (ABennett or NS) independently assessed the risk of bias in each included study, with a different second reviewer verifying assessments (ABennett or NS).Reviewers independently completed a training exercise of five articles before beginning assessments.We used the QUADAS-C 20 tool to assess the risk of bias in comparative DTA studies and the Cochrane RoB-2 tool for the single randomized cluster-controlled trial. 21

| Data synthesis
We described the study and participant characteristics, intervention and comparator details, outcome results, and risk of bias assessments for all included studies.For transmission and incidence outcomes, data were synthesized narratively.DTA results are presented visually in forest plots and pooled via meta-analysis, where appropriate.
Summary estimates of pooled sensitivity and specificity with 95% CIs were estimated using a univariate random-effects model.A univariate model was selected over a bivariate model due to the small number of included samples and the homogeneity observed in the specificity results.A continuity correction of 0.5 was applied for studies with an observed count of zero for any of the cells of the 2 × 2 table.Studies were included in the meta-analysis if they (1) provided data for at least two RADT strategies, (2) used the same RADT for each testing strategy, and (3) used paired samples for both the reference tests and the two index testing strategies.Studies were only included in the meta-analyses if provided data to complete a 2 × 2 table.If a study reported more than one analysis that would be eligible for inclusion in a metaanalysis, we first conducted fixed-effects meta-analysis.We used those pooled estimates of sensitivity and specificity in the random effects meta-analysis.
For the meta-analysis, data were pooled separately for each category of sampling strategy (e.g.accuracy of testing by sample type, the individual performing the sampling).When data were synthesized from sample types, different forms of nasal samples (e.g.anterior nasal, mid-turbinate) were collapsed under 'nasal'.Studies unable to be included in the meta-analysis had their results synthesized narratively.
Heterogeneity was explored through visual inspections of the forest plots and the I 2 statistic.For metaanalysis of the individual performing the sampling, pre-specified subgroup analyses were performed for the sample type.Due to small sample sizes in subgroups, heterogeneity between studies and differences between subgroups is described, but no statistical tests were performed.Data were analysed using RevMan software, 22 as well as the metafor and mada package in R software (R Core Team, 2022).

| Protocol amendments
Prior to commencing full-text screening for the first search results, the eligibility criteria were amended to include molecular tests as an eligible comparator if studies used molecular tests as a reference standard to evaluate the effectiveness of alternative rapid antigen testing strategies for diagnosing COVID-19.No other amendments were made.

| RESULTS
The searches retrieved 8010 unique articles.After title and abstract screening, 1279 articles were included for full-text review and 18 studies were included (Figure 1).A complete list of excluded full-text studies is available in Appendix S5.Study designs included a cluster-randomized trial, 12 prospective cohort studies, two retrospective cohort studies and four cross-sectional studies.
Study characteristics are summarized in Table 1, and a complete summary of results is available in Table 2. Studies were conducted between 2020 and 2021 and published in 2021 or 2022.One pre-print was included. 23Most reports were from US or Western Europe settings, with additional studies from Hong Kong, 24,25 Pakistan 26 and Japan. 27,28lthough the dominant strain in circulation was rarely reported, the original SARS-CoV-2 24,[29][30][31][32][33] and early variants (Alpha and Delta) were the most common strains based on the study setting.Testing settings ranged from inpatient hospitals to ambulatory and community testing sites.Most reports focused on testing within adult populations, although some specified that those younger than 18 were also tested. 24,27,34,35Although most studies included asymptomatic and symptomatic people, some solely focused on symptomatic populations. 26,28,36,37Studies reported a range of days for the timing of the antigen tests in relation to participant symptom onset, with most studies noting an average/median duration of symptoms of fewer than 7 days on the testing day.Although only one study explicitly reported on vaccination details, 23 most studies were conducted prior to vaccine rollout or at the early stages of population-wide vaccination.

| Quality assessment summary
The risk of bias in the 17 diagnostic accuracy studies was assessed with the QUADAS-C tool. 20Studies had a moderate 30,31,33,[38][39][40] to high [24][25][26][27][28][29]32,34,36,37 risk of bias (Appendix S6) when evaluated using the QUADAS-2 tool, and 15/17 studies [23][24][25][26][27][28][29][31][32][33][34][36][37][38][39] were rated at high risk of bias when the additional QUADAS-C assessment was completed. Several sudies 24,27,29,30,30,31,33,38,39 did not specify how patients were selected into the study and were at an unknown risk of bias for the patient selection domain.Studies that recruited participants 34 or that used case-control designs 25,26,28,37 were at higher risk of selection bias.Additional issues identified with the index strategy comparisons were studies that failed to use a fully paired design. 25,27,28,33,36,39Many studies failed to report if index test results (for either index test strategy) were interpreted without knowledge of the results of the reference standard 26,27,29,31,33,34 or vice-versa.24,27,[29][30][31][32]34,39 Studies were not rated at increased risk of bias due to multiple samples taken simultaneously, as studies have suggested that repeated sample collection does not decrease COVID-19 test performance.41 Seven studies 23,24,27,29,32,34,36 were at high risk of bias in the flow and timing domain.Common issues included using different reference standards or unclear reporting of participants excluded from analyses.Additional issues for the comparison were that the reference standard varied for some of the index tests 23,25,27,28,31,32,36,38 or was unclear.26,33 The single cluster-randomized trial was judged to have an overall rating of 'some concerns' (Appendix S7) using the Cochrane tool to assess the risk of bias in clusterrandomized trials (RoB 2).21 Concerns were noted regarding the timing of identification or recruitment of participants into the trial, the opportunity for bias due to deviations from the intended interventions, and potential bias in the measurement of the outcome.

| Testing frequency
We included one longitudinal study that addressed the DTA of serial testing strategies.Smith et al. 2021 followed 43 adults infected with SARS-CoV-2 as part of a wider university surveillance testing program. 34Nasal and saliva swabs were taken daily from participants for RT-qPCR and antigen fluorescent immunoassay.Protocol sensitivity (the ability of repeated testing over 2 weeks to detect an infected person) was reported for different testing frequencies.All tests showed >98% sensitivity if administered at least every 3 days.Protocol sensitivity dropped to 80% at weekly testing frequencies.The authors also compared the ability of different testing frequencies to identify cases before or during the period when infectious virus was detectable in nasal samples.When tests were performed less frequently than daily, there was a reduction in protocol sensitivity during this infectious period for both RT-qPCR and antigen testing.When testing weekly, the sensitivity of the antigen test was only 56%.

| Individual collecting samples
Seven 24,29,30,32,37,38,40 studies were included in the metaanalysis of DTA for self-collected samples versus HCWcollected samples for RADT testing (Figure 2).Pooled sensitivity was higher for HCW-collected samples (78%, 95% CI: 68%-89%) than self-collected samples (64%, 42%-85%), but the pooled specificity remained high (99%) regardless of who collected the samples.The heterogeneity of the sensitivity estimates was very high for both HCW-collected and self-collected samples (I 2 = 86%,  In studies where vaccine coverage was not reported data on the cumulative doses for COVID-19 vaccinations was estimated from Our World in Data based on reported study dates and country. 17 I 2 = 95%, respectively).To explore heterogeneity between studies, we examined subgroups stratified by who collected the samples and sample type.Subgroup analyses for self-collected nasal samples as the index test strategy (n = 5) 24,30,32,37,38 showed similar accuracy in pooled sensitivities for both self-collected samples (80%, 95% CI: 71%-89%) and paired HCW-collected samples (85%, 95% CI: 80%-90%) (Appendix S8).Sensitivity estimates were very heterogeneous across the small subset of three studies 29,37,40 that used self-collected saliva samples as the index test strategy (Appendix S9).While pooled saliva sensitivity was lower than nasal specimens regardless of who collected the samples, the self-collected saliva samples showed a clear decrease in sensitivity compared to the HCW-collected saliva samples (32% vs. 67%).Heterogeneity remained significant (I 2 = 86%, I 2 = 96%, respectively), suggesting that factors other than the individual collecting samples and the sample site affected accuracy.Data from Frediani et al 36 were excluded from the meta-analysis due to not using a fully paired design.
The study also reported higher sensitivity of HCW-worker collected samples, although the differences in sensitivity were not statistically significant.

| Test operator
One study compared different operators for RADT performance (non-paired index strategy samples). 33asopharyngeal samples were collected by a laboratory scientist, a fully trained research HCW or a self-trained lay individual, along with a paired nasopharyngeal RT-PCR as the reference standard.RADT sensitivity was highest when administered by laboratory scientists (79%, 95% CI: 72%-84%) and lower when trained HCWs (70%, 64%-76%) or self-trained members of the public (58%, 52%-63%) administered the test.
Four studies 29,31,37,40 were included in the meta-analysis for saliva vs. nasopharyngeal samples for antigen testing (Figure 4).A high degree of heterogeneity was observed for sensitivity across studies for both saliva (I 2 = 95%) and nasopharyngeal samples (I 2 = 87%).Summary sensitivity estimates were lower for saliva samples (25%) than for nasopharyngeal samples (59%).Specificity was high (>99%) across all studies for both saliva and nasopharyngeal samples.Four studies were not included in the meta-analysis due to not using fully paired samples 27,39 or providing sensitivity estimates alone. 26,28Although estimates were heterogeneous between studies, all four reported that sensitivity was substantially higher in nasopharyngeal samples than in saliva-based samples.
Of note, the pooled sensitivity of RADTs conducted with nasopharyngeal samples varied across studies (84% for studies with paired nasal samples and 59% for studies with paired saliva samples), suggesting heterogeneity between included studies beyond the sampling location.Indeed, the I 2 for these analyses was 52% and 87%, respectively.We could not explore between-study heterogeneity due to the small number of included studies and lack of subgroup data for factors of interest.Sources of heterogeneity of interest included symptom status, the brand of antigen test used in each study, the test setting (hospital versus ambulatory-based) and study region and the vaccination status of the population tested.
Three studies 23,25,39 provided data on oropharyngeal samples (Figure 5).Studies were not pooled as two 25,39 of these studies did not use fully paired samples for the index testing strategies.Two studies found equivalent sensitivities when combined oropharyngeal-nasal samples were compared with paired nasopharyngeal samples.Courtellemont et al 39 found that RADTs performed on oropharyngeal samples had lower sensitivity (71%) than tests performed on nasopharyngeal samples (97%).

| Incidence/transmission outcomes
Young and colleagues carried out a cluster-randomized, controlled trial in 201 secondary schools and further education colleges in England to examine the effect of an antigen testing strategy for school-based COVID-19 contacts.Schools were randomized to either (1) selfisolation of contacts for 10 days (control) or (2) to voluntary daily LFD testing for 7 days with LFD-negative contacts remaining at school (intervention).Contacts in the intervention schools self-tested by swabbing their anterior nasal cavity, and samples were tested by school staff using a SARS-CoV-2 antigen lateral flow device.
There was no statistical difference between study groups in symptomatic PCR-confirmed infection (aIRR = 0.96, 95% CI: 0.75-1.22).The authors concluded that the RADT daily testing strategy for school-based contacts was non-inferior to self-isolation for controlling COVID-19 transmission.
As noted in the quality assessment summary, there were some concerns about the risk of bias.

| DISCUSSION
This rapid review aimed to synthesize studies comparing alternative RADT strategies for SARS-CoV-2 testing.Despite our interest in the effect of different antigen testing strategies on COVID-19 incidence and transmission, we identified limited evidence.Although we found evidence on the DTA of various antigen testing methods, the evidence base was heterogeneous and had risk of bias concerns.One UK cluster-randomized trial found that an antigen-based daily testing strategy was non-inferior to self-isolation for the control of COVID-19 transmission.However, given the data were drawn from a single study with concerns for bias, additional research is needed to support these findings.Daily RADTs as an alternative to self-isolation may be particularly relevant to school or workplace settings, where avoiding extended periods of absence is desirable.A 2022 rapid review reported on the effectiveness of RADT screening programs, including population-level screening, pre-event screening and serial testing, to limit the transmission of SARS-CoV-2. 42owever, the authors could not conclude if the screening of asymptomatic individuals was effective in limiting transmission due to inconsistency in the results, a low number of included studies, and concerns surrounding methodological quality.As antigen testing programs continue to be implemented in different contexts, a standardized evaluation component should be added so that more data are available to evaluate the effect of these programs on SARS-CoV-2 transmission.
For the single included study that reported DTA by testing frequency, protocol sensitivity remained high if RADTs were administered at least every 3 days but fell at weekly testing frequencies.When infectious virus was detectable in nasal samples, less than daily antigen testing showed a decrease in protocol sensitivity.While the current body of evidence is very limited, these data suggest the importance of repeated antigen testing when developing testing algorithms.Overall, while specificity remained high across strategies, heterogeneity was a significant challenge in conducting the meta-analysis of sensitivity.While the overall trends and patterns observed for testing strategies were similar across included studies, the high level of heterogeneity suggests that the point estimates for sensitivity should be considered cautiously.Regardless of testing strategy, the sensitivity of RADTs was markedly lower than molecular tests, with few studies reporting RADTs that met the recommended minimum WHO sensitivity requirements. 10While still low compared to the RT-PCR reference standard, we found increased RADT sensitivity for HCW-collected samples (78%) than self-collected samples (64%).When also stratified by sample type, nasal samples showed only slightly decreased sensitivities for self-collected samples compared to those collected by HCWs.These findings align with prior meta-analyses for molecular testing; Tsang et al. found no statistical difference between pooled nasal and throat samples collected by HCWs or self-collected samples. 3Contrarily, self-collected saliva samples showed decreased sensitivity compared to paired HCW-collected saliva samples.Nasal samples had similar sensitivity when compared to antigen testing with paired nasopharyngeal samples.In contrast, sensitivity was much lower for saliva samples when compared to paired nasopharyngeal samples.
Despite the low sensitivity compared to molecular testing, RADTs have been implemented globally, 43 from testing symptomatic individuals or high-risk groups 44 to population-level screening. 45The differences in sensitivity between RADT strategies reported in this review warrant consideration in developing comprehensive testing strategies.False negatives may lead to increased community transmission, especially in asymptomatic individuals.False negatives are particularly important in high-risk settings, such as hospitals or long-term care facilities, where undetected SARS-CoV-2 spread can lead to outbreaks with severe consequences.
Others have suggested that a single testing approach for SARS-CoV-2 is not appropriate and that public health strategies must be tailored to the specific purpose of testing, including diagnostic, screening and surveillance testing. 7While further research is needed to evaluate the DTA of antigen testing strategies in primarily asymptomatic populations, self-collected samples may be ideal for screening settings where routine rapid antigen testing is needed.The ease of collecting samples other than nasopharyngeal swabs may outweigh decreases in sensitivity, especially as this review found only slight decreases in sensitivity for self-collected nasal samples.In non-high-risk settings, such as general population screening, RADTs may be combined with other public health measures to limit transmission, such as masking and physical distancing.Serial testing may be another way to overcome some of the lower sensitivity of RADTs, as it was observed in this review that frequent testing increased protocol sensitivity.Public health programs will likely continue implementing a mixture of SARS-Cov-2 testing programs, dependent on purpose and test setting to control the pandemic.
Our review had several limitations.First, screening, data extraction and risk of bias assessments were completed by a single reviewer, although we did have a second reviewer verify included studies and the accuracy of extracted data.We were limited in our ability to perform formal statistical analyses of heterogeneity due to the low number of included studies and lack of data reported for factors of interest.We were unable to perform pre-specified subgroup analyses to explore the effect of important test performance factors, such as patient symptom status, days since symptom onset at time of testing or vaccination status, due to the small number of included studies for each strategy.While forest plots and meta-analyses were used to describe patterns in test accuracy, the high level of heterogeneity suggests that the point estimates for sensitivity should be regarded with caution.Future research should explore how patient factors, such as symptom status, viral load and disease severity, may play a role in DTA for each sampling strategy.It has been suggested that RADTs perform well in individuals with the highest viral loads, thus enabling the identification of those who are at highest risk of transmitting the virus. 7dditionally, although some studies reported that selfcollection for RADT testing was supervised by HCWs, there were too few studies reporting supervision to explore this effect.However, in prior reviews there was a negligible difference in RT-PCR accuracy due to supervision in sample collection. 2 Finally, although all DTA studies used RT-PCR as the reference standard, the test brand and sample type used for RT-PCR varied across studies and sometimes within studies.We do not believe that it was a large effect due to the similarly high diagnostic accuracies molecular test platforms and sample types, 2,3 but this was considered a potential source of bias in our quality assessments.Our overall certainty in the evidence is low due to these inconsistencies and other risk of bias concerns.Future research comparing RADT testing strategies should prioritize the use of fully paired designs with the same molecular reference standards and ensure clear reporting adhering to STARD guidelines. 46

| CONCLUSION
We conducted a living rapid review between September 2021 and February 2022 to evaluate the effectiveness of different RADT testing strategies.The identified evidence comparing RADT testing strategies was highly heterogeneous and had concerns of risk of bias.Additionally, we found little evidence on serial testing or the value of RADT strategies to limit incidence or transmission.Overall, RADTs showed high specificity when using molecular tests as the reference standard, regardless of testing strategy.RADT sensitivity appears to be affected by the testing frequency, the individual performing sample collection and the sample type.RADT testing strategies should be optimized to fit the purpose and context of testing.

F I G U R E 2
Forest plot of paired RADT diagnostic accuracy results for self-collected versus HCW-collected samples (RT-PCR as reference standard).(A) All Studies (B) Studies included in Meta-Analysis

F I G U R E 3
Forest plot of paired RADT diagnostic accuracy results for nasal samples versus nasopharyngeal samples (RT-PCR as reference standard).(A) All studies (B) Studies included in Meta-Analysis

F I G U R E 4
Forest plot of paired RADT diagnostic accuracy results for salivary samples versus nasopharyngeal samples (RT-PCR as reference standard).(A) All studies (B) Studies included in Meta-Analysis

F I G U R E 5
Forest plot of paired RADT diagnostic accuracy results for oropharyngeal samples versus nasopharyngeal samples (RT-PCR as reference standard).
Summary of effectiveness of different COVID-19 rapid testing strategies.Protocol sensitivity of each testing platform to detect an infected person before or during the time in which their viral culture was positive.
T A B L E 2 a T A B L E 2 (Continued)