Estimating the agreement and diagnostic accuracy of two diagnostic tests when one test is conducted on only a subsample of specimens
Abstract
We focus on the efficient usage of specimen repositories for the evaluation of new diagnostic tests and for comparing new tests with existing tests. Typically, all pre‐existing diagnostic tests will already have been conducted on all specimens. However, we propose retesting only a judicious subsample of the specimens by the new diagnostic test. Subsampling minimizes study costs and specimen consumption, yet estimates of agreement or diagnostic accuracy potentially retain adequate statistical efficiency. We introduce methods to estimate agreement statistics and conduct symmetry tests when the second test is conducted on only a subsample and no gold standard exists. The methods treat the subsample as a stratified two‐phase sample and use inverse‐probability weighting. Strata can be any information available on all specimens and can be used to oversample the most informative specimens. The verification bias framework applies if the test conducted on only the subsample is a gold standard. We also present inverse‐probability‐weighting‐based estimators of diagnostic accuracy that take advantage of stratification. We present three examples demonstrating that adequate statistical efficiency can be achieved under subsampling while greatly reducing the number of specimens requiring retesting. Naively using standard estimators that ignore subsampling can lead to drastically misleading estimates. Through simulation, we assess the finite‐sample properties of our estimators and consider other possible sampling designs for our examples that could have further improved statistical efficiency. To help promote subsampling designs, our R package CompareTests computes all of our agreement and diagnostic accuracy statistics. Copyright © 2011 John Wiley & Sons, Ltd.
Citing Literature
Number of times cited according to CrossRef: 10
- Chung-Hong Chan, Jing Zeng, Hartmut Wessler, Marc Jungblut, Kasper Welbers, Joseph W Bajjalieh, Wouter van Atteveldt, Scott L. Althaus, Reproducible Extraction of Cross-lingual Topics (rectr), Communication Methods and Measures, 10.1080/19312458.2020.1812555, (1), (2020).
- L. Cereser, F. Marchesini, E. Di Poi, S. Sacco, G. De Marchi, A. Linda, G. Como, C. Zuiani, R. Girometti, Structured report for chest high-resolution computed tomography in patients with connective tissue disease: Impact on the report quality as perceived by referring clinicians, European Journal of Radiology, 10.1016/j.ejrad.2020.109269, 131, (109269), (2020).
- Bryant R. England, Punyasha Roul, Tina D. Mahajan, Namrata Singh, Fang Yu, Harlan Sayles, Grant W. Cannon, Brian C. Sauer, Joshua F. Baker, Jeffrey R. Curtis, Ted R. Mikuls, Performance of Administrative Algorithms to Identify Interstitial Lung Disease in Rheumatoid Arthritis, Arthritis Care & Research, 10.1002/acr.24043, 72, 10, (1392-1403), (2019).
- Guogen Shan, Exact Tests for Disease Prevalence Studies With Partially Validated Data, Statistics in Biopharmaceutical Research, 10.1080/19466315.2018.1555099, 11, 3, (266-273), (2019).
- Jeong Ho Park, Sung Woo Moon, Tae Yun Kim, Young Sun Ro, Won Chul Cha, Yu Jin Kim, Sang Do Shin, Sensitivity, specificity, and predictive value of cardiac symptoms assessed by emergency medical services providers in the diagnosis of acute myocardial infarction: a multi-center observational study, Clinical and Experimental Emergency Medicine, 10.15441/ceem.17.257, 5, 4, (264-271), (2018).
- Candace A. Robledo, Edwina H. Yeung, Pauline Mendola, Rajeshwari Sundaram, Nansi S. Boghossian, Erin M. Bell, Charlotte Druschel, Examining the Prevalence Rates of Preexisting Maternal Medical Conditions and Pregnancy Complications by Source: Evidence to Inform Maternal and Child Research, Maternal and Child Health Journal, 10.1007/s10995-016-2177-8, 21, 4, (852-862), (2016).
- Ezer Miller, Amit Huppert, Ilya Novikov, Alon Warburg, Asrat Hailu, Ibrahim Abbasi, Laurence S. Freedman, Estimation of infection prevalence and sensitivity in a stratified two‐stage sampling design employing highly specific diagnostic tests when there is no gold standard, Statistics in Medicine, 10.1002/sim.6545, 34, 25, (3349-3361), (2015).
- Comparison of Algorithm-based Estimates of Occupational Diesel Exhaust Exposure to Those of Multiple Independent Raters in a Population-based Case–Control Study, The Annals of Occupational Hygiene, 10.1093/annhyg/mes082, (2012).
- Catterina Ferreccio, María Isabel Barriga, Marcela Lagos, Carolina Ibáñez, Helena Poggi, Francisca González, Solana Terrazas, Hormuzd A. Katki, Felipe Núñez, Jaime Cartagena, Vanessa Van De Wyngard, Daysi Viñales, Jorge Brañes, Screening trial of human papillomavirus for early detection of cervical cancer in Santiago, Chile, International Journal of Cancer, 10.1002/ijc.27662, 132, 4, (916-923), (2012).
- Sarah L. Goff, Penelope S. Pekow, Glenn Markenson, Alexander Knee, Lisa Chasan‐Taber, Peter K. Lindenauer, Validity of Using ICD‐9‐CM Codes to Identify Selected Categories of Obstetric Complications, Procedures and Co‐morbidities, Paediatric and Perinatal Epidemiology, 10.1111/j.1365-3016.2012.01303.x, 26, 5, (421-429), (2012).




