Bayesian semiparametric ROC curve estimation and disease diagnosis



We develop a novel semiparametric modeling framework involving mixtures of Polya trees for screening data with the dual purpose of diagnosing infection or disease status and of assessing the accuracy of continuous diagnostic measures. In this framework, we obtain (i) predictive probabilities of ‘disease’ based on continuous diagnostic test outcomes in conjunction with other information, including relevant covariates and results from one or more independent binary diagnostic tests. An example would be the modeling of a serum enzyme-linked immunosorbent assay (ELISA) procedure for detecting antibodies to an infectious agent when used in conjunction with culture for antigen detection. Our second goal is to (ii) characterize measures of diagnostic performance of continuous tests by estimating receiver-operating characteristic curves and area under the curve, primarily when such extra information is available.

When true disease status is unknown, parametric and nonparametric analyses require sufficient separation between the distributions of outcome values for the diseased and nondiseased populations. However, this overlap becomes less problematic when additional information in the form of either an informative ‘prior’ that is based on real (preferably data-based) scientific input, or when additional information, or both, are available. The additional information can be used to distinguish ‘diseased’ from ‘nondiseased’ individuals. We present an example using simulated data that illustrates this point. We also present an example involving data from an animal-health survey for Johne's disease, where the performance of a serum ELISA is evaluated using additional information obtained from fecal culture. Issues related to identifiability and partial identifiability are also discussed. Copyright © 2008 John Wiley & Sons, Ltd.