Diagnosis of ulcerative colitis before onset of inflammation by multivariate modeling of genome-wide gene expression data



Background: Endoscopically obtained mucosal biopsies play an important role in the differential diagnosis between ulcerative colitis (UC) and Crohn's disease (CD), but in some cases where neither macroscopic nor microscopic signs of inflammation are present the biopsies provide only inconclusive information. Previous studies indicate that CD cannot be diagnosed by molecular and histological diagnostic tools using colonic biopsies without microscopic signs of inflammation, but it is unknown if this is also the case for UC.

Methods: The aim of the present study was to apply multivariate modeling of genome-wide gene expression to investigate if a diagnosable preinflammatory state exists in biopsies of noninflamed UC colon, and to exploit such information to build a diagnostic tool.

Results: Genome-wide gene expression data were obtained from control subjects and UC and CD patients. In total, 89 biopsies from 78 patients were included. A diagnostic model was derived with the random forest method based on 71 biopsies from 60 patients. The model-internal out-of-bag performance measure yielded perfect classification. Furthermore, the model was validated in independent 18 noninflamed biopsies from 18 patients (7 UC, 7 CD, 4 control) where the model achieved 100% sensitivity (95% confidence limits: 60.0–100) and 100% specificity (95% confidence limits: 71.5–100).

Conclusions: The present study demonstrates a preinflammatory state in patients diagnosed with UC. In addition, we demonstrate the usefulness of random forest modeling of genome-wide gene expression data for distinguishing quiescent and active UC colonic mucosa versus control and CD colonic mucosa.

(Inflamm Bowel Dis 2009)