In the real world, it is common that different experts have different opinions on the same problem due to their different experience. How to come up with a consistent decision becomes a critical issue. As an example, patient similarity assessment is an important task in the context of patient cohort identification for comparative effectiveness studies and clinical decision-support applications. The goal is to derive clinically meaningful distance metric to measure the similarity between patients represented by their key clinical indicators. It is desirable to learn the distance metric based on experts' knowledge of clinical similarity among subjects. However, often different physicians have different understandings of patient similarity based on the specifics of the cases. The distance metric learned for each individual physician often leads to a limited view of the true underlying distance metric. The key challenge will be how to integrate the individual distance metrics obtained for a group of physicians into a globally consistent unified metric.
To achieve this goal, we propose the composite distance integration (Comdi) approach in this paper. Comdi first constructs discriminative neighborhoods from each individual metrics, then it combines all discriminative information in those neighborhoods to learn a single optimal distance metric. We formulate Comdi as a quadratic optimization problem and propose an efficient alternating strategy to find the solution. Besides learning a globally consistent metric, Comdi provides an elegant way to share knowledge across multiple experts without sharing the underlying data, which lowers the risk of disclosing private data. Our experiments on several benchmark data sets show approximately 10% improvement in classification accuracy over baseline methods, which suggests that Comdi is an effective and general metric learning approach. We also demonstrate two case studies on applying Comdi to real-world clinic data sets. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 54–69, 2012