Get access

Mining large medical claims database to identify high-risk patients: The case of antidepressant utilization



Data mining techniques have been applied to discover knowledge from large observational data sets. In this paper, we focus on mining large medical claims databases to identify high-risk patients. Patient selection, feature extraction, and feature selection are three important processing steps before popular data mining techniques are successfully applied. Both patient selection and feature extraction require domain knowledge. The episode treatment group methodology is a useful tool for organizing medical claims data. It is used for patient selection and feature extraction in this paper. The specific goal of the study is to identify patients with major depression who have a high risk of receiving inadequate antidepressant medication. A nationwide medical claims database covering a 5-year period is used for this study. The records of 31,721 high-risk patients and 50,022 comparison patients were examined for 18 features that include patient demographics, episode factors, and comorbidity factors. After supervised feature selection, three features were selected and analyzed using the classification and regression tree method. The result showed that it is possible to use two of the features (number of non-antidepressant medications used and average number of claims during an episode of major depression) to identify a group of high-risk patients. These patients are 2.67 times more likely to have inadequate antidepressant medication than the comparison patients. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 154-163 DOI: 10.1002/widm.5