Building a disease risk model of osteoporosis based on traditional Chinese medicine symptoms and western medicine risk factors



In the Traditional Chinese Medicine (TCM) cross-sectional survey conducted by our team, we were interested in determining the risk factors of osteoporosis. To analyze this TCM study, we had to deal with three statistical problems: (1) a very large number of potential risk factors, (2) interactions among potential risk factors, and (3) nonlinear effects of some continuous-scale risk factors. To address these analytic issues, we used two data mining methods, support vector machine recursive feature elimination and random forest; to deal with the curse of high-dimensional risk factors, we applied another data mining technique of association rule learning to discover the potential associations among risk factors. Finally, we employed the generalized partial linear model (GPLM) to determine nonlinear effects of an important continuous-scale risk factor. The final GPLM model shows that TCM symptoms play an important role in assessing the risk of osteoporosis. The GPLM also reveals a nonlinear effect of the important risk factor, menopause years, which might be missed by the generalized linear model. Copyright © 2012 John Wiley & Sons, Ltd.