Traditional Chinese medicine (TCM) is a clinical-based discipline in which real-world clinical practice plays a significant role for both the development of clinical therapy and theoretical research. The large-scale clinical data generated during the daily clinical operations of TCM provide a highly valuable knowledge source for clinical decision making. Secondary analysis of these data would be a vital task for TCM clinical studies before the randomised controlled trials are conducted. In this article, we discuss the challenges and issues, such as structured data curation, data preprocessing and quality, large-scale data management and complex data analysis requirements, in the data processing and analysis of real-world TCM clinical data. Furthermore, we also discuss related state-of-the-art research and solutions in China. We have shown that the clinical data warehouse based on the collection of structured electronic medical record data and clinical terminology would be a promising approach for generating clinical hypotheses and helping the discovery of clinical knowledge from large-scale real-world TCM clinical data. Copyright © 2011 John Wiley & Sons, Ltd.