Development and comparison of circulation type classifications using the COST 733 dataset and software
Article first published online: 6 MAR 2014
© 2014 Royal Meteorological Society
International Journal of Climatology
How to Cite
Philipp, A., Beck, C., Huth, R. and Jacobeit, J. (2014), Development and comparison of circulation type classifications using the COST 733 dataset and software. Int. J. Climatol.. doi: 10.1002/joc.3920
- Article first published online: 6 MAR 2014
- Manuscript Accepted: 13 DEC 2013
- Manuscript Revised: 4 DEC 2013
- Manuscript Received: 26 DEC 2012
- circulation type classification;
- weather types;
- Rand index;
- pattern correlation;
- manual classification;
- threshold-based classification;
- principal component analysis;
- cluster analysis
In order to examine correspondence between different methods for circulation type classification, a dataset of classification catalogs for 12 different European regions has been created using a specially developed software package. Twenty-seven basic automatic classification methods have been applied in several variants to different input datasets describing atmospheric circulation. Together with six manual classifications a total of 33 methods are available for inter-comparison.
Pattern correlation, frequency time-series correlation and the adjusted Rand index have been used for comparison. Highly significant correspondence has been detected only for two clustering techniques while the remaining classification methods show surprisingly low similarity. A Monte-Carlo test with 1000 classifications of randomly defined types even shows that most of the methods are not more similar among each other than any arbitrarily chosen types.
The predominant dissimilarity between the methods is interpreted to be a result of a lack of inherent structures of the input data. Only simulated annealing clustering and self-organizing maps get nearly identical results because they can optimally fit the partitioning to the outer shape of the data cloud in the phase space. Also methods based on pre-defined types come to very different results because small changes in the definition of thresholds may lead to large differences in the partitioning.
It is concluded that because of the missing inner structure of the data there is no clear statistical reason to prefer any of the examined methods. For practice in synoptic climatology this means that finding a suited classification for a certain purpose may require a broad comparison of methods. The software package cost733class for development, comparison and evaluation of classifications which was developed and used in this study is available at http://cost733.geo.uni-augsburg.de to facilitate this task.