Special Issue Article
Destruction of normal distribution in small samples by centering and scaling
Article first published online: 4 MAR 2011
Copyright © 2011 John Wiley & Sons, Ltd.
Journal of Chemometrics
Special Issue: 5th International Symposium on Computer Applications and Chemometrics in Analytical Chemistry, Budapest, Hungary, June 21–25, 2010. Guest Editor: Károly Héberger
Volume 25, Issue 5, pages 247–253, May 2011
How to Cite
Tóth, G. (2011), Destruction of normal distribution in small samples by centering and scaling. J. Chemometrics, 25: 247–253. doi: 10.1002/cem.1382
- Issue published online: 24 MAY 2011
- Article first published online: 4 MAR 2011
- Manuscript Accepted: 6 JAN 2011
- Manuscript Revised: 5 JAN 2011
- Manuscript Received: 11 AUG 2010
- normal distribution
It is less emphasized in scientific literature that centering and scaling of data may drastically change the original distribution of the data for small samples. The destruction of the original distribution depends on the source of the estimation of the mean (centering) and the divisor (scaling) where the latter is connected to the spread of data. In our comparative study we focus on cases, where the sample is taken from normally distributed data; the means and the standard deviations are population or sample based. We discuss six cases in transforming the data or the sample means. Most of them are studied previously, but some of them have not been theoretically investigated. The transformed data follow normal distribution in three cases, if the scaling is performed with population standard deviation. In one case, the final distribution is related to β-distribution with astonishing density functions for N = 2 Dirac-delta, N = 3 Viking helmet like and N = 4 uniform distributions. Another case is the well-known t-distribution. For one of the transformed data, we were not able to identify the general form. Here we obtained only numerical results for 3 ≤ N. The effect of the transformations was tested on experimental data representing more or less normally distributed variables. We found that transformations using the sample standard deviation were significantly less normally distributed-like than the original data for small samples, but the other transformations enhanced the normal distribution-like feature. The results point out that centering and especially scaling require consideration for small samples up to questioning the reality of subsequent data evaluation processes where normal distribution is assumed. Copyright © 2011 John Wiley & Sons, Ltd.