Volume 17, Issue 1
Research Article

Centering and scaling in component analysis

Rasmus Bro

Corresponding Author

E-mail address: Rasmus.bro@optimax.dk

Chemometrics Group, Food Technology, Department of Dairy and Food Science, Royal Veterinary and Agricultural University, DK‐1958 Frederiksberg C, Denmark

Chemometrics Group, Food Technology, Department of Dairy and Food Science, Royal Veterinary and Agricultural University, DK‐1958 Frederiksberg C, Denmark.Search for more papers by this author
Age K. Smilde

Process Analysis and Chemometrics, Department of Chemical Engineering, University of Amsterdam, NL‐1018 WV Amsterdam, The Netherlands

Search for more papers by this author
First published: 31 January 2003
Citations: 225

Abstract

In this paper the purpose and use of centering and scaling are discussed in depth. The main focus is on two‐way bilinear data analysis, but the results can easily be generalized to multiway data analysis. In fact, one of the scopes of this paper is to show that if two‐way centering and scaling are understood, then multiway centering and scaling is quite straightforward. In the literature it is often stated that preprocessing of multiway arrays is difficult, but here it is shown that most of the difficulties do not pertain to three‐ and higher‐way modeling in particular. It is shown that centering is most conveniently seen as a projection step, where the data are projected onto certain well‐defined spaces within a given mode. This view of centering helps to explain why, for example, centering data with missing elements is likely to be suboptimal if there are many missing elements. Building a model for data consists of two parts: postulating a structural model and using a method to estimate the parameters. Centering has to do with the first part: when centering, a model including offsets is postulated. Scaling has to do with the second part: when scaling, another way of fitting the model is employed. It is shown that centering is simply a convenient technique to estimate model parameters for models with certain offsets, but this does not work for all types of offsets. It is also shown that scaling is a way to fit models with a weighted least squares loss function and that sometimes this change in objective function cannot be performed by a simple scaling step. Further practical aspects of and alternatives to centering and scaling are discussed, and examples are used throughout to show that the conclusions in the paper are not only of theoretical interest but can have an impact on practical data analysis. Copyright © 2003 John Wiley & Sons, Ltd.

Number of times cited according to CrossRef: 225

  • Multivariate Exploratory Data Analysis Using Component Models, Reference Module in Food Science, 10.1016/B978-0-08-100596-5.22902-8, (2020).
  • Hyperspectral near infrared image calibration and regression, Analytica Chimica Acta, 10.1016/j.aca.2020.01.019, (2020).
  • Pre-processing Methods, Reference Module in Chemistry, Molecular Sciences and Chemical Engineering, 10.1016/B978-0-12-409547-2.14878-4, (2020).
  • Analysis of Turbulent Reacting Jets via Principal Component Analysis, Data Analysis for Direct Numerical Simulations of Turbulent Combustion, 10.1007/978-3-030-44718-2, (233-251), (2020).
  • Chemometrics: multivariate analysis of chemical data, Chemical Analysis of Food, 10.1016/B978-0-12-813266-1.00002-4, (33-76), (2020).
  • Data Preprocessing for Multiblock Modelling – A Systematization with New Methods, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2020.103959, (103959), (2020).
  • Analysis of Plant Cell Walls by Attenuated Total Reflectance Fourier Transform Infrared Spectroscopy, The Plant Cell Wall, 10.1007/978-1-0716-0621-6_16, (297-313), (2020).
  • Ultraviolet-visible diffuse reflectance spectroscopy combined with chemometrics for rapid discrimination of Angelicae Sinensis Radix from its four similar herbs, Analytical Methods, 10.1039/D0AY00285B, (2020).
  • Optimising a stevia mix by mixture design and napping: a case study with high protein plain yoghurt, International Dairy Journal, 10.1016/j.idairyj.2020.104802, (104802), (2020).
  • Multi-set Pre-processing of Multicolor Flow Cytometry Data, Scientific Reports, 10.1038/s41598-020-66195-3, 10, 1, (2020).
  • Computer-aided recognition of myopic tilted optic disc using deep learning algorithms in fundus photography, BMC Ophthalmology, 10.1186/s12886-020-01657-w, 20, 1, (2020).
  • Methods for Early Detection of Microbiological Infestation of Buildings Based on Gas Sensor Technologies, Chemosensors, 10.3390/chemosensors8010007, 8, 1, (7), (2020).
  • Fusion of Mid-Wave Infrared and Long-Wave Infrared Reflectance Spectra for Quantitative Analysis of Minerals, Sensors, 10.3390/s20051472, 20, 5, (1472), (2020).
  • Data Fusion for the Prediction of Elemental Concentrations in Polymetallic Sulphide Ore Using Mid-Wave Infrared and Long-Wave Infrared Reflectance Data, Minerals, 10.3390/min10030235, 10, 3, (235), (2020).
  • Factor Uniqueness of the Structural Parafac Model, Psychometrika, 10.1007/s11336-020-09715-4, (2020).
  • Chemometric Strategies for Spectroscopy-Based Food Authentication, Applied Sciences, 10.3390/app10186544, 10, 18, (6544), (2020).
  • A volatilomics approach for off-line discrimination of minced beef and pork meat and their admixture using HS-SPME GC/MS in tandem with multivariate data analysis, Meat Science, 10.1016/j.meatsci.2019.01.003, (2019).
  • A graphical data processing pipeline for mass spectrometry imaging-based spatially resolved metabolomics on tumor heterogeneity, Analytica Chimica Acta, 10.1016/j.aca.2019.05.068, (2019).
  • Investigating Weathering in Light Diesel Oils using Comprehensive Two-Dimensional Gas Chromatography – High Resolution Mass Spectrometry and Pixel-based Analysis: Possibilities and Limitations, Journal of Chromatography A, 10.1016/j.chroma.2019.01.042, (2019).
  • Parametric study and modeling of cross-flow heat exchanger fouling in phosphoric acid concentration plant using artificial neural network, Journal of Process Control, 10.1016/j.jprocont.2019.10.001, 84, (133-145), (2019).
  • PARAMO: Enhanced Data Pre‐processing in Batch Multivariate Statistical Process Control, Journal of Chemometrics, 10.1002/cem.3188, 33, 12, (2019).
  • Multivariate analysis of performance and emission parameters in a diesel engine using biodiesel and oxygenated additive, Energy Conversion and Management, 10.1016/j.enconman.2019.112183, 201, (112183), (2019).
  • undefined, 2019 22nd International Conference on Computer and Information Technology (ICCIT), 10.1109/ICCIT48885.2019.9038559, (1-5), (2019).
  • Robust latent-variable interpretation of in vivo regression models by nested resampling, Scientific Reports, 10.1038/s41598-019-55796-2, 9, 1, (2019).
  • Analysis of Metabolomics Data—A Chemometrics Perspective, Reference Module in Chemistry, Molecular Sciences and Chemical Engineering, 10.1016/B978-0-12-409547-2.14593-7, (2019).
  • Comprehensive modeling of bloodstain aging by multivariate Raman spectral resolution with kinetics, Communications Chemistry, 10.1038/s42004-019-0217-1, 2, 1, (2019).
  • Unraveling Diagnostic Biomarkers of Schizophrenia Through Structure-Revealing Fusion of Multi-Modal Neuroimaging Data, Frontiers in Neuroscience, 10.3389/fnins.2019.00416, 13, (2019).
  • A Strategy for Inter-correlation Identification between Metabolome and Microbiome, Analytical Chemistry, 10.1021/acs.analchem.9b02948, (2019).
  • MVBatch: A matlab toolbox for batch process modeling and monitoring, Chemometrics and Intelligent Laboratory Systems, 10.1016/j.chemolab.2018.11.001, (2018).
  • Multiomics Data Integration in Time Series Experiments, Data Analysis for Omic Sciences: Methods and Applications, 10.1016/bs.coac.2018.06.005, (505-532), (2018).
  • Integration of Metabolomic Data From Multiple Analytical Platforms: Towards Extensive Coverage of the Metabolome, Data Analysis for Omic Sciences: Methods and Applications, 10.1016/bs.coac.2018.06.003, (477-504), (2018).
  • The Pixel-Based Chemometric Approach for Oil Spill Identification and Hydrocarbon Source Differentiation, Oil Spill Environmental Forensics Case Studies, 10.1016/B978-0-12-804434-6.00021-5, (443-463), (2018).
  • Exploring the unknown Balkans: Early Byzantine glass from Jelica Mt. in Serbia and its contemporary neighbours, Journal of Radioanalytical and Nuclear Chemistry, 10.1007/s10967-018-5987-x, 317, 2, (1175-1189), (2018).
  • Detection of adulterants in dietary supplements with Ginkgo biloba extract by attenuated total reflectance Fourier transform infrared spectroscopy and multivariate methods PLS-DA and PCA, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 10.1016/j.saa.2018.10.008, (2018).
  • Complex foraging ecology of the red harvester ant and its effect on the soil seed bank, Acta Oecologica, 10.1016/j.actao.2017.12.003, 86, (57-65), (2018).
  • Examples of unwanted variation when characterising dissolved organic matter using direct injection electrospray mass spectrometry and chemometrics, Analytical Methods, 10.1039/C8AY00226F, 10, 22, (2636-2646), (2018).
  • Three-Way Generalized Structured Component Analysis, Quantitative Psychology, 10.1007/978-3-319-77249-3_17, (195-209), (2018).
  • Chemometrics in Analytical Chemistry, Applied Chemoinformatics, 10.1002/9783527806539, (471-499), (2018).
  • Intrinsic and extrinsic factors which influence metal adsorption to road dust, Science of The Total Environment, 10.1016/j.scitotenv.2017.11.047, 618, (236-242), (2018).
  • Forensic Investigations of Diesel Oil Spills in the Environment Using Comprehensive Two-Dimensional Gas Chromatography–High Resolution Mass Spectrometry and Chemometrics: New Perspectives in the Absence of Recalcitrant Biomarkers, Environmental Science & Technology, 10.1021/acs.est.8b05238, (2018).
  • A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia, Briefings in Bioinformatics, 10.1093/bib/bbx060, 19, 6, (1356-1369), (2017).
  • Mass spectrometry in untargeted liquid chromatography/mass spectrometry metabolomics: Electrospray ionisation parameters and global coverage of the metabolome, Rapid Communications in Mass Spectrometry, 10.1002/rcm.8010, 32, 2, (121-132), (2017).
  • Tensor-Based Modeling of Temporal Features for Big Data CTR Estimation, Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation, 10.1007/978-3-319-58274-0_2, (16-27), (2017).
  • Protective behaviour monitoring on wood photo-degradation by spectroscopic techniques coupled with chemometrics, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 10.1016/j.saa.2016.05.050, 172, (34-42), (2017).
  • Preprocessing and Pretreatment of Metabolomics Data for Statistical Analysis, Metabolomics: From Fundamentals to Clinical Applications, 10.1007/978-3-319-47656-8_6, (145-161), (2017).
  • Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios, Analytical and Bioanalytical Chemistry, 10.1007/s00216-017-0628-8, 409, 28, (6699-6708), (2017).
  • Amino Acid Signatures to Evaluate the Beneficial Effects of Weight Loss, International Journal of Endocrinology, 10.1155/2017/6490473, 2017, (1-12), (2017).
  • Binary Classification of CNS and PNS Drugs, Pharmaceutical Chemistry Journal, 10.1007/s11094-017-1535-1, 50, 12, (800-804), (2017).
  • Extraction optimization and pixel-based chemometric analysis of semi-volatile organic compounds in groundwater, Anal. Methods, 10.1039/C7AY01348E, 9, 41, (5970-5979), (2017).
  • Evaluation of Tensor-Based Algorithms for Real-Time Bidding Optimization, Intelligent Information and Database Systems, 10.1007/978-3-319-54472-4_16, (160-169), (2017).
  • Chemometrics approach to FT-IR hyperspectral imaging analysis of degradation products in artwork cross-section, Microchemical Journal, 10.1016/j.microc.2017.01.007, 132, (69-76), (2017).
  • Common and distinct components in data fusion, Journal of Chemometrics, 10.1002/cem.2900, 31, 7, (2017).
  • An expert system for automated flavour matching – Prioritizer, Flavour and Fragrance Journal, 10.1002/ffj.3386, 32, 4, (286-293), (2017).
  • Hidden information in principal component analysis of ToF‐SIMS data: On the use of correlation loadings for the identification of significant signals and structure elucidation, Surface and Interface Analysis, 10.1002/sia.6269, 49, 10, (1028-1038), (2017).
  • Chemometrics – Bioinformatics, Food Authentication, 10.1002/9781118810224, (481-518), (2017).
  • Impact of Soil Warming on the Plant Metabolome of Icelandic Grasslands, Metabolites, 10.3390/metabo7030044, 7, 3, (44), (2017).
  • Prediction of Pectin Yield and Quality by FTIR and Carbohydrate Microarray Analysis, Food and Bioprocess Technology, 10.1007/s11947-016-1802-2, 10, 1, (143-154), (2016).
  • PCA-based multivariate statistical network monitoring for anomaly detection, Computers & Security, 10.1016/j.cose.2016.02.008, 59, (118-137), (2016).
  • Rapid quantification of casein in skim milk using Fourier transform infrared spectroscopy, enzymatic perturbation, and multiway partial least squares regression: Monitoring chymosin at work, Journal of Dairy Science, 10.3168/jds.2016-10947, 99, 8, (6071-6079), (2016).
  • Assessment of water quality in the vicinity of peat extraction sites: The case of Pien‐Saimaa, Finland, Water and Environment Journal, 10.1111/wej.12168, 30, 1-2, (157-166), (2016).
  • Remedies for Degeneracy in Candecomp/Parafac, Quantitative Psychology Research, 10.1007/978-3-319-38759-8_16, (213-227), (2016).
  • TNF-insulin crosstalk at the transcription factor GATA6 is revealed by a model that links signaling and transcriptomic data tensors, Science Signaling, 10.1126/scisignal.aad3373, 9, 431, (ra59-ra59), (2016).
  • Bioelectronic tongues: New trends and applications in water and food analysis, Biosensors and Bioelectronics, 10.1016/j.bios.2015.12.075, 79, (608-626), (2016).
  • DoE optimization of a mercury isotope ratio determination method for environmental studies, Talanta, 10.1016/j.talanta.2016.02.012, 152, (179-187), (2016).
  • Changes in glucose-elicited blood metabolite responses following weight loss and long term weight maintenance in obese individuals with impaired glucose tolerance, Diabetes Research and Clinical Practice, 10.1016/j.diabres.2015.12.024, 113, (187-197), (2016).
  • Normalization techniques for PARAFAC modeling of urine metabolomic data, Metabolomics, 10.1007/s11306-016-1059-9, 12, 7, (2016).
  • Tíz év az Európai Unióban – az új tagországok agrárteljesítményei, Közgazdasági Szemle, 10.18414/KSZ.2016.3.260, 63, 3, (260-284), (2016).
  • Time-of-flight secondary ion mass spectrometry as a screening method for the identification of degradation products in lithium-ion batteries—A multivariate data analysis approach, Journal of Vacuum Science & Technology B, Nanotechnology and Microelectronics: Materials, Processing, Measurement, and Phenomena, 10.1116/1.4948371, 34, 3, (03H138), (2016).
  • Reference (Potential) Evapotranspiration. II: Frequency Distribution in Humid, Subhumid, Arid, Semiarid, and Mediterranean-Type Climates, Journal of Irrigation and Drainage Engineering, 10.1061/(ASCE)IR.1943-4774.0000979, 142, 4, (04015066), (2016).
  • Physicochemical property profile for brain permeability: comparative study by different approaches, Journal of Drug Targeting, 10.3109/1061186X.2015.1132224, 24, 7, (655-662), (2016).
  • Blank Augmentation Protocol for Improving the Robustness of Multivariate Calibrations, Applied Spectroscopy, 10.1366/000370207780807777, 61, 5, (497-506), (2016).
  • Simultaneous Fault Detection and Sensor Selection for Condition Monitoring of Wind Turbines, Energies, 10.3390/en9040280, 9, 4, (280), (2016).
  • A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps, Metabolites, 10.3390/metabo6040040, 6, 4, (40), (2016).
  • Effects of Growth Parameters on the Analysis of Aspergillus flavus Volatile Metabolites, Separations, 10.3390/separations3020013, 3, 2, (13), (2016).
  • How to detect which variables are causing differences in component structure among different groups, Behavior Research Methods, 10.3758/s13428-015-0687-8, 49, 1, (216-229), (2015).
  • Heavy metals in the gold mine soil of the upstream area of a metropolitan drinking water source, Environmental Science and Pollution Research, 10.1007/s11356-015-5479-2, 23, 3, (2831-2847), (2015).
  • Rings or daggers, axes or fibulae have a different composition? A multivariate study on Central Italy bronzes from eneolithic to early iron age, Chemistry Central Journal, 10.1186/s13065-015-0090-7, 9, 1, (2015).
  • N‐way modeling for wavelet filter determination in multivariate image analysis, Journal of Chemometrics, 10.1002/cem.2717, 29, 6, (379-388), (2015).
  • Mass Spectrometry Imaging and GC-MS Profiling of the Mammalian Peripheral Sensory-Motor Circuit, Journal of The American Society for Mass Spectrometry, 10.1007/s13361-015-1128-8, 26, 6, (958-966), (2015).
  • Shall we use hardware sensor measurements or soft-sensor estimates? Case study in a full-scale WWTP, Environmental Modelling & Software, 10.1016/j.envsoft.2015.07.013, 72, (215-229), (2015).
  • Combining luminescence spectroscopy, parallel factor analysis and quantum chemistry to reveal metal speciation – a case study of uranyl( vi ) hydrolysis , Chemical Science, 10.1039/C4SC02022G, 6, 2, (964-972), (2015).
  • Influence of Tableting on the Conformation and Thermal Stability of Trypsin as a Model Protein, Journal of Pharmaceutical Sciences, 10.1002/jps.24672, 104, 12, (4314-4321), (2015).
  • Scaling in ANOVA-simultaneous component analysis, Metabolomics, 10.1007/s11306-015-0785-8, 11, 5, (1265-1276), (2015).
  • Chemometrics and qualitative analysis have a vibrant relationship, TrAC Trends in Analytical Chemistry, 10.1016/j.trac.2015.02.015, 69, (34-51), (2015).
  • Feasibility Study for Transforming Spectral and Instrumental Artifacts for Multivariate Calibration Maintenance, Applied Spectroscopy, 10.1366/14-07651, 69, 3, (407-416), (2015).
  • An untargeted gas chromatography mass spectrometry metabolomics platform for marine polychaetes, Journal of Chromatography A, 10.1016/j.chroma.2015.01.025, 1384, (133-141), (2015).
  • Principal component analysis in metabolomics: from multidimensional data toward biologically relevant information, Identification and Data Processing Methods in Metabolomics, 10.4155/9781910420287, (82-95), (2015).
  • Framework for alternating-least-squares-based multivariate curve resolution with application to time-of-flight secondary ion mass spectrometry imaging, Journal of Vacuum Science & Technology A: Vacuum, Surfaces, and Films, 10.1116/1.4927528, 33, 5, (05E123), (2015).
  • Fundamentals of PARAFAC, Fundamentals and Analytical Applications of Multiway Calibration, 10.1016/B978-0-444-63527-3.00001-1, (7-35), (2015).
  • Trajectory tracking of batch product quality using intermittent measurements and moving window estimation, Journal of Process Control, 10.1016/j.jprocont.2014.11.009, 25, (115-128), (2015).
  • The Future of Pharmaceutical Manufacturing Sciences, Journal of Pharmaceutical Sciences, 10.1002/jps.24594, 104, 11, (3612-3638), (2015).
  • Metabolite Profiles of Male and Female Humboldt Penguins, Veterinary Sciences, 10.3390/vetsci2040349, 2, 4, (349-362), (2015).
  • Multiplexed Component Analysis to Identify Genes Contributing to the Immune Response during Acute SIV Infection, PLOS ONE, 10.1371/journal.pone.0126843, 10, 5, (e0126843), (2015).
  • A dimension reduction technique for two-mode non-convex fuzzy data, Soft Computing, 10.1007/s00500-014-1538-8, 20, 2, (749-762), (2014).
  • Sparse multi-block PLSR for biomarker discovery when integrating data from LC–MS and NMR metabolomics, Metabolomics, 10.1007/s11306-014-0698-y, 11, 2, (367-379), (2014).
  • The influence of scaling metabolomics data on model classification accuracy, Metabolomics, 10.1007/s11306-014-0738-7, 11, 3, (684-695), (2014).
  • Chemical imaging and solid state analysis at compact surfaces using UV imaging, International Journal of Pharmaceutics, 10.1016/j.ijpharm.2014.10.064, 477, 1-2, (527-535), (2014).
  • Metabolite profile deviations in an oral glucose tolerance test—a comparison between lean and obese individuals, Obesity, 10.1002/oby.20868, 22, 11, (2388-2395), (2014).
  • Applying parallel factor analysis and Tucker‐3 methods on sensory and instrumental data to establish preference maps: case study on sweet corn varieties, Journal of the Science of Food and Agriculture, 10.1002/jsfa.6673, 94, 15, (3213-3225), (2014).
  • Search for liquids electrospraying the smallest possible nanodrops in vacuo, Journal of Applied Physics, 10.1063/1.4901635, 116, 22, (224504), (2014).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.