Efficiently handling high‐dimensional data from multifactorial designs with unequal group sizes using Rebalanced ASCA (RASCA)

A novel chemometric approach is proposed to analyze high‐dimensional data from unbalanced designs of experiments. It combines a rebalancing strategy based on averages with the ASCA method under the name Rebalanced ASCA (RASCA). The ability of RASCA to handle unbalanced designs was compared with standard ASCA, as well as state‐of‐the‐art methods, such as ASCA+ and WE‐ASCA. For that purpose, a controlled framework was designed to provide a systematic comparison of the various approaches. It included two real datasets obtained from initially balanced designs, which were gradually unbalanced by removing observations belonging to specific combinations of factor levels. The results illustrate that all methods considered led to identical solutions when the initial balanced design was kept. Nevertheless, increasing differences appeared when the design was gradually unbalanced. The proposed benchmark showed that RASCA and ASCA+ provided overall similar results for all effects with high agreement with the balanced solutions in comparison to classical ASCA and WE‐ASCA. RASCA was found to be a suitable chemometric tool to tackle unbalanced designs by ensuring unbiased parameter estimators with the added benefit of producing rigorously orthogonal effect matrices, thus facilitating interpretation.

Measurements are often made using advanced data acquisition platforms that may be affected by different sources of variation, and it has become quite common to rely on designed experiments to rationally evaluate the influence of specific factors that are suspected to have an impact on the system under study.
The total variability of each response variable can be independently decomposed into several sources according to the factors included in an experimental design using analysis of variance (ANOVA). 3 However, the univariate nature of this approach prevents from efficiently addressing potential relationships between variables. Its extension to multivariate ANOVA (MANOVA) 4 is limited to datasets with fewer variables than observations. A well-established chemometric approach to handle multivariate situations involving underdetermined systems is principal components analysis (PCA). PCA can treat wide datasets, while modeling how variables covary. 5 However, unlike ANOVA-related approaches, PCA is unsupervised and does not take the experimental design into account. The effects of the factors are thus mixed in each of the obtained principal components, which makes interpretation more challenging. To bridge the existing gap between ANOVA and PCA, when applied separately to multifactorial data structures, a series of methods combining the advantages of ANOVA, for the decomposition of the data into several sources of variation, and PCA, to handle high-dimensional datasets, were proposed in the literature.
While these methods mostly share a first univariate decomposition step based on ANOVA to take advantage of the experimental design, they rely on different approaches for the downstream multivariate analysis of the effect submatrices. For example, ANOVA-PCA (APCA) 6 and ANOVA-simultaneous components analysis (ANOVA-SCA or ASCA) 2,7 apply, respectively, PCA or SCA to each of the effect matrices. The main difference between these two methods lies in the way the matrix of the residuals is handled. While ASCA applies SCA to each pure effect matrix for dimensionality reduction, APCA adds the residuals to each pure effect matrix before applying PCA. The addition of the residuals to the pure effects results in the so-called augmented effect matrices. Applying PCA to these matrices has the advantage of representing the natural variation between the replicates of each level of an effect or combination of levels. However, since the extracted components maximize the variation in the space of the augmented effect matrices, these directions do not necessarily maximize the differences between the (combination of) levels of the effects of interest. On the other hand, applying SCA directly to the pure effect matrices maximizes the differences between the mean effects but does not account for the natural variation in the data. An extension of ASCA representing this variation was proposed 8 and will be discussed later. Other methods rely on the combination of ANOVA with partial least squares (PLS) such as analysis of variance-PLS (AoV-PLS), 9 ANOVA-PLS, 10 or ANOVA-target projection (ANOVA-TP). 11 The latter processes a posteriori the PLS solution by target projection to condensate the response-relevant variation in a single latent variable. 12 All these methods rely on the construction of a specific model from each of the pure or augmented effect matrices, which can be considered a disadvantage in terms of interpretation when the number of factors is high. In this context, other methods have been developed to offer the possibility to build a unique model providing a summary of the decomposition of all the sources of variation. This is the case of parallel factor analysis-SCA (PARAFASCA), 13 analysis of variance multiblock orthogonal partial least squares (AMOPLS), 14 and ANOVA-Common Dimensions (AComDim), 15 whose comparison showed their suitability to analyze complex designs. 16 The detailed description of all these methods being out of the scope of this article, the reader may refer to the above-cited literature for more information.
Although the initial development of these methods and most of their applications in the literature are related to the study of fixed and crossed factor designs, which is also the focus of this work, developments to analyze designs with possibly both fixed and random effects have been made. This is the case of the methods linear mixed model PCA (LiMM-PCA) 17 and repeated measures ASCA+ (RM-ASCA+). 18 Another matter that has been receiving further attention is the question of unbalanced designs of experiments. A design is considered balanced when the number of observations for each possible combination of levels between the factors is strictly equal. Unfortunately, situations where the design is unbalanced are quite common when collecting experimental data, and different strategies have been proposed to tackle this issue. Classical ANOVA decomposition based on the differences of means between factors (combination of) levels leads to biased estimators of the effects when the design is unbalanced. 19 As an alternative, Thiel et al 19 proposed a formulation of the ANOVA statistical modeling based on the general linear models (GLM). In the proposed method named ASCA+, classical ANOVA decomposition in ASCA is replaced by the GLM methodology to decompose the response matrix. In ASCA+, the levels of each effect are coded using sum coding. Other types of coding are available, 20 but Thiel et al 19 reported that a unique estimation of the effects can be obtained using sum coding through a type III sum of squares to jointly estimate the model parameters. Along the same line, Ali et al 21 introduced another extension of ASCA under the name weighted effect-ASCA (WE-ASCA) based on the weighted effect coding 22,23 of the design matrix, as an alternative to sum coding to deal with unbalanced designs.
ANOVA decomposition based on the classical approach (ASCA) and the GLM methodology using sum (ASCA+) or weighted effect (WE-ASCA) coding gives identical results when the design is balanced. Even though the above coding approaches coupled to the GLM methodology can be used to estimate unbiased estimators of the effects of unbalanced designs, the resulting effect submatrices are not all mutually orthogonal. The consequence is that different sources of variability in the data are confounded up to the level of unbalancedness. Although the development of ASCA+ and WE-ASCA are based on well-established ANOVA modeling statistical theory, the ability of these methods to effectively cope with unbalanced multifactorial designs still lacks empirical validation. An objective study evaluating their performance in this context is therefore needed.
In this work, a new strategy is proposed to tackle unbalanced designs. This approach is based on the calculation of averages for each unique factor levels combination to form a balanced dataset before ANOVA decomposition. This rebalancing strategy was combined with ASCA in this work under the name Rebalanced ASCA (RASCA). A detailed discussion on the implementation of RASCA is proposed. Furthermore, a systematic comparison of the ability of ASCA, ASCA+, WE-ASCA, and RASCA to deal with unbalanced multifactorial designs is presented. This benchmark was performed in a controlled framework where real datasets, with initially balanced designs, were gradually unbalanced by removing observations belonging to specific combinations of levels in the factors.

| RASCA IMPLEMENTATION
Although parameter estimations in ASCA+ and WE-ASCA are unbiased, even in the case of unbalanced designs, the effect matrices obtained may not be mutually orthogonal. As a consequence, the application of SCA to these matrices may be affected by the over-or under-representation of some combinations of levels between the factors. This may lead to components that do not necessarily describe solely specific main effects or interactions but a mixture of them, making interpretation more challenging. RASCA tackles that issue by rebalancing the design of experiments before applying ASCA. It must be emphasized that all methods (ASCA, ASCA+, WE-ASCA, and RASCA) rely on the assumption that all combinations of levels between the factors are present in the data.
In this section, implementation details of RASCA are described. Figure 1 illustrates RASCA algorithm step by step and serves as a visual support to describe the whole procedure hereafter. Let X U be the original data matrix and Y U the associated unbalanced design of experiments with two-fixed effect factors and one interaction, where the first factor has two levels, the second three levels, and the interaction six combinations of levels. Note that hereafter the mathematical notations with the superscripts U and B define unbalanced and balanced designs, respectively.

| Rebalancing strategy
The first step of RASCA consists in rebalancing Y U . One way to achieve this would be to sample observations from X U to form a balanced design. However, since it would be arbitrary to define which observations should be kept or not from X U , balancing could be performed not only once but several times, where N would define the number of undersamplings. A design being balanced if there is an equal number of observations for all possible combinations of factor levels, let it be n, then n observations per unique combination of levels need to be sampled to form a balanced design. The number of undersampling iterations N to perform can be obtained by calculating the number of all possible combinations of n observations to be drawn for each combination of levels, leading to an exhaustive list of all possible balanced data matrices with equal dimensions, from which an average matrix X B can be calculated. Nevertheless, building exhaustively all these matrices is not computationally efficient. A more efficient way of calculating X B , and the one advocated in this work, is to directly compute averages of each factor level combination from the original data X U . To do so, an average measurement is calculated for each unique combination of levels based on all samples associated with that combination. Then, the n replicates of the same combination of levels in X B are assigned the same average measurement to form the rebalanced mean matrix X B . The resulting matrix X B is then used for ANOVA decomposition in step 2a. Since the design Y B is balanced, performing this decomposition using the classical ANOVA approach or the GLM methodology produces identical results and simplicity can thus be favored to speed up computations. In step 2b, the previously calculated mean effect vectors of the levels, or combination of levels, are stored for later use. The mean effect vectors are defined as x f ,l , where f corresponds to an effect in fα, β, αβg and l takes values between 1 and the total number of levels in the associated effect f : 2 for α, 3 for β, and 6 for αβ. Note that hereafter whenever the subscript f is mentioned, it relates to any of the effects α, β, and αβ. This approach offers a reliable estimation of each level barycenter by taking advantage of average values. The mean effect vectors obtained through the exhaustive construction of all balanced data matrices or through direct averages are equal, but the latter has the critical advantage of being computationally more efficient.

| ASCA step
In step 3, ASCA is applied to the rebalanced mean matrix X B as it would be in a standard setting. The interested reader can refer to the original articles and a review for a detailed description of the theory and implementation of ASCA. 2,7,24 However, some details on the main results obtained by applying ASCA to the rebalanced mean matrix will be discussed here. The ANOVA decomposition of the rebalanced mean matrix X B generates mutually orthogonal pure effect matrices X B f . SCA is then applied to each pure effect matrix to extract the respective scores T B f and loadings P B f . Scores can be used to assess potential differences between the levels of each factor or combination of levels. Loadings can then be investigated for interpretation purposes. Since it relies on pure effect matrices composed of identical values for all replicates associated with a given combination of levels, the initial variability is lost in ASCA. Because this variability can be used to evaluate the magnitude of the observed differences between these (combinations of) levels, an extension of ASCA was proposed to reflect it. 8 The variation among the replicates can be calculated a posteriori by projecting the augmented effect matricesX ϵ , onto the respective loadings P B f obtained previously by applying SCA to the pure effect matrices X B f , leading to new scores T B f ,aug defined as the augmented scores of the rebalanced mean matrix X B . However, the rebalanced mean matrix analyzed by ASCA has already lost the natural variability of the initial data matrix X U and the scores subsequently calculated do not reflect it.

| Representing the natural variation in RASCA models
So far, steps 1 to 3 are very similar to applying the traditional ASCA algorithm, except that it is applied to the rebalanced mean matrix X B . However, representing the natural variation of the unbalanced data X U in RASCA models is needed for a proper evaluation of the results. It is proposed that this variation can be retrieved by first performing a synthetic ANOVA decomposition in step 4. The idea of this approach is to perform an ANOVA decomposition of the initial matrix X U associated with the design Y U using the previously calculated mean effect vectors x f ,l stored in step 2b.
To do so, each pure effect matrix X U f of the initial data matrix X U is determined by assigning to each replicate the associated mean effect vector x f ,l of a given factor and level, or combination of levels. Once the pure effect matrices X U f are formed, the residuals matrix X U ϵ is calculated by subtracting all pure effect matrices from X U . The scores of the pure effect matrices X U f can be calculated using the loadings P B f obtained in step 3 such that Nevertheless, these scores still do not represent the natural variability in X U . Similarly to the procedure proposed by Zwanenburg et al, 8 the augmented effect matricesX U f are calculated in step 6 such thatX Then, the augmented scores representing the natural variation in the data are calculated in step 7 such that Although four sets of scores can be extracted from RASCA, namely, and T U f do not represent the natural within variation of the (combination of) levels and are solely representing the between variation. Hence, these scores show equal values for each level, or combination of levels. The only difference between T B f and T U f is the number of replicates per combination of levels. This is also the case for the augmented scores T B f ,aug , because the ANOVA decomposition of X B based on averages of unique combinations of factor levels leads to a null residuals matrix X B ϵ , thus only reflecting the between variation. The most relevant set of scores to evaluate the natural variability between the replicates of each level of a factor, or combination of levels, is therefore T U f ,aug .

| ANOVA terms explained variance in RASCA
The ANOVA decomposition of a data matrix can also be expressed as a term-specific sum of squares (SSQ). These terms represent the total amount of variance explained in the data matrix by each effect included in the ANOVA model. As proposed by Vis et al., 25 k Á k 2 is the overall sum of the squared elements of each ANOVA model term matrix, called the squared Frobenius norm. These SSQ can be expressed as explained variation in the data matrix for each main effect, interaction and residuals terms as shown in Equation 1.
In RASCA, the explained variance of each ANOVA model term is calculated based on the effect matrices obtained after the synthetic ANOVA decomposition in step 4 of Figure 1. It has to be noted that X U refers more generally to the initial matrix analyzed by RASCA whether it truly is unbalanced or not and X μ U is the grand mean matrix of the initial data. However, the sum of the percentages of explained variance is not equal to 100% with this approach when the design is unbalanced.

| SIMULATED EXAMPLE
For sake of comparison, a first simulated example already described in the literature 19 is proposed to illustrate the effect of unbalanced designs on model parameters estimation. In the case of a two-fixed effect factors design, that is, α and β with a and b levels, respectively, a single variable is decomposed as in Equation 2.
x ijk is the response measured for one variable of the k th replicate of the i th and j th levels of factor α and β, respectively. For a given variable, μ is its grand mean; α i is the effect of the i th level of factor α; β j is the effect of the j th level of factor β; αβ ij is the interaction term of the i th and j th levels of factor α and β, respectively; and ϵ ijk is the error term. Let us now define for this model factors α and β with a ¼ 2 and b ¼ 3 levels, respectively, and k ¼ 2 replicates with the following known ANOVA model parameters: μ ¼ 10, α ¼ ðα 1 , α 2 Þ ¼ ð2, À 2Þ, β ¼ ðβ 1 ,β 2 , β 3 Þ ¼ ð1, 0:5, À 1:5Þ, αβ ¼ 0, and σ 2 ¼ 0. For this model where the interaction and error terms are null, the response vector x B (Equation 3, top) of the balanced design can be calculated using Equation 2. The model parameters associated with this response vector can then be estimated using the classical ANOVA decomposition based on differences of means or the GLM methodology, coupled to sum or weighted effect coding. Given that the design is balanced, the results are equal and the true model parameters defined above are retrieved. Suppose now that the 11th and 12th measures are missing, leading to the unbalanced response vector x U (Equation 3, bottom). The parameters subsequently estimated using classical ASCA, WE-ASCA, ASCA+, and RASCA are shown in Table 1.
As already reported by Thiel et al., 19 Table 1 shows that the parameters estimated by classical ANOVA decomposition based on differences of means (ASCA) are biased for the main effects, while an interaction effect appears even though it is absent in the true model. The latter is not the case with the GLM methodology coupled to the WE coding (WE-ASCA) but the main effects are different from the true model. This can be explained by the fact that although the design matrices obtained with sum or WE coding have the same dimensions, the effect of each level in WE coding represents its deviation from the weighted mean, whereas sum coding computes the deviation from the grand mean. It should be noted that ASCA+ and RASCA estimates are unbiased and correspond to the true model parameters, both for main effects and interactions. ASCA+ and RASCA lead therefore to identical parameter estimates but the main difference lies in the number of replicates associated with each combination of levels. If the initial dataset is unbalanced, ASCA+ effect matrices remain unbalanced, while they are balanced with RASCA. This property of RASCA has the critical advantage of extracting mutually orthogonal effect matrices in addition to the unbiased estimation of model parameters. It has to be noted that whereas ASCA+ effect matrices are not mutually orthogonal, the interaction terms obtained using WE-ASCA are orthogonal to the main effects, but the main effects are not mutually orthogonal.
The decomposition of the rebalanced mean matrix in RASCA ( Figure 1, step 2) does not properly reflect the error term of the model because this natural variation was stripped away from the data by the averaging process. For this reason, the error term has to be back-calculated through the synthetic ANOVA decomposition of the initial data matrix as performed in step 4 of Figure 1. Given that the parameters estimated with ASCA+ and RASCA are equal and unbiased, the error term back-calculated with RASCA is identical to the one from ASCA+.

| Comparison framework
ASCA results obtained when analyzing complete datasets with balanced groups were chosen as a reference to investigate the effect of increasingly unbalanced designs and assess how the different methodologies were able to handle unequal group sizes. Since applying ASCA to a balanced design gives the same results independently of the ANOVA decomposition methodology, it appears as a reasonable common basis to compare the various strategies. Figure 2 illustrates the proposed comparison framework. In order to perform these comparisons, a balanced design Y B was gradually unbalanced as defined in step 1. It is not possible to unbalance a design by removing specific observations associated with a level of a given factor without affecting the other factors. As a consequence, the notion of interchangeable units was considered to control the process of unbalancing the designs. 26 An interchangeable unit is defined as a unique combination of levels between the factors and constrained unbalancing was performed following this principle.
Let us consider an example with a design involving two-fixed effect factors with two and three levels, respectively. This design has thus six unique combinations of levels. If each unique combination of levels contains 10 replicates, it is possible to iterate over the total number of unique combinations of levels (six in this case) and the number of replicates to be removed, m, for a specific combination of levels, leaving the others untouched. From a total of 10 replicates, it is possible to remove from zero (leaving the design balanced) to eight replicates. At least two replicates are indeed needed to estimate the two-term interaction. Hence, the extent to which a design is unbalanced increases with the number of replicates removed. It is possible to estimate how many exclusion iterations must be performed for each combination of levels according to the available number of replicates per combination using combinatorial statistics without replacement. It corresponds to the number of possible combinations of m replicates to be randomly excluded among the total number of replicates k per combination of levels, calculated as k!=ððk À mÞ! Á m!Þ, where ! denotes the factorial of a scalar.
Once the initial design Y B has been unbalanced in Y U , ASCA U , ASCA+, WE-ASCA, and RASCA were applied to the data matrix X U in step 2, resulting in multiple sets of scores and loadings. The signs of the scores and loadings being ambiguous in PCA, the various sets obtained were corrected in step 3 with respect to the reference case using a correlation matrix 27 between the principal components (PCs) of the balanced case (ASCA B ) and those of the unbalanced solutions (ASCA U , ASCAþ, WE-ASCA, and RASCA). Additionally, the observations in the rows of X U are not necessarily the same as those in the rows of X B since some samples were removed. In order to retrieve the scores of all the observations in the initial design, scores were back-calculated in step 4 of Figure 2 by projecting the data matrix X B onto the space defined by the loadings of each method to obtain At each iteration, the resulting scores T M f ,X B (step 4) and loadings P M f (step 3) were compared with the reference case in step 5 using the modified RV-coefficient. 28 This coefficient, noted RV 2 hereafter, takes values between À1 and +1 and is an intuitive metric for matrix correlations since it can be interpreted in the same way as Pearson's correlation. It has the advantage of providing a single number characterizing the relationship between pairs of matrices and was used to assess the agreement between the scores or loadings obtained in the balanced case to those evaluated from the unbalanced ones.
Furthermore, the loadings of the variables were also compared using their rank. When interpreting a model, the focus is usually on variables with the highest, positive or negative, contributions to the observed effects. The idea of the F I G U R E 2 Comparison framework to assess the effect of unbalancing an experimental design in a controlled fashion suggested procedure is to sort the loadings in decreasing order of their absolute value for the reference case (ASCA with the complete balanced design) and the unbalanced test cases (ASCA, ASCA+, WE-ASCA, and RASCA) to compare the ranking of the variables. In the ideal situation, it is expected to be the same for both balanced and unbalanced solutions. This comparison was performed exhaustively for subset sizes ranging from 1 to the total number of variables to determine how many and which variables are present or absent in the unbalanced solutions with respect to the balanced one.

| Data and software availability
ASCA, ASCA+, WE-ASCA, and RASCA were applied to two real case studies within the comparison framework described above. Both datasets originate from omics studies and have a reasonably high number of observations, which is of utmost interest to investigate the effect of unbalancing experimental designs in a controlled fashion. These datasets have already been characterized in the literature and the experimental setup and procedures leading to the acquisition of the measurements are thus not discussed in detail in this article. The reader is referred to the below-cited literature for more information.
The Matlab R2021a tools and software implementations used in this work are made open access via a Zenodo archive (https://doi.org/10.5281/zenodo.5825078) distributed under the MIT license. A template to apply the comparison framework described above is provided as a Matlab script in the supplementary material.

| Case study 1
Case study 1 relates to the analysis of metabolomic profiles collected from Arabidopsis thaliana after leaf wounding. 29 The metabolic response was monitored over time for wild type and mutant plants. Each plant was mechanically wounded and harvested at four time points post-wounding, namely, at 0, 0.5, 2, and 5 h. The plant extracts were analyzed by ultra-high performance liquid chromatography coupled with a time-of-flight mass spectrometer in electrospray negative ionization mode. The dataset was downloaded from the MetaboAnalyst website (https://www.metaboanalyst. ca/resources/data/cress_time.csv, accessed 11.03.2022). It was obtained from a full-factorial design with two fixed effect factors, namely, plant type and time after wounding, with two and four levels, respectively, leading to a total of 72 observations characterized by 124 features (peak intensities). Each unique combination of factor levels involved nine observations. Autoscaling was applied to all features.

| Case study 2
Case study 2 relates to gene expression experiments obtained by mRNA extraction from left liver lobe samples from male Fisher rats after drug-induced hepatotoxicity at different doses and monitored over time. 30 Four doses of acetaminophen (50, 150, 1,500, and 2,000 mg/kg) were administered to the rats involving a time-course with 4 points (6, 18, 24, and 48 h). The dataset consists of a full-factorial design with two fixed effect factors, namely, dose and time, with four levels. Each of the 16 unique combinations of factor levels involved four observations, leading to a total of 64 observations characterized by 3,116 genes. To avoid numerical errors, two genes with missing values were removed, resulting in a total of 3,114 genes. This dataset was log transformed and normalized by the authors of the initial study, and it can be retrieved from their supplementary material.

| Case study 1
As described in Section 4.1, the proposed comparison framework involves gradually unbalanced designs covering each possible combination of levels. Thus, scores RV 2 are calculated for each combination. For sake of simplicity, Figure 3 top row shows average RV 2 values between the scores obtained in the balanced and unbalanced solutions as a function of the number of observations removed across all combinations of levels. Given that this dataset has originally nine observations per unique combination of levels, a maximum of seven observations could be removed per combination. In the rest of this manuscript, the extreme case refers to the case where the maximum number of observations was removed. Additionally, ASCA B and ASCA U refer to the solutions based on a classical ANOVA decomposition for the balanced and unbalanced cases, respectively. The RV 2 when 0 observations were removed corresponds to the balanced case. Figure 3 top row shows that ASCA, ASCA+, WE-ASCA, and RASCA gave identical results based on the complete balanced case. The RV 2 curves represent the mean values over all iterations with the standard deviations shown as error bars. A decrease was observed when the designs were gradually unbalanced. However, even in the extreme case, correlations remained high for all methods. RASCA and ASCA+ showed higher RV 2 values for the plant type and time factors compared with the other methods. As for the interaction, ASCA U showed lower RV 2 , while the other methods were associated with values closer to 1.
The RV 2 is an overall correlation measure useful for the global comparison of methods, but it was also observed that values could be significantly impacted depending on the specific unbalanced combination of levels. Figure S1 illustrates RV 2 plots for each combination of levels and each effect. In the extreme case, the lowest RV 2 for the plant type factor was around 0.92 for combinations of levels 1-3 and 2-3. For the time factor, the lowest RV 2 were around 0.76 for combinations 1-2, 1-3, and 2-3. For both main effects, ASCA U and WE-ASCA showed overall lower RV 2 than RASCA and ASCA+ for all combinations of levels. As for the plant type-time interaction, RASCA, ASCA+, and WE-ASCA showed higher RV 2 than ASCA U for all combinations, and the smallest value on average was 0.95 for combination 2-2.
The grouping of the observations in the score plots is one of the tools that can be used to evaluate the magnitude of the effects. A selection of one combination of levels per effect illustrating the greatest differences was made and for ease of illustration, the score plots of the extreme case are shown in Figure S2. In this case study, both levels of the plant type factor could be separated for all combinations of levels, but the projections obtained with RASCA and ASCA+ were more similar to the balanced solution than ASCA U and WE-ASCA. For the time factor, ASCA U and WE-ASCA showed substantially different score projections compared with the balanced case for combinations 1-2, 1-3, and 2-3, where levels 2-4 were not completely separated for combination 1-3, even in PC3. As for the interaction term, although all combinations of levels could be separated with all methods using the three first PCs, the projection of the observations along the PCs could substantially vary depending on the unbalanced combination of levels as shown for combination 1-4 in the supporting information. As was the case for the scores, RV 2 values calculated for the loadings decreased when the design was gradually unbalanced and showed similar RV 2 profiles to the scores. For ease of illustration, average RV 2 curves as a function of the number of observations removed were used, independently of the unbalanced combination of levels, but also for each one of them, as shown in Figure S3. On average, RASCA and ASCA+ led to higher RV 2 compared with the other methods for the two main effects. It has to be noted that RASCA and ASCA+ had a lower RV 2 for the combination 1-2 of the plant type factor. As for the interaction term, RASCA, ASCA+ and WE-ASCA showed higher RV 2 than ASCA U .
The ranking of the features based on the loadings in absolute value was then compared between the balanced and unbalanced cases to further investigate the effect on interpretation of gradually unbalancing the designs. This comparison was performed independently of the unbalanced combination of levels. Using all the results obtained for the extreme case, the procedure iterated from 1 to the maximum number of features, 124 in this example, and the number of lost features in the unbalanced solutions with respect to the balanced case was evaluated at each step. Figure 3 bottom row shows the average number of lost features per PC as a function of the maximum number of features retained. If the designs were balanced, the results would have been equal for all methods, and the number of lost features would have always been 0. Thus, curves close to 0 are associated with results most similar to the balanced case. As previously, RASCA and ASCA+ gave similar results for all effects, and their curves are overlaid for the plant type factor (Figure 3, bottom left). They had fewer lost features on average for the time factor and the plant type-time interaction. For the plant type factor, the features were separated into two groups. In the first group, RASCA and ASCA+ had on average two lost features more (i.e., 1.6% of the total number of features) than the other methods. After retaining around 65 features, all four methods were close to zero lost features. Afterwards, the average number of lost features was lower for RASCA and ASCA+ than the other methods.
In addition to the number of lost features, their label, along with the label of the features replacing them, was also compared for each PC of each effect. By investigating the frequencies of the lost features and the replacing ones, it was observed that independently of the method used, the same features were lost most of the time and the same replacing features were included. For ease of illustration, the average number of lost features for each PC and effect individually, along with the frequencies of lost and replacing features for PC1 of the time factor, as an example, are shown in Figure S4. As described above, RASCA and ASCA+ led to similar qualitative and quantitative results regarding the lost and replacing features for all effects. The actual number of lost features varied between the methods as shown in Figure 3 bottom row, and this influenced the frequencies of the lost features. Figure 4 top row shows average RV 2 values computed between the scores obtained in the unbalanced solutions and the balanced one as a function of the number of observations removed. Given that case study 2 has four observations for each of the 16 unique combinations of levels, a maximum of two observations could be removed per combination. Again, the RV 2 when zero observations were removed corresponds to the complete balanced design. In this case, ASCA, ASCA+, WE-ASCA, and RASCA gave identical results with an RV 2 of 1. As previously, the RV 2 curves decreased when the design was gradually unbalanced, but as can be seen for all the effects, the average RV 2 values were greater than 0.99. Figure 4 top row shows average RV 2 trends independently of the unbalanced combination of levels. As previously, RV 2 values for each combination of levels and each effect are shown in Figure S5. They illustrate that the RV 2 changed depending on the unbalanced combination of levels, but all values remained high (close to 1).

| Case study 2
In this case study, all methods led to similar score plots compared with the balanced case independently of the unbalanced combination of levels. A selection was made to show the average score plots of the extreme case of a given combination of levels for each effect in Figure S6. It has to be noted that levels 3 and 4 were not completely separated in both balanced and unbalanced cases for the dose factor. As for the time factor, all levels were separated in the first three PCs.
The RV 2 curves based on the loadings are shown in Figure S7. RV 2 values decreased when the design was gradually unbalanced. In the extreme case, an average value around 0.98 was found for all methods for the dose factor. For the time factor, all methods had an average RV 2 greater than 0.99. As for the interaction term, all methods had a value around 0.96.
As previously, the ranking in absolute value of the variables in the loadings was compared between the balanced case and the unbalanced ones for a total of 3,114 variables. Figure 4 bottom row shows the number of lost variables on average per PC as a function of the maximum number of variables retained. Despite the fact that high RV 2 values were obtained for both scores and loadings associated with all effects, Figure 4 bottom row shows that the order of the most contributing variables may still be affected differently depending on the method. For ease of illustration, the average number of lost variables for each PC and effect individually is shown, along with the frequencies of lost and replacing variables for PC1 of the time factor, as an example, in Figure S8. As previously, by investigating their frequencies, it was observed that independently of the method used, the same variables were lost most of the time and the same replacing variables were included. The observed differences were induced by the actual number of lost variables, which impacted the frequencies.

| DISCUSSION
The comparison framework proposed in this study aimed at providing a general strategy to evaluate how methods comparatively deal with unbalanced designs. Each method providing a large number of results for each effect in the design, investigating the scores and loadings for each effect of each method individually can be cumbersome. To simplify this process, the RV 2 of the scores and loadings was chosen as an objective metric to compare the results with the balanced situation. For sake of brevity, a selection of score plots was placed in the supporting information. The number of lost variables was then used as an intuitive measure to evaluate the loadings with the possibility to highlight which variables were lost and with which frequency. The illustration of these two metrics gave an overall picture focusing on the comparison between the balanced and unbalanced solutions.
It is important to note that although the solution of the balanced case was used as a reference for the comparison in each case study, it constituted only a snapshot of the phenomena of interest. Thus, if more observations were to be collected, the reference solution would change. However, these samples still formed reasonably large balanced designs, which were gradually unbalanced in a controlled framework. Moreover, caution has to be taken to generalize the obtained results to other datasets and data preprocessing treatments. It is suggested that additional investigations are needed to evaluate the ability of the various methodologies to handle unbalanced designs of experiments. The tools and comparison framework developed in this work can serve as a basis to consider data from different sources, such as spectroscopic signals, as well as other analytical instrumentations. Furthermore, efforts remain to be made for developing and improving comparison frameworks to assess the effect of unbalanced designs on the analyses. In this work, the ability of state-of-the-art methods to cope with the effect of gradually unbalancing the designs of two real datasets was objectively investigated through the evaluation of the scores and loadings.
The RV 2 of the scores reflected the proportionality of the association between the scores obtained from the unbalanced solutions and the balanced situation, giving information on the grouping of the observations. Values close to 1 indicated a similar grouping of the observations, while lower values reflected differences. As expected, the scores' RV 2 computed between the unbalanced solutions and the balanced reference decreased for both datasets when the designs were gradually unbalanced. The differences in case study 1 were shown to be dependent on the studied effects and the structure of unbalanced combinations of levels, whereas in case study 2, all methods provided consistently high RV 2 for the scores and loadings for all effects. The presented results were mostly focused on the extreme cases where designs were unbalanced at their highest level. This ensured that the differences in the results were maximized. The small number of observations per unique combination of levels in case study 2 prevented from creating highly unbalanced designs, and this may explain why the unbalanced solutions were very similar to the balanced one. In fact, case study 1 also showed that for lower numbers of removed observations, the results of the unbalanced solutions remained most similar to the balanced one. These results seem, therefore, to confirm the intuitive perception that the extent to which a design is unbalanced impacts the analyses.
The obtained scores' RV 2 were related to the associated score plots in order to investigate the effect on the grouping of the observations when the designs were gradually unbalanced. In most situations, all methods achieved comparable separations of (combination of) levels with respect to the reference case. Nevertheless, even though separation was possible, unbalancing the designs still affected the projection of the samples in the space of the PCs, subsequently impacting the interpretation process through the loadings. In fact, the analysis of loadings led to lower RV 2 values than the scores, indicating that even in cases of similar grouping of the observations, differences were induced in the loadings. Although other metrics exist for the comparison of coupled matrices, 31 the RV 2 coefficient was found to provide an intuitive common basis for the comparison of the various strategies, but it should be emphasized that the values obtained were in general greater or equal to 0.95 and differences in this region should not be over-interpreted. This is especially true for case study 2 where RV 2 values were close to 1 for both scores and loadings.
The investigation of the loadings on a second level was mostly focused on qualitative differences between the reference case and unbalanced solutions. The attention being brought in practice to the variables with the highest contributions in loadings, it was shown that depending on the method, the most contributing variables in absolute value in the balanced case may be different in the unbalanced ones. Although the methods showed differences in the number of lost variables, it was shown that the most frequently lost and replacing variables were in general similar. Moreover, investigating the trends of the lost variables revealed interesting structures in the data, potentially associated with the relevance of the variables. In case study 1, variables of PC1 for both the plant type factor and the interaction formed two groups where the curves of the number of lost variables went back to 0 before increasing again. One hypothesis to explain this bimodal distribution could be related to a prior selection of the variables to generate the dataset from a larger data matrix based on the plant type factor. Furthermore, as shown in the supporting information, by investigating the number of lost variables for each PC of each effect individually, it was observed that the maximum number of lost variables tended to increase with the number of successive PCs extracted. This observation may be explained by the lower amount of variance explained by these successive PCs. This effect is exacerbated for the high-dimensional dataset in case study 2, which may be caused by the presence of many noisy variables, or at least variables not related to the experimental factors.
The comparison framework proposed in this study offers the possibility to investigate how different methods cope with unbalanced designs, along with the effect of specific combinations of levels on the analysis when they are unbalanced in a controlled fashion. As shown in Figure S1, the ranking of the methods based on RV 2 values could change depending on which combination of levels was unbalanced. The evaluation of the variables frequently lost or those replacing them can also open new interpretation possibilities regarding the evaluation of the relevance of the variables. This analysis may improve the understanding of the systems under scope.
Overall, RV 2 values for both scores and loadings reflected high correlation between the balanced and unbalanced solutions showing high association between the rows of these matrices. The multiple RV 2 plots showed that all methods give identical results in balanced cases. Even though differences appeared when the designs were gradually unbalanced, the metrics used in the framework gave similar results for low numbers of removed observations. It remains, however, difficult to anticipate the extent of the impact of an unbalanced design on the analysis. In that sense, although these metrics are useful to explore the large amount of results obtained when comparing the various strategies, the statistical properties of the methods should not be overlooked as they are independent of the designed data. Nevertheless, the suggested framework constitutes a reproducible basis to assess objectively the ability of these methods to handle the two main challenges occurring when designs are unbalanced, that is, unbiased parameter estimation and orthogonality of effect matrices.
Classical ASCA does not solve the biased parameters estimation and the non-orthogonal effect matrices. This may result, as shown in the simulated example, in the occurrence of an artifactual interaction effect. In addition, biased parameter estimates may increase the rank of the pure effect matrices, thus increasing the number of extracted SCA components as shown by Thiel et al. 19 Interestingly, WE-ASCA takes the prevalence of the (combination of) levels into account to estimate the mean effects. As shown in the simulation example, this leads to estimations of the parameters for the main effects that are not the ones of the true model. In case study 1, WE-ASCA showed lower RV 2 for the main effects compared with RASCA and ASCA+. However, the interaction terms estimated using WE-ASCA are orthogonal to the main effects, 23 and this property may explain why WE-ASCA solutions showed high RV 2 values for the scores and loadings of the interaction term in case study 1. It seems, therefore, that WE-ASCA does not completely tackle the two challenges. RASCA and ASCA+ led to very similar results, probably because their parameter estimators are equal, as shown in the simulated example and throughout the real case studies investigated, leading to identical scores. However, it should be noted that different loadings were obtained. As RASCA first performs an estimation of the barycenter of the combinations of levels through a rebalancing strategy based on averages, the pure effect matrices are thus mutually orthogonal. The loadings are then calculated based on these matrices, whereas ASCA+ uses unbalanced pure effect matrices to calculate the loadings. Despite the mean effect vectors being equal for both methods, the main difference is the number of replicates associated with each unique combination of levels. Although the orthogonality of the effect matrices in RASCA did not lead to substantially different results compared with ASCA+ for both real case studies, it constitutes an important property in addition to unbiased parameter estimation when facing unbalanced designs. While ASCA+ aims at solving biased parameter estimates, RASCA brings an intuitive solution to obtain both unbiased parameter estimates and rigorously orthogonal effect matrices.

| CONCLUSIONS
A novel strategy based on the calculation of averages for each unique factor levels combination in association with ASCA was proposed to efficiently handle high-dimensional data generated using unbalanced designs of experiments. The ability of RASCA to handle unbalanced designs was compared with ASCA, ASCA+, and WE-ASCA using a controlled framework involving gradually unbalanced designs. It was shown that all methods provided identical results in the reference setup when the design was balanced. However, differences were observed when the designs were gradually unbalanced. RASCA demonstrated to be an efficient alternative to deal with unbalanced designs. It showed consistently high correlation with the balanced solution for the main and interaction effects indicating that the estimation of the barycenter of the (combinations of) levels through the rebalancing strategy proposed was efficient. In addition to the estimation of unbiased model parameters, RASCA solves the issue of the non-orthogonality of the effect matrices, which ensures that variations of the individual effects are not mixed. This advantage is of interest with respect to the interpretation of the models. More generally, it is not straightforward to assess a priori the effect of unbalancing on the analysis. For this reason, the ability of these strategies to handle unbalanced designs should be further investigated in controlled frameworks.