Shift‐invariant tri‐linearity—A new model for resolving untargeted gas chromatography coupled mass spectrometry data

Multi‐way data analysis is popular in chemometrics for the decomposition of, for example, spectroscopic or chromatographic higher‐order tensor datasets. Parallel factor analysis (PARAFAC) and its extension, PARAFAC2, are extensively employed methods in chemometrics. Applications of PARAFAC2 for untargeted data analysis of hyphenated gas chromatography coupled with mass spectrometric detection (GC‐MS) have proven to be very successful. This is attributable to the ability of PARAFAC2 to account for retention time shifts and shape changes in chromatographic elution profiles. Despite its usefulness, the most common implementations of PARAFAC2 are considered quite slow. Furthermore, it is difficult to apply constraints (e.g., non‐negativity) to the shifted mode in PARAFAC2 models. Both aspects are addressed by a new shift‐invariant tri‐linearity (SIT) algorithm proposed in this paper. It is shown on simulated and real GC‐MS data that the SIT algorithm is 20–60 times faster than the latest PARAFAC2‐alternating least squares (ALS) implementation and the PARAFAC2‐flexible coupling algorithm. Further, the SIT method allows the implementation of constraints in all modes. Trials on real‐world data indicate that the SIT algorithm compares well with alternatives. The new SIT method achieves better factor resolution than the benchmark in some cases and tends to need fewer latent variables to extract the same chemical information. Although SIT is not capable of modeling shape changes in elution profiles, trials on real‐world data indicate the great robustness of the method even in those cases.


| INTRODUCTION
Parallel factor analysis (PARAFAC), also known as canonical decomposition (CANDECOMP), is a tensor decomposition method that is used to find a unique low-rank approximation to higher-order tensor data. 1,2 An important assumption for the PARAFAC model to provide chemically meaningful models is that the data must follow a low-rank multilinear structure. Multi-linearity means that a sum of the outer products of a set of factors can approximate the data well. This concept is an extension of bi-linear modeling, as shown in Figure 1, and can also be seen as the mathematical model of the Beer-Lambert law. Examples of bi-linear models are multivariate curve resolution (MCR) or principal component analysis (PCA). Contrary to MCR, PARAFAC provides only one unique set of multi-linear factors that minimizes the residual error under reasonable assumptions. 3 Thus, the solution of the PARAFAC model is unique under mild conditions, as described by Harshman and Lundy 4 and proven by Kruskal. 5 PARAFAC is often used in chemometrics to analyze datasets, such as excitation-emission fluorescence spectroscopy or hyphenated chromatography measurements. 6 Unfortunately, the PARAFAC assumption of low-rank tri-linearity for chromatographic data is rarely fulfilled because of shifts of the peak signals in the retention time mode. 7,8 Therefore, the PARAFAC model is less useful for this type of data because a single latent variable cannot be used to model elution profiles across the sample mode (i.e., the principle of parallel proportional profiles). 7 A more flexible model that can handle the described deviations from trilinearity is the PARAFAC2 model. 9,10 In the PARAFAC2 model, the factors in one mode are allowed to shift and change shape to some extent. More specifically, it is only required that the factors in this mode have constant crossproducts. Thus, factors (elution profiles) in one mode can vary for each sample in another mode under the constraint that the cross-products between the varying factors remain constant from sample to sample. The PARAFAC2 model has been proven to be very successful in modeling hyphenated chromatographic data, like gas chromatography coupled with mass spectrometric detection (GC-MS). 8,11 However, some challenges still need to be addressed to facilitate the use of PARAFAC2 and broaden the application area.
In the classical alternating least squares (ALS) implementation of the PARAFAC2 model, it is not generally possible to impose constraints on the shifted mode. 10,12 This is because the constraints must be imposed on the product of two matrices and not on a single-factor matrix. Alternative implementations have been developed recently that enable the use of constraints in all modes. [12][13][14][15] Some of the developed approaches show unexpected properties (e.g., more components are required to achieve a suitable PARAFAC2 solution). 15 Although some important improvements have been made to increase computational efficiency, 16 most algorithms for calculating the PARAFAC2 model 13,16,17 are still comparatively slow. This is particularly problematic in the analysis of untargeted chromatographic data because hundreds, or even thousands, of low-rank PARAFAC2 models must be fitted to extract chemical information from a high-rank dataset. 11 This paper proposes an alternative algorithm for calculating the PARAFAC2 model that is fast and allows all modes to be constrained. It is based on a novel shift-invariant tri-linearity (SIT) constraint to achieve a unique solution. The SIT method conceptually relies on PARASIAS 15 and a flexible tri-linearity implementation. 18 It specifically overcomes the issue of rank inflation reported for the PARASIAS method. 15 The paper is organized as follows: Section 2 gives an overview of the relevant theoretical concepts leading to the introduction of the proposed method. Section 3 describes the experimental setup and the datasets used for the performance evaluation, and Section 4 provides the results accordingly. A conclusion is given in Section 5.

| PARAFAC and PARAFAC2
Let the matrix X k I Â J ð Þ be the k th slab of the tensor X I Â J Â K ð Þ . Then the rank R PARAFAC model is given by Equation (1) and the least squares loss function by Equation (2). In this model, A and B are factor matrices with dimen- containing the weights for the k th slab. The diagonal elements of D k are the elements of the k th row of the factor matrix C K Â R ð Þ and E k are the residuals with dimension I Â J ð Þ.
Algorithmically, the solutions for A, B and C can be found by ALS. In ALS, each factor matrix is estimated by solving a least squares problem, holding the other two factor matrices constant. The expression BʘC denotes the Kathri-Rao product (column-wise Kronecker product) of the matrices B and C.
Update A, holding B and C constant Update B, holding A and C constant For each update, X must be unfolded along a different mode. The resulting matrices are visualized in Figure 2. Several computational adjustments to the procedure described above have been published to make the algorithm more efficient. A good summary is, for example, given by Tomasi and Bro. 19 This procedure is repeated until the relative change in the loss function value is below a pre-defined convergence criterion.
The PARAFAC2 model was first described by Harshman. 9 It is an extension of the PARAFAC model that allows us to find for each k th slab of X an individual set of factors B k . Thus, the rank R PARAFAC2 model for the k th slab of X is given by Equation (3) and the loss function by Equation (4).
To maintain uniqueness, the factor matrices B k are constrained to have a constant cross-product matrix H for all k. This implies that B k can be expressed as the matrix product of an orthonormal matrix P k with the cross-product matrix H (cf. Equation (5)).
The first efficient algorithm for fitting the PARAFAC2 model is the direct fitting algorithm proposed by Kiers et al. 10 Basically, the idea is to split the optimization problem into two steps. First, P k is updated by calculating the SVD solution of the matrix X T k AD k H À Á . In the second step, X k is compressed by the projection on P k and A, and D k and H are updated by running a few iterations of a PARAFAC model. However, one problem with this algorithm is that the B k matrices cannot be constrained easily. This is because B k is calculated implicitly by the product of the cross-product matrix H with the orthonormal matrix P k , as shown in Equation (5). The difficulties of constraining B k in the classical PARAFAC2-ALS framework are described in detail by Cohen et al. 12 More recently, the PARAFAC2-flexible coupling (PFFC) algorithm has been published that allows B k to be, for example, non-negatively constrained. In this algorithm, the hard coupling B k ¼ P k H is replaced by the more flexible coupling argmin μ k B k À P k H j j j j 2 . In other words, B k must not be equal to but close to P k H. The non-negative flexible coupling algorithm has been reported to be more robust and sometimes provides better factor resolution than the classical PARAFAC2-ALS. 12

| Tri-linearity constraints
An alternative algorithm for calculating the PARAFAC model was proposed by Tauler. 20,21 In this algorithm, the loss function is formulated like an MCR problem, as shown in Equations (6) and (7). This approach is also referred to as multi-set analysis. 22 The matrix X BC has dimensions IxJK ð Þand can be obtained by unfolding X, as shown in Figure 2.
F I G U R E 2 Different unfolding of a three-way array X.
Deviating from the conventional notation in MCR, we use V instead of C to describe the matrix of concentration profiles to prevent confusion with the factor matrix C introduced in the PARAFAC notation. The factor matrix V has size JK Â R ð Þand the factor matrix S is of size I Â R ð Þ. To obtain a unique solution, a tri-linearity constraint is sequentially imposed on the reshaped columns of the factor matrix V. Each column v r is first reshaped into a J Â K ð Þmatrix M. The tri-linearity assumption requires that the rows of M contain scalar multiples of the same factor. If tri-linearity holds, M should be approximately rank one at convergence up to noise. This is enforced by calculating M est from the first principal component (PC) of M as an estimate of M. A visualization of how the tri-linearity constraint is applied to V is given in Figure 3.
One advantage of the tri-linearity approach is that constraints can easily be applied component-wise. [20][21][22][23] Because tri-linearity is imposed factor-by-factor, another advantage is that tri-linearity can be relaxed factor-by-factor. This allows for greater flexibility in modeling deviations from tri-linearity with the potential expense of losing model uniqueness. Extensions of the tri-linearity constraint to quadri-linearity and multi-linearity have also been described. 24

| Discrete Fourier transformation (DFT)
The DFT is one of the most important mathematical operations in signal processing and engineering. The DFT projects a discrete periodic signal with a period length N from the time domain into the frequency domain, as shown by Equation (8). The result is the discrete, complex-valued function b f k ½ , whose real part describes the amplitude of frequencies contained in the signal f n ½ . The imaginary part of b f k ½ describes the phase of the frequencies in f n ½ . The inverse DFT reconstructs the signal in the time domain by a sum of weighted and phase-adjusted frequencies shown in The power spectrum is defined as the squared modulus of b f as shown by Equation (10). One important property of the power spectrum is its shift invariance. The shift information of an input signal is contained in the phase spectrum.
The shift-invariance property of the power spectrum is displayed in Figure 4. In this figure, the results of applying the DFT to a set of signals are illustrated. In (A), this set of signals was created by multiplying random weights to a Gaussian peak, and in (D), the same signal has been shifted randomly along the time axis. As can be seen by comparing (B) and (E), the shift in the time domain does not change the power spectrum of the signal. The shift information is, however, captured in the phase spectrum (F).
In practice, the DFT is calculated by the fast Fourier transformation (FFT) algorithm. 25 Using the most common radix-2 FFT algorithms requires the length of the signal to be a power of two. This can be easily achieved using zeropadding to convert the input signal length to the next larger power of two. A detailed description of other important properties of the DFT can, for example, be found in engineering or signal processing textbooks. 26

| SIT
One flexible implementation of the tri-linearity approach for handling shifted data was recently described for chromatographic data. 18 In this implementation, profiles are aligned based on some specific points within the signal (e.g., the peak maximum). However, the alignment procedure must be formulated explicitly, which limits the applicability of this approach.
To illustrate the difference between explicit and implicit alignment, we consider three signals with arbitrary intensity ( Figure 5). The first one is a normal Gaussian shape, and the second and third are exponentially modified Gaussian shapes (known to be a good model for skewed peaks in chromatography). 27 If the three signals are aligned based on the peak maximum, the result is a rank three system. Conversely, if the same three signals are aligned to a common phase spectrum, the result is a rank two system. The reason for this difference is that the shift-invariant power spectra of the second and third signals are identical and differ only in their phase. Hence, it can be concluded that an implicit alignment is more broadly applicable than an explicit alignment.
A method for modeling shifted GC-MS data implicitly is PARASIAS. 15 In this method, the elution profiles are transformed into shift-invariant power spectra using the DFT. The power spectra are then decomposed using the PARAFAC algorithm. Finally, the inverse DFT is applied to the factor matrix containing the estimates of the elution profiles in the frequency domain. A significant drawback of the PARASIAS decomposition is that it sometimes requires more latent variables than PARAFAC2 to achieve comparable results on the same dataset. 15 This is because the rank of the power spectra matrix is inflated when the raw elution profiles are a sum of multiple underlying elution profiles. This is a consequence of the presence of cross-terms in the power spectra of the raw profiles. In contrast, the proposed SIT is applied factor-by-factor and does not introduce cross-terms. This means that SIT does not artificially increase the rank, as does PARASIAS.
The proposed SIT method combines the idea of the flexible tri-linearity approach 18 and the idea of using DFT for the synchronization of shifted profiles 15 and is illustrated in Figure 6. The DFT is used to synchronize the shifted profiles stored in the reshaped factor matrix M (cf. Figures 3 and 4). The DFT is computed column-wise on each of the K profiles in M. The profiles are zero-padded as needed before calculating the DFT. Zero-padding can be used to adjust for differences in the length of individual profiles before reshaping v to M. After applying the DFT to the columns of M, a J Â K matrix ɸ containing the phase spectra and J Â K matrix containing the shift-invariant power spectra b M are obtained. The power spectra are calculated using Equation (10). Tri-linearity is imposed by performing a PCA decomposition of b M and calculating b M est as the outer product of the first PC. The estimate of the original, shifted signal in time domain M est is obtained by applying the inverse DFT to ɸ and b M est . The proposed method will provide an accurate solution if the matrix of power spectra b M has rank one for all R components. It is important to note that non-negativity on V is applied when V is estimated as the non-negative least squares solution of X T BC S S T S À Á À1 (compared with Equation (6)). Subsequently, factors can still become slightly negative after calculating the inverse DFT of b M est to obtain M est . This can be prevented by setting negative entries in M est to zero. The implementation of SIT shown in Figure 6 is not capable of modeling shifted, non-tri-linear data (e.g., elution profiles having different shapes across samples). In this situation, the rank of b M will be greater than one. This limitation can potentially be overcome by softening the SIT constraint. One approach to do this is by taking more than the first PC for the calculation of b M est : Thus, factors not behaving in a shifted, tri-linear manner can be modeled as bi-linear factors. 22,23 This approach allows for great flexibility, as has been shown in previous studies. 22,23 However, relaxing trilinearity constraints on some factors is expected to affect the uniqueness properties of the solution. 28 A detailed description of the algorithm is provided in Appendix A. To make it a fair comparison, simple random initialization was selected throughout all experiments. Non-negativity constraints were applied to the non-shifted modes for (1) and to all modes for (2) and (3). The convergence criterion was selected as 10 À9 for the relative change in the loss function. This means that the sequence would stop as soon as the relative change in loss function error between consecutive iterations is below 10 À9 . Further, the criterion to distinguish between local and global minima was selected as ΔEFP < 10 À8 , in accordance with the heuristic described by Yu et al. 30 In this case, ΔEFP means the absolute difference in explained fit percentage between the best fit model m 1,R and any other model m i,R taken from an ordered set of models with R components calculated on a specific dataset X. The maximum number of iterations was set to 2 Â    Figure 7 together with the true underlying elution profiles. Each of these 200 (50 different for each of the four conditions) tensors was fitted ten times with each algorithm, starting from different random initializations. The models with the highest explained fit percentage (EFP) out of the ten repetitive fits were then compared. This procedure was used to avoid comparisons biased by the presence of local minima and degenerate solutions. The goal of this experiment was to compare the performance of the different algorithms in terms of computation time, lack of fit, and quality of factor resolution.

| Apple wine
Solid-phase microextraction (SPME) GC-MS measurements have been carried out to monitor the fermentation of apple wine with three different yeast species. A total of 155 samples have been measured over 12 days of fermentation. During the runtime of 46 min, 7091 mass scans from 15 to 300 m/z were recorded to determine the volatiles formed during the fermentation process. The dataset was collected as part of a master's thesis at the Department of Food Science at the University of T A B L E 1 Showing the selected conditions for the simulation of different gas chromatography coupled with mass spectrometric detection (GC-MS) datasets.

Condition
Number of components Signal-to-noise ratio Copenhagen. The results are not published. Three intervals were selected that cover different real challenges in the analysis of GC-MS data. More specifically, Interval 12 has a peak with a low signal-to-noise ratio (S/N). Interval 18 contains one large, shifting peak that changes its shape with increasing intensity. Interval 33 contains two small peaks in the tail of one large cut-off peak. The selected datasets are shown in Figure 8. The goal of this experiment is to qualitatively evaluate how the different algorithms cope with difficult real-world data and to study the limitations of all algorithms comparatively.

| Simulated data
The results of the benchmark study on simulated data are shown in Table 2. Considering all the investigated conditions, SIT is drastically faster than the benchmark. Both PF2 and PFFC show overall similar performance but are F I G U R E 8 Gas chromatography coupled with mass spectrometric detection (GC-MS) data measured on different batches of apple wine fermentation. Interval 12: large baseline component and small peak. Interval 18: intense peak that changes its shape with increasing intensity. Interval 33: two very small peaks in the tail of a very large peak.
T A B L E 2 Results for the benchmark study on simulated data. The results in columns "Iterations", "Computation time", and "EFP max " were calculated as the mean over the 50 datasets for each respective condition. The "Tucker congruence" has been calculated by averaging first over the Tucker congruences of the factor matrices for each of the three modes in one dataset and then by taking the mean over all 50 datasets. distinguishable for specific conditions (e.g., PF2 is slower on Conditions 1, 2, and 4 but significantly faster on Condition 3). The speed-up factor of SIT ranges between 19 and 28, comparing it to the fastest algorithm for each condition, respectively. All algorithms provide very similar EFP values. The added noise puts an upper boundary on the EFP max that can be achieved. Thus, the EFP values of the models (EFP model ) are evaluated relative to the EFP max of a given dataset and noise level. If EFP model (denoting the fit of one of the three algorithms on a specific dataset) exceeds EFP max , the model is overfitted, and if EFP model is below EFP max , the model is underfitted. Therefore, the S/N ratio for each simulated dataset was calculated to determine EFP max . Fit differences, EFP dif , were calculated by subtracting EFP max from EFP model . The mean EFP max values and the mean EFP dif values are reported in Table 2. Negative EFP dif values indicate a lack of fit, and positive values indicate overfitting of the models. From the results in Table 2, it becomes obvious that all models have a slight tendency to overfit the data. However, this tendency generally increases with higher noise levels and a larger number of components. For low noise levels, the EFP dif values are undistinguishable between all models. For higher noise levels, PF2 shows the largest tendency for overfitting, followed by PFFC. The SIT algorithm has a lower tendency for overfitting compared with PFFC and PF2. It follows the expectations that PFFC and SIT have lower fit values than PF2 because they have been more constrained with non-negativity applied to all three modes instead of just two modes. The quality of the resolution was measured by the Tucker congruence 31 (cosine) between the modeled factors and the underlying true factors for each mode. Major differences in the quality of the resolution can be seen for modes A and B while all algorithms provide excellent estimates for mode C. All conditions considered, SIT provides the best resolution, while PF2 tends to perform better than PFFC for lower noise levels and vice versa for higher noise levels. The superior robustness of PFFC compared with PF2 for higher noise levels was also reported by Cohen et al. 12 However, except for the most challenging Condition 4, all algorithms provide good factor resolutions with Tucker congruences higher than 0.9.
In Figure 9, the true and estimated factors of a randomly picked dataset are shown. The first two columns of the matrix plot show the spectral profiles and the elution profiles. Deviations between true and modeled factors can be identified where the colored lines are not laying over the black line. The SIT estimates (yellow) resemble the true profiles very closely. Larger deviations can be seen for the PF2 (blue) and PFFC (orange) estimates. The artifactual peaks in the mass profile estimates of PF2 and PFFC appear at m/z values, at which the baseline signal has strong bands. At the same time, the estimates for the elution profiles of the baseline show "bumps" at the positions of the analyte peaks. The mass profile estimates from PF2 confuse some of the bands between analytes 1 and 2, which is also reflected in the elution profiles showing negative peaks. In summary, it appears that SIT has less difficulties in unmixing the signal contributions from the true underlying factors than PF2 and PFFC have. The third column shows the distributions of the standardized concentration residuals. The residuals have been calculated by subtracting the true concentrations from the estimated concentrations and scaling the result by the true concentration. The quantitative accuracy of the different algorithms is comparable, with the exception that SIT yields higher accuracy on analyte 2.
F I G U R E 1 0 Selected elution profiles of different three-component models fitted to Interval 12 of the apple wine dataset (cf. Figure 8). The upper and lower rows show baseline components, and the middle row shows the analyte signal.

| Apple wine
In order to challenge the results of the simulation study, SIT was benchmarked against PF2 and PFFC on three different real GC-MS datasets. On average, SIT was 65 times faster than PF2 and 58 times faster than PFFC. A detailed summary of the computation times for all three datasets is given in Appendix B.
The first dataset (left-most in Figure 8) is a small peak on a baseline with a slope. The dataset was fitted with three component models: PF2, PFFC, and SIT. The residuals for this model indicate that all chemical information has been captured. Because there is no reference available, the modeled profiles are compared qualitatively. In Figure 10, selected elution profiles are illustrated for the three fitted components, and in Figure 11, the respective mass spectra are shown. The results indicate that PF2 is suffering from two-factor degeneracy (2-FD), as the elution profiles for the baselines are highly negative and the corresponding mass spectra are highly positively correlated. 32 The imposed nonnegativity constraints on the elution profiles save the solutions of PFFC and SIT from becoming degenerate. 33 Nevertheless, PFFC has difficulties unmixing the contributions of baseline and analyte signals. This can be emphasized by the "bumps" in the elution profiles of baseline 1. Moreover, the mass signals of baseline 1 are also very pronounced in the mass spectra of the analyte.
The analyte was identified as butyl acetate by a NIST library search. The Tucker congruences of the mass profiles with the normalized reference spectra are 0.519, 0.279, and 0.985 for PF2, PFFC, and SIT, respectively. Interestingly, the Tucker congruence between modeled and reference spectra does not change going from a three-component SIT model to a two-component SIT model. Moreover, the mass profiles of a four-component PFFC model have a drastically higher Tucker congruence with the butyl acetate reference spectrum (0.969).
F I G U R E 1 1 Mass spectra belonging to the elution profiles shown in Figure 10. The upper and lower rows show the mass spectra of baseline components, and the middle row shows the analyte signal. The analyte mass spectra of PARAFAC2-ALS (PF2) and PARAFAC2-flexible coupling (PFFC) both have a strong band at low m/z-index, which also appears in the baseline mass spectra. This is not the case for the analyte signal of shift-invariant tri-linearity (SIT).
The second dataset (middle in Figure 8) shows an intense, heavily shifted peak signal. A three-component model was selected as being optimal because models with more components resulted in the elution profiles being distorted. In Figure 12, selected elution profiles of the three analyte signals are shown. The PF2 model suffers, as in the previous example, from 2-FD, as can be seen from the elution profiles of analytes 2 and 3. The results from PFFC and SIT look very similar, especially considering the modeled mass profiles shown in Figure 13. However, the elution profiles of analyte 2 modeled with PFFC are distorted. These artifacts get worse with increasing intensities of the analyte 3 signal. Another observation is that the peak shapes of analyte 3 are clearly varying with increasing intensity, from an almost Gaussian shape (low intensity) to a triangular peak shape (high intensity). As outlined in Section 2.4, SIT can basically not model shape changes. If shape changes are present, the power spectra matrix b M will have a rank greater than one. Hence, approximating b M by a one-component PCA model results in truncating the Fourier coefficients. The truncation can cause artifacts in the data such as ringing, for example, related to Gibbs phenomena. Considering this limitation, it is quite surprising how well SIT resolves the elution profiles. Ringing, according to Gibbs phenomena, is present in the elution profiles of analytes 1 and 3, but only gets visible in Figure 12 after zooming in. However, PFFC and SIT fail to model the baseline properly, as can be seen from the offset present in the elution profile of analyte 2.
One limitation of the SIT model becomes obvious when looking at the results achieved by fitting a seven-component model to the third dataset (right in Figure 8). The large cut-off peak causes a discontinuity at the edge of the interval, which cannot be modeled properly by the DFT. The reason is that the discrete time signal is required to be periodic. This means that we must have a smooth transition if we would connect the starting point of the interval with the endpoint. Therefore, we obtain large residuals at the edges of the interval when we model the third dataset with the SIT F I G U R E 1 2 Selected elution profiles of different three-component models fitted to Interval 18 of the apple wine dataset (cf. Figure 8). The factors modeled with PARAFAC2-ALS (PF2) suffer from two-factor degeneracy (2-FD). The results of PARAFAC2-flexible coupling (PFFC) and shift-invariant tri-linearity (SIT) are overall similar. The elution profiles modeled with PFFC for analyte 2 show artifacts. algorithm (cf. Figure 14). Further investigation revealed that the large residuals are exclusively caused by one factor describing the large cut-off peak, while the other factors do not show any artifacts. Thus, SIT can still model the elution profiles of the analytes hidden in the tail of the large peak very well, as can be seen in Figure 15. The mass spectra of analytes 1 and 2 were compared against the NIST library. No reliable match could be found for analyte 1. The best F I G U R E 1 3 Mass spectra belonging to the elution profiles shown in Figure 12. The mass spectra of PARAFAC2-flexible coupling (PFFC) and shift-invariant tri-linearity (SIT) compare very well across all analytes. The mass spectra obtained with PARAFAC2-ALS (PF2) are all the same, indicating factor degeneracy.
F I G U R E 1 4 Residuals of different models fitted to Interval 33 (cf. Figure 8). The model residuals for shift-invariant tri-linearity (SIT) are significantly larger than the residuals obtained with PARAFAC2-ALS (PF2) and PARAFAC2-flexible coupling (PFFC).
F I G U R E 1 5 Selected elution profiles of different three-component models fitted to Interval 33 of the apple wine dataset (cf. Figure 8). All models resolve very similar elution profiles for analyte 1. The elution profiles of analyte 2 compare well between PARAFAC2-flexible coupling (PFFC) and shift-invariant tri-linearity (SIT). The elution profiles obtained with PARAFAC2-ALS (PF2) for analyte 2 look distorted and have ten times larger weights than the elution profiles from PFFC and SIT.
F I G U R E 1 6 Mass spectra belonging to the elution profiles shown in Figure 15. The mass spectra of PARAFAC2-flexible coupling (PFFC) and shift-invariant tri-linearity (SIT) compare very well across all analytes. The mass spectra obtained with PARAFAC2-ALS (PF2) are distinct from the solutions obtained with PFFC and SIT. match for analyte 2 was found to be hexanoic acid butyl ester for SIT (Tucker congruence 0.88) and hexanoic acid 2-methylpropyl ester for PFFC (Tucker congruence 0.80). Because the PF2 solution suffered from 2-FD and the mass spectra looked very unlike the mass spectra of PFFC and SIT, no plausible match was found in either case (cf. Figure 16).

| CONCLUSION
This paper demonstrated that SIT provides a fast, accurate alternative to existing PARAFAC2 algorithms 10,[12][13][14]18 for untargeted modeling of hyphenated chromatography data. In contrast to classical PARAFAC2-ALS, 10 SIT allows constraints in the shifted mode, for example, non-negativity. Compared with existing SIT methods, 18 the proposed SIT approach models shifted data implicitly. The quality of resolution of the proposed SIT method was evaluated on simulated 17 and real GC-MS datasets. 11 Compared with PARAFAC2-ALS, the SIT method performed better on simulated and real GC-MS datasets. On challenging real GC-MS datasets, non-negativity constraints on the shifted mode were necessary to prevent degenerate solutions. In comparison with PFFC, 12 SIT performed better on the simulated datasets and, in some cases, better on real GC-MS data. Specifically, SIT achieved better factor resolution in cases with large baseline components. Moreover, SIT was more efficient on some datasets because fewer components were required to capture the chemically meaningful information. However, other than peak shifts, the currently proposed SIT algorithm does not account for deviations from tri-linearity, such as significant shape changes in the elution profiles. Although SIT handled shape changes surprisingly well (cf. Figures 12 and 13), further improvements to the algorithm can certainly extend the application range of the algorithm further. The SIT algorithm in its current form has been implemented in the PARADISe toolbox (https://ucphchemometrics.com/paradise, [ December 15, 2022] Algorithm: Shift-invariant tri-linearity 0. Initialize V and SSE, e.g.: 2b. Applying Shift-Invariant-Tri-Linearity Constraint for r ¼ 1 :