Near‐infrared spectroscopy for the inline classification and characterization of fruit juices for a product‐customized flash pasteurization

Abstract The feasibility of inline classification and characterization of seven fruit juice varieties was investigated by the application of near‐infrared spectroscopy (NIRS) combined with chemometrics. The findings are intended to be used to optimize the flash pasteurization of liquid foods. More precise information of the kind of product in real time had to be achieved to enable a more product‐specific process. Using the method of partial least squares discriminant analysis, the fruit juice varieties were classified, showing a classification rate of 100% regarding an internal and 69% regarding an external test sets. A characterization by the extract content, pH value, turbidity, and viscosity was made by fitting a partial least squares regression model. The percentage prediction error of the pH value was <3% for internal and external test sets, and for the Brix value prediction errors were about 4% (internal) and 20% (external). The parameters viscosity and turbidity were found to be unsuitable. Despite this, the strategy applied to gain more product‐specific information in real time showed to be feasible. By linking the results to a database containing potentially harmful microorganisms for various types of fruit juices, a more product‐specific calculation of the necessary heat input can be performed. To demonstrate the practical relevance, a comparison between conventional and product‐adapted process control was performed using two fruit varieties as examples in case of Alicyclobacillus acidoterrestris. Thus, with more accurate product information, achieved through the use of NIRS with chemometrics, a more precise calculation of the heat input can be achieved.

bottle pasteurization. The thermal effect required to prevent spoilage by microorganisms is expressed in pasteurization units (PU).
The PU are calculated using a highly simplified model (Heiss, 2004;Rahman, 2007). The so-called fruit juice formula was historically developed based on empirical values (Oliver-Daumen, 2011;Schwarzer et al., 2010). Even if dedicated particularly to fruit juices, simplification of this formula provides insufficiencies, which can lead to both impaired safety and exaggerated treatment. The major disadvantage of the PU model results first from "globalization" by employing a product-unspecific leading germ as reference and second from ignoring the heating and cooling sections in a continuous flash pasteurization system. For a realistic assessment of the microbiological hazard potential, product properties such as pH value, extract content ("Brix value"), and turbidity have to be taken into account.
These parameters can affect the inactivation rate of microorganisms, which is expressed by D and z values. The D value is the decimal reduction time at a reference temperature. The z value indicates the required temperature increase that is necessary to reduce the D value (time) to a tenth compared to a reference temperature (Tiwari & Rajauria, 2018). Depending on the fruit juice type and its individual properties, the D and z values for the microorganisms vary in a relevant extent (Oliver-Daumen, 2011). Therefore, getting information about the product in time of processing enables to determine more product-specific D and z values, provided that a database with corresponding values like Brix and pH values can be accessed. Using this information, a more specific PU target value can be calculated and the process control can be adjusted accordingly (Schwarzer et al., 2010). Such a database with scientifically approved D and z values for specific product type and the specific growth conditions already exists with an open Internet access and is continuously fed with new data (Schwarzer et al., 2010). Hence, the aim of this study is to enable manufacturers to profit practically from this database, to obtain a more gentle process, and to better protect nutritive juice compounds. Therefore, the feasibility of inline product classification (fruit variety) and characterization of relevant properties (Brix value, turbidity, pH value, and viscosity) was demonstrated and presented in this study.

| Approach for a case-specific pasteurization
This work follows a novel approach to liquid food preservation that allows for case-specific pasteurization and better protects valuable chemical compounds. Figure 1 shows the underlying idea and workflow, which is investigated here using near-infrared (NIR) analysis.
With the help of inline spectroscopy measurements and chemometrics, a selection of the microorganisms that are specifically relevant for the product is to be made from a large number of potentially harmful microorganisms. Only for these microorganisms, the kinetic parameters (D/z values) have to be considered, with which a F I G U R E 1 Underlying background idea as a novel approach for a case-specific calculation of the pasteurization units (PU) by the individual selection of spoilage microorganisms from fruit variety identification and analysis of relevant product properties ("pessimistic") PU calculation is performed. Although these resulting PU are still to be regarded as a worst-case scenario for safety reasons, this only refers to the ultimately selected few microorganisms. This calculation is therefore much more optimistic and thus gentler than the globalizing assumption of a much larger spectrum of microorganisms. The selection of harmful microorganisms in practical applications is supposed to be conducted in two steps: (i) the categorization of the product type and variety and (ii) the product characterization with regard to Brix, pH, turbidity, and viscosity.
The aforementioned strategy is exemplified within this study using two fruit varieties (apple and grape; of a total of seven varieties contained in this study).

| Near-infrared spectroscopy application on fruit juices
For the determination of product information, a real-time and inline applicable analytical method, which is nondestructive and easily adaptable, is necessary. Near-infrared spectroscopy (NIRS) in combination with chemometrics has the potential to accomplish these requirements (Günzler & Gremlich, 2002;Kessler, 2007). Nearinfrared spectroscopic investigations on fruit juices are the matter of numerous publications. Typical applications are the verification of the correct declaration of regional provenance, the verification of authenticity, or the quantification of special ingredients (Hosseini et al., 2021;Igual et al., 2010;Lanza & Li, 1984;Rambla et al., 1997;Reid et al., 2005;Šnurkovič, 2013;Twomey et al., 1995;Włodarska et al., 2018). In most cases, these investigations are offline applications and place emphasis on a single fruit variety or on particular ingredients. Some studies combine different techniques such as NIRS with ICA like in the study of Ribeiro et al. (2017). An example of a study involving a multiproduct investigation was published by dos Santos et al. (2018). Beside the specific aim in the context of the fruit juice pasteurization and NIRS as an inline measurement method, a particularity of this work is the use of a so-called transflection probe, which is a system of transmission measurement with a doubled path length using a reflection surface. Figure 2 shows schematically the experimental design of processing and the ways of analyzing and evaluating the results by chemometric methods, which is explained in more detail in the following sections.

| Sample material
In order to study the feasibility of the classification and quantification of fruit juices by NIRS, 7 × 5 (n = 35) commercial samples have been bought from local grocery stores: seven varieties of fruit juices, each from five different producers. The samples included the varieties cloudy apple, orange, pear, peach, cranberry, black currant, and grape. The reference values determined for the parameters investigated within this study are tabulated as mean values with related standard deviation in Table 1. The minimum and maximum   values were presented in Table 2. In addition, 26 other fruit juices of different varieties were purchased, the number varying depending on the availability of different manufacturers.

| Inline NIR measurements
Processing of fruit juice samples and near-infrared inline measurements were carried out in a laboratory system for HTST treatment of liquid foodstuff type HT220 (OMVE). The experimental setup has been described in detail previously (Weishaupt et al., 2020) so that only a brief description of the general setup is given here. The inline NIR measurements were conducted under constant process conditions with flow rate of 90 L/h, temperature of 20°C, and pressure of 3 bar. The heat-holding section of the HTST laboratory plant was extended by a tube coil containing a segment with three ports that allows the insertion of external probes. Through one of these ports the NIR probe was inserted. The NIR probe is a so-called transflection probe sensor with a variable path length and a reflective surface at the opposite side of the light source (Avantes BV). The sensor was connected to a spectrometer type PSS-2120 (Polytec GmbH) with a diode array detector of 256 pixels and a spectral range from 1100 to 2100 nm in combination with the software Pas Labs 1.2 (Polytec GmbH). Recording of the spectra was made in absorbance mode with 100 scans averaged per spectrum. The path length of transflection probe was set to 2 mm. Before running the fruit juice samples, a reference measurement with demineralized water at a temperature of 20°C was conducted, also under constant process conditions with flow rate of 90 L/h. Each juice was measured inline 30 times at 20°C.
For the generation of a so-called external test set of spectral data, which are totally independent from the training dataset, another set of 26 different commercial fruit juices consisting of the same varieties were measured under the same process settings.

| Laboratory reference measurements
Extract ("Brix value"), turbidity, pH value, and viscosity were measured in the laboratory environment to provide reference data, which were used to generate models using the NIR spectra with the help of chemometric methods. These reference values are necessary in the second step after identification of the fruit variety to better individualize the pasteurization requirements according to the overall approach described earlier (Figure 1) GmbH, Ostfildern-Scharnhausen, Germany) with the adapter CC12 and 230 rpm for 1 min measuring time. All measurements were performed in fivefold, for which the mean and standard deviation were then calculated.

| Data analysis
The characterization and classification of the fruit juices were realized with chemometric methods applying Simca 16.1 (MKS Umetrics AB). An informative overview of the basics of the chemometric methods used here is given by Hosseini et al. (2021) in their publication, thus following is a brief explanation of the partial least squares regression (PLSR) and partial least squares discriminant analysis Viscosity ( samples (Eriksson, 2013). PLSR models for Brix value, turbidity, viscosity, and pH value were fitted and validated on an internal test set (spectral dataset was split into 2/3 training and 1/3 test set) and on an external test set (26 totally independent fruit juice samples).
For classification, the method of PLS-DA was employed, which is an adaption of the PLSR for the purpose of classification (Aguilar-Rosas  Ruiz-Perez et al., 2020;Szymańska et al., 2012). The principle is based on the use of a binary "dummy" system, which assigns, for example, a 0 or a 1 depending on the class membership.
With a chosen threshold of 0.5, a sample is considered to be a group member, when the predicted y value is the highest and it has a value above 0.5. For the ability of allocation, the model has to be trained in a calibration phase regarding the characteristics of the individual groups. In this study, the aim of the categorization was to distinguish the seven different fruit varieties. Before the regression and classification models were developed, the raw spectra were preprocessed.
For this matter, regardless of the chemometric method applied, the dataset of 30 spectra recorded for each fruit juice sample was divided into two parts, 20 spectra as training set and 10 spectra as internal test set. With tools like wavelength selection, standard normal variation (SNV), multiscatter correction (MSC), Savitzky-Golay smoothing, and derivative spectra, model performance was optimized in an iterative process trying to find the best preprocessing strategy for a high-performing prediction or classification model.
Quality parameters of evaluation were the explained variation (R 2 ) and the predictive quality (Q 2 ). R 2 represents a measure for the variance explained by the model and Q 2 for the predictive ability based on the difference between the predicted value and the actual value.
Both shall attain a value close to 1 as indication of a high performance. The number of latent variables formed in the course of dimensional reduction was examined to avoid overfitting by executing the permutation test (Eriksson, 2013;Lindgren et al., 1996). This has the advantage of checking the optimal number of latent variables and of verifying the statistical significance of the model. Starting from the unpermuted model with corresponding R 2 and Q 2 , the Y variable is randomly mixed up in its assignment to the X variable.
This leads then to changed values of R 2 and Q 2 . If the original model is high in significance, the R 2 and Q 2 resulting from the permutation test are supposed to be significantly lower (R 2 below 0.3 and Q 2 lower than 0.05; Eriksson, 2013;Eriksson et al., 2008;Lindgren et al., 1996).
In addition to these general quality parameters, there are further quality and performance parameters specific for PLSR and PLS-DA.  (Oliveri & Downey, 2012). In case of a multiclassification situation with threshold, it is necessary to visualize the classification performance at various thresholds to find the optimum ratio of se to sp for high performance. For this purpose, the receiver operator characteristic (ROC) plot is considered, in which the sensitivity as true positive rate (TPR) is plotted against the false positive rate (FPR = 1 − sp). In case of more than two classes, it represents the separability of one group against all other groups. This curve should therefore rise as steeply as possible (ideal case would be TPR = 1 and FPR = 0). If the curve runs close to the bisector, a purely coincidental allocation can be assumed. The threshold value that has the largest normal distance from the bisector is the optimal one. As a measure for the curse of ROC curve, the area under the curve (AUC) was introduced, which has a value of 0.5 for the bisector. The closer this value comes to 1, the better the ability to separate between different classes is. The AUC value thus corresponds to the probability that a positive value is actually classified as such (Baratloo et al., 2015;Fawcett, 2006;Oliveri & Downey, 2012;Szymańska et al., 2012).

| Scenario of a smart selection of pasteurization parameter setting using productspecific D and z values
For a more product-specific control of the pasteurization process, information about the product to be pasteurized is necessary, such as the fruit variety, the Brix value, and the pH value. Within this study, this information is determined by NIRS and chemometric methods.
Considering two fruit juices, for example, apple and grape are processed in one production site. First, they need to be identified as apple juice or grape juice by NIR and PLS-DA. Then, Brix and pH values are predicted using NIR measurement and PLSR. After determining these specific microbiologically relevant parameters by means of NIR, product-specific characteristic data of the mortality rates (D and z values) could be determined from the database. Via comparison with the globalized PU values determined by applying fruit juice formula, the extent of a possible optimization by a more product-specific calculation of the PU values.

| RE SULTS AND D ISCUSS I ON
The foundation for a more product-specific pasteurization is the knowledge of certain product properties, which must already be available at the start of production. The microbially relevant properties to be determined are the fruit variety, the extract content, the pH value, the turbidity, and the viscosity, which are realized by means of NIR spectra in combination with chemometrics. The aim of the following data evaluation is therefore to test the feasibility of this approach.

| Identification of fruit variety
Near-infrared raw spectra (1050 measurements) were divided into a training set (700 measurements), that is, 20 spectra per fruit juice, and a test set (350 measurements), that is, 10 spectra per fruit juice. Two models were generated, one for fruit variety classification applying PLS-DA and another one for characterization of fruit juice properties applying PLSR. For both models, the preprocessing methods of wavelength selection and MSC led to highest model quality parameters, relying on the parameters of R 2 and Q 2 (data shown in Table 3). Besides the parameters R 2 and Q 2 , it was mainly the prediction performance expressed in RMSEP with respect to the external test set, since low prediction error values are associated with a high robustness of the model against unknown samples.
A comparison of all the results of the different preprocessing methods is shown in Table 4 for the PLS-DA model and in Table 5   After evaluation of model quality, the classification performance of the PLS-DA model was validated by calculating the sensitivity and specificity, which were merged to get the parameter of accuracy.
The classification performance for the calibration dataset and the internal test dataset showed no failures in classification. Regarding the external test dataset which consists of 26 completely independent juices, the classification rate decreased from 100% to 69% (Table 6), which still corresponds to a high assignment rate.
Expressed in terms of the performance parameters se, sp, and acc, values >90% are shown for the parameter sp for all fruit juices.
The risk for a wrong assignment of samples to another class is therefore low. For se, the values differ greatly depending on the fruit variety. For orange, peach, and cranberry, se is 100%, for  Table 7. To investigate an optimization approach to the model performance, the classification task was modified in a second PLS-DA model (model properties were displayed in Table 3 named "PLS-DA orange"). The task was now not to classify each fruit variety used in the study, but to distinguish one variety, in this case orange, from the others. This approach increased the classification rate from 67% to 98%, while the sensitivity for orange remained at 100% and was increased to 95% in the case of specificity. Based on the overall high values of performance parameters, especially since the risk of a false negative assignment is higher than the opposite, the NIRS can be considered as applicable as inline analysis method for classification of fruit juice varieties. Furthermore, it became apparent that there is even potential for improvement.   Table 3. RMSEE and RMSECV of the parameters Brix and pH values were very low in relation to the reference values and near 0, indicating the potential of these parameters as reference values. In order to verify the normality, the residual plots were considered. A linearity was found, and the limits of ±4 were not exceeded, which would be an indication of outlier. The residual plots of Brix and pH values were shown in Figure 6 as an example. In addition to examining the plots of residuals, an analysis of variance (ANOVA) was performed to test the significance of the models. The CV-ANOVA provides a significance test of the null hypothesis of equal residuals of the two models compared. The p value is considered here, which indicates the probability level at which a model is recognized as significant, usually at a value of <.05. Since all the four Y variables resulted in p values smaller than 0.05, they can be described as significant. Results are shown in Table 8.

| Product characterization
For optimization and validation of prediction performance, the RMSEPs for the internal and external test sets were calculated (  Figure 7). In case of viscosity and turbidity, an increase by a multiple of the RMSEP value was observed. A comparison of these results was shown in Table 10.
The difference of the values with regard to the external test set shows that the model requires optimization with regard to the calibration dataset in order to increase the robustness against unknown samples. An increase in the number and diversity of the calibration dataset would provide this. In order to make the RMSEP values more interpretable, they were set in relation to the mean value of the reference values in order to determine the percentage error of prediction (

| Exemplary application of an individualized pasteurization using product-specific D and z values
Finally, an example is given how to profit in practice from an inline product identification and characterization using NIRS as analytic tool. A large number of D and z values are available and many of them are already collected in an Internet accessible database ("Lemgo Dand z-value Database for Food"). However, more data are required for a wider use in practice. Despite its exemplary character, the case shown here is intended to demonstrate the feasibility of an individualized and gentle pasteurization by implementing NIRS as an inline analytical tool. A well-known juice-spoiling germ, Alicyclobacillus acidoterrestris, was therefore selected. In fruit juices, this organism can lead to product spoilage during warm storage (depending on climate and weather in certain countries), or hot packaging with slow cooling after too cold heat treatment, which stimulates the germination (Ciuffreda et al., 2015;Komitopoulou et al., 2001 which the z values used refer, corresponds to the actual temperature measured at the end of the heat-holding section in classical control of the pasteurization process, and t represents the heating time. In the here shown case, one (real) sample from the class apple and one from the category grape were randomly selected and characterized by PLSR method in Brix and pH values (shown in Table 11).

| CON CLUS ION
Using the PLS-DA method, a classification model was created which is capable of distinguishing seven different fruit juices in the course of inline measurement using NIRS. In addition, these fruit juices were characterized in terms of microbiological properties by the two parameters Brix and pH values using the PLSR method. The turbidity and viscosity could not be represented by NIRS with high coefficients of determination, which led to regression models of low (2) ("fruit juice formula") was found. The applied strategy of inline analysis using NIRS and chemometrics is therefore suitable for product adaptation of the pasteurization process. However, it is questionable whether the ambitious plan to individualize pasteurization down to fine product properties can succeed. Therefore, it is necessary to further investigate how cloudy fruit juices differentiate from clear fruit juices or how this applies to different beer types. In conclusion, this study has provided a fundamental basis for further investigation to realize an individualized pasteurization.

ACK N OWLED G M ENTS
Acknowledgement is being given to Polytec GmbH (Waldbronn, Germany), who provided the spectrometer on loan. The financial support for the SMARTPas (13FH024IX6) project is provided by the Federal Ministry of Education and Research (BMBF).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

E TH I C A L A PPROVA L
The study does not involve any human or animal testing.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data are available upon request from the authors.

Imke Weishaupt
https://orcid.org/0000-0003-2965-5345 TA B L E 11 Comparison of the required holding time in a flash pasteurizer at 95°C to apply a lethal impact of 63 PU based on experimental D/z values (Splittstoesser et al., 1994) on an empirical model (Silva et al., 1999) taking pH value and extract content