Evaluation of human breastmilk adulteration by combining Fourier transform infrared spectroscopy and partial least square modeling

Abstract A two‐step chemometric procedure was developed on the attenuated total reflection‐Fourier transform infrared data of human breastmilk to detect adulteration by water or cow milk. The samples, collected from a Milk Bank, were analyzed before and after adulteration with whole, skimmed, semi‐skimmed cow milk and water. A preliminary clustering via principal component analysis distinguished three classes: pure milk, milk adulterated with water, and milk adulterated with cow milk. A first partial least square‐discriminant analysis (PLS‐DA) classification model was built and then applied on new samples to identify the specific adulterants. The external validation on this model reached 100% of the correct identification of pure milk and 90% of the type of adulterants. In the following step, four PLS calibration models were built to quantify the amount of the adulterant detected in the classification analysis. The prediction performance of these models on new samples showed satisfactory parameters with root mean square error of prediction and percentage relative error lower than 1.38% and 3.31%, respectively.

banks of human milk gather, treat, and store milk from healthy lactating women. Human milk banks perform an important social role by promoting breastfeeding and encouraging the mothers to breastfeed their babies. These banks are also an important helper for the care and treatment of premature, low-weight, sick newborns (Goóes, Torres, Donangelo, & Trugo, 2002).
The commercial interest of BM is becoming a reality in recent years, although in many countries there is no legislation dedicated to this topic. Recent papers have reported cases of infected BM samples or contaminated with cow milk or drug (Keim, Kulkarni, et al., 2015;. Adulterated BM becomes not only inferior in quality and economic value but it is also dangerous for the infants. The simple addition of water into milk could affect the variation of nutritional composition such as protein and solid content. Infants intolerant to cow milk for allergy could suffer severely if they ingest BM adulterated with this milk (Santos, Pereira-Filho, & Rodriguez-Saona, 2013;Zhang et al., 2014).
In the last 25 years, the milk has been studied from many points of view and one of the main topics has been focused on the development of procedures for monitoring quality and safety of the food matrix, from whatever source it comes from (Poonia et al., 2017).
UV and IR spectroscopic methodologies have been widely used for determination of milk adulteration (Kasemsumran, Thanapase, & Kiatsoonthon, 2007). Recio Garcıá-Risco López-Fandiño Olano and Ramos (2000) have used the capillary electrophoresis for detection of rennet whey solids in milk. Polymerase chain reaction technique has been used to evaluate the milk adulteration due to the mixing of milk from different origins (López-Calleja et al., 2004). Sandwich ELISA, RP-HPLC, immune chromatography, and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-MS) have been used to assay milk adulteration due to soya bean proteins or serum additions (Chávez et al., 2012;Kruså, Torre, & Marina, 2000;De Noo et al., 2005;Oancea, 2009).
The coupling of instrumental analysis with multivariate data analysis techniques, able of handling very large data matrices, is the latest evolution of the methodologies proposed for food analysis (Bassbasi, Luca, Ioele, Oussama, & Ragno, 2014). The chemometric tools applied to the instrumental signals allow to extract the information stored in the data, identifying with remarkable reliability the data patterns and the clustering of the samples (objects), based on the similarities among them. This analytical information can be elaborated for building mathematical models, used to estimate new unknown samples (Bassbasi et al., 2014;Dinç, Ragno, Baleanu, Luca, & Ioele, 2012).
In this perspective, IR spectroscopy shows high sensitivity and specificity and can be used in fingerprint mode analysis, thus becoming a good source of information for multivariate techniques (Rodriguez-Saona & Allendorf, 2011). Near-infrared (NIR) spectroscopy and mid-infrared (MIR) spectroscopy have been widely used for the determination of protein, lactose, and other milk properties (Kawasaki et al., 2008). Since the IR fingerprints show variations in both positions and shapes of the signals in the presence of adulterated milk, some authors have investigated the correlation between NIR and MIR data in the presence of water, whey (Kasemsumran et al., 2007), urea, and caustic soda (Khan, Krishna, Majumder, & Gupta, 2015) by using chemometric analysis (Limm, Karunathilaka, Yakes, & Mossoba, 2018).
Moreover, IR spectroscopy and chemometric procedure are fully adapted to the dictates of the green analytical chemistry (GAC). The role of the analytical chemists should be increasingly focused on developing more environmentally friendly laboratory procedures, by minimizing the use of chemicals, energy consumption, and waste (Gałuszka, Migaszewski, & Namieśnik, 2013;De Luca, Ioele, Spatari, & Ragno, 2017). IR spectroscopy seems appropriate for this purpose as it involves the possibility to analyze complex samples (such as food and environmental matrices) with minimal or no sample preparation, coupled with a simple and fast data collection. respectively. The amount of adulterant ranged from 5% to 50% with multiple addition of 5%, replicated five times. Applying the Kennard-Stone sampling method (Galvão et al., 2005), from each set were selected 40 samples for the modeling procedures and 10 samples to validate the models.

| Instruments
The IR fingerprints were recorded by using a Spectrum Two Fourier transform infrared (FTIR) spectrometer (Perkin Elmer), equipped with an attenuated total reflection (ATR) accessory consisting of a flat top-plate fitted with a 25 reflection, 45°, 50 mm ZnSe crystal.
The ATR system was cleaned before each analysis by using dry paper and scrubbing it with hexane and ethanol, and spectra acquisition was performed without using cover apparatus. The room air FTIR-ATR spectrum was used as background to verify the cleanliness and to evaluate the instrumental conditions and room interferences due to H 2 O and CO 2 . FTIR spectra of the milk samples, placed on the ATR surface, were recorded between 4,000 and 450 cm −1 . Scan number and resolution were optimized at 16 scans and 4 cm −1 , respectively.
The Unscrambler X software version 10.3 from CAMO (Computer Aided Modelling) was used for the chemometric treatment of the spectral data.

| Chemometric methods
A PCA study of the data patterns was performed to highlight the PLS regression is a factor analysis method, very useful in the processing of spectroscopic data for the calibration analysis of complex samples (Geladi & Kowalski, 1986;Mabood, Jabeen, Ahmed, et al., 2017;. In applying PLS procedure, the spectroscopic data (descriptor variables) are arranged in a matrix X (n,m) while a second matrix Y contains the concentration data (response variables). The algorithm PLS1 is adopted in the presence of one vector y, while PLS2 regression is applied for a matrix Y (n,k) in which the components or classes are more than one (k > 1). X and Y are mean-centered and then decomposed in factors. Consecutive orthogonal factors are selected with the aim to maximize the covariance between descriptors and responses. PLS modeling is achieved when the factors that explain most of the covariation between both data sets are found (Forina, Oliveri, Lanteri, & Casale, 2008).
In PLS-DA, Y variable takes on value 1 for the samples belong-

| Exploratory analysis of ATR-FTIR fingerprints of milk samples
Human and cow milk contain a similar amount of water, but the relative amounts of carbohydrate, protein, fat, vitamins, and minerals vary widely. The protein content in whole cow milk is more than twice that of human milk. The amount of protein in milk is linked to the growth rate of each animal species. Human infant needs less protein and more fat because a large amount of energy is consumed for the development of the brain, spinal cord, and nerves.
The proteins in milk consist of two principal categories: caseins and whey. Cow milk contains more casein than human milk, in a ratio of 80:20, whereas in human milk this ratio is 40:60. Whole cow milk and human milk contain a similar amount of fat, but the types of fats are different. Cow milk contains more saturated fat while human milk contains more unsaturated fat. The higher level of unsaturated fatty acids in human milk reflects the important role of these fats in brain growth. In humans, the brain develops A very strong overlap between the spectral signals of human milk and pasteurized cow milk is evident throughout the full recorded spectral region, suggesting a high similarity in the composition of the matrices. Therefore, it seemed necessary to perform a multivariate data study to interpret the data matrices, taking into account the full information from the FTIR fingerprints of the samples.
Raw FTIR spectra were pretreated to select the information more useful for the chemometric modeling. First of all, only the wavelength range between 3,000 and 1,000 cm −1 was considered, discarding the terminal regions because considered rich in instrumental noise and useless information carriers (Kasemsumran et al., 2007). After that, a mathematical pretreatment of the data seemed necessary to minimize instrumental problems as baseline fluctuation  showed a further range rich in information between 1,700 and 1,500 cm −1 , specific for the protein composition. Figure 4a shows the 3D score plot using PC1, PC2, and PC4. The grouping of the samples was clear making it possible to distinguish the pure BM samples and the BM samples adulterated with water or cow milk. However, this PCA modeling identified only one cluster of BM samples adulterated with cow milk but was unable to distinguish the type of cow milk added.

| Classification of milk samples by PLS-DA modeling
A PLS-DA modeling was developed with the aim to classify the milk samples as pure BM or adulterated with water (W), whole cow milk (CM), semi-skimmed (SSCM), and skimmed (SCM). PLS2 algorithm required the setting of more than one Y variable. In this study, five Y variables/classes (Y BM , Y W , Y CM , Y SSCM , and Y SCM ) were set in modeling, assigning the value 1 to the samples belonging to each class and 0 to those belonging to other classes.
The PLS-DA classification model was validated by full crossvalidation, and its performance evaluated in terms of correlation coefficient R 2 and RMSECV. The figures of merit, shown in Table 1, were statistically acceptable by considering 6 factors, with RMSECV values between 0.122 and 0.145 and R 2 in the range 0.857-0.944. Figure 4b shows the score plot factor 2 versus factor 4 by PLS-DA modeling, in which the discrimination across all classes is greatly improved.
This model was applied to an external prediction set consisting of 60 samples: 10 samples of pure BM and 10 samples from each subset of adulterated samples. According to the PLS-DA procedure, a sample was considered belonging to a class n when the predicted value of Y n was higher than 0.5. The classification results obtained are listed in Table 2 through a confusion matrix. One hundred percent of BM and W samples were well classified while some difficulties were found in classifying the samples adulterated with cow milk.
The PLS-DA model was able to identify the samples adulterated with cow milk, but the exact type of cow milk added was only identified for 90% of the samples. The poor classification of three samples (one SCM and two SSCM) was likely due to the extreme similarity of the BM adulterants. The difference between skimmed and semiskimmed milks was due to the lipid composition alone. No samples were detected as suspect origin.

| Estimate of adulteration by PLS1 approach
In order to quantify the amount of adulterant added to BM, four PLS1 calibration models were built by using the sets W, CM, SSCM, and SCM, respectively. Full cross-validation permitted to select the optimal number of factors for all the models by evaluating the parameters R 2 and RMSECV. The obtained values of the parameters for all the data sets are listed in Table 3. TA B L E 3 Statistical parameters of PLS models from FCV and external validation procedures section 2.12.1).

| CON CLUS IONS
The combined use of chemometric methods and IR spectral analysis in a multistep procedure proved to be very effective in detecting even minimal variations in the composition and characteristics of human milk. The handling of ATR-FTIR spectral fingerprints by PLS regression procedures was able to detect fraudulent additions of water or cow milk into human breastmilk.
In particular, the PLS-DA technique proved to be particularly robust in discriminating, with a high percentage of correct classification, pure human milk from those adulterated. A further definition of four PLS1 models, specific for the type of adulterant, allowed to determine the amount of adulterant, obtaining excellent results in terms of accuracy and precision when the models were validated on samples external to calibration.
This work demonstrates that ATR-FTIR spectroscopy has great potential in the control of food matrices whose quality and integrity must be ensured as being of fundamental importance for human health. Moreover, compared to other more complex analytical techniques proposed in the literature, the proposed procedure is inexpensive, requires reduced execution times, and does not require any pretreatment of the samples.

ACK N OWLED G M ENT
This study was supported by Ministero dell'Istruzione, Università e Ricerca (MIUR), Italy.

CO N FLI C T O F I NTE R E S T
The authors declare that they do not have any conflict of interest.

E TH I C A L A PPROVA L
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

I N FO R M E D CO N S E NT
Informed consent was obtained from all individual participants included in the study.