Selectivity‐relaxed classical and inverse least squares calibration and selectivity measures with a unified selectivity coefficient

Two popular calibration strategies are classical least squares (CLS) and inverse least squares (ILS). Underlying CLS is that the net analyte signal used for quantitation is orthogonal to signal from other components (interferents). The CLS orthogonality avoids analyte prediction bias from modeled interferents. Although this orthogonality condition ensures full analyte selectivity, it may increase the mean squared error of prediction. Under certain circumstances, it can be beneficial to relax the CLS orthogonality requisite allowing a small interferent bias if, in return, there is a mean squared error of prediction reduction. The bias magnitude introduced by an interferent for a relaxed model depends on analyte and interferent concentrations in conjunction with analyte and interferent model sensitivities. Presented in this paper is relaxed CLS (rCLS) allowing flexibility in the CLS orthogonality constraints. While ILS models do not inherently maintain orthogonality, also presented is relaxed ILS. From development of rCLS, presented is a significant expansion of the univariate selectivity coefficient definition broadly used in analytical chemistry. The defined selectivity coefficient is applicable to univariate and multivariate CLS and ILS calibrations. As with the univariate selectivity coefficient, the multivariate expression characterizes the bias introduced in a particular sample prediction because of interferent concentrations relative to model sensitivities. Specifically, it answers the question of when can a prediction be made for a sample even though the analyte selectivity is poor? Also introduced are new component‐wise selectivity and sensitivity measures. Trends in several rCLS figures of merit are characterized for a near infrared data set.

indexes (eg, motor octane number in gasoline) or physical properties (eg, viscosity in polymers) are also modeled. All discussions henceforth are based on spectroscopic calibration with analyte concentration as the quantity being modeled, but the discussions are applicable to other techniques of instrumental analysis.
From a set of calibration samples with known instrumental responses and quantitative information of the analyte, there are 2 main approaches to stating a multivariate calibration model. These approaches are often referred to as direct (the classical least squares [CLS] model being the most used, but CLS variations have been published [1][2][3][4][5] ) and inverse (that include inverse least squares [ILS] methods such as multiple linear regression, principal component regression, partial-least squares [PLS] regression, and ridge regression [RR], among others). 6,7 To provide accurate predictions from nonselective measurements, each calibration approach sets constraints on the model relative to the non-analyte components (interferents). The understanding of such constraints has led to the concept of net analyte signal (NAS) [8][9][10][11][12][13] and selectivity measures. 8,[14][15][16][17][18][19][20] It is well developed that CLS imposes strict orthogonality constraints while ILS methods do not. 21 Specifically, depending on the analyte and interferent concentrations in the calibration set as well as the spectral noise structure, different ILS models will be obtained [21][22][23][24] while nearly the same orthogonal CLS models are estimated. 21 In essence, ILS models at appropriate tuning parameter values (eg, latent variables for PLS and ridge parameter for RR) are the models most useful for prediction and not orthogonal to interferents. However, ILS models at correct tuning parameter values are actually commonly nearly orthogonal to the interferents. 25 Proposed in this paper is relaxed CLS (rCLS), a method that relaxes the CLS orthogonality constraints with relaxation parameters to form a family of models. The process is extended to form relaxed ILS (rILS) models. In fact, the proposed relaxation parameters used to form rCLS and rILS models are related to the component-wise selectivity definitions proposed in reference 21 and this relationship is elaborated in this paper.
Also presented in this paper is a unified definition of the selectivity coefficient. The selectivity coefficient is a measure used in analytical chemistry developed for univariate calibration. 26,27 It finds most use with ion-selective electrode calibration. 28,29 The selectivity coefficient (formally defined in Section 3.2 for univariate and Section 3.3 for multivariate) characterizes the sensitivity of a method for an interferent relative to the sensitivity of the method to the analyte. To date, a definition of the selectivity coefficient is nonexistent for multivariate CLS and ILS calibrations. Developed in this paper is a definition that applies to both calibration processes and reduces to the usual univariate definition. With multivariate selectivity coefficients, it is possible to assess the prediction quality expected from a model relative to the sample analyte and interferent concentrations. Thus, even though the selectivity for the analyte can be poor, acceptable concentration predictions may still be possible. 30 In addition to the selectivity coefficient, new component-wise selectivity and sensitivity measures are introduced. This paper begins by discussing the underlying calibration model with respect to CLS and ILS and the orthogonality goals between pure component spectra and model regression vectors. After this, an overview of estimating CLS and ILS models is provided leading into presentations of the relaxed versions of CLS (rCLS) and ILS (rILS). With the relaxed multivariate calibration methods established, selectivity coefficients are defined. This section begins with the classical univariate definition and generalizes the definition to multivariate calibration. The rCLS process and selectivity coefficient definition are exemplified using a small near infrared (NIR) calibration set of 2 species.

| SELECTIVITY-RELAXED CALIBRATION THEORY
For the following mathematical equations and expressions, scalars are represented as nonbold italic letters, column vectors as bold lowercase letters, and matrices as bold uppercase letters. The hat (^) used in the literature over a variable to indicate estimated quantity has been omitted for simplicity except for predicted quantities of the analyte; whether the magnitude is measured or calculated can be deduced from the context. The superscripts "T" and "+" denote transposition and a pseudoinverse, respectively. The matrix I represents a properly dimensioned identity matrix. The L 2 norm of a vector is symbolized by ‖·‖.

| Underlying model
Let r (P × 1) be the measured (background corrected) spectrum measured over P wavelengths of a sample with N responding components. Let the subscript a denote the analyte of interest in a sample and i any other sample component, here regarded as interferent for the determination of a. Throughout this paper, it is assumed that a sample spectrum is a linear combination of analyte and interferent pure component spectra following the Beer-Lambert law where the columns of S (P × N) are the pure component spectra of the analyte s a and interferents s i , c (N × 1) holds respective concentrations of the analyte (c a ) and interferents (c i ), and e contains the measurement noise. For simplicity, this paper only considers interfering components, but the developments can also be asserted for other sources of variation in r, such as measurement temperature and instrument. A wide range of modeling methods estimate the analyte concentration in a sample bŷ where b a contains the model regression coefficients for component a. In some cases, the concentrations of all components in the calibration standards are known and it is possible to derive a vector of regression coefficients for each component. In this case, all components are predicted simultaneously byĉ where B is a matrix of regression coefficients whose ath column is b a andĉ is an estimate of c in Equation 1. Because of unselective measurements, a common goal of calibration methods is to provide a model that predicts the analyte concentration without being affected by the interferents. This can be expressed as where c k is either the analyte a or an interferent i concentration and δ ka is the Kronecker delta. Figure 1 illustrates the expected behavior according to Equation 4 where the predictionĉ a is the same as the concentration of a in the sample for any value of c a , (ie, ∂ĉ a ∂c a ¼ 1) and, at the same time, the prediction does not change with an increasing amount of interferents in the sample (ie, and applying Equation 4 gives meaning that b a must be estimated such that s T a b a ¼ 1 and at the same time, be orthogonal to the (usually unknown) spectra of all interferents in the sample ( s T i b a ¼ 0; ∀i ) and to the noise vector (e T b a = 0). Similar constraints can be stated for other components in the sample acting as the analyte by redefining a and i.
Assuming Equation 1 is valid and taking all components into account simultaneously, then from Equation 3, and For B to fulfill the requirement stated in Equation 4, then where the identity matrix I arises from the Kronecker delta in Equation 4. Estimating a regression vector b a satisfying Equation 6, or a matrix B fulfilling Equation 9 is the CLS working framework and the ILS regression targets these equations. 21 These points are further discussed in the next section.

| Net analyte signal
keeps interferences from biasing the predictionĉ a . A model that fulfills this requirement has a null sensitivity for the interferents, or, equivalently, full selectivity for the analyte of interest. Such a model is said to maintain orthogonality and uses for prediction only that part of the analyte signal orthogonal to the interferent space also referred to as the NAS. 8 This definition pertains to CLS. A more encompassing definition that includes CLS and ILS is to define NAS as that part of a sample signal used for prediction. 10 As such, the NAS plays a fundamental role in the prediction of multivariate models and influences the prediction variance. The NAS for a sample spectrum is commonly given by where the net analyte sensitivity vector s Ã a , can be calculated as The constraints in Equation 4 ensure that modeled interferents will not bias the prediction of the analyte. To fulfill Equation 4, b a must be orthogonal to the spectra of the interferents, the CLS constraint. This is equivalent to saying that the only contribution to the NAS of the unknown sample, r Ã a , must be generated by only the analyte of interest. The more similar the analyte and interferent spectra become, Figure 2A,B show that the NAS decreases for the orthogonal projection of s a against s i (the s Ã a is restricted to be orthogonal to the interferent). Similarly, the NAS decreases as the degree of sample complexity increases. All other things being equal, a small norm of the NAS encompasses a small signal-to-noise ratio, large prediction variance and hence, large overall prediction errors. 10,30 This realization leads to the awareness that strict adherence to the defining relation in Equation 4 can lead to underperforming models. The underperformance is the price paid by CLS to avoid the interferent from biasing the prediction independent of the interferent concentration.
A remedy for this CLS limitation is realizing that prediction bias caused by an interferent is the result of the model sensitivity for the analyte, the model sensitivity for that interferent, and the amount of interferent in the sample. Hence, when the amount of interferent is expected to be (or known to be) very low or the sensitivity for the interferent is low, a small prediction bias due to the interferent can become acceptable if there is trade-off for another prediction quality such as prediction variance, the same bias/variance tradeoff premise for ILS. Under these circumstances, a null sensitivity for that interferent is not required for CLS, and the orthogonality (ie, full selectivity) constraint of the analyte against all interferences in Equation 4 can be relaxed to increase the NAS and decrease prediction uncertainty. In this case, s Ã a is not constrained to be orthogonal to the interferent. Transitioning from Figure 2B to Figure 2C demonstrates the concept.

| Classical and inverse calibrations
Using a collection of calibration samples, Equation 1 is expressed as where R (M × P) holds the measured spectra in rows, C (M × N) contains the known concentration of the components in the standards, and E (M × P) is the error matrix associated with this decomposition. The calibration (training) step of CLS amounts to estimating the pure component spectra as In this paper, it is assumed that the calibration standards, and hence S, properly reflect the composite system of matrix interactions. Once S is known, the solution to Equation 9 gives direct access to the CLS regression coefficients satisfying Equations 4 and 6 and is written as With the matrix of regression vectors, concentrations for calibrated components can be estimated for a sample using Equation 3.
In some analytical problems, such as in the analysis of natural products or very complex mixtures, pure component spectra are not known or cannot be estimated by Equation 13 because not all components concentrations are known even though Equation 12 is the underlying model. In this case, ILS modeling (multiple linear regression, principal component regression, PLS, RR, etc) uses only quantitative information of the analyte of interest for training and does not need pure component spectra. With ILS, the sample prediction for the analyte occurs by Equation 2 by using a vector of regression coefficients for analyte a obtained from the pseudoinverse of R expressed as b ILS;a ¼ R þ c a ; where R + depends on the ILS method used. Note that if concentrations for more than one responding component are actually known, the ILS model matrix can be computed by FIGURE 2 Example net analyte sensitivity s Ã a effecting the net analyte signal (NAS) (Equation 10) at 2 measured responses r 1 and r 2 . A and B, The NAS becomes smaller for the orthogonal projection of s a against s i as the similarity between the analyte spectrum s a , and the interferent spectrum s i increases. C, Relaxing the orthogonality constraint in Figure 2B with respect to the interferent can lead to a larger NAS where b ILS,a is the analyte column in B ILS .
Because of the unique differences between how regression vectors are obtained for CLS and ILS (Equation 14 versus Equation 15 or 16), B CLS satisfies the orthogonality constraints while b ILS,a and B ILS do not. This conclusion is fully derived in Brown. 21 Thus, ILS processes strive to meet orthogonality and the final model is one that is useful for prediction. This nearorthogonality has been graphically characterized. 25 It should be noted that in the hypothetical noise-free measurement situation, the ILS models do fulfill 31 Equations 6 and 9.

| Selectivity-rCLS
The selectivity requirements in CLS can be relaxed by setting Equation 4 to where l is the relaxation parameter relative to k and j that represents either the analyte or an interferent as the situation dictates and is read as the relaxation parameter for the kth component on the jth component with − ∞ < l kj < ∞. A relaxation parameter represents the change in predicted analyte concentration per unit change in the respective interferent concentration and allows the analyte concentration prediction to be affected by the interferent (ie, introduces prediction bias). Better relaxation parameters are those that slightly increase bias for lower prediction variance. The matrix L containing these relaxation coefficients l kj replaces the identity matrix I in Equation 9 resulting in new regression coefficients for relaxed models B L expressed as Solving Equation 18 for the rCLS coefficients in B L results in The B L coefficients deviate from B CLS by the amount defined by L. It is seen that rCLS is a generalization that converges to CLS when full selectivity is imposed by L = I and hence, B L = B CLS . Expanding Equation 19, the regression coefficients in rCLS are seen to be where b L,a and b L,i are respective vectors of regression coefficients in B L . Using b L,a to predict the analyte concentration results inĉ Because of the CLS orthogonality constraints, Assuming l aa = 1 in Equation 22, then the bias is 0 when l ia = 0 (ie, full model selectivity); otherwise, the c i l ia terms introduce bias inĉ a . As shown in the selectivity coefficient sections (Sections 3.2 and 3.3), this bias, however, will be lower if the concentration of the interferent in the sample is low and/or if the instrument sensitivity for the interferent is also low. In turn,ĉ a can benefit from a lower variance if the L 2 norm of b L,a is smaller. As shown in Section 6, other combinations of l aa and l ia are possible leading to other bias/variance trade-off situations for the relaxed B L . Depending on the sample and degree of relaxation from orthogonality, the selectivity-rCLS model can have an improved prediction performance compared to the CLS model. Values for l kj must be optimized for the dataset at hand, since the interferent spectral characteristics (interferent wavelength sensitivities and similarities with the analyte spectrum) and concentrations differ from one dataset to another. Also, note that for a given analyte, only the corresponding column of L must be optimized. One approach to solve for L (or the l a column) is to use a cross-validation strategy while systematically varying L. Depending on the number of components and hence values in L, this process is probably not feasible and an optimization algorithm such as simulated annealing or genetic algorithm could be used. For both situations, the optimal L is sought that minimizes prediction error. Neither approaches were studied in this paper and only trends in measures of model quality are reported as relaxation parameters varied.

| Analysis of rCLS as a basis change from CLS
To sustain Equation 12 (R = CS T + E), the corresponding relaxed expression is expressed as where C L = CL and S T L ¼ L −1 S T . By inserting the invertible matrix L, the solutions are not unique (ie, multiple solutions with the same fit E can be obtained). Algebraically, the measured sample spectra (R) can be expressed as their coordinates on a given vector base. The column vectors in S represent N independent basis vectors that span the space of all possible mixtures of the N components where the linear model is valid. Hence, by setting the base to the pure component spectra at unit concentration, the coordinates of each sample in R for this pure component base are the concentrations of each component in the respective mixture spectra. These concentrations are contained in C and mathematically expressed by Equation 1.
The N column vectors in S L also span a vector space for spectra in R (apart from noise). The coordinates of the spectra R in this basis set are now given by C L . Thus, L −1 is the transformation matrix from the S basis set to an S L basis set. Depending on L (degree of relaxation or transformation), different basis vectors can be formed and, respectively, the corresponding rCLS models in B L . The components of S L can be thought of as pseudo-components where a pseudo-component is a basis vector made up of mixtures of pure components that serves the same purpose as the pure component basis vectors.
The creation of pseudo-components from pure components amounts to relaxing the selectivity constraints of the CLS model. The concentration estimate of the analyte in a sample by rCLS (b c L;a ) is an unbiased estimate of the amount of the pseudo-component s L,a but a biased estimate of the amount of the pure component s a . As an example, it is possible to tune the relaxation (transformation) parameters in order for the pseudo-component to be almost the same as the pure component spectrum (eg, a two-component situation with s L,a = 0.99 s a + 0.01s i ). In this situation, the solution b c L;a is as a biased estimate of c a due to a small contribution from the interferent.
The transformation of the basis vectors in S by L −1 is equally matched by transformation of the B CLS basis vectors (the N column vectors) to the new basis set B L with L being the transformation matrix. These basis vector changes are illustrated in Figure 3 (patterned after the general situation in Vainchtein 32 ). From Figure 3, it is recognized that S and B CLS have the special property of being reciprocal basis vectors (S T B CLS = I) as do S L and B L (S T L B L ¼ I). Because of this reciprocal nature within CLS and rCLS, the respective calibrations and predictions can be expressed as covariant and contravariant components with additional special relationships. [32][33][34] Shown in Figure 4 is a geometric depiction (a rendition from the generic portrayal in Vainchtein 32 ) for a two-component CLS system (analyte and interferent) for a single spectrum from R. The sample spectrum r is not only represented as linear combination of the basis vectors in S but also as a linear combination of the reciprocal basis vectors from B CLS . The components for the linear combination of the S basis vectors are respective concentrations. Using Equation 1 with only one interferent, the relationship is written as The where v a and v i represent the covariant components. These components are computed individually by v a = r T s a and v i = r T s i or simultaneously by v T = r T S. The vectors in S can be labeled the covariant vectors because when the dot product is taken between r and a pure component spectrum s, the corresponding covariant component is obtained.
A similar depiction in Figure 4 can be made for the transformed basis vectors S L and B L by substituting the respective transformed basis vectors for S and B CLS . Equally, the following equations represent the transformed contravariant and covariant situation From Figure 4, the relationship between the contravariant basis set and the NAS is clarified. Specifically, the contravariant component c a can be found from either the component c a ‖s a ‖ shown in Figure 4 or the projection of r onto the contravariant vector b CLS,a . This projection is the NAS labeled r * with the L 2 norm ‖r * ‖ = c a /‖b CLS,a ‖. Not as useful from an analytical chemist's perspective, Figure 4 shows the covariant components.
The ambiguity in the decomposition R ¼ C L S T L þ E results from the lack of constraints on L in Equation 23. Thus, any invertible L is a solution as demonstrated in Section 6 and many solutions (regression vectors) with no physical interpretation are possible that form effective predictions, [22][23][24] ie, the models just work. Similarly, without constraints, C L and S L are not easy to chemically interpret. Regardless, the pseudo-components, by definition, meet the CLS hard constraints of S T Because ILS models are not reciprocal basis vectors with pure component spectra in S (the ILS regressions are not unique 21,22 ), an ILS regression vector is not a true contravariant vector. Thus, a sample spectrum cannot be represented as a linear combination of all respective ILS regression vectors (one for the analyte and one for each response forming interferent, B ILS ) and simultaneously, a linear combination of S. This is also true for the basis vectors transformed by L −1 and L. However, it should be noted that while the reciprocal basis set description (the covariant, contravariant, and orthogonality strictness) does not hold for ILS, many of the computations (NAS, net sensitivity vector) are still relevant. These points are further elaborated in the next section.

| Selectivity-rILS
By expressing ILS in a relaxed format (rILS), insight can be gained on how close an ILS solution comes to satisfying the CLS selectivity constraints. Even though all sample components are not known when using inverse models, it can still be assumed that the Beer-Lambert law (Equations 1 and 12) characterizes the underlying model generating the measured spectra.
Using Equation 2, the fitted concentrations for a sample using an ILS calibration iŝ Applying the relaxed parameter definition to Equation 27 results in where the derivative in the second equation is taken for every interferent i. As already noted, ILS models implicitly relax the orthogonality constraint to maintain a bias/variance trade-off relative to respective minimization criteria, 16 ie, l aa ≠ 1 and l ia ≠ 0. Variations of s T a b ILS;a and s T i b ILS;a (now termed l aa and l ia ) as ILS tuning parameters vary has been graphically characterized. 25 Specifically, theses terms were used to characterize the bias/variance tradeoff in selecting tuning parameters.
An rCLS model is formed from a linear combination of all analyte and interferent CLS models. Similarly, an rILS model is formed from a linear combination of all ILS models. This combination to form the rILS models B Q is expressed as where Q denotes a matrix of coefficients to form a linear combination of B ILS . Note that Q is different from the L used in rCLS. With CLS, there is an inverse relationship between S T and B CLS and hence, the L matrix used in Equation 19 to form the relaxed models B L as the linear combinations of B CLS is the same L matrix obtained in Equation 18 for the inner products of the pure component spectra with B L (Figure 3 is a diagram of the inverse basis relationship). With ILS, the inverse relationship between S T and B ILS does not exist. For example, given the generic situation of Equation 9 (S T B = I), the only B satisfying this equality is B CLS . As with rCLS, the degree of deviations of L from I for the rILS models in B Q depend on the coefficients in L shown by relative to prediction byĉ T ¼ c T S T B Q þ e T B Q , similar to Equation 7 for CLS.The more L for a particular modeling process deviates from I, the greater the deviation from obtaining an orthogonally constrained model. When Q is the identity matrix, then the rILS models are the ILS models and L has the relaxation coefficients expressing the amount of orthogonality obtained for the ILS models. This statement agrees with previous work where the matrix W = S T B ILS − I is defined 16 and can be rearranged to W = L − I. Note that this second equation is true for both rCLS and rILS. The W can be interpreted as a measure of departure from the CLS. Prediction of the analyte in a sample by a particular rILS model is expressed bŷ ¼ c a l aa þ ∑ i c i l ia þ e: The more of S that is known, the better the assessment of the interferent effects defined by L for particular rILS model. Interpretation of the respective rILS l aa and l ia values is the same as for rCLS; l ia values denote the change in predicted analyte concentration per unit change in the respective interferent concentrations relative to the model used for prediction. This aspect is further commented on in the Selectivity section. Values in L are not as simple for rILS as for rCLS. This complexity can be seen by expanding Equation 29 for a simple situation with an analyte and one interferent to where b Q,a and b Q,i are the vectors of regression coefficients in B Q . Using b Q,a expressed in Equation 32 to predict the analyte concentration results in From comparing Equation 33 to Equation 31, the l aa and l ia values are written as Equation 34 shows that the final l aa and l ia values depend on the relationship between the base ILS models being used and amount (or weight) of the base ILS models combined to form the final analyte model vector b Q,a . If Q = I, then l aa ¼ s T a b ILS;a and l ia ¼ s T i b ILS;a . As with rCLS, a systematic variation of Q or optimization algorithm can be used to determine appropriate Q values (or the q a column) for rILS. However, because all the ILS models are rarely known for the analyte and each interferent, then the practicality of using rILS is nearly nonexistent. It is interesting to note that Equation 29 can be thought of as a more general regression strategy based on stacked regression [35][36][37] where a collection of analyte regression models are used. For example, using a set of analyte models for a B, a simple stacked regression gives c a ¼ Rb q a ¼ RBq a ¼Ĉq a . Least squares can be used to solve for q a .

| SELECTIVITY
While selectivity is the focus in this section, analyte and interferent sensitivities are key figures of merit 20,38 that selectivity depends on. In particular, the accuracy of analytical determinations can be adversely affected by other species present in the sample and/or variations of analytical conditions (eg, changes in temperature in NIR spectra) that alter the instrument response for the sample. Sensitivity and selectivity are 2 figures of merit balanced in forming calibration models. 18,25 Approaches to measure and quantify sensitivity and selectivity have been provided (previous works [18][19][20] and references therein). Regarding selectivity, Ridder and Brown 17 state "a method is fully selective against a particular interferent if the result it renders is independent of the concentration of the interferent. If a method is not fully selective, then the results it produces will depend on the concentration of the interferent." In this work, a selectivity definition is proposed based on rCLS and is applicable to the usual multivariate CLS and ILS calibration models and reduces to the standard univariate definition. Because sensitivity is key to defining selectivity, a brief overview of univariate sensitivity is given next.

| Sensitivity in univariate calibration
Sensors are designed to respond to the amount of a component of interest (the "analyte" a) in a sample. The "sensitivity" of the sensor for analyte a, s a , is defined 39 as the change in the measured response r per unit change of concentration of a (c a ) expressed as dr dc a ¼ s a : In univariate CLS calibration with narrow concentration ranges, the relationship between response and concentration should be describable by a straight line. In this situation, sensitivity is constant over the concentration range and is the slope of the regression line. 39 If the relationship is not linear, then the sensitivity changes depending on the value of c a . The sensitivity can also depend on the sample matrix, in which case the standard addition method is used. In addition to the analyte, the sensor can also respond to the presence of other components in the sample (interferents) and hence, the measured response is a mix of responses generated by the analyte and interferents. The degree of contribution from an interferent signal depends on how sensitive the sensor is to the interferent (s i ). This sensitivity can be obtained by measuring samples with increasing known amounts of interferent when available.

| Selectivity and the selectivity coefficient in univariate CLS
In univariate calibration, the (background corrected) measured response is assumed to follow the linear model with prediction byĉ where b a = 1/s a is defined for clarity. Imposing the condition in Equation 4 on Equation 37 gives If unexpected interferents contribute to the measured response of the unknown sample, Equation 36 is now written as and prediction with Equation 37 is given byĉ a ¼ c a þ ∑ i c i s i b a þ eb a : A nonzero sensitivity of the sensor for interferent i (s i ≠ 0) introduces a bias in the predictionĉ a in the amount c i s i b a . This bias depends on the amount of the interferent i (c i ) and on the sensitivity for the analyte (b a = 1/s a ) and for the interferent (s i ). The absolute sensitivity of the sensor for an interferent is not as important as the relative sensor sensitivity for the interferent compared to the analyte sensitivity. In this sense, "selectivity" for an analyte means lack of sensitivity (low enough sensitivity) for a potential interferent compared to that for the analyte. Accordingly, the univariate "selectivity coefficient" for analyte a in the presence of interferent i is defined as with −∞ < K a,i < ∞. In univariate calibration with spectroscopic data, the sensitivities in Equation 41 are usually positive, but for other systems of analysis, negative values are possible. The selectivity coefficient is used to quantitate the model sensitivity towards the interferent relative to the analyte model sensitivity. 18,26,27 With the selectivity coefficient now defined, then dividing Equation 39 by s a and Equation 40 by s a b a with rearrangement gives, respectively, The analyte prediction error is obtained by subtracting the true analyte concentration value from Equation 42 producing and expressed as a relative error byĉ Equations 43 to 44 show the crucial significance of selectivity coefficients for quantitating the effect each interferent has on a particular analysis. For example, a K a,i = 0.001 means that the sensor is 1000 times more sensitive to the analyte than the interferent. Also shown in Equations 42 and 43 is the direct effect analyte sensitivity has on the prediction variation due to random error e. Specifically, the greater the analyte sensitivity, the smaller the effect. Selectivity, like sensitivity, is continuous and when it is stated that "a sensor is selective for the analyte of interest," actually what is meant is that the sensor is selective enough such that errors caused by the sensitivity to the interferent become irrelevant. A sensor can be checked for a number of potential interferents followed by calculating the selectivity coefficient K a,i for each interferent. The greater the number of low K a,i values, the more the sensor can be considered globally selective for the analyte.
Applying notation previously developed for selectivity assessment in multivariate ILS 16 to the univariate situation and starting with Equation 40, the prediction error is written aŝ where w a = s a b a − 1 and w i = s i b a and because in this univariate situation, b a = 1/s a , then w i = K a,i . The corresponding univariate selectivity indicators developed for multivariate ILS 16 (presented in the next section) are for the analyte and each interferent The interferent prediction problem shown in Equations 42 to 44 does not exist with multivariate CLS. In this situation, the regression vector cancels the effect of the interferent (s T i b a ¼ 0) to fulfill the orthogonality constraint.
3.3 | Selectivity, sensitivity, and the selectivity coefficient in multivariate rCLS and ILS

| Selectivity coefficient
For congruence with univariate calibration, a definition for the selectivity coefficient in multivariate calibration should consider how nonmodeled interferents affect the prediction from the multivariate calibration model. Moreover, the selectivity coefficient should also be defined for modeled interferents since they also affect the model's prediction performance by decreasing the NAS and increasing the prediction variance. Applying Equation 41 for the univariate situation to the multivariate case gives the selectivity coefficient as where the generic model vector b a notation indicates any analyte model vector via CLS, rCLS, ILS, rILS, or other method and −∞ < K a,i < ∞ with values approaching zero denoting an optimal selectivity situation as with the univariate definition. Using which is the multivariate representation of univariate situation in Equation 42. If a model is accurately predicting the analyte, the model is doing a good job of making the contributions sum to the analyte concentration in Equation 49. Specifically, while regression vectors can have uniquely different shapes and magnitudes, the analyte concentration predictions are similar. [21][22][23][24] The corresponding multivariate prediction error equations to univariate Equations 43 and 44 are, respectively, With selectivity coefficients, Equations 50 to 52 show the effect an interferent will have on the analyte prediction (or prediction error) for a sample. Different from the univariate situation, the prediction does depend on the relaxation parameter for the analyte (l aa ) where for the univariate case, l aa = 1.
With respect to the approach taken in Brown and Ridder, 16 Equation 47 for univariate calibration becomes for multivariate calibrationĉ Unlike the univariate situation, w i ≠ K a,i for multivariate calibration and Equation 53 written to include the selectivity coefficient isĉ a −c a ¼ c a w a þ l aa ∑ i c i K a;i þ e: The selectivity coefficient for a particular interferent and model can also be represented by the ratio of the L 2 norms of the net sensitivity interferent s Ã i and analyte s Ã a vectors that are the projections of the respective pure component spectra onto the model vector. With this notation, the selectivity coefficient is written as

| Selectivity and sensitivity
The CLS component-wise net sensitivity vector was presented and discussed using Equation 11 can be modified for rCLS and others as where again the b a notation indicates any analyte model vector via CLS, rCLS, ILS, rILS, or other method. The scalar selectivity is with the corresponding sensitivity written as To avoid negative SEL and SEN values for the analyte and interferent when a negative value for a relaxation parameter is obtained, absolute values can be used.
In many calibration situations, it is unlikely to know S or the full C and ILS is commonly used in these situations. To characterize the model analyte sensitivity, the best approximation is probably the usual SEN a ¼

| MEAN SQUARE ERROR OF PREDICTION
Using the approach set out in other works, 16,21 the expected mean square error of prediction (<MSEP>) for an arbitrary b a model is expressed as where Ψ denotes the expected concentration covariance for the population of samples being considered, Σ symbolizes the expected measurement error covariance, and w is the (N × 1) vector with analyte value w a = l aa − 1 and respective values for each interferent are w i = l ia . The first term in Equation 59 is termed bias, and the second term is the variance contribution. With this derivation of <MSEP>, the influence of the relaxation parameters in the bias contribution are directly obtainable. The influence of the relaxation parameters on variance for rCLS can be distinguished from Equation 59 by replacing the generic b a with b L,a calculated from Equation 19. The bias term in Equation 59 depends on the selectivity and sensitivity of the calibration model and features of the population samples being predicted, eg, relative amounts of the interferent compared to the analyte. However, in the CLS case with full orthogonality, w G is a zero vector causing removal of the bias term. Calculated in this paper is the MSEP for the analyte based on observed values for Ψ = C T C/(M) and w (l a ).

| Algorithms and data preprocessing
All algorithms were written by the authors using MATLAB 8.1 (The Math Works, Natick, Massachusetts). Calibration samples are mean-centered relative to the calibration set. New samples are mean-centered to the calibration set mean prior to prediction. For simplicity, Σ = σ 2 I and σ 2 = 1 are assumed when computing MSEP.

| Near infrared data
In addition to pure component methanol and water samples, 9 mixtures of methanol and water were prepared with methanol concentrations in increments of 10% (by volume) ending 40 at 90%. The water concentrations were such that summed sample concentrations are 100%. Spectra of the 9 mixtures plus pure water and methanol were measured in a 0.5-mm flow cell using an NIRSystem model 6500 NIR spectrophotometer. Spectra were recorded from 1100 to 2500 nm in 2-nm increments for 700 points per spectrum. All samples were measured over 1 hour. The spectra are plotted in Figure 6. Methanol is the analyte, and due to the sample set size, leave-one-out cross-validation (LOOCV) was used for CLS and rCLS modeling. For all studies, only the 9 mixtures were used. As noted previously, estimated matrix effected pure component were calculated from Equation 13 using the 9 mixtures.

| RESULTS AND DISCUSSION
To graphically assess the trade-offs between different measures of model quality (bias, variance, MSEP, K a,i , sample-wise prediction errors, etc) for rCLS, a simple two-component system (an analyte and interferent) was studied. In this case, only 2 relaxation parameters are needed (l aa and l ia ) and observations from images of model quality measures are possible as the 2 relaxation parameters vary across a range of values. See Section 5 for ranges. With more than 2 analytes, such images are not possible.

| Prediction error, bias, and variance
Shown in Figure 7 are the mean values from LOOCV for the root mean square error of calibration (RMSEC), RMSE of crossvalidation (RMSECV), MSEP, bias, and the model vector L 2 norms (‖b L,a ‖). The RMSEC, RMSECV, and bias images indicate  Figure 7 is at an rCLS model with low l aa and l ia values. It should be noted that mean MSEP images over the LOOCV images are the same for the calibration and validation sets and hence, only one is shown in Figure 7. This statement is also true for the respective mean bias images. However, while the individual sample-wise RMSEC and ‖b L,a ‖ images are essentially the same across the LOOCV, sample-wise RMSECV, bias, and MSEP images do vary depending on the sample left out. Inspection of the sample-wise RMSECV and bias images shown, respectively, in Figures 8 and 9 reveals that RMSECV and bias values track similarly as the relaxation parameter values vary. Because of this similarity, only 3 of the 9 bias images are represented. Also noteworthy is that for a given sample, many models will predict equivalently agreeing with previous work. 22,24 For the situations presented, the b L,a models in the deep blue zones primarily differ by respective L 2 norm values (magnitude), not by changes in shape (direction). Figures 8 and 9 show that when the interferent to analyte ratio is low, acceptable rCLS predictions are possible when the l aa values are nearly 1 regardless of the value of l ia . Conversely, when the analyte to interferent ratio is high, the l aa values can now range while the value of l ia must be close to 0 for an acceptable prediction error.
From Figure 8, it is observed for all samples that the orthogonality constraints can be relaxed and increases in prediction errors are obtained because of adding bias to the prediction relative to the CLS prediction at l aa = 1 and l ia with a variance reduction (the ‖b L,a ‖ image in Figure 7). However, most notable, it is also possible to achieve an equivalent prediction error with a reduction in the variance. In this case, l ia increases in value relative to 0 with a corresponding decrease in the l aa value from 1. Conversely, equivalent predictions errors are possible with increases in variance (nondesirable solutions) where now the l ia values are more negative with corresponding increases in the l aa value above 1.
The images in Figures 9 and 7 reveal that the sample-wise bias/variance trade-off has several potential combinations depending on the interferent to analyte concentration ratio. Possible are increased bias for a decrease in variance (a goal of rCLS, but the tradeoff should not be significant), a bias increase for an increase in variance (undesirable models), and lastly, apparently, no change in bias accompanied by decrease in variance (a desirable model).
Displayed in Figure 10 are the sample-wise MSEP images. The trends are like those seen in Figure 8 for the RMSECV images with the exception that each rCLS model is unique relative to the MSEP. For all samples, the orthogonality constraints can be relaxed to obtain reduced MSEP values. Specifically, reducing l aa values from 1 while increasing the l ia values from 0. Thus, while each sample can be predicated with low errors and bias by several respective rCLS model, only one sample-wise rCLS model has the smallest MSEP. Because of the sample-wise variations, 2 simulated calibration ranges were made for the validation sample with the greatest interferent to analyte concentration ratio with the interferent at 90% and the analyte at 10%). Shown on the left in Figure 11 is one calibration situation where the 8 samples in C have the analyte concentration equally incremented from 1% to 50% instead of from 20% to 90% as in Figures 9 and 10 to more closely span the analyte concentration for this particular validation sample. As with the full calibration range, the interferent concentration varies to maintain a concentration sum of 100% in the 8 simulated calibration samples. The bias and MSEP images are obtained using Equation 59 with the same b L ,a and l aa and l ia ranges used with full calibration range. The only change is the analyte and interferent reduced C ranges and hence, a concentration covariance matrix more closely centered around the validation sample. The second simulated calibration case is shown on the right of Figure 11 where now the analyte concentration range is further reduced to more closely span the analyte concentration in the same validation sample. In this case, the analyte concentration varies from 1% to 11% in equal increments and the interferent concentration, respectively, varies to maintain the concentration sum at 100% in the eight calibration samples. In comparing the bias and MSEP images in Figures 9 and 10 for the full calibration range to those in Figure 11, it is observed that as the calibration concentration range decreases to more closely bracket the validation sample situation, the calibration bias and MSEP images converge to those of the specific validation sample. Using local calibration ranges for specific samples allows for better matrix matched calibration samples to the specific samples, thereby leveraging the advantages of rCLS. Using the mean MSEP of the local calibration set now allows proper selection of the relaxation terms for the validation sample where the MSEP are small. Thus, the more a calibration set is descriptive of a new sample for quantitation, the more likely the selection of the best l a by the MSEP for the calibration set will match the optimal l a for the new sample.

| Sensitivity and selectivity
By using the relaxed sensitivity measures in Equation 58, the trade-off between maximizing the model sensitivity towards the analyte and minimizing the model sensitivity to the interferent is possible. This trade-off is demonstrated in Figure 12 where it is also observed from the top image that the traditional sensitivity measure does not provide this trade-off information. In conjunction with prediction errors in Figure 8 and MSEP values in Figure 10, rCLS models associated with greater sensitivity have lower prediction errors and MSEP values. It is also seen that if the sensitivity is low for the analyte, predictions of new samples with acceptable prediction error are still possible. However, if the interferent to analyte ratio is high, then the MSEP can be poor FIGURE 12 Mean sensitivity images relative to the analyte methanol relaxed classical least squares (rCLS) calibration models across the leaveone-out cross-validation (LOOCV) for the general measure 1/‖b L,a ‖ (shown is log(1/‖b L,a ‖)) and the relaxed version (Equation 58) relative to respective relaxation parameter values (absolute for SEN i due to the allowed negative values for l ia ) compared to when the interferent to analyte concentration ratio is low. Because the mean sensitivity values (absolute for SEN i due to the allowed negative values for l ia ) presented in Figure 12 are essentially the same as the sample-wise images, only the mean image is shown.
When the relaxed selectivity measures defined in Equation 57 are used, the corresponding images in Figure 13 (absolute for SEL i due to the allowed negative values for l ia ) are similar to the respective sensitivity trends shown in Figure 12 except the selectivity values range between 1 and 0. Similar interpretations to the trends made for RMSECV and MSEP and sensitivity can be stated for the selectivity values. Conversely, the selectivity measures defined by Equations 46 and 47 are shown in the bottom row of Figure 13 and the trends with RMSECV and MSEP are not obvious. These selectivity definitions only depend on the respective relaxation parameters (deviation from orthogonality). The selectivity measures proposed here not only depend on the relaxation parameter values but the selectivity values are also dictated by sensitivity of the analyte relative to the model. Depending on the interferent to analyte concentration ratio, the selectivity may be low for the analyte, but predictions of samples with acceptable prediction errors are still possible. This trade-off is well characterized by the selectivity coefficient proposed in Equations 48 and 55 and imaged in Figure 13. As with the sensitivity images in Figure 12, the sample-wise selectivity values shown in Figure 13 are essentially the same as those for the mean images.
The selectivity coefficient values displayed in Figure 13 in conjunction with the RMSECV images in Figure 8 show that if the interferent concentration is large relative to the analyte concentration, acceptable predictions are possible when the selectivity coefficient is low. It is possible to also have low prediction errors where the selectivity coefficients are large. Low prediction errors can occur because the l aa values are small thereby compensating for the large interferent prediction bias. Again, the sample-wise selectivity images are essentially the same as the mean image in Figure 13 and only the mean image is shown.

| CONCLUSION
With rCLS, a family of CLS models can be formed. Some rCLS models leverage interferent concentrations into the prediction to increase bias with an improvement in variance. Other combinations of relaxation parameters are possible that vary the degree of the bias/variance trade-off and hence, the MSEP, ie, numerous models can be formed that predict well, but only one has the lowest MSEP relative to the variance. It was shown that the more local the calibration set (concentration ranges closely bracket the analyte and interferent concentrations), the greater opportunity for improvement of rCLS over CLS. An rILS strategy was presented to form a family of ILS models. However, for the main reason ILS is used (all concentrations of spectrally responding species are not known), this approach is not practical. Regardless, a selectivity coefficient is defined that is applicable to univariate and multivariate calibration. With such a coefficient, it is possible to assess the predictability of a sample given knowledge of expected analyte and interferent concentrations.
If the goal is to obtain a model that predicts well and targets the CLS orthogonality constraints, then a penalty regression approach can assist in this. Specifically, the regression vector satisfying the minimization expression min Rb a −c k k 2 þ λ 2 b a k k 2 þ η 2 Sb a −i k k 2 (60) is sought where the first 2 terms are the usual RR terms and the third term is an extra orthogonality constraint weighted by the η tuning parameter with i denoting a column vector of zeros with a 1 at the row position for the pure component analyte spectrum in S. Alternatively, the i in Equation 60 could be a vector of target relaxation parameter values. If S is not known, then non-analyte samples can be used instead giving min Rb a −c k k 2 þ λ 2 b a k k 2 þ η 2 Nb a k k 2 ; where N is a collection of samples without the analyte. 41 For Equations 60 and 61, sought from the regression is a b a = b ⊥ + b i , where b ⊥ denotes the part of b a orthogonal to the S and/or N and b i is the contribution from the interferents. 41 In both Equations 60 and 61, the MSEP is not directly minimized, but with an MSEP penalty included, the solution is no longer directly computable. As demonstrated in Section 6, the more closely the calibration set mimics a particular sample regarding the full sample matrix, the more likely the solutions of Equations 60 or 61 will correspond to a minimum MSEP.