A Liu estimator for the beta regression model and its application to chemical data

Beta regression has become a popular tool for performing regression analysis on chemical, environmental, or biological data in which the dependent variable is restricted to the interval [0, 1]. For the first time, in this paper, we propose a Liu estimator for the beta regression model with fixed dispersion parameter that may be used in several realistic situations when the degree of correlation among the regressors differs. First, we show analytically that the new estimator outperforms the maximum likelihood estimator (MLE) using the mean square error (MSE) criteria. Second, using a 'simulation study, we investigate the properties in finite samples of six different suggested estimators of the shrinkage parameter and compare it with the MLE. The simulation results indicate that in the presence of multicollinearity, the Liu estimator outperforms the MLE uniformly. Finally, using an empirical application on chemical data, we show the benefit of the new approach to applied researchers.


| INTRODUCTION
In many empirical contexts in which the dependent variable is restricted to the interval [0, 1], such as rates and proportions, the classical solution has been to transform the dependent variable so that it is mapped from the interval [0, 1] onto the real line (R). In this situation (which is common when analyzing chemical, environmental, or biological data), the logistic regression model is considered, where the log-odds is used as the dependent variable in a linear multiple regression model. Another example of a common transformation is the inverse of a suitable cumulative density function, which also leads to the dependent variable Y 2 [0, 1] being mapped map onto the real line. A well-known example of the latter is to use the probit model. But these kinds of solutions have several demerits; for example, the model parameters cannot be easily interpreted in terms of the original dependent variable. Another demerit is that proportion measures typically show asymmetry, and hence any inference based on the normality assumption can be deceptive, especially for a small sample.
As a remedy to the aforementioned problems, Ferrari and Cribari-Neto 1 proposed a beta regression model for a continuous dependent variable (Y) with support in the open interval ]0, 1[. This model is based on the assumption that the dependent variable follows the beta distribution. One of its advantages is that the regression parameters are interpretable in terms of E [Y], and further, the model can also accommodate asymmetries and heteroskedasticity. Later developments have expanded the model so that it is also possible to include covariates for modeling dispersion. 2 While the performance of the maximum likelihood estimator (MLE) for the model parameters in a beta regression model has been investigated under orthogonal regressors, it is of great interest to investigate the performance of MLE under conditions where the regressors are non-orthogonal and even more interestingly to investigate the performance of a Liu estimator by Liu,3 as an alternative to the MLE, for situations with orthogonal and nonorthogonal covariates. Shrinkage estimators for the linear regression model were first considered by Hoerl and Kennard 4 where the classical ridge estimator was introduced. This method was further developed and applied to chemical data by Vigneau et al, 5 Muniz and Kibria, 6 and Algamal 7 among others. A major development was the introduction of the Liu estimator by Liu, 3 which has an advantage over the ridge estimator since the shrinkage is a linear function of the shrinkage parameter. This method has also been further developed and applied to chemical data by Kaçiranlar et al, 8 Alheety and Kibria, 9 and Lukman et al. 10 The extent to which these types of methods have been studied is due to the importance of the situation when the regressors are non-orthogonal since this is the most common situation when analyzing chemical, environmental, or biological data.
In this paper, for the first time, we propose some Liu estimators for the beta regression model with fixed dispersion parameter. To estimate the shrinkage parameter of the Liu estimator, we use five different methods from previous literature and one novel method based on the derived optimal value. These estimators will be investigated in the case where we have a multicollinearity issue. We evaluate the MSE properties analytically, use a simulation study to investigate the finite sample properties and show the benefit of using this method on two different chemical data sets. In the first application, we use the same dataset analyzed by Ospina et al. 11 In this data set, the dependent variable is defined as the proportion of crude oil that is converted to petrol after fractional distillation. This dependent variable is related to some explanatory variables that are multicollinear. In the second application, we follow Şimşek, 12 where we study the color of hazelnuts. The color of roasted hazelnuts is measured using a colorimeter which is one of the most widely used methods in chemistry to determine color characteristics. In both examples, the benefit of our new method is clear to applied researchers analyzing chemical data.
The organization of this paper is as follows: The beta regression model and Liu estimators are discussed in Section 2. A simulation study to compare the performance of the estimators are provided in Section 3. To illustrate the findings of the paper, two real-world data are analyzed in Section 4. Concluding remarks are presented in Section 5.

| BETA REGRESSION MODEL AND LIU ESTIMATORS
The beta regression model and some Liu estimators are discussed in this section.

| Beta regression model
In contrast to the multiple linear regression model in which the dependent variable Y is assumed to be normally distributed, the beta regression model assumes that Y has a beta distribution with a support on the interval [0, 1]. The beta regression model by Ferrari and Cribari-Neto 1 is based on an alternative parameterization of the beta probability density function (pdf) in terms of a variable mean (μ(x)), a precision parameter (φ), and x, which is an (k × 1) vector of predictors. Hence, using the same parameterization for the beta distribution, we have the following pdf Y $ B(μ(x), φ): In particular, and thus, for a fixed value of X, and hence fixed μ(x), the variance of Y increases/decreases with decreasing/ increasing φ, which explains the name dispersion parameter. Further, for fixed φ, the variance of Y is maximized (unimodality) when μ(x) = 0.5 and minimized as we move to the endpoints of the support for Y. Hence, the variance is a function of μ(x) so that a regression model using this parameterization is heteroscedastic. From Equation 1, the log pdf becomes Now let Y i f g n i = 1 be a random sample from the population Y i $ B(μ(x i ), φ), and let g(.) be a strictly monotonic and twice differentiable link function such that where x i is a (k × 1) vector of predictors and β 0 = [β 1 , …, β k ] is a (k × 1) vector of unknown regression parameters. Moreover, we assume that the link function g(.) : ]0,1[ ! R and there exist several different link functions that map the linear predictor onto the space ]0, 1[, such as where ln(.) is the natural logarithm and is the standard normal cumulative distribution function. Hence, the beta regression model assumes that the mean of the dependent variable can be represented in the following form: When using the logit link function, the beta regression model is obtained by assuming that the conditional mean of Y i can be written as Based on the random sample, the log-likelihood function for this family of beta regression models is then given by where μ(x i ) = g −1 (η i ). MLEs for β, φ are usually found by numerically maximizing Equation 9. Ospina and Ferrari 13 show that MLEs can be obtained by using a re-weighted least square algorithm (Greene 14 ). These estimators are not unbiased, especially in small samples (Cribari-Neto and Vasconcellos 15 ), although Simas et al 2 have obtained analytical bias corrections for the MLEs. We do not perform any bias corrections for the MLEs because we use the betareg package that only offers the possibility to estimate beta regressions without any bias corrections in the R system for statistical computing (Cribari-Neto and Zeileis 16 ). The motivation for this paper is twofold: first, to examine the effect of multicollinearity on MLEs and, second, to use inbuilt packages for estimating a beta regression model (as is common among practitioners).

| Liu estimators for a beta regression model
Different types of shrinkage estimators are available for estimating the regression parameters when the regressors are ill-conditioned in the model. The ordinary ridge regression estimator was first developed for the linear regression model by Hoerl and Kennard. 4 Later developments led to different new types of ridge regression estimators. 17,18 Liu 3 proposed an alternative but similar regression estimator with a different biasing parameter. In a multiple regression context, the regression parameter vector β for the Liu estimator is specified as follows: where d is a shrinkage parameter defined in the interval [0, 1] andβ ML is the MLE of β.
In a beta regression context, the parameter vector β for the Liu estimator becomeŝ where d is a shrinkage parameter defined in the interval [0, 1] andβ ML is the MLE of β obtained by a re-weighted least squares algorithm.

| A comparison of the MSE properties of Liu and ML
Let Λ = diag(λ 1 , Á Á Á, λ p ) be a diagonal matrix with the eigenvalues of X 0Ŵ X and let Γ = [γ 1,ÁÁÁ , γ p ] be a matrix with corresponding orthonormal eigenvectors. Then, based on the risk function in 27, the ML estimator is given by The Liu estimator can be written as follows: To derive the MSE of the Liu estimator 13, some simplifying lemmas are listed below.
Proof: The proof of Lemma 1 can be seen in Appendix A.
Proof: The proof of Lemma 2 can be seen in Appendix B.
Based on the risk function in (3.4) and by 13 we have for the Liu estimator: By Lemmas 1 and 2, the MSE for the Liu estimator 14 becomes Substituting α = Γ 0 β into 15 and using the fact that the trace of a square matrix is the sum of its eigenvalues, we obtain Proof: The proof of Theorem 1 can be seen in Appendix C.
To find an optimalβ d ð Þ, we need to determine the optimal value of d such that the MSE ofβ d ð Þ is minimized in the closed interval from zero to one. From the proof of Theorem 1, we have that Setting 17 equal to zero yields

| Estimation of the shrinkage parameter
To obtain such a shrinkage estimator, several estimators of the shrinkage parameter d have been suggested. We have selected a number of these that have shown good results in other applications. These estimators were mainly developed in a multiple regression context for the case where d is restricted to a scalar, hence yielding the same shrinkage for all parameters in β. 3,17,19 These estimators are easily extendable to a beta regression context because the estimator of the unknown β parameter vector can be expressed in the form of a weighted least squares estimator, that is, the form of the Liu estimator 11. To present the investigated Liu estimators of the shrinkage parameter d in the paper, we first defineλ j to be the jth eigenvalue of the matrix X 0Ŵ X, where X is a (n × p) matrix containing the regressors in the beta regression model and W is a diagonal weight matrix with the elements 1 in its diagonal. Furthermore,λ max = maxλ 1 , ÁÁÁ,λ p À Á and β 2 max is the largest element inββ where is the Hadamard product.
The first proposed estimator is based on the work of Hoerl and Kennard 4 and is defined as follows: The following two estimators are based on the work of Kibria, 17 which were developed for a linear multiple regression context: The remaining two estimators are based on the work of Khalaf and Shukur, 19 which were also developed for a linear multiple regression context: Finally, the last estimator is based on optimalβ d ð Þ where d is determined by plugging in MLE's of φ, λ j È É p j = 1 , β in 18; hence, the final estimator is j − 1 +α 2 ĵ λ j + 1 À Á 2 ,α = Γ 0β and X 0Ŵ X = ΓΛΓ 0 :

| THE MONTE CARLO EXPERIMENT
To evaluate the different point estimators of the regression parameters β 0 = [β 1 Á Á Áβ k ] in the context of a beta regression model, we need to perform a Monte Carlo simulation study under some different contexts. To make the evaluation similar to that in empirical applications, the data generating process is specified with both stochastic error terms and stochastic regressors. This is achieved by generating the data in a two-step procedure. Hence, in each replication of the Monte Carlo experiments, the first step involves generating n independent observations of the regressors which are drawn from a multivariate normal distribution by a pre-specified correlation structure among the regressors. The corresponding mean is then calculated by using the inverse link function 8 which yields n different mean values. These mean values are then used as parameters in the second step. In the second step, n independent observations of the dependent variable are obtained by conducting n random draw from a beta distribution with fixed dispersion and varying mean based on the mean values generated in step 1. Hence, for the ith observation of regressors, the inverse link function 8 produces the ith mean value, which is then used as a parameter in the second step along with a fixed dispersion parameter. The inverse link function 8 usually needs to be adjusted marginally to avoid ones and zeros in the observations of the dependent variable (Y). We follow the general recommendation (Smithson and Verkuilen 20 ) for solving this problem by rescaling the values of the dependent variable (Y i ) by the following:

| The design of the Monte Carlo experiment
The main property of the data generating process (DGP) is that the dependent variable follows a beta distribution with a fixed dispersion parameter and variable mean.

| The DGP used in the Monte Carlo experiments
The dependent variable is generated according to the following formula: where μ X i ð Þ= expðX i 0 βÞð1 + expðX i 0 βÞÞ -1 and X i 0 = [X i,(1) ÁÁÁX i,(p) ] is a vector with dimension p × 1 in which we have p different stochastic regressors; more information regarding the specification of the regressors is found below. The variability of the dependent variable is affected by two parts (Equation 2), where φ is a fixed scalar that partly controls variability; more details about the values used for φ are available in next subsection.

| The correlation structure among the stochastic regressors
In each Monte Carlo experiment, we generated a (n × p) matrix (Z), where p is the number of regressors obtained by random draw from a multivariate normal distribution with uncorrelated variables and uncorrelated observations; that is, Z $ N p (0, I p , I n ). Hence, the regressors are initially uncorrelated which will be changed in the Monte Carlo experiments. Our choice of the regressors' correlation matrix is a Toeplitz matrix that is determined by one parameter, ρ, as shown below: where P is the population correlation matrix. By using this type of correlation structure, we can generalize the same correlation pattern to any dimension on the matrix of regressors. Setting ρ to zero yields the uncorrelated regressors, and for cases in which ρ is larger than zero, an individual regressor is uncorrelated at the same level as the rest of the regressors (which is more realistic in some empirical applications). The regressors (X) are generated by post-multiplying Z by the transpose of the matrix root from the Cholesky decomposition of P so that they follow the correlation structure in question in the DGP.

| Motivation for the parameters used in the Monte Carlo experiments
We have chosen three main settings in which the number of regressors in the DGP is equal to 3, 6, and 12. We chose these numbers based on empirical studies analyzing chemical data with the use of more than 12 continuous regressors is unusual. In this study, we used three values for the degrees of freedom (df = n − p − 1) covering large and small sample sizes: 20, 50, and 100. We have chosen a relatively large resolution for the ρ parameter, which controls the correlation structure among the regressors. This is because the main focus of this paper is to investigate how correlated regressors affect the precision of the estimators of β in the beta regression model and the potential for a Liu estimator under different settings. We use ρ = 0.7,0.8,0.9,0.95,0.99.
The beta coefficients were chosen by first setting all coefficients to be one, except for the intercept which was set to zero and then allocating a +/− sign to the coefficient by randomly drawing from a Bernoulli distribution with probability for success equal to 0.5. Finally, the coefficients where adjusted so that the length of the beta vector is always one.
The final parameter in the Monte Carlo experiments is the specification of the stochastic terms in the DGP. It is well known that the performance of ridge and Liu estimators are affected by the signal-to-noise ratio in the DGP. The dispersion parameter is therefore set to 1, 10, and 100, where 1 corresponds to a low signal-to-noise ratio.
We investigated a much richer variety of simulation parameter values, but to keep the results presentable in this paper; we have chosen to present the above values.

| Performance criteria for evaluating point estimators of the unknown regression parameters in a beta regression model
To assess how good an estimator is, we first need to determine a loss function that measures how close the estimator is to the unknown parameter. Then, by taking the expectation of the loss function, we obtain the corresponding risk function. We choose the estimator with the smallest risk, based on the risk function. In this paper, we have chosen L 2 norm as the loss function, which yields the mean squared error (MSE) as the risk function for evaluating the performance of the competing estimators of unknown regression parameters β 0 = [β 1 Á Á Áβ k ] in the beta regression model. Hence, we will treat ρ as a nuisance parameter and focus solely on the evaluation of the estimators of β. We choose the L 2 norm as the loss function (and hence the MSE) because it is one of the most common risk functions and allows us to directly compare our results with other results reported in the literature. Hence, we have following risk function: The risk function in 27 is estimated by the corresponding empirical risk function below: To simplify the comparison of the different Liu estimators to the MLE, we calculate the relative MSE defined as follows:

| Discussion of simulation results
The simulation results for different degrees of freedom, correlation, and dispersion parameters are provided in Table 1a for p = 3, Table 1b for p = 6, and Table 1c for p = 12. All tables show the relative MSE where a value above one indicates that the Liu estimator outperforms the MLE. Increasing the degree of correlation increases the benefit of using the Liu estimator. The estimator d 5 is nearly always in the optimal choice with the largest decrease in MSE followed by d 6 that sometimes can yield a lower MSE when the degree of multicollinearity is severe. This is especially clear when the value of the degrees of freedom is low. The benefit of the Liu estimator decreases with increasing sample size. Finally, the relative gain of applying the Liu estimator increases with the number of regressors; for example, for case of 12 regressors, the gain of applying the Liu method is substantial in the presence of high multicollinearity. Finally increasing the variability increases the relative benefit of applying a Liu estimator, and this pattern is prevails regardless of the number of regressors used in the simulation study.
In summary, we always recommend the use of a shrinkage estimator in the presence of moderate to strong multicollinearity. The estimators d 5 and d 6 perform especially well. These estimators may also be recommended in some situations where there is a low degree of correlation and even sometimes when orthogonal regressors are present. Simulation results for orthogonal regressors are available on request.

| APPLICATIONS TO CHEMICAL DATA
For illustration purposes, we will consider two real-life applications in this section.

| Gasoline yield data
In this example, we will analyze the classical data set from Prater, 21 which was also analyzed by Ospina et al. 11 In this data set, the dependent variable is defined as the proportion of crude oil that is converted to petrol after fractional distillation. We use all four regressors originally considered in Prater. 21 The first regressor (gravity) is defined as crude oil gravity and measured as the American Petroleum Institute (API) gravity. It is a measure of how heavy or light a petroleum liquid is compared with water. The second regressor is defined as the vapor pressure of crude oil (denoted vapor). We use the temperature (measured in degrees Fahrenheit) at which 10% of the crude oil has vaporized (denoted tempten). Finally, we use the temperature at which all petrol in the amount of crude oil vaporizes (temp). The condition index of the explanatory variables equals 68.1, which indicates a multicollinearity problem. The estimated coefficients for the different estimation methods along with the MSE are shown in Table 2. In some cases, the variable changes sign when applying a Liu-type estimator. However, the effect for that variable is very close to zero, and the variable seems to have no effect on the proportion of crude oil that is converted to petrol after fractional distillation. The MSE decreases when applying a Liu-type estimator, especially when using d 2 ,d 3 , d 5 , and d 6 estimators. In general, the result confirms the simulation study results, and the d 5 estimator always minimizes the MSE.

| Color of hazelnuts
In this example, we follow Şimşek, 12 where we study the color of hazelnuts. The color of roasted hazelnuts is measured using a colorimeter, which is one of the most widely used methods in chemistry to determine color characteristics. The                  measurement of colorimeter can differ, but in this example, it ranges from 0% to 100%. To determine the color characteristics of hazelnuts is a common research question in food chemistry, and it has been studied by, for example, Chiou and Tsai, 22 Demir et al, 23 Özdemir and Devres, 24 Pattee et al, 25 Robbins and Fryer, 26 Sanders et al, 27 and Şimşek, 12 among others. The changes in color of hazelnuts during roasting are in those papers explained by non-enzymatic browning that includes the Maillard reaction, caramelization, pigment destruction, and ascorbic acid browning. In Şimşek, 12 it is explained by the temperature in the oven time (x 1 ) and the time (x 2 ). They used a paraboloid model to investigate the effect of these factors on the color of hazelnuts. The condition index of this model is 665.8, which indicates a severe multicollinearity. From the results presented in Table 3, we can see the same paraboloid effect as was shown in Simsek (1997). However, the fit of the regression model increases substantially when using the new Liu estimator. The MSE of the optimal choice of ridge parameter d 5 and d 6 decreases the MSE almost with 10 times. Hence, it is clear that when modeling these type of relationships, the beta regression is a better choice than the linear regression model employed by Simsek (1997). Furthermore, our new suggested method leads to a substantial improvement of the estimated parameters and the MSE.

| CONCLUSIONS
Some Liu shrinkage estimators for the beta regression model are proposed in this paper. Since a theoretical comparison among the estimators is not possible, a simulation study was conducted to compare the performance of the estimators. In the design of the experiment, we change several factors, including the number of degrees of freedom, the number of explanatory variables, the dispersion parameter, and the degree of correlation. We find that the new Liu estimators outperform the MLE most of the time, especially when there is high to strong correlation among the explanatory variables. Furthermore, among the proposed shrinkage estimators, d 5 and d 6 performed better than the rest. Therefore, these estimators may be recommended for practitioners since it moderately shrinks the vector of parameters. We analyze real data to illustrate the findings of the paper and find some support for the simulation study results.