A novel response model and target selection method with applications to marketing

Response models used in marketing are not always constructed for later marketing optimisation, which often results in unsatisfactory results in target selection for future marketing activities. To solve this problem, we develop a new binary response model and a new marketing target selection method. The proposed model can predict multiple propensity scores per customer through customer‐specific propensity score distributions, which is not possible with existing response models, filling a gap in the literature. The target selection method can determine the best propensity scores from those predicted by the proposed model and use them to select customers for further marketing activities. Our simulation results and application to real marketing data confirm that the performance of the proposed model in target selection is significantly better than that of the existing models, including some popular machine learning methods, which indicate that our method can be very useful in practice.


Introduction
Statistical response models have been used in many disciplines, such as marketing.Many response models have been developed and used in marketing research.In addition to the probit and logit models, Alvarez & Brehm (1995) proposed a heterogeneous choice model to deal with possible heterogeneous issue in the data.Manski (1975Manski ( , 1985) ) developed the binary quantile regression (BQR) model, which is a semi-parametric model, and Bult (1993) also discussed its advantages and limitations in marketing.Horowitz (1992) and Kordas (2006) proposed smoothed BQR models.Rossi, McCulloch & Allenby (1996) developed a random coefficient selection model to deal with the heterogeneity in observable demographics and applied the model to purchase history data.Train (2002) gave a more comprehensive discussion of this topic.
Moreover, an aggregate advertising response model based on consumer population dynamics was proposed by Wang et al. (2013), while Li & Ansari (2014) proposed a Bayesian semi-parametric approach for endogeneity and heterogeneity in choice models.Bruno, Cebollada & Chintagunta (2018) developed a new model and used it to deal with the intra-household heterogeneity in customer brand choice behaviour.Kappe, Blank & DeSarbo (2018) developed a random coefficients mixture hidden Markov model for flexible patterns of unobserved heterogeneity in both the state-dependent and transition parameters.
When using these response models in marketing, we first estimate the models by using a training data set, which contains customer information, including customer responses to specific product or service offers obtained in past or pilot marketing activities.Then, we use the estimated models to predict a propensity score for each customer in a test data set, which contains customer information that has not been used in the model estimation, where the propensity score indicates the likelihood of the customer becoming a potential buyer.Finally, the customers in the test set are divided into groups according to their propensity scores, which are arranged in a decreasing order.A group of customers (e.g. the top 10% of the customers) can then be selected for future marketing activities because they have a higher propensity score than others.
The response models used in marketing can perform well in target selection when the predicted propensity scores are unbiased.However, when the predicted propensity scores are biased, these response models' performance in target selection may not be satisfactory.This problem may occur, for example, when the response models are incorrectly specified and/or when data are imbalanced.In such cases, the propensity scores predicted by these models may be biased towards, for example non-buyers.As pointed out by Ling & Li (1998), the propensity scores may be skewed within a narrow range in one end, and thus, errors in propensity score estimation will affect the ranking more easily in target selection.Branco, Torgo & Ribeiro (2016) further pointed out that when data are imbalanced, a response model in marketing can perform well in prediction (as measured by the total percentage of correct predictions), but it may be unsatisfactory in target selection, as the predicted propensity scores may be biased towards non-buyers, resulting in worse target selection results.
It is seen that optimising the subsequent utilisation of the estimated response models may be necessary if they are employed to directly or indirectly assist marketing managers in making marketing-mix decisions.For example, we may need to use the estimated models to identify the optimal propensity scores that can be used to determine the best customer group that is most likely to be potential buyers of a company's new product or service.
However, the existing response models used in marketing are not always constructed for later marketing optimisation (Albers 2012), and the existing models do not allow us to predict multiple propensity scores for each customer.Therefore, we have no room to determine whether the propensity scores predicted by the models are the 'best' for target selection in marketing.
It is important to use optimal propensity scores in marketing because targeting using optimal propensity scores enables marketers to efficiently allocate resources, improve campaign effectiveness, reduce costs, and enhance customer satisfaction.It is also important to use optimal propensity scores in other subject areas, including psychology, political science, economics, finance and sociology, because using optimal propensity scores allows researchers in these fields to better address selection bias, assess the impact of policies and analyse observational data to make causal inferences.
However, to the best of our knowledge, there is currently no response model that predicts multiple propensity scores for each individual, allowing us to determine the optimal propensity score.It is this gap in the literature that motivates us to develop a new response model and related approach to target selection for marketing research in this paper.
The contribution of this work to the related literature is twofold.On the one hand, we contribute to the statistical literature by developing a novel binary response model that can be used to solve practical problems in many disciplines.Unlike existing response models, the proposed model explicitly estimates the entire propensity score distribution.This will play a vital role in later marketing optimisation because it will allow the proposed model to assign multiple propensity scores to each customer, and these propensity scores can cover a wide range of propensity score distribution, allowing us to identify optimal propensity scores for marketing.It is worth noting that this model can also be applied to other business problems including, for example, fraud detection and insurance/risk management.
On the other hand, we contribute to the marketing research by developing a target selection method in order to facilitate the use of the new model in marketing.Our target selection method includes two steps.The first step is to determine the optimal propensity score that should be used for selecting targets, and the second step is to use the identified propensity score to perform target selection in marketing.Since the best propensity score can be used for target selection, our model can deal with some of the problems caused by model specification errors and/or imbalanced data, as shown later in this paper.
Our results confirm that not all propensity scores can produce satisfactory results in target selection.This explains why if the propensity score estimated from a given model is biased, then the model's performance in target selection may be unsatisfactory.Our results also show that the proposed model outperforms baseline response models as well as some popular machine learning methods used in target selection.The main reason is that our method can use the 'best' propensity score for target selection, while the existing methods may not.
In Section 2, we briefly discuss the benchmark response models that are closely related to the model proposed in this paper, and in Section 3, we discuss the proposed response model and its estimation.In Section 4, we present the method for target selection.Simulation results are discussed in Section 5, while in Section 6, we discuss the application of the proposed method to the bank marketing data, and compare the results with those obtained from the benchmark response models and some machine learning methods commonly used in marketing.Section 7 concludes.Appendix A gives the proofs of three theorems, Appendix B and Appendix C give the prior density function and the MCMC method for parameter estimation, respectively, and Appendix D presents results of another simulation study which confirms that the proposed estimation method performs well, and the convergence of the method does not depend on the strength of the prior information on the parameters.

A brief review of the baseline response models
The binary logistic regression (BLR) model is one of the popular response models used for target selection in marketing.Many researchers use it as a benchmark model, see for example Cui, Wong & Wan (2012) and Zahavi & Levin (1997).
Generally, a binary regression model is defined by where i = 1, . . ., n and n is the sample size, α is a vector of parameters, x i = (x 1i , . . ., x ki ) is the observed value of k predictors for customer i , and y i is the observed response of customer i .Moreover, y * i is an unobserved continuous variable that may represent, for example, customers' psychological feelings about a new product, and ε i s are assumed to be iid random variables.
Let F (•) be the distribution function of ε i and η i = η(x i , α).Then, the propensity score of customer i can be estimated by the response probability of the customer, which is given by μ 1) becomes the BLR model.In this case, the conditional mean of y * i , that is η i , is used to estimate the propensity score μ i .It is worth mentioning that, unless otherwise stated, in this paper we assume that ε i follows the logistic distribution.
Note that model (1) cannot handle heterogeneous issues that may exist in the data.In other words, when the variance of ε i is not a constant, model ( 1) is not useful.Yatchew & Griliches (1985) pointed out that in the presence of heterogeneity, the estimation of model parameters will be inconsistent and inefficient.In order to solve the problem of heterogeneity, Alvarez & Brehm (1995) proposed a heterogeneous choice (HC) model: where z i is also a vector of predictors that may be the same as x i , and α 1 and α 2 are vectors of the model parameters.So, model (2) tries to deal with the heterogeneous problem by using the term e z i α 2 .However, Achen (2002) pointed out that model ( 2) is equivalent to the following model and there is no way to distinguish between them.This implies that model ( 2) is in fact a non-linear homogeneous binary response model.Hence, the predicted propensity score is given by μ i = e a /(1 + e a ), where a = η(x i , α 1 )/e z i α 2 .The BQR model developed by Manski (1975Manski ( , 1985)), Horowitz (1992) and Kordas (2006) is a semi-parametric model because they do not assume a distribution for the error term, and they can be estimated by using the methods based on Manski's (1975Manski's ( , 1985) ) maximum score function and its variants.The benefits and limitations of semi-parametric models are also discussed by Bult (1993).Recently, Benoit & Van den Poel (2012) developed a Bayesian approach to the BQR model by using the asymmetric Laplace distribution for ε i , denoted by ALD(ψ = 0, σ = 1, τ ), where σ and ψ are the scale and location of the distribution, respectively, and τ ∈ (0, 1).Hence the model can be expressed by (3) As the τ quantile of ALD is zero, η i = η(x i , α 1 ) is in fact the conditional τ quantile of y * i .Following Hashem et al. (2016), the propensity score can be estimated by μ i = 1 − ALD(ψ = −η(x i , α 1 ), σ = 1, τ = 0.5), which corresponds to the conditional median of y * i .We will use the BLR, HC and BQR models as benchmark response models because they are closely related to our model.This is discussed more at the end of Section 3.1.

Proposed model and parameter estimation
The construction of the existing response models discussed above determines that only one propensity score can be assigned to each customer.Hence, if the predicted propensity scores are biased, the errors in these scores will affect the ranking of the customers, leading to unsatisfactory results in target selection.
In order to overcome the limitations of existing models, in this section, we first develop a new response model by explicitly estimating the distribution of propensity scores, and then we develop a parameter estimation method.

Proposed model
Recall that, for the BLR model, the propensity score is given by μ i = Pr (y i = 1|η i ) = e η i /(1 + e η i ).If we regard μ i as a random variable, then η i is also a random variable.Hence, if we know the distribution of η i , then we also know the distribution of μ i .To treat η i as a random variable, we first propose the following model: where h 2 (β, x i ) > 0, α and β are parameter vectors to be estimated, both ε i and ξ i are iid random variables and ε i follows the logistic distribution.We further assume ε i and ξ i are independent.Model (4) says that h 1 (α, x i ) and h 2 (β, x i ) are the location and scale of the distribution of η i respectively, where the predictors involved in h 1 and h 2 may be different, but to simplify the notation used in the paper, we set them equal.It is seen that the term h 2 (β, x i ) can also be used to deal with some heterogeneity in the data.
The second step of our model building process is to determine the distribution of ξ i in model (4).For target selection, we need the distribution of η i to capture different characteristics of data, such as the characteristics related to the centre, skewness and tails of the data.This is important because, for example, when data are imbalanced, the distribution of the data is skewed.Compared with the centre, the tail of the distribution may contain more useful information that can be used for target selection.
The work of Fournier et al. (2007) suggests that the generalised lambda distribution (GLD) can be a very good candidate for our model because this distribution is so flexible that many standard distributions, such as normal, Weibull, log-normal, t-, skewed t-and F -distributions, and many other distributions can be accurately approximated by this distribution.This shows that the GLD can handle many data structures that these standard distributions may not be able to handle alone.Hence, using the GLD for ξ i will make the model more robust to model specification errors.
It is worth noting that the GLD is only explicitly defined by its quantile function, which can be expressed by Q gld (τ ) = a + bQ(τ , γ ), where a and b are the location and the scale of the distribution, and Q(τ , γ ) is defined by where γ = (γ 1 , γ 2 ) and τ ∈ (0, 1).In fact, Q(τ , γ ) defined by ( 5) is a special case of GLD, with location 0 and scale 1.So we let ξ i follow the GLD defined by (5).However, as the GLD is only explicitly defined by its quantile function, we need to use the quantile function to rewrite model (4), which gives where τ ∈ (0, 1) and Q(τ , γ ) represents the quantile function of ξ i and θ = (α, β, γ ).In this paper, we refer to model (6) as the quantile function response (QFR) model.
and let Q(τ , γ ) be the quantile function of ξ i .Then the conditional quantile function of η i is given by Q Moreover,model (6) implies that the conditional quantile function of the propensity score μ i is given by See Appendix A for a proof.Theorem 1 shows that model (6) actually defines the entire distribution of η i through its quantile function.It further shows that the distribution of propensity scores can also be obtained easily through (7).Moreover, due to the monotone relation between η i and μ i , the τ quantile of η i corresponds to the τ quantile of the propensity score μ i for any τ ∈ (0, 1).This shows another advantage of using the quantile function of η i in the model.We will take advantage of this when we develop the target selection method later in the paper.
In the following, we focus on model ( 6), where Q(τ , γ ) is defined by (5).Hence, Q η i (τ |θ, x i ) defines the GLD with location h 1 (α, x i ) and scale h 2 (β, x i ).Moreover, γ 1 and γ 2 control its left and right tails respectively.If γ 1 = γ 2 , then the distribution of η i is symmetric, otherwise it is skewed.Furthermore, as Gilchrist (2000) pointed out, γ 1 and γ 2 determine not only the skewness but also the relative weights of the tails.This means that the skewness is modelled as a result of tail shape rather than as an independent feature.This is one of the features that other standard distributions may not have.
The final step of our model building process is to specify a functional form for h 1 (α, x i ) and h 2 (β, x i ) respectively.In order to compare with the benchmark response models, we use the form of a linear function for h 1 (α, x i ).We use the form of a quadratic function for h 2 (β, x i ) because estimating generalised non-linear regression models that contain exponential forms can be difficult due to large sampling errors (see, McCullagh & Nelder 1989).Specifically, in this paper, we let where β 0 > 0 and β j ≥ 0, j = 1, . . ., k , which guarantee that h 2 (β, x i ) > 0.
Theorem 2. Consider model ( 6).If Q(τ , γ ) is given by ( 5) and h 1 (α, x i ) and h 2 (β, x i ) are defined by ( 8), then model ( 6) is well defined on the parameter space = {(α j , See Appendix A for a proof.Therefore, once the quantile function ) is available, we know the entire distribution of η i and μ i .As the distribution of η i and μ i defined by the model is very flexible, many features of the data can be captured by the distribution, such as features related to centre, dispersion, skewness, tail and so on.The captured features can then be regarded as potential propensity scores for target selection.We will discuss this issue in Section 4.
The relations between the QFR model ( 6) and the BLR, HC and BQR models are discussed below.If h 2 (β, x i ) = 0, then the QFR model becomes the BLR model.The HC model tries to deal with the heterogeneity through the unobserved y * i , although the model is equivalent to a non-linear homogeneous model, while the QFR model can deal with heterogeneity through η i .The QFR model uses the quantile function approach to response modelling (see, e.g.Gilchrist 2000) and hence its parameters do not depend on τ , while the BQR model uses the quantile regression approach to response modelling (see, e.g.Koenker 2005) and hence its parameters depend on τ .In addition, the QFR model can assign multiple propensity scores to each customer, while the BLR, HC and BQR models can only assign a unique propensity score to each customer.Finally, all response models, namely QFR, BLR, HC and BQR models, do not consider specific data structures, such as panel data, time series data or data with specific heterogeneous structures.Hence, they are good benchmark response models for our research.

Parameter estimation
In order to facilitate the use of the proposed model in practice, we now discuss how to estimate the model parameters.In this paper, we develop an estimation method for model ( 6), where Q(τ , γ ), h 1 (α, x i ) and h 2 (β, x i ) are defined by ( 5) and ( 8) respectively.However, this estimation method can be easily extended to other formulae for Q(τ , γ ), h 1 (α, x i ) and h 2 (β, x i ).

Posterior density function
First recall that in our model, the propensity score μ i is a random variable whose distribution is determined by its quantile function Q μ i (τ |θ, x i ).So a random sample of μ i , also denoted by μ i in order to simplify the notation used in this paper, is given by , where τ i is a random sample of the uniform distribution between 0 and 1. Let However, since μ i depends on τ i and τ i is unobservable, the MLE method is not convenient for the parameter estimation.Thus we consider a Bayesian approach to parameter estimation, which requires us to derive the posterior distribution of θ, μ and τ .Theorem 3. Let π(μ, τ , θ |x, y) and π 0 (θ |x) be the posterior and prior density functions of (μ, τ , θ ) and θ respectively.Then

and τ ∈
See Appendix A for a proof.In this paper, we let M = 10 20 and = 10 −20 .Hence, the difference between 1 and (see Theorem 2) is negligible, thus ensuring that important parameter regions in will not be missed.Theorem 3 shows that the posterior density function is well defined on , but it is very complicated.Hence, the Markov Chain Monte Carlo (MCMC) method is suitable for parameter estimation.

MCMC method
It follows from Theorem 3 that we now need to specify a prior density function π 0 (θ |x) so that it is well defined on 1 .To simplify the calculation, we let π 0 (θ |x) = π(α)π(β)π(γ ), where α j follows a truncated normal distribution on [−M , M ], and β j and −λ v follow a truncated log-normal distribution on [ , M ] respectively.Appendix B provides the detailed formula of the prior density function.
The basic idea of a MCMC method is to generate a sequence of values in the parameter space so that this sequence of values forms a Markov Chain whose equilibrium distribution is the posterior distribution of the parameters.See Brooks (1998) for details.
To achieve this, we use the Metropolis-Hastings algorithm in which a candidate parameter value is simulated from a chosen distribution and this proposed value is accepted as the next in the sequence with a known probability, see Geyer (2011).The detailed steps of our MCMC method are given in Appendix C.
Therefore, after a burn-in period, posterior samples can be collected from the Markov Chain generated by the MCMC method.The model parameters can be estimated by using the average of the posterior samples.We denote the estimated parameters by α, β and γ .Then, the conditional quantile function of η i and μ i can be estimated by Qη Using the distribution of η i or μ i , we can now assign multiple propensity scores to each customer for target selection.

Multiple propensity scores
Recall that if we use an existing model, we can only assign a unique propensity score to each customer.If, for example, the model is not specified correctly or the data are imbalanced, the predicted propensity scores may be biased towards non-buyers, resulting in large errors in target selection.
However, for the QFR model, we have the entire propensity score distribution, defined by Qμ i (τ | θ, x i ).Therefore, we can use, for example, the mean, median, quantiles or other information about the distribution of μ i as the estimated propensity scores for each customer.It can be seen that each customer now has multiple propensity scores, and different propensity scores contain information about different parts (or characteristics) of the μ i distribution.
It is worth noting that which propensity score is more suitable for target selection depends on the data structure.For example, if the data are balanced, a propensity score containing information about the centre of the μ i distribution may be more appropriate, but when the data are imbalanced, a propensity score containing information about the tail of the distribution may be more appropriate.Therefore, a wide range of propensity scores can avoid losing important information about the distribution of μ i and enable us to determine the 'best' propensity score for target selection.
It is worth emphasising again that the relation between η i and the corresponding μ i is monotonic.This suggests that the ranking of customers based on μ i is the same as the ranking of customers based on η i , thus leading to the same results in target selection.Therefore, in order to simplify the calculation, in the rest of this paper, we will use η i to define the propensity scores for target selection.
Let η ij be the j th propensity score of customer i , where j = 1, . . ., J , and J is the total number of propensity scores that are assigned to customer i .Let C j = {η ij , i = 1, . . ., n}.Then C j contains the j th propensity score of all customers.
In order to ensure that η ij can cover a wide distribution range of η i , we can let η ij be the τ j quantile of η i , that is, cover a wide range between 0 and 1.As an example, we can define first seven propensity scores by letting τ j ∈ {0.01, 0.05, 0.25, 0.5, 0.75, 0.95, 0.99}, we can define the eighth propensity score by letting η i 8 = η i 7 − η i 1 .Therefore, these propensity scores reflect the main characteristics of the η i distribution or μ i distribution.More specifically, the tail of the distribution is captured by η i 1 , η i 2 , η i 6 and η i 7 , the central location is captured by the η i 3 , η i 4 and η i 5 , and the dispersion is captured by η i 8 .It is worth noting that other information about the distribution of η i may also be used to define propensity scores, but in this paper, we will mainly use the propensity scores defined above.

Target selection
It is worth noting that in direct marketing, the lift is usually the most important criterion for assessing the performance of a model in target selection (see e.g.Bhattacharya 1999, Vuk & Curk 2006, and Cui, Wong & Wan 2012).The lift is defined by the ratio of the percentage of true positive responses in a specific group of customers identified by a model to the percentage of responses in the group identified by the random model.For example, a model with a lift of 2 in a group of 10% customers is said to be twice as good as the random model in that group.Thus, we can rank customers in an ordered list: customers with higher propensity scores are at the top of the list.Then, we can calculate the lift in the top 10% of customers, and then calculate the lift in the top 20% of customers, and so on, until we reach 100% of customers.Using lifts across these groups is helpful for comparing the performance of different models.
For our model, after assigning multiple propensity scores to each customer, we need to determine which score should be used in target selection.Recall that we have used the training set to estimate the model.We now still need to use the training set to determine the best propensity score for target selection.The main steps of our target selection method are given below.
(i) Given j , rank customers in the training set to obtain an ordered list: customers with higher j th propensity scores are at the top of the list.(ii) Use the ordered customer list to define 10 customer groups, denoted by A j , where = 1, . . ., 10, and A j contains the top 10 % customers.(iii) Calculate the lift in each group A j , denoted by u j .(iv) Calculate the average value of the lifts, denoted by ūj .(v) Repeat the above steps for all j = 1, . . ., J .(vi) Let ūj * = max { ūj , j = 1, . . ., J }. Then the j * th propensity score is the propensity score that should be used for target selection.(vii) Calculate the j * th propensity score of all customers in the test set, rank the customers and select a group of customers for future marketing activities.
Note that max { ūj , j = 1, . . ., J } may not be unique.For example, we may have ū2 = ū6 = max { ūj , j = 1, . . ., J }.In this case, both propensity scores may be used for target selection.Users can determine which one to use in practice.It is also worth noting that if the training and test data sets are not random samples of the same data set, then the proposed target selection method may not work, because in this case, the two data sets may have different data structures.To ensure that the training and test data sets have the same structure, various sampling methods may be used.For example, the simple random sampling method, stratified random sampling method or combinations of several random sampling methods can be used.
In the above method, we used the average of the lifts in the 10 percentage groups to determine which propensity score should be used in target selection.Our results show that this simple optimisation criterion works well.However, it is worth noting that we can also use other criteria to determine the best propensity score to use.For example, instead of the average lift, we may use a median lift, we may use the area under the lift curve, or we may combine multiple propensity scores with the financial cost of marketing (see e.g.Bult & Wansbeek 1995).Therefore, the comparison between different optimisation standards needs to be studied in the future.

Simulation study
We now discuss the simulation results in order to gain some in-depth understanding of the performance of the proposed model in target selection.The purpose of this simulation study is to confirm that, in the presence of model specification errors, the performance of the proposed model in marketing is better than the benchmark response models.This is important because in practice, we often face the problems caused by model specification errors.
The above procedure was repeated 200 times, resulting in 200 independent data sets, the first of which is shown in Figure 1.
It is seen that the data structure is quite complicated and the dependence between variable x and η is not linear.We fitted the following four models to each of the data sets.
QFR model: where ) is given by ( 7).BLR model: HC model: Therefore, all four response models are incorrectly specified.We now consider the performance of these models in prediction and in target selection.When predicting discrete variables, commonly used metrics are the confusion matrix (Kohavi & Provost 1998) and the area under the receiver operating characteristic (ROC) curve (Fawcett 2006), denoted by AUC.The larger the AUC, the higher the predictive power of the model.To generate an ROC curve, we need to (i) calculate the propensity score μ i of the customers in a data set, (ii) select a sequence of score thresholds 0 and y i = 0 otherwise, and calculate the total percentage (denoted by p k ) of correct positive predictions and (iv) plot p k against c k to obtain the ROC curve for a model.It can be seen that the higher the ROC curve, the larger the AUC, and therefore the better the prediction performance of the response model.It is worth noting that in this paper, we use the median of μ i (corresponding to the median of η i ) to predict the response variable.
Figure 2 shows the plot of the AUCs in 200 simulations.It is seen that the AUC values obtained from our QFR model are larger than those obtained from the SLR, HC and BQR models.In fact, we also used the one-tailed t-test to check the significance of the difference between the average AUCs obtained from different models.The results show that, at the 1% significance level, the QFR model has the highest predictive power, the HC model has the second highest predictive power, and the BQR and BLR models have similar predictive power.Now, we consider target selection.We used the first simulated data set as the training set, and simulated another 100,000 data from the same model to give the test set.For the QFR model, we use the median of μ i and η ij as propensity scores for customer i , where η ij is the τ j quantile of η i , j = 1, . . ., 5 and τ j ∈ {0.05, 0.25, 0.5, 0.75, 0.95}.Therefore, in this simulation study, we assigned a total of six propensity scores to each customer.By using the training set and the target selection method discussed in Section 4.2, we calculated the lift in each group, the results of which are given in Table 1, where μ represents the median of μ i .So, we can use Table 1 to determine which propensity score should be used in target selection.
It is seen that the lifts obtained based on the propensity scores defined by the median of μ i and the 50% quantile of η i are the same.This is not surprising, because the median of μ i corresponds to the 50% quantile of η i (i.e. the median of η i ), they should produce the same result in target selection.Comparing the average lift, Table 1 also shows that the average lift based on these two propensity scores is the largest.Therefore, we can use the 50% quantile of η i or the median of μ i for target selection.
Next, we check whether the best result of target selection on the test set still corresponds to the best propensity score determined above.Note that in reality, the test set does not contain values of y i .However, since we use the simulated data, we know the values of y i in the test set and we can use them to check whether the best propensity score identified above is still the best on the test set.
Table 2 shows the lifts obtained from all propensity scores predicted by the QFR model on the test set.It can be seen that the results on the test set are very similar to those on the training set, with the best results again corresponding to the 50% quantile of η i or the median of μ i .This is what we should expect, because both the training set and the test set are simulated from the same model, and hence they have the same data structure.
We also checked performance of the benchmark response models in target selection and included the results in Table 2.It can be seen that the overall performance of our model in target selection is also better than that of the benchmark response models.Note that all three benchmark response models have the same lifts because they are accurate to two decimal places.We further conducted another simulation study in order to confirm that the proposed estimation method performs well and the convergence of the method does not depend on the strength of the prior information on the parameters.Indeed, good results were also obtained.See Appendix D for details.

A marketing application
In this section, we compare the performance of the four response models in terms of prediction and target selection.We also compare the performance of these response models in target selection with some popular machine learning methods commonly used in marketing.

The data
The data considered in this application were collected from marketing campaigns conducted by a Portuguese banking institution in the period from May 2008 to November 2010.Although the original data are not available, Moro, Laureano & Cortez (2011) used data mining techniques and provided a subset of the data, which contains 16 attributes, one output variable and 45211 instances, of which 5289 were successful.Hence the success rate was 11.7%.
The variables given in the data set are defined as follows.The response variable is Y, where Y = 1 represents that the customer subscribed a term deposit and Y = 0 otherwise.The 16 predictors are age (V 1 ), job type (V 2 ), marital status (V 3 ), education level (V 4 ), credit in default (V 5 ), average yearly balance in euros (V 6 ), housing loan (V 7 ), personal loan (V 8 ), contact communication type (V 9 ), last contact day of the month (V 10 ), last contact month of year (V 11 ), last contact duration (V 12 ), number of contacts performed during this campaign (V 13 ), number of days that passed by after the client was last contacted from a previous campaign (V 14 ), number of contacts performed before this campaign and for this client (V 15 ) and the outcome of the previous marketing campaign (V 16 ).
We divided the data set into two random subsets, namely the training set and the test set.The training set consists of 690 customers, and the test set contains the rest of the data.The sample size of 690 was randomly selected between 600 and 800 using the random number generator in the statistical software R. As we do not know the specific structure of the entire data set, we used the simple random sampling method to obtain the training set.We deliberately keep the sample size of the training set small, because in practice, we often need to conduct pilot marketing activities for a small group of customers before we conduct a full scale marketing campaign.Therefore, response models that perform well on smaller data sets are very important in practice.
We will use the training set to estimate the models, and use the test set to examine the performance of models in target selection.For our model, we will also use the training set to determine the propensity score that should be used in target selection.Plots of the data in the training set are shown in Figure 3.It is seen that the data are imbalanced.In fact, the success rate in the training set is 13.6%, very similar to the entire data set.

Estimated response models
We estimated the four response models using the training set.For the QFR model defined by ( 6), ( 5) and ( 8), a Markov Chain was run for 5 × 10 5 steps, whose plot (not shown to save space) suggests that the Markov Chain converged after a burn-in period of the first 10 4 steps.The three benchmark response models were estimated using the statistical software R. Figure 4 summarises the estimated parameters (dots), where each vertical line segment corresponds to the 95% credible interval or confidence interval of a model parameter, and each vertical dashed line indicates that the confidence interval is too wide to be displayed within the range of the graph.

Performance of the response models in prediction
We first consider in-sample prediction.We used the four fitted models to calculate the propensity score of customers in the training set and obtained the total percentage of correct positive predictions at each score threshold.Then, we show the ROC curves of the four models in Figure 5a, where the vertical line corresponds to the typical score threshold 0.5.It can be seen that for in-sample prediction, our model has better performance than the benchmark response models in the entire range of 0-1.For the out-of-sample prediction, we check the average performance of the models.We obtained 100 independent random subsets from the test set, each of which contains 10,000 customers.Then, for each of these 100 subsets, we used the four fitted models to calculate the propensity scores and obtain the total percentage of correct positive predictions.
It is worth mentioning that we do not have any information about the data structure, and therefore, we use 100 independent random subsets of the test set for out-of-sample prediction, which will allow us to check how the models perform on different test sets that may have different unknown structures.
To make the plot clearer, we further calculated the average percentage of correct positive predictions at each score threshold and show the corresponding ROC curves in Figure 5b.Clearly, the performance of our model in prediction is also better than that of the benchmark response models.On the other hand, in the benchmark response models, when the score threshold is less than 0.5, the HC and BQR models are better than the BLR model, otherwise the BLR model is better.

Performance of the response models in target selection
A response model can perform well in prediction, but it may be unsatisfactory in target selection, especially in the case of imbalanced data (Branco, Torgo & Ribeiro 2016).We now examine the performance of the models in target selection.First, we consider our model.
We use the method discussed in Section 4.1 to define the propensity scores in this study.Specifically, for j ≤ 7, let η ij be the j th propensity score of customer i , where η ij is the τ j quantile of η i , and τ j ∈ {0.01, 0.05, 0.25, 0.5, 0.75, 0.95, 0.99}.For j = 8, let η i 8 = η i 7 − η i 1 .Let C j contain the j th propensity score of all customers in the training set.
After ranking customers in the training set according to the propensity scores contained in C j , we calculated the lift in each percentage group.Table 3 shows all the lifts obtained using the training set.It is worth noting that Table 3 does not contain the results corresponding to the propensity scores defined by μ i , because they are the same as the results given in Table 3.
It is seen that the maximum average lift corresponds to j = 5, 6, 7, 8.According to our target selection method, the last four propensity scores are the best propensity scores, and we can use any of them to select targets from the test set.
Figure 6 shows the lifts in Table 3, where the grey and darker curves correspond to j ≤ 4 and j ≥ 5 respectively.Obviously, the grey curves are much lower than the  horizontal line, which means that if the first four propensity scores are used in target selection, the performance of the model in target selection will be much worse than the random model.On the other hand, the dark curves are mainly located above the horizontal line, especially for the first few percentage groups.All these results indicate that if the last four propensity scores are used, the model will produce similar results in target selection, and all these results will be much better than the results obtained from the random model.Therefore, any of the last four propensity scores can be used for target selection.Now, given j ≥ 5, we can calculate the j th propensity score of the customers in the test set, and rank the customers in the test set according to their j th propensity scores.Then, we can select a certain percentage of customers for future marketing activities.Please note that in reality, until the completion of the marketing activities, the performance of the model on the test set for target selection is not known.However, in this study, since the data were collected in the past bank marketing activities, we can use a lift table or lift chart to check the performance of the model on the test set.
More specifically, for comparison purposes, we calculated all eight propensity scores for each customer in the test set, as well as the lift in each percentage group.We present the results in Table 4 and Figure 7.
As expected, Table 4 clearly shows that the last four propensity scores have produced very similar results in target selection, and these results are also much better than those obtained by using the first four propensity scores.It is worth noting that Table 4 also confirms that, when the data are imbalanced, the propensity scores corresponding to the centre of the distribution (50% quantile of η i ) may not be suitable for target selection.
Figure 7 further shows that the lift curves corresponding to the last four propensity scores are much higher than other lift curves, so confirming again that the best propensity scores identified on the training set do produce the best results in target selection on the test set.
We further used the benchmark response models to calculate the propensity scores of customers in the test set and the lift in each percentage group.The results can also be found from Table 4 and Figure 7. Obviously, in terms of target selection, the performance of all benchmark response models is much worse than our model.In fact, they are even worse than the random model in target selection.It is worth noting that in this application, the propensity scores obtained from the benchmark response models correspond to the centre of a skewed distribution, which is similar to our fourth propensity score (that is, the 50% quantile of η i ).Therefore, in this study, they are not as useful as the propensity scores corresponding to the tails of the η i distribution in the target selection.

Performance of some machine learning methods in target selection
Now we further compare our methods with the following popular machine learning methods that are commonly used in marketing.(a) Decision tree (DT) (see, e.g.Hastie, Tibshirani & Friedman 2009), which is a machine learning method that recursively partitions data based on features to create a tree-like structure that predicts target outcomes by traversing the tree's branches according to feature values.(b) Classification and regression trees (CART) (see, e.g.Breiman et al. 1984), which is a DT-based approach for categorical (classification) and continuous (regression) target prediction, where data are divided into subsets at each node according to features, resulting in leaf nodes containing the predicted target value or class.(c) Random forests trees (RFT) (see, e.g.Breiman, 2001), which is an ensemble learning technique that combines multiple DTs, improves prediction accuracy by averaging their outputs and provides robust and reliable target selection by aggregating predictions from different individual trees.(d) K-nearest neighbour classification method (KNN) (see, e.g.Cover & Hart 1967), which assigns target labels to data points by considering the majority class of the k nearest neighbors in the feature space.(e) Quantile regression random forests (QRRF) (see, e.g.Meinshausen 2006), which is a hybrid approach that combines the ensemble power of random forests with quantile regression, allowing  Cortes & Vapnik 1995), which classifies data points into different classes by finding a hyperplane to maximise the separation between data points in feature space, enabling efficient target selection even in complex and high-dimensional data.It is worth noting that some of these methods also work well with imbalanced data.We use each of these methods to perform the target selection and calculate the lift in each percentage group.The results are given in Table 5.Compared with the results obtained from the response models, we see that these machine learning methods outperform the BLR, BQR and HC methods, and the best machine learning method is the classification and regression trees method with an average lift of 1.08.However, these results are still much worse than the results obtained from our method.

Conclusion
We have developed a new response model that can explicitly estimate the entire propensity score distribution.Therefore, it can assign multiple propensity scores to each customer, which provides a way to fill the gap in the literature.To facilitate the use of the proposed model in marketing, we also developed a new target selection method.The target selection method can be used to identify the best propensity score from the propensity scores predicted by the proposed model, and use the identified propensity score to select targets for future marketing activities.
In this paper, we discussed our approach by using a special case of the QFR model ( 6), where Q(τ , γ ) is given by ( 5) and h 1 (α, x i ) and h 2 (β, x i ) are given by (8).In fact, the methodology developed in the paper can be easily extended to other QFR models defined by different Q(τ , γ ), h 1 (α, x i ) and h 2 (β, x i ) functions.
We have seen that the propensity scores identified by our target selection method can indeed produce much improved results in target selection compared to the benchmark response models and the machine learning methods commonly used in marketing.Our results also confirm that not all propensity scores can produce good results in target selection.This explains why the propensity scores predicted by existing response models may not provide satisfactory results in target selection.
It is seen that our model helps us understand and better capture complex patterns and dependencies in customer data, and more accurately predict customer behaviour and responses to marketing campaigns.With our enhanced target selection method, marketers can identify and target specific customer segments more effectively.On the other hand, it is also worth noting that the model we have developed can also be applied to other business problems including, for example, fraud detection and insurance/risk management.
Finally, our target selection method is based on the average lift to determine the optimal propensity score for target selection, which is limited.Therefore, future examination and comparison of many other criteria that may be helpful in determining the optimal propensity score for target selection will be required.

APPENDIX A
Proof of Theorem 1.As Q(τ , γ ) is the quantile function of ξ i , we have Pr As τ is an arbitrary real number between 0 and 1, we see that the quantile function of η i is given by which is the second equation in model ( 6) as required.
Note that μ i = Pr (y i = 1|η i ) = e η i /(1 + e η i ) is a monotone function of η i .Hence, given the quantile function Proof of Theorem 2. We need to show that two distinct parameter vectors, (α, β, γ ) and (α * , β * , γ * ) cannot yield the same value of the maximised likelihood function.
First note that if for the two parameter vectors (α, β, γ ) and (α * , β * , γ * ) such that holds for all x i and τ ∈ (0, 1), then we have where the quantile function of μ * i is given by Hence, we only need to show that if (A1) holds, then we must have α = α * , β = β * and γ = γ * .For (A1) to hold, we need, for all x i and τ ∈ (0, 1), and It follow from (A2) that, for all x i Hence, we must have α i = α * i for i = 0, . . ., k , that is α = α * .For (A3) to hold for all x i and all τ ∈ (0, 1), we need ki for all x i , which means that β = β * must hold; and for all τ ∈ (0, 1).This means that two strictly monotone quantile functions are equal and the parameters do not depend on τ .Therefore, we must have γ 1 = γ * 1 and γ 2 = γ * 2 , that is γ = γ * as required.This completes the proof.
Proof of Theorem 3. First, note that if we let f (x ) be the probability density function of a random variable X and Q x (τ ) its quantile function, then we have f and π i (τ i | θ, x i ) = 1 as τ i is uniformly distributed on (0, 1).This completes the proof.
Since μ i ∈ (0, 1), we see that μ hence it is finite on [ , 1 − ].Therefore, there exists a constant, say M , such that Hence, it follows from

APPENDIX B
The prior density functions for α, β and γ are given below.Clearly, they are density functions truncated on the parameter space of the posterior density function.
, where σ j , s j and λ v are the scale parameters of the respective prior density functions.

APPENDIX C
Our MCMC method is given below.Let μ, τ , α, β, γ represent the current and μ , τ , α , β , γ represent the proposed parameter values.Then our MCMC method consists of the following steps.
Step 6 If the proposed values are accepted, let

Another simulation study
In this simulation study, we show that the proposed estimation method performs well, and the convergence of the method does not depend on the strength of the prior information on the parameters.For i = 1, . . ., 500, we simulated x i uniformly on [0, 5] and η i from which can be achieved by simulating τ i uniformly on (0, 1) and calculating η i = Q η i (τ i |x i ) using (D1).Then μ i values can be calculated by μ i = e η i /(1 + e η i ).Finally, we let y i = 1 with probability μ i and y i = 0 otherwise.By repeating these steps 200 times, we obtained 200 independent data sets.Note that in reality both η i and μ i are not observable, but here we will use them to check the performance of our estimation method.Specifically, we will (a) compare the distribution of the estimated standardised residuals ri = {η i − ( α0 + α1 x i )}/{ β0 + β1 x 2 i } (i = 1, . . ., 500) with the true one Q(τ , γ ) = (τ −0.35 − 1)/(−0.35)− {(1 − τ ) −0.5 − 1}/(−0.5); (D2) (b) compare the estimated distribution of μ with the true one defined by (7); and (c) check the coverage probabilities of the estimated quantile curves of η and μ respectively.If the estimation method performs well, we should expect a good agreement between the estimated results and true results.
It is worth mentioning that the strength of the prior information about the parameters of the posterior density function is measured by the values of σ j , s and λ v (see Appendix B).For example, a large (or small) value of σ j suggests that the prior information on α j is weak (or strong).To check the effect of the prior information on the estimation, we let σ j = s = λ v = ξ , where ξ = 1, 2, 3. So, the variance of α j is given by 1, 4 and 9 for ξ = 1, 2 and 3 respectively, while the variances of β and γ v are the same, given by 4.671, 2926 and 6.565 × 10 7 corresponding to ξ = 1, 2 and 3 respectively.Clearly, when ξ = 2 the strength of the prior information has already become very weak.Now, for each value of ξ , a Markov Chain was run for 5 × 10 5 steps.Testing runs suggest that a burn-in period of the first 2 × 10 4 steps is enough.The posterior samples were then collected after the burn-in period.The first two rows of Figure D1 show the time series plots of the posterior samples obtained by using the first simulated data set and ξ = 2, which suggests that the convergence of the Markov Chain has been achieved.The last two rows of Figure D1 show the probability density function plots of the posterior marginal distributions of the parameters, where the vertical lines correspond to the true parameter values, and the darker continuous, grey continuous and dotted curves correspond to priors with ξ = 1, 2 and 3 respectively.It is seen that in all cases the true parameter values are well within the range of the posterior marginal distributions.To save space, we will present our results corresponding to ξ = 2 in the paper.
For each data set, we recorded the Bayesian estimate together with an associated 95% credible interval, leading to 200 credible intervals for each parameter.Table D1 shows the  true parameter values and the average value of the lower (upper) bounds of these credible intervals.It is seen that all the true parameter values are well within the respective lower and upper bounds, suggesting a good performance of the method.We now compare the distribution of ri with (D2).A good performance of the method is expected if the two distributions are not significantly different.So we estimated a probability density function by using ri (i = 1, . . ., 500) for each data set, which is then compared with (D2) by using the Kolmogorov-Smirnov test.We found that the average p-value of the 200 tests is 0.0642 with a 95% confidence interval [0.0454, 0.0831].Hence the Kolmogorov-Smirnov test shows that, on average, the distribution of ri is not significantly different from the true distribution defined by (D2) at a 1% level of significance.
Similarly, we compared the estimated and true distributions of μ.We found that the average p-value of the Kolmogorov-Smirnov tests over 200 simulations is 0.0433 with a 95% confidence interval ranging from 0.0313 to 0.0554.Hence, these two distributions are also not significantly different at a 1% level of significance, providing further evidence about the good performance of the method.
Let us now consider the estimated conditional quantiles of η and μ.For τ ∈ {0.025, 0.05 , 0.25, 0.5, 0.75, 0.95, 0.975}, let n τ η (or n τ η /500) and n τ μ (or n τ μ /500) be the number (or proportion) of observed η i and μ i in the th simulated data set that are less than Qη i (τ | α, β, γ , x i ) and Qμ i (τ | α, β, γ , x i ) respectively, where = 1, . . ., 200.For each , we further calculated mean squared error (MSE) between n τ η /500 and τ and between n τ μ /500 and τ respectively.A good performance of the method is expected if these MSE values are all small for = 1, . . ., 200.The average MSE value for η is 0.0065 with a 95% confidence interval [0.006, 0.0069] and that for μ is 0.0051 with a 95% confidence interval [0.0049, 0.0054].Clearly, both of these MSEs are very small.In summary, the performance of the estimation method is satisfactory in this study.

©
2024 The Authors.Australian & New Zealand Journal of Statistics published by John Wiley & Sons Australia, Ltd on behalf of Statistical Society of Australia.

Figure 1 .
Figure 1.Plots of the first simulated data set in the simulation study.

Figure 2 .
Figure 2. AUC plots for the fitted models.(a)-(c) Black curves for QFR model and grey curves for BLR, HC and BQR models.(d, e) Black curves for BLR model and grey curves for HC and BQR models.(f) Black curve for HC model and grey curve for BQR model.

Figure 3 .
Figure 3. Plots of the observed bank marketing data in the training set.

Figure 4 .
Figure 4.The parameters (points) of the four response models estimated using the bank marketing data.The vertical line segment shows the 95% credible/confidence interval.The vertical dashed line indicates that the confidence interval is outside the range of the graph.

Figure 5 .
Figure 5. (a) ROC curves on the training set.(b) ROC curves on the test set.The vertical line corresponds to the score threshold 0.5.The continuous, dash-dotted, dotted and dashed ROC curves correspond to QFR, HC, BQR and BLR models respectively.

Figure 6 .
Figure 6.The lift chart for propensity score determination.The grey and darker curves correspond to j ≤ 4 and j ≥ 5 respectively.The horizontal line represents the baseline.

Figure 7 .
Figure 7.The lift chart for the test set: Grey and black curves correspond to the QFR model with j ≤ 4 and j ≥ 5 respectively.Dotted, dot-dashed and dashed curves correspond to BLR, BQR and HC models respectively.Horizontal line represents the baseline.

©
2024 The Authors.Australian & New Zealand Journal of Statistics published by John Wiley & Sons Australia, Ltd on behalf of Statistical Society of Australia.

Figure D1 .
Figure D1.Top two rows: Time series plots of the posterior samples of the model parameters for ξ = 2.Last two rows: Plots of the posterior marginal density functions of the model parameters, where the darker continuous, grey continuous and dotted curves correspond to ξ = 1, 2 and 3 respectively, and the vertical lines correspond to the true model parameters.

Table 1 .
Lifts for propensity score determination.

Table 2 .
Lifts for test set in target selection.

Table 3 .
Lifts for propensity score determination.

Table 4 .
Lifts obtained from different models on the test set.

Table 5 .
Lifts obtained from machine learning methods.