Modeling category-level purchase timing with brand-level marketing variables

Authors


Abstract

Purchase timing of households is usually modeled at the category level. However, many potential explanatory variables are observed at the brand level. To explain interpurchase times one has to either construct category-level measures of marketing efforts, or integrate the model with a model for brand choice. In this paper we pursue the latter where we use latent brand preferences to capture the relevance of the marketing mix of an individual brand. We compare our new model with several standard approaches on in-sample and out-of-sample fit and on the interpretation of the estimates of key parameters. Copyright © 2009 John Wiley & Sons, Ltd.

1. INTRODUCTION

It is important for store managers to understand the influence of the marketing mix on purchase timing. This information can, for example, be used to determine the optimal time between promotional activities and for active stock management. To describe purchase timing several models have been proposed in the literature; see Gupta (1991), Jain and Vilcassim (1991), Helsen and Schmittlein (1993), Seetharaman (2004), and Seetharaman and Chintagunta (2003) for a recent overview. One usually aims at describing the relation between interpurchase times and various explanatory variables. These explanatory variables can be divided into two groups. The first group corresponds to household-specific variables, like household size and family income, but also variables such as the current stock of the product and the time since last purchase within the product category. These variables can be directly linked to the interpurchase times. The second group contains marketing-mix variables, like price and the presence of promotional activities. These variables cannot be directly linked to the interpurchase times, as marketing-mix variables are observed at the brand level and purchase timing is modeled at the category level.

In the ideal case, we would have knowledge of the preferred brand of each household at every moment in time. To explain purchase timing we could then use the marketing mix of the brand that is purchased or would be purchased at any moment in time. In practice this is, of course, not feasible. First of all, data collection would be practically impossible. Secondly, the household may not have a unique preferred brand at every point in time. Therefore the researcher should either somehow summarize the marketing efforts of all brands into a category-level index or construct an integrated model of choice and purchase timing. This final task is exactly the research question we address in this paper. The key question can be summarized as: how to include marketing-mix variables of individual brands in a category-level interpurchase time model?

One may think that the answer to this question is just to use the marketing mix of the purchased brand. The problem is, however, that brand choice is only revealed at purchase occasions and is not available at non-purchase moments. One may opt to use the marketing mix of the previously purchased brand for the non-purchase moments, but this is likely to be suboptimal as households may switch brands; see, for example, Vilcassim and Jain (1991). In fact, a household may change preferences several times between two purchases, especially if the marketing mix changes in this period, for example, due to promotions.

We are, of course, not the first to notice these problems in modeling interpurchase time. In every purchase timing study the researcher will have to decide how to construct category-level marketing-mix variables. An often-used solution is to consider a weighted average of brand-specific marketing-mix variables. The weights are usually household specific and are obtained from choice shares of the particular household; see, for example, Gupta (1988, 1991) and Chib et al. (2002). A disadvantage of weighting the marketing mix using choice shares is that household-specific information is required to obtain the weights. This approach is therefore less suitable for out-of-sample forecasting. Furthermore, as choice shares are by definition constant over (periods) of time the model does not take into account that preferences may change. A simple solution is to weigh the marketing mix with brand choice probabilities following from a logit specification, although there is a more elegant solution, which we will propose below.

Another popular approach amounts to using the so-called inclusive value from a brand choice model as a summary statistic for the marketing efforts in a category; see, among others, Bucklin and Gupta (1992), Chintagunta and Prasad (1998), and Bell et al. (1999). The inclusive value has the interpretation of the expected maximum utility over all brands in the category. The inclusive value naturally depends on the marketing mix of all brands. A large expected utility is likely to be positively correlated with the probability of a purchase in the category. Although theoretically appealing, this specification is rather restrictive. In the corresponding purchase timing model there is only one parameter that relates all marketing efforts of all brands to purchase timing: that is, the coefficient corresponding to the inclusive value. Moreover, the effects of marketing variables are restricted to be similar on choice as on purchase timing. Another problem may be that the relation between the inclusive value and purchase incidence may only hold within households. Between households there may be substantial differences in inclusive value that are not related to differences in purchase timing. A household with a strong brand preference may have a larger inclusive value than a household with less pronounced preferences. Of course, one cannot conclude from this that the former household will on average have shorter interpurchase times. The between-household differences will be even more pronounced when unobserved heterogeneity in brand preferences is incorporated in the brand choice model.

To meet the limitations of the above-mentioned solutions, we introduce in this paper a new model. The idea behind this model is to use brand choice probabilities as indicators of brand preferences. The brand choice probabilities are the best information we have on the preferences of households. In the model we combine brand-specific hazard functions using choice probabilities. During non-purchase weeks the brand choice is treated as a latent variable. We only observe this variable at the purchase occasion.

This idea not only potentially improves the purchase timing model, but it could also add to the performance of the part of the model related to brand choice. The fact that a household does not make a purchase in a particular week also reveals information about the preferences of this household. Although this information may be very useful, it is usually ignored when modeling brand choice. For example, consider the situation where a household frequently purchases a certain brand that is also frequently promoted. Assume that this household never purchases other brands when they are promoted. If one only considers purchase occasions one may overestimate the effect of promotions on brand choice as the non-purchase promotional activities are completely ignored. The fact that the household does not purchase the other brands even when they are promoted implies that it has a strong base preference for the frequently purchased brand.

Integration of the interpurchase time model and brand choice model could therefore lead to a better performance on explaining brand choice as well as interpurchase time. In the resulting model the brand choices of households are revealed at purchase occasions, while at non-purchase occasions the preferred brand is treated as a latent (unobserved) variable. In this way, we also use information revealed by households at non-purchase occasions to describe brand choices and interpurchase timing. We will call this specification the latent preferences purchase timing model. This integrated model is also useful if one is only interested in the purchase timing. In this case the model provides a coherent framework for including marketing-mix variables in the duration model.

The outline of this paper is as follows. In Section 2 we propose the latent preference purchase timing model. We briefly discuss two standard approaches in the literature as well as a third alternative. In Section 3 we compare our new approach with the alternative solutions using data on purchases in the detergent category. We present a comparison based on in-sample fit and out-of-sample forecasting performance. Furthermore, we discuss differences in estimates of key parameters in the different models. Finally, in Section 4 we conclude.

2. MODELING INTERPURCHASE TIMING

One of the most popular approaches to describe duration data in economics is to use the concept of a hazard specification; see, for example, Lancaster (1979) and Kiefer (1988). This model is also frequently used to describe purchase timing at the category level; see, for example, Jain and Vilcassim (1991), Vilcassim and Jain (1991), Helsen and Schmittlein (1993) and Chintagunta and Haldar (1998), among many others. Before we discuss the hazard specification for modeling purchase timing, we first introduce some notation.

Let din be the purchase timing of the nth purchase of household i in calendar time, n = 0, …, Ni. The Ni observed interpurchase times are therefore defined as tin = dindi, n−1, with n = 1, …, Ni. Note that t refers to the time in a particular duration. The value of t equals 0 at the beginning of each duration.

Explanatory variables may vary during the spell. In all practical cases, the variables evolve according to a step function; that is, they are constant during a period of time and then jump to a new level. Marketing variables tend to be constant within a calendar week. Denote by τl, l = 1, …, L, the time indices of a change in one of the covariates. For ease of discussion we refer to the period between τl and τl+1 as a ‘week’. Note that the time between τl and τl+1 does not have to be equal for different l. Using this notation, week 1 corresponds to the interval (τ0, τ1]. Furthermore, denote by Kin(t) the week number corresponding to t time periods after the start of the nth interpurchase spell of household i. In Figure 1 we give a graphical representation of the purchase process. In this example we have purchases in weeks 2 and 4, and in this case we would have Kin(0) = 2 and Kin(tin) = 4.

Figure 1.

Graphical representation of purchase occasion di, n, interpurchase time ti, n and time indexes of changes in covariates τl

The hazard function for the nth interpurchase time for household i is denoted by λin(t), where t = 0 corresponds to the start of the interpurchase spell. As the basic building block of the model we use a general hazard function g(t; win(t)), where win(t) denotes a vector of explanatory variables at duration t associated with the nth interpurchase time. The specification of g(·) depends on the type of hazard model chosen; for example, the proportional hazard specification (Cox, 1972) is given by

equation image(1)

where λ0(t) is a baseline hazard function; see Gupta (1991), among many others, for a similar approach. In this specification the sign of γ gives the direction of the effect of an increase in win(t) on the hazard. That is, if γ> 0 an increase in win(t) results in a decrease of the expected interpurchase time. Our approach, however, does not depend on the particular choice of hazard specification. One could also consider the Accelerated Lifetime Model (Prentice and Kalbfleisch, 1979; Chintagunta, 1998) or the Additive Risk specification (Aalen, 1980; Seetharaman, 2004).

Marketing-mix variables are constant within one calendar week, which leads to the natural assumption that the brand choice preferences of households are constant within a week. We denote the brand preferences of household i at duration t associated with the nth purchase time by ymath image. Although one could consider smaller time intervals to allow for more frequent changes in preference, the (discrete) preference process, by definition, cannot develop in continuous time.

In case one only has household-specific variables, the hazard function in (1) can directly be applied by including the household characteristics in win(t). However, if one wants to include marketing instruments in the model, it is unclear which brand's marketing-mix variables or which combination of brand-specific variables should be included in win(t). Adding the marketing instruments of all brands leads to a model with many parameters, especially if the number of brands is large. Furthermore, the parameters will be difficult to interpret. Adding the marketing mix of the preferred brand is impossible as brand choice is only revealed at purchase occasions and not in between. Instead, the researcher should either model brand choice and purchase timing simultaneously or somehow aggregate the marketing instruments for all brands to the category level. Of course, the weights in the aggregation can be different for each household. Integrating both models seems to be optimal as the brand choice model gives the best summary of households' brand preferences.

Below we discuss the integrated approach in more detail. Alternative methods will be discussed in Section 2.2. For simplicity of notation we will assume that the model only includes marketing instruments. Other types of explanatory variables can be included in the usual way.

2.1. Latent Preference Purchase Timing Model

The brand choice of a household in a certain week is observed if the household makes a purchase in the product category. To describe brand choice at the purchase occasion and the household's preferences in the weeks in between purchases, we use a conditional logit specification:

equation image(2)

where the random variable Yik gives the (hypothetical) brand choice of household i out of J brands in week k, and where β measures the effect of the marketing mix xijk on brand choice; see, for example, Guadagni and Little (1983). We impose αJ = 0 for identification. The variable xijk denotes the marketing mix of brand j experienced by household i in week k.

During non-purchase weeks, we do not observe brand choice but we assume that households do have a preferred brand. The preferred brand choice is treated as a latent variable and takes the role of the brand choice. To explain the purchase timing part of the model, consider the hypothetical situation where we know the preferred brands of households in all weeks, including those where no purchase is made. Assume that the preferred brand in week k is given by yik and that the hazard function in this week is given by

equation image(3)

where we use xijn(t) as short-hand notation for xmath image, where g(·) is given in (1), and where I[·] is an indicator function. The indicator function equals one if the condition in brackets is true and zero otherwise. The brand choice probabilities are given by the logit probabilities Pr[Yik = yik].

In practice, the marketing-mix variables are constant during a week. Therefore, we assume that the brand preferences given the marketing mix are also constant during a week. Given these assumptions, the joint density function of a duration starting at di, n−1 and preferred brands yik for weeks k = Kin(0), …, Kin(t) is given by

equation image(4)

where equation image denotes the survival function given the brand choices before time t; see Kiefer (1988) and van den Berg (2001). The survival function gives the probability of no purchase up to time t; it is defined through the integrated hazard function, that is,

equation image(5)

Using the assumption of constant marketing mix and brand preferences during a week, we can expand the survival function to obtain

equation image(6)

The first part of (6) refers to the week in which the purchase is made. The middle part concerns the period from the start of the spell to the end of the first week. The third part of (6) deals with all other periods of constant preferences and marketing mix.

So far we have assumed that we know the preferred brands, even at weeks where there is no purchase at all. Of course, we do not observe brand preferences at weeks without purchases. Hence, we have to sum over all possible realizations of the latent brand preferences in these weeks to obtain the joint density of the interpurchase time and the brand choice at the purchase occasion. Hence, we sum (6) over all possible values of yik in weeks k = Kin(0), …, Kin(t)− 1 to obtain the joint density of the interpurchase time t and the (observed) brand choice ymath image, that is,

equation image(7)

The likelihood contribution Lin of the nth interpurchase time of household i resulting in a purchase of brand yi, Kin(tin) equals f(tin, ymath image). Hence, the joint likelihood function of the interpurchase times and the brand choices reads

equation image(8)

Parameter estimates for the interpurchase time and brand choice model can be obtained by maximizing the log-likelihood function with respect to the model parameters. As in any continuous time duration model the model automatically accounts for non-purchase weeks through the contribution of the integrated hazard to the likelihood. Roughly speaking, optimizing the likelihood corresponds to maximizing the hazard at purchase occasions while minimizing the integrated hazard over all non-purchase occasions.

We end this subsection with a note on the interpretation of the parameters in the brand choice part of the model. The interpretation of the parameters of the brand choice specification in the latent preference model (7) is different from the interpretation of a separate brand choice model. The separate brand choice model considers brand choice conditional on a purchase. The brand choice specification in the joint model also captures preferences during non-purchase occasions.

2.2. Alternative Approaches

The current solution treats the brand preferences at non-purchase occasions as a latent variable. In this section we briefly discuss three alternative approaches to model interpurchase times while still using brand-specific explanatory variables; two of these are widely used in the literature. One possible solution is to simply weight the marketing mix with the observed household-specific choice shares as in, for example, Gupta (1988, 1991). The choice shares are constant over time. To allow for a more flexible specification, one may also replace the choice shares by the brand choice probabilities. Another approach is to include an inclusive value from a brand choice model in the hazard specification as in, for example, Chintagunta and Prasad (1998). Below we discuss these alternatives in more detail.

Choice Share Weighted Average of Marketing Mix

To construct marketing-mix variables at the category level, one may weigh the marketing mix over the J brands using observed choice shares. Hence, the hazard function is given by

equation image(9)

where cij denotes observed choice share of brand j for household i; the function g(·) is defined in (1). The household-specific choice shares are usually estimated using the in-sample purchases. Out-of-sample forecasts would have to be based on the in-sample choice shares. This approach is therefore not useful in case one wants to predict purchase timing of households for which no purchase history is available.

For this model, the likelihood contribution of the nth interpurchase time follows from standard duration theory and is given by

equation image(10)

Note that in this case we do not need to estimate a choice model. The choice share weights can directly be obtained from the data. To facilitate comparison of the likelihood value of this model to that of the latent preference model and the other alternatives discussed below, we augment (10) with the brand choice probabilities of the actually chosen brand at the end of the nth interpurchase time, that is, Pr[Ymath image = ymath image], see (2). The complete likelihood function now becomes

equation image(11)

Preference Weighted Average of the Marketing Mix

An alternative to the choice share approach is to use a weighted average of the marketing mix, where the weighting scheme follows from choice/preference probabilities Pr[Yik = j] in week k. For this weighting scheme, the hazard specification is given by

equation image(12)

where again g(·) is as defined in (1) and the brand choice probability Pr[Yik = j] is given in (2). The advantage of this approach over using choice shares as weights is that this method allows the weights to evolve over time. Changes in preferences, for example due to promotions, are therefore accounted for in this weighting scheme. Additionally, this approach can be used to construct out-of-sample weights for households with unknown purchase history. Although this approach may seem straightforward, as far as we know this approach has never been used in the literature.

Note that this approach is different from our latent preference purchase timing model. Here we construct a weighted average of the marketing mix, while in the latent preference model we weight (loosely speaking) the brand-specific hazard specifications with the latent brand preference.

Inclusive Value

Another frequently used approach is to include the inclusive value from a brand choice model as an additional explanatory variable in the hazard function; see, for example, Chintagunta and Prasad (1998). Consider the brand choice model given in (2). The inclusive or category value is defined by

equation image(13)

This expression has the interpretation of the expected maximum utility over all J brands in the category. The inclusive value is added to the hazard function as an explanatory variable and hence, in case of no other explanatory variables, the hazard function is given by λin(t) = g(t;IVin(t)), where g(·) is defined in (1).

Again, the model parameters can be estimated using the maximum likelihood approach. There are two possibilities. One could first estimate the parameters of a brand choice model and construct the inclusive value, which is in turn used to estimate the parameters of the purchase timing model. It is, however, more efficient to estimate both models simultaneously. In this case we maximize the joint likelihood of the choice model and the purchase timing model. For this specification, the likelihood contribution of the nth interpurchase spell of length tin resulting in a purchase of brand ymath image(tin) reads

equation image(14)

If one decides to model purchase incidence using a binary logit specification, the inclusion of an inclusive value can also be justified as a nested logit model specification; see, for example, Ben-Akiva and Lerman (1985), Franses and Paap (2001) and Train (2003).

In this specification, the inclusive value captures the correlation between the purchase timing and brand choice decision. This approach is followed by, for example, Ailawadi and Neslin (1998) and Bell et al. (1999).

3. APPLICATION

In this section we compare the performance of our latent preference purchase timing model with various model specifications using household panel scanner data. For this comparison we estimate the four different specifications discussed in Section 2 on data in the detergent category. The performance of the different specifications is measured using in-sample and out-of-sample criteria. Furthermore, we analyze differences in parameter estimates.

3.1. Data

The data we use are part of the so-called ERIM database, which is collected by A. C. Nielsen. The data span the years 1986–1988, and the particular subset we use concerns purchases of detergent by households in Sioux Falls (South Dakota, USA). We consider only households making at least four purchases and buying only brands that are available throughout the complete sample. This reduces the total number of households available to 419. We randomly split the datasets into two parts such that the number of households is roughly the same in both samples. The first part is used to estimate the parameters of the various models, while the second part is used for out-of-sample model comparison. We have 228 households in the first part and 191 households in the second part of the sample. The number of interpurchase spells is 2531 and 1867, respectively. These purchases are spread over 13 different stores. Note that not all households are observed directly from the start of the observational period. We distinguish between six brands; that is, Cheer, Oxydol, Surf, Tide, Wisk and a rest brand. The brand-level marketing-mix variables are obtained as a weighted average of product-level figures. To accommodate for differences across stores, we aggregate the brand-level characteristics over stores using household-specific weights. Table I shows some summary statistics of the data. The second column of the table displays the choice shares. Tide clearly is the largest player in this market. Of the other brands, Oxidol and Surf have the lowest choice shares (both close to 9%). The differences in the average price are relatively small. We notice, however, substantial differences in the standard deviations of the prices; that is, some brands have more frequent price discounts than others. Wisk and especially Surf have the highest standard deviation of price; these brands also have the highest average display frequency. Note that the average display frequency denotes the average proportion of products of a certain brand that is on display across all weeks and stores.

Table I. Summary statistics of choice shares, marketing-mix characteristics and interpurchase timesa
BrandChoice shareAverage price (100 oz)Standard deviation of priceAverage display
  • a

    Averages and standard deviation of the marketing-mix variables are computed across all weeks.

Cheer11.37%5.1280.3170.06%
Oxidol8.64%5.2290.3030.68%
Surf8.80%5.0260.6981.80%
Tide36.13%5.0590.2761.14%
Wisk10.28%5.1640.4592.23%
Rest24.78%5.0580.2970.20%
Average interpurchase time7.55 
Median interpurchase time5.14 
Standard deviation interpurchase time7.51 

3.2. Model Specification

The latent preference model, the inclusive value model specification and the weighted preference marketing-mix model specification contain a brand choice model. We use the standard conditional logit model (2) to describe brand choice in all three specifications. This model contains brand-specific intercepts, the marketing mix of all brands in the market (price and display) and a lagged brand choice dummy capturing state dependence.

To describe purchase timing we use the proportional hazard specification (1) with a log-logistic baseline hazard; to be more precise the baseline hazard reads -6pt

equation image(15)

where α> 0 and γ> 0. This specification allows the baseline hazard to be monotonically decreasing or inverted U-shaped.

As explanatory variables in the hazard model we use household characteristics and marketing-mix variables. We include household size and household income as these explanatory variables are known to influence interpurchase timing. To control for inventory effects, we include two household-specific inventory variables. As inventory is unobserved these variables have to be constructed from the data. The first variable we construct is the average quantity purchased by the household. This variable allows us to distinguish between heavy and light users. The second variable is a measure for the relative inventory of a household. To compute this variable we first calculate the (absolute) inventory. We set the initial inventory equal to the purchased quantity at the first shopping trip of the household. Note that the sample for a household begins with an observed purchase of that household. Each period we reduce the inventory by the consumption rate of the household, which is calculated by dividing the total purchased quantity of a household in the entire sample over the time the household is observed.1 If a new purchase is made the purchased quantity is added to the inventory. We do not allow for negative inventory. Next, we normalize the inventory levels across households by dividing the inventory over the average quantity purchased. The resulting measure can now easily be compared across households. For each household we measure the inventory in terms of their average purchase quantity. Both inventory variables are constructed at the category level and can be included directly in the hazard specification.

The available marketing instruments are price and display. As these marketing-mix variables are observed at the brand level, we consider our latent preference purchase timing model to combine brand-level marketing-mix variables with the category-level interpurchase times. Furthermore, we consider the three other approaches to include the brand-level marketing mix in the hazard specification as discussed in Section 2. The inclusion of the household characteristics is always the same across the four model specifications.

3.3. Model Fit

We estimate the parameters of the four model specifications using the purchase data of the 228 households in the first part of the sample. Before we turn to direct model comparison we first consider tests for the validity of the model specification for each of the four models. In linear regression models the residuals are often used for model validation. In the duration model there is no easy test that can be based on the residuals directly. In this paper we use a test for model misspecification which is based on the notion that the cumulative density function (CDF) evaluated in the observed durations (generalized residuals) are uniformly distributed in case of correct model specification; see, for example, Arnold (1990, p. 61, Theorem 2-7). We investigate the distribution of these CDF values for each of the four specifications. There are several tests available; different tests are sensitive to different deviations from the uniform distribution. We have applied the Kolmogorov test, the Kuiper test, the Anderson–Darling test and the Cramer–von Mises test to the generalized residuals of all four model specifications. In none of the cases can we reject the null hypothesis of uniformity. Based on these results we conclude that we do not have serious model misspecification.

Our interest in the estimation results is twofold. First of all we want to gain insight in the general fit of the different model specifications. Secondly, we are interested in the actual (differences in) parameter estimates. We first focus on the fit, without giving attention to the parameter estimates. In Section 3.4 we return to the interpretation of the parameter estimates

No Unobserved Heterogeneity

First, we analyze the differences in descriptive power, where we do not allow for unobserved heterogeneity in the brand choice and duration models. In this case the model specification based on individual choice shares (9) has an advantage over the other specifications. It allows for an easy representation of between-household heterogeneity in brand preferences. Differences in brand preferences will have a large influence on the relative importance of the marketing mix of individual brands on the purchase incidence decision. We therefore expect this specification to be superior in in-sample fit. For out-of-sample prediction, individual choice shares may not be available if we consider households outside the estimation sample. One may use the in-sample average choice share across households as a predictor for the out-of-sample individual choice shares. In this case the forecasting performance of the individual choice share specification will probably be lower. We expect that explicit modeling of (unobserved) heterogeneity in brand preferences will lead to the same or even better fit of the alternative models compared to the specification based on choice shares. This assertion is analyzed in the next part of this subsection.

Table II displays some in-sample and out-of-sample performance statistics of the four models. These four models are the latent preference purchase timing model, the purchase timing model with inclusive value and the two interpurchase time models which are based on weighted marketing-mix variables. As in-sample measures we consider the maximum log-likelihood value, the Akaike Information Criterion (AIC), and the Bayes Information Criterion (BIC). The value of the log-likelihood function for the out-of-sample observations evaluated at the in-sample maximum likelihood estimate is used to evaluate the forecasting performance of the various model specifications. For the choice share specification we consider two values of the out-of-sample log-likelihood. The first is based on household-specific choice shares estimated using out-of-sample observations, while for the second the choice shares are set to the in-sample average choice shares. This measure represents the case in which choice share information is not available when forecasting interpurchase times.

Table II. Performance measures of different interpurchase models without correcting for unobserved heterogeneitya
 Duration/choice models
Choice sharesbInclusive valueWeighted marketing mixLatent preferences
  • a

    Underlined entries indicate the best-performing model, per performance measure.

  • b

    The parameters of the interpurchase timing model using choice shares can be estimated independently from the brand choice model. To allow for easy comparison, the performance statistics show the results of the combination of the duration model and the brand choice model.

log L− 9757.28− 9769.72− 9760.4− 9755.09
AIC19548.619571.419554.819544.2
BIC19606.819626.319613.119602.5
Out-of-sample log L− 7696.12− 7700.73− 7688.41− 7682.79
 With out-of-sample shares− 7678.82 

Several conclusions can be drawn from the results. First, we notice that the specification based on household-specific choice shares does not have the best in-sample performance, which contradicts our expectation. Our latent preference model outperforms this specification by two likelihood points. The information criteria also indicate that the latent preference model produces the best fit. The choice-share-based specification ranks second best. Note that we do not count the choice shares as parameters in computing the information criteria, although strictly speaking these estimated shares are to be seen as parameters. If we would count the weights as parameters, the choice-share specification would be the lowest in rank on the AIC and BIC measures. Secondly, the inclusive value specification turns out to perform worst on all measures.

The latent preference model also performs best if we consider out-of-sample likelihood values, although now the choice-share specification comes out third. If the out-of-sample choice shares are assumed to be known, the performance of the choice-share specification improves dramatically. When using these choice shares this specification now performs best, as more or less expected. Notice, however, that in a typical out-of-sample forecasting exercise the choice shares will not be available.

Unobserved Heterogeneity

So far, the parameters of the models are assumed to be the same across individuals. We now investigate whether the results stay the same if unobserved differences in purchase planning are taken into account. To this end we consider the same models while allowing for unobserved heterogeneity in all model parameters. We model the heterogeneity using a latent segment specification; see Wedel and Kamakura (1999). The resulting model is estimated using the EM algorithm; see Dempster et al. (1977). As the convergence rate of the EM algorithm tends to be low close to the optimum, we use a rather loose convergence criterion and fine-tune the parameters using direct numerical maximization of the log-likelihood using the BFGS algorithm. To reduce the probability of ending up in a local maximum of the likelihood, we estimate the parameters of the heterogeneous models with different starting values. The results below are based on the parameter estimates of the model with the best likelihood value.

Unfortunately, we cannot use the model specification test described above for models with unobserved heterogeneity. This test relies on the independence of the observations. If we assume that there exists segments of households, the observations belonging to a household are no longer independent. Given the fact that allowing for unobserved heterogeneity provides more flexibility to the model specification, we, however, do not expect serious problems with model specification.

First, we compare the in-sample and out-of-sample performance for one, two, and three segments. We do not consider models with more than three segments as these turned out to result in very small segments. The smallest segment in this case corresponds to approximately 4% of the households, or only nine households. Table III presents the log-likelihood values for all four model specifications.

Table III. In-sample and out-of-sample log-likelihood values for the four competing models. Highest log-likelihood values per number of segments are underlined
Number of segmentsIn-sample log-likelihoodOut-of-sample log-likelihood
123123
Inclusive value− 9769.72− 9307.49− 9125.41− 7700.73− 7467.36− 7492.60
Pref. weighted mark. mix− 9760.40− 9310.20− 9082.25− 7688.41− 7449.80− 7376.55
Latent preferences− 9755.09− 9298.08− 9072.45− 7682.79− 7450.23− 7371.08
Choice shares (using in-sample shares)− 9757.28− 9303.97− 9078.29− 7696.12− 7456.74− 7383.40
Choice shares (using out-of-sample shares) − 7678.82− 7436.07− 7363.23

Overall, the relative performance of the different model specifications stays the same after correcting for heterogeneity. In-sample, the latent preference purchase timing model performs best for all heterogeneity specifications. The choice-share specification ranks second. For one and for three segments the specification based on a preference weighted marketing mix ranks third. For two segments this specification performs worst.

Out-of-sample, we see more or less the same picture, although when using two segments the preference weighted marketing mix specification has the best performance, but the difference with the latent preference specification is very small.

3.4. Parameter Interpretation

We now turn to a detailed discussion of the differences in parameter estimates across the different model specifications. We split this discussion into two parts. First, we discuss the parameters dealing with the choice decision (or brand preference). Next, we consider the model parameters directly dealing with the purchase timing. To allow for an easy comparison across models we choose to use three segments for all specifications, although the out-of-sample likelihood function value indicates that two segments for the inclusive value model is more appropriate.

Table IV displays the parameter estimates for the household's brand preferences for the four model specifications together with a standard brand choice logit model for the choices only. First we compare the relative sizes of the segments. Except for the pure choice model, for all models we appear to have found the same-sized segments. When inspecting the estimated brand intercepts we find that for the pure choice model the third segment hardly considers the brand Wisk (brand intercept − 10.26). The results for the other specifications show that once we take the purchase

Table IV. Parameter estimates for the choice model, based on three latent segments (standard errors in parentheses)
ModelaSegment 1Segment 2Segment 3
choicecsivpwmlpchoicecsivpwmlpchoicecsivpwmlp
  • a

    choice, choice model (no duration part) cs, duration model with choice shares; iv, inclusive value model; pwm, model based on preference weighted marketing mix; lp, latent preference model.

Cheer−0.85−0.50−0.77−0.49−0.47−0.97−0.42−0.42−0.42−0.430.77−0.56−0.24−0.53−0.54
 (0.15)(0.16)(0.19)(0.16)(0.16)(0.33)(0.15)(0.12)(0.15)(0.15)(0.22)(0.31)(0.18)(0.31)(0.31)
Oxidol−1.42−0.60−0.64−0.58−0.56−1.34−0.81−1.19−0.81−0.830.56−3.07−1.23−3.06−3.05
 (0.20)(0.17)(0.16)(0.16)(0.16)(0.38)(0.16)(0.16)(0.16)(0.16)(0.24)(1.02)(0.35)(1.01)(1.00)
Surf−0.21−0.44−0.45−0.45−0.45−1.09−0.72−1.27−0.72−0.72−1.07−0.840.22−0.85−0.85
 (0.12)(0.15)(0.14)(0.15)(0.15)(0.35)(0.16)(0.16)(0.16)(0.15)(0.34)(0.33)(0.15)(0.33)(0.33)
Tide0.490.340.300.360.360.020.03−0.290.060.06−0.390.070.540.080.07
 (0.11)(0.13)(0.11)(0.12)(0.13)(0.25)(0.13)(0.11)(0.13)(0.13)(0.28)(0.25)(0.18)(0.26)(0.26)
Wisk−0.32−0.69−0.74−0.66−0.65−0.70−0.61−0.92−0.60−0.61−10.26−0.44−0.13−0.38−0.42
 (0.13)(0.17)(0.16)(0.17)(0.17)(0.31)(0.15)(0.14)(0.15)(0.15)(26.37)(0.31)(0.15)(0.31)(0.31)
Price−0.36−0.52−0.37−0.48−0.26−0.70−0.17−0.29−0.110.50−0.62−0.31−0.28−0.320.16
 (0.13)(0.16)(0.15)(0.16)(0.22)(0.38)(0.17)(0.14)(0.17)(0.34)(0.34)(0.34)(0.10)(0.36)(0.38)
Display0.050.040.050.03−0.020.030.050.040.050.030.050.020.010.020.02
 (0.01)(0.01)(0.01)(0.01)(0.01)(0.02)(0.01)(0.01)(0.01)(0.01)(0.03)(0.02)(0.00)(0.02)(0.02)
Lagged choice1.322.562.522.542.524.451.291.841.291.301.023.802.843.803.80
 (0.07)(0.08)(0.08)(0.08)(0.08)(0.18)(0.09)(0.08)(0.08)(0.08)(0.14)(0.16)(0.16)(0.16)(0.16)
Segment probs:0.440.680.680.680.670.430.210.240.210.220.130.110.080.110.12

timing into account such a segment does not exist. Another remarkable difference between the choice model and the other models is in the estimates of the lagged choice variable. Especially for the second segment we find a rather large estimate for the state-dependence parameter.

For the latent preference model the price coefficient is insignificant for all three segments. In all other models price is significant in the largest segment. For display we find something similar. In the latent preference specification display seems to have a smaller influence compared to the other models.

In Table V we present the parameter estimates associated with the duration part of the models. The parameters α and γ determine the shape of the baseline hazard; see (15). Across the four model specifications there are hardly any differences in these parameters. This leads to the conclusion that for each model specification we have found the same type of segments. This is confirmed by the size of the segments.

Table V. Parameter estimates for the choice model, based on three latent segments (standard errors in parentheses)
ModelaSegment 1Segment 2Segment 3
csivpwmlpcsivpwmlpcsivpwmlp
  • a

    cs, duration model with choice shares; iv, inclusive value model; pwm, model based on preference weighted marketing mix; lp, latent preference model.

Intercept1.240.731.241.350.590.642.362.651.920.422.342.37
 (0.67)(0.37)(0.77)(0.81)(1.15)(0.55)(1.64)(1.69)(0.82)(1.23)(0.84)(0.84)
Household income0.010.010.020.010.07−0.010.060.060.130.280.130.13
 (0.01)(0.01)(0.01)(0.01)(0.02)(0.01)(0.02)(0.02)(0.03)(0.04)(0.02)(0.02)
Household size0.130.110.130.140.100.390.100.100.38−0.130.380.39
 (0.03)(0.03)(0.03)(0.03)(0.04)(0.03)(0.04)(0.04)(0.04)(0.05)(0.04)(0.04)
Avg. purch. quant.−0.02−0.01−0.02−0.020.07−0.260.040.04−0.43−0.51−0.43−0.43
 (0.03)(0.03)(0.03)(0.03)(0.07)(0.05)(0.08)(0.07)(0.06)(0.07)(0.06)(0.06)
Rel. inventory−1.59−1.66−1.60−1.69−0.41−0.47−0.41−0.410.060.000.070.07
 (0.15)(0.11)(0.15)(0.14)(0.05)(0.05)(0.05)(0.05)(0.03)(0.04)(0.03)(0.03)
Incl. value 0.44   0.65   2.17  
  (0.10)   (0.16)   (0.50)  
Price−0.18 −0.19−0.21−0.31 −0.64−0.70−0.51 −0.59−0.59
 (0.12) (0.14)(0.15)(0.22) (0.31)(0.31)(0.15) (0.16)(0.16)
Display0.04 0.030.070.04 0.020.030.02 0.020.02
 (0.01) (0.01)(0.01)(0.01) (0.01)(0.01)(0.01) (0.01)(0.01)
log(α)0.580.570.590.600.930.930.940.930.820.770.820.82
 (0.07)(0.07)(0.07)(0.07)(0.07)(0.06)(0.07)(0.07)(0.05)(0.08)(0.05)(0.05)
log(γ)−3.33−3.19−3.30−3.32−2.47−2.72−2.45−2.45−2.91−2.85−2.93−2.92
 (0.17)(0.16)(0.17)(0.17)(0.15)(0.13)(0.15)(0.14)(0.20)(0.26)(0.20)(0.20)
Segment probs:0.680.680.680.670.210.240.210.220.110.080.110.12

Contrary to the previous results we now observe slightly stronger price and display effects for the latent preference model compared to the other models. Although these differences are not significant, they are consistent across the three segments. For the inclusive value model we do not get explicit estimates for the effect of price and display on the purchase timing. Below, we consider the partial derivatives of the hazard function with respect to price to compare the total price effect on the hazard function across all models.

Especially for the second and the third segment the inclusive value model gives quite different parameter estimates for the household and the inventory variables. For the other specifications these parameters do not differ much. The largest segment (68% of the households) contains households that are very sensitive to the inventory level.

So far, we have considered the choice and the purchase decision separately. In practice these two decisions are of course connected. A price promotion of a product that a household will never consider will not influence his purchase timing. Therefore it is interesting to consider the impact of a marketing instrument (e.g., price) on the purchase probability. To this end we calculate the derivative of the hazard rate with respect to one of the prices, that is,

equation image(16)

This derivative can be interpreted as the derivative of the instantaneous probability of a purchase with respect to price. In this way we also account for the dependence of the choice/preference probabilities on the prices.

In the Appendix we present the formulas for all four model specifications. The derivatives depend on the length of the duration and on the value of all explanatory variables. To remove the dependence on time and on the other explanatory variables, we consider the derivative relative to the baseline hazard and we calculate the derivatives at the mean of the explanatory variables.

In Figure 2 we present the derivatives of the hazard rate corresponding to the first segment. Each graph presents the derivatives with respect to the price of a single brand, and each line in the graph corresponds to a particular model specification. The horizontal axis gives the relative price at which we calculate the derivative (holding all other variables fixed at their mean). The value 1 on the horizontal axis corresponds to the average observed price, while, for example, the value 0.8 indicates 80% of the average price.

Figure 2.

Derivative of the hazard function with respect to the price of one of the brands; results correspond to the largest segment in Table V

The first thing we notice is that the derivatives are not constant. The derivative tends to be higher for low prices. Hence households are more price sensitive when prices are low. The choice share specification is the exception here. For this specification, the derivative is almost constant. If the price of a brand is relatively low, the probability that this brand is preferred by a household is relatively large. Changes in the price of the brand will therefore have a large influence on the purchase probability in the category. A second observation is that the specification based on the inclusive value underestimates the effect of price relative to the other specifications. The values for the derivatives for the latent preference model and the model based on the preference weighted marketing mix are rather close. Unreported results show that for the other two segments the differences are somewhat larger.

3.5. Forecasting

In this section we consider the forecasting performance of the various model specifications. We separately consider forecasting brand choices and forecasting the duration. Forecasting the brand choice is relatively straightforward. At each purchase occasion the brand with the highest preference probability is taken as the forecast. To evaluate the forecasting performance we rely on the hit rate.

Forecasting duration is, however, not straightforward. The obtain an unbiased forecast we have to calculate the expected value of the interpurchase time. Calculating this expectation requires evaluating the integral equation image. The value of the density function f(t) depends on the marketing instruments at time t. Therefore, to compute the expected value of the interpurchase time, we need to know the value of all explanatory variables for every possible point in time in the future, which of course we do not know. A feasible alternative is to use the median of the distribution as a point forecast. To find the median of the distribution, we solve F(t) = 1/2. As long as the solution is contained in our data span, we are able to obtain a point forecast. To evaluate the forecasting performance we use the mean absolute deviation (MAD) as one can show that for this loss function the median is the optimal forecast. Observations for which the solution of F(t) = 1/2 is not contained in the data span are ignored when calculating the MAD.

In Table VI we present the forecasting performance of all models for one, two and three latent segments. All forecasts are made unconditionally; that is, we do not condition on previous observed behavior of a household. The first panel of the table shows the hit rate for forecasting the brand choice. It is remarkable that there are almost no differences across the specifications. The latent preference model comes out best, but the difference with the pure choice model is at most 0.2 percentage points. In the second panel we present the MAD for forecasting the duration. For all specifications the numbers are rather large. The median of the observed durations is 5.14. It turns out that forecasting interpurchase times is a very difficult task. One of the reasons for this is that the dispersion of interpurchase times is very large compared to the mean and median; see also Table I.

Table VI. Forecasting performance
Number of segmentsIn sampleOut of sample
123123
Forecasting choices (hit rate)
Choice0.7040.7050.7040.6720.6730.673
Inclusive value0.7040.7030.7040.6710.6720.672
Pref. weighted marketing mix0.7050.7050.7040.6730.6730.674
Latent preferences0.7050.7050.7050.6740.6740.674
Choice shares0.7040.7040.7040.6720.6730.672
Forecasting duration (mean absolute deviation)
Inclusive value4.2974.4854.4574.7274.6134.626
Pref. weighted marketing mix4.3074.4864.4704.7254.5834.595
Latent preferences4.3144.4854.4674.7224.5814.589
Choice shares4.3074.4844.4694.7134.5724.589
Choice shares (using in-sample shares)   4.7184.5914.606

The ordering of the models according to the in-sample MAD is not consistent with the ordering according to the in-sample likelihood values. In-sample, the inclusive value specification seems to provide the best forecasts. Out-of-sample, the choice share or the latent preference specifications seem to perform best. We do not want to emphasize these results to much. The differences in forecasting performance are rather small and the results show that all models forecast relatively poorly due to the large dispersion in the data.

4. CONCLUSIONS

In this paper we have introduced an integrated model of brand choice and purchase timing. The contribution of this model is twofold.

The first contribution is that the model provides an answer to the practical question of what to do with brand-specific marketing efforts when modeling purchase timing. As purchase timing is measured on the category level, one has to somehow aggregate the brand-level information. In the literature there are two popular techniques. Category marketing efforts are often formed by calculating a weighted average of the marketing mix of individual brands. In this case, household-specific choice shares are often used as weights. Another approach is to summarize all marketing-mix variables of all brands into the so-called inclusive value.

We have proposed two alternative specifications. First, we suggested a specification that integrates a brand choice model with the purchase timing. As another alternative we create category-level marketing instruments using household-specific weights that are obtained from a brand choice model.

In an empirical comparison of the four specifications we find that the latent preference model in general performs best in terms of likelihood fit. This holds in-sample as well as out-of-sample. The differences in forecasting performance across the four models are, however, relatively small.

The second contribution concerns the efficient use of all available information in one coherent framework. By treating brand choice as a latent variable in between purchases we also use the information revealed at non-purchase occasions. Our results show that when the available information is not used to its full potential one may overstate the effect of marketing instruments on brand choice and understate the effect on purchase timing. Our results show that marketing-mix activities affect purchase timing rather than brand-switching.

Acknowledgements

We thank Pradeep Chintagunta, Philip Hans Franses, and three anonymous reviewers for their comments. All calculations are done using Ox 4.02 (Doornik, 1999).

. APPENDIX: DERIVATIVES OF THE HAZARD RATE WITH RESPECT TO A MARKETING INSTRUMENT

In this Appendix we give the formulas for the derivative of the hazard rate at time t with respect to a marketing-mix variable at time t. These derivatives can straightforwardly be obtained from the model specifications in Section 2.

. Choice Share

equation image(17)

. Inclusive Value

equation image(18)

. Preference Weighted Marketing Mix

equation image(19)

. Latent Preference

equation image(20)
  • 1

    In the duration model it is difficult to deal with variables that continuously change over time. We therefore approximate the inventory process by a step function.

Ancillary