Abstract
 Top of page
 Abstract
 1. INTRODUCTION
 2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 3. THE PRICE CONSIDERATION (PC) MODEL
 4. CONSTRUCTION OF THE DATA SETS
 5. ESTIMATION RESULTS
 6. MODEL SIMULATIONS
 7. CONCLUSION
 Acknowledgements
 Appendix
 REFERENCES
 Supporting Information
The workhorse brand choice models in marketing are the multinomial logit (MNL) and nested multinomial logit (NMNL). These models place strong restrictions on how brand share and purchase incidence price elasticities are related. In this paper, we propose a new model of brand choice, the “price consideration” (PC) model, that allows more flexibility in this relationship. In the PC model, consumers do not observe prices in each period. Every week, a consumer decides whether to consider a category. Only then does he/she look at prices and decide whether and what to buy. Using scanner data, we show the PC model fits much better than MNL or NMNL. Simulations reveal the reason: the PC model provides a vastly superior fit to interpurchase spells. Copyright © 2009 John Wiley & Sons, Ltd.
1. INTRODUCTION
 Top of page
 Abstract
 1. INTRODUCTION
 2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 3. THE PRICE CONSIDERATION (PC) MODEL
 4. CONSTRUCTION OF THE DATA SETS
 5. ESTIMATION RESULTS
 6. MODEL SIMULATIONS
 7. CONCLUSION
 Acknowledgements
 Appendix
 REFERENCES
 Supporting Information
The workhorse brand choice model in quantitative marketing is unquestionably the multinomial logit (MNL). It is often augmented to include a nopurchase option. Occasionally, nested multinomial logit (NMNL) or multinomial probit (MNP) models are used to allow for correlations among the unobserved attributes of the choice alternatives, or the models are extended to allow for consumer taste heterogeneity. There have been rather strong arguments among proponents of different variants of these models.
But in two fundamental ways, all these workhorse brand choice models are more alike than different: (1) they assume essentially static behavior on the part of consumers, in the sense that choices are based only on current (and perhaps past) but not expected future prices, and (2) they all make strong (albeit different) assumptions about when consumers see prices, and when they consider purchasing in a category.
For instance, any brand choice model that does not contain a nopurchase option is a purchase timing/incidence model—of a very strong form. This incidence model says that a random and exogenous process1 determines when a consumer decides to buy in a category, and that the consumer only sees prices after he/she has already decided to buy. On the other hand, standard brand choice models that contain a nopurchase option make an opposite and equally strong assumption: that consumers see prices in every week,2 and decide whether to purchase in the category based on these weekly price vectors.3 And, in either case, whether one assumes consumers see prices always or only if they have already decided to buy, standard models assume that consumers then make decisions solely based on current (and perhaps past) prices.
In this paper we propose a fundamentally different model of brand choice that we call the “price consideration” or PC model. The difference between this and earlier brand choice models is that, in the PC model, consumers make a weekly decision about whether to consider a category, and this decision is made prior to seeing any price information. Of course, the decision will depend on inventory, whether the brand is promoted in the media, and so on. Only after the consumer has decided to consider a category does he/she see prices. In this second stage, the consumer decides whether and what brand to buy. Thus, the PC model provides a middle ground between the extreme price awareness assumptions that underlie conventional choice models (i.e., always vs. only when you buy) because in the PC model consumers see prices probabilistically.
We estimate the PC model on Nielsen scanner data for the ketchup and peanut butter categories. We compare the fit of the PC model to both MNL with a nopurchase option, and NMNL with the category purchase decision at the upper level of the nest. All three models incorporate (i) state dependence in brand preferences a la Guadagni and Little (1983), (ii) dependence of the value of nopurchase on duration since last purchase (to capture inventory effects), and (iii) unobserved heterogeneity in brand intercepts.4
We find that the PC model produces substantially superior likelihood values to both the MNL and NMNL, and dominates on the AIC and BIC criteria. Simulation of data from the models reveals that the PC model produces a dramatically better fit to observed interpurchase spell lengths than do the MNL and NMNL models. In particular, the conventional models greatly exaggerate the probability of short spells. For the PC model, this problem is much less severe.
To our knowledge, this severe failure of conventional MNL and NMNL choice models to fit spell distributions has not been previously noted—or, even if it has, it is certainly not widely known. We suspect this is because it is common in the marketing literature to evaluate models based on insample and holdout likelihoods, and fit to choice frequencies, while fit to choice dynamics is rarely examined. Since the PC model is as easy to estimate as the conventional models,5 we conclude it should be viewed as a serious alternative to MNL and NMNL.
2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 Top of page
 Abstract
 1. INTRODUCTION
 2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 3. THE PRICE CONSIDERATION (PC) MODEL
 4. CONSTRUCTION OF THE DATA SETS
 5. ESTIMATION RESULTS
 6. MODEL SIMULATIONS
 7. CONCLUSION
 Acknowledgements
 Appendix
 REFERENCES
 Supporting Information
Keane (1997a) argued that conventional MNL and NMNL choice models (with or without nopurchase) could produce severely biased estimates of own and crossprice elasticities of demand, if inventoryplanning behavior by consumers is important. To understand his argument, consider this simple example. Say there are two brands, A and B, and that consumers are totally loyal to either one or the other. The following table lists the number of consumers who buy A, B or make nopurchase over a 5 week span under two scenarios. First, a no promotion environment:
Week  1  2  3  4  5 

A  10  10  10  10  10 
B  10  10  10  10  10 
Nopurchase  80  80  80  80  80 
And second a scenario where Brand A is on promotion (say a 20% price cut) in week 2:
Week  1  2  3  4  5 

A  10  20  5  5  10 
B  10  10  10  10  10 
Nopurchase  80  70  85  85  80 
Now, notice that brand A's market share goes from 50% to 67% in week 2. Brand B's market share drops to 33%, even though no consumer switches away from B. Thus, a brand choice model without a nopurchase option would conclude that the crossprice elasticity of demand is substantial (i.e., 34/.20 = 1.7), even though it is, in reality, zero.6
Interestingly, including a nopurchase option does not solve the basic problem. Notice there is a postpromotion dip, presumably arising from inventory behavior, that causes the number of people who buy A to drop by 50% in weeks 3 and 4, before returning to normal in week 5. If we took data from the whole 5week period, we would see that Brand A's average sales when at its “regular” price are 7.5, increasing to 20.0 when it is on promotion. Thus, a conventional static choice model with a no purchase option would conclude the price elasticity of demand is (20.0/7.5 − 1)/(.20) = 8.3. The correct answer is that the short run elasticity is 5 and that the long run elasticity is zero, since all the extra sales come at the expense of future sales.
As Keane (1997a) notes, the exaggeration of the elasticity could be even greater if consumers are able to anticipate future sales. For example, suppose a retailer always puts brand A on sale in week 2 of every 5 week period. As consumers become aware of the pattern, they could concentrate their purchases in week 2, even if their price elasticity of demand were modest. Conventional choice models, however, would infer an enormous elasticity, since a modest 20% price cut causes sales to jump greatly in week two.
An important paper by Sun et al. (2003) shows that the problems with conventional choice models noted by Keane (1997a) are not merely academic. They conduct two experiments. First, they simulate data from a calibrated model where the hypothetical consumers engage in inventoryplanning behavior. Thus, Sun et al. know (up to simulation noise) the true demand elasticities. They estimate a set of conventional choice models on this data; i.e., MNL with and without nopurchase, NMNL with nopurchase. Each model includes consumer taste heterogeneity and state dependence. The MNL without nopurchase exaggerates crossprice elasticities by about 100%, while MNL and NMNL with nopurchase do so by about 50%. A dynamic structural model of inventoryplanning behavior fit to the same data produces accurate elasticity estimates—as expected since it is the “true” model that was used to simulate the data.
Second, Sun et al. estimate the same set of models on Nielsen scanner data for ketchup. Strikingly, they obtain the same pattern. Estimated crossprice elasticities from the MNL and NMNL models that include nopurchase are about 50% greater than those implied by the dynamic structural model that accounts for inventoryplanning behavior. And those from the MNL without nopurchase are about 100% greater.
Thus, marketing research needs to confront the fact that own and crossprice elasticities of demand are much greater when estimated from pure brand choice models than from models that include a no purchase option. What accounts for this phenomenon? As we've seen, one explanation is the inventoryplanning behavior that Keane (1997a) predicted would cause such a problem.7 In response, several authors, such as Erdem et al. (2003), Sun et al. (2003) and Hendel and Nevo (2004), have proposed abandoning conventional static choice models (i.e., MNL and NMNL) in favor of dynamic structural inventoryplanning models. Of course, the difficultly with these models is that they are extremely difficult to estimate.
In this paper, we adopt a different course of action. We seek to develop a simple model—no more complex than MNL or NMNL—that does not suffer from the problems noted by Keane (1997) and Sun et al. (2003). The motivation for our approach is two fold: First, we conjecture that a brand choice model with a more flexible representation for category consideration/purchase incidence may provide a more accurate reduced form approximation to consumer's purchase decision rules than the conventional models like MNL and NMNL. Second, we note there is a simpler way (than estimation of complex dynamic inventory models) to reconcile the pattern of elasticities across models noted by Sun et al. (2003). The idea is simply that consumers do not choose to observe brand prices in each period.
More specifically, we propose a twostage “price consideration” (PC) model in which a consumer decides, in each period, whether to consider buying in a category. This decision may be influenced by inventories, advertising, feature and display conditions. If a consumer decides to consider a category, he/she looks at prices and decides whether and what brand to buy.8 Such a model can generate a pattern where consumers are more sensitive to prices when choosing among brands than when deciding whether to buy in a category. The positive probability that consumers do not even consider the category (i.e., they do not even see prices) creates a wedge between the brand choice price elasticity and the purchase incidence price elasticity.
While this model is extremely simple, it has not, as far as we know, been used before in the marketing literature. It is important to note that the PC model is quite different from a nested multinomial logit (NMNL) model, where consumers first decide whether to buy in a category and then, in a second stage, decide which brand to buy. In the first stage of the NMNL, the decision whether to purchase in the category is a function of the inclusive value from the second stage, which is, in fact, a price index for the category. Thus, the NMNL model assumes that consumers see all prices in all periods. The second stage is different as well, because in our model the second stage includes a no purchase option. That is, if a consumer decides to consider a category, he/she may still decide, upon seeing prices, to choose nopurchase. A NMNL model can certainly generate a pattern that brand choice price elasticities substantially exceed purchase incidence price elasticities, but it would do so by assuming that brands are very similar. As noted above, such an assumption is inconsistent with data on brand switching behavior.
There are a number of ways we can test our model against alternative models in the literature. First, we can simply ask whether it fits better than simple MNL and NMNL models that include a no purchase option. Second, our model can be distinguished from the alternative dynamic inventory story for the same phenomenon by looking at categories where inventories are not important. If brand choice price elasticities exceed purchase incidence price elasticities even for nonstorable goods, it favors the simple story.
4. CONSTRUCTION OF THE DATA SETS
 Top of page
 Abstract
 1. INTRODUCTION
 2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 3. THE PRICE CONSIDERATION (PC) MODEL
 4. CONSTRUCTION OF THE DATA SETS
 5. ESTIMATION RESULTS
 6. MODEL SIMULATIONS
 7. CONCLUSION
 Acknowledgements
 Appendix
 REFERENCES
 Supporting Information
We use the Nielsen scanner panel data on ketchup and peanut butter for Sioux Falls, SD and Springfield, MO. The sample period begins in week 25 of 1986 for both categories. It ends in week 34 of 1988 for ketchup, and in week 23 of 1987 for peanut butter. The ketchup category has 3189 households, 114 weeks, 324,795 store visits, and 24,544 purchases, while peanut butter has 7924 households, 51 weeks, 258,136 store visits, and 31,165 purchases.13
During this period, there were four major brands in ketchup: Heinz, Hunt's, Del Monte, and the Store Brand; four major brands in peanut butter: Skippy, JIF, Peter Pan and the Store Brand. There are also some minor brands with very small market shares, and we dropped households that bought these brands from the sample.14 As a result, we lose 558 (out of 7924) households in peanut butter, and 101 (out of 3189) households in ketchup. The number of store visits is reduced to 236,351 in peanut butter, and 314,417 in ketchup. To impute the initial value of GL and purch_gap, we use the first 10 weeks of individuals' choice history for peanut butter, and 20 weeks for ketchup. This reduced the number of store visits in the data to 175,675 and 259,310 for peanut butter and ketchup, respectively.
Household characteristics included in Z_{it} are household income (inc_{i}) and household size (mem_{i}). Attributes of alternatives included in X_{jt} are an indicator for whether the brand is on display (display_{jt}), whether it is a featured item (feature_{jt}), and a measure of coupon availability (coupon_av_{jt}). Attributes of the category included in X_{ct} are a dummy for whether at least one of the brands is on display (I_{dt}), and a dummy for whether one of the brands is a featured item (I_{ft}).
A difficulty arises in forming the price variable because we model only purchase timing and brand choice, but not quantity choice. Yet each brand offers more than one package size, and price per ounce varies across package sizes. We need to have the price variable be on an equal footing across brands and weeks.15 Thus, we decided to always use the price of the most common size package when estimating our model  32 oz for ketchup, 18oz for peanut butter. Admittedly, this introduces some measurement error into prices, but the problem cannot be resolved by a different construction of the price variable—only by introducing quantity choice.
Another problem in constructing the price and promotion variables is that in scanner data we only observe the price paid by the consumer for the brand he/she actually bought. Similarly, we only observe whether a brand is on display or feature when the consumer chooses the brand. Therefore, prices, display_{j} and feature_{j} for other brands and weeks must be inferred. Erdem et al. (1999) discuss this “missing prices” problem in detail.
To deal with this problem, we use the algorithm described in Keane (1997b). It works as follows: (1) Sort through all the data for a particular store on a particular day. If a consumer is found who bought a particular brand, then use the marked price he/she faced as the marked price for that brand in that store on that day. (2) If no one bought a particular brand in a particular store on a particular day, use the average marked price in that store in that week to fill in the price. (3) If no one bought a particular brand in a particular store in a particular week, then use the average marked price of the brand in that store over the whole sample period to fill in the price.
The missing display_{j} and feature_{j} variables are inferred in a similar fashion. Of course, the observed display_{j}'s and feature_{j}'s are dummies equal to 0 or 1. However, since we may use weekly average values or store average values to fill in the missing display_{j}'s and feature_{j}'s, some of them end up falling between 0 and 1 in our data set.
Next we turn to our coupon variable. Keane (1997b) and Erdem et al. (1999) discuss the extremely severe endogeneity problem—leading to extreme upward bias in price elasticities—created by use of price net of redeemed coupons as the price variable. Instead, we use a brand/week specific measure of coupon availability, coupon_av_{jt}. Basically, this variable measures the average level of coupon redemption for a particular brand in a particular week. The algorithm for constructing it, which is rather involved, is also described in Keane (1997b).
A final problem is that a small percentage of observations have unreasonably high prices, presumably due to measurement/coding errors. So we created a maximum “plausible” price in each category—$ 3 for peanut butter and $ 2 ketchup (in 198588 nominal dollars)—and replaced prices above this maximum with the mean observed price of the brand. This procedure affected 2% of the observations for peanut butter, and 1.4% of the observations for ketchup.
Summary statistics for households are given in Table I, while those for brands are in Table II. Notice that the two categories are rather different. In peanut butter, the four brands have similar market shares (ranging from 29.9% for Skippy to 19.8% for Peter Pan), while in ketchup Heinz is dominant (with a market share of 63.1%). Households buy peanut butter and ketchup on 11.66% and 7.35% of shopping occasions, respectively. These figures give face validity to the idea of the PC model: when purchase frequency in a category is this low, it seems unlikely that households would check on peanut butter and ketchup prices in every week.16
Table I. Summary statistics of the data and household characteristics  Peanut Butter  Ketchup 


#Households  7,366  3,088 
Max #weeks observed per household  41  94 
#Store visits  175,675  259,310 
#Purchases  20,478  19,044 
Average household income*  5.94  5.99 
Average household size  2.84  2.73 
Table II. Summary statistics of product characteristicsPeanut Butter 

Alternative  No purchase  Skippy  JIF  Peter Pan  Store Brand 

#observations  155,197  6,137  4,867  4,061  5,413 
share (%)  88.34  3.49  2.77  2.31  3.08 
mean(p_{jt})  n.a.  1.842  1.917  1.887  1.366 
mean(feature_{jt})  n.a.  0.0211  0.0033  0.0380  0.0298 
mean(display_{jt})  n.a.  0.0036  0.0038  0.0317  0.0085 
mean(coupon_av_{jt})  n.a.  0.1030  0.063  0.215  0.0025 
Ketchup 
Alternative  No purchase  Heinz  Hunt's  Del Monte  Store Brand 
#observations  240,266  12,042  3,212  1,528  2,262 
share (%)  92.65  4.64  1.24  0.59  0.87 
mean(p_{jt})  n.a.  1.151  1.146  1.086  0.874 
mean(feature_{jt})  n.a.  0.0401  0.0433  0.0583  0.0236 
mean(display_{jt})  n.a.  0.0282  0.0329  0.0266  0.0183 
mean(coupon_av_{jt})  n.a.  0.1246  0.0835  0.0240  0.0045 
 Peanut Butter  Ketchup 
mean(I_{ft})  0.0904  0.1602 
mean(I_{dt})  0.0463  0.1027 
5. ESTIMATION RESULTS
 Top of page
 Abstract
 1. INTRODUCTION
 2. PROBLEMS WITH CONVENTIONAL CHOICE MODELS, AND THE PC ALTERNATIVE
 3. THE PRICE CONSIDERATION (PC) MODEL
 4. CONSTRUCTION OF THE DATA SETS
 5. ESTIMATION RESULTS
 6. MODEL SIMULATIONS
 7. CONCLUSION
 Acknowledgements
 Appendix
 REFERENCES
 Supporting Information
Table III presents results obtained by estimating the multinomial logit model (MNL), nested multinomial logit model (NMNL) and price consideration model (PC), using data for the peanut butter category. We estimate two versions of the PC model. In PCI, decisions to consider the category depend on category feature and display indicators, the purchasegap, and household size, while in PCII these factors influence the utility of the no purchase option as well.17
Table III. Estimates for the Peanut Butter Category  MNL  NMNL  PC I  PC II 

 Estimate  s.e  Estimate  s.e.  Estimate  s.e.  Estimate  s.e. 

α_{1} (Store Brand)  − 5.465  0.148  − 6.673  0.212  − 3.791  0.078  − 4.972  0.158 
α_{2} (JIF)  − 5.309  0.159  − 6.474  0.220  − 3.591  0.097  − 4.804  0.173 
α_{3} (Peter Pan)  − 6.003  0.161  − 7.217  0.224  − 4.317  0.102  − 5.499  0.175 
α_{4} (Skippy)  − 5.115  0.157  − 6.251  0.216  − 3.387  0.094  − 4.596  0.169 
α_{init, GL}  106.170  6.061  108.471  5.558  23.548  1.702  25.002  2.223 
α_{init, pg}  − 0.045  0.004  − 0.082  0.007  − 0.041  0.004  − 0.037  0.005 
β_{d}(display_{jt})  1.377  0.054  1.471  0.056  1.728  0.048  1.414  0.055 
β_{f}(feature_{jt})  2.061  0.039  2.201  0.042  2.259  0.032  2.084  0.040 
β_{c}(coupon_av_{jt})  1.590  0.157  1.986  0.172  1.605  0.164  1.599  0.166 
φ_{p}(p_{jt})  − 0.239  0.085  − 0.346  0.092  − 0.846  0.050  − 0.226  0.090 
φ_{inc}(p_{jt}·inc_{i})  0.017  0.003  0.024  0.003  0.018  0.003  0.018  0.003 
φ_{mem}(p_{jt}·mem_{i})  − 0.047  0.022  − 0.071  0.024  0.112  0.006  − 0.071  0.023 
State dependence: GL_{ijt} = δ*GL_{ijt−1} + (1 − δ)*d_{ijt−1} 
λ(GL_{ijt})  − 1.869  0.584  − 0.451  0.723  3.179  0.310  2.692  0.320 
δ  0.989  0.001  0.989  0.001  0.949  0.004  0.953  0.005 
Utility of no purchase: 
β_{fc}(I_{ft})  − 0.391  0.030  − 0.579  0.034    − 0.289  0.036 
β_{dc}(I_{dt})  − 0.608  0.038  − 0.712  0.038    − 0.493  0.046 
β_{mem}(mem_{i})  − 0.306  0.039  − 0.309  0.031    − 0.330  0.041 
β_{pg}(purch_gap_{it})  − 0.023  0.002  − 0.019  0.001    0.001  0.002 
η    − 0.855  0.122     
ρ( = 1/(1 + exp(η)))    0.702      
Probability of considering a category: P_{it}(C) = exp(L_{it})/(1 + exp(L_{it})) where: L_{it} = γ_{i0} + γ_{f}·I_{ft} + γ_{d}·I_{dt} + γ_{mem}·mem_{i} 
+ γ_{pg}·purch_gap_{it}, and γ_{i0} = γ_{0} + γ_{initial}·purch_gap_{i1} + υ_{i} 
γ_{0}      − 1.594  0.140  − 1.592  0.162 
γ_{initial}      − 0.029  0.012  − 0.042  0.013 
γ_{mem}      0.118  0.030  0.121  0.033 
γ_{f}      1.376  0.102  1.015  0.107 
γ_{d}      1.748  0.183  1.045  0.157 
γ_{pg}      0.717  0.035  0.868  0.055 
σ_{υ}      1.163  0.055  1.312  0.088 
− loglikelihood  74596.50  74546.882  73983.11  73869.12 
− 2(loglikelihood)  149193.00  149093.764  147966.23  147738.23 
AIC  149249.00  149151.764  148028.23  147808.23 
BIC  149539.45  149452.583  148349.79  148171.29 
One striking aspect of the results is that coefficient α_{init, GL} from equation (30), that captures how the presample initialized values of the GL variables are related to the brand intercepts, is very large and significant in all the models.18 Thus, the presample purchase behavior conveys a great deal of information about brand preferences.
Strikingly, there is no evidence for state dependence in the MNL and NMNL models. The coefficient on GL is the wrong sign (and insignificant in NMNL), and the parameter δ is close to one, implying a brand purchase hardly moves GL. The PCI and PCII models, on the other hand, imply significant state dependence. The coefficients on GL are 2.7 to 3.2, and the estimates imply a value of (1 − δ), the coefficient on the purchase dummy, of around. 05. Thus, e.g., in PCII a lagged purchase raises the next period utility for buying a brand by 0.127.
All four models imply similar average price coefficients. The PCI and PCII models imply that, for the “average” household the price coefficients are −.421 and −.321, respectively. Thus, e.g., in PCII, the effect of a lagged brand purchase on current utility from the brand is equivalent to the effect of a .127/.321 = 39 cent price reduction. Mean prices are in the $ 1.36 to $ 1.92 range, so this is a substantial state dependence effect.
Part of why the PC models imply substantial state dependence while MNL and NMNL do not is that the coefficient α_{init, GL}, while still substantial, is smaller in the PC models by a factor of 4. Thus, the MNL and NMNL models ascribe all the presample differences in purchase behavior to heterogeneity, while the PC models do not.
As we would expect, the display, feature and coupon availability variables all have large and highly significant positive coefficients in the brand specific utility functions in all four models. In the MNL and NMNL models, the utility of the nopurchase option depends negatively on the category feature and display indicators, duration since last purchase in the category (purch_gap), and on household size. The latter two findings are consistent with inventory behavior. Analogously, in the PC model, the probability of considering the category depends positively on the category feature and display indicators, duration since last purchase in the category, and household size. All these effects are highly significant.
It is useful to examine what the PC models imply about the probability of considering the category under different circumstances. Consider a baseline situation with no brand on display or feature, and a household of size 3 that just bought last period (i.e., purch_gap_{it} = 1).19 Then, the PCII model estimates imply the probability of considering the peanut butter category is 39.7%, on average.20 The estimates of γ_{f} and γ_{d} are 1.015 and 1.045, respectively. This implies that the consideration probability increases to 75.6% if one or more brands is on display and feature.21 [Note that, in peanut butter, the category display and feature indicators equal 1 in 4.63% and 9.04% of weeks, respectively]. Finally, the estimate of γ_{pg} is 0.868. This implies that, starting from the baseline, if we increase the purchase gap to 5 weeks, the probability of considering the category increases to 90.7%.22
The PCII model implies that, even conditional on having decided to consider the category, the category feature and display indicators, and household size, have significant negative effects on utility of nopurchase. The loglikelihood improvement from PCI to PCII is 114 points, while AIC and BIC improve 220 and 179 points, respectively. This suggests feature and display have some influence on brand choice beyond just drawing attention to the category.23
Finally, we turn to our main goal, which is to investigate whether the PC model fits the data better than MNL and NMNL. The bottom panel of Table III presents loglikelihood, AIC and BIC values for the four models. The likelihood value for NMNL is modestly better than that for MNL (i.e., 50 points). The ρ on the inclusive value is 0.70, which implies that brands are closer substitutes than suggested by the MNL model (see equation (9), and recall that if ρ = 1.0 the models are equivalent). As we discussed in Section 2 and 3.1, NMNL can generate brand choice price elasticities exceeding those for purchase incidence by assuming brands are similar.
However, the loglikelihoods for the PCI and PCII models are superior to NMNL by 563.8 points and 677.8 points, respectively. The AIC and BIC produce very similar comparisons. This is to be expected as the models have similar numbers of parameters (i.e., the MNL, NMNL, PCII and PCII models have 28, 29, 31 and 35 parameters, respectively). Thus, the PC models clearly produce better fits to the peanut butter data than the MNL and NMNL models.
Table IV presents estimates for the ketchup category. Here the estimate of ρ in the NMNL model is essentially 1, so MNL and NMNL are essentially equivalent. Hence, the parameters estimates and likelihood values for MNL and NMNL are essentially the same. The qualitative results for ketchup are quite similar to those for peanut butter. One small difference is that, in ketchup, MNL and NMNL do generate a positive coefficient λ on the GL variable (about 1.3), but it is not statistically significant. Furthermore, the state dependence implied by the point estimates is very weak. The estimates imply a value of (1 − δ), the coefficient on the purchase dummy, of around 0.005. Thus, a lagged purchase raises the next period utility for buying a brand by only about 0.007. As the price coefficient at the mean of the data is −.797, the effect of a lagged purchase on the current period utility evaluation for a brand is equivalent to a price cut of less than 1 cent.
Table IV. Estimates for the Ketchup Category  MNL  NMNL  PC I  PC II 

 Estimate  s.e  Estimate  s.e  Estimate  s.e  Estimate  s.e. 

α_{1} (Heinz)  − 3.902  0.170  − 3.902  0.176  − 2.860  0.084  − 3.693  0.181 
α_{2} (Hunt's)  − 5.251  0.171  − 5.252  0.176  − 4.084  0.087  − 4.915  0.182 
α_{3} (Del Monte)  − 6.130  0.173  − 6.129  0.181  − 4.959  0.095  − 5.786  0.187 
α_{4} (Store Brand)  − 5.966  0.166  − 5.969  0.174  − 4.799  0.082  − 5.593  0.182 
α_{init, GL}  86.105  7.695  86.245  4.269  16.720  1.921  16.327  1.013 
α_{init, pg}  − 0.037  0.002  − 0.037  0.002  − 0.030  0.002  − 0.031  0.003 
β_{d}(display_{jt})  1.097  0.041  1.097  0.042  1.128  0.032  1.137  0.042 
β_{f}(feature_{jt})  2.238  0.032  2.238  0.032  2.136  0.024  2.296  0.034 
β_{c}(coupon_av_{jt})  1.583  0.105  1.583  0.107  1.835  0.111  1.836  0.111 
φ_{p}(p_{jt})  − 1.001  0.148  − 1.000  0.154  − 1.680  0.080  − 0.963  0.154 
φ_{inc}(p_{jt}·inc_{i})  0.009  0.005  0.009  0.005  0.006  0.005  0.004  0.005 
φ_{mem}(p_{jt}·mem_{i})  0.055  0.041  0.055  0.042  0.249  0.012  0.037  0.042 
State dependence: GL_{ijt} = δ*GL_{ijt−1} + (1 − δ)*d_{ijt−1} 
λ(GL_{ijt})  1.338  0.800  1.346  0.970  4.235  0.453  4.323  0.438 
δ  0.995  0.0005  0.995  0.0003  0.973  0.004  0.972  0.002 
Utility of no purchase: 
β_{fc}(I_{ft})  − 0.037  0.029  − 0.037  0.034    0.244  0.035 
β_{dc}(I_{dt})  − 0.105  0.034  − 0.105  0.038    0.032  0.040 
β_{mem}(mem_{i})  − 0.217  0.045  − 0.217  0.031    − 0.259  0.048 
β_{pg}(purch_gap_{it})  − 0.011  0.001  − 0.011  0.001    − 0.002  0.001 
η    − 10.212  0.632     
ρ( = 1/(1 + exp(η)))    0.999      
Probability of considering a category: P_{it}(C) = exp(L_{it})/(1 + exp(L_{it})), where: L_{it} = γ_{i0} + γ_{f}·I_{ft} + γ_{d}·I_{dt} + γ_{mem}·mem_{i} 
+ γ_{pg}·purch_gap_{it}, and γ_{i0} = γ_{0} + γ_{initial}·purch_gap_{i1} + υ_{i} 
γ_{0}      − 1.043  0.163  − 0.937  0.157 
γ_{initial}      − 0.015  0.008  − 0.009  0.008 
γ_{mem}      − 0.036  0.035  − 0.070  0.033 
γ_{f}      1.527  0.107  1.925  0.161 
γ_{d}      1.035  0.141  1.274  0.182 
γ_{pg}      0.475  0.028  0.421  0.025 
σ_{υ}      0.866  0.114  0.716  0.107 
− loglikelihood  71945.204  71945.196  71357.804  71313.949 
− 2(loglikelihood)  143890.408  143890.393  142715.608  142627.899 
AIC  143946.408  143948.393  142777.608  142697.899 
BIC  144244.845  144257.489  143108.021  143070.945 
As with peanut butter, the PC models again imply much stronger state dependence. The price coefficient for an “average” household is −.964 for PCI and −.838 for PCII, which is similar to the values produced by the logit models. But the PC models generate values of λ·(1 − δ) of about (4.3)(.028) = .12, so a lagged purchase is comparable to about a .12/.84 = 14 cent price cut. Mean prices are $ 0.87 to $ 1.15, so this is a substantial effect.24
Another difference is that category feature and display variables have positive effects on the value of nopurchase in the PCII model, whereas in peanut butter they had negative effects. Thus, in ketchup, feature and display make a consumer more likely to consider the category, but, conditional on consideration, they make the consumer less likely to actually buy. In peanut butter, they made both consideration and conditional purchase probabilities higher.
To get a sense of how likely a household is to consider ketchup during a store visit, take a situation with no brand on display or feature, and a household of size 3 that bought last period.25 The estimates of PCII imply the probability the household considers buying ketchup is 33.4%, on average. This increases to 90.2% if one or more brands is on display and feature, a larger effect than in peanut butter. [Note that category display and feature indicators equal 1 in 10.3% and 16.0% of weeks, respectively]. Finally, if we increase the purchase gap to 5 weeks, the consideration probability increases from 33.4% to 69.4%, a smaller effect than in peanut butter.
Turning to the issue of model fit, the MNL and NMNL produce essentially identical loglikelihoods, for the reason noted earlier. The PCI model is superior by 587.4 points, and the PCII model is superior by 631.3 points. Again, the PCI and PCII models have only 2 and 6 more parameters that NMNL, so the AIC and BIC tell a very similar story. Thus, the PC model also produces a clearly better fit to the ketchup data than do the MNL and NMNL models.