The periodic review model with independent age‐dependent lifetimes

A retailer places orders periodically for items that are shipped by a wholesaler. Items that are not sold perish randomly and independently of one another, with the perish probability depending on the age class. We consider a first‐in‐first‐out policy for depleting items. We model this problem as a Markov decision process with stochastic demand, unit holding, outdating and ordering costs, plus unit penalty costs for lost sales. We prove convexity for the penultimate period and show convexity may not hold any earlier. A dynamic program can be solved optimally for small instances. We introduce both a one‐stage‐lookahead heuristic and a heuristic which is a combination of two existing standard approaches, the newsvendor and periodic review models. For simulated data, we compare these heuristics to the optimal solution for small problem instances and to further lookahead policies for larger problem instances. We show that the two new heuristics achieve results close to optimal. Our numerical study, which includes real data from a large European retail chain, highlights that products perishing independently from each other strongly affect model behavior compared to existing approaches from the literature.

to the inconvenience caused.Consequently, managing perishable products efficiently is crucial to the profitability of groceries in the retail industry.By taking the characteristics of customer demand and the perishability process into account, retailers can make better inventory decisions which balance satisfying customer needs and minimizing spoilage.
In this paper, we focus on food products.However, our model also applies to other settings such as the healthcare industry, within, for example, blood supply management.Red blood cells can be transfused up to 42 days after donation (Sarhangian et al., 2018), and blood platelets have a lifetime of only 5 days (Chao et al., 2018;Chen et al., 2019).Perishability is of special importance to blood platelets, since they are expensive to purchase, store and dispose, and lives may be put at risk if their management is not planned carefully (Duan & Liao, 2013;Haijema et al., 2007).
We consider store managers' ordering decisions for perishable products in the setting of a large European retail chain

Production and Operations Management
where customers pick older items first, that is, according to a first-in-first-out (FIFO) policy.The FIFO policy is preferred by retailers because it helps to reduce spoilage.Hence, newer items are placed in boxes under the older items or kept in the backroom storage and cold room (Reiner et al., 2013).FIFO is also commonly used in the literature (e.g., Chao et al., 2015;Karaesmen et al., 2011;Ketzenberg et al., 2015;Nahmias, 2011).
When placing an order, it is important for the retailer to forecast both future demand and the amount of products that will perish.Customer demand is stochastic and varies from day to day.The perishing process is characterized by two measures.One is the maximum shelf life, which is the maximum number of days a product can be sold.The other is the perish probability, which determines the probability with which a product has to be discarded at the end of a day.A product perishing affects the retailer both by incurring an outdating cost and by depleting the inventory available for the next day.The closer an item is to its maximum shelf life, the more likely it is to perish.
We model the retailer's problem as a periodic review inventory control model with lost sales.Customer demand has an arbitrary distribution that we do not require to be stationary.In each period, any item perishes independently of the others with a known, age-dependent perish probability.Therefore, items may perish out of sequence, that is, in a different order than they arrived.Any perished item incurs an outdating cost, but there is also a per-item penalty cost for unfulfilled demand.We formulate our model as a Markov decision process (MDP).
Our setting is similar to the one analyzed by Ketzenberg et al. (2015), but with two important extensions.The first is nonstationary demand, which is frequently observed in retailing, where demand tends to be higher at weekends.The second is all items perishing independently; Ketzenberg et al. (2015) assume dependent perishing where items of the same age class all perish at the same time.Our independence assumption increases the number of possible state transitions but is more realistic for the retail setting we consider; for example, one bad pack of strawberries does usually not spoil the whole batch.As we will show in our numerical study, independent perishing strongly affects model behavior so dependent perishing is not a good substitute.
To demonstrate the practical use of our model, we collected hourly data from a large European retail chain over a time horizon of 2 years for 66 stores.We analyze sales data and selling prices for five perishable products (fruit and vegetables) with a range of maximum shelf lives and age-dependent perish probabilities.The store managers currently use their own judgment to make order decisions, and the retail chain is considering using commercial software solutions in the future that would contain standard methods such as the periodic review or newsvendor model.In this paper, we compare such standard methods to the optimal solution and the heuristics we develop.
The paper proceeds as follows.In Section 2, we review the literature on periodic review models with perishable prod-ucts.The following sections make several contributions to this literature.In Section 3, we propose a new model for a real-life problem-the periodic review model with independent age-dependent lifetimes and lost sales.We also develop a simplifying cost transformation, a lower bound on the optimal cost and show that convexity holds in the penultimate period but may fail earlier.
For instances with a small state space, we obtain optimal solutions via dynamic programming.Each state can have a large number of follow-up states when inventories are large and maximum shelf lives long, so many realistic problem instances are computationally infeasible to solve optimally.Therefore, in Section 4, we introduce two heuristics tractable for larger problem instances.In Section 5, we analyze the performance of these two heuristics for both simulated and the aforementioned real data from the retail chain.We compare their performance both to each other and to simple order policies which either assume that items always perish (newsvendor model) or never perish (periodic review model).We show that there is a clear benefit from using these heuristics instead of simple policies, which are often applied in practice.In addition, our numerical results indicate that, in any considered problem case, at least one of our two heuristics is always close to optimality.

LITERATURE REVIEW
We review the literature of periodic review models for perishable products and stochastic demand.We distinguish between models that assume a fixed lifetime, and those that consider random lifetimes, which is also related to the literature on random yield.

Fixed lifetime
Research on fixed lifetime inventory systems originated with Van Zyl (1964), where the periodic review setting with backorders, stochastic demand, a FIFO inventory depletion policy, and a fixed lifetime of two periods is analyzed.Upper and lower bounds on the cost function are derived, and a nonstationary optimal ordering policy is shown to exist.Nahmias and Pierskalla (1973) add a per-item perish cost to this model, and Nahmias (1975) further extends to a lifetime of n periods.A myopic policy based on an approximated outdating distribution function is developed in Nahmias (1976).
The model of Nahmias (1975) has also been studied with lost sales instead of backorders.Nandakumar and Morton (1993) utilize expected outdate bounds similarly to Nahmias (1976) to derive a heuristic.Chao et al. (2015) investigate nonstationary demand which can be correlated to account for seasonality.They introduce two policies that balance cost components and derive worst-case performance bounds.They show that their results hold both for the backlogging and lost-sales settings.Williams and Patuwo (1999) assume a lifetime of two periods but allow positive lead times.They derive optimal order quantities for lead times of up to four periods.Minner and Transchel (2010) analyze a model with

Production and Operations Management
positive lead times under service-level constraints.Using their analysis of the inventory distribution, they compare a dynamic order size policy with constant order and base-stock policies.Haijema and Minner (2016) provide further insight into fixed lifetime models by comparing the performances of constant order policies, base stock policies, and hybrid policies that also take age-dependent inventories into account.Recently, Haijema and Minner (2019) consider a model with batch sizes and a mixture of demand via FIFO and last-in-first-out (LIFO) depletion policies.They propose new solution policies based on a division of stock into new and old products which reduces the complexity.

Random lifetime
The majority of the random lifetime literature concerns continuous review models.Indeed, Ketzenberg et al. (2015) point out that by the time of publication of their paper, only they and Nahmias (1977) consider a periodic review model for perishable products with random lifetime and stochastic demand.Nahmias (1977) extends the model by Nahmias (1975) to a stochastic lifetime while assuming that successive goods outdate in the same order that they arrive and that all goods of the same age perish or survive jointly.Using a similar approach as Nahmias (1976), an adjusted myopic policy is derived.Ketzenberg et al. (2015) address the random lifetime setting of Nahmias (1977) for lost sales.While they drop the assumption that products perish in the order in that they arrive, they do assume that all items in the same age group perish together.They evaluate the value of time and temperature history to determine the quality of goods.Assuming a lead time of one period, an infinite time horizon, and a stationary demand distribution, they develop heuristics that trade off simplicity and performance.Kouki and Jouini (2015) study the effect of random lifetimes on the performance of inventory systems.They develop an analytical solution for exponential and deterministic lifetime distributions and conduct a simulation study for the more general case of Erlang distributions.For a comprehensive review of the literature on all types of inventory systems, see Bakker et al. (2012), Goyal and Giri (2001), Janssen et al. (2016), Karaesmen et al. (2011), andNahmias (1982).
The periodic-review inventory problem that we consider with stochastic demand, lost sales, and age-dependent perishability where each item perishes independently has not been analyzed in the literature, despite this being a very common problem in retailing.Standard models from the literature solve extreme cases of our problem where either all or no items perish each day.We assess both these standard models and the age-group-dependent perishing model of Ketzenberg et al. (2015) in our independent-perishing setting.

Random yield
Related to modeling random lifetimes is the literature on random yield, as only a portion of the products makes it to the next period due to the probability of an item perishing.For example, Voelkel et al. (2020) suggest an MDP that consid-ers random yield during transport where either all or none of the items can spoil in each lead time period.Sonntag and Kiesmüller (2017) also regard stochastic demand and random yield in a multistage system.They introduce quality control systems to monitor the effect of random yield and achieve significant safety stock reductions.Kiesmüller and Inderfurth (2018) propose a periodic-review model with order inflation to solve the multiperiod inventory control problem with stochastic demand, fixed setup costs, and random production yield.They show that simple heuristics can perform very well for this complex problem if the parameters are adjusted to demand and yield risks appropriately.For a more extensive overview of the literature on random yield models, see Yano and Lee (1995) and, more recently, Kiesmüller and Inderfurth (2018).
Even though our setting is similar to random yield problems, common policies such as order inflation cannot be applied: the inflation factor would ignore the fact that inventory in the first period is not subject to random yield as no perishing has occurred yet.Furthermore, the inflation factor does not take the age structure into account, which is important in our setting with age-dependent perish probabilities.

MODEL
In this section, we formulate the mathematical basis for analyzing the stochastic periodic review model of a single stock-keeping unit with a random and age-dependent lifetime.We choose a finite horizon over a planning horizon of T periods, indexed by t = 1, … , T. In-stock units have a maximum lifetime of m periods.We assume that items perish independently of one another, each according to a Bernoulli distribution.The complete notation is summarized in Supporting Information Section EC.1.

Sequence of events
First, the retailer observes the state of the inventory: x t ≡ (x t,1 , … , x t,m ) denotes the discrete numbers of units in stock at the start of period t, where x t,i denotes those of age class i, in other words, those ordered in period t − i.The total onhand inventory is denoted x t ≡ ∑ m i=1 x t,i .Second, the retailer places an order q t .Third, a random demand D t occurs with the probability function f D t (⋅).According to the FIFO inventory depletion policy, units with the highest age class are purchased first.If D t exceeds x t , then excess demand is lost and a per-unit shortage penalty cost incurred.Otherwise, each unsold item incurs an inventory holding cost.Fourth, any unsold units perish independently of one another and the retailer disposes of perished units at per-unit outdating costs.Fifth, the ordered q t units arrive.This corresponds to an effective lead time of one period, which is common in retailing.

Inventory transition
We write Y t ≡ (Y t,1 , … , Y t,m ) for the inventory vector between steps 3 and 4 in the sequence of events above, in other words, ] + ] + follows directly from the FIFO depletion policy.Throughout we use capital letters for random variables and small letters for their realizations.Therefore, we write y t,i

Production and Operations Management
∑ m j=i+1 x t,j ] + ] + for the realization of Y t,i where demand d t occurs.Analogously to known inventory x t , we denote the vector of random inventories of different age classes at the start of period t as X t ≡ (X t,1 , … , X t,m ) and their sum as X t .
Newly ordered items have no yield loss, so x t+1,1 = q t .Units of other age classes transfer according to a binomial distribution, that is X t+1,i+1 = ∑Y t,i j=1 Φ i,j for 1 ≤ i ≤ m − 1, where Φ i,j are independently and identically distributed Bernoulli-distributed random variables with parameter (1 −  i ).The parameter  i denotes the perish probability for an item of age class i.Because the maximum lifetime is m,  m = 1.
For m = 3, Table 1 shows a sample trajectory that illustrates the dynamics of the process.We start with an inventory x 1 that consists of four, three, and five items of age classes 1, 2, and 3, respectively.In t = 1, we order q 1 = 7 items and a demand of d 1 = 4 occurs.Following the FIFO depletion policy, four items of age class 3 are sold.Next, items in y 1 perish independently with probability depending on the age class.In the example, two of the four age 1 items perish and none of the three age 2 items perish.Since the maximum lifetime is 3, the one unsold age 3 item perishes with certainty.The remaining items of ages 1 and 2 then form age classes 2 and 3 of x 2 , respectively, with the q 1 = 7 items forming age class 1.In t = 2, we order q 2 = 9 items and observe a demand of d 2 = 6.The demand depletes age classes 2 and 3 completely and uses one item of age class 1. Afterwards, one age 1 item perishes, and x 3 is formed both by the remaining inventories shifting by one age class and the q 2 = 9 items arriving.In t = 3, we order q 3 = 4 items and the demand of d 3 = 15 items consumes the entire inventory, with one unit of demand left unsatisfied.

Total cost
The cost components of our model are variable ordering, holding, penalty, and outdating costs.Ordering costs cq t are linear in the order size.We mark variables with a hat that will change based on a simplifying cost transformation (Proposition 1) presented later in this section.Holding costs are ĥ where  is a discount factor and q is the vector of orders from t = 1 to T. Note that (2) assumes that excess inventory can be salvaged at unit salvage value c at the end of the time horizon.The salvage value of older products would usually be less than 100% of the order cost c; however, the salvage value only affects the final period T, which is in the distant future, so the effect on the overall model is minimal.We conducted a sensitivity analysis to demonstrate this.For the 135 problem instances in the numerical study described in Section 5.1, if the 100% salvage value for all items is replaced by 50%, then the optimal costs rise on average only by 0.03%.If the 100% is replaced by 0%, the same rise is 0.05%.The benefit of assuming a 100% salvage value is that it allows the cost structure to be simplified, as will be seen later in this section in Proposition 1.
Our objective is to find the optimal cost min q Ĝ(q) and the associated optimal policy.To do this, we formulate our model as an MDP with the Bellman equation be the set of all possible combinations of inventories after perishing, given demand realization d t .Considering the randomness of both the outdating process and the customer demand, we obtain is the probability function of the binomial distribution with parameters n and , and we assume that the demand D t is bounded above by D max t .Equation (3) allows us to find the optimal solution using dynamic programming.The curse of dimensionality renders

Production and Operations Management
this method computationally intractable for large problem instances.For example, with m = 3, D max = 50, and T = 200, the optimal solution takes around 12 hours to calculate, which is not practically useful.However, we will utilize the MDP formulation to analyze the problem's structure and to subsequently obtain heuristic policies.We will also calculate optimal solutions for small problem instances and will use them as a benchmark for our heuristics.
For any problem instance, we may use the MDP structure to form a lower bound on the optimal cost using the following observation.If there is no maximum lifetime and the perish probabilities for each age are all the same, then the age of an item is insignificant, so the state of the MDP is simply the total inventory x t .This leads to a drastic reduction of the state space, and an MDP which is quick to solve.While an unrealistic model, if we set  ≡ min i=1,…,m  i as the perish probability for all ages (in other words, the number of perished items follows a binomial distribution with parameter ) we obtain a problem whose optimal cost is an easily calculable lower bound for the problem with maximum lifetime m and perish probabilities  1 , … ,  m .We will investigate the lower bound numerically in Section 5.4.
Next, similarly to Chao et al. (2015), we modify ĥ, p, and θ to absorb part of the per unit order cost c, so the only order cost remaining is paid at the terminal time and is independent of the order policy.This simplifies the structure of the model and the convexity proof of Proposition 2 because the cost incurred in period t is independent of the order q t .
We formulate this simplification in Proposition 1.The proof of the proposition follows a similar structure to Chao et al. (2015).The proof can be found in Supporting Information Section EC.2.Proposition 1.We define h as ĥ + ( −1 − 1)c, p as p −  −1 c and  as θ + c.Then, and with the terminal cost function is an equivalent MDP to the one formulated in (3).
Convexity holds at the penultimate period.The proof of the below can be found in Supporting Information Section EC.3.

Proposition 2. For any inventory x
Before the penultimate period, convexity of J t in q t is not guaranteed.In the following, we show a counterexample where J T−2 is not convex.
Let T = 3, the maximum lifetime m = 2, and the discount factor  = 1.Write  ≡  1 for the perish probability of age one items.The inventory state at time t is represented by the 2-tuple x t ≡ (x t,1 , x t,2 ).Suppose D t = 3 for t = 1, 2, 3, which has the following implications: • The retailer never optimally orders more than three items, since any excess on three items will not sell at age one and hence incur an unnecessary holding cost.Therefore, we may restrict the retailer to orders of at most three, which, assuming the starting age i stock level x 1,i ≤ 3 for i = 1, 2, means x t,i ∈ {0, 1, 2, 3} for t = 1, 2, 3, restricting the state space.
• Under this restriction of the state space, at any time period, all the age two items will sell (so none perish).Therefore, any unsold items must be age one (which perish with probability ), so the state can be delineated by the total inventory x t ≡ x t,1 + x t,2 , leading to a state space of {0, … , 6}.

HEURISTIC POLICIES
In this section, we develop and analyze heuristic policies.First, we discuss two policies which are often used in practice but are designed for the simple cases where either all or no product perishes.In Section 4.1, we introduce a Newsvendor policy optimal for the case where all product perishes, and in Section 4.2 a periodic review policy which assumes no product perishes.Next, in Section 4.3, we develop our own heuristic, which is a novel combination of the previous two.In Section 4.4, we analyze the ordering behavior of all three heuristics immediately evident from their formulae.Finally, in Section 4.5, we develop policies looking several periods ahead, with a particular focus on the one-stage lookahead policy due to the convexity of our model at time T − 1.

Newsvendor heuristic
First, suppose all items perish after one period; this is a newsvendor model.Since the inventory is emptied at the end

Production and Operations Management
of each period, the optimal order at period t (q NV* t ) is the optimal base-stock level at the start of time t + 1, which is known to be where F −1 D represents the generalized inverse cumulative distribution function of the random variable D (Axsäter, 2015).We call (10) the newsvendor heuristic (NV).Note the quantile in (10) balances the underage cost, p, and the overage cost, h + .

Periodic review heuristic
Now suppose items do not perish at all; this is a standard periodic review model.Once an item is ordered, it remains in the inventory until it is sold, so its age is immaterial.Therefore, the state is simply the total inventory, denoted x t at time t.
In the following, we present a simple heuristic arising from this model.If items never perish, the overage cost per unit time is h.Since the underage cost is still p, by ( 10), the optimal base-stock level at the start of time t + 1 is ).If the retailer knew at time t the on-hand stock at time t + 1, namely x t+1 = max(x t − d t , 0), then the optimal order at time t (q PRV* t ) would be in words, order to bring x t+1 up to ), or order 0 if ).Yet, the retailer does not know d t at time t, so cannot compute q PRV* t when making the time t order.A sensible heuristic estimates x t+1 using its expectation over the demand at time t, then rounds to the nearest integer, ordering q PRV t , which is defined as We call (12) the periodic review heuristic (PRV).

A combination of NV and PRV
Each unsold age i item carries over to period t + 1 with probability 1 −  i , so the expected on-hand stock at time t + 1 is given by 2. Order if on-hand stock at t + 1 was known.
In both NV and PRV, the optimal order if the on-hand stock x t+1 was known involves subtracting x t+1 from the optimal base-stock level at the start of time t + 1, the latter which is calculated by balancing the underage cost, p, and the overage cost per unit time, h in PRV and h +  in NV.In our model, the underage cost is again p, but the overage cost per unit time now depends on the age of the unsold item; at age i, an unsold item incurs a holding cost h and an additional outdating cost  with probability  i .A simple adjustment would calculate the average overage cost per unit time up to the maximum lifetime m; in other words, h + ψ, where ψ is the mean of the m perish probabilities.However, an item unsold at age 1 may not survive the full m ages; it is likely to be bought or perish before then.In most practical cases where  1 ≤ ⋯ ≤  m = 1, such an average would overestimate the true overage cost.
To adjust, we discount the average using an estimate of the probability that an item survives to the next period, described in the following.
For an age i item unsold in period t to remain unsold after period t + 1, two things must happen.First, the item must Production and Operations Management not perish at time t, which happens with probability (1 −  i ).Second, the product must not sell at time t + 1.If a t+1 is the probability of the latter, the discounted average cost of an item unsold at time t is given by Note that ( 15) is consistent with NV and PRV.In NV, m = 1 and  1 = 1, so ( 15) is simply h + .In PRV,  i = 0 for all i, so ( 15) is equal to h.
Of course, a t , the probability of an item not selling in period t, depends on the age of the item and the current stock levels.For our heuristic, we simply wish to choose a fixed value for a t that performs well.To do this, note that a t is also affected by the distribution of demand in period t, D t .For an age i item to remain unsold, there must be (assuming the retailer orders sensibly) a lower-than-expected realization of demand (lower than the total on-hand stock aged i and older).Therefore, items of any age are more likely not to sell the more variable D t .Write  D and  D for the standard deviation and mean of a random variable D, respectively.We found that a version of the coefficient of variation between 0 and 1, works well as an estimate for a t .

The ADPRV heuristic
Putting steps 1 and 2 together, our proposed heuristic, which we call the age-dependent PRV (ADPRV), orders q ADPRV t at time t, which is defined as where I t+1 is given by ( 14), and g t is given by for ât defined in (16).

Comparison of the order quantities of NV, PRV, and ADPRV
In this subsection, we compare the three heuristics just introduced to each other and examine whether they underorder or overorder compared to optimal.
No on-hand stock at t First, we examine the simple case of the order at time t when the inventory is empty.Examination of their respective definitions in ( 10), ( 12), and ( 17) with x t = 0 shows that the orders of NV, ADPRV, and PRV are respectively.The inequalities follow from the fact that F −1 D is increasing for any random variable D, and g t in ( 18) is between 0 and 1.Since PRV ignores outdating, it underestimates the expected lifetime cost of a new item, so overorders; on the other hand, NV optimally solves the problem where all items outdate, so underorders.In other words, like the order of ADPRV, the optimal order lies between the orders of NV and PRV.This behavior can be seen in Figure 1, which shows the three heuristics' orders and the optimal order as the total on-hand stock varies for two closely related problems with m = 2.

Some on-hand stock at t
Now we consider the general case where x t ≥ 0. First, consider NV.As (10) shows, NV orders the same amount no matter the value of x t since all unsold product is assumed to perish.Further, NV orders less than optimal when x t = 0 and the optimal order must drop to 0 for large enough x t .It follows that NV underorders when x t is small and overorders when x t is large enough, a pattern evident in Figure 1.When x t ≥ 0, PRV simply subtracts the expected number of unsold items from its x t = 0 order.Therefore, as x t increases, PRV's order monotonically decreases to 0. It follows that there exists y such that PRV orders more than NV if and only if x t ≤ y; see Figure 1.Ignoring outdating of current stock means PRV overestimates future stock levels, which encourages underordering.The significance grows with x t , so, after overordering when x t = 0, PRV leans more and more towards underordering as x t increases.Figure 1 shows two cases.In Figure 1a, the optimal order hits 0 before PRV can start to underorder, while in Figure 1b PRV does underorder for large x t until it hits 0.
Unlike PRV, (17) shows that ADPRV's order at time t depends not only on the total inventory x t but also on its age distribution.Yet ADPRV still exhibits decreasing behavior in the sense that, for any i ∈ {1, … , m}, ADPRV's order will decrease if the age i stock increases and all other ages of stock stay the same.To help choose their time t order, both ADPRV and PRV estimate the expected on-hand stock at time t + 1.Yet, since it does account for unsold items perishing, ADPRV's estimate is smaller.Therefore, if x t grows by 1, no matter the age of the extra item, ADPRV's order will decrease by no more than PRV's.In other words, as x t rises, ADPRV's order decreases less severely than PRV's, as Figure 1 shows.
Recall from (19) that ADPRV always orders less than PRV when x t = 0.In some problems (e.g., Figure 1b but not Figure 1a), PRV catches ADPRV due to its steeper gradient, meaning there exist x t where ADPRV's order is larger than PRV's.Such problems usually have large perish probabilities, causing a big difference between the estimated expected onhand stock at time t + 1 of ADPRV and PRV and hence a big difference in gradients, or they have  small compared to p, meaning that the orders of ADPRV and PRV are close when x t = 0.While both problems in Figure 1 have the same perish probability, p∕ is small in Figure 1a but large in Figure 1b.
Further, Figure 1 shows ADPRV's order is close to optimal for all inventory levels, with a tendency to underorder more than overorder.Reasons behind this behavior will be discussed in Section 5 when the performance of ADPRV is analyzed.

Multi-stage lookahead policies
In this subsection, we design heuristics which consider costs for a given number of future periods, but no further.Recall J t (x t , q t ) is the expected cost from period t until the terminal time T if at period t the state and order are x t and q t , and from period t + 1 onwards the optimal order is made.The exact n-lookahead heuristic (En) orders min q t (J max(t,T−n) (x t , q t )) (20) at period t.In other words, if there are less than n periods to go, order to minimize the cost over all remaining periods (in which case the optimal order is made).Otherwise, if there are more than n periods to go, order to minimize the cost over the next n periods, assuming that all leftover inventory after these n periods is salvaged, so that the terminal cost summarizes the value of the remaining inventory.
Clearly, the larger we choose n, the closer En will be to the optimal policy, but the greater the computational effort to calculate En.When the optimal policy requires too much computational effort to calculate, the largest n such that En is calculable can provide a good estimate of the optimal policy and hence a good benchmark for other, simpler heuristics.
The computation involved to calculate En can be reduced by assuming all items of age class i perish together with probability  i , and otherwise none perish, in other words, solving our model with n periods but with dependent perishing as in Ketzenberg et al. (2015) replacing our usual independent perishing assumption.Since it is a heuristic for our setting with independent perishing, we call the solution to the model with this assumption the exact n-lookahead heuristic with dependent perishing (En Dep).
Another way to reduce the computational effort to calculate E1 specifically is to use Proposition 2, which shows that the remaining cost at period T − 1 is convex in the order at T − 1.Since E1, aside from the last period, always assumes we are at period T − 1, convex optimization techniques can reduce its computational time.

Experimental design
In this section, we choose parameters to reflect a wide variety of scenarios.A summary of the parameters is given in Table 2.We examine maximum lifetimes m = 2, 3, and 4. We expect the effect of an increasing lifetime to decrease because inventory is depleted according to a FIFO policy.Therefore, items with a high age class will mostly not be present.To investigate the effect of customer demand, we use three demand distributions with the same mean: a Poisson distribution with mean 20, which is close to symmetrical, a right-skewed negative binomial distribution with parameters 5 and 0.8, and a discrete uniform distribution on [0,40].These distributions have variances of 20, 100, and 400/3, respectively.Note that the first two distributions are more common in retailing, with the latter included for illustrative purposes.
Inventory holding costs ĥ are normalized to 1, and order cost c is set to 5. The cost of a shortage must be at least the selling price, which must be at least the order cost c = 5.Plus, there may be a loss of future sales due to loss of goodwill.Therefore, we examine p = 10, 15, 20.The cost of outdating could be very small if the retailer is large and has efficient waste disposal.For smaller retailers, it could be a lot higher, so we examine θ = 0, 5, 10.
We choose a high discount factor,  = 0.999, to ensure costs over the entire time horizon of T = 200 are reflected in the results.This means that the transformed parameter values from Proposition 1 are roughly h ≈ 1; p ≈ 5, 10, 15;  ≈ 5, 10, 15.
As illustrated in Table 3, we consider five distinct perish probability sets for each maximum lifetime.In each, the perish probabilities increase with age to reflect older products becoming more and more likely to perish.In total, we consider 135 settings.

Maximum lifetime m = 2
For a maximum lifetime of m = 2, we can solve the model with T = 200 optimally with dynamic programming and hence calculate the percentage increase from the optimal expected cost to the expected cost of each heuristic.Since the order costs incurred at the terminal time do not depend on the ordering policy, we omit them from the calculations to show the effect of each ordering policy separately.Results are shown in Table 4 with the perish probabilities for m = 2 from Table 3 and parameters from Table 2. Further, a box plot showing the variability of the performance of each heuristic (except E3, which is so close to optimal) can be found in Figure 2a, and another showing E1 and ADPRV for different demand distributions can be found in Figure 2b.
NV and PRV, the most basic of our heuristics which are frequently applied in practice, can perform poorly, Figure 2 showing both can be as much as 60% above optimal.Further, Table 4 shows that, even for parameters where one of the two performs better, ADPRV or E1 is at least two times closer to optimal, often much more.This demonstrates that neglecting the effect of stochastic perishing can lead to substantial losses.
In a PRV model no product perishes, so PRV performs better the lower  1 .In addition, no perishing means the main cost to avoid is the penalty cost p; therefore, PRV stocks high to avoid product shortages.When p is high and  is low, the optimal policy displays the same behavior, so PRV performs better.NV assumes that all product perishes, so performs better the larger  1 .
In the remainder of this section, we first focus on the differences between independent and dependent perishing and then compare across different parameters our two main heuristics, E1 and ADPRV.

Dependent versus independent perishing
On average, Table 4 shows that E1 Dep performs almost twice as badly as E1, suggesting a benefit in assuming products of the same age perish independently rather than all together as in Ketzenberg et al. (2015).Yet, in fact, the difference in order quantities between E1 and E1 Dep varies wildly from problem to problem.This effect is demonstrated in Figure 3, which shows, for four of our 135 problem settings, the orders of ADPRV, E1, and E1 Dep minus the optimal order quantity for different total on-hand stock levels.
In some problems, Figure 3d as an example, E1 and E1 Dep are virtually identical.Yet in others, for example, Figures 3a  and 3b, E1 Dep begins to overorder compared to optimal once the total on-hand stock surpasses the mean of the demand distribution (which is 20), with the size of the overorder growing   4 (except E3) against the optimal policy.Subfigure (b) shows E1 and ADPRV across the three different demand distributions, where "Po," "NB," and "U" stand for the Poisson, negative binomial, and uniform distributions detailed in Table 2.

Production and Operations Management
with the inventory level up to a remarkable 20.In fact, no matter how large the total on-hand stock levels become, E1 Dep still makes a nonzero order when the optimal, ADPRV and E1 orders are 0. We also see E1 Dep underorder compared to optimal by four items for medium on-hand stock levels in Figure 3c.Below, we explain both behaviors.First, we explain the overordering of E1 Dep seen in Figures 3a and 3b where p∕ is large.Suppose the state is x t = (S, S) for large S. If S is large enough, all demand in period t will be covered by the age two stock, so S age one items will remain at the end of period t.The amount that will carry over to period t + 1 follows a binomial distribution with parameters S and 1 −  1 , and hence mean S(1 −  1 ).If  1 is not too large, we can take S large enough so that the optimal order at time t is 0, since so much age one product is likely to carry over.Since E1 and ADPRV assume the above independent perishing model, their order is also 0 for large S.
If perishing was dependent instead, S age one products would survive to period t + 1 with probability (1 −  1 ), and 0 would survive with probability  1 .When p is large compared to , E1 Dep, which assumes dependent perishing, desperately wants to avoid having no product at all at time t + 1

Production and Operations Management
F I G U R E 3 Difference between optimal and E1, E1 Dep, and ADPRV orders at period 1 by total period 1 on-hand stock for m = 2, T = 200, h = 1, and c = 5.Note that p and  are the same for each row.
to avoid a large penalty cost, so, at the expense of risking a smaller outdating cost in the future, makes a large order at time t in case all S on-hand items perish before t + 1.In reality with independent perishing, for large S the chance of all or most of the S age one items perishing is negligible, so the retailer is not actually at much risk of incurring penalty costs at time t + 1 at all if they order 0. In other words, E1 Dep orders high to safeguard against a scenario which, in real life, is extremely unlikely to happen.
The underordering of E1 Dep in Figure 3c occurs because E1 Dep starts to order 0 just after the total stock exceeds 40, while the optimal policy waits until around 65.The reason for this difference is linked both to a small p∕ and, just like the overordering case, E1 Dep anticipating a situation likely to happen with dependent perishing, but very unlikely with independent perishing.This time, the situation is none of a large number of on-hand items perish, so newly ordered stock will not be sold.Any unsold item risks a perishing cost ; since  is large compared to p, E1 Dep orders nothing much sooner than the optimal policy as total stock levels rise.
To summarize, while in some settings the retailer will perform just as well assuming a simpler dependent perishing model, in others, particularly when there is a large difference between penalty and outdating costs, this assumption will achieve very poor results, with the retailer at risk of either over-or underordering compared to optimal.

E1 versus ADPRV
While the lookahead heuristics E2 and particularly E3 perform excellently, their computational complexity prevents their practical use.In this section, we compare our two leading practical heuristics, E1 and ADPRV.On average, ADPRV performs better than E1, as can be seen from Table 4 and Figure 2a.ADPRV also has a smaller standard deviation than E1, showing a more steady performance over a wide range of parameters.Yet, there are problem settings where E1 beats ADPRV and hence should be preferred.In this subsection, we shall explain the patterns in performance concerning demand and perish probability that lead to these preferences and examine the ordering bias of both heuristics.

Demand and perish probability
Both E1 and ADPRV perform better the larger  1 , since the problem is easier to solve: the more items that perish, the higher the proportion of the inventory level at the next period that is made up of the retailer's order, a quantity over which the retailer has complete control.When  1 = 1, they both reduce to NV which is optimal in this case.
As for demand, E1, which looks ahead just one period, performs well for Poisson, since there is low variance in demand from period to period, so the next period is more representative of general behavior.As the variance of demand increases from the Poisson to the negative binomial to the uniform, the next period becomes less representative and, as shown in Figure 2b, E1's performance worsens.Figure 2b also shows that ADPRV, on the other hand, thrives when demand variance is high.The cause, explained in the following, is due to an effect also linked to the perish probability.ADPRV calculates its order by simply taking the expectation over both demand and perishing for the next period.The optimal policy is more bespoke, taking into account the whole distribution of demand and perishing, in both the next and future periods.When the variance of demand or perishing (or both) increases, the expectations used by ADPRV will be less accurate, so the cost under ADPRV will rise.However, the cost under the optimal policy will rise by a greater amount, since its more advanced toolkit, which is of great use when demand and perishing are more certain, becomes less and less advantageous over simply using the expectation as ADPRV does.In other words, the optimality gap of ADPRV decreases when the level of uncertainty in either demand or perishing (or both) rises.As Table 5 shows, the optimal policy shows a strong improvement on ADPRV's crude approach only when there is low uncertainty both in demand (Poisson) and perishing (low  1 ).

Production and Operations Management
E1, like the optimal policy, also uses the whole distribution of demand and perishing but only for the next period, which is sufficient for good performance when uncertainty is low.As a result, Table 5 shows that E1's advantage over ADPRV is only for both small  1 and Poisson demand.

Bias in the order quantities
Figure 3 demonstrates that both E1 and ADPRV have some ordering bias, the former (respectively, latter) tending to consistently order more (respectively, less) than the optimal policy.
E1 chronically overorders since it assumes all product which survives to age 2 is salvaged at its order cost.In reality, if such product does not sell at age 2, it perishes for cost .The resulting overordering will be punished more severely the larger the outdating cost , explaining why E1 degrades with the outdating cost  in Table 4.
Figure 3 also shows that the overordering of E1 is more pronounced for the negative binomial and uniform distributions, as the more unpredictable demand is, the more likely a product is to remain unsold for its entire lifetime and hence perish after two periods, a cost which E1 does not consider.Further, it can be seen from Figure 3 that the overordering effect is worse for smaller  1 , since the smaller  1 is the greater the proportion of total product perishing that occurs at age 2.
Figure 3 shows that ADPRV usually underorders.For example, in Figure 3d, where p∕ is low, ADPRV underorders when there is no stock, in which case its order is which balances underage and overage costs.This effect occurs because, while the underage cost, p, is known, the overage cost is estimated by ADPRV using g t (defined in ( 18)), which accounts for the probability of an overordered item perishing at some point in its lifetime.Numerical investigation suggests g t tends to slightly overestimate the overage cost, h + g t .The smaller p∕ the more influence g t has on (21), so the more likely ADPRV is to underorder when there is no stock.In Figures 3a, 3b and 3d, ADPRV's underordering generally grows with the on-hand stock until it becomes large enough for both ADPRV and the optimal policy to order 0. We see this behavior for the following reason.No matter the current inventory, ADPRV orders at time t with the aim for the on-hand stock to be (21) at time t + 1.However, the larger x t , the larger the ratio of old to new stock in the time t + 1 inventory.Older stock has less value than newly ordered stock due to a higher chance of perishing sooner; therefore, the larger x t , the more likely ( 21) is to be below the optimal time t + 1 on-hand-stock level.
We do not see the above effect in Figure 3c for two reasons.First,  1 is larger, so items are less likely to survive to age 2, meaning the old to new stock ratio at time t + 1 is smaller.Second, ADPRV calculates its time t order using only expected values, ignoring the uncertainty arising from demand and perishing at time t.Due to this uncertainty, the optimally ordering retailer slightly biases their order away from the expected values to avoid the heaviest cost.In Figure 3c, the outdating cost is dominant, so the optimal policy orders a little less than the expected values, which counteracts ADPRV's underordering tendencies.

5.3
Maximum lifetime m = 3, 4 Due to the large number of states, problems with maximum lifetimes greater than 2 cannot be optimally solved by dynamic programming, so neither optimal nor heuristic expected costs can be calculated.Therefore, for m = 3, 4, we use simulation to estimate costs under each heuristic.
For each parameter setting, we simulate 10,000 runs with T = 200, with run length chosen by comparing the average simulated heuristic cost to the expected heuristic cost when m = 2.The furthest computationally tractable lookahead heuristic is E3 for m = 3 and E2 for m = 4, both shown to be close to optimal when m = 2. Yet, the average percentage decrease in costs from E2 to E3 for m = 3 is only 0.01%.Therefore, to make the m = 3 and m = 4 results comparable, we benchmark both using E2.
Results are shown in Table 6 for m = 3, with the corresponding perish probabilities from Table 3 and parameters from Table 2.The results for m = 4 are very similar, so are presented in Table EC.5 in Supporting Information Section EC.5.Such similarity implies that most products are either sold or perished before reaching a high age class, as in accordance with the FIFO depletion policy.For both m = 3 and 4, box plots showing similar patterns to those in Figure 2 for m = 2 can be found in Supporting Information Section EC.6.
NV and PRV can again both perform poorly compared to the other heuristics, but, as expected, NV (which assumes m = 1) gets worse as m increases from 2 to 3, and PRV (which assumes m = ∞ and no perishing) gets better.In fact, the performance of all heuristics except NV improves as m increases from 2 to 3, since inventory lasts longer, so the retailer has more opportunity to correct an under-or overorder in a previous period.NV orders independently of the current stock level, so does not share this benefit.
The difference in performance between E1 and E1 Dep increases with m since there are more opportunities for prod-ucts to perish, a process for which E1 and E1 Dep have different underlying assumptions.E1 is around three times better than E1 Dep when m = 3.
Similarly to m = 2, Table 6 shows that ADPRV does better when demand is more varied and E1 better when demand is less varied when m = 3, but Table 5 shows it is in the combination of Poisson demand and low perish probabilities where E1's whole advantage lies.Based on these findings, we suggest that the retailer assesses both demand variability and perish probabilities.If demand variability and perish probabilities are both low, choose E1, because there are significant benefits to E1's use of the whole distributions of demand and perishing, and one period is fairly representative of the rest so there is not much benefit to looking ahead any further.Otherwise, choose ADPRV, since the more thorough analysis of E1 (which also adds computational complexity) is not as useful, and, since demand is more varied, it is important to consider perishing costs several time periods ahead.

Lower bound on optimal cost
Recall that for m ≥ 3 the optimal cost is computationally intractable with T = 200 and our choice of demand distributions with mean 20.While E2 provides a very tight upper bound, E2's computational complexity grows sharply with m due to the exponential growth of the state space.In our numerical study, calculating E2 for a single problem case for m = 4 typically takes between 24 and 72 hours.
To quickly approximate the optimal cost for problems with larger m, we turn to the lower bound discussed in Section 3, which solves the problem where the perish probability for all ages is  ≡ min i=1,…,m  i .The resulting state space is simply the total inventory, so computational complexity is low and unaffected by an increase in m.

Production and Operations Management
In this section, we test the lower bound by calculating the percentage decrease from the benchmark cost to the lower bound for m = 2, 3, and 4. As before, the benchmark is the optimal expected cost for m = 2 and the simulated E2 cost for m = 3 and 4. Table 7 shows the results, both overall and split by demand distribution or  1 value.
We see a clear improvement from m = to m = 3, 4, a useful trait since m = 3, 4 is where the bound is most needed.Recall that the uniform distribution is included in our numerical study for illustrative purposes and not for any relevance to retailing.In the realistic cases of Poisson and negative binomial, the lower bound is within 0.5% of optimal on average for m = 3 and 4.However, Table 7 shows that the lower bound performs much better in some problem instances than others.
To explain these patterns, note that the perish probabilities in Table 3 used in our numerical study increase with age.Therefore, the perish probability used for all ages by the lower bound satisfies  ≡ min i=1,…,m  i =  1 , which becomes less accurate as product age increases.When either  1 is large or demand is less variable, product is less likely to survive until these badly estimated older ages, so the better the lower bound performs.
Perish probabilities also explain the improvement in the lower bound with m.The most inaccurately estimated perish probability is the mth, where the lower bound model is out by 1 −  1 .As m increases, the significance of this large inaccuracy decreases, since by FIFO product becomes less and less likely to remain in the inventory the older it becomes.This explains a lower bound tighter for m = 3, 4 than m = 2, but we would also expect to see the lower bound tighter for m = 4 than m = 3.However, the results are almost identical because so few products make it to age 3 that there is little benefit to the better approximation of the third perish probability when m = 4.

Real data
We compare our leading heuristics E1 and ADPRV both to each other and to the basic heuristics NV and PRV using real data from a large European retail company to better understand their performance in practice.We consider different types of perishable products (fruit and vegetables) that have different maximum lifetimes and perishability patterns.Table 8 contains the perish probabilities of the products, which start from 0.3 or 0.4 for the first day, and increase or stay the same for the days thereafter, reaching 1 by a maximum of 8 days.At the end of the day, the store manager checks which items are still salable for the next day and can be carried over.All perished items are discarded.The managers use their own judgment to place an order for the next day.
We evaluate these products with sales data of 66 stores.We collected daily selling prices and current batch sizes for each date, since the retailer can only order products in given batch sizes.Shortage penalty costs consist of the difference between selling price and ordering costs adjusted with discount factor  = 0.999, the latter chosen to ensure that costs over the entire time horizon are reflected in the results.Due to confidentiality reasons, we normalize the data so shortage penalty costs equal one.The daily unit inventory holding cost rates are 0.25% of the penalty costs.The outdating cost equals the penalty cost multiplied by 1.12 to reflect the cost of a lost sale in addition to disposal and handling costs of the outdated item.For each product-store combination, we have over 500 days of data.
Since we are only able to collect sales data and lost sales are unobserved, we estimate demand using hourly sales data and product-and store-specific demand patterns based on the nonparametric approach developed by Lau and Lau (1996).We assume that demand is Poisson-distributed and forecast demand with rolling-horizon Poisson regression.As external variables, we consider lagged demand, selling price, and weekdays.Note that we can only consider a time lag of greater than 1 day since the last day's sales are not yet known when the order decision is made.
Results are shown in Table 9. Overall 321 product-store combinations, as expected since demand is Poisson, E1 performs the best and is hence used to benchmark the other three heuristics.NV performs very badly on average, with PRV a substantial improvement.However, the standard deviation of PRV is higher than NV, showing that there are scenarios where PRV can perform just as badly as NV.Indeed, in their worst-case scenarios, NV and PRV have costs 23.8% and 22.8% higher than E1, respectively.
On average, the performance of ADPRV is close to E1 without much variation.In its best-case scenario, ADPRV has costs 2.1% lower than E1, and in its worst case costs 4.6% higher.Therefore, choosing E1 over ADPRV or vice versa does not make much of a difference to the retailer's costs, and, with its reduced computational complexity, ADPRV may well be preferable.

CONCLUSION
We develop a periodic review inventory control model with stochastic demand and perishing, lost sales, and a FIFO depletion policy where the perish probability depends on the age class of the product.The novelty of our model is that we allow products to perish independently of one another.This is an important feature not only because of its practical relevance (one bad product rarely spoils the whole bunch) but also because we find significant differences in ordering patterns compared to the dependent perishing assumption of Ketzenberg et al. (2015).Consequently, assuming dependent perishing can lead to poor performance if products perish independently in reality.
We construct an MDP and show that convexity holds in the penultimate period, but, via a simple counterexample, that convexity can fail any earlier.In addition to an optimal solution algorithm, we also develop two new heuristic policies which find excellent solutions to large problem instances that cannot be solved optimally due to the curse of dimensionality.The first is a combination of the optimal solutions to the well-known newsvendor and periodic review models, and the second is a one-stage lookahead policy.
Using simulated data and data from a large European retail chain, we show that our heuristics outperform existing models, which do not fully consider the characteristics of the perishing process.This research also provides important practical insights.The retail chain initially considered using commercial software solutions that contain standard methods such as NV and PRV.However, our research clearly indicates that they would be better off using the new E1 and ADPRV heuristics presented in this paper.In a thorough comparison of E1 and ADPRV, we explain their performance patterns arising in our numerical study.In particular, we highlight that, for problem parameters where one heuristic shows weaker performance, the other shows a stronger performance.Therefore, our results can advise retailers which heuristic to use depending on the nature of the demand they receive and the perish probabilities of the products they sell.
In our model, we assume a FIFO approach that corresponds to common stocking policies of retailers.It would also be interesting to analyze a LIFO approach or a mixture of approaches to account both for different stocking policies and customers that pick the freshest products from behind.Another important aspect in retailing is the multiechelon structure, where the supply chain consists of suppliers, distribution centers, and/or warehouses from which products are delivered to multiple stores.A multiechelon inventory policy would require making order decisions at the different echelons and considering their interdependencies.For example, if the warehouse does not have enough stock to supply all orders, the policy would require shortage allocation rules or transshipments between stores could be used to better balance inventories.We leave these investigations to future research.

A C K N O W L E D G M E N T S
We thank the department editor Jayashankar Swaminathan, the senior editor, and the anonymous referees for their constructive comments to improve the paper.The authors are grateful for the support of the EPSRC-funded EP/L015692/1 STOR-i Centre for Doctoral Training.

R E F E R E N C E S
Optimal and heuristic orders in period 1 by total period 1 stock for two problems with m = 2, T = 200, h = 1,  1 = 0.3, c = 5,  = 0.999, and demand following a negative binomial distribution with parameters 5 and 0.8.For optimal and ADPRV, where the order amount depends on the individual ages of the stock, the average over all states with the same total stock is taken.In (a) p = 5,  = 15, PRV > ADPRV for all states and in (b) p = 15,  = 5, ADPRV > PRV for large stock.

F I G U R E 2
Box plots showing the variability of heuristic performance for m = 2. Subfigure (a) shows all heuristics in Table

TA B L E 1 Example state transition t x t q t d t y t
and shortage penalty costs are p[D t − x t ] + .The number of outdated units is E t ≡ ) denotes a binomially distributed random variable with parameters n and .Outdating costs are θE t .We assume that c, p, ĥ, and θ are nonnegative.The one-period cost function, given order q t and inventory x t , is Ĝt (x t , q t ) ≡ cq t + ĥ[x t − D t ] + + p[D t − x t ] + + θE t , (1) and the costs over the whole planning horizon T are Ĝ In NV, no items carry over from period t to t + 1, while in PRV, all items unsold in period t carry over.In our model, each unsold item carries over with an age-dependent probability.Therefore, to calculate the expected on-hand stock at t + 1, we need to calculate the number of items of each age class which are unsold in period t.As previously, write x t ≡ (x t,1 , … , x t,m ) for the state at time t, where x t,i is the age i on-hand stock at time t.By FIFO, age i items are only sold once all older items have sold out.The number of unsold items of age i at time t can therefore be calculated as While the heuristics NV in(10) and PRV in (12)are simple to calculate, they both ignore the age-dependent perishability of items, a key feature of our model.NV and PRV stand at opposite ends of the perishability spectrum; in this subsection, we propose a heuristic which is a novel combination of the two.1.Expected on-hand stock at t + 1 Percentage increase from optimal to heuristic expected cost for m = 2 across different parameters TA B L E 4Abbreviations: ADPRV, age-dependent PRV; E1, E2, E3, exact lookahead heuristics; E1 Dep, exact lookahead heuristic with dependent perishing; NB, negative binomial distributions; NV, newsvendor heuristic; Pois, Poisson distributions; PRV, periodic review heuristic; U, uniform distributions.
Percentage increase from the benchmark (E2) to heuristic cost over 10,000 runs for m = 3 across different parameters Percentage decrease from benchmark to lower bound for m = 2, 3, 4 across different demand distributions and age 1 perish probabilities TA B L E 7

Production and Operations Management TA B L E 8 Perish probabilities Perish probability at the end of each day
Average and standard deviation of NV, PRV, and ADPV expressed as percentages over E1 for 321 product-store combinations Abbreviations: ADPRV, age-dependent PRV; E1, exact lookahead heuristic order; NV, newsvendor heuristic; PRV, periodic review heuristic.