• agent-based simulation;
  • reinforcement learning;
  • pricing;
  • supply chain


This study analyses simultaneous ordering and pricing decisions for retailers working in a multi-retailer competitive environment for an infinite horizon. Retailers compete for the same market where the market demand is uncertain. The customer selects the winning agent (retailer) in each term on the basis of random utility maximization, which depends primarily on retailer price and random error. The complexity of the problem is increased by competitiveness, necessity for simultaneous decisions and uncertainty in the nature of increases, and is not conducive to examination using standard analytical methods. Therefore, we model the problem using reinforcement learning (RL), which is founded on stochastic dynamic programming and agent-based simulations. We analyse the effects of competitiveness and performance of RL on three different scenarios: a monopolistic case where one retailer employing a RL agent maximizes its profit, a duopolistic case where one retailer employs RL and another utilizes adaptive pricing and ordering policies, and a duopolistic case where both retailers employ RL.