Surrogate-based optimization of a periodic rescheduling algorithm

Periodic rescheduling is an iterative method for real-time decision-making on industrial process operations. The design of such methods involves high-level when-to-schedule and how-to-schedule decisions, the optimal choices of which depend on the operating environment. The evaluation of the choices typically requires computationally costly simulation of the process, which — if not sufficiently efficient — may result in a failure to deploy the system in practice. We propose the continuous con-trol parameter choices, such as the re-optimization frequency and horizon length, to be determined using surrogate-based optimization. We demonstrate the method on real-time rebalancing of a bike sharing system. Our results on three test cases indicate that the method is useful in reducing the computational cost of optimizing an online algorithm in comparison to the full factorial sampling


| INTRODUCTION
Process scheduling is operational decision-making that arises in many fields of process systems engineering (PSE), such as production of metals, petrochemicals, pulp and paper, and transportation of goods.
The main decisions involve defining the set of tasks to be executed, their timing, and the allocated resources. 1In offline scheduling, these decisions are determined prior to the beginning of the process operation based on the available information.Mathematical optimization methods can find the optimal solution to a vast variety of such (often combinatorially complex) scheduling problems-subject to sufficient computing resources and execution time.[3] In online scheduling, the scheduling decisions are made in real-time while operating the process.If the decision-making is based on mathematical optimization, this process typically involves iterative execution of optimization procedures, the results of which are used to update the current schedule.Each optimization procedure aims to find the optimal operations for a predefined scheduling horizon to the future using the latest information of the process.As this scheduling horizon moves ahead in time, the method is generally referred to as the rolling horizon approach. 2While the methods for offline scheduling are fairly established, less attention has been put on the design of online scheduling methods. 4A major challenge in online scheduling is how to optimally use new information and the given computing resources in a limited time.The high-level decisions related to online scheduling can be categorized into when-to-schedule and how-to-schedule decisions. 5pta et al. 6 review the literature of online scheduling and highlighted four aspects for the design of online scheduling algorithms: (i) recomputation trigger, (ii) computation technology, (iii) uncertainty modeling, and (iv) allowable changes and constraints to the schedule.While the re-computation trigger is essentially a synonym for the when-toschedule decision, the last three aspects are all different subcategories of how-to-schedule decisions.
Regarding the when-to-schedule decision, periodic rescheduling is perhaps the most commonly used in the literature to trigger scheduling procedures.Recently, Gupta and Maravelias 7 investigate how the reoptimization interval and horizon length of open-loop scheduling affect the closed-loop solutions.The same authors also propose a systematic way to design a periodic rescheduling algorithm, 4 focusing on the interval of scheduling procedures and the horizon length.Stevenson et al. 8 study the effect of periodic rescheduling interval to the performance of a multipurpose production plant in different operation conditions.Dong et al. 9 present a re-optimization framework for a maritime inventory routing problem based on a discrete-time arc-flow formulation and stochastic simulations to account for uncertainty.Other commonly used methods for rescheduling timing are event-driven rescheduling 10,11 and a hybrid of periodic and event-driven rescheduling. 12Ikonen et al. 13 propose an approach where a reinforcement learning agent is trained to decide on the timing and computing time allocation of rescheduling procedures.
Recent studies on allowable changes and constraints in online scheduling are published in articles. 14,15Vieira et al. 16 provide a thorough review of rescheduling strategies, including re-optimization triggers.In this study, we focus on the decisions related to the periodic rescheduling.
It is worth noticing that an evaluation of an online scheduling algorithm requires repeatedly performed optimization procedures, as well as a simulation of the process, to which the optimized scheduling decisions are applied.In order to obtain a reliable metric of the closed loop performance of the algorithm, the evaluation may take hours of computing time.Considering that design of an online scheduling algorithm has many algorithmic settings (discussed above), a full factorial evaluation of all settings may require a prohibitively long computing time.In the literature, this aspect of online scheduling is rarely discussed.
Surrogate-based optimization is a sample efficient global search method suitable for black-box type objective functions, the evaluation of which is computationally expensive.The main idea is to, first, generate an initial sampling plan of the search space based on a design of experiment method, 17 for example, the Latin hypercube sampling 18 or the Morris-Mitchell Maximin distance sampling. 19Second, a surrogate (also referred to as a meta-model, a response surface, or an emulator) is constructed based on an initial sampling.The purpose of the surrogate is to serve as a cheap-to-evaluate representation of the objective function in the search space.Third, the surrogate is used in an iterative process where new candidate points (also referred to as infill points) are identified and evaluated, and the resulting input-output data are used to update the surrogate.A field partly overlaid with surrogate-based optimization is Bayesian optimization with the distinction that in the latter the surrogate is typically updated based on Bayesian statistics. 20rly works in these fields were done by Kushner, 21 Močkus and Žilinskas 22,23 during 1960-1970s.Sacks et al. 24 pioneered surrogate modeling of computational simulations in late 1980s.Jones et al. 25 proposed the efficient global optimization (EGO) method, where the infill points are determined based on a surrogate by maximizing the expected improvement in the objective function value (developing on the idea presented by Močkus 22 ).Commonly used surrogates are radial basis functions, 26 kriging, 27 and polynomial functions.Surrogate-based optimization can take into consideration features such as noisy experiments, 28,29 multiobjective optimization, 30,31 multifidelity function evaluations, 32 and parallel evaluation of infill points. 33,34Surrogate-based optimization, as well as Bayesian optimization, have been used extensively in the design of engineering systems. 35,36McBride and Sundmacher 37 review surrogate modeling in chemical process engineering.
Regarding planning and scheduling, Wan et al. 38 41 apply Bayesian optimization to the scheduling of pump operations.Hutter et al. 42 developed Bayesian optimization algorithms for the optimal configuration of the mixed integer programming (MIP) solver CPLEX, often used to solve scheduling problems.
In this study, we first draw attention to the high computing cost of optimizing the closed-loop performance of an online scheduling algorithm via simulation.Second, we propose the optimization of continuous parameter of a periodic rescheduling algorithm to be performed using surrogate-based optimization, in order to mitigate the above-mentioned high computing cost.As far as we are able to ascertain, such an approach has not been reported in the literature.Third, Gupta and Maravelias 4 report results of a two-dimensional control parameter optimization using full factorial sampling, but, to our knowledge, results of systematic optimization in higher dimensions have not been reported in the literature.We extend the search space into three dimensions, and discuss how surrogate-based optimization could facilitate even higher dimensional search spaces.
We evaluate the approach on the bike sharing rebalancing problem (BRP), which is a generalization of the one-commodity pickup and delivery traveling salesman problem. 43The objective is to minimize travel time of a number of rebalancing vehicles that transport bikes between fixed stations while fulfilling the service level requirement (SLR) of the system.We also discuss the transferability of the research to more complex processes in PSE, such as process scheduling and supply chain optimization.Artificial intelligence (AI) algorithms are used in this study (i) to make probabilistic predictions of the future demand at the bike stations, using a continuous time Markov chain, (ii) to cluster the stations of the system into subsets that are allocated to a single rebalancing vehicle, and (iii) in the surrogate-based optimization of control parameters of the online scheduling algorithm.

| OPTIMIZATION PROBLEM: BIKE SHARING REBALANCING
Urban bike sharing systems are expanding concurrently in large metropolitan areas and smaller cities around the world.In June 2021, the number of active bike sharing systems globally was around 2000. 44ke sharing is a form of sharing economy, aiming to increase the utilization rate of goods and reduce the overall greenhouse gas emissions.
The bikes of these systems are often used as last mile transportation, for example, between home and a train station.The operation of these systems is challenging, because the daily pickup and return distributions on different stations are asymmetric, leading to a risk of undesired empty or overfull stations.The system operators typically mitigate the imbalance by transporting bikes between stations by trucks.In the literature, the problem of finding the optimal operation of the rebalancing vehicles is referred to as the BRP.
The reason for selecting the bike sharing system as the example problem here is manifold.First, the process is very dynamic and has elements that follow certain patterns but cannot be fully forecasted.
An example of these is customer behavior, which resembles the product demand in typical PSE applications.Second, both the dynamic BRP and a typical online production scheduling problem involve solving computationally expensive MIP models in limited amount of time.
In addition, solving both problems follow similar procedures that need to be handled in the best possible manner, including the use of available data and information and an intelligent combination of optimization and AI technologies.Third, the studied problem is an instance of capacitated vehicle routing problems, which are frequently encountered by chemical companies that transport chemicals between and within their production sites, as well as to their customer network.
Fourth, and perhaps the foremost, open data are widely available for the BRP and can be used for research and development purposes.This allows to develop, test, and publish methodologies in practice long before it would be possible with partially confidential company data.Instead of using anonymized data, we can directly refer to real process data collected from the described environment.

| Background
The problems studied in the literature can be categorized, in general, into static and dynamic BRPs.The static BRP corresponds to a case of overnight rebalancing, in which the effect of the users on the system during rebalancing is assumed to be negligible and the rebalancing vehicles can operate in the region without congestion.A typical objective is to minimize the cost of operations, subject to fulfillment of station-specific target levels.Raviv et al. 45 formulated MIP models to minimize a weighted objective function, combining the user dissatisfaction function 46 and total operating costs.Chemla et al., 47 Dell'Amico et al., 48 and Erdo gan et al. 49 formulate exact MIP models, and branch-and-cut algorithms to solve them, in order to obtain the optimal operations of rebalancing vehicles.Schuijbroek et al. 50determine the SLRs from historical datasets and propose a clustering-based optimization approach.In their approach, each station is assigned to a self-sufficient cluster of stations, which is assigned a single truck to perform only transportations within the cluster.
The operations of the vehicles are determined by a MIP model.These models are useful in practice in the overnight rebalancing when the usage level is low, as shown, for example, by Freund et al. 51 in an experimental study in New York City.However, during the day-time, the state of the system is significantly altered by the users, leading to a situation where the optimal solution to the static BRP quickly becomes outdated.The expanded problem that also considers the usage of the system during rebalancing is referred to as the dynamic BRP.The key elements to solve this problem are as follows: (i) a good prediction of the future pickup and return rates at the stations (typically based on historical and online data), (ii) an efficient solution method (typically either MIP or a heuristic algorithm), and (iii) a re-optimization scheme that updates the planned operations when appropriate.Chiariotti et al. 52 propose a dynamic rebalancing approach that uses the birth-death process to model the future bike levels at the stations and updates the planned operations periodically using a heuristic algorithm.Ghosh et al. 53 propose a MIP model for the dynamic BRP that uses Lagrangian dual decomposition and an abstraction mechanism that groups stations close to each other into an abstract station in the optimization model.For an extensive review on BRPs, we refer the reader to the recent review paper by Shui and Szeto. 54

| Real-time optimization problem
Let us now formally describe the real-time optimization problem of rebalancing a bike sharing system.The system consists of a set of stations S and a set of rebalancing vehicles V that perform the transportation of bikes in between the stations.Each station i S has C i docks where bikes can be stored.Each vehicle v V has a maximum capacity of bikes, Q v .The relative locations of the stations are characterized by the distance matrix d i,j , where i, j S. The vehicles are assumed to move from a station to another at a constant speed of v. Loading a bike from a station to a vehicle, as well as the opposite action (i.e., unloading), takes time t load .Initially, at time t 0 , the stations have initial bike levels of s init i , i S, and the vehicles initial loads of q init v , v V, and initial locations of l init v , v V, where l init v points to a station in set S. A historical dataset of station-specific bike pickup μ i t ð Þ, i S and return rates λ i t ð Þ, i S as a function of the time of the day t is available for the online scheduling algorithm*.Figure 1 shows these rates at a station located in a commercial district of Helsinki in June 2020. 55ile also other re-optimization triggers are studied in the literature (see Section 1), we here only consider re-optimization procedures that are triggered periodically at an undefined time interval of Δt.We evaluate the periodic rescheduling algorithms by repeatedly formulating and Average pickup and return rates, μ(t) and λ(t), at a bike sharing station, located in a commercial district of Helsinki (Unioninkatu, 60.17 N, 24.95 E) in June 2020 solving optimization problems that determine the operations on the system (Figure 2).The re-optimization procedures are executed on a fixedcapacity computing resource that is continuously available.Thus, each optimization procedure is allocated a computing time limit equal to the re-optimization interval Δt.During the ith re-optimization procedure, the system evolves from the time point t i to t i+1 = t i + Δt.We simulate the evolution of the bike sharing system using real pickup and return data.
The data used in the simulation is not included in the historical dataset.
The operations obtained from the ith re-optimization procedure that started at t i can be put into action at t i+1 .The information of bike pickups and returns by users during the evaluation becomes available for the online scheduling algorithm at the time they occur.We evaluate the algorithm during a time span of t 0 …t end ½ .The objective is to minimize the sum of time the stations of the system are in an undesired state (i.e., empty or full), P i S t ef i , where t ef i is the time station i is empty or full.The next section describes the methods we use to formulate an optimization problem based on the prevailing state of the bike sharing system.

| Optimization problem formulation
We formulate the optimization problems using a slightly modified version of the model by Schuijbroek et al. 50The model, in fact, consists of three sub-models: (i) a probabilistic prediction model of the future bike levels at the stations, (ii) MIP model to solve routing problems, and (iii) the aforementioned clustering model to decompose large problems into smaller ones.Although the model was originally designed for static BRP problems, the model is also well-suited for dynamic problems because it recognizes stations that do not need to be visited (referred to as self-sufficient stations) and it is scalable to larger problems due to its clustering feature that decomposes the original problem into smaller sub-problems.In the following, we describe the model and report our modifications to it.We follow mostly the symbol notation used by Schuijbroek et al. 50

| Probabilistic prediction model
The inventory of bikes S i (t) at a station i S is modeled as a stochastic process on the state space 0, …, C i f g , where C i is the number of docks at the station (assuming that the station cannot be overfull).Let us denote this process as {S i (t): t ≥ 0}, where t represents the time.
Schuijbroek et al. 50model this inventory as a M t /M t /1/K queuing system, 56 such that the number of "customers" in the queue corresponds to the number of bikes at the station.The first two elements of the notation indicate that the arrivals of new "customers" (i.e., bikes in our case) and their service both are non-stationary Markovian processes.Both types of events are exponentially distributed with time-dependent rates λ i (t) and μ i (t), respectively.The third element indicates that there is one machine serving the customers, and the fourth element that the queue has a total of K = C i waiting spaces.
The queuing system is summarized in Figure 3.
A few assumptions need to be stated here (the same assumptions are made by Schuijbroek et al. 50).First, the model assumes all bike users to travel alone.In reality, a portion of the users travel in batches and therefore arrive at the pickup and/or return stations nearly simultaneously, which is against the assumption of exponentially distributed bike pickups and returns (Poisson process).We estimate † that out of all trips conducted in Helsinki in June 2020 81.5% were made alone, 16.1% in pairs, and 2.4% in batches of three or more.As the majority of the trips are conducted alone, the effect of some users traveling in batches is presumably small.Second, the model considers the stations separately without considering possible interactions between them.As example of this, let us consider a user starting a trip who cannot pickup a bike from a station because it is empty (and the opposite case of ending a trip to a full station).These users would typically proceed next to a nearby station.Nevertheless, this behavior is not considered in the model.
Let us denote the transient probability for the number of bikes at station i to equal σ 0,…, C i f g at time t ≥ 0 given that it was Evaluation of a periodic rescheduling algorithm on an operating environment during a time span of t 0 …t end ½ F I G U R E 3 Markov chain of the bike inventory S i (t) at station i S, having the capacity of C i .The pickup and return rates of bikes are μ i (t) and λ i (t), respectively Regarding the M t /M t /1/K queuing system, no closed form solution exists for these probabilities.Nevertheless, we can solve the probabilities numerically using the Kolmogorov forward equations, 57 which for this system are as follows: We use the Runge-Kutta method of order 5(4) ‡ , implemented in the Python library Scipy, 58 to solve this system of differential equations.
Figure 4 shows two 4-h probabilistic predictions made at 6:00 and 14:00 for the Unioninkatu bike sharing station, based on the data shown in Figure 1, and the starting inventory at 6:00 and 14:00 on August 17, 2021.It is worth noticing that the station has higher return than pickup rate in the morning (Figure 1), and vice versa in the afternoon.Thus, the station is at risk of becoming full during the morning and empty during the afternoon.It is worth noticing that the prediction method is only based on historical data of pickups and returns on the station.The method does not consider other information (e.g., about the weather or nearby events), which could potentially improve the accuracy in practice.
From the operations point of view, we are interested in the fraction of demand that can be satisfied.Schuijbroek et al. 50define the expected fraction of satisfied pickup and return demand at station i to be where T the length of the observation period.The authors define minimum values for these fractions, referred to as pickup and return SLRs, , respectively, for all stations i S. Regarding station i, let us denote its initial inventory by s 0 i and the inventory after the rebalancing by s i .Raviv et al. 46 provide a proof that g i (s, 0, T) and g i (s, C i , T) increase and decrease, respectively, as the starting inventory s increases.Based on this proof, Schuijbroek et al. 50determine the lower and upper bounds for the inventory s i after rebalancing: where The stations belonging to set S 0 ¼ i S : referred to as self-sufficient stations.These stations do not require rebalancing, but can be used as a source or a sink for bikes (as long as the bounds are not violated).We make an assumption that satisfying the pickup and return demand is equally important and that all stations have the same importance §.This allows us to define a general system-wide SLR , the lower and upper bounds cannot be satisfied.In such cases, we define the bounds to equal the inventory level that maximizes the worst of the two SLRs: We define the set of stations to be visited initially as S Ã ¼ S ∖ S 0 .
This set may, however, have a shortage of bikes or empty docks.In such a case, we incrementally add stations from S 0 to S Ã until S Ã is self-sufficient or S Ã ¼ S. We add stations from S 0 in the order of the greatest g i (s, 0, T) if the shortage is of bikes or g i (s, C i , T) if the shortage is of docks (Equation 3).
Two focal control parameters of the probabilistic prediction method, presented in this section, are the length of the prediction horizon T and the SLR β.These control parameters, along with the reoptimization interval Δt, are optimized using a method, which we will describe later in Section 3.

| Routing model
The objective of the MIP routing model by Schuijbroek et al. 50 included in the objective function.Therefore, the objective of the model is essentially to minimize the time to bring system to the above-described state that satisfies the SLRs.We have defined both the loading and unloading times to be equal to t load , and thus in The model, which is both arc-and sequence-indexed, has three sets of integer variables.Binary variable x i,j,t,v defines whether arc (i, j) is traveled at time step t T by the vehicle v. Integer variables y À i,t,v and y þ i,t,v define the number of bikes pickups and deliveries, respectively, from/to station i at the time step t by vehicle v.
As we can see from the variable definitions, Schuijbroek et al. 50fine the model to take into account multiple vehicles ¶ .However, when they use it with the earlier mentioned clustering model, they solve the routing problem for one vehicle at the time.We use the routing model in the same way.Therefore, for us, set V has always the cardinality of 1.We use the following slightly modified version of the model: Constraints 8 and 9 impose that the SLR (specified by the inventory bounds in Equation 4) is fulfilled at all stations.Constraints 10-12 ensure that the route of vehicle v is started from one station, is continuous, and has no branching.Constraints 13 and 14 impose that the pickup and delivery of bikes to/from station i is only possible when leaving from it.The set N ¼ S Ã [ 0 f g includes an artificial station that allows operations to be performed at the final station of the route without incurring a new routing cost.Constraints The starting location l v of vehicle v is its next destination after time Δt from the start of the re-optimization procedure.q 0 v is the corresponding load when arriving to that station (i.e., the starting load for the next planned route).
Regarding Equation ( 14), Schuijbroek et al. 50formulated it as In their formulation, the pickup of bikes can happen when leaving from a station, and delivery of bikes when arriving at a station.However, to our understanding, this form would allow the capacity of vehicles to be violated.In our experiments, we use Equation ( 14), defining that both pickup and delivery can happen only when leaving a station, precluding the violation.
When performing rebalancing with more than one vehicle, we first solve the clustering model 50 to decompose the problem into n single-vehicle routing problems (Equations 7-23), where n is the number of vehicles.The clustering model, which is also a MIP model, is defined in the paper by Shuijbroek et al. 50

| SURROGATE-BASED OPTIMIZATION
In the previous section, we described the online algorithm for the dynamic BRP.We highlighted the prediction horizon T, SLR β, and the re-optimization interval Δt as the main control parameters of the algorithm, which we believe to have a major impact on its performance.Let us denote these parameters by x = [T, β, Δt] T , and the metric of their goodness simply as y x ð Þ ¼ P i S t ef i .Finding the parameters x that minimize y(x) in a certain operating environment is hindered by the high computing cost of the evaluating y(x).
Surrogate-based optimization is a method that is well-suited for optimization problems with expensive function evaluations.A pseudo-code for a parallel surrogate-based optimization with the total budget of N sampling points is shown in Algorithm 1.The process is started by distributing n doe points in the search space by a design of experiment method.We use the Latin hypercube sampling 18 as the design of experiment method.The method distributes the points in the search space, such that their projections onto the variable axes have a uniform spread.The objective function value y is then evaluated at the points x 1 , …, x n doe È É .
The remaining N À n doe sampling points are used in an iterative process, where (i) the surrogate model b y representing y is updated with all data obtained so far, (ii) new sampling points are chosen based on a specified infill criterion (a procedure that involves several evaluations of the surrogate), and (iii) the objective function y is evaluated at these points.In the following, we describe the chosen surrogate model that is based on Kriging (Section 3.1) and the infill criteria (Section 3.2) in more detail.

| Kriging
Before evaluating the objective function y at some point x, we have uncertainty of the value we are going to obtain.In Kriging (also referred to as Gaussian process regression), the observations of the function are modeled as if they are from a stochastic process, represented by the random vector having the mean of 1μ, where 1 is a n Â 1 vector of ones.In this study, we model the correlation of the objective function values at points x i and x j using the Gaussian correlation function, defined as where d is the number of dimensions in the search space and θ 1 …θ d f g are hyperparameters that control how quickly Y changes in the different dimensions of the search space.
Our evaluations of objective function y x ð Þ ¼ P i S t ef i presumably contains noise.Therefore, we define the random vector Y to have a covariance of where σ 2 is variance, λ is the regularization hyperparameter (accounting for the noise), I is an n Â n identity matrix, and R is the correlation matrix, defined as R λ is referred to as the regularized correlation matrix.
The distribution of Y is dependent on the hyperparameters Determine the next n batch sampling points using the specified infill criterion.
Evaluate y in parallel at points x n , …, x nþnbatch È É .
Set n n þ n batch .
end while return solution x with the smallest y.
l yjθ, μ 2 , σ, λ Expressions for the optimal values of μ and σ 2 can be obtained by equating the partial derivatives ∂l/∂μ and ∂l/∂σ 2 to zero.The resulting expressions are Substituting Equation ( 30) to (29), and omitting the constant terms, yields the so-called "concentrated log-likelihood" This form only requires finding the optimal values for θ l , l 1,…, d f g and λ, after which b μ and b σ 2 can be obtained by substitutions.
After choosing the hyperparameters, the model can be used to make a prediction of the objective function value y at point x* that has not been evaluated.Let us define a new correlation vector r as The predicted mean b y x Ã ð Þ and variance b s

| Infill criteria
where y min is the best known objective function value. 25Using integration by parts, the equation can be transformed into where Φ(Á) and ϕ(Á) are the cumulative distribution function and the density function of the normal distribution, respectively, and In this study, we sample the first n ei infill points using the expected improvement criterion (maximization of Equation 35) and finalize the search by sampling n b y infill points using the predictionbased criterion (minimization of ŷðx Ã Þ in Equation 33).In order to reduce the total time of an optimization procedure we use a batch size of n batch = 3 when sampling using the expected improvement criterion.The parallel infill points are sampled using the method by Ginsbourger et al. 33 The final infill points based on prediction-based criterion are evaluated in series (n batch = 1).The method we use follows fairly closely the original EGO method 25 with the exceptions that we use a slightly different correlation function (Equation 26) and finalize the search by the prediction-based infill criterion.Despite of these differences, let us refer to the described method as EGO for the sake of brevity.As the implementation of the method, we use EGO function from the Python module SMT. 60A reader interested in further information on surrogate-based optimization may consult the works by Jones 61 and Forrester et al. 62

| RESULTS
We formulate real-time optimization problems based on data recorded from the bike sharing system in the Helsinki region.The origin-destination data of the trips conducted in the system, as well as the locations and capacities of its stations, have been published by Helsinki Region Transport 55,63 since 2016 under the CC BY 4.0 license.Kainu 64 has published the number of available bikes at the stations of the system at the sampling interval of 5 min since 2017 (also under CC BY 4.0).In 2020, the system consisted of 352 stations.
We have chosen a subset of these stations that lay inside the rectan- We determine the rates by calculating the mean of pickups and returns during 15-min intervals; the resulting values are maximum likelihood estimates for Poisson variables.Figure 1 shows the obtained rates for the station on Unioninkatu (i.e., station 11 in Figure 5).
When evaluating an online algorithm on the test data, we initiate the system at 6:00 using the numbers of bikes at the stations from the data recorded by Kainu. 64We then repeatedly perform reoptimization procedures, and simulate the evolution of the system (Figure 2).At t end = 15:00, the rebalancing is stopped.Finally, we simulate the system until 22:00 (without rebalancing) and determine the total time, P i S t ef i , the stations in the system were either empty or full during the time window of 6:00-22:00.The goodness of an online algorithm is measured based on P i S t ef i , averaged over the 10 test days.
Unlike some other bike sharing systems, the system in Helsinki allows bikes to be returned to full stations.However, in order to keep the methods analogous to other systems, we preclude returning a bike to a full station in our simulations, and use the number of docks C i as the initial number of bikes if the station was initially overfull.We also wish to point out that in the origin-destination data, the pickup demand is censored (i.e., the demand is not visible in the data) during the time periods when the station in question is empty.The return demand, on the other hand, is not censored in the data when the station is full, as the system allows stations to be overfull.We assume the censored pickup demand to have only a small effect on our evaluations.
Regarding the model parameters, we determine the elements of the distance matrix d i,j as the Euclidean distance between the stations with a detour factor of 1.4.The vehicle(s) have a capacity of Picking up or delivering a bike from/to a station takes 1 min.These three parameters are the same as in the work by Schuijbroek et al. 50rther, we define vehicles to have an average speed of v ¼ 20 km/h.Thus, based on the average speed and the pickup/delivery time, parameters d À and d + have the value of 333 m.In the beginning of operation, the load on the vehicle(s) is half of their capacity, q init v ¼ 11,v V.
In the following, we present the results from test cases with one

| Two-dimensional case with one vehicle
Let us start with a case where one rebalancing vehicle operates in the region shown in Figure 5 (the vehicle has the initial location of l init 1 ¼ 1).In this case, the purpose is to optimize the SLR β and the prediction horizon T while keeping the re-optimization interval fixed at Δt = 1 h.The value represents a natural choice for the re-optimization interval that could be chosen in practice.We specify the search domain using the bounds β 0:1,…,0:95 ½ and T 0:1, …,6:0 ½ h.
Figure 6 illustrates the results obtained by the EGO method and the full factorial sampling.The dots in both subfigures represent the initial sampling points.In Figure 6A, the crosses indicate the locations of the iteratively evaluated infill points.After the optimization procedure with EGO method, we trained a surrogate using all observed data.The contour in the Figure 6A represents the predicted objective function value (ŷðx Ã Þ in Equation 33) in the search space based on this surrogate.
The locations of 92 bike stations around the city center of Helsinki that are used in this study.The map is generated using OpenStreetMap 66 Both figures indicate the design space to have a vertically aligned valley of good control parameter combinations.The valley starts at the top of the search space at around (β = 0.8, T = 6.0 h) and becomes wider when Δt is reduced.The valley is surrounded by two separate regions with poorly performing control parameter combinations.The wide region at the top left corner of the search space corresponds to algorithms where the SLR β is so low that no or only a few stations are added to the route of the vehicle.This leads to situations where the vehicle is on idle and waits for instructions.If no rebalancing is performed during the test days, the objective function has the value of P i S t ef i ¼ 245:49 h (applies to several points in this region).On the other hand, the (vertically) narrow region at the top right corner of the search space corresponds to algorithms where too many stations are added to the routing model, leading to combinatorially complex models.As a result, the MIP solver fails to find an integer solution in the given computing time, and again no rebalancing instructions can be sent to the vehicle.
The points with the smallest objective function values are circled in Figure 6.The numeric values of these points, as well as the corresponding objective function values, are listed in Table 1.With the control parameters obtained by the EGO method the total time the stations are empty or full, P i S t ef i , is 3.32% smaller than with those obtained by the full factorial sampling.
Figure 7 shows the realized and planned route of the vehicle in the morning on August 17, 2020 when using the optimized control parameters obtained by the EGO method (see the map in Figure 5).
The realized and planned routes are indicated by solid and dashed lines, respectively.The three subfigures correspond the first three time points of the day when updates of the optimized operations are sent to the vehicle.Figure 8 shows the realized and planned load on the vehicle corresponding to these states.These two figures illustrate how the optimized plan of operations changes when new information is obtained.In Figure 8C, the one-bike gap between the realized and the planned load is caused by a shortage of bikes at station 121, precluding the vehicle to follow the planned operations precisely.
With the optimized control parameters, obtained by the EGO method, the average solution time of the MIP solver was 130.9 s, indicating that in many cases the solver did not need all of the allocated solution time of Δt = 3600 s to find the optimal operations.However, the solution times had significant variance.The minimum and maximum solution times were 0.0015 and 3600.0 s, respectively.The MIP gap varied from 0.00% to 2.33%.
F I G U R E 6 Results obtained using the EGO method (A) and full factorial sampling (B).The dots and crosses represent the initial and infill sampling point, respectively.The best obtained point is circled.The contour in (A) is from the surrogate that is trained with all observed data.EGO, efficient global optimization; SLR, service level requirement

| Three-dimensional case with one vehicle
Considering that the MIP solver did not need most of the allocated solution time in the case, the next question is whether we could find a better algorithm if we also relax the re-optimization interval Δt and include it as a new dimension in the search space.In this section, we study the same dynamic BRP as in the previous section but seek the optimal control parameter combination from the three-dimensional search space, defined by bounds β 0:1…0:95 ½ , T 0:1…6:0 ½ h, and Δt 0:01…3:0 ½ h.
We use the 64 sampling points allocated to the full factorial sampling by discretizing each dimension using four points (4 3 = 64).
Figure 9B illustrates the results obtained at these sampling points.It is worth noticing that the objective function landscape at all four 4 Â 4 horizontal layers resemble that of Figure 6B. Figure 9A shows the isosurfaces obtained by a surrogate that is trained with all observed data.The improvement in the best obtained objective function value is only 0.30% in comparison to the previous two-dimensional case (compare Tables 1 and 2) although the re-optimization procedures are performed more frequently.Arguably, there could be at least two reasons for observing only a relatively small improvement.First, we have

| Two-dimensional case with two vehicles
Finally, let us examine a case with two rebalancing vehicles.As the inclusion of the re-optimization interval Δt in the search space provided only a minor improvement in the objective function value, we fix it, for the sake of simplicity, again to the value of Δt = 1 h.Thus, the case is otherwise the same as that of Section 4.1, but has two rebalancing vehicles instead of one.The vehicles have initial locations of l init 1 ¼ 1 and l init 2 ¼ 81, located at the opposite sides of the region.Figure 10 illustrates the results with the EGO method and full factorial sampling.The objective function landscape is similar to that of  3).In this case, the EGO method yielded control parameters with an objective function value 0.76% lower than that from the full factorial sampling.Here, the re-optimization of operations may reallocate a station from one vehicle to another (see, e.g., station 32 in the optimized operations at time points 6:00 and 7:00).
We tested also an alternative infill strategy for the EGO method where also the first n ei infill points are generated based on the prediction-based infill criterion.The optimized objective function values in the three test cases (Sections 4.1-4.3)were within 0.7% from those of the primary strategy described earlier in this section.
Arguably, if the objective function has multiple local minima, the alternative strategy may yield sub-optimal solutions, as it is essentially a local search method.

| DISCUSSION AND CONCLUSIONS
In this study, we have investigated surrogate-based optimization of a periodic rescheduling algorithm on a dynamic BRP.We optimized three continuous control parameters of the algorithm (the SLR β, prediction horizon T and re-optimization interval Δt) and investigated how these parameters affect the performance of the algorithm.Based on our results on two-and three-dimensional search spaces, the main benefit of using the surrogate-based optimization is the reduction in the computational cost of tuning an online algorithm to a certain environment.The full factorial sampling was allocated 64 evaluations, all of which involve evaluating the performance of the algorithm on 10 days, each having 9 h of rebalancing vehicle operation.Thus, the total allocated computing time to solve the MIP models was 5760 CPU-hours.On the three studied cases, the surrogatebased optimization method was able to find control parameter combinations with 0.76% to 3.32% better objective function values with 50% fewer function evaluations in comparison to the full factorial sampling.However, the actual computing times of evaluating a point varied significantly depending on the region of the search space.In general, points with the smallest SLR β had the smallest computing times (in an order of seconds) due to the low combinatorial complexity of the resulting MIP models, whereas the points with a largest SLR β used, in many cases, all of the allocated computing time.
We can identify a few directions for improving the method.First, while the Latin hypercube sampling is a frequently used design of experiment method, yielding a uniform projected spread along the variable dimensions, it is not necessarily a space-filling method.Such a design of experiment method is, for example, that by Morris and Mitchell, 19 which maximizes the minimum distance among the points of a Latin hypercube.
Second, cost-aware methods have recently been developed at least in the field of Bayesian optimization. 65Considering the high variance that we observed in the actual computing cost, these methods could be useful in reducing the computing time of the optimization procedures, as they construct also a surrogate for the evaluation time and use this surrogate when deciding new sampling points.Third, surrogate-based optimization is applicable to optimization problems with up to around 20 variables ‡ ‡ . 20,35The future study should investigate expanding the dimensions of the search space by other continuous parameters related to an online scheduling algorithm (e.g., the allocated computing time per re-optimization procedure).A pre-screening of parameters by a fractional factorial design 17 could be useful prior to surrogate-based optimization, in order to include only important parameters in the optimization.
On the second test case, in which we also optimized the reoptimization interval, the best obtained control parameter combination has the re-optimization interval of Δt = 0.142 h (i.e., around 8.5 minutes).Sending updated operations to the drivers of the vehicles this frequently would be very impractical in reality, as the drivers would have uncertainty of the operations they need to perform after visiting the next planned station.Therefore, most operators would prefer to use the re-optimization interval of Δt = 1 h of the first case, the objective function value of which is only marginally compromised.This is also the case in most industries facing varying demand patterns or disturbances in the processes.Once an operator decides to take an action, for example, perform a small maintenance routine, he is occupied for a certain time and this routine may also limit or determine the next production decisions.For instance, in multi-product processes, a sequence-dependent changeover may take even 30 minutes (in, e.g., Pulp & Paper or steel industry) and there is no rationale to swap the next product during this time.
As already indicated above, the methodology described in this paper is generic and can be applied to various processes.The dynamics of a bike sharing system and process industries are both stochastic by nature.On the material side, there may be urgent orders or supply problems from the upstream processes.Additionally, the processing times are heavily affected by potential disturbances or deviations in, for example, material qualities and yields.Many of these stochastic elements cannot be fully prepared for but may be observed or predicted, for example, by surrogate models, allowing to better tune the optimization model parameters, as can also be seen in many earlier contributions in PSE, cited in the literature overview.The main challenge is to have access to sufficient information and suitable methodologies in order to better describe the underlying process model through historical data.The discussed study, thus, also contributes to the research on the integration of scheduling and control: here, we use AI/ML methods to acquire more information and knowledge about the low-level process dynamics (that in this case cannot be fully observed nor controlled) in order to improve the upper-level awareness and, consequently, the overall decision quality.

ENDNOTES
* This dataset can be generated from user trip data, containing the identifiers of the origin and destination stations, as well as the corresponding time stamps.Such data are widely published by the operators of bike sharing systems.† The estimate is based on an assumption that two trips belong to the same batch, if they pairwise have the same pickup and return stations and their corresponding time stamps are within 3 min.‡ The method takes steps using the fifth-order Runge-Kutta method and controls the error by assuming an accuracy of the fourth-order method.§ This assumption has the caveat that the algorithm cannot prioritize a station with high demand over a station with low demand.Nevertheless, station-specific service level requirements are outside the scope of this study.¶ The routing model with multiple vehicles has a high combinatorial complexity already with relatively small number of stations.The authors use it mainly as a reference model.
** We use the data from the second half of August as the test data because July and the first half of August are the main vacation times in Finland.The vacations presumably have an effect on the daily demand profiles of the bike sharing system.Studying this effect falls outside the scope of this study.† † We omit the computing time to determine the bounds s min i and s max i ,i S (Section 2.3.1), as this time is typically in an order of seconds, or tens of seconds.‡ ‡ With a greater number of variables the repeatedly performed retraining the surrogate may incur a prohibitively large computing cost.
deploy surrogate modeling in simulation-based optimization of a three-stage supply chain problem under uncertainty.Sahay and Ierapetritou 39 minimize the total cost of a multi-enterprise supply chain by finding the optimal warehouse capacities using surrogate-based optimization based on Kriging and expected improvement maximization.Shi and You 40 reformulate an integrated optimization problem as a mixed-integer nonlinear program and propose a bilevel structure, consisting of scheduling dynamic optimization levels.Surrogate models based on piecewise linear functions are used to represent the linking variables of the levels (i.e., processing times and costs).Candelieri et al.

F
is to minimize the maximum travel distance among a set of rebalancing vehicles subject to satisfying the bounds in Equation 4 at all stations I G U R E 4 Probabilistic prediction of the number of bikes at Unioninkatu bike station in Helsinki.The prediction is made based on historical data from June 2020 and given starting levels at 6:00 (A) and 14:00 (B) on August 17, 2020.The white lines show the actual realizations on that day.The station has a capacity of 22 bikes.The brightest yellow color exceeds the scale i S. Considering that the vehicles travel at the constant speed v, the authors transform the time to pickup or deliver a bike into the corresponding distances d À and d + , respectively, which the vehicle would travel during these operations.These distances are

15 and 16
limit the amount of transshipment.Constraints 17 and 18 ensure that the load on vehicle v is within 0, …, Q v f gthroughout the rebalancing.Constraint 19 tracks the total distance h v traveled by vehicle v (taking also into account the bike pickup and delivery times).Constraint 20 tracks the maximum distance H among h v , v V. Constraints 21-23 were not in the original model by Schuijbroek et al.. 50 We add them, in order to make the model more suitable for dynamic BRPs.Constraint 21 imposes vehicle v to start the route from its starting location l v .Constraints 22 and 23 enforce that a station can only be visited once, which reduces the computational complexity of the problem.Accordingly, in order to ensure the feasibility of the model, we adjust bounds s min i and s max i to be at most Q v bikes away from initial number of bikes s 0 i (i.e., the bounds can be satisfied by one visit).We reason this choice by the recursive nature of online scheduling.If a station requires multiple visits, a new visit would be scheduled at the next iteration after visiting the station for the first time.As all stations can be visited at most once we define the number of time steps in T to equal the number of stations in set S Ã .

Finding the optimum of y based
on the surrogate b y requires balancing between sampling points that b y indicates to be competitive (i.e., exploitation) and sampling points that lead to improvements in the accuracy of b y (i.e., exploration).b y should be accurate especially in regions where competitive points are found.The new points are chosen based on an infill criterion.A natural infill criterion for exploiting b y is to sample the point x* that minimizes ŷðx Ã Þ in Equation (33).This is referred to as the prediction-based infill criterion.A commonly used infill criterion in surrogate-based optimization is to sample the point x* that maximizes the expected improvement E(I) in the best known objective function value.This infill criterion has a balanced exploration and exploitation.The expected improvement is defined as gular region, specified by the coordinate ranges [60.15 … 60.19 N] and [24.90 … 25.00 E] (Figure 5).The region, capturing the city center of Helsinki, has a total of 92 stations.We use the origin-destination data recorder in June 2020 as the training data, and the corresponding data recorded during August 17-28, 2020, as the test data**.For the sake of simplicity, we only consider weekdays in this work, as the demand profiles are different on weekdays and weekends, and should anyways be handled separately.The number of weekdays in June 2020 was 22, whereas the specified time window in August 2020 includes 10 weekdays.Here, training the prediction model means determining the pickup and return rates, μ(t) i and λ t ð Þ i ,i S, from the aforementioned origin-destination data.

(
Sections 4.1 and 4.2) and two rebalancing vehicles (Section 4.3).In all cases, we allocate 32 evaluations for the EGO method, described in Section 3, and use the following sampling plan: first, we evaluate n doe = 15 initial points, then n EI = 15 infill points using the expected improvement criterion in batches of n batch = 3, and finally n b y ¼ 2 infill points in series using the prediction-based criterion.We also show reference results generated using full factorial sampling.In this method, the variables, defining the search space, are discretized evenly, and all variable combinations are evaluated.In order to ensure that the full factorial sampling is sufficiently dense, we allocate 64 evaluations for it.The evaluations were performed on a high performance computing facility, consisting Intel Xeon E5 2680/2690 v3 central processing units (CPUs).The number of allocated threads to evaluate a day in the test data was chosen to be the same as the number of rebalancing vehicles in the test case.Each evaluation was allocated 4 GB of memory.We use Gurobi 9.1.2to solve the MIP models (i.e., the routing model [Equations 7-23] and the clustering model, defined in 50).Regarding the solver time limits, if only one vehicle is used in rebalancing, we define the solver time limit to equal the re-optimization interval † † Δt.If multiple vehicles are used, we define the time limit to solve to the clustering model as 20% of Δt, and the time limit to solve the resulting routing models as Δt À t clu , where t clu is the run time of solving the clustering model (the routing models are solved in parallel).

7 F I G U R E 8
Realized (solid line) and planned route (dashed line) of the vehicle when obtaining first three operations updates at 6:00 am (A), 7:00 am (B), and 8:00 am (C) on August 17, 2020 Realized (solid line) and planned load (dashed line) on the vehicle corresponding to the three states shown in Figure 7.The vertical dotted lines indicate the arrival times to specific stations cube (where Δt is small), the isocurves are aligned nearly vertically.This indicates low dependency of the re-optimization interval Δt.Further, the hyperparameters θ 1 , …, θ d f gof the Kriging model give an indication of the importance of the different variables.The hyperparameters θ 1 and θ 2 , corresponding to β and T have values of 0.708 and 0.454, respectively, whereas θ 3 , corresponding to Δt has a value of 0.101, indicating lesser importance.

Figure 6
Figure 6 but, in this case, the valley of good control parameter combinations has shifted slightly toward top right.The shift is due to the clustering method, which reduces the combinatorial complexity of the routing models.Even at the upper bound of the SLR β = 0.95, the MIP solver can find an integer solution in most of the optimization procedures.The control parameters with the best objective function

Figure 11
Figure 11 illustrates the routes of the vehicles when they receive the first three operations updates in the morning on August 17, 2020.
The re-optimization interval has the fixed value of Δt = 1 h.Abbreviations: EGO, efficient global optimization; SLR, service level requirement.

Table 2
T A B L E 3 Optimized control parameters, and the corresponding objective function values, on the twodimensional case with two vehicles Note:The re-optimization interval has the fixed value of Δt = 1 h.Abbreviations: EGO, efficient global optimization; SLR, service level requirement.