## SEARCH BY CITATION

### Keywords:

• heuristics;
• dynamic programming;
• transportation;
• tabu search

### 1. Introduction

The double traveling salesman problem with multiple stacks (DTSPMS) models a realistic routing and packing problem. It consists in determining a pair of routes (pickup and delivery) for a unique vehicle in two different and disjoint networks. Items are collected in the pickup route, stored in one of several identical stacks in the vehicle, and then delivered in the other network using the delivery route. Each network has a depot where the corresponding route starts and ends. The stacks have a limited size. When items in the stacks are delivered, they must obey the last-in-first-out order. Therefore, given an assignment to a stack for every collected item, a pair of pickup and delivery routes are feasible if and only if, for every pair of items stored in the same stack, the relative order in which they appear in the pickup route is reversed in the delivery route.

The problem was first introduced by Petersen and Madsen (2009) and, since it has the traveling salesman problem as a special case (when there are at least as many stacks as there are clients), is NP-hard.

The DTSPMS models a realistic problem where transportation of goods has to be performed between two cities. A vehicle is used to collect the items in a container in the supplying city. The container is then transported (in a ship, plane, train, etc.) to the demanding city where the items are distributed by a different vehicle respecting the package policy applied in the pickup route, that is, always delivering items from the top of one of the stacks.

DTSPMS instances consist of a given number of clients, two distance matrices specifying the distances between every pair of clients as well as the distance of every client to the depot in the pickup and delivery networks and two integer numbers specifying the number of available stacks and their capacity. Solutions to the problem may be seen as two permutations of the clients representing the pickup and delivery routes and a stack assignment for each client.

Three sets of benchmark instances were introduced in Petersen and Madsen (2009). Each set is composed of 20 instances with, respectively, 12, 33, and 66 clients. All instances have three stacks and the capacity of every stack is so that all of them are full when the pickup route is complete. That is, the capacity of each of the stacks is four for the 12 clients instances, 11 for the 33 client instances, and 22 for the 66 clients instances. A normal semitrailer in Norway loads 3 × 11 pallets, making the medium-size test instances real world. Some trucks may stack the pallets in two levels, having room for 66 pallets, but this gives a slightly different configuration than the larger test cases. In the rest of the paper, we assume that the stacks are full after the pickup phase for every instance.

In Petersen and Madsen (2009), besides proposing the problem, the authors describe four heuristic algorithms based on iterated local search, tabu search, simulated annealing and large neighborhood search. Two simple route operators are proposed to create neighborhoods that are used to explore the solution space. Four complex route operators (and their associated neighborhoods) as well as a variable neighborhood search heuristic are introduced in Felipe et al. (2009a, 2009b). A new approach by the same authors is presented in Felipe et al. (2011), in which solutions where the stacks are overflowed are temporally permitted in order to diversify the search process. In Urrutia et al. (2012), the stack loading/unloading constraints were temporally relaxed in order to use fast route operators borrowed from the classical traveling salesman problem. In Casazza et al. (2012), a set of problem properties is analyzed and a fast heuristic is developed using polynomial algorithms derived from those properties. In Côté et al. (2012), the pickup and delivery traveling salesman problem with multiple stacks is proposed. The authors heuristically approach the problem by large neighborhood search. Since this problem generalizes the DTSPMS, they were able to test their approach on benchmark DTSPMS instances obtaining competitive results.

Exact algorithms have also been proposed for the problem (Carrabs et al., 2013; Lusby and Larsen, 2011; Lusby et al., 2010; Alba Martínez et al., 2013; Petersen et al., 2010). The branch and cut in Alba Martínez et al. (2013) is, to the best of our knowledge, the best exact algorithm for the problem at the time of writing. Concerning instances with three stacks, this algorithm was able to solve instances with up to 25 clients and stack sizes up to 9 in one hour of computation. On the other hand, it was unable to solve to optimality some instances with 21 clients in one hour of execution.

In the rest of the paper, n is the number of clients; s, the number of stacks; and c is their capacity. For the instances considered in this work, holds. Informally, we do not distinguish between a client and the demand of the client, so we may say that a client is stored in a given stack.

Let us define a loading plan as a matrix with s rows and c columns. The loading plan determines the storage arrangement of the clients once the pickup tour is finished. The matrix contains a permutation of the n clients and each row represents the state of each stack. Figure 1 shows a loading plan and a pair of compatible routes for that loading plan.

In this work, we propose a new heuristic for the DTSPMS. The new heuristic is based on a local search in the loading plan. A dynamic programming (DP) algorithm then computes the optimal pickup and delivery routes for the resulting loading plan. This novel approach differs from the rest of the literature that works directly on the pickup and delivery routes.

The rest of this paper is organized as follows. In the next section, we describe our approach and analyze the solution space of the problem in terms of loading plans. Section 'The DP algorithm' describes the DP algorithm and discusses its integration with a local search heuristic. Section 'The core heuristic algorithm' describes the core heuristic being proposed and all its components. Section 'New starting solution from path relinking' introduces the path-relinking enhancement and its usage in the core heuristic. In Section 'Computational results', we show the computational results and in the last section we provide some concluding remarks.

In Casazza et al. (2012), it is shown that it is possible to compute optimal routes with respect to a given loading plan in polynomial time (as long as the number of stacks is fixed) via DP. This fact allows local search algorithms to work in the loading plan solution space instead of working directly with the routes. To the best of our knowledge, no such approach has been proposed before. This absence may be explained by the relatively high computational cost of obtaining the optimal routes of a given loading plan, , that is polynomial and equal to for a fixed number of stacks . This high computational cost may be seen as prohibiting for local search algorithms for which the evaluation of the cost function is the most frequently performed operation.

On the other hand, such an approach has a set of advantages when compared with approaches working directly on the routes. First, the size of the solution space is n! corresponding to the number of permutations of clients in the loading plan. This size is much smaller than the size of the solution space in route approaches that equals (n!)2 because for any of the n! permutations of clients on the pickup route, there are other n! permutations (not all of them feasible) in the delivery route.

Second, contrary to any route approach, there are no infeasibility issues in the loading plan approach. In fact, any loading plan has a pair of associated optimal feasible routes. In the route-based approaches, depending on the number of stacks, the major part of the solution space is composed of infeasible solutions. This fact leads to the development of complex move operators to avoid the evaluation of infeasible solutions (Felipe et al., 2009a, 2009b).

Finally, the average quality of the solutions in a loading plan approach is expected to be much better than in the route approaches. Each pair of routes deriving as a result of the execution of the DP algorithm is optimal among all pair of routes that can be obtained from that loading plan.

The proposed approach falls into the math-heuristic paradigm (Maniezzo et al., 2009) in the sense that it uses an exact mathematical programming tool such as DP inside a metaheuristic algorithm. It has some similarities with the corridor method (Caserta et al., 2010; Caserta and Voß, 2010, 2012; Sniedovich and Vos, 2006) in which subproblems are solved exactly by restricting the search of solutions to a small portion of the search space. In the corridor method, an exact approach for the problem at hand is adapted to search for the optimal solution inside a search space consisting of candidate solutions that are not too different from a given incumbent solution. Redefining how much a solution can differ from the incumbent, the “corridor” can be widened or narrowed. In our approach, instead of having an incumbent solution we have a loading plan. In this case, the “corridor” has constant size and consists of all pairs of feasible routes that can be obtained from the given loading plan.

### 3. The DP algorithm

Assuming s = 3, the DP algorithm takes as argument a loading plan L (whose entry corresponds to the client at the position j of stack i), an distance matrix D (pickup or delivery distance network), the depot and the stacks capacity c. It computes the cost of the optimal pickup or delivery route that can be obtained with the given loading plan. In order to obtain the cost of the optimal solution for the given loading plan, the DP algorithm has to be executed twice, once for the pickup route and once for the delivery route. Let S be the four-dimensional state matrix of the DP algorithm, indexed by the number of items already picked up (resp. delivered) in each stack, and the index of the last modified stack. The S matrix stores the optimal cost for each state.

The base step of the DP algorithm consists in computing the optimal costs of states in which just one item was picked up (resp. delivered), namely, S(1, 0, 0, 0), S(0, 1, 0, 1), and S(0, 0, 1, 2). This is done by computing the distance from the depot to the first client on each stack:

Then, the rest of S can be computed as follows:

The optimal cost associated with the loading plan L can then be computed as the minimum sum over all three stacks of the cost corresponding to picking up (resp. delivering) the overall last item at the last client in that stack and the distance from that client to the depot:

#### 3.1. Working with the DP algorithm in a local search algorithm

As mentioned in Section 'Optimal routes from loading plan via DP', the DP algorithm evaluates neighbor solutions inside a local search procedure. Therefore, given the complexity of the DP algorithm, it is natural to think that its execution will be the bottleneck of the algorithm in terms of computational time. To tackle this issue, we explore here four possible ways of action:

• Implement the DP algorithm as efficiently as possible. We comment on our actual implementation in this section and discuss some further possible improvements in Section 'Conclusions and future work' as future work.
• Try to take advantage of the similarity between the loading plans for which the optimal routes are to be computed. We discuss Δ-evaluation (Hoos and Stützle, 2004) for a swap neighborhood in this section.
• Solve the problem of obtaining the optimal routes heuristically instead of using the DP algorithm while evaluating neighbors, and use the DP algorithm only once per iteration of the search after the new current solution is selected. We comment on this option in this section.
• Evaluate only promising neighbors during the search. This idea is discussed in Section 'Estimating the cost of a move' and is a fundamental part of the proposed algorithm.

On the benchmark instances used in this work (see Section 'Computational results'), the current implementation of the DP algorithm is around five times faster than a naive implementation developed with a first version of the heuristic algorithm. There are two main reasons for that difference in performance.

First, instead of creating or updating states using a transition function while browsing already solved states, our DP implementation browse the yet unsolved DP states in such a way that when browsing a given state all state costs needed for its computation are already computed. This allows us to solve the great majority of the states with a single min  operation on three summations that is computed much faster than the conditional branching instructions needed for verifying the existence of a given state and the need of updating its cost.

Second, all “border” states, that is, those states whose cost computation is not calculated as the minimum of three summations (all states in which any of x, y, or z is equal to zero and those in which and , and or and ), are computed first outside the main cycle of the DP algorithm.

Therefore, our DP algorithm consists only of a series of cycles; all of them performing and summation operations without the use of any conditional branching instructions besides the (inevitable) cycle termination conditions. Browsing states instead of creating them accelerates the DP algorithm execution but, on the other hand, precludes the use of fathoming of suboptimal states (Morin and Marsten, 1976). Note that the described implementation of the DP algorithm does not compute the optimal routes but only their costs. Whenever the optimal routes need to be computed, a slightly more time-consuming version of the DP algorithm that maintains state-to-state pointers, has to be used.

Now, suppose that the DP algorithm had been run for a given loading plan and also consider a new loading plan constructed by swapping the content of any two positions of the original loading plan. It is clear that the cost of states in which the swapped clients were not yet served do not change. Also, note that the cost of reaching the final states from states in which both swapped clients were already served neither change. The costs of reaching the final stages can be computed for all states by executing the DP algorithm in an inverse fashion (from the final states to the initial states).

Therefore, once the DP algorithm has been run in a direct and inverse fashion for a given loading plan, it is possible to compute the cost of optimal routes of loading plans in which two clients had been swapped by recomputing only the cost of states where one of the clients that were swapped was already served and the other was not served yet. Note that this strategy is general and can be applied to moves other than swap. In the instances considered in this paper (with 33 clients and three stacks), the DP algorithm has 4752 states. Only 8.59% of those states must be recomputed in the best case (the two clients being swapped are consecutively stored in the same stack). On the other hand, in the worst case (the two clients are stored in the first and last positions of different stacks), 86.83% of the states must be recomputed. The average percentage of states that have to be recomputed over all the swap neighborhood is equal to 45.11%. Therefore, one can argue that this method would be able to reduce the computational time associated with the computation of the cost of the whole swap neighborhood by a little more than a half.

On the other hand, as noted before, the computational time of the DP algorithm greatly depends on implementation issues. The need of maintaining two state matrixes for the current solution (with the cost of reaching each state from the initial and the final states) plus the need of updating some of the states for the solution being evaluated imply the need of some degree of copying and conditional branching. As a consequence, in our implementation, the Δ-evaluation underperforms the DP algorithm executed from scratch when computing the cost of each candidate solution in the whole swap neighborhood. Also note that for more complex neighborhoods, with more clients being moved, the percentages given in the previous paragraph would increase, making the Δ-evaluation less appealing.

Another way to deal with this high computational cost would be obtaining the costs of the routes associated with the neighbor loading plans heuristically. This idea was actually implemented in the context of this work by replacing the DP algorithm with a 2-exchange local search algorithm developed for the problem in Urrutia et al. (2012). In this approach, every time a neighbor solution is evaluated, the move is performed in both routes instead of being performed in the loading plan. Then, the local search is applied on both routes and the cost of the locally optimal solution found is then returned as the cost of the neighbor. The neighbor solution that is finally selected in each iteration has its optimal routes computed via DP. Note that, due to the restrictions of the problem and the fixed loading plan, the size of the 2-exchange neighborhood used in the local search is linear with respect to the number of clients opposed to the quadratic size of the same neighborhood in the context of the traveling salesman problem. In consequence, the local search is faster than the DP algorithm. Although when applying this idea, computational results showed a decrease of as much as 50% in cost evaluation time, the decrease on quality on the selection of the next current solution within the search usually deteriorated the quality of the best solution found in a fixed period of time.

### 4. The core heuristic algorithm

Tabu search is a metaheuristic based on neighborhood search Glover and Laguna (1997). It starts from an initial feasible solution and iteratively moves to the best solution in its neighborhood that is not considered tabu. The tabu status is used to avoid cycling and guide the search through different parts of the solution space.

Our proposal is a multistart algorithm using tabu search followed by a single-step neighborhood search procedure to improve the initial solutions. Both searches use different neighborhoods and are combined to improve the solution as much as possible before a new initial solution is constructed. Algorithm 1 shows a pseudocode of the proposed heuristic. Both search procedures are described later in this section.

Algorithm 1: The core heuristic
Input: DTSPMS instance
Output: best solution found before stopping criteria is met
1 While stopping criteria not met do
2
3       repeat
4
5
6       until both improving procedures fail improving

In the rest of this section, we first describe the randomized constructive heuristic used to obtain feasible initial solutions. Then, we introduce two move operators to be used on the loading plan. Next, the two improving procedures are described. In the next section, we describe a path-relinking approach to be used instead of the constructive heuristic for creating initial solutions once a set of elite solutions is established. Parameter settings are detailed in Section 'Computational results'.

#### 4.1. Initial solution

As noted in Petersen and Madsen (2009), a solution for one stack instance of the DTSPMS is extendible to a feasible solution for the same instance with multiple stacks. In such a single stack instance, the delivery route is always equal to the reversed pickup route. In multistack instances, considering such routes, any stack assignment for the clients would yield a feasible solution with exactly the same cost. In consequence, one possible way to obtain reasonable feasible solutions for the DTSPMS is to consider the sum of the pickup and delivery matrices and apply any constructive heuristic for the traveling salesman problem considering the depot as a client. Then, use the obtained solution as the pickup route and the reverse of it as the delivery route. Finally, assign the customers to the stacks sequentially respecting their capacity.

In our approach we used a randomized version of a simple TSP insertion heuristic. First, we form a partial cycle including only the two clients farthest apart. Then, while there are clients outside the route under construction, we randomly choose a client not in the route and insert it in the best possible position in the current route. Finally, we assign a stack to every client randomly, observing the stack capacities.

#### 4.2. Neighborhoods and local search

Two simple neighborhoods are proposed in this work to explore the loading plan solution space. In the swap neighborhood, given a current solution, the clients in two different positions of the loading plan are swapped to obtain a neighbor solution. Observe that we do not distinguish between interstack and intrastack swaps, all of them are part of the swap neighborhood. The size of this simple neighborhood is .

In the 3-exchange neighborhood, given a current solution, three clients in three different positions of the loading plan exchange their positions. Either the first client moves to the position of the second one, the second one moves to the position of the third one, and the third one goes to the position of the first one or the first client moves to the position of the third one, the second one moves to the position of the first one and the third one goes to the position of the second one. Observe again that no distinction is made between interstack and intrastack moves. The size of the neighborhood is .

In both neighborhoods, the move evaluation is done by computing the costs of the optimal pickup and delivery routes for the resulting loading plan, by calling twice (once for the pickup route and once for the delivery route) the DP algorithm mentioned in Section 'Optimal routes from loading plan via DP'. Therefore, since the DP algorithm executes in time , the computational cost of evaluating the whole neighborhood is for the swap neighborhood and for the 3-exchange neighborhood. If s is consider fixed and equal to 3 then and the computational cost is and respectively.

#### 4.3. Estimating the cost of a move

As seen, computing the cost of all the solutions in the whole 3-exchange neighborhood can be extremely expensive in terms of computational cost. In order to tackle this, we propose a neighbor cost estimator function to assist the local search procedure in identifying promising neighbors.

An O(1) move estimator may be implemented by simply assuming that the loading and unloading sequences from the stacks are unchanged after the move is performed. That is, if the client in position p of the stack t is placed in the kth position of the pickup route (resp. delivery route), after the move is performed, the client in position p of the stack t is still placed in the kth position of the pickup route (resp. delivery route) even if the client in that position of the stack has changed.

Following this idea, the move is performed not only in the loading plan but also in both the pickup and delivery routes. The resulting routes are feasible but not necessary optimal for the obtained loading plan. In fact, the new cost of the routes is an upper bound on the cost of the optimal routes that can be obtained via DP using the obtained loading plan. Observe that the estimator can be computed in O(1) per neighbor since it only requires to recompute the client-to-client trips that are modified (up to 6 in the 3-exchange neighborhood) in both routes.

In order to improve the quality of this estimator, we propose to evaluate better feasible route positions for the clients involved in the move. Let x be one client involved in the 3-exchange move and and the clients that are stored consecutively before and after x in its own stack. Define that (resp. ) is the depot if x is the first (resp. last) client in its stack. The fact that , x and are in consecutive positions in their stack does not imply that they are in consecutive positions in either of the routes because clients stored in other stacks can be visited in between. Clearly, shifting the position of client x in a route to a new position that is still in between and does not generate any infeasibility.

As an example, consider client 8 in Fig. 1. There are three alternative positions for that client in the pickup route that do not generate any infeasibility. The positions are those currently used by other clients that are in between the positions of clients 3 and 5, predecessor and successor of client 8 in the second stack.

In our improved move cost estimator, after performing the move in the routes, we evaluate the cost of moving every client involved in the move to every alternative feasible position in both routes and return the best cost found. This cost still is an upper bound on the cost of the optimal routes for the modified loading plan and it is usually much better than the one first proposed.

#### 4.4. Tabu search

Tabu search heuristics are iterative procedures that, starting from an initial current solution, find in each iteration the best nontabu solution in a given neighborhood. The best solution found before the stopping criterion is met is returned. The tabu criterion, which determines which neighbor solutions are considered tabu, is defined by the heuristic developer and is meant to avoid cycling and to guide the heuristic to different areas of the solution space.

In this work, we design a tabu search procedure using the 3-exchange neighborhood. Since evaluating each solution of the 3-exchange neighborhood via DP is extremely expensive in terms of computational cost ( for the three stack instances of the work), we use the move estimator described in Section 'Estimating the cost of a move' to evaluate only the most promising part of the neighborhood. Each iteration of our tabu search is divided into three steps.

###### 4.4..0.1 Estimation step:

In this step, the cost of each move in the neighborhood is estimated using the move estimator described in Section 'Estimating the cost of a move'. Due to the size of the neighborhood (), even estimating the cost of all moves in it is an expensive procedure (). So, not every move cost is estimated in each iteration of the tabu search. In the first iteration, the cost of each move is estimated. In the following iterations only the neighbors with best estimated cost from the previous iteration are re-estimated. A complete re-estimation iteration is triggered, after a certain number () of iterations. After the neighbor solutions are estimated, the estimation step concludes sorting the moves by their estimated cost in a nondecreasing order.

###### 4.4..0.2 Evaluation step

This step receives from the previous step a list with tabu and nontabu moves sorted by estimated cost. Considering the moves in that order it is possible to evaluate just the neighbors with the best estimated cost. Some of these are presumably good. In this step, up to a number of the nontabu neighbors with best cost estimation are evaluated via DP. Up to a number of tabu neighbors are also evaluated. If a nontabu neighbor is evaluated as being better than the current solution or a tabu neighbor is evaluated as being better than the best solution known so far (aspiration criteria), the evaluation step is halted and the neighbor is selected as the next current solution. After evaluating nontabu neighbors, the evaluation step terminates selecting the best nontabu evaluated neighbor as the next current solution. If tabu neighbors are evaluated before of those nontabu, the next tabu neighbors are skipped while scanning the list of neighbors.

###### 4.4..0.3 Move step

In this step, the move selected in the previous step is performed and the tabu status is updated. In our heuristic, the tabu criterion is the position of a client in the loading plan, that is, any solution in which a client involved in the move is back in the position of the loading plan it just left is considered tabu for a number of iterations plus a small number of iterations chosen at random from 0 to . Observe that, in this way, a client can be moved in consecutive iterations as long as it does not return to a position of the loading plan where it has recently been unless the aspiration criterion is applied, that is, the solution found is better than the best known so far.

The tabu search stops after a number of iterations without improving the best-known solution. This parameter is set to the first time the tabu search is called. Then, the parameter is increased by one after each restart. In this way, each restart is expected to be more time consuming than the previous ones but, on the other hand, it is expected to search more intensively in the solution space and have a greater chance of finding better solutions. Algorithm 2 depicts the tabu search procedure.

Algorithm 2: The tabu search procedure
input : sol: an initial solution, limit: maximum number of iterations without improving the best-known solution
output: best solution found
1
2 while iterations without improvement do
3       if complete re-estimation iteration then
4             estimate the cost of all moves
5       else
6             estimate the cost of the moves with the best estimated cost
7       sort all moves by estimated cost and consider them in that order
8       while less than nontabu moves has been evaluated and no solution better than was found do
9             if move is not tabu or less than tabu moves have been evaluated then
10                   evaluate the move by calling dynamic programming
11       if solution better than was found then
12             perform the best of the evaluated moves in
13       else
14             perform the best of the nontabu evaluated moves in sol
15       update tabu status

The tabu search final solution is not necessarily locally optimal for the swap neighborhood. Therefore, after the tabu search heuristic stops, an improving solution is sought for in the swap neighborhood of the solution returned by the tabu search heuristic. If an improving solution is found, this solution is used as a new initial solution for a new execution of the tabu search.

### 5. New starting solution from path relinking

Path relinking (Resende et al., 2010) is a technique that explores the solution space between an initial and a guiding solution. By using a given neighborhood, path relinking creates a path of solutions connecting the initial and the guiding solutions. Starting from the initial solution, path relinking evaluates all moves in the neighborhood that decrease the distance from the current solution to the guiding solution. The best of those moves is selected until the guiding solution is reached. If both the initial and the guiding solutions are of good quality, it is expected that the solutions in the path that share properties of both, will also be of good quality.

Path relinking is sometimes used as a postoptimization procedure, see for example Ribeiro and Resende (2012). In our approach, after a set of elite solutions is constructed, path relinking is used for the construction of initial solutions in replacement of the randomized greedy heuristic introduced in Section 'Initial solution'.

#### 5.1. Elite pool management

A pool of high-quality and diverse solutions is maintained during the execution of the heuristic. The hamming distance between two solutions is the number of positions of their loading plans in which the clients stored are different. The parameter defines the size of the elite pool. After each of the first iterations of our multistart heuristic, the solution obtained is stored in the elite pool. A solution obtained in a following iteration, replaces the worst-cost solution of the elite pool if its hamming distance to every other solution in the elite pool is at least . The solution can also enter the elite pool by being better than a solution in the elite pool whose distance to it is smaller than . In this case, the new found solution replaces the solution that is similar to it.

#### 5.2. Integration of the path relinking with the core heuristic

After restarts, each next restart of the heuristic is consider either a constructive restart, in which the initial solution is constructed with the randomized constructive heuristic as in the core heuristic, or a path-relinking restart, in which the initial solution is constructed with path relinking. At first, one of every restarts are path-relinking iterations, this variable is decreased by 1 whenever, after consecutive constructive restarts, the obtained solution does not qualify to enter the elite pool.

In path-relinking restarts, two elite solutions are chosen at random. One is considered the initial solution and the other the guiding solution. Then, a number is chosen at random from the range where is the hamming distance of the solutions.

Finally, steps of path relinking are executed starting from the initial solution toward the guide solution using the swap neighborhood. In each step, each swap movement decreasing the number of differences between the current and the guide solution is evaluated via DP and the best among them is selected. The solution obtained after steps is then used as the initial solution for the current iteration of the core heuristic.

### 6. Computational results

The proposed heuristic was coded in C++ and executed on a Intel Core i5 2.3 GHz with 4 GB of RAM. The experiments were conducted on the 20 instances with 33 clients proposed in Petersen and Madsen (2009). Parameters were experimentally tuned to: .

Table 1 shows the computational results. The first two columns show the instance name and the cost of its best-known solution. The next column considers the core heuristic introduced in this paper without the path-relinking approach. It shows the quality of the best solution found in 10 executions of 3 minutes of computation compared with the best-known cost for that instance (). The last column shows the same information for the core heuristic with the path-relinking approach.

Table 1. Computational results
InstanceBest knownTS minTS-PR min
R0010631.0021.000
R0110321.0001.000
R0210651.0001.000
R0311001.0011.000
R0410521.0051.000
R0510081.0001.000
R0611101.0001.000
R0711051.0001.000
R0811091.0001.000
R0910911.0001.000
R1010161.0001.000
R1110011.0001.000
R1211091.0021.000
R1310841.0001.000
R1410341.0001.000
R1511421.0001.000
R1610931.0001.000
R1710731.0071.004
R1811181.0181.000
R1910891.0041.000

Considering the heuristic without the path-relinking approach, a solution as good as the best-known solution was found for 13 over 20 instances when executing the heuristic 10 times for each instance. The largest difference in quality between the best solution found and the best-known solution was 1.8% for instance R18. The average cost of the solutions found by the core heuristic without the path relinking in one execution of at most 3 minutes was 0.8% above the best-known cost. Considering the heuristic with the path-relinking approach the results are much better. A solution as good as the best-known solution was found for 19 of the 20 instances. The largest difference in quality between the best solution found and the best-known solution was 0.4% for instance R17. The average cost of the solutions found by the core heuristic with path relinking in one execution of at most 3 minutes was 0.17% above the best-known cost.

Even if the obtained results are not as good as the current state-of-the-art heuristic (Felipe et al., 2009a, 2009b), they are competitive. The results compare favorably with other heuristic approaches (see, e.g., Petersen and Madsen, 2009; Urrutia et al., 2012).

With the current parameter setting, the proposed heuristic spends the great majority of its computation time on the tabu search procedure. The path relinking and the one step swap neighborhood search account for less than 10% of the time. Inside the tabu search procedure most of the time is spent in the computation of the cost of optimal routes for neighbor loading plans via DP.

### 7. Conclusions and future work

This paper introduced a novel local search approach to the DTSPMS. Instead of working directly on the routes, the local search works on the loading plan and a DP algorithm is used to construct optimal solutions for each loading plan.

The main drawback of the proposed approach is the high computational cost associated with the evaluation of each move via DP. To overcome this problem a move cost estimator was introduced. The estimator, combined with a partial neighbor evaluation policy, made the approach feasible and competitive with the best heuristics in the literature. Although our novel approach does not outperform the best route search based heuristic algorithms, it obtains, in a short period of time, good-quality solutions for instances for which, currently, no exact algorithm is able to obtain proven optimal solutions and no heuristic can consistently obtain the best-known solution cost over the same time period.

Being a restart algorithm, the core heuristic developed in this work suffers from the absence of a long-term memory policy. With this in mind, a path-relinking approach was proposed to enable the heuristic to use previously obtained good-quality solutions in the construction of new initial solutions. Computational results showed that this approach enhanced the performance of the heuristic.

Despite the competitive computational results obtained for medium-size instances with 33 clients, the heuristic, in its current form, is not suitable for much larger instances. This fact is due to the high computational cost of the DP algorithm used to evaluate move costs during local search.

Future works should aim overcoming this last issue. As noted in Section 'Working with the DP algorithm in a local search algorithm', there are several ways to improve the computational cost of evaluating neighbor solutions via DP. Some of them were studied in this work and one of them, the partial evaluation of the neighborhood using a cost estimator, is a fundamental part of the proposed heuristic. The possibility of fathoming dynamic programming states that cannot influence in the optimal cost of final states as proposed in Morin and Marsten (1976) is a strategy that should be investigated in the context of the DTSPMS. The Δ-evaluation strategy as mentioned in Section 'Working with the DP algorithm in a local search algorithm' can also be further investigated and incorporated into the algorithm. Also, a memory approach could be used to keep track of loading plans already solved. In this way, before calling the DP algorithm, the heuristic could check if the same or an equivalent loading plan had been already solved. Finally, the use of parallel processing may also reduce the computational time for the proposed heuristic.

### Acknowledgments

This research was partially funded by the Research Council of Norway as a part of the DOMinant II project. The author Sebastián Urrutia was partially supported by CNPq grant 303442/2010-7. The authors wish to thank the anonymous referees for their helpful and constructive comments that greatly contributed to improve the quality of the paper. The first and second authors thank Arne Løkketangen for his scientific contribution, but also for his kindness, happiness, wisdom, and friendship. Rest in peace.

### References

• , , , , 2013. A branch-and-cut algorithm for the double traveling salesman problem with multiple stacks. INFORMS Journal of Computing 25, 1, 4155.
• , , , 2013. A branch-and-bound algorithm for the double tsp with two stacks. Networks 61, 1, 5875.
• , , , 2012. Efficient algorithms for the double traveling salesman problem with multiple stacks. Computers & OR 39, 10441053.
• , , , 2010. A math-heuristic for the multi-level capacitated lot sizing problem with carryover. Lecture Notes in Computer Science 6025, 462471.
• , , 2010. A math-heuristic algorithm for the DNA sequencing problem. Lecture Notes in Computer Science 6073, 2536.
• , , 2012. A math-heuristic Dantzig-Wolfe algorithm for the capacitated lot sizing problem. Lecture Notes in Computer Science 7219, 3141.
• , , , 2012. Large neighborhood search for the pickup and delivery traveling salesman problem with multiple stacks. Networks 60, 1930.
• , , , 2009a. The double traveling salesman problem with multiple stacks: a variable neighborhood search approach. Computers & OR 36, 29832993.
• , , , 2009b. New neighborhood structures for the double traveling salesman problem with multiple stacks. TOP 17, 190213.
• , , , 2011. Using intermediate infeasible solutions to approach vehicle routing problems with precedence and loading constraints. European Journal of Operational Research 211, 1, 6675.
• , , 1997. Tabu Search. Kluwer Academic Publishers, Boston, MA.
• , , 2004. Stochastic Local Search-Foundations and Applications. Morgan Kaufman, San Francisco, CA.
• , , 2011. Improved exact method for the double TSP with multiple stacks. Networks 58, 290300.
• , , , , 2010. An exact method for the double TSP with multiple stacks. International Transactions in Operational Research 17, 637652.
• , , (eds), 2009. Matheuristics: Hybridizing Metaheuristics and Mathematical Programming. Springer Verlag, Berlin, Heidelberg.
• , , 1976. Branch-and-bound strategies for dynamic programming. Operations Research 24, 611627.
• , , , 2010. Exact solutions to the double travelling salesman problem with multiple stacks. Networks 56, 229243.
• , , 2009. The double travelling salesman problem with multiple stacks—formulation and heuristic solution approaches. European Journal of Operational Research 198, 1, 139147.
• , , , , 2010. Scatter search and path-relinking: fundamentals, advances, and applications. In , (eds) Handbook of Metaheuristics (2nd edn). Springer, New York, pp. 87107.
• , , 2012. Path-relinking intensification methods for stochastic local search algorithms. Journal of Heuristics 18, 193214.
• , , 2006. The corridor method: a dynamic programming inspired metaheuristic. Control and Cybernetics 35, 551578.
• , , , 2012. A strategic oscillation heuristic for the double traveling salesman problem with multiple stacks. Proceedings of the 5th International Workshop on Freight Transportation and Logistics, Mykonos, Greece.