The phenomenon of decision oscillation: A new consequence of pathology in game trees

Random minimaxing studies the consequences of using a random number for scoring the leaf nodes of a full width game tree and then computing the best move using the standard minimax procedure. Experiments in Chess showed that the strength of play increases as the depth of the lookahead is increased. Previous research by the authors provided a partial explanation of why random minimaxing can strengthen play by showing that, when one move dominates another move, then the dominating move is more likely to be chosen by minimax. This paper examines a special case of determining the move probability when domination does not occur. Specifically, we show that, under a uniform branching game tree model, whether the probability that one move is chosen rather than another depends not only on the branching factors of the moves involved, but also on whether the number of ply searched is odd or even. This is a new type of game tree pathology, where the minimax procedure will change its mind as to which move is best, independently of the true value of the game, and oscillate between moves as the depth of lookahead alternates between odd and even.


INTRODUCTION
The minimax procedure is the fundamental search algorithm for deciding the next move to play in two-player perfect information games between white and black; [1][2][3] Chess, Checkers, Othello and Go are examples of such games. The minimax procedure is utilised by constructing a full-width +1-ply game tree (with ≥ 0) from the current position (with white, maximizing, to move), where nodes represent game positions and arcs represent legal moves from one position to another. The value of the root is determined by computing a score for each leaf value using a static evaluation function and backing up these scores using the minimax rule. In practice, many refinements have been developed in order to improve the performance of the minimax search algorithm. 3,4 Here, we concentrate on the classical minimax search algorithm, noting that the results of the paper would apply to any variant of the algorithm that guarantees the same outcome as classical minimax, such as alpha-beta pruning. 4 Many of the variants of the minimax search algorithm 4 would not necessarily return the same minimax value as classical minimax and therefore would demand separate study.
In order to measure the utility of the minimax procedure, we follow Reference 5 in using a random static evaluation function that returns a natural number uniformly distributed between 1 and , with > 1; this variation of the minimax procedure is called random minimaxing. In this way, we can decouple the effectiveness of the minimax procedure from the accuracy of the static evaluation function, allowing us to study the minimax procedure in its raw form. The experiments carried out by Beal and Smith, 5 using random minimaxing in Chess, produced the interesting result that the strength of play increases as the depth of the lookahead is increased. In an attempt to understand this result, we showed in Reference 6 that, when one move dominates another move, then the dominating move is more probable, that is, more likely to be chosen by the minimax procedure. A move by white dominates another when choosing the former move over the latter would give less choice for black, at all levels, when it is black's turn to move, and subsequently more choice for white when it is white's turn to move. Under random minimaxing, domination can be viewed as a form of mobility, which lends support to the hypothesis that the minimax procedure is successful in games (such as Chess and Othello) where having greater mobility is considered to be advantageous.
A problem that was left open in Reference 6 was to determine the probability of one move being chosen rather than another when domination does not occur. In this paper we tackle this problem for the case where the subgame trees rooted at the nodes representing the moves have uniform branching factors. The branching factor of such a subgame tree T i can be viewed as the average number of choices a player would have in subsequent moves, given that the move that leads to the root of T i was actually chosen. We call this model of a game tree the uniform branching model.
Our main result is that modelling a game tree in this way leads to decision oscillation, which is a form of pathological behaviour in the following sense. We show that, under reasonable assumptions, whether the probability of one move being chosen is greater than that of another move depends not only on the branching factors of the moves involved but also on whether the number of ply searched is odd or even. Decision oscillation implies that the minimax procedure may change its mind as to which move is best, independently of the true value of the game, resulting in the move choice alternating between two moves as the search deepens. A similar phenomenon, called the odd/even effect (in which the backed-up value oscillates each time the depth of search increases by one), 7 has been observed in practical game-playing programs. However, for computer chess programs, it has been suggested that this phenomenon is related to quiescence, 1 which is not related to decision oscillation as described here.
Game tree pathology has been studied for over three decades. 8 Under the uniform game tree model, when the scores of the leaf nodes are independent identically distributed random variables restricted to the values 1 (win) or 0 (loss), the term pathological means that, as the depth of lookahead increases, the probability that a move chosen by the minimax procedure is correct tends to the probability that a randomly chosen move is correct. 9 A related form of minimax pathology, called the "last player theorem", was exhibited by Nau, 10 who showed that, under the uniform game tree model with win/loss values at the leaf nodes, the probability that the last player appears to have a winning strategy tends to one as the depth increases. (Nau's game tree model is slightly different from ours in that he assumes that the last player is maximizing, while we assume that the first player is maximizing.) One explanation for the apparent absence of pathology in real games could be that the scores of leaf positions are seldomly independent. Beal 11 showed that, when sibling node dependence is taken into account, the error probability decreases with the depth of lookahead, and Pearl 12 showed that if the density of "traps" exceeds a certain threshold then pathology can be avoided. This explanation for the lack of pathology, when real games contain numerous "traps" that lead to early termination of the game, is based on the fact that for such terminal nodes the score assigned to the position by the evaluation function reflects the true value of the game. A further refinement was suggested by Scheucher and Kaindl,13 who showed that the use of multivalued evaluation functions with a non-uniform error distribution allows for improved evaluation quality, so that evaluation errors decrease when the depth of search is increased. Luštrek et al. 14 extended the model further to include real values for both heuristic and true values, showing that pathology can be eliminated in this model due to reduced variance of the values returned by the minimax procedure.
Despite these results, it was shown by Sadikov et al. 15 that in a king and rook versus king chess endgame, with errors introduced into the evaluation function, pathology still exists in the sense that deeper search does not reduce the evaluation error. They conjecture that the cause is the bias introduced by the minimax procedure, where the bias is defined as the difference between the true and minimax values of a position, averaged over all the positions examined during a minimax search. Interestingly, despite the fact that evaluation accuracy may decrease with deeper searches, it was shown that bias affected all evaluations more or less equally, and therefore it did not affect the relative ordering of moves with respect to their quality, Hence, the decision accuracy of minimax was not impaired in this context and was more accurate with deeper searches.
Schrüfer 16 uses a model similar to that of Nau, 9 that is, game trees have a uniform branching factor and a finite depth, and only two heuristic values, win and loss, are considered. However, the error attached to the heuristic value returned by the evaluation function is modelled using two different sets of independent identically distributed random variables. This is to distinguish the probability of assigning a win value to a leaf when it is a loss and the probability of assigning a loss value to a leaf when it is a win (in Nau 9 the probability of these two types of error are equal). Schrüfer 16 showed that, for the errors to decrease with increasing depth of search, the probability of having a single "good" candidate move should be small.
Notwithstanding this long history of studying minimax pathology, there is continuing interest in the problem. Lorenz and Monien 17 investigated arbitrary (not necessarily uniform) win/loss game trees, using the same probability model as in Reference 9. They show that, in order for minimax to be non-pathological in this more general model, there must be at least two leaf-disjoint strategies that prove the true value at the root of the game tree. A more recent paper by Nau et al. 18 used simulations and experimental tests to show that pathology is more likely to occur in practice when the heuristic evaluation function has a small number of possible values, the branching factor of the game trees is high, and the local similarity values of nearby nodes in game trees is low. Another recent paper by Zuckerman et al. 19 suggests that local pathologies in the game tree occur due to the error of the static evaluation function used when applying the minimax procedure. The authors propose to minimise the error by tracking these local pathologies in tandem with computing the minimax value. Also, even more recently, a paper by Liu et al. 20 suggests a variant to the minimax algorithm that overcomes pathology by optimising the minimax backup rule.
It is also worth mentioning that the newer Monte Carlo tree search (MCTS) algorithm 21,22 has proved to be competitive with the minimax algorithm when combined with reinforcement learning methods. 23 However, we note that in Reference 24 it was shown that the Stockfish Chess engine, 25 which deploys minimax rather than MCTS, outperforms a state of the art MCTS-based Chess engine on solving a well-known Chess endgame puzzle. In addition, 26 demonstrated that a best-first variant of minimax is competitive with state of the art reinforcement learning methods that use MCTS. To combine the strengths of both MTCS and minimax, hybrid search strategies have been introduced. 27 The layout of the rest of the paper is as follows. In Section 2, we provide notational background for the minimax procedure and then, in Section 3, we introduce the definitions and assumptions used in the paper. In Section 4, we adapt the results from Reference 10 to random minimaxing of game trees. In Section 5, we show that the uniform branching model leads to decision oscillation for random minimaxing. Finally, in Section 6, we give our concluding remarks.

THE MINIMAX PROCEDURE
We assume that the reader is familiar with the basic minimax procedure. 2 However, we briefly recall some of the definitions from Reference 6 relevant to this paper. A game tree T is a special kind of tree, whose nodes represent game positions and arcs represent legal moves from one position to another; the root node represents the current position. In general, we will not distinguish between the nodes and the positions they represent, nor between the arcs and the moves they represent. Furthermore, when no confusion arises, we will refer to the position arrived at as a result of making a move as the move itself. We are assuming a two-player zero-sum perfect information game between the first player, called white, and the second player, called black, where the game has three possible outcomes: win for white (i.e., loss for black), loss for white (i.e., win for black), or draw (see Reference 28 for a precise definition of a game).
The level of a node n in T is defined recursively as follows: if n is the root of T then its level is zero, otherwise the level of n is one greater than the level of its parent node. Nodes of T at even levels are called max-nodes and those at odd levels are called min-nodes. At a max-node it is white's turn to move and at a min-node it is black's turn to move. We assume that T is a +1-ply game tree, with ≥ 0, where the number of ply is one less than the number of levels of T. (Thus the root of T is at level zero and the leaves are at level +1.) Non-leaf nodes of a game tree are called internal nodes. A full-width game tree satisfies the condition that there is an arc for each legal move from every position represented by an internal node. We will assume that all game trees are full-width.
We define the following two operators on T: 1. root moves(T) is the set of children of the root of T, that is, the set of nodes representing the possible positions arrived at after white makes a single move; 2. T[n] is the subgame tree of T rooted at a node n; if n is the root of T then T[n] = T.
We let minimax(T, +1, score, ) denote a procedure that returns the leaf node of the principal variation chosen by minimaxing, 2 where T is the +1-ply game tree whose root represents the current position, score is a static evaluation function, and is a natural number representing the maximum possible score.
We assume that the scoring of leaf nodes is computed by the function score, which returns a natural number between 1 and inclusive, with > 1. We denote the set {1, 2, … , } by [ ]. For the purpose of scoring, we assume that all leaf nodes are distinct although, in practice, two distinct leaf nodes may represent the same position (for example, through a transposition of moves 3 ).
In general, it is possible that there is more than one principal variation, in which case we assume that the minimax procedure returns the set of leaf nodes of all the principal variations. For our purposes, it is sufficient to know whether or not a particular leaf node could be returned by the minimax procedure. We note that normally, in practical implementations, the leaf node of only one principal variation is returned.
The score assigned to an internal node n of T during the evaluation of minimax(T, +1, score, ) is called the backed-up score of n and is denoted by sc(n); when n is a leaf node sc(n) = score(n). The backed-up score of a subgame tree T[n] is sc(n), the score of its root n; so the backed-up score of T is the score of the leaf node of any principal variation.

RANDOM MINIMAXING
For given , score and , the procedure minimax(T, +1, score, ) defines a strategy for playing a particular combinatorial game. We assume, from now on, that successive calls of score return a sequence of independent random integers uniformly distributed between 1 and . In this case, we will refer to the induced minimax strategy as random minimaxing. 5,6,29 We are interested in the probability that any given node lies on a principal variation. For a node n ∈ root moves(T), we define prob(n, i) to be the probability that n is on a principal variation of T and sc(n) = i. This may be calculated as follows: count the number of assignments of scores to the leaf nodes of T such that n is on a principal variation of T and sc(n) = i, and then divide this count by N , the total number of assignments of scores to the leaf nodes of T, where N is the number of leaf nodes of T. We now define prob(n), the probability that a node n ∈ root moves(T) is on a principal variation of T, by From now on, we assume that n ∈ root moves(T) and that T[n] has a uniform branching factor b ≥ 1, where b may depend on n. Suppose first that m is a max-node in T[n] and that m ′ is a child of m. We note that the distribution of sc(m ′ ) is the same for all the children of m. We define where 1 ≤ i ≤ . Then, since m is a max-node, Now suppose conversely that m is a min-node in T[n]. In this case we define Then, since m is a min-node, We therefore define and consider the recurrence relation for t ≥ 0 (see [Reference 6, Equation (9)] for a justification of this formula), where Starting with x 0 , we can compute the corresponding sequence x t , for 1 ≤ t ≤ , by successive applications of F b as in (6). We denote the final value x of the above sequence by For any given value of i, we may substitute x t for x(i) in equations (2) to (5), for appropriate values of t. It is then easy to verify that, when -t is even, x t is the probability that the score of a min-node of T[n] at level + 1-t does not exceed i, and that, when -t is odd, x t is the probability that the score of a max-node of T[n] at level + 1-t exceeds i.
For t = , we have We observe that F b (0) = 1 and F b (1) = 0, otherwise 0 < F b (x) < 1. Thus, when x is 0 or 1, if is even then F b (x) = x, and if is odd then F b (x) = 1 − x.
From now on, unless stated otherwise, we will assume that there are exactly two nodes, n 1 and n 2 in root moves (T), and that the subgame trees T [n 1 ] and T [n 2 ] have uniform branching factors b 1 and b 2 , respectively.
The following lemma (cf. Lemma 5.1 in Reference 6) expresses the probability of a move in terms of propagation functions.
Proof. Since the root of T is a max-node, and prob(n 1 , i) and prob(n 2 , i) are independent, we have The result then follows from (7). ▪ The following corollary generalises Lemma 1 to the case when the root of T has more that two children.

Corollary 1. Suppose that n ∈ root moves(T) has branching factor b, and that all other nodes in root moves(T) also have uniform branching factors. Then, for each i ∈ [ ],
where sib(n) denotes the set of sibling nodes of n, b ′ is the branching factor of n ′ , and Ω(T, i) is independent of n.

THE LAST PLAYER THEOREM FOR RANDOM MINIMAXING
In this section we restate the relevant results from Reference 10 in the context of our formulation of random minimaxing, which differs slightly from that in Reference 10: it is assumed that the last player is maximizing, while we make the assumption that the first player (root node) is maximizing.
The following theorem and corollary follow directly from the results in the Appendix and Table 1 in Reference 10.
; (e) lim b→∞ b = 1 and the convergence is strictly monotonic increasing. (f) 1 2 ≤ b < 1 and the inequality is strict except when b = 1.
(We note that b = 1 − w b , where w b is the threshold value in Reference 10.) Corollary 2. Let x 0 ∈ [0, 1] and let b > 1. We define the sequence x t as in (6).
In the next section we will also need the following lemma.

Lemma 2. For all b > 1, b is irrational.
Proof. Suppose to the contrary that, for some b > 1, b = y z where y and z are co-prime integers. Then y > 1, since, by Theorem 1 (f), b > 1 2 . Now, if u is a prime factor of y, it is also a prime factor of z b since, by Theorem 1 (a), The result now follows, since u is then also a prime factor of z, which contradicts the fact that y and z are co-prime. ▪

A DECISION OSCILLATION THEOREM FOR RANDOM MINIMAXING
We now present our main results, showing that the choice of move under random minimaxing is pathological in the sense that the choice depends not only on the branching factor of the possible moves, but also on whether the depth of search is even or odd. The result depends on modelling game trees using the uniform branching model, and is in contrast to the situation with non-uniform game trees where domination may be present. 6 The following definition imposes some reasonable constraints on the maximum score , and the critical values, v b 1 and v b 2 , for branching factors b 1 and b 2 . The intuition behind the constraints is that the maximum score should be large enough.
Definition 1 (Admissible maximum scores). We assume that b 1 > b 2 and that b 1 and b 2 are defined as in Theorem 1. (Note that v b 1 > v b 2 by Theorem 1 (e).) We say that the natural number , > 2, is admissible if, for some i ∈ [ ], b 2 < x i 0 < b 1 . The following lemma, which is an immediate consequence of Definition 1 and Theorem 1 (f), implies that, in practice, admissibility of is not too restrictive provided is large enough.
The following theorem establishes our main result, that random minimaxing leads to decision oscillation under the uniform branching model. Theorem 2. Suppose that b 1 > b 2 > 1 and that is admissible. Then, for large enough , 1. prob(n 1 ) > prob(n 2 ) if and only if is even, 2. prob(n 1 ) < prob(n 2 ) if and only if is odd.
Proof. Clearly, since is either even or odd, we only need to prove the "if" parts of (a) and (b).
(a) Assume that is even and let i ∈ [ ]. Then x i 0 is strictly increasing with i. We first compute the value of prob(n 1 , i) for each i ∈ [ ]. It follows from Lemma 2 that, So there are three cases to consider:

by parts (a) and (c) of Corollary 2, lim
Clearly there exists a unique i ∈ [ ] satisfying case (C), which implies that prob(n 1 ) → 1 (see also [Reference 30, Theorem 1]). We now similarly compute the value of prob(n 2 , i) for i ∈ [ ]. There are again three cases to consider: This implies that x i 0 < b 1 , by the definition of admissibility. By Corollary 2 (a) lim →∞ F b 1 (x i 0 ) = 0, so prob(n 2 , i) → 0, by Lemma 1. It follows that prob(n 2 ) → 0 and thus, for large enough , prob(n 1 ) > prob(n 2 ) as required.
(b) This follows using a similar argument to that used in (a), utilising parts (b) and (d) of Corollary 2. ▪ We extend the notion of admissibility to game trees with more than two moves by letting b 1 and b 2 in Lemma 3 be the branching factors of any two moves in root moves(T) satisfying b 1 > b 2 . Thus, for to be admissible, the conditions of Definition 1 must hold for the branching factors b 1 and b 2 of any two moves in root moves(T) with b 1 > b 2 .
As a corollary, we show that the main result can be extended to game trees with more than two moves in root moves(T). For node n j ∈ root moves(T), from (9) we have prob(n j , i) = where sib(n j ) denotes the set of sibling nodes of n j . Proof. We note from (10) that prob(n 1 , i) and prob(n 2 , i) are given by the expressions in (8) multiplied in each case by the same factor, namely the product in (10)  We now extend Theorem 2 to the case when b 2 = 1. Proof. We first assume that is even, and show that prob(n 1 ) > prob(n 2 ) in this case.
Using an identical argument to that made in the proof of Theorem 2, it follows that for cases (A) and (B), prob(n 1 , i) → 0 as → ∞. However, for case (C), we have F b 2 (x i 0 ) = x i 0 for all even by Theorem 1 (c). So, by Lemma 1 and (1), as → ∞, so, by Lemma 1, as → ∞, prob(n 2 , i) → 0 or 1 , respectively. Therefore, by (1) by Theorem 1 (e). Straightforward calculation shows that ⌈ b 1 ⌉ + ⌊ b 1 ⌋ > , for ≥ 4. So, for large enough , prob(n 1 ) > prob(n 2 ) as required. As in Theorem 2, the proof when is odd follows by a similar argument. ▪ We next extend the result of Theorem 3 to game trees with more than two moves. The proof of the following corollary is exactly analogous to that of Corollary 3. A subgame tree is level-regular if all nodes at the same level have the same branching factor, that is, the same number of children.
The next extension of Corollary 3 is an interesting and stronger form of decision oscillation pathology. Intuitively, it states that the this result still holds when we attach, between each move n j ∈ root moves(T) and the subgame tree T[n j ] below it, an arbitrary level-regular -ply subgame tree T j , where is an even integer.
For a given move n ∈ root moves(T), let T ′ [n] be the ( + )-ply subgame tree obtained by substituting the subgame tree T[n] for each leaf node of T . We construct T ′ [n] in this way for each node n ∈ root moves(T), where the branching factors r 1 , r 2 , … , r for different nodes n may be different. Let T ′ be the game tree where root moves(T ′ ) consists of the root nodes of T ′ [n] for n ∈ root moves(T). We call T ′ a level-regular extension of T.

Corollary 5. The result of Corollary 3 holds for any level-regular extension of T when
Proof. Let n ′ ∈ root moves(T ′ ) with propagation function F r (F b (⋅)). By the continuity of this function and Corollary 2, for any since F r (0) = 0 and F r (1) = 1 for even . Thus, when is large enough, the attached subgame tree T serves to propagate, essentially unchanged, the values passed to it by the subgame trees T[n] substituted for its leaves. It follows that prob(n) and prob(n ′ ) tend to the same value as tends to infinity, which yields the result. ▪ Corollary 5 is interesting because it indicates that, provided the branching factor is uniform at the lower levels of each subgame tree T[n], the branching factors at the top levels have little effect on the probability of a move. We conjecture that Corollary 5 still holds for an arbitrary extension T of T, even when T is not level-regular, provided the probability distribution of the leaves is suitably constrained.

CONCLUDING REMARKS
We have shown that, under the uniform game tree model, random minimaxing exhibits decision oscillation. That is, for moves whose subgame trees have different branching factors the minimax procedure will most probably change its selection of best move depending on whether the depth of lookahead is odd or even, provided the depth is large enough. The results in this paper suggest that, in order to understand which of two moves is more likely to be chosen, we need to take into account the structure of the subgame trees rooted at these moves. Corollary 5 indicates that the lower levels of the tree have more influence on the minimax value. However, it may be the case that using game specific information, such as sibling dependence, decision oscillation will be eliminated.
A non-uniform random static evaluation function is obviously a more realistic model for a real game than a uniform random function. In a typical game-specific application, the evaluation tries to produce the same move ordering as would a function that estimated the probability of winning. The evaluation function is often a linear combination of terms, to which a squashing function (such as a sigmoid function; see Chap. 4 of Reference 31) could be applied to estimate the probability of winning. In this typical arrangement, the leaf probability values in the range [0, 1] are not uniformly distributed.
The results we have presented generalise to evaluations with and without a squashing function − the only requirement on the squashing function is that it be monotonic. (Monotonicity ensures that the same leaf is selected by minimax.) This is due to the fact that when a monotonic function is applied to the leaf evaluations, the minimax procedure will still choose the same move at every node in the tree. The typical squashing functions used in game-specific applications, for example, the logistic function, are monotonic. Thus the result applies to any model in which a uniform random distribution is used to model the linear evaluation, with a squashing function applied to map the linear evaluation to a probability of winning.
Finally, we note that it follows from Reference 32 that Theorem 1 and Corollary 2 can be generalised to random evaluation functions that are not necessarily either uniform or identically distributed, as long as they are independent. Moreover, Nau also showed that the condition of independence can be somewhat relaxed. It follows that decision oscillation will also hold in this more general setting provided the probability distribution of the leaves is suitably constrained.
We have shown that decision oscillation is present in a situation where domination does not occur. It is an open problem to investigate, in detail, the relationship between these two phenomena. In this context it would also be interesting to find weaker conditions than domination 6 that eliminate decision oscillation. Finally, it remains to be seen how these results relate to practical game playing systems, and whether there is any connection between decision oscillation and the odd/even effect. 7 It would also be interesting to investigate game pathology in the context of MCTS, as it was shown in Reference 33 that pathology may also arise when MCTS is deployed.

DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.