A New Pathology in the Simulation of Chaotic Dynamical Systems on Digital Computers

Abstract Systematic distortions are uncovered in the statistical properties of chaotic dynamical systems when represented and simulated on digital computers using standard IEEE floating‐point numbers. This is done by studying a model chaotic dynamical system with a single free parameter β, known as the generalized Bernoulli map, many of whose exact properties are known. Much of the structure of the dynamical system is lost in the floating‐point representation. For even integer values of the parameter, the long time behaviour is completely wrong, subsuming the known anomalous behaviour for β = 2. For non‐integer β, relative errors in observables can reach 14%. For odd integer values of β, floating‐point results are more accurate, but still produce relative errors two orders of magnitude larger than those attributable to roundoff. The analysis indicates that the pathology described, which cannot be mitigated by increasing the precision of the floating point numbers, is a representative example of a deeper problem in the computation of expectation values for chaotic systems. The findings sound a warning about the uncritical application of numerical methods in studies of the statistical properties of chaotic dynamical systems, such as are routinely performed throughout computational science, including turbulence and molecular dynamics.


DOI: 10.1002/adts.201900125
real numbers can dramatically alter the dynamics of chaotic systems after a short amount of simulation time. This was observed for the Fermi-Pasta-Ulam-Tsingou problem in the 1950s, [1] for the Lorenz system and the Hénon-Heiles problem in the 1960s, [2,3] for the Chirikov-Taylor map in the 1970s, [4] and for too many systems to enumerate thereafter. It has long been recognized that extreme sensitivity to initial conditions precludes accuracy of orbits after too long a time, both in the computational and experimental domains. It is well known in turbulence [5] and in molecular dynamics. [6] To overcome this problem and restore the predictive power of the scientific method to such systems, dynamicists retreated to the position that, while accuracy for individual orbits may not be possible, accuracy in an averaged sense may still be possible, for some variables of interest in some systems of interest. For example, if the dependent variables of the Lorenz system are denoted by (x, y, z), and if the initial conditions are uniformly distributed in a sphere of unit radius centered at the origin, it may be true that one hundred different computer programs will yield one hundred different answers for (x, y, z) at time 100, but there is hope that the average value of x 2 is more robust, and that it may be calculated and compared with empirical results. The averaging here may be a time average, an ensemble average, or both.
The entire statistical theory of turbulence is built upon the above supposition. For driven, incompressible Navier-Stokes flow in the turbulent regime, for example, it may be impossible to know the hydrodynamic velocity and the pressure at a particular point in space at a late time, but fluid dynamicists are convinced that it ought to be possible to know, say, the average of the fourth power of the x component of velocity divided by its variance squared-a dimensionless number-to very high precision. Lacking any way to compute this number analytically, they routinely resort to digital computer simulation for this purpose. Because the quantity in question is an average over a long time or a large ensemble, they are less concerned about the detrimental effects of floating-point truncation on individual orbits. There is a vaguely articulated but nonetheless widespread hope that any such errors will not lead to systematic deviations in the statistical quantities of interest. The methods of Direct Numerical Simulation (DNS) and Large-Eddy Simulation (LES) for the Navier-Stokes equations are predicated on this hope, and are routinely www.advancedsciencenews.com www.advtheorysimul.com used to predict everything from the weather, to the drag past airplanes and automobiles, to the flow of air through ducts and to the flow of blood through our arteries.
Here, we demonstrate that, for at least one simple-butprototypical driven, dissipative dynamical system, namely the generalized Bernoulli Map, the abovementioned hopes are dashed. While it has long been known that individual orbits of this map are chaotic in nature, and that even statistical averages are problematic for the particular case of the system parameter = 2, [7][8][9] our present work demonstrates a more serious problem. Even assuming generic values of , and even assuming idealized statistical averages over infinite evolution time and infinite ensemble size, the results of such averages will be inaccurate by factors of order unity. Unlike other sources of error associated with floating-point numbers, such as loss-of-significance errors, noise in function evaluation, and underflow and overflow, [10] the problem we describe in this work is due to the discrete and finite nature of the floating-point numbers and the extremely delicate structure of the attracting set of chaotic dynamical systems. Though the root of this problem resides in the use of finite-precision floating-point arithmetic, it cannot be mitigated by increasing the precision of the floating-point representation. Our analysis strongly suggests that the pathology we describe will exhibit for mantissa and exponent fields of any finite length whatsoever, and for floating-point numbers encoded in any radix whatsoever. Indeed, there is every reason to anticipate that this anomalous behaviour is generic in dissipative chaotic systems of the kind encountered in turbulence and molecular dynamics, and that it is entirely possible that many published results of numerical simulation are substantially inaccurate for this reason.
Single-precision IEEE floating-point numbers consist of 32 bits, of the form where the subscript 2 indicates that the enclosed bits are to be interpreted as base-2 numbers. For = 0, as the integer (e 1 , … , e 8 ) 2 ranges from 1 to 254, [11] the format is capable of representing numbers in [1,2) times exponentials ranging between 2 −126 and 2 +127 . In each interval [2 j , 2 j+1 ), there are 2 23 equally spaced singleprecision floating point numbers, distinguished by their mantissa bits. This means that there are as many floating-point numbers in [1,2) as there are in [ 1 2 , 1), as there are in [ 1 4 , 1 2 ), as there are in [ 1 8 , 1 4 ), etc. It is this discrete and uneven distribution of floating-point numbers, superposed upon the delicate distribution of chaotic attracting sets, that causes the pathology studied in this work. Double-precision IEEE floating-point numbers are constructed in similar fashion, but using 52 mantissa bits and 11 exponent bits.
A typical problem in the numerical simulation of chaotic dynamical systems is the estimation of expectation values of observables that depend on the state of the system in a long-time average, an ensemble average, or both. The effects of the pathology we describe fall into one of three categories, depending on a model parameter: i) observable expectation values that are nearly correct, ii) observable expectation values that are obviously wrong, or, iii) last and most insidiously, observable expectation values that are wrong, but not obviously so. In the last situation, we will demonstrate relative errors of order unity, even though the results might superficially seem accurate.
The model dynamical system that we examine to reveal this new pathology is the generalized Bernoulli map, sometimes called the beta shift. [12] Mathematically, this is a dynamical system whose state space is in correspondence with real numbers in the interval [0,1). The initial condition is denoted by x 0 . The state of the system at time j + 1, denoted by x j+1 , is given by For the original Bernoulli map, was taken to be two, but in this study we examine many different values of > 1. We chose this system because it has a dense and complex attracting set, it is simple enough to examine analytically, an exact expression is known for its invariant measure, and it (or straightforward variants or generalizations of it) is topologically conjugate to many dynamical systems of interest to engineers, biologists, chemists, physicists, and mathematicians. Owing to these properties, we are able to calculate exact expectation values of observables on this set using term-by-term integration over the known invariant measure. We refer to the exact value of an observable O(x) calculated in this way as O ex .
We compare O ex with the result that would be obtained by an ideal floating-point simulation, in which the initial conditions comprise an infinite ensemble randomly sampled from the interval [0,1), each of which is allowed to run for an infinite length of time. To obtain such an idealized result in a finite amount of time using single-precision floating-point numbers, we must 1. enumerate all of the limit cycles of the dynamics, 2. identify the basins of attraction of each of those limit cycles in the set of all floating-point numbers in [0,1), 3. compute the probability that the random number generator will select an initial condition in the basin of attraction of each limit cycle, and 4. average the observable over each of the limit cycles, weighted by their respective probabilities.
In order to do (2) correctly, one has to pay careful attention to the non-uniform distribution of the floating-point numbers. The 2 23 floating point numbers in [2 −j−1 , 2 −j ) must each be assigned a weight 2 −j−24 so that the total probability of selecting an initial condition in this interval is 2 23 the length of the interval. This may involve combining probabilities of very different magnitudes, and so it is computed by first sorting the list of contributing probabilities and then adding them from smallest to largest, using double-precision arithmetic, to avoid loss of significance.
We can work this out using single-precision floating-point numbers by enumerating all of the asymptotic limit cycles of the dynamics, as well as the fraction of initial states that lie in the basins of attraction of those limit cycles. By computing averages over the limit cycles, and then weighting those averages by the  (1)). They are not as smooth as these graphs make them appear.
fractional sizes of the corresponding basins of attraction in [0,1), we obtain the result that one would obtain if one could perform an ideal floating-point simulation of the system, for an infinite period of time and using an infinite number of ensemble elements. We are able to compute this result, which is the best that one can hope for from any floating-point calculation, only because there are just about a billion [13] single-precision floatingpoint numbers in [0,1). The structure of the limit cycles thus found permits us to infer how the result would behave for floating-point numbers of arbitrarily high accuracy, and we call this result O fp . The relative error between O ex and O fp is attributable to the newfound pathology.
If is a positive integer greater than or equal to two, it is easy to understand the effect of the generalized Bernoulli map on the base-representation of the state. The action of the map simply eliminates the first digit of x j , and shifts the remaining digits one place to the left to obtain x j+1 . This makes clear that all rational numbers lie on periodic or eventually periodic orbits, since their base-digit representations will either repeat or eventually repeat. It likewise makes clear that all irrational numbers lie on chaotic orbits. The state space therefore consists of a dense set of unstable periodic orbits. This set of orbits, and similar sets in more complicated dynamical systems, have been termed "the skeleton of chaos" [14] because a knowledge of these orbits, including observable averages over them, is sufficient to calculate exact observable averages over the system's invariant measure using the dynamical zeta function formalism. [15] Unfortunately, as we shall demonstrate, the exquisite complexity of these dynamics is badly damaged by casting it into floating-point arithmetic.
In spite of the complexity of the generalized Bernoulli map, much is known about it. For any integer value of ≥ 2, the Perron-Frobenius equation (see, e.g., ref. [16]) can be used to demonstrate that the invariant measure of the dynamics is uniform on [0,1). For non-integer , the invariant measure is much more complicated, but an exact expression for it is given by the following series due to Hofbauer [17] h ( where x j := f j (x) (so that, in particular, 1 j denotes f j (1)), is the Heaviside function, [18] and C is a normalization constant. Assuming that the orbit {1 j } ∞ j=0 is ergodic, the above series makes manifest that the invariant measure has discontinuities at a dense set of points in [0,1). Examples of this invariant measure are shown in Figure 1 for three non-integer values of , though the reader should be cautioned that these graphs are not as smooth as they appear.
In this study, we use observables of the form O(x) = x q for q = 1, … , 100. From Equation (1), it is evident that Because 1 j ∈ [0, 1), the error incurred by truncating the above sums can be bounded by a geometric series, allowing us to compute the result to desired accuracy.

www.advancedsciencenews.com www.advtheorysimul.com
The damage that floating-point arithmetic does to these dynamics is most easily appreciated for the case = 2. Because computer arithmetic is done in base two, and because the binary digits shift one place to the left with each iteration, one bit of precision is lost with each application of the map. Since there are 23 bits of mantissa for single-precision numbers, the result will be zero after 23 iterations. The use of double precision with 52 bits of mantissa just delays the final result. Either way, the invariant measure will be a Kronecker delta at x = 0. If it were possible to let the number of bits of mantissa approach infinity, that Kronecker delta would effectively approach a delta distribution at x = 0. The exact time-asymptotic result for the floating-point dynamics would never be a uniform measure, the correct answer for the real-valued dynamics. It is straightforward to demonstrate that exactly the same thing happens for any even integer value of . Lest the above-described pathological behavior be dismissed as a consequence of the fact that is a multiple of two and computer arithmetic is carried out in base two, it may be noted that a computer based on ternary arithmetic would have the same problem if were any multiple of three, and so on.
It is worthwhile to pause and ask why the spectrum of periodic orbits is damaged to the extent described above. The real Bernoulli map with = 2, for example, has a periodic orbit of period two, wherein 1 3 maps to 2 3 , which maps back to 1 3 . Unfortunately, the binary expansion of 1 3 is 0.010101 …, and that of 2 3 is 0.101010 …, neither of which terminate. Therefore, no matter how large the number of bits of mantissa, these quantities are not exactly representable as floating-point numbers, and neither is the periodic orbit that they comprise. Beginning close to a point on this periodic orbit is insufficient because all these orbits are demonstrably unstable. Any roundoff error in the initial condition will inevitably grow. For The next line of inquiry that suggests itself is the examination of odd integer values of . By doing a thorough examination of the periodic orbit spectrum of these dynamical systems for singleprecision arithmetic, we have managed to classify all the periodic orbits of such systems, for any odd integer . The collection of , partition the set of odd integers greater than or equal to three, and thereby define an equivalence relation ∼ on the odd numbers greater than or equal to three. It is possible to show that equivalent odd values of have the same periodic orbit spectrum. For odd values of from 3 to 17, Table 1 provides these details. It is seen, for example, that 3 ∼ 11 (both in equivalence class S − 2 ), and therefore beta shifts with = 3 and = 11 have the same periodic orbit spectrum. Likewise 5 ∼ 13 (both in equivalence class S + 2 ), and therefore beta shifts with = 5 and = 13 have the same periodic orbit spectrum. Table 1 makes clear that the periodic orbit spectrum for singleprecision floating-point numbers obtained for odd integer is very different from that of the real continuum dynamical system. Only orbits consisting of dyadic fractions [19] can be represented precisely, and these have periods that are restricted to powers of two. The density of orbits as a function of period thus decays ex- ponentially, indicating a negative topological entropy, but the actual topological entropy is easily shown to be ln > 0. Because knowledge of periodic orbits and their properties is sufficient to work out time-asymptotic expectation values of observables, one might be concerned that this badly damaged orbit spectrum would lead to correspondingly badly damaged observables. The exact invariant measure for odd integer should also be uniform, so that for the observable O(x) = x q the exact expectation value is O ex = 1 q+1 . Figure 2 plots the relative error, (O ex − O fp )∕O ex versus q, where 1 ≤ q ≤ 100, for three different odd integer values of . While the magnitude of this relative error may be regarded as small, it is nonetheless about two orders of magnitude larger than machine precision, and clearly trends upward with q.
In the more general case of fractional , the invariant measure has the interesting structure shown in Figure 1. Floating-point roundoff properties figure necessarily into the calculation of the orbits, and floating-point orbit periods are no longer restricted to powers of two. The floating-point orbits themselves are fewer and further between, and orbit periods tend to be much smaller than they are for (odd) integer . In the case = 3 2 , for example, while there are exact prime periodic orbits for all periods ≥ 3, there are a total of only ten floating-point periodic orbits. Beyond the trivial period-one orbit {0}, the next one has period 186, followed by periods 243, 270, 404, 540, 960, 1800, 3479, and 11050, the last of these being the longest orbit present. It is in such cases that the most serious problems are encountered -serious in that the answers obtained are not obviously wrong, but wrong nonetheless. For observable x q , Figure 3 plots the relative error, (O ex − O fp )∕O ex versus q, where 1 ≤ q ≤ 100, for three different fractional values of . It is seen that the error incurred can be substantial indeed; for 1 ≤ q ≤ 100, it can reach approximately 2.5% for = 5 4 , 14% for = 4 3 , and 7.5% for = 3 2 . These errors are far greater than those encountered for odd integer .
Thus far, we have restricted attention to observables of the form x q . To see that similar problems will be encountered for more general functions of x, we can examine the invariant measure itself. The theoretical invariant measure was presented in Figure 1, but we can also compute a "numerical invariant www.advancedsciencenews.com www.advtheorysimul.com  measure." This is done by taking the collection of all points on periodic orbits, weighting each by the fractional size of the basin of attraction of its orbit, and constructing a weighted histogram accordingly. For even integer , this would result in a delta distribution at x = 0. The histograms for = 3, 5, 7, 9, and for = 3 2 , 4 3 , 5 4 , 6 5 are presented in Figure 4, along with the invariant measures of the corresponding continuum systems, calculated using Equation (1). It is seen that while the invariant measures coincide, at least on the scale illustrated in the figure, for odd integer , they differ by order unity for non-integer . This egregious discrepancy in the invariant measure is the origin of the order unity differences observed between the theoretical and numerical expectation values of x q . The above argument makes manifest that similar differences would be observed for the expectation value of almost any function of x.
The suggestion that the error is due to short maximum orbit periods is confirmed by Figure 5, which plots the maximum fractional error observed for 1 ≤ q ≤ 100 versus the period of the longest orbit present in the floating-point dynamics. For the values of considered, the smallest errors observed were those for = 6 5 for which the longest orbit period is 36,897. The largest were those for = 4 3 for which the longest orbit period is only 3,567. The downward trend is clear from the figure. It should be noted that all the maximum periods observed for fractional are several orders of magnitude smaller than those for (odd) integer , as recorded in Table 1. This is consistent with the observed discrepancies for fractional being much higher than those for (odd) integer . For that matter, it is also consistent with the observed discrepancies for even integer being highest of all, since those cases have only one orbit of length one, namely {0}.
The grossly truncated nature of the periodic orbit spectrum and the shortness of the orbits for fractional beta strongly suggest that these problems will not be mitigated by increasing the mantissa length. No matter how long the mantissa, floating-point numbers will always be dyadic rational numbers. The periodic orbit spectrum will always be limited to orbits all of whose states are dyadic rationals. The topological entropy will still be negative, rather than positive. The period of the kth floating-point periodic orbit, ordered by period, is likely to be far smaller than the period of that of the continuum system. In short, simply throwing more bits of precision at this problem is unlikely to make it go away.
Efforts to justify the validity of using floating-point arithmetic for chaotic dynamical systems often appeal to the Shadowing Lemma. [20] For a discrete hyperbolic map x n+1 = f (x n ), this states that for any > 0, however small, there exists an > 0 such that a sequence of computer-generated points {y n } ∞ n=0 for which y n+1 is within an epsilon ball of f (y n ) will itself lie within a neighbourhood of some true orbit. If is taken to be machine precision, this suggests that a computer-generated orbit will always be close to an actual orbit. Lest one derive undue comfort from this observation, however, it is important to note that the Shadowing Lemma provides no guarantee that the statistics of the subset of continuum orbits that are shadowed by computer-generated orbits will be at all similar to those of the entire collection of true orbits of the system.
As an illustration, consider the example of = 2, where the most egregious errors in floating point arithmetic arise. For = 2, the only roundoff error that takes place is when we represent the initial conditions. After that, there is no roundoff error at all. Suppose that we could somehow sample the initial condition x 0 from the true invariant measure on [0,1). The computer will round x 0 to a dyadic fraction y 0 , and dyadic fractions all eventually reach the orbit {0} under the action of the map. The sequence reported by the computer, {y n }, is an actual orbit; hence it is shadowed by itself. It neither shadows nor is shadowed by the orbit {x n }. The sequence {y n } will not remain in any reasonable neighborhood of the sequence {x n } and vice versa. [21] In the end, the sequence {x n } will sample the invariant measure precisely for almost all initial conditions x 0 , but the computable sequence {y n } will do nothing of the sort. There is nothing here that is in contradiction with the Shadowing Lemma; the lemma just has nothing to say about the origins of the floating point pathologies we have described.
In conclusion, we have demonstrated a serious systematic error in the numerical calculation of the statistical properties of a very representative chaotic nonlinear dynamical system. This error is distinct from previously studied numerical errors related to rounding and loss of precision, in that it would persist for any finite-precision mantissa, however large. It arises from the discreteness of floating-point numbers, their non-uniform distribution along the real axis, and their inability to represent points on periodic orbits of the dynamics in a precise way, giving rise to a dramatically truncated periodic orbit spectrum. It cannot be mitigated by the use of fixed-point arithmetic or other recently proposed adjustments to the floating-point system of representation [22,23] owing to the discrete nature of any finite-state digital computer.
It is true that many chaotic dynamical systems of interest in the natural sciences are far more complex than the generalized Bernoulli map presented here. The chaos of turbulent fluid flow, for example, is subtly correlated in ways that have no analogue in the model considered in this paper. We do not believe, however, that practitioners should draw any comfort from the fact that their models are more complex than this one. Rather, we would suggest that if so simple a system exhibits such egregious pathologies, a more complex system will probably exhibit even more devilish ones. Hence we see no reason to doubt that substantial errors of this sort will be present in numerical simulations of chaotic dynamical systems of widespread interest in science and engineering, including computer simulations of thermostatted molecular dynamics, [6] turbulent fluid dynamics, and reaction-diffusion dynamics, about which until now computational scientists have been completely unaware.