Statistical Diversions

# Statistical Diversions

Article first published online: 16 JAN 2014

DOI: 10.1111/test.12034

© 2013 The Authors. Teaching Statistics © 2013 Teaching Statistics Trust

Additional Information

#### How to Cite

Petocz, P. and Sowey, E. (2014), Statistical Diversions. Teaching Statistics, 36: 27–31. doi: 10.1111/test.12034

#### Publication History

- Issue published online: 16 JAN 2014
- Article first published online: 16 JAN 2014

In our column in *Teaching Statistics* vol. 35 no. 2 last year, we looked at some puzzles and paradoxes of probability arising from the subtle concepts of randomness and statistical independence. We now want to uncover some paradoxes of randomness.

We saw previously that statisticians are concerned with two aspects of the concept of randomness: defining the characteristics of a random sequence of numbers (as a theoretical underpinning of the concept of a ‘random variable’) and then defining operational methods for generating sequences of numbers that have these characteristics (as the basis for selecting a random sample and for many other practical applications). The first of these aspects we can summarize by saying that a random sequence is, over a ‘very long’ run of numbers, notionally patternless and each number is unpredictable from knowledge of those that came before. We say ‘notionally patternless’ because there is no formal definition of a ‘very long’ run and because there is no limit to the kinds of patterns we might want *not* to have in a random sequence. In practice, we have to limit ourselves to some small set of patterns to be excluded – specified so as to be appropriate to the practical context for which we need the random numbers. Then, we look for ways of generating a sequence of numbers in a way that has a low chance of such patterns turning up.

When it comes to unpredictability, we are similarly up against a practical constraint. In theory, unpredictability requires that a specific kind of pattern be absent: there must be no exact relation connecting the *n*^{th} random number to the (*n* − 1)^{th}, no exact relation connecting the *n*^{th} random number to any combination of the (*n* − 1)^{th} and the (*n* − 2)^{th}, and so on. In practice, we usually limit ourselves to guarding against only the first of these. The discussion so far presumes that our context is one where we choose to, or are obligated to, generate (what we shall call) ‘truly random’ numbers. As we shall see, there are statistical contexts where truly random numbers are not necessarily the statistician's first choice.

Well, then, how are truly random numbers generated and for what purposes? An everyday instance is tossing a coin to resolve an ‘either-or’ choice of action. Intuitively, we accept that the many small motions of the tossing hand and chance variations in the forces applied to the coin will ensure the unpredictability of the outcome at each toss and an absence of systematic patterns in any long sequence of tosses. (It's assumed that there is no trickery, for example, by catching the coin and glimpsing its face before it is slapped against the wrist.) Surprisingly, however, our intuition may be wrong: coin tosses are not typically random. Such, at least, is the finding of the careful (and mathematically advanced) study of the physics of coin tossing in P. Diaconis, S. Holmes and R. Montgomery (2007), Dynamical bias in the coin toss, *SIAM Review*, **49**, 211–235.

A quite different context is the public drawing of lottery prizes where the first three prizes (say) are all substantial sums of money. Clearly, if these three prizes were won by three tickets purchased successively from the same ticket seller, questions would be asked about the integrity of the draw, *even though such an outcome is perfectly compatible with true randomness over a ‘very long’ run of lottery drawings*. Lottery winners must not only be chosen truly at random, they must also be *seen* to be chosen truly at random. Thus, extensive precautions are taken. For the UK National Lottery, for instance, several bins for mixing balls are available, as are several ball sets. On any particular occasion, a chance-selected bin is paired up with a chance-selected ball set. A visibly thorough mixing of the balls follows. Only then are the winning ball numbers drawn.

Despite similar precautions with all public lotteries, they are sometimes inadequate to ensure a truly random outcome. A prominent example is the 1970 US draft lottery, by which men aged between 20 and 26 years were conscripted for military service in Vietnam according to the day and month of their birth. The departures from randomness in this lottery are informatively analysed in N. Starr (1997), Nonrandom risk: the 1970 draft lottery, *Journal of Statistics Education*, **5** (2), online at http://www.amstat.org/publications/jse/v5n2/datasets.starr.html

With few exceptions, it is only by ‘physical’ methods (i.e. methods based on a physical device such as a lottery bin or a roulette wheel) that one can efficiently generate long sequences of the truly random numbers we have been referring to. It was, indeed, from such devices that statisticians generated random sequences in the 1930s – the early years of modern inferential statistics – for sampling and simulation studies. But before these sequences could be (let's use the term) ‘certified’ as random, they needed to pass statistical tests of randomness, that is, tests suggesting the absence of several kinds of unwanted patterns. It's not surprising, then, that several tests of randomness in number sequences were devised in the same era.

Various physical methods were refined and automated in the following decades. This effort culminated in the US RAND Corporation's production, from 1949 onwards, of ‘a million random digits with 100,000 normal deviates’. These million digits had been carefully tested for randomness before they were published under the above title in 1955. When this unique book of some 600 pages appeared, whimsical book reviewers enjoyed themselves (‘how did they proofread it?’, ‘I can't recommend it: it has thousands of characters but no plot’). Today anyone can download a copy free of charge at http://www.rand.org/pubs/monograph_reports/MR1418.html

However, speedily such ‘certified’ random sequences could be generated in that era, the process of certifying them was actually quite cumbersome, especially when extremely long sequences of random numbers were needed, for example, for life-like simulation of the operation of a large industrial plant. Also, all these numbers needed to be stored long term – a challenge for the limited memory of computers of the time – because replicability of results was indispensable during the testing stage of software designed for simulation studies. A more compact and more reliable method of generating random numbers was needed, perhaps from some kind of formula. But *what* kind of formula?

A remarkable formula was soon proposed – remarkable because it is entirely deterministic! For that very reason, as we shall shortly see, it is enmeshed in its own web of paradoxes. It was first convincingly demonstrated in a paper by the US mathematician, D. H. Lehmer (1951), Mathematical methods in large scale computing units, *Annals of the Computation Laboratory of Harvard University*, **26**, 141–146. The formula Lehmer proposed is called a ‘multiplicative congruential generator’ (MCG). The (*n* + 1)^{th} random number is derived from the *n*^{th} by the recurrence relation *X*_{n+1} = *k*.*X _{n}* (mod

*M*). Here,

*k*and

*M*are parameters:

*k*is termed the ‘multiplier’ and

*M*is termed the ‘modulo’. For given values of

*k*and

*X*,

_{n}*X*

_{n+1}is the integer remainder after the integer quotient from dividing

*k*.

*X*by

_{n}*M*is evaluated. An equation involving modulo arithmetic is called a ‘congruence’, hence the name of the generator. To start the generating process off requires a value

*X*

_{0}, termed the ‘seed’. Here is a simple arithmetic example: put

*X*

_{0}= 9,

*k*= 11,

*M*= 13. Then

*X*

_{1}= 99 (mod 13) = 8,

*X*

_{2}= 88 (mod 13) = 10 and so on. Lehmer and dozens of subsequent writers investigated the best choices of values for

*k*,

*M*and

*X*

_{0}to ensure that the MCG produces sequences that will pass standard tests of randomness. However, it was clear from the outset that every MCG generates sequences that are periodic, that is, after a number of steps (determinable in advance), the sequence repeats itself exactly. In the above simple example, for instance, the sequence repeats after

*X*

_{12}.

If we choose to work with an MCG, we face two paradoxes. Unpredictability, we said above, is an indispensable attribute of a random sequence. How, then, can the MCG legitimately be described as a random number generator when it is obvious that, if the formula is known, *all* the successive values it generates are entirely predictable? A sensible way to resolve this paradox is to adapt the terminology: MCG generated numbers clearly cannot be called truly random, but they can be ‘pseudorandom’ – that is, MCGs can (for suitable parameter values) produce sequences that will pass standard tests of randomness, even though they are not unpredictable.

Then again, how can they be essentially patternless if they are periodic? This time, we cannot escape by changing the terminology. Instead, what we must seek are parameter sets for which the MCG generates a ‘certified’ sequence that is long enough for our needs in a particular application but still shorter than the period of that MCG. Studies of the period length of different MCGs are common in the scholarly literature of the past 50 years, as are similar studies for the many deterministic alternatives to the MCG formula that have been proposed as pseudorandom number generators. Today, scientists in all fields use pseudorandom numbers to solve many different kinds of statistical *and non-statistical* problems. It says a lot about the importance they attach to having suitable long-period pseudorandom number generators with excellent randomness properties that this field of research continues to be fertile, though there are already well over 1000 research papers on the theme.

Further insight on the MCG and its variants, as well as detail on some standard tests of randomness in generated sequences can be found at http://www.ams.org/samplings/feature-column/fcarc-random

It will be clear from this discussion why winning numbers in public lotteries and in electronic gambling machines are not decided by the output of a pseudorandom number generator, but rather by truly random numbers. Truly random numbers are also preferred by many in the field of cryptography for securely encoding messages. This last avenue of application of random numbers has developed so rapidly with the growth of online commerce, mobile telephony and electronic surveillance that there has been a renewal in recent years of research effort on ways of producing reliable truly random sequences. Old ‘physical’ methods (e.g. intermittent capture of mid-calculation values in computer memory; counts per unit time of atomic particles ejected from a radioactive substance) are being reappraised and new ‘physical’ methods (e.g. capture of atmospheric noise; counts of the impacts of cosmic radiation) explored. A rare non-‘physical’ method has also found success, with the report that the successive digits of the decimal expansion of π form a truly random sequence; see Y. Dodge (1966), A natural random number generator, *International Statistical Review*, **64**, 329-344. This was no minor investigation: more than 6 billion digits of π passed multiple tests of randomness!

With so many viable sources of truly random sequences, there is now even a niche industry for the commercial supply of such sequences. A typical supplier can be found at http://www.random.org

We saw above that pseudorandom number generators were devised to assure the replicability of results during the testing stage of software development, without the need to store extensive arrays of generated numbers. That was in an era when computers had sharply limited memory capacity. There is no such restriction today. Thus, it has become feasible to work extensively with truly random number generators. As a philosophical bonus, it avoids the need to grapple with the paradoxical characteristics of pseudorandom number generators.

Fortunately or unfortunately, depending on your perspective, other paradoxes of randomness remain to tantalise and disconcert us! See, for example, our new questions 2 and 5, below.

Here are our answers to the questions from our previous column.

#### Question 1

In the casino game Blackjack (also known as Vingt-et-un or Twenty-one), the gambler aims to score a higher total with his or her cards than the dealer, but no more than 21. Face cards (kings, queens and jacks) are counted as ten, an ace can be counted as one or 11, and other cards are counted at face value. A successful gambler is paid the amount of his or her original stake, plus the return of the stake. But if the gambler and the dealer have the same point total, this is called a ‘push’ and the gambler doesn't win or lose money on that bet. The gambler could then retrieve the stake and quit, or could use it to bet on the next round.

#### Question 2

The author referred to is Sir William Petty (1623–1687), an economist and a founding member of The Royal Society, who pioneered the field of ‘political arithmetic’ (or economic statistics, as we call it today). Here is Petty's blunt assessment of lotteries, written in 1662 in chapter 8 of his book A Treatise of Taxes and Contributions: “Now in the way of lottery, men do also tax themselves in the general, though out of hopes of advantage in particular. A lottery therefore is properly a tax upon unfortunate self-conceited fools. … Now because the world abounds with this kinde of fools, it is not fit that every man that will may cheat every man that would be cheated; but it is rather ordained, that the sovereign should have the guardianship of these fools, or that some favourite should beg the sovereign's right of taking advantage of such men's folly, even as in the case of lunaticks and idiots.” (online at http://www.public-domain-content.com/books/taxes_contributions/8.shtml)

#### Question 3

The first of Adam Smith's two assertions is, in practice, always correct: if you buy all the tickets in a lottery you will win all the prizes but they will, in total, be of lower value than the amount you have spent – otherwise the people running the lottery would make no money. The second assertion appears incorrect: if you buy an extra ticket it would, in fact, increase your chance of winning a prize, *until you had bought so many tickets that you had paid out more than the total of the available prizes*. Perhaps Smith was thinking of a particular context where the tipping point specified in the italicized words of our previous sentence was reached with the acquisition of only relatively few tickets.

Alternatively, Smith may have been thinking intuitively, for there is a frame of reasoning – unrecognized in Smith's day – in which his statement, slightly adapted, makes sense. Our adaptation is to switch attention from a focus on chance (i.e. the *probability* of a return from buying tickets) to a focus on the money amount that might be won or lost as a result of the play of chance (i.e. the *expected return* from buying tickets) and then to reason in terms of expected return.

Suppose 1000 tickets are sold at £1 each in a lottery where there is only one prize, £500. Then, the expected return if you buy *N* tickets is defined by ‘*N* chances in 1000 of winning £500 less £*N*, the cost of the tickets’. In symbols we write: expected return (in £) = 500(*N*/1000) − *N* = − (*N*/2). In a commercial lottery, the expected return will *always* be negative. And the more tickets you buy, the greater will be your expected loss in money terms. In this example, the expected loss will rise to £500 when you buy all the tickets.

#### Question 4

The Musical Dice Game (*Musikalisches Würfelspiel*) was published in Germany in 1793 and attributed by the publisher to Wolfgang Amadeus Mozart. Successive rolls of two dice select 16 individual bars of music from a compendium of bars. These bars, when assembled in the random order selected, produce a harmonious minuet (a type of dance popular in the 17th and 18th centuries) every time. More details can be found in Z. Ruttkay (1997), Composing Mozart variations with dice, *Teaching Statistics*, **19**, 18–19. There are several computer versions of the game, for instance, one by John Chuang at http://sunsite.univie.ac.at/Mozart/dice/ where you can try out the process yourself.

#### Question 5

As we noted in the previous column, the result of repeated straight-up bets at roulette can be modelled using the binomial distribution. Let *n* be the number of bets made and *p* = 1/37 the success probability, then the number of wins, *X*, will have a binomial(*n*, *p*) distribution. Since each bet costs $1 and each successful bet returns $36 (including the original $1 bet), your total profit from *n* bets will be *Y* = 36*X* − *n*, and you will be ahead if this is positive. Now, let us consider the probability of being ahead as a function of *n*, the number of plays.

It is because the number of plays is a discrete variable that the paradox highlighted in the question arises.

To be ahead at any time during the first 35 plays, you need to win only once. Each extra play will give you an added chance of winning and so your probability of being ahead will increase with *n*. During the plays from 36 to 71, however, you will need two wins to be ahead. So when you make the 36th play, your probability of being ahead will go down. For plays from 37 to 71, the probability will increase, but it will go down again when you make the 72nd play, since you will now need three wins to be ahead. As a function of the number of plays, the chance of being ahead increases for every play except for those that are a multiple of 36. At these points, your chance of being ahead decreases.

Using the binomial distribution, we calculated the numerical probabilities of being ahead (i) at an exact multiple of 36 plays and (ii) at one less than the next multiple of 36. These probabilities are graphed in Figure 1 for selected multiples of 36 (‘circles’) and for 35 plays later, just before the next multiple of 36 (‘triangles’). As the number of multiples of 36 increases, you can see that each of these sets of probabilities forms a decreasing sequence, reflecting the basic gambling truth that “the longer you play, the less likely you are to be ahead”.

Figure 2, which plots the binomial probability of being ahead for all values of *n* between 500 and 600, lets you see in close-up how this probability behaves. The function displays a sawtooth pattern, increasing for every value of *n* except the ones that correspond to the next multiple of 36, when the probability drops. This illustrates clearly that, most of the time, playing one extra game will increase your chances of coming out ahead.

*Note*: The normal approximation to the binomial probability of being ahead treats the number of plays as a continuous variable. Then the (approximate) probability of being ahead has no sawtooth pattern, rather it declines monotonically. This can be seen graphically in Figure 1, where the normal approximation to the binomial probability of being ahead is plotted with crosses.

And here are five new questions.

- We remarked above that MCGs can (for suitable parameter values) produce sequences that will pass most tests of randomness. However, there is a pattern that is unavoidable in any sequence generated by an MCG by virtue of its recursive structure (quite apart from its periodicity). What sort of pattern is this? Investigate the pattern using the first 12 values generated by our illustrative MCG:
*X*_{n+1}= 11*X*(mod 13), with_{n}*X*_{0}= 9, by plotting them, using overlapping successive pairs of these values as Cartesian co-ordinates. -
We quote from the work of a British mathematician and philosopher:

“We have a randomizing machine that produces a series of ones and noughts. We require for experimental purposes a random series of 16 ones and noughts. We start the machine which now gives us a series of 16 noughts. We of course reject this series as unsuitable and suspect the machine of being biased. It is returned to the makers for adjustment. When it comes back we have a very long experiment for which we require a random series of 2,000,000 ones and noughts. We leave the machine running … but on checking through the 2,000,000 ones and noughts it produces we are surprised to find not a single run of 16 noughts. Again we suspect it of being biased and send it back. But what is its designer to say to all this? First we send it back because it produces 16 noughts in a row. Very well: he puts in a device to prevent its doing this. We then send it back because it never produces 16 noughts in a row. … It seems we are never satisfied.” Implicit here is another paradox of the notion of randomness. What is this paradox? How do you propose that it be resolved?

- A, B and C are three random events. If A is dependent on B and B is dependent on C, is A necessarily dependent on C?
- From an ordinary deck of playing cards, I take five red and five black cards, shuffle them together well, and offer you the following bet. You start with an amount of $100 (call it your ‘pot’); your first bet is half of this, $50, and I match this amount. I deal out a card: if it is black, you win and take the money that we have bet ($100); if it is red, you lose and I take the money. We continue turning over the cards in the mini-deck and each time you bet, half of your current pot on the colour of the next card and I match your bet. Since there are equal numbers of black and red cards, the game will be fair. Do you agree?
- Two people, A and B, play a coin tossing game. They toss a fair coin repeatedly. After each toss, A scores a point if the result is a ‘head’ while B scores a point if the result is a ‘tail’. What is the most likely value for the proportion of the time (i.e. the proportion of the total number of tosses) for which A is ahead on total points scored? If their game lasts for 1000 tosses and A is ahead for only 50 of these, what should she conclude?

(*Note*: of course, at each toss A is as likely to score a point as B, but something surprising happens as they keep track of their accumulating points, and thus of who is ahead. Indeed, most people would find the answers to these questions paradoxical. While the results can be stated simply, their mathematical demonstration is quite advanced!)

If you have any comments on this column, please e-mail us at Peter.Petocz@mq.edu.au