In the 1920s in Cambridge, England, Muriel Bristol, a biologist at the new Rothamsted Experimental Station, claimed that she could tell by the taste of a cup of tea whether the milk or the tea had been poured into the cup first (Fig. 1). Pompous dons scorned the idea: “It just can't be done, don't y'know!” But why not? Could it be true? And if so, how could we tell?

Dr. Bristol's future husband, William Roach, suggested that she be given a chance to prove her claim, an event that not only makes entertaining reading, but is important in the history of science.1, 2 That is because another Rothamsted scientist, Ronald A. Fisher (1890-1962), who was a founder of modern statistics, suggested a way to make sense or nonsense of Dr. Bristol's claim.

If Dr. Bristol was just making an attention-seeking claim, she should have been able to taste a cup of tea and have a 50-50 probability of getting or guessing, just by chance, the right pouring order. By itself, however, a single sip between cup and lip would be quite useless in evaluating her claim. On the other hand, Fisher thought he could formalize a test that might be more convincing. What would that require, and what would the result mean? What do we even mean by a ‘test’? And by the way, where does that obvious-sounding 50-50 probability come from?

In this case, the test would have two possible outcomes: tea first, then milk, or milk first, then tea. If you were taking the test and you really were just guessing, your guess should bear no resemblance to the truth. But you might get lucky. The key to Fisher's strategy lies in the word ‘probability,’ or the idea of getting it right ‘just by chance.’ The classical way to determine the difference between chance and causal truth is to undertake a series of repeated trials of the same condition, in which probability refers to the frequency of hits, or correct responses.

In the event, Fisher confronted Dr. Bristol with eight cups of tea (Fig. 2), four of which were milk-first, and the other four tea-first. She guessed right all eight times. Or was it guessing? (The Brits do know their tea, after all.) Fisher worked out the number of ways in which a person who was just guessing could get one, two, three … up to all eight guesses correct. He did this by assuming a fixed probability ½ that any given guess would be correct ‘just by chance.’

For example, there are eight waysto get one guessright and sevenwrong: RWWWWWWW, WRWWWWWW, and so on to WWWWWWWR. If we assume that the sip from each cup was an independent tasting, the probability of each guess being right just by chance is 1/2½; the probability of each specific set of the preceding string of answers being right is (1/2)8. There is a similarly specifiable number of ways that one could make seven of the eight guesses right, again “just by chance.” If you total up all of these sets of possibilities, from none right to eight right, they have to add up to 1.0; that is, all possible outcomes. Then you can ask, if Dr. Bristol was really just guessing, what is the chance that she would be as successful as she in fact was? Does her palate match her performance?

In 1925, Fisher worked out all of these probabilities in a famous book, *Statistical Methods for Research Workers* (http://psychclassics.yorku. ca /Fisher/Methods/). Later, in a retrospective chapter, he described the tea-tasting experiment.3 I've simplified it a bit because, for example, one might also take into consideration the order in which the cups were presented or the fraction of cups that were poured milk-first.

It seems natural or even obvious that the set-up would be four cups of tea-first and then four of half-milk but that, perhaps, is just a cultural bias. The cups could be, say, six with milk first and two with tea first. But 50-50 optimizes the power of the test to discriminate luck from Lapsong. In fact, Fisher told Dr. Bristol that there were four cups of each kind, though not the order in which they were presented to her. Could she have detected the knowing but inadvertent expressions on Fisher's countenance, the way that soothsayers do? Could the pourer have used a not exactly halfway line in the cup for the first item put in, leaving a color difference between the two pouring orders? Were they all stirred thoroughly? Could knowing that half the cups were milk-first affect her guessing and the probabilities used to evaluate the results? Presumably Dr. Bristol wasn't told how she was doing, because then her guesses really would not have been independent. She would have known, for example, that when she came to the last cup it had to be whichever state she had identified three times up to that point (remembering what's already been played is one way people try to win at cards.)