## The birthday problem

The classical birthday problem is well known among statisticians and even among some school children. It often comes as one of their early lessons in probability – perhaps because it involves birthdays, which all children like, and it has a somewhat unexpected answer. It is also fairly simple to solve. It can be expressed as follows: “How large a group of people do you need to make it more likely than not that two of them share a birthday?” Or, in a classroom: “How big does this class have to be before two children are likely to have the same birthday?” Most children (and most adults) would guess the answer to be around half of 365 – call it 183, since you cannot have half-people; of course they are way out. The correct answer is 23 – see the box. In a class of 23 children, it is more likely than not that two of them have the same birthday. On a football pitch there are 23 players if you count the referee; in half of all football matches there will be at least one shared birthday on the pitch.

### How many children do you need in a class before two of them are likely to have the same birthday?

The way to work this out is to calculate the probability that the children all have *different* birthdays, and then subtract that answer from 1. Suppose the class announce their birthdays in turn. Child A has 365 possible days for his birthday (we will ignore leap years, here and throughout, as one complication too many); suppose it is June 27th. Child B has one chance in 365 of having the same birthday – so 364 chances in 365 of having a different one. Say hers is April 3rd. If we have got this far without a shared birthday, Child C announces his birthday (it is November 5th); the probability that it is neither of those two dates is 363 out of 365. So the probability of the first three children all having different birthdays is 364/365 × 363/365, which is about 0.99179.

Child D has her turn. Hers must not be on any of the three dates mentioned so far, which leaves 362 possible days out of 365; so the probability of this fourth birthday also being different is the previous answer multiplied by 362/365 (which comes to 0.98364); and so on down the rest of the class. By the time the 23rd child gives his birthday, the probability that all of them so far are different is 364/365 × 363/365 × 362/365 × 361/365 × … × 343/365. This comes out to 0.493, which is a little under a half. In other words, there is a just under 50–50 chance that all the birthdays so far are different.

And that in turn means that with 23 children there is a just over 50–50 chance that not all the birthdays are different, and that two of them share the same day.

So far so simple. And we can continue beyond 50–50 chances: in a party of 57 people there is a 0.99 probability of a shared birthday; 70 people are enough to give a 0.999 probability^{1}. And of course if you have 365 (or 366 if you really want leap years) people in a room, a shared birthday is certain. Even for a group of 100 people the probability is 0.999999693, which is as close to one as makes no real difference.

However, we have made two assumptions in all of this. Firstly, we supposed that birthdays are independent – that the date of one birthday does not affect the date of another. This would not be true if we had done the experiment at a convention of twins and triplets – one twin's birthday would determine the birthday of someone else in the room. We have also assumed equiprobability: that birthdays are evenly distributed throughout the year – that January 1st, April 23rd and September 19th are all equally likely as birthdays, each with a probability of 1/365.

The assumption of uniform birth dates simplifies these calculations considerably; however, it is not actually true. There is much evidence that it does not hold for real human populations: birthday distributions actually depend on social, religious, economic and environmental factors. Figure 1 illustrates these patterns for children born in 2011 in four European and four American countries. It uses the adjusted proportions of months of birth and approximate month of conception, and takes into account the different number of days in months.

The countries shown have substantial variations in the months when peaks occur, in their amplitudes, and in the deviations from the hypothesis of uniformity. Western Europe has birth-peaks from July to October, with spring troughs; in the Americas, Brazil's births are distributed in an opposite pattern. These differences may reflect environmental factors, notably temperature, which has an effect on human fertility and on human desires^{2}. However, aggregated national monthly figures like these conceal effects resulting from (i) the well-known variability of the number of births by day of the week (hospitals discourage weekend births); (ii) public holidays, which may result in potential increments of conceptions; and (iii) regional variations. What effect does this non-uniform spread of birthdays have on our calculations? Do they increase the probability of common birthdays or decrease it?

It has been shown^{3} that any deviations from the equiprobable birthday model *increase* the probability of finding at least two individuals with a common birthday out of a group of size *n*; our previous answers are therefore an upper bound for the birthday problem solution for any real-life, non-uniform distributions. It is also known that the adjustments required to account for such empirical distributions tend to be small – too small to change the results from the equiprobability assumption, even by one person^{4}.