**Tags**

While out and about, you run into an old friend of yours. To your surprise, they have a young boy in tow. This friend quickly tells you all about the child, most notably that they were born on a Tuesday, but then hurries off to pick up their other child. You wave goodbye, then realize: you’d no idea they had *two* kids, let alone one, yet your friend only talked about the tyke next to them. What can we say about this other child?

An obvious starting point is their gender. Not because it’s important, but because it’s easy: we only assign babies to one of two genders, mare or female. That suggests the following diagram.

2nd Child | |
---|---|

male | female |

There are only two possibilities, therefore you’re 50% certain the other child was assigned male, and 50% certain they were assigned female.

We can double-check this with Bayes’ Theorem. Let’s represent “the next child is male” with *B*, and the odds of *B* with *p(B).* The odds of two male children coming up will be *p(BB)*, and the odds of the second child being male, assuming the first was, as *p(BB | B1)*. That translates to

The same logic applies with the genders swapped.

The Bayesian version is more flexible than the table one; some wags point out that, worldwide, there are 107 boys for every 100 girls. That’s tough to accommodate in tabular form, but trivial via Bayes Theorem.

But hold on here: we specified that this person had two children, no more no less. So all the possibilities are

1st Child | 2nd Child |
---|---|

male | male |

male | female |

female | male |

female | female |

There are four states in total: one is impossible as it has two girls, two have the other child is a girl (order doesn’t matter here), and one has a boy as the other. The odds of the other kid being a girl are actually two-thirds, not half, and thus the odds of them being a boy is a third.

The Bayesian account shifts slightly. The likelihood of encountering a boy given a two-child pairing is 3/4, according to the above table, but so too is the likelihood of encountering a girl, thus the two hypotheses wind up with equal probability. If we let *E* stand for “encountering a child of an old friend,” and *2K* stand for “that friend having two kids,”

So each hypothesis carries equal weight. Those probabilities get shoved to certainties when we learn the child actually is a boy. As for the actual question,

Where *p(BG)* is the probability of a male/female pair, ignoring order, and *p(B)* the odds of one of the children being a boy (which the question sets at 100%). And to satisfy the wags,

So which answer is right? Let’s pretend your friend had an infinite number of kids, and always brings the first when wandering around. If we assume perfect gender parity, then of all the infinite arrangements half will have a boy as the first kid, while half will have a girl. We can then ignore that first kid and consider the rest of the possibilities. But since an infinite list doesn’t get shorter when you subtract an item from it, we’re left with the *exact same collection* of infinite arrangements. Every single arrangement after we subtract that kid can be mapped to one before, with no extras or gaps. We can then reapply the same logic of the first kid to the second: if the first kid had a fifty-fifty chance of being assigned female, so too must the second.

But we’re not dealing with an infinite number of kids here. The question states there are only two kids, hence “other” instead of “another.” Selecting one without replacing them *does* have an impact on the rest of the sample, especially one this small, and to ignore that would be to ignore some information shared by the question. The third-and-two-third answer seems to be most sensible of the two; indeed, when Marilyn vos Savant empirically evaluated this question via poll, she got that answer as opposed to a fifty-fifty split, and some very pedantic researchers looked at actual demographic data to arrive at the same conclusion.[1]

No, that’s not quite right. vos Savant nor the researchers didn’t consider this *exact* question, because this version mentions the boy was born on a Tuesday. It seems like a trivial detail, but then again I nitpicked the difference between “other” and “another.” Let’s redo the analysis, this time incorporating birth day of week.

Right, so 169 of 196 possibilities drop out because they involve a pair of girls or no boys born on Tuesdays. That leaves 27 left, of which 14 have girls as the next child, so that puts the odds of the second child being female at 14/27.

14/27?! What a bizarre number, yet if you count squares above you’ll see it’s true. The Bayesian approach comes to the same conclusion.

And this time we’ll pander to the scientific pedants:

This result screams at common sense. How can such a trivial detail have so great an effect? Yet the explanation is exactly the same as for the two-child version: samples of a finite pool without replacement can do strange things. By bumping up the size of the pool, from four to 196, we make the scenario behave more like the infinite case. So rejecting 13/27 as an answer means rejecting 1/3 as well.

But accepting it leads to all of wild scenarios. What if we changed the qualifier from “born on a Tuesday” to, say, being born in December? Or being born with red hair, or being born on a leap day?

It looks like we can rig the results to whatever we want. And what’s worse, it isn’t because we’re deploying sketchy statistical methods or screwing up our math; all we’re doing is changing what we sample in the possibility space, by changing what we consider important to the answer.

Normally this would be the place where I turn around and cut to a simple solution. But this is a “problem” and not a “paradox:” it’s really just a reminder to carefully consider “what do we care about?” If a variable isn’t important to your conclusions, don’t factor it in. If it is, justify why, and do some quick testing to see how much it wiggles your conclusion around. Be very mindful of how sampling can change your conclusion, because even trivial-looking decisions can do so.

I know this example sounds like a rare corner case, but there are variations that recur in the scientific literature.

How different investigators might conceive the planning and execution of a study can also lead to p values with widely varying magnitudes. As an example of this, let us examine Fisher’s (1935, ch. 2) classic experiment of the ‘lady tasting tea,’ as described by Lindley (1993). The lady in question claimed she could distinguish between whether milk or tea had been poured first into a cup (of tea). In the experiment, the lady is presented with six pairs of cups of tea, and she must determine whether milk or tea entered the cup first. The null hypothesis—that she cannot, in fact, discriminate—is that she would guess 50% right (R) and 50% wrong (W). Suppose that she gets the first five results right and the last one wrong, or RRRRRW. The p value for this outcome, Lindley notes, is 7(1⁄2) 6 , or .110, which is not statistically significant at the .05 level. This p value, like all of them, consists of two parts. In this case: 6(1⁄2) 6 = .094 (probability of observed outcomes) + (1⁄2) 6 = .016 (probability of more extreme outcomes). …

Suppose instead of the above design, another researcher decides to repeat the experiment until the lady makes her first mistake. In this case, and with the same RRRRRW data, the p value is now statistically significant at the .032 level [(1⁄2) 6 + (1⁄2) 6 = .016 + .016 = .032]. The two parts of this p value are explained as follows: (1⁄2) 6 = .016 (probability of observed outcomes)—but without this expression being multiplied by 6 because the mistaken choice, W, must always come at the end (see, e.g., Goodman, 1999)— + (1⁄2) 6 = .016 (probability of more extreme outcomes).

Of course, these experimental results make no sense. The exact same data, obtained in the exact same sequence, should yield the exact same p values. But they do not. And all because two different investigators held alternate conceptions as to how the experiment should be run.[2]

There’s a huge debate in the literature about stopping rules, as it’s been known for some time that they can change study outcomes. Switching to the Bayesian interpretation is an easy fix for that one, but it should be obvious from the above that sampling issues will persist no matter which probability framework you choose.

Account for them as best you can.

[1] Carlton, Matthew A., and William D. Stansfield. “Making babies by the flip of a coin?.” *The American Statistician* 59.2 (2005).

[2] Hubbard, Raymond, and R. Murray Lindsay. “Why P values are not a useful measure of evidence in statistical significance testing.” *Theory & Psychology* 18.1 (2008): 69-88.

giliell

said:wait, I need to nitpick your first table, because I don’t get it.

You say that the chances is that the other child is a girl is 2/3, but if we look at the table there are twice as many “male” than “female” children.

We don’t know if this child is first or second, so we cannot eliminate that first “female” in the first side, ok, but that also means we don’t know which of the “males” our Tuesday boy is, so the first “male” “male” row should get counted twice:

Possibility 1: He’s the firstborn and has a younger brother

Possibility 2: He’s the secondborn and has an older brother

Possibility 3: He’s the firstborn and has a younger sister

Possibility 4: He’s the secondborn and has an older sister

4 possibilities, 2 male, 2 female

hjhornbeck

said:The order is important to your sampling, yet it isn’t to that table. Consider these statements:

The second one gives the expected two-thirds odds (try it, if you don’t believe me), while the others are fifty-fifty. You’ve appended the last scenarios together, and since each of them is equally likely the odds of getting heads are (1/2)*(1/2) + (1/2)*(1/2) = 1/2.

Probability can be tricky business. You don’t want to know how many drafts this post had. 😛