1/2 a girl vs. 2/3 a boy; or—I suck at stats

Listen, here's the thing. If you can't spot the sucker in the first half hour at the table, then you are the sucker.
“Matt Damon [1]”, “Rounders [2]”

Back in my college days, I was invited to a poker game, and I'm sure by sheer coincidence, the said day just happened to be pay day. Now, while I knew (and still know) what the various hands are (“flush”—five cards of the same suit, “full house”—three of a kind with a pair, “royal flush”—the ace, king, queen, jack and 10 of a single suit, etc), I didn't know (and still don't) the ranking of the hands—which hands won over which hands. I was assured that wouldn't matter and that I could have a “cheat sheet.” So I arrive at the game with a huge pocket full of money and an attitude of “how hard can this be?” Said attitude was reinforced as I won a few early rounds.

The end of the night came with the end of my money.

I learned two lessons that night:

While the first lesson sunk in (and to this day, I haven't played a game of poker, so my record stands at a rather dismal 0–1) I forgot the second lesson—that I suck at statistics.

Monday, I wrote about pairs of kids [3] and the odds of a particular pairing, given some information.

Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. **What are the odds that person has a boy and a girl?**

“Coding Horror: The Problem of the Unfinished Game [4]”

I read the explanation for the 2/3 results [5], said “Okay, I can see that,” accepted it as gospel and went about my business, which involved me going back and forth with someone over this issue [6], with both of us firm on our respective view points (me: 2/3; Vorlath [7]: 1/2).

Wanting to settle this once and for all, I wrote a very verbose program [8] (it's written for clarity, not to be fast or anything—this is a very tricky problem and yes, the program is verbose) that picks a bazillion pairs of kids and brute forces the results so that I can figure out who's right and who's wrong.

Table: Number of kids
 	Value	Percentage
------------------------------
Total # of kids	20000000	100.0
Boys	10002254	50.0
Girls	9997746	50.0

I ran this program for 10,000,000 pairs. 20,000,000 virtual kids were created for this. 50% boys, 50% girls. No controversy here.

Table: Pair Stats
 	Value	Percentage
------------------------------
Total # of pairs	10000000	100.0
Boy/Boy	2501203	25.0
Boy/Girl	2499753	25.0
Girl/Boy	2500095	25.0
Girl/Girl	2498949	25.0
At least one Boy	7501051	75.0
At least one Girl	7498797	75.0

Again, nothing unexpected here either. Four possible pairings, 25% of each pairing. 75% of the pairings will have at least one girl, and 75% will have at least one boy. Again, straight from the numbers. So far, so good.

Table: Disclosure table #1—Overview
 	Value	Percentage
------------------------------
Total # of pairs	10000000	100.0
Disclosed First Kid	5000671	50.0
Disclosed Second Kid	4999329	50.0
Disclosed Girl	4999440	50.0
Disclosed Boy	5000560	50.0

Nothing seems wrong here; half the kids being disclosed are the first ones; independently, half of the kids being disclosed are boys. But there is a problem here, but for now, I'll leave it to the reader to spot the issue (and it is an issue with this problem). I didn't spot the problem until later myself.

Table: Disclosure table #2—disclosed a Girl
 	Value	Percentage
------------------------------
Disclosed Girl	4999440	100.0
  First kid	2499547	50.0
  Second kid	2499893	50.0
Disclosed Girl, other girl	2498949	50.0
  First kid	1249211	25.0
  Second kid	1249738	25.0
Disclosed Girl, other boy	2500491	50.0
  First kid	1250336	25.0
  Second kid	1249738	25.0
Disclosed Girl, pick girl, correct	2498949	50.0
  First kid	1249211	25.0
  Second kid	1249738	25.0
Disclosed Girl, pick girl, wrong	2500491	50.0
  First kid	1250336	25.0
  Second kid	1250155	25.0
Disclosed Girl, pick boy, correct	2500491	50.0
  First kid	1250336	25.0
  Second kid	1250155	25.0
Disclosed Girl, pick boy, wrong	2498949	50.0
  First kid	1249211	25.0
  Second kid	1249738	25.0

[The first three lines of this particular table can be read as:

The line labeled “Disclosed Girl, pick girl, correct” can be read as: a girl was disclosed, we picked the other kid as being a girl, and we were correct.” —Editor]

Well … XXXX! I was wrong! The odds are 50/50. I was all set to start posting this when I noticed Vorlath conceeding the 2/3 position on this follow- up post [9].

I must have missed something in the program.

Okay, what if I exclude from consideration the boy/boy pairs entirely? How do the odds change then? One two-line patch later and …

Table: Number of kids
 	Value	Percentage
------------------------------
Total # of kids	15000398	100.0
Boys	4998619	33.3
Girls	10001779	66.7

Okay, numbers are 75% of what we had … so far so good.

Table: Pair Stats
 	Value	Percentage
------------------------------
Total # of pairs	7500199	100.0
Boy/Boy	0	0.0
Boy/Girl	2500052	33.3
Girl/Boy	2498567	33.3
Girl/Girl	2501580	33.4
At least one Boy	4998619	66.6
At least one Girl	7500199	100.0

Yes, that's what would be expected by dropping a quarter of all pairings.

Table: Disclosure table #1—Overview
 	Value	Percentage
------------------------------
Total # of pairs	7500199	100.0
Disclosed First Kid	3750492	50.0
Disclosed Second Kid	3749707	50.0
Disclosed Girl	5002113	66.7
Disclosed Boy	2498086	33.3

Table: Disclosure table #2—disclosed a Girl
 	Value	Percentage
------------------------------
Disclosed Girl	5002113	100.0
  First kid	2500888	50.0
  Second kid	2501225	50.0
Disclosed Girl, other girl	2501580	50.0
  First kid	1250803	25.0
  Second kid	1250777	25.0
Disclosed Girl, other boy	2500533	50.0
  First kid	1250085	25.0
  Second kid	1250777	25.0
Disclosed Girl, pick girl, correct	2501580	50.0
  First kid	1250803	25.0
  Second kid	1250777	25.0
Disclosed Girl, pick girl, wrong	2500533	50.0
  First kid	1250085	25.0
  Second kid	1250448	25.0
Disclosed Girl, pick boy, correct	2500533	50.0
  First kid	1250085	25.0
  Second kid	1250448	25.0
Disclosed Girl, pick boy, wrong	2501580	50.0
  First kid	1250803	25.0
  Second kid	1250777	25.0

And it's still 50/50! Am I missing anything else?

Okay, re-read even more comments [10] and looking closer at the original problem statment:

Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. **What are the odds that person has a boy and a girl?**

“Coding Horror: The Problem of the Unfinished Game [11]”

Oh, there's an unstated assumption going on—namely, what gender the hypothetically speaking parent will reveal! So far, I've had the hypothetically speaking parent disclosing a randomly picked child (first or second), which could be either a girl or a boy. Add some more lines to force the child to be disclosed as a girl (if there is a girl) and …

Table: Disclosure table #2—disclosed a Girl
 	Value	Percentage
------------------------------
Disclosed Girl	7500174	100.0
  First kid	4999692	66.7
  Second kid	2500482	33.3
Disclosed Girl, other girl	2501019	33.3
  First kid	2501019	33.3
  Second kid	0	0.0
Disclosed Girl, other boy	4999155	66.7
  First kid	2498673	33.3
  Second kid	0	0.0
Disclosed Girl, pick girl, correct	2501019	33.3
  First kid	2501019	33.3
  Second kid	0	0.0
Disclosed Girl, pick girl, wrong	4999155	66.7
  First kid	2498673	33.3
  Second kid	2500482	33.3
Disclosed Girl, pick boy, correct	4999155	66.7
  First kid	2498673	33.3
  Second kid	2500482	33.3
Disclosed Girl, pick boy, wrong	2501019	33.3
  First kid	2501019	33.3
  Second kid	0	0.0

Sheesh!

So, I suck at statistics, and statistical word problems are hard to write properly.

And now I can put this problem to rest.

[1] http://www.imdb.com/name/nm0000354/

[2] http://www.imdb.com/title/tt0128442/

[3] /boston/2009/01/05.1

[4] http://www.codinghorror.com/blog/archives/001203.html

[5] http://www.codinghorror.com/blog/archives/001204.html

[6] http://my.opera.com/Vorlath/blog/2009/01/04/sample-space

[7] http://my.opera.com/Vorlath/blog/

[8] /boston/2009/01/09/kids.c

[9] http://www.codinghorror.com/blog/archives/001204.html

[10] http://www.codinghorror.com/blog/archives/001204.html

[11] http://www.codinghorror.com/blog/archives/001203.html

Gemini Mention this post

Contact the author