There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female?
Well, of course, you can’t know for sure, because, by some extraordinary coincidence, the last 100,000 families in a row might have gotten boys on the first try. But in expectation, what fraction of the population is female? In other words, if there were many such countries, what fraction would you expect to observe on average?
I first heard this problem decades ago, and so, perhaps, did you. It comes up in job interviews at places like Google. The answer they expect is simple, definitive and wrong.
And no, it’s not wrong because of small discrepancies between the number of male and female births, or because of anything else that’s extraneous to the spirit of the problem. It’s just really wrong. The correct answer, unlike the expected one, is not simple.
So: Are you smarter than the folks at Google? What’s the answer?
I’d like to include a hat tip here, but I don’t want to make it too easy for you to cheat, so I’ll hat tip when I give you the solution a few days from now. Unless, of course, it shows up in comments first.
The proportion of people who have no girls is 1/2. The proportion with 1 girl is 1/4. The proportion with 2 girls is 1/8. And so on. Therefore the E(Number of Daughters) in any given family is:
Sum(n=1:Inf, (n-1)/2^n)
Which converges to 1. Since all families have 1 sum the proportion of children who are girls is 1/2.
An even more elegant way of proving it is by saying that the probability of any given birth being a girl is always 1/2. Since expectation sums regardless of independence the population expectation is still 1/2.
Let N be the number of daughters a couple has, then P(N=k) = 1/2^(k+1), assuming it’s equally likely to have a boy or girl at each birth.
This implies that EN = \sum_{N=0}^{\infty} N/2^(N+1). By the root test n(th)root(n/2^(n+1)) = n(th)root(1/2) * n(th)root(n/2^n) -> 1 * (1/2) = 1/2 < 1, the sum converges. However, I'm not really sure how to sum this.
This is the expected number of daughters that a couple expects to have and since each couple only has one son, the ratio of sons to daughters is 1:EN. Thus the fraction of the child population that is girls EN/(1+EN).
Running a quick computer simulation it looks like EN = 1, so the ration of daughters to sons is 1/2.
The first comment is the standard answer. Probably the “google answer” that the moderator refers to.
Consider these questions that may spoil the simple approach: Will any parents have 100 girls then a boy? What about twins?
I’d guess the expected answer is 50% female. Still working on the right answer, though :)
Okay, I’ve run a few simulations, and my answer still stands at 50%. I’m curious to know the actual solution now…
I would agree with Doug here, my calculations show the same result. But it sounds too simple and obvious, which might mean that it is, according to Steve, the “simple, definitive and wrong” result.
Whilst Doug did exactly what I would have done… I suspect Doug is wrong, because if they have a boy *they stop*.
So there will always be slightly less girls than boys. How much less? I’m not sure. :P
Also, this is just a modified version of gambler’s ruin.
http://en.wikipedia.org/wiki/Gambler's_ruin
If the answer was anything other than 50% one could easily outperform the stock market (or any martingale for that matter). Just invest some money on Day 1 and keep it in the market until you’ve had one up day. If your expected number of up days was greater than 1/2, you have now just beat the market and are on the road to riches. (If it’s less just short).
I agree with Doug:
“The proportion of people who have no girls is 1/2. The proportion with 1 girl is 1/4. The proportion with 2 girls is 1/8. And so on. Therefore the E(Number of Daughters) in any given family is:
Sum(n=1:Inf, (n-1)/2^n)”
…but my calculation shows it converges to 2 meaning expected number of girls is 2 for a family. The expected number of boys in a family is given by Sum{n=1:Inf} 1/2^n, which converges to 1.
Should I conclude that there would be 33% boys and 67% girls?
50-50. By simple natural selection logic, after a few generations, you can expect (in aggregate) the probability any one person gives birth to a male to drop to 25%.
Quick run through. First child, half are boys and half are girls, so there is the same number of boys and girls as the first child.
The half that had girls will go on to have a second child, half of which will again be boys. So of those with 2 children there will be equal numbers of boys and girls as the second child.
This will continue, with equal numbers of boys and girls in each subsequent child. So I guess the “expected” answer is 50% That at least would be my first answer if asked at an interview.
So if you started with 128 people, 64 would have 1 boy and 64 would have a girl as first child. So 64 would have only one boy.
32 would have a boy and a girl
16 would have a boy and 2 girls
8 would have a boy and 3 girls
4 would have a boy and 4 girls
2 would have a boy and 5 girls
1 would have a boy and 6 girls
1 would have 7 girls
That one would go on to have another child, but with a 50% expectation of boy or girl, it does not alter the outcome.
However, this is only the number of children the first generation has. We want to know the number of females alive at a particular moment. If we assume an average of 2 years between children, then the only child boys are 14 before the 7th girl is born. Assuming equal life expectancy, these will die first. This will result in more girls being alive at any particular moment. However, it is possible that this effect will vanish after a few generations. It becomes clear that it is not such a simple model.
On average the sex ratio is approximately 50%, this is caused by the genetic equilibrium. Some genes increase the probability of having a boy, some genes increase the probability of having a girl. These genes are not evenly distributed across population, so for some pairs the probability of having a boy is let’s say 55%, and for others the probability of having a girl is 55%. The interesting conclusion is that those pairs with 55% probability of having a girl will have more children on average if the follow the rule of having only one son. So the desire to have one son will increase the ratio of the girls to above 50%, the size of this effect depends on the distribution of sex-ratio genes across population.
It is possible to model the resulting genetic disequilibrium, and it is interesting to speculate what will restore it in the long run, perhaps a mutation that creates a genetic predisposition to rebel against the one-boy rule will restore it.
From the last time I participated in this discussion:
“””
No policy changes can possibly affect the gender ratio.
For example, “one child, but you can try again if it’s a girl” means half the families have 1 boy, a quarter have 1 girl + 1 boy, and the last quarter have 2 girls. That’s still a 50/50 ratio overall. “Keep trying until you have a boy” produces a similar effect.
If you want to affect the gender ratio, you need selective abortion or infanticide. I don’t support either of these strategies, but the fact remains that if you see a widely disparate gender ratio, that’s the cause to look for.
“””
Every child born is equally likely to be male or female. That’s not quite the same as “what fraction of the population is female?” though.
A couple ways you could get answers other than 1/2:
– per-child expectation isn’t exactly 50/50; I’ll let the biologists address this one.
– one gender survives longer than the other, on average, so more of them are alive now to be counted as members of the population.
I suspect Google’s “right” answer assumes that a woman’s capacity for off-spring is infinite. On a similar theme, I hear Google have a interview question to do with dividing up the bounty between captain and crew on a pirate ship which also requires one to completely ignore any kind of understanding of the world to get the “right” answer…
Figuring a 50/50 chance of boy or girl (which isn’t exactly accurate I know), I looked at it this simple way: say there are 1000 families. For a first child, 500 have a boy and 500 have a girl. The families with boys stop, and families with girls have another child – 250 boys and 250 girls. This continues on for subsequent kids – 125 boys and 125 girls (and then you just have to overlook the partial kids being born – 62.5 each, etc). The point is, looking at it this way, also, there is the same ratio of boys to girls as there is here – approximately 50-50. I assume this is the simple, definitive, and wrong answer you’re talking about, though. It doesn’t get much simpler than that.
Okay, I’ve knocked up a quick simulation, and I’m consistently getting slightly more girls than boys now. I hadn’t, however, considered running it for more than one generation.
My result makes sense, I think, because a family has AT MOST one boy, and AT LEAST 0 girls, but they can have a theoretically infinite number of girls before they have that boy. All it takes is one improbable family and the average is dragged above 50/50.
Jennifer is almost there. The problem is that in the limit, you have one last family who can keep on going theoretically infinitely until they get a boy.
I’d like to take a stab at what’s wrong with the 50% estimate.
Fist, an example about dice: I am an eminient dice researcher . I roll two dice, and write down the value. I do this a lot of times, say 100, average all the values, and I’m going to get around 7. I have the same chance of rolling 6 as 8, 5 as 9, 4 as 10, 3 as 11, and 2 as 12.
Now suppose I change the rules. When I roll below a 7, I record the roll, but I tell myself to add a ‘bonus roll’ at the end of the experiment. For now, assume that low-valued bonus rolls don’t add another bonus roll–steven’s original example includes the infinite rolling business, but let’s ignore that for the time being.
Let’s look at two trials under the new rules, where I start out choosing to roll 5 dice. In trial 1, the first 5 rolls come up 7,8,10,8,7, and in trial 2, the first 5 rolls come up 7,6,4,6,7. At this point, trial 1 is equally likely to happen as trial 2. If I were to compare the averages of these two trials at the 5 roll mark, they would deviate from 7 by the same amount. BUT we’re forgetting something: my rules require the trial that came out 7,6,4,6,7 to roll 3 more pairs of dice.
There are many, many different ways those 3 pairs of dice could come out, but I trust that we can all agree on the fact that the average roll from those extra 3 pairs of dice will most likely be 7. Not surprisingly, increasing the number of rolls in a trial where the first few rolls come out unusually low is most likely to raise the average.
See what happened there? The rules force unusually low-yielding trials to roll more often, and it lets unusually high-yielding trials stay as they are. Let us all agree that these rules will skew the results I get in favor of high rolls. I will very likely get an average above 7 in each trial I run under the newer rules.
Now for us to finally get something out of that overdrawn example: girls = low rolls, boys = high rolls.
And we get the result that our rules will give us populations with a higher percentage of boy offspring.
If you assume they get infinite tries at having a boy, then your population ends up being infinity due to that poor couple in the limit that just can’t seem to have a boy. Then when you go to figure out the proportion of boys in the population you get:
#boys/infinity = 0
So 0% of the population are boys. Put model in garbage.
Here’s a wrinkle. Not all couples with n daughters will succeed in having child n+1.
Consider the 2 family case
1. BB – 25% converges to 0% girls
2. BG – 50% convergers to 50% girls
3. GB – 25% converges to 50% girls
4. gg – 25% does not converge to 100% girls
the tree of outcomes is:
100% boys x 50% of the time
50% (or 1/2) boys x 25% of the time
1/3 boys x 12.5% (1/8) of the time …
or % boys = Sum[ 1 / (n x 2^n) ], for n = 1 to infinity.
I don’t know if there’s a closed-form solution, but Excel produces after n = 18 that the % boys = 69.3% and thus the % girls = 30.7%
The geometric distribution gives the expected number of trials needs for an event to happen. In this case, it is the number of births before a boy is born. The expected value of the geometric distribution is 1/p. With p set to 50%, the expected number of children born per family is 2. Hence, the proportion “in expectation” must be 50-50 per family and also in aggregate.
The solution that @Doug and others got can be arrived at by noting that the expected number of female births in each family follows a negative binomial distribution (p=.5, r=1), http://en.wikipedia.org/wiki/Negative_binomial_distribution
I suspect this is the “incorrect” solution Steve is referring to. Could it be that the correct answer takes into account death rates? Assume that children that are born first also die first. Since, in each family, the girls are born before the boy, the girls would also die before the boy, so that boys will comprise more than 50% of the population. I haven’t worked this out analytically, but it seems like this should be true.
Am I on the right track?
Let me step in here to assure all of you that:
1) The answer is definitely not 50%.
2) The reason it is not 50% has nothing to do with multiple births. It has nothing to do with changes in the probability of any given child being a girl (that probability is assumed to be 50%, always). It has nothing to do with limits on the number of children a given woman can have.
3) Nobody has yet gotten to the heart of the matter.
4) There’s something very very interesting going on here.
n+1: you have proven that the # of families with % of boys exceeding 50% will exceed the # of families with % of girls exceeding 50%, but that’s a different claim than the claim that # of boys > # of girls, since girl families have more children than boy families.
My earlier comment about the boys dyeing first is wrong, because in each “cycle” of births there are equal numbers of boys and girls. For each only child boy, there are equal numbers of same age girls with siblings.
I don’t see how the numbers of girl and boy offspring of the “seed” parents can be other than 50%
How about fluctuations from statistical means producing a shift. if in one country there are more boys born, this deviation from the mean persists. In the neighboring country with more girls born, the deviation is removed by subsequent births. Say country A, through a ststistical fluke, gets 100% boys in first generation. There will be no girls born. Country B gets 100% girls. There will be a second “batch” of children, half of which will be boys. The average of the two countries is more boys than girls.
My calculaton estimates that the result would be about 56% girls. As I’m sure that everyone else has noticed, some families have more boys that girls, some have more girls than boys, and some are split relatively evenly, for biological reasons. In this model families that have more boys that girls will have less children overall than families that have more girls than boys.
However, based on Steve’s last comment it doesn’t seem that this is the right argument either.
If you look at a single couple and ask what the expected proportion of girls you will find it will be:
0 x 1/2 + 50% x (1/2) ^ 2 + 66.66% x (1/2) ^ 3 + …
and so
Sum from 1 to infinity of (n/n+1) x (1/2) ^ (n+1)
which comes out at about 0.30685.
Can we then say that if each individual family has an expectation of 30.685% girls then the proportion of girls across all families is also 30.685%.
I don’t remember the rules about combining individual expectations but I think if all the families are independent that might be true.
So 30.685%.
I am going to add that the key point is NOT in any way biological; it has nothing to do with some families being more prone to boys than others, or boys having different lifespans than girls, or any such thing. This is not a trick question. There is a key purely mathematical reason why the answer is not 50%.
Some things aren’t clear from the way you present the puzzle.
It is clear that you are fixing the probability of a boy or girl at 50/50 for any birth. Okay.
But are you looking at the immediate or incremental effect, or long term? Are you looking at what happens after many generations? Is it a closed system, or are new men & women from other countries plentiful? When you ask about percentage of the population are you considering lifespans? Are you assuming monogamy, and trying to simulate the effect the gender ratio of children will have on the number of future couples?
Hmm, my calculus is rusty. But is the key mathematical reason at all related to the intuitive idea that this biased stopping rule entails that no family will have more than one boy? Whereas many families will have more than one girl?
At any moment in time, a family can have N children.
N=1
Half of these families have boys, half have girls.
N=2
Three-quarters of the children in these families are girls.
N=3
5/6 of the children in these families are girls.
N=4
7/8 of the children in these families are girls.
etc.
In conclusion, more than half of all children at any moment in time are girls.
Hmm. Maybe there’s more than one steady-state ‘attractor’ solution. And maybe it isn’t steady-state. And starting with a population that has zero children isn’t realistic, so that might eliminate the 50/50 solution.
So… I’m guessing that you really ought to start with a non-zero number of children, in a finite population, finite maturation rate, finite birth and death rates, then go on to derive the correct non-linear differential population equations, and then do the global attractor analysis. Not so simple.
Or in other words, once a family has finished having children, it has on average an even number of boys and girls.
But in every family that will have children in the future, there are at least as many girls as boys (if there are no children yet) or there are more girls than boys (if there is at least 1 child so far).
I note that Mr Landsburg did not ask for the sex-ratio of total births. Rather he asked us for the sex-ratio of the current population. That is a different matter. Suppose, for example, that single children are much healthier and live longer than children with siblings. Only boys are single children.
But that may be a non-mathematical explanation, which Landsburg’s comments seem to rule out.
Steve –
Does your answer have to do with timing? By that, I mean:
A family that has a boy, and therefore has stopped having kids, would have an equal number of boys and girls, in expectation. A family that is still trying, would have at least as many girls as boys. At any given time, the population is comprised of both types of families. Taking the expectation across both types of families says that there will be more girls in the population than boys.
Kausic has it right.
Each family has only one boy. How many girls?
Prob Number of girls
1/2 0
1/2 1
1/4 2
1/8 3
1/16 4
1/32 5
The expected number of girls = 1/2 + 2/4 + 3/8 + 4/16 + 5/32 . . .
The first two numbers sum to 1. The rest of the series sums to 1, as well. So the average family has 1 boy and 2 girls.
Note: I could not figure out the trick to rewrite the sum of the infinite series to solve the thing mathematically, so I did “convergence by spreadsheet” and found that sum the rest of the series converges to 1.
I think Steve means this. Couples toss a perfectly fair coin. If they get H they mark Boy on their door and stop. If they get T they mark Girl on their door and continue. The door always has room for more marks.
In short he asserts the reasonably simple proof for 50% is missing something about probability or convergence or some purely mathematical thing.
Bob Ayers:
But that may be a non-mathematical explanation, which Landsburg’s comments seem to rule out.
Yes, I’m ruling it out. You should assume either that all children live forever, or that I was really asking about sex-ratio of total births, not of the current population.
MattF:
So… I’m guessing that you really ought to start with a non-zero number of children, in a finite population, finite maturation rate, finite birth and death rates, then go on to derive the correct non-linear differential population equations, and then do the global attractor analysis. Not so simple.
All very interesting, but completely irrelevant to what I’ve got in mind.
Cos: It’s a closed system, it’s always been this way, there’s no immigration or emigration, etc. etc.
KenB:
I think Steve means this. Couples toss a perfectly fair coin. If they get H they mark Boy on their door and stop. If they get T they mark Girl on their door and continue. The door always has room for more marks.
Yes. Exactly.
Right, so these immortal children, can they themselves have children? If so is this society polyamorous, i.e. potential fecund pairings are unlimited by the number of virile males?
Incidentally, I did find the answer on google, but I don’t understand it. :P So I’ll wait for one of the interlocutors here to figure it out and read their reasoning.
Maybe this a polygamous Mormon cult which excommunicates many males? It would certainly help get around the limits to childbirth.
OK so I’ll take a stab at this.
Let there be N couples flipping coins in this way.
We let B be the number of boys and G the number of girls born.
B is deterministically N. G is the sum of n geometric random varibles with mean 1. So as n is presumably very large G is approximately normal.
We care about E[G/(B+G)]=E[G/(n+G)]. Now G is approximately symmetric (being normalish). When G is really large (say 1.5n) this fraction goes up (to 0.6 in this case). But when G is small (say 0.5n) it goes down more ddramatically (to 0.25 in this case).
Now I’m not sure how much this effect actually has for n tending to infinity. And I’m certianly not sure this is what you’re looking for but thought I’d throw it out there.
I do not see how this can be answered on mathematics alone. If the sex ratio in the population veers from away 50%, what assumptions do we need to make regarding coupling for new cohorts? As for the initial population, at some point all but a negligible fraction of them stops reproducing because they all will have a boy.
Wouldn’t there be infinitely many girls(however many the couples have managed to have) to at least one boy? At first I thought that there would be a continuing decline in the number of girls with each generation, but thinking about it more, since there is only ever one boy per couple and one couple could potentially have a near infinite number of girls before having a boy(not the greatest of probabilities, but in the mathematical sense it could happen), then it would actually work the other way where the number of girls could be infinitely larger than the number of boys.
Non-mathematical explanations are explicitly excluded from consideration here, so this is a bit of a tangent, but doesn’t anybody remember that some men (e.g. Henry VIII) are unable to produce male issue?
The effect on the overall distribution of male vs female births is lost in the noise, but the problem insists on duplicating the overall distribution in each and every household.
So what you’d actually get would be the overwhelming majority of parents producing a nice even 50/50 — which would be utterly swamped by the infinite number of girls produced by those fathers who can’t produce sons. Yes, infinite: There’s no end point to the algorithm.
So, to a first approximation, 100% of the children will be female.
This is a stab in the dark but I think the answer you’re looking for might have smething to do with caring abotu the fraction
G/(B+G) instead of the total number of girls and boys born.
Number of boys is deterministically N. Number of girls is distributed approximately normally with mean N. Large values of G will have less effect on G/(B+G) than small values. I’m doubtful this has an actual effect in the limit as N tend to infinity but that’s the best I can think of right now.
I guess what is interesting about this problem is that we have a process that generates a 50/50 expectation of boys and girls by the time it completes (this is the standard Google solution).
Moreover, if the process hasn’t completed yet, then the future expectation of boys and girls is 50/50, because of its memory-less property.
But if the process is still ongoing, then its past performance must have been tilted in favor of girls (because by definition there have been no boys produced by an ongoing process).
Hmm.
Retardo: If we limit each family to one child per year, then at no point does any family have an infinite number of children. This is enough to completely erase the effect you’re imagining.
John Arkwright: Your probabilities sum to more than one, which suggests you need to adjust your probabilities.
jensen’s?
ratio of the expectations
!=
expectation of the ratio
@John Arkwright, you get the math wrong.
The probability that a given family has exactly one girl is 1/4, not 1/2. All your further probabilities are also double what they ought to be. You can easily see that your probabilities can’t be right because they add up to more than 1.
That said, if you do the math right, you get an equal sex ratio for the reasons explained by others previously. And the good professor assures us that is wrong. I can’t wait to find out why.
It seems to me that if each WOMAN reproduces until she has a boy and one male is sufficient to service all women, then the female sex ratio approaches one with a few, very busy males.
jensen’s?
expectation of the ratio
!=
ratio of the expectation
what’s the expected ratio of #girls/#children for one specific family? it’s:
(1/2)*(0/1) + (1/4)*(1/2) + (1/8)*(2/3) + … << 1/2.
We mostly have been starting from a position of a finite, equal number of men and women and testing the proportion of children they would have under the conditions described. I think this must be the wrong approach, but I don’t see the right one.
I don’t have a complete solution, but I *do* have something that suggests that the ratio will not be 0.5.
Suppose that the population consists of just a single family. This family has N children, and due to the way they produced their family, we know that they have 1 boy and N-1 girls, for a sex ratio of (N-1)/N.
Further, the probability that they have N children is 1/2^N. So the expected average of the fraction of girls in the population over all single-family populations is:
Sum from 1 to infinity of (1/2^N)*(N-1)/N
Or:
0/1 * 1/2 +
1/2 * 1/4 +
2/3 * 1/8 +
3/4 * 1/16 +
4/5 * 1/32 +
5/6 * 1/64 +
6/7 * 1/128 + …
This sum converges to ~30.7% girls over all single-family populations. (I did this numerically, I’m not sure how to do it analytically).
Note that the expected ratio of girls to boys is
Sum from 1 to infinity of (1/2^N)*N-1, which converges to 1.0!
I.e., it makes a difference whether you’re asking for the expected fraction of girls in the population or the expected ratio of girls to boys, because they’re not necessarily related the way you would expect.
I’m not sure how to analytically extend this result to larger populations than a single family, but I expect that as the population gets larger the fraction of girls in the population tends towards 50%.
OK, my first comment is completely wrong, because it is not the case that 1/2 of 1-child families will have girls. If you have 1 girl you will soon become a 2-child family, so the majority of 1-child families at any moment in time will have boys. So never mind that approach.
And my other comments don’t have much promise either, because at any moment in time families that have finished having children have on average more boys than girls.
So I don’t know.
I think Jonathan Kariv has it: E[G/B] = 1, as is intuitive, but E[G/(G+B)] is not 1/2.
E[G/(G+B)] = Sum{i from 1 to infinity} (0.5)^(i+1) * (i/(i+1))
This comes to about 31%.
Not sure what the closed form value is.
Here is my simulation of the problem in Python:
#!/usr/bin/python
import random
BOYS = 1
GIRLS = 0 #indexes
NUMFAMILIES = 1000000
lChildren = [0,0] #starting off with this number of offspring
random.seed()
for i in range(NUMFAMILIES):
t = GIRLS
while t == GIRLS:
dRand = random.random() #simulate a birth
if dRand >= .5:
t = BOYS #this will stop the birth loop for the family.
lChildren[t] = lChildren[t] + 1
print ‘Girls=’ + str(lChildren[GIRLS]) + ‘. Boys=’ + str(lChildren[BOYS]) + ‘.’
And here is the output for the first few runs:
Girls=1001800. Boys=1000000.
Girls=998676. Boys=1000000.
Girls=998357. Boys=1000000.
Girls=1000451. Boys=1000000.
Girls=1001039. Boys=1000000.
Girls=997328. Boys=1000000.
Girls=999801. Boys=1000000.
Girls=1001145. Boys=1000000.
Girls=1001228. Boys=1000000.
Girls=998136. Boys=1000000.
Girls=1000731. Boys=1000000.
Girls=1001362. Boys=1000000.
Girls=1001195. Boys=1000000.
Girls=1003358. Boys=1000000.
It sure looks close to 50/50, although there are slightly more girls than boys. The average of the number of girls born in 1000329. I assume my simulation is missing something.
I also got ~31% using Excel and a Monte Carlo method.
Sorry, the code did not format properly. Here is the link to the code:
https://docs.google.com/leaf?id=0B9LN7PCVtKrHZGZlZWEzYzktMGQ2Ny00YjdhLTk5MTItYjYwOGYyNjVjMDI1&hl=en&authkey=CNSiy3w
OK, let’s look at the expected number of boys for any given family: 1. That was the easy part.
The expected number of girls is…
0*P(first kid = boy) + 1*P(second kid = boy) + 2*P(third kid = boy) etc.
0*1/2+1*1/4+2*1/8+3*1/16+… = Sum(n=1 to inf) n/(2^n+1). Can anyone do that summation off the top of their head? (I don’t have mathematica on this computer). I really hope the answer isn’t 1.
That seems to give us the expected number of girls for any given family.
Sorry, in my above equation, that should be the sum from n=0 to infinity, not n=1
Nevermind^^ Was doing something very wrong. Results reverted to 50/50 in Excel when I fixed a major problem
Coincidentally, I interviewed at Google yesterday. I was not asked any questions or brain teasers that were unrelated to computer programming. I’ve interviewed at four companies with Google-like interview processes in the past couple months, and have been asked only one non-programming brain teaser.
@Steve L. and everyone else who denies 50%:
Please explain how the sum of n variables with E(1/2) can ever be anything different than 1/2 * n. Remember we’re not talking about higher moments here, the mean of a sum of variables is ALWAYS equal to the sum of their means.
Forget about the calculations, forget about the simulations, this simple fact from probability theory dictates that the expected number of girl births is always 1/2 the number of births.
@David Pinto:
I haven’t done Python in years but it looks to me like you are not creating 100K families and doing the history of each separately. Ie you are not simulating the condition that once family J has a boy they stop breeding (and read about the axionm of choice for fun instead). You are flipping a coin for each family regardless of their past.
@Jonathan Campell: Your calculation works for a country with 1 family. I think it gets closer to 1/2 as we add more families in. But assuming the country is of finite size then it’s never quite 1/2. I’m not sure if this is what steve wants.
following up on my earlier comment, a friend of mine found a closed-form solution to the sum: [1 / (n x 2^n)], n = 1 to infinity.
The answer is pleasantly: LN(2), arrived at by integrating both sides of: 1 / (1-x) = 1 + x + x^2 + x^3 + …
Thus the portion of boys, found to be:
100% boys x 50% of the time
50% (or 1/2) boys x 25% of the time
1/3 boys x 12.5% (1/8) of the time …
is finally = LN(2) = 69.3%, and the girls = 1 – LN(2) = 30.7%.
Dr. Landsburg indicated that “3) Nobody has yet gotten to the heart of the matter.” but I don’t know that this precludes these from being the final correct answers :)
Jonathan Campbell: The point that the expected value of a ratio is not equal to the ratio of expected values is spot on. Dave B did your same calculation earlier, but I believe you are both computing the expected number of girls each mother will have.
(In what follows, I prefer to work with the percentage of boys, because the number of boys for a fixed number of mothers is not random.)
Each mother will have one boy and a total number of children that has a geometric distribution. Similar to what you and Dave B derived, the expected ratio of boys to children for a particular mother is the summation from 1 to infinity of (1/n)*(1/2)^n which is about 0.6931.
To put this another way, if B_m is the number of boys born to the mth mother, and C_m is the number of children born to the mth mother, then the expected ratio of B_m/C_m is about 0.7. However, the question was about the ratio of the total number of boys (or girls) to the total number of children. This is
(B_1 + B_2 + B_3 + . . . )/(C_1 + C_2 + C_3 + . . . )
and its expected value is not equal to the expected value of B_m/C_m.
For instance, if we have only two mothers in the generation, then they will give birth to 2 boys. The total number of children they will have will have the distribution:
Pr[total = n] = (n-1)(1/2)^n, n=2,3,4,…
The expected ratio of total boys to children will be the summation of
(2/n)(n-1)(1/2)^n
from n=2 to infinity, which drops to about 0.6137. It seems like the more mothers we add to the generation, the closer this ratio will get to 0.5.
I have come to trust the good Professor on this sort of thing, so I’m probably missing something. I’ll try to find more time for this later . . . maybe the issue involves looking at the effect of subtle differences over a few generations?
Good question.
@Steve: You did not mention a locale. Is it St Petersburg?
Jonathan Campbell:
Ah, I think you’re right.
Let N = # families, G = # girls, B = # boys.
B = N and E(G) = N, normally distributed as N becomes large.
To figure out the standard deviation of G we have to figure out the variance of the number of girls for an individual family.
Using some generating functions I believe this variance for an individual family is 2, so the standard deviation of G as N becomes large is sqrt(2*N). I could be wrong.
So G is normally distributed with mean N and standard deviation sqrt(2*N). The expected proportion of girls is the expectation of G/(G+B) = G/(G+N) = 1 – N/(G+N). N/(G+N) is the reciprocal of a 1+G/N, which is normally distributed with mean 2 and standard deviation sqrt(2/N).
So what is 1 – the expected value of the reciprocal of a normal variable with mean 2 and standard deviation sqrt(2/N)?
First we have to figure out the mean and variance
Then B = N and as N becomes large G is a normally distributed variable with mean N and standard deviation
A little simpler than some of the other suggestions, but…
Why not think of this as a binomial distribution?
If the probability of any family, on any trial of having a girl is 0.5, then we need to find the number of trials (n) where the probability of having all girls = 0. Using the excel formula we need to find binomdist(n,n,0.5,false) = 0. Since this will never happen, then the ratio of girls to boys is infinite, and the expected ratio of the population that is girls will approach 1.
Jonathan Kariv & Thomas Bayes: You’re right, I calculated for only 1 family, and it was the same calc as Dave B’s. So now I’m stumped as to why the answer isn’t 0.5, unless, as you say, it is just a matter of there being a finite # of families.
Here’s a Perl implementation that starts with some number of families and iterates through generations until each family has had a boy. It rarely more than ±2% off of the obvious 50%.
http://pastebin.com/By4TUW8n
Whatever the trick is here, it’s hard to see where these simulations are going wrong.
“Then B = N and as N becomes large G is a normally distributed variable with mean N and standard deviation”
Then the number of girls would be N *1/2*N b/c N families will try to have kids, 1/2*N will have boys, so 1/2*N will have at least one girl. If those families will have, on average, N girls, then you have 1/2*N^2 girls.
So the proportion of girls in the first generation would be
1/2*N^2/(N+[1/2*N^2]) = 1/2*N/(1+1/2*N) the limit of which is 1 – so 100% of the first gen. would be girls.
I think several responese here are related. Let me repeat my first example, with 2 families
Consider the 2 family case
1. BB – 25% converges to 0% girls
2. BG – 50% convergers to 50% girls
3. GB – 25% converges to 50% girls
4. gg – 25% does not converge to 100% girls
Here we see ratios of g/(g+b) of 0, .5, .5. and x, x < 1 (I am too lazy to calculate x). What is E(g/(g=b))? it is
0
(1/4) + .5(1/2) + x(1/4) = .25 +4/4 with x<1, which is < .5
(N families ins not 1 family N times, Here we see disproportionate effect of early males in most families)
Is this right or am I missing something?
Thomas Bayes: I suspected the jump from one family to all families was dodgy. As you say the answer definitely depends on the number of families. If this is the solution it’s a little unsatisfactory since the spirit of the puzzle suggests although does not state an infinite number of families. But then you don’t prove that in the infinite case it goes to 0.5 but I would bet it does.
Number of girls is distributed normally with mean N and standard deviation order of square root N. So yeah as the number of families goes to infinity the ratio goes to 1/2.
I think 3 girls to 1 boy. Key is that we never allow for multiple boy possibilities.
Doug said, on December 21, 2010 at 1:32 pm:
“Please explain how the [expectation of the] sum of n variables with E(1/2) can ever be anything different than 1/2 * n. … Forget about the calculations, forget about the simulations, this simple fact from probability theory dictates that the expected number of girl births is always 1/2 the number of births.”
Indeed it does, and that is the standard “good” answer that avoid all summing of series, all discussions of twins, all extraneous chit-chat.
So I suspect that Steve L means something different by “non mathematical” than I do.
++
Steve L writes:
“KenB:
I think Steve means this. Couples toss a perfectly fair coin. If they get H they mark Boy on their door and stop. If they get T they mark Girl on their door and continue. The door always has room for more marks.
Yes. Exactly.”
If we accept that, we see that we have a suite of couples, and that some are flipping coins and writing down the results. It doesn’t matter how or why some are continuing to flip and others have stopped. If the coins are fair, the total history will be that of a fair coin being repeatedly flipped.
To answer the question asked, Are You Smarter Than Google? I’m not. I’ve determined this by reading all your replies. I’ll wait for Steve’s answer but I sure I won’t understand. :)
Lets assume a finite number of families (N) and that all of them have completed the task of conceiving a boy. What is the expected population in this situation? Well what we want to do is integrate over the total number of families the random variable of how many girls they have plus the number of boys they had. From my assumption, we know that each family has one boy. So we integrate:
0 to N int(1 + E[G])
Where E[G] = 0 to inf int(X*f(X)dX) <-{expected number of girls}
Where X is our random variable that follows a poisson distribution of arrival time 1/2
Skipping some math, we know the expected value of a poisson distribution of arrival time 1/2 is exactly the arrival time, so E[G] = 1/2. We can integrate over all of our families and the solution will be a newborn population of size 1.5*N. We know every family has 1 boy, so the number of new boys will be N. Given our newborn population, we know the number of girls will be 0.5*N (either from integrating our E[G] or total newborn population less new boys). So the proportion of newborns will be 2/3 boys and 1/3 girls.
If we further assume our prior population distribution was 1/2 boys and 1/2 girls then we need to fuse this with our proportion of newborns. With N families, this leads to 0.5*N boys and 0.5*N girls. Our new total population will be N+1.5*N = 2.5*N. Total number of boys will be 0.5*N+N = 1.5*N which gives us a proportion of 3/5. Total number of girls will be 0.5*N+0.5*N = N which gives us a proportion of 2/5.
@Bob Ayres:
I don’t think you are addressing the question asked, which is not the separate expaectations of numbers of boys and girls born, but the joint expectation of the ratio. It id different in different cases. One boy it is 0; one of each it is .5, 2 girsls one boy it is .67, etc. They will on average have the same number of B and G. But the ratio g/(b+g) will not be a constant, it will be a fraction < 1. Each of these fractions has a probability affected by the stop flipping rule. Add up these fractions times their probability and the sum is not 1/2 (except it seems asymptotically).
Q:There’s a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female?
A: The name of the country is China. I believe the population is something like 46% female total, 42% female at birth, or something like that.
Sorry, I meant the ratio of females to males approaches infinity, not one, in my earlier post. The expected number of males is finite and the expected number of females is infinite.
Now for the really interesting question. Is Dark Star Old Chestnut a good beer, or does it just have a suitable logo?
Good question . . . I’m stumped.
Which of these statements are false . . . ?
1. Each girl will give birth to one boy.
2. Each girl will give birth to n girls with probability
p(n) = (1/2)(1/2)^n, n=0,1,2,…
(This is a geometric distribution.)
3. A generation that has G girls will give birth to G boys.
4. A generation that has G girls will give birth to M girls with probability
p(M) = C(M+G-1,M) (1/2)^M (1/2)^G, M=0,1,2,…
where C(M+G-1,M) is the binomial coefficient. (This is a negative binomial distribution.)
5. If there are G_k girls and B_k boys out of C_k total children in the kth generation, then the next generation will have
B_(k+1) = G_k
G_(k+1) = NB(G_k,1/2)
C_(k+1) = G_k + G_(k+1)
where NB(G_k,1/2) is a negative-binomial random variable with parameters G_k and 1/2.
Dave B and Thomas Bayes: Perhaps the spirit of the puzzle does not require assuming there are an infinite number of families (that would qualify as a mathematical point), so instead of 1/2 it’s just very close to 1/2.
I take it all back. The expected number of male offspring *per woman* must be one (1/2+1/4+1/8+…) which implies a sex ratio of one (50% males). It is true each woman has an expected greater number of female offspring, but each of them has an expected number of male offspring equal to one. I await the correct answer which must have something to do with transfinite numbers.
JeffSemel: I think that is it . . . I was looking for a ‘paradox’-type answer, and I think the point is just that the expected proportion of boys (or girls) in a generation is different from 0.5 by an amount that converges to 0.25/B, where B is the number of boys in a given generation (which is equal to the number of girls in the previous generation).
I haven’t derived this claim from the negative-binomial yet; I am basing it on a simple numerical analysis for various values of B. Here is the code in matlab:
B = 1000; n=0:1:1e4; B*(.5-sum((B./(B+n)).*nbinpdf(n, B, .5)))
This computes the number of boys (B) times 1/2 minus the expected ratio of boys to children when there are 1000 mothers in a generation. The answer is -0.2500. This is the same answer for any value of B greater than about 20, and doesn’t deviate too much until B is below 5.
This is my FINAL answer. The expected number of female offspring per woman is one and the expected number of male offspring per woman is one, so there are two females for every male. 2/3 of the population is female on average.
I know someone else has already shown that.
I’m inclined to go with the answers that say 0% or that the population goes to 0 over time.
Here are my assumptions that I think are valid for the problem:
1) A couple is a man and woman who stays together for life.
2) Every couple is able to have children.
3) A man and a man can’t form a couple that can have kids
5) The natural birth rate is 105 boys to 100 girls (close to 50%, but not 50%)
With this birthrate, there are always be more men, so when it is time to pair up, there will be men who are unable to form couples who can have kids. Therefore number of couples that can have kids will reduce with each generation.
I think that even if you assume a 50% birthrate on average, the same applies. There may be a 10 year period where more men are born which would have the same affect as reducing the number of partners that can have kids.
I think that the phrase “Therefore each couple keeps having children until they have a boy; then they stop.” is in the puzzle to just assume that on average every parent has 2 kids.
Sorry, I meant the ratio of females to males approaches infinity, not one, in my earlier post. The expected number of males is finite and the expected number of females is infinite.
“In other words, if there were many such countries, what fraction would you expect to observe on average?”
Take N islands, place a family on each.
0.5*N islands end up with (B)
0.25*N islands end up with (G,B)
0.125*N islands – (G,G,B)
…
Average over the countries %of boys is
0.5*(1) + 0.25*(1/2) + 0.125*(1/3) + …
– that’s if parents are not counted.
I think I see the issue now, this is a continuously changing population ratio. If we look at this period by period it can be clearer. My math was also wrong in the previous post, I should have assumed starting at an equal ratio of N boys and N girls. Taking my previous solution as one period (and adjusting math) we will get a new total population of 2*N boys and 1.5*N girls. Our additional population will always be dependent on the number of girls in the population. So showing a few periods and assuming infinitely lived individuals to not complicate the math:
Period 0: N boys N girls (1/2 and 1/2)
Period 1: 2*N boys 1.5*N girls (4/7 and 3/7)
Period 2:
We find the newborn population as 0 to 1.5*N int(1 + E[G])
This will give us 1.5*N newborn boys and 0.75*N newborn girls. Now:
3.5*N boys 2.25*N girls (14/23 and 9/23)
Period 3:
We find the newborn population as 0 to 2.25*N int(1 + E[G])
This will give us 2.25*N newborn boys and 1.125*N newborn girls. Now:
5.75*N boys 3.375*N girls (46/73 and 27/73)
…
This appears to be like a Markov process. Each period we add boys at a rate equal to the number of girls in the country and add girls at a rate equal to one half the number of girls in the country. Setting up our first vector of population as X_0 = [b,g] where the first element is boys and second element is girls then our projection matrix will be P = [1,0; 1,1.5] which can be translated to new amount of boys will equal the old amount plus total number of girls while new girls will equal 1.5 times the total number of girls. Our system can be set up like:
X_n = X_0 P^n <-matrix notation
Where n is defined by the period we are in. Solving this out for boys and girls we get:
g_n = (1.5^n)g_0
b_n = b_0 + SUM{i=0,n}[(1.5^i)g_0]
Normalize initial starting populations to 1. Girls grow at a constant rate of 50%. Boys will grow in a much more complicated fashion. Boys should grow at a rate that depends on time and also increases over time. Skipping some math, it appears the growth rate looks something like:
[(1-1.5)(1.5)^(n+1)] / [1-(1.5)^(n+1)]
The boys growth rate will dominate thus in the limit the population ratio will approach 100% boys.
Thomas Bayes: Very cool.
Okay, I think I have it.
Start with some number of mothers in a generation.
B is the number of boys they will have.
G is the number of girls they will have.
We want to know the expected value of B/(B+G).
B is equal to the number of mothers, so it is not random.
G is random and has a negative binomial distribution. Two important things to know about G: its mean is B and its variance is 2B.
To find the expected value of f(G) = B/(B+G), first expand the function in a Taylor series about B:
f(G) = f(B) + f'(B)*(G-B) + 0.5*f”(B)*(G-B)^2 + …
where f'(B) and f”(B) are the first- and second-derivatives of f(G) with respect to G, but evaluated at G=B.
Because the expected value of G-B is zero, and the expected value of (G-B)^2 is 2B, the expected value of f(G) is:
E[f(G)] = f(B) + 0.5*f”(B)*2B + …
But f(B) = 1/2 and f”(B) = 2*B/(B+B)^3 = 1/(4*B^2), so
E[f(G)] = 1/2 + 0.25/B + …
B is the number of boys, but it is also the number of mothers from the previous generation, so we have a way of computing the expected proportion of boys for a given number of mothers. We could refine this by adding another term or two in the Taylor series, but this is a very good approximation.
@ Robert
I think saying that the population ratio will approach 100% boys means that the population goes to 0. E.g. if there are 100 boys and 1 girl. Then only 1 couple can be formed. 50% of the time that child will be a boy. At this point there is no option for children to be born.
The answer is 75% Boy Children, 25% Girl Children per generation with a rapidly declining population until extinction.
The reason is that half of the families which have a boy on the first child terminate and there are no more chances of having a girl at all.
The remaining are true trials are 50%/50% expected outcome
Child 1 B G G G G G
Child 2 B G G G G
Child 3 B G G G
Child 4 B G G
Child 5 B G
Child 6 B
For the many who are off on this particular wrong track: The dynamics of the population ratio (the way it changes from one period to the next) have nothing to do with this.
Q to the author: The puzzle implies these people don’t have a natural lifespan limit (they keep having children as many times as they want).
What exactly happen to those who get a boy:
– do they leave the country?
– live happily ever after (forever), no more children?
– divorce and find new life partners?
Why does anyone here think there will be more males than females? We have a population that keeps breeding as long as it produces girls and stops as soon as a boy appears. There is a bias towards breeding females.
What if every family has a boy as their first child (not impossible for a verrrry small country). Then the game is over, and eventually that country ceases to exist.
If I’m following Thomas Bayes’ line of reasoning correctly …
There is an asymmetry to the problem. For a country with M mothers, there is a 2^-M probability that all mothers will have a first born son, and the fraction of girls will fall off to zero. At the other extreme, there is a 2^-M probability that all mothers will have a first born daughter. But after this happened each mother would on average have one more boy and one more girl, and the fraction of girls would be somewhere around 2/3.
That is, for small values of p, it is much more likely for girls to be some fraction p of the population that it is for them to be 1-p of the population. So the expected number of girls might be the same as the expected number of boys in each generation, but it could also be true that the expected fraction of the population that are girls is less than one-half.
As the number of mothers increases, the likelihood of an extreme event falls and the amount of asymmetry decreases. This is consistent with Thomas Bayes’ approximate formula of E[f(G)] = 0.5 + 0.25/M.
@ Will A, no the ratio of boys in the population can approach 100% without the population itself going to 0. This is implied by the population itself going to infinity. It is sort of like asking what is the limit of g(x)/[f(x)+g(x)] as x goes to infinity if g(x) = x^2 and f(x)=x. However this is a moot point as apparently the time dynamics are not the issue that Dr. Landsburg is focusing on.
@ Niel:
What I’m having a hard time seeing is how any bias male or female doesn’t kill the population.
E.g. my reasoning is that if there is a bias toward females, then in any given generation. Let’s say it’s 60/40 and there are currently 100,000 in the current population. In the next generation, there will be 60,000 girls and 40,000 boys.
This means that only 80,000 couples can have kids.
If you can explain how I’m wrong, I’d appreciate it.
I am assuming that a male can impregnate multiple females, so the only thing you need to look at is the female population, and figure out the expected number of male and female offspring per woman.
I apologize if I am stealing someone’s answer. I haven’t read all comments. I think I have a framework to show that the percentage of male births will be less than 50%:
In a population with N mothers, we know that the number of male babies will be N in the limit – each mother stops having kids after she has a male, so each mother will have exactly one male (assuming she can try an infinite amount of times if necessary).
On the other hand, the expected number of female children each woman will have is:
W = 1*1/4 + 2*1/8 + 3*1/16 + 4*1/32…..
i.e. she will have 1 female child with probability 1/4 (female-male), she will have 2 with probability 1/8 (female-female-male), 3 with probability 1/16 (female-female-female-male), and so on…
Thus, the exact percentage of male births in a population of N mothers will be N/(N+W), where W is the series above. Further, without going into gory details, it can be shown that W > 1. Thus, in expectation, each mother will have more than 1 woman child, and at most 1 man child, leaving the percentage of births for a given set of N mothers less than 50% male.
The intuition (I think) is that, by adding a constraint that each woman must stop procreating after a single male baby, society eliminates the possibility of multiple male families, which actually decreases the overall likelihood of men.
But I may be completely wrong….
Regarding the bias issue, IF the analysis I did earlier is correct, then we can set the probability of a girl to be p and the expected proportion of boys will be:
E[B/(B+G)] = 1-p + p(1-p)/M,
where M is the number of mothers in a given generation.
Ok, I may have got it.
Every couple (M + F) begets exactly 1 M + on average 1 F and then passes away.
A mental picture is like: there is a country with x M and y F.
Every now and then, 1M + 1F leave and get replaced (after a few years delay) with 1M + very-close-to-one F.
In other words, whatever initial ratio of M/F was, it will stay unchanged, assuming the country is large enough. There is no attractor.
Or, to be more precise:
Number of M stays the same.
Number of F follows a random walk with no drift. So is F/M ratio.
No attractors. A capturing wall at F = 0.
ok, i haven’t read all the comments, but i skimmed dr. landsburg’s comments to gather that the answer does not depend on complex biological factors like a gender bias in the population or multiple men being able to father children with the same women, etc. he also seems to just one the percentage of women in the OFFSPRING (doesn’t care about the parents). I am also going to make one more over-simplification (that could be fatal), but i’m not going to take in to consideration the procreation of offspring. (basically, i’m just imagining one procreating couple).
the percentage of females in the population is the probability of a randomly selected child being female at a given point in time. i’m calling such a point in time a generation.
p(F)
gen 1) 50% (1/2)
gen 2) prob (1st child being female) + prob (2nd child being female) = 1/2 + 1/4 = 75%
gen 3) 1/2 + 1/4 + 1/8 = 87.5%
gen 4) 1/2 + 1/4 + 1/8 = 93.75%
ok, now bear with me as i’ve forgotten how to write infinite series, let alone solve them…
prob (female) in generation n = 1 / 2^n + 1 / 2^(n-1) + 1 / 2^(n-2) +…1/2^(n-n) (need the -1 because in generation 0, probability of female must be 0 )
which means that p(F) = {[sum (2^n)] / 2^n} – 1
therefore p(F) = {sum [2^(n-1)]} / 2^n (n>1)
Again, because I don’t remember how to solve infinite series, I have to hazard a guess, but I think that converges to 1…
Therefore, the more generations go by, the more the population will tend toward being 100% women…
Does that make sense?
Do other people find this sort of question interesting? I see it as, basically, guess the unrealistic assumptions that Steve wants to make. Clearly, realistic assumptions would involve a maximum number of children per mother.
Anyway, I wrote an Octave (Matlab) script to simulate this using what I think are the most simple (and unrealistic) set of assumptions.
http://pastebin.com/sdMSkXnx
The results from 4 runs with 1e8 mothers makes it look like the result is very close to 50%:
1.000000e+08 mothers
1.9998399900 average children per mother
28 most children per mother
99983999 girls
199983999 total children
49.9959994299 percentage girls
1.000000e+08 mothers
2.0001137500 average children per mother
30 most children per mother
100011375 girls
200011375 total children
50.0028435883 percentage girls
1.000000e+08 mothers
1.9999452600 average children per mother
28 most children per mother
99994526 girls
199994526 total children
49.9986314625 percentage girls
1.000000e+08 mothers
2.0001095500 average children per mother
29 most children per mother
100010955 girls
200010955 total children
50.0027386000 percentage girls
What I think is interesting is if we make it a little bit more realistic by adding an assumption that the mothers will stop if they reach say, 8 girls. This results in the girl percentage being a little under 50% (about 49.2%):
1.000000e+07 mothers
1.9603266000 average children per mother
8 most children per mother
9642407 girls
19603266 total children
49.1877577951 percentage girls
1.000000e+07 mothers
1.9612225000 average children per mother
8 most children per mother
9651092 girls
19612225 total children
49.2095720909 percentage girls
1.000000e+07 mothers
1.9603011000 average children per mother
8 most children per mother
9642122 girls
19603011 total children
49.1869437812 percentage girls
1.000000e+07 mothers
1.9609180000 average children per mother
8 most children per mother
9647983 girls
19609180 total children
49.2013587514 percentage girls
Ooops, I forgot to increment the counts when the mother stopped with all girls. After I did that, I get about 50%, even with the mothers stopping with 8 girls. So, the stopping at 8 case is not particularly interesting, either…
http://pastebin.com/3mj4uqUX
1.000000e+07 mothers
1.9915936000 average children per mother
8 most children per mother
9955053 girls
19915936 total children
49.9853634798 percentage girls
1.000000e+07 mothers
1.9914570000 average children per mother
8 most children per mother
9953591 girls
19914570 total children
49.9814507670 percentage girls
1.000000e+07 mothers
1.9922274000 average children per mother
8 most children per mother
9961677 girls
19922274 total children
50.0027105339 percentage girls
1.000000e+07 mothers
1.9917205000 average children per mother
8 most children per mother
9956254 girls
19917205 total children
49.9882086869 percentage girls
The answer is: we should expect 55% girls in this population.
Every single family will end up with one – and only one – boy. So, if there are 1000 families in this population, we should expect exactly 1000 boys…no more and no less. On the other hand, 500 families will have no girls at all, and the other 500 families will have at least one girl.
However, the 500 families that have at least one girl will have – on average – about 2.3 girls each. Some families (but only a third of the families with girls) will have just one girl before being blessed with their son. The rest will have two, three, four, five, etc. A tiny fraction will have an enormous number of girls before finally having a boy. It becomes clear that, of the 500 families with girls, the average number of girls will be fractionally higher than 2. Again, each of these families will end up with one and only one son.
Summing the probabilities of these 1000 families gave me an expected average of 1222 girls out of the population of 2222 children, or 55% girls.
Thank you for the challenge, Dr. Landsburg!
Is everyone counting the gender of the parents in their calculation? In a family with one son, the ratio of male to female is 66.6%. The ratio doesn’t reach 50-50 until the family has one son and one daughter.
So, my back of the envelope gives me 28.5% are male.
out of 1000 couples:
The first 500 families have a ratio of 66.6% male to female
then
250 are 50%
125 are 40%
62.5 are 33.32%
31.25 are 28%
15.62 are 25%
7.81 are 22.22%
3.905 are 20%
I stopped there and added all the percentages and divided by 10.
@ bart.mitchell:
I found the best way to see this was to write a computer simulation. Write a script like the following (trying to be somewhat language independent). When start running this with numCouples = 1 and the start increasing.
class country
{
int numBoys
int numGirls
country()
{
numBoys = 0
numGirls = 0
}
double getAverage() { return numGirls/(numBoys+Girls) }
}
main()
{
int numCouples = 1
int numCountries = 100
Country[] ctry = new numCountries[numCountries]
for(int cIdx = 0; cIdx < numCountries; cIdx)
{
//simulate population for each country
for(int i = 0; i < numCouples; i++)
{
do
{
//generate a random number either 0 or 1
int r = random(0..1)
if(r == 0)
{
//it's a girl
ctry[cIdx].numGirls++
}
else
{
//it's a boy
ctry[cIdx].numBoys++
}
} while (r == 0) // continue looping until a boy is born
}
}
//we have populated the boys and girls for each country.
//let's the get the average of the averages.
double totAvg = 0
for(i = 0; i > totAvg/numCountries
}
So is root cause of more girls than boys the fact that some mothers will eventually HAVE to stop having kids when they are unable to procreate anymore? I see how the calculations get the #s reported, but I’m still trying to wrap my head around why. Maybe I missed an explanation further up in the comments.
Jennifer:
So is root cause of more girls than boys the fact that some mothers will eventually HAVE to stop having kids when they are unable to procreate anymore?
No, the calculations I’ve done assume that women never lose the ability to procreate.
I demonstrate the paradox with a simulation here. The question is poorly stated by the author, however. It clearly asks for the fraction of the population as a whole, but he wants the fraction of girls in the average family, which is a completely different animal. Steve should lose the bet for asking one question and answering another.
loveactuary and David Pinto produce the same answer (or have the same thinking anyway), and that is the average percentage of girls per family:
1 boy, 0 girls = 0%
1 boy, 1 girl = 50%
1 boy, 2 girls = 67%
So, we have 3 families with 3 boys and 3 girls, and a simple average (average by family) in this illustration is 38.9%.
Phil Birnbaum explains it better here.
You can use the method from loveactuary to get just under 31%. Instead of treating each family as its own country, you can pool families, and you will get a result of .499999… as Phil demonstrates.
Perhaps Steven can be more explicit in his question. He does say this:
“In other words, if there were many such countries, what fraction would you expect to observe on average?”
He’s suggesting it the way Phil is interpreting it, that each country will have an equal weight, regardless of how many girls there are. loveactuary’s description is perfect in this regard.
Would a country in which every couple followed such a breeding rule eventually have a population of zero?
If we imagine the country begins with a generation containing k couples, then in expectation that generation will procreate k males and k* females, where k* < k. Then since every couple must have one male and one female, the second generation can have only k* couples (with k – k* bachelors). The rate of decrease in the population speeds up as k decreases because the expected ratio between k* and k is smaller for smaller k. Eventually, we would expect to get to a generation with only one couple; if this couple has a son as its first child, procreation ceases.
@ Brian:
I believe that you are correct. I find it interesting that others aren’t arguing over this. If you were applying for a job at Google, writing a simulation program that shows what happens over time to a population following this rule would seem to be an acceptable approach.
I think it would be fair to ask a programmer to simulate this in an interview, because I could see if they defensively check for divide by 0 in their code.
If I am not missing something it should be 1-Log(2) which is approximately 0.307. Technically, is not this as simple as ratio of expectations is not equal to expectation of ratios argument. Intuitively, half of the families will have only 1 child which is boy.
1/4th will have 1 boy 1 girl. 1/8th will have 2 girls 1 boy. and so on.
Hence, assuming there 100 families, (0+12.5+12.5*2/3+6.25*3/4+…)/(50+25+12.5+6.25+..)~=0.307.
In a more technical way it is \sum_{n=1}^infty ((n-1)/n) (1/2)^n =1-Log(2).
Oh no!
Ive forgotten to add in the factor that if they are all boys for a given time there will be no way to give birth to ANYTHING as no girls equals bye bye country.
Still if you look at the question it says give the answer as a fraction, which for 50 percent is 1/2, and for the predicted answer is just under 11/25.
Or has that really oversimplified the point?
I don’t like the way they have said no biological factors are involved, seeing as I’m actually a biochemistry student I would have enjoyed applying them to the problem.
Ive thougt about trying this out with a single celled organism which only reproduces sexually, but that doesnt cover the question. May do it anyway. Dissartation here we come.
I think there is a confusion here between two questions :
– If you knock on one door in this universe and ask them what the proportion of boys is in their household, what is the expected result ?
– What is the expected proportion of boys in this universe ?
They are different, as only one family with a 1000 girls would have a big impact in the second case, not in the first case.
I am assuming here that the question is the second one. We have N families, N boys, and for a family k, G_k girls. The G_k are i.i.d with geometric distribution (parameter 1/2)
The proportion of boys P_b is P_b = N/(N+G_1+…+G_N)
We write S_N = G_1+…+G_N and divide by N on both sides of the fraction :
P_b = 1/(1+S_N/N)
By the law of large numbers S_N/N -> E(G_1) = 2 (the inverse of the parameter) with probability 1
P_b converges to 1/3 and P_g to 2/3 with probability 1
Unless there is a mistake somewhere
If people had all their children instantaneously upon getting married, and could have any number of children, I wouldn’t envy them. And the answer would be:
Expected # of boys = 1
Expected # of children = 1/2 + 2/2^2 + 3/2^3 + …
= 1/2 + 1/2^2 + 1/2^3 + … +
1/2(1/2 + 1/2^2 + 1/2^3 + …) +
1/2^2(1/2 + 1/2^2 + 1/2^3 + …) +
…
or, the sum from k=0 to infinity of [(1/2^k) * (series that sums to 1)] = 2. So the ratio would be 1:1. This is not because “half of all births are girls”; it is not a random series of births. It’s because the excess girls had by families trying for a boy are exactly balanced by the fact that half the families are boys-only.
People are overlooking the fact that families don’t have all their children instantaneously on being married, and don’t live forever. At any moment, many families still have only girls. To solve it, you’d have to know how many years apart people usually have children, and how long a family survives. But there will be more boys than girls.
You could avoid having to know how far apart people have children if you assume that families live infinitely long, and that this had being going on for an infinite timespan–note the 1:1 sex ratio and 2-children replacement rate is a stable population–but I’m pretty sure that would converge to the 1:1 ratio. So I don’t think there is a defined non-simple answer.
Oops. I see that what Steve is getting at is that the expected value of the fraction of children who are girls is not 1/2. The tricky part is that the expected NUMBER of boys and girls is equal, but that does not translate into the expected fraction being equal. Trials where few children are born (mostly boys) have a larger impact per child born than trials where many are born (mostly girls).