Hacker News new | past | comments | ask | show | jobs | submit login
What you probably forgot about models
1 point by erikb on Nov 29, 2011 | hide | past | favorite
http://news.ycombinator.com/item?id=3289839

In this linked post there is a lot of discussion about wether knowing about a father who has a child, which is a son and was born on tuesday, leads to a different probability for this father's second child being a son or a daughter.

I really want to make a point here and please stick to the end before you start commenting, because I will say that you (yes you!) are wrong.

Some answered with P(child_2=boy)=0.5 because boys and girls are both equally likely and knowing anything about the other children of this father is quite unimportant. Another group said P(child_2=boy)=1/3, because this father has 2 children with each being a boy or a girl and u can already discount both being a girl, because he said he has at least one son. For more details read the comments and blog post which are all possible to find under the link above. The third statement is, that P(child_2=boy)=13/27, doing the same as the P=1/3 guys but adding the weekdays in the same way and then excluding the impossible cases according to what the father said.

Each of these groups discusses that their probability is better then the other 2. So which one is right? I tell you that all groups are wrong and all probabilities are wrong. The reason is, all groups don't know which is true and if the child is a boy or not. The thing is that they just model this uncertainty using probability theory. So all of them are models. And what do we know about models? Models are all wrong, because they always generalize the real world. If you would have a really correct model, it would be the real world itself.

On the other hand also all 3 probabilities are good and correct to use. Because they all help us to make any assumption at all about wether the second child is a boy or not. All 3 answers are better then "I don't know" and depending on how well you argue which model, all 3 of them are also mathematically correct.

So instead of arguing that your number is better then the others and that your truth is more true then the truth of the others (isn't that called religion), you should understand from this kind of question that probability is just a model, which has it's flaws and it's advantages.

That said I think it is practically better to underfit (=use less information then you could) then to overfit (=use data that actually doesn't improve your knowledge, which is misleading), why I would, in a real life application, use the 0.5 until I know I really have a situation which allows me to infer more, because given different equally likely models, it is best to use the simplest one (google "Occam's Razor".)

And that is actually already all that is to such kind of problem. If you can't understand the math well enough to get to ALL THREE probabilities, then you should study probability more deeply before you start arguing. If you think one of them is better then the others, then you might be great at math but you don't understand the basic philosophy of every scientist and engineer (your model is always wrong). And last but not least you should become a theoretical scientist if you don't understand Occam's Razor.

Thanks, you are a very patient reader and I'm proud of you. Now you can start arguing why your probability is actually the one and only, if you still feel like it!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: