Hacker News new | comments | show | ask | jobs | submit login

Because we are not doing hypothesis testing, we are doing classification on a toy dataset. Sure, one could treat this as a forecasting challenge, but then one would need another Titanic sinking in roughly the same context, with the same features... That demand is as unreasonable as calling this modeling knowledge competition meaningless.

And if you see classification as a form of hypothesis testing, then cross-validation is a valid way of testing if hypothesis holds on unseen data.




I think that is a rub. With the goal just being to find some variables that correlate together, it is a neat project. But, ultimately not indicative of a predictive classification. If only due to the fact that you do not have any independent samples to cross validate with. All samples being from the same crash.

This would be like sampling all coins from my pockets and thinking you could build a predictive model of year printed to value of coin. Probably could for the change I carry. Not a wise predictor, though.


You are right, but only in a very strict, not-fun, manner :). Even if we had more data on different boats sinking, the model would not be very useful: We don't go with the Titanic anymore and plotted all icebergs. Still, if a cruise ship were to go down, I'd place a bet on ranking younger women of nobility traveling first class higher for survivability than old men with family traveling third class, wise predictor or no.

This dataset is more in line with what you are looking for: https://www.kaggle.com/saurograndi/airplane-crashes-since-19...


Makes sense. And yes, I completely meant my points in a pedantic only manner. :)




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: