
My experience with Kaggle's data-science competitions - dhruvbhatia
http://dhruvbhatia.com/thoughts/kaggle-data-science-competitions/
======
triplesec
TL;DR (for article): I enter these things and they are ranked. I like it.

Not really a very informative blog post.

~~~
thrush
Agreed, I think this made the front page solely riding off the hype of the
recent AI and Machine Learning media surge.

I would love to hear more information though. Fun fact, part of my data
analytics course next semester involves participating in Kaggle competitions.

------
jbooth
My problem with a lot of these competitions, particularly the high-reward
ones, is you've got people duking it out at the 4th or 5th decimal point for
the top few spots. At that point, you're not even competing for best
generalizable algorithm, you're competing for 'algorithm which happens to fit
the holdout set exceptionally well'. It's overfitting at a distance.

~~~
Houshalter
The data set used for the leaderboard is not the dataset used in the final
evaluation. They also limit the number of final submissions you can have and
the number of submissions you can make per day to prevent that.

Often the person at the top of the leaderboard will move down a few places in
the final evaluation.

~~~
dhruvbhatia
Further to Houshalter's point: the President & Chief Scientist of Kaggle
addresses this specific issue and explains the Public/Private Leaderboard
system in this webcast:
[http://youtu.be/GrugzF0-V3I](http://youtu.be/GrugzF0-V3I)

------
kephra
My problem with Kaggle is that they are more greedy then Apple, Kickstarter
and Paypal together. Apple takes 30%, Kickstarter takes 10%, Paypal takes 3%,
but Kaggle takes 50%!

I once asked them what the price would be to start a competition to outsource
a difficult machine learning problem. Their answer was: We would need to pay
$20k, if we want the winner to earn $10k.

Thats when I stopped thinking about those greedy bastards.

~~~
Houshalter
That seems quite high, though those people do need to make money somehow. The
other businesses aren't really comparable because they have a much, much more
money flowing through them and can afford to tax a smaller percent to stay
competitive.

------
ForHackernews
Wait, so companies post real problems, and then people solve them for them for
free? Is there at least a prize for the winners?

~~~
dhruvbhatia
It is somewhat akin to spec work within the design field[1], in that only the
top performers make money and everyone else goes home empty handed despite
their hours of work. However, I've noticed that most of the near-winners tend
to open source and share their solutions on GitHub, so one could argue that it
helps the community (unlike say 99 designs, on which non winning entries are
basically worthless).

[1]: [http://www.nospec.com/faq](http://www.nospec.com/faq)

------
LambdaAlmighty
Lost me at "big data problems".

~~~
pmelendez
Why? A multi giga dataset in machine learning is considered to be big enough
to be called like that. An algorithm like support vector machine which
complexity can be O(n^3) would be prohibitive in a single computer.

