
Data Science Competitions 101: Anatomy and Approach - akashtndn
https://techandmortals.wordpress.com/2016/07/27/data-science-competitions-101-anatomy-and-approach/
======
mattkrause
I wonder how often there is an actual, statistically-significant difference
between the performance of the first and second, third, (etc) place entries.

Performance on the test set is a point estimate of the classifier's
performance, but there is some variance around that point estimate due to the
precise composition of the test set. For example, a classifier will perform
somewhat differently if _these_ 10,000 digits were in the test set instead of
_those_ 10,000 digits.

Academic papers have gotten better about reporting confidence intervals, but I
don't think I've ever seen a contest that uses them.

~~~
ACow_Adonis
As a stats guy, that's actually been a thing that stops me participating in
many of these comps.

The leader board for many of the tasks actually end up being either
statistically the same (which means where you finish in the top N will have a
significant random factor assuming you actually know what you're doing ink the
first place).

The second is that it's also common for the top places to game the structure
of the data or overfit to the particular data or training set.

Good for competition :)

But bad data science :(

~~~
hcarvalhoalves
> The second is that it's also common for the top places to game the structure
> of the data or overfit to the particular data or training set.

And somewhat often, the datasets for those competitions are f __* up because
having clean, unbiased, historical data is uncommon in organizations. Or they
mess up while anonymizing the data.

Other point about competitions is that you don't need to deal with things like
concept drift, implementation complexity, resource/latency constraints for
predictions and a lot of other things that people applying ML in practice have
to face. More often than not, these factors dominate having a 1% or 0.1%
better model - unless you're like Google or Facebook.

------
bllguo
So is it fair to say a large part of what separates the top placers in these
competitions is feature engineering?

~~~
benhamner
The edge is almost never the choice of model architecture. It's very common
knowledge now that deep neural networks and gradient boosted machines are
incredibly effective at different problem classes, and that ensembling almost
always marginally boosts performance.

The edge winning teams have varies from competition to competition. It
includes

\- robust cross-validation strategies \- robust feature selection strategies
\- creative feature engineering \- finding uniquely valuable external datasets
that improve performance \- robustly controlling for distributional
differences between train and test sets \- (and unfortunately on occasion)
information leakage

~~~
p4wnc6
Are we as a community learning anything noteworthy about the particular
composition of those approaches, e.g. the list of stuff at the bottom of your
comment, from the contest winners?

If yes, then why don't many, many participants snap to doing things exactly
like past winners did them (not just same model architectures, but same cross-
validation strategies, same inclusion of good outside data sets, etc.) -- thus
making the differences between entries almost indistinguishable very fast, and
meaning that winning vs. losing would probably just come down to random noise,
thus contest performance probably shouldn't be related to whether someone is
good in that field or would be good for a certain job?

If no, then what purpose do the contests serve? Perhaps just having fun, which
is fine, but clearly not advancing knowledge of how to design good general
model pipelines -- thus contest performance really shouldn't be related to
whether someone is good in the field (that is, the researchers or whomever
_is_ figuring out general new knowledge about the best model pipelines would
be, but not necessarily contest winners), and contest performance again
wouldn't be very related to, say, whether someone would be good at a job in
this area.

I don't point this out to suggest the contests aren't valuable. On the
contrary, I think they are fun, interesting, and very valuable.

But I have always been skeptical that, even from first principles, contests
like this cannot be useful for finding "the best" engineers to hire. Either
(a) everyone snap-copies what the winners do, so either everyone's a good
employee or none of them are, or (b) you don't learn anything over time from
the general types of things that winners repeatedly do differently or better
than losers, and so being with the "winner" category is not systematically
different than being in the "loser" category.

The only third option I see is that we do steadily learn new systematic
principles that emerge from what winner groups happen to do, but for someone
reason, none of the other contestants choose to copy or recycle those ideas.
Then, whatever property it is that makes winners winners would be related to
discovering new effective ways of doing this, and not just luck-of-the-noise.

~~~
bllguo
I think things other than choice of algorithm are harder to copy. For
instance, finding good outside sets sounds a lot easier than it is. But I do
see your point. For me these competitions are more a fun educational tool
where I can practice the model-building process. I agree that they probably
aren't useful for finding the best engineers... although there are companies
who see things differently.

