

$10,000 algorithm competition: predict commute times on a Sydney freeway - datageek
http://kaggle.com/RTA

======
tgflynn
I would suggest that anyone considering entering a kaggle.com competition take
a look at their Terms and Conditions : <http://kaggle.com/Legals/terms> Note
section 9 which appears to require the winner to assign copyright on the
submission to the organizer.

Giving up all rights to your code has never seemed to me to be a good strategy
for a developer (except possibly in the case of full time employment).

~~~
reader5000
Submissions do not consist of code, merely predictions (at least for the
competitions I'm familiar with). I don't think there is anyway they could
compel you to give up your code, but I'm no lawyer.

~~~
nl
Actually:

 _The winning entry has to be a general algorithm that can be implemented by
the RTA. That means that it takes a timestamp as an input and can generate
predictions for the next 15 mins, 30 mins, 45 mins, 60 mins etc. Results must
also be replicated by the RTA before any prize money is awarded._

So it sounds like they are expecting your code.

~~~
noverloop
pseudocode will do as well

------
rrrhys
A tollbooth was taken off this road about 3-4 months ago that was a major
congestion point - 2 years of historical data is largely useless I think.

~~~
robryan
As a little aside, being from Melbourne I was amazed how many toll roads their
are driving around Sydney.

~~~
rrrhys
Being from Sydney, I was amazed to see you guys have solved this particular
problem on your motorways (time to destination based on traffic).

Is it good? I was just roadtripping through.

~~~
robryan
I have no idea how accurate those times are, I always assumed they would just
be basic approximations but you may well be right that they could be more in
depth.

~~~
Nick_C
When I lived there up to a couple of years ago, they were surprisingly
accurate, even during fairly heavy traffic.

I can't recall the exact details, but from memory they use road counters at
entries and exits to measure traffic numbers within sectors and, based on
theoretical maximum number of cars per sector, predict whether traffic is
capable of flowing at the speed limit or at some calculated reduced speed for
each sector. The speed calculations, along with each sectors' distance, give
you predicted trip time to the various exits on the display.

As I said, don't quote me, but that was my understanding.

~~~
rrrhys
Here is your $10,000

------
ryan-allen
Would you consider this kind of action as 'scraping the bottom of the barrel'?
In the design community so-called 'competitions' are considered rorts and
shunned by anyone worth their weight.

What is ironic is that the NSW govt. probably have spent 50 times the figure
working with private companies attempting to develop an acceptable solution,
and this stinks of a last ditch effort.

~~~
nikster
I wonder why they only offer $10k. It seems cheap, particularly, as you say,
they probably spent WAY more than that trying to do this in-house.

$10k is honestly hardly worth my time. I guess they're hoping for that one
bright high school kid... good luck!

~~~
brc
I'll say. There is a lot of work in this, and $10k is not going to attract
many professionals. They're obviously out for the talented hobbyist or
beginner.

Considering how much the RTA budget would be, it's almost insulting. Just
having a policeman sit in a car by some roadworks (so-called 'specials') would
cost them about $1000/shift.

The last time a new road project got opened near me the launch party would
have cost nearly $10,000 by the time it was done.

The prize should be at least 50 or 100,000. It would be a tiny spec in the
budget but that would motivate people to spend some real time on the problem.

~~~
willcannings
Take a look at some of the other projects on Kaggle - a lot of them have no
money or roughly similar amounts. One of the main crowds Kaggle attracts is
University students and staff, and they do these projects because they're
interesting and fun. It's similar to a shared task at an academic conference,
which is the sort of thing most of the active ML/KD&DM community are
interested in.

------
sandaru1
I'm little bit sad to see that HN'ers are discussing the prize and money/time
ratio instead of the challenge itself. What happened to the "hacker" in
"Hacker News"?

I agree, money is a factor - but it doesn't have to decide your every move.
This is an interesting challenge whether the prize offered is worth your time
or not. The entrepreneurs are supposed to bunch of people who likes solving
interesting challenges - not a bunch of people who likes solving interesting
problems if only it makes you money.

~~~
papaf
I've fired up R and am doing some exploratory data analysis. I think its an
interesting problem and will definitely submit something if I can get decent
predictions with the downloaded data.

I've looked at the literature and research has tackled such problems using
computer science(neural nets) and statistics (linear regression, Kalman
filters). In my opinion, its easier to approach it as a statistics problem
than a computer science one and this may explain why there is some hostility
on HN. For instance, getting $10000 for a linear regression is easy money
(although I would be surprised if the winning entry used this).

------
gregable
I've never looked at commute time data before, so I'm just an armchair
analyst.

It seems to me though, as a driver, that commute times are dominated by
accidents. And certainly while you could estimate the mean probability of an
accident based on the number of travelers, the daily variance seems insanely
high and its unlikely that any amount of training data is going to decrease
that much. I wonder how good these results are going to be.

~~~
Andys
Still, something could be done - for example, I know that the M4 heads west
out of the city, and so is subject to poor visibility during afternoon peak
due to sunset.

------
icefox
Something more interesting is the company hosting the competition:
<http://kaggle.com/>

"a platform for data prediction competitions that allows organizations to post
their data and have it scrutinized by the world's best data scientists"

They have several up it looks like

------
c-oreills
This is slightly susceptible to result intervention - algorithm not doing too
well? Drive a tractor slowly down the inside lane until reality matches your
prediction!

------
jjcm
The competition seems artificially restricted based on what it's trying to
accomplish. Why limit yourself to past data? We have immense sources of data
for events months before they actually happen. Why wouldn't you account for a
cricket match/product release/concert that's scheduled to happen a month in
the future?

~~~
datageek
Doesn't seem to be anything stopping ppl from including other data.

~~~
pufuwozu
_You're welcome to bring additional data as long as it's publicly available._

[http://kaggle.com/view-postlist/forum-29-rta-freeway-
travel-...](http://kaggle.com/view-postlist/forum-29-rta-freeway-travel-time-
prediction/topic-195-using-additional-datasets-eg-rain-fog-etc/task_id-2467)

------
siculars
Not worth it. A better way to host a public contest is the way NYC is doing it
with NYCBigApps. Take a look at their ToS, <http://nycbigapps.com/rules>, note
section D. Those terms look fair to me.

------
yycom
Problem is you can't predict the speed limit on the M4.

