
$50,000 Spock Entity Resolution Challenge - bootload
http://challenge.spock.com
======
mattculbreth
This is a good way to find a) people to hire, b) publicity, and c) solutions
to tough problems. I have to assume they're doing it for the first two
reasons, not so much the latter. My guess is that they've already got a good
solution to this, given that it's integral to their business. Maybe they think
they'll find a better solution out there, and if so it'd make sense to hire
that person.

------
zkinion
Yes, this very much confirms my theory that startups must do unorthidox
methods to gain traffic, especially when the users of the site impact other
users.

spock is worthless without other users on spock. What is a phone
book/directory without people? If they can break through the chicken or egg
problem that many others have faced in the past, they'll be very successful
because that same problem will arise as a barrier of entry to competitors.

Having competitions, paying users, or even sending spam are all viable methods
to getting initial traffic for a site.

~~~
danw
_spock is worthless without other users on spock_

From what I understand it crawls the internet to find people rather than
getting people to submit information. This way it's useful even without other
users on spock.

------
dood
A small prize for a big problem. Is this problem (differentiating individuals)
not what spock claims to be its core technology? If so, it is a mite worrying
that they need to advertise for a solution.

~~~
ntoshev
They are copying Netflix with this approach. However it makes much more sense
for Netflix: prize is $1m, they are an established company, and recommendation
system is important but not the core of what they do.

I will go ahead and download the corpus though - the problem is interesting
and you can't easily find datasets like this.

Edit: Anyone care to post it as a torrent? I think posting a link here and in
Reddit will be enough to get a good download speed. I wish someone posted a
torrent for the Google N-grams corpus as well...

------
mattculbreth
Here's the Reddit discussion link:
<http://programming.reddit.com/info/1hy27/comments>

If we had Alex's social comment site we'd already have this linked in.

------
mattculbreth
Ok I'm registered. Downloading the 1.5G document set now. This is pretty
interesting.

~~~
bootload
way to go matt. Did you say 1.5G? For those reading & who want a summary the
problem is on 'entity resolution'.

 _'... A common problem that we face is that there are many people with the
same name. Given that, how do we distinguish a document about Michael Jackson
the singer from Michael Jackson the football player ...'_

~~~
mattculbreth
Yes, 1.5G. And that's just the small dataset to get you going! They really
should used a Torrent for this thing. Right now I'm hoping my wireless keeps
up with the download.

~~~
yaacovtp
There are times when you want to keep things in your house. They're using
amazon s3 so it's costing them $0.30 per download. Pennies compared to the 50k
prize.

