Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
$50,000 Spock Entity Resolution Challenge (spock.com)
4 points by bootload on April 15, 2007 | hide | past | favorite | 11 comments


This is a good way to find a) people to hire, b) publicity, and c) solutions to tough problems. I have to assume they're doing it for the first two reasons, not so much the latter. My guess is that they've already got a good solution to this, given that it's integral to their business. Maybe they think they'll find a better solution out there, and if so it'd make sense to hire that person.


A small prize for a big problem. Is this problem (differentiating individuals) not what spock claims to be its core technology? If so, it is a mite worrying that they need to advertise for a solution.


They are copying Netflix with this approach. However it makes much more sense for Netflix: prize is $1m, they are an established company, and recommendation system is important but not the core of what they do.

I will go ahead and download the corpus though - the problem is interesting and you can't easily find datasets like this.

Edit: Anyone care to post it as a torrent? I think posting a link here and in Reddit will be enough to get a good download speed. I wish someone posted a torrent for the Google N-grams corpus as well...


Yes, this very much confirms my theory that startups must do unorthidox methods to gain traffic, especially when the users of the site impact other users.

spock is worthless without other users on spock. What is a phone book/directory without people? If they can break through the chicken or egg problem that many others have faced in the past, they'll be very successful because that same problem will arise as a barrier of entry to competitors.

Having competitions, paying users, or even sending spam are all viable methods to getting initial traffic for a site.


spock is worthless without other users on spock

From what I understand it crawls the internet to find people rather than getting people to submit information. This way it's useful even without other users on spock.


Here's the Reddit discussion link: http://programming.reddit.com/info/1hy27/comments

If we had Alex's social comment site we'd already have this linked in.


Ok I'm registered. Downloading the 1.5G document set now. This is pretty interesting.


way to go matt. Did you say 1.5G? For those reading & who want a summary the problem is on 'entity resolution'.

'... A common problem that we face is that there are many people with the same name. Given that, how do we distinguish a document about Michael Jackson the singer from Michael Jackson the football player ...'


Yes, 1.5G. And that's just the small dataset to get you going! They really should used a Torrent for this thing. Right now I'm hoping my wireless keeps up with the download.


There are times when you want to keep things in your house. They're using amazon s3 so it's costing them $0.30 per download. Pennies compared to the 50k prize.


or offer a DVD burning service.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: