Two years ago I wrote http://formula1db.com , to teach myself sql. To get the data, I had to screen scrape formula1.com . Once I had the data, learning SQL became a joy. I haven't done much with the data since I built the site, but I am considering open sourcing it.

What I don't understand is, why don't sport leagues open source their data. What do they lose? Its a good think that people are so excited about your sport, that they build custom apps based on it. Sadly sport leagues don't seem to get it, I remember the MLB cracking down on a fan generated datatbase of baseball statistics a while ago.

Actually, MLB data is fairly close to being open-sourced. Historical data IS open-sourced (though not by MLB itself):

http://baseball1.com/content/view/57/82/ http://retrosheet.org/

Current major and minor league data is available as well, though MLB will crack down on anyone who is trying to make money off of derivative products. Here's where you'll find it, as XML:


One can do pretty cool stuff with all of it, and many people have, despite the fact that we can't make money off of it:

http://minorleaguesplits.com/ (my site)

http://baseball.bornbybits.com/2008/pitchers.html (analysis based on detailed pitch speed / break information that MLB started collecting last year.)

They think that they can't sell the data if they open source it. They probably also have control issues.

They think that a popular application using "their data" is necessarily lucrative and they think that they should get a huge hunk of that money.

