
Ask HN: How to legally obtain sports data for commercial use? - sga
This is a long standing question of mine reignited by bignoggins post at http://news.ycombinator.com/item?id=1772224 where he described the success of http://www.fantasymonsterapp.com.<p>I've often thought of building sports related apps (esp. pertaining to fantasy sports) but I've always struggled with how to legally obtain the necessary data (scheduling, statistics, player images, team logos, etc.) such that I can pursue it as a commercial venture.  An obvious solution is to simply scrape the info but I'd assume you'd get shut down or blocked rather quickly.  Yahoo offers a Fantasy Sports API but it's to be used for non-commercial purposes only.<p>Can anyone shed light on where/how to obtain current and past sports data that is available for commercial use? (I'm most interested in NFL, NBA, MLB, NHL data)<p>Thanks!
======
mattmaroon
As for the stats, you purchase them from a stats provider. The big dog is
Stats.com, but they're very expensive. They are the primary source for all of
the major fantasy sites, though some use secondary sources (I think for
accuracy verification).

At Draftmix we used a competitor of theirs called PA Sports Ticker, which
Stats bought shortly after we shut Draftmix down. We had previously used a
cheaper one called XML Team, but we realized quickly that we had gotten what
we had paid for as the feeds were often updated very late or contained errors.
They're fine enough for getting started (and probably the easiest to
implement, since you pull the data on demand rather than having them post it
to you) especially if you don't require live stats. Live stats cost more and
are harder to implement. You could get post-game stats and schedule data for a
few grand a year back then from XML Team, live stats for a few times that, but
I don't know what Stats buying their primary competitor has done to prices. I
can't imagine it's made them get cheaper.

There's a new one called Sports Direct. I don't have any experience with them,
but our former salesman from PA works there. I'd be happy to put you in
contact if you'd like, just email me. He's a good salesman at least.

For player images and team logos you need to set up licenses. Logos come from
the league (NFL, MLB), player images from the players' unions (NFLPA, MLBPA).
This is very costly. The actual images themselves can be provided by Stats and
other sources, but you can't use them without paying the license (though Stats
may have worked out a deal that lets them include that in the package).

The Fantasy Sports Trade Association (fsta.org) is the best place to find
service providers for the industry. Anyone worth anything is a member.

~~~
mattmaroon
I should also mention that while scraping is an option, though possibly not a
legal one, some of the stats sites (I've heard) routinely put some false data
in their to catch scrapers. You could probably get around this by scraping
multiple sites and throwing out any data that isn't identical.

------
iheartmemcache
This[1] StackOverflow thread answers this question pretty thoroughly.
Unfortunately it looks like if you want it to be legal, you should be prepared
to pay a hefty (> 4 figures) monthly subscription fee.

[1] [http://stackoverflow.com/questions/57106/anyone-know-of-
an-n...](http://stackoverflow.com/questions/57106/anyone-know-of-an-nfl-or-
nba-api)

~~~
weixiyen
It's a lot more than 4 figures. Getting just the basic data will be in the 5s,
and that's per season of 1 sport. Maybe I suck at research but after talking
to the reps about pricing, 4 figures would be an absolute steal.

~~~
niravshah
4 figures a month, 5 figures a year/season :)

~~~
weixiyen
Ah it appears I just suck at reading. Thanks :)

------
mccutchen
I too have wondered about this for a couple of minor side projects. I've
always resorted to scraping, but it feels wrong and I know it's not feasible
for commercial projects.

I've always assumed that there was some commercial data source out there that
would provide all of this information in a nice, structured format for some
kind of fee, but I have yet to find it.

One nice thing about major sites moving to "live" scoreboards is that you can
often find nicely structured data sources behind them. For instance, here's
the NFL's live score feed, in JSON:

<http://www.nfl.com/liveupdate/scores/scores.json>

(Unfortunately, it's empty as I write this because there are no games going on
right now. Here's an example taken late on a Sunday or on Monday morning:
<http://gist.github.com/626612>)

Another, related question is how to get good gambling information (point
spreads, totals, etc.) for the same use case. I think this might be easier, as
I've come across various sports book sites in the past that offer subscription
services.

On Yahoo's NFL odds page, it says their data source is OddsShark
(<http://www.oddsshark.com/>) whose home page advertises

    
    
      Offer comparative live odds and other sports stats on your site FREE of charge.
    
      You pick the sportsbooks, you pick the bet types and OddsShark.com sports betting odds engine does the rest, delivering feeds to your site.
    

I got in touch with them, but never received a response...

~~~
davidedicillo
that's what I was using for Twootball, the problem is that it breaks during
the playoff.

------
dustym
I've worked with STATS, Inc and XML Team. I've also implemented features
against the Yahoo! Fantasy API and it was very nice to work with.

First off, you are going to have to deal with a rep.

STATS is the big name in the business and they feed, at least partially, many
stat resellers from who you might be able to get cheaper rates. From there I'd
say you should find a cheap service or a mechanism (scraping, etc) that gives
you just enough data to work with and start building against it. Look at XML
team for competitive pricing. If you get to the point where your app is past
prototype, you should then investigate buying into the full service.

Depending on the day and the feed, wrangling sports data is awesome or
horrible or both.

On the subject of scraping, I'm not sure what the legalities are. Obviously
you are probably violating the TOS of any site you are visiting if you grab
the data, but at the same time, strikes, balls and fouls are facts of the
game.

Images and logos are sometimes provided by sports data brokers.

Take a look at <http://www.stats.com/> and <http://www.xmlteam.com/>

------
dougb
What about crowd sourcing the data collection ? Make an app to let sports fans
enter the data as they watched the game and publish it under a CC license ?

I've been to many baseball games where I've seen people keeping score on paper
while watching the games.

------
jat850
In my previous development experience for a fantasy gaming website, I dealt
with Stats Inc and the API/datasets they provided.

Without permission I don't think it would be fair for me to provide you a
direct contact, but they did offer all of the data our site required, in
useable formats.

Our initial site only dealt with the NBA as they provided the best avenue for
use of their logos and player names.

Feel free to contact me more directly if you want a bit more info.

Best of luck!

------
cloudkj
Hmmm, seems to me there's a business opportunity in providing lower cost, more
developer friendly sports data. I remember looking into getting access to
sports data since I wanted to do some analytics after I read Moneyball. Old,
archived data is easy to come by, but any fresh, real-time data sources seem
to have non-trivial costs.

I guess there might be some restrictions on who gets access to the official
raw data for various games, depending on the sports league. If the costs for
getting that data are high, then the only way to circumvent that would be to
collect them yourself. Even then, I don't know if the leagues would come at
you hard for gathering data and using team names or player names...

~~~
kreek
I don't know about other sports but I have researched this for the Premier
League (soccer) and they own the stats lock stock and barrel. Even if you
hired a bunch of mechanical turks to watch every game and jot down every shot
and tackle the stats still belong to the Premier Leageue because the games are
arranged by them. If you list the Premier League schedule on your blog you'll
get a cease and desist unless you pay them a fee.

~~~
LargeWu
American courts have ruled that sports statistics are "pure facts", and
therefore not copyrightable. In fact, companies like STATS, Inc. rely on the
fact that they are public domain. When you pay STATS, what you are buying is
the compilation and quality control they provide.

~~~
notahacker
UK courts have ruled otherwise, and the FA has a copyright troll that
specialises in sending c&d letters to webmasters displaying fixture lists and
their webhosts.

------
mikerhoads
I work for the NFL and we license our data directly here:
<https://www.nfl.info/NFLConsProd/Welcome/index.html>

I don't really know the exact pricing structure but I don't imagine it is
cheap.

------
retree
The situation in the UK, is that anything (and this includes fixtures, logos,
statistics, even live twitter score updates) has to be licensed through a
company called Football DataCo. The costs coming to ~$6000/season just for
fixtures. You can't even use names that sound similar. For example calling
Liverpool 'Merseyside Red'. [1]

They enforce this strongly, outsourcing it to a company who only does this
sort of thing.

[1] [http://www.epltalk.com/2010-11-premier-league-opening-day-
fi...](http://www.epltalk.com/2010-11-premier-league-opening-day-fixtures-
fiasco-21003)

------
gcaprio
I'm actually glad someone brought this up. I'm starting a company around this
very idea: making data available and consumable. Our first site is up:

<http://www.cfbreference.com>

There's about 5 years of data that we've culled from the NCAA about CFB. We're
adding more every week and will soon go back in time for historical data.

But, our twist is that the site will be upgraded to be a completely consumable
site. Full REST API support, dynamic url data generation and more. We're
adding new stuff every day. So you can get the data you way in JSON, RDF, XML
& HTML depending on your Accept header, querying string parameter and even url
parameters.

We are going to try and build apps on top of this date, but data sites are and
will remain FREE. We want to encourage community participation contributions.
That means free for anyone, anywhere even if you yourself don't contribute
data.

We're also going to add scoring / charting apps for mobile phones so that you
can chart your own games and, if you'd like, contribute the data back to use.

We're not 100% there yet, but I'll post here when we are. We'd love feedback
from the entire HN community, not only on the sports data aspect but on the
technical implementation. After all, if it's not easy to use & powerful, we're
not doing a good enough job.

~~~
sga
It definitely seems that there must be an opportunity here to compete against
STATS and others. But if (like you) you wanted to compete where do you legally
get the stats in the first place or in an ongoing manner. What is the original
source of the data? Do you have to pay for game tapes, watch them and compile
the stats yourself?

~~~
cloudkj
I'm also curious as to how you get the data. The fact that there seems to be
almost a monopoly by STATS on the data leads me to believe that there needs to
be more competition in this space. Curious as to how you guys are going about
it.

~~~
gcaprio
The NCAA publishes their data as batch feeds for use. We're starting with just
using that. In addition to the current data they do that for ( 2000- ) a lot
of old data is available dating back to the 1800's in various formats. ( I'm
speaking of college football here in particular ).

However, I believe there are certain restrictions surrounding the real time
access of data. We haven't come up against that obstacle yet, since we're
first attacking historical archiving of data after the fact, even if it's only
1 day after the games have been played.

------
ironblunt
We do mostly baseball and in our first year, we used a bunch of retrosheet
data for historical data and we went with BIS for current season data. They're
pretty laid back and their data was pretty detailed for us. Recently, we went
with MLB.com's xml data over at <http://gd2.mlb.com/components/game/mlb/>
where we had to do a bunch more calculations to get all the data we wanted,
but it's free (for now).

We also looked at XML Team and I found their prices to be completely
reasonable and they have a per document pricing structure which allows you to
control your costs to a much greater extend.

We also spoke with Stats Inc and found them to be pretty unreasonable in terms
of dealing with startups and for home projects.

Hit me up if you want any more data or if Benchcoach can help with the data on
the baseball side. We're looking at expanding it to football and basketball
this year so we've been speaking with XML team about that.

------
luffy
Regarding scraping:

I have a hard time figuring out what the difference is between having a human
read a web page with sports scores on it, and then entering those scores in to
your application vs. having a scraper grab those scores automatically. In most
cases, these source web pages will be publicly available without requiring any
agreement to a terms of service contract.

Scraping a site and using the actual HTML in your application would be a
copyright violation, definitely. Sometimes a particular format can even be
patented. So I'd definitely stay away from actually scraping out an entire
table and inserting that into your app.

But as far as the scores/facts - those are not subject to copyright. So what
is the particular legal issue if you are scraping and only getting _non-
copyrightable facts_ from a publicly available web page? I'm genuinely curious
to know.

~~~
sga
I don't know. I agree that the game scores, roster data, etc. are facts but at
the same time work was performed at some point to compile that data so I
assume there must be some legality around not simply copying it and reusing
it. Of course I don't know for sure it's just a gut feeling. Wish I could find
a definitive answer without having to pay legal consultation fees I can't
afford.

~~~
bignoggins
From a legal point of view, duplication of facts is allowable according to
Feist Publications v. Rural Telephone Service. Learned this in my IP law
class. Duplication of original expression is forbidden. Stats, imo, fall into
the "facts" category.

------
hakan
I had the same experience a lot of people are describing here with STATS -
just too expensive out of the box. If you need real-time data, they're your
only option, but if you need weekly updates or historical data, I may be able
to help.

Playerfilter (<http://www.playerfilter.com>) is built on top of an API that we
are looking to expose to the public (use can see it being used in the URL
hash). API support isn't live yet but we are working with beta testers.
Basically, we return data for players, seasons and games over any time period
since 1970. Please check it out and drop us a line if you'd be interested in
more details.

------
terra_t
With scraping I'd be concerned about legal issues more than I would about
getting "shut down or blocked" technically. On the other time, I spent my
postdoc time working on "low observable" webcrawlers rather than physics...

I'd seriously considered a sports-related project based on open data and I was
still concerned that I could get into legal trouble, so I sorta merged the
project into something much bigger, in which the sports content would be
barely noticeable.

------
stevederico
I know bloomberg sports provides data via an API, but it's not cheap.
<http://www.bloombergsports.com/>

------
shafqat
We (<http://platform.newscred.com>) might be able yo help. Drop me a line
(email in platform).

