
Ask HN: Would anyone use an API for sports data? - romellogoodman
I&#x27;m brain storming for a personal project. It would be similar to http:&#x2F;&#x2F;sportradar.us and be an api for sports stats.
======
AstroJetson
To practice my R skills (I wanted a different dataset to play with) I took an
online class from EdX taught by BostonU baseball guru Andy Andres. They talk a
lot in the class about sabermetrics and the data sources you can get. The
public stuff (either free or through paid sources) is a fraction of the data
that the clubs get and use for their own internal use. Things like wind on the
field, each pitch speed, where it went in the strike zone, etc. are not
released out.

So you have a tough climb to get access to that data without getting the clubs
to give it up. But if you build it the fans will come :) there is an amazing
number of people that live for the little nuanced data.

For those of you interested in Sabermetrics or think you'd like to know more
about baseball stats, I highly recommend Andy's class.

------
eropple
I do use APIs for sports data, but I pull that data from existing places.
Unless you have some impressive connections or a _lot_ of time on your hands,
you're probably not getting data with the same detail as, say, baseball's Sean
Lahman or Retrosheet (or commercial options like the data that underlies
Synergy video tagging for the NBA), and specialty stuff like SportsVU data
(again NBA) is not exactly easy to come by without dropping All Of The Money
on a license. I can't speak to the NFL or NHL, but I know datastores exist
with accessible APIs and very good data sources.

This is a pretty mined-out space. Might be fun (and if so, go for it!), but I
don't know how applicable it would be if you're looking to build it for others
to consume.

~~~
larrykubin
Do you mind sharing some of the existing data sources / API's you are using?

~~~
eropple
I don't operate in this space professionally at this point, so I steer clear
of the paid options, but everything I listed in that post is a decent starting
point. For pure statistics in other sports the Sports-Reference APIs are fine
(though I'd rather pull down the data, and the Lahman database is better for
baseball); I don't have any good play-by-play data for the NBA aside from
commercial sources. I used to have some affordable, but kinda crappy, NBA
play-by-play/shot-chart data, but if you're interested in that I'd just be
ready to open the wallet.

------
danvoell
Let me know if you need any help. We built a machine learning service called
hopdata and I have been looking for a sports API to link into so we could do
something similar to fivethirtyeight.
[http://www.hopdata.com](http://www.hopdata.com)

~~~
romellogoodman
Sounds good. What type of data would you want to get from a sports data api?

~~~
danvoell
At this point, I had it in my mind as a broad and large undertaking, we would
really need to dig into what information is available. For instance, if we
look at baseball, we could make predictions based on historic outcomes and
real-time scores but it would be much better the more granular we can get.
Current score, current inning, current outs, current count, which players are
in the outfield, who are the next 9 people to bat, who is pitching, which
ballpark, the weather...

~~~
romellogoodman
That makes sense. I feel like the more granular and real time you can get, the
better

------
arcanus
Would be fun to have something that easily imported into pandas or another
decent data platform so one could run some simple regressions/data science on
it. Might be able to get some peoplease to help out if you open sourced it on
github, or something similar.

~~~
romellogoodman
I never thought of this. What other data platforms are out there and what is
the most popular or widely used in your opinion?

~~~
arcanus
Most of my stuff is in numpy/scipy, so that sort of array would be great (they
are C arrays under the hood, so pretty fast).

As you can tell, most of my work is in python. But R is absolutely huge, so
providing an interface to that would certainly be handy to most.

------
callmeed
Absolutely. I have a couple of sports apps [0] that use both scraped web data
and a hidden API to get realtime baseball game information. One of them won a
sports hackathon a couple years ago (sports radar was a sponsor).

From what I recall, sports radar is really expensive because they have people
manually entering data all the time. If you could find a happy medium where
your data is close to real-time but doesn't require a lot of manual work, you
could keep your costs low and prices affordable for small app developers.

I'd suggest targeting mobile developers and not just fantasy/daily fantasy.
I'd love to work some more up-to-date questions in my sports trivia app.
Example: "What had the most rebounds in Warriors/Cavs game 1?"

[0]

[https://itunes.apple.com/us/app/hat-trick-daily-sports-
trivi...](https://itunes.apple.com/us/app/hat-trick-daily-sports-
trivia/id722427984?mt=8)

[https://itunes.apple.com/us/app/on-deck-baseball-
alerts/id91...](https://itunes.apple.com/us/app/on-deck-baseball-
alerts/id911024857?mt=8)

~~~
Touche
How far off are we from using machine learning to gather data from video,
removing the manual entry cost?

~~~
Eridrus
You're probably better off with an OCR system that's configured per channel
display format, rather than an ML solution. ML systems require training
regardless of effectiveness.

------
krick
I don'n really see the point of the question. I doubt you hope to find out
there is a significant consumer-base in need, for reasons already mentioned,
so are you asking if it's useless? I would say no, because making data
accessible always is good.

Personally, I don't care the slightest for stuff like football, hockey or
whatever, but I remember struggling to find comprehensive data-set on human
physical abilities, such as results in light&heavy athletics, which would
include year, age, measurements and all other stuff that might be potentially
interesting when analyzing something like that. Data on not-mainstream events
is especially hard to find, which is a pity, as it is potentially even more
interesting, because olympic athletes are obviously less human-like.

------
kornish
Definitely! Thinking about building a natural language query API over sports
data, similar to Statmuse [0], but the whole thing kind of depends on having
data in the first place.

[0]: [https://www.statmuse.com/](https://www.statmuse.com/)

~~~
romellogoodman
What type of data would you be looking for?

~~~
kornish
Individual player and team stats, probably at a per-game resolution, would be
ideal. Relational works great but any format could work. Additionally, a join
table of player-team memberships over time (per season?) would be useful for
visualizing timelines. Bulk data downloads could also be useful, since the
tool I'm envisioning wouldn't depend so much on real-time data as the ability
to explore past trends.

------
AlwaysBCoding
I currently use sportradar. People will definitely use a sports data API --
but you would have to beat sportradar somewhere (either on developer
experience, price, accuracy, feature set, number of sports covered, etc...) to
make a viable business out of it.

~~~
romellogoodman
That's what I was thinking. The easiest way to beat them would be price and
developer experience. The price tag of the data is a little pricey and the
docs seem hard to understand.

What are the biggest short comings you've found with the service?

~~~
AlwaysBCoding
You probably won't beat them for B2B. The service is good but not great, but
the switching costs are too high that even if your service was better in some
way you wouldn't win any customers. They're competent and get the job done and
with B2B if you have an existing foothold that's normally enough.

The service is definitely too expensive, the data is occasionally wrong and I
have to hound them to fix it, and the developer experience is surely lacking,
but at the end of the day a working JSON API of sports data is a working JSON
API of sports data, all the other stuff is just noise.

The proper way to beat sportradar would be to find a way to pursue a B2C model
instead of a B2B model. i.e. instead of 3k a month to license the data if
there was some way to pay $25/month for a similar experience you might be able
to capture a much larger market share that sportradar prices out. If you could
get college kids hacking on sports stats to use your service because it's the
only thing they can afford you might be able to grow that into something.

~~~
romellogoodman
Yeah thats what I was thinking, a very low entry point. I don't see any
problems associated with only charging $15-25/month to access the data.

One thing that I have been wondering is, how do I get away with using the data
without the leagues intervening? Or do they not own the data because the stats
are publicly accessible

~~~
AlwaysBCoding
You can use the stats without the league's permission for sure. Check out
[http://www.bitlaw.com/source/cases/copyright/nba.html](http://www.bitlaw.com/source/cases/copyright/nba.html)
if you want the actual legal grounds for why you can use stats.

------
slindz
Depends on exactly what you're setting up.

If you're doing the collection, and making that data available - I would be
very interested. If it's the framework to do collection myself - not so much.

Looking forward to more details!

~~~
romellogoodman
I would be doing the collection myself and making the data available.

~~~
stevefeinstein
Have you looked into what the MLB, NFL, NBA, etc would do to block you? While
they have no legal protection of facts, you can't copy their facts from them
and hope they won't come after you.

~~~
romellogoodman
No I have not. But is there any way for them to know if a fact was copied from
them or not? If Steph Curry made 10 out of 11 field goals in a game. How would
they know if I watched the game and found that data out or looked on their
website?

------
abathur
Once in a blue moon I get the urge to run some analysis or another on sports
stats, but I almost inevitably end up having second thoughts when I realize
obtaining good data is going to be an ongoing chore.

~~~
romellogoodman
Yeah i feel like obtaining historical data will be easy but staying up to date
is the real challenge

~~~
abathur
On the upside, the existence of that friction definitely points to a gap where
value can be added. Another likely source of friction where you can add value
is the work anyone working on multiple information sources (whether this is
several databases/apis for one sport, or for different sports altogether) has
to do to integrate multiple services and data models. If you can identify some
people who are doing that work in the first place, you can probably make a
business case for how you can simplify their application and give them a
single interface/API for all of that data.

I'll also echo the suggestion from @scottrblock on the potential usefulness of
various metadata. If all you have are game statistics, while people can
obviously do lots of novel analysis on the data, it's fairly obvious what the
"use" is. With a wide spectrum of metadata, you might end up simultaneously
creating something of value to someone analyzing the effect of weather on
different sports/teams/players and for hotel chains trying to set prices based
on the historical attendance for all sporting events within 50 miles.

If you could create a revenue model that supports delivering the service both
to continual high-volume business users and to users just using the API
occasionally to support an analysis project, perhaps it's possible to pass
revenue downstream to other data sources. Granted, this adds a number of
administrative problems (people submitting data collected/generated/sold by
other parties as their own to collect revenue on it; getting low-
quality/falsified data; etc.)

* I wasn't aware of stattleship or sportsradar before this thread; no idea how well they cover these.

------
swanson
[http://developers.stattleship.com/#introduction](http://developers.stattleship.com/#introduction)

~~~
romellogoodman
Seems like what I was looking for but I wonder how robust their data is.

------
rralian
I was looking for an API for NCAA bowl game results for a simple app to
coordinate a family pool that we do every year and I couldn't find anything
that would cost less than several thousand dollars (I probably wouldn't pay
anything really). I just created a dashboard to enter the scores. If something
was available I'd love to tie into it.

------
jboggan
Yes indeed. I have an ongoing project with college football and I was
surprised nothing good for this actually exists.

Correctness and comprehensiveness is probably more important in my mind than
having the information immediately. I'd wager most people around here would
want to use it for training data rather than any sort of real-time system.

~~~
romellogoodman
What time of data are collecting in your system if you don't mind me asking

~~~
jboggan
I was just trying to start with really simple things like all of the weekly
game scores and what rank the teams held at that point in time. I was
surprised no one had this and I just had to write a scraper.

------
imroot
ESPN had an API for a while, but, it didn't provide scores (they were getting
the scores from a third party).

I've been looking for something like this for one of my personal projects,
but, haven't really had the time to spend on it recently.

~~~
romellogoodman
Besides scores, what other data would you look for?

~~~
imroot
Upcoming Games, player stats, feeds of player photos.

------
pbreit
Wouldn't files be much, much easier to work with? Or do APIs enable you to
regulate consumption since the data is so expensive?

~~~
romellogoodman
APIs allow anyone to request and use the data without have to host all of the
data. At least thats what I was thinking.

------
scottrblock
Very much so. Especially if it pulled in other tangental data such as weather
at the time and location of the game.

~~~
romellogoodman
So stats and meta data such as weather, location, and time are what you would
look for?

------
bbcbasic
My company uses such services in other sports. They are professional punters.

I imagine William Hill etc would find it useful

------
lowmagnet
Stats, LLC is another example.

~~~
romellogoodman
Do they have a publicly available api though? It seems like you have to
contact a sales rep just to get access.

------
dayre
I would build a REST client for it and pipe all the data to /dev/null.

~~~
romellogoodman
Yeah thats what I was thinking. A rest api to interface with all of the data

------
blts
Hedgefunds?

