
When Sports Betting Is Legal, the Value of Game Data Soars - jbredeche
https://www.nytimes.com/2018/07/02/sports/sports-betting.html
======
jonbaer
"Some legal experts, like Ryan Rodenberg, an associate professor of sports law
at Florida State University, believe that, as with musical recordings and
other copyrighted material, courts will find that real-time sports data is
owned by those who produce it: the leagues and their players.

Others dismiss that view. Marc Edelman, a professor of law at Baruch College,
said he believed that only "pre-scripted" events were subject to copyright —
meaning that while professional wrestling performances might qualify,
football, basketball and other true competitions would not."

~~~
rayiner
I don't think either would fall under copyright. As to the first, the Second
Circuit has held (in a widely cited case) that NBA game data is not subject to
copyright:
[https://en.wikipedia.org/wiki/National_Basketball_Ass%27n_v....](https://en.wikipedia.org/wiki/National_Basketball_Ass%27n_v._Motorola,_Inc).
The guiding rule, under the Supreme Court's _Feist_ case, is that facts cannot
be protected. What happened at an NBA game is a fact; the fact itself cannot
be copyrighted. Only an expression of that fact (an article or radio segment
reporting on it) can be protected.

~~~
btown
Fixed link:
[https://en.wikipedia.org/wiki/National_Basketball_Ass%27n_v....](https://en.wikipedia.org/wiki/National_Basketball_Ass%27n_v._Motorola,_Inc%2E)

> The district court held that Motorola and STATS did not infringe NBA's
> copyright because only facts from the broadcasts, not the broadcasts
> themselves were transmitted. The Second Circuit Court agreed with the
> district court's argument that the "[d]efendants provide purely factual
> information which any patron of an NBA game could acquire from the arena
> without any involvement from the director, cameramen, or others who
> contribute to the originality of the broadcast" [939 F. Supp. at 1094].

It’s a really fascinating line that the court drew. The concept that a certain
player scores a 3 pointer with around 2 seconds left on the clock is clearly a
fact knowable to every patron. But what about the exact location from which
they took the shot? What about the number of milliseconds the ball was in the
air, or the angle at which the shot was launched? These are facts that mere
humans cannot access with accuracy; they require the involvement of cameras,
camera operators, and other entities characteristic of a copyrightable
expression of the fact. If I were to use these to create a 3D simulation of a
game, would that not be a derivative work of the film from which I obtained
sufficient facts to make that simulation? Where do we decide that pixels on a
video feed are more special than other parts of the content?

~~~
rayiner
> Where do we decide that pixels on a video feed are more special than other
> parts of the content?

It comes from the Copyright Act: 17 USC 102 states that copyright protects
"original works of authorship fixed in any tangible medium of expression." It
also states that in "no case does copyright protection for an original work of
authorship extend to any idea, procedure, process, system, method of
operation, concept, principle, or discovery, regardless of the form in which
it is described, explained, illustrated, or embodied in such work."

The way this has been interpreted is as a dichotomy between ideas/facts and
expression. You might use very expensive equipment to, for example, map the
ocean floor. That data cannot be copyrighted. But a video visualization of
that data could be copyrighted.

The result is, as you point out, somewhat unintuitive in that case, since
generating the data is the hard part, not creating the visualization. But
that's where Congress chose to draw the line in order to circumscribe the
scope of copyright.

~~~
secabeen
> The result is, as you point out, somewhat unintuitive in that case, since
> generating the data is the hard part, not creating the visualization. But
> that's where Congress chose to draw the line in order to circumscribe the
> scope of copyright.

And then the fun part, where data owners intentionally add false data to catch
people. As a false item, it's not protected under the fact exclusion, so any
inclusion of it is clear infringement.

~~~
devilsbabe
Wow really? That's devilishly ingenious. Do you have examples of that?

~~~
ermir
Here's an example in the cartography world:

[https://en.wikipedia.org/wiki/Trap_street](https://en.wikipedia.org/wiki/Trap_street)

~~~
secabeen
Yeah, there are equivalents in the yellow/white pages industry, from what I've
heard. That wikipedia page shows that it doesn't always work, but it is an
interesting idea.

------
PaulRobinson
This is not new. Sports betting has been legal in most parts of the World for
some time. Football data in Europe is so expensive, very good businesses like
Opta can make a very nice living from it. Go get a quote for them for Premier
League games alone: thousands a year for only some of the data.

For years, people getting into sports analytics have done so via baseball (and
the Sabremetrics community), and the NBA because the data has not been seen of
commercial utility. It's been collected by fans.

That will change dramatically, but it should be resisted. Leagues and players
should embrace open data because it will in the long-term lead to analysis
that helps them, but more importantly, fosters a deeper interest in their game
and therefore makes their own careers more valuable.

~~~
phillc73
> Football data in Europe is so expensive

Depending on which data you need, there are already some good sources of free
football data.[1][2][3]

Someone has also conveniently wrapped much of this in an R library.[4]

Football is actually one of the better sports in terms of easily obtainable
data at no cost. Rugby is much more difficult to find extensive datasets,
although there are some interesting attempts.[5]

Decent cricket data also exists in a few places[6], but generally requires
faster and more regular updating. However, there are R libraries for cricket
data too.[7] This one scrapes from the ESPN Cricinfo site.

It is possible to obtain horse racing data for the UK and Ireland at a
reasonable price, for personal use[8] and Hong Kong does a great job of making
a huge volume of horse racing data available at no cost, but not in a
particularly machine usable format (extensive scraping required). Sadly, other
large racing jurisdictions such as Australia and the US don't have anything
free, or even reasonably priced, as far as I'm aware. Ray Paulick has covered
this as a general problem for the sport for a few years now.[9]

[1][http://www.football-data.co.uk/data.php](http://www.football-
data.co.uk/data.php)

[2][https://github.com/openfootball](https://github.com/openfootball)

[3][https://github.com/jokecamp/FootballData](https://github.com/jokecamp/FootballData)

[4][https://github.com/dashee87/footballR](https://github.com/dashee87/footballR)

[5][http://api.drop22.net/](http://api.drop22.net/)

[6][https://cricsheet.org/](https://cricsheet.org/)

[7][https://github.com/tvganesh/cricketr](https://github.com/tvganesh/cricketr)

[8][https://www.betwise.co.uk/](https://www.betwise.co.uk/)

[9] [https://www.paulickreport.com/news/the-biz/gardner-horse-
rac...](https://www.paulickreport.com/news/the-biz/gardner-horse-racing-
statistics-and-data-locked-away-from-most-players-and-fans/)

~~~
eeeuo
I would argue that almost all of that information in your post is stats not
data.

The type of data that people in this thread are talking about would be more
in-line with detailed positional information about each of the players on a
football pitch over 90 minutes. In a cricket context, it would be more along
the lines of the exact release angle and speed for each of the bowlers.

This type of information is clearly available, as Michael Caley is able to
quickly generate xG maps for an entire game[1], but I do not believe it's
public.

Your [9] link points out that much more information is available to baseball
betters, but even baseball has a significant walled garden in terms of data.
For example, the raw data used to generate the stats in [2] is not open to the
public.

[1] [https://twitter.com/caley_graphics](https://twitter.com/caley_graphics)

[2]
[https://www.youtube.com/watch?v=tzPKlQXo6hk](https://www.youtube.com/watch?v=tzPKlQXo6hk)

~~~
phillc73
You make a good point and my post requires clarity.

My links were all to post-event data, not live in-play data sources. I still
wouldn't call, for example in a cricket match, the number of wickets taken by
a bowler a stat. It's just data. A stat is derived from the data, for example
bowling stike rate or economy. Or that a trainer had a winner at a certain
race track. That's just the post-event data. If you want to derive further
statistics, you have to calculate it yourself.[1]

The links above just have, for the most part, raw event data.

[1] [https://blog.betwise.net/2018/06/19/loops-with-r-
creating-a-...](https://blog.betwise.net/2018/06/19/loops-with-r-creating-a-
racecard-with-trainer-and-jockey-stats/)

~~~
eeeuo
The number of wickets taken is a stat. The raw data that informs it is the
collective set of all balls bowled by a bowler.

I'm not being needlessly pedantic, it's an important distinction when
considering the level of analysis that one is able to perform. If you are
doing major cricket analytics, you need ball-by-ball information, including as
much information about the bowler's position, movement and arm motion,
batter's position, movement and stroke information, how the field is set up,
conditions of the pitch, situation in the match, etc.

For example, consider a situation where we're attempting to compare two
bowlers. Bowler A may have got a wicket off a shot that 95% of batters would
not play, whereas Bowler B did _not_ get a wicket despite bowling a ball that
achieves a wicket 10% of the time. The stats suggest that bowler A is in
better form, but a data-driven view of the game suggests that bowler B is
actually in better form.

As it stands, stats are available in abundance for every major sport, but
detailed data is not. If a better had access to the latter, and they were were
able to parse it with an in-depth understanding of the sport, they'd be at a
huge advantage versus betters that did not, and they would reap the benefits.

------
geraldbauer
You are more than welcome to join in. I've started two open sports data
initiatives. The world cup is the world's biggest sport event (3+ billion
fans) but open data (or data services) are hard to find. See the football.db
[1] or football.csv [1] projects for more. Enjoy the beautiful game with open
data :-). [1]
[https://github.com/openfootball](https://github.com/openfootball) [2]
[https://github.com/footballcsv](https://github.com/footballcsv)

------
gadders
This [1] was an interesting article from a 3 years ago about Tennis "Court
Siders" \- people paid to transmit results of games back to betting syndicates
faster than the official bookmaking results services. Kind of low-latency
sports betting...

[1]
[https://www.bbc.co.uk/news/magazine-32402945](https://www.bbc.co.uk/news/magazine-32402945)

~~~
anonu
I wonder about the technical details. Data gathering hardware, just one button
or two or more? What data is important to collect? Just points? What is the
edge in doing this? Data quality? Latency? How much faster are you than TV
feeds?

~~~
eeeuo
It is likely used with live betting systems(common on many gambling sites)
where you can bet on the outcome of a single game of tennis. The line for the
game will move with each point to reflect the new situation. If a better is
able to get information about points seconds ahead of the other betters, they
are able to make bets on the updated odds before the line moves as the rest of
the betters react to the new information.

For example, consider a game with evenly matched opponents. At 0-0, the server
might have odds at ~1.5, with the non-server having odds at ~2.7[1]. When the
score moves to 0-15, the odds might move closer to even, eg 1.9/1.9. If you're
able to get information about the first point ahead of the crowd and place a
bet on the non-server at 2.7 when the true "predicted outcome" is closer to
1.9, you obviously have a massive advantage.

This can also be used to bet on sets or matches, but the advantage is much
smaller. Still, a better with any sort of advantage will always win over the
long term.

[1] Yes, those don't add up to 100%, welcome to the vig :)

------
anonu
I think it's absurd to try to control "data scraping" at events. Just like you
should be able to scrape the public facing pages of any website (as recent
legal cases have shown) you should should be able to attend an event and
collect any data you like.

------
sputknick
I'm excited to see what decentralized technologies like Augur and Gnosis do to
disrupt some of these conversations. The question raised in the article about
"who adjudicates data disputes" is one of the main features of Augur with it's
decentralized oracles. Also if people can anonymously use a decentralized
system to change the odds, it decreases the value of data from these paid
sources.

------
chillydawg
I am a consumer of such real time data and I can tell you from experience that
where there is no competition quality drops and prices increase. I really,
really hope they leave the market open.

------
forapurpose
> Data on the second-by-second action — exactly when a goal is scored, where
> it landed in the net, who had the assist — creates manifold betting
> opportunities.

Will there be high-frequency sports betting, like HFT on financial markets?
Why not?

~~~
chillydawg
Go look at the betfair.com exchange (if you are able). They handle several
thousand bets/sec at high load. Plenty of high frequency traders there
employing the same tricks as the city boys.

------
foobaw
On an aside, my friend is a full time sports better who has an extremely high
ROI during football season on one of those Fantasy sites. He has difficulty
making any money on other sports. I wonder why.

~~~
eeeuo
Variance. Unless your friend has sustained this performance over a number of
years, it's probably just variance.

Most successful long-time sports betters are not playing on a level playing
field. In order to beat the vig over the long term, a successful better must
be privy to information that is unavailable to other betters, or at least the
majority of betters. This either means a novel analytical technique, which is
rare, or inside information.

The number of people who simply watch games closely and are able to discern
information that allows them to bet successfully on future games _over the
long term_ is exceptionally rare in the real world.

