

Quest to Track Every Shot in the NBA Changed Basketball - esalazar
http://www.wired.com/2014/10/faster-higher-stronger/

======
jimmytucson
I've made the same types of charts--hexbins encoded by size (for shot
frequency) and color (for Effective Field Goal Percentage) using shot
coordinate data scraped off of ESPN--and my charts don't come out anywhere
near as smooth and "trend-depicting" as his do. I've concluded that he must be
smoothing the data so much that the result barely resembles reality.

First of all, the frequency of shots near the basket so overwhelms that of
shots anywhere else that the hexbins away from the basket end up being so
minuscule that they're barely visible. So there must be some kind of frequency
capping or logarithmic scaling somewhere in his charts. This is not the most
egregious "lie" in his charts but it hides an interesting truth: 70% of shots
are taken from 30% of the half-court (I'm making those numbers up but you get
the idea.)

Secondly, I found that a player's (or even a team's) eFG% varies so much from
bin to bin that you rarely get smooth color patterns like the ones that show
up in his charts. His charts show orangered hexbins close to the basket that
somewhat evenly and predictably get lighter and yellower as distance from the
basket increases. But in practice, this is nearly impossible. Each hexbin
would have to span hundreds if not thousands of shots--much more than an
entire season's worth of data for a single player--for a pattern like that to
appear. To me, this is almost deceitful because it tells a "story" that isn't
there. eFG% is much more "random" than his charts depict. A player might go
10/20 from one location and 5/30 from the one directly adjacent.

------
danh1979
A curious non-sequitor:

"The data wasn't exactly private, but neither was it public—Goldsberry scraped
it from the web."

If Goldsberry scraped the data from the web, wasn't the data inherently
public?

~~~
pp19dd
So this situation happens in the news world all the time. While a company or
agency has original databases, excel sheets, what have you - they don't
consider that publishing them in a "human-readable" format is nearly the same
thing as publishing the raw data. Try calling the place for a copy, and
they'll hang up on you. But, they won't think that a crafty outsider can
probably reconstruct the original by scraping.

What's particularly interesting here is guessing the motivation behind
publishing. Was the information a trade secret, or did a middle-manager want
to show that their team is ahead of the others? Or are these feathers to show
the company has the know-how and capability?

In either case, most of the web-published data isn't initially considered as
published data by the publishers, who in turn don't think to state any
restrictions governing the data. That's when we scrape and make use of it -
and even if there are restrictions on republishing, you can still perform and
claim transformative derivative work.

The fun legalese part is what happens when they discover what you're doing and
try to lash out, or interrupt a standing scrape. One time, all it took to
unblock access was to show up at a meeting and get yelled at by a police
captain for 30 minutes. Our retort started with "In the interest of public
safety, ..."

~~~
function_seven
> One time, all it took to unblock access was to show up at a meeting and get
> yelled at by a police captain for 30 minutes. Our retort started with "In
> the interest of public safety, ..."

I'd like to hear more about your example.

~~~
pp19dd
Well, the PD found out that we were scraping and publishing data when a
superior asked them about it. They were embarrassed and ambushed. Imagine your
boss asking you "hey data guy, when did we start sending data to the paper?"

The data itself was public safety information and there was every reason to
publish it. Anyhow, our access got cut off and when we inquired about it, they
setup a meeting at their headquarters instead of providing any answers. That
morning, I showed up at their deathstar-looking building with my editor and we
spent 30 minutes getting chewed out by guys in uniforms, suits and badges for
"incorrect geocoding" and other false information that we were publishing.

We said that yes, there were some errors but that we took every reasonable
attempt to validate it (see [http://pp19dd.com/2009/02/vessels-in-
distress/](http://pp19dd.com/2009/02/vessels-in-distress/)). After the guy
running the show vented, he showed us the proper way to geocode and correct
errors during which time I was thinking "uh, why not send us the lat/lng that
you're showing us here, instead of berating us?"

The compromise was that they'd add "precint zone" information to the dataset,
and we could proceed so long as we checked whether a geocoded point was within
the zone. We promised to check this process with a point-in-polygon algorithm,
and the guy was happy as a clam that we took note of his work and gave him
respect. After that, he eased up and showed us some of the other cool stuff
the PD data guys were working on. For example, they pre-plot escape vectors
for burglaries so when cops are dispatched, they first go to where bad guys
are likely running to, not where they ran from.

------
adamfeldman
"Now, as new technologies start to generate terabytes of data about players
and tactics, that next great competitive advantage will go to the number
crunchers and analysts who can make sense of all those signals. Take the
statistical tsunami of SportVU in the NBA. “It's not an exaggeration to say
that 85 percent of the teams don't know what to do with this data,” Goldsberry
says. “The idea that this is going to revolutionize the NBA—well, I'm not sure
that's true unless teams awaken really quickly to things like machine learning
and data visualization.""

~~~
Nimi
Well, this sounds pretty much like an admission of the uselessness of his
methods. Contrast this with Wins Produced
([http://www.boxscoregeeks.com/faq](http://www.boxscoregeeks.com/faq)), a
simple statistical model, and it's hard to see the point...

------
bmoresbest55
As a basketball player/sports fan and computer programmer/ tech enthusiasts, I
love to see people post these types of articles.

~~~
swanson
Do you read Grantland? If not, check it out - they have great stuff at the
intersection of nerds + sports on a regular basis.

~~~
PKop
[http://fivethirtyeight.com/sports/](http://fivethirtyeight.com/sports/) as
well

------
genofon
I have worked with some NBA teams and they are already using software to track
ball, players, referee, training, sleep, Heart Rate etc.. there is really
nothing revolutionary in this article

~~~
mountaineer
Are you able to share any details of the software, devices and/or sensors they
use? I've been kicking around ideas on how to bring these types of things to
rec sports.

~~~
genofon
software: depends on the team, device: most of them use stats
[http://www.stats.com/sportvu/sportvu.asp](http://www.stats.com/sportvu/sportvu.asp)
to track games and various wearable technology (which is not allowed in games
) on training. They are not as far behind as media portrait them, all these
tracking devices (sleep. HR, GPS..) have been around for a long time. You
wouldn't believe the stuff they tried, there are some snakeoils salesman but
they usually end up pretty badly.

Also for software there is a lot of variance, I feel that they prefer people
that know personally or have previous experience. There are a few people
making the decision to adopt a platform for a team to track all the data
floating around, usually few high managers of physio, that's your audience.
Usually it's highly customized and there is a lot of support as the user
(coach, physio stuff, athlete...) is not a tech person

------
the_cat_kittles
"This Guy" publishes stuff regularly on Grantland, its not like he is an
unknown

