
The Distribution of Library Book Circulation Is Not a Power Law - chrismealy
http://cscs.umich.edu/~crshalizi/weblog/744.html
======
mayank
Many, _many_ things aren't a power law, but are widely reported to be due to
sloppy statistics and a deep misunderstanding of regression on log-transformed
data.

The problem is this: people frequently log-transform empirical counts of
things (like library book circulation) and then fit a line to the
distribution. Since the line generally fits very well, with very high (but
ultimately meaningless) R-squared values, they rush off to write a paper.

I could cite at least a dozen very "important" papers that gush over power law
relationships in this or that. Instead, I'll let Cosma Shalizi's excellent
blog post on the topic do the talking and citing:
<http://cscs.umich.edu/~crshalizi/weblog/491.html>

It's an excellent read for anyone reaching for a log-transform.

EDIT: I didn't realize that Cosma was the author of the linked piece. The
other blog post by him I link here is also highly recommended.

~~~
Zaak
Everything is linear if plotted log-log with a fat magic marker.

~~~
mayank
It's not just that -- if the tail of the distribution doesn't seem to fit, you
can just call it a "power-law with a cutoff", "exponential tail", "multiple
scaling regimes", "piecewise linear" or some other cruft. Physicists,
particularly, are egregious abusers of this. Like Agent Mulder, they want to
believe.

------
arjunnarayan
I like Cosma's (the author of this piece) point on another of his blog-posts
on this topic of whether you've found a power law distribution[1]:

Ask yourself whether you really care. Maybe you don't. A lot of the time, we
think, all that's genuine important is that the tail is heavy, and it doesn't
really matter whether it decays linearly in the log of the variable (power
law) or quadratically (log-normal) or something else.

But I don't know why I would expect the distribution of library books to be a
power law: physical books are inherently limited in how many people can hold
them. And the number of books at my school library that are permanently
checked out always is rather high. There's a strict upper bound on the max-
circulation of a book, so you don't get those super-high values for a few
books required to make a power law work. Besides, do you care that it is log-
normal and not power? I think the lesson is stop calling things "power law"
when you mean "fat-tailed".

Fat-tailed is a good enough description in most use-cases anyway.

\-- [1] <http://cscs.umich.edu/~crshalizi/weblog/491.html>

------
PaulHoule
Note that you've got a cutoff because a library book can only circulate to one
patron at a time (and libraries will rarely buy 10+ copies of a bestseller)

You very well may have a power law in the range before that, but all systems
that show power laws practically have finite size and other cutoffs.

95% of academic work on power laws is bunk. Ask your average professor or
graduate student whose name is on a power law paper what a "statistical
estimator" is and you'll get a blank look.

------
ahi
"Zipf, Power-laws, and Pareto - a ranking tutorial"

[http://www.hpl.hp.com/research/idl/papers/ranking/ranking.ht...](http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html)

