
So You Think You Have a Power Law (2007) - tosh
http://bactra.org/weblog/491.html
======
ballooney
Rule 6 of Akin's Laws of Spacecraft Design _:

6\. Everything is linear if plotted log-log with a fat magic marker.

_[http://spacecraft.ssl.umd.edu/akins_laws.html](http://spacecraft.ssl.umd.edu/akins_laws.html)

~~~
itronitron
I was surprised to not see my favorite NASA quote... "no situation can be so
bad that it can't be made worse"

------
yannis7
Back in my academia years, I was really frustrated with that group of
researchers, "cult of power law" I called them - mainly centered around Santa
Fe Institute.

In all their talks, the pattern was always the same: 1) take a dataset from
some random system, and strip it from all its domain context
("interdisciplinary" research) 2) brag about being a "physicist" thus applying
a "physicist's" approach to new areas of research 3) plot data on log-log
scale - kind of looks like a power law 4) make a toy model and use it to
"simulate the system" 5) plot simulation vs real data on log-log scale --
kinda look the same 6) promise that your trivial little model will reveal
whole new horizons for that field you know nothing about - because others are
stupid and you're a "physicist" 7) write a grant proposal

------
drallison
Power laws and scale-free networks are discussed in
[https://arxiv.org/abs/1801.03400](https://arxiv.org/abs/1801.03400) with HN
comments at
[https://news.ycombinator.com/item?id=16144867](https://news.ycombinator.com/item?id=16144867).
Both this and the cited paper are worthwhile reads for power law users.

------
clircle
You can't use a goodness of fit test to claim that your data follows a power
law (or any distribution). You can only use a GoF test (such as Kolmogorov-
Smirnov) to collect evidence that your data don't follow some hypothesized
distribution. And if you collect enough data, your GoF test will reject every
hypothesized distribution.

~~~
hexane360
That's where #6 comes in:

>Use Vuong's test to check alternatives, and be prepared for disappointment.
Even if you've estimated the parameters of your parameters properly, and the
fit is decent, you're not done yet. You also need to see whether other, non-
power-law distributions could have produced the data. This is a model
selection problem, with the complication that possibly neither the power law
nor the alternative you're looking at is exactly right; in that case you'd at
least like to know which one is closer to the truth.

------
PaulHoule
Note that paper appeared in SIAM, not "Reviews of Modern Physics" or some
place where physicists might read it.

(Speaking as someone who wrote a probably invalid paper about power laws and
who had Mark Newman working just across the hall in the 1990s)

------
v_lisivka
Can anybody say which kind of function on these graphics?

Distribution of atoms in Solar systems looks like cos(m):
[https://upload.wikimedia.org/wikipedia/commons/e/e6/SolarSys...](https://upload.wikimedia.org/wikipedia/commons/e/e6/SolarSystemAbundances.png)

Star clusters, looks like sin(s):
[http://cdn.iopscience.com/images/0004-637X/725/2/1717/Full/a...](http://cdn.iopscience.com/images/0004-637X/725/2/1717/Full/apj371793f4_lr.jpg)

Exoplanets, looks like sin(m): [http://exoplanets.co/img/exoplanets-mass-
distribution.jpg](http://exoplanets.co/img/exoplanets-mass-distribution.jpg)
(more at
[http://exoplanetsdigest.com/author/yaqoob/](http://exoplanetsdigest.com/author/yaqoob/)
)

~~~
marcosdumay
Star clusters look like normal. Exoplanets looks like 2 normals added up
(probably because of 2 different detection procedures).

Distribution of atoms in the Solar System is the weird one. No idea what it
is.

~~~
MereInterest
Yeah, the distribution of elements is weird and wonky because it depends on a
lot of details about nuclear physics. First off, nuclei with odd number of
protons or neutrons are less stable than nuclei with even numbers. That is
what causes the staggered bumps every-other element.

The low-mass region is largely determined by stellar nucleosynthesis. After
helium, largely works by shoving more helium atoms on. So, to you get lots of
Carbon-12 and Oxygen-16, which are kind of like 3 or 4 helium nuclei stuck
together.

The heavier regions get really weird, because those depend on the r-process.
Some astrophysical events spit out heaps and loads of neutrons, which stick to
nuclei. You go way the heck away from stability, then beta-decay back once
things calm down. What products you get depend on the reaction rates of
hundreds of possible reactions, few of which can be experimentally measured in
a lab, because there is no way to make that strong of a neutron source.

Explaining that distribution is a topic of many, many dissertations, and does
not in any way reduce down to a simple law.

Source: am nuclear physicist

------
btrettel
Any similar articles or advice about power law relationships in general (not
distributions)? I've fit a lot of data to power law relationships in the past
but don't know if there are any non-obvious pitfalls. I can recognize when a
power law obviously won't work, but as has been said, a lot of data can look
like a power law relationship. So for each obvious failure of a power law
relationship, there are a certain number of false positives.

------
Asdfbla
Why do you guess that their paper from June 2007 mentioned in the blog was
only released February 2009? Seems like a long revision period.

Nothing important, just threw me a bit off guard seeing the date of the blog
post and the authors own prediction "forthcoming (2009)". But maybe he edited
the blog once he knew when the paper finally came out.

~~~
username1723
I know for a fact that Clauset, Shalizi, and Newman first submitted their
article to Reviews of Modern Physics. It was rejected there and then submitted
to SIAM Review, which published it at one round of revision. SIAM Review back
then was not known for being a "speedy" journal.

------
haberman
Title should have: (2007).

~~~
dang
Added. Thanks!

------
itronitron
IMHO this feels like quibbling as the author doesn't make clear when this
would actually make a difference.

~~~
princeb
> the author doesn't make clear

he makes the point across several points (4, 6, and 7 and maybe 1) which is
that using a power law gets tail estimates very badly wrong (or they fit the
tail and then get the distribution of the rest of the domain wrong).

------
larkeith
This a fascinating article that provides lots of worthwhile information for
anyone planning in fitting a power law to their data.

I will probably never read another of the author's writings due to the
pervasive negativity.

~~~
teraflop
I don't see any negativity in this article, outside of maybe the headline and
the first paragraph. It consists almost entirely of constructive suggestions
for how to do data analysis.

------
gnarbarian
I used a power series to approximate the radius of a planet to its mass for a
webgl space game I'm making [0] where solar systems are randomly generated. I
needed something to roughly approximate the size of a planet or star based
solely on mass. Using the 8 planets + (sedna & pluto) and our star to generate
the curve function I got a r^2 value of 0.989. [1]

I'm not about to make some scientific claim based on it but for my purpose (a
neat game) using a power series to approximate mass was an extremely efficient
and simple solution to my problem.

[0] [http://thedagda.co:9000/](http://thedagda.co:9000/)

edit: if you have an xbox controller you can use this:
[http://thedagda.co:9000/?gamepad=true](http://thedagda.co:9000/?gamepad=true)

[1]
[https://docs.google.com/spreadsheets/d/1GKPNNMJrZMaf8aQqgGD-...](https://docs.google.com/spreadsheets/d/1GKPNNMJrZMaf8aQqgGD-9DNJMkYVO7BKJvAIQL9o5aI/edit?usp=sharing)

~~~
contravariant
Yeah that's a power law. It would have been pretty easy to see had you used a
log-log plot.

~~~
EtDybNuvCu
This is literally what the article is arguing against doing.

~~~
contravariant
They're cautioning against drawing statistical conclusions from it. That
doesn't mean the log-log plot isn't more useful than the linear plot.

Also they seem to be talking about a power law distribution, whereas here
we're talking about the dependence of two variables. As a first step plotting
them on a log-log plot and saying 'yeah, looks straight' is fine. Among other
things this tells us the dependence is suspiciously close to a cubic root law,
which could possibly be justified using dimensional analysis.

