There's actually a joke in the field that when you get a new dataset, the first thing you do is fit it to a power law. If that doesn't work, you fit it to a broken power law.
The ONLY variables that are normally distributed are those that are averages of many independent, identically distributed variables of finite variance.
Thus, if you cannot find the finite-variance variables that average up to form a variable X, then X is not normally distributed.
(The parts about independent and identically distributed are technical red-herrings. The only essential condition is finite variance.)
Macro and financial economists have traditionally been the worst offenders.
What you're saying is probably part of it, although in my experience that criticism can be leveled as much, if not more, at wet-lab-type biologists who eschew all but the most minimal stats.
With the social sciences, though, there's another phenomenon at play, which is that the phenomena are so abstract often that there's not really a good theoretical reason to assume anything in particular. And if that's the case, because the normal is the entropy-maximizing distribution, you're actually better off assuming that rather than some other distribution. You could also use nonparametric stats, but that has its own advantages and disadvantages.
Bias-variance dilemma and all that.
The truth is, it's hard to beat the normal even when it's wrong. And if you subscribe to the inferential philosophy that every model is wrong, you're better off being conservatively wrong, which implies a normal.
I'm not saying everything should be assumed to be normal. But unless things are (1) obviously super non-normal, or (2) you have some very strongly justified model that produces a non-normal distribution, you're probably best off using a normal if you're going to go parametric. And I think those two conditions are met much more often than we like to admit.
The normal distribution is kind of over-maligned, I think. I started my stats career being enamoured of rigorously nonparametric stats, and still am (esp. exact tests, bootstrapping/permutation-based inference, and empirical likelihood), but have grown to strongly appreciate normal distributions (or whatever maxent distribution is appropriate).
With data in hand, a skew/kurtosis scatter plot is a good way to gauge the higher dimensional distribution of your data. Another option is to cluster the variables of the data set using something like HDBSCAN and color the plot points based on cluster membership.
If you have to go guessing distributions without evidence, you're better off choosing a low k student's T distribution (for robustness to outliers) or a gamma distribution (if you think your data might be skewed).
Mind, you might find different distributions when solving diffusion equations (the Cox-Ingersoll-Ross process involves a Bessel distribution if I'm not mistaken). But the diffusion-jump paradigm is much better justified there than distributional (or even finite moment) assumptions in discrete land.
Put it this way: if finance is abusing distributional assumptions, there's money being left on the table.
And unsurprisingly there are quant funds and prop trading firms that use this very fact to make lots of money. Academic economist's love for the normal distribution is derided by pretty much any fellow real world trader I know.
Look, even if we disregard jumps as a possible term in equations, merely considering volatility to be stochastic and driven by a Brownian already gets you variances that may grow arbitrarily fast. This is without considering stranger nonlinearities on the Brownian term itself.
People are overly impressed by forceful arguments of the Taleb variety and log-log plots of empirical distributions and start parroting talking points about heavy/fat tails.
If I wasn't computerless and on my phone I would link to a paper that does the analog of the Anscombe quartet for log-log distribution plots "proving that data is Pareto/power law/etc." It's good vaccine for fat tail hipsterism, look it up.
The Tanaka equation is an example (admittedly: not a diffusion, not a case o plug-and-chug the Ito lemma) of a process driven by a Brownian with a discontinuous probability distribution. How the hell? From memory,
dTNK = dB if dB>0 else -dB
Now imagine a model with three equations.
X1 is a bog-standard geometric diffusion (we could have picked something that's chi-squared distributed driven by a Brownian from standard interest rate models) as in the 1970s Black-Scholes models. But instead of having an exogenous volatility, it has its Brownian term dB1 multiplied by a second equation, X2.
dX2 could be a standard mean-reversion equation, but again its dB2 is multiplied by X3.
dX3 is something like abs(dB3 - dB4* X3).
Voilà, an equation (X1) with a sudden break driven by the level of a mean-reverting equation (X2, which tells us volatility should come down in finite time even if it grows by a lot at times) that's set to blow up at a X2-dependent but stochastic level.
Don't get me wrong, Poisson-like jumps are very common (they're precisely the limiting process for sudden jumps) but people overstate (perhaps because they didn't really read the conditions for Ito isometry) how much a Brownian motion forces a system into normality or smoothness.
But hey, people get away with being hipsters about programming languages, why shouldn't they do that for stochastic calculus too, you know?
Power law walks into a bar. Bartender says, "I've seen a hundred power laws. Nobody orders anything." Power law says, "1000 beers, please".
Agh, I'm a mathematician and should get this, but I don't. Is the joke just that power laws give behaviour that is very small for quite a while, and then becomes suddenly large?
Could you elaborate a little, or maybe give an example?
But if something can happen at any given scale, the distribution must operate very differently. The distribution can't pick out any favorite size to work at --- some fraction of events must happen at every possible scale. So, if we're looking at the distribution of stellar luminosities, then there will be some stars which are as bright as the Sun, some which are 1% as bright as the Sun, and some which are 100 times as bright as the Sun. And since the star formation process doesn't have any preferred scale, then the number of Solar luminosity stars to stars which are 1% as luminous as the Sun must be about the same as the number of stars which are 100 times as bright as the Sun to the number of Solar luminosity stars. it's perfectly fine for the distribution to prefer dimmer or brighter stars because this doesn't end up introducing a preferred scale --- it just tilts the slope of the distribution (which is a line on a log-log graph).
I'm not a mathematician, so I'm not sure if these statements are true in any rigorous sense, but in practice scale-free processes tend to produce power laws in the universe.
Well worth the 20 minutes.
I recommend reading them through, but slide 39 is the tl;dr.
"So You Think You Have a Power Law — Well Isn't That Special?"
> it was made after the discovery that on a log log plot everything is a straight line(o)
i'd recommend the entire lecture, but definitely at least check out the anecdote this joke bookends
it is about hubble's original 1929 embarrassing failed attempt to calculate the rate of the expansion of the universe(i)
That is indeed true, but why should such a property imply that the use of Normal distribution is appropriate ? Just a rhetorical question, of course, because your comment does indicate that Normal is not a good choice unless one has compelling reasons to do so.
Another argument that is used to justify its use is the central limit theorem. That says a sum of (nearly) independent variables with (nearly) identical distributions with finite mean converge to the Normal distribution. If the process under observation is indeed a superposition of such random processes, then yes the choice of the Gaussian can be justified. But it is surprisingly common that one of the 3 requirements are violated. A common violation is that the variance is infinite or it is so high that the process is better modeled as one with an infinite variance. In such situations the family of stable distributions are the more appropriate choice.
Gauss' own use of the Gaussian was motivated by convenience rather than any deep theory. Even at that time it was well known that other distributions, for example, Laplace distribution works better.
The negative log of the (unnormalized) Gaussian density is basically x^2 and this fact comes very handy.
It's used extensively in Bayesian regression, modeling noise as additive Gaussian leads to the squared loss function, and if you also model the prior as Gaussian it leads to an L2 regularizer.
It also simplifies many other calculations. Fitting a mixture of Gaussians by expectation-maximization or the Kalman filter come to mind.
For squared loss almost every theorem you want to be to be true are actually true. In a way its dangerous because it sets up bad expectations (* no nerd pun intended).
A result that holds for squared loss but not for others is the SVD as a low rank approximation of a matrix.
*Expectation minimizes the squared loss
I am a (also former) physicist and it drives me crazy when people fit what ever happens to be in the chart to a straight line, "to get the trend". Whenever I ask them for the theory which predicts that x and y will be linked by y = ax + b, they do not have any.
This also goes on with extrapolations or interpolations, usually without the slightest theoretical reason to do so.
The problem is that when working with marketing, HR or even finance, they are so used to "getting the trend" with a linear fit that I am hopeless.
I once draw a parabole and asked for the trend (between the minimum). They were surprised by this stupid question as "it is obvious that there is no trend". Taking some random points linking share value with the number of women in a company "obviously fits to a trend line".
So there is a huge bias among researchers to assume them to make their treatment of the data easier.
That's funny -- my field has the same thing with a different distribution. Analyzing stochastic processes is much easier if you assume an exponential distribution. It's one of my criteria for whether somebody's giving a bad job talk. If an unfounded assumption that waiting times are exponentially distributed shows up in the first three slides, the rest of the presentation is probably B.S. Even if the presenter used Beamer and filled the slides with beautiful equations. (The exponential distribution of waiting times is basically an assumption that events are independent.)
Edit: Especially if the presenter's slides are full of beautiful equations.
This is only true if events are independent. Suppose we're modeling rider arrivals at a bus stop. The bus comes once an hour. Does the exponential distribution adequately model how long you need to wait for another person to show up? If a Poisson process is an appropriate model, the expected value of the number of people arriving in the 20 minutes after the bus departs is the same as the expected value of the number of people arriving in the 20 minutes before. Clearly not. Rider arrival rates depend on how long it is until the bus is supposed to depart. Indeed, most arrival processes are not ideal Poisson processes, but it may be a better approximation in some cases than in others.
We use Poisson processes for two reasons:
1. They're often a good enough approximation of reality to make useful predictions.
2. They're mathematically tractable.
Unfortunately, our reasons for using Poisson processes are often more (2) than (1).
None of this before/after bus mess.
Evidently. I'm afraid we have no common ground on which to discuss stochastic processes. Have a pleasant weekend.
There's something deeply satisfying using a model so simple to explain a piece of nature. :)
Example: cosmic ray energy spectrum. http://iopscience.iop.org/1367-2630/12/7/075009/downloadFigu...
Of course in reality, at some point, some other physical process starts to dominate and cuts off the power law. this then turns it into a power law with a different slope or an exponential cutoff.
Edit: I forgot to add that a star's light profile on an image is actually approximately Gaussian. But it isn't exactly Gaussian --- I seem to remember that the core more closely resembles a Lorentzian. I am obviously not an observer, otherwise I would have thought of that immediately!
The maximum entropy distribution with a given mean is the exponential. I think to define power law distributions you will need additional constraints more concrete than "undefined variance".
Power law fits to empirical data are heavily misused.
Well, how about the magnitude of stars. So, is the magnitude of stars normally distributed? I googled "distribution magnitude of stars" and got images (by clicking on images tab) that seem pretty normally distributed to me. Aren't they?
Consider the following example: The [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution) is a pathological example of a distribution with very heavy tails. Heavier than the power law distributions meantioned in the GP. So heavy, the distribution doesn't even have a mean value. But if you just look at the pdf, it looks almost exactly like a gaussian.
Too bad inference with long tailed distributions is so hard.
Check out Chebyshev's inequality.
Just don't start talking about it to your pointy-haired boss. You'll get fired.
I'm pretty jaded at this point.
If EX = infinity but X is finite with prob > 0, the standard deviation will be infinite too.
Not really. Plus keep in mind magnitude is a logarithmic measure. Things get skewed when you take logarithms. Also if you look for example at star sizes:
The distribution is certainly not gaussian, and very much likely heavy tail.
That doesn't tell you whether the magnitudes are normally distributed though. That just tells you how the magnitude is computed from an object's observed/measured flux.
The size of a ruler that a machine cuts (quasi normal/Gaussian)
This sounds wrong. One is invariant under scaling, the other under translation.
However there is a way to make it correct. Your second sentence is spot on the intuition. If you log transform power-law distributed random variable the RV that you get is an exponentially distributed RV.
It shouldn't, because there is a big reason to lean towards normal distribution, and that is the central limit theorem. If you sample a variety of different distributions, the sum of all these samples will go towards a normal distribution. That is why the normal distribution pops up in so many places.
I've had debates that not everything comes out as a Bell Curve only to be called an idiot for claiming that.