Hacker News new | comments | show | ask | jobs | submit login

As a (former) astronomer, I've never understood why people assume normal distributions for everything. I understand that there are good theoretical motivations for this --- namely, the normal distribution is the distribution that maximizes entropy for a given mean and variance. But in astronomy, nothing is normally distributed. (At least, nothing comes to mind.) Instead, everything is a power law. The reason for this is that most astrophysical processes are scale free over many orders of magnitude, and if you want a scale free process, it must be distributed as a power law.

There's actually a joke in the field that when you get a new dataset, the first thing you do is fit it to a power law. If that doesn't work, you fit it to a broken power law.

I work in CG animation, and one thing artists do when procedurally generating geometry is tweak the size distribution of things meant to mimic natural phenomena, such as leaves and rocks. They are typically assembling these as expressions, so it's common to first reach for a simple uniform distribution in the form of scale*rand(), but more experienced folks know to go straight to a power law instead. I think it's fascinating that such an extremely high-level way to characterize natural processes could make it into an artist's toolkit like that.

To build on what 'blevin says, check out for instance https://bost.ocks.org/mike/algorithms/

I don't quite get the relevance to his comment, but wow! What a great link.

A smooth star cluster can be plotted by giving stars a normally distributed variate for each dimension. Normal distribution is the only kind which will result in a radially even cluster - any other distribution results in visible deformation over the dimensions axis. This geometric property of normal distribution is true for any number of dimensions.

I dabble in my own crap gamey stuff, and I use random [mostly 0-1] to power of something all the time; not because I know what I'm doing, but because it's simple and often good enough. I don't know math, but multiplying all sorts of things with each other to see what happens I can do; that and getting random numbers from perlin noise can cover a lot of sins, or at least make for a lot of fun :)

There's this funny "heuristic" converse of the central limit theorem (by Mandelbrot, I think):

The ONLY variables that are normally distributed are those that are averages of many independent, identically distributed variables of finite variance.

Thus, if you cannot find the finite-variance variables that average up to form a variable X, then X is not normally distributed.

(The parts about independent and identically distributed are technical red-herrings. The only essential condition is finite variance.)

Averages or sums. But yes, the point still holds.

Well, a sum is just an average multiplied by a constant.

Not to mention that in the wider world, the most common types of processes follow a Zipfian distribution[1]. It think this misunderstanding comes from undergraduate statistics courses teaching you all about normal and uniform distributions, while most students are unlikely to learn the limits and general inapplicability of that model unless they go into research. Most people's first encounter with real statistics is bell curves. Not to mention that the central limit theorem lulls you into a false sense of security that "at the end of the day" everything will look normal.

[1]: https://en.m.wikipedia.org/wiki/Zipf%27s_law

I think it has a lot to do with people in the social sciences wishing that the CL Theorem could simplify everything sampled to a single mathematical model they can have tools programmed to do magically when fed the data.

There's also a lot of cultural memory from te times when computation was really hard. Much can be improved in the most-data-impoverished social sciences with things like MCMC or even bootstrapping of frequenting statistics, buuuut... statistics are tools, not the focus of social scientists (who design and validate their hypotheses in a primarily qualitative fashion), so the old ways hang on for a long time.

If I hadn't recently read Benoit Mandelbrot's book "The Misbehavior of Markets," I might dismiss this as empty cynicism. But I think there is some truth to it.

Macro and financial economists have traditionally been the worst offenders.

I don't mean it to be cynical, it's just my experience that a lot of data gathering in the social sciences (my experience was in education) makes terrible assumptions, often implicitly, the worst being that the variables are IID. And it's very obvious from the publication that the number crunching is some cargo-cult approach they probably learned from their mandated semester or two in research methods. (Okay, that last bit was cynical!)

I could probably be characterized as a social scientist, at least a behavioral scientist.

What you're saying is probably part of it, although in my experience that criticism can be leveled as much, if not more, at wet-lab-type biologists who eschew all but the most minimal stats.

With the social sciences, though, there's another phenomenon at play, which is that the phenomena are so abstract often that there's not really a good theoretical reason to assume anything in particular. And if that's the case, because the normal is the entropy-maximizing distribution, you're actually better off assuming that rather than some other distribution. You could also use nonparametric stats, but that has its own advantages and disadvantages.

Bias-variance dilemma and all that.

The truth is, it's hard to beat the normal even when it's wrong. And if you subscribe to the inferential philosophy that every model is wrong, you're better off being conservatively wrong, which implies a normal.

I'm not saying everything should be assumed to be normal. But unless things are (1) obviously super non-normal, or (2) you have some very strongly justified model that produces a non-normal distribution, you're probably best off using a normal if you're going to go parametric. And I think those two conditions are met much more often than we like to admit.

The normal distribution is kind of over-maligned, I think. I started my stats career being enamoured of rigorously nonparametric stats, and still am (esp. exact tests, bootstrapping/permutation-based inference, and empirical likelihood), but have grown to strongly appreciate normal distributions (or whatever maxent distribution is appropriate).

From my perspective, either you have data and you have an idea what sort of distributions you're working with, or you don't, and you should fix that problem first rather than going down the theory rabbit hole.

With data in hand, a skew/kurtosis scatter plot is a good way to gauge the higher dimensional distribution of your data. Another option is to cluster the variables of the data set using something like HDBSCAN and color the plot points based on cluster membership.

If you have to go guessing distributions without evidence, you're better off choosing a low k student's T distribution (for robustness to outliers) or a gamma distribution (if you think your data might be skewed).

I'm ignorant of "heavy-tailed statistics" as a field, but the normal distribution in quantitative finance is justified by the Levy characterization of Brownian motion (basically every continuous stochastic process is a drift-diffusion driven by a Brownian) and the Levy representation of a discontinuous process as a sum of a diffusion and a pure jump process.

Mind, you might find different distributions when solving diffusion equations (the Cox-Ingersoll-Ross process involves a Bessel distribution if I'm not mistaken). But the diffusion-jump paradigm is much better justified there than distributional (or even finite moment) assumptions in discrete land.

Put it this way: if finance is abusing distributional assumptions, there's money being left on the table.

The main issue with this being that markets don't exhibit behaviour analogous to Brownian motion. It's a nice little assumption that makes the math work, but frequency analysis of real world markets shows that they behave like pink noise instead.

And unsurprisingly there are quant funds and prop trading firms that use this very fact to make lots of money. Academic economist's love for the normal distribution is derided by pretty much any fellow real world trader I know.

You are throwing in a lot of technical words around and the specific fact you state arent wrong, but that cannot make a strong case for the choice of the Normal distribution. All you need is that the variance of bounded (in time) increments be unbounded for the Gaussian behavior to fall apart (for example non-rare jumps of sufficient magnitude).

I did say "diffusion-jump process".

Look, even if we disregard jumps as a possible term in equations, merely considering volatility to be stochastic and driven by a Brownian already gets you variances that may grow arbitrarily fast. This is without considering stranger nonlinearities on the Brownian term itself.

People are overly impressed by forceful arguments of the Taleb variety and log-log plots of empirical distributions and start parroting talking points about heavy/fat tails.

If I wasn't computerless and on my phone I would link to a paper that does the analog of the Anscombe quartet for log-log distribution plots "proving that data is Pareto/power law/etc." It's good vaccine for fat tail hipsterism, look it up.

I dont think we are disagreeing (as long as you do not claim that stock prices are merely a Gaussian process.

I thought of a simple example on my way to my commute. (It's a short walk.)

The Tanaka equation is an example (admittedly: not a diffusion, not a case o plug-and-chug the Ito lemma) of a process driven by a Brownian with a discontinuous probability distribution. How the hell? From memory,

dTNK = dB if dB>0 else -dB

Now imagine a model with three equations.

X1 is a bog-standard geometric diffusion (we could have picked something that's chi-squared distributed driven by a Brownian from standard interest rate models) as in the 1970s Black-Scholes models. But instead of having an exogenous volatility, it has its Brownian term dB1 multiplied by a second equation, X2.

dX2 could be a standard mean-reversion equation, but again its dB2 is multiplied by X3.

dX3 is something like abs(dB3 - dB4* X3).

Voilà, an equation (X1) with a sudden break driven by the level of a mean-reverting equation (X2, which tells us volatility should come down in finite time even if it grows by a lot at times) that's set to blow up at a X2-dependent but stochastic level.

Don't get me wrong, Poisson-like jumps are very common (they're precisely the limiting process for sudden jumps) but people overstate (perhaps because they didn't really read the conditions for Ito isometry) how much a Brownian motion forces a system into normality or smoothness.

But hey, people get away with being hipsters about programming languages, why shouldn't they do that for stochastic calculus too, you know?

Hey. Why do people upvote a post that says "Taleb is a jerk" and makes a reputation argument in a forceful manner (reminiscent of Taleb's own style) -- and then downvote a post with technical detail?

Probably the fact that you flatly ignored the remark about stuffing your post with opaque technical terms--I know a lot of maths, but I can't make heads or tails of what you're saying. Who are you even talking to? I assume it means something to you, but it doesn't even look like you're trying to communicate your point in a clear manner. Just that you enjoy using words. When trying to read around those technical terms, what is left has a rather nasty and arrogant tone. Therefore, doesn't add much to the discussion = downvote.

Can't recommend the Mandelbrot book highly enough - he was a student of Paul Levy, and he wrote extensively about why Gaussian is not a good choice to model financial time series.

Disagree; economists have failed more visibly, but they've got nothing on social scientists.

There's also joke that goes:

Power law walks into a bar. Bartender says, "I've seen a hundred power laws. Nobody orders anything." Power law says, "1000 beers, please".

> Power law walks into a bar. Bartender says, "I've seen a hundred power laws. Nobody orders anything." Power law says, "1000 beers, please".

Agh, I'm a mathematician and should get this, but I don't. Is the joke just that power laws give behaviour that is very small for quite a while, and then becomes suddenly large?

I'm not a mathematician, but I think the joke is that you can't infer a meaningful average from a power law.

> if you want a scale free process, it must be distributed as a power law.

Could you elaborate a little, or maybe give an example?

The links that others have provided are good, but the way I think about this is basically that the scale of a process basically sets the order of magnitude at which it occurs. We tend to think of processes that have a particular scale, which makes power laws somewhat unintuitive (even though there are also lots of scale-free processes in ordinary life, too). So, cars have a natural scale, since they have to fit people inside them. So the distribution of car sizes is pretty limited --- you aren't going to find cars that are microns long, and you won't find cars that are kilometers long.

But if something can happen at any given scale, the distribution must operate very differently. The distribution can't pick out any favorite size to work at --- some fraction of events must happen at every possible scale. So, if we're looking at the distribution of stellar luminosities, then there will be some stars which are as bright as the Sun, some which are 1% as bright as the Sun, and some which are 100 times as bright as the Sun. And since the star formation process doesn't have any preferred scale, then the number of Solar luminosity stars to stars which are 1% as luminous as the Sun must be about the same as the number of stars which are 100 times as bright as the Sun to the number of Solar luminosity stars. it's perfectly fine for the distribution to prefer dimmer or brighter stars because this doesn't end up introducing a preferred scale --- it just tilts the slope of the distribution (which is a line on a log-log graph).

I'm not a mathematician, so I'm not sure if these statements are true in any rigorous sense, but in practice scale-free processes tend to produce power laws in the universe.

This isn't a proof but if you want to understand it, these slides are alright: http://cs.brynmawr.edu/Courses/cs380/spring2013/section02/sl...

I recommend reading them through, but slide 39 is the tl;dr.

Richard McElreath has some nice points about variable independence assumptions. Concretely, for a model this assumption represents the state of "ignorance" and is the most conservative choice (i.e. doesn't assume any correlation). That said, he also mentions the "mind projection fallacy" of confusing epistemological claims with ontological claims, which may in turn be the reason why many assume normal distributions for everything.


"So You Think You Have a Power Law — Well Isn't That Special?"

reminds me of this joke from a lecture Lawrence Krauss gave in 2009 of his work in 'A Universe From Nothing':

> it was made after the discovery that on a log log plot everything is a straight line(o)

i'd recommend the entire lecture, but definitely at least check out the anecdote this joke bookends

it is about hubble's original 1929 embarrassing failed attempt to calculate the rate of the expansion of the universe(i)

(o) https://www.youtube.com/watch?v=7ImvlS8PLIo&t=18m25s

(i) https://www.youtube.com/watch?v=7ImvlS8PLIo&t=12m55s

The version I've heard is "everything on a log log plot in a sufficiently thick marker".

> I understand that there are good theoretical motivations for this --- namely, the normal distribution is the distribution that maximizes entropy for a given mean and variance.

That is indeed true, but why should such a property imply that the use of Normal distribution is appropriate ? Just a rhetorical question, of course, because your comment does indicate that Normal is not a good choice unless one has compelling reasons to do so.

Another argument that is used to justify its use is the central limit theorem. That says a sum of (nearly) independent variables with (nearly) identical distributions with finite mean converge to the Normal distribution. If the process under observation is indeed a superposition of such random processes, then yes the choice of the Gaussian can be justified. But it is surprisingly common that one of the 3 requirements are violated. A common violation is that the variance is infinite or it is so high that the process is better modeled as one with an infinite variance. In such situations the family of stable distributions are the more appropriate choice.

Gauss' own use of the Gaussian was motivated by convenience rather than any deep theory. Even at that time it was well known that other distributions, for example, Laplace distribution works better.

As you say, mathematical convenience is another argument.

The negative log of the (unnormalized) Gaussian density is basically x^2 and this fact comes very handy.

It's used extensively in Bayesian regression, modeling noise as additive Gaussian leads to the squared loss function, and if you also model the prior as Gaussian it leads to an L2 regularizer.

It also simplifies many other calculations. Fitting a mixture of Gaussians by expectation-maximization or the Kalman filter come to mind.

Absolutely !

For squared loss almost every theorem you want to be to be true are actually true. In a way its dangerous because it sets up bad expectations (* no nerd pun intended).

A result that holds for squared loss but not for others is the SVD as a low rank approximation of a matrix.

*Expectation minimizes the squared loss

I think that "good theoretical motivations" is the key point.

I am a (also former) physicist and it drives me crazy when people fit what ever happens to be in the chart to a straight line, "to get the trend". Whenever I ask them for the theory which predicts that x and y will be linked by y = ax + b, they do not have any.

This also goes on with extrapolations or interpolations, usually without the slightest theoretical reason to do so.

The problem is that when working with marketing, HR or even finance, they are so used to "getting the trend" with a linear fit that I am hopeless.

I once draw a parabole and asked for the trend (between the minimum). They were surprised by this stupid question as "it is obvious that there is no trend". Taking some random points linking share value with the number of women in a company "obviously fits to a trend line".

My boss and I talk about this phenomenon a lot. Normal distributions have lots of nice properties, and lots of tools that can do lots of fancy things, and infer lots of nice "facts" from them.

So there is a huge bias among researchers to assume them to make their treatment of the data easier.

Physicists in particular have a reputation for overusing the assumption of a power law! http://bactra.org/weblog/491.html

> There's actually a joke in the field that when you get a new dataset, the first thing you do is fit it to a power law. If that doesn't work, you fit it to a broken power law.

That's funny -- my field has the same thing with a different distribution. Analyzing stochastic processes is much easier if you assume an exponential distribution. It's one of my criteria for whether somebody's giving a bad job talk. If an unfounded assumption that waiting times are exponentially distributed shows up in the first three slides, the rest of the presentation is probably B.S. Even if the presenter used Beamer and filled the slides with beautiful equations. (The exponential distribution of waiting times is basically an assumption that events are independent.)

Edit: Especially if the presenter's slides are full of beautiful equations.

..is it not? Waiting times would be a poisson distribution. In my field where most folks are much less math-savvy, it's WAY more common for someone to be "surprised" or find it "insightful" that the 80/20 rule exists on a dataset (that I had concluded would already be power-law distributed before I looked at it); or who struggle with defining how another data set like email opens over time is distributed (because it's Poisson and hey we're never really introduced to that idea)

> Waiting times would be a poisson distribution

This is only true if events are independent. Suppose we're modeling rider arrivals at a bus stop. The bus comes once an hour. Does the exponential distribution adequately model how long you need to wait for another person to show up? If a Poisson process is an appropriate model, the expected value of the number of people arriving in the 20 minutes after the bus departs is the same as the expected value of the number of people arriving in the 20 minutes before. Clearly not. Rider arrival rates depend on how long it is until the bus is supposed to depart. Indeed, most arrival processes are not ideal Poisson processes, but it may be a better approximation in some cases than in others.

We use Poisson processes for two reasons:

1. They're often a good enough approximation of reality to make useful predictions.

2. They're mathematically tractable.

Unfortunately, our reasons for using Poisson processes are often more (2) than (1).

I don't really understand your example. The number of people arriving at the bus stop would match a poisson distribution where the x-axis is time-to-next-bus.

None of this before/after bus mess.

> I don't really understand your example.

Evidently. I'm afraid we have no common ground on which to discuss stochastic processes. Have a pleasant weekend.

Ok dude you too. Nice chatting

That's almost not a joke - I did particle astrophysics, not astronomy, but for the exact same reason you state, our physics results (we used particle physics as a window to astronomy, or at least we tried to) would end up with broken power law fits.

There's something deeply satisfying using a model so simple to explain a piece of nature. :)

Example: cosmic ray energy spectrum. http://iopscience.iop.org/1367-2630/12/7/075009/downloadFigu...

When you want to fit a distribution to a star's image on the plate, normal does indeed seem a weird choice. But the point about multidimensional gaussians is that they are the only distribution that is maximally "ignorant" (maximum entropy) -- so if there's a small (unresolvable) companion star we don't know about, we can do no better than a gaussian if we want to ignore it.

Well, that's not quite true. The normal distribution maximizes entropy for a finite mean and variance. But it's not guaranteed that certain physical processes have well defined means and variances. A power law is the maximum entropy distribution with a well defined mean, but undefined variance, on the range (0, infinity). [1]

Of course in reality, at some point, some other physical process starts to dominate and cuts off the power law. this then turns it into a power law with a different slope or an exponential cutoff.

Edit: I forgot to add that a star's light profile on an image is actually approximately Gaussian. But it isn't exactly Gaussian --- I seem to remember that the core more closely resembles a Lorentzian. I am obviously not an observer, otherwise I would have thought of that immediately!

[1]: https://en.wikipedia.org/wiki/Maximum_entropy_probability_di...

> A power law is the maximum entropy distribution with a well defined mean, but undefined variance, on the range (0, infinity).

The maximum entropy distribution with a given mean is the exponential. I think to define power law distributions you will need additional constraints more concrete than "undefined variance".

That's if you require the range to be [0, infinity). If you don't require 0 to be a valid point in the distribution then a power law is the maximum entropy distribution. A power law just diverges if you try to integrate to zero, so you have to find a lower entropy distribution.

I think that if the power law cannot be integrated on [0,infinity), it cannot be integrated on (0,infinity) either.

> But in astronomy, nothing is normally distributed. (At least, nothing comes to mind.) Instead, everything is a power law.

Power law fits to empirical data are heavily misused.


that's really weird. I'm not an astronomer so I thought "hmmm, what's the first thing I can think of in astronomy."

Well, how about the magnitude of stars. So, is the magnitude of stars normally distributed? I googled "distribution magnitude of stars" and got images (by clicking on images tab) that seem pretty normally distributed to me. Aren't they?

I think you're being downvoted a bit unfairly. What most non-statisticians don't realize is that some of the most "unnormal" distributions actually look pretty normally distributed, and some of the most ugly looking distributions actually behave like a normal distribution. When you look at a distribution's pdf, you're only looking at the "center" of the distribution. But what statisticians (usually) care about are the tails of a distribution.

Consider the following example: The [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution) is a pathological example of a distribution with very heavy tails. Heavier than the power law distributions meantioned in the GP. So heavy, the distribution doesn't even have a mean value. But if you just look at the pdf, it looks almost exactly like a gaussian.

It is pretty easy to tell a Cauchy or a low k Student's T apart from a Gaussian. Just look at your outliers. One thing a lot of people miss is that a normal distribution with outliers _IS_ a long tail distribution.

Too bad inference with long tailed distributions is so hard.

thanks! Very interesting. Is the standard deviation still useful for some of the other distributions you mentioned, that look vaguely Gaussian?

You're referring to the square root of the second moment about the mean. The short answer is yes, it's always useful, provided you have an estimator.

Check out Chebyshev's inequality.

Just don't start talking about it to your pointy-haired boss. You'll get fired.

thanks. but what does your last line, about the boss mean?

That this sort of knowledge is often uninteresting to non-technical types.

I'm pretty jaded at this point.

For the specific example of the Cauchy distribution, the standard deviation isn't defined.

The standard deviation is defined as


If EX = infinity but X is finite with prob > 0, the standard deviation will be infinite too.

You mean like this one?


Not really. Plus keep in mind magnitude is a logarithmic measure. Things get skewed when you take logarithms. Also if you look for example at star sizes:


(from http://spiff.rit.edu/classes/phys440/lectures/size/size.html).

The distribution is certainly not gaussian, and very much likely heavy tail.

Not to mention that the distributions from three different sky surveys (in your first link) look dramatically different. Which means the histograms are saying at least as much about the measurement process as they are about what's actually out there in the universe.

To add to that, the sharp cutoff at faint (large number) magnitudes is almost certainly a reflection of the fact that astronomical surveys and images have a flux limit, below which objects cannot be individually detected.

First sentence on Wikipedia: "In astronomy, magnitude is a logarithmic measure of the brightness of an object..."

> First sentence on Wikipedia: "In astronomy, magnitude is a logarithmic measure of the brightness of an object..."

That doesn't tell you whether the magnitudes are normally distributed though. That just tells you how the magnitude is computed from an object's observed/measured flux.

I'm explaining why magnitude looking normally distributed doesn't contradict the parent.

I think it's about size and independence. Gaussian distributions connote small scale (the background noise of the electrical output of this sensor) vs power laws connote large scale (the number of electrical sensors per company) that you normally deal with in your field.

The size of a ruler that a machine cuts (quasi normal/Gaussian)

Wealth (Power/Pareto)

It's not exactly scale. Things tend to fit a Gaussian when individuals tend towards the mean, and power laws when there are "the rich get richer" processes pushing individuals towards extrema.

Power law indicates an exponential distribution, which is often taught right after normal distribution in most probability classes. Normal distribution usually considered as white noise, hence a fixed value + white noise is often Normal, which is quite common as well.

> Power law indicates an exponential distribution

This sounds wrong. One is invariant under scaling, the other under translation.

> This sounds wrong.

It is.

However there is a way to make it correct. Your second sentence is spot on the intuition. If you log transform power-law distributed random variable the RV that you get is an exponentially distributed RV.

> As a (former) astronomer, I've never understood why people assume normal distributions for everything.

It shouldn't, because there is a big reason to lean towards normal distribution, and that is the central limit theorem. If you sample a variety of different distributions, the sum of all these samples will go towards a normal distribution. That is why the normal distribution pops up in so many places.

I understand that. But in astronomy, there are so many vastly different scales to work with, that in practice, at any particular scale, there is usually only one physical process that dominates. It is extremely unlikely for two independent physical processes to operate at nearly the same scale! So it's hard to get processes that are the sum of enough random variables to make a distribution approximately Gaussian. The situation is probably different in a field like biology because there are many factors that, say, affect a person's height. The closest that I've ever encountered in astronomy was the distribution of the powers of a class of radio galaxies I studied that exhibited an approximately log-normal distribution. (See figure 7 of [1].)

[1]: http://adsabs.harvard.edu/abs/2012ApJ...756..116A

That sounds nice, justifying why you expect your distribution to be the sum of other distributions, and not another operation like multiplication takes a lot more work; in some fields (e.g. finance), it's completely wrong.

If you multiply random numbers together you get a log normal distribution. Which means the logarithm of the number is normally distributed (multiplication is just adding logarithms after all.) However multiplicative noise is a lot less common than additive noise.

Central limit theorem only yields normality when the source distributions have defined variance, otherwise they go to levy alpha.

They want everything to hit a Bell Curve because that is the only thing they remember from statistics class.

I've had debates that not everything comes out as a Bell Curve only to be called an idiot for claiming that.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact