
Are mass shootings really random events? A look at the US numbers - jipumarino
http://www.empiricalzeal.com/2012/12/24/are-mass-shootings-really-random-events-a-look-at-the-us-numbers/?utm_source=rss&utm_medium=rss&utm_campaign=are-mass-shootings-really-random-events-a-look-at-the-us-numbers
======
RyanZAG
All these comments about statistics don't seem to even begin to understand
statistics.

1) Yes, we can use the chi-squared test here - the small number of events is
built into the p value.

2) No, we do not need to include all murders, as we are not testing for
murders. We are testing for mass shootings. This line of reasoning is the same
as saying we cannot test for rotten apples only, we can only test if all fruit
is rotten. Statistics on categories is acceptable and meaningful if apples are
a specific kind of fruit, or mass shootings are a specific kind of crime. We
can't draw an conclusion about crime, but we can draw a conclusion about mass
shootings.

3) These statistics prove only a single thing: mass shootings in USA are
likely random events and have a mean value of ~2.

    
    
      The following conclusions are applicable:
      - Mass shootings are likely not a 'copycat' crime,
        and each event is likely completely independent
        of any other event occurring.
      - You should expect about 10 mass shootings over the
        next 5 years. Not so nice...
    
      Following conclusions are NOT applicable:
      - Gun laws have any effect. Gun laws may or may not
        decrease the average, these statistics do not say.
      - No measures that have been put in place to reduce
        mass shootings (I assume there are?) have had any
        effect so far. They may have had a positive or
        negative effect, but these effects may be small
        or may be cancelled out by other, opposite effects.
      - The chance of a mass shooting is stable, and no
        increase or decrease in events seems to be happening.
        The distribution fit test only shows us on the
        aggregate data, and that it does fit that distribution.
        The correct way to check for this is to break the data
        set in half, and compare the first set against the
        second set.
    

And now we return you to our regular statistics hate...

EDIT: Or not - nobody is questioning the real possible problem here: the data
itself?

<http://www.bradycampaign.org/xshare/pdf/major-shootings.pdf>

This seems to imply there are FAR more mass shootings per year than indicated
by the data. None of my conclusions above are correct if the data itself is
wrong, and I don't even live in the USA so I can't vouch for the correctness
of the data.

~~~
pretoriusB
> _These statistics prove only a single thing: mass shootings in USA are
> likely random events and have a mean value of ~2._

The whole "it is random" BS is based on the pre-assigned value of 2 per year.
Why pre-suppose that 2 per year is an acceptable norm, that is not problematic
in itself?

Consider a US with a mean value of 10,000 mass shootings per year, with the
occasional 0, 20.000 or 50,000 etc mass shootings fitting the poisson
distribution. Using the same methodology as the article, the same "conclusion"
would have been reached, that mass shootings are random.

This is a major misunderstanding at what "random" implies in this context. It
just means that the motives and decisions at the individual level to "go for
it" are triggered independently and with the specific year bearing no
influence. That is, it proves that the specific events are not _co-ordinated_.

This is mighty fine, but it doesn't at all mean that the cultural / law / etc
climate that makes even considering an attempt at one of these events possible
(much more for a median of 2 a year) is "random" or non changeable by human
intervention.

Way to misapply statistics. As they say, there are lies, damned lies, and
statistics.

As an extreme example, in a country where there are no guns (either legal or
illegal), there would be no mass shootings AT ALL. No "spontaneous random
activity" pertaining to mass shootings could change that.

In a similar, but more practical way, if other western countries have a median
of about 0 mass shootings per year for decades (with the occasional
exception), this tells us that even if the number of mass shootings per year
in the US is random, the fact that it has mass shooting at all is not random
but a cultural/structural result.

~~~
svantana
A minor point here, but relevant since the OP is emphasizing the importance of
understanding the Poisson distribution - if the mean really was 10,000, any
observation outside the range [9,500, 10,500] would be extremely unlikely.

<http://en.wikipedia.org/wiki/Poisson_distribution>

~~~
pretoriusB
Thanks, I should correct for that, but don't have the "edit" link anymore.

------
rdl
This seems like trying to use a hammer (statistics) because you have a hammer,
not because it's the right tool for the job.

Essentially, if you look at the incidents, you see enough common factors
(increasingly, using semiautomatic carbines, carrying multiple weapons,
attacking schools, wearing armor or load bearing gear, etc.) to think there is
some common factor at work. The population of random people on the street
doesn't pick the AR-15 to do _anything_ , and certainly doesn't pick a school
as a target for anything. The solution space here isn't "spree shootings at
schools through time", it is traits of spree shootings themselves -- location,
methods, etc. They're pretty tightly clustered.

Either there is a common hidden factor, or these incidents are feeding on each
other.

I personally don't think gun control is the major tool to deal with this, and
don't think violent video games are the problem, but rather the non-stop
multi-day press coverage by the media of each of these incidents.

Some insignificant douchebags from a Colorado school became about as famous as
the 9/11 terrorists (and far more than fortune 50 CEOs or scientists or
classical musicians) by murdering their classmates.

(Columbine essentially as as big a deal for the 'how to respond to shootings'
world as 9/11 was to aviation security; previously, you cordoned off the area
and called in SWAT to negotiate, thinking it was a hostage situation -- now,
the first 1-2 responders on scene move directly to the threat with whatever
weapons they have on them at the time, ignore any wounded victims, close, and
engage/destroy -- similarly, hijacked airliners are now viewed as air to
ground missiles vs. hostage negotiations.)

Every time the media talks about the shooters in one of these situations,
making them famous, it reinforces the rational (if defective) choice of
someone who wants to be famous at the cost of doing evil to copycat.

The mythological/historical example is Herostratus, who burned the temple of
artemis just to be famous.

~~~
fijal
May I ask how gun control is not a tool to deal with that? In other words,
what are the differences between Americans and Europeans that prevent most of
such violence happening in Europe (mostly at least), other than hard access to
guns?

~~~
yummyfajitas
One key fact to observe is that crime _without guns_ in the US is also very
high. The US has a stabbing+bludgeoning+poison+etc murder rate of 1.8/100k,
which is still higher than murder by all methods in most of Europe. I.e.,
Americans just like to kill people.

<http://www.cdc.gov/nchs/fastats/homicide.htm>

[https://en.wikipedia.org/wiki/List_of_countries_by_intention...](https://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate)

One possible cause is demographics - more than half of our murders are
committed by a demographic group making up only 10% of our population, and
this subgroup is very uncommon in Europe.

[http://en.wikipedia.org/wiki/Race_and_crime_in_the_United_St...](http://en.wikipedia.org/wiki/Race_and_crime_in_the_United_States)

~~~
rdl
Possibly also the drug war, which is quite connected with violence, race,
racism, poverty, urban/suburban/rural migration patterns, public health, human
nature, geography, immigration, expansion federal power, military conquest,
etc.

------
beloch
Mass shootings are rare enough that you're going to get poor stats due to
granularity. It might be more interesting to look at stats like homicide rate.

The U.S. has a homicide rate that is second only to Russia in the G8, and is
more than 3 times higher that of any G8 nation besides Russia. This sets off
warning bells for me...

Normally I'd love to dig a little deeper, but it's time for me to play Santa.
Merry Christmas!

~~~
tomjen3
The sad truth is that if you exclude innercity slums, themurder rate in the us
plummets to mostly normal levels.

~~~
seanmcdirmid
Except, is that really true?

[http://www.nationalatlas.gov/articles/people/IMAGES/crime_mu...](http://www.nationalatlas.gov/articles/people/IMAGES/crime_murder.gif)

What the heck is wrong with Northern Idaho? Elko Nevada? So Cal's inland
empire? I didn't think there was anyone to kill in western New Mexico. The
Mississippi Delta also doesn't look like a healthy place to live.

If we excluded Louisiana, the murder rate would probably plummet to mostly
normal levels. And it just isn't New Orleans.

~~~
samdk
I don't know if the person you're responding to is actually correct or not,
but the map you're referring to is very misleading. The places you named--and,
not coincidentally, many of the places on the map with the highest murder rate
as a percentage of population--are all sparsely populated.

Notice that the many of the places with the lowest murder rates _also_ tend to
be sparsely populated. What you're seeing is the increased variance that one
should expect to see when dealing with smaller sample sizes, not evidence that
small towns are more dangerous than big cities.

~~~
seanmcdirmid
Ya, I just wish I had a better map like this one for vehicular homicide:

<http://www.textmap.com/offence/vehicular-homicide.htm>

I bet we would see more geographic correlation with not too much variation in
city areas (e.g., the Delta is still a dangerous place to live, which would
correspond to anecdotal evidence). Unfortunately, I can't google up a decent
gun homicide heat map.

------
chewxy
For anyone who's interested, this blog post piqued a very morbid interest in
me, and so I decided to have a look at the data.

Here's a distribution of the number of days before the previous incidents (as
reported by MotherJones. I'm personally not familiar with the number of
shootings in the US, which according to different sources ranges from very few
to way too many): <http://imgur.com/Onoxv> . The colours represent the group
of how many incidents happened in the 180 days prior to the incident.

Here is the distribution of the number of incidents 180 days prior to an
incident: <http://imgur.com/9VTjQ>

Draw your own conclusions

------
aes256
There is absolutely an observed trend of copycat school shootings which flies
in the face of randomness.

That said, I think this trend is hidden from this dataset because it relates
specifically to _mass_ shootings, including only those incidents in which the
shooter took the lives of at least four people.

I would hypothesize that each major shooting is followed in the succeeding
months by a number of slapdash copycat shootings, in many of which less than
four people are killed by the shooter.

------
clarkm
Since mass shootings are such a rare events, the data is overdispersed, so a
negative binomial distribution is likely more appropriate than a Poisson
distribution. This, along with quasi poisson, is commonly used by
criminologists to account for such problems.

See this paper for a more detailed discussion:

<http://www.crim.upenn.edu/faculty/papers/berk/regression.pdf>

~~~
dbecker
Deviations from the Poisson distribution are exactly what he is testing for.
The paper you referenced discusses how to model in the presence of
overdispersion. Not how to test that as a hypothesis.

Moreover Poisson distributions are generated from events that occur in a very
small fraction of all opportunities, but with many opportunities each time
period. So, unless the timing of shootings is correlated (again, the thing he
is testing for), this is a perfect use case for the Poisson distribution.

------
SoftwareMaven
This was a great article, but I'm not convinced mass shootings are completely
independent. Given the media attention, it is very likely that event n+1 was,
in some way, impacted by event n.

I don't have any evidence to back this up, so I could be (and I hope I am)
completely wrong. But if I'm right, then I think it would imply we need to
reduce the correlation between events. And it is likely the media that
provides that correlation. Instead of reporting about the <insert your word
here> that causes these events, report on the _impacts_ of these events. It
would, hopefully, stop deifying the perpetrator, which might reduce the
likelihood of other perpetrators from doing the same.

~~~
jeremyjh
There should be some way to formalize this into a hypothesis. Testable with
the same data I mean. If you were correct we should see more clustering then
we do with a random sample.

~~~
carbocation
Perhaps looking at the time between mass shootings would be more interesting
than simply looking at mass shootings per year.

~~~
vsbuffalo
You'd get weird mixture model type systems. There are certain subclasses:
school-based vs non-school-based. You'd have to control for this since school-
based shootings are highly temporally correlated. But then maybe there's
crossover too. Like everything, estimating these things with more complex
models is hard to do right, and findings are usually overstated.

------
lostlogin
This hits HN as another shooting covers front pages. A new dark twist for
extra newsworthiness.
[http://www.nytimes.com/2012/12/25/nyregion/2-firefighters-
ki...](http://www.nytimes.com/2012/12/25/nyregion/2-firefighters-killed-in-
western-new-york.html?hp&_r=0)

Edit: it is unclear at this stage if this event would qualify for inclusion in
the numbers according to the criteria in the article. Below are the used
criteria.

The killings were carried out by a lone shooter. (Except in the case of the
Columbine massacre and the Westside Middle School killings, both of which
involved two shooters.)

The shootings happened during a single incident and in a public place.
(Public, except in the case of a party in Crandon, Wisconsin, and another in
Seattle.) Crimes primarily related to armed robbery or gang activity are not
included.

The shooter took the lives of at least four people. An FBI crime
classification report identifies an individual as a mass murderer—as opposed
to a spree killer or a serial killer—if he kills four or more people in a
single incident (not including himself), and typically in a single location.

If the shooter died or was hurt from injuries sustained during the incident,
he is included in the total victim count. (But we have excluded cases in which
there were three fatalities and the shooter also died, per the previous
criterion.)

We included six so-called "spree killings"—prominent cases that fit closely
with our above criteria for mass murder, but in which the killings occurred in
multiple locations over a short period of time.

------
phaselock
The auther does a horrible job. He takes it as an assumption that the reader
doesn't know statistics, and when he gets to the actual meat he just states as
fact, "i calculate k=32.5 m/j" " what does this mean? It means I'm right" he
should as a minimum either assume completely that the reader knows everything
nessesary to interpret the hypothesis test or actually make an effort to
explain to layman what the p-value means.

------
archgoon
Kudos to the author for attacking the question this way, it's always important
to look at data and try to model it. (and it's a lot of work!)

Unfortunately, I'm pretty sure there's a fatal flaw in the analysis.

Suppose we had two types of years, both following a Poisson Distributino, but
one with a higher incidence, the other with a lower incidence, and they
alternated[1].

Now sample each type of year separately. Each will give you a Poisson
distribution. To get the distribution of all years you'd just add bins of the
two types together.

The sum of two poisson distributions is a poisson distribution (with a
different mean), therefore, we cannot conclude anything about how the mean
value is changing with time (and thus, answering the question: is the
incidence rate rising) unless we actually bin the data by year, and see if the
mean value is changing with time.

Unfortunately, by splitting time up, we reduce the already small sample sizes
for each bracket. Time series analysis is a more appropriate tool to tackle
this question.

[1] This is unchanged if the rate is continuously increasing, the setup was so
that you can actually think of collecting the same data over multiple years.

------
omershapira

      "If mass shootings are really occurring at random, then this suggests that they are extreme, unpredictable events, and are not the most relevant measure of the overall harm caused by gun violence."
    

In my Undergrad years I would get crucified for arguing that a computational
model works and concluding that it is therefore the way reality works.

Seriously though, can anyone reading this fathom a way to formalize the event
"Guns are available"? What's the probability of getting a gun if it's not
available in shops? You need to model a different country for that.

The same reasoning can be used to show that the probability of a mass shooting
given the availability of guns is lower given a lack thereof. In other words,
this: <http://www.youtube.com/watch?v=KsN0FCXw914> .

And if they are indeed rare events, what's the metric for that? If we have a
chance of stopping a rare, unexpected event that would add quite a lot to the
expected value of deaths per year, would it not matter significantly in our
efforts of stopping it?

~~~
jlgreco
I don't think the author is attempting to imply anything other than _"Assuming
mass shootings are random unrelated events, 7 mass shootings in 1 year does
not necessarily signal a significant change from the past 30 years. (Though if
it happens again, it does.)"_

~~~
omershapira
I thought so too until I read the last 3 paragraphs. "Those numbers check out"
is totally valid. But concluding with the relevance of the measure of gun harm
is an observation about reality, not Statistics, which is debatable, but has
no relevance to what was previously mentioned.

We haven't yet defined the goals. Unless "Minimum amount of gun-related
deaths" is the goal (clearly, it isn't that simple in this country), in which
case his highlighted claim is definitely false - we don't know much about
effective metrics. Are the psychological effects of a mass shooting similar to
a snowball effect? Maybe in the long run? There's a lot more to research
before saying 'insignificant', especially if you don't describe the model.

One more thought: could you define 'mass shooting'? Usually the media is
responsible for that definition, whereas a random murder isn't called that.

~~~
jlgreco
The conclusion about the relevance of the measure of gun harm seems to me the
_"mass shootings in 1 year does not necessarily signal a significant change
from the past 30 years"_ part. It means that discussions along the line of
_"Why the change?"_ , or _"How can we go back to the way it was before?"_ are
premature, though it does not say anything at all about discussions along the
line of _"How do we reduce mass shootings?"_.

Mother Jones probably did a pretty decent job of defining 'mass shooting', I
assume they errored on the side of inclusion (I only recall 2 this year). The
authors analysis seems to make it pretty clear it is only considering that
definition.

I'm really not reading this as an attack on discussions of gun violence in
general, as it seems you are.

------
chrisringrose
Beware: the whole point of this is article (and train of thought) is to cast a
shadow of doubt over gun control. It's to remind us how these are random and
extreme events. Yes, mass-shootings are random and extreme. A layperson can
conclude this without having to look at any statistics.

But unfortunately this distracts from the real problem. Machine guns. And how
they're legal, and easy, for anyone to buy. And ammo. And modifications for
guns.

When the constitution was written to allow citizens the right to bare arms,
the only arms in the legislators' wildest dreams took a minute to load one
bullet, likely would miss, and probably wouldn't kill with one shot. Since the
world is very different today, shouldn't the laws change too? Look at the
success of gun control in the United Kingdom.

Enough of this nonsense. It reminds me of climate change deniers pointing out
"Yeah but it's snowing now."

------
dizzystar
I think a better methodology would be using linear time clusters and measuring
the possible impact of the Werther Effect.

Look at the regional areas where the news may not have spread out.

Look at the attempts that were stopped and see if they clustered. The stopped
attempts are probably a better indicator as people, after hearing the news,
are naturally more alert to the indicators that something may happen.

The other problem with the data is that there is no way, from looking at the
graph, if an event happening in December affected an event happening in
January, thus delimiting by years is arbitrary.

------
joe_the_user
The question probably requires a more detailed treatment.

For example, even the distribution of killings by frequency looked poisson, if
the number were uniformly increasing year by year, it would look like a trend.

------
waldrews
Statistician here. Saying nothing about the substance of the argument or
whether their data source appropriately classifies what counts as this kind of
crime, the methodology is on the right track but not optimal.

Of course binning the events by year throws away data about specific timing.
Having actual event times would allow fitting a hazard rate model.

There are two simple alternative hypotheses - explanations for a deviation
from the Poisson assumption. What they're calling "random" is really "occuring
at a homogeneous rate;" so we have to ask - as opposed to what?

1) clustering/overdispersion, because the events happen more often alongside
each other (copycat effect or whatever; risk of a new event is a function of
time since last event)

2) secular trend (the rate of events is changing over time)

We can't really distinguish between these two without explicitly modeling, and
it doesn't look like we have enough data to do that. The tool to do it would
be a generalized linear model with an overdispersed Poisson dependent
variable.

It's kind of bad form to estimate the Poisson mean from the data, and then use
that to fit the distribution you're testing against. You're using the data
twice, so the p-value isn't what you think it is. You should be conditioning
on the total number of events in the sample.

Also, chi-square distribution comparison is for large samples. This is a meh-
kinda-borderline-midsize sample, to use a technical term.

The test they want that has neither of these problems is Fisher's exact test
for equal proportions. If the events were generated by a Poisson process, the
number in each year would be conditionally binomial, and they'd add up to the
total observed event count. The test is: is the data explained by all the
years getting events randomly at the same rate (null hypothesis) vs. each year
having its own rate (alt hypothesis).

And finally, yes, you get a p-value. But you also need to think about the
power of the test detect an anomaly if there was one (type II error). For
something like this, you could do power analysis by a simple simulation, but
you'd need to specify what kind of anomaly you'd want power to detect (e.g.
all events cluster together in one year, to pick an extreme).

Otherwise, all you have is a design that has a correct p-value. If you reject
the null hypothesis at 5%, but you have low power, you might as well toss a
coin and declare the data "nonrandom" 5% of of the time. This might be as good
a test as we can get with the data available, but p-value isn't the only thing
you can look at.

------
jQueryIsAwesome
I have flagged this article; is not only link bait using the subject everyone
is talking about; is being extremely disingenuous using statistics about a so
ambiguous concept as "mass shootings" to draw conclusions about a real world
situation.

~~~
mitchi
I come here to gain knowledge about things related to science and computers.
I'm tired of all the "humanities"

~~~
vitalique
I'd say if you are going to learn about or do any kind of serious science,
resist putting labels left and right just because you are tired, on this site
and elsewhere. There's nothing wrong with humanities in terms of being able to
benefit from applying CS to them.

