

Crime Over Time: Visualizing Crime Data in Chicago - glaugh
http://www.socrata.com/blog/crime-time-visualizing-crime-data-chicago/

======
andymboyle
Well, I'm not sure what the point of this was. My team at the Chicago Tribune
has been working on visualizing and explaining Chicago crime data for quite
some time. Here's one of our biggest projects, which pulls in (I think) the
same data Socrata used, as well as our own tracking of shootings, homicides
and suburban crime data:
[http://crime.chicagotribune.com](http://crime.chicagotribune.com)

How we built it: [http://blog.apps.chicagotribune.com/2013/02/28/the-
chicago-c...](http://blog.apps.chicagotribune.com/2013/02/28/the-chicago-
crime-site/)

More about the suburban crime data:
[http://blog.apps.chicagotribune.com/2014/01/31/displaying-
cr...](http://blog.apps.chicagotribune.com/2014/01/31/displaying-crime-data-
for-chicagos-suburbs/)

Hopefully one of my coworkers will join in here who's more knowledgeable than
me, but remember crime data, such as what Socrata is displaying, is just a
snapshot of crimes that have been reported. It could mean those crimes were
just more enforced at that time, or more people decided to report crimes, etc.
It is not a crime victimization survey.

------
dougmccune
It's always really important to understand what you're analyzing when dealing
with crime data. In this case this is when crimes are reported or when arrests
are made. That's not the same thing as when the crime occurs. So the spikes
for burglary in the morning and at lunch hour are more likely because that's
when people notice the crime has occurred and call the cops, not when someone
is actually breaking in. And the prostitution trends are probably when the
cops are choosing to be out and about arresting, which might not at all be
correlated with when the most illegal activity is actually happening.

It's interesting that the author thinks the spikes on the first of the month
are bad data, but that the huge spike in sexual assault on Jan 1 of each year
is valid data. I wouldn't doubt that there are slightly more sexual assaults
on new year's eve, but on the order of 10x more than any other day of the
year? That doesn't seem right. I'd guess there are more nuances about how
sexual assaults are reported or logged that are at play here.

------
adrianh
I did (some of) this with chicagocrime.org in 2005... :-)

...and I've since learned the error of my ways. It's just too misleading to do
these types of overly simplistic data reports. The data set is flawed in many
ways, including:

* Data model. The Chicago crime data only has a single date/time field. For many crimes, such as break-ins, the victim/reporter isn't able to pinpoint an exact time; they might just provide a time _range_. That doesn't jive well with the data model.

* Data mistakes. I dealt with public record databases extensively from 2005-2012 (Washington Post, chicagocrime.org, EveryBlock). Government data sets (like any data sets!) contain mistakes, which are compounded when you do aggregate queries.

* Systematic police department effort to reduce crime numbers through data trickery. See the amazing recent piece by Chicago Magazine: [http://www.chicagomag.com/Chicago-Magazine/June-2014/Chicago...](http://www.chicagomag.com/Chicago-Magazine/June-2014/Chicago-crime-statistics/)

All in all, obviously this post is harmless link bait at face value, but more
thought should be given to these issues. Open govt. data is a good thing, but
it's healthy to be skeptical.

(Note chicagocrime.org is no longer around, as I redirected it to my other
project, EveryBlock, several years ago:
[http://www.holovaty.com/writing/chicagocrime.org-
tribute/](http://www.holovaty.com/writing/chicagocrime.org-tribute/))

------
minimaxir
The presentation of the data is extremely confusing, especially with the lack
of titles and axis names. Does each bin between each month represent a year
from 2001 to 2013? On some charts, like narcotics and criminal sexual
assaults, you switch to weekly bins with no indication at all.

~~~
incision
I was going to say the same thing.

I don't know how to say it without coming off overly negative, but those
graphs read like textbook examples of what not to do.

Thankfully, there's no shortage of good, free resources for learning these
things [1].

1: [http://datasciencemasters.org/](http://datasciencemasters.org/)

------
ejain
Socrata and Statwing are both very useful services, but without some domain
knowledge and a good understanding of how the data was collected, even the
best tools won't get you valid insights.

------
fiatmoney
What's with the spike in sexual assaults on the first of every month? Some
reporting artifact?

~~~
minimaxir
I checked the data myself and I believe it's a coding error on his end. I'm
not seeing a spike in December or January for Criminal Sexual Assaults. (it's
worth noting that using a month function on a null date usually results in a 1
being returned (January))

Here's what I'm seeing with the data:
[https://www.icloud.com/iw/#numbers/BAJdMdQ8oUn9yYC-
REmBpZlzD...](https://www.icloud.com/iw/#numbers/BAJdMdQ8oUn9yYC-
REmBpZlzDDlXE7j3N3SF/Criminal_Sex_Chicago)

~~~
kbart
What's wrong with this link? It says Firefox (v29.0) is not supported and if I
chose to ignore that, an error occurs.

