
Examining crime reports in San Francisco with SQL - speckz
https://www.periscopedata.com/blog/safety-in-san-francisco.html
======
dougmccune
Almost all the weird spikes in the SF crime data are data issues, not real
crime trends. If you're naive about it you'll get to the conclusion that one
spot in SOMA along Bryant is the crime capital of SF, except that's just where
the Hall of Justice is where tons of crimes get geolocated to. That doesn't
mean they actually happened there. Same thing for many of the police stations
and hospitals throughout the city. The top 5 locations in the data aren't
"real". The fact that Monday, the 1st of every month, and Jan 1 are the maxes
too doesn't strike me as legit.

The article mentions some of this, that "It could also be an effect of how
this data is collected". That's my guess for every insight in this article.

~~~
minimaxir
I did a followup to my analysis of this dataset by plotting each arrest
location as a single point on a map. ([http://minimaxir.com/2015/12/sf-arrest-
maps/](http://minimaxir.com/2015/12/sf-arrest-maps/) ). The cluster locations
of those are meaningful and not randomly geolocated. (All of Tenderloin and
16th St. Mission).

~~~
dougmccune
The major clusters in the TL and Mission are real, for sure. But as one
example, the Hall of Justice at 850 Bryant still sticks out in a lot of these.
You can see it in the version by crime type where plotting the points. It's
the spot just south of the highway before the Bay Bridge. In your hex binned
map it's one of the pink hexagons in every crime type map.

So yeah, the bad data effect goes away somewhat when you aggregate the data by
larger areas. And it doesn't make the overall trend of TL and Mission crime
rates invalid or less obvious. But it's still visible in all your maps.

------
minimaxir
It's worth noting that this particular dataset _easily_ fits into memory. I
had done a similar analysis 9 months ago on the same dataset (blog post:
[http://minimaxir.com/2015/12/sf-arrests/](http://minimaxir.com/2015/12/sf-
arrests/) , Jupyter notebook: [https://github.com/minimaxir/sf-arrests-when-
where/blob/mast...](https://github.com/minimaxir/sf-arrests-when-
where/blob/master/crime_data_sf.ipynb)) and it is 377.9 MB on disk, 180.9 MB
in memory.

Using a SQL approach for that amount of data instead of manipulating the data
via R/Python may be overkill and unnecessarily verbose. (e.g. date/time
conversions and catching edge cases like Feb 29)

~~~
loukrazy
Alternately you could have your SQL cake and eat it too using any SQL where
you can define custom functions, such as in SparkSQL

------
NolMan
I believe you have reversed latitude and longitude in your first data figure.

The title should likely read "with RedShift" instead of "with SQL".

------
danso
I used this data for a class lesson. Looking at the general numbers doesn't
yield many interesting insights, and as minimaxir said, it's a dataset that
easily fits into memory.

Never figured out why prostitution dropped so much and the SFPD never
responded back to me. Another interesting trend is that car thefts, IIRC, is
the one category that shows a distinct rise in the past couple of years.

[http://www.padjo.org/2014-10-14/](http://www.padjo.org/2014-10-14/)

edit FWIW, the numbers in my post (from Fall 2014) are not reflected in the
latest dataset. That is, instead of 269 prostitution related incidents in
2013, there are now 692 incidents. In 2014, there were 449. So still a drop.
Interesting that older data gets so many updates/revisions 2+ years later
(other than the change of disposition, of course)

------
syassami
1st of the month == rent due soon, therefore go commit crime? Krayzie bone may
have alluded to this
[http://www.azlyrics.com/lyrics/bonethugsnharmony/1stofthamon...](http://www.azlyrics.com/lyrics/bonethugsnharmony/1stofthamonth.html)

~~~
thesmallestcat
There is nothing in that song about committing crime to pay rent though. It's
about celebrating because the first and fifteenth are when people receive
welfare checks, which is why the refrain includes "cash your checks."

~~~
was_boring
Exactly. I'd even take it a step further and say it's about a cash injection
into the local economy so everyone is out either working to collect other's
money or spending it themselves.

> The 1st be the day for the dopeman; Slangin' that cocaine fool, and I'm
> working late tonight; And all them fiends be lovin' them thugs; 'Cause I got
> them rocks for them pipes

------
jwcrux
Nice! I did something similar with the San Antonio Police Department open
data: [http://jordan-wright.com/blog/post/2016-05-06-exploring-
sapd...](http://jordan-wright.com/blog/post/2016-05-06-exploring-sapd-call-
data-with-elk/)

