
Machine Learning: Regression of 911 Calls - lukasz_km
http://machinelearningexp.com/machine-learning-regression-911-calls/
======
minimaxir
The model is predicting the number of 911 calls is dependent on latitudes and
longitude...naturally, it would be expected that the number of 911 calls be
strongly correlated with the amount of potential victims. ("The model score
(R2) is 0.81, so its accuracy is significant" is an incorrect interpretation
of the R2; it indicated the amount of variation explains by the model, which
is much different from an accuracy metric, and the high value could be
explained by the correlation)

I built a similar model which predicts the types of crimes in San Francisco
using LightGBM (better than xgboost which is better than scikit-learn's
GBMs/GBTs), filling lat/long, month, day-of-week, hour, and year
([http://minimaxir.com/2017/02/predicting-
arrests/](http://minimaxir.com/2017/02/predicting-arrests/)). The
classification aspect is much tricker than a simple regression. But even then,
latitude and longitude constituted 70% of the Gain in the GBM model.

(as an aside, day-of-week/hour should likely be encoded as categorical
variables using one-hot-encoding, although when I tested that in my post, the
results were unchanged, oddly)

~~~
lukasz_km
Minimaxir thanks for comment. Regarding "accuracy" I agree that this word
usage is unfortunate in case of R2. A "goodness" would be better description
of R2. Regarding the "one-hot" encoiding of inputs, I would say - it depends.
I know it is a common practice to do "one hot" encoding in Machine Learning.
There are cases when it really helps the model to converge (for example,
sometimes Neural Networks would require such curation of input). However,
model that I have used in my post is Decision Tree based. And when you think
for a moment about it, Decision Tree based model should handle numerical input
just fine. Usually, Decision Tree based models do not even require
standardization/normlization of input and work correctly without it.

------
slavakurilyak
TL;DR

It's possible to predict the location and quantity of 911 calls based on call
data (accident time, description of emergency calls and geolocation data)
using Regression analysis

