
Using logistic regression to predict parking difficulty - cing
https://research.googleblog.com/2017/02/using-machine-learning-to-predict.html
======
nharada
I appreciate a good ol' logistic regression model. I know deep learning is hot
shit right now, but this right here is probably the best way to solve most
real world ML problems. Just good data, insightful features, and a simple
classifier.

~~~
et2o
The dirty secret in ML is that logistic regression, SVM, and random forests
often work better than deep learning on real problems.

~~~
akhilcacharya
For image classification, CNN's are still the way to go. But creating your own
architecture and training your own novel model aren't necessary for most
problems anymore it seems because of transfer learning.

~~~
markovbling
I threw all of the above plus CNNs at MNIST problem and boosted decision trees
outperformed CNNs.

Granted if I tuned both perfectly, CNNs probably would have outperformed but
with defaults and a small amount of parameter search, boosting worked best.

~~~
akhilcacharya
That's pretty interesting, why are boosted decision trees so effective? I've
heard the same meme applied to kaggle competitions (everything is just a way
to shove data into xgboost, etc)

~~~
romaniv
The grandparent post contains a hint of something I've already heard in
lectures and from ML practicioners: boosting and random forests are more
resilient to improper tuning (to put it another way, they are more universal
and work well "out of the box").

Which, BTW, makes them more appealing to me personally. In many real-life
cases extra few percent of accuracy matters very little, but ability to just
apply something to a problem without much fuss matters _a lot_.

------
minimaxir
You can't write a blog post on how to build a statistical model _without
stating how good the model is_ in actuality, along with validating the other
regression details such as independence of features, train/test split, etc. (I
am coincidentally working on a blog post along a similar San Francisco dataset
which specifically addresses these concerns, so it's on the mind)

Logistic regression in particular has many features which provide more
information about feature importance _or lack thereof_ and many metric to
confirm model quality, and it is disappointing to see this post only do a
high-level overview. Yes, it may be a Google trade secret, but there has to be
give-and-take.

~~~
vmarsy
> You can't write a blog post on how to build a statistical model without
> stating how good the model is in actuality

> it is disappointing to see this post only do a high-level overview. Yes, it
> may be a Google trade secret, but there has to be give-and-take.

Why exaclty? This is not an academic paper. People who get this feature to
show up might be curious about how it works, and 99.9% of them won't
understand anything about independence of features, train/test split, etc.
Worse, they would find the article too boring and technical. Just knowing that
it is powered by a ML algorithm (and not some human input) is enough. I'm not
sure why there has to be a give-and-take.

The fact that they put links to wikipedia for what Logistic regression is
should give a good idea of the intended audience of this blog post.

~~~
minimaxir
> Worse, they would find the article too boring and technical.

> The fact that they put links to wikipedia for what Logistic regression is
> should give a good idea of the intended audience of this blog post.

The Wikipedia page on logistic regression is an order of magnitude more
technical than this blog post.

------
ams6110
> we were able to ... utilize anonymous aggregated information from users who
> opt to share their location data

That should read, "from users who did not disable the on-by-default sharing of
their location data"

~~~
halflings
Is it actually on by default ? Last time I reset my android phone, I had to
agree to sharing my location history etc.

------
bahro
"When we started the training process, many of us thought that the
“fingerprint” feature described above would be the “silver bullet” that would
crack the problem for us. We were surprised to note that this wasn’t the case
at all — in fact, it was features based on the dispersion of parking locations
that turned out to be one of the most powerful predictors of parking
difficulty."

I assume dispersion of parking locations is the distance from parking location
to destination? I would have liked to see more about what kinds of inputs they
used and how they cleaned them up to account for the confounding factors they
mention (public transit users, private parking.)

~~~
feral
> I assume dispersion of parking locations is the distance from parking
> location to destination?

I would guess its the density of parking locations in a given area, rather
than distance to destination?

------
legulere
> in a pre-launch experiment, we saw a significant increase in clicks on the
> transit travel mode button, indicating that users with additional knowledge
> of parking difficulty were more likely to consider public transit rather
> than driving.

This shows pretty clear that we shouldn't try to accommodate cars as much as
possible when there already is good public transport at a certain location.

~~~
oftenwrong
We should also remember that excessive accommodation of cars will often
prevent the possibility of good mass transit.

------
KirinDave
I was just retaking CS261 on Coursera alongside a friend (we're in week 3) and
they were asking, "What good is this anyways?"

Related techniques and how to implement them are covered in the first 2 weeks.
While a lot more is going on in this system, one could call the core of the
system that does this estimation "simple" for the field.

------
axplusb
I am wondering how much of this gets to be real-time. Are they computing the
difficulty of finding a spot based on Maps/Waze users' live data or using
daily/weekly patterns on past data?

------
jeffreygoesto
If you go to Stein's you better walk or take public transport, though. ;-)

------
gcatalfamo
Can we use an API to help test parking prediction by using it in our apps?

