
Wide and Deep Learning: Better Together with TensorFlow - hurrycane
https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html
======
john-whelan
I'm curious, is it possible to train a model and send that model to a
raspberry pi or something to make something like this realistic to use? Then
take the results from the raspberry pi, send them elsewhere to be trained into
the model? and repeat?

~~~
joefkelley
Yes, this is very frequently done. Training is far more computationally
expensive than evaluation in most cases.

I don't know of any specific cases of this being down with a raspberry pi, but
many phone apps, for example, have this sort of architecture. Train a model on
a powerful server/cluster, send model to phones for use, phones collect more
training data that is sent to server/cluster, repeat.

------
twelfthnight
I don't know if I missed it, but I would have liked to have seen a
performance/accuracy comparison between wide+deep learning and a simple
ensemble between wide and deep models. The advantage to having 2 separate
models is that you could use just one or the other if something went wrong, or
if you needed to make a faster prediction (i.e. when the escalator breaks, you
get stairs).

~~~
plusepsilon
In many cases the simple ensemble is more fragile because you have to keep
track of multiple things at once. The winning solution for the Netflix Kaggle
competition was an intractable ensemble which never got used in production.
Also when you update your models you'll have to tune them individually, then
manually tune the ensemble weights.

Another advantage of joint learning (which the authors mentioned) is that the
individual models need not be as big when trained independently since they
complement each other. Though the joint model will surely be bigger than each
of the individual models.

~~~
argonaut
They (reasonably) claim the joint model doesn't have to be as big, but, for
example, it would be interesting the see an ensemble of 2 models: a wide model
of the same size as the wide half of the joint model, and a hierarchical model
of the same size as the hierarchical half of the joint model.

------
ninjin
I am about to catch a flight, so I am unable to do anything better than
skimming the post and paper. But isn't this just the good old feature
embeddings coupled with learnt features that have been around for several
years by now?

~~~
mattj
I think the change here is they're learning the embeddings alongside the
feature weights (eg they're part of the same loss function).

------
saltvedt
Can anyone give more examples of specific problems this could solve?

~~~
anantzoid
The corresponding paper linked in the blog post explains the recommendation
system behind Google App Store. The recommendations generated from this model
led to a significant increase in app downloads.

~~~
rockmeamedee
Nice.

It strikes me that the example in the blog post is just a general search
problem, eg google search could use this: if you type "Brexit" and you want a
general overview of a lot of different things, vs you type a specific query
and are looking for a specific page.

