
Running your models in production with TensorFlow Serving - hurrycane
http://googleresearch.blogspot.com/2016/02/running-your-models-in-production-with.html?m=1
======
Smerity
Model serving in production is a persistent pain point for many ML backends,
and is usually done quite poorly, so this is great to see.

I'm expecting large leaps and bounds for TensorFlow itself. This improvement
to surrounding infrastructure is a nice surprise, just as TensorBoard is one
of the nicest "value-adds" that the original library had[4].

Google have ensured many high quality people have been active as
evangelists[3], helping build a strong community and answerbase. While there
are still gaps in what the whitepaper[1] promises and what has made it to the
open source world[2], it's coming along steadily.

My largest interests continue to be single machine performance (a profiler for
performance analysis + speedier RNN implementations) and multi-device /
distributed execution. Single machine performance had a huge bump from v0.5 to
v0.6 for CNNs, eliminating one of the pain points there, so they're on their
way.

I'd have expected this to lead to an integration with Google Compute Engine
(TensorFlow training / prediction as a service) except for the conspicuous
lack of GPU instances on GCE. While GPUs are usually essential for training
(and theoretically could be abstracted away behind a magical GCE TF layer)
there are still many situations in which you'd want access to the GPU itself,
particularly as performance can be unpredictable across even similar hardware
and machine learning model architectures.

[1]:
[http://download.tensorflow.org/paper/whitepaper2015.pdf](http://download.tensorflow.org/paper/whitepaper2015.pdf)

[2]: Extricating TensorFlow from "Google internal" must be a real challenge
given TF distributed training interacts with various internal infra tools and
there are gaps with open source equivalents.

[3]: Shout out to @mrry who seems to have his fingers permanently poised above
the keyboard -
[http://stackoverflow.com/users/3574081/mrry?tab=answers&sort...](http://stackoverflow.com/users/3574081/mrry?tab=answers&sort=newest)

[4]: I've been working on a dynamic memory network
([http://arxiv.org/abs/1506.07285](http://arxiv.org/abs/1506.07285))
implementation recently and it's just lovely to see a near perfect
visualization of the model architecture by default -
[http://imgur.com/a/PbIMI](http://imgur.com/a/PbIMI)

~~~
mrry
Thanks for the shout out!

~~~
colah3
Derek is also extremely available to his colleagues at Google. He's always
friendly when I ask questions, and very thoughtful. I feel lucky to work with
him, however distantly! :)

------
dgacmu
Note also that we've released v0.7 of Tensorflow today - more details in the
release announcement:
[https://groups.google.com/a/tensorflow.org/forum/#!topic/dis...](https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/_IdeX4XCRqg)

------
TheGuyWhoCodes
This looks great and brings TensorFlow close to using it in production where
the model has a life cycle.

I'd wish they could implement other well know ML algos like trees, give Spark
ML some fight :)

~~~
barneso
There are plenty of alternatives out there to Spark ML: here is a survey of RF
implementations: [https://github.com/szilard/benchm-ml/tree/master/z-other-
too...](https://github.com/szilard/benchm-ml/tree/master/z-other-tools)

There is a whole other world of non stochastic gradient descent based
algorithms out there; IMO Tensorflow is sensible to stick to one class of
algorithms and do it well.

(Disclaimer: I work on mldb, one of the tools on that list).

~~~
TheGuyWhoCodes
mldb looks great. But I was referring to distributed model building, in a
horizontal way. Which SparkML does, and TensorFlow says it does. If they can
implement a distributed Gradient Boosting Tree across nodes, maybe even with
GPU support (Although I'm not sure if it's applicable) that could be huge.

~~~
barneso
Once the open source version of Tensorflow releases multi-node support, this
would be one way to make it work. There are potential gains from using a GPU
for RF training. As for distributing, in my experience for small models it
doesn't make much difference and for larger models the cost of distributing
the dataset dominates the benefit from having multiple nodes. But an
implementation carefully designed for a given node topology could be made more
performant.

------
swah
Off-topic: I always open C++ projects from Google - they are always so tidy
and clean. It just feels like a work of craftmanship, if that actually exists
in software:
[https://github.com/tensorflow/serving/tree/master/tensorflow...](https://github.com/tensorflow/serving/tree/master/tensorflow_serving/core)

OTOH, I have a strong prejudice against Javascript on the backend... And its
not due to it being dynamic - the same doesn't happen with Python codebases.
It is completely irrational.

~~~
pjmlp
I don't share the same opinion from the NDK code.

Plain C with C++ compiler, with a pseudo Hungarian notation.

------
curiousfiddler
I'm not sure if TensorFlow already provides that, but it would also be pretty
awesome to access some of Google's data sets to train the models.

~~~
nl
Which data are you after? The ImageNet data is public _and_ they released the
pretrained model.

They've promised to release (or already have released) the models for
_Exploring the Limits of Language Modeling_ [1] which was trained on the 1 B
Word Benchmark corpus[2] which is also public data.

Note that for these, the trained models are often more immediately useful. The
language modelling model was trained for 3 weeks on 32 Tesla K40s. That's not
something many can replicate casually.

[1]
[http://arxiv.org/pdf/1602.02410v2.pdf](http://arxiv.org/pdf/1602.02410v2.pdf)

[2] [http://www.statmt.org/lm-benchmark/](http://www.statmt.org/lm-benchmark/)

