I wish the diagrams were bigger, they are hard to read and a bit blurry. One of ...

agibsonccc · on June 12, 2017

Warning: I'm a vendor. Take everything I say with a grain of salt. I will try to sell you something.

1 biased perspective I have here: Infra is often a different team from data science. They don't always do the deploying. Beyond "some sort of serving thing" the data scientists might not necessarily know about what's being deployed. This is not true at every organization and there are exceptions. This is typically true of most companies we sell to though. There are usually ML platform teams that do the "real" deployment (especially at sizable scale)

Another characteristic of production is it's "boring". "Production" is a mix of databases to track model accuracy over time, possibly microservices depending on how deployment is "done". Characteristic ways of giving feedback when a model is wrong, experiment tracking and model maintenance among other things.

A lot of these things are typically very specific to the company's infrastructure.

The "fun" and "sharable" part that people (especially ML people) is usually related to "what neural net did they use?"

The other thing to think about here: "production" isn't just "TF serving/CoreML and you're done" there's typically security concerns, different data sources,.. that are often involved as well that might be specific to a company's infrastructure. There also might be different deployment mechanisms for each potential model deployment: eg: mobile vs cloud.

Grain of salt sales pitch here: We usually see the "deployment" side of things where it's a completely different set of best practices that happen to overlap with data scientists experiments. This includes latency timing, persisting data pipelines as json, gpu resource management, kerberos auth for accessing data, managing databases and an associated schema for auditing a model in production (including data governance), connecting to an actual app/dashboard like the ELK stack,..

TLDR: The deployment model would be its own blog post.

krona · on June 12, 2017

The Google paper Machine Learning: The High Interest Credit Card of Technical Debt [1] offers a semi-rigorous introduction to the topic of real-world ML model engineering/deployment considerations and best practice. (If anyone else knows of similar work I'd be grateful to hear about it.)

[1] https://research.google.com/pubs/pub43146.html

agibsonccc · on June 12, 2017

This is actually a great reference! Thanks for the link.

eggie5 · on June 12, 2017

Looking forward to your model deployment blog post -- it's still a new pattern for most

agibsonccc · on June 12, 2017

What would you like to see? I can see what we can do. Typically "deployment" is an overloaded term.

Thanks for your interest!

vinutheraj · on June 12, 2017

There's TensorFlow Serving [0] and the SavedModel export format [1] to help with this.

[0] https://tensorflow.github.io/serving/ [1] https://github.com/tensorflow/tensorflow/blob/master/tensorf...

cityhall · on June 12, 2017

If you have to rely on model serialization schemes you have a problem because they express the model in terms of low level operations.

You probably want to do experiments with multiple model variants, or teak your model and fine tune from deployed weights. To do that you need a way to recreate it from layer-level objects instead of the add/reshape operations Tensorflow and its kin store internally.

minimaxir · on June 12, 2017

Keras has a save_model() [including both weights and architecture] and a load_model() function.

The model has to be converted to a format for CoreML, which does not work with the Keras 2 API yet: https://pypi.python.org/pypi/coremltools

nl · on June 12, 2017

You can just use the weights in a Go matrix. There's some code at https://www.tensorflow.org/versions/master/install/install_g... but it wouldn't surprise me if they have their own implementation.

eggie5 · on June 12, 2017

That example is essentially building (and ostensibly training) an albeit a trivial model in Go. Typically you have more complicated architecture so you're deployment has two parts:

* the weights * and the architecture

nl · on June 12, 2017

Yes, the example is TF Hello World.

But the very first sentence on that page says: These APIs are particularly well-suited to loading models created in Python and executing them within a Go application

This is exactly what Uber is doing.