Hacker News new | past | comments | ask | show | jobs | submit login

I wish the diagrams were bigger, they are hard to read and a bit blurry.

One of the interesting points, that is often overlooked in ML is model deployment. They mention tensorflow, which has a model export feature that you can use as long as your client can run the tensorflow runtime. But they don't seem to be using that b/c they said they just exported the weights and are using it go which would seem to imply you did some type of agnostic export of raw weight values. The nice part of the TF export feature is that it can be used to recreate your architecture on the client. Bu they did mention Keras too which allows you to export your architecture in a more agnostic way as it can work on many platform such as Apples new CoreML which can run Keras models.




Warning: I'm a vendor. Take everything I say with a grain of salt. I will try to sell you something.

1 biased perspective I have here: Infra is often a different team from data science. They don't always do the deploying. Beyond "some sort of serving thing" the data scientists might not necessarily know about what's being deployed. This is not true at every organization and there are exceptions. This is typically true of most companies we sell to though. There are usually ML platform teams that do the "real" deployment (especially at sizable scale)

Another characteristic of production is it's "boring". "Production" is a mix of databases to track model accuracy over time, possibly microservices depending on how deployment is "done". Characteristic ways of giving feedback when a model is wrong, experiment tracking and model maintenance among other things.

A lot of these things are typically very specific to the company's infrastructure.

The "fun" and "sharable" part that people (especially ML people) is usually related to "what neural net did they use?"

The other thing to think about here: "production" isn't just "TF serving/CoreML and you're done" there's typically security concerns, different data sources,.. that are often involved as well that might be specific to a company's infrastructure. There also might be different deployment mechanisms for each potential model deployment: eg: mobile vs cloud.

Grain of salt sales pitch here: We usually see the "deployment" side of things where it's a completely different set of best practices that happen to overlap with data scientists experiments. This includes latency timing, persisting data pipelines as json, gpu resource management, kerberos auth for accessing data, managing databases and an associated schema for auditing a model in production (including data governance), connecting to an actual app/dashboard like the ELK stack,..

TLDR: The deployment model would be its own blog post.


The Google paper Machine Learning: The High Interest Credit Card of Technical Debt [1] offers a semi-rigorous introduction to the topic of real-world ML model engineering/deployment considerations and best practice. (If anyone else knows of similar work I'd be grateful to hear about it.)

[1] https://research.google.com/pubs/pub43146.html


This is actually a great reference! Thanks for the link.


Looking forward to your model deployment blog post -- it's still a new pattern for most


What would you like to see? I can see what we can do. Typically "deployment" is an overloaded term.

Thanks for your interest!


There's TensorFlow Serving [0] and the SavedModel export format [1] to help with this.

[0] https://tensorflow.github.io/serving/ [1] https://github.com/tensorflow/tensorflow/blob/master/tensorf...


If you have to rely on model serialization schemes you have a problem because they express the model in terms of low level operations.

You probably want to do experiments with multiple model variants, or teak your model and fine tune from deployed weights. To do that you need a way to recreate it from layer-level objects instead of the add/reshape operations Tensorflow and its kin store internally.


Keras has a save_model() [including both weights and architecture] and a load_model() function.

The model has to be converted to a format for CoreML, which does not work with the Keras 2 API yet: https://pypi.python.org/pypi/coremltools


You can just use the weights in a Go matrix. There's some code at https://www.tensorflow.org/versions/master/install/install_g... but it wouldn't surprise me if they have their own implementation.


That example is essentially building (and ostensibly training) an albeit a trivial model in Go. Typically you have more complicated architecture so you're deployment has two parts:

* the weights * and the architecture


Yes, the example is TF Hello World.

But the very first sentence on that page says: These APIs are particularly well-suited to loading models created in Python and executing them within a Go application

This is exactly what Uber is doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: