Ask HN: What does your production machine learning pipeline look like?
Doing some design for an upcoming project and taking a survey.

I'll go first. Model training happened nightly on a Spark cluster. This output a PMML-based SVM model. The model was instantiated on a cluster of compute servers running Openscoring. A thin Node web service wrapper used the Openscoring cluster to serve realtime client prediction requests. Dataset size in the hundred millions of examples with hundreds of features. Handled thousands of requests per second, no problem.

Separating the training technology from the execution technology was nice but the PMML format is limiting in the kinds of models you can use that both you trainer and executor will support. What are people doing who use same tech for both? For something like Tensorflow, I assume you must have to save the model as binary from the train step and then send it off to the prediction cluster to be instantiated again for execution?






This is a bit of a shameless plug since I'm one of the creators of the tool but I'd recommend Pachyderm. It's built with ML Pipelines as the primary use case. It version controls your data and keeps track of the data provenance of your models. All your computations can be expressed as Docker containers so you can use any tools you want. We have an example in our docs of how to use TensorFlow.

https://github.com/pachyderm/pachyderm

Deepdetect for both dev and prod, with the minimum code in front of it. This setup is definitely not able to accommodate all modern ML needs, but the fast and secure model update from dev to prod is the easiest for us. Disclaimer: DD author so the bias is very high, apologies for this, maybe my comment will remain useful to some.

We use scikit-learn to train the models every few weeks when we get more labeled data. Once a model is trained we use joblib to save the entire pipeline (normalization, feature processing etc). In production we have a thin Rest wrapper that loads the model pipeline to memory and serves prediction requests. We scale the number of these servers based on the load.

