Hacker News new | past | comments | ask | show | jobs | submit login
How to put machine learning models into production (stackoverflow.blog)
119 points by Aaronmacaron 5 days ago | hide | past | favorite | 34 comments





Too many people focus on "properly" putting ML into production...

I'd like to propose an alternative... Build a model (once) on your dev machine. Copy it to S3. Do CPU inference in some microservice. Get the production system to query your microservice, and if it doesn't reply in some (very short) timeout, fallback to whatever behaviour your company was using before ML came along.

If the results of yor ML can be saved (eg. a per-customer score), save the output values for each customer and don't even run the ML realtime at all!

Don't handle retraining the model. Don't bother with high reliability or failover. Don't page anyone if it breaks.

By doing this, you get rid of 80% of the effort required to deploy an ML system, yet still get 80% of the gains. Sure, retraining the model hourly might be optimal, but for most businesses the gains simply don't pay for the complexity and ongoing maintenance.

Insider knowledge says some very big companies deploy the above strategy very successfully...


You only get 80% of the gains if your data distribution is stationary and/or you can construct a non-ML solution with near 80% of the ML one. If a non-ML solution is that close to non-ML then I would just stick with that - the advantages in terms of predictability and ease of maintenance would likely outweigh small performance gains.

A model that can't be continuously trained inevitably rots due to data, interface or environment changes (and the code is typically very difficult to maintain across team members - if the author leaves it's often a ticking time bomb). If you're OK with the model rotting then it wasn't that important to your business to begin with. This is not true for all businesses.


Yeah, there was a multiple dev long effort to build "complete" solution but I did what you described to satisfy impatient stakeholders. Since then the fancy solution is incomplete and mothballed while the simple deployment has been running ever since.

The next value add was consolidating data in a db with web ui so relevant non-devs can view and help add to it + easily integrate different data sources with automatic validation. I wish their was a nice open source thing here you could spin up without much effort. There's middle ground between complex/paid full solution that can scale to huge data sets and integrate with turking etc and emailing csvs or sharing google sheets links around.


Agreed! It seems like there's a lot of power in having "one to beat". As in, let's get some model up before we worry about one that needs daily updates.

Overall, a well written article.

If you're interested in ML Ops, I have a shameless plug to share: on November 19th I host a free online panel, "Rage Against the Machine Learning", with industry experts. [0]

[0]: https://cotacapital.zoom.us/webinar/register/8116020076218/W...


Thanks for the share. Will check it out.

Registered! BTW, awesome panel title.

I'm so glad you like it! Sometimes you're not sure whether people will get from your words the sense that you had in mind :)

There's a great paper from Google about this "Machine Learning: The High Interest Credit Card of Technical Debt" [0] which discusses why you should use a framework to deploy ML models (the authors are involved in developing TFX).

In my experience, spending time explaining results to the business is also a very time consuming element of deploying a model too.

0:https://research.google/pubs/pub43146/


I was expecting this to be more about running inference in production, though the information in the article itself was interesting on its own.

There does seem to be a dearth of writing on the actual topic of deploying models as prediction APIs, however. I work on an open source ML deployment platform ( https://github.com/cortexlabs/cortex ) and the problems we spend the most time on/teams struggle with the most don't seem to be written about very often, at least in depth (e.g. How do you optimize inference costs? When should you use batch vs realtime? How do you integrate retraining, validation, and deployment into a CI/CD pipeline for your ML service?).

Not taking anyway from the article of course, it is well written and interesting imo.


There seems to be the idea that training an ML model is like compiling code - but every "compile" leaks information into the training pipeline. Repeated testing and choosing (unless it is on a fresh draw) is an optimization step, you are optimizing on the test set.

Using a fresh draw is difficult and expensive, especially since the labels may not be available. Using A/B is expensive, multi-armed bandits are more efficient, but again there is an optimisation element there (waits for shouting to start)

Additionally surely there is a really significant qualitative judgement step about any model that is going to be used to make real world decisions?


You don’t typically perform optimizations iteratively with feedback from the final test set. Instead you split your training set into validation and training, and you iterate on that, leaving your true hold out test set completely unexamined all along.

You would do model comparisons, quality checks, ablation studies, goodness of fit tests and so forth only using the training & validation portions.

Finally you test the chosen models (in their fully optimized states) on the test set. If performance is not sufficient to solve the problem, then you do not deploy that solution. If you want to continue work, now you must collect enough data to constitute at minimum a fully new test set.


I agree with the process you describe - but the traps are things like running a beauty contest (Kaggle ?) of n models against the final test set...

I've recently come across the MLOps community here https://mlops.community/.

The meetups are all on YouTube and have great topics like putting models into production, but also more interesting ones (to me) like ml observability and feature stores.

Their slack channel is great too, learned a lot about the reality of using kubeflow vs the medium article hype


As a practical detail, I'm wondering if it always makes sense to wrap your predictor in a simple if-then based predictor. If your learned model makes bad predictions in certain specific cases, you can "cheat" with Boolean logic. This could also be useful when the business has a special case that doesn't follow the main patterns.

Any thoughts on that?


The if-then predictor is called an expert system and it is a kind of model. Combining multiple models is called ensemble model. Here, this is how the idea can sound smart, not like cheating. ;-)

On a more serious note - as always, it depends. On the logic, on the company processes and skills, how frequently will the logic change, visibility into the decision making, ... It may be a good idea or you may bring in a lot of manual work.


This is almost always a very bad idea because the if/then condition you describe is business logic - as in, how do you recognize the business situation when you want to decline a prediction, and how does that change?

It is very complex because most of the time there is no simple rule such as a threshold on the confidence score of the prediction. In practice it might be more like, “if the user has more than 7 items in their cart and if the user is not a returning customer that filled out personal data and the value of their cart is greater than $100 and they have not put a new item in the cart for 2 minutes, and the confidence score of the predictor is less than 0.4, THEN don’t show the next recommended item, just display a checkout link.”

And the number of items, the cart value limit, the time since last item-add, etc., will all be hotly debated by product management and changed 5 times every quarter.

In grad school there was a professor who said of machine learning that “parameters are the death of an algorithm” - so you want to avoid coupling extra business logic parameters tightly with the use of machine learning models.


I understand it may not be recommended in complex situations, such as you described, but I think @steve_g's idea may be interesting for some scenarios.

I work in automatic train traffic planning, mainly for heavy-haul railways [1]. Recently, we've been working on a regression model to predict train sectional running times based on historical data.

As our tool is used during real time operation, we can't risk the model outputting an infeasible value. So we're thinking about defining possible speed intervals, e.g. (0km/h, 80km/h] for ore trains, and falling back to a default value if the predicted running time causes the speed to fall out of this range.

[1] https://railmp.com/en/our-solution/


Might want to look at 'guarded' learning. This https://arxiv.org/abs/2006.03863?context=cs.SE might inspire you. In one team we managed to train with both simulation and data and 'force' the trained model to infer within the 'security' simulation.

I think there might be some better research on this... Not bothered to look it up much.


To me this sounds like a bad misunderstanding of regression and your use case. If you need a system capable of predicting within an interval, you should use something different, for example you could use Bayesian regression with a prior that only has support on your permitted domain.

You very much should be choosing models that actually map to the physical reality of your problem domain, and not using something that is fundamentally unphysical for your use case, but attempting to correct it with hand made business logic.


The title doesn't really match the article in my mind. To me, it talks about everything but actually deploying a machine learning model in production. In particular, there are a lot of words around where training data is stored. In my experience, the training data is really more part of the the development process than the actual productionisation of the model.

That said, there is a piece here on TFX, which is valuable in this context. I also think the advice about going with proprietary tools that speed up the process is good. Tools like Microsoft's AI tooling, Dataiku and H20 are good in that context.

I would have liked to have seen some discussion around when you should deploy a model as an API vs generating batch predictions and storing them - I've done both on a test bench, but I don't really know how well the API scales.


> To me, it talks about everything but actually deploying a machine learning model in production.

This seems to be a common theme of a lot of articles about 'how to put ML models in to production'.


“Deploying a model” is sort of a nebulous concept. You probably have some kind of server, which loads a serialized model and runs data from requests or from batch files through the model to get predictions. Which part are we deploying? The service can be deployed like any other service. The model is probably a file. You deploy it by copying it to s3 I guess? You can go more in depth, and that’s what the article is about.

If you don't expect to need to tweak the model parameters very often, and it has a simple form (decision trees, linear regression) and forms a sub component of an existing application, another deployment option is just hard coding the model as a library or even an expression amidst the rest of the application code.

That’s a good option too. Sometimes the model consists of some training and inference code, which can be loaded as a library, plus a bunch of weights, which may be large enough to warrant a separate file. Either way, I think the deployment of the model isn’t really a hard problem. Validation of model quality, keeping track of what code was used, and what training data, making sure the updated data is where it’s supposed to be before training, tying all these parts together in some kind of comprehensive management system are all harder problems I think.

There are some subtleties to deployment. If you trained a model with say sklearn version X and the component serving requests deserializes it but it’s using sklearn version Y then things can break badly.

is anyone running TFX in their companies in production ? how has the experience been ?

since like everyone is on K8s, im wondering if kubeflow is not the more natural fit


kubeflow is pretty horrendously bad unfortunately. Most of the installation docs are incomplete and inaccurate, and since the workflow requires building a separate container for each submitted task (instead of separately specifying version control commit) you cannot actually get reproducible results. You’d have to scrape the state of the code out of the identified container tied to a job, since the circumstances under which the container was created for the job can be any arbitrary, out of band changes a developer was working on, such as from a branch they never pushed.

This workflow also doesn’t work well in hybrid on-prem + cloud environments because, for example, your model training might run in a cloud Spark task, but your CI pipeline (responsible for building and publishing a container to an on-prem container repo) might run on-prem. kubeflow, for example, has a hard requirement to put containers into cloud container registries, and makes assumptions about what the networking situation is allowing connection between on-prem and cloud container resources.

I think industry shifting focus to kubeflow is actually a giant mistake.


There are several other platforms/tools taking different approaches to productionizing ML workload. At Polyaxon[1], we used to create a container for each task and also log the git commit for reference, and provide ways to push the container to an in-cluster registry or a cloud registry. In the new version of our platform, we basically improved the process by injecting specific git commit inside pre-built docker images to not only reduce the build time, but also to allow easier integration with external tool such as Github actions.

[1]: https://github.com/polyaxon/polyaxon


Hey thanks for the comment. Could you talk about what you see as an alternative...but assuming that k8s (most likely some cloud managed flavor like eks, etc) is what the native devops is based on ?

What I'm seeing is that ML/data engineering is diverging from devops reality - and going and building its own orchestration layers. Which is all but impractical except at the largest orgs.

I'm yet to find something that fits in with kubernetes. Which is why it seems everyone is using fully managed solutions here like Sagemaker


I do think managed solutions like Fargate & Sagemaker are good choices. Some providers, notably GCP, have no offerings that seriously match these (Cloud Run has too many limitations).

Kubernetes is very poor for workload orchestration for machine learning. It’s ok for simple RPC-like services in which each isolated pod just makes stateless calls to a prediction function and reports a result and a score.

But it’s very poor for stateful combinations of ML systems, like task queues or robustness in multi-container pod designs for cooperating services. And it is especially bad for task execution. Operating Airflow / Luigi on k8s is horrendous, which is why nearly every ML org I’ve seen ends up writing their own wrappers around native k8s Job and CronJob.

Kubeflow can be thought of like an attempt to do this in a single, standard manner, but the problem is that there are too many variables in play. No org can live with the different limitations or menu choices that kubeflow enforces, because they have to fit the system into their company’s unique observability framework or unique networking framework or unique RBAC / IAM policies, etc. etc.

I recommend leveraging a managed cloud solution that takes all that stuff out of the internal datacenter model of operations, move it off of k8s, and only use systems you have end to end control over (eg, do your own logging, do your own alerting, etc. through vendors & cloud - don’t rely on SRE teams to give you a solution because it almost surely will not work for machine learning workloads).

If you cannot do that because of organizational policy, then create your own operators and custom resources in k8s and write wrappers around your main workload patterns, and do not try to wedge your workloads into something like kubeflow or TFX / TF Serving, MLflow, etc. You may have occasional workloads that use some of these, but you need to ensure you have wrapped a custom “service boundary” around them at a higher level of abstraction, otherwise you are hamstrung by their (deep seated) limitations.


this was super-brilliant. thank you so much ! I wish you would write a blog on this.

Airflow is a good replacement and works well, easy to deploy and to add datasources/steps.

Airflow is only a DAG task executor, which hardly scratches the surface of what is needed for managing experiment tracking, telemetry for model training, and ad hoc vs scheduled ML workloads.

Airflow is useful as a component of an ML platform, but it is only in principle capable of addressing a really really tiny part of the requirements.

You also need to ensure Airflow can easily provision the required execution environment (eg. distributed training, multi-gpu training, heavily custom runtime environments).

Overall Airflow isn’t a big part of ML workflows, just a small side tool for a small subset of cases.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: