Are there ML models that are trained while being used? Humans learn as we go along, but this "train and deploy" that's so common doesn't seem sustainable.
If it can't be done online, then it shouldn't be that difficult to save the output for each week and then finetune the production model on that each weekend, right? Especially if there are humans correcting the faulty or uncertain outputs.
That's a pretty standard part of MLOps. I have a fraud model in production, it's being incrementally retrained each week, on a sliding window of data for the last x-months.
You can do it "online", which works for some models, but for most need monitoring to make sure they don't go off the rails.
That's good to hear, how does it work in practice? Is it basically running the same training as from scratch, but with only the new data, on a separate machine to produce a new version which is then replacing the old production version? Is part of MLOps starting a new training session each week, checking if the loss function looks ok, and then redeploying it?
I still think of how humans work. We don't get retrained from time to time to improve, we learn continually as we gain experience. It should be doable in at least some cases, like classification where it's easy to tell if a label is right or wrong.
* Take the previous model checkpoint, retrain/finetune it on a window with new data. You typically don't want to retrain everything from scratch, saves time and money. For large models you need specialized GPUs to train them, so typically the training happens separately.
* Check the model statistics in depth. We look at way more statistics then just the loss function.
* Check actual examples of the model in action
* Check the data quality. If the data is bad, then you're just amplifying human mistakes with a model.
* Push it to production, monitor the result
MLOps practice differs from to team to team, this checklist isn't universal, just one possible approach. Everyone does things a little differently.
> I still think of how humans work. We don't get retrained from time to time to improve, we learn continually as we gain experience. It should be doable in at least some cases, like classification where it's easy to tell if a label is right or wrong.
For some models, like fraud, correctness is important. Those models need a lot of babysitting. For humans, think about how the average facebooker reacts to misinformation, you don't want that to happen with your model.
Other models are ok with more passive monitoring, things like recommendation systems.
Continuous online training can be done. Maybe take a look at reinforcement learning? It's not widely applied, has some limitations, but also some interesting applications. These types of things might become more common in the future.
When I learned about RL we were taught to disable exploration when doing evaluation of the model since exploration part is stochastic. I don't think that would work in production.
If it can't be done online, then it shouldn't be that difficult to save the output for each week and then finetune the production model on that each weekend, right? Especially if there are humans correcting the faulty or uncertain outputs.