Hacker News new | comments | ask | show | jobs | submit login
Best Practices for ML Engineering (2017) (developers.google.com)
129 points by dsr12 8 months ago | hide | past | web | favorite | 11 comments

My favorite part is what so many people seem to forget:

Rule #1: Don’t be afraid to launch a product without machine learning.

Just because you can use machine learning doesn’t mean you should.

Also, should be said. Just because you can use the framework and it produces what you expect the first time: Doesn't mean that you're using the technique correctly.

(AI has some very advanced techniques in it and require attention to detail and background knowledge to use them correctly)

Sounds like a disclaimer for any programming. Just because the happy path works doesn't mean you have a finished system.

About a year ago I spoke to someone at Google who described her job as trying to convince people not to use machine learning in cases where it wasn't really appropriate. They take this seriously.

The part about a solid pipeline is very important. Recently I talked to someone who had a bad experience detecting phrases using gensim. I always have had a good experience so I was curious. Turns out they were doing OCR on PDFs and feeding the result directly into gensim. As anyone with OCR experience knows, there is often a lot of noise which is definitely going to screw up gensim or any other NLP tool. The quality of your preprocessing is important!

Might be worth noting that this is old [1]. I think either in 2016 or 2017, the same document was posted in PDF form.

[1] http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

I want to highlight #1 and #2 especially. Machine learning is not necessarily the correct tool to use to solve a problem. You won't even understand whether it might be unless you've done at least some of the work for Rule #2. If the data you've collected and the metrics you think you can nudge with a machine learning product support it, make sure you have as complete a set of metrics as possible to measure whether your ML product is useful.

It's easy for data scientists and machine learning engineers see P/R, AUC, or whatever as the goal, especially if there isn't much support in the organization for measuring product performance. It's often not the end goal. Measurements of a model's performance in this context indicate some measure of statistical performance with respect to training and test data. Real, live measurements from "in the wild" application are the true fitness test.

"Track as much as possible in your current system."

But... "don't be evil" now!

woah i had no idea zinkevich worked at google. that guy is really legit.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact