
Best Practices for ML Engineering (2017) - dsr12
https://developers.google.com/machine-learning/rules-of-ml/
======
stochastic_monk
My favorite part is what so many people seem to forget:

Rule #1: Don’t be afraid to launch a product without machine learning.

Just because you can use machine learning doesn’t mean you should.

~~~
monksy
Also, should be said. Just because you can use the framework and it produces
what you expect the first time: Doesn't mean that you're using the technique
correctly.

(AI has some very advanced techniques in it and require attention to detail
and background knowledge to use them correctly)

~~~
hueving
Sounds like a disclaimer for any programming. Just because the happy path
works doesn't mean you have a finished system.

------
rpedela
The part about a solid pipeline is very important. Recently I talked to
someone who had a bad experience detecting phrases using gensim. I always have
had a good experience so I was curious. Turns out they were doing OCR on PDFs
and feeding the result directly into gensim. As anyone with OCR experience
knows, there is often a lot of noise which is definitely going to screw up
gensim or any other NLP tool. The quality of your preprocessing is important!

------
theCricketer
Might be worth noting that this is old [1]. I think either in 2016 or 2017,
the same document was posted in PDF form.

[1]
[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf](http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf)

~~~
dang
Discussed in 2017:
[https://news.ycombinator.com/item?id=13414776](https://news.ycombinator.com/item?id=13414776).

------
sidlls
I want to highlight #1 and #2 especially. Machine learning is not necessarily
the correct tool to use to solve a problem. You won't even understand whether
it _might be_ unless you've done at least some of the work for Rule #2. If the
data you've collected and the metrics you think you can nudge with a machine
learning product support it, make sure you have as complete a set of metrics
as possible to measure whether your ML product is useful.

It's easy for data scientists and machine learning engineers see P/R, AUC, or
whatever as the goal, especially if there isn't much support in the
organization for measuring product performance. It's often not the end goal.
Measurements of a model's performance in this context indicate some measure of
statistical performance with respect to training and test data. Real, live
measurements from "in the wild" application are the true fitness test.

------
kornakiewicz
"Track as much as possible in your current system."

~~~
tejasmanohar
But... "don't be evil" now!

------
mathinpens
woah i had no idea zinkevich worked at google. that guy is really legit.

