
How to deliver on Machine Learning projects - jakek
https://blog.insightdatascience.com/how-to-deliver-on-machine-learning-projects-c8d82ce642b0
======
vjsc
So we had this idea of a new feature for our product. The only way to quickly
do it was to somehow implement a machine learning algo and that would give us
the result that we wanted. Viola!! It seemed simple.

Now our company doesn't have any machine learning expert or a data science
genius. Going for hiring one would take time. Taking someone up on contract
would be very expensive (our CEO wasn't ready to shell out that kinda money).
So the task fell on me. They asked me to go through the multitudes of Machine
leaning MOOCs out there and get a working prototype ready in 2 weeks.

I had already done Andrew Ng's course back when it came out for the first
time. But my memory had faded for the lack of practice.

I re-ran the course again. I went over a couple of online ML books too.

Then I started thinking of the problem at hand. Unfortunately, it turned out
to be a chicken and egg problem. For the feature to work perfectly we needed a
large amount of training data to train our models. But without the feature
actually deployed, we didn't have any way to collect any training data.

So we ultimately fell back to simple algo, that took it's decisions based on a
few hard coded rules. Things have been working fine till now.

~~~
hellogoodbyeeee
They gave you two weeks to become a data scientist and implement a working
solution? That's nuts. I'm still pretty early career, but I have done data
science work for about four years now and I wouldve quoted at least two months
to figure out data, clean it, feature engineer, run models, compare results,
and then deliver the best performing solution.

~~~
tedivm
And they didn't even have data!

~~~
pletnes
No data cleaning required. That’s often 80% of a project. So 2 months -> 2
weeks makes sense now!

------
fromthestart
Machine Learning is much more nuanced than people seem to understand. You
can't just throw data at a net and expect results-this field requires a heavy
degree of intuition, and engineers must be prepared for nets to pick up on
patterns not obvious to humans, which can lead to unintuitive results.

Neural nets are basically black box heuristics, with unpredictable edge cases.
Much like human reasoning, I'd warrant!

------
b_tterc_p
this doesn’t seem to offers any novel perspectives. I read it as intended for
self marketing.

~~~
e_ameisen
Co-author here. This post came out of a discussion with Adam, where we both
realized that the advice we were giving to ML teams and ML Engineers to guide
them to better results were very often process centric rather than model
centric.

Many resources exist online about how to get a model to converge, and that’s
not usually what makes or break a project.

Data acquisition, augmentation, model selection, and iterative exploration
however seem quite rarely discussed compared to how important we have seen
them be. This is our attempt at sharing this outside of our usual circles.

------
seren
That sounds awfully close to DMAIC.

[https://en.wikipedia.org/wiki/DMAIC](https://en.wikipedia.org/wiki/DMAIC)

Nothing wrong with that though...

------
sgt101
So we do the loop 50 time and we now have an algorithm that works (97%!) on
the test set. We are happy! We run it in production and everything looks good
(prbly 92% ish). Everyone is happy! We all get promoted or get new jobs. Then,
one day, someone actually looks at what it's doing... and lo. It. does. not.
work (~51%) Everyone is sad. Apart from us! Yay!

Seriously - an optimisation loop on a test set? Seriously?

------
rfeather
The point about hacking away at the code needs to be couched heavily. It's too
easy to conclude you've got negative or positive results when what you really
have is a silly little bug. The lack of focus on implementation skills in data
(or even "real" science) is frightful. The one take away anyone trained in
software engineering could share is that if you aren't very sure if it is
working as intended, it's very likely not. Code review is very applicable here
when making major pivots, even if unit or other testing is decidedly too time
consuming for the train test improve loop.

Edit: typo "of" to "if". Somewhat serendipitous if you think about it.

------
reureu
I love that "Data Scientist" has become such an inflated and meaningless title
that now we have "Machine Learning Engineer".

~~~
ende
Well, “Data Scientist” has been appropriated by the overflow of PhD’s w/o any
actual stats or computational backgrounds and few academia prospects, so I
guess you need to create new job titles for thise who are going to do the
actual work.

~~~
reureu
I totally agree, and wasn't arguing that a new title wasn't necessary. And I'm
ok with my downvotes for that comment :)

It's just funny that "Data Scientist" seemed to be originally branded as the
more technical/engineer-y version of a data analyst. Now I get recruiters
contacting me for "Data Scientist" positions that entirely revolves around SQL
and excel, and nobody in the Bay Area hires "Data Analysts" anymore.

Alright, guess it's time to update my LinkedIn and resume to adjust for this
inflation? Maybe I should jump up a few inflation levels and just become a
"Deep Learning Engineer."

~~~
borroka
I do not see any problem with that. There is a ton of confusion in the tech
world regarding labels, who does what, it is needed or not, outside of the
core actions that need to be done. The net effect of laying off 50% of tech
people from public tech companies might even result in a net positive for the
companies. Not for a tech worker like me, so please do not tell them.

Taking advantage as much as possible of hypes and other people's lazyness is
fine in my book. It is certainly not my duty from the outside to educate
recruiters and business people who make hiring decisions on the field – when I
tried, from the inside, to gently point out that what they were thinking did
not make any sense, I just put myself in a dangerous spot. I can be a data
scientist, deep learning engineer, machine learning engineer, machine learning
research scientist, whatever pays more and whoever has the most fun. If using
an RNN instead of a more effective and efficient linear regression gives me
more money and prestige, I will do it – as an IC you either go with the flow
or you are not having a good time. The vast majority of us is not saving lives
anyway.

