
Unpopular Opinion – Data Scientists Should Be More End-to-End - importantbrian
https://eugeneyan.com/writing/end-to-end-data-science/?mkt_tok=eyJpIjoiWmpBNVlURTRNRE00WXpjMiIsInQiOiJockxCUXB1YmRPOGxvNm40UDhwdEk2dlI0UHhRY0NzTmdDV3ZnbFl4d3p6WGNZRU56Qmh1S2VNXC9zbmR2RUJrTjk4bW1DeVNUVHdBQW50aHdEOXhLd3VPVDRnbmRWOWdiSFFQVTVmaFZwZ3o3N0tLMk9Xdk9sYXBPc0I3Y1FhREoifQ%3D%3D
======
geebee
People will agree, in theory. But here's what happens. The data science team
will grill the generalist about finding a steepest descent vector and how
logistic regression works, and conclude that the candidate just isn't going to
be populate, run, and interpret output form a neural net.

Then, the data engineering team will ask the generalist to write code to print
the path from one leaf node to another in a binary tree at the whiteboard in
45 minutes, and conclude that the candidate just isn't going to be able to
figure out how to identify matching terms in different parts of a JSON tree.

Eventually, both groups withdraw into their own silos and confuse each other.
They may decry the lack of generalists, but when it comes time to hire, they
will resolutely not hire candidtes with 80%ile skills in both areas. They will
hire only people with 95%ile skills in one area. They may get some people who
have well rounded skill sets through sheer chance, but their process selects
against this outcome.

------
alexfromapex
Agreed, but first companies need to realize that you don’t need a math PhD to
do 80% of the things a data scientist does. My company has data science and
machine learning experts that aren’t as well-rounded in the software
engineering side but there’s no cross training because they want SEs to have a
CompSci or SE degree and DSs to have a math or DS degree.

------
aeternum
The problem with this argument is it could be applied to almost every role. IE
Sales team should be more end-to-end (if they spec features and maybe even
write code they will understand the product better). For most companies, this
would be a terrible idea.

------
Viliam1234
People should have way more skills than they have now, they should learn all
those skills in their free time, and be available for the same salary. Then we
could simply hire fewer of them to do the same work. Also, I deserve a pony!

------
mlthoughts2018
This is sadly misguided because it mistakes the behavior that’s good for the
company (fast iteration loops, tight alignment between product and
engineering) for the means to get there (make data scientists be more end to
end).

The goal of tight iteration with good alignment is of course a good one, but
verbal sleight of hand doesn’t mean the way to achieve it is with more end to
end responsibilities for data science.

The huge cost of course is that data scientists and ML engineers have a hugely
asymmetric comparative advantage when spending their time on model training
and statistical solutions. You want them at full utilization for this set of
tasks because nobody else you employ can do that same statistical work, and
that work is often hugely valuable whereas most of the end to end work is
frankly grunt work and fighting through errors that anyone can do.

If you hire Michael Jordan for your basketball team, why would you make him
spend his (expensive) time cleaning up soda bottles or checking the elevators
for maintenance issues? It utterly makes no sense and wastes the comparative
advantage - all your Michael Jordans will be heading for the door.

~~~
ska
This argument is a really good one, but only in a small (< 10%, < 1%?) of
cases. Far more often the "grunt work" is generating at least as much value as
the modeling, and Michael Jordan doesn't want anything to do with your office
pick up game.

Especially in the case where you can't deploy meaningfully because your "data
scientists" are working in a silo with poor communication.

~~~
mlthoughts2018
Based on my ~8 years of experience managing ML teams in big companies, I’d say
it’s closer to 80% of cases.

“You can’t deploy correctly because of data scientists” is a failure of SRE
organizations to provide support, tooling and training.

In fact, most data science and ML engineers are quite skilled in systems
engineering, because you have to do so much work with GPU hardware issues,
underlying scientific package management, efficient data transportation, etc.

Editing some kubernetes config, hardening a high traffic web service or
optimizing a query based on an index are trivial by comparison, they are just
boilerplate timewasters that need to be a different team’s job to automate.

~~~
amznthrowaway5
"In fact, most data science and ML engineers are quite skilled in systems
engineering, because you have to do so much work with GPU hardware issues,
underlying scientific package management, efficient data transportation, etc."

This does not align with my experience working with many data/applied
scientists at very large companies. A lot of them cannot even write basic code
or use git commands. The engineers who are capable are often silo'd away from
the scientists, and the organizations struggle to produce any real value.

~~~
mlthoughts2018
I’m sorry you’ve had such a rare and incredibly uncommon, unrepresentative
experience. It sounds very idiosyncratic to your workplace and likely to the
hiring processes.

~~~
amznthrowaway5
Why do you suspect this experience is so rare and unrepresentative, as opposed
to yours? It is a company wide problem in the cases I've seen, the scientist
positions are not even expected to have junior software engineer level
competence in things like programming.

