Hacker News new | past | comments | ask | show | jobs | submit login

There's this undertone of "I should be payed as much/more than Data Science people because I'm better than them at statistics and data science = machine learning = statistics".

My experience doing data science at small companies that can't afford to hire more than 1 person for the role is that it is so much more than just building models or doing statistics.

You have to:

1. Build APIs and work with developers to get predictive models integrated into the rest of the software stack

2. Know how to add logging, auditing, monitoring, containerizing, web scrapers, cleaning data(!!), SQL scripts, dashboards, BI tools, etc.

3. Do some basic descriptive stats, some basic inferential stats, some predictive modeling, work on time-series data, sometimes apply survival analysis, etc. (Python/R/Excel who cares)

4. Setting up data pipelines and CI/CD to automate all this crap

5. Trying to unpack vague high level requirements along the lines of "Hey do you think we could use our data to build an 'AI' to do this instead of manually doing it" and then coming up with a combination of software / statistical models that perform as least as good or better than humans at the task.

6. Work with non-technical business users and be able to translate this back to technical requirements.

Hey, if all you do all day is "build models" then that sounds like a very cushy DS job you have. It's definitely not been my experience. I would describe it more like a combination of software engineering and statistics and business analyst. That's why it pays higher than just statistics. But this is just my experience..




The author starts with

The data science world may reject me and my lack of both experience and a credential above a bachelors degree

More likely the data science world will reject him because he is so confident a field he has so little experience or knowledge of.

Data scientist is a profession rather than the name of an academic field. So data scientists' job is to solve practical problems. That involves a lot more than class assignments, and in some cases involves using machine learning to maximize predictive accuracy (because common ML models like gradient boosting capture interactions and non-linearities in a richer way than the GLM models the author is familiar with).

Their argument "that's a garbage model because we can't reasonably interpret underlying parameters," is replacing their personal criteria above what is needed to solve some problems.

They can blame it on only having bachelor's degree. But the real problem is the belief that a bachelor's degree taught them everything there is to know, and those in the DS field are ~ idiots who got lucky enough to be paid more.


I feel like the blog post should be read in line with how it was probably written: informal, personal and somewhat sarcastic, with a bitter note because he chose one major and now it turns out people value something else instead. Hence the title "the final stage of grief". I did not get the impression that the author thinks machine learning is stupid or that he knows everything about it.


And in all honesty, no data scientist with an ounce of self-worth should work long-term for such companies, unless it just happens to be their own.

You're basically doing 3 jobs for the price of one: Software Engineer, Data Engineer, Data Scientist.

Sure, you'll be a jack of all trades, as far as data goes, but it'll be at the cost of some specialization.

I'm probably gonna get a lot of sh!t for this post - probably from data [x] people that are in that exact position themselves, but the above description is exactly why I'd aim for larger companies with somewhat established analytics / data / ML teams or offices. You get to focus on the important stuff, instead of juggling ten balls at the same time.

(And it's not only in the field of data science. Some of the traditional SE positions I see at startups or small companies look absolutely grotesque - basically the whole IT and Dev. department baked into one job)


No one with an ounce of self-worth should work long-term for companies that expect them to do exactly what their title implies they should and not a thing more.

You're basically being arbitrarily restricted to learning and enjoying exactly one thing when it would often make more sense in context to become involved in: Customer relations, systems administration, management, software engineering, data science, etc.

Sure, you'll become really good at that one thing, but it'll be at the cost of personal growth and job satisfaction.

> I'm probably gonna get a lot of sh!t for this post...

I mean, yeah. You've basically lampooned anybody who enjoys working in ill-defined cross-disciplinary circumstances as having "[not an] ounce of self-worth".

It sounds like, from your perspective, your field is "the important stuff" and other fields are just balls to be juggled. There's nothing wrong with that, but lots of people don't think that way. To some people, the important stuff is anything that makes their customers happy. To others, it's anything that helps them learn.

And let's dispense with the notion that "doing 3 jobs for the price of one" is an accurate description of having broad rather than narrow responsibilities. One comes at the cost of the other. If you're an equally capable specialist and generalist, and you're capable of genuinely performing those 3 jobs at once, then if you were to specialize you'd be performing the work of 3 average specialists, and you'd be in the same boat as before.

Do what you're best at and try to get the best possible compensation for it, monetary or experiential. It's as simple as that.


> And in all honesty, no data scientist with an ounce of self-worth should work long-term for such companies, unless it just happens to be their own.

You're kind of saying that no one should work for a startup ever. At small companies you have to do many things. As you said a software engineer, there aren't dedicated front-end engineers or dev-ops. Marketing teams don't have content vs. growth vs. performance vs. brand vs. email marketers. You might be the first sales person, which means no sales ops support, no account manager for ongoing relationships, etc.

There are trade-offs, of course! There are trade-offs to anything. Some people value working on many aspects of a company. Some people find understanding more than their narrow field to be interesting and rewarding and, you know, self-worth-y!

So I think it might be helpful to step back and consider that not everyone has the same priorities, experiences, interests, or definitions of happiness and self-worth as you do. And that's okay.


>You get to focus on the important stuff, instead of juggling ten balls at the same time.

So on one hand, you can't build any models without the work of the engineers, but on the other the model building is "the important stuff"?

Maybe it's just me, but I enjoy working on all aspects of the data pipeline.


Model building is often the trivial part, and often you can't build models without a solid understanding of things like the data pipeline.


Some people enjoy doing full stack DS and get paid well for it.


And there's the potential risk of being mediocre on many fields and become less competitive in each area. My opinion is it really needs to be the field you love (biology, science etc) to be worth the effort.


to be fair, how large should a field be?

specializing in physics as a whole is way to broad. but is specializing in front end software development to broad?

Once you have learned the fundementals of computer science and its assosicated fields (networking and systems engineering mainly), the difference between doing back/front end works is not that large.


I had two semesters of databases at university. I wonder why they wasted so much time. I mean, you don't really need university education to understand the SELECT statement, right? /s

The problem with computer science is that it may take a lot of time to understand something thoroughly, but it only takes a while to find a tutorial, copy some example code from web, and build a simple version that only contains a few bugs and will be a nightmare to maintain and scale, but hey, it mostly works, as long as you use it in the predicted way and don't put in too much data and don't type or click too fast.

And because this works most of the time, and can be sold, this became the standard. People only capable of copy-paste development still get jobs. Design patterns, that's just some academic nonsense for nerds, right? Hey, my teenage nephew made a simple application in PHP over the weekend; how much more difficult can your job be, seriously? Java and JavaScript are the same thing, aren't they? Okay, one of them has optional semicolons, but now you are making a mountain out of a molehill, just admit it. What database design? Just make some tables and put the data there; if it's not fast enough, add some indexes. Web page design? Just put the button on the top; if it doesn't fit, put it on the bottom; if it still doesn't fit, put it on the side or maybe in the header, whatever. What's all this talk about technical debt? I am paying you to add new features...

Yes, software developers who don't have deep knowledge in any field, and who do mediocre work, are quite interchangeable. No need to invent specialized job positions for them. If your company develops 10 products, all the wheels will be reinvented 10 times and most of them won't rotate properly, but that's life. The advantage is that replaceable people are cheaper and more obedient.


I mostly agree but the competition in the field will dictate that. Like you stated if it's features you are after and the sold party won't mind these small details then maybe they (developers) are interchangeable. But e.g in science mediocre work will cost a lot if certain minor details are found wrong. That's whole reproducibility crisis in science is about. The reputation of a whole institute / department might be at risk if the mistakes are exposed.


Some of what you're describing in 1-4 is Data Engineering. 5-6 exists (in some form) for most software jobs.

The general breakdown I give people is:

Data Scientists:

* Get data.

* Clean data (~60% - 70% of time required).

* Research.

* Low level data analysis.

* Building models.

It's mostly the "knowing data" and the modelling.

Data Engineers:

* Data storage

* Data processing

* Automation

* Infrastructure

It's about getting the Data Scientist's output into production / making data easily available to them.

This is especially true for big ETL jobs. The more we can automate your ETL jobs, the happier you'll be!


I talk regularly with business people who have hired data scientists, and 5 and 6 is always the biggest complaint. That, plus new hires are always unprepared to handle how messy real-world data is.


The messiness of data is something I even see creating a growing rift between academic ML/DS and real world applications.

What makes for a nice paper doesn't necessarily make for a model that will survive contact with new user generated data.


I remember some similar discussions when I worked in logistics consulting for a while. The data source was a mess, like a mess. Data was even included as screenshots of spreadsheets in other spreadsheets.

Some math PhD was in charge of that modelling. Without any domain knowledge concerning the data (logistics, consumption, maintenance) and thus unable to properly interpret the raw data to begin with. Most of the time was spent on writing some Python scripts to analyze the raw data, still full of errors. And build predictive models on top of that mess. Kind of formed my view of data science, unfairly so.


Just like almost all software work is maintenance, almost all data science is data cleaning.


I liken it to sending a chef to a grocery store. It's not just about being a good cook. Half the battle is in choosing the correct ingredients. Not just a dozen eggs, but free range where the yolks will be a vibrant orange yellow and improve the presentation. The cleanest models with the highest fidelity often fall out as the next obvious transformation of a well groomed and hygienic dataset.


I hate the fact that you're right. And that you've described my job in a small company.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: