Hacker News new | comments | show | ask | jobs | submit login
AI vs. Data Science vs. Data Engineering (insightdatascience.com)
97 points by mwakanosya on Sept 30, 2017 | hide | past | web | favorite | 13 comments

The term "data science" always struck me as odd. As in, don't all sciences necessarily involve data?

This comes up every time that the term data science is mentioned on Hacker News, and it's frustrating that so many of the replies to you are the same ha-ha-only-serious responses that always pop up. The term is over 50 years old, and refers -- straightforwardly enough -- to the science of studying how to learn from data.[1]

To all the jokes of "data science is just statistics done by engineers" and other such things, read Breiman's "Statistical Modeling: The Two Cultures." [2] It talks about how the field of statistics largely ignored "algorithmic modeling" techniques, and therefore historically those techniques have been developed outside of academic statistics, either in computer science departments or in industry.

If you look at all the big name people who are pushing forward on deep learning and machine learning -- Yann LeCunn, Andrew Ng, Geoffry Hinton -- at Facebook, Google and other places, they don't have statistics degrees, they have computer science degrees. There's a whole wave of techniques and schools of thought that developed outside of statistics. To come back now and say "data science is statistics done by engineers" as some slight against engineers is malicious, parochial and wrong, and it annoys me greatly that it comes up so often on Hacker News.

1) http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSci...

2) https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...

Scientists from fields that rely on data, like physics, often do well as data scientists.

I myself like Information Science more than Data Science, but I do not care that much for semantics. There was a need to specify a role of someone who makes sense out of data, gathers insights, using the tools from mathematics, computer science, statistics, and information theory. It's also a different type of science, data-driven science, as opposed to theoretical/metaphysical, empirical, or computational science.

There was an old joke that AI stood for Advanced Informatics. I think the commercialization of the term "AI" is a bit harmful and obfuscating. Companies tumble over one another to market their professionals as Applied AI or their products as AI. AI is the automation of human thought. It includes philosophy and cognitive science, both fields seem completely missing for applied AI.

I know many AI researchers already switched to calling themselves ML researchers a few years back. This, because the field of AI became muddied with futurist adherents of the Singularity. Did not help that the public perception of AI is somewhere between "Skynet is coming!" and "AI will take my job". Nowadays, ML is also heavily saturated and hyped beyond repair. Meanwhile the field of AI has not even solved the common sense problem.


Originally Peter Naur said it was a term synonymous with Computer Science. In 1996 - 2001 it was used in a similar manner today by academics. After the Harvard Business review called it "the sexiest job of the 21st century" it stuck.

See https://en.wikipedia.org/wiki/Data_science

It started as a secret code that meant "statistician that can program." There's a long debate in stats regarding whether what we do is science.

Only programmers program is a myth only believed by programmers. Programming wouldn't even exist without statisticians, engineers, physicists and others.

Data Science is what statistics is called by engineers.

I've taken a lot of statistics classes. Not once were random forests mentioned. Boosting was. Gradient boosting wasn't. Linear and logistic models were mentioned, but those are day 1 data science.

No, statistics is what statistics is called by engineers.

Data science is rudimentary-level statistics, done on a Mac sipping a latte, while pretending that data mining and other disciplines haven't existed for decades already

One interesting point is the relative prevalence of different programming languages in data science vs. data engineering. Python and R are obviously dominant in data science, while Java and JVM languages are more widespread in data engineering, and that divide means that the algorithms don't always plug in well to the big data stack.

The article makes the false equivalency that all three skills are mutually exclusive. In actually, having proficiency in all of AI/DS/Data Engineering is important, as they are all interrelated with DS proper (where AI is used for Moreno robust modeling and Data Engineering is for practical schema management)

And DevOps too. Honestly I’d like to see more thought pieces about statistical devops workflows that aren’t from startups which intentionally complicate the process to sell their own product.

Wouldn't a false equivalency be if the author said all jobs are equivalent because they share some common skills? Seems like you might be claiming a false "exclusivity?"

I'm legitimately asking because I had to just admit to myself that I didn't really know what "false equivalency" means and looked it up.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact