Hacker News new | past | comments | ask | show | jobs | submit login

In my experience it's a bit of both.

Most of a data scientist's time is definitely spent getting data from various places, possibly enriching it, cleaning it, and converting it into a dataframe that can then be modeled (and surprisingly a lot of those tasks are often beyond pure actuaries / statisticians who often get nervous writing SQL queries with joins nevermind Spark jobs)...

But also once you've got that dataframe, it's often too big to be processed on a single machine in the office so you need to spin up a VM and move the data around which also requires a lot more computer science / hacking skills than you typically learn in a stats university degree so I think the term data scientist for someone who can do the computer sciency stuff in addition to the stats has its place...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: