Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Things You Wish You Knew Before Getting into Machine Learning
56 points by onuralp on Feb 21, 2019 | hide | past | favorite | 15 comments
Especially for those who switched careers to become a machine learning practitioner, data scientist, data engineer vs.

From own experience (switched to ML 1.5 years ago):

1. That software engineering skills are way more important than ML skills.

2. That you'd be spending more time on making presentation than doing ML (and it makes sense, it's very important to present statistics properly).

3. That most problems don't need good ML models. Something cheap and easy is often good enough. What you do need to be good, is data pipelines around them (see 1.)

In my case, I learned ML enough to feel "senior" compared to other people in company and online in less than a year. Same path to Senior SWE took me much longer (way larger mandatory knowledge base, probably because ML is a young field). So I'd say ML is definitely easier.

Those are great points. Could you comment what resources you have used to learn ML?

Mostly Kaggle -- reading others solutions and notebooks and integrating them into mine code.

Also there's a great Coursera course on ML for Kaggle: https://www.coursera.org/learn/competitive-data-science

I think once you finish it, you're better than 60% of silicon valley data scientists, no kidding.

Very good pointers. I would like to get in touch with you regarding how you transitioned to ML. I don't see a contact info in the profile. My email is in my profile. Pls let me know.

Kaggle is more than enough to get started. I would hire anyone who's Master there. Probably not even need for Master, just enough knowledge to explain why that thing work and that would not.

See this course to get into Kaggle: https://www.coursera.org/learn/competitive-data-science

Thank you for the inputs and course reference

It's starting to become a cliche (which might be a good thing), but building datasets, cleaning that data and validating that data is the hard part..by far. The actual machine learning is quickly become a commodity.

Our code base ratio of data cleaning/APIs/pre-processing : API calls to ML packages is like 98:1

So, I agree that data work is a large part of the job.

When presented with a new problem, you have to build the data infrastructure before you can do any learning.

But if you are on a team maintaining a project over the long term, you amortise the cost of this a bit. You will still see big impacts from improving your data, but you will also see big improvements from modeling improvements, though often that will just be plugging in a different box.

But I actually think this is a good thing of your goal is to build applications. These methods are hot right now because they are very good at things that are hard for us to program, so they let us build better systems.

One other thing I'll mention is that GDPR has made a lot of inane things pretty painful, largely due to overly conservative lawyers. This is probably mostly an issue for large consumer tech companies.

Except in rare cases, or specific teams tackling problems that are both exceptionally hard, and exceptionally well-suited for deep learning, I would take someone with some medium value stats and advanced python/pandas coding ability over a PhD in ML.

Sometimes people you work with, like team members or PMs, will really want to understand ML and be involved but will have a hard time grasping the concepts being discussed. I found it really helped to draw out and illustrate the different components and data flows!

The best places to start for a complete beginner are Precalculus and Hello-World in C.

I'm serious about this. Ultimately the job is just software development plus statistics.

If you are a software developer, work on your statistics.

If you're a statistician, learn to program.

Most people will have gaps in both of these sub-fields.

Do not, under any circumstances, take any online courses that include the phrases "data science" or "machine learning" in the title.

It isn't as interesting as we are lead to believe.

Your ability to develop an amazing ML model is limited by your organization's ability to collect and clean data. However, the great news is that most problems do not need an incredible model. Small uplifts in performance could still result in substantial outcomes.

In industry, you also need to balance the amount of time and effort it takes to build your model against the incremental benefit.

In the industry its much more important to build/get a great data set for training and test than building the perfect model.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact