Ask HN: What are the 20% of ML/DL skills that are used 80% of the time
21 points by snyp 8 months ago | hide | past | web | favorite | 6 comments

Talking about ML and DL and figuring out how companies can add a few buzzwords in their marketing to say that they use AI.

One of them is probably data preprocessing: To properly prepare the data before presenting it to the algorithm.

This is one thing that frustrates me about AI. I can data pre-process all day every day, I've been writing ETLs and data warehouses for years. But what am I supposed to preprocess it to? What is the ideal shape of the data?

A lot of courses gloss over this. The dedicate a whole section to cleaning data and then skip straight to ML with datasets already made. Or slightly better, they make you pre-process the data but tell you exactly what columns you need not why. So when you have a new project unless it is near identical to the example in the course you may not know what to do.

Very high level competency in Pandas/R and SQL and knowing what to do in SQL and what to do in a scripting language I consider very important. Have wasted so many days writing stuff in SQL that I should have written in Python.


And model.fit_transform()

