Hacker News new | past | comments | ask | show | jobs | submit login

What I've found in reality is that machine learning is 99% data cleaning scripts and 1% the part you're talking about. I've also seen the heavy duty statistics people writing data cleaning python scripts which probably leads to a lot of frustrations :)



I think what may be understated here is that while it’s true that ML is mostly date cleaning, data cleaning is not easy. There are a million little decisions made and it’s rarely clear which ones are most effective. Experimenting with various techniques is great but the iteration times and cost are usually too high to try more than a small handful of approaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: