Hacker News new | past | comments | ask | show | jobs | submit login

The code itself is indeed simple, thanks to the combined efforts of very smart and capable researchers and developers across the world. But the time taken to write the actual code to perform ML is negligible compared to:

- choosing the right algorithm(s) for the specific task and data at hand

- tuning hyperparameters

- interpreting preliminary results and/or the output of statistical tests

- chasing down and cleaning data(!!)

- deploying the resulting ML pipeline into something more maintainable than a GitHub repo of inconsistently named Jupyter Notebooks strewn with hardcoded paths to CSV files and dangerously obsolete documentation written by employees who have since left your company.

Do you have any references that could help with 'choosing the right algorithm'? That seems to be something that comes with experience, or knowing how a particular algorithm works, or more importantly, does not work.

One possible strategy is:

1. Find out what kind of information that is wanted from the model and what kind of data you have available that could help with this. 1b) And note down other requirements for the solution.

2. Find out how to formulated the information needs in terms of a well-understood and well-researched problem.

3. Find out what the current best performant and well-understood solutions are for this problem. Usually through a literature search.

4. Choose a few candidates, and rank them wrt your particular requirements.

5. Test out 1-3 of them

This is a business-needs-first, top-down type method. Prior knowledge of the details of a lot of algorithms not needed. Ability to understand terminology and quickly identify relevant material/papers critical. A good overview of common problem formulations and methods will build over time, and speeds things up immensely.

This one has been recommended to me a lot: https://scikit-learn.org/stable/tutorial/machine_learning_ma...

But I agree I wish there was more resources on that, it seems to be just a trial and error process.

For my final project in control engineering I was tasked with writing a couple of ml algorithms and rank them to suggest the best one. The code I was working on was not good at all and all the data was being processed in Matlab, so I spent all the project refactoring and proposing a viable solution to using python as a math backend for the java application.

Anyways I had to research a little into the subject and what I found is that there isn't a straightforward approach to choosing algorithms. I could be mistaken, but I believe the best approach for you would be to get intimate knowledge from every ml algorithm and maybe use a cheatsheet to guide you, but ultimately only knowing well your data set (distribution of classes, occurrences, which traits are better for which categorization you want to do etc..) will bring you farther than looking for a recipe for choosing an algorithm.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact