Roughly speaking, the roadmap for a typical ML/AI student looks like this: 0) Le...

te_chris · on Dec 14, 2023

As someone a wee bit along the journey but with the maths dragging me down a bit, I've found that, while in a perfect world I'd love to get my maths up to solid 2nd year undergrad level, it's just going to take me another year or so. That hasn't stopped me moving forwards. I understand y = ax + b, bits of linear algebra, gradient descent, but I still don't have the critical intuition to pass a college level maths exam.

This has helped me build the intuition for understanding these concepts in ML, and as an experienced developer I've found I've been able to pick up the ML stuff relatively easily - it's mostly libraries at the practical level. This has in turn shown me two things: ML is data quality, prep, and monitoring; I actually like the maths: it annoys me that there's this whole branch of knowledge that I don't grok intuitively and I want to know more. As I go deeper on the maths, I find myself retrospectively contextualising my ML knowledge.

So: do both and they'll reinforce each other - just accept you'll be lost for a bit.

Also: working with LLMs is incredible, as you can skip the training step and go straight to using the models. They're fucking wild technology.

TrackerFF · on Dec 14, 2023

I personally think it is possible to get a grasp of how many ML models learn, if you can get the intuition behind it - without the formal math knowledge, but only up to a certain point.

From my time in college studying this, you had approximately four types of students:

1) Those that didn't understand how models worked, and lacked the math to theoretically understand the models (dropped out class after a couple of weeks)

2) Those that understood (intuitively) how the models worked, but lacked the math to read and formalize models. Lots of students from the CS program fell under this group - but I think that is due to CS programs here having less math requirements than traditional engineering and science majors.

3) Those that understood how the models worked, and had the math knowledge. This was the majority of students.

4) Those that did not understand the models, but had the math knowledge.

Of these, 2-3 were the most common types of students. In the rare occasion, you had type 4 students. They would have no problem with deriving formulas, or proving stuff - but they'd more or less freeze up or start to stumble when asked to explain how the models worked, on a blackboard.

With that said, if someone has any ambition of doing ML research, I think math prereqs are a must. Hell, even people with good (graduate level) math skills can have a hard time reading papers, as there are so many different fields/branches of math involved. Lots and lots of inconsistent math notation, overloading, and all that.

There's a lot of contrived "mathiness" in papers, even where simple diagrams will do the trick. If your paper doesn't include a certain amount of equations / math, people aren't taking it serious...so some authors will just spam their papers with somewhat related equations, using whatever notation they're most comfortable with.

fragmede · on Dec 14, 2023

#2 is interesting to me. My computer engineering degree had me do enough math classes that it only took a few extra classes to get a minor in math, so CS students not having the math background is interesting to me. Must be different curriculums.

opportune · on Dec 14, 2023

If I can offer some unsolicited advice, try to seek out ML educational material with polished visualizations. If you're struggling with the math, trying to learn the concepts by reading libraries or textbooks or papers will be very hard - you might understood a specific thing if you look at it closely, but it will be hard to conceptually develop an intuition for why it works. A good visualization, or a strong educator explaining by way of analogy, can make a huge difference.

For example, gradient descent in conjunction with your learning rate can be visualized as calculating your error vector (your gradient), stretching it by your learning rate, applying it your parameters, computing the next error vector, and so on. If you think of what applying this vector might look like in 3d space, training your model is basically getting all your parameters to fall into a hole (an optimum). This kind of conceptualization helps you understand the purpose and impact of the learning rate: a way to stretch out the steps you make to descend into holes, so that you might hopefully "shoot over" local non-global optima while still being able to "fall into" other optima.

You could read papers and stare at code for a very very long time without developing that kind of intuition. I don't think I could ever come up with this myself just dabbling.

And as a side note, in mathematics at least for me, the most unexpectedly hugely important factor in understanding something is exposure-time. In college and grad school I found I didn't fully intuit most material until about 12mo after I had finished studying it - even if I hadn't actively studied it at all in the interim. I think it has something to do with the different ways our brains encode recent/medium/long term knowledge, or sleep, or something - not really sure, but I do know the earlier you started learning something and exposing yourself to the concepts, the sooner your subconscious builds that intuitive understanding. So you can do yourself a huge favor by just making an effort to dive into the math material even if it feels like a slog or that you're not getting it right this minute - you might make up one day in a few months and just "get it" somehow