Hacker News new | past | comments | ask | show | jobs | submit login

The kind of math that would be necessary to understand papers like [1][2][3][4].

[1]: https://arxiv.org/abs/1703.04933 [2]: https://arxiv.org/abs/1701.07875 [3]: https://arxiv.org/abs/1706.01350 [4]: https://arxiv.org/abs/1512.04860




You should keep in mind that probably none of those machine-learning researchers has studied only math specific to that domain, so their papers are likely to include whatever math they have a background in, plus any new techniques they had to learn to get their results.

That said, everything I saw in the papers you linked was linear algebra, calculus or probability theory plus the usual smattering of background notation and set theory.

Once you have a solid background in those areas, it is likely more productive to look up the specific concepts mentioned in a paper (such as the Kullback-Leibler divergence or the Bellman equation), because by then you are probably too deep in the woods to find one resource that adequately covers all those different directions.


That's mostly linear algebra, probability theory and calculus. You're going to have a difficult time self-studying all of that if you haven't had much exposure to it.

Books are probably a less efficient method of learning the mathematics if you have targeted subjects you want to learn about. They're typically suited to introductions and breadth-wise coverage of fields, but once you get higher up, "linear algebra" (for example) can get fuzzy with things like abstract algebra. That means you'll end up with several tome-like books to work through which can be productive, but it'll take a while and you'll need to map the material to the applications you're interested in on your own. It's more efficient to develop a good baseline of understanding about a broad subject area, learn the foundational theorems, then move on to the specific areas you need to learn. This is typically doable if you've developed the requisite mathematical maturity overall and if you have learned the "essentials."

Practically speaking: maybe pick up foundation texts like Strang's (linear algebra), Spivak's (calculus) and Ross' (probability theory). You're going to want a solid foundation in analysis before moving on to higher order probability theory, so drill down on that after you do a refresher on the calculus. From there you should attempt to read each paper (even if you struggle a lot), take notes on what confuses you or doesn't make sense, read the prior art on those topics and then come back to it.

I don't particularly read machine learning papers often, but I read mathematical cryptographic ones very often (at least once per day I find myself in a new one). It's not typical that I read a research paper introducing a novel primitive or construction where I follow the math immediately on a single pass, and I often come across things I need to read about first. From a thirty thousand foot view the math for both of these subjects is broadly similar in rough topical surface area, so I think this methodology for academic reading is fairly applicable to most subjects that involve a lot of mathematics understanding.

Basically: don't approach learning the heavy math with a monolithic, brute-force approach as if you were in university. That's a slog and it's demotivating. Learn the minimum foundation for each area you need, then proceed to more advanced topics as you need them.


For calculus, books, why Spivak over Swokowski? Can you compare and contrast and suggest why someone might suggest one over the other? I don't have a preference myself, but it would be good to understand the differences.

The linear algebra is obviously key, and I wish we'd done more of the in my advanced high school classes instead of elementary analysis.


Thanks a lot for your comment. I do have exposure to all the three topics. I self-studied with Strang's MIT OCW course in high school, took calculus and probability in high school and undergrad. So, I'm not really looking for big introductory books for two reasons, I don't really have time to go through big books, and since I already have some exposure, it becomes hard to find new things to learn from such introductory books. So, I was looking for something more concise which efficiently covers such mathematics.

EDIT: I think the main topic missing from my background is this so-called "analysis". I never formally studied it. Is there a more efficient way to study analysis than spivak's, for someone who has a decent background otherwise?


Check out Mattuck's Introduction to Analysis.

Analysis is basically "really rigorous calculus". Basic analysis courses are also usually where you learn to do proofs.

(To some reasonable generality "calculus" stands for "rules of manipulation", while analysis is the mathematical theory of calculus. So I can teach you stochastic calculus in a couple of two-hour sessions but understanding what the hell is going on (stochastic analysis) requires measure theory, some functional analysis and much courage)


I don't know why, but people always seem to forget that optimization is an important topic in machine learning that requires study. Boyd's book is the canonical source (and free online). If you want to get some functional analysis background at the same time, you can look at Optimization by Vector Space Methods. It's an older book but it is still worth a read and provides more theoretical foundations than Boyd.


> I don't particularly read machine learning papers often, but I read mathematical cryptographic ones very often (at least once per day I find myself in a new one).

Where do you find new ones? I'd like to get into this.


The IACR eprint archive, which is essentially arXiv for cryptography. Essentially everything worth reading in cryptography is either in the IACR eprint archive or a conference proceeding. All conference proceedings from the IACR conferences can be read online for a fairly cheap membership fee. More often than not everything is cross-posted to the eprint archive even if it's published in a journal (which there's basically one: The Journal of Cryptology) or a conference.


Nah, Lax/Folland/Shiryaev!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: