That said, everything I saw in the papers you linked was linear algebra, calculus or probability theory plus the usual smattering of background notation and set theory.
Once you have a solid background in those areas, it is likely more productive to look up the specific concepts mentioned in a paper (such as the Kullback-Leibler divergence or the Bellman equation), because by then you are probably too deep in the woods to find one resource that adequately covers all those different directions.
Books are probably a less efficient method of learning the mathematics if you have targeted subjects you want to learn about. They're typically suited to introductions and breadth-wise coverage of fields, but once you get higher up, "linear algebra" (for example) can get fuzzy with things like abstract algebra. That means you'll end up with several tome-like books to work through which can be productive, but it'll take a while and you'll need to map the material to the applications you're interested in on your own. It's more efficient to develop a good baseline of understanding about a broad subject area, learn the foundational theorems, then move on to the specific areas you need to learn. This is typically doable if you've developed the requisite mathematical maturity overall and if you have learned the "essentials."
Practically speaking: maybe pick up foundation texts like Strang's (linear algebra), Spivak's (calculus) and Ross' (probability theory). You're going to want a solid foundation in analysis before moving on to higher order probability theory, so drill down on that after you do a refresher on the calculus. From there you should attempt to read each paper (even if you struggle a lot), take notes on what confuses you or doesn't make sense, read the prior art on those topics and then come back to it.
I don't particularly read machine learning papers often, but I read mathematical cryptographic ones very often (at least once per day I find myself in a new one). It's not typical that I read a research paper introducing a novel primitive or construction where I follow the math immediately on a single pass, and I often come across things I need to read about first. From a thirty thousand foot view the math for both of these subjects is broadly similar in rough topical surface area, so I think this methodology for academic reading is fairly applicable to most subjects that involve a lot of mathematics understanding.
Basically: don't approach learning the heavy math with a monolithic, brute-force approach as if you were in university. That's a slog and it's demotivating. Learn the minimum foundation for each area you need, then proceed to more advanced topics as you need them.
The linear algebra is obviously key, and I wish we'd done more of the in my advanced high school classes instead of elementary analysis.
EDIT: I think the main topic missing from my background is this so-called "analysis". I never formally studied it. Is there a more efficient way to study analysis than spivak's, for someone who has a decent background otherwise?
Analysis is basically "really rigorous calculus". Basic analysis courses are also usually where you learn to do proofs.
(To some reasonable generality "calculus" stands for "rules of manipulation", while analysis is the mathematical theory of calculus. So I can teach you stochastic calculus in a couple of two-hour sessions but understanding what the hell is going on (stochastic analysis) requires measure theory, some functional analysis and much courage)
Where do you find new ones? I'd like to get into this.