
Learning Math for Machine Learning - vincentschen
https://blog.ycombinator.com/learning-math-for-machine-learning/
======
ivan_ah
Here is a nice "cheat sheet" that introduces many math concepts needed for ML:
[https://ml-cheatsheet.readthedocs.io/en/latest/](https://ml-
cheatsheet.readthedocs.io/en/latest/)

> As soft prerequisites, we assume basic comfortability with linear
> algebra/matrix calc [...] >

That's a bit of an understatement. I think anyone interested in learning ML
should invest the time needed to __deeply __understand Linear Algebra:
vectors, linear transformations, representations, vector spaces, matrix
methods, etc. Linear algebra knowledge and intuition is key to all things ML,
probably even more important than calculus.

Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a
compact little brick that reviews high school math (for anyone who is "rusty"
on the basics), covers all the standard LA topics, and also introduces dozens
of applications. Check the extended preview here
[https://minireference.com/static/excerpts/noBSguide2LA_previ...](https://minireference.com/static/excerpts/noBSguide2LA_preview.pdf)
and the amazon reviews
[https://www.amazon.com/dp/0992001021/noBSLA#customerReviews](https://www.amazon.com/dp/0992001021/noBSLA#customerReviews)

~~~
ssivark
> _I think anyone interested in learning ML should invest the time needed to
> deeply understand Linear Algebra: vectors, linear transformations,
> representations, vector spaces, matrix methods, etc. Linear algebra
> knowledge and intuition is key to all things ML, probably even more
> important than calculus._

To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic
reasoning (probability theory, stochastic processes, Bayesian reasoning,
graphical models, variational inference) might be equally if not more
important.

The emphasis on linear algebra is an artifact of a certain computational
mindset (and currently available hardware), and the recent breakthroughs with
deep neural networks (tremendously exciting, but modest success, in the larger
scheme of what we wish to accomplish with machine learning). Ideas from
probabilistic reasoning might well be the blind spot that's holding back
progress.

Further, for a lot of people doing "data science" (and not using neural
networks out the wazoo) I think that they can abstract away several linear
algebra based implementation details if they understand the probabilistic
motivations -- which hints at the tremendous potential for the nascent area of
"probabilistic programming".

~~~
throwawaymath
_> To play devil's advocate, probabilistic reasoning (probability theory,
stochastic processes, Bayesian reasoning, graphical models, variational
inference) might be equally if not more important._

And of course, you're not going to get very far with probability theory and
stochastic processes unless you have a mature understanding of analysis and
measure theory :)

This comment exchange neatly demonstrates the intrinsic problem. Most of these
articles start off much like this one does: by assuming "basic comfortability
with linear algebra." That sounds straightforward, but most software engineers
_don 't have it._ They haven't needed it, so they haven't retained it even if
they learned it in college. It takes a good student a semester in a classroom
to achieve that "comfortability", and for most it doesn't come until a second
course or after revisiting the material.

If you don't already have it, you can't just use StackExchange to fill in the
blanks. The random walk method to learning math doesn't really pan out for
advanced material because it all builds on prior definitions. Then people like
you make a comment to point out (correctly) that probability theory is just as
important for all the machine learning that isn't just numerical optimization.
But unless you want to restrict yourself to basic statistics and discrete
probability, you're going to have a bad time working on probability without
analysis. And analysis is going to a pain without calculus, and so on and so
forth.

There are certain things you need to spend a lot of time learning. Engineering
and mathematics are both like that. But I think many of these articles do a
disservice by implying that you can cut down on the learning time for the math
if you have engineering experience. That's really not the case. If you're
working in machine learning and you need to know linear algebra (i.e. you
can't just let the underlying library handle that for you), you can't just
pick and choose what you need. You need to have a robust understanding of the
material. There isn't a royal road.

I think it's really great people like the author (who is presumably also the
submitter) want to write these kinds of introductions. But at the same time,
the author is a research assistant in the Stanford AI Lab. I think it's fair
to say he may not have a firm awareness of how far most software engineers are
from the _prerequisites_ he outlined. And by extension, I don't think most
people know what "comfortability with linear algebra" means if they don't
already have it. It's very hard to enumerate your unknown unknowns in this
territory.

~~~
Xeronate
I get what you are saying, but is the right way to learn math with a
"connected path". I've heard "The art of problem solving" series works through
math in the correct way, but I'm not sure how far I would get on that alone.
Right now I'm trying to gain intuition in linear algebra via OCW with Strang,
but I would like to truly understand. Is the only way to just to do a second
bachelors in math?

~~~
throwawaymath
You don't need to do a second bachelors - you really need four or so courses.
If you have the patience and dedication you can sit down with the textbooks
and work through them on your own.

~~~
asafira
This.

There's always more you might want to learn, but when people talk about these
basics, it's really just being super focused in 4 or so classes, not a whole
ivy league undergrad curriculum in math.

probability & stats, multivariable calculus, and linear algebra will take you
a long way.

~~~
Xeronate
Cool. I will look into those, but I was asking as a general interest in math
question. I actually have no interest in machine learning. I'm bored of
chasing money. Interested in 3D computer graphics and math for math's sake.

------
JesseAldridge
I think a lot of people need to start from the basics because they don't have
a good foundation in math. The core problem is schools will push you along if
you can somehow produce the correct answer for 70% of the problems on a test.
Combine this with intense pressure not to fail and you will very likely end up
in higher level math courses with many gaping holes in your foundational
knowledge. You thus end up relying on tricks and memorization rather than
useful understanding. Here is a TED talk where Sal Khan of Khan Academy talks
about this:
[https://www.youtube.com/watch?v=-MTRxRO5SRA](https://www.youtube.com/watch?v=-MTRxRO5SRA)

After struggling to understand advanced math in a lot different contexts I
decided to go through the entire K-12 set of exercises on Khan Academy. I
blazed through the truly elementary stuff like counting and addition in a few
hours, but I was suprised at how quickly my progress started slowing down. I
found I could not solve problems involving negative numbers with 100%
accuracy. Like (5 + (-6) - 4). I would get them right probably 90% of the time
but the thing is Khan Academy doesn't grant you the mastery tag unless you get
them right 100% of the time. I found most of my problems were due to sloppy
mental models. Like, I didn't understand how division works -- if someone were
to ask me what (3/4) / (5/6) even means conceptually I would not have been
able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6
of 3/4... wait no that's multiplication... you need to flip the second
fraction over... for some reason..." It was around the 8th grade level that I
found myself having to actually work hard. (What does Pi even mean?) And I've
been through advanced Calculus courses at the university level.

~~~
throwawaymath
_> Like, I didn't understand how division works -- if someone were to ask me
what (3/4) / (5/6) even means conceptually I would not have been able to
provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of
3/4... wait no that's multiplication... you need to flip the second fraction
over... for some reason..."_

In case you (or others reading this) still struggle to formalize division, a
very nice way to conceptualize it is as the inverse of multiplication. This
neatly sidesteps the problem of trying to figure out a clean analogue for what
it means to to multiply a fraction of something by another fraction of
something, since the intuitive group-adding idea of multiplication sort of
breaks down with ratios.

Addition is a straightforward operation, but subtraction is trickier. For all
real _x_ there exists an additive inverse _-x_ satisfying _x_ \+ _(-x)_ = 0.
So to subtract 3 from 4 we instead take the sum 4 + (-3) = 1.

Likewise to multiply 3 by 4 we add four groups of 3: 3 + 3 + 3 + 3 = 12. We
accomplish division by using a multiplicative inverse: for all real _x_ there
exists a 1/ _x_ such that _x_ (1/ _x_ ) = 1.

So (3/4) / (5/6) is equal to (3 * 1/4) / (5 * 1/6). In other words, take the
multiplicative inverse of 4 and 6 and multiply them by 3 and 5 respectively.
Then multiply the first product by the inverse of the second product.

This is the axiomatic basis of division as "repeated subtraction": subtraction
is the sum of a number and another number's additive inverse, and
multiplication is repeated addition. Then division is the product of a number
and another number's multiplicative inverse. From this perspective you need
not even understand division computationally if all you'll ever deal with are
fractions and not decimals.

~~~
andai
So this is really interesting, math is all about relationships, and you've got
a really solid understanding of how different operations are related.

------
cs702
This is excellent. Thank you for taking the time to write it.

I don't know what is it about math -- especially when it involves manipulation
of symbols as opposed to pictures or lay language -- that turns off so many
people.

The fact that so many software developers "don't like math" is ironic, because
they're perfectly happy to manipulate symbols such as "x", "file", or
"user_id" that stand in for other things every day. The entirety of
mathematical knowledge is very much like a gigantic computer language (a
formal system) in which every object is and must be precisely defined in terms
of other objects, using and reusing symbols like "x", "y", "+", etc. that
stand in for other things.

Perhaps the issue is _motivation_? Many wonder, "why do I need to learn this
hard stuff?" If so, the approach taken by Rachel Thomas and Jeremy Howard at
fast.ai seems to be a good one: build things, and then fill the theoretical
holes as needed, motivated by a genuine desire to understand.

~~~
mindcrime
_I don 't know what is it about math -- especially when it involves
manipulation of symbols as opposed to pictures or lay language -- that turns
off so many people._

I can tell you at least part of it, from my subjective perspective. I tend to
"think" in a very verbal fashion and I instinctively try to sub-vocalize
everything I read. So when I see math, as soon as I see a symbol that I can't
"say" to myself (eg, a greek letter that I don't recognize, or any other
unfamiliar notation) my brain just tries to short-circuit whatever is going
on, and my eyes want to glaze over and jump to the stuff that _is_ familiar.

OTOH, with written prose, I might see a word I don't recognize, but I can
usually work out how to pronounce it (at least approximately) and I can often
infer the meaning (at least approximately) from context. So I can read prose
even when bits of it are unfamiliar.

There's also the issue that math is so linear in terms of dependencies, and
it's - in my experience - very "use it or lose it" in terms of how quickly you
forget bits of it if you aren't using it on day-in / day-out basis.

~~~
ivan_ah
Very good point about reading/verbalizing symbols and notation. Once you know
what they mean, they are super useful for expressing complex concepts
precisely and compactly, but when you're getting started they look like an
alien language...

~~~
mindcrime
This is why I'm using Anki to memorize the Greek alphabet, and to keep basic
algebraic (h.s. algebra that is) stuff in mind. It might seem like a small
thing, but remembering the various rules for factoring, working with
fractions, dealing with exponents / root, etc. is not easy when you don't do
math all the time.

------
g9yuayon
I'm kinda curious why so many people think that Linear Algebra Done Right is
an introductory book for beginners who have math anxiety. Don't get me wrong,
the book is great and I enjoyed working it through. It was a magical
experience when I saw how simple it was to prove some seemingly hard theorems
by just linking the right definitions and theorems. That said, the book does
require certain level of math maturity as it achieves its elegance by staying
at certain level of abstraction and its style is quite formal, so much so that
a person who can use this book as its first linear algebra textbook shouldn't
have math anxiety at all.

~~~
throwawaymath
Speaking as one of the people who recommended it in this thread: I don't think
math anxiety is the right focus for which textbook to choose. More precisely,
I don't think you should try to solve that problem by getting a different
linear algebra textbook. To put it bluntly, someone with math anxiety probably
just doesn't have the mathematical maturity for linear algebra yet. In that
case they'd be doing themselves a disservice by attempting the material using
some sort of "more accessible" book; instead, they should focus on resolving
that anxiety through developing a solid foundation in the prerequisite
material.

Linear Algebra is typically the first course in which students have to
transition from predominantly rote computation to proof-based theory. Axler's
_Linear Algebra Done Right_ is very often the textbook used for that course
because it (mostly [1]) lives up to its name. This isn't Math 55: compared to
Rudin and Halmos, Axler is a very accessible introduction to linear algebra
_for those who are ready for linear algebra_. The floor for understanding this
subject doesn't doesn't get much lower than Axler (and in my opinion, it
doesn't get much _better_ at the undergraduate level either).

It's unfortunate that so many people want to skip to math they're not ready
for, because there's no shame in building up to it. A lot of frustration can
be eliminated by figuring out what you're actually prepared for and starting
from there. If that means reviewing high school algebra then so be it; better
to review "easy" material than to bounce around a dozen resources for advanced
material you're not ready for.

__________________

1\. See Noam Elkies' commentary on where it could improve:
[http://www.math.harvard.edu/~elkies/M55a.10/index.html](http://www.math.harvard.edu/~elkies/M55a.10/index.html)

~~~
jacobolus
Tools from linear algebra can be accessible and useful to many people who
don’t want to (or are not yet prepared to) prove nontrivial theorems. Indeed,
a book like Axler’s should probably be used in a _second_ semester-long linear
algebra course for typical undergraduates wanting to study abstract
mathematics; a gentler more concrete introduction would probably be better for
students without previous exposure to linear algebra or hard mathematical
thinking. For engineers or others who want to use linear algebra in practical
contexts, something like Boyd & Vandenberghe’s new book might be a better for
a first (or even second) course than Axler’s book,
[https://web.stanford.edu/~boyd/vmls/](https://web.stanford.edu/~boyd/vmls/)

Elkies’s post is in the context of a course for _very_ well prepared and
motivated first-year undergraduate pure math students who are racing through
the undergraduate curriculum because most of them intend to take graduate-
level courses starting in their second year.

Those two audiences are very far apart.

~~~
throwawaymath
_> Those two audiences are very far apart._

Yes, that's precisely why I said, "This isn't Math 55: compared to Rudin and
Halmos, Axler is a very accessible introduction to linear algebra for those
who are ready for linear algebra."

How do you propose to teach linear algebra beyond basic matrix operations and
Gaussian elimination if you're not teaching any theory? You can take some
disparate tools from linear algebra (just like you can with analysis to make
calculus), but The presentation of learning the mechanical tools of linear
algebra versus the theory of linear algebra is a false dichotomy. Axler's
textbook is a very nice compromise that provides students an understanding of
_why_ things are the way they are while still teaching them how to work
through the numerical motions of things. You need not go so far as reading
_Finite Dimensional Vector Spaces_ if you want to avoid theory, but you need
_enough_ of it to put the mechanical operations in some kind of context.

~~~
jacobolus
Personally I think that the undergraduate mathematics curriculum does a poor
job of exposing people to examples and concrete situations before introducing
new abstractions.

Students are often entirely unfamiliar with the context (problems, structures,
goals, ...) for the new abstractions that are rained down on them, and end up
treating their proofs as little exercises in symbol twiddling / pattern
matching, without much understanding of what they are doing.

The undergraduate curriculum is put in this position because there is a lot of
material to get through in not much time, and students are generally
unprepared coming in. Ideally students would have a lot of exposure to basic
material and lots of concrete examples starting in middle school or before,
but that’s not where we are.

~~~
throwawaymath
I think we're in agreement on that point. In my experience most peoples'
difficulty with higher mathematics comes from the tendency of elementary and
high schools to push students along through grades without ensuring they've
really mastered the material. Unfortunately most students come to hate math
because they're introduced to ever more abstract and complex material when
they haven't achieved a solid foundation to build upon. I don't see this
artifact of our education system going away any time soon.

------
SpaceManNabs
My bullet list, which might be too ambitious and theory-focused, but this is
what I used from my physics background.

Learn some:

Calc up to 3 (you can skip some of the divergence and curl stuff)

Linear algebra (no need for Jordan change of basis)

Real analysis

Intermediate probability theory (MAE, MAP, conjugate priors minus the measure
theory stuff)

A little bit of differential geometry (at least geodesics. This is for
dimension reduction)

Discrete math (know counting and sums really well)

Learn a little bit of Physics (at least know Lagrangians and Hamiltonians)

A little bit of complex analysis (to know contour integration and
fourier/laplace transforms)

Some differential equations (up to Frobenius and wave equations)

Some graph theory (my weak spot, but I have used the matrix representations a
few times)

After all that, read some Kevin Murphy and Peter Norvig.

Congrats, now you can read most machine learning papers. The above will also
give you the toolkit to learn things as they come up like Robbins-Monro.

OP's article is much better if you are trying to be a ML
developer/practitioner. Like I said, this list might be too theory focused,
but it lets me read lots of applied math papers that aren't ML focused.

~~~
blt
I'm interested to know where you encountered contour integrals in machine
learning?

~~~
mlevental
ya lol and Hamiltonians. sometimes people just reel off all the math they've
heard of to sound impressive. next we'll have people talking about de rham
cohomology because of TDA (or something like that)

~~~
cfcf14
Hamiltonian mechanics, along with many other seemingly out of place 'advanced'
maths, show up in modern Bayesian statistics pretty frequently. Hamiltonian
Monte Carlo/Riemannian Manifold Monte Carlo are pretty cutting edge (although
are implemented in popular libraries like MC-Stan and Pymc3) and both require
fairly advanced physics to really understand.

Additionally, we're seeing the introduction of even more sophisticated
stochastic samplers (stochastic gradient hamiltonian monte-carlo, etc) that
require even more esoteric branches of math and physics to really grok. I have
a strong math background but frequently find myself struggling with a lack of
knowledge in statistical mechanics when trying to read papers in these areas.

So yeah - there's plenty of bullshit and exaggeration. But there's also some
wicked cool stuff happening which requires very sophisticated (and
specialized) knowledge to understand.

~~~
blt
I agree. My original question came from curiosity, not incredulity :)

~~~
cfcf14
:) Personally I've never used any serious complex analysis in my job (I'm very
grateful too, because I always struggled a bit with it). The closest thing
I've seen, which I did run into recently, is the use of complex numbers to
compute very accurate finite differences. It's one of those delightful tricks
that is both elegant and useful:
[https://blogs.mathworks.com/cleve/2013/10/14/complex-step-
di...](https://blogs.mathworks.com/cleve/2013/10/14/complex-step-
differentiation/)

I've been working in golang, which fortunately has built-in complex128 types,
so it's proved very helpful in a project!

~~~
blt
oh cool, I really like Cleve Moler's Matlab posts.

I took a complex analysis class and did OK, but I get the feeling that EEs are
the ones who really benefit from it (at least in the applied world). They seem
to have some very rich analyses of linear dynamical systems using frequency
domain methods.

------
shashanoid
For those who don't know, please check out 3blue1brown videos on youtube for a
better understanding of concepts like Linear algebra and other things required
for machine learning. Thank me later.

~~~
vincentschen
I love 3blue1brown! Will add to resources. :)

------
skadamat
This is something we're striving hard to do at the startup I'm involved with
(end-to-end resources for learning machine learning, with just high school
math background assumed).

In our Data Scientist Track ([https://www.dataquest.io/path/data-
scientist?](https://www.dataquest.io/path/data-scientist?)), I specifically
focused on teaching K-nearest neighbors first b/c it has minimal math but you
can still teach ML concepts like cross-validation, and then I wrote Linear
Algebra and Calculus courses before diving into Linear Regression.

[https://www.dropbox.com/s/lh23y44dsg96xpv/Screenshot%202018-...](https://www.dropbox.com/s/lh23y44dsg96xpv/Screenshot%202018-08-01%2014.55.45.png?dl=0)

------
xenihn
I recommend the No Bullshit books for anyone with no real math background past
trig to get their feet wet, and/or anyone who hasn't done any serious math
study for years.

[https://minireference.com/](https://minireference.com/)

~~~
ivan_ah
Thx! Had I known you'll post this, I wouldn't have self-promoted so
shamelessly :) I'll add some direct links to PDF previews:

MATH & PHYS book:
[https://minireference.com/static/excerpts/noBSguide_v5_previ...](https://minireference.com/static/excerpts/noBSguide_v5_preview.pdf)

LA book:
[https://minireference.com/static/excerpts/noBSguide2LA_previ...](https://minireference.com/static/excerpts/noBSguide2LA_preview.pdf)
\+ free tutorial:
[https://minireference.com/static/tutorials/linear_algebra_in...](https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf)

~~~
xenihn
Hah I beat you by 2 minutes. Thanks for the great books!

------
ultrasounder
Thanks for posting this!! Was actually searching for this the other day here
on HN and found a link to the [https://github.com/mml-book/mml-
book.github.io](https://github.com/mml-book/mml-book.github.io). haven't
checked it out yet but the links in the OP look solid.

~~~
vincentschen
Looks interesting! Have you gone through it yourself? And how does it compare
to other resources?

~~~
ultrasounder
Like I commented above I haven't had a chance to go through the
[https://github.com/mml-book/mml-book.github.io](https://github.com/mml-
book/mml-book.github.io) book yet. But now that I have read your article in
full I think diving headlong first with ML and then back filling the
Math/Stat/Prob holes is the best approach to learn ML engineering. Like SICP
authors mused about modern software development as being "programming by
poking at it using APIs" instead of just lesrning to program just for the heck
of it.

------
rasmi
Hi Vincent, you may want to point your "Best Practices for ML Engineering"
link to the non-PDF version here: [https://developers.google.com/machine-
learning/guides/rules-...](https://developers.google.com/machine-
learning/guides/rules-of-ml/)

~~~
vincentschen
Thanks for the heads up!

------
Bizarro
The LAFF: Linear Algebra class just started again for the "fall semester"
[https://courses.edx.org/courses/UTAustinX/UT.5.02x/1T2015/co...](https://courses.edx.org/courses/UTAustinX/UT.5.02x/1T2015/course/)

Maybe one of these days I'll complete it :)

I really like 3Blue1Brown for a wide range of math topics. He's just a great
teacher.

[https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)

Frankly, I find the UTAustin linear algebra class less than ideal or optimal,
but it's free and lots of classmates, material, so...

------
thanatropism
Re: PCA vs. tSNE. I don't know much about tSNE, but if it _is_ a "manifold
learning method" as the sklearn docs say, you could try something like LTSA
instead:

e.g.
[http://www.aaai.org/ocs/index.php/aaai/aaai11/paper/download...](http://www.aaai.org/ocs/index.php/aaai/aaai11/paper/download/3603/3894)

Then, it's not difficult to understand what a manifold _is_ , but it took me a
number of attempts to get it, and then I only did when studying them formally
with Spivak 1963. Now the concept of manifold seems patently obvious to me and
not really needing much formalization, but...

~~~
vincentschen
Thanks for the reference... will give it a read.

~~~
nimithryn
There's also UMAP, which is new but looks promising.

------
amorphous
Just wondering if someone had a similar experience: I absolutely loved Math in
school, zipped through the classes, always one of the best.

Then things changed at university (studying computer science) and I completely
lost interest. Not sure why (bad teacher, going from being best in class to
being average, the math at uni different from school).

Now, much later, I regret not having followed through and miss the beauty of
Math. I'm re-discovering it and wondering how I could use more of it in my
work.

------
harias
Nice article. Would you recommend this MOOC?
[https://www.coursera.org/specializations/mathematics-
machine...](https://www.coursera.org/specializations/mathematics-machine-
learning) It doesn't focus on probability or statistics though. If not, is
there any other MOOC you would suggest?

------
Tenoke
>A student’s mindset, as opposed to innate ability, is the primary predictor
of one’s ability to learn math (as shown by recent studies).

The article seems good overall, but I only skimmed the rest after seeing a
citation of a 5-year-old Atlantic article describing disputed and at minimum
highly exaggerated findings presented as 'shown in recent studies'.

~~~
mkl
That may be a bad reference, but there are lots of studies about this. See the
book _Mathematical Mindsets_ by Jo Boaler for many references.

------
Bizarro
I really want a shallow-dive into machine learning and I know I need linear-
algebra as a foundation. I would love an interactive course in linear algebra
where we could input matrices and see some visual stuff with animations.

~~~
ivan_ah
Check out 3blue1brown on youtube
[https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
Everything this guy does is gold!

This is also really good for connecting LA concepts with visuals
[http://immersivemath.com/ila/index.html](http://immersivemath.com/ila/index.html)

------
dsiegel2275
Does anyone have suggestions on learning resources for matrix calculus? I'm
trying to come up to speed with the topic and could use pointers to worked
examples, video lectures, etc.

~~~
iamaaditya
[The Matrix Calculus You Need For Deep
Learning]([https://arxiv.org/pdf/1802.01528.pdf](https://arxiv.org/pdf/1802.01528.pdf))

------
lordfoom
Anyone have a suggestion for a good online course in linear algebra?

~~~
throwawaymath
Yes, UIUC offers very good online math courses:
[https://netmath.illinois.edu/college/math-415](https://netmath.illinois.edu/college/math-415).
There is also a more pure/abstract version of that course available.

If you don’t care about accreditation and are patient, sit down with Axler’s
_Linear Algebra Done Right_ and Hoffman & Kunze’s _Linear Algebra_ , in that
order.

I would caution you against trying to learn linear algebra using a “take what
you need” approach. A random walk approach to learning the material is faster
than an accumulation approach, but it’s more brittle and prone to confusion. A
lot of things which appear to be irrelevant or unnecessary for machine
learning (computation or research) can be imperative for understanding or
implementing much more complex concepts later on.

~~~
lordfoom
Thank you! I am interested in the knowledge rather than the credits, so I
appreciate the book recommendations :)

------
saintPirelli
I have been waiting for something like this for months. This is inconceivably
valuable to me. Thank you so much!

------
coherentpony
In the author's example, the function max(0, x) they subsequently
differentiate isn't differentiable.

~~~
artwr
It is within the context of distributions or generalized functions
([https://en.wikipedia.org/wiki/Distribution_(mathematics)](https://en.wikipedia.org/wiki/Distribution_\(mathematics\)))
but people are often loose on the terminology and tend to just use the term
"functions". It's a wonderful topic, with a lot of interesting applications in
differential equations and physics.

I just found a quick explanation by Terence Tao about why people are generally
loose in this case, meaning that some properties transition nicely from smooth
(here, differentiable) top the rough categories by passing to the limit and
density arguments:
[http://www.math.ucla.edu/~tao/preprints/distribution.pdf](http://www.math.ucla.edu/~tao/preprints/distribution.pdf)

Of course there are exceptions.

------
graycat
Might look at the thread

Foundations Machine Learning (bloomberg.github.io)

at

[https://news.ycombinator.com/item?id=17519591](https://news.ycombinator.com/item?id=17519591)

There machine learning (ML) is basically a lot of empirical curve fitting. The
context is usually with a lot of data, thousands of variables, millions or
billions of data points, observations, pairs of values of thousands of
independent variables and the value of the corresponding dependent variable.
The work is all a larger, more data, version of: You have a high school style
X-Y coordinate system and some points plotted there. So, you want to find
values for coefficients a and b so the line

y = ax + b

fits the points as well as possible. But, you can do variations, try to fit,
say,

log(y) = a sin(x) + b

Or replace log or sin with any functions you want and try again.

The _logic_ , rational support, is essentially as follows: So, take, say, 1000
x-y pairs. Partition these into 500 _training_ data and 500 _test data_. Find
the best fit you can, using whatever fits, to the training data. Then take the
equation and see how well it fits the test data. If the fit of the test data
is also good, then that is your _model_.

Now you want to apply the model in practice, apply the model to data did not
see in the given 1000 points. So for the application, will be given a value of
x, plug it into the equation, and get the corresponding value of y. That's
what you want -- maybe the value of y gives you Y|N for ad targeting, Y|N
cancer, what MSFT will be selling for next month, what the revenue will be for
next year, etc.

The rational, logical justification here is an assumption (which should have
some justification from somewhere) that the x you are given and the y you want
for that value of x is sufficiently _like_ the x-y values you had in the
original 1000 points.

Okay. Empirical curve fitting to a lot of data to make a predictive model,
that is found with training data, tested with test data, and applied where the
given data in the application is _like_ the data used in the fitting.

The OP mentions that some people believe that to make progress to real machine
intelligence, need more math than what I outlined.

My guess is that to make that intended progress, for all but some tiny niche
cases, first need some much more powerful and quite different ideas,
techniques, etc. than in the curve fitting ML I outlined.

Yes, there is a chance that with lots of data from working brains and lots of
such empirical fitting we will be able to find some fits that will uncover
some of the workings of the brain crucial for real intelligence. Uh, that's a
definite maybe!

But there is a lot more to what can be done to build predictive models than
such curve fitting, empirical or otherwise. I outlined some such in the thread
that I referenced above.

So, for the question in the OP, what math? Well, if want to pursue directions
other than the empirical curve fitting in the Bloomberg course I referenced
above, my experience is -- quite a lot. For the education, start with a good
undergraduate major in pure math. So, cover the usual topics, calculus,
abstract algebra, linear algebra, differential equations, advanced calculus,
probability, statistics. Then continue with more in algebra, analysis, and
geometry.

