
Ask HN: What maths are critical to pursuing ML/AI? - chrisherd
What maths must be understood to enable pursuit of either of the above fields? are there any seminal texts&#x2F;courses&#x2F;content which should be consumed before starting?
======
CuriouslyC
You absolutely need a solid grounding in multi-variable calculus, linear
algebra, probability theory and information theory. It will also be helpful to
be well versed in graph theory.

In my opinion one of the best starting points is "Information Theory,
Inference and Learning Algorithms" by David MacKaye. It's a bit long in the
tooth now, but it is still one of the most approachable and well written books
in the field.

Another old book that stands up very well is "Probability Theory: the Logic of
Science" by E. T. Jaynes.

"Elements of Statistical Learning" by Tibshirani is also good.

"Bayesian Data Analysis" by Andrew Gelman is another great read.

"Deep Learning" by Ian Goodfellow and Yoshua Bengio is useful for getting
caught up with recent advances in that field.

~~~
tptacek
I'm not super interested in ML but I am very interested in applied mathematics
in computer science. I've got a fair bit of linear algebra due to
cryptography, but have had virtually no need of any form of calculus (unless
I'm relying on it without knowing it) in my career.

So beyond just saying that you'd need grounding in multivariable calculus to
do serious ML work, I would be super interested in hearing more about why that
is and what kinds of problems crop up in ML that demand it.

~~~
jules
Most of ML is fitting models to data. To fit a model you minimise some error
measure as a function of its real valued parameters, e.g. the weights of the
connections in a neural network. The algorithms to do the minimisation are
based on gradient descent, which depends on derivatives, i.e. differential
calculus.

If you're doing Bayesian inference you're going to need integral calculus
because Bayes' law gives the posterior distribution as an integral.

For ML you just need Calculus 1 and 2. The curl/div and Stokes is Calculus 3
which a physics thing. You don't need that for ML.

You may need the basics of functional analysis in certain areas of ML, which
is arguably Calculus 4.

~~~
LrnByTeach
Could not agree more .......

> Most of ML is fitting models to data. To fit a model you minimise some error
> measure as a function of its real valued parameters, e.g. the weights of the
> connections in a neural network. The algorithms to do the minimisation are
> based on gradient descent, which depends on derivatives, i.e. differential
> calculus.

> If you're doing Bayesian inference you're going to need integral calculus
> because Bayes' law gives the posterior distribution as an integral.

------
leecarraher
It will depend on the level you plan to engage in the ML/AI space. If you just
want a job in ML/AI , you are in luck. Due to the growing assortment of
available, mostly to fully automated, solutions like Datarobot, H2O, sckit-
learn, keras(w/ tensorflow) the only math you will absolutely 'need' is
probably just Statistics. Regardless of what's going on behind the scenes with
whatever automatically tuned and selected algorithm your chosen solutions
uses, you will still need some stats in the end to show the brass that 'your'
model works. the upside is that then you can spend time, learning feature
extraction, data engineering, and the aforementioned toolkits, in particular
what models they make available.

If you want to develop new techniques and algorithms, the the skies the limit,
you'll of course want Stats too though.

~~~
rcarrigan87
Can you recommend a Stats course that would be most relevant for people trying
to be more practitioners (not researchers)?

~~~
samstave
Please just recommend the best online stats course you know of as a general
toolbelt-notch.

~~~
mindcrime
There's a series of courses on Coursera, part of a Specialization from Duke
titled something like "Statistics and Probability with R" or something like
that. I've taken the first few classes in that series and have found them
pretty good. The class on Bayesian Statistics is a little more difficult, but
not too bad. I'll just say that you might want to complement the class with
another book or other references on Bayesian stats. I've used this book:

[https://www.amazon.com/Bayes-Rule-Tutorial-Introduction-
Baye...](https://www.amazon.com/Bayes-Rule-Tutorial-Introduction-
Bayesian/dp/0956372848)

------
irchans
Basic probability is very helpful: Expectation, Standard deviation, P(A and B)
= P(A)*P(B) is A and B are independent, P(A or B) = P(A)+P(B) if A and B are
mutually exclusive. Also, knowing algebra is very helpful.

In a way, you don't really need to know much more because there is a lot of
good software out there.

If you want to learn more math, learn Linear Regression, Logistic Regression,
p-values, probability density functions, cumulative density function, the
Central Limit Theorem, Gaussian Distributions, Exponential Distributions,
Binomial Distribution, (maybe) Student-T distribution.

If you want to learn even more, first learn matrices (adding, multiplying,
inverting, rank, span, matrix decomposition (SVD, and eigendecomposition are
the most important)).

If you want to learn even more, it's time to learn calculus. Integral calculus
is needed for continuous probability distributions and information theory.
Differential calculus is needed to understand back propagation.

There are a lot of other good suggestions written by the other commentators.

~~~
septimus111
This is a great list of the main concepts to know.

------
KirinDave
If you care about actually reading the nournals, as I do, and you had a very
poor math education (as mine was abysmally opposed to both math and science as
enemies of religion) then here are things I've determined I need to know to
read journals:

\- Core statistics. You need to be familiar with how statisticians treat data,
because it comes up a lot.

\- Calculus. You do not need to be a wizard at working the numbers but you do
need to understand how to describe the process of differentiation and
integration over multiple variables comfortably.

\- Linear algebra. It's essentially the basis for everything, even more than
statistics.

\- Numerical nethods for computing. I constantly have to refer to references
to understand why people make the choices they do.

\- Theory of computation and the research clustered around it. Familiarity
here helps a lot. Sometimes I even catch errors or am able to recognize
improvements available. Also there is a lot of crossover, as one would expect.
An example: everyone is remembering how good automatic differentiation is! And
given that properly combined differentiable equations are also differentiable,
AD let's you optimize over your optimization process. It's differentiable
turtles all the way down.

My next big challenge is nonparametric statistics. Many researchers tell me
that this is a very fruitful place to be and many methods there are
increasingly making improvements in ML.

~~~
cynicaldevil
How did you learn these topics? Did you solve problems for each of them?

~~~
KirinDave
Reading, study, and textbooks.

Tbh, I'm not where I want to be with them. So maybe next year I can talk about
2017 and my math oddessy.

------
mindcrime
It depends on how deep you want to go and what your goals are, but I'd say
that CuriouslyC pretty much nailed it. Multi-variable calculus, linear
algebra, and probability / stats are definitely the core.

If you're interested in finding more "freely available online" maths
references, check out:

[http://people.math.gatech.edu/~cain/textbooks/onlinebooks.ht...](http://people.math.gatech.edu/~cain/textbooks/onlinebooks.html)

[http://www.openculture.com/free-math-
textbooks](http://www.openculture.com/free-math-textbooks)

[https://open.umn.edu/opentextbooks/SearchResults.aspx?subjec...](https://open.umn.edu/opentextbooks/SearchResults.aspx?subjectAreaId=7)

[https://ocw.mit.edu/courses/online-
textbooks/#mathematics](https://ocw.mit.edu/courses/online-
textbooks/#mathematics)

[https://aimath.org/textbooks/approved-
textbooks/](https://aimath.org/textbooks/approved-textbooks/)

There's also a TON of high-quality maths instructional content on Youtube,
Videolectures.net, etc. For example, there's some really good stuff by David
McKay (also mentioned in CuriouslyC's post) here:

[http://videolectures.net/david_mackay/](http://videolectures.net/david_mackay/)

Be sure to check out Professor Leonard:

[https://www.youtube.com/user/professorleonard57](https://www.youtube.com/user/professorleonard57)

Gilbert Strang:

[https://www.youtube.com/results?search_query=gilbert+strang](https://www.youtube.com/results?search_query=gilbert+strang)

and 3blue1brown:

[https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw)

as well.

~~~
jonahx
Another upvote for 3blue1brown. I just watched his linear algebra series and
it's probably the most outstanding math instruction I've encountered.

~~~
zimzim
[https://www.youtube.com/user/EugeneKhutoryansky](https://www.youtube.com/user/EugeneKhutoryansky)

another nice yt channel about math and physics.

------
WhitneyLand
Surprising level of disagreement here on a few items for a sub field that has
its own degree tracks.

Multiavariable calc you either "abolsutely" need or don't really need. Should
be well versed in graph theory, or don't need it much.

Surely some of the contradiction is caused by different assumptions of what
the goal is. But some of its hard to relate to as a reader. For example, I
haven't been in the field but but have tried to read enough to understand the
concepts, and having studied graph theory I don't see how it's a top 5
recommendation.

I don't doubt anyone's experience, would just be nice to know which assumption
is behind a suggestion.

~~~
PeterisP
To apply known methods in cases where they mostly work, you don't need to know
the math behind them, you just need to know _basic_ stats and _basic_
probability to interpret the results. So if the assumption is that you'll
simply be solving your problems by applying the known methods using the
(great!) tooling made by others, then you don't need the math background; you
can certainly train undergrads to solve quite nifty problems with the powerful
tools without going into much if any detail about the underlying math,
treating it as an engineering problem of following best practices. After all,
the choice of e.g. a particular gradient descent optimization algorithm is
_not_ based on their mathematical properties (the proven bounds are _so_ far
away from practical results, and a better proven bound doesn't correlate that
much with having better results) but on empirical evaluation, and in most
cases you're not going to implement any of the low-level structures/formulas
on your own anyway, in practical solution development you're just going to
choose them from a list by name in the framework of your choice.

On the other hand, if the assumption is that your particular problem _is not_
solvable easily and reliably with the current approaches, then quite a lot of
the math background helps - if you want to improve on the current results, or
debug/understand why your solution doesn't work as intended, or why the
conceptual solution can't work on your problem because of incompatible
assumptions, then these areas of math are useful. If you want to use a new
bleeding-edge construct, or a rare niche construct that's not yet implemented
in the framework of your choice, then you're going to need to write it
yourself, and _then_ you need to understand how it works.

There's a large distance between _using and applying_ ML techniques and
_researching and improving_ ML techniques; it's a continuum, but there's space
for _many_ people standing purely in the applied end.

~~~
WhitneyLand
I think it's more nuanced. On average the better grasp of the theory an
engineer has, the more pathways to success they have. Making better decisions,
less guessing, leading a team, wanting to have input into future products and
services, and so on.

Just having things be less opaque reduces cognitive load, makes more room for
creative solutions.

------
sunsu
None. You can be a productive ML engineer without understanding the math. Many
elitist engineers here will downvote me, but its true. ML libraries that allow
you to quickly get productive have come a long way. BUT, you have to have a
solid understanding of WHICH algorithms/tools to use WHEN. There is also a lot
of "voodoo" knowledge to gain that isn't well documented or explained
(unrelated to maths).

------
gtani
For going all in,
[https://ocw.mit.edu/courses/mathematics/18-657-mathematics-o...](https://ocw.mit.edu/courses/mathematics/18-657-mathematics-
of-machine-learning-fall-2015/syllabus/)

But a good number of people that are doing work haven't taken real analysis,
or it's been awhile and so you should be current on multivariable and vector
calculus. Calculus of variations shows up from time to time.

For math reviews, look at the following (there's others if you want more refs,
ping me):

[http://www.deeplearningbook.org/](http://www.deeplearningbook.org/)

[https://metacademy.org/roadmaps/](https://metacademy.org/roadmaps/)

[http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...](http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-
machine-learning-theory-algorithms.pdf)

------
ratsimihah
Make sure to differentiate between AI researcher and applied AI software
engineer, or whatever that is called.

The former needs the mathematical background mentioned here to develop
groundbreaking algorithms or improve on existing ones, while the latter merely
implements them and requires a much smaller mathematical toolset.

------
Myrmornis
It depends whether you want to work more as an engineer / data analyst, or
more as a "ML researcher". For the latter, then, yes, as everyone says below,
you need to be totally comfortable with multivariable calculus, linear
algebra, probability and statistics, numerical optimization etc. But many jobs
are more practical in nature, in which the main case essential skill is, being
able to run a bunch of different models with different parameter values and
collect and interpret the results, efficiently and reproducibly, and be able
to talk about them and make recommendations for the way forwards. In those
jobs you're not actually going to need to be able to derive updates for
backpropagation, even though it's certainly satisfying to understand it.

~~~
mindcrime
Yep. We have to keep in mind the distinction between "applied ML" and "ML
research" (while realizing that this is a continuum, not a binary
distinction). Not everybody is doing cutting edge original research... some
people really can just get by with downloading DL4J, reading a few tutorials,
and then applying a basic network to their problem, and create some value in
the process.

I think cars are a good analogy. In the early days of automobiles, you needed
to be something just short of a mechanical engineer to keep one going for any
length of time, and it was routine to need to carry around tools and spare
parts to perform significant repairs. You really needed to know a pretty good
bit about how the car worked to use it effectively. But over time cars
developed better abstractions and became more dependable and it became
possible to operate a car without caring one lick about how it works, beyond
know that it needs gas (or electricity!) and taking it in for the occasional
tuneup /tire change / alignment / etc.

I wouldn't say we're at the point yet where ML afford one the opportunity to
be completely divorced from caring about the underlying details, but I think
we are at a point where you can legitimately get useful stuff done without
needing to be able to, say, derive the equations for backprop by hand.

------
mtzet
Honestly, I don't think having to learn some stuff before starting anything is
nessecary, especially for learning a field as wide as ML/AI. It's much better
to start out trying to learning something you're interested in, and then
trying to fill in the gaps. This will also help you understand and motivate
the underlying theory you're reading.

So for example, start with some source in ML/AI you'd like to read. If you get
stuck, ask somewhere (possibly an online forum like this) what field you're
having trouble with and how to get started there.

------
wadams19
Totally depends on where you want to land on the engineering-AI-products to
pure-AI-research spectrum.

So, what do you mean by "pursuing"?

But even still, I would caution against trying to upload a bunch of new math
concepts into your brain without first understanding the ML/AI context.

I would say go through both of Andrew Ng's ML and DL courses on Coursera.

Then, pick a domain/ problem that you're interested in.

Then, read papers about how ML/AI is applied in that domain.

Then, try to reproduce a paper that you understand and are interested in.

------
charlescearl
Michael I. Jordan's suggested reading list has been posted here a few times

[https://news.ycombinator.com/item?id=1055389](https://news.ycombinator.com/item?id=1055389)

------
coconut_crab
Maybe unrelated to OP's question but I have always felt that it is impossible
to get a job in AI/ML without a PhD in that field (by getting a job I mean do
something new/useful and not just coding algorithm devised by other people). I
studied mechatronics in university and fairly comfortable with math (calculus,
linear algebra and stats), I have even written a small neural network back
then to optimise parameters for lathe machining. But that's no where near
enough for a job in AI/ML. Unlike writing a web page, which someone can learn
within a week to produce something usable, I feel like you need years and
years of studying to barely get a start in ML/AI and there is no hope for us
non computer scientist at all.

[Added] Of course writing webpages pays well enough, but I still can't shake
off this feeling that I am missing something by not jumping on the AI/ML train
though.

------
amrrs
Statistics and Probability - For non-math background, Openintro.org with R and
Sas lab is a good one. Khan academy videos on the same again makes a lot of
concepts easier.

[http://www.r-bloggers.com/in-depth-introduction-to-
machine-l...](http://www.r-bloggers.com/in-depth-introduction-to-machine-
learning-in-15-hours-of-expert-videos/) Introduction to Statistical Learning
[http://www-bcf.usc.edu/~gareth/ISL/](http://www-bcf.usc.edu/~gareth/ISL/)
(Rob S and by Trevor H, Free I guess) for more in depth, Elements of
Statistical Learning by the same.

Linear Algebra (Andrew Ng's this part in Introduction to Machine Learning is a
short and crisp one)

If you're not scared by Derivatives, you can check them. But you can easily
survive and even excel as a data scientist or ML practitioner with these.

------
sumitgt
I won't really comment about ML/AI in general. But, if you specifically care
about getting into Deep Learning, I would say only bother looking into: \-
Basic linear algebra and matrix algebra.

Since you would rely on frameworks like Tensorflow to handle figuring out the
derivatives for you, you don't really need to know much calculus. Just read up
on what derivative of a function at a particular point signifies. This should
give you enough intuition to understand things initially.

A skill that would really come in useful would be ability to look at a
function and think how increasing/decreasing one of the variables would affect
it's value. This would help develop intuition around a lot of concepts used in
Deep Learning topologies.

------
rdrey
I previously attempted Andrew Ng's old course, but didn't complete the
tutorials. Now I would start in this order:

Watch the course.fast.ai lectures quickly, just to see a lot of practical
ML/AI applications. You'll see how effective you can be just by knowing the
tools with very little math background.

Next I'd look at the NEW Andrew Ng introduction on Coursera. It is much more
approachable than his first course. You might still feel a little overwhelmed
by a few equations, but then you'll implement them yourself in numpy. (And the
ipython/jupyter notebooks are really well written, walking you through every
step.)

------
mendeza
I wish the people who answer this question are people that are current deep
learning engineers or data scientist that use deep learning in real world
settings, I am worried that people who are not credible are giving advice,
which is not valuable. I am a masters student taking a PhD class in Bayesian
machine learning to figure this out as well. I hope to have a better answer
for this by the end of the course!

~~~
mindcrime
_I wish the people who answer this question are people that are current deep
learning engineers or data scientist that use deep learning in real world
settings,_

Why do you want answers only from people doing deep learning? Deep learning is
just a subset of the overall field (albeit an incredibly popular and useful
one).

Anyway, the simple solution is just to use some simple machine learning of
your own to analyze the data set which these threads constitute. Look for
patterns... are certain answers being repeated over and over again, by
different posters? Then I'd argue that your Bayesian posterior for "this is
legitimately important" should go up.

Take Linear Algebra for example... given the sheer number of people saying
"linear algebra" in their answers, it seems a reasonably bet to me that LA is
really, truly useful. Either that or there's some _really_ freaking group-
think shit going on. :-)

~~~
mendeza
I guess what I am looking for is advice from practitioners who wont lead
people astray who are really interested in diving deep into ML.

I have attempted to read the Statistical Learning book, and its so daunting
because the book expects a lot of background knowledge, and it takes a while
to really wrap your head around these concepts. I think people should learn
from a lighter book, before diving into these books if you are lacking the
background.

My current approach to pursuing a career in DL and ML is going to graduate
school, taking a graduate ML course, and trying to apply my knowledge to
different problems I am interested in.

I am reading the Bishop book Pattern Recognition now. I think from the
perspective of having to re-learn a lot of calculus and probability, that book
is more approachable than Statistical learning.

My advice (which I am attempting now) to dive deep into ML is follows:

1\. Taking Bayesian ML class (at Cornell) 2\. Read/Study Pattern Recognition
by Bishop, for 5hrs/day 3\. Try exercises, if fail, review solutions 4\. If
lost(which is usually), review missing concepts from MIT OCW scholar courses

------
jochenleidner
1\. You can get a long way with high school calculus and probability theory.

2\. Regarding books I second the late David McKay's "Information Theory,
Inference and Learning Algorithms" and the second edition of "Elements of
Statistical Learning" by Tibshirani et al. (there's also a more accessible
version of a subset of the material targeting MBA students called James et
al., An Introduction to Statistical Learning). Duda/Hart/Stork's Pattern
Classification (2nd ed.) is also great. The self-published volume by Abu-
Mostafa/Magdon-Ismail/Lin, Learning from Data: A Short Course is impressive,
short and useful for self-study.

3\. Wikipedia is surprisingly good at providing help, and so is Stack
Exchange, which has a statistics sub-forum, and of course there are many
online MOOC courses on statistics/probability and more specialized ones on
machine learning.

4\. After that you will want to consult conference papers and online tutorials
on particular models (k-means, Ward/HAC, HMM, SVM, perceptron, MLP, linear and
logistic regression, kNN, multinomial naive Bayes, ...).

------
xchip
All I needed to write my conv net library was to understand the chain rule and
some basic multiplication.

People like to make this look harder than it is.

------
ivan_ah
Probability theory and linear algebra are pretty much the core. Learning LA
will help you become comfortable with multi-dimensional quantities, vector
spaces, and give you some powerful computational techniques, e.g., SVG==PCA.

Here is a short tutorial on linear algebra:
[https://minireference.com/static/tutorials/linear_algebra_in...](https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf)
and a preview of the full book:
[https://minireference.com/static/excerpts/noBSguide2LA_previ...](https://minireference.com/static/excerpts/noBSguide2LA_preview.pdf)

------
blt
If you want to understand SVMs deeply, a course in convex optimization. In
general, proving maximum likelihood estimation for a lot of classic machine
learning models involves using the method of Lagrange multipliers. But not
deep neural networks :)

------
pramalin
I found this study plan very useful for me.
[https://www.analyticsvidhya.com/blog/2017/01/the-most-
compre...](https://www.analyticsvidhya.com/blog/2017/01/the-most-
comprehensive-data-science-learning-plan-for-2017/)

Provides a very good idea of the courses required and their time frame. I
roughly followed along this path but took "Analytics Edge"
[https://www.edx.org/course/analytics-edge-
mitx-15-071x-3](https://www.edx.org/course/analytics-edge-mitx-15-071x-3) for
introduction into ML algorithms.

------
framebit
A thorough, intuitive grounding in statistics is crucial, IMO.

Doing any kind of ML means questioning all the assumptions that go into your
results and understanding how those assumptions could affect the outcome. That
process starts in stats.

------
bjourne
It depends on what "pursuing ML/AI" means. I've written a recommendation
engine with barely understanding linear algebra and a spam filter without
knowing Bayes theorem. A programmer can work on ML systems without having a
solid foundation in higher maths. However, if you want to develop your own
solutions then you surely need the math.

I would recommend reading Toby Segaran's Programming Collective Intelligence:
[http://shop.oreilly.com/product/9780596529321.do](http://shop.oreilly.com/product/9780596529321.do)

------
ChadyWady
For ML, the other users gave a good coverage of topics. But AI is an
incredibly broad field, and each specialty uses different math topics.
Learning all of the math would be infeasible. What are your particular
interests?

Russell and Norvig have a good book at
[http://aima.cs.berkeley.edu](http://aima.cs.berkeley.edu) that covers many
different topics in AI, although it is definitely not comprehensive. I would
say that whatever you learn in an undergraduate CS degree would give you a
good starting point for learning any particular AI topics.

------
septimus111
Not knowing anything about you, I'll assume that

\- you are starting with the equivalent of a high school level of maths

\- you want to take a ML course or read an ML book without feeling totally
lost

As some commenters have said, Calculus, Probability and Linear Algebra will be
very helpful.

Some people like to recommend the "best" or "most important" books which you
"should" read, but there is a strong chance these will end up sitting on a
bookshelf, barely touched. So I will recommend some books which are perhaps
more accessible.

\- Calculus by Gilbert Strang

\- Linear Algebra by Gilbert Strang

For Probability: I don't have any favourites, sorry.

------
pramalin
You can also dive in first and then cover the math behind ML, by taking Andrew
Ng's courses. [https://www.coursera.org/learn/machine-
learning](https://www.coursera.org/learn/machine-learning)
[https://www.coursera.org/specializations/deep-
learning](https://www.coursera.org/specializations/deep-learning)

------
deepnotderp
Basic high school calculus and linear algebra is really the only _required_
thing.

I would recommend probability theory and statistics as well.

------
e19293001
You can learn the required maths along the way through Andrew Ng's deep
learning course at coursera.

------
pveierland
The following is a concise and good explanation of necessary knowledge of
information theory:

[http://colah.github.io/posts/2015-09-Visual-
Information/](http://colah.github.io/posts/2015-09-Visual-Information/)

------
jules
* Calculus

* Linear algebra

* Optimisation

* Probability

Various universities have very good course content freely available online,
often including textbook recommendations, course notes, exercises, sample
exams, and video lectures. Realistically it is probably going to be quite
difficult to learn this on your own.

------
ronald_raygun
I'd say everything in this list is good to know
[http://pages.cs.wisc.edu/~tdw/files/cookbook-
en.pdf](http://pages.cs.wisc.edu/~tdw/files/cookbook-en.pdf)

~~~
roumenguha
I believe there's a more recent version of this document available here:
[http://statistics.zone](http://statistics.zone)

------
wickedgamer
Calculus Functions Derivatives Integration Analytic Geometry That's all I
think *[http://shrugemojis.com/shrug-emoji/](http://shrugemojis.com/shrug-
emoji/)

------
dtjon
Probability, and thus multivariate calculus and partial differential
equations. Linear algebra. Convex Optimization, and thus multivariate and
partial differential equations. Some principals of statistics is usually
helpful

~~~
jules
Why do you need partial differential equations? I don't think you necessarily
need any knowledge of differential equations to do ML, though the top ML
people certainly would know it because of their general math education.

~~~
septimus111
I spent a lot of time messing with PDEs as a student but sadly that knowledge
hasn't been very useful - I've only seen them come up in quite specialised
areas like optical flow...

------
EternalData
Some people have had a more comprehensive view on this -- if I were to focus
on one field of math to understand really well though, it'd be statistical
reasoning and the understanding of probability and uncertainty.

------
bitL
Calculus (preferably both multi-variate and discrete), probability,
statistics, operations research, graph theory, topology, computational
complexity. All depends on how deep you'd like to go.

~~~
clircle
Discrete calculus? I think you mean univariate.

------
wickedgamer
Yet it depends. Theres a lot out there on google one could learn.
[http://fitnessjab.com/](http://fitnessjab.com/)

------
master_yoda_1
There are two way to approach ML/AI

1) First read all the prerequisites and then work on a problem

2) Start working on a problem and learn all the math ML/AI as you need

The second option works best.

------
graycat
Part I

(1) Calculus

Generally should have college freshman and sophomore calculus.

(1.1) Functions

So, there can understand better what a _function_ is. E.g., function

    
    
         f(x) = 3x^2 + 1.
    

(1.2) Derivatives

Then will learn how to find the slope of the graph of a function. That is the
_derivative_ of the function. E.g., for function f with f(x) = 3x + 2, as in
high school algebra, the slope is 3. Then for each x, the derivative of f at x
is just 3.

The derivative of function f is denoted by either of

    
    
         f'(x) = d/dx f(x)
    

E.g., for function f(x) = 3x^2 + 1 it turns out that

    
    
         f'(x) = 6x.
    

(1.3) Integration

For function

    
    
         g(x) = 6 x
    

maybe we want to know what function f(x) will give us

    
    
         f'(x) = g(x)
    

Finding such a function f is _anti-differentiation_ , that is, undoes
differentiation. So, sure,

    
    
         f(x) = 3x^2 + C
    

for any constant C.

Such anti-differentiation is also the way to find the area under a curve. So,
can use that to find the area of a circle, volume of a cylinder, etc. Doing
that the anti-differentiation is _integration_.

The fundamental theorem of calculus shows how differentiation and integration
are related.

(1.4) Analytic Geometry

Commonly taught at the beginning of a calculus course is _analytic_ geometry.

So, take a cone an cut it. Then the cut surfaces will be one of a circle, an
ellipse, a parabola, a hyperbola, or just two crossed straight lines. So,
those curves are from a cone and are the _conic sections_.

There is some simple associated algebra.

Conic sections are important off and on; e.g., applied math is awash in
circles; the planets move in ellipses; a baseball moves in a parabola or
nearly so; an electron moving toward a negative charge will turn away from
that charge in a hyperbola.

It turns out that in linear algebra (below) circles and ellipses are
important.

(1.5) Role of Calculus

Calculus was invented by Newton as part of working with force and acceleration
for understanding the motion of the planets.

E.g., if at time t function d(t) gives distance traveled, then function v(t) =
d'(t) is the velocity at time t and function a(t) = v'(t) is the acceleration
at time t.

Then Newton's second law is

    
    
         F(t) = m a(t)
    

where F(t) is the force at time t applied to mass m.

Calculus is the first approach to the analysis of continuous change and is a
pillar of civilization.

Knowledge of calculus will commonly be assumed in work in ML/AL, data science,
statistics, optimization, applied math, engineering, etc.

E.g., a lot in ML, AI, and data science is getting best fits to data; best
fitting is to minimize errors in the fit; such minimization is mostly a
calculus problem; one of the main steps in ML is steepest descent, and that is
from a derivative.

Probability theory (e.g., evaluating coin tossing, poker hands, accuracy in
ML) will be important in ML/AI, etc.; two of the basic notions in probability
are cumulative distributions and density distributions; the cumulative is from
an integration, and the density is from a differentiation.

~~~
graycat
Part II

(2) Linear Algebra

(2.1) Linear Equations

The start of linear algebra was seen in high school algebra, solving systems
of linear equations.

E.g., we seek numerical values of x and y so that

    
    
         3 x - 2 y = 7
    
         -x  + 2 y = 8
    

So, that is two equations in the two unknowns x and y.

Well, for positive integers m and n, we can have m linear ( _linear_ is in the
above example but omitting here a careful definition) equations in n unknowns.

Then depending on the constants, there will be none, one, or infinitely many
solutions.

E.g., likely the central technique of ML and data science is fitting a linear
equation to data. There the central idea is the set of _normal equations_
which are linear (and, crucially, _symmetric_ and _non-negative semi-definite_
as covered carefully in linear algebra).

(2.2) Gauss Elimination

The first technique for attacking linear equations is Gauss elimination. There
can determine if there are none, one, or infinitely many solutions. For one
solution, can find it. For infinitely many solutions can find one solution and
for the rest characterize them as from arbitrary values of several of the
variables.

(2.3) Vectors and Matrices

A nice step forward in working with systems of linear equations is the subject
of vectors and matrices.

A good start is just

    
    
         3 x - 2 y = 7
    
         -x  + 2 y = 8
    

we saw above. What we do is just rip out the x and y, call that pair a
_vector_ , leave the constants on the left as a _matrix_ , and regard the
constants on the right side as another vector. Then the left side becomes the
matrix theory _product_ of the matrix of the constants and the vector of the
unknowns x and y.

The matrix will have two rows and two columns written roughly as in

    
    
       /         \
       |  3  - 2 |
       |         |
       | -1    2 |
       \         /
    

So, this matrix is said to be 2 x 2 (2 by 2).

Sure, for positive integers m and n, we can have a matrix that is m x n (m by
n) which means m rows and n columns.

The vector of the unknowns x and y is 2 x 1 and is written

    
    
       /   \
       | x |
       |   |
       | y |
       \   /
    

So, we can say that the matrix is A; the unknowns are the _components_ of
vector v; the right side is vector b; and that the system of equations is

    
    
         Av = b
    

where the Av is the matrix product of A and v. How is this product defined? It
is defined to give us just what we had with the equations we started with --
here omitting a careful definition.

So, we use a matrix and two vectors as new notation to write our system of
linear equations. That's the start of matrix theory.

It turns out that our new notation is another pillar of civilization.

Given a m x n matrix A and an n x p matrix B, we can form the m x p matrix
product AB. Amazingly, this product is associative. That is, if we have p x q
matrix C then we can form m x q product

ABC = (AB)C = A(BC)

It turns out this fact is profound and powerful.

The proof is based on interchanging the order two summation signs, and that
fact generalizes.

Matrix product is the first good example of a _linear operator_ in a _linear
system_. The world is awash in linear systems. There is a lot on linear
operators, e.g., Dunford and Schwartz, _Linear Operators_. Electronic
engineering, acoustics, and quantum mechanics are awash in linear operators.

To build a model of the real world, for ML, AL, data science, ..., etc., the
obvious first cut is to build a linear system.

And if one linear system does not fit very well, then we can use several in
patches of some kind.

(2.4) Vector Spaces

For the set of real numbers R and a positive integer n, consider the set V of
all n x 1 vectors of real numbers. Then V is a _vector space_. We can write
out the definition of a vector space and see that the set V does satisfy that
definition. That's the first vector space we get to consider.

But we encounter lots more vector spaces; e.g., in 3 dimensions, a 2
dimensional plane through the origin is also a vector space.

Gee, I mentioned _dimension_ ; we need a good definition and a lot of
associated theorems. Linear algebra has those.

So, for matrix A, vector x, and vector of zeros 0, the set of all solutions x
to

Ax = 0

is a vector space, and it and its dimension are central in what we get in many
applications, e.g., at the end of Gauss elimination, fitting linear equations
to data, etc.

(2.5) Eigen Values, Vectors

 _Eigen_ in German translates to English as special, unique, singular, or some
such.

Well, for a n x n matrix A, we might have that

Ax = lx

for number l. In this case what matrix A does to vector x is just change its
length by l and keep its direction the same. So, l and x are quite special.
Then l is an _eigenvalue_ of A, and x is a corresponding _eigenvector_ of A.

These eigen quantities are central to the crucial singular value
decomposition, the polar decomposition, principal components, etc.

(2.6) Texts

A good, now quite old, intermediate text in linear algebra is by Hoffman and
Kunze, IIRC now available for free as PDF on the Internet.

A special, advanced linear algebra text is P. Halmos, _Finite Dimensional
Vector Spaces_ written in 1942 when Halmos was an assistant to John von
Neumann at the Institute for Advanced Study. The text is an elegant finite
dimensional introduction to infinite dimensional Hilbert space.

At

[http://www.american.com/archive/2008/march-april-magazine-
co...](http://www.american.com/archive/2008/march-april-magazine-contents/why-
can2019t-a-woman-be-more-like-a-man/?searchterm=Sommers)

is an entertaining article about Harvard's course Math 55. At one time that
course used that book by Halmos and also, see below, Baby Rudin.

For more there is

Richard Bellman, _Introduction to Matrix Analysis_.

Horn and Johnson, _Matrix Analysis_.

There is much more, e.g., on numerical methods. There a good start is LINPACK,
the software, associated documentation, and references.

(5) More

The next two topics would be probability theory and statistics.

For a first text in either of these two, I'd suggest you find several leading
research universities, call their math departments, and find what texts they
are using for their first courses in probability and statistics. I'd suggest
you get the three most recommended texts, carefully study the most recommended
one, and use the other two for reference.

Similarly for calculus and linear algebra.

For more, that would take us into a ugrad math major. Again, make some phone
calls for a list of recommended texts. One of those might be

W. Rudin, _Principles of Mathematical Analysis_.

aka, "Baby Rudin". It's highly precise and challenging.

For more,

H. Royden, _Real Analysis_

W. Rudin, _Real and Complex Analysis_

L. Breiman, _Probability_

M. Loeve, _Probability_

J. Neveu, _Mathematical Foundations of the Calculus of Probability_

The last two are challenging.

For Bayesian, that's conditional expectation from the Radon-Nikodym theorem
with a nice proof by John von Neumann in Rudin's _Real and Complex Analysis_.

After those texts, often can derive the main results of statistics on your own
or just use Wikipedia a little. E.g., for the Neyman-Pearson result in
statistical hypothesis testing, there is a nice proof from the Hahn
decomposition from the Radon-Nikodym theorem.

~~~
laurus
I have been inspired by some of your past posts suggesting a path for studying
mathematics and doing graduate level work, and have changed my direction to
try and follow what you suggest. Is there any way I can get in touch with you
privately? (I'm not looking for help with specific technical questions if
you're concerned about that.)

~~~
yorwba
Are you doing the "Get the book. Read the book. Do the exercises." method? If
you are, what's your experience?

I have had some books stored up since forever, and graycat's post did motivate
me to finally get around to reading them, but I find it hard to integrate into
my daily routine. His 24h challenge killed my productivity for a day, and I
can't really afford to get distracted by some tricky proof when I'm supposed
to do something else.

~~~
laurus
Yes, I'm working through a few books that way. I didn't see his 24h challenge
so I'm not sure what it is, but what has been effective for me is blocking off
a few hours every day to work on this stuff. I haven't gotten to the really
difficult material he's talking about yet, but I'm looking forward to seeing
how this goes. Good luck to both of us!

~~~
yorwba
The exercises are here
[https://news.ycombinator.com/item?id=15022458](https://news.ycombinator.com/item?id=15022458)
(that post was downvoted & flagged to death, so you might have to turn on
showdead in your profile to see it)

In a different comment chain on the same submission
([https://news.ycombinator.com/item?id=15024640](https://news.ycombinator.com/item?id=15024640)),
he challenged the commenters disagreeing with him to do these exercises in 24
hours. The tone was pretty abrasive, TBH, but I found the questions
interesting enough that I tackled them in earnest.

I posted my solution attempts, so don't scroll down too far if you want to try
them on your own ;)

------
bootcat
Probability and Statistics to begin with !

------
bluetwo
Not a mention so far about game theory or Nash equilibrium.

I'm no expert but does anyone think these apply?

~~~
srean
It very much does. Boosting, a one of the best off the shelf ensemble
classifier is derived from a game theoretic formulation. Besides that there is
this huge body of literature about prediction under non-probabilistic sequence
of test cases. This line of work is primarily held up by game theoretic
arguments and that of online convex optimization.

~~~
bluetwo
I wasn't familiar with Boosting. Now off the read some articles.

------
dekhn
Linear algebra, probability, and tree/graphs.

------
mindhash
demystified has good series on calculus and linear algebra. Its light weight

------
blubb-fish
just start learning and you will see what math comes along ;)

------
adamnemecek
Stanford EE263 is very spicy
[http://ee263.stanford.edu](http://ee263.stanford.edu)

------
proofofstake
> What maths must be understood to enable pursuit of either of the above
> fields?

None.

> Are there any seminal texts/courses/content which should be consumed before
> starting?

No.

You don't need to know binary to start being a programmer/developer either.
Just start already. As long as you are not in charge of a medical diagnosis or
financial model, you don't get any drawback in experimenting (and failing
miserably).

Assuming applied ML, the most difficult part will be the human-political
business element of it: People not understanding your model or using its
output correctly, bias, feedback loops, acquiring enough resources, etc. The
more you can explain to them, without resorting to heavy maths, the better
communicator you are.

That said, it can't hurt to do Ng's Coursera course (a lot of top performers
started out with this course). Learning from Data by Caltech's Abu-Mostafa
goes very wide on machine learning. "Programming Collective Intelligence" is
a, somewhat dated, good book.

As for seminal texts, the field is too wide for this. A better bet is: Find a
professor in the field you are interested in. Say "Deep Learning", you could
have a look at LeCun, Hinton, Schmidhuber, Bengio, ... Now look at their PhD-
students, their papers, their courses, their conference talks, their software,
their current research. Basically become a student under the most
authoritative professor in the subfield you can find and resonate with,
without ever paying any university tuition or them knowing you exist. This is
very possible these days.

But by all means: Just start out. Machine learning is fun. Learning about dry
100 year old maths not so much. Make mistakes. Learn to detect and avoid
overfit. Find out if you are passionate and curious about parts of the field,
then the theory will come eventually. A lot of the time these questions seem
to demand answers like: "You need a PhD-level understanding of mathematics"
Just so your brain can go: "I am not good enough for this, so let's look at
something easier". Don't use this as an excuse. Start making intelligent
stuff. There are 16-year-olds on Kaggle routinely beating maths PhD's.

Also remember that, despite the current trend of calling everything "AI", that
AI is a very wide field, of which mathematics is only a small part. There is
philosophy, linguistics, cognitive science, physics, neuroscience, psychology,
computer science, robotics, logic, ... all these parts vary wildly in their
prerequisite maths knowledge.

