
Algebra, Topology, Differential Calculus, and Optimization Theory for CS and ML [pdf] - adamnemecek
https://www.cis.upenn.edu/~jean/math-deep.pdf
======
knzhou
Giant, broad sets of notes like these are very satisfying for the writer --
not so much for the reader. I should know, since I have a similar 1500 page
work on the foundations of physics, and nobody's ever found it comprehensible.
[1] I think a lot of academics have these personal tomes lying around, though
most haven't bothered to type them up.

I see a lot of comments asking how one should read these things, but the
answer is that one shouldn't. These documents are more like diaries than
textbooks. They'll show you the scaffolding of somebody else's technical mind,
not how to build your own. The best textbooks are always focused, and honed by
teaching experience.

1: [https://knzhou.github.io/#lectures](https://knzhou.github.io/#lectures)

~~~
jolmg
I don't know about the whole book or how it compares to other textbooks, but
as someone who didn't really know what exactly Groups, et al. were, I found
that first page of Chapter 2 to be very clear and to-the-point. I'm sure not
everyone learns the same way, but the way that first page didn't beat around
the bush at explaining what Groups, Monoids, and Abelian Groups are seems to
really appeal to me.

Then again, maybe I just haven't experienced better material on that topic. Do
you have any suggestions?

~~~
knzhou
Short segments of a massive tome can be good! Honestly all decent expositions
of groups are basically equally good, though different styles can still appeal
more depending on the person.

The problem is when people try to use the _whole_ book for some goal --
whatever that goal is, it's never an efficient route.

------
armitron
Since it mentions CS I feel criticism is fair: I have a CS degree from CMU and
this material is simply incomprehensible to me. It's not only written from a
mathematical perspective but also aimed at mathematicians, and not computer
scientists.

Unfortunately, math textbooks that are both immediately useful and instantly
accessible to computer scientists are few and far between. Why doesn't anyone
strive to cover that gap? I find that the major point of contention with most
math textbooks is their tendency to pile one definition after another without
showing me how that definition is useful in practice. I have a limit in how
many abstract mathematical structures and theorems I can absorb before my
brain simply wipes them out since it deems them to be useless.

For that reason, the best math textbooks for me have been grounded in
practicality (and are not really math textbooks). I learned calculus from
Spivak, abstract algebra from Antoine Joux and analysis from Sussman.

------
calf
Where's Chapter 1? And wouldn't it be great if huge textbooks like these had a
detailed dependency graph, so you could easily build up through a sequence of
topic modules to reach something that interests you.

~~~
siddboots
A while ago I did this for a few analysis text books that I was working
through: Baby Rudin, Big Rudin, Calculus on Manifolds, and a couple of others.

As I was working through the books I would capture every numbered definition
or theorem in a table, along with all of the explicit cross-references or
implicit dependencies. Then I wrote a little D3 app to visualise the graph,
with a few customisable views. E.g what is the dependency structure of the
chapters? What is the full list of dependencies for a given theorem?

The result is actually pretty interesting, and the different levels of rigor
that different authors choose really comes through. I would love to have a few
weeks to sink into this project properly and make a public tool out of it.

~~~
graycat
When I was writing my undergraduate honors paper on group representations, I
tried something like your _dependency structure_! Result: Mostly all I got was
just a simple sequence with nothing interesting or surprising.

For that nearly 2000 page PDF, actually I know enough about the contents to
suspect that again a _dependency structure_ graph, especially for anyone with
a background in Baby Rudin ( _Principles of Mathematical Analysis_ ), Big
Rudin ( _Real and Complex Analysis_ or maybe _Functional Analysis_ ),
surprisingly, would not look very interesting.

From just going through the table of contents, there are a lot of interesting
and uncommon topics in those ~2000 pages, but the book looks more like a small
encyclopedia or reference than an integrated whole.

And for something that long and in places with some rare topics, there is,
surprisingly, especially for the goal of ML (machine learning), comparatively
little on probability, stochastic processes, and mathematical statistics.

But in those many pages, there are many topics possibly valuable and not easy
to find elsewhere, so keep a copy of the PDF and use it as a reference when
appropriate.

Here is a short remark for readers without much background: Don't be afraid to
dig in nearly anywhere in the ~2000 pages; maybe you will have to backtrack a
little but in general can start nearly anywhere and don't have to read
sequentially chapter 1, 2, ....

Warning: Some topics are covered relatively superficially while others have
some really rare details.

Broad, Curious Point: The ~2000 pages may be this and that, but it is
relatively close to the applied mathematics of operations research and
numerical analysis, closer to those fields than what is usually regarded as
computer science until computer science wants to _absorb_ the math of
operations research and numerical analysis. For more, some of the numerical
analysis was, as I recall, mostly of interest for numerical solutions of
partial differential equations. That field is also to be subsumed by computer
science?

~~~
siddboots
> Mostly all I got was just a simple sequence with nothing interesting or
> surprising.

It absolutely depends on the text of course. One one end of the spectrum you
have something like Euclid's Elements, where a DAG representation seems to be
very well suited to spirit of the text. Of the books that I explored, Baby
Rudin came the closest to this ideal. While many of the individual chapters
contains long runs of mostly linear development, the interdependence of the
chapters is complex, and apparently very carefully considered.

You might be right about the linked textbook. I suspect that there would be a
core linear algebra subgraph that most of the rest of the book depends on, but
otherwise little else in the way of complex structure.

One thing that occurred to me was that the reductio ad absurdum of this ends
up being something like mathematical logic. As you try to capture the
dependency structure with more and more precision, you find yourself trying to
build a proof assistant.

Still, I think that there's something to be said for studying the structure of
great textbooks purely as cultural artifacts.

~~~
graycat
> Of the books that I explored, Baby Rudin came the closest to this ideal.
> While many of the individual chapters contains long runs of mostly linear
> development, the interdependence of the chapters is complex, and apparently
> very carefully considered.

Naw! Take another pass through Baby Rudin!

Here is some help: With appropriate mild assumptions, a continuous, real value
function on a compact set is uniformly continuous. For the finite dimensional
space of interest, compact is the same as closed and bounded. That is the main
content of the first chapters.

First Application: With uniform continuity, get to show that the Riemann and
Riemann Stieltjes sums converge so that those integrals do exist.

Second Application: The uniform limit of a sequence of continuous functions is
continuous. This was a question on my Analysis Ph.D. qualifying exam -- I got
it! Thus a certain normed space is _complete_ , that is, is a Banach space --
I'm not sure this remark is in the book.

The chapter(s) on sequences and series is used to define ln, sin, cosine so
that can do Fourier series.

The exterior algebra is essentially separate.

That's the main stuff.

Don't take the Riemann integral very seriously: The good stuff is Lebesgue's
integral (partitions the range instead of the domain) as in the first half of
Rudin's _Real and Complex Analysis_. There also get to see Banach space,
Hilbert space, von Neumann's novel proof of the Radon-Nikodym theorem (grown
up version of the fundamental theorem of calculus and with astounding power)
and the Fourier integral. I won't take off points if you don't read the second
half.

Rudin's _Functional Analysis_ \-- get to see distributions.

~~~
siddboots
I don't doubt your expertise, and no doubt I still have a lot to learn, but I
have to admit I find it off-putting that you immediately assumed a position of
talking down to me.

It's not at all clear to me what part of my comment you are correcting.

~~~
graycat
"What part"?

You wrote

> Of the books that I explored, Baby Rudin came the closest to this ideal.
> While many of the individual chapters contains long runs of mostly linear
> development, the interdependence of the chapters is complex, and apparently
> very carefully considered.

and I removed what was "complex".

Many students hear bad things about Baby Rudin and don't try it. Of students
who do try it, too many don't finish well. Of the students who do finish, only
a small fraction take a second pass where they will get some of the clarity I
typed in for you.

For my Ph.D., there were five qualifying exams. I did the best on four of
them, and one of those was Analysis and fairly close to Baby Rudin. The
department didn't offer a course to help students prepare for the Analysis
exam. So, I was likely the only student who had taken two passes through Baby
Rudin; the first pass was in a course; I did okay; the second pass was on my
own and slowly; it was fun! The second pass is the main reason I did the best
on that exam. The next year the department tried teaching a course from Baby
Rudin: They did that for only one year; I can believe that few or none of the
students did well with the course. A friend from another department wanted to
learn Baby Rudin, took the course, had a hard time, and dropped it. He went
away unhappy, and that was sad and not really necessary.

So, my experience with Baby Rudin suggests (i) take a formal course, (ii) use
advice such as I gave here to see during the course some clarity in what is
going on, how, and why; (iii) also study or glance at one or two other
competitive or similar books, e.g., Spivak, _Calculus on Manifolds_ , Fleming,
_Functions of Several Variables_ , Kolmogorov and Fomin, Apostol, and more,
and (iv) take a second pass through Baby Rudin.

Net, Baby Rudin has discouraged a lot of students, too many. To many students,
Baby Rudin looks forbiddingly severe and abstract; students can't figure out
what the heck is going on, how, or why. E.g., your view was "complex" \-- it
can be "simple". With some help, e.g., as I typed in here for you, Baby Rudin
can be okay.

~~~
siddboots
You are misunderstanding my use of the word "complex".

I'm saying that after encoding the graph structure I found Baby Rudin
contained more complexity (i.e. connectedness) than the other texts that I
repeated the excersise for.

------
abhisuri97
Previous discussion:
[https://news.ycombinator.com/item?id=20570025](https://news.ycombinator.com/item?id=20570025)

Tl;dr:

* This is dense

* This is hard to understand if you don't have an education in mathematics

* This is not really meant to be an "introduction to X" book unless you have the background knowledge

* Gallier (who imo is an awesome professor) really values rigor and not hiding anything behind hand-wavy statements. He is an incredibly serious mathematician + computer scientist ([https://en.wikipedia.org/wiki/Jean_Gallier](https://en.wikipedia.org/wiki/Jean_Gallier)). Accordingly, this book (work in progress) tries to be incredibly detailed on the topics it chooses to cover (and may miss out on some areas of CS + ML solely bc it chooses detail over comprehensiveness).

~~~
wenc
The title is interesting because it purports to cover the areas of math
relevant to CS and ML. So I took at a look at the topics covered, and my
impression is that they were chosen by a generalist mathematician rather a
specialist in each of the areas -- in optimization for instance, some of the
topics would not be developed or weighted in quite in this fashion in a state-
of-the-art course. Other subtopics serve purely theoretical interests --
either too theoretical or lack sufficient depth to be useful to the
practitioner or the earnest researcher. I fear this may mislead some into
thinking that the topics proposed are the primary ones in each area of study
-- I would be wary of using this a trunk of a semantic tree to build knowledge
on.

There seems to a lack of mathematical unification in the work except perhaps
these were a collection of topics that were of interest to the author(s).

I want to say this could be a good handbook, but handbooks (e.g. engineering
handbooks) are usually collections of pieces written by specialists in each
subfield. I suspect that by attempting to cover such a large chunk of the
mathematical landscape, some things may have been lost in the treatment
(except in topics that the authors are specialists in). Still, it is a good
effort and I congratulate the authors on the ambitious attempt but would
advise readers to consult texts written by experts in each subfield to get a
sense of what topics are important and what are not.

~~~
graycat
> my impression is that they were chosen by a generalist mathematician rather
> a specialist in each of the areas -- in optimization for instance, some of
> the topics would not be developed or weighted in quite in this fashion in a
> state-of-the-art course.

Yes. Compared with this book, the one semester course "Foundations of
Optimization" I took was better _balanced_ and usually deeper.

> I want to say this could be a good handbook

I see it as best as a _handbook_. So, e.g., if hear about some topic X, want
to know more, and find a treatment in this book, then give the book a shot but
don't take the book too seriously as the balanced view of an expert.

> advise readers to consult texts written by experts in each subfield to get a
> sense of what topics are important and what are not.

Yes. And not just for "a sense" but for the actual content. E.g., I would go
to some of the books in my professional library for the numerical analysis,
linear programming, non-linear programming, regression analysis, Hilbert
space, exterior algebra, abstract algebra, Lagrangian techniques, etc. And in
the table of contents I didn't see anything on network flows which,
considering what else is in that book, is a curious omission.

But, again, in summary, if want to know about topic X that is in the book,
then likely give the book a shot, at least for a few minutes.

One more point: For students who want careers in academic research, one
reason, likely the most important reason, to study such advanced and
specialized topics is as an introduction to some promising _streams_ of
research. Well, I'd try to avoid using this book much for that goal and,
instead, look for better material from better sources.

------
redsymbol
Wow, this is epic! Nearly 2,000 pages, that's amazing.

Weird as it sounds, some of the happiest moments of my life come from deeply
studying mathematics like this. When I finally cash out of the entrepreneur
game, I'm looking forward to spending several decadent hours each day studying
something like this again. Great fun.

~~~
cheez
You don't need to cash out. Just build a life where you can take sabbaticals
often. That's what I've done. It let me raise both my kids, one to college
age, one almost there, on my own.

~~~
redsymbol
I have different goals, my friend. But congrats on how you've set things up,
sounds like it worked out great!

~~~
cheez
Not sure what your goals are, but I still make mid six figures.

------
Myrmornis
Should there exist better methods for deciding where one should allocate one's
available time within the vast landscape of expositional mathematical material
at undergrad and graduate level?

------
jahbrewski
Honest question: how do you tackle 2k pages of material like this?

~~~
george_ciobanu
I went to high school in Romania and first two years of college as well. We
did tons of theory on about 2/3 (if not 3/4) of the material here (I also
covered some of it when I continued my college in Montreal, Canada). But that
took about 6 years of daily (maybe there was a day in the week when we didn't
do math) study. And I'm sure we didn't cover it in the same depth as they do
here (for most things we did a lot of proofs and theory). So it's possible but
it takes a lot of time. In addition, it was usually motivated by the next
step: you need to learn how to compute the inverse matrix in order to further
process it with more complex algorithms which you actually use in labs or some
sort of application (even if it's made up).

The interesting thing is that, for a Math student (or a college professor)
this is the FOUNDATION of math you need in order to study advanced stuff. For
some of the recent math proofs (think famous ones) you'd need to study a whole
year (after you know most of this stuff) just to learn the theory foundations
for that proof.

I'm of course exaggerating a bit - most Math majors would probably go through
most of the stuff in this PDF but not necessarily all (machine learning stuff
for example, and possibly numerical analysis stuff) of it. And if they do they
usually forget* most of it, except the areas they use in their daily research
(assuming they got for an advanced degree). And when you use some of this
stuff every day it becomes second nature after a certain point.

My experience is mostly based on the Romanian educational system which is
typically more hardcore science (Math and Physics) even for Computer
Engineering degrees.

*Sometimes you don't forget stuff, you just don't think about it until you see it again (concrete example for me: Schur decomposition) and then partial memories of it come back.

~~~
adamnemecek
There's no way you covered 2/3 of this in hs.

~~~
xenihn
CIS nations don't fuck around when it comes to math ed. Why do you think
there's so many Eastern Europeans in fintech?

~~~
adamnemecek
I have first hand experience with mathematical education in post-communist
countries. There's no way they did 2/3 of this in hs.

------
shademaan
Jean Gallier (the author, and a CS prof at Penn) is quite the character. Also
a very nice guy who cares a lot of about teaching. W Wish I'd read (some of)
this book before I started my PhD!

------
pizza
Nearly 2000 pages of dense material, mein Gott..

------
anarrayofatoms
What happened to the introduction?

~~~
adamnemecek
I'm guessing they'll write it last. The book is still being written.

------
xmly
It is a great handbook! Thanks a lot.

------
ur-whale
200 pages of theory to build up to ... SVM's??

TBBH, that's kind of disappointing.

I was hoping to see some sort of insight around deep learning stuff.

~~~
pmorais
I don’t think this kind of insight even exists into deep learning stuff (yet).
One of the main reasons people even care about SVMs is that they have nice
analytical properties

------
hos234
Man...Math textbook/website/vid overload is making me weep for some reason.
Most of this stuff hasn't changed in 200 years. Wtf do we need so many books
for? Total failure of Math Pedagogy imho.

~~~
Myrmornis
I think this is a fair topic to bring up. Between commercial textbooks and
freely-distributed PDFs, there is a huge amount of expositional material for
any given topic in sub-research level mathematics and one does wonder whether
there could be a better method of clustering and ranking the material.
(Clustering is probably very doable.)

