Hacker News new | past | comments | ask | show | jobs | submit login
Algebra, Topology, Differential Calculus, and Optimization Theory for CS and ML [pdf] (upenn.edu)
319 points by adamnemecek on Oct 13, 2019 | hide | past | favorite | 58 comments

Giant, broad sets of notes like these are very satisfying for the writer -- not so much for the reader. I should know, since I have a similar 1500 page work on the foundations of physics, and nobody's ever found it comprehensible. [1] I think a lot of academics have these personal tomes lying around, though most haven't bothered to type them up.

I see a lot of comments asking how one should read these things, but the answer is that one shouldn't. These documents are more like diaries than textbooks. They'll show you the scaffolding of somebody else's technical mind, not how to build your own. The best textbooks are always focused, and honed by teaching experience.

1: https://knzhou.github.io/#lectures

Cal Newport has written about experimenting with this; he calls it the Textbook method[0][1].

[0] https://www.calnewport.com/blog/2012/08/10/you-know-what-you...

[1] https://www.calnewport.com/blog/2012/08/16/experiments-with-...

I don't know about the whole book or how it compares to other textbooks, but as someone who didn't really know what exactly Groups, et al. were, I found that first page of Chapter 2 to be very clear and to-the-point. I'm sure not everyone learns the same way, but the way that first page didn't beat around the bush at explaining what Groups, Monoids, and Abelian Groups are seems to really appeal to me.

Then again, maybe I just haven't experienced better material on that topic. Do you have any suggestions?

Short segments of a massive tome can be good! Honestly all decent expositions of groups are basically equally good, though different styles can still appeal more depending on the person.

The problem is when people try to use the whole book for some goal -- whatever that goal is, it's never an efficient route.

Since it mentions CS I feel criticism is fair: I have a CS degree from CMU and this material is simply incomprehensible to me. It's not only written from a mathematical perspective but also aimed at mathematicians, and not computer scientists.

Unfortunately, math textbooks that are both immediately useful and instantly accessible to computer scientists are few and far between. Why doesn't anyone strive to cover that gap? I find that the major point of contention with most math textbooks is their tendency to pile one definition after another without showing me how that definition is useful in practice. I have a limit in how many abstract mathematical structures and theorems I can absorb before my brain simply wipes them out since it deems them to be useless.

For that reason, the best math textbooks for me have been grounded in practicality (and are not really math textbooks). I learned calculus from Spivak, abstract algebra from Antoine Joux and analysis from Sussman.

Where's Chapter 1? And wouldn't it be great if huge textbooks like these had a detailed dependency graph, so you could easily build up through a sequence of topic modules to reach something that interests you.

A while ago I did this for a few analysis text books that I was working through: Baby Rudin, Big Rudin, Calculus on Manifolds, and a couple of others.

As I was working through the books I would capture every numbered definition or theorem in a table, along with all of the explicit cross-references or implicit dependencies. Then I wrote a little D3 app to visualise the graph, with a few customisable views. E.g what is the dependency structure of the chapters? What is the full list of dependencies for a given theorem?

The result is actually pretty interesting, and the different levels of rigor that different authors choose really comes through. I would love to have a few weeks to sink into this project properly and make a public tool out of it.

When I was writing my undergraduate honors paper on group representations, I tried something like your dependency structure! Result: Mostly all I got was just a simple sequence with nothing interesting or surprising.

For that nearly 2000 page PDF, actually I know enough about the contents to suspect that again a dependency structure graph, especially for anyone with a background in Baby Rudin (Principles of Mathematical Analysis), Big Rudin (Real and Complex Analysis or maybe Functional Analysis), surprisingly, would not look very interesting.

From just going through the table of contents, there are a lot of interesting and uncommon topics in those ~2000 pages, but the book looks more like a small encyclopedia or reference than an integrated whole.

And for something that long and in places with some rare topics, there is, surprisingly, especially for the goal of ML (machine learning), comparatively little on probability, stochastic processes, and mathematical statistics.

But in those many pages, there are many topics possibly valuable and not easy to find elsewhere, so keep a copy of the PDF and use it as a reference when appropriate.

Here is a short remark for readers without much background: Don't be afraid to dig in nearly anywhere in the ~2000 pages; maybe you will have to backtrack a little but in general can start nearly anywhere and don't have to read sequentially chapter 1, 2, ....

Warning: Some topics are covered relatively superficially while others have some really rare details.

Broad, Curious Point: The ~2000 pages may be this and that, but it is relatively close to the applied mathematics of operations research and numerical analysis, closer to those fields than what is usually regarded as computer science until computer science wants to absorb the math of operations research and numerical analysis. For more, some of the numerical analysis was, as I recall, mostly of interest for numerical solutions of partial differential equations. That field is also to be subsumed by computer science?

> Mostly all I got was just a simple sequence with nothing interesting or surprising.

It absolutely depends on the text of course. One one end of the spectrum you have something like Euclid's Elements, where a DAG representation seems to be very well suited to spirit of the text. Of the books that I explored, Baby Rudin came the closest to this ideal. While many of the individual chapters contains long runs of mostly linear development, the interdependence of the chapters is complex, and apparently very carefully considered.

You might be right about the linked textbook. I suspect that there would be a core linear algebra subgraph that most of the rest of the book depends on, but otherwise little else in the way of complex structure.

One thing that occurred to me was that the reductio ad absurdum of this ends up being something like mathematical logic. As you try to capture the dependency structure with more and more precision, you find yourself trying to build a proof assistant.

Still, I think that there's something to be said for studying the structure of great textbooks purely as cultural artifacts.

> Of the books that I explored, Baby Rudin came the closest to this ideal. While many of the individual chapters contains long runs of mostly linear development, the interdependence of the chapters is complex, and apparently very carefully considered.

Naw! Take another pass through Baby Rudin!

Here is some help: With appropriate mild assumptions, a continuous, real value function on a compact set is uniformly continuous. For the finite dimensional space of interest, compact is the same as closed and bounded. That is the main content of the first chapters.

First Application: With uniform continuity, get to show that the Riemann and Riemann Stieltjes sums converge so that those integrals do exist.

Second Application: The uniform limit of a sequence of continuous functions is continuous. This was a question on my Analysis Ph.D. qualifying exam -- I got it! Thus a certain normed space is complete, that is, is a Banach space -- I'm not sure this remark is in the book.

The chapter(s) on sequences and series is used to define ln, sin, cosine so that can do Fourier series.

The exterior algebra is essentially separate.

That's the main stuff.

Don't take the Riemann integral very seriously: The good stuff is Lebesgue's integral (partitions the range instead of the domain) as in the first half of Rudin's Real and Complex Analysis. There also get to see Banach space, Hilbert space, von Neumann's novel proof of the Radon-Nikodym theorem (grown up version of the fundamental theorem of calculus and with astounding power) and the Fourier integral. I won't take off points if you don't read the second half.

Rudin's Functional Analysis -- get to see distributions.

I don't doubt your expertise, and no doubt I still have a lot to learn, but I have to admit I find it off-putting that you immediately assumed a position of talking down to me.

It's not at all clear to me what part of my comment you are correcting.

"What part"?

You wrote

> Of the books that I explored, Baby Rudin came the closest to this ideal. While many of the individual chapters contains long runs of mostly linear development, the interdependence of the chapters is complex, and apparently very carefully considered.

and I removed what was "complex".

Many students hear bad things about Baby Rudin and don't try it. Of students who do try it, too many don't finish well. Of the students who do finish, only a small fraction take a second pass where they will get some of the clarity I typed in for you.

For my Ph.D., there were five qualifying exams. I did the best on four of them, and one of those was Analysis and fairly close to Baby Rudin. The department didn't offer a course to help students prepare for the Analysis exam. So, I was likely the only student who had taken two passes through Baby Rudin; the first pass was in a course; I did okay; the second pass was on my own and slowly; it was fun! The second pass is the main reason I did the best on that exam. The next year the department tried teaching a course from Baby Rudin: They did that for only one year; I can believe that few or none of the students did well with the course. A friend from another department wanted to learn Baby Rudin, took the course, had a hard time, and dropped it. He went away unhappy, and that was sad and not really necessary.

So, my experience with Baby Rudin suggests (i) take a formal course, (ii) use advice such as I gave here to see during the course some clarity in what is going on, how, and why; (iii) also study or glance at one or two other competitive or similar books, e.g., Spivak, Calculus on Manifolds, Fleming, Functions of Several Variables, Kolmogorov and Fomin, Apostol, and more, and (iv) take a second pass through Baby Rudin.

Net, Baby Rudin has discouraged a lot of students, too many. To many students, Baby Rudin looks forbiddingly severe and abstract; students can't figure out what the heck is going on, how, or why. E.g., your view was "complex" -- it can be "simple". With some help, e.g., as I typed in here for you, Baby Rudin can be okay.

You are misunderstanding my use of the word "complex".

I'm saying that after encoding the graph structure I found Baby Rudin contained more complexity (i.e. connectedness) than the other texts that I repeated the excersise for.

> is relatively close to the applied mathematics of operations research and numerical analysis

I know these areas well, and I would be hesitant to agree with this statement.

My Ph.D. from one of the world's best research universities is in essentially the math of operations research. My most important prof was a star student of E. Cinlar, long editor in chief of Mathematics of Operations Research. I've published peer-reviewed original research, taught graduate courses, and applied to US national security and business the math of operations research.

I stand behind my statement.

Then I would be doubly surprised -- perhaps you did not notice the lack of coverage of certain important OR areas like (mixed-) integer programming and the many advances therein? Or in nonlinear optimization, interior point methods? Active-sets? SSOCs? Constraint qualifications? Also pretty thin on theory of statistics, queuing theory, combinatorics, etc.

These are all fundamental OR topics but the book makes different choices on topics to cover. (to be fair, the book does not claim to target OR, so I'm not disagreeing with the book's premise, only yours).

I do think though that the book could benefit from feedback from specialists in each individual area. In some areas, I find the selection of topics to be a little bit "unconventional".

You are not disagreeing with what I stated. I said that the book was close to the math of OR (operations research), that is, much of the book is a subset of the math of OR. You outlined some major OR omissions in the book -- I agree fully. So, you are saying that the book does not cover OR, that its OR material is a proper subset of OR. Fine. I agree.

I thought his linear programming coverage was superficial. In particular, I didn't notice any coverage of linear programming for the problem of least cost flows on a network where each arc has a maximum capacity. The nice anti-cycling rule for the version of the simplex algorithm, with strongly feasible bases, is from W. Cunningham, long Chair at the Waterloo Department of Combinatorics and Optimization. It turns out that that algorithm makes a nice dent in the challenges of linear integer programming.

Yes, I saw not much on constraint qualifications. I'm the guy who showed that the Zangwill and Kuhn-Tucker constraint qualifications are independent for problems with only functional constraints -- my work also solved a problem stated but not solved in the famous Arrow, Hurwicz, Uzawa paper on constraint qualifications in mathematical economics.

Any really good book on linear programming is "close" to OR but can be attacked with your objection that it omits lots of OR.

>My most important prof was a star student of E. Cinlar

so you're 2 removed from someone whose authority on the matter we would trust?

>I've published peer-reviewed original research, taught graduate courses

lol so does every phd student. not a high bar were' talking about here.

Could you please stop being a jerk on HN? You've done it a lot, it's dismaying, and we've had to ask you before.

Please review the site guidelines: https://news.ycombinator.com/newsguidelines.html. Note that the first rule under Comments is "Be kind" and the second is "Don't be snarky".

For the information I presented, my qualifications are fine, as you in effect confirmed.

I'd love to learn more about this visualisation you produced. Can you say more? Is an example output or code available?

Sure. The code is a mess of scripts and initial work on a web interface. It is not currently in a form to release. It began as just an aid to help myself understand what I was studying. To get a sense of the "road map".

I collected the data in a spreadsheet for each text. The columns were "Id, Type, Name, Chapter, Description, Dependencies". Id is the margin reference from the textbook, Type is one of theorem, definition, corollary, etc., Name is for named theorems or definitions, Dependencies is a comma separated list of Ids.

I then had a small Python script that built up a NetworkX [1] representation. Initially I was using Cytoscape [2] to explore the graphs, but then started work on a web interface using dagre-d3 for automated graph layout [3]. This example [4] is very similar to the types of layouts it produced.

I found that a large textbook quickly grows into a big hairy ball, and that it is only useful if you are able to collapse and/or hide nodes intelligently. I.e. show the chapter dependency structure, or look at the sub-DAG involving a selection of nodes. I only got a little way into doing this when I got moved on to other distractions. Shazaam for bird song, or something like that...

[1] https://networkx.github.io/ [2] https://cytoscape.org/ [3] https://github.com/dagrejs/dagre-d3 [4] http://cs.brown.edu/people/jcmace/d3/graph.html?id=small.jso...

Is there any way we can stay "updated" on this? I'd be really interested in seeing this!

Not currently. I would certainly ShowHN if I ever develop it to such a point, but I honestly wouldn't say that this is likely.

Please feel free to run with it youself!

The introduction is always the last thing you write. Presumably Jean's not done yet.

Previous discussion: https://news.ycombinator.com/item?id=20570025


* This is dense

* This is hard to understand if you don't have an education in mathematics

* This is not really meant to be an "introduction to X" book unless you have the background knowledge

* Gallier (who imo is an awesome professor) really values rigor and not hiding anything behind hand-wavy statements. He is an incredibly serious mathematician + computer scientist (https://en.wikipedia.org/wiki/Jean_Gallier). Accordingly, this book (work in progress) tries to be incredibly detailed on the topics it chooses to cover (and may miss out on some areas of CS + ML solely bc it chooses detail over comprehensiveness).

The title is interesting because it purports to cover the areas of math relevant to CS and ML. So I took at a look at the topics covered, and my impression is that they were chosen by a generalist mathematician rather a specialist in each of the areas -- in optimization for instance, some of the topics would not be developed or weighted in quite in this fashion in a state-of-the-art course. Other subtopics serve purely theoretical interests -- either too theoretical or lack sufficient depth to be useful to the practitioner or the earnest researcher. I fear this may mislead some into thinking that the topics proposed are the primary ones in each area of study -- I would be wary of using this a trunk of a semantic tree to build knowledge on.

There seems to a lack of mathematical unification in the work except perhaps these were a collection of topics that were of interest to the author(s).

I want to say this could be a good handbook, but handbooks (e.g. engineering handbooks) are usually collections of pieces written by specialists in each subfield. I suspect that by attempting to cover such a large chunk of the mathematical landscape, some things may have been lost in the treatment (except in topics that the authors are specialists in). Still, it is a good effort and I congratulate the authors on the ambitious attempt but would advise readers to consult texts written by experts in each subfield to get a sense of what topics are important and what are not.

> my impression is that they were chosen by a generalist mathematician rather a specialist in each of the areas -- in optimization for instance, some of the topics would not be developed or weighted in quite in this fashion in a state-of-the-art course.

Yes. Compared with this book, the one semester course "Foundations of Optimization" I took was better balanced and usually deeper.

> I want to say this could be a good handbook

I see it as best as a handbook. So, e.g., if hear about some topic X, want to know more, and find a treatment in this book, then give the book a shot but don't take the book too seriously as the balanced view of an expert.

> advise readers to consult texts written by experts in each subfield to get a sense of what topics are important and what are not.

Yes. And not just for "a sense" but for the actual content. E.g., I would go to some of the books in my professional library for the numerical analysis, linear programming, non-linear programming, regression analysis, Hilbert space, exterior algebra, abstract algebra, Lagrangian techniques, etc. And in the table of contents I didn't see anything on network flows which, considering what else is in that book, is a curious omission.

But, again, in summary, if want to know about topic X that is in the book, then likely give the book a shot, at least for a few minutes.

One more point: For students who want careers in academic research, one reason, likely the most important reason, to study such advanced and specialized topics is as an introduction to some promising streams of research. Well, I'd try to avoid using this book much for that goal and, instead, look for better material from better sources.

> This is hard to understand if you don't have an education in mathematics

Right: Sure, some undergraduate math major who has been rushing ahead in math since, say, middle school, could study all of this book, but most of the book is mostly more advanced and/or specialized than undergraduate math majors will encounter.

But, yes, it would be wise to regard an undergraduate math major as a nearly necessary prerequisite for the book.

In that case, the 70+% of the academic computer science community will have a tough time having math background enough to make good use of this book.

A few years back I read Gallier's notes on matrix Lie groups, this was definitely the most accessible (yet complete) introduction to the subject I found.

Wow, this is epic! Nearly 2,000 pages, that's amazing.

Weird as it sounds, some of the happiest moments of my life come from deeply studying mathematics like this. When I finally cash out of the entrepreneur game, I'm looking forward to spending several decadent hours each day studying something like this again. Great fun.

You don't need to cash out. Just build a life where you can take sabbaticals often. That's what I've done. It let me raise both my kids, one to college age, one almost there, on my own.

I have different goals, my friend. But congrats on how you've set things up, sounds like it worked out great!

Not sure what your goals are, but I still make mid six figures.

Should there exist better methods for deciding where one should allocate one's available time within the vast landscape of expositional mathematical material at undergrad and graduate level?

Honest question: how do you tackle 2k pages of material like this?

I went to high school in Romania and first two years of college as well. We did tons of theory on about 2/3 (if not 3/4) of the material here (I also covered some of it when I continued my college in Montreal, Canada). But that took about 6 years of daily (maybe there was a day in the week when we didn't do math) study. And I'm sure we didn't cover it in the same depth as they do here (for most things we did a lot of proofs and theory). So it's possible but it takes a lot of time. In addition, it was usually motivated by the next step: you need to learn how to compute the inverse matrix in order to further process it with more complex algorithms which you actually use in labs or some sort of application (even if it's made up).

The interesting thing is that, for a Math student (or a college professor) this is the FOUNDATION of math you need in order to study advanced stuff. For some of the recent math proofs (think famous ones) you'd need to study a whole year (after you know most of this stuff) just to learn the theory foundations for that proof.

I'm of course exaggerating a bit - most Math majors would probably go through most of the stuff in this PDF but not necessarily all (machine learning stuff for example, and possibly numerical analysis stuff) of it. And if they do they usually forget* most of it, except the areas they use in their daily research (assuming they got for an advanced degree). And when you use some of this stuff every day it becomes second nature after a certain point.

My experience is mostly based on the Romanian educational system which is typically more hardcore science (Math and Physics) even for Computer Engineering degrees.

*Sometimes you don't forget stuff, you just don't think about it until you see it again (concrete example for me: Schur decomposition) and then partial memories of it come back.

There's no way you covered 2/3 of this in hs.

CIS nations don't fuck around when it comes to math ed. Why do you think there's so many Eastern Europeans in fintech?

I have first hand experience with mathematical education in post-communist countries. There's no way they did 2/3 of this in hs.

My process based on the book “Make it stick: the science of successful learning” and trial and error.

Before you begin, have a single source of truth for your daily goals whether its a calendar, todo list etc. I like a paper agenda notebook, my favorite layout is the emergent task planner from David Seah and the roterunner on amazon. A quick note on process: you want your schedule to be flexible and fault tolerant but specific, actionable and accountable. I like to block times of day for an activity and have a general idea of my goals for that day. I keep a big weekly todo list for each activity and chip away at it each week. You want to give yourself the least possible amount of choices each hour of your day but be able to tolerate errors in estimating time for a goal.

Write down a purpose, of what u want to get out of the book. Thing back to the learning objectives on a syllabus in college, these should be specific.

Look at the outline, take inventory of the topics you want to cover and their chapters/sections.

Now schedule some block of time in your calendar, agenda, etc each week to study the book.

Now when studying a chapter/section: first survey for a summary, practice questions, exercises etc. to get an overview.

Start by answering the exercises, making predictions about a section and generally pre testing your knowledge of the topic. Based on some psychological studies in the book, the more you challenge your mind the more connections your brain makes thus improving learning.

Next read through the section in order to fill in the gaps identified in your pre test phase, this purpose based reading improves retention. Dont underline, copy sections into notes, highlight etc these arent worth the time.

Now create a diagram, outline or other summary of the text for your notes without copying things from the book. Try to do it in your own words.

Finally, go back to the exercises you didnt complete or revisit your pretest questions and answer them. This greatly increases retention. Quizzing yourself is the best way to learn.

In summary the way you tackle 2k pages of material is the same way you do in school. Weekly work towards the goal with practice, clear objectives, creating study guides and outlines and testing yourself.

How do you eat an elephant? Piece by piece. it sounds stupid, but start at the beginning, and pace yourself. Chapter by chapter, you'll get through it.

I'm really poor at math, so I've learned to work slowly and check if I really understand something before I go to the next chapter. Most maths is based on previous work / understanding. So, when I rush or try to skip pieces, I misunderstand the next part, making my rushing completely useless.

Get a math degree and take a few grad courses and then use it as a reference. Alternative route: get an advanced degree in a relatively mathematical non-math STEM field and use it as a reference. I doubt there’s 1-2 people on the planet who are going to use that kind of book otherwise.

Use it as a reference

By accepting that you will spend two years studying it.

I don't think you do, in the same way no one learns the webster dictionary by rote.

OTOH, this is a fairly extensive and in-depth reference, which means that if - in the course of your studies or research - you hit on a topic you need to understand better, the book is here to be consulted on that specific topic.

And you won't get the "I don't understand that part" feeling when you read it because the part that's giving you a headache is likely explained somewhere else in the book.

Binary search.

Regard all of math as a tree based on essentially depth. Right, this is an over simplification since really have a directed graph of dependencies. But cheat a little and get a tree.

Then study the main trunk and larger branches.

For the deep roots, that is, the foundations, go light on those or will get confused, discouraged, and generally stuck-o. E.g., do some axiomatic set theory if really want; see some of the axiom of choice, but go light on the latest material on model theory and forcings.

Then with the trunk and larger branches, will be able, as circumstances require, get to nearly any of the leafs fairly quickly as required.

For that ~2000 page PDF, there, too, try to find the trunk and larger branches and mostly skip over the leafs. Maybe 80% of that book is leafs.

Finding the trunk and larger branches in that book could be challenging. So, instead, just stay close to the commonly recommended undergraduate math major texts and topics. For each such text, at least glance at the 2-3 leading competitive texts and, then, take what is in common as the trunk and larger branches. Also for each topic, pick the best looking treatment and then glance at the other treatments.

If you can, find a really good math prof in a really good math department at a really good university and try to chat with him about your progress for a cup of coffee (you pay) hopefully 4 times a year. If you can find more than one such prof, even better.

Some departments publish their old copies of their qualifying exams; might get some of those!

One more: Don't let yourself get stuck or discouraged. It's a fact of life that not all the materials are good. So there are some actual errors. Some exercises are just too darned hard, need material presented later in the book or not at all, or are just tangential curiosities not worth the trouble. Some of the writing can be obscure. Often there is not enough motivation to let you figure out what is important and what is not: If really obscure definition X or theorem Y seems not to be used in the rest of the book, then find some good reason to study X and Y or maybe delay them or just skip them. Sometimes obscure X, Y actually are important; sometimes not. There should be motivation, but commonly there is not.

If you can't after a good effort understand something or solve an exercise, don't assume that the difficulty is you or that you are leaving a dangerous hole in your knowledge. If later you decide you need that something, then return and give it another shot.

Generally for a really good, mature view of a book, just need more than one pass through it; so don't seek everything on just a first pass. Even given a really thorough first pass, you will notice significantly more on a second pass, e.g., some later material on the first pass will help you better understand the real points and importance of some earlier material on a second pass. And do glance at some alternative treatments.

One recommendation is to chew on a proof until you really see it thoroughly. A counter recommendation is just to notice that we are all short of time. Net, use some judgment.

Jean Gallier (the author, and a CS prof at Penn) is quite the character. Also a very nice guy who cares a lot of about teaching. W Wish I'd read (some of) this book before I started my PhD!

Nearly 2000 pages of dense material, mein Gott..

What happened to the introduction?

I'm guessing they'll write it last. The book is still being written.

It is a great handbook! Thanks a lot.

200 pages of theory to build up to ... SVM's??

TBBH, that's kind of disappointing.

I was hoping to see some sort of insight around deep learning stuff.

I don’t think this kind of insight even exists into deep learning stuff (yet). One of the main reasons people even care about SVMs is that they have nice analytical properties

Man...Math textbook/website/vid overload is making me weep for some reason. Most of this stuff hasn't changed in 200 years. Wtf do we need so many books for? Total failure of Math Pedagogy imho.

I think this is a fair topic to bring up. Between commercial textbooks and freely-distributed PDFs, there is a huge amount of expositional material for any given topic in sub-research level mathematics and one does wonder whether there could be a better method of clustering and ranking the material. (Clustering is probably very doable.)

Check out Wolfram Alpha

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact