Suggestions for resources? (textbooks, videos, etc)
0) Algebra, Trigonometry, Calculus
Make sure you have a decent grasp over high school level math topics. You might not need to use these topics frequently (though trig comes up a surprising amount), but they are necessary to establish a base level of mathematical maturity.
1) Linear Algebra
Obviously very important if you want to do any 3d work, but it comes up in later topics like graph theory.
Linear Algebra and Its Application - Strang
2) Discrete Math
You need an understanding of proofs and logic before you can get to the real algorithms material. Counting and probability is also very important.
Discrete Mathematics and Its Applications - Kenneth Rosen
This looks like a good free option, but there are no exercises:
CLRS, but here's a free option: http://www.cs.berkeley.edu/~vazirani/algorithms.html
This should take you some time as each of these topics usually corresponds to a college class. When you finish with those, start reading about CS Theory, Combinatorics, or pick up a graduate Algorithms text.
There's a cute idea to put 100 wireless customers all on the same wavelength. Basically have a tricky antenna pattern with a lobe for each user and so that each user gets only their own signal. It's all trig.
Yes, all or nearly all! That the trig functions are the source of the most important orthogonal basis in Hilbert space theory need not be good to know just for physics or signal processing!
Let's see: Also I mentioned characteristic functions and Bochner's theorem which are core probability and not just physics or signal processing. Characterization of time invariant linear systems might be in mechanical engineering and might be tried even in economics! Seismic data analysis via the fast Fourier transform is, yes, in signal processing but is also mostly regarded as geology or just looking for oil or anything 'down there'.
I tried, as you can see, I really tried, to show how trig was for more than just your father's topics in physics and signal processing. I tried!
1. Reading the "aha!" books by Martin Gardner, as a child.
2. Reading Lockhart's "A Mathematician's Lament" 
3. Linear algebra, calculus, and complex analysis classes. I was taught
these at Cornell; you might look for them via MIT's OpenCourseWare .
4. A bunch of combinatorics, from Cornell's classes in probability.
You may also want to start in a completely different direction, since you're interested more in CS-based topics (I am interested more in physics-based topics) by starting with number theory and modular arithmetic, and of course MIT has a Comp Sci section  which also produces lectures on OpenCourseWare; you may wish to watch those lectures.
I haven't found many other books that explain calculus as well as that one. If you get that and then a book of exercises from Schaum's outlines, you'll be able to get pretty good at it. Also, the book is a really good read for a math book.
It takes you through all the relevant mathematics required to learn about basics of CS based topics you are looking for (algorithms, proofs and other practical applications).
It also acts as a good primer for learning basic Computer Algorithms. The USP of book is it's noob friendly and I think perfectly suited to your needs.
Check it out here: http://www.amazon.com/Think-About-Algorithms-Jeff-Edmonds/dp...
If you have some prior exposure to the SKI combinators, one big aha-moment may come from the book's exploration of several alternative axiom systems that, while preceding the SK system chronologically, are often left out of modern CS-oriented treatments of combinatory logic.
Also, while I think the exercises on symbol manipulation are not very enlightening on their own, the syntactic rewriting point of view is a useful complement if your view of combinatory logic has been mostly semantic. Proving completeness in the syntactic framework becomes a matter of showing that any combinatory rewriting rule can be expressed in terms of your base combinators. This in turn can be done if your combinators allow arbitrary permutations to be expressed in addition to duplication and dropping of symbols (cf. lengthening and shortening rules in GEB). For example, K is a dropping combinator, and rather than viewing S as a substitution operator on the semantic level (as is done when translating lambda calculus to SK calculus), it can be viewed as a hybrid permutation and duplication combinator at the syntactic level. If these dual roles are split into separate combinators, you get alternative axiom systems.
There's some really great stuff listed there, I use it all the time.
1. The Art and Craft of Problem Solving by Paul Zeitz.
2. How to Solve It by George Polya.
I'd also recommend learning any of the competition Mathematics topics like number theory, combinatorics or geometry as base material to improve your problem solving skills. I have also found the Schaum's series on Combinatorics and Graph Theory concise and useful.
More generally, if you're not already comfortable with linear algebra (a couple college semesters "or equivalent experience"), I'd recommend both  and  for two entirely different perspectives. For modern algebra more generally, I'm a huge fan of .
If I could only take one introductory mathematics book to a desert island, I'd cheat a bit and take , , and . While never directly involved in CS, Courant was very interested in the practical applications of theoretical mathematics, see, e.g.,  and, well, most everything else he wrote.
I could go on and on, so I'll stop here.
It then follows something similar to a university-level curriculum in Europe (continental, perhaps not UK), which might lead all the way to research maths, but perhaps it is not the most appropriate path for computer science because it entirely misses the mathematical concept of computability.
Computer Science: http://aduni.org
Another vote for khanacademy. I did this myself, and worked out a plan to put every piece of information into long term memory. When you're motivated and not limited by century-old university bureaucracy, you can move very quickly, and run circles around people with a "real" degree, most of whom have forgotten everything but the general concepts.
Udacity is also coming up as an excellent resource, but the important thing to realize is that every resource adds a different perspective on the matter that is extremely valuable.
For CS-specific information, aduni.org is a glorious resource. It's from 2001, but the CS basics never change. I'd recommend getting a handle on all the math from khanacademy so that you understand everything they're referencing.
Good place to start would be to go through khan academy's probability videos.
Then you can move to a discrete math book like Discrete Mathematics with Applications or Mathematics for Computer Science (free pdf courses.csail.mit.edu/6.042/fall10/mcs-ftl.pdf) and any Calculus book of your liking.
There are also great resources to learn Math using Haskell. I'm working through The Haskell Road to Logic, Maths and Programming and it's proving to be great http://www.haskell.org/haskellwiki/Books_and_tutorials/Mathe...
I have also signed in a class in nearest community college in my area.
Udacity's introduction to statistics, so far, has been really easy.
From what I've seen so far, Udacity are aiming to be more inclusive "community college" level and Coursera are aiming for a true university level education.
I'm having a really tough time with the exercises even though I should have the prereqs... What can I say, we don't do much rigorous math in engineering.
Anyways this book came highly recommended to me from a friend in quantitative finance.
The benefit of DIY maths is that you get to focus on what your interest is, whether that be algebra or geometry or stats/data analysis or discrete maths or foundations or whatever. (Contrary to some answers here, I don't think trig, measure theory, or planar geometry are necessary, unless you read something and it catches your fancy.) If your goal is just to do more maths, then just throw away anything that bores you and dive into anything that interests you.
If you're interested in C.S. applications then I think discrete maths is the thing to focus on. Eg, "Concrete Mathematics" by Knuth. But you can go an even more direct route and if algorithms is your goal, there are some free courses available online that do just that.
MIT OCW and AcademicEarth are excellent for in-depth (course-length) treatments of material by seasoned teachers. Wikipedia and math.StackExchange are good for broad overviews and quick Q's, respectively.
For "proofs, and other practical applications" -- that's quite broad. OCW has a course on geophysics and a course on signal processing -- both are "practical applications" but I don't know if that's what you have in mind. Most of the OCW stuff -- whether it's chemistry, materials sci, DSP, whatever -- is aimed at engineers (other than the upper level pure maths), and therefore practical.
As for proofs, you will see them in almost any maths textbook. Spivak makes some challenging ones and if you haven't taken calculus or linear algebra then Flanigan & Kazdan might be worth a skim. But if you're interested in CS applications then you can find those kinds of argumentation in an algorithms course.
BTW, I do not recommend the Khan Academy. It gets a lot of press but I have not found it so great. (It is fine but so is just surfing Youtube -- a lot of people make videos besides SK.)
Also, if you can still edit this question, you might give us some idea of where you're starting from and how much time you expect to spend, to elicit better answers.
PS By "the hard way" do you mean doing exercises? Because pretty much all mathematics texts expect you to do it "the hard way" if that's what you mean.
Use something lke Khan Academy to fill in the blanks.
It is squarely targeted at beginners, if you know the stuff already then many other commenters have recommended more advanced books.
Then I started applying it into programming step by step which made it possible for me to look at every little calculation.
Then I made a tiny physics simulation: http://www.youtube.com/watch?v=ud8NirjyLAA
Here is a nutshell description:
The standard high school level subjects are algebra, plane geometry, second year algebra, trigonometry, and solid geometry.
The standard college level subjects are calculus, abstract algebra, linear algebra, advanced calculus, ordinary differential equations. Might also take elementary courses in probability and statistics.
Standard graduate school topics are point set topology, measure theory, functional analysis.
For more in applications, can do (1) optimization -- linear, network linear, linear integer, quadratic, non-linear, and dynamic programming -- and (2) probability, statistics, and stochastic processes based on measure theory.
Here is an overview:
The course in abstract algebra will make you familiar with a long list of topics that come up in computing including sets, Boolean algebra, logic, relations, mappings, integers, prime numbers, modular arithmetic, rational numbers, the fundamental theorem of arithmetic, the fundamental theorem of algebra, elementary number theory, and the Euclidean greatest common divisor algorithm. You will also see finite field theory which has long been important in algebraic coding theory. You may touch on the question of P versus NP. You may see some of the work of Goedel, etc. and model theory. You will get good with math based on definitions, theorems, and proofs and develop 'mathematical maturity', i.e., ability to read, understand, and do abstract mathematics. Your ability to describe logical material in writing will improve.
Famous authors in abstract algebra include Birkhoff, MacLane, Herstein, Lang. More recent authors likely touch on algebraic geometry.
Linear algebra, mostly about 'vectors' and linear transformations, is the core of an ocean of applications in multivariate statistics (regression, factor analysis, analysis of variance, discriminate analysis, and more and, thus, applications in ad targeting, machine learning, data mining, anomaly detection, recommendation engines, etc.), optimization, group representation theory as in molecular spectroscopy, Shannon's information theory, and more. E.g., linear algebra is the place to learn about solving systems of linear equations, e.g., by Gauss elimination, and, thus, the place to get started with matrix inversion and the simplex algorithm of linear programming (optimization) and network linear programming.
A good first text in linear algebra is by Ben Noble. The crown jewel is by Paul Halmos, 'Finite Dimensional Vector Spaces', written when Halmos was an assistant to von Neumann; the book approaches linear algebra much like it was functional analysis and, thus, is a baby version of Hilbert space theory and, thus, sometimes used to teach Hilbert space theory to physicists for quantum mechanics. Richard Bellman wrote tons. See also Roger Horn's books. Evar Nering is a good first book (although his treatment of linear programming is not good); so is Hoffman and Kunze (apparently now available on the Internet in PDF for free). For numerical linear algebra, see Forsythe and Moler.
The standard advanced calculus text to teach the theorems and proofs of calculus is Walter Rudin's 'Principles of Mathematical Analysis'. After that book, notation such as O( n ln(n) ) as in Knuth's TACP will be child's play.
Note: At one time the Halmos 'Finite Dimensional Vector Spaces' and Rudin's 'Principles' were used in Harvard's Math 55 with a colorful description in
Also good coverage of the binomial theorem in high school or abstract algebra and an elementary course in probability will be a help in following Knuth's TACP.
Measure theory will take a huge load off your back: Calculus, that is, Riemann integration theory as taught through Rudin's 'Principles', has some rough edges, understood back to at least 1900, and Lebesgue and Borel and measure theory (heavily due to Lebesgue) provide a clean solution. As from Kolmogorov, the solution also makes a clean foundation for probability, stochastic processes, and statistics.
Likely the nicest first text in measure theory is Royden's 'Real Analysis'. Also good is the first, real, half of Rudin's 'Real and Complex Analysis' where he also gives nice introductions to both Hilbert and Banach spaces and a nice treatment of the Fourier transform.
My favorite authors in probability based on measure theory are J. Neveu (long in Paris) and L. Breiman (long at Berkeley). Or just read from their teacher, M. Loeve (long at Berkeley but with a writing style that looks like some cross between English and French).
This background will also let you do much more in both pure and applied math in many directions and for many applications.
For learning math, it is "not a spectator sport". Most of the work is between your ears as you think about the material. A good teacher in abstract algebra is likely necessary to get you started. Also without at least occasional contact with a solid university department it is too easy to get off track. Still, nearly all the work is to be done alone in a quiet room, and there a book is fine. In principle videos could help, but so far I've never seen even one that I would recommend for any utility at all in learning math.
You are right that you do need algebra, linear algebra, just enough calculus to understand infinite series (which is usually put in the second calculus course), etc.
You left out graph theory and combinatorics, both of which are extremely important to CS.
I would also suggest more emphasis on logic, sets, and type theory. Category Theory is also something fairly common in CS. So I would recommend (Free!): www.cs.unibo.it/~asperti/PAPERS/book.pdf
Abstract algebra: http://abstract.ups.edu/
And while I am skeptical as to the need of linear algebra for CS, it is such a key requirement for mathematical maturity I will suggest: http://www.math.miami.edu/~ec/book/. If Linear algebra is a building then abstract algebra is the frame. The two should really be taught at the same time.
Linear algebra is a necessary piece of background for linear programming (including the simplex method) and or standard approximation algorithms for many NP-hard problems.
Strassen's algorithm for fast matrix multiplication is commonly taught in algorithms class. It does not make much sense unless you know what matrix multiplication is.
Also if you want to work in computer graphics (which comes up both in theoretical and applied problems) you will need a solid understanding of linear algebra and matrices to understand the material.
I could list more, but that's enough to demonstrate that linear algebra does come up in a lot of places.
(p.s. i read your stuff on kelly criterion a long whiles ago, top notch, thanks!)
You're welcome, and thanks for the compliment. :-)
Right. The only advanced calculus text I listed was 'Baby Rudin', and the main contribution there is just to get some of the more important properties of the real numbers, Euclidean n-space, infinite sequences and series, and Riemann integration solid.
If a CS student is to stop before these topics, okay, but if they are going to go on then these topics will be part of what is generally assumed.
For CS, as we know, it is doing what EE did -- moving beyond its core tools and into what to do with those tools. E.g., EE got into nonlinear filtering and stochastic integration. E.g., CS is now getting into both optimization and statistics. Then being handy with that Baby Rudin material, at least the early chapters, will get to be important.
With Baby Rudin already done, there can be a good course in differential equations, and such a course can be a good way to see some of the value of what did in linear algebra and Baby Rudin and to exercise that material. At some point in the future, a CS guy might well get into some work involving differential equations -- viral growth models, flight of airplanes and space vehicles, and much of mechanical engineering.
One of the main themes in the future of computer applications is handling 'randomness', and my view is that serious work in that direction should have the measure theory foundations. I did an A/B on that! Early in my career I tried the easy way. After measure theory, Neveu, Breiman, Loeve, Dynkin, Lipster, Shiryaev, etc., I concluded that the measure theory approach to probability, stochastic processes, and statistics was essential.
In particular, without measure theory, people too easily get totally stuck in the mud on what 'random' means, while with measure theory Kolmogorov has a really nice answer. My view is that people may not like Kolmogorov's answer but that, with some really simple assumptions, we get forced into that answer anyway!
For the rest, the Hilbert and Banach spaces won't go away! E.g., a huge dessert buffet of really finger lick'n good applications of the Hahn-Banach theorem is David G. Luenberger, 'Optimization by Vector Space Methods', and that book will be a nice source of methods for a lot that computing people might encounter.
> You left out graph theory and combinatorics, both of which are extremely important to CS.
For combinatorics, I assumed that one would get enough, in some ways deeper than in Knuth's TACP, from abstract algebra and elementary probability. E.g., combinatorics is a lot about counting, and so is group theory in abstract algebra.
For graph theory, I assumed that one would get enough from optimization on networks. E.g., a 'basis' in the network simplex method on a network is a minimum spanning tree! And the max flow/min cut theorem can follow just from linear programming. Dynamic programming, which I mentioned, can be viewed as graph theory.
I did mention linear programming (and so does CLRS), and one reason is that in the CS study of algorithms the algorithms for linear programming are important, and surprising, benchmarks. Also integer linear programming is one of the more important motivations for the question of P versus NP.
For some of the surprise, the simplex algorithm has low degree polynomial expected performance (K. Borgward) but exponential worst case performance (Klee and Minty), but the polynomial algorithms (Khachiyan) when they are faster than simplex have both of them too slow for practice! So, at one of the first places we looked at computational complexity for problems more challenging than, say, heap sort and AVL trees, an exponential algorithm turned out to be superior in nearly every sense of interest to a polynomial algorithm! There never was a guarantee that the study of computational complexity would be easy!
But there's a lot of overlap: I didn't mention courses in 'finite mathematics', combinatorics, or graph theory, but one way and another what I described should provide enough coverage. I didn't mention the CS book CLRS, but I mentioned good coverage of some of the more advanced topics in that book. Or, there is a lot of blending old wine and pouring it into new bottles.
There is a broad point: A major theme in CS now is to borrow, modify, and apply work done some years ago in applied math, especially from operations research, e.g., linear programming, flows on networks, stochastic point processes, and statistics. While CS has some new applications, typically the material is done more carefully in the old applied math sources. So, I emphasized learning the material as math instead of as CS. Besides, the OP was asking about math for CS and not 'mathematical CS'! Also the title asked for the "hard way"!
There's another broad point: What to learn and why to learn it? Just a first cut view of what O( n ln(n) ) means should take only a little searching on the Internet. Besides, Knuth's TACP is quite clear on such 'asymptotics'. I wish Microsoft's MSDN documentation of .NET was as clear and easy to read as TACP!
One reason not to learn this stuff is to confirm that Knuth, Sedgewick, CLRS, etc. were correct after all!
Generally the reason to learn such stuff is for new applications in the future. For that, my view was just to stay with a relatively traditional course in relatively applied math although I leaned away from classic mathematical physics and more to business applications.
I think there is a point of diminishing returns.
Baby Rudin I'm dubious about. But Royden and big Rudin (both of which you recommended) I have certainty about. There are good reasons that I never saw CS students in my real analysis classes. I don't think it is particularly valuable for CS either then or now, to acquire a deep understanding of real analysis.
And yes, I know about measure theory. I know how it applies to probability. But I went the other way. I learned measure theory. Then I learned probability. Then I began having to do probability stuff in the real world. And not once has my measure theory background been particularly relevant.
As for Hilbert and Banach spaces, they are key pieces of mathematics. In fields from wavelets to optimization theory, they come up over and over again. But I would wager that most computer science professors do not need to know what Hilbert and Banach spaces are. I'd even bet that most have not heard of the Hahn-Banach theorem. Again, if you find yourself going that way, learn it later.
On combinatorics and graph theory, you claim that people will learn enough of that material elsewhere. Maybe, maybe not. But it is clear that programming problems routinely get turned into graph theory problems, many of the most important programming algorithms are about graph theory (start with the traveling salesman problem and work your way through the list of NP-complete problems), and at its heart, analyzing an algorithm's run-time is a combinatorics problem. Acquiring the necessary concepts and vocabulary for those is necessary, whether you classify the book you're learning from as a math text or a CS text.
Yes, there is a big question about what to learn, about how much to invest in such things.
> Baby Rudin I'm dubious about. But Royden and big Rudin (both of which you recommended) I have certainty about.
But Baby Rudin is a prerequisite to Royden and big Rudin.
I'm sorry, but probability, stochastic processes, and mathematical statistics were junk for me until I went at them via measure theory.
I floundered terribly with random variables until I saw the measure theory definition; it's terrific: Go take 10,000 measurements. Now have the values of 10,000 random variables. Any 10,000 measurements at all. So far, no concept of 'randomness' at all. So, random variables are very general things and, e.g., handle even deterministic processes as a special case.
E.g., sufficient statistics is just an application of the Radon-Nikodym theorem, and a total train wreck to do otherwise. Yes, order statistics are always sufficient, maybe nice to know in 'data mining'. That sample mean and sample variance are sufficient in the Gaussian case is mind blowing; nice opportunity for 'data compression'!
E.g., measure theory and the Radon-Nikodym theorem define conditional expectation, that is, under mild assumptions, E[Y|X] = f(X) for some measurable f. Then easily f(X) is the best non-linear least squares approximation of Y. Nice.
Further, if 'cross tabulate' Y on X, then have a discrete approximation to E[Y|X] which shows that cross tabulation is a discrete version of the best non-linear approximation of Y given X.
Measure theory permits working with all the forms of convergence of random variables, especially strong convergence, at least awkward to do otherwise.
Then martingale theory makes little sense without measure theory.
Measure theory, the Kolmogorov extension, shows that we really can have a collection of random variables with desired properties.
Measure theory is crucial in even defining E[Y|U(t), t <- a] since we are conditioning on an uncountably infinite collection of random variables. Measure theory, then, shows that we can replace U(t), t <= a by the sigma algebra they generate and, then, condition on the sigma algebra. Cute.
Constructions such as
E[Y|U(t), t <= a]
are crucial in the nice qualitative, axiomatic definition of the Poisson process.
Similarly for independence of two collections of random variables where each collection has uncountably infinitely many random variables.
Measure theory was crucial in the standard results of ergodic theory.
I wrote a paper on anomaly detection in server farms and networks, and the key idea in the paper was a finite group of measure preserving transformations lifted roughly from ergodic theory.
Via measure theory we can show that the space of real valued L^2 random variables is complete, and, thus, a Hilbert space, which continues to blow my mind that any such thing could be true. I'd also like to have locally compact, but that's a bit much to hope for!
The Doob decomposition shows that every stochastic process is the sum of a martingale and a predictable process, all measure theory!
It's tough enough to believe in probability with the measure theory foundations; otherwise, I couldn't swallow the stuff!
There is a broad point: Maybe OP wants to know what to learn for the applications of the future. Then what current CS profs know is not necessarily very relevant!
That does not tell me that this will be the future of CS. But there are niches where this stuff is more applicable than I had realized.
But one direction is handling 'randomness', and for that I recommend a measure theory foundation of probability, stochastic processes, and mathematical statistics. Just a recommendation. It's risky; your mileage may vary. As Yoda said, "Always difficult to see, the future".
One thing I'd like to point out is that measure theory is not the only and probably the least interesting way to study probability. There is the more elegant (IMO) approach via Nonstandard Analysis. And the fun more practical approach via Game Theory and Markets (which also support "imprecise probabilities").
I also think theres room for different approaches to the same thing, each offering their own unique insight. Many differential equation modelling problems, especially those involving populations could be fruitfully replaced by agent modelling.
Vovks and Shafer's game-theoretic approach is interesting in that related approaches like bandit models and online learning have recently picked up in popularity.
And real understanding takes time no matter what you do, the best one can do is start off in a manner such that the tools required to understand well enough are not more complicated than the subject matter itself.
However you've created difficulties for the day that they dive back in and try to really learn the subject. Because as they go to learn what a real number really is, and what its properties are, and about pathological functions, they also have to learn about the hyper-reals and a complex model-theory construction that (in both variations that I am aware of) requires choice.
There are a variety of other pedagogical choices that do not present such barriers to comprehension. (Note, by no means am I a fan of the limit approach. I know full well that it goes over the heads of the students, and I see no point in having the person at the front blather on about stuff that the class is not able to expected to understand.)
In any case my biggest complaint about how Calculus is taught is this. I think people come out of a first Calculus course without understanding the tangent line properly. If you don't understand the tangent line, Calculus is a mass of formulas. Despite how easy it is symbolically to jump directly from the tangent line to the derivative, I think that a solid week should be spent on the tangent line (calculating it for more complicated functions, finding applications, etc) until it is every student understands it well enough that they are ready to take the leap of looking at the slope and building a function out of it.
Anyone that can follow the proofs required to construct the reals can easily do so for the hyperreals. I am not an expert in the area and it's been years since I studied at depth but I know there are methods which avoid the need for model theory. I do not recall them invoking the axiom of choice but I could be wrong on that.
But the point is not that the learner gets a Hyperreal only approach but a varied exposure. I think that is the key - much more important than understanding tangents (what happens to visualization at dimensions of 4?) I think that learning all of the disparate parts at once and letting the student have the time to become comfortable (linear maps, derivatives, surfaces, groups) is what would be best. Allowing them to drift and backtrack and then whenever they felt ready would take whatever appropriate exams to show mastery of each area. The exams would also allow for more interesting problems.
It would take longer but the end product would be a far more cohesive understanding than the mishmashed nature of the current historical siloed approach. You've done machine learning right? I think one can take something from there about learning. The brain is just a much more advanced version of those basic objects: more and varied examples is better than curated and small examples. It won't overwhelm anyone unless they're overly impatient and then, math is probably not for that personality type.
The proofs for the hyperreals involve a lot more machinery than the proofs for the standard reals. That is my educated opinion based on knowing both sets of proofs and constructions. (Of course this need not make infinitesmals a pedagogical disaster - very few students actually care much about learning the proofs.)
As for the model theory approach, I am intimately familiar with the ultrafilter construction. It uses choice. I know there is a second construction which I am not familiar with, but from what I've read it also requires choice. Both involve model theory. That's a mighty big sledgehammer for a pretty small fly.
Incidentally Dedekind cuts can be understood as follows. The set of reals can be equated with the set of points where you can cut the rationals into two. More precisely if X is a nonempty subset of the rationals with an upper bound, we get a cut of the rationals into the set of upper bounds of X, and things that are not an upper bound of X. Any two subsets can be considered equivalent if their set of upper bounds is identical. An equivalence class of subsets is a real number.
For any cut you can generate a unique set A of rationals that are not upper bounds, and a set B of rationals that are upper bounds. When you do this, all rationals in A are less than all rationals in B, and A does not contain an upper bound.
Conversely if we have a partition of the rationals into 2 non-empty sets A and B such that all members of A are less than all members of B, and A does not contain an upper bound. Then we have a cut of the rationals.
So there is a 1-1 correspondence of reals to places we can cut the rationals to partitions of rationals with that property.
Those partitions of the rationals are called Dedekind cuts.
(This is one of two constructions of the real numbers. The other, Cauchy sequences, turns out to generalize more usefully in the field of topology.)
I did find out that there is a constructive approach though, so the axiom of choice is not actually necessary. www.math.ucla.edu/~asl/bsl/0403/0403-001.ps
MIT 18.06, 18.085, 18.086, 6.262, 6.450
Stanford EE263, EE364A, EE261
Profs Strang, Boyd and Gallager are quite a bit better with maths than the typical engineering lecturer, even though their courses are not exactly at the level of Rudin, Breiman et al.
It's not comprehensive by any means, and you probably need to know at least calculus to be ready for most of this, but it covers some pretty cool stuff, including RSA.
Generally in advanced calculus I avoided the discussion of, and much connection with, geometry, Stokes theorem, and exterior algebra. So, I avoided Buck, Fleming, Spivak, and of course also, now in English, Henri Cartan, 'Differential Forms'. I even avoided the classic applied advanced calculus text, long used at MIT, Francis B. Hildebrand, 'Advanced Calculus for Applications'.
For Loomis and Sternberg, I agree with you, and have both the hard copy and the PDF.
Since I've mentioned such advanced calculus, I will try to save many students: Students, there's a secret. The secret is that vector analysis, Stokes theorem, etc. are important in physics and engineering; they will also be important in computing when computing concentrates on such physics and engineering. But still mostly what you will find in physics and engineering is vector analysis much as it was done in the 19th century which the late 20th century math departments liked about as much as a skunk at a garden party.
If you read the modern treatments, complete with differential forms, then you will be at the head of the class in an advanced class in general relativity (e.g., Misner, Thorne, and Wheeler) but will still be lost in much of old physics and engineering!
So, what to do? Sure, go to Tom M. Apostol, 'Mathematical Analysis: A Modern Approach to Advanced Calculus', Addison-Wesley, Reading, Massachusetts, 1957. The good thing about this book is the lie in the title -- it's mostly a 19th century treatment and not "modern"! So do whatever you have to do to get a copy. And get the 1957 edition and NOT a more recent edition where he omitted the 'good stuff'!
Then, in about 20 pages of the sweetest dessert you ever tasted, with line integrals, conservative force fields, and potentials, volume and surface integrals, nice stuff like that, you will find a charmingly clear presentation of what you need. Right: The treatment is not up to the precision of Rudin and actually needs pictures. Still it's what you need for much of physics and engineering. It's, uh, 'intuitive' math; trying to make that material as precise as Rudin could take you, well, a long time.
And it's EASY -- can take it with a couple of beers and have a really fun evening. Then don't tell anyone where you learned it! Besides, at its core, it's just nice uses of the fundamental theorem of calculus you saw in freshman calculus! Did I mention, it's easy?
The key point about Rudin's 'Principles' is the care with which he covers the real numbers, compactness, continuity and uniform continuity, sequences and series, and the Riemann integral (yes, patched up with the Stieltjes extension which isn't much different). So, he concentrates hard on the foundations. For someone like the OP, getting those foundations solid is likely more important than rushing into many of the more famous topics in 'advanced calculus' -- Fourier series, the heat equation, Lagrange multipliers, vibrating strings (boundary value problems), the Navier-Stokes equations, series solutions to ordinary differential equations, etc.
While I like Rudin, 'cut many of my math teeth' on Rudin, and really like some of his treatments of some topics, I omitted some notes on how to read Rudin; some such notes could be helpful. In particular, Rudin has some places where it's easy to get stuck, and students should be advised not to get stuck (don't assume that just because you can't see how to solve some one exercise must be missing something important) and if necessary just to look for other sources, ask for help, skip over and come back, or just f'get about it. Rudin was one of the best writers of his material, but he was not perfect, varied, got easier to read as he wrote more, but still is relatively severe. Due to the severity, there have been some people, e.g., at Courant, who didn't like Rudin!
As much as I like the real half of his 'R&CA', he gets a bit severe and obscure in a few places (his novel and surprising but long and 'unstructured' construction of Lebesgue measure and his work on regular Borel measures); net, for most students it would be good to read Royden first or in parallel.
Rudin has two exercises that can slow people down: (1) Every closed set is the union of a perfect set and a set that is at most countable and (2) there are no countably infinite sigma algebras. Both exercises require paying attention to what is countable versus uncountable. The first one I worked on about 14 hours a day for two weeks before someone mentioned 'uncountable' at which time I got it in about 90 seconds. The second one took me a long evening, but I was the only one in the class who got it. For the first one, eventually Rudin included the hint. Students: Don't get stuck on such exercises.
For Rudin's 'Functional Analysis', I nearly went to Brown's Division of Applied Math but at the last moment went to Hopkins instead. Brown was using Rudin's FA, so I got a copy and at Hopkins asked for a reading course in it. Alas, the prof had never heard of that book and declined to participate! So, Rudin's FA, along with his 'Fourier Analysis on Groups' or some such are still sitting new on my shelf as I write software!
But there are some risks.
First, broadly a big risk is getting off track. There are many poor books, and dig into one of those and might not ever come out. And there are side streets; might go down one of those and never get back to the main road. Don't do those things.
So, the lessons: Pick subjects carefully. In any subject, use only very highly recommended books, and use more than one in parallel although likely one book as your 'primary' source and the others as 'supplementary'.
For book recommendations, get those from a better source than Hacker News! I suggest getting recommendations from some of the best courses and professors in the departments of at some of the world's best, and I mean at top 20 or top 10, research universities. So, for a subject you want to learn, find what books such courses, profs, departments, and universities are using and recommending.
Second, another big risk is getting stuck. One way is to encounter an exercise you can't solve, believing that you are missing something important, and then just grinding to a stop, for days, weeks, or forever, on that exercise. Don't do that.
(A) There are a lot of 'misplaced' exercises. Just because an exercise is in a book doesn't mean that it is reasonable to be able to solve it yet. Yes, in a good book you should be able to solve 90% of the exercises, but one book may have 1-20 exercises that are just nasty chuckholes in the road. I know one beautiful book where the exercises are so difficult, and apparently usually beyond the book, that one should get a prize for each doable exercise they find! (B) There are some tricky exercises that are just tricky and not really important. (C) There's no way you are going to miss much if you skip a few exercises. (D) You can continue on, maybe ask for help, look at your supplementary sources, etc. and then either get a solution or conclude that you haven't missed much. Yes, do work hard, but definitely don't get stuck. After enough hours at one spot, just move on.
Going on will sometimes give you the tools you need to resolve the place you were having trouble. Generally it's no sin, and sometimes helpful, to rush ahead quickly, get an overview, and then return to a more complete pass, and the overview can help you judge if the place you were having trouble was important or not. Also there's no way just one pass over some material can be as effective as three or so. E.g., one goal is a 'synthesis' of the material, and that needs several passes and several sources.
Third, there are occasional actual errors in the sources. So, if something doesn't make much sense, then it's wise to suspect an actual error someplace. You might be able to resolve the issue by consulting supplementary sources, doing some derivations on your own, say, to find a counterexample, or asking for help. By now the Internet may actually be a good place to ask for help. And, there's no law against writing the author of a book!
Fourth, some books are just much better than others. And even a good book might make a mess out of a few topics. So, again, consider several sources.
Fifth, there is a broad point about 'standards of quality'. Math is by a wide margin the most precise of all academic subjects. Still, in the total body of math that is done currently or already on the shelves of the libraries, the standards of quality vary widely.
Some of the most highly polished math ever put on paper is by (the collection of largely anonymous mathematicians) Bourbaki; very little of the rest of math is so highly polished; some of the best math and math writing is not as highly polished as Bourbaki.
So, there is a danger: It's possible to work to standards that are, in some or nearly all respects, too high. Beware that in the best work, the best of that work is not some high polish but in the real 'content' that would still be good just copied off a blackboard.
Why high polish? Because it can make one more immune to criticism. But, even Bourbaki gets criticized. Some criticism will always be there. So, don't try to defend against all possible criticism.
So, the 'standards of quality' that might be worth emphasizing would be the real 'content', the 'meaning', the 'significance', the 'power', the 'meat' and not secondary issues such as polish.
Contact with a good university department will give you some insight into what standards of quality you want to pursue, not too high, not too low, worthwhile and not just wasted effort, etc.
As I've written in this thread, it may be important for you to have a good teacher for abstract algebra since that is where your approach to math will take a big change to how the rest of math works -- definitions, theorems, and proofs. You will need to learn how to get the understanding, intuition, 'conceptual models', helpful pictures, promising applications, the meaning, the 'meat', etc. out of presentations that look just abstract or like just abstract nonsense; of course, attempt to do such things only for what are without doubt high quality presentations.
You will need to learn how to write math with definitions, theorems, and proofs, and for that you may need a good prof to read and mark your homework.
If you tackle abstract algebra without a teacher, then beware of these 'side' lessons you need somehow to learn and make some effort, likely not very large, to teach them to yourself. In particular, look at some beautifully written math, e.g., John von Neumann, e.g., his 'Quantum Mechanics', Paul Halmos, Leo Breiman, and pay close attention how they write.
All the warnings aside, if you are careful about avoiding the dangers, then you should be able to do well.
Let me suggest, generally some occasional contact with a good university department should help. E.g., if you work through Halmos 'Finite Dimensional Vector Spaces', Rudin's 'Principles of Mathematical Analysis', MacLane and Birkhoff's 'Algebra', or Leo Breiman's 'Probability', along with supplementary sources for each, then maybe ask to sit in, totally informally, just to observe, one or a few sessions of a corresponding university course. And get some homework assignments and some test copies. Then see if it appears you understood the material. You might be pleasantly surprised! Also, get old copies of the qualifying exams -- if these are easy for you, then you are doing well!
At least at one time, the Princeton math department, a good candidate for the best in the world, just stated that the graduate courses were introductions to research by experts, that no courses were given for preparation for the qualifying exams, and that students were expected to do such preparation on their own. Well, Princeton was stating that they expected students to do much like what you are trying to do, independent study.
There is a big theme: It's a good guess that what's valuable in math is creating new stuff. For a Ph.D. or tenure at a research university, research is the main work. For valuable applications to problems outside math, it may be that work that is at least a little new will be much more valuable.
Then, nearly always original work in math is done largely or totally independently. So, net, at some point, being good at independent work is crucial.