Confession as an AI researcher; seeking advice 346 points by subroutine 8 months ago | hide | past | web | favorite | 98 comments

 Being a grad student, this person is at a perfect place to build up their math background. Any school almost certainly offers the following:1. Convex Optimization -- not all problems are convex, but solutions for nonconvex problems end up primarily using convex methods with slight adaptations.2. Stochastic Optimization -- ML is pretty much all stochastic optimization. No surprise there.3. Statistical/Theoretical Machine Learning -- courses built around concentration bounds, PAC learnability, and the Valiant/Vapnik school of thought. This gives you what you need to talk about generalizability and sample complexity.4. Numerical Linear Algebra -- being smart about linear algebra is most of efficient machine learning. Knowing which kinds of factorizations help you solve problems efficiently. Can you do a Gram-Schmidt factorization? Cholesky decomposition? LU factorization? When do these things fail? When do you benefit from sparse representations?5. Graphical Models -- Markov chains, Markov fields, causal relationships, HMMs, factor graphs, forward-backward algorithm, sum-product algorithms.If you're in school, take advantage of the fact that you're in school.Once you have a grasp on these things (and you'll have to catch up on real analysis, matrix calculus, and a few other fields of math), you'll be able to start reasoning about ways to improve existing methods or come up with your own. I think a lot of it is just developing mathematical maturity to give you a vocabulary to think about things with.
 Alright, so I do research in and produce optimization tools professionally. From my biased point of view, someone is better off learning generic optimization theory and algorithms rather than specialized versions like convex or stochastic optimization. Generally speaking, generic nonlinear optimization methods form the foundation for everything and then there's a series of tricks to specialize the algorithm.Very specifically, there are two books that I think provide a good foundation. First, is Convex Functional Analysis by Kurdila and Zabarankin. Not many people know about this book, but essentially it provides a self-contained background to prove the Generalized Weirstrass Theorem, which details the conditions necessary for the existence and/or uniqueness of an optimization problem. This is important because even convex problems don't necessarily have a minimum. For example, min exp(-x) doesn't have one, but it does have an infimum. The background necessary to understand this book is real analysis and as a quick aside I think Rudin's Principles of Mathematical Analysis is the best for this. Second, is Nocedal and Wright's Numerical Optimization book. It provides a good overview of the powerful algorithms in optimization that we should be using. Now, it's weakness is that it often cheats and uses a stronger constraint qualification than we're afforded in practice. Candidly, I find that the derivative of the constraints will not remain full rank and we will likely violate the LICQ. Further, it covers a number of algorithms that really shouldn't be used in practice, ever. That said, it does cover the good algorithms and it generally has the best presentation out of the other books.Sadly, I don't know of any killer books for numerical linear algebra. And, yes, I've read cover to cover things like Saad's Iterative Methods for Sparse Linear Systems, Trefethen and Bau's Numerical Linear Algebra, and Golub and van Loan's Matrix Computations. They're valuable and well-written, but don't quite cover what I end up having to do to make linear algebra work on my solvers.Anyway, this is all biased and opinion, so take what you will. If someone else has some of their favorite references for optimization or numerical linear algebra, I'd love to hear.
 I can only speak from experience in my CS PhD program, but my above recommendations are based directly on that experience.We have an exam after the first two years and a similar process. It depends on what your advisor expects/wants. Mine has been flexible. And I focused more on efficient software engineering and applications for prior methods than on new research as I got up to speed on other matters. It made obvious a lot of ways I could improve them, just by being forced to look at and implement all of the details under the hood.And, at least in my CS department, there is a very heavy emphasis on mathematics with the ML/AI folks. They coauthor a number of papers with the applied math department and the rest of their papers are mostly proofs. They'll usually back it up with proof-of-concept implementations, but in that regard, they're very much like researchers in applied math except that they use Python instead of MATLAB.
 I think this is way too much for a pure CS person. It is not likely they will make a big contribution on the math side without being a mathematician first. E.g. an applied mathematician to CS.For ML, the OP already has linear algebra which is sufficient. Deep neural networks is back prop which is basically high school math. You could have mentioned ODEs, sensitivity analysis which I think are more relevant than convex optimization. For NNs we don't even care about identifiability in both the statistics and dynamic systems points of view. NNs blow away SVMs and almost everything except for random forests in some domains. Both of these have this interesting property that nobody understands them except in terms of black boxes for the most part. Boosting is another example. It really is stranger than fiction.The being said I think statistics/probability theory and Bayesian stats/networks are useful to know for any scientist.I would talk to your advisor about what to do. They will be able to advise on what's important and what to learn/focus on.
 > Boosting is another example. It really is stranger than fiction.Is this true? Boosting is pretty well formulated in the PAC framework and the classical algorithms (e.g. Adaboost) are well-characterized.
 You're correct. Boosting was directly formulated in the PAC framework.(Source: http://l2r.cs.uiuc.edu/Teaching/CS446-17/LectureNotesNew/boo... "The original boosting algorithm was proposed as an answer to a theoretical question in PAC learning [The Strength of Weak Learnability; Schapire, 89.]")It took a while, but there's been a lot of work lately explaining neural nets' performance over the last 5 years of so, from papers showing PAC learnability for specific architectures (https://arxiv.org/abs/1710.10174) to work saying that most local optima are close to global optima (http://www.offconvex.org/2016/03/22/saddlepoints/), to work saying that the optimization error incurred (as separate from approximation and estimation errors) serve as a form of regularization for deep neural networks.And understanding how these things work helps improve and speed up these methods and models: it's hybrid algorithms which are enabling performance in time-series data and more complex tasks. The future will nearly certainly use neural networks as part of many algorithms, but I doubt that the full machinery will be simple feed-forward nets of ever-increasing sizes.
 This would’ve literally been my answer. Brilliant. I can’t recommend this answer enough.
 i finished my masters, but one problem is that is over half the courses in a 10 course masters program, which generally only allows for 4 total electives. As a CS grad, I felt handcuffed by required core courses and other 'select 2/4 of these, which took up 6 courses in the major. the remaining 4 courses were electives, and 3 were allowed outside of CIS. those courses were mostly offered in the same semester, and would also have conflicts with required core courses.Another problem is that most data scientist positions are filled by statisticians who will be giving you the job interview. Almost all of the questions will be around stats. i personally feel a mastery of those courses would be great, but they would also not help me land a job because improving LDA to run on small text input by using a variation auto encoder doesn't help me recite the formula for a t-test.
 What courses (starting from pre-calculus) should one take to do what you listed above? I want to match your recommendations to course titles starting with pre-calculus. List book recommendations as well if you would. Thanks!
 I guess baby Rudin (or Hubbard & Hubbard for something simpler) in the analysis department; and Halmos (or Axler) in the linear algebra department.This is, essentially, Math 55. All 4 books have been used at different stages in this famous course.
 Halmos seems to discuss the same things as Hoffman and Kunze, which is the more “standard” and recommended book. Nevertheless after these you will still have to read up on multilinear algebra (tensors and determinant-like functions) as well as stuff on the numerical side of linear algebra.
 Convex: Bertsekas - Convex Optimization Theory, Convex Optimization Algorithms. Nesterov - Lecture Notes (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.693...)Statistical/Theoretical: Shai Shalev-Schwartz & Shai Ben-David's Understanding Machine Learning (http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...) Mohri's Foundations of Machine Learning (https://cs.nyu.edu/~mohri/mlbook/)The two above courses could share SSS's Online Learning text (https://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf). To be fair, the stochastic variants of most optimization algorithms can be learned reasonably quickly off of a statistical machine learning/basic optimization background. There's the option of Spall's Intro to Stochastic Search and Optimization, which covers neural networks, reinforcement learning, annealing, MCMC, and a wide variety of other applicaitons and techniques. (http://www.jhuapl.edu/ISSO/)Similar to what kxyvr said, I also don't know of any killer linear algebra text, which is why I think a course is so useful. The matrix cookbook is helpful along the way. kxyvr is also entirely right that general nonlinear optimization is important -- though perhaps less indispensable. (Going the other way, the Bertsimas linear optimization textbook I've had for years mostly gathers dust.)For PGMs: I got Predicting Structured Data back when it was new (https://mitpress.mit.edu/books/predicting-structured-data), but I think that Chris Bishop's treatment in PRML is easier to follow. He has some lecture slides which expand on it quite well. (https://www.microsoft.com/en-us/research/people/cmbishop/)Bishop would also be my go-to intro ML book over Murphy.I can't in fairness offer recommendations for the rest of the intermediate undergraduate math texts because I took them so long ago, but I can say that I have benefited from reviewing the MIT OCW courses from time to time.
 Great, thanks!
 Learning all the advanced math within a few years is a hopeless endeavour. It would take decades of hard work because there is so much out there, and we need to know all of it if you want to make progress (e.g. pull out a fancy-name theorem out of nowhere to solve some practical problem).I find a better approach is to focus on a few basic ideas you need specifically for your work and digging deep in there. Nobody can be expert-level in everything, but you can be expert level in your specific domain of research.Also for ML stuff, it's hard to overemphasize the importance of understanding linear algebra really well. Here is a excerpt of a book I wrote on LA which should get you started on learning this fascinating swiss-army-knife-like subject: https://minireference.com/static/excerpts/noBSguide2LA_previ...
 What you say is correct, but that reddit thread concerns (I think) a narrow class of devs of which there are many here: US CS majors who graduate with only a few semesters of calc and discrete and are underequipped to bootstrap themselves if they need to understand, say, topology. But they have to gain some understanding, so i put a few book lists in that reddit thread, took about ... 5 minutes.BTW I really like what i've seen of your guidebooks, or whatever you call them (neo-textbooks?).
 "Learning all the advanced math within a few years is a hopeless endeavour."Not in my experience. It's possible to get the equivalent of a bachelors and masters in math within two years (which is enough to overcome the issues listed in the post), but it's all you'll be doing for that period of time. Well worth it imo.
 > Not in my experience. It's possible to get the equivalent of a bachelors and masters in math within two yearSeriously.In fact, most people who learn this math learn it in a period of 2-3 years. Far from being impossible, learning this math in a few years is normal. It's not even a full time job. Most people learn all this math while also doing other classes and school stuff. Even a very dedicated math major probably only spends 20-25 hours a week actually studying math. I'm not sure much more than that is sustainable for most people anyway.Now, I'll grant, this is going to be a lot harder to do without the structure of well thought out syllabi and lectures, but it's certainly manageable.
 If you want to learn decent undergrad math you really dont even need that much. 1yr analysis, 1yr algebra, topology. Then maybe a combination of a more indepth linear algebra course, an optimization course, a probability course, a stats course. You could probably do all that and a few more electives in one year full time.From what I've heard from some foreign students, the 400 level American undergrad courses are what they are expected to learn as freshmen.
 > It's possible to get the equivalent of a bachelors and masters in math within two years (which is enough to overcome the issues listed in the post), but it's all you'll be doing for that period of time.About that - I don't think OP (or their research group) can afford to hold off publishing and attending ML conferences for 2 whole years while brushing up on math.
 The ratio of explore /exploit depends on what your goals are. If it's just to bang out a PhD in applied ML then perhaps, but if you're in it for the long haul it's well worth it. It would unlock a whole bunch of research directions and catching back up to the field with solid math skills in hand would be quick. If you pay attention to the leaders in AI, they're mostly applied mathematicians in disguise.
 You don’t need to understand everything in every field and I doubt very many people do.When Einstein was working on general relativity, he had a lot of help from friends and colleagues who pointed him towards the math he needed. He didn’t learn differential geometry until he was already deep into general relativity.Find a level of abstraction that you’re comfortable with and learn to be okay with black boxes at the lower level, and only dig into those boxes when what’s inside them actually matters.
 I think an important lesson for any grad student is to learn to read through the bullshit in papers and try and understand what the authors actually did.It helps a lot that in CS you can often see the code that the authors published along with the paper. Just staring at formulae doesn't mean much, because for all you know the author just hammed up the equations to get their paper into a top conference. That's not to say that the equations are excessive, or the authors are being misleading, but I think there is definitely an expectation in some fields that putting equations in makes your paper look clever even if they're broadly unecessary.It's also wildly different depending on the field. If you look at variational methods in computer vision, images are [continuous] mappings from some domain onto the reals (I : Ω->R3 for colour). Does that change the fact that an image in memory is just a bunch of numbers in a grid? Not really, but it's bloody confusing the first time you see it.This doesn't help with understanding the maths, but at some point you have to give up and say "This guy proved it, and someone else peer reviewed it, so I can use it to solve my problem". It's perfectly OK to stand on other people's work and still make creative contributions to your field, that's the point of research.
 >I think an important lesson for any grad student is to learn to read through the bullshit in papers and try and understand what the authors actually did.We actively work to make our writing hard to understand in this field. I do this all the time myself. I don't really need this complex looking equation to make my point. But if I don't have it in there a reviewer will think my writing is not academic enough. So there you have it. Once you go in realizing this is the case everywhere, it becomes a lot easier to understand academic papers.
 I get your point but I wish this wasn't the case with most research. I, like the author, am not a math guy but have been reading tons of ML papers recently. I usually skip the formal definition parts and get to the 'juicy' implementation parts.I wish there was a ELI5 section in each paper.
 What have been your favorite papers so far?
 That's hard as I haven't read too many. The recent deepmind papers (the ones about imagination) were good. The papers were pretty standard but they came along with explanatory blogpost[1] and some videos covered them too[2][3][4]. This supplementary content is what made them accessible for me.
 > When Einstein was working on general relativity, he had a lot of help from friends and colleaguesAs well as his lover, later wife, lest we forget:
 Are you sure? I thought his first wife helped him only at the beginning, the GP talked about GR which came later.
 Quite possibly. I suppose I read "when Einstein was working he got help with the math" - so "he also got help (from his wife and others) when working on special relativity" might be more accurate.
 In particular, Einstein worked with Marcel Grossman: https://en.wikipedia.org/wiki/Marcel_Grossmann who did most of the calculating and verifying of intuitive ideas that Einstein came up with.
 How does one expect to do ‘AI research’ without much of a background in math? Machine learning is pretty much all math.Researchers are generally expected to be experts in their field. The people writing the papers on arxiv likely spent most of their lives learning about machine learning and mathematics.Unfortunately, there’s not an easy path to become an expert. One just has to dig in and learn from the ground up.Edit: The good news is that it’s never been as easy to learn math than it is now. When I was an undergrad in math, there were almost no resources available to learn the intuitions behind the math. One just has to keep doing proofs and exercises over and hope that it would ‘click’ at some point. But, sometimes that wouldn’t happen until many years later. Nowadays, one can watch YouTube videos where experts describe the intuition behind the math. It’s awesome.
 Absolutely you can do 'AI Research' without a degree in maths. Sure you need a grounding in linear algebra, stats, probability and calculus, but not much more than a CS or physics degree will teach you. That stuff is indeed learned easily, and it's not what the Reddit user is worried about.That also ignores applications of machine learning, which is also a massive (and lucrative) field. But because it's a trendy field, I think there is an obsession with people needing to understand everything theoretical that comes out for fear of missing the boat.Some of the really interesting papers that have come out over the last few years - for example artistic style transfer and Faster R-CNN - have hardly any maths in. You can count the equations on one hand in both those papers. No doubt the authors know their stuff, but how readable are those papers compared to e.g. a 100-page proof? Which did I learn more from?They're a combination of two things: intuitive network architecture and a clever loss function. The first thing is a combination of intuition and programming, the second involves a little maths, but nothing outrageous.
 You're right - I see now when he says 'AI Research' he really means doing applied ML. And, that can certainly be done without knowing everything.If the goal is to make a lot of money doing applied ML, then become a consultant and aim to know 10% more than the customers. If the goal is to create models that are relatively effective, then read tutorials, play with data, experiment, and iterate. But, if the goal is to create very effective models and be able to actually explain why they work (which I think is what many companies want), then one has to understand the math.That is, showing some interesting relationships, trends, predictions, or inferences on a data analysis portal or consumer web site is one thing. But, using ML to dispense medication, regulate a medical device, drive a power plant, or identify criminal suspects - those may require different skills.(BTW I don't mean to disparage the middle group, as that's largely what I do. But, luckily I have people in the latter group who can validate what I'm doing.)
 Not a degree, no, but you should have depth in the math techniques you are using to solve your problem. You should also develop a high level abstracted understanding of math to have the intuition that your techniques are likely the right ones.This approach will help make incremental improvements. You might even get lucky and hit in something that cites really well.
 Actually I think it's sometimes harmful to take the maths too seriously. There are three parts to the ideal paper:1. Describe a new technique; 2. Show that it works; 3. Explain why it works.Understanding why things work is easily the hardest thing. This is where the most maths gets deployed...But often people are reaching for the fancier maths when they can't find a simpler intuition behind the idea. You can also use fancier analysis to substitute for less impressive empirical results. These explanations might convince reviewers, but that doesn't make them any more likely to be correct.I find it effective to take a very "computer's eye view" of things. Instead of thinking primarily about the formalisation, I mostly think about what's being computed. What sort of information is flowing around, during both the prediction and the updates? What dynamics emerge?
 > Machine learning is pretty much all math.Eh, I would say math is the language we use to communicate to computers presently, but the underlying _concepts_ don't require "more maths" and can often be grasped through an intuitive approach. For example, almost everyone who uses photoshop understands the underlying concept of a convolution (e.g. Gaussian blur), even if they don't know the mathematics that can be used to describe the operation. Yes, there are difficult notions that formalization or generalization assist with--perhaps it is better to see math as augmenting the initial intuition, rather than driving the intuition?
 It's not always helpful even when someone gives you a correct direction but none of his abstracted vision of how to apply that direction.For example: convolution is not _just_ Gaussian blur; it allows someone to find an object on the scene (or a shape of something in time dimension). How is that related to Gaussian blur and why are they the same? It takes time to understand the full domain of the concept.
 To understand the underlying principles, but not to use it. In this case, it's a point and click -- if they like the effect, keep it. Literally no maths are required for understanding, yet the outcome is achieved.
 > Unfortunately, there’s not an easy path to become an expert. One just has to dig in and learn from the ground up.Indeed, there's no royal road to geometry!
 I have no idea why you were downvoted this was the exact quote that came to my mind when I read the text you quoted as it sums up the principle perfectly that knowledge of math just doesn't come passively. Some context from wikipedia about the quote, 'Euclid is said to have replied to King Ptolemy's request for an easier way of learning mathematics that "there is no Royal Road to geometry," according to Proclus' [1].
 I guess I cannot rely on a widespread knowledge of the history of mathematics :(Still, I thought this line was famous "enough".
 "“I understood nothing, but it was really fascinating,” he said. So Scholze worked backward, figuring out what he needed to learn to make sense of the proof. “To this day, that’s to a large extent how I learn,” he said. “I never really learned the basic things like linear algebra, actually — I only assimilated it through learning some other stuff.”"
 I've long wanted a series of interactive math ebooks that work that way. Each would take one interesting theorem, such as the prime number theorem, and work backward.When you start the book, it would give the theorem and proof at a level that would be used in a research journal. For each step of the proof, you would have two options for getting more detail.The first option would be at the same level, but less terse. E.g., if the proof said something like "A implies B", asking for more detail might change that to "A implies B by the Soandso theorem". Asking for more detail there might elaborate on how you use the Soandso theorem with A".The second expansion options gives you the background to understand what is going on. In the above example, doing this kind of expansion on the Soandso theorem would explain that theorem and how to prove it.Both types of expansion can be applied to the results of either type of expansion. In particular, you can use the second type to go all the way down to high school mathematics.If you started with just high school math, and used one of these books, you would get the basics...but only those parts of the basics you need to understand the starting theorem.Pick a different starting theorem, and you get a different subset of the basics. It should be possible to pick a set of theorems to treat this way that together end up covering most of the basics.That might be a more engaging way to teach mathematics, because you are always working directly toward some interesting theorem.
 Yes, you and absolutely everyone else in the world that loves math, didn't have time to get a phd and isn't elitist wants this.Sadly, the monetization of this is tricky. Probably has to be an open source effort. Need some visionary like wales or khan, but they are very very rare.
 It's a great idea and I think it's much bigger than maths. If you do not already know about it, searching around what a "Dynabook" is cannot be a waste of time.You may be interested in this kind of laying out a proof: https://lamport.azurewebsites.net/pubs/proof.pdf
 Yeah. Reading the post I see a guy overwhelmed by a bunch of equations and numbers. Which isn't to say he shouldn't learn them, but math is always far more intimidating when you don't understand it than other subjects.
 > "intimidating"Exactly./speculationThere is a point where one starts to see "behind" the symbols. It's a strange sensation, as if one could understand the ideas in a non-verbal way. The symbols become optional. Intimidation crawls back before curiosity at this point.An amazing book on the subject is:`````` "Hadamard - The psychology of invention in the mathematical field" `````` /speculation
 What took me a long time -- and is still a skill I'm developing -- is to both verify and "read" the math at the same time, to see the proof and the story at the same time.At one level, you're observing a technical construction and trying to ensure that it's (mostly) sound; but at another level, you're trying to understand the broader picture of how it fits in, what the builder was trying to accomplish or what perspective of the world they're trying to share.Mathematics is -- like any language -- just the articulation of an experience, of an insight, of an understanding. As you get further into mathematics (and possess more technical skills of your own), it becomes more important to see "Oh, he's trying to apply the machinery of homotopy to type theories as a means of discussing equivalence" than it is to get bogged down in the technical details. Often, the details are wrong in the first draft, but in a fixable way. (This is extremely common in major proofs.)> There is a point where one starts to see "behind" the symbols. It's a strange sensation, as if one could understand the ideas in a non-verbal wayI think at some point, you have to compile mathematics to non-verbal ideas for computational reasons -- your verbal processing skills are simply too slow and too simple compared to other systems. Your visual and motor systems are way more powerful and (in the case of motor systems) operate in high dimensions. Much like GPUs in computers, if you can find a representation of a problem that works on a specialized system, you can often get a big computational boost; in mathematics, we have to push our understanding of self and experience to the limits to find more efficient representations of ideas, so we can operate on more interesting or complex ones.I think most mathematicians work in extremely personal, non-portable internal representations, and then use the symbols as a way to create an external representation that the other mathematicians can compile into their own internal representations.If you see mathematics as extremely high level code meant to be compiled to equivalent internal representations on thousands of slightly different compilers, I think the language starts to make more sense -- it's meant to be a reverse compilation target for machine code that's been under revision for ~3000 years, so of course it looks a little funky.Ed:I will say this --One thing I've noticed as I've gotten older is that we do a really poor job of teaching students the story of mathematics -- the human motivations, the community, the long standing projects (some have gone on for hundreds of years; some are still ongoing).I sincerely believe that for young kids (less than, say 10), it would be better for their development to teach skills 4 days a week and simply tell them part of the story on the 5th. It would make mathematics much more relatable and understandable.
 > Ed: ...A few people have thought about this very idea. You may take a look at:`` "http://www.vpri.org/html/words_links/links_ifnct.htm"``
 I liked but didn't love mathematics in high school and as such I just did what I had to do and moved on. A decade later I worked through a CS degree and gravitated towards books about mathematicians and now I have a deep fascination with mathematics and I wish I read these books when I was in high school!
 A survey of how mathematicians think about mathematics [citation needed] found 80% visually, 15% kinesthetically, and 5% symbolically (i.e. in terms of notation).
 > math is always far more intimidating when you don't understand it than other subjects.In a way it is like a magic trick. Frustrating when you don't know how it works, but when you find out it's like: oh was that all there's to it? However, unlike a magic trick, math leaves you with something that can be actually useful.
 And once you understand it you can't see how you didn't understand it before.
 Hmm that's very interesting. I just don't understand how he made it through university. When I was enrolled in CS I somewhat got along with Algebra and was completely lost when it came to Analysis and so I dropped out. Back then I was working so hard at my courses I felt that I simply had no time to even consider "other stuff". I would like to know how it was obvious to him what he had to do.
 > Scholze started teaching himself college-level mathematics at> the age of 14And also that a few people have exceptional intellectual abilities, built-in.
 I think this is fairly common first-year grad student emotional response. And quite frankly, it is the job of your mentor and department to ensure you receive sufficient training for an academic or research career.Modern AI is evolving rapidly but there is a foundation upon which everyone draws upon. The Sutton and Barto book is one such foundational text.Find a collaborator in the Math department to work with. And participate daily in stack overflow forums for math and stats, such as Cross Validated.I can also recommend CASI by Efron and Hastie. Deep historical understanding of where we are today in probabilistic inference.
 I always had the same feeling. I'm not bad at math in general (I'm not well versed in high level maths either), but as a developer, trying to jump in the ML field seems really impossible. One would think that he could teach himself ML algorithms, but you ALWAYS end up reading math notation instead of pseudo code.To be honest the 3blue1brown videos seems really wonderful at explaining what is going on without going too deep, as the math in ML lecture seems to be trying to prove everything, and is always trying to teach using math notation all the time.I guess this is happening because most of ML is mostly coming from research since it's all new, so it's being taught mostly by people who can grok the math, meaning mathematicians, it's not taught by people who are programmers. This really shows how much math should keep being math, and not leak into fields where practice matters more. Programming languages and pseudo code are not for nothing. Computers don't talk math.So as years go by, ML will be taught more as a practice subject rather than a theory one, and things will get better. I think it's just a matter of how it's being taught, because reading code will always make more sense than reading high level math. Videos and oral explanations also help a lot.
 You might appreciate the Deep Learning for Coders courses from Fast.ai. It's basically ML as a practice subject rather than theory as you suggested.I felt similar to you when I first started learning ML but their code first approach really helped it click for me on an intuitive level. Then you can go back and dig into the maths behind it.
 Currently getting my Masters in AI. I'll be honest, I can understand the concepts when presented to me, but the mathematical proofs are beyond me. I've learned to be ok with that. There's just not enough time in a 2 year program to teach myself the underlying vagaries of everything I encounter.
 I think he should just go on with his research and don’t bother with understanding every obscure reference in papers. One can grasp the core ideas surprisingly well even when skipping over proofs and formulas.And when the time comes to write his own papers, he should remember to intentionally make it harder to read for outsiders. E.g. instead of writing “I calculated the total error by summing the per-neuron errors”, one should write “the loss function utilized an integral over the output lattice using a discretized method by Newton et. al.”, or some other bullshit.
 As an amateur who has jumped in and out of learning basic ML over the years, it has been interesting to see the web of terminology expand to the point where your post is no longer satire. Writing a dictionary or annotating papers to decipher ML-speak to basic-math speak would be a pretty worthwhile endeavor for someone (I see glimmers of this in the work being done by the folks at fast.ai) and in general would probably not remove much real information.
 Sound like a need for a “imposters handbook” for Mathematics, just like there is for Computer Science: https://bigmachine.io/products/the-imposters-handbook
 A little off-topic, but hope this would be an another reminder that despite having huge success/hype these days, deep learning/machine learning are just another tools for solving problems, and you can't go really far if you just treat it like a magic framework (i.e. import tensorflow as tf) and fail to understand its underlying principles.
 When I left school, I found myself in a similar boat, and decided to set a goal of getting myself the knowledge equivalent of an undergrad degree in math. I already had a physics degree under my belt, so it wasn't as long of a path as it might be for others, but over a bunch of years of self study it paid off. When it comes to your career a few years is nothing. The strong foundation pays regular dividends because learning things that use it comes so much more quickly.It's a huge, slow, painful investment, but totally doable and with tremendous ROI if you want to work with stats/ML/optimmization/really any numerical computing for a living.The reason I recommend this route is that most of the more advanced math books you will encounter will assume this stuff as the readers' common knowledge. Having that foundation, the majority of the literature is already tailored to you!
 A technique I have found that works well is read a lot of paper and their citations, but don't dive deep. Each paper usually provides some easy to grasp insight (far too little per paper, but that is elitism for you) that you can use to get a good picture of the field. Reread papers to grasp more insight. Once you have a good overall picture, find some area/problem that really interests you, you like the math, and isn't covered well and bone up on the math techniques. Do your research and present.Showing up at thesis defenses is good too. Learn a lot from the back and forth with advisors.The key is to understand at a high level why the different math techniques are being used without actually understanding all the details. This won't be sufficient for your own work,but at least you'll have a good idea how your part fits in the scheme of things.
 That person approaches the issue as his personal problem. However, there are likely many other students around him with the same problem. It is a problem of the whole field recently.One solution would be simply to a arrange a local seminar, and understand a couple of papers in full detail. It would help to invite a couple of mathematically aware students, from mathematics, physics, or the part of cs faculty where they prove stuff. They should be able to explain and answer questions immediately, which is way more effective than reading whole books or taking courses. Those can be read for details later.If the papers for the seminar are deep learning papers, part of the outcome of the seminar is likely to be an understanding that the authors of these papers do not necessarily understand the mathematics themselves.
 This all does bring up a good point. Why not create a publicly accessible 'math tree' that people can use to learn about any kind of math. If there is a symbol or step they don't understand they should be able to follow it all the way down to basic counting.
 Based on your post, I highly recommend taking a year or two off to focus on math only. You can get the equivalent of a bachelors and masters in pure math in just two years (if that's all you're doing with your time), and it would be enough to fix all the issues you're experiencing. Just take the pure math courses instead of computational math, as abstract and difficult as possible, it will generalize much better :).I got into math for exactly the same reasons while doing research in computer vision as an undergrad, and taking the requisite time off to learn advanced math (actually going overboard on it) has been an incredible boon to my AI career.
 I think the real issue is that the OP has a mistaken expectation that they should understand everything. For instance, the group that wrote the Wasserstein GAN paper are surely those that think night and day about distance metrics. And they might be totally lost reading a paper about some energy based method that relies on concepts from physics.The point is that researchers have their little niche and they try to make contributions in areas adjacent to it. It's unrealistic to think everyone publishing papers understand all the other papers, particularly in such a cross-disciplinary field like ML. There's also a big gap between a researcher deep in their career and a student fresh out of a masters program.It's also hard to transition from someone who's used to reading and understanding textbooks to someone who's often reading really technical research and understanding very little of it at first. You just have to push through and have confidence that you'll eventually learn enough to make a contribution. That's what it means to "become an expert"--you start off as not being an expert and then beat your head against the wall for a few years until you bootstrap your way out of it. And if you want to do it in a reasonable amount of time, you should probably choose something you have some of the fundamentals for.
 From one of the comments:>"Professional heavy math people are those who said in the 60s that the perceptron's limitations proved all AI was impossible. And in the 90s that one hidden layer was all you needed, deep learning was useless."Can anyone provide the citations for this? I was aware of the latter but not the first one. You can find people still repeating the one layer stuff up to a few years ago just by reading stackexchange.
 I wonder if the author is referring to the oft-repeated claim that Minsky's and Papert's proof that a perceptron cannot learn the Xor function had a chilling effect on research into neural networks generally, even though Minsky and Papert themselves had shown that multi-layer networks were capable of doing so [1][2].I realize that even this alleged misunderstanding is not the same as a claim that AI is impossible. The closest attempt of a mathematical proof of the impossibility of AI that I am aware of is the Lucas-Penrose argument from Gödel's first incompleteness theorem [3].[2] Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.
 Thanks, the first ref looks like it may be the one.
 Besides, heavy math people were also those who proved that recurrent multi-level neural networks were Turing complete and pushed the field up from the ground.But sorry, I don't have citations. It's stuff I've read a long time ago, in random books picked at library shelves by looking at the covers :)
 Do you really not need familiarity with the relevant math to be admitted to AI doctoral programs? I wouldn't have thought that was the case.
 How does one get into an ML PhD like this? I was under the impression it was impossible if you’re not a math double major.
 I am a math double major doing a research masters in ML/CV right now, and I know plenty of pure CS majors who are doing just fine. The math that 95% of ML scientists use is not that hard to grasp. Sure, when they encounter functional analysis stuff, they start to cry inside a little, but that doesn't happen very often.
 Interesting. I’m interested in a related MS/PhD with a similar math background as OP and assumed I was disqualified.
 As Alan Kay noted, the right point of view can add 80 IQ. I was in a quantitatively heavy field and always felt out classed by those with strong physics and maths backgrounds. Nevertheless I published to papers in Nature journals and overturned about 10 years of high profile research, not because I was smarter but because I spent more time trying to find the right perspective and when I found anomalies instead of brushing over it, confident in my own intelligence, I instead drilled down until I found the root of the problem — something that everyone else had overlooked. You don’t need to be a classical genius to make a contribution but you probably do have to be tenacious.
 One thing that hasn't been mentioned: learning mathematics from talking to another human can be 10-100 times faster than getting it from books. Another thing: mathematics is huge and seems to accommodate all personality types. Pick something that turns you on and grow outwards from there. The folks on reddit seem to be obsessed with Rudin, and that's good stuff, but there's so many other roads to follow.And I'm so impressed by how much better the comments are here than on reddit! Good job HN, you rock.
 > the “utility density” of reading those 1000-page textbooks is very low. A lot of pages are not relevant, but I don’t have an efficient way to sift them out. I understand that some knowledge might be useful some day, but the reward is too sparse to justify my attention budget. The vicious cycle kicks in again.That is their main problem. All those useless pages are what becomes useful later.And we find the same kind of attitude everywhere in tech: why read a full RFC when you can assume shit and get done with a 2 paragraphs tutorial?
 Perhaps you can focus on improving the usability of AI tool-sets to a wider market rather than focus on finding the Next Big Magic Equation. Example: https://github.com/RowColz/AI An AI expert(s) may do the initial setup, but factor tables allow more "typical" office workers to tune and prune the results.
 I like the comment that likens this sort of deeply-linked knowledge to a DAG. In my own (limited) experience, once I’ve mentally found the DAG where every node either references some other node or baseline knowledge, the learning task almost immediately switches from daunting to routine. Just work on understanding each node in the dependency chain until you get to the one you seek!
 From TillWinter's response: > Also: doing the master is to understand that you don't know anything, and doing your doctorate is to learn the others know nothing as well.I had always heard variations on the first part -- that going to a good school was supposed to humble you by showing you how much you don't actually know.Never heard the second part. That's great.
 I am an AI researcher and faculty member at a large and famous university. I probably know less math than that Reddit poster. Math is important if you are specifically interested in the math of AI. If you are interested in inventing algorithms and solutions you mostly don't need the math.
 Starting from pre-calculus what areas of mathematics (with book recommendations) should one study rigorously to have the foundations to pursue a PhD in Machine Learning?
 The depth and illegibility of the field he is describing make me believe that we are much further away from general purpose ai than I previously thought
 Remember:"Complex models are rarely useful (unless for those writing their dissertations)." (V.I.Arnold)
 Perhaps the author should write a ML tool to help sift through all the material ;)