As a researcher in the field I am not quite sure how I feel about these kind of resources. I am all for making research accessible to a wider audience and I believe that you don't need a PhD, or any degree, to do meaningful work.
At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
You don't need a degree, but I think you do need to spend some time to get a deep enough understanding of what's going on under the hood, which often includes some math and takes time. This can be made accessible and there are plenty of good resources for that. But all these "become an AI pro by looking at some visualizations and copying this code" is maybe hurting more than it helps because it gives the illusion of understanding when it's actually not there. I wouldn't want people learning (solely) from this touching my production systems, writing blogs, or putting papers on arXiv.
I have self taught this material and have been working professionally in the field for some years now. It was primarily driven by the need to solve problems for autonomous systems I was creating. When I am asked how to do it I give the progression I followed. First have preferably a CS background but at least Calc 1&2, Linear Algebra, and University statistics, then:
1. Read "Artificial Intelligence A Modern Approach" and complete all the exercises [1]
2. Complete a ML course I recommend the Andrew Ng Coursera one [2]
3. Complete a DL course I recommend the Andrew Ng Coursera one [3]
4. Complete specialization courses in whatever you are interested in such as computer vision, reinforcement learning, natural language processing, etc. These will cover older traditional methods in the introduction also which will be very useful for determining when DL is not the correct solution.
Additionally I suggest a data science course which usually covers other important things like data visualization and how to handle bad/missing data. Also, I learned a lot by simply being surrounded by brilliant people who know all this and being able to ask them questions and see how they approach problems. So not really self taught as much as untraditionally taught.
Unfortunately not a single person has actually followed the advice. Everyone has only watched random youtube bloggers and read blogs. Some have gotten into trouble after landing a job by talking buzzwords and asked for help but my advice does not change.
It does make it rather hard to find a job without a degree though, I would not recommend it. All of mine only come from strong references I have from luckily getting my foot in the door initially.
Thanks for sharing your learning path and congrats to your success. There are so many great resources out there that it's possible for anyone to become an expert. Unfortunately, people like you who are willing to put in the hard work seem to be the minority. All of your success is well-deserved and props to you. Just like you said, most gravitate towards the easy-to-understand videos and blogs instead of confronting their gaps in knowledge. I've had the same experience with giving advice - anything that looks like it requires focused work (solving textbook problems!) or is unsexy is readily ignored in favor of the latest hype demo or high-level framework.
> It was primarily driven by the need to solve problems for autonomous systems I was creating.
I wonder if your success also had something to do with the fact that you had a specific problem you were trying to solve. Did you feel that your specific problem put everything you were learning into context and made it easier to stay motivated?
> I wonder if your success also had something to do with the fact that you had a specific problem you were trying to solve.
As a side note, and I've found that this is one of the best ways to learn. Find a hard but obtainable problem and work towards it gathering all the knowledge you need along the way. What works for me is breaking down a project into a bunch of mini projects and so it becomes a lot easier to track progress and specify what I need to learn. That way even if you don't finish the project there's still a clear distinction of what you learned and can do.
I completely agree with this. In undergrad, I majored in non-profit management. Every single course in major had a field work requirement. Grant writing class required us to work with an area charity and write them a grant application, which our professor graded as one of our assignments. Same thing for program evaluation class and the rest. In addition to learning the topics, I learned so much about how to work with real world teams.
> Did you feel that your specific problem put everything you were learning into context and made it easier to stay motivated?
Absolutely, having focus on finding better solutions to a single problem for multiple years certainly helped putting everything into context and staying motivated. That is the biggest problem I think, really learning this stuff like is needed to in order to solve new problems takes years of investment and just can't be done in a week or a month. Pretty hard to stay focused on something for that long without a goal and support of some sort.
The way that having the specific long term problem to solve really helped was always having that thread spinning in the back of my mind thinking about how something new could be applied to solve part of it and possibly trying it out. Also thinking about if certain approaches could even be practical at all.
I suppose that is probably fairly similar to grad school though.
Teaching isn't just about presenting the information to the student.
That's basically all you've done. Here, student, read these complex topics and at the end of it all you will have learned machine learning!
The art of being a teacher is much more nuanced. You (apparently) fail to present the material in a way that is accessible, relatable, and not overwhelming.
For example, the first thing you say to do is go read a college textbook front to back, and do all the exercises. And you're surprised that nobody has followed your steps?
Yes, you are right. I'm not trying to teach the the dozen or so people who have asked, only to lay out a progression with prerequisites similar to what I did for self learning. I certainly do not have time to be creating courses and problem sets or anything more than answering specific questions.
The AIMA book has a lot of open resources around it that I always mention including a full open course I believe, it should all be linked on the site. Although, I also mention that while it is probably not a good idea they can possibly skip it and go right on to the ML course. Both of the Coursera courses are complete with lecture videos and work presented in a very accessible manner including interesting projects.
Well, maybe I got the wrong impression but after reading the (very accessible) Yolo V3 paper [1], it seems to me that even the experts do little real math and lots of guesswork, kicking a model until it starts giving results.
Oh yes, I don't think advanced math is required in any way. However, there is a difference in that researchers who have worked with these models for many years (often including having done some of the math) have a very good intuitive understanding of these models. Once you have that, it's fine to be driven by gut feeling. Just like many engineers are driven by gut feeling that come from tacit knowledge through experience.
What is dangerous is reading a 30 minute blog post and getting the illusion of having some kind of understanding, when in reality it can take years to develop that. It's like cloning the postgres Github repository, compiling it, running a few queries, and then saying you've built databases and being hired to become the "database expert" at some company, spreading wrong knowledge left and right.
That's why the popularization of these quick immediate reward tutorials is dangerous. It takes time and effort to become knowledgable at something. Of course, many people are smart enough to know that these tutorials (or cloning the postgres repo) is just the first step on a longer learning journey, and in that case it's totally fine, but there are also many people who start thinking they are experts ready to work on research or production models after going through such things, not being aware of the many things they don't know [0]
It sounds like what you’re complaining about is some people who put papers on arxiv and deceivingly claim to be experts. And kind of implying that these blog posts are to blame, so the blog posts should be retracted due to those dishonest academic people? You’re making a very confusing point.
Not confusing. He's saying that watching Judge Judy for a week doesn't make you a lawyer. And when hiring, be careful because lots of people who claim to be experts are far from it.
That’s valid, but he’s implying that Judge Judy should be taken off air because some people watch it and then pretend to be lawyers. Squelching information seems like a terrible way to eliminate a few impostors downstream.
Yes, but that's a bad example because pretending to be a lawyer is hard. A better example would be gurus spreading nutrition recommendations that are not wrong per-se, but extremely simplified. Nutrition is a complex topic and individual differences make it hard to generalize. Let's say the information are so simplified that they are likely to hurt people who blindly follow them without doing further research. So, should this information be taken off air or not? I would say yes, and perhaps you would say no. In either case, I don't think the answer is quite as clear-cut.
As an FDA law firm engineer, I review and submit about three AI devices a week to FDA (Ive probably done at least 50 AI submissions at this point) As part of that I review the all the submission docs including the design specification for each submission. Im convinced that none of the companies understand how any of the math works and they just submit models they have grabbed from Tensor or Pytorch
The "real math" comes in when you're debugging a huge network that occasionally emits NaNs deep in the reverse pass. At that point, you need to understand the details of gradient descent and a whole lot more practical analytic theory.
> it seems to me that even the experts do little real math and lots of guesswork
Machine Learning is a rich and varied field. Like most applied sciences, there is a spectrum from the heavily applied to the heavily theoretical, to some which try to span both sides of the spectrum at the same time (e.g. https://arxiv.org/abs/1704.04932).
The paper is definitely accessible, but that doesn't mean you don't have to have a solid understanding of math to do this stuff behind the hood. They gloss over stuff like "focal loss" plus this is just an update on some small tweaks they've made, so obviously wouldn't be super math heavy.
All in all, not a great paper to prove your point imo.
I am one of those without a PhD, but have taken the time to the learn the math & contribute quite a bit to this area.
That being said, I also don't think deep learning itself is not really a "science". The issue I have is you can't predict if a network will learn.
We're effectively testing deep learning networks the same way the Romans used to test bridges. Send a bunch of elephants over them, if it holds it's good enough.
There's obviously some indicators of success, but on a whole the overall interaction between components is very difficult to calculate and near-impossible to predict. While I think it's important to understand how layers interact and how a given function will impact your optimization, etc. it's not fully required to have a deep understanding of the mathematics, at least for most cases.
I also personally don't view anything on arXiv worth anything. I typically will read articles/papers myself if reviewing a candidate and / or would like to see their publications at conferences or journals. Otherwise, it's essentially a blog post (which IMO is fine, but will require me to review it).
The mathematics of neural networks is very well understood from a theoretical and scientific perspective. It is easy to say that a neural network will have predictive power given labelled training data.
Whether or not the predictions it is making satisfy unstated requirements is another problem altogether.
> We're effectively testing deep learning networks the same way the Romans used to test bridges. Send a bunch of elephants over them, if it holds it's good enough.
Well for human level tasks maybe, but what about other areas of research where we like to discover patterns not seen by a human? Like finding links between genomic interplay with external perturbation such as radiotherapy? It'll be very lucky to have such 'elephants' to test at all.
Well I’m certainly no expert on this, but I would guess due to the previous comment the field is possibly too immature at the moment to have as much mathematical certainty as you might find with other methods and fields of mathematics. I recently read the beginning of the book Introduction to Mathematical philosophy by Bertrand Russell and in it he explains how at the times of the Egyptians though they invented geometry it wasn’t very formal and it was very much like the grandparent explained Machine Learning is where Romans used to test bridges they would just throw enough examples at something until they thought it worked. This didn’t mean geometry could never reach a system where they could know the surety of their theorems. The Greeks did just that by starting from basic assumptions or axioms and building a consistent and partially complete system that allowed them to prove many things that followed from there assumptions. There’s a possibility that at some future time (possibly future generations) we’ll have better mathematical tools to figure out the specifics of why neural networks and machine learning work and to the specific extent they do work.
I am also currently reading a Programmer’s Introduction to Mathematics by Jeremy Kuhn (excellent book by the way I would whole heartedly recommend to programmers who have some background in math or thinking in abstractions but who want to learn more math) and it has a quote that states learning Mathematics is a lot like walking into a series of dark rooms and feeling around and getting a feeling of what is in the room until you flip on the light switch but then you could always go to a new room and start all over. I think in that sense machine learning is a series of rooms some lit, but a majority still dark that we haven’t grasped yet.
I think there's a difference between pure mathematics and applied math in these discussions. Exploring the dark room of math is one thing while having a whole lot of great tools and figuring out how to use them in a rain forest to build a livable dwelling is quite another. Math being the abstraction of the world needs bridges to the problems we are facing. Hence the practice of trying a few real life examples (elephants) to test whether it (a method) works.
Do you ever feel like all the noise influences the way you think about your own career? I work as a data scientist and sometimes find the hype so off-putting that I think that I should look for a role that's related to solving some optimization problems outside of ML, or a software engineering role in some completely different domain and work as a backend developer or something similar.
Definitely. I came from a software engineering background into AI 6+ years ago, and I was lucky to be there at the right time. From purely a career perspective, leaving aside that this may be your life's passion, I believe that right now is the worst time to get started in AI. These is so much noise and competition but very few actual jobs or value created. It's just a research PR machine. Just as with Data Science, companies that think they need AI often just need better data collection, pipeline and infrastructure engineering. Good backend/infrastructure/data engineers are so much harder to find these days, and these skills IMO provide much more value than doing some kind of modeling.
> Good backend/infrastructure/data engineers are so much harder to find these days, and these skills IMO provide much more value than doing some kind of modeling.
After all these years of FOMO on AI, this is music to my ears.
Right, I’ve been reading about machine learning for about 5 years, have read hundreds of articles about different techniques, and have often tried to explore ways it could be used.
However, I’ve never found a practical use in software engineering.
Every time I think I discover something that could use machine learning, I usually don’t have any data to work with or don’t have a clear definition of what the inputs and outputs would be.
In the end, I find a way to develop a solution that doesn’t need AI and often makes me realize that AI would not have been able to provide the required reliability.
I even participated in a self-driving RC car competition (running on raspi 3). I ended up scrapping the ML solution because it could only run at 10 hertz which was not even close to what was needed.
I ended up writing a custom algorithm in C that ran at 200fps and could do “advanced” moves like backing up and course correcting.
I’m not saying that ML doesn’t have its place. I’m saying that right now it’s all about images and nlp and has huge costs.
In any practical usage, the ML parts that I would need are already available as a service that I can use almost free (like azure cognitive services).
> In the end, I find a way to develop a solution that doesn’t need AI and often makes me realize that AI would not have been able to provide the required reliability.
This was brought home to me back in the 1990s at a place where we were trying to popularise expert system technology (a positively prehistoric form of AI). It was one of my first jobs in industry after leaving academia. The goal was to create a system that could predict the speed at which a unit of military vehicles could move so that a contact report expert system could decide whether two groups of vehicles could have feasibly moved between two points in a given time. The system could use this to help decide whether two reports referred to two different enemy units or a single unit that had moved from point A to point B.
After quite a lot of time cranking out and debugging rules to describe the movement behaviours, I realised that some simple convoy arithmetic would do just as good a job - units are often constrained by the speed of the slowest vehicle, and respect inter-vehicle distances. For most purposes this simple arithmetic was just as good (and orders of magnitude faster) than the complex rule engine.
I think there is a lot of more practical deep learning that are coming out, but it often times working with the people who know the field intimately.
For example, in my PhD thesis project, I am making a series of deep learning models to make a cardiac MRI autopilot. I've build a series of deep learning networks to localize the cardiac landmarks that define the cardiac imaging planes. And our group has even made it into a clinical prototype that works within our imaging workflow.
I think the field is shifting in that ML technologies are increasingly requiring domain level knowledge in order to make a practical endpoint.
I've been involved in a lot of Data Science projects, and the most successful ones are the ones where the Data Scientist either did or was part of the infrastructure and Data Engineering.
Most of the failed projects basically were "Hire a Data Scientist to make a magic model" and nothing else was supported. Basically, "Here's some data, do some magic." I read somewhere that 90% of data is low/zero value, and I'd agree with that.
There's absolutely a place for people doing pure statistical or ML/Deep Learning modeling. But the rest of the organization has to support them so they can have that narrow focus. A lot of places want to take a shortcut and not do the data work.
5 years ago the FOMO was from Hadoop. Hadoop Here, Hadoop there...people just forgot you have to actually collect a lot of high quality data to make Hadoop useful. It's AI now.
I’ve found that non-technical management expectations are just too swayed by the AI hype machine.
I’ve also found operations research field to be more grounded and still have plenty of research opportunities. There’s something very fulfilling about optimizing a real-world system, particularly when the answer isn’t intuitive.
>I work as a data scientist and sometimes find the hype so off-putting that I think that I should look for a role that's related to...
I'm Data Scientist and the hype is making me really jaded about the whole thing. I work in a customer facing role and it's just painful to see how many sensible and solvable problems with ML/Deep Learning get buried and destroyed under the pile of hype of solving unrealistic "sexy" problems with "AI." It was okay when Machine Learning was the hot thing, it's been awful since "AI" became the hot thing.
>At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
This seems extremely short-sighted to me.
If the barrier to entry (on an already relatively young technology) has come down so far that there are a bunch of noobs running rampant right now, does that not bode extremely well for future advancements in the field (assuming some non-zero conversion rate from noobs to productive members of the field over time)?
Also, let's not pretend that people with relevant degrees don't make shitty contributions to arXiv all the time.
I'm surprised anyone would default to trusting arXiv papers, or even browse new additions, without first checking up on the credentials and experience of the authors.
If the problem is finding new papers, then you really should be using sources other than arXiv.
Are you sure you want to start excluding and limiting this field? Part of the higher salaries is hype not necessarily value. That brings in the crowd who download change a parameter or two. They are not making world-wide breakthroughs but through that simple act they understand how things work better and can make more informed decisions around this topic.
Any company who hires based on a simple hello world might be the company that doesn't even know why they want ml but follow the trend. This person might be the best candidate because they can think about ml on a slighly deeper level than the company and can help bridge that first step. They probably know the buzzwords, what's hot that's important to marketing.
I would disagree that having a lot of interest hurts the field. Where are you seeing “noise and low quality work”? For your sake, I hope you’re not reading random papers from unknown authors on arXiv in your spare time!
I think we’ve seen impressive contribution from people without PhDs. Chris Olah and Alec Radford come to mind first. (Note: I’m not implying that you disagree with that statement, just wanted to point some role models out to those without PhDs who want to contribute to the literature.)
High quality work comes from a tiny incisive fraction of the research community. Most published research, even from PhDs, isn’t worth reading. Easily accessible tutorials promoting Colab are not the problem!
there are many papers that seem like obvious crap that get into supposedly prestigious conferences (NeurIPS, ICML) every year. I think many people might reasonably believe that a paper in the supposed best venue in the field will be worth reading, but that is not really true.
compared to machine learning, other research communities in computer science have much less nonsense. you can really feel the difference in quality if you look at papers accepted to equivalently competitive non-ML conferences.
i think the broader problem may be that people hiring for very lucrative, competitive jobs are effectively outsourcing their hiring decisions to conference reviewers, which causes the whole system to break down.
>At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
I'm sorry but given that many papers in NeurIPS, ICML, etc are exactly what you described I find your criticism a bit lacking.
The peer review system is so broken, especially with the number of submissions that the top journals get in ML now, that I don't even pay attention to what's accepted where anymore. It's all the same as arXiv to me. The best way to figure out what's useful and what's not is to wait and see which papers pass the test of time. But if you're an academic you don't necessarily have that luxury.
Well that definitely speaks more to the field in general and not necessarily to "beginners" who tune a pretrained model and post the results to arXiv. There's massive incentives to post papers with incremental progress when there's potentially billions of dollars of Grants and VC money for folks with their names on those papers.
> At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
Note that this is not a problem exclusive to AI or Computer Science.
I have a hobbyist interest in Entomology, and I was so disappointed to see that people are still pumping out papers that are minor tweaks off old population modeling papers from the 1980s. The field is shockingly stagnant. I've read random PhD theses from the 70s that are written on damn typewriters that are higher quality than modern papers from so-called "top tier" research universities.
put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
That’s one way of looking at it. Perhaps another way is ask why all of these employers are hiring people who have done low quality research without understanding who they’re really hiring.
It reminds me of the adverse market effects on gamers when Bitcoin miners were buying up all the GPUs a few years ago. It’s another emergent collective phenomenon that’s distorting the market.
I think there may be quite a few employers out there with non-technical management who have become convinced that they need to hire a machine learning expert, without any particular reason why. They might hear a competitor has hired someone and so they need to as well. Really weird.
1. There is much more supply than demand in terms of Machine Learning and Data Science. There aren't actually that many jobs outside of research, it's just that the hype makes it look that way. Now that all these PhDs seem to be starting in ML, I wonder what they will do in 4-5 years when they finish the degree. I don't think a market for them will exist.
2. Many companies don't know what they are doing. They are hiring ML people because they want to put AI into their marketing materials. In reality, they don't need ML, they just need someone collecting data, building a database, and running a query. They just don't realize that's the case. The same happened with "Data Science" and "Big Data" - What most companies needed were software engineers building infrastructure and data collection, not people running sklearn.
> they want to put AI into their marketing materials.
Somebody should start a thread that captures the silly examples of this. My personal favorite was a workout template “powered by AI”. Mind you, the only information the customer provided was the basics like age, sex, weight, and goal. This signaled peak AI hype for me.
I have seen RFPs that ask for machine learning based solutions without having any clue how they are at all applicable for their business, just because they wanted to eventually have some headlines with the updated software.
I consider this equivalent to the democratization of web site development and then app development. It’s certainly lead to an explosion of crummy and security-nightmare apps but in exchange has been an on-ramp for some good developers and exciting products as well.
The massive hype surrounding anything “AI” has caused the literature to become a dumpster fire, yes, but a handful of good papers still appear. Just use a low pass filter as with most things these days.
I agree in some sense but I also think that everybody has the right to learn at any level, mainly because not everything in neural networks field is research to create the next optimizer or a new architecture.
This is perfect for the people I work with and the role we have (and I am talking about some of them are PhDs in math and physics but without much CS knowledge). Some of this people just need to see this is a stack of non linear functions that have to be minimized and they grasp the idea of nn really fast.
To me this would be like saying that if you don't know computational complexity theory or how compilers work you are not a good software developer, but I think is just a different level and as long as you don't try to inflate your resume is fine.
Sometimes I feel some fear on AI field when the knowledge is spread, and there is a bunch of backgrounds (maths, physics, quantitative finance, chemistry, ...) that need exactly this to at list demistify nn.
Yeah, if you're not programming your models in binary then gtfo imposters!!
On a more serious note, yes understanding this and anything really takes time and investment. The problem for me at least originally not only with ML but originally with engineering back when I was trying to learn (and couldn't afford school), was finding quality sources for getting started in that process of learning. By providing simplified resources like this google one, the hope is that many beginners can get that one "aha!" moment where they start the basic understanding that allows them to start tinkering and learning.
People without a decent understanding shouldn't be submitting research papers, full on agree there. It's basically a waste of everyone's time and harmful for the field of research as a whole as it dilutes the overall signal to noise ratio. However there's so much space in the ML world that doesn't involve research, not only for fun hobby projects, but also even professionally. Resources like this are critical to reducing the knowledge gap out there between researchers and the programmers in the field that work on little ML projects for doing things like sentiment analysis for their company.
These sub-research projects are mission critical at a lot of companies yet are held up at the majority of non-FAANG companies because there's only one data scientist while the teams of engineers are clueless as to how to assist.
> the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
Then the problem isn't the low barrier and ease of use, it's with arXiv's filtering and how resumes are valued and verified.
I guess the same could be argued for coding, in general. I myself don't have a degree in CS yet I can tell amateur code from well designed one (unless I have fooled everyone in the last 20 years!).
The industry has ways to filter sloppy work, somehow, I wouldn't worry too much about it. I would, instead, include the necessity of being able to judge good from bad in this fieald too, as it's always been for everything else.
But then again, a lot of current ML jobs are basically just that - finding some optimal architecture, tune hyperparameters, and bam! You're now a modern "AI" powered company.
Heck, I've encountered plenty of ML jobs that didn't require anything more than familiarity with some known frameworks or libraries, and being able to apply known methods to real-world data / problems.
So I can absolutely understand why people are copy/pasting tutorials or papers, and just doing some slight changes. You're practically miles ahead of the competition, when it comes to the job search.
This is really a myth. Most ML jobs require very detailed understanding of statistics because the devil is in the details.
You need to understand things like multicollinearity, coding biases, missing data techniques, convergence of Markov chains, learning curves, mechanics of various higher order gradient optimization methods, how to really carefully evaluate goodness of fit in a huge range of categories of models (neural nets are the vast minority of all models used in production settings) and a ton more beyond this.
If you read “tensorflow for hackers” and believe you can write production neural nets, it’s a disaster.
I can just tell from my own experience, having interviewed and researched a ton of ML jobs: The majority of ML jobs today seem to be re-branded analytics jobs.
I'd say that a solid 4 in 5 of the jobs I've interviewed for, which were tagged in the ML domain, were just that. Typical [x] analytics jobs which don't really require more than stats 101, and good handling of excel. Basic scripting knowledge were often in the nice-to-know section.
Now, there might be a world difference in the typical ML jobs you see in startup hubs like SF, and the jobs you see elsewhere - but companies are, and have been for almost 10 years, been desperate to get onboard of the hype-train, and have re-banded a lot of jobs to attract those wanting to work with ML or Data Science.
I share your sentiment to a fair extent and I have written about it before here on HN.
Yes, for a lot of this stuff you don't need a PhD (and frankly I find that marketing weird for the above course). But you do need strong intuitions, understanding of some CS, and math "savviness" i.e. you don't have to know All The Math now, but you should be able to pick up stuff as needed, when you're trying to understand your problem and/or structuring your solution.
One could learn all of this stuff online today - the amount of good resources out there is crazy. Frankly, I am jealous, because I began working on ML more than 10 yr ago, and we were relatively starved for resources on pretty much all fronts: resources to study from (reading material or videos), affordable compute power, s/w libraries. But unfortunately, despite their abundance today, most people don't take the time to dive deep. Of all the years of me suggesting courses and books to people (when asked), only ONE (or maybe two) person managed to go through them to a fair extent. But there is a significant fraction of the rest, for which reductive messaging like "become a pro in AI in 3 weeks" has been misleading.
As a hiring manager sometimes they are as surprised as me, when an interview doesn't go well, after the resume seemed promising to both sides. And to be very clear, I don't blame them (sure, there are some pretentious opportunists, who flat out lie, but I've found them not to be the norm); all this messaging seems to have created a bubble where often you don't know what you don't know. It's amusing that thrice, rejected candidates reached out me saying that the interview was quite eye opening! I have been at the receiving end too - where 90% of the interview seemed to be about some very specific setting of a library or a method, because that was the conception of ML the interviewer had.
I think people should learn, by whatever means, and create stuff because they can - this is the best kind of learning. Silly projects are great too, if they are fun - if they don't advance your understanding, they might motivate you to be less silly! What's missing in this ecosystem is honest messaging about where your skill levels really are. I don't know how to fix it in a way that also doesn't harm, in some way, the widespread learning/awareness reg ML. On a smaller scale though, I have accepted that this has increased my scope of work in screening resumes: if someone lists her github repo, or an arxiv paper, I actually need to spend time to go through them. I don't see this as noise, but a widening of the spectrum of available ML skills in the market; and I need to put in some effort to place an applicant in this spectrum. I've accepted that this is the flipside of working in a hot area: for the multiple job opportunities accessible to me, I have to put in more thought for hiring. I can't have the luxury of the former without the responsibility of the latter. Although, being lazy, I'd totally want to ;).
The voices need better data curation and longer training, but some speakers such as David Attenborough are quite good.
I've also built a real time streaming voice conversion system. I want to generalize it better so that it can be an actual product. I think it could be a killer app for Discord. Imagine talking to your friends as Ben Stein or Ninja.
I've been watching TTS and VC evolve over the last few years, and it's incredible pace at which things are coming along. There are now singing neural networks that sound better than Vocaloid. If you follow researchers on Github (seriously, their social features are a killer app!), you'll see model after model get uploaded - complete with citations, results. It's super exciting, and it's the future I hoped research would become.
If you're diving into this, I would recommend using PyTorch, not TensorFlow. PyTorch is much easier to use and has better library/language support. TorchScript / JIT is really fantastic, too. I even mean this if just you're poking around with someone else's model - find a PyTorch alternative if you can. It's much easier to wrap your head around. TensorFlow is just too obtuse for no good reason.
Hi Brandon, nice work! Some questions to learn more if you don't mind - are you using Tacoton2 for the voice generation? If it's Tacotron2, are you using a base model before you train up new speakers, or is each speaker model trained from scratch? How long do you run the training for normally (for both cases), and what hardware are you running?
You mentioned elsewhere you're renting the V100s, what services have you used, and would you recommend them?
By the way, your Trumped.com is throwing some errors in the console so the site isn't working for me.
> are you using Tacoton2 for the voice generation?
Nope. glowtts. Tacotron2 has higher fidelity (it's a denser network), but it's really slow and expensive to run.
> are you using a base model before you train up new speakers
Absolutely! Transfer learning is essential to training on sparse or limited data, and it's incredibly effective.
> How long do you run the training for normally
> and what hardware are you running?
I think I explained this in my other answer? But if not, it's typically awhile. The best guiding light though is to frequently listen to the inference results. Are they improving? Watch the Tensorboard graphs to see what loss and attention look like and make sure you're actually learning.
> By the way, your Trumped.com is throwing some errors in the console so the site isn't working for me.
Oh no. I realize there are some bugs I left for iPhone (simply because I don't have one), and I really need to get those fixed. I'm not sure if this is your case or not. Perhaps the pods with the Trump model are also experiencing duress -- I'll need to investigate that too. I have yet to hook up monitoring (yikes).
There's a reason for the word omission: I'm using the CMUdict Grapheme -> Phoneme database. There are 140,000 entries, but it doesn't capture everything. I've had to add words like "pokemon" and "fortnite".
I'm looking for a model that handles arbitrary grapheme/word -> phoneme/polyphone transformation. I'm also interested in perhaps replacing Arpabet with IPA (using the existing arpabet database to construct it). This might work well for non-English languages.
Do you happen to know an existing model that does this?
This is an amazing side project. Would you share some details about how you've hosted them? Also, if possible about the training details like how long did it take for you train them? Did you do it on your local GPU or some cloud provider or free service like Colab?
I'm pretty serious, so I've put some money into it. And even more time.
I've got a 2x1080Ti setup I used locally back in the day, but it's really slow. I still train stuff on it, but only things I know will train successfully for a long time (eg the Melgan model).
I use rented V100 GPUs to train the speaker models. They're quick and allow me to refine the datasets and parameters much more quickly than if I was doing all of it on my own box. Colabs are great and I could probably get along with them if I wasn't running so many experiments in parallel.
I can get reasonable results in a few hours on an 8xV100. Once I hone in on a direction I like, I'll let it train for a few days. (The David Attenborough model is a result of this.)
I still have a ton of refinement to do. I'm also working on singing models, and these should be ready by the weekend.
I've thought about buying beefy GPUs at this point as I've proven to myself it's not just a temporary hobby. Cloud compute is expensive.
The models are hosted on Rust microservices (a frontend proxy that fans out into multiple model servers), and this is deployed to a Kubernetes cluster. I'm planning to add more intelligence to the proxy and individual model containers so they independently scale.
Tangentially related (and also using the ubiquitous MNIST dataset), Sebastian Lague started a brilliant, but unfortunately unfinished video series on building neural networks from scratch.
This video was an absolute eye-opener me [1] on what classification is, how it works and why a non-linear activation function is required. I probably learned more in the 5 minutes watching this than doing multiple Coursera courses on the subject.
One "ah ha" about that required non-linear function I had was the fact that if you were just passing numbers through a series of linear functions they could by definition just be combined into one equation.
The whole AI/ML stuff has become so hyped up that its probably time to find another topic of interest in software engineering for me. Its a weird melange nowadays where frameworks and "academic credentials" are fused together by major tech companies and leaves me - who has deployed a dozen of classical ML models into production that are still running after couple of years - wondering what this is all about.
Overall, working with people with different backgrounds, a ML-related PhD is usually not correlated nor anti-correlated to these people having a good understanding of the relationship between models and and their applications.
I wish we could leave the framework and name-dropping behind and talk more about what it takes to evaluate predictions, how to cope with biases, etc.
Really familiar territory. I think the hype has poisoned the minds of many and at this state "AI/ML" has turned into a simple buzzword. Much like "blockchain" 2 years ago. And while I'm still as fascinated about ml as I was 5 years ago, like many others, I've decided to stay in the shadows and do my own thing just for the fun of it. Especially since marketing and ego started playing a big role around those communities. It genuinely makes me sad but I think I always knew in the back of my mind that this would likely turn out to be a nail in the coffin of AI/ML, not robots taking over the world.
The way I see it, ML/AI is nothing more than a marketing campaign for much of the industry and few people realize that it's often a small component and rarely a major selling point for anything. Like "ml-powered kitchen blender" or whatever. As you said, few people discuss evaluating predictions, tackling biases. I suspect because most people are a lot more interested in snatching a piece of the cake.
It's different. With the ML stuff there's a bunch of actually useful applications and interesting problems at the core with a lot of fluff and marketing piled on top of it. That's the reason why you're seeing the ML/AI hype last so much longer than blockchain (which was basically a quick cash grab with no substance).
Don't get me wrong, AI/ML is immensely more valuable than blockchain ever was. Not a single doubt in my mind. But in terms of exploitation for marketing purposes - it's a very similar story. That's what I'm referring to.
> It's different. With the ML stuff there's a bunch of actually useful applications and interesting problems at the core with a lot of fluff and marketing piled on top of it. That's the reason why you're seeing the ML/AI hype last so much longer than blockchain (which was basically a quick cash grab with no substance).
LOL...I thought supermarkets were using blockchain to track the provenance of their cabbages, coffee beans, beef joints, etc LOL
I think you have a point about ML/AI. To my mind, judging by the hype, there seem to be a lot of solutions looking for a problem. Having said that I feel I should also jump on the bandwagon and get my ML/AI credentials as an insurance against future demand for the skillset ;-)
"I wish we could leave the framework and name-dropping behind and talk more about what it takes to evaluate predictions, how to cope with biases, etc."
We can, can't we? "We" as in professional software developers. I always thought of this word-hyping of a management thing and a buzzword pool for non-techies.
I know this influences our work, but that doesn't keep us concentrate on what's really up... or am i wrong?
By the way, I tried getting into ML but i'm really poor at maths and at that time was not willing to put time into maths. And nearly every tutorial back then threw formula after formula at my face... So a bit of mathematical education could not hurt. Doesn't have to be a PhD though.
> We can, can't we? "We" as in professional software developers. I always thought of this word-hyping of a management thing and a buzzword pool for non-techies. I know this influences our work, but that doesn't keep us concentrate on what's really up... or am i wrong?
Its just tiresome because it is tiring to refute BS or to explain why a certain approach is not viable.
> So a bit of mathematical education could not hurt
definitely but the word "PhD" is often welded as if you'd have to have secret knowledge that is otherwise not accessible to you, which isn't true.
Well the hype is required to get your grandmom, older execs, or a strategy/biz dev team at a brick and mortar firm who can afford only 1 dev to gain confidence that they too can use ML.
Its true that with todays Frameworks and easy API calls almost everyone with a little technological background can deploy a ML/AI model and get sufficiant results. But Bootcamps cannot replace an academic education. As soon as you are not able to understand and review new released papers and insights and have to wait for an high level blog entry or video course on that topic you are worth nothing. Without a deeper understanding you can just guess what is going on inside that blackbox NN or ML model and have to rely on blindly change parameters and even worse you are not able to understand your results or compare someoneleses results with yours using statistical tests and so on. So in the end people (maybe not everyone) without academic background are just API callers that will struggle on the long term.
It's a pity that most Tensorflow tutorials out there seem to deal with images. We tried to use it for real-time data classification (data -> [yes | no]). Every tuturial out there seems to assume you're using Python (which is probably not an invalid assumption). Here's my 2c when trying to use Tensorflow with C++:
a) Loading SavedModels is a pain. I has to trawl the Tensorflow repo and Python wrappers to see how it worked.
b) It's incredibly slow. It added ~250ms to our latency. We had to drop it.
c) It has a C++ framework that doesn't work out-of-the-box, you have to use the C lib that wraps an old version of the C++ framework (confused? me too).
d) It's locked to C++03.
Tensorflow-Lite looked to fit the bill for us, but our model weren't convertible to it. We no longer use Tensorflow.
I don't understand why you are getting downvoted. TensorFlow 1.x barely worked and people stuck with it because the alternatives were worse. I moved to PyTorch as soon as I could because it is better than TensorFlow 1.x, TensorFlow 2.x, or Keras w/TensorFlow backend.
TensorFlow is designed by committee and is more of a brand now than a machine learning framework. Did you know there is TensorFlow Quantum?
There were alternatives. IMO, Microsoft Cntk being one which was better than tensorflow. Tensorflow 'won' because of herd mentality and the perception that everything google does is cool.
I am using Python's TensorFlow API from C# through my own binding, and I don't understand how you got 250ms latency on C API without screwing up on your side. I could effortlessly run a super real-time network playing a video game with soft actor-critic with my setup on 1.15.
We didn't understand either. One thing, our cloud instances had no GPU's. Our model wasn't exactly complex, just maybe 15 float and string scalar inputs.
This is a nice introduction, even though as most of the tutorials on ML it goes from 0 to 100 in 2 lessons.
A couple years ago I started studying ML, and I have a design background, so I needed to digest all the math and concepts slowly in order to understand them properly.
Now I think I understand most of the fundamental concepts, and I've been using it quite a lot for creative applications and teaching, and I have to say the best resource I've found for beginners, by far, is "Make Your Own Neural Network" by Tariq Rashid.
It starts really from the beginning and it takes you through all the steps of building a NN from zero, with no previous knowledge. really good.
Since everyone is talking about hype in ML, I wish there was some hype for good ole' conversional scientific computing. Yes, it's not so sexy, you have to build your own model yourself, and then the hard work is in finding and verifying a suitable numerical method and finally devising a solid implementation. It requires a vast number of different skills, anything from pure math to low level programming and it is definitely not trivial work, but it does not seem like it pays that well.
It's also a lot of fun. For anyone that likes math, but also likes weird approximations, there's all kinds of juicy stuff in scientific computing. It will make you a better ML coder, too.
I am constantly puzzled by people saying that AI is overhyped and fresh grads won't have enough jobs for them. Almost every real life industry: retail, logistics, construction, farming, heavy industries, mining, medicine have just recently started to try AI. The amount of manual and suboptimal tasks that have to be automated and optimized is enormous. I am pretty sure there is more the enough work for applied DSs with domain knowledge in mentioned industries.
Agreed, I feel we are just barely scratching the surface. I'm fairly involved in the AI/ML world (research at FAANG) and the amount of hype definitely scares me, but ML/AI isn't going anywhere soon imo.
I agree with this totally. The amount of benefit we've seen in the last 5 years due to ML model improvements is staggering. There is huge potential to apply even pre-trained models in tons of industries. BERT + fine tuning solves domain-specific NLP problems far better than the cutting edge research of a couple years ago.
That being said, the business / marketing use of terms like "AI" is way out of control.
This is very well done, hitting on some pain points and explaining how to work around them.
I have devoted close to 100% of my paid working time on deep learning for the last six years (most recently managing a deep learning team) and not only has the technology advanced rapidly but the online learning resources have kept pace.
A personal issue: after seven years of not updating my Java AI book, I am taking advantage of free time at home to do a major update. New material on deep learning was the most difficult change because there are so many great resources, and there is only so much you can do in one long chapter. I ended up deciding to do just two DL4J examples and then spending most of the chapter just on advice.
The field of deep learning is getting saturated. Recently I did a free mentoring session with someone with a very good educational background (PhD from MIT) and we were talking about needing specializations and specific skills, just using things like DL, cloud dev ops, etc. as necessary tools, but not always enough to base a career on.
Definitely it can help peoples’ careers working through great online DL material, but great careers are usually made by having good expertise in two or three areas. Learn DL but combine that with other skills and domain knowledge.
I attended a conference talk of a FB AI-engineer talking about her paper with backprop equations so obviously wrong my eyes hurt, and incorrect definitions of objects. It did not stop her from participating (btw. this is always unclear -- who did what) in state-of-the art research in object detection.
PhD is overrated in the deep learning context. It is more about forging the intellectual resilience and ability to pursue ideas for months/years than learning useful things/tricks/theorems.
Twenty five years ago, this would have been "LINUX, UNIX, and serving, without a PhD" and Matt Welsh's Linux Installation And Getting Started was the intro (https://www.mdw.la/papers/linux-getting-started.pdf). I was one of many who adopted Linux early, using this book (later I read the BSD Unix Design and Implementation, which I would describe as senior undergrad/junior grad student material).
Having those sorts of resources to introduce junior folks to advanced concepts are really great to me- my experience is that I learn a lot more by reading a good tutorial than a theory book, up until I need to do advanced work (this is particular to my style of learning; I can read code that implements math, but struggle to parse math symbology).
Agreed. I used TensorFlow 1.x before PyTorch matured. Today, all of my deep learning is PyTorch based because TensorFlow 2.x didn't fix the core TensorFlow 1.x issues (e.g., poor reproducibility, unintuitive API design) and PyTorch performs better than Keras w/TensorFlow backend.
For what it's worth, I've found Pytorch to be much more rigid than TF. Maybe I just haven't found the easy way to do things. For example here's a function that applies an N×N box filter to all but the first 2 dimensions of a tensor (apologies to mobile users):
Is there a simple way to do this in Pytorch? Preferably without having to inherit from the base class for convolution. It seems to me that Pytorch is like Keras and Tensorflow is like Numpy.
I read somewhere saying TF2 is good for production while Pytorch is good for research(and papers), is this true? I'm more interested in putting one of them into real products, esp standalone embedded devices.
It used to be true, but nowadays there are several options to deploy Pytorch models:
1. PyTorch C++ API which can trivially load and execute your models exported via JIT
2. ONNX export and inference in TensorRT (highest performance inference option)
3. Or just deploy straight up PyTorch Python code - it'll run fine in "production".
One place where PyTorch is weaker than TF is mobile. TFLite is a lot more mature and has all sorts of acceleration support (GPU, DSP). So if that's what you need, at this point there's really no other good choice IMO.
In my experience, not quite as fast for fully-tuned code, but the difference is small - and given the same project deadline, the PyTorch version will probably be faster.
Properly tuned (DistributedDataParallel + mixed precision) it will train faster, and consume a lot less RAM, allowing you to use larger batches, and higher LR.
The video version [1] is also pretty awesome, though its code itself is a bit outdated now.
Explains a lot of very practical issues that you might not find in most academic textbooks, but you encounter every day in practice.
Have a simple rant here. All these BIG $ companies every now and then come out with statements and what not, that doing AI ML is very easy and every one including their cats should do AI, ML courses and training(preferably on their platform). Once that is done the job market is yours. Reality is far from this.
- Today AI|ML does not have the capability marketed by these big companies. Incidentally marketing is targeted at governments, big non tech companies and gullible undergraduates.
- Undergrads many a times take these training courses wherein they acquire the skill set to call these APIs and flood the job market wherein a data-entry or data-analyst job is tagged as AI|ML job.
- High paying jobs in AI|ML still require a Masters or PhD or a mathematical background.
In conclusion, the current hype around AI|ML is misguiding gullible undergrads and governments(I dont mind the government being cheated THOUGH).
I’d like to add to that list all the “learn AI” blogs etc. that gloss over the details and show you how to call some libraries to do cool stuff. Everyone and their cat can learn how to invoke some APIs, which is cool, but the reality is that data is hard, and if you want to do good, sound work, you need to have an understanding of the math and fundamental principles behind it.
It might not save you from making stupid mistakes or wasting your time, but at least you will know what mistakes you can - and will - make.
Another side of this that I've seen is people with PhDs in applied mathematics on a research team where a PhD is a hard requirement, spending all of their time passing dataframes around and convincing the sales team that the bar charts in the product are in fact correct.
Have we been working on the same team? :P
The worst part (or best part of the joke) is that with a mathematical PhD you're very ill equiped for a discussion with your typical sales rep.
i am doing quite some deep learning work as part of my consultancy practice. And its all hand made stuff , together with lots of trial and error while trying to replicate papers that might be relevant fkr the task at hand.
so i totally agree with your statement. big corps overhype the shit out of it in order to sell and lots of n00bs fail for it.
regarding your last points. even if its super easy nowadays to deploy YOLO, and everyone and his mother can do it, making actually something that works and provides business value is hardcore. and without scientific skills_> no chance
if you don't mind me asking, what do you mean by "hand made". I'm currently wworking on a information science degree and I'm trying to focus on machine learning and data science. Could you go into a little more detail on what gives something business value?
- looking very carefully into the very specific challenge of your client
- figuring out how (and if) ML can help
- figuring if its still economically feasible (costs of research vs perceived(!) benefit)
- deriving a solution.
- tinkering tinkering tinkering. usually more with the data than with the models :-)
All my A.I. projects are essentially outsourced R&D projects where we deliver the brain and computing power. So far, it never was as easy like installing YOLO or any other off the shelf product.
Edit:
You also need very often custom software to create custom datasets. AI models are often only tested on academic datasets but I observed empirically that their performance transfers badly to real world datasets. So you need to create your own datasets etc. This is often a non-trivial problem. So I wrote a lot of dataset creation tools in my AI practice.
Not the OP, but ultimately something that increases revenue or decreases costs by some measurable amount.
The best thing you can do to make yourself good at this is to practice. Get some Kaggle data and try to fit a model. Realise your data is crap, clean data, repeat.
Every useful system is hand made in the sense that there's a vast amount of set up and operational code. Mostly the model's the easy part (although it will take so much time to run).
Actually, making some things with ML has become really easy, google auto ml image recognition, or the equivalent with azure and amazon gives you a website to upload tagged images and builds a model based on it. If the problem to solve it's not very complicated (let say, room type recognition) it just works. The only problem is if you try to apply that model in volume, prepare for the costs.
In my case, I train my model with them and download to use on our machines without having to touch any ML related code.
So lets see. Suppose you do a Google Certification for AI|ML. This makes you an apt user of Google's AI platform. Your value lies not with Google but with other companies that want to use Google's platform for AI, ML work.
For specific AI,ML work(like developing the API that you are using), Google will hire PhDs and grad students who specialise in AI|ML. For engineering solutions of those products Google will hire software and distributed systems engineer. You will be hired by someone who wants to use Googles platform.
The title is obviously clickbait-y, but it’s fine: they’re trying to sell a product (Google Colab).
IMO if you’re interested in AI research or ML engineering, you already know that — in order to avoid getting people killed - you have to understand how it works under-the-hood. You’re doing yourself, your employer and your fellow humans a favour.
Just keep up the good work, and ignore the bullshit. If an AI winter comes, you’ll be well prepared to migrate to another engineering role.
I wonder if Google is using this resource to train their own staff without PhD and after that allow them to work as ML engineers? That would provide credibility to such program - instead it is more aimed to sell more ML computing power to the masses (who won't really understand how to use it to get meaningful results).
As usual with tools, it’s the use instead of the instrument. Domain knowledge still has advantages against generalism, in applied and critical fields. I mean, you can’t do medicine or materials yet with AI/ML without understanding the domain?
Is this tutorial aiming at entry level? Those graphs are quite difficult. I guess it will take a lot of time to do homework on some fundamental curriculums.
It is totally possible and easy to use tensorflow/torch if you havn't skipped linear algebra classes. Ph.D is needed if you are going for a job where you design a sophisticated model (not just adding layers, but experimenting with activation, attention etc).
A PhD isn't even a requirement for doing the more advanced stuff. Obviously you need a lot of math and ML specific knowledge but there's no reason why you can't have that knowledge with an undergraduate math degree (for example). Spending 3-6 years doing research in a very narrow and possibly unrelated branch of mathematics will give you a PhD, but the linear algebra and multivariable calculus that you actually need for the ML stuff are covered in a bunch of undergrad/masters courses in mathematics, computer science, engineering, physics etc.
I second this. I have worked with a bunch of undergrads (I am pursuing a masters degree in CS) and they had a thorough grasp of the math and could really contribute to the research agenda of the group. When I did my undergrad (in 2011-15), I ended up taking a lot of electronics/hardware courses. Turns out undergrads these days just swap them with math/machine learning courses. Good for them.
I would say a PhD (or masters degree minimum) is required if you are developing new novel architectures, but not to iterate on existing architectures for a particular application.
Applied Data Science does not require developing as yet unseen architectures, being able to read cutting edge papers and applying the research is more than likely to enough.
At the same time, the low barrier of entry and hype has resulted in a huge amount of people downloading Keras, copying a bunch of code, tuning a few parameters, and then putting their result on arXiv so they can put AI research on their resume. This has resulted in so much noise and low quality work that it really hurts the field.
You don't need a degree, but I think you do need to spend some time to get a deep enough understanding of what's going on under the hood, which often includes some math and takes time. This can be made accessible and there are plenty of good resources for that. But all these "become an AI pro by looking at some visualizations and copying this code" is maybe hurting more than it helps because it gives the illusion of understanding when it's actually not there. I wouldn't want people learning (solely) from this touching my production systems, writing blogs, or putting papers on arXiv.