Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning: The Great Stagnation (marksaroufim.substack.com)
484 points by puttycat 10 months ago | hide | past | favorite | 218 comments

Some good points in the article, but I disagree with the tone and the conclusion.

> we’ve rewarded and lauded incremental researchers as innovators, increased their budgets so they can do even more incremental research

There isn't a scientific field where every single paper is groundbreaking. It's a Brownian motion of small incremental innovations, until eventually we stumble upon something big (like deep learning). In no way is machine learning unique in this. Sounds like the author is simply disappointed that, like in any other profession, day-to-day of a researcher is a slog and not a perennial intellectual festival. We've been in an exciting deep learning craze for a while, but it's silly to expect it to last forever. Back to the grind now.

> Machine Learning Researchers can now engage in risk-free, high-income, high-prestige work

Not sure what author means by "risk-free". Yes, if you're not publishing enough you're most likely not going to starve. Is that a bad thing? Is the survival instinct the only good motivator for doing good research?

I would argue that there's plenty of risk, in that people who don't publish good research don't get very far in their academic careers, which in my view is good enough motivation. "They must do good research or starve" is a rather cynical take, especially from someone who seems to not be doing too badly for themselves.

I'd rather more fields provided similar benefits. Maybe then going into science wouldn't be associated with so much sacrifice, so more smart people would choose science over investment banking or such, and we'd make more scientific progress faster.

> CNNs use convolutions which are a generalization of matrix multiplication.

A nitpick: CNNs are most definitely not a generalization of matrix multiplication. In fact, the opposite: you can view CNNs as a matrix multiplication with a particular matrix structure.

>We've been in an exciting deep learning craze for a while, but it's silly to expect it to last forever. Back to the grind now.

This was effectively my response to hardmaru when this topic came up on reddit [1]

Basically 2010-2018 was an open field for ML/DL research with old(ish) methods being rapidly applied to low hanging fruit and large datasets with newly cheap compute.

Deepmind and others are actually making new methods but by and large are remixes of those same old approaches.

The majority of different research out there trying other approaches (Numenta, OpenCog, Causal Calculus, anything Schmidhuber etc...) don't really get any love because it doesn't fit within the mass tensorflow/torch framework.


> Basically 2010-2018 was an open field for ML/DL research with old(ish) methods being rapidly applied to low hanging fruit and large datasets with newly cheap compute.

Absolutely! And what many folks need to continue to remember is that many scientific disciplines and domains are really just starting to wrestle with the utility and implications of this first generation of deep learning tools and applications. I graduated with my PhD in atmospheric science from an R1 just over 4 years ago; at that time, very few people were looking at how DL provided useful tools for their work. These days, the field is inundated with folks playing with these tools and knocking tons of low-hanging fruit off the tree - it might not be "deep", revolutionary research, but it's fomenting a mini-revolution with respect to R2O and real applications of what had previously been somewhat niche science.

There's no reason to think this trend won't continue. New tools let new generations of scientists take new stabs at their discipline, and of course the low-hanging fruit drops first as folks get their bearing, build skills/experience, and - most importantly - prove efficacy so that they can get funding for more ambitious work.

This seems to be the trend in every computational science field. Take something compute intensive, apply DL magic and get results in a fraction of the CPU time as the old approach.

Similarly on the commercial side, plenty of opportunity to solve existing problems with DL.

So while it may be true that DL research progress has slowed, still plenty to do in applying existing DL.

Those other approaches you mentioned don’t get much love because they don’t work. Their advocates worked on them for many years and have nothing to show for it.

>they don't work.

In the same way that ANNs didn't work for quite some time, until we've had the compute and the data to train them successfully?

I get that it's important to prove that an idea is worthwhile, and the easiest way to do that is to use it to solve a practical problem. At the same time, I am conscious that we shouldn't put all our eggs in the deep learning basket: who knows where the ceiling is going to be.

Don't get me wrong, I like deep learning, and you have to be silly not to admit how successful it has been. But the field would be so much more boring if not for the people with alternative views and ideas.

No, not in the same way. Convnets and lstms worked great when they were invented in the 90s. Yes, they were limited by the available compute power but they did work well right away.

When I was in grad school in 1999 studying AI the general consensus was that neural networks didn't work very well, and that you were better off with the more mathematically grounded methods like support vector machines. TD-Gammon was just about the only success story for neural networks, there hadn't been much else since 1992.

I don't know where that "general consensus" came from, because in 1998 Lecun clearly showed [1] convnets beating all other models, including SVMs, at image recognition.

[1] http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

It like other neural network research was ignored because neural networks were considered a dead-end at the time. In the early 2000s I recreated his LeNet-5 implementation, and no one was interested despite the great results I was getting in OCR and medical image processing with tumor detection.

Younger people don't realize there was strong bias against using neural networks in the late 90s up until Hinton's talk on NNs around 2007. I get the feeling we're going through the same thing where novel research is becoming ignored because everything must fit the deep learning paradigm to be noticed.

Most research will be ignored if it does not have good results, regardless of whether it's novel, or whether it's DL. In fact, a lot of recently published DL research is ignored for exactly this reason. Top DL conferences are so competitive currently that the quality bar is pretty high.

There are lots of ideas floating around in AI field. Some of them might be good, most are not. If you have an idea and want others to look at it you better demonstrate how it outperforms every other method when applied to some task.

OP's point is that research is/was being ignored despite having good results. But this is normal, it just takes time, and a critical mass of good results for most researchers to switch to the new paradigm. (A decade is a very short time in the history of science.)

Like they say, science progresses one funeral at a time.

ConvNets were invented in the 80s.

The first convnet that is similar to what we use today was described in 1989 [1], and immediately became the best method to do a handwritten digit recognition.

[1] http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf

As opposed to DNNs, which worked right off the bat with no decade of ignominy... /s

There isn't a scientific field where every single paper is groundbreaking. It's a Brownian motion of small incremental innovations, until eventually we stumble upon something big (like deep learning). In no way is machine learning unique in this.

Except ... while machine learning is great, has made important and significant strides, it's not a yet science. It involves essentially a series of sophisticated, mathematically informed recipes for feeding data to giant algorithms and having them create something useful (maybe very useful but still).

An analogy from a couple years ago is bridge building before physics. You accumulate rules of thumb, you get a vague understand what works. You get better. But you aren't producing a systematic field.

And that implies merely advancement isn't necessarily progress (which isn't to say there's no progress but building larger SOTA isn't that as the article notes).

I would agree that as a science machine learning is in its infancy. There's statistical learning theory, but arguably that's a fairly small subfield, if not a separate field completely. ML has largely been an engineering enterprise so far, but in my experience the interest in developing underlying theory has ramped up in recent years.

What we should be careful about is to not be too strict when defining "science". In my view the goal of any scientific field is to build understanding that is useful for predicting outcomes of experiments. Now, this understanding could be defined mathematically, but it doesn't have to be. Don't see why building heuristics can't be a part of this, assuming such heuristics reliably predict outcomes of experiments.

>ing is great, has made important and significant strides, it's not a yet science. It involves essentially a series of sophisticated, mathematically informed recipes for feeding data to giant algorithms and having them create something useful (maybe very useful but still).

So... scientists need to do scienceing until it is. This is what happened with Biology over the last 50 years after 2000 years of pinning things on cards and putting them in draws.

My original point was in response to OP's saying that not every paper needs to be brilliant. The thing about "normal science" is that begins with more or less verified theories and extends their theory and practice - until they fail, reach the edge of the theories, and the science community must search for alternative hypotheses. Machine learning is more working with a bunch of practices, rules of thumb and suggestions for doing practices. These can't really fail since they're roughly repeating themselves in different domains. The problem is that can be done forever and you never need new theories or approaches.

So I'd claim we're really at the situation right now, at the exploring new ideas, the "crisis of science" phase where essentially people have to start brainstorming (and not all ideas are good here either but they need to be somewhat original).

All this is using Thomas Kuhn's Structure Of Scientific Revolutions model very roughly.

> An analogy from a couple years ago is bridge building before physics. You accumulate rules of thumb, you get a vague understand what works. You get better. But you aren't producing a systematic field.

This would be more like Kuhn's pre-paradigmatic (pre-scientific) activity, than moving from one scientific paradigm to another ?


(I don't know which one is more appropriate here.)

Yeah - SVM were considered to be optimal, then they got smashed by deep networks, now people keep bringing them back and saying "they are just as good" and yet deep networks keep being used to do all the breakthrough work.

Thanks for your comment. I should go and read someone who’s actually studied the phenomenon, instead of making conjectures.

> An analogy from a couple years ago is bridge building before physics. You accumulate rules of thumb, you get a vague understand what works. You get better. But you aren't producing a systematic field.

Sounds like science to me; systematic recording of (perceived) cause and effect.

Arguably, deep learning and cell biology both appear like equal parts pure wizardry and flailing in the dark, but maybe that’s just because we haven’t gathered enough pieces yet, and not necessarily because people are doing the wrong things, thus failing to advance?

Currently ML is building model to fit data and trying to avoid over-fitting. I think "engineering field" is a more suitable term.

I think you are exactly right, and want to add a couple things ():

- It's easy to recognize scientific / technological revolutions in hindsight, but at the time they're anything but. I think most recognize the importance of persistence in the progenitor of an idea. What's often missed in these discussions is how important the subsequent incremental progress is: working out consequences of a theoretical insight, figuring out what you can build on top of a new tech, etc.

- I don't have a strong opinion about the optimal amount of "risk," but I do think making risk existential (as in one would starve without a research breakthrough) would have the opposite effect, because while we all say we like people to take risks, what we really mean is like people to take risks and suceeed. And yes, too little risk can breed complacency.

And yes, convolutions are a specific kind of linear transformation, whereas matrices represent linear transformations on vector spaces. The specific structure is that convolutions represent linear transformations that are translation-invariant, a property that many types of data (e.g., images) have, at least approximately.

() My perspective as an academic but not a computer scientist.

Where in the world did you come across the () ... () style? I like it.

Found the LISP programmer

Thanks. I actually type "(asterisk)", forgetting every single time that HN uses asterisks for formatting, and too lazy to go back and fix it...

Couldn’t agree more.

The author cites no compelling trend that ML is stagnating, esp relative to other disciplines.

I’d also add: is it BAD if universities are churning out highly skilled workers when there is high demand for them?

This seems to be more a rant of ML becoming more mainstream and accessible to a wider variety of students than anything.

They admit that some of the most “innovative and challenging” problems are still around, but many do not focus on that. Ok, so what, there are maybe more tiers and splitting of work into subdomains, some more “vocational” and some more “research-y” in nature. Is this a BAD thing?

Just to be clear, there's a lot of researchers making money that aren't producing squat. There are many AI ethicists I admire, and there are many that can't code, can't produce, and just spend their time getting into arguments with people like Yann LeCunn, who actually have produced groundbreaking research.

LeCon's work wasn't groundbreaking; he was saved from oblivion only by Moore's Law. LeCun stood still doing the same thing for decades. But when wave after wave of faster hardware arrived he was finally declared successful.

So if you want to credit success to dumb luck, or to stubbornness, or insanity (as Einstein said: "doing the same thing over and over again and expecting different results." then LeCon is your man. I don't know of any quote where LeCun says something to the effect of: "Well, I knew the hardware would speed up and then my neural net software would work exactly as I designed it to and manifest AI. So I waited twenty years."

IOW It wasn't the LeCunn's surfboard(his software) that handed him success, it was the wave(advancing hardware speeds):


> "Well, I knew the hardware would speed up and then my neural net software would work exactly as I designed it to and manifest AI. So I waited twenty years."

You're talking like we only discovered Moore's law now, instead of it being a good bet for the last half-century...

I hope calling him a “con” wasn’t intentional.

He speaks French after all...

>> CNNs use convolutions which are a generalization of matrix multiplication.

> A nitpick: CNNs are most definitely not a generalization of matrix multiplication. In fact, the opposite: you can view CNNs as a matrix multiplication with a particular matrix structure.

Neither is really a generalization. Any matrix multiplication can be implemented with a convolution function, and yet any convolution can be represented as a matrix multiplication via im2col.

Discrete convolutions are a special case of matrix multiplication in the sense that a discrete convolution can be formulated as the product between a Toeplitz matrix and the input signal.

[1] https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convo...

Having a sinecure from which you can make forays into new ideas is far from the reality of most jobs. It is more akin to the "gentlemen scientists" of yore. Whether this is actually a good way to stimulate knowledge accumulation seems like something that should be studied rather than assumed.

Agree, and the article sets up a weird dichotomy between empiricism and innovation. The empirical work of adapting a Transformer architecture to be smaller or faster or work better on a particular task is scientific work. Like a lot of Comp Sci, a lot of it is engineering, but there is a clear scientific component in that you come up with a model first, then implement and test it (using a lot of pre-built components). It's certainly as scientific as e.g. designing a medicine.

See also the recent discussion on Peer-reviewed papers are getting increasingly boring.


> It's a Brownian motion of small incremental innovations, until eventually we stumble upon something big (like deep learning).

Is the deep learning really the result of incremental research? The SOTA chasing frenzy comes after the discovery of deep learning. The motivation of incremental research can hardly be justified as to discover the next deep learning, although they might do.

Of course it is, if we agree that incremental research isn't necessarily "SOTA chasing". Deep learning, and in fact all scientific breakthroughs, did not materialize out of nowhere. They're a culmination of a long sequence of scientific papers (some more incremental than others), conference presentations, poster sessions, hallway conversations, even coffee chats.

It's not some dude disappearing into the forest for a few years and coming back with a revolutionary idea (maybe in movies). Everything that helped shape the scientist's thinking has contributed to the idea, even if sometimes the idea is so revolutionary that it's hard to see the direct link. Besides, show me a scientific paper with no references. :-)

It's a shame that this is rarely acknowledged and more often than not we talk about "that guy who invented X and got a Nobel prize for it". But us mere mortals can take solace in knowing that even if we didn't change the world with our own ideas, it's possible that we've influenced someone who has.

What does SOTA means ? "State of the Art" ?

Yes. Alternatively, “look, ma, my number is the biggest number”. Apologise for the sass, just not very happy that SOTA chasing is often discouraging original contributions that would actually move the field forward.

I'd argue that deep learning isn't even new. I was doing multi-layer neural networks back in 1992. The compute power though was woefully underpowered to actually do what they are using it for today.

Yeah, "multilayer perceptrons" date back to the 80's...

agreed with the nitpick wholeheartedly

This article is dead-on, but I think it is missing a fairly large segment of where ML is actually working well: anomaly detection and industrial defect detection.

While I agree that everyone was shocked, myself included, when we saw how well SSD and YOLO worked, the last mile problem is stagnating. What I mean is: 7 years ago I wrote an image pipeline for a company using traditional AI methods. It was extremely challenging. When we saw SSDMobileNet do the same job 10x faster with a fraction of the code, our jaws dropped. Which is why the dev ship turned on a dime: there's something big in there.

The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.

At Embedded World last year I saw tons of FPGA solutions for rejecting parts on assembly lines. Since every object appears nearly in canonical form (good lighting, centered, homogeneous presentation), NN's are kicking ass bigtime in that space.

It is important to remember Self-Driving Car Magic is just the consumer-facing hype machine. ML/NNs are working spectacularly well in some domains.

I worked and built out a proof of concept industrial defect detection system recently, with a large focus on modern DNN architectures. We worked with a plant to curate a 30000+ multi-class defect dataset, many with varying lighting and environment conditions. As you said, modifying and parameter tuning NN is not always a hopeful endeavor.

However, you can make significant gains to your models by going back to traditional image filtering/augmentation. Sticking with well researched object detectors/segmentation algorithms and putting our effort on improving the algorithms that cleans up the data takes you far. It's impossible to avoid because images will always be full of reflections, artifacts, strange coloration unless you have the perfect lighting tunnel setup; doable nonetheless.

Currently doing the image collection for a NN. Created a custom HW rig to speed things up, lighting, turntables, actuators for novel objects, the works. It's really hard and tedious. We're doing liquid detection and even under IR/UV lights, it's still really hard.

We'd love to be able to work with a company for a few days, get the parameters set up right for our case, and then let them take the thousands of images. My company would easily pay $100K+ for such a data set.

> This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

Huh? If anything I would say ML is way more trial-and-error focused than imperative programming.

Well, if your only experience is reading python opencv stack overflow posts, then of course...

I mean in the sense that using ML for a problem often requires just trying a dozen different modeling techniques, then a bunch of a hyper-parameter searching, then a bunch of stochastic tuning…

Oh. I see what you mean. Yeah, I guess by definition backwards propagation is trial-end-error. Huh, I never thought of it that way. Thanks for clarifying, I thought you were being saucy: my apologies for being snarky.

It's HN; we're all here for the snark!

> However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.

This is a link to a dataset, unless I'm missing something it's not about anomaly detection. I looked into this area a few years ago and always try to keep my eye open for breakthroughs... care to share any other links?

Oops! Thanks, it was from this challenge. Lots of neat stuff in here.


>The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

Uh what? You can literally finetune a Fast.ai model overnight to be borderline SOTA on whatever problem you have data for. 0 Math involved, isn't that exactly a hacker's wet dream?

Fast.ai works pretty well when you're working on standard tasks but starts to fall apart when you want to do something more exotic.

I doubt there's much in CV for instance that couldn't be achieved easily with fast.ai. You don't need to be doing exotic things to build a product.

Never said you need an exotic model to build a product nor that you couldn't do exotic things in fast.ai. Fast.ai is just a leaky abstraction.

And yet, it might fail in actual application.

If every DGP could be captured by fast-tuning a sophisticated enough model, science probably would be solved even before DL.

We're talking about building useable products not solving science...

My point being that the reason many products end up not usable, ref. accounts in this thread, is the same reason why science isn’t solved and doing ML correctly isn’t easy.

There's a nice talk by Yann LeCun where he goes on to explain really well why deep learning has such fast progress. [0]

He goes on to explain how theory always comes later.

I thought that information theory came before practice but turns out it also came after. (there was a bunch of heuristics for sending messages with teletypes)

A nice example is Roman technological advances in architecture and materials that predate the use of geometry or mathematics completely. All advancements a result of tinkering and heuristics.

[0]: https://www.youtube.com/watch?v=gG5NCkMerHU

It's not so simple as "what comes first?" (theory or experiment). What generally happens is that we stumble upon something, build a rudimentary but useful or interesting piece, then try to understand it, generalize it, and expand it (and usually succeed, with science). Both happen iteratively.

Information theory came after the telegraph and early communication systems. However, we could not have built modern communication devices without Information theory (the insights and design principles). We build, then we theorize, then we build better, etc. it's not a simple procedure. Computers were developed similarly: there were all sorts of ad-hoc logical apparatus, we built boolean theory to explain it, and then we did all sorts of experiments trying to build computers. Their architects were largely mathematicians with very good ideas of mathematical design principles (and creative new mathematical ideas), not a group stringing together electrical elements and seeing what happens. The same goes for the development of ML/Deep learning, and many other technological marvels.

Although "accidental" discoveries do happen, they happen from a methodical set, with knowledge, intuition, and good priors.

A similar historical pattern is the invention of the steam engine, and then the theoretical framework of engine & thermodynamics formulated by Carnot. (Carnot engine)

The theoretical framework came almost 100 years after the steam engines original invention

Shannon derived the maximum channel capacity decades, almost a century, before we are finally getting close.

So in that sense forming a theory about something is integrating all the practical stuff you've seen before. To me that make sense, IMO that "practical stuff" are what experiments are, aren't they? Probably the experiments are far from perfect, if remotely good even, but that's how empiricism should work? I think? (hmm... not entirely sure)

It can go both ways. E.g. often, experimental physicists will try to prove/disprove a priori predictions from theoretical physicists.

Theory is just an explanation of hw the world works: you can come up with that explanation as a reason for observations, or as a logical consequence of other theories (which is then verified by observation).

I've once read a paper in the late 90s or so which openly said ML research was partly "experimental maths".

"Science owes more to the steam engine than the steam engine owes to science"

Hey everyone this is OP, was a really nice surprise seeing my article generating so much discussion. I unfortunately don't think I'll have time to answer everyone but feel free to reply to this comment if you'd like to ask me anything.

It seems like the article was a bit polarizing, some of the comments made me realize I made a few imprecise statements and I'll fix those. Other comments didn't approve of my tone and that's a bit harder to fix since I tried writing this article in the same way I usually speak. I've written a lot of more dry content for my robotics ebook and I was purposeful in trying something different for this article.

Some of the comments actually generalized my observations to management and software more generally and it's always nice to see people taking my ideas further than I thought they could go.

One major thing I'd like to point out is I don't think "ML is dumb" is the right conclusion, there are lots of incentives making it so more of ML is stagnating but this is certainly not true for the field at large. Interesting ideas need to involve a certain amount of risk. The latter half of my article showcases a few ML adjacent projects which I think are absolutely fascinating.

And if you're interested in reading more stuff by me my robotics and machine learning ebook is very easy to read http://robotoverlordmanual.com/ and will teach you all you need to start building robots at home

Great job writing this article.

Where did you find the memes?

I made most of the edgy ones and had shared them throughout 2020 on my Twitter - only Stack more layers and Squirrel are not original content and ofc the really polished ones like how many angels dance on the head of a pin and the daft punk reference

The writing process for this article was meme first, content second

Astronomy has entered a period of great stagnation. More and more grad students are investing huge amount of time and building larger and larger radio telescopes just to learn more about black hole formation and the properties of pulsars. Little consideration is given to how to more efficiently use inexpensive consumer telescopes purchased at Walmart! Where are the big discoveries? New planets in our solar system? A new structure for the Milky Way? I sincerely hope that the Julia language will allow us to discover ET radio signals, because we all know that Python lacks this capability.

This made me lol - We should write a second post together

There is about to be a 'great pivot' in ML.

There has been a rabid frenzy of throwing money at anything that has ML in it. Soon investors and CEOs will realize that ML is effective in narrow ways and that not everything needs ML.

They will also realize that 1 ML team + ML as a service (Azure ML, Sagemaker, Google AI platform) is cheaper and works more reliably. The services will keep improving and an underpaid mediocre ML-Engineering team can work to keep the production system up and running.

Basically, ML teams might lose jobs just as DB/Cluster admins did with the advent of serverless compute as service.

I expect it to (already happening) create a day trading company like hierarchy. The Fair/Brain/OpenAIs will pay 7 figures to the top grads to be first to market. BigN ML product teams will expand and stay as well paid as they are. Then there will be a huge drop as we move to offshore ML product teams that are viewed as cost centers by the remainder of 99% companies. These will be most of the jobs available.

In such a system, a pure ML scientist (usually a PhD) will only exist at the top companies. So if you are not in the top 1-3% percentile, you will not have a pure ML job. However, there will still be hybrid DS-SDE jobs (ML engineering, ML product maintenance, ML-as-a-service user) or hybrid DS-PM jobs (Analysts, Consultants, data driven business decision makers). So, anyone who is not in that top 1-3% will have to pivot to one of these 3 roles.


I won't call this an AI winter. But, it will definitely become boring majority of those employed in the field..

I am an experienced ML manager in a large ecommerce company, and I mostly agree with you, and I can’t wait for this to happen - and I think people just entering college or grad school for ML should not fear it. It’s a good thing.

Right now, there is so much misunderstanding about what ML is, what resources it needs, and how it works that the corporate environment is very stressful.

ML jobs are well paid, but they are NOT fun. No one understands ML devops & the infra needs to enable tight experimentation loops. Existing observability and telemetry systems are wildly bad for model training, reproducibility or any form of online or semi-online learning. As an ML engineer you’ll have to take on huge workloads of devops, infra, tooling, data munging. I’ve seen more than a few brilliant ML engineers burnout and quit because of this.

As ML becomes better understood as a boring technology, and decisions around ML projects, team structure and especially ops support start to get more standardized, I think this will get better.

The pivot you mention means a thinning out of the headcount on the pure ML research side. But it also means opening up more positions in ML engineering, infra & devops.

If people choose their specialization appropriately and remain open to being less on the research side of this, then I think there will continue to be lots of opportunities for high-paying jobs, and people will know their required responsibilities more unambiguously and probably will be happier, rather than dredging through the endless series of bait and switch jobs that exist today, promising a focus on ML research but typically forcing you more into ML devops & data platform management.

> ML jobs are well paid, but they are NOT fun. No one understands ML devops & the infra needs to enable tight experimentation loops. Existing observability and telemetry systems are wildly bad for model training, reproducibility or any form of online or semi-online learning. As an ML engineer you’ll have to take on huge workloads of devops, infra, tooling, data munging. I’ve seen more than a few brilliant ML engineers burnout and quit because of this.

Can I cry? I feel so understood right now.

I love my job in ML, the subject matter is fun, but there is so such a huge burden of expectations on a team's titular data scientist. It is exciting in a 'mid 90s during the web revolution' sort of wild-west way, but you also have the cynicism of the mature Software field. A good ML engineer is worth their weight in gold.


I also wrote this in a pseudo-fictional dystopian sense. A 'If I was an ML pessimist' take on the the state of things.

The other comments made to the parent I originally posted, are great counter arguments. (2012-14: Alexnet, 14-16: Deep LSTMs, 16-18: Resnet,M-RCNN,Yolo 18-20: Tranformers, 2020+: Alphafold,GPT3,CLIP, et al.) Deep learning has been improving pretty linearly over the last decade. If I was looking at it in a naively statistical sense, then ML will actually be able to match the rising supply of ML scientists with a rising demand. That's the optimistic take though. In that case it will actually feel like being a programmer in the 90s, in that a couple pivots can propel you to multi millionaire.

Bravo Sir!!! You have put it brilliantly!!

Mostly disagree.

> CEOs will realize that ML is effective in narrow ways and that not everything needs ML.

Any stable business isn't unjustifiably syncing costs here. I project FY21 rise an AI-funded efforts in large businesses.

> They will also realize that 1 ML team + ML as a service

Yes/no. This is has more platform implications vs actual ML.

> ML teams might lose jobs

Assumes ML Jobs only do some form of R&D. Data is a utility, and advanced analytics is valuable. Stable ML Jobs dont just work on deep learning.

> I expect it to (already happening)...

partially agree. already happening. But cost centers are only taking on what was standardized yesterday. Tomorrow still requires advanced analytics capabilities.

> In such a system

moot point. This is the system.

I see a pivot in focused efforts. More optimism in an early AI commodity vs stagnation. We're moving from research to integration. Further areas to improve AI in applications (with continuous feedback training) and many domains of advanced analytics.

The optimistic side of me agrees with you.

For my career I say, "tere u me shakkar". ("let there be sugar in your mouth", ie. let your words come true)

Ha & to be fair I've been on and am on a call now trying to sugar coat this exact issue at scale with in-flux relatable scenarios.

> become boring

80-95% of "ML" has always been "boring" and any data scientist/ML-engineer worth their salt know this. This also concerns what you refer to as "pure ML job". It only takes fresh grads and juniors a couple of projects to realize the true meaning of "data cleaning", "outlier detection" "robustness" and their likes - it's painstaking work.

>Soon investors and CEOs will realize that ML is effective in narrow ways and that not everything needs ML.

Sure but ML is also not being used in 90+% of the narrow use cases it is good for.

>They will also realize that 1 ML team + ML as a service (Azure ML, Sagemaker, Google AI platform) is cheaper and works more reliably.

These services replace part of the ML Ops component but not much else except in very narrow use cases. There's also already GUI based tools for building models but they're also not used much. I don't see this part of ML Ops as being the majority of what ML Engineers do so the majority of ML Engineers don't have to worry.

I think this analysis is completely spot on. Personally, I have had the same thoughts for about a year and I have just switched from a 'pure' ML/DS' to exactly the hybrid engineering style role you describe.

I've talked to colleagues about this before but I find it's mostly a cyclical nature fuelled by tunnel vision.

Every good couple of years researchers come up with a good advancement or fresh concept that reignites the community. However all that happens till the next breakthrough is basic tweaks. The amount of junk papers that slightly adjust the method that gets a few decimal place improvement then call it a snazzy name would fill a mountain. Drawing blood from a stone. People get so obsessed with specific methods thinking its the new great thing they don't stop and think it's probably not the only way.

Marketing and media is the worst though for general public perception. The amount of times they would warp ML into a magical pangea. "ML will make you skinny!" I mean it's just mathematical approximations, been around a while.

Don't get me wrong, I like this field. I'm fortunate that I get to apply it to a problem that helps people but at times I just want to shout from the rooftops that it's not a God it wont make all your dreams come true. Then they get annoyed it doesn't and we a bunch of stagnation articles like this.

Interesting article but the part where he says “all” you need to learn are matrix multiplications seems vacuously true to the point of meaninglessness. All numerical methods across every field essentially boil down to matrix multiplications.

Your math ability needs to be such that you can transform a problem into a form that can be computed numerically using matrix multiplications, which requires more skill (sometimes significantly more) than simply knowing how matrix multiplications work. Sometimes this ability to reframe complex problems numerically using matrix multiplications is the lions share of the research!

Good point. Almost as useful as saying "all you need to learn is Boolean algebra" when it comes to computing.

All you need is ZFC...

All you need is ZF...

> Academics sacrifice material opportunity costs in exchange for intellectual freedom.

Most academics I’ve come across only think they’re doing this. My perception is they are too insecure about their self-worth to pursue material opportunities.

I admit, the number of academic types I know is not vast so maybe it’s too small a subset to make any judgments

I went a very uncommon route in my career: I started phd school after 14 years of industry. My motivation was I was tired of creating new and interesting things in industry only to have them be killed by politics. So, my idea never saw the light of day san a few dozen people. My thought was that if i have a phd, I am rewarded in my career for publishing these thoughts. If they are published, maybe some organization smarter than mine would use the ideas. I am a true scientist at heart in that i want to know where my ideas break and a good way to know that is to have a lot of public scrutiny over them.

How has the experience been getting a PhD after so long out of school? I'm quickly approaching 14 years in industry but still haven't ruled out a PhD.

I just started my PhD in AI (specifically NLP). Also took a non-traditional route (creative writing major -> 10 years in industry -> phd).

It's interesting so far. Research feels very open ended compared to industry. While I was in industry (AI fintech startup), even though goals were rapidly changing, I had a good idea of what problems to work on and how to gauge progress.

In contrast, research is far more undefined. There are days I feel lost and other where I'm chasing rabbits down deep holes. It's been hard for me to figure out if the problems I'm trying to solve are worth exploring (are they good research question? and more importantly are they publishable).

But that being said, it's only about 6 months in and I feel like I'm still learning what it means to do research. I've definitely enjoyed having the space to explore problems at my own pace and think deeply about them.

from "creative writing" to AI (industry or academia) sounds like such a wide gap, I did not think such a jump would be possible... congrats!

So how’s it working out for you?

The core problem is that you don't have intellectual freedom. You won't get funded if you are not researching the hot new thing.

Exactly, and that's assuming you are one of the very select few that gets to even select and lead the research areas. Most of the opportunities are working along side professors and department heads that already have an idea of what you should be working on. If you are at one of the elite universities with billion dollar endowments that don't have a lot of these funding problems, then you are working on what the corporate donors suggest the University work on. The university will sell it as "staying relevant for the future job market"

If one has a supportive advisor and adequate non-grant support (e.g., a fellowship), it is possible to work on self-drive ambitious projects. However, typically an advisor only wants to invest significant time if the work is interesting to them. Also, often I've been approached by students who want to do their own thing in terms of high risk work, but they don't have adequate skills or enough existing work to graduate with. Mt goal is to ensure students advance knowledge, graduate, and get the kind of job they want. Ambitious projects that fail will significantly impair positive outcomes.

The billion dollar endowments don't usually go toward supporting research directly.

This is clearly overstated. Some nuance would be appreciated.

> We’ve gamified and standardized the process so much that it’s starting to resemble case studies at consulting interviews.

This is precisely true, as someone who has passed both screens for competitive jobs.

Cracking the coding interview <> Case in Point

Live coding <> Do 3-digit multiplication in your head (eg 347 * 469)

Sorting algorithms <> M&A Evaluation Frameworks

I could go on...

You just memorize a bunch of crap that's vaguely (but not really) applicable but is super random, and then you just keep asking "do you want me to keep going" in various tones until they tell you to stop.

>> We’ve gamified and standardized the process so much that it’s starting to resemble case studies at consulting interviews.

Love the way this is put.. it is so true :(

I still feel like much of AI is a plot to dumb down the modern economy. We want our business people to be just as effective as our quants; we want nothing to require real intellectual labor.

The idea that you traditionally have these programmers who spout mumbo-jumo all day, cost a lot of money, and seem to always be planning stuff behind your back is threatening, and all the more so because you are utterly dependent on them. ML breaks their control over the means of production.

Now, that's not to say I am against labor saving devices. I most certainly am for them, but an an economy in which everyone is in a deep learning arms race is an irrational shit show that could only result in less productivity.

(It's possible a single central planner AI could do better, because at least the training data would be "real world" and not output of other deep learning black box actors. But of course single-planner economies have a huge amount of other downsides.)

This seems fair to me. The executive view of ML is "can you do me a magic?" And as this article's "Graduate Student Descent" bit makes clear, the worker response is often to semi-randomly perturb code, show some graphs, and say, "Is this a magic?"

For me, most software development is about finding something boring and laborious. We get a computer to do the work so humans can level up and work on something requiring actual thought. That requires getting a deep understanding of the actual work.

Some of that definitely happens in well-run ML projects. But there's a bunch of Silver Bullet Syndrome stuff going on, where ML's shiny results and magazine articles lead to inflated expectations and inflated claims of success. A fellow nerd says, "I did an algorithm!" Some turns that into an impressive presentation with claims of X% gains in the Key Business Metric, hallowed be its name. In reality, it's more plus or minus X% when you account for externalities, natural variation, and actor adaptation. But that's ok, because by the time anybody finds out, attention is elsewhere.

That's not to really blame ML for that. For a period years ago, I kept getting asked, "Can we use a wiki for that?" I would start an explanation of what it actually takes to make a wiki work (hint: it's not the software). Their eyes would glaze over in short order, because they realized that it would take actual work. So many people want the silver bullet, the magic pill. Especially people in the managerial caste, as the reigning dogma there is that management is a universal skill. Details are for the little people.

> Especially people in the managerial caste, as the reigning dogma there is that management is a universal skill. Details are for the little people.

Exactly. Machine learning is the perfect ideological duel. It's "universal labor" for "universal management", and both sides are equally illiterate in the ways of the world.

Ooh, that's very well put.

This rings true from my experience as a data scientist for a non-tech compay. I spend more time doing expectation management than ML work. This seems to hold true whether we're trying to use ML on a new and novel problem statement, or trying to reproduce some published application or supposed solution a competitor claims to have.

>Especially people in the managerial caste, as the reigning dogma there is that management is a universal skill.

This would be funny if it were not also so true and sad... management as a skill (and it is a skill, it is not IMHO something that can be taught, especially in business schools!) is such a rarity.

It's just another tool. I have this saying: all machine learning is just clever tricks and techniques until someone learns how to brute force the solution.

It's the same adoption/business technology tension that has existed since Frederick Taylor in the early 1900s or Vonnegut's Player Piano concept where they propose taking a recorder to automated human-adverse tasks by recording their movements. The hype is trying to replace people. Real-world adoption seems to take place where machine learning complements human activity to do things humans are not good at, not replaces it. It's not making them dumber, it's making them more enabled.


In the early 1900’s Frederick Taylor called public attention to the problem of ‘national efficiency’ by proposing to eliminate ‘rule of thumb’ management techniques and replacing them with the principle of scientific management [157]. Scientific management reasons about the motions and activities of workers and states that wasted work can be eliminated through careful planning and optimization by managers. While Taylor was concerned about the inputs and outputs of manufacturing and material processes, computers and the information age brought about parallel concepts in the management and organization of information and its processes. Work efficiency could now be measured not by bricks or steel, but by their information flows.

F. W. Taylor, “Principles of Scientific Management”, Harper & Row, New York, NY, 1911.

Not all tools are alike. Taylorism in particular was based around, yes, a bunch of empiracism, but also trying to understand the production process in greater detail. The modern ML seems more like the modern MBAism where the details are considered beneath management. That would have been antithetical to Taylor, who was all about how the sausage is made.

> I have this saying: all machine learning is just clever tricks ...

It appears to me that you are missing the mark here, unless this is largely a definitional issue.

Do you consider the foundations of ML to be a clever trick?

Do you think human brains primarily learn by clever tricks?

When a metaphor or saying falls apart with one more level of questioning, I would suggest it may be time to find a better metaphor.

To your first question, yes it's just a style of solving a problem that can be implemented in software for 90%+ of the time.

For your second, I think that's part of the problem. Most people confuse how a human brain learns with what is actually running in a machine learning program. It's similar at some level, but not really doing the same thing at another.

> To your first question, yes it's just a style of solving a problem that can be implemented in software for 90%+ of the time.

What insight is gained by this statement? What does it explain? What does it downplay?

I'm not currently seeing much value in it. I'll explain why. Saying 'just a method of solving a problem' comes across as reductive without being useful.

Imagine if someone said 'flying is just a means of movement' in the context of studying a hummingbird's agility. It says more about the speaker than the subject. It suggests the person is uninterested or focused on other things.

So with regards to your statement, it suggests you don't care and/or don't appreciate what makes learning difficult.

I'm talking about learning theory. About generalization given data. This is certainly not easy. Yes, it can be encoded in an algorithm, but that does not make it less interesting.

BTW, Google Brain posted this on the subject of "brute force" solutions yesterday. I thought it would be of interest here.

The big idea here is that enough brute force will lead to better compute-use techniques which will in turn make it possible to do more with less compute. But the current reality is that these systems don’t tend to justify their existence when compared to greener, more useful technologies. It’s hard to pitch an AI system that can only be operated by trillion-dollar tech companies willing to ignore the massive carbon footprint a system this big creates.

> Now, that's not to say I am against labor saving devices.

That's good to know. Because "labor saving devices" is by no means the field of machine learning. A hammer is a "labor saving device" if all you have are rocks. I've got the impression that a lot of people conflate machine learning and robotics with labor saving in general. Of course compared to the state of art machine learning let's us hope to find magical shortcuts to get our work done. But an IDE, a word processor, or a compiler is also a labor saving devices. As well as a piece of paper, it is way faster to doodle on a piece of paper at your desk than finding the next cave to doodle.

Agreed I find that ml is a meta tool. You gotta Know the tool you need, then use ml to build it. Build the wrong tool and there’s little value. The right tool will make people more productive.

I always see that the data used in ML field are not being praised enough. I don't see enough people realising it was the large collection of data / curated information that enabled the advancement (may it be that the ML methods were heavily and manually tweaked to accomplish certain tasks). It simplify wasn't possible before the age of internet.

That reminds me how SQL was advertised - "Now you can program computers in plain English and fire all your programmers!". Fast forward, and today good luck getting a job as a Microsoft SQL programmer if you only know Oracle SQL.

Yes there is some of that, but at least SQL is rather readable and declarative, even if the syntax is way too irregular for the ideas.

I blame less SQL false advertising, than the insane tunability and lack of effort into migrations tooling for making the relational database world so much more kafkaesque. Or really between the nature of Oracle, Microsoft, and their customers, it might have been an inevitable insanity along the lines of Conways law and too profitable -> too many cooks in the kitchen -> too complex.

Wrong sub-field. Business insights are usually the domain of statistics, not deep learning. Deep learning is mostly for processing raw text, images, sounds and recommendations. It's dumb, boring stuff like translation, OCR, spam filtering and search ranking. How to run a business is the domain of game theory in economics, or maybe you could do something with Bayesian methods.

As someone who works on open-source tooling for AI, good.

The more people who can leverage AI, the better the industry will be as a whole. It serve as both a check on AI hype, and also leads innovating in encouraging new applications.

Huggingface Transformers is a good demo of this philosophy in action.

Trying to use ML to get rid of programmers will just replace them with ML experts who also have to be programmers to implement the models and munge all the data. These people will in turn have to be paid more than the original programmers were.

The goal of ML isn't so much to get rid of programmers as to get rid of specialized programmers. Right now if you want to solve a problem, you need someone with a deep understanding of that specific problem to develop a solution. For a complex problem like diagnosing cancer patients, you are talking about a team of people with decades if not centuries of combined experience in oncology on top of the actual programming expertise to implement the tool. The holy grail of ML is to reach the point where someone who is an expert on making ML systems can apply the same (or substantially similar) tools and expertise to a wide range of problems - the same team that makes a cancer diagnosis system could also make a legal text search system or a protein folding system. Realistically we'll probably never get to the point where zero domain knowledge is required, but even if the requirement is just substantially reduced to the point where the ML expert can learn what they need in months instead of years, that would be revolutionary.

"The" goal of ML? No.

A goal? No. ML is a field, not something with agency.

One effect of ML is more generalization of prediction and inference problems. (I'm using inference in the statistical sense.)

Won't they be much fewer in number, though?

Yeah I think a very few but flashy situations, ML is worse results but even fewer programs, so worth it in some sense. The idea the non-programmers should be doing a little something is also very good. Just too the embodiment of that is the mess that is non-programmer Python.

> “ The idea that you have these programmers who spout mumbo-jumo all day, cost a lot of money, and seem to always be planning stuff behind your back is threatening, and this breaks their control over the means of production.”

That is a very bizarre description of ML engineers. In every company I’ve worked at, ML is a team or teams that partners with product managers and other engineering teams to learn about problems they need solved. It’s very systematic, boring, and tied heavily to those other teams as the leaders and decision makers.

First you look for high level value propositions, like automating a decision process, removing a customer friction point, creating key metrics where simple metrics are intractable, or various multi-modal information retrieval goals.

You identify opportunities in these kinds of high level areas in lock-step with product managers and other engineering leaders. Then you move on to identify sources of data that can be leveraged, and eventually (much later) you get to the smaller set of work training a model, validating with acceptance tests and hardening the implementation for safe production deployment.

If people are spouting “mumbo jumbo” and making big ML model decisions without lock-step synchronization with other stakeholders, that sounds like organizational dysfunction, not any type of issue with ML.

I also find it odd that you bring up “cost[ing] a lot of money” .. that’s very out of place among everything else mentioned. That seems more like insecurity or jealousy over the market demand for ML talent, and wanting to cut other people down rather than work with them or acknowledge the level of effort it required to get that level of expertise in ML.

> That is a very bizarre description of ML engineers

No no no, that's a description of regular programmers.

There's so much more arcana in the field of programming and computer science as a whole, especially with the piss-poor job we've done deprecating bad old interfaces etc. (the monster that is modern Unix grows without bound). And of course there are the various langauge and other fads. All that is a nightmare for a traditional business person, whether they know it or not, and the regular programmers probably feel like an extortion racket of sorts.

A plot? By whom?

ML, depending on the situation may refer to a field of research, a methodology, a set of tools, among other things. Only things with agency 'plot'.

Richard Feynman — 'Philosophy of science is about as useful to scientists as ornithology is to birds.'

Well, this blogpost shows where this analogy breaks down : birds (AFAWK) don't try to discuss ornithology.

"bureaucrats running the asylum" is normal science !

Thomas Kuhn has shown how it works in The Structure of Scientific Revolutions :


The problem is that without Kuhn our expectation are set by pop history of science, which only remembers 'anormal', extraordinary science : the paradigm changes.

(Otherwise, this is a great blogpost.)

I kind of agree with the author's major sentiment: that ML research is stuck in a rut with incremental improvement. However, the longer the article goes on, the less and less I agree with any of their statements. They start of criticizing the incremental improvers. They advocate later that if "stack more layers" beats a method, the method isn't good while completely ignoring anything other than the standard SOTA metrics like data efficiency, computational efficiency, model representation efficiency, etc.

And, while I agree there is a "fake rigor" problem in ML research, the particular examples that they bring up aren't extremely good exemplars in my opinion. Instead, they seem to have a problem with the standard operating procedure of mathematics while missing the point that it is what got the "stack more layers" school here in the first place. Simplifying a problem so you can understand it, and then relaxing the assumptions and seeing if you can figure out what implications that has is how advances are made.

Plus, they have some hot takes and statements that are just plain wrong.

> With Automatic Differentiation, the backward pass is essentially free and is as engaging to compute as 50 digit number long division. Deriving long complicated gradients is fake rigor that was useful before we had computers.

What? First of all, AD is not a solved problem and using it is not "essentially free." There's a huge performance overhead when adding AD to a system. Try using a second order method with AD. I hope your Hessian actually finished computing.

> Julia on the other hand is a language made for scientific computing where everything is automatically differentiable by default.

This is just plain false. I'm a huge proponent of Julia, and there are some great AD packages, but in no way is everything automatically differentiable (even with the nice packages), nor is that a design goal. The work on Flux.jl (a package) is extremely impressive though, and there are particular features of Julia that allow some awesome package interoperability (e.g. the fact that some ODE solvers can be differentiated through with Flux).

> and there are some great AD packages, but in no way is everything automatically differentiable (even with the nice packages), nor is that a design goal.

I work on the AD infastructure for Julia. That absolutely is a design goal. Certainly we are not there yet; we still have a long way to go. But that is where we want to go to.

With the cavet that thigns that are not mathematically defined to have derivatives (e.g. the derivative of `xs[i]` with respect to `i`) we won't differentiate those.

But for stuff like mutation (the big thing Zygote doesn't support (though some of our other ADs do)), we sure do want it to.

Sorry, I guess I was unclear.

It's definitely a goal for the AD ecosystem, but, as far as I'm aware, AD is not a design goal of Julia itself. That was the point I was trying to make.

Perhaps not originally, but now it is certainly a language design goal that the entire compilation pipeline can be parameterized and modified to allow for a huge variety of custom transforms. AD is a subset of the transformations they are endeavouring to support.

E.g. see https://github.com/JuliaLang/julia/pull/33955

Generally speaking, if there is any sort of code transformation pass that's needed for AD but not supported, you can expect people to be working to support either that specific transformation, or a generalization of the transformation. This has been a theme in the language development for years now.

That’s really interesting. Thanks for sharing.

>some ODE solvers

Some? Which ones aren't? Could you open an issue?

I was unsure if every ODE solver was able to be differentiated through. The "some" was me trying to avoid making a blanket statement about all, when I wasn't sure.

Sorry for the confusion.

"Neural Network weights are learnt via Gradient Descent and Neural Network architectures are learnt via Graduate Student Descent."

Like it!

Graduate Student Descent -- there's also the other saying: Every moment you spend writing, publishing, or working on your dissertation is a moment that you are getting behind on doing unique and cutting edge research.

But also "The difference between screwing around and science is writing it down."

I like yours better.

There's so much stuff about the world that we don't understand and want to and that even trivial machine learning techniques can help us figure out. Yet, at least in my field (computer security), it has become incredibly difficult to publish research that applies machine learning without having some needlessly obtuse methodological twist to make it seem more complicated. Even low hanging fruit needs picking and I'm worried discoveries with intrinsic value are being left to rot in the name of feeling "scientific."

>The construction of Academia was predicated on providing a downside hedge or safety net for researchers. Where they can pursue ambitious ideas where the likelihood of success if secondary to the boldness of the vision

Is there a citation for this?

Much I would like it to be true, from the little I've read it seems that plenty of universities/colleges were originally religious in nature and were intended to defend orthodoxy, even to combat specific heresies. Definitely not to take intellectual risks.

I dropped out of a PhD in RFIC/MMIC design. Gradient student descent and lack of first principles reasoning hits too close to home. Maybe it's just the nature of highly empirical engineering disciples to just throw many hours of grad student and simulation time blindly at problems in hopes that one lucky fella finds a new optimization.

I fear ML and AI are taking the lion share of analytics resources in most big companies and that's just not right. We still have basic problems with data integrity and blending. Getting a real end to end view of any business process is still a herculean task. Try matching something like invoices with marketing leads, good luck...

If it makes you feel any better, in practice, most ML projects fail and just turn into (super valuable) reporting like you describe.

If you want a reporting solution, buy AI.

It's about time someone formalized Graduate Student Descent, please someone write a paper diving into all it's properties!

"Graduate Student Descent is one of the most reliable ways of getting state of the art performance in Machine Learning today and it’s also ... fully parallelizable"

So, this is the claim I thought was interesting:

"Every paper is SOTA, has strong theoretical guarantees, an intuitive explanation, is interpretable and fair but almost none are mutually consistent with each other"

How can that be true unless the "theory" itself isn't really worked out?

Here's something to put ML and related tech to work: to quantify the influence of shilling, astroturfing, brigading, gatekeeping, and all kinds of political campaigning on the 'net, the devastating consequences of which are only now seen by the general public.

Though I have to say that I've personally seen ML invading conferences not related to ML per se, with over half of submissions employing ML techniques to random problems in the primary field which however were in itself clearly of no interest to the presenter and only a vehicle for their graduation even more than usual. So I'm admitting to see MLer as mostly in it for advancing their careers; glad to be convinced otherwise.

Winter is coming, again.

That's unlikely. There are many machine learning applications that have been shown to be good enough for commercial use and they aren't going anywhere. The worst case for the field is that progress slows down, people realise that their expectations were unrealistic and the hype inevitably dies down. Which has to happen eventually. So even if ML isn't the hottest thing or a massively growing field, it will still be used for a long time.

> There are many machine learning applications that have been shown to be good enough for commercial use and they aren't going anywhere.

If you could name three of them I'd be really grateful. Serious question; everything surrounding ML seems to be only good for (non-monetizable) art projects.

As art it is amazing, not going to lie, but "commercial use" seems like a huge stretch.

I think you are underestimating how many of the services you already use incorporate machine learning somewhere. It's not about GANs or deepfakes (right now those are closer to the art projects you are referring to). Some examples for actual applications or classes of applications are image recognition, speech recognition, recommender systems, virtual assistants, fraud detection, financial forecasting.

Here's 3 off the top of my head, but there's more especially when you get into less flashy territory.

Translation (Google translate, DeepL)

Automatically generated product descriptions, sometimes also edited by humans (Alibaba)

Image Tagging (Facebook photos)

Google translate, good enough?

Today, I've received a package from Amazon containing router bits (for wood working not IT). It contains a so called "User Manual" which is obviously so badly translated, I assume automatically, that it will only fool a spell checker, that it is actually written in German.

I often hear and read good things about Google Translate but every time I read something from it, e.g. when a browser or webpage helpfully decides that I would prefer a butchered salad of German words instead of an English web page, I am repulsed.

I don't know but I live in Berlin, dont know German and use google translate on all sorts of official and non-official stuff (e.g. for my Anmeldung two weeks ago, reading up on Mietendeckel, news reports, random letters and shopping sites etc.) and it works almost always. Obviously not as perfect as a real translator but good enough > 99% of the time.

In this case, give DeepL a try. Translations between English and German at least, are magnitudes above Google, IMO. Not without errors, of course.

Okay, I should have worded my comment more carefully.

These applications seem to firmly fall into the "I'm willing to compromise on quality if I don't have to pay a living person a wage" niche, so they're value-destroying, not value-creating.

Are there examples of value-creating applications for ML? (From a business point of view; obviously the "shitty translations but at no cost" proposition creates value for the average Internet user.)

Your qualification of value destroying/value-creating makes little sense to me. If I need a news article or some other webpage translated I can do it thanks to Google Translate. If Google Translate was not here, I would simply not get it translated and lose the information, there is no way I would contact a translation firm for that kind of stuff. To me it is firmly value creating.

At any rate here are way more than 3 other uses of DL today off the top of my head:

* Autocompletion (be it in search engines toolbars or in Gmail/Word/...)

* Superresolution GAN, the most interesting example to me being NVIDIA DLSS, you render a game at ~720p or less and then upscale it to the target resolution of 1080p or 4k, allowing to get quality that the machine would not have been able to support at the target resolution directly.

* Image recognition/tagging: Most of this is used in the security domain, but there is also a lot of stuff around inventory management, safety etc.

* Semantic search

* Protein folding (AlphaFold)

* In astrophysics: detection of supernovaes, FRBs and probably a bunch of other stuff I'm not aware of

* Self-driving cars: Even assuming self-driving technology does not evolve anymore from now on, the current state of the art is still a selling point.

* Predictive maintenance: Used for plane engines and other things

I don't understand this. Do you think that e.g. the average engineer is value destroying because if the business hires a more expensive and experienced one they will do a better job?

In either case it is only 'value-destroying' if the business has unlimited resources.

Image tagging is not in that category. Google Photos, Synology Moments and other similar projects are valuable for end users. Nobody had time to manually tag hundreds of gigabytes of photos with even 10% of the items in them.

Like I said, I'm not talking about end users. (Obviously for end users ML enables doing lots of things "for free" that used to be paid services before.)

I'm talking about commercial applications, like the OP said. That is, things I could potentially pitch to management and substantiate with something concrete that isn't "you can fire your lowest-paid contractors now".

Some things are worth doing but not worth paying people. Google is using ML to read house numbers from Google Street View photos. Why is that not a good commercial application? Google gets its map updated faster and cheaper with ML.

As far as I'm aware, ML is driving a pretty sizeable amount of new functionality at the big tech companies. How do you think Google does search and translate, Netflix recommends films, Tesla drives your car, Siri recognises your voice, etc. etc.

I see the sentiment you've expressed a lot, and feel it speaks to a massive disconnect amongst developers. Most people are interacting with ML systems dozens if not hundreds of times a day.

Question wasn't if ML is useful. The question is - does ML make money or create (financial) value for the company?

The stuff you listed doesn't, it's just part of a moat for already established products that don't depend on ML for their market share. (It's not like Netflix will lose market share if they switch from ML to some other approach for their recommendation system.)

Spam detection is the oldest and most well-known one.

It is also used quite a bit in graphics and imaging; DLSS is a consumer-facing application, but it is also used in other domains, like OCR.

Machine translation is another ubiquitous use case. As is any kind of language processing, like text-to-speech.

Also, in the industry, it is heavily used for anomaly and defect detection. Also, Google reportedly uses it for a lot of search/recommendation stuff.

ML is definitely monetizable, but not every company needs it, by a long shot. It is not "AI" and seems to often resemble a complex DSP step when used in practice. I think it's overhyped, but it is far, far more useful than anything blockchain will ever be.

Not even remotely an ML expert. Just some ideas I had:

Speech recognition + basic NLP for automatic customer support triage. None of these are great to use as a customer, but they seem to be effective enough to continue using and save companies lots of money.

Automatic "offensive" content detection for social media. I'd bet they use ML to do a first-pass on uploaded content to make sure it doesn't contain porn, gore, etc. Probably lets these companies save money.

Automatic defect detection in factories. Instead of training humans to detect subtle issues in manufacturing issues. I think companies like Samsara are experimenting with offering this tech as a service.

Facial recognition/tracking for law enforcement/defense. Ignoring the ethics of it for a moment, it seems like governments would be willing to pay a good amount of money for this tech. Could be used to automatically search through hours of footage to find which frames (if any) contain a target.


Most of these were developed woth actual business partners and are being used right now.

Not sure what you're talking about. Neural networks are powering a LOT of the internet.

> Siri/Cortana/Google Home/Alexa, powered by DeepSpeech+language models

> Google Search, powered by BERT

> Tesla, powered by variants of YOLO

> Facial recognition powered by MTCNN+FaceNet

> AirBnB product search+recommendations

> Amazon product recommendations

GANs are a bit artsy, but JFC they're not even a decade old - we've gone from shitty MNIST clones to fully synthetic faces in the span of 5 years!

Predictions: 1. I suspect we're going to see DeepFakes in Hollywood - famous people might license their faces to movies that they might not have the time to star in

2. People are going to start building even more powerful versions of search, like combinations of CLIP

3. Neural networks still aren't optimized for edge devices - we're going to see a deluge of cheap drones with cutting edge computer vision by default

Any online service with considerable content creation and recommendation that you can think of is using ML to improve every aspect of the system (better embeddings, better retrieval, better ranking, better text understanding, better personalization, etc.) And the commercial usefulness is measurable.

Industrial robotics, driver assist, drug discovery, computational photography, speech translation, and so many other examples illustrate a clear commercial applicability of ML methods scaled in the last 10 years specifically.

Do you consider voice assistants, DLSS and protein folding predictors to be non-monetizable art projects?

>If you could name three of them I'd be really grateful.

Latent Dirichlet Allocation, message-passing in Hidden Markov models, and Naive Bayes for spam filtering. Outside my subfield, there's always the basic handwriting recognition employed in ATMs.

Speech-to-text. Every voice assistant, many phone trees, assistive software.

Facial recognition. iphone face unlock, photo tagging, etc.

Behavior prediction for advertising. Ad quality scores basically.

We do machine learning for large organizations and business is good. We've been doing it for many years. We're even making our platform (https://iko.ai) to do it faster and more consistently as it takes care of all the menial tasks.

These projects have an impact in the real world [predictive maintenance on infrastructure that serves people, for example].

Winter is always coming, the important question is when it will arrive. Current ML research is still destroying new problems with ease so the current velocity is high. It will slow down first, before people start to question why the new crop of problems are too hard, and then the cycle will start again.

I think there's already a good amount of discussion around where current ML methods. stuff like lack of sample efficiency, adjusting for distribution shift, etc.

Just a quick belief of myself (might change tomorrow): ML winter will not come until we hit the flattening of the specialized hardware s-curve. I know people believe that ML can still scale in funding (for even bigger models than GPT-3 with current hardware) but that shouldn’t be our hope. I also can not imagine that the exponential efficiency increase in architectures with optimized structure can continue for a couple more years (happy to be proven wrong here). Also the scaling in dataset sizes will ultimately halt (or might be already) for supervised learning which might give a bump for reinforcement learning where there is usually unlimited data (due to simulation)

Most commercial use of ML, including neural nets, is small data application-specific business logic. Think phishing / spam / fraud detection, anomaly detection, semantic search, image search, keyword labeling, classifying or segmenting customer data. These applications have well-understood business value propositions. Much, much less often the ML application focused on truly large data.

I think we might see a winter in super big applications like self-driving cars or voice assistants, but ML in general is just a boring, non-controversial business tool with hundreds of valuable applications.

You’ll still need statistical specialists to train and operate models and ensure systems avoid pitfalls like overfitting, poor convergence, multicollinearity, confounders, etc.

So I doubt this will have much impact on ML job market. Companies that invest in ML will continue to run circles around companies that don’t. You’ll just see the unjustified over-focus on SOTA neural networks die down and become just another boring tool in the toolbox like everything else in ML.

I agree, or at least I hope you are right. I have some doubts about investment into ML without the hype (at least for legacy big corps) but probably their hype-driven ML efforts are/were misguided by poor incentives. But I agree again that companies that invest in ML from a grassroots/first principles way should run circles around the legacy ones. I think the „boring“ part you mention might be what I refer to as ML/AI winter. But then again do you think there is much space left in the ML toolbox apart from neural networks? I think at least for e.g. computer vision we can agree that neural nets ate all the cake, right?

Yeah, if I got a penny everytime I heard that one...

Promises made by AI caused more spent pennies during the last 60 years. And people continue to be inspired by such promises.

Python is the new Lisp.

Not sure that it's as much stagnation as commoditization.

Sounds like the field has accelerated itself into an intellectual dead end.

Not enough to know how to walk . You need to know where you want to go, and figure out the existence of any path in the first place or you might end up squaring the circle.

Matrix multiplication is nothing to be laughed at. It is how anyone performs linear projection anyway.

Fair enough to say, currently ML research is fixated on exhausting combinations of blocks to squeeze a marginal improve on a few public benchmarks.

But it is like saying software has been stagnated since we are not getting new ISA created. Building ML application remains as challenging as ever, it is just modeling itself has become streamlined before everything else.

Technology Review summary of AI and ML adoption into business. It indicates, while you leading thinkers have figured it out, business is a long ways from being anywhere near peak ML.


This article makes a lot of points, but nothing that implies actual stagnation? I wish the writing was more concise and to the point.

BERT engineer? That acronym is essentially ungooglable, since you just find every person ever named Bert.

This is what I finally found: BERT, which stands for Bidirectional Encoder Representations from Transformers.


Searching for "BERT machine learning" would have worked.

A number of industrial domains have largely benefitted from this tool and outsiders can even set up their own virtual r&d lab at almost no upfront investment these days. As usual, a tool is (still...) only a tool, and it (still, just...) needs domain knowledge to work in full and produce noticeable ROI?

> It’s important to avoid becoming Gary Marcus and criticize existing technique that work without proposing something else that works even better.

Hilarious. Has Gary Marcus actually done anything, in practical terms, like actual code or something, that outperforms the DL approaches he attacks so viciously?

Marcus argues convincingly and appears less biased than others who benefit from the current high level of investment in DNN.

It seems to be true that no better proposals for solutions have come from his side so far. But I think his criticism per se is valuable, especially his reminder that one cannot simply ignore sixty years of research.

EDIT: and of course he is not the only renowned scientist who calls for reflection; here is a quote from an interview with Judea Pearl: "AI is currently split. First, there are those who are intoxicated by the success of machine learning and deep learning and neural nets. They don’t understand what I’m talking about. They want to continue to fit curves. But when you talk to people who have done any work in AI outside statistical learning, they get it immediately. I have read several papers written in the past two months about the limitations of machine learning." (https://www.theatlantic.com/technology/archive/2018/05/machi...)

Gary Marcus has been proven wrong on claims he's made and is mostly just nay-saying with adhoc reasoning made up to support it.

>appears less biased than others who benefit from the current high level of investment in DNN.

Yes, instead he blatantly tries to benefit from the counter-investment in AI skepticism.

There somehow seems to be an unwritten law of the Python generation like "Thou shalt not criticise Machine Learning". Or is there a better explanation for the emotions that flare up every time someone dampens the exaggerated expectations and reminds us of earlier research in the field of linguistics or AI?

Let's not make this a generational divide thing, shall we? There's people of all ages who seem to think AI research is like a silly ball game, where one cheers their home team's plays and boos their er, whatsitcalled, the other team's plays, no matter the plays.

And who can blame them? AI (read: machine learning) (read: deep learning) research has turned into a huge feeding frenzy. People see all the billions thrown about by Google, Facebook et al. and they go crazy. Maybe they think that if they cheer hard enough and boo hard enough they'll look knowledgeable and "passionate about machine learning" and maybe someone will hire them. Maybe they just want to be on the right side of history, with the winners, not the losers. And when there's so much money to win, there sure are plenty of losers!

A while ago someone posted here an article that advised that to become expert in machine learning one should (among other things) "flashcard X papers in major sub-fields" or something along those lines. Pretty revealing of what people are thinking of: Google is hiring machine learning specialists. Shut up Garry Marcus, you'll scare the fish off.

There's plenty of good criticism of given products, papers, and approaches. There's also bad criticism though and it's important to distinguish the two.

The discourse often has little to do with science though.

Not your problem. Just let 'em be stuck in their local optimum (probably better this way) :P

> Not your problem

Unfortunately I'm not unaffected.

I keep coming across versions of this comment.

Why does it matter? It’s not like the validity or invalidity of Gary’s criticisms would be different if he had done other totally separate ML research.

I am a DL researcher and practitioner.

Any response to criticism that brushes off the criticism based on the source of the criticism is bad and not in the scientific spirit. Progress comes by asking questions and pointing out flaws. You don't need to have an answer to the question before you ask the question.

Marcus's criticisms are valid. Most DL researchers have a huge blind spot. It is not good for the field.

I can't direct a movie but I can tell that Plan 9 from Outer Space is crap (or so crap it's gold, whatever). The kid who pointed out the emperor had no clothes didn't need to be a master tailor. And Gary Marcus doesn't need to propose alternatives to current deep learning practice in order for his criticism of deep learning to be valid.

... and yet he has: neuro-symbolic integration. That's his suggestion.

And in fact that's a whole field that's been publishing work for a while now. So he's not just making it up.

Are you sure that overall deep learning customers are spending more than traditional ML customers? My guess: it’s not even close - financial sector is still order of magnitude larger as paying customer than those hot CV/NLP companies.

What a nothing burger.

fast.ai link actually points to http://book.realworldhaskell.org/ What gives? Edit: having read to the end, there's a mention of Haskell book, so just an editing error then.

Hasn't this been the state ef academia for a long time now? At least for ML.

I like till it mention keras, fast.ai. No offense to keras, fast.ai they are very good for beginners in ML. But for serious work you need more (try to write autoregressive generative model etc). Here again bullshit article trending on HN front page, shame

lol I am a so called BERT engineer and I can't agree more to this wow

I see a lot of parallels between ML/DL and cryptocurrency research.

At least with machine learning there is some actual substance underneath the hype.

> cryptocurrency research

We're thinking very hard about if the price of our deep-lolcoin is going to go up enough to buy a new a vacation house....

Can u elaborate on that?

I can't speak for them, but I took it this way: there are certain industries that are simply incentivized to produce more of their own industry.

The criticisms of the financial industry where they are not really creating anything of value other than optimizing more value out of existing money.

Adtech industry where a whole generation of technical researchers spent time figuring out how to optimize clicks.

Cryptoeconomy and blockchain where large amounts of money are created out of perceived value and gargantuan efforts are made to build, not solutions to real-world problems (yes I know there are some), but ways to increase the shared, total value of the technology or individual cryptocurrency.

Lots of salty kids in here trying to justify spending 6 years as an anonymous hyperparameter tuning drone in the Stanford AI Lab.

Missing (2020)?

Two month ago?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact