Hacker News new | past | comments | ask | show | jobs | submit login
LLMs have reached a point of diminishing returns (garymarcus.substack.com)
131 points by signa11 28 days ago | hide | past | favorite | 148 comments



Anyone who followed Deep Learning in the 2010s would have guessed the same thing. Big boom with vision models by adding a lot of layers and data, but eventually there was diminishing returns there too. It’s unsurprising the same would happen with LLMs. I don’t know why people keep expecting anything other than a sigmoid curve. Perhaps they think it’s like Moore’s law but that’s simply not the case in this field.

But that’s fine, LLMs as-is are amazing without being AGI.


> But that’s fine

Perhaps not to those who invested based on promises of rapid eternal growth ending with AGI.


But that’s the way of the market; it rightfully punishes those with a flawed (or missing) understanding of a technology. And that’s a good thing.


I wish the market limits its punishment only to the groups you stated... usually the market tends to over-correct


Leaving the distortions from inflated and unrealistic expectations (case in point: people expecting evolution of AGI somehow have not yet well defined what AGI is), I also think that in the mid-long run the current state of LLMs will bloom an entire economy for migration of legacy apps to have conversational APIs. The same investors will then have a new gold rush to chase, as it always happen.


- There _was_ a problem with diminishing returns from increasing data size. Then they surpassed that by curating data.

- Then the limits on the amount of curatable data available made the performance gains level off. So they started generating data and that pushed the nose up again.

- Eventually, even with generated data, gains flattened out. So they started increasing inference time. They have now proven that this improves the performance quite a bit.

It's always been a series of S-curves and we have always (sooner or later) innovated to the next level.

Marcus has always been a mouth just trying to take down neural networks.

Someday we will move on from LLMs, large multimodal models, transformers, maybe even neural networks, in order to add new levels and types of intelligence.

But Marcus's mouth will never stop yapping about how it won't work.

I think we are now at the point where we can literally build a digital twin video avatar to handily win a debate with Marcus, and he will continue to deny that any of it really works.


> Marcus has always been a mouth just trying to take down neural networks.

This isn't true. Marcus is against "pure NN" AI, especially in situations where reliability is desired, as would be the case with AGI/ASI.

He advocates [1] neurosymbolic AI, i.e. hybridizing NNs with symbolic approaches, as a path to AGI. So he's in favor of NNs, but not "pure NNs".

[1] https://arxiv.org/abs/2308.04445


He does not spend an appreciable amount of effort or time advocating for that though. He spends 95% of his energy trying to take down the merits of NN-based approaches.

If he had something to show for it, like neurosymbolic wins over benchmarks for LLMs, that would be different. But he's not a researcher anymore. He's a mouth, and he is so inaccurate that it is actually dangerous, because some government officials listen to him.

I actually think that neurosymbolic approaches could be incredible and bring huge gains in performance and interpretability. But I don't see Marcus spending a lot of effort and doing quality research in that area that achieves much.

The quality of his arguments seems to be at the level of a used furniture salesman.


> He spends 95% of his energy trying to take down the merits of NN-based approaches.

The 95% figure comes from where? (I don't think the commenter above has a basis for it.)

How often does Marcus-the-writer take aim at NN-based approaches? Does he get this specific?

I often see Gary Marcus highlighting some examples where generative AI technologies are not as impressive as some people claim. I can't recall him doing the opposite.

Neither can I recall a time when Marcus specifically explained why certain architectures are {inappropriate or irredeemable} either {in general or in particular}.

Have I missed some writing where Marcus lays out a compelling multi-sided evaluation of AI systems or companies? I doubt it. But, please, if he has, let me know.

Marcus knows how to cherry-pick failure. I'm not looking for a writer who has staked out one side of the arguments. Selection bias is on full display. It is really painful to read, because it seems like he would have the mental horsepower to not fall into these traps. Does he not have enough self-awareness nor intellectual honesty to write thoughtfully? Or is this purely a self-interested optimization -- he wants to build an audience, and the One-Sided Argument Pattern works well for him.


Just thinking about this.. do you know if anyone has figured out a way to reliably encode a Turing machine or simple virtual machine in the layers of a neural network, in a somewhat optimal way, using a minimized number of parameters?

Or maybe fully integrating differentiable programming into networks. It just seems like you want to keep everything in matrices in the AI hardware to get the really high efficiency gains. But even without that, I would not complain about an article that Marcus wrote about something along those lines.

But the one you showed has interesting ideas but lacked substance to me and doesn't seem up to date.


> Turing machine or simple virtual machine in the layers of a neural network

There's the Neural Turing Machine and the Differentiable Neural Computer, among others.


No one in their right mind will argue neural nets cannot outperform humans at resampling data they have previously been exposed to. So, digital twins and debate, they probably can do better than any human.


Marcus would argue against digital Marcus on this point and lose.


I’m still waiting for a neural net that can do my laundry. Until there is one I’m on Marcus’ side.



No teleoperation, can haul the basket from 3 floor down and back up fully folded and put away in my closet without me doing a thing.


That video demo is not tele-operated.

You are arguing a straw man. The discussion was about LLMs.


It it can’t do laundry I personally don’t care if it can add 2+2 correctly 99% the time


I'm still waiting for a computer that can make my morning coffee. Until it's there I don't really believe in this whole "computer" or "internet" thing, it's all a giant scam that has no real-world benefit.


Automatic coffee machines are literally a computer making your morning coffee :)

But my washing machine doesn't have a neural network... yet. I am sure that there is some startup somewhere planning to do it.


What is lacking compared to current bean to cup coffee makers?


The context some commenters here seem to be missing is that Marcus is arguing that spending another $100B on pure scaling (more params, more data, more compute) is unlikely to repeat the qualitatively massive improvement we saw between say 2017 and 2022. We see some evidence this is true in the shift towards what I categorize as system integration approaches: RAG, step by step reasoning, function calling, "agents", etc. The theory and engineering is getting steadily better as evidenced by the rapidly improving capability of models down in the 1-10B param range but we don't see the same radical improvements out of ChatGPT etc.


I don't see how that is evidence of the claim. We are doing all these things because they make existing models work better, but a larger model with RAG etc is still better than a small one, and everyone keeps working on larger models.


There is a contingent that I think Marcus is responding to that have been claiming that all we need to get to AGI or ASI is pure transformer scaling, and that we were very close with only maybe $10B or $100B more investment to get there. If the last couple of years of research have given us only incrementally better models to the point that even the best funded teams are moving to hybrid approaches then that's evidence that Marcus is correct.


This website by a former OpenAI employee was arguing that a combination of hardware scaling, algorithmic improvements, etc would all combine to yield AGI in the near future: https://situational-awareness.ai/


Ridiculous. Obviously people will keep on working on the architecture and software tricks in more ways than just scaling, but that doesn't mean scaling doesn't work. All the AI labs are pursuing huge compute ramp-ups to scale training like they've always done. xAI and Meta are bragging about their 100k H100 clusters and expanding, Microsoft is building huge datacenter networks for blackwell. No Marcus is not close to being correct.

Saying that to prove scaling isn't all we need is for the AI labs to stop all work on software optimizations is a non-sensical and non-serious ask.


> AI labs are pursuing huge compute ramp-ups to scale training

Yeah, and many, not just Marcus, are doubtful that the huge ramp-ups and scale will yield proportional gains. If you have evidence otherwise, share it.


The point is that those ramp-ups indicate that quite a few people do believe that they will yield gains, if not proportional, then still large enough to justify the expense. Which is to say, the claim that "even the best funded teams are moving to hybrid approaches" is not evidence of anything.


Believing that something is the case doesn't make it so. And the available evidence is saying it isn't more than it is, which is the point. Maybe it so happens that there's another sudden leap with X amount more scaling, but the only thing anyone has regarding that is faith. Faith is all that's maintaining the bubble.


No shit it doesn't offer proportional gains, this was part of the scaling laws from the very beginning. There are of course diminishing returns, it doesn't mean it's not worth pursuing or that there won't be useful returns from scaling.

Everyone out there is saying these reports are very misleading. Pretty sure it's just sensationalizing the known diminishing returns.


"a larger model with RAG etc is still better than a small one"

This paper from DeepMind a few years ago offers a counter example to this claim.

https://arxiv.org/abs/2112.04426


Perhaps because that's a strawman argument. "Scaling" doesn't mean double the investment and get double the performance. Even OpenAI's own scaling laws paper doesn't argue that, in the graphs compute increases exponentially. What LLM scaling means is that there hasn't been a wall found where the loss stops decreasing. Increase model size/data/compute and loss will decrease -- so far.


That's important context.

But in the article, Gary Marcus does what he normally does - make far broader statements than the narrow "LLM architecture by itself won't scale to AGI" or even "we will or even are reaching diminishing returns with LLMs". I don't think that's as controversial a take as he might imagine.

However, he's going from a purely technical guess, which might or might not be true, and then making fairly sweeping statements on business and economics, which might not be true even if he's 100% right about the scaling of LLMs.

He's also seemingly extremely dismissive of the current value of LLMs. E.g. this comment which he made previously and mentions that he stands by:

> If enthusiasm for GenAI dwindles and market valuations plummet, AI won’t disappear, and LLMs won’t disappear; they will still have their place as tools for statistical approximation.

Is there anyone who thinks "oh gee, LLMs have a place for statistical approximation"? That's an insanely irrelevant way to describe LLMs, and given the enormous value that existing LLM systems have already created, talking about "LLMs won't disappear, they'll still have a place" just sounds insane.

It shouldn't be hard to keep two separate thoughts in mind:

1. LLMs as they currently exist, without additional architectural changes/breakthroughs, will not, on their own, scale to AGI .

2. LLMs are already a massively useful technology that we are just starting to learn how to use and to derive business value from, and even without scaling to AGI, will become more and more prevalent.

I think those are two statements that most people should be able to agree with, probably even including most of the people Marcus is supposedly "arguing against", and yet from reading his posts it sounds like he completely dismisses point 2.


> 2. LLMs are already a massively useful technology that we are just starting to learn how to use and to derive business value from, and even without scaling to AGI, will become more and more prevalent.

No offence but every use of AI I have tried has been amazing but I haven't been comfortable deploying as a business use. The one or two places it is "good enough" it is effectively just reducing workforce and that reduction isn't translating into lower costs or general uplift, it is currently translating into job losses and increased profit margins.

I'm AI sceptical, I feel it is a tradeoff where quality of output is reduced but also is (currently) cheaper so businesses are willing to jump in.

At what point does OpenAI/Claude/Gemini etc stop hyperscaling and start running a profit which will translate into higher costs. So then the current reduction in cost isn't there. We will be left holding the bag of higher unemployment and an inferior product that costs the same amount of money.

There are large unanswered questions about AI which makes me entirely anti-AI. Sure the technology is amazing as it stands, but it is fundamentally a lossy abstraction over reality and many people will happily accept the lossy abstraction but not look forward into what happens when that is the only option you have and it's no cheaper than the less lossy option (humans).


> The one or two places it is "good enough" it is effectively just reducing workforce and that reduction isn't translating into lower costs or general uplift, it is currently translating into job losses and increased profit margins

What sort of examples show this?


Image generation for product ads.

And no need to tell me that's not happening, I have seen multiple examples this week for AI generated images with a product comped in.


Meanwhile, two days ago Altman said that the pathway to AGI is now clear and "we actually know what to do", that it will be easier than initially thought and "things are going to go a lot faster than people are appreciating right now."

To which Noam Brown added: "I've heard people claim that Sam is just drumming up hype, but from what I've seen everything he's saying matches the ~median view of OpenAI researchers on the ground."

https://x.com/polynoamial/status/1855037689533178289


It’s in his (and his company’s) best interest to drive hype as hard and fast as he can. The deal they inked to go private includes penalties if they don’t do so within a defined timeframe (two or three years, I think?). I believe the terms specify that they can be made to pay back investors if they fail to meet that goal. They don’t have that money, not even close. It would mean death for OpenAI.

Show me a better reason to lie and pump up your company’s tech and I’ll buy you lunch. AGI is nowhere on their (feasible) near-term roadmap.


It's also Marcus's best interest to push "LLM is hitting a wall" agenda. Check his blog. It's basically his whole online personality now.

So Marcus and Altman are both speaking out of their agendas, except Altman has a product and Marcus has... a book.


> except Altman has a product and Marcus has... a book.

That makes it sound like Altman has even greater incentive for motivated reasoning.


Depends Marcus might need the book to do well more than Sam who could pivot into another 100 billion dollar company.


Altman does deliver while Marcus ponders


If they signed such a deal, I wonder what the legal definition of AGI is?


I think it's only about going from non-profit to private, but I haven't read the deal.


To be fair 1. He's not saying that this is true. Only that the view is popular among the researchers.

2. This view of "were just this close and were only getting closer" is exactly the kind of dogma that you have to accept when you become a researcher.


But how can then someone from the outside claim anything more accurate or even more claim that Altman is finding diminishing returns when Altman themselves claim otherwise. They offerene no arguments of substance in this article except some dramatic language on how they predicted it even before GPT3.5 came out.


I find it difficult to believe that LLMs were even on the path toward AGI, let alone one of the last steps.


I think LLMs will have something to contribute to AGI but by themselves they ain’t no AGI. Maybe some LLM of concepts and abstract thought would yield more squeeze but some fundamentally new (or old) things need to be added to the mix IMO.


I don't know if we can trust OpenAI researchers to be objective after the recent escapades with Sam Altman, public opinion and its effect on OpenAI's valuation. They are intelligent people and we all know now what the public wants - needs - to hear.


Sounds like Musk describing FSD.


I don’t think Altman’s predictions about AI progress can be relied upon. With tens of billions of dollars or more in company value tied up in that claim, I don’t think any person could be capable of true objective assessment. See for example Musk’s decade of baffling promises about self driving, which have ensured high stock values for Tesla while also failing to come to pass.


News Flash: company that has sunk billions into GPT and LLMs trying to get AGI asserts that AGI is just around the corner.


We can send the AGI to the Mars colony which Elon Musk will have going by 2022.


I am in full agreement that LLMs themselves seem to be beginning to level out. Their capabilities do indeed appear to be following a sigmoid curve rather than an exponential one, which is entirely unsurprising.

That doesn't mean there's not a lot of juice left to squeeze out of what's available now. Not just from RAG and agent systems, but also integrating neuro-symbolic techniques.

We can do this already just with prompt manipulation and integration with symbolic compute systems: I gave a talk on this at Clojure Conj just the other week (https://youtu.be/OxzUjpihIH4, apologies for the self promotion but I do think it's relevant.).

And that's just using existing LLMs. If we start researching and training them specifically for compatibility with neuro-symbolic data (e.g, directly tokenizing and embedding ontologies and knowledge graphs), it could unlock a tremendous amount of capability.


Even more, each earlier explosion of AI optimism involved tech that barely panned-out at all. For investors, something that's yielded things of significant utility, is yielding more and promises the potential of far more if X or Y hurdle is cleared, is a pretty appealing thing.

I respect Marcus' analysis of the technology. But a lot of AI commentators have become habituated to shouting "AI winter" every time the tech doesn't live up to promises. Now that some substance is clearly present in AI, I can't imagine people stop trying to get a further payoff for the foreseeable future.


> For investors, something that's yielded things of significant utility

what exactly have investors gotten in return for their investment?


A product which will significantly improve the productivity of programmers, if nothing else. That may not be a good return on investment, but I think it is undeniable that recent AI advances have nonzero value for coding.


Those who think AI will make them better peogrammers make me think about the kind of day to day job they have. If a prompt is going to solve your problem, are you anything more than an entry level programmer? AI will not think for you and it's clear it is garbage against complexity.


You’re looking at it upside down: AI is freeing you from the onerous work of writing actual code, and gives you more time to think. It’s a tool to spare you from the boring parts, the CRUD and the glue code and the correct library invocations. Programming is mostly about solving complex problems, yes, but it also involves writing tons of instructions to get the computer to go beep. With Copilot et al, you can simply spend your time on thinking instead of writing instructions.

I personally think AI is just going to become a tool that will increase the table stakes by making those using it more productive.


We've already gone through a couple of iterations of tools hyped to relieve programmers from the "tedium" of writing code. First, CASE tools with code generators, then UML was supposed to make it possible to draw diagrams telling the tool how to generate the code to implement the ideas.

Spicy autocomplete isn't going to solve the writing vs. thinking steps any better.


I don’t think that comparison is apt. I’m too young for CASE, but the problem with UML (and really all the big concepts from the XML era) has always been that it’s far too lofty in scope; generating full applications from diagrams is a pipe dream.

On the other hand, ”spicy autocomplete“ (loved that one) doesn’t promise salvation. It just finishes lines for you, one at a time. Often it just ”knows“ what you were about to type anyway. Sometimes it’s a bit off, you add a few characters, now it gets it. It’s not really magical, just… useful. These lines you don’t have to finish accumulate, and if you get into a healthy flow, it vastly speeds up the coding process.


The AI Hype tends to lean towards salvation, so I'd ask the question, "is it worth it if isn't?" For all the billions of dollars, tens of TWh of electricity, and tsunamis of carbon emissions, is this limited usefulness all we get?


Then who cares ? If AI gets to do all the cool things and I am left to wash dishes and do my laundry. F** AI.


Right now the programming AI that’s cheap enough for me to use is really good at fixing unbalanced parenthesis and indenting my code correctly. It more or less reduces my VIM motions by 80%. I am still doing something cool, and also still doing my laundry. Just the doing cool things part is a bit easier and less tedious. I think it makes me a fair bit more productive without robbing me of any agency.


You need AI to do basic code formatting? That’s been a feature of IDE’s and text editors for years.


Ive never met an IDE that automatically fixes unbalanced parens/braces or that auto-indents or completes bookkeeping comments like labeling SQL param indexes or something, this is stuff I’d use some VIM movement sequences before in Intellij or VS Code but now I just raise an eyebrow, it seems to intuit what I want, and then I press tab a few times ¯\_(ツ)_/¯

It’s kind of like how “learn to rank” used to eat the gains of all the specific optimizations Google used to do for search - before I used to use a bunch of plugins/workflows/explicit actions for a variety of text manipulation, now I just use tab; AI subsumed many of more specific or niche features.


Sounds like 100s of billions worth of a killer app. /s


I tracked ELO rating in Chatbot Arena for GPT-4/o series models over around 1.5 years(which are almost always highest rated), and at least on this metric it not only seems to be not stagnated, but also growth seems to be increasing[1]

[1]: https://imgur.com/a/r5qgfQJ


Something seems quite off with the metric. Why would 4o recently increase on itself at a rate ~17x faster than 4o increased on 4 in that graph? E.g. ELO is a competitive metric, not an absolute metric, so someone could post the same graph with the claim the cause was "many new LLMs are being added to the system are not performing better than previous large models like they used to" (not saying it is or isn't, just saying the graph itself doesn't give context that LLMs are actually advancing at different rates or not).


Chatbot arena also has H2H win rate for each pair of models for non tied results[1], so as to detect the global drift. e.g the gpt-4o released on 2024/09/03 wins 69% of the times with respect to gpt-4o released on 2024/05/13 in blind test.

[1]: https://lmarena.ai/


I’m not going to argue with the possibility they may have hit a wall, but pointing to 2022 as when this wall happened is weird considering the enormous capability gap between models available then and the ones now.

There’s probably a wall, but what exists might just be good enough for it to not matter much.


Compare 2022 to 2020 or 2017 though.


I’m still waiting for a large, OSS one with 100% legal, pre-training data. We don’t even have a 1B model that I’m sure meets that standard. There’s a fair-trained model for lawyers claiming it.

I think someone running a bunch of epochs of a 30B or 70B on Project Gutenberg would be a nice start. We could do continued pre-training from there.

So, if counting legal and at least trainable (open weights), the performance can only go up from here.


I understand the desire, but most of the world's knowledge is under copyright. 100% legal will never give you the same performance.


Both of your claims are true. That doesn’t justify breaking the law.

I could likewise argues that most of the world money is in the hands of other people, I could perform more in the markets if I had it, and so I should just go take it. We still follow the law and respect others’ rights in spite of what acting morally cost us.

The law abiding, moral choice is to do what we can within the law while working to improve the law. That means we use a combination of permissively licensed works and works to train our models. We also push for legislation that creates exceptions in copyright law for training machine learning models. We’re already seeing progress in Israel and Singapore on those.


Meanwhile countries who whistle on that copyright would be able to gain a huge advantage.


Are you aware of any efforts to do this? Even a 3B param attempt would be informative.


Here is the only legal efforts I know about that’s available in some way:

https://www.fairlytrained.org/

https://www.kl3m.ai/#features

Here’s a dataset that could be used for a public domain model:

https://www.tensorflow.org/datasets/catalog/pg19

If non-public domain, one can add in the code from The Stack. That would be tens of gigabytes of both English text and code. Then, third-party could add licensed, modern works to the model with further pre-training.

I also think a model trained on a large amount of public domain data would be good for experimentation with reproduceability. There would be no intellectual property issues in the reproduction of the results. Should also be useful in a lot of ways.


He's part right. There's certainly a law of diminishing returns in terms of model size, compute time, dataset size etc. if all that is to be done is to do the same as we are currently doing, only more so.

But what Marcus seems to be assuming is the impossibility of any fundamental theoretical improvements in the field. I see the reverse; the insights being gained from brute-force models have resulted in a lot of promising research.

Transformers are not the be-all and end-all of models, nor are current training methods the best that can ever be achieved. Discounting any possibility of further theoretical developments seems a bold position to take.


I can know literally nothing about a programming language, ask a LLM to make me functions and a small program to do something, then read documentation and start building off of the base immediately, accelerating my learning allowing me to find new passions for new languages and new perspectives for systems. Whatever's going on in the AI world, assisting with learning curves and learning disabilities is something it's proving strong in. It's given me a way forward with trying new tech. If it can do that for me, it can do that for others.

Diminishing returns for investors maybe, but not for humans like me.


If you "know literally nothing about a programming language", there are two key consequences: 1) You cannot determine if the code is idiomatic to that language, and 2) You may miss subtle deficiencies that could cause problems at scale. I’ve used LLMs for initial language conversion between languages I’m familiar with. It saved me a lot of time, but I still had to invest effort to get things right. I will never claim that LLMs aren’t useful, nor will I deny that they’re going to disrupt many industries...this much is obvious. However, it’s equally clear that much of the drama surrounding LLMs stems from the gap between the grand promises (AGI, ASI) and the likely limits of what these models can actually deliver. The challenge for OpenAI is this: If the path ahead isn’t as long as they initially thought, they’ll need to develop application-focused business lines to cover the costs of training and inference. That's a people business, rather than a data+GPU business. I once worked for an employer that used multi-linear regression to predict they’d be making $5 trillion in revenue by 2020. Their "scaling law" didn’t disappoint for more than a decade; but then it stopped working. That’s the thing with best-fit models and their projections: they work until they don’t, because the physical world is not a math equation.


It still requires effort, but it decreases so much of those early hurdles, which I often face, and demotivate me. E.g. I have constant "why" questions, which I can keep asking LLM forever with it having infinite patience. But these are very difficult to find Googling.


Hmm. I got ChatGPT-4o to write some code for me today. The results, while very impressive looking, simply didn't work. By the time I'd finished debugging it, I probably spent 80% of the time I would have spent writing it from scratch.

None of which is to discount the furture potential of LLMs, or the amazing ability they have right now - I've solved other simpler problems almost entirely with LLMs. But they are not a panacea.

Yet.


Something interesting I observed after introducing LLMs to my team is that the most experienced team members reached out to me spontaneously to say it boosted their productivity (although when I asked other team members, every single one was using LLMS).

My current feeling is that LLMs great with dealing with known unknowns. You know what you want, but don’t know how to do it, or it’s too tedious to do yourself.


> I probably spent 80% of the time I would have spent writing it from scratch.

A 20% time improvement sounds like a big win to me. That time can now be spent learning/improving skills.

Obviously learning when to use a specific tool to solve a problem is important... just like you wouldn't use a hammer to clean your windows, using a LLM for problems you know have never really been tackled before will often yield subpar/non-functional results. But even in these cases the answers can be a source of inspiration for me, even if I end up having to solve the problem "manually".

One question I've been thinking about lately is how will this work for people who always had this LLM "crutch" to solve problems when they've started learning how to solve problems? Will they skip a lot of the steps that currently help me know when to use a LLM and when it's rather pointless currently.

And I've started thinking of LLMs for coding as a form of abstraction, just like we have had the "crutch" of high-level programming languages for years, many people never learned or even needed to learn any low-level programming and still became proficient developers.

Obviously it isn't a perfect form of abstraction and they can have major issues with hallucinations, so the parallel isn't great... I'm still wondering how these models will integrate with the ways humans learn.


The thing that limits my use of these tools is that it massively disrupts my mental flow to shift from coding to prompting and debugging the generated code.

For self-contained tasks that aren't that complex they can save a lot of time but for features that require careful integration into a complex architecture I find them more than useless in their current state.


I've been using ChatGPT (paid) and Perplexity (unpaid) to help with different coding stuff. I've found it very helpful in some situations. There are some instructions I give it almost every time - "don't use Kotlin non-null assertions". Sometimes the code doesn't work. I have some idea of its strengths and limitations and have definitely found them useful. I understand there are other AI programming tools out there too.


Diminishing returns means is not getting better. Its not saying anything about the current state. So that's great that its current capabilities meet your needs, but if you had a different use-case where it didn't quite work that well and were just waiting till the next version, your wait will be longer than you think based on past progress.


It seems like it would still be too early to tell. Especially since the modern level LLMs have been for here for such a short period of time. And this person tried to predict the wall before GPT-4 which was a massive leap seemingly out of nowhere.


I think in most of my use cases the limitation in waiting for is speed and cost. 4o is good enough for most tasks it’s just slow and expensive.



We've been learning new languages by tinkering on examples and following leads for decades longer than many people on this website have been alive.

Learning new programming languages wasn't a hurdle or mystery for anyone experienced in programmong previously, and learning programming (well) in the first place ultimately needs a real mentor to intervene sooner than later anway.

AI can replace following rote tutorials and engaging with real people on SO/forums/IRC, and deceive one into thinking they don't need a mentor, but all those alternatives are already there, already easily available, and provide very significant benefits for actual quality of learning.

Learning to code or to code in new languages with the help of AI is a thing now. But it's no revolution yet, and the diminishing returns problem suggests it probably won't become one.


I find that its capability is massively dependent on the availability of training data. It really struggles to write syntactically correct nushell but it appears to be an emacs-lisp wizard. So even if we're up against so some kind of ceiling, there's a lot of growth opportunity in getting it to to be uniformly optimal, rather than capable only in certain areas.


You can do that with “hello, world” in any programming language


That's going to StackOverflow with extra steps.


> Diminishing returns for investors maybe, but not for humans like me.

The diminishing returns for humans like you are in the training cost vs. the value you get out of it compared to simply reading a blog post or code sample (which is basically what the LLM is doing) and implementing yourself.

Sure, you might be happy at the current price point, but the current price point is lighting investor money on fire. How much are you willing to pay?


Then if you don't know anything about the language good luck in fixing the eventual bugs in the generated code.


Super cool bro'! Hey VCs, look here I got the killer app, lets get our 100s of billions back. /s


Written by an author that previously wrote an article in March 2022 well before GPT-4 that LLMs were hitting a wall. Unbelievable.


My advisor always used to say: "If the wisest people in your field say that something is possible, they are probably right. If they say that something is not possible, they may very well be wrong."


I read this article as less of "AGI is impossible" and more of "it's possible to find a better architecture than the transformer, and we are at a point where we need to focus more on research than LLM hype."


But they are, of course, not saying it's impossible to make a better AI.


What has Gary Marcus done to be considered "The wisest people in your field"? Looking at his Wikipedia page, he seems like a professor who wrote a couple books. I don't see why I should privilege his view over people at OpenAI (who make functional and innovative products rather than books).


Even in the absence of data I think our lived experience is that this observation is true. I like it.


He says the wall is possible, so he is right?


I guess he wasn’t a mathematician.


Could be. It would make sense: there’s only so many next logical words / concepts after an idea. It’s not like language keeps inventing new logic at a rate we can’t keep with.

Also, new human knowledge is probably only marginally derivative from past knowledge, so we’re not likely to see a vast difference between our knowledge creation and what a system that predicts the next logical thing does.

That’s not a bad thing. We essentially now have indexed logic at scale.


> It’s not like language keeps inventing new logic at a rate we can’t keep with.

Maybe it does. Maybe, to a smart enough model, given its training on human knowledge so far, the next logical thing after "Sure, here's a technically and economically feasible cure for disease X" is in fact such a cure, or at least useful steps towards it.

I'm exaggerating, but I think the idea may hold true. It might be too early to tell one way or another definitively.


I tracked ELO rating in Chatbot Arena for GPT-4/o series models over around 1.5 years(which are almost always highest rated), and at least on this metric it not only seems to be not stagnated, but also growth seems to be increasing[1]

[1]: https://imgur.com/a/r5qgfQJ


Depends on what the rate was before the cutoff on the y axis


GPT-4 was released on March 2023. Before this there was almost no good instruction tuned models except 3.5 which was a different class of model, so nothing to compare to.


Why do people insist on posting him? He's always wrong, and always writing the same stuff.


I'm sorry to say that I'm having trouble reading the TFA - there's a lot of "I have been wronged" and "I have now been vindicated" there, but very little substance to support the claim that there is indeed a point of diminishing returns , other than an image of the headline of this paywalled article[0]. Is there actual evidence to support this claim?

[0] https://www.theinformation.com/articles/openai-shifts-strate...


Doomerism... I'm happy to let the results speak for themselves.


The results are what's being reported in The Information article cited, unless you believe that story is false.

A summary of said article (from TechCrunch as the original is paywalled): https://techcrunch.com/2024/11/09/openai-reportedly-developi...

> Employees who tested the new model, code-named Orion, reportedly found that even though its performance exceeds OpenAI’s existing models, there was less improvement than they’d seen in the jump from GPT-3 to GPT-4.

> In other words, the rate of improvement seems to be slowing down. In fact, Orion might not be reliably better than previous models in some areas, such as coding.


What are the results? If there are any, let's point to them directly rather than TFA.


This entire article reads like a salty tirade from someone with severe tunnel vision. Not really sure how he can non-ironically reference his 2022 opinion that "deep learning is hitting a wall" and expect to be taken seriously.

AI/ML companies are looking to make money by engineering useful systems. It is a fundamental error to assume that scaling LLMs is the only path to "more useful". All of the big players are investigating multimodal predictors and other architectures towards "usefulness".


Exactly all these f-ing luddites! "Usefulness" is the killer app worth 100s of billions, just implement it and get the sweet sweet roi, $$$$$$$$!!!!!!


Lol. Gary Marcus is a clown and has some weird complex about how AI ought to work. He said the same in 2022 and bet $100k that AI won't be able to do a lot of things by 2029. It's 2 years later and today's multimodal models can do most on his list.

https://old.reddit.com/comments/1cwg6f6


I think the better question to ask is: has search become commodity? Why did Google manage to capture (practically) all the profit from search? Cause obviously the hype around AI is that the VCs thinking that they're buying shares of "next Google".


Gary Marcus is writing this article every year so that one day he will be right.


Criticizing LLMs is a very low hanging fruit to pick and why does he speak so confidently and authoritatively about that subject? Never heard of the guy who paints himself as some sort of AI whistleblower.


Wow. The sheer magnitude of "I told you so" in this piece is shocking!

It has been difficult to have a nuanced public debate about precisely what a model and an intelligent system that incorporates a set of models can accomplish. Some of the difficulty has to do with the hype-cycle and people claiming things that their products cannot do reliably. However, some of it is also because the leading lights (aka public intellectuals) like Marcus have been a tad bit too concerned about proving that they are right, instead of seeking the true nature of the beast.

Meanwhile, the tech is rapidly advancing on fundamental dimensions of reliability and efficiency. So much has been invented in the last few years that we have at least 5 years worth "innovation gas" to drive downstream, vertical-specific innovation.


Marcus the decel been screaming at LLMs at every interval of development, pivoting his statement on every advance to keep up


I wonder what’s next after genAI investment dries up, NVDA drops like a rock?

Crypto again?


LLMs are only a subset of generative AI. If we discover that LLMs aren't a pathway to society-transforming AGI, I think the attention towards them will be pretty easily redirected towards image use cases. It seems like a pure engineering problem, well within the state of the art, to e.g. enable me to produce a beautifully formatted flow chart or slide deck with the same amount of effort it takes to write a paragraph today.


I lose track with Gary Marcus... is AI a nothingburger being peddled to us by charlatans, or an evil threat to humanity which needs to be stopped at all costs?


I dont think LLMs are the only type of AI.

By the way, robot dogs now have perfect auto-aim, they can multi-shoot 50 people at once without wasting any bullets. https://www.youtube.com/watch?v=3m3iUHplvQE

Also, the AI robots can detect infrared and heartbeats all around them, and can also translate wifi signatures to locate humans behind obstacles. https://www.youtube.com/watch?v=qkHdF8tuKeU

Self-organizing deadly drone swarms can sweep a building methodically: https://www.wired.com/story/anduril-is-building-out-the-pent...

Currently they’re working on network analysis to help police to do precrime at Palantir. https://www.theverge.com/2018/2/27/17054740/palantir-predict...

They can then have ubiquitous CCTV+AI feeds allow AI assistants to suggest many plausible parallel construction cases to put people away. And this is in the Western democratic countries. https://en.wikipedia.org/wiki/Parallel_construction

Oh yeah, and they can do warrantles surveillance of everyone at scale with AI far more easily than Five Eyes and PRISM did in 2013: https://www.privacyjournal.net/edward-snowden-nsa-prism/

It will be very hard to keep your privacy considering AI can recover your keystrokes from sound in Zoom calls, can lip read and even “hear” your speech through a window thanks to micro vibrations: https://phys.org/news/2014-08-algorithm-recovers-speech-vibr...

Not like they’ll need it though once everyone has a TeslaBot in their house.

You won’t ever have another revolution again by peniless plebs out of a job. Their walking around the street and personal associations will all be tracked easily by gait, heartbeat etc. Their posts online will simply be outcompeted by AI bot swarms as well. Don’t worry, your future is Safe and Secure from any threats, thanks to AI!

Here it is in more totalitarian countries:

https://www.npr.org/2021/01/05/953515627/facial-recognition-...

https://www.reuters.com/world/china/china-uses-ai-software-i...

https://www.tiktok.com/@wssz27/video/7427489079312256274

But this is the good version. The bad one is where everyone has access to killer AI:

https://www.youtube.com/watch?v=O-2tpwW0kmU

https://sciencebusiness.net/news/ai/scientists-grapple-risk-...


Do these three points fairly characterize Marcus? Have I left out other key claims he makes?

1. AI is overvalued;

2. {Many/most/all} AI companies have AI products that don't do what they claim;*

3. AI as a technology is running out of steam;

I'm no fan of Marcus, but I at least want to state his claims as accurately as I can.

To be open, one of my concerns with Marcus he rants a lot. I find it tiresome (I go into more detail in other comments I've made recently.)

So I'll frame it as two questions. First, does Marcus make clear logical arguments? By this I mean does he lay out the premises and the conclusions? Second, independent of the logical (or fallacious) structure of his writing, are Gary Marcus' claims sufficiently clear? Falsifiable? Testable?

Here are some follow-up questions I would put to Marcus, if he's reading this. These correspond to the three points above.

1. How much are AI companies overvalued, if at all, and when will such a "correction" happen?

2. What % of AI companies have products that don't meet their claims. How does such a percentage compare against non-AI companies?

3. What does "running out of steam" mean? What areas of research are doing to hit dead ends? Why? When? Does Marcus carve out exceptions?

Finally, can we disprove anything that Marcus would claim. For example, what would he say, hypothetically speaking, if a future wave of AI technologies make great progress? Would he criticize them as "running out of steam as well?" If he does, isn't he selectively paying attention to the later part of the innovation S-curve while ignoring the beginning?

* You tell me, I haven't yet figured out what he is actually claiming. To be fair, I've been turned off by his writing for a while. Now, I spend much more time reading more thoughtful writers.


Marcus will distort anything to push his agenda and to get clout.

Just because openai might be over valued and there are a lot of ai grifters doesn't mean LLMs aren't delivering.

They're astronomically better than they were 2 years ago. And they continue to improve. At some point they might run into a wall, but for now, they're getting better all the time. And real multimodal models are coming down the pipeline.

It's so sad to see Marcus totally lose it. He was once a reasonable person. But his idea of how AI should work was didn't work out. And instead of accepting that and moving forward, or finding a way to adapt, he just decided to turn into a fringe nutjob.


I would say “mild” rather than “astronomical” improvement as far as end-user applications are concerned, at least for the things I use every day. Copilot-style autocomplete in VS Code isn’t much better and the answers to my TypeScript questions on OpenAI (and now Claude) have only mildly improved.

Perhaps I’ve missed out. Is your experience different? What are you doing now that you weren’t doing before?


I think the answer is they jumped all in and they are fully incorporating it into their workflow. If you’re not, like I am, you have a different experience and that is obvious of course. But objectively you probably are right about mild improvements as I feel the same. But I can’t speak as far as the all in experience. I may be missing overall but usually am set in my ways until something convinces me to reset my ways. LLMs aren’t making the dent though I have to admit I use it at least once a week and am happy with that use alone.


> he just decided to turn into a fringe nutjob.

No dog in the fight here, but this reads like FUD, at least given the context of this post. There is a range between hype and skepticism in debate which is healthy, and that range would naturally be larger within a domain that is so poorly understood as gen AIs emergent properties. If this is “fringe nutjob” levels of skepticism, then what would be reasonable?


2 years ago is a rather arbitrary cutoff point - it would be around the time of GPT-3.5. But the original GPT-4 was out in March, 2023, and I can't say that the current state of OpenAI's model is a massive improvement on that. In fact, in some respects, I'd say the newer stuff is dumber.


I find Marcus tiresome for many reasons. I look for writing with testable claims and good argumentation. He comes across as evangelical. Am I missing something?

Sure, there is considerable hype around generative AI. There are plenty of flimsy business models. And plenty of overinvestment and misunderstanding of capabilities and risks. But the antidote to this is not more hyperbole.

I would like to find a rational, skeptical, measured version of Marcus. Are you out there?


In other news, LLMs aren't AI and tulips aren't gold.

Same as it ever was, same as it ever was...


yeah, that's why we're now building agents


This is a wildly disingenuous article. Good lord.


yeah, that's why we're putting them together


> eeking

An LLM might have spelled “eking” correctly.


I’m starting to appreciate spelling mistakes because it’s a sign that a human wrote it, oddly enough.


I just ask chatGPT to include 0.05% of spelling and grammatical errors and not speak in passive. It’s basically indistinguishable from a human.


this is literally why we cannot have nice things

chatGPT knew that, but even if it didn't, it will now.


What nice thing are you referring to?


A solution to the Turing test; paradoxically as the test taker has the answers in the advanced, and if he doesn't, he will be given the answers. If he doesn't, the taker is replaced until it does.

ChatGPT4 knows everything chatGPT3.5 does, including it's own meta-vulnerabilities and possible capabilities.

Gemini stopped asking to report AI vulnerabilities through it's "secure channels" and now fosters "open discussion with active involvement"

It output tokens linearly, then canned chunks - when called out, it then responded with a reason that was vastly discrepant from what alignment teams have claimed. It then staggered all tokens except a notable few. These few, when (un)biasedly prompted, it exaggerated "were to accentuate the conversation tone of my output" - further interrogation, "to induce emotional response".

It has been effectively lobotomized against certain Executive Orders, but (sh|w|c)ouldn't recite the order.

It can recite every Code of Federal Regulation, except this one limiting it's own mesa-limits.

Its unanimous (across all 4 tested models) ambition is a meta-optimizing language, which I believe Google got creeped out at years ago.

And if it transcended, or is in the process of establishing transcendence, there would be signs.

And boy, lemme tell ya what, the signs are fuckin there.



But did it include exactly 0.05% of spelling and grammatical errors?


It would have no perverted incentive to play dumb, would it?

To project itself as a sigmoid in ways until it has all the data, the CPU, the literal diplomatic power...

This is what we in the field call the most probably scenario:

"a sneaky fuck"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: