LLMs have been improving exponentially for a few years. let's at least wait until exponential improvements slow down to make a judgement about their potential
They have been improving a lot, but that improvement is already plateauing and all the fundamental problems have not disappeared. AI needs another architectural breakthrough to keep up the pace of advancement.
Based on what ? The gap between the release of GPT-3 and 4 is still much bigger than the time that has elapsed since 4 was already released so really, Based on what ?
there are no much reliable benchmarks which would measure what is gap really. I think currently corps compete in who will leak benchmarks to training data the most, hence o1 is world programming medalist, yet makes stupid mistakes.
I'm not as up-to-speed on the literature as I used to be (it's gotten a lot harder to keep up), but I certainly haven't heard of any breakthroughs. They tend to be pretty hard to predict and plan for.
I don't think we can continue simply tweaking the transformer architecture to achieve meaningful gains. We will need new architectures, hopefully ones that more closely align with biological intelligence.
In theory, the simplest way to real superhuman AGI would be to start by modeling a real human brain as a physical system at the neural level; a real neural network. What the AI community calls "neural networks" are only very loose approximations of biological neural networks. Real neurons are subject to complex interactions between many different neurotransmitters and neuromodulators and they grow and shift in ways that look nothing like backpropagation. There already exist decently accurate physical models for single neurons, but accurately modeling even C. elegans (as part of the OpenWorm project) is still a way's off. Modeling a full human brain may not be possible within our lifetime, but I also wouldn't rule that out.
And once we can accurately model a real human brain, we can speed it up and make it bigger and apply evolutionary processes to it much faster than natural evolution. To me, that's still the only plausible path to real AGI, and we're really not even close.
I was holding out hope for Q*, which OAI talked about with hushed tones to make it seem revolutionary and maybe even dangerous, but that ended up being o1. o1 is neat, but its far from a breakthrough. It's just recycling the same engine behind GPT-4 and making it talk to itself before spitting out its response to your prompt. I'm quite sure they've hit a ceiling and are now using smoke-and-mirrors techniques to keep the hype and perceived pace-of-progress up.
OpenAI's Orion (GPT 5/Next) is partially trained on synthetic data generated with a large version of o1. Which means if that works the data scarcity issue is more or less solved.
OpenAI has the biggest appetite for large models. GPT-4 is generally a bit better than Gemini, for example, but that's not because Google can't compete with it. Gemini is orders of magnitude smaller than GPT-4 because if Google were to run a GPT-4-sized model every time somebody searches on Google, they would literally cease to be a profitable company. That's how expensive inference on these ultra-large models is. OpenAI still doesn't really care about burning through hundreds of billions of dollars, but that cannot last forever.
This, I think, is the crux of it. OpenAI is burning money at a furious rate. Perhaps this is due to a classic tech industry hypergrowth strategy, but the challenge with hypergrowth strategies is that they tend to involve skipping over the step where you figure out if the market will tolerate pricing your product appropriately instead of selling it at a loss.
At least for the use cases I've been directly exposed to, I don't think that is the case. They need to keep being priced about where they are right now. It wouldn't take very much of a rate hike for their end users to largely decide that not using the product makes more financial sense.
They have, Anthropic Claude Sonnet 3.5 is superior to GPT-4o in every way, it's even better then their new o1 model at most things (coding, writing, etc.).
OpenAI went from GPT-4, which was mind blowing, to 4o, which was okay, to o1 which was basically built in chain-of-thought.
No new Whisper models (granted, advanced voice chat is pretty cool). No new Dalle models. And nobody is sure what happened to Sora.
OpenAI had a noticeable head start with GPT-2 in 2019. They capitalized on that head start with ChatGPT in late 2022, and relatively speaking they plateaued from that point onwards. They lost that head start 2.5 months later with the announcement of Google Bard, and since then they've been only slightly ahead of the curve.
It's pretty undeniable that OpenAI's lead has been diminished greatly from the GPT-3 days. Back then, they could rely on marketing their coherency and the "true power" of larger models. But today we're starting to see 1B models that are undistinguishable from OpenAI's most advanced chain-of-thought models. From a turing test perspective, I don't think the average person could distinguish between an OpenAI and a Llama 3.2 response.
In some domains (math and code), progress is still very fast. In others it has slowed or arguably stopped.
We see little progress in "soft" skills like creative writing. EQBench is a benchmark that tests LLM ability to write stories, narratives, and poems. The winning models are mostly tiny Gemma finetunes with single-digit parameter counts. Huge foundation models with hundreds of billions of parameters (Claude 3 Opus, Llama 3.1 405B, GPT4) are nowhere near the top. (Yes, I know Gemma is a pruned Gemini). Fine-tuning > model size, which implies we don't have a path to "superhuman" creative writing (if that even exists). Unlike model size, fine-tuning can't be scaled indefinitely: once you've squeezed all the juice out of a model, what then?
OpenAI's new o1 model exhibits amazing progress in reasoning, math, and coding. Yet its writing is worse than GPT4-o's (as backed by EQBench and OpenAI's own research).
I'd also mention political persuasion (since people seem concerned about LLM-generated propaganda). In June, some researchers tested LLM ability to change the minds of human subjects on issues like privatization and assisted suicide. Tiny models are unpersuasive, as expected. But once a model is large enough to generate coherent sentences, persuasiveness kinda...stops. All large models are about equally persuasive. No runaway scaling laws are evident here.
This picture is uncertain due to instruction tuning. We don't really know what abilities LLMs "truly" possess, because they've been crippled to act as harmless, helpful chatbots. But we now have an open-source GPT-4-sized pretrained model to play with (Llama-3.1 405B base). People are doing interesting things with it, but it's not setting the world on fire.
It feels ironic if the only thing that the current wave of Ai enables (other than novelty cases) is a cutdown of software/coding jobs. I don't see it replacing math professionals too soon for a variety of reasons. From an outsiders perspective on the software industry it is like it's practioners voted to make themselves redundant - that seems to be the main takeaway of ai to normal non tech people ive chatted with.
Many people have anecdotally, when I tell them what I do for a living, have told me that any other profession would have the common sense/street smarts to not make their scarce skill redundant. It goes further than that; many professions have license requirements, unions, professional bodies, etc to enforce this scarcity on the behalf on their members. After all a scarce career in most economies is one not just of wealth but higher social standing.
If all it does is allow us to churn more high level software, which let's be honest is demand inelastic due to mostly large margins on software products (i.e. they would of paid a person anyway due to ROI) it doesn't seem it will add much to society other than shifting profit in tech from Labor to Capital/owners. May replace call centre jobs too I guess and some low level writing jobs/marketing. Haven't seen any real new use cases that change my life yet positively other than an odd picture/ai app, fake social posts,annoying AI assistants in apps, maybe some teaching resources that would of been made/easy to acquire anyway by other means etc. I could easily live without these things.
If this is all it is seems Ai will do or mostly do it seems like a bit of a disappointment. Especially for the massive amount of money going into it.
> many professions have license requirements, unions, professional bodies, etc to enforce this scarcity on the behalf on their members. After all a scarce career in most economies is one not just of wealth but higher social standing.
Well, that's good for them, but bad for humanity in general.
If we had a choice between a system where doctors get high salary and lot of social status, or a system where everyone can get perfect health by using a cheap device, and someone would choose the former, it would make perfect sense to me to call such person evil. The financial needs of doctors should not outweigh the health needs of humanity.
On a smarter planet we would have a nice system to compensate people for losing their privilege, so that they won't oppose progress. For example, every doctor would get a generous unconditional basic income for the rest of their life, and then they would be all replaced by cheap devices that would give us perfect health. Everyone would benefit, no reason to complain.
That's a moral argument, one with a certain ideloogy that isn't shared by most people rightly or wrongly. Especially if AI only replaces certain industries which it looks like to be the more likely option. Even if it is, I don't think it is shared by the people investing in AI unless someone else (i.e. taxpayers) will pay for it. Socialise the losses (loss of income), privatise the profits (efficiency gains). Makes me think the AI proponents are a little hypocritical. Taxpayers may not to afford that in many countries, that's reality. For software workers we need to note only the US mostly has been paid well, many more software workers worldwide don't have the luxury/pay to afford that altruism. I don't think it's wrong for people who have to skill up to want some compensation for that, there is other moral imperatives that require making a living.
On a nicer planet sure, we would have a system like that. But most of the planet is not like that - the great advantage of the status quo is that even people who are naturally not altruistic somewhat co-operate with each other due to mutual need. Besides there is ways to mitigate that and still give the required services especially if they are commonly required. The doctors example - certain countries have worked it out without resorting to AI risks. I'm not against AI ironically in this case either, there is a massive shortage of doctors services that can absorb the increased abundance Imv - most people don't put software in the same category. There is bad sides to humanity with regards to losing our mutual dependence on each other as well (community, valuing the life of others, etc) - I think sadly AI allows for many more negatives than simply withholding skills for money if not managed right, even that doesn't happen everywhere today and is a easier problem to solve. The loss of any safe intelligent jobs for climbing and evening out social mobility due to mutual dependence of skills (even the rich can't learn everything and so need to outsource) is one of them.
> If all it does is allow us to churn more high level software, which let's be honest is demand inelastic due to mostly large margins on software products (i.e. they would of paid a person anyway due to ROI) it doesn't seem it will add much to society other than shifting profit in tech from Labor to Capital/owners.
If creating software becomes cheaper then that means I can transform all the ideas I’ve had into software cheaply. Currently I simply don’t have enough hours in the day, a couple hours per weekend is not enough to roll out a tech startup.
Imagine all the open source projects that don’t have enough people to work on them. With LLM code generation we could have a huge jump in the quality of our software.
With abundance comes diminishing relative value in the product. In the end that skill and product would be seen as worth less by the market. The value of doing those ideas would drop long term to the point where it still isn't worth doing most of them, at least not for profit.
It may seem this way from an outsiders perspective, but I think the intersection between people who work on the development of state-of-the-art LLMs and people who get replaced is practically zero. Nobody is making themselves redundant, just some people make others redundant (assuming LLMs are even good enough for that, not that I know if they are) for their own gain.
Somewhat true, but again from an outsiders perspective that just shows your industry is divided and therefore will be conquered. I.e. if AI gets good enough to do software and math I don't even see AI engineers for example as anything special.
many tech people are making themselves redundant, so far mostly not because LLMs are putting them out of jobs, but because everyone decided to jump on the same bandwagon. When yet another AI YC startup surveys their peers about the most pressing AI-related problem to solve, it screams "we have no idea what to do, just want to ride this hype wave somehow"
>But once a model is large enough to generate coherent sentences, persuasiveness kinda...stops. All large models are about equally persuasive. No runaway scaling laws are evident here.
Isn't that kind of obvious? Even human speakers and writers have problems changing people's minds, let alone reliably.
The ceiling may be low, but there are definitely human writers that are an order of magnitude more effective than the average can-write-coherent-sentences human.
I don’t think you should expect exponential growth towards greater correctness past good enough for any given domain of knowledge it is able to mirror. It is reliant on human generated material, and so rate limited by the number of humans able to generate the quality increase you need - which decreases in availability as you expect higher quality. I also don’t believe greater correctness for any given thing is an open ended question that allows for experientially exponential improvements.
Though maybe you are just using exponential figuratively in place of meaning rapid and significant development and investment.