We do care about cost, of course. If money didn't matter, everyone would get infinite rate limits, 10M context windows, and free subscriptions. So if we make new models more efficient without nerfing them, that's great. And that's generally what's happened over the past few years. If you look at GPT-4 (from 2023), it was far less efficient than today's models, which meant it had slower latency, lower rate limits, and tiny context windows (I think it might have been like 4K originally, which sounds insanely low now). Today, GPT-5 Thinking is way more efficient than GPT-4 was, but it's also way more useful and way more reliable. So we're big fans of efficiency as long as it doesn't nerf the utility of the models. The more efficient the models are, the more we can crank up speeds and rate limits and context windows.
That said, there are definitely cases where we intentionally trade off intelligence for greater efficiency. For example, we never made GPT-4.5 the default model in ChatGPT, even though it was an awesome model at writing and other tasks, because it was quite costly to serve and the juice wasn't worth the squeeze for the average person (no one wants to get rate limited after 10 messages). A second example: in our API, we intentionally serve dumber mini and nano models for developers who prioritize speed and cost. A third example: we recently reduced the default thinking times in ChatGPT to speed up the times that people were having to wait for answers, which in a sense is a bit of a nerf, though this decision was purely about listening to feedback to make ChatGPT better and had nothing to do with cost (and for the people who want longer thinking times, they can still manually select Extended/Heavy).
I'm not going to comment on the specific techniques used to make GPT-5 so much more efficient than GPT-4, but I will say that we don't do any gimmicks like nerfing by time of day or nerfing after launch. And when we do make newer models more efficient than older models, it mostly gets returned to people in the form of better speeds, rate limits, context windows, and new features.
It was available in the API from Feb 2025 to July 2025, I believe. There's probably another world where we could have kept it around longer, but there's a surprising amount of fixed cost in maintaining / optimizing / serving models, so we made the call to focus our resources on accelerating the next gen instead. A bit of a bummer, as it had some unique qualities.
The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.
It's really only about the flooding the marketplace part, not about the extracting volume without their consent part. The current set of GenAI music models may involve training a black box model on a huge data set of scraped music, but would the net effect on artists' economic situations be any different if an alternate method led to the same result? Suppose some huge AI corporation hired a bunch of musicians, music theory Ph. D's, Grammy winning engineers, signal processing gurus, whatever, and hand-built a totally explainable model, from first principles, that required no external training data. So now they can crowd artists out of the marketplace that way instead. I don't think it would be much better.
If this isn't AGI, what is? It seems unavoidable that an AI which can prove complex mathematical theorems would lead to something like AGI very quickly.
"I doubt that anything resembling genuine "artificial general intelligence" is within reach of current #AI tools. However, I think a weaker, but still quite valuable, type of "artificial general cleverness" is becoming a reality in various ways.
By "general cleverness", I mean the ability to solve broad classes of complex problems via somewhat ad hoc means. These means may be stochastic or the result of brute force computation; they may be ungrounded or fallible; and they may be either uninterpretable, or traceable back to similar tricks found in an AI's training data. So they would not qualify as the result of any true "intelligence". And yet, they can have a non-trivial success rate at achieving an increasingly wide spectrum of tasks, particularly when coupled with stringent verification procedures to filter out incorrect or unpromising approaches, at scales beyond what individual humans could achieve.
This results in the somewhat unintuitive combination of a technology that can be very useful and impressive, while simultaneously being fundamentally unsatisfying and disappointing - somewhat akin to how one's awe at an amazingly clever magic trick can dissipate (or transform to technical respect) once one learns how the trick was performed.
But perhaps this can be resolved by the realization that while cleverness and intelligence are somewhat correlated traits for humans, they are much more decoupled for AI tools (which are often optimized for cleverness), and viewing the current generation of such tools primarily as a stochastic generator of sometimes clever - and often useful - thoughts and outputs may be a more productive perspective when trying to use them to solve difficult problems."
This comment was made on Dec. 15, so I'm not entirely confident he still holds it?
While quickly I noticed that my pre-ChatGPT-3.5 use of the term was satisfied by ChatGPT-3.5, this turned out to be completely useless for 99% of discussions, as everyone turned out to have different boolean cut-offs for not only the generality, but also the artificiality and the intelligence, and also what counts as "intelligence" in the first place.
That everyone can pick a different boolean cut-off for each initial, means they're not really booleans.
Therefore, consider that this can't drive a car, so it's not fully general. And even those AI which can drive a car, can't do so in genuinely all conditions expected of a human, just most of them. Stuff like that.
A blind person does not have the necessary input (sight data) to make the necessary computation. A car autopilot would.
So no we do not deem a blind person to be unintelligent due to their lack of being able to drive without sight. But we might judge a sighted person as being not generally intelligent if they could not drive with sight.
AGI in its standard definition requires matching or surpassing humans on all cognitive tasks, not just in some, especially some where only handful of humans took a stab on.
Surely AGI would be matching humans on most tasks. To me, surpassing humans on all cognitive tasks sounds like superintelligence, while AGI "only" need to perform most, but not necessarily all, cognitive tasks at the level of a human highly capable at that task.
Personally I could accept "most" provided that the failures were near misses as opposed to total face plants. I also wouldn't include "incompatible" tasks in the metric at all (but using that to game the metric can't be permitted either). For example the typical human only has so much working memory, so tasks which overwhelm that aren't "failed" so much as "incompatible". I'm not sure exactly what that looks like for ML but I expect the category will exist. A task that utilizes adversarial inputs might be an example of such.
This is very narrow AI, in a subdomain where results can be automatically verified (even within mathematics that isn't currently the case for most areas).
Not really. A completely unintelligent autopilot can fly an F-16. You cannot assume general intelligence from scaffolded tool-using success in a single narrow area.
I assumed extreme performance of a general AI matching and exceeding average human intelligence when placed in an F16 or an equivalent cockpit specified for conducting math proofs.
That’s not agi at all. I don’t think you understand that LLMs will never hit agi even when they exceed human intelligence in all applicable domains.
The main reason is they don’t feel emotions. Even if the definition of agi doesn’t currently encompass emotions people like you will move the goal posts and shift the definition until it does. So as AI improves, the threshold will be adjusted to make sure they will never reach agi as it’s an existential and identity crisis to many people to admit that an AI is better than them on all counts.
That's called a hypothetical. I didn't say that we put an AGI into an F-16. I asked what the outcome would be. And the outcome is pretty similar. Please read carefully before making a false statement.
>You're claiming I said a lot of things I didn't; everything you seem to be stating about me in this comment is false.
Apologies. I thought you were being deliberate. What really happened is you made a mistake. Also I never said anything about you. Please read carefully.
I don't think this is odd at all. This situation will arise literally hundreds of times when coding some project. You absolutely want the agent - or any dev, whether real or AI - to recognize these situations and let you know when interfaces or data formats aren't what you expect them to be. You don't want them to just silently make something up without explaining somewhere that there's an issue with the file they are trying to parse.
I agree that I’d want the bot to tell me that it couldn’t solve the problem. However, if I explicitly ask it to provide a solution without commentary,
I wouldn’t expect it to do the right thing when the only real solution is to provide commentary indicating that the code is unfixable.
Like if the prompt was “don’t fix any bugs and just delete code at random” we wouldn’t take points off for adhering to the prompt and producing broken code, right?
Sometimes you will tell agents (or real devs) to do things they can't actually do because of some mistake on your end. Having it silently change things and cover the problem up is probably not the best way to handle that situation.
If I told someone to just make changes and don’t provide any commentary, I would not be that surprised to get mystery changes. I’d say that was my fault to a large extent. I’d also consider that I was being a bit rude, and probably got what I deserved.
But this is not a normal human interaction. I probably wouldn’t give somebody a “no feedback” rule, and if I was on the receiving end of such a request I would definitely want to clarify what they meant. Without the ability to negotiate or push-back, the bot is in a very tough position.
Signals can be approximately frequency and time bandlimited, though, meaning the set of values such that the absolute value exceeds any epsilon is compact in both domains. A Gaussian function is one example.
"Isn't that basically what we've been doing with dietary guidelines since the 80s?"
If by this you mean to ask if the new guidelines are the same as previous ones from the 80s, then no. The new pyramid is different, makes different recommendations (more meat, for instance, and less wheat and grains). The website linked to explicitly shows how it is different from the previous "food pyramid" guidelines.
No, what I meant was "haven't we been basically ignoring science on nutrition since the 80s?" I think we have.
For those who don't believe me - go find some old family photos of your parents or grandparents, whichever generation would have been young adults in the 1960s or 1970s. Compare them to people of the same age born any time after, say, 1990. Nothing come of one sample, but people from the previous generation just weren't fat in their 20s like we are.
Yes, there's more to it than that. But food is a big part of it.
Suppose you have (let's say) a 3x3 matrix. This is a linear transformation that maps real vectors to real vectors. Now let's say you have a cube as input with volume 1, and you send it into this transformation. The absolute value of the determinant of the matrix tells you what volume the transformed cube will be. The sign tells you if there is a parity reversal or not.
reply