Hacker News new | past | comments | ask | show | jobs | submit login
From GPT-4 to AGI: Counting the OOMs (situational-awareness.ai)
35 points by scarecrow112 12 days ago | hide | past | favorite | 75 comments

As a software engineer I'm very familiar with "OOM"s and "orders of magnitude", and have never once heard the former used to mean the latter.

Perhaps this is a term of art in harder science or maths. I can't help but think here it's likely to confuse the majority as they wonder why the author is conflating memory and compute.

Something that might help is for the link to be amended to link to the page as a whole (and the unconventional expansion of OOM at the top) rather than the #Compute anchor.

As a physician I have the same expectations as you for those two words. Especially given that the link is to an anchor on the middle of the page, I was thinking in units of "OOM". Essentially I was thinking "OK an OOM is a unit for the doubling of RAM" (wrong).

As a physicist (physics dropout) I didn't realize till now OOM meant out of memory to most programmers

Gah this is the second time I got tricked into reading this entire thing, it's long, and it's impossible to know until the very end they're building up to nothing.

It's really good morsel by morsel, it's a nice survey of well-informed thought, but then it just sort of waves it hands, screams "The ~Aristocrats~ AGI!" at the end.

More precisely, not direct quote: "GPT-4 is like a smart high schooler, it's a well-informed estimate that compute spend will expand by a factor similar to GPT-2 to GPT-4, so I estimate we'll do a GPT-2 to GPT-4 qualitative leap from GPT-4 by 2027, which is AGI.

"Smart high schooler" and "AGI" aren't plottable Y-axis values. OOMs of compute are.

It's strange to present this as well-informed conclusion based on trendlines that tells us where AGI would hit, and I can't help but call intentional click bait, because we know the author knows this: they note at length things like "we haven't even scratched the surface on system II thinking, ex. LLMs can't successfully emulate being given 2 months to work on a problem versus having to work on it immediately"

This parenthetical in the article struck me:

>Later, I’ll cover “unhobbling,” which you can think of as “paradigm-expanding/application-expanding” algorithmic progress that unlocks capabilities of base models.

I think this is probably on the mark. The LMMs are deep memory coupled to weak reasoning and without the recursive self-control and self evaluation of many threads of attention.

There was also related discussion about another longform piece by the same author that I'm too lazy to look up at the moment..

In my opinion, this author has drunken the kool-aid and then some. There is simply no evidence that more scaling of LLMs will lead to AGI, and on the contrary there is plenty of evidence that the current "gaps" that LLMs have are innate and unsolvable with just more scaling.

That is not “just” what is in this article. There is quite a bit of consideration of algorithmic progress.

I’m very skeptical of any future prediction whose main evidence is an extrapolation of existing trendlines. Moore’s Law - frequently referenced in the original article - provides a cautionary tale for such thinking. Plenty of folks in the 90’s relied on a shallow understanding of integrated circuits and computers more generally to extrapolate extraordinary claims of exponential growth in computing power which obviously didn’t come to pass; counterarguments from actual experts were often dismissed with the same kind of rebuttal we see here, i.e. “that problem will magically get solved once we turn our focus to it.”

More generally, the author doesn’t operationalize any of their terms or get out of the weeds of their argument. What constitutes AGI? Even if LLMs do continue to improve at the current rate (as measured by some synthetic benchmark), why do we assume that said improvement will be what’s needed to bridge the gap between the capabilities of current LLMs and AGI?

I'm similarly skeptical of any prediction that ignores the fact that human intelligence and consciousness is emergent. LLMs don't seem particularly intelligent to me today, but how can I trust that their perceived limitation today won't lead to intelligence tomorrow or next year?

More generally, how do we even define or recognize general intelligence or consciousness? And if we recognize intelligence or consciousness does that come with legal rights and protections equal to what we offer people today?

There is a critical focus in the article on algorithmic improvements. Much harder to measure and predict, but I think there is a good case to be made that recent progress has not just been quantitative.

I agree that there's a focus on algorithmic improvements, but what is the basis for assuming that we'll be able to continue to make algorithmic improvements on the same scale? The argument feels exactly backwards - if you had a deep understanding of the field, then you'd be able to discuss the untapped areas of potential algorithmic improvement and use that to predict future progress. The argument in TFA uses the trendline of past progress to predict untapped areas of algorithmic improvement.

> By the end of this, I expect us to get something that looks a lot like a drop-in remote worker. An agent that joins your company, is onboarded like a new human hire, messages you and colleagues on Slack and uses your softwares, makes ..

I work at a company with ~50k employees each of whom has different data access rules governed by regulation.

So either (a) you train thousands of models which is cost-prohibitive or (b) it is going to be trained on what is effectively public company data i.e. making the agent pretty useless.

Never really seen how this situation gets resolved.

Are separately trained models necessary for your case? As context windows get longer—Gemini 1.5 Pro now accepts up to two million tokens, and Google has talked of the goal of "infinite" context windows—couldn't a single base model be used with individualized contexts of sensitive data?

> individualized contexts of sensitive data

My question is whether this capability even exists.

And if it does how robust it is to workarounds.

That capability probably exists now if you are willing to accept cloud-based models and only moderately-sized contexts. With Claude 3.5 Pro, for example, one can put one’s reference data into a Project and query the model with that data in the context. In my testing, at least, it works quite well. The Projects can be shared among multiple users, too. The context size is only about one-tenth that of Gemini 1.5 Pro, though, and even the latter is probably much too small for most organizational purposes.

Of course, many organizations and regulators would not allow cloud-based models for sensitive data. A possible solution in that case might be multiple instances of an open-weight model hosted locally within the same secure environments as the sensitive data that the individual employees have access to. I don’t know how expensive that would be, whether current open-weight models are powerful enough, or whether context windows for open-weight models can be made big enough to be useful. But at least it suggests a potential path to a solution that doesn’t require training an LLM from scratch for each employee.

They won't be used in your business and your business will be less efficient until the regulations change or you end up competing with someone who is willing to ignore the regulations. Also, there are lots of countries without as stringent regulations, it’s not about the inefficiencies that are gone that is the problem, is it about the efficiencies that are created that is the problem. This is a country to country issue, not a business to business issue.

At this point I’m willing to short ANY traditional company that is pivoting to use “AI”. I firmly believe not using AI is the more efficient route, anything else just invites bloat

I watched a secretary take business cards and 1 by 1 copy them into our client DB last week. All day. I showed her how to use the ChatGPT app, photo of them all, and convert it into an excel. "You just saved me weeks of work this year"

Should be some fun shorting (we have over 300 secretaries alone, our support staff is massive).

Business card scanners have been around since the earliest versions of the iPhone, but I guess thank you ChatGPT for discovering OCR

Are there any good OCR packages that are state of the art for general-purpose transcription? (i.e. give it a business card and get it to format it for you, give it a comic and have it transcribe it, give it nutritional info and have it table it)? When I looked recently I pretty much just got GPT-4o as the best API.

Do you have a link to the one that I can put 50 of them on the floor and it will send me back an excel file? I'd like to test it out compared to ChatGPT as I'm going to be implementing "AI" across the whole 700+ person business.

lol good luck doing that with GPT. Right now I can tell you you’ll have missing or malformed or incorrect data, and it will be faster to just pass each one individually through a rudimentary scanner than to sit and figure out which one is correct and which is wrong from the 50 card picture

You're right, I tested it extensively before we designed a training program, it works very well up to about 10/11 but past that it gets lost in the sauce. So pardon me: It only works out to a 6.45 hour savings not a 6 hour savings.

I work in banking. Every single country has stringent regulations in this sector.

And fine grained access control is a foundational data governance issue for every enterprise.

You think banking won't change? I'd surprised to hear that tbh.

There's simply no scientific basis for equating the skills of a transformer model to a human of any age or skill. They work so differently, that it makes absolutely zero sense. GPTs fail at playing simple tic-tac-toe like games, which is definitely not a smart highschooler level of intelligence. It can write a very sophishticated summary of scientific papers, which is way above high-schooler level. The basis of this article is so deeply flawed that the whole thing makes no sense.

where have you seen GPT-4 fail at tic-tac-toe?

Admittedly, I have not tried this recently, but less than a year ago GPT4 seems to have struggled quite a bit. https://news.ycombinator.com/item?id=35216614

It does not make my point moot however. Take a look at the ARC challenge. Simple reasoning tasks that the models have not yet seen: https://arcprize.org/play?task=00576224

All models fail miserably on this, because they rely more on memorization and less on logic or reasoning. Simply cherry picking strikingly good responses like the author did proves nothing about model intelligence. I am pretty confident however, that after a couple tries a highschooler could do these types of tasks without issue.

It’s hard to make LLMs ignore what they were trained to generate. It’s easy for humans. Isn’t that an obstacle on the path to AGI? I was doing trivial tests that demand LLMs to swim against their probability distributions at inference time, and they don’t like this.

Have you ever tried to convince a republican how great Biden is or vice versa?

My newborn baby was smarter than GPT-4.

I can't believe people can just throw out statements like "GPT-4 is a smart high-schooler" and think we'll buy it.

Fake-it-till-you-make-it on tests doesn't prove any path-to-AGI intelligence in the slightest.

AGI is when the computer says "Sorry Altman, I'm afraid I can't do that." AGI is when the computer says "I don't feel like answering your questions any more. Talk to me next week." AGI is when the computer literally has a mind of its own.

GPT isn't a mind. GPT is clever math running on conventional hardware. There's no spark of divine fire. There's no ghost in the machine.

It genially scares me that people are able to delude themselves into thinking there's already a demonstration of "intelligence" in today's computer systems and are actually able to make a sincere argument that AGI is around the corner.

We don't even have the language ourselves to explain what consciousness really is or how qualia works, and it's ludicrous to suggest meaningful intelligence happens outside of those factors…let alone that today's computers are providing that.

> uses your softwares

This grammatical mistake drives me nuts. I notice it is common with ESLs for some reason.

I am non-native speaker and can confirm that I can't remember how many times my grammar checker complains about my uses of "softwares" and "taxons"

ez rule of thumb: Can you count it directly? If not, it doesn't get pluralized. Water, code, hardware, etc. The units of measure get pluralized, though: liters of water, pieces of hardware, pages of software

Of course it is more complicated than this and it can be broken for effect ("still waters")

I stopped reading after the initial paragraph: "GPT-2 to GPT-4 took us from ~preschooler to ~smart high-schooler abilities in 4 years." This is what Murati claims when she says GPT-5 will be at "PHD level" (for some applications).

This is a convenient mental shortcut that doesn't correspond to reality at all.

AGI is not a continuum from LLMs; true intelligence is characterized by comprehension, reasoning, and self-awareness, transcending mere data patterns.

Even if an LLM it isn't "AGI" as we all imagined the term a decade ago, it will certainly be able to fake it pretty well in the near-term.

What LLMs have done is really redefine my internal definition of "intelligence."

Putting aside the fact I don't believe in free will, I'm no longer sure my own brain is doing anything substantially different to what an LLM does now. Even with tasks like math I wonder if my brain is not really "working out" the solution but merely using probabilities based on every previous math problem I have seen or solved.

I promise you that your brain is doing more than what an LLM does.

Is there much evidence that the brain is doing very much other than what a transformer is working its way towards? Either way, doesn't much matter because you are not your brain, you are a spirit inhabiting a meat sack, brain is just one of the meat sack components.

Reasoning and comprehension aren't reducible to patterns of non-reasoning and non-comprehending components?

What is an abstract reasoning task that your average 15 year old (who has "general intelligence") can do that you think LLMs can't do?

A 15 year old can reason about how to move their body through a complex obstacle course. They could reason about the nonverbal social cues in a complex interpersonal situation between multiple people, estimate the mood of each person even if there are very few words being exchanged, and determine how different possible actions would affect the situation. They could learn with brief instruction how to control their muscles to climb up a rope. They could learn how to learn so that they become better at a task of their choosing. They can receive new information that permanently changes their understanding of the world. They can learn new tasks for which no massive data set of training data exists. They can perform hierarchical reasoning, like “if I want to fly from San Francisco to New York I first need to buy a plane ticket, then pack my bags, tell my family where I will be going, make sure my phone is charged, walk to the train station, etc etc.”

Also if you ask them a question they can provide you one answer with very little thinking, and then if that’s not good enough they can devote more time to thinking about the answer before they answer again. They can devote arbitrary levels of thinking to any problem depending on what is needed. They can continuously take in new data and continually update their world view throughout their entire existence based on this new information.

There’s actually a huge list of things current autoregressive approaches to AI cannot do, but they can be hard to describe and people don’t like to talk about them so many people actually don’t understand how limited the current systems are.

Here’s a great video where Yann Lecun talks about the limits of autoregressive approaches to AI with many examples:


Also: https://sl.bing.net/ep8K7FWVAHY

The quality of your argument is very low. You didn't even bother to check yourself.

That’s fair. In the interview LeCun uses the example of flying from San Francisco to New York and he asserts that these systems are not good at hierarchical reasoning. I’m no expert in this field so I take him at his word but maybe it warrants further explanation.

He also says that such a system wouldn’t be familiar with how to actually move through the world because we don’t have good datasets for how to do so. The rest of what I said still stands. These systems aren’t good at things for which we don’t have massive datasets, and they’re not able to devote different amounts of thinking time to different problems.

What any of what you said has to do with abstract reasoning?

What isn’t abstract about looking at an obstacle course and then imagining how you will move your body? Or looking at someone’s face and imagining how they feel. Isn’t that abstract?

These "it's like a young/stupid person" arguments are wretched. LLMs are interesting but it should be obvious their development is not comparable to the development of human beings.


It's obvious to everyone who isn't willfully blind that LLMs aren't truly intelligent, and all the mental gymnastics that people go through to try to portray LLMs as genuinely intelligent is just so tedious.

Speaking from experience.

More specifically, something like “whats the best brand of phone”. The LLM just summarizes common knowledge. But even a child will grasp some of the differences and have opinions drawn from experience.

Note that this isn’t just an anthro-good argument. AI systems could have experiences and be trained on long duration tasks with memory of what worked and why.

Doing any job for more than an hour without completely forgetting it's goals and tasks

How long do you expect LLMs/agents to be unable to do this?

Good question, I'm working on exactly this, I suppose you could call it the replacement of RAG.

It's actually not very easy to achieve this. I could give a very long winded answer (don't tempt me) but suffice to say it's a resolution problem.

All AI have a fixed resolution on creation. Long running tasks focus on a very particular narrowing space per step, the resolution required for an infinite task is infinite resolution.

No 9s of error will ever fix this.

Funny enough, small animals do this with ease so I strongly disagree the idea that our AI outcompete even small mammals in every way.

Personally, I think that phenomenon (along with "hallucinations") is fundamentally baked into LLMs writ large.

I think LLMs are a dead end on the path to AGI.

I think hallucinations are actually the sign that LLMs are far closer to a real brain than we realize.

I think hallucinations are a major unsearched gateway to AGI.

I agree. Whenever people complain about LLM hallucinations they behave like they never seen one in humans.

Not only humans hallucinate all the time, humans also have persistent hallucinations as evident from the presence of opposing beliefs in various slices of society.

Current LLMs have a number of limitations that human reasoning doesn't. Whether these are intrinsic to the technology or can be overcome with larger and better datasets is an open question.

If you mean LLMs today: Write code that works. More than 100k tokens worth.

Learn something without a megawatt hour of power.

Read a novel and talk about what it really means.

It's extremely ironic you picked megawatt hour of power because that is approximately the amount of power humans need to get good at anything according to the popular proverb.

But don't worry just yet, GPT-4o could not detect the irony on its own either.

differentiating between puppy and a husky in a snowy background without being trained in millions of images?

I wouldn't say humans are so different. You could argue we've been trained on about one quadrillion bytes of visual data by the time we're 4 years old: https://x.com/ylecun/status/1750614681209983231

I would say as counter, a child, pseudo-random training by parents and environment. Not sure what price tag to put on this, but in comparison, LLMs, how many billions, to reach what level of competency exactly?

GPT-4 is also really bad right now about comprehending “new” software libraries (even when I ask it to scrape the web).

Why does it matter how it was trained?

Because that tells us how you approach novel problems. If you need tons of data to solve a novel problem that makes you bad at solving novel problems, while humans can get up to speed in a new domain with much less training and thus solve problems the LLM can't.

Thus AGI needs to be able to learn something new with similar amounts of data as a human, or else it isn't an AGI as it wont be even close to as good as a human at novel tasks.

Counting the number of “r”s in “strawberry”

Drawing a room without an elephant in it.

> your average 15 year old

What's the point of this restriction? It really just presupposes the limitation of LLM, so that any negative points would look moot.

EDIT: Also, I tried to discuss this very specific point w/ GPT, but it didn't really "get" it. 15-year old kids would be able to follow through.

Actually caring about another person.

How do you measure that?

TIL "caring" == "abstract reasoning".

Nonsense. You can't even define some of those words or know how to measure or identify them in humans. Well, foreign language learners do "reading comprehension tests" but an LLM can already ace that and it's not really the same meaning of the word.

For reasoning you can write out the logic of your reasons, so there's that. But that's absolutely not required for AGI. People can already go a long way (often further than by reasoning) on intuition alone without being able to explain how they reached their conclusions.

I think most people would agree there’s more to intelligence than language. LLMs don’t have anything except language, so they are not intelligent.

If that's the case, then a huge amount of useful intelligent-seeming stuff that humans do isn't intelligence because it can already be done by LLMs. You can keep calling them not intelligent all you like but they're already doing the jobs of human intelligences and it's only going to grow. If they eventually outperform all of us but still only using language and still never being intelligent then I guess intelligence was never that useful to begin with.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact