Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why LLMs are not and probably will not lead to "AI" (an opion)
24 points by kylebenzle on Jan 12, 2024 | hide | past | favorite | 39 comments
As someone working in statistics and peripherally machine learning it has been endlessly tiresome to hear LLMs be marketed as "AI" to an unsuspecting audiance. LLMs are no closer to AI than Alexa was this time last year.

While the capabilities of Large Language Models are impressive, calling them "AI" remains contentious. Here's why some in the technical community, including Sam Altman, have our doubts:

Limited understanding and reasoning: LLMs excel at pattern recognition and statistical analysis, but they lack true understanding of the data they process. They can't reason logically, draw meaningful conclusions, or grasp the nuances of context and intent. This limits their ability to adapt to new situations and solve complex problems beyond the realm of data driven prediction.

Black box nature: LLMs are trained on massive datasets. This "black box" nature makes it challenging to explain their predictions, debug errors, or ensure unbiased outputs.

Lack of "general intelligence": LLMs currently lack the broad, transferable intelligence that characterizes humans. They excel at specific tasks within their training data, but struggle with novel situations or requiring different skills. An inability to generalize outside their training data restricts their claim to the title of "AI."

Focus on prediction over understanding: LLMs, for all their impressive feats, remain slaves to their training data. They excel at mimicking and recombining existing information, akin to a masterful DJ remixing familiar tracks. They remain powerful tools, like supercharged search engines and spell checkers, but calling them AI risks mistaking virtuosity for originality. LLMs are inherently statistical models, predicting outputs based on past observations, nothing more.

Overestimating progress: The rapid advancements in LLMs can lead to overoptimistic claims about their capabilities. Comparing them to intelligence is misleading, the underlying mechanisms and levels of understanding differ significantly.



Without a definition of "true understanding" this is meaningless.

Saying they "can't reason logically" is also purely subjective - you out the benchmark much higher than I do, given I've seen ChatGPT provide reasoning at a level many humans fail to reach on a regular basis. This does not mean it's good at it, but that humans reasoning ability is often rather poor.

I see nothing in this other than an argument from them not being good enough now coupled with unsupported implied speculation of what understanding and reasoning and intelligence implies that we have little evidence either for or against.

All in all this comes across as blind belief.


> given I've seen ChatGPT provide reasoning at a level many humans fail to reach on a regular basis.

and you don't know for sure if it was really reasoning or memorization and stochastic parroting. From another hand there are many research results demonstrating that neural networks have limitation in learning even something like multiplying numbers with many digits.


I don't know if humans are "really reasoning or memorization and stochastic parroting" either without a testable hypothesis of what makes something "really reasoning" as opposed to the other two.

And most humans struggle with multiplying numbers with even quite few digits.


Okay, I mentioned this in some of my past comments, but try this:

Ask your "AI" something about a non-popular domain (e.g., Clifford algebra) or even ask it to derive something (e.g., the geometric product) given some axioms and definitions.

Mathematics is precise as it gets, yet it failed to derive the geometric product given a set of axioms (e.g., distributive property) and definitions. Ask it to derive a theorem from perspective geometry.

Or, try something else that is not so popular or widespread (something outside of math), I suspect, that it also does a poor job deriving things from principles (axioms, definitions, theorems).


> And most humans struggle with multiplying numbers with even quite few digits.

most of the humans can learn the algorithm, and do many iterations reliably if you give pen and paper, current LLMs struggle with this.

So, if we define reasoning as learning and executing algorithm of complexity N, than humans can learn more complex algorithms for higher N.


Current LLMs have gotten better at this with each iteration without the extensive rote repetition of applying the algorithms following by additional training on corrections that we subject people to.

Yet, still, I think you overestimate the average humans ability to reliably execute even algorithms that simple.

I've had LLMs do that correctly up to far larger numbers than I've multiplied by hand. I could do it, but I'm not confident I wouldn't get sloppy and make mistakes without going back and reviewing my work.

I've also had LLMs execute algorithms most people wouldn't even understand. E.g. converting between NFAs and DFAs, or taking a grammar in an unspecified BNF variant abd generate grammatically valid sentences in it, or conversely validating a sentence against the grammar of a fictional language.

I think a whole lot of the arguments here assume a level of human reasoning well above average.

That's not to say current LLMs do not also have glaring blind spots, but so it seems to me does people dismissing their apparent reasoning out of hand because of assumptions about human abilities we mostly haven't tested.


> I've had LLMs do that correctly up to far larger numbers than I've multiplied by hand.

but given enough time you can multiply any numbers, while LLM has fundamental inability to learn algorithm.

> I've also had LLMs execute algorithms most people wouldn't even understand. E.g. converting between NFAs and DFAs

this is a not good quality discussion, such claims should be expressed in the form of some research results: describing what you do exactly, how data has been obtained, verified for correctness, what is complexity of your NFAs etc, otherwise I personally have hard time to judge how indicative your claim.


> but given enough time you can multiply any numbers, while LLM has fundamental inability to learn algorithm.

GPT4 at least knows the algorithms - many alternative version - and will follow them. What it won't do, just like most humans, is take the time to go over it and review each step for possible errors without further prompting and inducement.

Yes, humans could. Most humans don't, without outside pressure. With pressure applied and LLM will too.

Again, this is an example of how people overestimate how humans do in the same condition.

I would suggest you look up some studies on numeracy - sure, most people can learn if put through enough pressure, and good humans would do well against currebt LLMs, but average adult human numeracy is shockingly poor.

Consider if you would question people's ability to reason in general if you find a few examples of humans who do poorly at it.

It is fascinating when people compare LLMs this way against some platonic ideal of a human without considering that many of the failures of LLMs to stick to the prompt occurs in humans too. E.g. when you argue a human could do something with enough time, you leave out the effort you'd need to expend to convince said human to do so, and follow your instructions carefully enough.

Yes, there are still plenty of conditions where humans can beat LLMs given sufficient inducement, and even in many cases easily. But that is a very different subject.

> this is a not good quality discussion, such claims should be expressed in the form of some research results: describing what you do exactly, how data has been obtained, verified for correctness, what is complexity of your NFAs etc, otherwise I personally have hard time to judge how indicative your claim

I'll happily do that if you do the same for your unsubstantiated claims first.

Otherwise you can simply actually try to ask ChatGPT about these things yourself, and then try to ask a random person about them.


> With pressure applied and LLM will too.

unless it is fundamentally incapable.


ChatGPT knows how to write Python code, run it, and use the results.

It can also use WolframAlpha. It’s not a pure LLM solution but it’s good enough to multiply numbers.

Have you tried it?


ChatGPT knows how to multiply step by step without either of those, but like a human, who'll be annoyed and usually sloppy if you ask them to do things step by step instead of taking shortcuts, you increasingly need to push it to get it to do so with care.

The irony of these objections is that people speak out of a "but we learned to do that at school" way of thinking, often on the basis of above average skills, rather than considering that while most people maybe could (I'm not convinced), most people won't do it step by step unless you really goad them into it and pay attention that they don't skip or gloss over steps.


The question is inability of ChatGPT to do logical reasoning. Multiplying numbers is a simple test to prove this.

The same with coding: it can produce code for tasks where something similar was in training dataset, but if you ask to do completely unknown problem - it will fail.


Yep, or somehow they make it apparent that they indeed "parrot" stuff they had in their large dataset. See my past comments for details. Fooled by randomness? (See Nassim Nicholas Taleb)


Is there a difference between memorization + stochastic parroting if it is indistinguishable from human understanding?

If a hunk of metal looks like a car, sounds like a car, and drives like a car is it a car?


Who knows, but until it can fool me by randomness (like Nassim Taleb suggests), i.e., deriving something given a set of axioms and definitions (from a non-widespread domain), then I might join your "ignorance is bliss" argument.

Also, hype is good for the stock market, so keep hyping it up? At the end, Elon Musk success seems to stem from hype and sensationalism. CNN (i.e., "breaking news" everywhere) has financial success due to sensationalism and hype, so hype is good, I guess?


it is indistinguishable on some trivial results, but not scalable for next level of complexity.


I agree with your perspective overall but I find when something like this is posted the responses are usually along the lines of "how do we know that statistical prediction on a large enough scale is not akin to understanding?" Which to me is a somewhat lazy retort but highlights that one huge issue in this field is not really having definitions or an understanding of our own about what we are debating, and that we can't rely on saying "X is not Y" to dispel hype because at our current level of rigor around definitions, someone can just as easily say "nuh uh, X is Y".

For me the biggest indicator that something is still missing, and a reason not mentioned here, is the lack of any killer app besides general chatbots and code completion, even more than a year out from their explosion onto the scene (at which point both of those use cases already existed). It was assumed that the LLMs would quickly change everything but it appears their seemingly-minor limitations are pretty fundamental for many use cases.


This isn't an argument about the capabilities of LLMs, it's an argument about whether we use the term AI - and that ship sailed a long long time ago.

If your point is that LLMs won't lead to some kind of extreme AI capability that you're imagining or you've read about in science fiction books, you can just say that instead. But LLMs are already as AI as can be.


I am doomed to repeat myself, but I mentioned it countless times in the past and I also mentioned it here: Try asking it to derive something given a set of definitions and axioms from a non-widespread domain (e.g., Clifford algebra, perspective geometry). Then you might see that it parrots what the majority of its dataset says. It is essentially similar to Google, but you don't mix and match the search results, the LLM or the "AI" does.


I don't see how that's relevant.


The non-popular or non-widespread aspect of a "prompt" (e.g., "derive the geometric product given axioms/definitions ...") should be understood as a data point that it might lack reasoning skills. As it seemingly parrots what most of its dataset "says". For more details, check my other past comments. Calling such LLMs "AI" seems to be sensationalism at work.


This is a problem of your expectations. Things don't need to be perfect to qualify as AI.


Hmm... wouldn't you agree that a formal (relatively accurate language) such as math is easier to get right than an non-accurate human language?


So as understanding of "AI tech" improves it become a tool with specific label, and magical AI hype migrates to something more advanced and less understood. However the utility of stochastic parrots in this case is more general and adaptable to context, you can have it work as chatbot, wikipedia and web search as well as writing code/script/markup, so there is semblance to "General Intelligence" and developing this capability improves the result, without reaching some dead-end state. Perhaps we overvalue "General Human Intelligence" and think stochastic parrots are somehow below us, when our brains are shaped by similar forces and confabulate bullshit just as well as stochastic parrots.


Can you elaborate on the distinction you see between prediction and understanding?

I see them both being very very similar, possibly even the same thing. Are you saying a prediction only matters if it was done in a certain way, following a specific thought process?


You might see them both "being very, very similar", but that does not mean that it is.

I mentioned it countless times already, but it gets tiresome to repeat myself. Just ask your "AI", capable of "thought", a question (e.g., derive something given a set of axioms and definitions) from a non-widespread field (e.g., perspective geometry, Clifford algebra).

Then you might also arrive at the conclusion that it is a hyped up "Google search results mixer and matcher".


>They can't reason logically, draw meaningful conclusions, or grasp the nuances of context and intent.

Can you come up with some examples that would demonstrate this on ChatGPT (GPT-4)? I'm really curious, as last time I tried that I was not able to come up with good examples.



And if you ask the average person in the street they wouldn't be able to answer any of these either.

I feel like this has covered 100 times artificial intelligence doesn't necessarily mean superhuman intelligence.

People answer things wrong all the time. Showing an intelligence gets something wrong doesn't mean it's not an intelligence.


Seems like you are moving the goalpost (i.e., don't expect it to derive a theorem given a set of definitions and axioms from a non-widespread field). It should, in fact, derive any formula given a set of axioms and definitions, but we can't be sure whether randomness fooled us (again). If an LLM can indeed derive any theorem given axioms/definitions, even if we make up a field, then at this point it might not matter:

Ignorance is bliss.


are we discussing here "intelligence" or high IQ? those problems sit on high IQ level for humans, and are not a good fit for testing intelligence in general. if one claims that passing those tests is a requirement for intelligence, are they saying that humans with IQ 80 are not intelligent?


This post seems to be trapped in the line of thinking of 'to be intelligent it must replicate human' which really isn't and hasn't been the goal of AI in a long time from what I can tell. Why limit ourselves to doing something we're already good at? why not build things to do things that are both intelligent and excel at tasks we aren't so good at, or don't like doing?

BIG NOTE there is no universally agreed upon and useful definition for intelligence, so discussing intelligence is generally a bad time.


Was this (from the second paragraph onwards) written by GPT-4, by any chance?


Nope.


LLMs are a component of AI. LLMs do not comprise AI. I expect that AGI will require a mix of technologies, much like the human brain system.

Data require processing will be transferred between these mixes of technologies.


A popular user here on HN also had a similar revelation or sentiment. We will see...


LLM may lead to AI the way language leads to society. It might be the glue that connects more distinct computational frameworks.

and it still may be the interface by which we interact with a fully sentient machine.


OPENAI is trying to release these views to confuse its opponents, I can clearly feel that the day when mathematics as a profession is disrupted is not far away from us, 5-10 years


not much of an opponent if you buy into it? These views come from people who haven't worked with and don't really understand the domain. OP states they only tangentially work with ML things, sticking more to the domain one might understand as traditional statistics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: