Hacker News new | past | comments | ask | show | jobs | submit login

There are limitations with LLMs but nobody is being clear about it.

The overall state of LLMs can be distilled into 3 points:

1. LLMs Can produce output that is equal in intelligence and creativity to humans. It can even produce output that is objectively better than humans. This EVEN applies to novel responses that are completely absent from the training set. This is the main reason why there's so much hype around LLMs right now.

2. The main problem is that LLMs can't produce good output consistently. Sometimes the output is better, sometimes it's the same, sometimes it's the worse. LLMs sometimes "hallucinate", they are sometimes inconsistent, they have an obvious memory problems. But none of these problems completely preclude the LLM from being able to produce output that is objectively better or the same as human level reasoning... it's just not doing this consistently.

3. Nobody fully understands the internal state of LLMs. We have limited understanding of what's going on here. We can understand inputs and outputs but the internal thought process is not completely understood. Thus we can only make limited statements about how an LLM thinks. Nobody can make a statement that LLMs obviously have zero understanding of the world, nobody can make a statement that LLMs are just stochastic parrots because we don't really get whats going on internally.

We only have output from LLMs that are remarkably novel and intelligent and output from LLMs that are incredibly stupid and inconsistent. The data does not point towards a definitive conclusion, it only points towards possibilities.

There's actually a cargo cult around downplaying AI. There are people who say clearly the AI is a stochastic parrot and they point to the intention of the algorithm itself behind the LLM. Yes the algorithm at the lowest level can be thought of as a next text predictor. But this is just a low level explanation. It's like saying a computer system is simply a turing machine executing simplistic instructions from a tape roll when such instructions can form things like games and 3D simulations of entire open worlds. The high level characteristics of this AI is something we currently cannot understand. Yes we built a text predictor, but something else that was not expected came out as an emergent property and this emergent property is something we still cannot make a definitive statement about.

What does the future hold? What follows is my personal opinion on this matter: I believe we will never be able to make a definitive statement about LLMs or even AGI. We will never be able to fully understand these things and instead AGI will come about from a series of trials, errors and accidents. What we build will largely come about as an art and as unexpected emergent properties of trying different things.

I believe this for two reasons. The first reason is philosophical. There's this sort of blurry concept that I believe that a complex intelligence cannot fully comprehend something that is equal in complexity to itself. We can only partially understand complexity equal to ourselves by symbolically abstracting parts away but not everything can be abstracted like this. Sometimes true understanding involves comprehension of the entire complex crystal without abstracting any part of it away. I believe that the concept of "intelligence" is such a crystal, but that's just a guess.

The second reason is scientific. We've had physical creations of complex intelligence right in front of ours eyes that we can touch, manipulate and influence for decades. The human brain and other animal brains have been studied extensively and our understanding has been consistently far away from any form of true understanding. Given the evidence of the failure to understand the human brain even when it's right in front of us, I'd say we're unlikely to ever completely understand LLMs as well.




> It's like saying a computer system is simply a turing machine executing simplistic instructions from a tape roll when such instructions can form things like games and 3D simulations of entire open worlds.

That's a bad analogy, none of those things are emergent behavior.

We can debate whether what an llm does is "emergent" - it's basically a definition thing though and isn't very interesting.

In reality, what's most surprising is that so much of what we say is explainable as next token prediction. It's not the other way around - we're showing how predictable we are, rather than how smart the AI is. But it's clear to me that it's in the outlying cases where the differences are. AI doesn't extrapolate outside it's training data, and even if it gets (100-\alpha)% of it's output right, there is always some alpha that's not in the training data and differentiates pattern matching or fancy key-value lookup (which is how we know AI works) from whatever intelligence is.


The analogy is about abstraction. It is not about emergent properties. A computer program is characterized differently when it's a 3D engine versus a a series of instructions.

Same with LLMs. We can characterize an LLM as a text predictor at the lowest level. But when the LLM gives me a novel response and solves a bug in my code, is text prediction really the only way to characterize that? Obviously there is a higher level analysis that we cannot fully comprehend yet.

In this case yes, the 3D engine is not an emergent property while the novel responses of an LLM are emergent. But this dichotomy is irrelevant to the analogy.


> Nobody can make a statement that LLMs obviously have zero understanding of the world, nobody can make a statement that LLMs are just stochastic parrots because we don't really get whats going on internally

For such strong statements that they do have an understanding of the world, and are not simply stochastic parrots (arguably the null hypothesis), the burden of proof is on the LLM proponents. Precious little proof has been provided, and stating that nobody knows what goes on inside obviously does not add to that.


> stating that nobody knows what goes on inside obviously does not add to that.

No one is saying that LLMs absolutely understand the world. But many people are saying that an aspect of understanding is a possibility likely enough to warrant further investigation and speculation. When someone says nobody knows what's going on, they are simply acknowledging this possibility.

Not realizing this and even dismissing the possibility of something beyond a stochastic parrot does not add to anything.

What is the burden of proof that you yourself are not a stochastic parrot? Seems like we can't tell either and we only can guess from your inputs and outputs. This blurriness of even proving sentience for you makes the output of LLMs that much more interesting. Do you seriously need to assign burden of proof here when clearly there is something very compelling going on here with the output of LLMs?


Saying that: 'we don't know how human intelligence works AND we don't know how AI works IMPLIES human intelligence EQUALS AI' is clearly a logical fallacy, sadly one heard far too often on HN, given that people here should know better.


Except this was never said.

What was said is that intelligent output from an LLM implies a "possibility" (keyword) of intelligence.

After all, outputs and inputs are all that we use to assume you as a human are intelligent. As of this moment we have no other way of judging whether something is intelligent or not.

You should read more carefully.


> What was said is that intelligent output from an LLM implies a "possibility" (keyword) of intelligence.

No it doesn't, because you can break down how they "learn" and generate output from their models, and thought or intelligence doesn't occur at any step of it.

It's like the first chess computer, which was actually a small guy hiding under the table. If you just show that to someone who treats it as a black box, sure, you might wonder if this machine understands chess. But if you put a little guy in there, you know for a fact that it doesn't.


No you can't break it down. The experts don't fully understand the high level implications of an LLM. This is definitive. We have no theoretical modelling of what LLMs will output. We can't predict it at all, therefore we do not fully understand LLMs from a high level.


'Possibility' - thus as per my original point, the burden of proof is on the proponents.

'outputs and inputs' - that is reduction almost to absurdity, clearly human intelligence is rather more than that. Again, we come back to the 'we don't understand human intelligence therefore something else we don't understand but seems to mimic humans under certain conditions is also intelligent'.


The only thing absurd is your argument. Short of mind reading inputs and outputs are the only thing we have to determine what is intelligent. Go ahead prove to me you are an intelligent being without emitting any output and I'll 100 percent flip my stance and believe you.

That is the whole point of the turing test. Turing developed it simply because we can't fully know what is intelligent through telepathy. We can only compare outputs and inputs.

>- thus as per my original point, the burden of proof is on the proponents

There are no proponents making a claim that intelligence is absolutely true. There are only proponents saying it is possibly true.

Burdens are human qualities assigned to random people for no apparent reason. If it talks like a human then the possibility becomes open by common sense, burden of proof is just some random tag you are just using here.

But again no one is making a claim that LLMs are conscious. But you seem to be making a claim that it isn't. You made a claim, Great. looks like it's your burden now. Or perhaps this burden thing is just stupid and we should all use common sense to investigate what's going on rather then making baseless claims then throwing burdens on everyone else.


I think the Turing Test has a lot to answer here for the current fandango. It (and your input/output argument) boils down to 'if it can't be measured it cannot exist', which does not hold up to philosophical scrutiny.

Burden of proof is a well established legal and scientific concept that puts the onus on one side of the debate to show they are right, and if they are unable to prove that, then the other side would automatically given the 'judgement'. For example, if someone claimed there was life on the Moon, it would be on them to prove it, otherwise the opposite would quite rightly be assumed (after all, the Moon is an apparently lifeless place). Another example, a new drug has to be proven safe and effective before it can be rolled out - instead of others having to prove it is NOT safe and effective to STOP the rollout.


Nobody said if it can't be measured it doesn't exist. Nothing of this nature was said or implied.

What I do believe is that if it can't measured then it's existence is only worthwhile and relevant to you. It is not worthwhile to talk about unmeasurable things in a rigorous way. We can talk about unmeasurable things hypothetically, but topics like whether something is intelligent or not where we need definitive information one way or another requires measurements and communication in a shared reality that is interpretable by all parties.

If you want to make a claim outside of our shared reality then sure, be my guest. Let's talk about religion and mythology and all that stuff it's fine. However...

There's a hard demarcation between this stuff and science and a reason why people on HN tend to stick with science before jumping off the deep end into philosophy or religion.

My point on burden of proof was lost on you. Who the burden is placed on is irrelevant to the situation. Imagine we see a house explode and I thus make a claim that because I saw a house explode an actual house must have exploded. Then you suddenly conveniently declare that if I made the claim the burden is on me to prove it. What? Do you see the absurdity there?

We see AI imitating humans pretty well. I make a soft claim that maybe the AI is intelligent and suddenly some guy is like the burden of proof is on you to prove that AI is intelligent!

Bro. Let's be real. First no definitive claim was made second it's a reasonable speculation irregardless of burdens. The burden of proof exists in medicine to prevent distribution and save lives, people do not use the burden of proof to prevent reasonable speculation.


>> What is the burden of proof that you yourself are not a stochastic parrot?

Because the person you're talking to is a human?


Am I? How do you know this isn't output generated by an LLM?


Well, you tell me: was it?

I assume we're having a good faith conversation?


We are. But the point is you can't tell. You are entirely relying on my output to make an identification.


Really? I thought I was relying on the intuition that most comments on this site are unlikely to be generated by an LLM.

Also, I thought your point was "What is the burden of proof that you yourself are not a stochastic parrot?".


[flagged]


>> Go use that on your philosophy friends

Don't be an asshole.


Having read your comment again, I think the key word here is 'speculation', in all its (in)glorious forms.


There's a difference between wild speculation and reasonable speculation with high likelihood.

For example. I speculate you are a male and it's highly likely I'm right. The speculation I'm doing here is of the same nature as the speculation for intelligence.

The angle your coming at it from is that any form of opinion other then the opinion that LLMs are stochastic parrots is completely wild speculation. The irony is that you're doing this without realizing your position is in itself speculation.


What do you mean by the "stochastic parrots" (null) hypothesis in this case? Cards on the table, I think by any reasonable interpretation it's either uninformative or pretty conclusively refuted, but I'm curious what your version is.


I mean that it simply surfaces patterns in the training data.

So responses will be an 'agregation' (obviously more complex than that) of similar prompt/response from the training corpus, with some randomness thrown in to make things more interesting.


"Surfaces patterns in the training data" seems not to pin things down very much. You could describe "doing math" as a pattern in the training data, or really anything a human might learn from reading the same text. I suspect you mean simpler patterns than that, but I'm not sure how simple you're imagining.

A useful rule of thumb, I think, is that if you're trying to describe what LLMs can do, and what you're saying is something that a Markov chain from 2003 could also do, you're missing something. In that vein, I think talking about building from a "similar prompt/response from the training corpus", though you allow "complex" aggregation, can be pretty misleading in terms of LLM capabilities. For example, you can ask a model to write code, run the code and give the model an error message, and then model will quite often be able to identify and correct its mistake (true for GPT-4 and Claude at least). Sure, maybe both the original broken solution and the fixed one were in the training corpus (or something similar enough was), but it's not randomness taking us from one to the other.


There is a big difference between 'doing math' by repeating/elaborating on previously seen patterns, and by having an intuitive grasp of what is going on 'under the hood'. Of course our desktop calculators work (very well) on the latter principle.

As you say, both the broken and correct solutions were likely in the training corpus (and indeed the error message), so really we are doing a smoke and mirrors performance to make it look like the correct solution was 'thought out' in some sense.


I think dismissing problem-solving as "smoke and mirrors" based on regurgitating training data will give you a poor predictive model for what else models can do. For example, do you think that if you change the variable names to something statistically likely to be unique in human history, the ability will break?

As for pattern recognition vs intuitive grasp--I don't think I follow. I would call pattern recognition part of intuition, unlike logically calculating out the consequences of a model, but on the other hand I would not say that a desktop calculator "grasps" anything-it is not able on its own to apply its calculating ability to real world instantiations of mathematical problems in the way that humans (and sometimes LLMs) can do.


> There's actually a cargo cult around downplaying AI.

> The high level characteristics of this AI is something we currently cannot understand.

The lack of objectivity, creativity, imagination, and outright denial you see on HN around this topic is staggering.


Yeah, I also remember hyperloop.


> LLMs Can produce output that is equal in intelligence and creativity to humans

You could have said the same about photocopiers, when they were just as novel - where you'd never seen the "creation"s original.

Generative "AI" became impressive only at the point its input funnel gained access to more original material than any individual impressee could truly comprehend.

Cargo cult, yes.


If anything's for sure it's that if anyone says they know exactly what a large neural net is doing and why, they're most likely full of shit.


> LLMs Can produce output that is equal in intelligence and creativity to humans.

"Intelligence" is begging the question when the whole debate is about whether they are intelligent, and what intelligence even is.

"Creativity" however... consider my eyebrows raised. Where do you see LLMs surpassing humans in creativity?

> It can even produce output that is objectively better than humans.

Examples?


>"Intelligence" is begging the question when the whole debate is about whether they are intelligent, and what intelligence even is.

Intelligent as in the output response is often indistinguishable from human level reasoning and/or creativity. Sometimes superior.

>"Creativity" however... consider my eyebrows raised. Where do you see LLMs surpassing humans in creativity?

>Examples?

Query:

   Compose a poem about a half dragon half hippo on a quest to defeat a speck of dirt. The speck of dirt has armies of atomic bombs on legs defending him. Make the poem epic, heroic and eloquent.
Response:

   In realms where imagination's fervor flows,
   There lived a creature of unusual pose,
   Half dragon, half hippo, in heroic repose,
   Embarking on a quest that nobody chose.

   A speck of dirt, mundane, minute,
   Guarded by atoms, in explosive pursuit.
   Their deadly legs, marching, resolute,
   The specter of destruction, absolute.

   Yet undeterred, our hero strode,
   Down the path that destiny showed.
   With dragon's fire and hippo's load,
   To face the atomic, legged abode.

   Mountains crumbled, rivers fled,
   The sky above turned fiery red.
   Yet the half-dragon, half-hippo, met the dread,
   His heart on the quest, his mind on the thread.

   "Begone!" he roared with a dragon's ire,
   Faced the atomic legions, an unholy choir.
   Against the speck, he breathed his fire,
   In the face of oblivion, he did not tire.

   He summoned his strength from the hippo's well,
   Against the explosive onslaught, he fell.
   But in his heart, a courage swell,
   Against the speck of dirt, he'd rebel.

   With one mighty heave, the ground did shake,
   The atomic army faltered, began to break.
   He plunged into the heart of the quake,
   For his home, for his world, for all of their sake.

   The speck of dirt, it's reign thus end,
   No longer a threat, no longer a fiend.
   Victorious stood our hippo-dragon friend,
   His quest completed, his land to mend.

   In tales of valor, they'll speak his name,
   The dragon-hippo who tamed the atomic flame.
   In epic sagas, his heroic claim,
   Against a speck of dirt, he won the game.
Illustration:

https://imgur.com/a/AOHlBkf

(you will get a cautionary warning about NSFW but don't worry... the picture is SFW)

I mean you could argue that this entire response is just stochastic parroting. But my point is you can't say anything either way. We don't know how these LLMs came up with the poem or the illustration. But one thing we do know is that none of what I posted here is a copy of anything that exists.

Is it objectively better or equal to what humans can produce? I don't know. You can try to Prove me wrong. Write a better poem and draw a better picture in less time.


I'm sorry, I didn't ask you for a poem-like text generator.

Your claim was:

> LLMs Can produce output that is equal in intelligence and creativity to humans. It can even produce output that is objectively better than humans.

I don't see this poem about half-dragon / half-hippos as particularly creative, but I'll preempt the "my opinion vs your opinion" with this: it definitely does NOT surpass what humans can come up with. Human poems are unarguably better.

And this word salad of a poem definitely fed from human creations and is derivative of them.

I didn't ask whether LLM could create poem-like texts.


You asked for examples where it could do better than you and you stated it couldn't be creative. I gave you an example both in text form and in picture form where it is creative and it does better than you.

First this proves it can do better than you. The word salad is likely better than anything you can come up with. Again feel free to prove me wrong here by doing better. Draw me a better illustration and write me a better poem. These are your initial points. Stick to the point and prove me wrong. Do not deviate.

Second there is no denying this is creative. Both the picture and the text are the definition of creative. Whether it's a poem or not is besides the point. Whether it's "particularly creative" or not is also besides the point. The picture and the text prove your initial points wrong. I will be sticking to this point until you prove otherwise. Until then I request you do not deviate the conversation to alternative points.


> You asked for examples where it could do better than you

No. I suggest you read again. Or is that "you" a collective for "humankind"?

> First this proves it can do better than you.

No. You are misusing the word "proof" in a dishonest way.

> The word salad is likely better than anything you can come up with.

Feeling combative, are we? You know nothing about me. I don't feel compelled to write anything for your amusement; I suppose that makes me different from a LLM-powered chatbot.

> The picture and the text prove your initial points wrong. I will be sticking to this point until you prove otherwise. Until then I request you do not deviate the conversation to alternative points.

I feel no obligation to follow your whims, unlike a chatbot. The text and picture prove nothing of the sort. Besides, I didn't claim I was a particularly good writer, let alone a good poem writer (I didn't claim the contrary; I made no claims at all).

I didn't claim there is no creativity with LLMs. I claimed it's barely equal to and certainly doesn't surpass human creativity.

PS: I am very skilled at drawing (in a different style than the example) and I can easily surpass it in my preferred style. I don't find the illustration you showed very good, either.


Not being combative. You are mistaken. I am simply trying to keep the conversation on point and prevent deviation. You made initial points I want those points to be determined to be definitively wrong or right before moving on and branching off into deviations.

By "you" I mean the average human. The common human. It can surpass you as an average human and thus it can surpass the common human aka most humans. I don't know you but I made an assumption that you are average.

If you are good at drawing that doesn't mean you can do better. When I compare the art from LLMs to other artists it is in general equal. Then in this case it matches you in your preferred style. But likely beats you in photorealistic styles. I know artists often use simplistic styles to make things easier. Is this the case for you? I wouldn't know. But when looking at other artists I find it very likely it matches you in skill.

The claim made by me is that an LLM can surpass humans and match humans. I did not make the claim that it consistently does this. I believe the poem and the picture proves this as everyone on this thread is unlikely to provide any proof to the contrary.

Maybe you can do slightly better for the illustration. But slow speed prevents you from proving this.


"You" the average human, but then you challenge me to provide something better? Weird.

I wouldn't write something as bad as this poem, and I'm not even a poet!

No, my art style is not "simple", but it's not photorrealistic either (this style you showed isn't photorrealistic either, mind you).

Without taking away how the current AI image generators work, which is impressive, I find good human artists are better. And the AI is taking from them, anyway. It's one thing to say "draw like van Gogh", and another entirely to be van Gogh for the first time.

Comparing an algorithm to "average people" makes no sense. Some people are not creative at all, so maybe a clever chimp is more creative! A vector-graphics game from the 80s-90s is better than most people at drawing vector art, so what? This is not how meaningful creativity comparisons work.

Creativity is not measured in speed either. If this is the metric you're using, I can see the source of our disagreement.


Yeah why not challenge you? I assume your average. That's not wierd at all.

If you wouldn't write something as bad as the poem then write something better.

LLMs are taking away from artists simply because in the eyes of consumers they are roughly equivalent if not better. Who's to say your judgement is better then the judgement of consumers of art?

Why not compare algorithms to the average person? It's certainly better then comparing to some off the charts anomaly of a person. What you're not seeing is that an LLM beating average people is already proof it's creative. But then again LLM art surpasses even those that are above average.

Creativity is not measured in speed. This I agree. But that was not my point. My point was, speed is allowing LLMs to supply me with an endless array of proof and examples. Speed is preventing you from providing anything. It's your word against actual example outputs created by chatGPT or stable diffusion.


> Yeah why not challenge you? I assume your average. That's not wierd at all. If you wouldn't write something as bad as the poem then write something better.

Because, like I explained, I'm not at your beck and call. I'm not ChatGPT; you cannot order me to do things for your amusement.

> LLMs are taking away from artists simply because in the eyes of consumers they are roughly equivalent if not better.

You are making a wildly unsupported claim ("equivalent if not better"). Also, people who enjoy art are not "consumers" nor is art a "product". Your mindset is all wrong about this, which might explain why you're so easily satisfied with AI art.

> Why not compare algorithms to the average person?

Because a completely dumb algorithm that takes paragraphs from random texts in Project Guttenberg, without paying much attention to fine coherence, is already producing something "better" than the average person. Yet nobody, not even you, would call it a breakthrough in neither AI nor creativity.

This is not how meaningful discussion about creativity will happen.

By the way, the onus is on you. You made an extraordinary claim, it's on you to provide a convincing example. I don't have to "provide" anything (yet).


> Guarded by atoms

atoms, not atom bombs.

> his mind on the thread

What is that?

All in all I found the poem to be really bad. "he won the game" is not something you'd hear in an epic, it generally seems to go by the gamer definition of "epic" which is just calling something epic because you can't be bothered to examine or describe it. It reminds me of Edgar A. Poe and his "draw the rest of the owl" style. "It was so foreboding and beyond human imagination". Show, don't tell.

It breathed fire, it was so heroic and resolute and a lot of other adjectives just floating about, there is no fight at all - that all is skipped, the army faltered (because fire was breathed on atomic bombs? okay?)... it's just a bunch of filler text with no substance, I can't imagine any sequence of events based on this.

And one of the images shows several people riding on the hippo, with another hippo in the background, totally failing the assignment. None of them show atomic bombs on legs, and don't even attempt to depict a speck of dust.


Bad poem. But creative. It took some creative liberties which you did not like. Also the LLM took creative liberties on the picture, similar to a human. I guess if a human drew extra people in some mock up I would automatically assume that human is a robot. Makes sense? No.

As for the spec of dust. It's there , it's just too small for you to see.

I guess you not liking the poem is now the demarcation for intelligence? Come on man. This poem is better than anything you can come up with and it's creative.

Hmm as for the nukes. That one is your most legitimate claim. It definitively failed in that respect. But I would hardly call that a clear sign that it's not intelligent. This is more a clear sign that the LLM is not understood. We don't know why it didn't draw the nukes. To say it didn't because it's not intelligent? Well that's too bold of a claim.


>> Compose a poem about a half dragon half hippo on a quest to defeat a speck of dirt. The speck of dirt has armies of atomic bombs on legs defending him. Make the poem epic, heroic and eloquent.

This is certainly creative. But, if I understand correctly, this is your prompt, yes?


A composition of a poem from this prompt is creative. The poem and the picture had to fill in elements not included in my prompt.


It's creative (though possibly gramatically correct word salad from human sources; no small feat, but not exactly what's claimed either).

What is not is good poetry. Certainly no proof that LLMs can surpass humans.


Then write a better poem. Draw a better picture.

I wouldn't say this example surpasses all humans. It surpasses most humans and matches those trained in poetry and in illustration. Where it does definitively excel is timing. Both the poem and the pictures were generated in less than a minute. No human can create that quickly ever. Even the best of us cannot match that in speed.


Speed is not the measure of creativity. I don't think anyone will deny that machines can do some things way faster than humans; this has little to do with AI in particular.

I don't think this "poem" matches or surpasses most humans trained in poetry.

I don't have to provide anything. I mean, there's a huge body of poetry (that this LLM was trained on, by the way) to compare it to. Pick poetry you like, and compare it to this one. You'll see the difference in quality.


Speed is not a measure of creativity but it is a critical factor in the generation of evidence.

In that respect it is beating your argument on all counts.

You don't have to provide anything. But it makes your argument weaker if you can't generate better works of creativity from the given prompt.

Let's stick to the dynamic prompt. The point is to choose a prompt that will create works that don't exist. We don't want the LLM or the artist in question copying anything that already exists. Proof of creativity requires an actual live demonstration of it.


> In that respect it is beating your argument on all counts.

Which, pray tell, do you believe my argument is?


What I mean is that the OP's prompt to the LLM is creative, not the LLM's output. The LLM's output just expounds on the human's prompt so the poem it generated is clearly not an example of creativity.


> output that is objectively better or the same as human level reasoning... it's just not doing this consistently

I'd say the inability to do it consistently is because it's not reasoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: