Because they are both statements about the future. Either humans can inductively reason about future events in a meaningful way, or they can’t. So both statements are equally meaningful in a logical sense. (Hume)
Models have been improving. By induction they’ll continue until we see them stop. There is no prevailing understanding of models that lets us predict a parameter and/or training set size after which they’ll plateau. So arguing “how do we know they’ll get better” is the same as arguing “how do we know the sun will rise tomorrow”… We don’t, technically, but experience shows it’s the likely outcome.
It's comparing the outcome that a thing that has never happened before will (no specified time frame), versus the outcome that a thing that has happened billions of times will suddenly not happen (tomorrow). The interesting thing is, we know for sure the sun will eventually die. We do not know at all that LLMs will ever stop hallucinating to a meaningful degree. It could very well be that the paradigm of LLMs just isn't enough.
What? LLMs have been improving for years and years as we’ve been researching and iterating on them. “Obviously they’ll improve” does not require “solving the hallucination problem”. Humans hallucinate too, and we’re deemed good enough.
Humans hallucinate far less readily than any LLM. And "years and years" of improvement have made no change whatsoever to their hallucinatory habits. Inductively, I see no reason to believe why years and years of further improvements would make a dent in LLM hallucination, either.
As my boss used to say, "well, now you're being logical."
The LLM true believers have decided that (a) hallucinations will eventually go away as these models improve, it's just a matter of time; and (b) people who complain about hallucinations are setting the bar too high and ignoring the fact that humans themselves hallucinate too, so their complaints are not to be taken seriously.
In other words, logic is not going to win this argument. I don't know what will.
I don’t know if it’s my fault or what but my “LLMs will obviously improve” comment is specifically not “llms will stop hallucinating”. I hate the AI fad (or maybe more annoyed with it) but I’ve seen enough to know these things are powerful and going to get better with all the money people are throwing at them. I mean you’d have to be willfully ignoring reality recently to not have been exposed to this stuff.
What I think is actually happening is that some people innately have taken the stance that it’s impossible for an AI model to be useful if it ever hallucinates, and they probably always will hallucinate to some degree or under some conditions, ergo they will never be useful. End of story.
I agree it’s stupid to try and inductively reason that AI models will stop hallucinating, but that was never actually my argument.
> Humans hallucinate far less readily than any LLM.
This is because “hallucinate” means very different things in the human and LLM context. Humans have false/inaccurate memories all the time, and those are closer to what LLM “hallucination” represents than humam hallucinations are.
Not really, because LLMs aren't human brains. Neural nets are nothing like neurons. LLMs are text predictors. They predict the next most likely token. Any true fact that happens to fall out of them is sheer coincidence.
This for me is the gist, if we are always going to be playing pachinko when we hit go then where would a 'fact' emerge from anyway, LLM don't store facts, correct me if I am wrong, as my topology knowledge is somewhat rudimentary, so here goes, first, my take, after this, I'll past GPT4's attempt to pull this into something with more clarity!
We are interacting with multidimensional topological manifolds, and the context we create has a topology within this manifold that constrains the range of output to the fuzzy multidimensional boundary of a geodesic that is the shortest route between our topology and the LLM.
I think some visualisation tools are badly needed, viewing what is happening is for me a very promising avenue to explore with regards to emergent behaviour.
GPT4 says;
When interacting with a large language model (LLM) like GPT-4, we engage in a complex and multidimensional process. The context we establish – through our inputs and the responses of the LLM – forms a structured space of possibilities within the broader realm of all possible interactions.
The current context shapes the potential responses of the model, narrowing down the vast range of possible outputs. This boundary of plausible responses could be seen as a high-dimensional 'fuzzy frontier'. The model attempts to navigate this frontier to provide relevant and coherent responses, somewhat akin to finding an optimal path – a geodesic – within the constraints of the existing conversation.
In essence, every interaction with the LLM is a journey through this high-dimensional conversational space. The challenge for the model is to generate responses that maintain coherence and relevancy, effectively bridging the gap between the user's inputs and the vast knowledge that the LLM has been trained on."
If you believe humans hallucinate far less then you have a lot more to learn about humans.
There are a few recent Nova specials from PBS that are on YouTube that show just how much bullshit we imagine and make up at any given time. It's mostly our much older and simpler systems below intelligence that keep us grounded in reality.
It's like you said, "...our much older and simpler systems... keep us grounded in reality."
Memory is far from infallible but human brains do contain knowledge and are capable of introspection. There can be false confidence, sure, but there can also be uncertainty, and that's vital. LLMs just predict the next token. There's not even the concept of knowledge beyond the prompt, just probabilities that happen to fall mostly the right way most of the time.
We don't know that the mechanism used to predict the next token would not be described by the model as "introspection" if the model was "embodied" (otherwise given persistent context and memory) like a human. We don't really know that LLMs operate any differently than essentially an ego-less human brain... and any claims that they work differently than the human brain would need to be supported with an explanation of how the human brain does work, which we don't understand enough to say "it's definitely not like an LLM".
I'm trying to interpret what you said in a strong, faithful interpretation. To that end, when you say "surely it will improve", I assume what you mean is, it will improve with regards to being trustworthy enough to use in contexts where hallucination is considered to be a deal-breaker. What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.
On the other hand, the problem of getting people to trust AI in sensitive contexts where there could be a lot at stake is non-trivial, and I believe people will definitely demand better-than-human ability in many cases, so pointing out that humans hallucinate is not a great answer. This isn't entirely irrational either: LLMs do things that humans don't, and humans do things that LLMs don't, so it's pretty tricky to actually convince people that it's not just smoke and mirrors, that it can be trusted in tricky situations, etc. which is made harder by the fact that LLMs have trouble with logical reasoning[1] and seem to generally make shit up when there's no or low data rather than answering that it does not know. GPT-4 accomplishes impressive results with unfathomable amounts of training resources on some of the most cutting edge research, weaving together multiple models, and it is still not quite there.
If you want to know my personal opinion, I think it will probably get there. But I think in no way do we live in a world where it is a guaranteed certainty that language-oriented AI models are the answer to a lot of hard problems, or that it will get here really soon just because the research and progress has been crazy for a few years. Who knows where things will end up in the future. Laugh if you will, but there's plenty of time for another AI winter before these models advance to a point where they are considered reliable and safe for many tasks.
> What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.
I mean this is what I was saying. I just don't think that the technology has to become hallucination-free to be useful. So my bad if I didn't catch the implicit assumption that "any hallucination is a dealbreaker so why even care about security" angle of the post I initially responded to.
My take is simply just that "these things are going to be used more and more as they improve so we better start worrying about supply chain and provenance sooner than later". I strongly doubt hallucination is going to stop them from being used despite the skeptics, and I suspect hallucination is a problem of lack of context moreso than innate shortcomings, but I'm no expert on that front.
And I'm someone who's been asked to try and add AI to a product and had the effort ultimately fail because the model hallucinated at the wrong times... so I well understand the dynamics.
Just because you can inductively reason about one thing doesn't mean you can inductively reason about all things.
In particular you absolutely can't just continue to extrapolate short-term phenomena out blindly into the future and pretend that has the same level of meaning as things like the sun rising which are the result of fundamental mechanisms that have been observed, explored and understood iteratively better and better over an extremely long time.