Keep in mind that the actual test is adversarial - a human is simultaneously chatting via text with a human and a program, knowing that one of them is not human, and trying to divine which is an artificial machine.
And all that is kind of irrelevant, because if LLMs were human-level general intelligence, they would solve all these questions correctly without blinking.
No human would score high on that puzzle if the images were given to them as a series of tokens. Even previous LLMs scored much better than humans if tested in the same way.
And most humans would do well on maths problems if the input was given to them as binary. The reason that reversal isn't important is that the Tokens are an implementation detail for how an AI is meant to solve real world problems that humans face while noone cares about humans solving tokens.
Humans communicate with each other to get things done. We have to think carefully how we communicate with each other given the shortcomings of humans and shortcomings of different communication mediums.
The fact that we might need to be mindful of how we communicate with a person/system/whatever doesn't mean too much in the context of AI. Just like humans, the details of how they work will need to be considered, and the standard trope of "that's an implementation detail" won't work.
The problem is that there are no tasks that LLMs are reliably good at. I believe that's what OP is getting at.
I fixed a production issue earlier this year that turned out to be a naive infinite loop - it was trying to load all data from a paginated API endpoint, but there was no logic to update the page number being fetched.
There was a test for it. Alas, the test didn't actually cover this scenario.
I mention this because it was committed by a co-worker whose work is historically excellent, but who started using Copilot / ChatGPT. I'm pretty sure it was an LLM-generated function and test, and they were deeply broken.
Mostly they've been working great for this co-worker.
I understand that, the point I'm making is that reliability is not a requirement for utility. One does not need to be reliable to be reliably useful :)
A very similar example is StackOverflow. If you copy/paste answers verbatim from SO, you will have problems. Some top answers are deeply broken or have obvious bugs. Frequently, SO answers are only related to your question, but do not explicitly answer it.
SO is useful to the industry in the same way LLMs are.
Sure, there is a range. If it works 100% of the time its clearly useful. If it works 0% then it clearly isn't.
LLMs are in the middle. Its unclear which side of the line they are on. Some ancedotes say one thing some say another. That's why studies would be great. Its also why syntax highlighting is a bad comparison since that is not in the greyzone.
If you want your writing to be taken as establishing facts, you need to base it on a priori established facts; if you base it on hypotheses and beliefs then you're just expanding on the hypothesis, contributing further belief.
I suspect though that it's not the author's beliefs, just an ambiguously written way of saying that it was done by those people with that belief - in the same way that an atheist or follower of some other religion may say that Christians pray in order to communicate their wishes to God. Of course there is the rude and angry for no clear reason brand of atheist who could never bring themselves to say such a thing, but to the rest of us there is no problem in describing someone else's actions by their own reasons for doing them, even if we don't share that motivation. I have colleagues who run for pleasure; though I do not for mine.
In my experience, people often conflate forgiveness, reconciliation, and redemption. They are distinct things. People also often do not understand the role repentance plays (evangelical Christians are _especially_ prone to miss it entirely).
We can forgive someone for horrific things they've done, even when they are utterly unrepentant. Forgiving them is changing our attitude about the harms the offender did from anger and hatred to acceptance and peace. This does not mean saying "what they did was okay," but rather accepting that they chose to do something horrible, learning to make peace with their hurtful choices, and focusing on healing the damage to ourselves and others rather than seeking to injure the injurer.
Reconciliation is mending a damaged relationship between two (or more) people, as much as it can be. Unlike forgiveness, true reconciliation requires that all parties make an effort - the injured party must put in the work of forgiving the offender, and the offender must do the work of repenting.
"Repent" now means "make a show of acting sorry" in evangelical Christian circles, but genuine repentance is very different. It is recognizing the harmful choices you have made and truly regretting them, and as a natural outgrowth of that regret, choosing to do what you can to heal the damage you have done. It means accepting the consequences of your choices; a repentant offender expects to be distrusted and treated differently for their crimes, and does not resent it, realizing the shattered relationships (emotional, familial, civil, and otherwise) are part of what they chose. Without repentance, others may forgive an offender, but reconciliation is not possible - if the injured continue a relationship with the aggressor, it is necessarily a stilted, broken, fragmented one.
Redemption is rarer and harder to achieve than either forgiveness or reconciliation. It is when reconciliation succeeds and goes beyond success, giving birth to something fuller and more beautiful than what was first broken. Adapting the words of my lost-yet-beloved faith, "He will wipe away every tear from their eyes, and there will be no more death or sorrow or crying or pain. Behold, He is making all things new."
This is not, in my experience, how conservative Protestant evangelicals use these words. However, in the decades I practiced Christianity, I slowly found these ideas in the Bible, despite the ludicrously-wrong exegesis practiced in US churches, by reading the texts many, many times. I find a great deal of truth and wisdom in them, considered this way. I hope perhaps my explanation can help you understand why so many people resonate with the message recorded in the Gospels, and the one that Dostoevsky so loved.
> Go back a few decades and you'd see articles like this about CPU manufacturers struggling to improve processor speeds and questioning if Moore's Law was dead. Obviously those concerns were way overblown.
Am I missing something? I thought general consensus was that Moore's Law in fact did die:
The fact that we've still found ways to speed up computations doesn't obviate that.
We've mostly done that by parallelizing and applying different algorithms. IIUC that's precisely why graphics cards are so good for LLM training - they have highly-parallel architectures well-suited to the problem space.
All that seems to me like an argument that LLMs will hit a point of diminishing returns, and maybe the article gives some evidence we're starting to get there.
The article you pointed out says the end came in 2016: Eight years ago.
My point is those types of articles have been popping up every few years since the 1990s. Sure, at some point these sort of predictions will be proven correct about LLMs as well. Probably in a few decades.
P-hack badly-constructed datasets until they find a coincidence in a dataset that reinforces their preferred narrative.
I mean, no, not all of them do that all of the time.
But it seems to be pretty common, and I'm not at all convinced that it's smarter, more correct, or wiser to live by research than by subjective experience.
> we do know that a calculator doesn't think or have awareness in the way we consider ourselves to.
If only this were true.
Some people take panpsychism seriously, and while it may sound ludicrous at first blush, it isn't actually unreasonable. Hard materialist reductionism may be true but no one's really given an irrefutable explanation for the oddity which is consciousness.
It's tempting to say "Things that don't respond to stimulus are obviously not self-aware", but see locked-in syndrome for a falsification of that claim.
> Empirically assessing the result after the fact is how we determine intelligence and reasoning in.. anything humans included.
It isn't, though. At least not exclusively, I'd argue not even primarily.
We look instead at how someone describes their thought process to see if it seems reasonable, whether they're using logically-valid reasoning forms, and whether their arguments bback up their claims.
It's common to say that someone's _reasoning_ is sound but their _conclusion_ is incorrect, due to <subtle observation they missed> or <invalid premise>.
This is why math teachers always tell students to show their work, and give out 0s for correct answers with no steps.
All of that is simply part of the "result" here though. Result in this discussion is the output of the black box, whether that's a machine or a human brain. I don't mean it in the "final answer" sense.
For example, all the conclusions the math teacher made are still ultimately from assessing the "result" of the student only.
Keep in mind that the actual test is adversarial - a human is simultaneously chatting via text with a human and a program, knowing that one of them is not human, and trying to divine which is an artificial machine.