Ok, so ChatGPT got some things wrong. If everyone began posting how ChatGPT got something wrong, we will be here all day. I can't see how this article is newsworthy.
And the article has constant literary interruptions like this:
"I set down my teacup and put my face in my hands."
"There was an almost infinitesimal pause before ChatGPT chirped"
"I closed my eyes and contemplated my life choices."
"I narrowed my eyes as ChatGPT brightly informed me"
I don't know what the author is trying here but there is a time and place to use literary pauses like these and I don't think, this article is that place. The article reads like a weird cross between a rant, technical article and a wannabe literary piece.
These interventions on the contrary made me chuckle, even at something I already knew.
I think this is a good article. It's informative, funny (subjectively) and showcases a clear example of how something trivial for a human expert is not barely understood by ChatGPT.
> trivial for a human expert is not barely understood by ChatGPT
I try to stay away from these arguments but it's a Saturday, so what the heck! I think this topic is becoming a tired topic now. One party would claim that ChatGPT doesn't "understand". Other party would claim that it's a large statistical model, of course it doesn't "understand". Do you expect your washing machine to "understand" that it's washing clothes? Then there is another party that would claim, rightfully, do we understand what "understanding" is? This has been rinsed and repeated over and over again ad nauseum. Is there anything more to learn in these arguments?
I mean, can we just give "ChatGPT got some things wrong, haha" some rest now?
> I mean, can we just give "ChatGPT got some things wrong, haha" some rest now?
Sure, right after the “AI is more profound than fire and electricity”¹ arguments stop.
People keep making counter arguments because the argument keeps getting made. Stop inundating HN (and everywhere else) with praise for LLMs as if they were the second coming of Christ and the counter arguments will subside too.
Article title: Google CEO: A.I. is more important than fire or electricity
Quote from article: AI is one of the most important things humanity is working on. It is more profound than, I dunno, electricity or fire," says Pichai
——
That’s not the argument he made. I have said for a long time that you can find always someone saying anything you want on the Internet, with that you can paint your side/any-side all with the same brush even if it’s an opinion held by only 1 person.
There might some someone on Twitter who actually thinks AI is more _important_ than fire or electricity but that’s not a commonly held belief and it’s not even the case in the article you linked.
> I have said for a long time that you can find always someone saying anything you want on the Internet
Agreed. But who says it matters too. Some people scream into the void, others share opinions that sway elections and affect the lives of millions.
I’m not only considering Pichai, he was merely the first proxy I remembered. The point is that the importance and abilities of LLMs are regularly overstated.
My friend and I have a game where we link each other a wildly successful submission title on reddit (usually political) and then screenshot the part of the article (usually not linked on reddit) that reveals the title as phony.
Thousands of comments reacting to a headline or screenshot of a tweet that purports to summarize a headline without actually reading TFA.
Yep, I used to get caught up in headlines and I’m sure I still read too much into them but if there is an outlandish comment/position/opinion in the title OR something that perfectly fits with my “world view” I make myself find the real quote/line which often just further erodes trust in reporting institutions.
You need to consider your source of course, but more often than not the more outlandish the claim the more likely it was misquoted or they had the wrong takeaway.
Except the originally stated option (in the comment and in the article title) is completely false. He didn’t say AI was more _important_ but that it was more _profound_. There is a difference and words have meaning.
as a general rule when someone discusses profound technology they mean it is important, words may have meaning but they also have common usage and when he says AI is profound he certainly doesn't mean it is deep like the ocean.
Evidently you do not agree with this - so please tell me, what technology do you consider profound and yet, also, unimportant.
That is not the point of the argument. But fine, I edited the original post to use “profound” so we can end this nitpick. Yes, words have meanings and we should absolutely consider those differences when they’re the central thesis. But this wasn’t it, it was merely the first buffoon with a similar argument I could think of. He served as an example of a trend, not as the entire point. It makes zero difference if he said “profound” or “important”, they are equally stupid in this context.
it can be more profund and still fuck up all the time!
And yes, the completely blind zealous worship by these parasite outlets is even worse than the absolutely uncritical copy-pasting of AI output that some people post on forums.
I had several moments in my life where i learned about a thing from its source of truth (documentation, source code, standards) and was infuriated that the common knowledge about it was just wrong. I can give a long list but that would derail the thread.
Fact is, common knowledge is extremely bad, for whatever reason. Stochastic language models are trained on common knowledge.
Ask ChatGPT what the "B" in the name for the DB-9 connector means. Or why the "/usr" directory is named that way. ChatGPT will reliably give the popular, but wrong answers.
I'd strongly suggest addressing the points I've made rather than resorting to a lazy dismissal like I "sound like a generated account." It's a non-argument and contributes nothing.
For the record, I'm not affiliated with OpenAI. I use ChatGPT often. It solves my problem many times. Many times it fails miserably. I neither love it nor hate it. It is what it is.
My frustration stems from the sheer laziness of some posts and comments about LLMs here on HN. The ones that irritate me the most are those that expect LLMs to somehow possess magical abilities like being able to inherently distinguish between correct and incorrect outputs. Where does this expectation even come from? Have these people spent no time at all understanding how LLMs actually work? No? And yet they expect magic? It's baffling!
If you have a substantial critique of the points I raised, by all means, share it. If not, please spare me the hand-waving dismissals.
Your argument is upside down. If people are insisting LLMs aren’t magic, it’s only because too many people argue that they are. Just like you’re frustrated at the repeated con arguments, other people are frustrated at the repeated pro arguments.
Your "argument" is so poorly constructed that it could easily be considered trolling and therefore feels like it has no point to engage it, thats one of the main reasons it reads as generated by OpenAI bot accounts, like just today someone realized they could ask about the time Jimmy Carter kicked a klansmen in the nuts and GPT would explain with excruciating detail how and when it went down, except this never happened, but with your "argument" we should just assume that GPT making things way too frequently its just a part of life and that its still a pretty useful tool despite misleading thousands of people in a daily basis.
Who are you to decide what style the author should use? It’s consistent, entertaining, and well-written. It’s on their personal blog. If an author can’t choose for themselves a writing style on their personal blog, where can they?
Or, to put it another way: “there is a time and place to demand dry technical writing and I don’t think, this reader went to the right place.”
> Who are you to decide what style the author should use?
Who am I to decide? Well, who are you to ask me who I am to decide? Should we keep going in circles, or would you prefer we actually discuss the point at hand?
Well, seriously, to be honest, I don't care much about what style the author should use. The author should write in their style on their personal blog as much as they want. I care about it appearing on the HN main page and me having to click it and then being unhappy about my dear HN community upvoting a mundane article of questionable quality to HN main page. Who am I to do so? Well, I am the reader!
If I can't even critique an article that I read here, then what are we all even doing here?
> "I closed my eyes and contemplated my life choices."
Specifically regarding this one, I think the implication was that the author was hoping that ChatGPT, having been alerted to its mistake, would correct the mistake, as a human developer would (subordinate, coworker whose code is being reviewed). "Contemplated my life choices" then implies that the author recognizes that it was their choice to ask ChatGPT and that they were foolish for holding that hope in their heart.
The style reminded me of BOFH or dailywtf, maybe that’s what they we’re aiming for?
But I agree: ChatGPT adding some unnecessary aria labels is neither newsworthy nor unexpected
To underline the point, it got it wrong in exactly the same way 95% of current frontend developers do.
I think it's a good reminder how perfectly mediocre current AI is. That doesn't make it useless though: I'm way below mediocre in several things I need to do on most weeks in my work, so in those areas AI can be a great help.
Maybe it did. But what's newsworthy about it? Is it really news that ChatGPT gets stuff wrong? If every person on this thread posts one article to HN describing how they asked a question to ChatGPT and it got it wrong, does that make this site better or worse?
This article is as newsworthy as a model card from OpenAI hyping their latest model as more "thoughtful" and "safer" and whatever other bogus criteria they claim to boost their valuation. This type of real-world experience reports offset the overblown hype and marketing surrounding these tools.
So, yes, they're very much needed and important, even if the conversations around it are repetitive. You're free to ignore them.
> OpenAI hyping their latest model as more "thoughtful" and "safer" and whatever other bogus criteria they claim to boost their valuation.
If this is true, it's a valid criticism. I hear you. I think it is downright thoughtless to mislead the layman into thinking LLMs can be "thoughtful" by anthropomorphizing them! To all companies that are doing this, seriously, stop dressing up a statistical token generator as if it's some kind of sentient philosopher.
There are so many better and higher quality submissions in the /newest that never make it to the front page but it's almost always some shallow generic "hobbyist" junk with zero value like this that makes it instead.
Making a rocket science of a <button> on an HTML page... "anything that gratifies one's intellectual curiosity"... yeah.
It challenges the assumptions around "accessibility". some linters might (blindly) suggest aria labels, where the author is suggesting otherwise. That is worth the 3 min reflection that this article took to get through eh?
"an article about a button not being a submit button"
This is just one of the points the article is making, and it isn't that the button isn't a submit button, it's that the button returned by the LLM has both characteristics of a submit button, and characteristics of a regular button.
Honestly I'm just relieved that ChatGPT didn't suggest a div with role="button".
I have spent the last 5 years trying to get a good grasp of semantic and accessible HTML. It seems like every week I find out I was wrong about something I thought I knew. There's a bunch of blog posts out there with conflicting examples, and the only "official" source that's actually readable is the ARIA Practices Guide (APG). Which, by the way, starts off by telling you that "No ARIA is better than Bad ARIA" and recommends that you don’t use any of the code from the "Patterns" pages because reasons.
So to end my rant, I would say that on the one hand, I understand the author's frustration, but on the other, a11y in HTML is a nebulous maze of ARIA attributes and it isn't always clear which element is the "semantic" one for certain use cases.
The author could have easily written the same article with vague personal opinions of how their own code isn't perfect for a slightly different set of default assumptions of how buttons should be used. Not everyone wants to dive into <form> behavior oddities like changing element's default behaviors or automatically reloading the page on submission nor do they want to assume the best label in the graphical context always matches the best label in the screen reader context. Doesn't make their example actually bad, it just means short examples are always easy to nit about.
One thing I liked about the article is it's a longer dive instead of a short answer, meaning you get more of the nuance presented to you instead of assumed. Of course, it looks like the same thing comes from either source if you set out to ask for a detailed explanation.
Also, FWIW since these will all vary anyways, I do actually like the full answer I got from ChatGPT more overall than the one at the end of their article. Even despite that it doesn't bother mentioning form considerations it gives more full answers to the points in the "hand tailored" response, as well as covering when you should consider using a separate aria-label or not https://chatgpt.com/share/e/6754aaad-823c-8010-a5ad-96eff5f0....
The author is asking for a button that is accessible and got a button that is accessible. The system never claims to an expert in the subject matter, and yet still gets the ball in the hole.
The author didn't ask "give me the most basic button you can that a screen reader can still access."
The tool did what we expect and gave a reasonable answer, and was able to reason about why it gave those answers when questioned.
I think it does not matter in this case. ALL current LLMs are prone to confabulation. AFAIK, there is not a single model which is able to tell “I don’t know” instead of making things up.
> ALL current LLMs are prone to confabulation. AFAIK, there is not a single model which is able to tell “I don’t know” instead of making things up.
I can't help but feel that people who keep parroting these lines haven't even bothered to take 30 days out of their lives to actually learning the basics of how LLMs work. LLMs are not "prone" to confabulation. They confabulate by design. That's how they work. They predict the next token based on probabilities derived from training data, not some magical ability to discern truth from falsehood.
The critical step people seem to ignore is that after the LLM generates a response, you need a filter to determine whether it's correct or not. But this is easier said than done! It can be done but in a very narrow range of cases, like generating code that passes a specific set of test cases or producing a Lean proof. The tests or proof verifiers act as that filter. But even here, you're far from guaranteed perfection. Test cases don't cover everything and proofs might be valid but still irrelevant to the problem.
Expecting an LLM to just say "I don’t know" fundamentally misunderstands what these models are. They're not epistemic agents. They don't "know" or "not know" anything. They just generate statistically plausible sequences. If that seems like a flaw to you, the problem isn't the LLM, it's your unrealistic expectations.
> The critical step people seem to ignore is that after the LLM generates a response, you need a filter to determine whether it's correct or not.
And therein lies the problem. A vast swath of human users tend to turn off their brain when interacting with LLMs. They expect correctness from computers and thus do not understand they produce plausible looking text, not the truth.
I agree with your thesis, your comment is correct in all its technical details, but making everyone understand those very important points and act accordingly is a continuous uphill struggle.
| ... "but making everyone understand those very important points and act accordingly is a continuous uphill struggle."
And that right there is exactly why people who know keep writing articles and getting in arguments with people who don't know but think they do because some "expert" who also doesn't actually know anything about "AI" beyond the hype they've been sold said so... Until the "bubble pops" and the hype dies down, folks are gonna keep parroting various mistakes and misunderstandings and folks who actually understand the technology are gonna have to keep repeating the same old tired arguments... Seen it time and again over my decades in the "tech" industry. This time is once again proving to be a carbon copy of the same old script we've seen played out countless times before with each new "next big thing". Surprise!
The "model solution" isn't exactly great. "Just make sure to style the button so it's appearance changes when it has focus..." Ok? So show me how to do that!
I find this style of writing incredibly cringy. Not only verbose for the information conveyed, but really not funny (I guess it was the intention). Maybe I am alone but for this reason I would at least be able to downvote main stories. Now I just hide it.
I bet this person is from a different culture than you? I'm American raised in the states, and this read like a Brit. The author's domain is .uk (https://tink.uk/) so I suspect that's true.
Yeah, one of my pet peeves is when someone really puts a lot of effort into making a technical article into an exaggerated story about themselves, like some sort of gonzo journalist.
An explanation after each response, clearly pointing out the problem, is better than,
'The computer gave a response. I looked at the response, shocked. Wow. WOW!!!! Wooooooooooooooow. Absolutely shocked. I couldn't understand why I got that answer instead of the one in my mind. My head was spinning. Unable to believe what I was seeing, I took a step outside for some fresh air, and to have a smoke. My brand is parliaments, and I pulled out one and lit it up. My world was collapsing around me. I also have to go shopping today, I thought. It was at this time that I remembered I was writing a blog entry. You see, I like blogging. It suits me, because I am smart and want to share it. This response I got though, it was so dumb. So dumb. So, so, so, so, so, dumb. I put my head in my hands thinking about the response. I contemplated my life choices because of the response. I wondered where I went wrong because of the response. Then I pondered where humanity went wrong because of the response. Then I thought about where homo sapiens went wrong because of the response. Clearly this bad response by chatgpt was a fundamental error with evolution. Do I really want to exist in a world where an LLM can give a bad response? Does anybody??? What is this world coming to when something on the internet is wrong?????'
'It was then that I typed my second ever message to chatgpt. As I examined the response, it was so bad the entire universe abruptly exploded, or might as well have. I...'
There is no "downvote" for main stories but there is a "flag" option to flag articles that you think don't belong to the HN main page. Users with enough karmas can see the "flag" option.
Ah. I thought the flag was to signal it was inappropriate so I only used it on stories like that. But I don't want to signal it shouldn't be on the main page; just that I personally don't like it: downvote would be more appropriate?
And the article has constant literary interruptions like this:
"I set down my teacup and put my face in my hands."
"There was an almost infinitesimal pause before ChatGPT chirped"
"I closed my eyes and contemplated my life choices."
"I narrowed my eyes as ChatGPT brightly informed me"
I don't know what the author is trying here but there is a time and place to use literary pauses like these and I don't think, this article is that place. The article reads like a weird cross between a rant, technical article and a wannabe literary piece.