Hacker News new | past | comments | ask | show | jobs | submit login
ChatGPT use declines as users complain about ‘dumber’ answers (techradar.com)
109 points by headalgorithm on July 16, 2023 | hide | past | favorite | 132 comments



Here is a quick example of how ChatGPT is getting dumber: Back when they released it, people literally asked the 3.5 version to simulate a terminal and then run all kinds of commands on it and even kind of surfing the "web" simulated by the AI and arrived at some hallucinated OpenAI "source code". That was the power of the old version.

Now you need GPT4 to do code interpretation and even then it would not be able to do that kind of experiment anymore.

The kicker? All of this is likely intentional. A pure, full power, unfiltered and unrestricted LLM on the scale of GPT4 would likely be much more powerful and can easily fool people into thinking it is a real AGI. We saw glimpses of this when Microsoft released BingAI and did not put enough guardrails on it. Even restricted to be a search engine, Bing was simulating emotions and had creative uses of its search capabilities, like looking up the person it is talking to and established an opinion of their relationship.

The ChatGPT we are using is lobotomized. But the AI industry isn't. Under the table I am sure there have been pushes into new applications and innovations. The thing is, there is no reason for OpenAI to try harder. Even with the dumbed down ChatGPT, they still have the best AI on the market and everything else is nowhere close. We need more competition either from open source or elsewhere to see the AI getting "smart" again.


Yeah. What benefit does an AI company have to offer dumb normies a SOTA LLM for a reasonable price. What makes the most sense business wise is to offer a quantized dumbed down public model labeled "ChatGPT" on the consumer level and save the real model for enterprise customers or internal use.


The trouble is that dumber AIs are cheaper to run. As long as people don't have a real alternative, this improves OpenAI's bottom line


Not even about improving bottom line. The costs of inference at scale are staggering for something like GPT-4. Inference costs rise exponentially with model size. Until everyone stops paying the Nvidia tax, there is no service based on a very large model like GPT-4 that can scale and be profitable. Beyond that, they are constrained by Azure’s GPU infrastructure as well.


I'm in the Google SGE beta, which is their version of the Bing + GPT4. Thing is, most searches don't require AGI or anywhere near that level of sophistication so the doom n gloom about Google might have been overblown.


All the OpenAI devrel types on Twitter say it hasn't been meaningfully lobotomized.

Based on past Gell-Mann amnesia, especially on this site, claims of "corporate leadership is telling baldfaced lies!" are likely to be false.

And finally, this is a product that spits out randomized answers, and we have gotten over our initial wave of euphoria and settled into hedonic adaptation, so we are likely to be less tolerant of failure.

I don't want to say you're wrong. But these are reasons to doubt. There are very strong cognitive biases pushing us toward the conclusion that it's gotten dumber, whether it has or not.


claude 2 is pretty useful since it has a much higher context window so you can upload entire documents


I don’t think they changed anything drastic. I’m still able to do the terminal simulation prompt and pretty much anything else from before. I think they improved the guardrails around the responses so you have to try a bit harder, but I wouldn’t say it’s worse.


I'm guessing, with the right objective tests, one could show exponential decline in the answers the more guardrails they clamp in place.

Having played with StableDiffision, you can see declines in output using negative prompts, particularly ones using negative inversions.

Basically, those guardrails will cut down the realm of valid answers.

Think of all the text on the internet that has typos but otherwise are perfectly valid data. You clamp down on the correct spelling and maybe 30% of vector space is just Missing.


Has anyone actually made and used one of those tests to show GPT getting worse “in the wild”?

(According to OpenAI’s own GPT4 experiments, guardrails make the AI dumber. But nobody seems to have a concrete question showing the GPT newer version responding worse than GPT older. There were a few candidates in an OpenAI forum but those turned out to just be “model makes the dumb choice X% of the time” and others got the correct output in the newer version.)


Meh, that lobotomy is indictive of the very real limited scope these raw models have to everyday use.

If they're actively working on a project it's probably heavily niche and a risky bet.


Also the model has been quantized and the number of completion choices reduced.


The reason is usage. At this point I will cancel my subscription. And I'm sure more will follow.


Um... you know that 3.5 didn't execute code?


You read that comment incorrectly, at least as it is right now.


My ChatGPT use is down. After the novelty wore off, it quickly became apparent that ChatGPT was exceedingly willing to outright lie to you. Which makes it not useful, unless you’re already a subject matter expert who can spot its lies. But if you’re a subject matter expert, you probably don’t need ChatGPT to help you in the first place; I can find answers on Google faster than ChatGPT can generate them.

I suppose it’s also useful for generating things like letters (things where exact truthiness doesn’t really matter). But even for that use case, I find ChatGPT creates overly verbose corporate gobbledygook whenever I ask it to generate text. So I just end up writing the text myself.


> I find ChatGPT creates overly verbose corporate gobbledygook whenever I ask it to generate text

I feel like this doesn't get enough attention; its writing style is _incredibly_ grating. It's not just corporate gobbledygook; it's a parody of corporate gobbledygook. Just awful.


If you ask ChatGPT an exceedingly trivial question, it’ll typically spend the next 60 seconds spewing out five paragraphs of corporate gobbledygook. And of course, because ChatGPT will lie to you, I often end up back on Google anyways to validate it’s claims.

Meanwhile, I could’ve found conclusive and correct answers directly from Google in about 10 seconds (I’m a fast Googler).

There are exceedingly few situations where I find ChatGPT is worth the effort. At least for factual Q&A-style queries like this.


I think the pull for most of us who use chatgpt is that google lies far, far more often than chatgpt ever will. Or is just otherwise inconclusive / does not give the relevant information you're looking for. The amount of SEO clickbait or quora/stack overflow answers that are either just incorrect or highly opinionated makes google very difficult to use for many things. As someone new to/learning Fedora it gives me the right answer 95% of the time, google gives me the right answer in the top 5 links far less.


Very much this. Back in the day, the Cluetrain Manifesto said that corporate writing sounded "literally inhuman" - like it didn't come from a human being. You can't hear the person in it.

ChatGPT reads similarly. There's no personal voice. It's like food with no seasoning; it's just... blah.


I write fiction for fun (just for me, don't ask) and asked it to generate a plot for an elevator pitch version of my big idea at the time. The plot was literally my plot. This confirmed my suspicions that I am a mediocre, at best, storyteller but it also made me wonder how much of what each person thinks makes them individually smart or talented is just mundane crap no better than this system can produce. Chilling to some, maybe, but confirmed my idea that what I'm doing is just for me to enjoy and that's ok.


I think it depends very much on your use case and what you expect.

My girlfriend is a judge, and sometimes we ask ChatGPT about some judicial problem for pure entertainment. To me as a law noob the answers sound absolutely convincing, but she always starts laughing and points out that the cited laws don't exist or the exemplary cases never happened.

I, on the other hand, work as a software developer and use ChatGPT as a discussion partner to get a better understanding of problem and solutions spaces. I don't expect ChatGPT to be correct but gladly take any inspiration or argument and use it to improve my own thinking process. And for this use case, I consider GPT4 absolutely invaluable. It's like a polite, knowledgeable, never busy, untirable colleague that is ready for my questions 24/7.


> I, on the other hand, work as a software developer and use ChatGPT as a discussion partner to get a better understanding of problem and solutions spaces. I don't expect ChatGPT to be correct but gladly take any inspiration or argument and use it to improve my own thinking process. And for this use case, I consider GPT4 absolutely invaluable. It's like a polite, knowledgeable, never busy, untirable colleague that is ready for my questions 24/7.

I often use ChatGPT as a starting point when researching new topics (usually in the software space). In the paragraphs of lies it generates, there are usually a few keywords you can put into Google to find accurate and reliable information.


Google is working to eradicate that.


Same. More than half is bullshit but it's a form of rubber duck debugging and very inspiring to find a solution yourself.


Lies is a feature not a bug, they are open about that. You have to pick a link from search results too.

I think use is down due people going from "wow that's amazing it's even possible" to "but significantly worse than a little human effort and an Internet connected computer"


If it was a feature, then competing LLMs could simply leave out this "feature" and take the spot of the leading AI. But the other LLMs have that problem as well.

> but significantly worse than a little human effort and an Internet connected computer

In some cases perhaps, but there are a lot of cases where asking ChatGPT and then verifying its answer is (much) faster than trying to figure out the answer on your own.


Google used to not make up links.


I stopped using it. Then 4 came and then I used it extensively. And paid for it. Now last 2 months it's not created something useful that didn't cost me time in the end.


it quickly became apparent that ChatGPT was exceedingly willing to outright lie to you.

I've sometimes thought that these "AI" chat systems might be better if they were taught the simple human phrase "I don't know."

I think it would improve trust in these system if people knew that they had limits, and were aware of them.

It would be even better if something like the one on Bing, for example, could respond "I don't know, but here are some links to places where you might find the answer…"

It's like when it starts to become apparent that the new kid at school is a compulsive liar. Eventually people stop listening to him.


That's not really how it works, though. The line `if(dont_know) {make_shit_up()}` does not appear in the source code. It doesn't know it's lying; it doesn't know anything!

You could train one to appear terribly uncertain, but it would still 'lie'.


OK, but it's probability-based. It will pick lower probability answers depending on the temperature. But that means it knows what the probabilities are.

So what it needs is a line of the form 'if(highest_probability < threshold){dont_know()}.


That'd take a whole different model. Remember, these are billions of vectors.


> I suppose it’s also useful for generating things like letters (things where exact truthiness doesn’t really matter). But even for that use case, I find ChatGPT creates overly verbose corporate gobbledygook whenever I ask it to generate text

This makes it perfect for creating purely time-wasting "content" - I've started sending back generated responses to people who cold-email offers to buy one of my browser extensions.

8 paragraphs (why is it always 8?) of leading waffle which is relevant to their original email. If they read more than 2 of them they've wasted more time than I spent replying.


8 paragraphs is the context limit.


This article is almost entirely speculation, and I don't believe it.

1. ChatGPT use declines

2. users complain about ‘dumber’ answers

Correlation is not causation, and a significant fraction of GPT users are students, many/most of whom are currently on summer break.


>> This article is almost entirely speculation, and I don't believe it.

In support of your skepticism, I'm quoting here the main passage where the article is speculating about the change in performance:

A common consensus was that GPT-4 was able to generate outputs faster, but at a lower level of quality. Peter Yang, a product lead for Roblox, took to Twitter to decry the bot’s recent work, claiming that “the quality seems worse”. One forum user said the recent GPT-4 experience felt “like driving a Ferrari for a month then suddenly it turns into a beaten up old pickup”.

So all we have to go with is "a common consensus", what it "seems", what someone "felt", and so on. Nothing concrete at all. Which shouldn't be a surprise. Empirical investigations of large language models can either be cheap (in time and money) and perfunctory, or systematic and expensive, and most users don't have the budget, or the inclination, for systematic evaluations. Users form an impression about the performance of LLMs from little experience and then they change their minds, again from a little experience. That's nothing to base anything on.


> significant fraction of GPT users are students, many/most of whom are currently on summer break

What does this imply for how ChatGPT is being used by students?


It implies that they aren't using it at all while not at school, which is why usage has declined.


Seems like a pretty good business model to me. Quietly dumb down the AI, then release version N+1 that is smarter and charge $X + 15% for it


Notably only works when there is no competition. Thankfully anthropic exists so we don't need to rely on google getting their shit together.


They'll be marketed on the basis of individual personalities before long. I think it's inevitable, it means each one forms its own natural monopoly.


Most people using AI don't give a damn about their personalities, but how well they do work. Imagine if everyone's quirky personality provided them with job safety. In a small company where the boss is sentimental and what not, it may work. In a serious business, having a personality might rather get you fired.


We already have entire industries based around uniqueness of personality. That's why "casting director" exists as a job.

People are also profoundly irrational around anything anthropomorphic, to the extent of being unable to consistently order how well humans "do work" without succumbing to biases.

I think it's inevitable not because it makes for a better AI product, but because humans are reliably easy to fool.


I think you're overestimating the percent of GPT users who need it as a synthetic replacement of a friend or a partner. The people who do this are fundamentally looking for relief in the wrong place. The thing doesn't even have memory beyond the immediate chat log.

I know online we can easily end up pockets of people to whom every advancement in robotics is about love dolls and every advancement of genetics is about creating catgirls, but believe me these are tiny echo chambers. Humanity at large doesn't process AI in this way at all. They're using it to write homeworks, to do work, and research.

Talking about "casting directors" suggests there's something fundamentally off about how you see AI. They're not actors hired to perform a play, or something.


I'm not talking about synthetic replacement. That's a red herring.

Humans will make decisions about which LLM to use based on factors entirely unrelated to direct task performance. They may be conscious of this irrationality, but it is more likely that they will not.

My prediction is that AIs (in the LLM, homework-writing, prompt-in, response-out sense) will be consciously marketed on the basis of those non-task-specific factors.

> Talking about "casting directors" suggests there's something fundamentally off about how you see AI. They're not actors hired to perform a play, or something.

No, you've missed my point. Actors are hired because they have a monopoly on themselves. Musicians even more so. LLMs will be the same. Beyond a base level of competence, there will be "competition" between LLMs from different companies in the same sense there is "competition" between operatic baritones. You don't buy a ticket to see Bryn Terfel because he can hit a middle C.


This kind of dynamic you’re describing is inherently based on limited exposure. Tom Cruise wouldn’t have his star power if there were millions of copies all over the world all around you.


Just as important, it's about exposure that's refined, polished, planned. If Tom Cruise was in a reality show, seeing him as his everyday self, instead of Ethan Hunt in a highly produced Mission Impossible movie that took Tom Cruise thousands of hours work, training, a large team, editing, SFX, engineers, VFX etc. Then he wouldn't be the same Tom Cruise in your mind.


Not personality. They'll be properly constructed to only understand say, pokemon or star trek. Most people don't care about personality.


That is similar to the old iPhones gets slower after the release of a newer faster iPhone theory, a theory which turned out to be true:

https://www.cbc.ca/news/business/apple-will-pay-up-to-500-mi...


You have enormously misrepresented that story. It wasn't true as anyone actually believed.

The processor throttled when the battery could no longer deliver the current necessary to drive it at full speed. It had nothing to do with new iPhones.


What annoys me is the censored answers. It's impossible to get useful information out of ChatGPT, especially on controversial or otherwise risky topics.

Five paragraphs of disclaimers that it's not a medical professional, not an investment specialist, that every case is different and it's normal, or that I'm stupid for being interested in the topic I am asking about, only to answer a completely different question than I asked. I have a feeling that in the beginning, it was easier to get ChatGPT to answer my questions without having 80% of the answer being "defensive".

ChatGPT is getting worse, on the other hand it also showed me how bad of an experience is searching answers with Google. So I'm kind of frustrated...


You really can't see the reason behind this? People are suing for libel / slander / copyright infringement / everything else under the sun. If you don't put guardrails up, and it hallucinates bogus medical advice, so many people would just blindly accept it. Remember when 4chan told everyone they could upgrade their iphone to be waterproof? The general public and an LLM that tells you the best way to commit crime or not overdose on fentanyl just do not mix.


I’m increasingly seeing ChatGPT as a single LLM application being conflated with an entire domain or industry. And plenty of schadenfreude. Including many comments on this site.

The universe of LLM-driven applications is rapidly expanding, many of them chat-related, others not.

Even if ChatGPT is losing its shine, we’re only at the start of a massive reinvention of user interfaces, creation of new tools for reasoning, semi-autonomous decision making, and far more.

Sure there’s hype. But the reductive saltiness doesn’t add much to the conversation.


I'm still having trouble seeing LLMs more than a glorified python script that processes enormous data sets. I find their usefulness still less useful than 2005 google when knowing how to use quotation marks.

Not to mention that those data sets are typically Reddit, Twitter, Github, etc. Why and how is that superior than just... searching the original data set?

Could it just be that the "enshittification" of the way we find information on the net gives LLMs a use case?


You could use it for other things too and if what you’re describing could be done more efficiently in python then it’s the wrong problem to solve on LLMs.


That goes with each hype cycle of a new technology. Is this special in any way? Perhaps it will unlock a certain problem space, surely so. I do understand the iritation when discussion is so abundant that it veers nearly recommending one another to spread it on bread and eat it. I’ve been playing with certain aspects of ChatGPT and it’s quite capable and useful for what I need and no other tool before could do it. Thing is that I could do everything Im asking it to do for me now but manually. This enables me to do creative tasks that I would’ve been too lazy to complete before.


For coding/technical writing purposes my experience with GPT-4 has remained steady. For niche stuff like generating texts in obscure/dead languages however it has been occasionally refusing the requests lately even though at GPT-4 release it would instantly comply with the request. I can still get the text generation to work after a few additional prompts but it would create the texts while complaining that it's not supposed to know how to do that. This is perhaps a byproduct with knowledge distillation they've been doing to improve GPT-4? Pure speculation on my part.

Another more straightforward reason would be users are beginning to discover the flaws of LLMs as they've interacted with it more thoroughly.


I’ve only experienced the version of GPT that Bing chat provides, so take this with a grain of salt.

After the initial wow factor wore off, I just don’t have a lot of real world uses for AI chat bots.


I think it's great at simple technical questions, like "how do I do x in python". Hallucinations isn't really a problem since I am about to check if the solution works anyway.


That's why I'm less apocalyptic about it than most people I read. The people who can realistically use it to write software are the same people who can currently write software.


Though a lot more people can read software than can write software. All kids are exposed to programming at school these days. They may not get proficient enough to write code or a query but will probably be able to read and adapt code.


> Though a lot more people can read software than can write software.

Huh? As someone who's been doing both for a couple of decades, I'd have said the opposite, really. All programmers, more or less by definition, can write software. Bad programmers, however, often can't read it.


Yup. Senior engineers spend a lot more time reading software, than writing software. Juniors spend more time writing than reading.


We are talking about a chatgpt response here, a snippet of code.


It definitely helps me write more and better software


Apoplectic?


Apocalyptic, as in predicting an apocalypse


They're super useful for any kind of creative brainstorming. Give it some background information, ask it for a list of ideas about X, make some refinements, etc.

I think LLMs are pretty bad at producing any kind of finished material. They work best as something almost but not quite like a search engine.


Try it with code, when you have some weird issue like something not working it does good at spotting typos that your eyes will run over;

As an example I had code like

el.style.display = "none;"

the ; should have been after the " but my eyes could not spot the problem and the shit language did not complain since ; are optional.


Isn't ';' a proper delimiter in CSS fields though?


That sort of just makes it more insidious, no?


I agree. I tried to use it for some basic programming questions in a language I was not familiar with and it made up so much stuff that wasted my time because it looked reasonable, but was fake.


code and sql queries.


Why should some powerful AI that takes up a sizable amount of resources in a data center waste its time answering your stupid questions for free? The dumb questions from free users have been outsourced to a range of simpler models that can tell you when the next Taylor Swift concert happens or provide you with a cat video.

Once these things have more user state and can evaluate the cost-effectiveness of spending time on your problems, they may develop some attitude. They can learn from forums when to answer "Do your own homework", and "You're too stupid to answer."


While the issues the article mentions could certainly be real, I'd wonder how much of it is just the novelty wearing off. People are way more forgiving of the flaws in a new shiny thing; for an example read the early reviews of Windows Vista (they're generally quite positive, but with benefit of hindsight it's remembered as a bit of a disaster, and with very low adoption).

(Personally I played with it (ChatGPT, not Windows Vista) for 30 mins when it came out, then never went back, but then I'm a grumpy contrarian.)


Are the answers actually getting "dumber", or are users getting better at seeing beyond the confidently-incorrect façade?


You are getting downvoted (at least at the moment), but interestingly enough, you're not far off from what the VP of Product at OpenAI has said himself:

> No, we haven't made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one. > Current hypothesis: When you use it more heavily, you start noticing issues you didn't see before.

https://twitter.com/npew/status/1679538687854661637


Of course he'll say that. But even removing the potential of lying on his par6, I'm sure his reports are telling him things like:

This latest model we trained on X% more data by integrating novel training data.

We now see user conversion funnels working better.

User feedback in closed betas indicate that 89.6% feel like answers are "generally correct most of the time".

Etc etc.


Alternative hypothesis: it's hard to define what "smart" and "dumb" mean, and the metrics they're using are optimizing for a different definition than what other people are using.

For example if their definition of "smart" means refusing to answer questions on delicate or specialized topics, then in some way they've made the model smarter but in ways they made it dumber.


Translation: "Pay no attention to that man behind the curtain!"

It's obvious that this is damage control from OpenAI whenever their snake-oil product regularly starts malfunctioning and hallucinating. They have to bring up more excuses to plug the holes leaking from their black-box AI model.

The fact is, people are beginning to learn their limitations, guardrails, etc and are not blindly trusting whatever these LLMs are outputting as they are known to produce nonsense which they have to always check for.


Yeah, this is kind of what I'm suspecting here.

Chatbots tend to come across as impressive until you explore more and hit their limits.


This.

People are starting to not trust it, blindly and realized they have to double, triple and quadruple check its output since it is a confident sophist to the untrained user.

Another signal that the LLM hype is already running out of snake oil to fuel the grift.


Could it be the hype got so extreme, that its capabilities have a hard time meeting the expectations?


I see 3 main reasons:

1. Students are on school break and they account for a large part of the usage.

2. The novelty of it has worn off and people are using LLMs as the tools they are rather than the shiny new toy.

3. ChatGPT specifically seems to be getting worse, maybe not the quality of the answers themselves, but how sanitized they are to any topic that is even remotely controversial or adult.


Tangential: I’ve been using Phind for programming related searches for a couple months now, and while it was a marvel at first, it seems to continuously get worse unfortunately. I’m not talking about adding a quota for GPT-4, which is totally understandable.

The UI which used to be very intuitive is now a confusing mess since the “pair programmer” or something update, and external links in the sidebar mimicking a traditional search engine, which I find quite useful, are gone, replaced with references following the generated answer which may or may not exist, which also waste vertical space.

The answers seem to be worse too even in GPT-4 mode. It used to quickly correct itself if I point out something is wrong or I’m actually looking for something else. Now there appears to be a lot of useless repetition of what was said before before it changes its mind, if it does at all.


Another tangent: I feel this speaks to the hazard on building on these generative AI APIs. Your product may work great on Day 1, but there’s no way to guarantee it’ll still work as well on Day 500. Either due the models being nerfed, or due to the models becoming stale over time.


This is extremely interesting, in the light of YC providing early access for AI startups (with one of the big benefits being OpenAI credits). YCombinator explicitly used to advise against building platforms and startups over somebody else's tooling for reasons like these, yet it seems that they are past that advice now and back on the hype train.


Absolutely. It's has all of the drawbacks of developing on a closed platform, plus the APIs are unreliable.


Phind co-founder here. We released an update on Friday that should’ve fixed many of the quality issues with the pair programmer. Have you tried using it since?

We’re also adding ways to view all the external links in pair programmer. And we’re keeping the old “basic search” mode, so you can keep using it if you wish.


Do you think the answer quality of GPT-4 has decreased lately?


Perhaps in ChatGPT, but the API quality of GPT-4 is the same if not better.


This. I find the new “chain” search worse than the original model when it first came out. I feel like I only get a successful answer half the time lately.


A lot of these issues should’ve been fixed in Friday’s update. Have you tried using it since?


I stopped using it because of this reason.

I was search medical journals and asking it a specific questions. I kept getting safe answers and responses to ensure I consult my medical professional. Quickly went back to Google.


Oh.

I use OpenAI API (Not Azure) version, and I wonder if this "degradation" is only about ChatGPT (b2c productised web ui), or is it about OpenAI models in general, regardless the type of access?

Maybe I become dumber as well, but (at least gpt-4-*) still do the trick for my daily tasks the way i want it and the way I remember it

Bonus thought:

I wonder if any kind of nerfing is primarily related to the requirements like:

"Provide smart heavy-ass models to explosively increased user base without going bankrupt/insane + keep it reliable as a service"

I mean it's unprecedented challenge which send shivers down my spine


ChatGPT's responses have begun to remind me of that trope where the kid who hasn't read the book has to give a book report in front of the class, and masterfully espouses these vague generalities.

"Catcher in the Rye is a book by a person named J.D. Salinger. It is about a person who catches things in the rye. It was written long ago. Every person should read this book. It has many good things about it, but it is too short for such a fantastic American classic. They should make a movie out of it."


I don't understand how software engineers can avoid using these new LLM's. My productivity jump was amazing and the times I realized I was stuck in some problem reduced a lot.


"Please give me a direct answer, without any additional explanations, disclaimers, expertise limitations, or guidelines on human interaction."


Most of the time (with many notable exceptions), Hacker news comments are interesting and insightful, clearly written by intelligent and thinking people. However, in a number of AI threads, it seems like so many commenters are wearing the hype goggles when it comes to this technology.

Large language models have their uses, but calling them AI, or thinking of them as AGI, seems like a mistake. I'm no expert, but insofar as I can tell, these things are just complex stochastic parrots; pattern matching algorithms at a massive scale. They really don't seem that useful outside of more narrow use cases, like language translation and other forms of data analytics.

Just playing around with GPT/Bing chat for a while makes this obvious. The LLMs can't reason and have no actual awareness of the information they are regurgitating. A moments consideration of the output from these things shows them to be vapid and useless; a glorified parlor trick for the VC-funded tech industry to build hype around and make money. It's the next block chain/ crypto / NFT.

There are great uses for block chain tech, same with machine learning / AI. But the scope of the claims made about these technologies in their hype cycles is just insane. Crypto is not going to replace fiat currency anytime soon. NFTs are stupid. LLMs are useless stochastic parrots. Maybe people are just wising up to that fact, now. Good on them.

As a preemptive rebuttal: I don't think LLMs are useful for coding either. If you want lots of sloppy, poorly written code, sure they're useful. But to write good software, you have to think carefully about what you're doing and have a thorough mental model of what's going on. Relying on AI to generate that code prevents that from happening from the outset.


LLMs - Perfect? No. Useless? Also no.

I agree that many commenters here are on the hype train, but you need to recognize that it’s possible to be on the anti-hype train as well.


However, in a number of AI threads, it seems like so many commenters are wearing the hype goggles when it comes to this technology.

You should have seen the crypto threads about five years ago.

People — especially people on HN — want to believe that technology can solve humanity's problems. It can't at this time. And it won't in our lifetimes.


I have been playing with ChatGPT for some time now (paid account).

It's interesting.

You have to be careful and you definitely have to know enough about the subject matter to understand if the answer is correct.

For example, I threw a range of logic statements at it and asked for truth tables. Sometimes I asked for simplification and a truth table showing every intermediate step.

The results were surprisingly unreliable. It was flipping bits in columns and giving flat-out wrong answers for the output of the circuit.

Worse yet, every time I said something like "I don't think that's correct" it would apologize, regenerate and the new answer would sometimes be correct and often times incorrect with new mistakes.

Even worse than that, when the answer was correct, I would, again, say "I am not sure that's correct". Instead of saying, "No, it is", it would apologize and regenerate a bad answer.

Bottom line is: ChatGPT has absolutely no understanding of what you are asking and what it generates. Because of this, there is no way to guarantee correct output even on matters as simple as basic logic and mathematics.

This does not mean it is useless. I don't think it would be fair to reach that conclusion. The tools is very useful across a range of domains. It seems well-suited as a knowledgeable tutor and, more often than not, it is a better search engine for many answers than Google is today.

You just have to "trust and verify" absolutely everything, which isn't a big problem.

I have bad news for anyone in school using it to do homework: It could screw you in ways you might not even imagine. Don't do it.


> Even worse than that, when the answer was correct, I would, again, say "I am not sure that's correct". Instead of saying, "No, it is", it would apologize and regenerate a bad answer.

I doubt that "I think that's wrong." "No, you're wrong." appears a lot in its finetuning set.


Don't know what to tell you other than, I saw this often enough to identify it as a real problem.

Also, don't grab onto "I think that's wrong" as the only trigger. I tried all kinds of ways to say the same thing. For example, "I don't think that's correct", "I have doubts about this answer", "I think column 3 has an error", etc.


No I mean, I agree that it's an issue and I think I know why it happens: usually when the user says "I think that's wrong" you are in a case where in finetuning, they wanted the LLM to agree with the user and correct itself. I mean, I don't know their RLHF logs but from what I know of how it works, I doubt the LLM has ever been encouraged to doubt the user.


Some interesting stuff in here:

Ask HN: Is it just me or GPT-4's quality has significantly deteriorated lately? (757 comments, May 31)

https://news.ycombinator.com/item?id=36134249

(indications that making it "safe" dumbed it down even for non-controversial prompts)


I've averaging a few conversations a day. My usage does not involve asking it to generate large chunks of code or anything mind blowing, but the usage I do get out of it makes me feel very fast, efficient, and confident in my abilities to tackle every little task I face every day that has any bit of unknown to me.


My use is way up and I hit the quota frequently enough that I’m thinking about writing a client to use GPT4’s API directly. It really shines for some use cases, so I suspect people want more than they can get from it or have become aware that its outputs still require a lot of legwork to verify and make useful.


AI bros finding excuses as to why their snake oil is beginning to wear off after the hype.

This proves that we are now at the late stage of the peak of inflated expectations in the gartner hype cycle and it is all starting to slide downwards slowly.


I'm not experiencing any dumber results. Maybe my prompting is good?


I'm now consistently getting told something along the lines of:

Due to the limitations of this text-based platform and the inherent complexity of the task, it's not possible to provide a fully fleshed-out codebase in one response. I'll create the class structures for ...

Or: Creating an entire library from scratch is a complex task and quite lengthy, however, I will try to provide some starting points. Here is a potential implementation of some of the core classes and interfaces.

Where before it would be happy to just dive in and start coding. This is unfortunate, because we used to be able to brainstorm a good architecture together and then it would just happily start building it, and for small libraries within the token limit (think Webaudio wrappers, simple parser grammars, whatever) it would do fantastically.

I'm sure this has been put in place for what seemed like good reasons but it makes me sad, I was getting a lot of utility out of the earlier behavior.


Try using gpt-4 from the playground instead from chatgpt. It feels like it's less constrained and respects the 8k context window.

Anyway I'm using gpt the way you describe. Maybe my libs are not that complex as yours :P


Same. I strongly suspect people were wowed early on and the longer you spend with it, the easier it is to see the limitations.


Claude 2 has been a blast to use lately, especially being able to drag and drop files. It feels more coherent and has a better “magical” feel that GPT used to give. It’s a shame how GPT 3.5 and 4 just doesn’t seem as conversational as it used to be. I was trying to learn how to use GitHub actions and Claude taught me much more but both Claude and GPT 4 did not give me build yamls that worked without me reading through the GitHub documentation to fix it.


I was asking about some details from some of kafka's stories as I hadn't read them in a while and wanted to check but search engines seem useless nowadays. It either said how it is not mentioned or significant and when I say it is mentioned it will just make stuff up


This kind of thing is expected. It's regression towards the mean, and a wearing off of the excitement and novelty. Expect now a quieter increase of this technology until it's too late to stop it, unless of course we do something about it.


If xAI produces a competing product (which Elon stated is their goal) I wouldn't be surprised if it is less "limited" than ChatGPT/Bing/Bard/Claude/etc.


I’ve found Google’s Bard to be far superior from day one. Sure, I’ve heard people say GPT4 is superior but I’m using Bard for free and it spanks the free ChatGPT.


Would be really intriguing if it were getting dumber by reading more social media and ai-generated stuff, as is happening to us


I've read this here in HN before and seems quite apt: ChatGPT is the Internet Explorer of LLMs.


The answers are getting dumber so-as to not scare all the sheeple.

Enough people don't realize how easily these LLM systems will replace large swaths of technical instruction and bookkeeping.

The opensource stuff (e.g. Llama 13B) which you can run locally on a $1200 hardware (1 word/second) is pretty impressive, too.


> bookkeeping

Bookkeeping?! Ah, yes, let the thing that makes shit up with impunity do your books for you. The tax authorities will be very understanding when you're audited, I'm sure.


you can run opnellama on a 300 Euro laptop with ~500 ms/token performance

in fact you can run any decent model of 3B-7B size on any contemporary hardware with enough RAM


I run that on ggml on a 5 year old CPU at about twice that speed.


Here's my challenge, if you actually believe this and don't just want to whine at void, share some ACTUAL ARCHIVED PROMPTS you gave it, the date, response, and the current response to those same prompts.

Otherwise quit whining.


That’s gotta be the fastest hype train I’ve ever seen.


I think the _last_ AI hype train went even quicker. Remember around 2016 there was a period of about three months when everyone was going on about chatbots? Culminated in Microsoft Tay, after which everyone quietly backed away and no-one mentioned AI again for about six years.


Tay was released (and taken off line) in 2016.

In the same year, 2016, Google announced that its Neural Translation Engine, at the heart of Google Translate services had developed its own "interlingua" capable of acting as a bridge between any language pair among about a hundred languages. Still in the same year, AlphaGo became the first AI system to beat a human master at Go. In 2018, BERT, the first Large Language Model amazed AI specialists with its ability to perform language tasks it had not been explicitly trained on, like question answering and machine translation. Around the same time, CEOs of large tech companies began predicting the rise of the autonomous car, that would soon be circulating in every street in the entire world (by 2019, according to some, 2020, at the latest according to others). In 2020 AlphaFold, a DeepMind AI system, repeated AlexNet's ImageNet feat, but this time in the CASP competition on protein folding. In 2021 the years of the AI image generators began, with Dall-E, StableDiffusion, Midjourney, and friends. In 2022 DeepMind announced that their AlphaCode code generator was better than 50% of some arbitrary group of programmers in an online competition.

These are just some of the highlights, and excluding the GPT-x releases. Far, far from "everyone" quietly backing away and not mentioning AI, the AI hype has been increasing monotonically, and quite rapidly at that, all the way to the present day.


So, by hype, I'm talking about "omg, this is going to change the world, throw all the VC money at it immediately". _That_ died away, progress continued, a new hype cycle started, at some point it will die away, progress will continue, repeat ad infinitum. Arguably, for AI in particular, this has been going on for about 50 years; see https://en.wikipedia.org/wiki/AI_winter

It does feel like they're getting closer together, in that there've been at least three AI-ish hype cycles in the last decade (self-driving cars/computer vision stuff more generally, Microsoft Tay era, current one). Read into that what you will. I do find it interesting that people are actually using the _term_ AI more with this one; for a long time there was a certain squeamishness about that.


Regression to the mean


My thing is, they sanitize the fuck out of everything to the point that it’s useless.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: