Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why does no one seem to care that AI gives wrong answers?
70 points by arduinomancer 9 months ago | hide | past | favorite | 116 comments
If you had a piece of code or software that sometimes produces totally wrong output we would consider that a bug.

Yet it seems like with AI all the investors/founders/PMs don’t really care and just ship a broken product anyway

I feel like I’m going crazy seeing all the AI stuff ship in products that gives straight up wrong outputs

It’s like a big collective delusion where we just ignore it or hand wave that it’ll get fixed eventually magically




My graduate research was in this area. My lab group developed swarm robots for various terrestrial and space exploration tasks. I spent a lot of time probing why our swarm robots developed pathological behavioral breakdowns - running away from construction projects, burying each other, etc... The issue was so fundamental to our machine learning methods that we never found a way to reliably address it—by the time I left, anyway. No matter how we reconfigured the neural networks, trained, punished, deprived, or implemented forced forgetting or fine-tuned, nothing seemed to eliminate the catastrophic behavioral edge cases—nothing except for dramatically simplifying the neural networks.

Once I started seeing these behaviors in our robots, their appearance became much more pronounced every time I dug deeply into proposed ML systems: autonomous vehicles, robotic assistants, chatbots, and LLMs.

As I've had time to reflect on our challenges, I think that neural networks very quickly tend to overfit, and deep neural networks are incomparably overfitted. That condition makes them sensitive to hidden attractors that cause the system to break down when it is near these areas - catastrophically.

How do we define "near"? That would have to be determined using some topological method. But these systems are so complicated that we can't analyze their networks' topology or even brute-force probe their activations. Further, the larger, deeper, and more highly connected the network, the more challenging these hidden attractors are to find.

I was bothered by this topic a decade ago, and nothing I have seen today has alleviated my concern. We are building larger, deeper, and more connected networks on the premise that we'll eventually get to a state so unimaginably overfitted that it becomes stable again. I am unnerved by this idea and by the amount of money flowing in that direction with reckless abandon.


I believe I saw this research featured in a documentary or some other film. Am I remebering incorrrectly.


I’m not aware of any specific films we were in. We filmed a lot of our robotics trials for various collaborations, but no documentaries while I was there. Shortly after I left, my team got some acclaim for their Lunar Ark project (which is really cool). But, I had been out for a couple of years by that point. If they filmed a documentary, it likely would have been for that project.


Personally, I and people I've spoken with use LLMs less and less because of how often they're wrong. The other day I asked ChatGPT about a specific built-in method in Java and it told me that it couldn't do one specific thing. I was already using it in that context so I pushed back and it said "Oh yeah you're right, sorry"

I feel like I can't trust anything it says. Mostly I use it to parse things I don't understand and then do my own verification that it's correct.

All that to say, from my perspective, they're losing some small amount of ground. The other side is that the big corps that run them don't want their golden gooses to be cooked. So they keep pushing them and shoving them into everything unnecessarily and we just have to eat it.

So I think it's a perception thing. The corps want us to think it's super useful so it continues to give them record profits. While the rest of us are slowly waking up to how useless they are if they will confidently tell us incorrect answers and are moving away from it.

So you may just be seeing sleezy marketing at work here.


> I was already using it in that context so I pushed back and it said "Oh yeah you're right, sorry"

Same thing happened to me. I asked for all the Ukrainian noun cases, it listed and described six.

I responded that there are seven. "Oh, right." It then named and described the seventh.

That's no better than me taking an exam, so why should I rely on it, or use it at all?


If you find it absolutely necessary to only work with coworkers who are incapable of making mistakes, I assume that you probably work alone?


If you have a coworker who makes mistakes at the same rate that ChatGPT makes them, it might be perfectly reasonable for you not to want to work with that coworker.

Humans aren't perfect. But "makes some mistakes" and "confidently spews errors at a high rate" are not the same. The difference matters.


Actually I do mostly work alone, I'm a truck driver. But that has nothing to do with my scepticism.


Coworkers learn, chatgpt doesn't.


Hype is fading, so usage decreases.

But you must admit that it is still useful, and usage will not drop to zero.


LLMs would be better nomenclature than AI in this context.

LLMs are not factual databases. They are not trained to retrieve or produce factual statements.

LLMs give you the most likely word after some prior words. They are incredibly accurate at estimating the probabilities of the next word.

It is a weird accident that you can use auto-regressive next word prediction to make a chat bot. It's even weirder that you can ask the chatbot questions and give it requests and it appears to produce coherent answers and responses.

LLMs are best thought of as language generators (or "writers") not as repositories of knowledge and facts.

LLM chatbots were a happy and fascinating (and for some, very helpful) accident. But they were not designed to be "factually correct" they were designed to predict words.

People don't care about (or are willing to accept) the "wrong answers" because there are enough use cases for "writing" that don't require factual accuracy. (see for instance, the entire genre of fiction writing)

I would argue that it is precisely LLMs ability to escape the strict accuracy requirements of the rest of CS and just write/hallucinate some fiction that is actually what makes this tech fascinating and uniquely novel.


> LLM chatbots ... were not designed to be "factually correct" they were designed to predict words.

For this question, what LLMs were designed for is I think less relevent than what they are advertised for, e.g.

"Get answers. Find inspiration. Be more productive. Free to use. Easy to try. Just ask and ChatGPT can help with writing, learning, brainstorming, and more." https://openai.com/chatgpt/

No mention of predicting words.


The thing I find fascinating is that apparently there is a chunk of behavior that we might define as “intelligent” on some level that seems directly encoded in language itself.


I completely agree. As language is the preferred encoding method for intelligent thought (at least in our species) it could very well be that a sufficiently accurate language model is also a generally intelligent model.


> LLMs are best thought of as language generators (or "writers") not as repositories of knowledge and facts.

And the utility of a "language generator" without reliable knowledge or facts is extremely limited. The technical term for that kind of language is bullshit.

> People don't care about (or are willing to accept) the "wrong answers" because there are enough use cases for "writing" that don't require factual accuracy. (see for instance, the entire genre of fiction writing)

Fiction, or at least good fiction, requires factual accuracy, just not the kind of factual accuracy you recalling stuff from an encyclopedia. For instance: factual accuracy about what it was like to live in the world in a certain time or place, so you can create a believable setting; or about human psychology, so you can create believable characters.


I'd argue that what you're talking about in fiction is coherence (internal consistency) not factual accuracy (consistency with an externally verifiably ground truth).

I'd also argue that the economic value of coherent bullshit is ... quite high. Many people have made careers out of producing coherent bullshit (some even with incoherent bullshit :-).

Of course, in the long run, factual accuracy has more economic value than bullshit.


> I'd argue that what you're talking about in fiction is coherence (internal consistency) not factual accuracy (consistency with an externally verifiably ground truth).

No. I'm talking about "factual accuracy (consistency with an externally verifiably ground truth)." Mere internal consistency is not enough: a fictional world where everyone consistently stabs themselves in the eye when they see flashing lights is consistent, but lacks factual accuracy, and is therefore garbage fiction.

> I'd also argue that the economic value of coherent bullshit is ... quite high. Many people have made careers out of producing coherent bullshit (some even with incoherent bullshit :-).

I agree there's (greedily selfish) "economic value" to coherent bullshit, but there's negative social value to it. It's basically a kind of scam.

IMHO, some of the best applications for LLMs are for things like spam and scams, not the utopian BS they're promoted for (e.g. some LLM will diagnose your illness better, faster, and cheaper than a doctor).


So, if you dealt with a person who knew all the vocabulary related to a field, and could make well-constructed sentences about that field, and sounded confident, it would almost always mean they had spent a lot of time studying that field. That tends to mean that, although they may occasionally make a mistake, they will usually be correct. People apply the same intuition to LLMs, and because it's not a person (and it's not intelligent), this intuition is way off.

There is, additionally, the fact that there is no easy (or even medium difficult) way to fix this aspect of LLM's, and it means that the choices are either: 1) ship it now anyway and hope people pay for it regardless 2) admit that this is a niche product, useful in certain situations but not for most

Option 1 means you get a lot of money (at least for a little while). Option 2 doesn't.


Yep. I think this is right on. The anthropomorphization in their behavior and problem descriptions is flawed.

It's precisely that analogy we learned early in our study of neural networks: the layers analyze the curves, straight segments, edges, size, shape, etc. But, when we look at the activation patterns, we see they are not doing anything remotely like that. They look like stochastic correlations, and the activation pattern was almost entirely random.

The same thing is happening here, but at incomprehensible scales and with fortunes being sunk into hope.


Even with a human speaker, that's not a totally safe assumption, and a certain type of fraudster relies heavily on people making this type of assumption (the likes of L Ron Hubbard in particular liked to use the language of expertise for various fields in a nonsensical way, and this is extremely convincing to a certain sort of person). But LLMs might almost have been designed to exploit this particular cognitive bias; there's really significant danger here.


Agreed. The optimistic scenario is that being exposed to so many "hallucinating" LLM's, people will become better at spotting the same thing in humans. But, I admit freely that is just the optimistic scenario.


I find that "intelligence" was a shaky concept to begin with, but LLMs have completely thrown the idea in the trash. When someone says "LLMs are not intelligent", I treat that as a bit of a signal that I shouldn't pay much attention to their other points, because if you haven't realized that you don't have a good definition for intelligence, what else haven't you realized?


> When someone says "LLMs are not intelligent", I treat that as a bit of a signal that I shouldn't pay much attention to their other points, because if you haven't realized that you don't have a good definition for intelligence, what else haven't you realized?

So you have a good definition for "intelligent", and it applies to LLM? Please tell us! And explain how that definition is so infallible that you know that everyone who says LLM aren't intelligent are wrong?


> So you have a good definition for "intelligent", and it applies to LLM?

No. I feel like that was my whole point.


I'm sure there's plenty I haven't realized, but the reason it's worth pointing out that LLM's are not intelligent, is that their boosters routinely refer to them as "AI", and the "I" in there stands for "intelligence", so pointing out that the label applied is not accurate, is important.


In that case, what AI system do you feel that the label can be applied to?


I don't think there is one. Many researchers had tried to switch the field to using "ML" instead, since it's a lot more accurate of a label, but it doesn't hype as well, and that appears to have been the decisive factor.


When I did my masters fifteen years ago, it was a masters in "machine learning". AI was already just what laypeople called it.

If your argument is that "this isn't AGI", I don't think anyone at all disagrees, but then that's a bit of a tautology.


I haven't found a human that answers every single question correctly, either. You know whom to ask a question based off that person's domain of expertise. Well, AI's domain of expertise is everything (supposedly).

What gets difficult is evaluating the response, but let's not pretend that's any easier to do when interacting with a human. Experts give wrong answers all the time. It's generally other experts who point out wrong answers provided by one of their peers.

My solution? Query multiple LLMs. I'd like to have three so I can establish a quorum on an answer, but I only have two. If they agree then I'm reasonably confident the answer is correct. If they don't agree - well, that's where some digging is required.

To your point, nobody is expecting these systems to be infallible because I think we intuitively understand that nothing knows everything. Wouldn't be surprised if someone wrote a paper on this very topic.


This is a common argument to support the usage of LLMs: "Well, humans do it too."

We have many rules, regulations, strategies, patterns, and legions of managers and management philosophy for dealing with humans.

With humans, they're incorrect sometimes, yes, and we actively work around their failures.

We expect humans to develop over time. We expect them to join a profession and give bad answers a lot. As time goes on, we expect them to produce better answers, and if they don't we have remediations to limit the negative impact they have on our business processes. We fire them. We recommend they transfer to a different discipline. We recommend they go to college.

Comparing the successes and failures of LLMs to humans is silly. We would have fired them all by now.

The big difference is that computers CAN get every single question correctly. They ARE better than humans. LLMs are a huge step back from the benefits we got from computers.


Also, humans can say "I don't know," a skill that seems impossible for LLMs


What's more useful to me is "I'm not sure" or "I'm very sure", but LLMs can't provide you with a level of certainty. They're very sure about everything, including things they make up.


> The big difference is that computers CAN get every single question correctly.

I emphatically disagree on that point. AFAIK, nobody has been able to demonstrate, even in principle, that omniscience is possible over a domain of sufficient complexity and subtlety. My gut tells me this is related to Gödel's Incompleteness Theorem.


I find this a useful frame of reference: don't assume anyone, anything is correct. Learn to work with what is, not what could be. AI is very helpful to me, as long as I don't have unrealistic expectations.


a) If you ask me about surgery I will say "I don't know". LLMs won't do that.

b) Experts may give wrong answers but it will happen once. LLMs will do it over and over again.


>b) Experts may give wrong answers but it will happen once. LLMs will do it over and over again.

Well... Sometimes "experts" will give the wrong answer repeatedly.


Assuming they aren't convinced they are wrong - and multiple opinions can be valid while differing. The world isnt always black and white. An AI can never be convinced it's wrong permanently, and sometimes it can't be convinced ed temporarily, depending on the model


Although new models do get trained and replace older ones, so from a users perspective it's not like they'll never change their answers on things. We've seen improvements over time, so while individual models are relatively fixed,the LLM industry itself is much more dynamic.


Yes, but the annoying part to me is that the model doesn't enhance itself based on my history with it.


> investors/founders/PMs don’t really care

Garry Tan from YC is a great example of this.

It's not that he doesn't care. It's just that he believes that the next model will be the one that fixes it. And companies that jump on board now can simply update their model and be in prime position. Similar to how Tesla FSD is always 2 weeks away from perfection and when it happens they will dominate the market.

And because companies are experimenting with how to apply AI these startups are making money. So investors jump in on the optimism.

The problem is that for many use cases e.g. AI agents, assistance, search, process automation etc. they very much do care about accuracy. And they are starting to run out of patience for the empty promises. So there is a reckoning coming for AI in the coming year or two and it will be brutal. Especially in this fundraising environment.


> It's not that he doesn't care. It's just that he believes that the next model will be the one that fixes it.

No, what he does is he hopes that they can keep the hype alive long enough to cash out and then go to the next hype. Not only Garry Tan, but most VCs. That's the fundamental business model of VCs. That's also why Tesla FSD is always two weeks away. The gold at the end of the rainbow.


When I was a kid there was this new thing that came out called Wikipedia. I couldn't convince anyone it was useful though because they pointed out it was wrong sometimes. Eventually they came around though.

AI is like that right now. It's only right sometimes. You need to use judgement. Still useful though.


I feel like this fails on the premise that the models can be improved to the point where they are reliable. I don't know that holds true. It is extremely uncommon that making a system more complex makes it more reliable.

In the rare cases where more complexity produces a more reliable system, that complexity is always incremental, not sudden.

With our current approach to deep neural networks and LLMs, we missed the incremental step and jumped to rodent brain levels of complexity. Now, we are hoping that we can improve our way to stability.

I don't know of any examples where that has happened - so I am not optimistic about the chances here.


The difference right now is that many people are paying for it. It feels odd to pay for something that could give wrong answers.


Your point is valid if you believe LLM/Generative AI is deterministic; it is not. It is inference-based, and thus it provides different answers even given the same input at times.

The question then becomes, "How wrong can it be and still be useful?" This depends on the use case. It is much harder for applications that require high deterministic output but less important for those that do not. So yes, it does provide wrong outputs, but it depends on what the output is and the tolerance for variation. In the context of Question and Answer, where there is only one right answer, it may seem wrong, but it could also provide the right answer in three different ways. Therefore, understanding your tolerance for variation is most important, in my humble opinion.


> Your point is valid if you believe LLM/Generative AI is deterministic; it is not. It is inference-based, and thus it provides different answers even given the same input at times.

Inference is no excuse for inconsistency. Inference can be deterministic and so deliver consistency.


Yeah. Almost all of the "killer apps" for LLMs all revolve around generating content, images, or videos. My question is always the same, "Is there really such a massive market for mediocre content?"


Lots of people care.

From a coding perspective, proper technical systems already have checks and balances (e.g. test cases) to catch bad code, and is something that's important to have regardless of generative AI usage.

From a creative/informational perspective, there are stories every day of hallucinations and the tech companies are correctly dunked on because of it. That's more product management error than AI error.

AI hallucination isn't a showstopper issue, it just has to be worked around.


The fact that every AI-based company gets dunked-on for hallucination somewhat suggests that hallucination is a showstopper issue and in fact cannot be worked around.


I agree personally. I don't use LLMs for most things these days because I've been bitten in the ass enough times (whether it was real fallout or just me being able to prove the LLM wrong) that I don't trust them at all. To me it's a showstopper because it can get even the simplest explanations very incorrect and that's not useful to me.

They're still my first go over google these days but I usually only use them for code or weird word transformations and any information is double checked. Pretty useless as an answer bot.


A "showstopper" issue is a QA term-of-art indicating an issue that the project cannot be pushed to production as long as it exists. The AI project managers made the calculation that the proportion of hallucination issues (and their consequences) is within acceptable bounds.

The tide is only turning recently on whether that's a good business tradeoff.


Nothing is a showstopper, if your quality-control is poor enough. The value of AI in industries with high-quality product management is inversely proportional to the quality of human input. In most well-paying careers, that makes AI obsolete from the get-go.


Where "acceptable bounds" equals whatever the current proportion is.


There's no "hallucination". This word is simply PR spin to disguise faulty output.


I’m also curious about this. This morning I needed to generate summaries of news articles when I noticed Bing AI was inserting facts that weren’t present in the source article. (Not correct at all) It really hurts the potential of what AI could do if I have to double check everything it generates. We wouldn’t accept a spreadsheet program that required double checking with a calculator, why do LLMs get a pass?


Because the big corps are the ones running the show and they have money and they have money invested and want returns. I can't think of another reason. Your spreadsheet example is perfect.


With Excel you might want that calculator.



Excel must be such an incredible codebase. The amount of spaghetti + spaghetti to keep legacy spaghetti behaviour must really be a sight to behold.


They did something insane to hide the 0.1 + 0.2 != 0.3 problem w/ binary floating point which causes even stranger results to occur occasionally.


Crazy that it is not regulated.


The reason they don't care is the typical user doesn't notice. He asks the bot questions to which he does not know the answer, leaving him unable to detect when the bot answer is wrong.


My completely baseless theory is that there is an unbelievable amount of astroturfing happening in defense of this technology at a rate never seen before because of how badly people with capital want this to work and how "close" the tech is to achieving that.

If this is correct, then it's less of "people don't care" and more "the hype is louder than them."

That said: I, too, am completely perplexed by people within the tech community using LLMs heavily in making software while, unironically, saying that they have to keep an eye on it since it might produce incorrect work.


I work in AI product eng for a larger company. The honest answer is that with good RAG and few-shot prompting, we can consider actual incorrect output to be a serious and reproducible bug. This means that when we call LLMs in production, we get about the same wrong-answer rate as we do any other kind of product engineering bug.


Think of it as a system returning complex results after a search. Think of it as a synthetic search result. Think of it as a result where you still have to evaluate the source for reliability. Think of it as a junior engineer making mistakes. Think of it as a reason why you will have job security for complex tasks because the easy tasks can be done by dumber project managers. Think of it like you are now a senior engineer with a junior engineer doing the mundane stuff.

Do the mundane stuff in school/college/boot camp. Do the cool stuff at work.


> Think of it as a junior engineer making mistakes.

A junior engineer who repeats the same mistakes even after correction, never learns ... and soon gets the sack.


looks like you have little experience with junior engineers. now try experiencing junior engineers who will never grow to senior engineering levels.


In 1942, atom bombs didn't work at all. Does that mean nobody on the Manahattan Project cared? In my mind, when I hear that nobody on earth has achieved something, but a massive bubble of people are feverishly working on it, my conclusion is that a lot of people care a lot. I assume by how your post was phrased, you're using the term AI to mean LLM. You noticed that 100% of LLMs, every last one of them, in research labs and commercial businesses, hallucinate and give wrong answers when prompted for something very different from their training data. Thousands are this way. None at all exist that are the other way. A very peculiar property of an entire technology. But your conclusion wasn't that it's inherent property of LLMs (a statistics machine), or that we need to move beyond LLMs to achieve AGI. It wasn't that LLMs have a lot of powerful uses when kept inside the narrow scope of their training. Your conclusion was that, across the entire earth, without exception, "investors/founders/PMs don’t really care". I'm sorry, I am not following — perhaps if you elaborate more on each logical steps you took to get to that conclusion, we can shed more light on what you're missing.


> In 1942, atom bombs didn't work at all. Does that mean nobody on the Manahattan Project cared?

In 1942 the a-bomb didn't exist. It wasn't an overhyped, extant bomb that gave wrong responses! Straw man. Straw bomb?


So far this year the following companies have asked me for money for their new "AI" features:

  - slack   
  - github   
  - microsoft   
  - google    
  - atlassian   
  - notion   
  - clickup   
  - hubspot
So ask yourself: Who benefits from the hype? And who would benefit from a better general understanding of the flaws?


Gitlab as well


Hype. Every so many years some sort of semi-novel software gets invented/improved and some dude puts 100mg of 2c-b in his Huel shake and realizes that he’s invented the godhead. This dude invariably has buddies that do VC.

It’s the same reason why we heard about blockchain for years despite it having near zero practical uses


2020 VR/AR 2021 Blockchain 2022 NFT 2023 AI

The hype is real. None of these solved anything that people need.

And I'm talking people, not "users".


Isn't this just about unrealistic expectations? Lots of people derive lots of value from AI, but it isn't good for everything, and can't be trusted on its own for many things.


Because everyone incorrectly assumes there is intelligence at work and because people don't want to critically evaluate answers because that takes a lot of time to do.


I found LLMs to be quite useful, so the time saved is worth the effort to double check answers.

Granted it might have to do with how I use LLMs. If you just blindly ask a question you increase the chance of hallucinations. If you give a lengthy input, and the output is highly dependent on the input than you will get better results. Think email re-writing, summarizing, translation.


Think about this: when was the last time SV or the broader tech industry brought a revolutionary innovation to the consumer that improved their lives? The smartphone? Video streaming? I can't think of a single thing since. And those were both over a decade ago.

The tech industry is an environment composed almost entirely of companies running a loss to prove viability (and don't see that as ironic) to raise more funding from investors. AI is just the latest in a long series of empty hype to keep the gravy train running, last year it was VR, and it looks like at this point that the whole thing is teetering on a cliff. It's a bunch of MBAs scrambling for a sales pitch.

LLMs are useful. But "extremely lossy compression of documents with natural language lookup built in" doesn't sell endless subscriptions as well as "we created a mind." So they sell hype, which of course they cannot live up to because LLMs aren't minds.


Because 99% of people which have this issue ask a question to just a LLM. What you do is you add RAG to it, or you make it an agent which can retrieve information, and suddenly it's very accurate.

LLMs are language models, not magical information models with all information in the world somehow fit into several gigabytes. Use them right.


If people are paying for their product, why should they care?

As for why people are paying for a product that returns incorrect results, could be any number of reasons:

- People buy into the hype/marketing and actually think AI-thing is going to replace some part of their workflow

- People want to experiment and see how well it does at replacing part of their workflow

- Whatever the AI-thing does for a customer is not reliant on it being correct, therefore producing incorrect output simply doesn't matter

A good example would be my company's corporate IT AI bot that is effectively a very poor search engine for internal corporate wiki self-help articles on IT and HR related stuff. The actual IT/HR portal has a traditional search that, if you know the terms to search for, does a much better job. So most people ignore the AI, but I'm pretty sure we bought the engine from someone.


I am interested to know why you omitted

- People don't know that much of the output is incorrect


The LLMs are basically useless for anything requiring reasoning or to help solve real problems, but the big use-case is that LLMs are competing with google search. Google search is so polluted with low quality ads-first SEO garbage that the may-be-hallucinating-LLM is a more effective way to find some sorts of information than manually slogging through the garbage and possibly getting the wrong answer anyway.

I suppose that there is also some hope that the hallucination problem will erode as more effort/compute is poured into the training. There may need to be a paradigm shift though, the current structure around generating tokens based on probabilities seems like it will forever be a 'regurgitator'.


It can give wrong answers sometimes and still be useful. Also there are many tasks where it almost always gives correct answers, text to speech with function calling is 100x better now than it was 2 years ago. And in some spaces, correctness is a fuzzy concept anyway (creative spaces).


Of course most everyone cares, but the value proposition is high enough that people aren't going to hold off on using it until it is perfect.

Nonetheless, as with autopilot, you don't want to substitute paying attention with it. "Trust, but verify" as Reagan said.


Probably for the same reason that tech hasn't improved anything in a long time.

Tolerable Pizza delivery is ruined. The Internet is a walled wasteland now. Far too much "content" that doesn't need to exist. Everything is an ad.

None of our lives have been improved by software.


Amen. I hate technology these days. It feels like the big corps managed to mess it up in the worst ways. Oh, you want to check the weather real quick? Here's an ad first. I'm sick of it. None of it even works properly or does what it's supposed to do half of the time but we're just supposed to eat it all up and pay more and more for it all.


Maybe it’s because I willingly pay for quality content and services that I enjoy like news and YT premium, but I don’t think things are that bad. LLMs mess up but they’re new and still have growing pains. My main IDE is great, Windows sucks but the mythical year of the Linux Desktop feels like it’s kind of here and quite usable, games are (much) worse and more predatory, as are most social media platforms but I just avoid those aspects. Privacy and rights over data isn’t great, but the GDPR is pretty good. Incredible resources like sci-hub have come about.

Idk, I felt like this a few years ago but things feel like they’ve got better. I miss the internet of yore just like everyone else (and particularly miss the decentralisation) but a lot is much better.


We built a correctness checker for LLM-generated SQL code for the military before LLMs were commercially available, it is going live soon on http://sql.ai . Some people do care about this problem, but it is hard to solve; even for SQL alone, this requires significant computer algebra, automated theorem proving, having to define what 'correct' even means, and much else etc.


It's impossible to solve with LLMs. You keep adding RAGs until you're back at a non-LLM implementation. LLMs are probabilistic. Most programs, including SQL, need to be deterministic to provide value.


To paraphrase Upton Sinclair, it is difficult to get a man to care about something, when his salary depends on his not caring about it.

A lot of money has poured into AI, money potentially well in excess of the return on investment over the next several years. The field, from investors to CEOs and downwards to developers, is in a state of collective suspension of disbelief. There is going to be a lot of people out of work when reality reasserts itself.


It's not apparent if you're not already an expert in the domain you're querying, so users trust its answers, especially because it's delivered with an air of confidence (until you challenge it).

Unfortunately that's good enough for a lot of people, especially when you don't actually care and just need an output to give to someone else (office jobs etc).


There are use cases where it doesn't matter, i.e. creative writing. Additionally, I don't think AI engineers have even figured out the path for LLMs to be hallucination free and extremely accurate. It's better to ship something that is not perfect (or even not great) now, and that way the industry gains experience and the tools slowly but surely get better.


This is a false line of thinking. Karpathy say's hallucinating is what makes LLMs special. LLMs are way more like compression with a mix of hallucination rather than anything else.


It's because AI is useful enough despite its current limitations.

Developers work with what we have on the table, not what we may have years later.


It depends on how you look at it. The creative process isn't just a piece of code. It usually involves trying, tweaking, testing, and tuning before an optimal solution is reached. In both the real world and with software development, achieving a perfect result in a single shot is more the exception than the rule.


"Show me the incentive, and I'll show you the outcome."

There is a belief, cynical or otherwise, that AI will make (a very small number of) people extraordinarily wealthy. The drive to stuff it into every facet of the digital experience reflects this belief.


I’ll be honest, every day people tell me the magical things all these new AI tools can do and I try them and usually find the results useless.

Every AI chatbot I’ve ever interacted with has been unable to help me. The things I’ve had them write do usually pass the Turing Test, but are rarely even close to as good as what I could write myself. (I admit, being self-employed for a long time, I can just avoid a lot of busy work that many people cannot, so I may be missing lots of great use cases there. I never find myself having to write something that isn’t great and wanting to just get it over with. AI might be great if you do. )

I’ve been trying to use image/video creation to do lots of other things and I’ve not even come close to getting anything usable.

I appreciate certain things (ability to summarize, great voice to text transcription, etc.) but find a lot of it to be not very useful and overhyped in its current form.


(1) Some problems are probabilistic either in theory or practice. For instance there could be a sentiment analysis problem where the state of the art was 67% accuracy 5 years ago and with an LLM it is easy to get 85% accuracy. 100% accuracy is going to be impossible anyway because sometimes you really can't figure how somebody feels.

(2) It's a big topic that could be addressed in different ways but I'll boil it down to "people are sloppy" and that many people become uncomfortable with complex problems that have high stakes answers and will trade correctness for good vibes.

(3) LLMs are good at seducing people. To take an example, I know that I was born the same day as a famous baseball player who was also born exactly a year before an even more famous cricket player. I tried to get Microsoft's Copilot to recognize this situation but it struggled, thinking they were born on the same day or a day apart rather than a whole year. Once I laid it out explicitly and my own personal connection it had effusive praise and said I must be really happy to be connected to some sports legends like that, which I am. That kind of praise works on people.

(4) A lot of people think that fixing LLMs is going to be easy. For instance I'll point out that Copilot is completely unable to put items in orders that aren't excessively easy (like US states in reverse alphabetical order) and others will point out that Copilot could just write a Python program that does the sorting.

That's right and it is part of the answer, but it just puts off the problem. What's really irksome about Copilot's inability to sort is that it doesn't know that it can't sort, if you ask it what the probability is that it will sort a list in the right order it will tell you that it is very high. It's not so easy to know what is possible in terms of algorithms either, see

https://en.wikipedia.org/wiki/Collatz_conjecture

as evidence that it's (practically) impossible to completely understand very simple programs. See the book

https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach

for interesting meditations on what a chatbot can and can't do. My take is that LLMs as we know them will reach an asymptote and not improve explosively with more investment, but who knows?


You'd consider it a bug because you can fix a bug. Doesn't seem like the tech's there yet, but it's still good enough to be useful.


Honestly I find LLMs to be a great tool, when using them right, and with the sufficient skills to know when they’re wrong. And for some problems, you don’t need a 100% right answer.

Earlier today I asked ChatGPT to give me a Go script to parse a Go codebase (making heavy use of Go AST libraries which I never use normally) and it gave me a 90% good solution which saved me a lot of time. To be clear the solution was non functional on its own, but it still saved me from doing exploration work and gave me a quick overview of the APIs I would need.

A few days ago it helped me generate code for some obscure AWS API using aws-sdk-go-v2. It was again almost fully working, and better than the examples I could find online.

I have examples like this every week. It’s not as amazing as some people say, but still pretty useful. I rejected AI stuff at first but don’t regret adding LLMs to my toolbelt.


In my usage it's competing with Google searches often those bring up a lot of nonsense. You have to filter it.


Another question, why does nobody care about the enormous and absurd energy cost to train and run models?


Line go up. The only three words that matter nowadays.


Unless we’re talking about solar’s cost per watt, then line go down. Down down down.


Not at all. That energy could have been used to cool someone’s house, filter water, charge a car, but instead it’s being used to write an esaay for someone’s homework while coal is being burned.


>why does nobody care about the enormous and absurd energy cost to ___________

Just fill in the blank and you've described human history after we started digging up coal in mass.

Now, beyond that, think of it this way...

You're very rich, but to keep having new yachts and pleasure islands and jumbo jets you need for the plebs around you to keep breeding and then spending 18 years to train them not to be completely stupid, then another 4 to 10 years for them to be experts, all while hoping they don't get hit by a car or off themselves. That's a massive amount of resource expenditures if you're looking to make yourself even richer. Now think of it this way. Instead of spending a lot of effort training those meat popsicles, you train a machine. Yea, it takes a lot of time, effort, and energy to get it where it needs to go. But after you have this 'general' machine, you never need other humans again. How much energy will that save you on your goal of world domination?


If the marginal utility AI is less than the costs of the energy needed to use it then presumably capitalism will solve this problem?


Does anyone know why this post no longer appears on HN.

It hasn't been flagged.


Because point often is not in providing a working solution. Instead it is selling just a solution. Just look at so many software projects in history. Did they provide correctly working solution? Or did they generate lot of billable work?


I didn't care too much at first because it seemed that the rate of improvement was sufficient to cover a multitude of sins. But I'm starting to, because it is becoming clear that progress hit an absolute brick wall with GPT4. If anything it has has gone backward since then.

Just today, ChatGPT4o screwed up a rudimentary arithmetic problem ( https://i.imgur.com/2jNXPBF.png ) that I'd swear the previous GPT4 model would have gotten right.

And then there's this shitshow: https://news.ycombinator.com/item?id=40894167 Which is still happening as of this morning, only now all my previous history is gone. Nothing left but links to other peoples' chats. If someone at OpenAI still cares what they are doing, it's not obvious.


I think this is the community that's an aberration. More often I see that "AI" has become synonymous with hallucinations and slop.


Blind optimism


Are you saying the emperor has no clothes!?


Everyone I know that doesn't work in tech is disappointed with AI assistants blurting out wrong answers. Investors will catch on soon.


"It is difficult to get a man to understand something when his NVIDIA shares depend upon his not understanding it"


IMHO because over the last two decades we have become so accustomed to lies and brokenness being the standard that it just doesn't matter to most people.

Move fast and break things, and don't pay anyone, but when you do that long enough and burn billions in VC money, you end up rich. Why does that work?

Why can someone like Trump lie and lie and lie and be convicted for felonies and turn up on the worst people list and nobody seems to care?

There are no more consequences. You break software, people don't care if it's the only thing available in the walled garden. You fuck up games, people don't care if you shove a TB worth of updates down their pipes later. You rugpull millions of dollars and walk out unscathed, as long as someone made a profit they will keep praising you.

You used to be actually shunned and driven out of the village for shit behavior. Not anymore. We find all kinds of ways to justify being terrible at stuff.

So along comes tech that costs us barely anything to use and produces meh results most of the time. That's amazing. It used to take thousands of talentless hacks to come up with all that mediocre wrong shit and they all wanted a paycheck. It's progress in a world where nothing means anything anymore.


It’s the grift. Doesn’t matter. Just slap AI on ANYTHING as fast as you can and hopefully no one will notice and hopefully someone else will have fixed it. Oh did I mention AI!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: