> I don't understand how anybody expects to convince LLM users this doesn't work...

dml2135 · 2025-05-06T01:17:12 1746494232

> I am still trying to sort out why experiences are so divergent. I've had much more positive LLM experiences while coding than many other people seem to, even as someone who's deeply skeptical of what's being promised about them. I don't know how to reconcile the two.

I am also trying to sort this out, but I'm probably someone you'd consider to be on the other, "anti-LLM" side.

I wonder if part of this is simply level of patience, or, similarly, having a work environment that's chill enough to allow for experimentation?

From my admittedly short attempts to use agentic coding so far, I usually give up pretty quickly because I experience, as others in the thread described, the agent just spinning its wheels or going off and mangling the codebase like a lawnmower.

Now, I could totally see a scenario where if I spent time tweaking prompts, writing rule files, and experimenting with different models, I could improve that output significantly. But this is being sold to me as a productivity tool. I've got code to write, and I'm pretty sure I can write it fairly quickly myself, and I simply don't have time at my start up to muck around with babysitting an AI all day -- I have human junior engineers that need babysitting.

I feel like I need to be a lot more inspired that the current models can actually improve my productivity in order to spend the time required to get there. Maybe that's a chicken-or-egg problem, but that's how it is.

steveklabnik · 2025-05-06T16:38:48 1746549528

> I'm probably someone you'd consider to be on the other, "anti-LLM" side.

I think if you're trying stuff, you're not, otherwise, you wouldn't even use them. What I'd say is more that you're having a bad time, whereas I'm not.

> I wonder if part of this is simply level of patience, or, similarly, having a work environment that's chill enough to allow for experimentation?

Maybe? I don't feel like I've had to had a ton of patience. But maybe I'm just discounting that, or chiller or something, as you allude to.

namaria · 2025-05-06T08:54:59 1746521699

> Now, I could totally see a scenario where if I spent time tweaking prompts, writing rule files, and experimenting with different models, I could improve that output significantly.

I think this is it. Some people are willing to invest the time in writing natural language code for the LLM.

> I spent time tweaking prompts, writing rule files, and experimenting with different models, I could improve that output significantly. But this is being sold to me as a productivity tool. I've got code to write, and I'm pretty sure I can write it fairly quickly myself, and I simply don't have time at my start up to muck around with babysitting an AI all day -- I have human junior engineers that need babysitting.

I agree and this is the divide I think: skeptical people think this is a flimsy patch that will eventually collapse. I for one can't see how trying to maintain ever growing files in natural language won't lead to a huge cognitive load quite soon and I bet we're about to hear people discussing how to use LLMs to do that.

magicalist · 2025-05-05T21:04:34 1746479074

> This is really one of the hugest divides I've seen in the discourse about this: anti-LLM people saying very obviously untrue things, which is uh, kind of hilarious in a meta way.

> https://bsky.app/profile/caseynewton.bsky.social/post/3lo4td... is an instance of this from a few days ago.

Not sure why this is so surprising? ChatGPT search was only released in November last year, was a different mode, and it sucked. Search in o3 and o4-mini came out like three weeks ago. Otherwise you were using completely different products from Perplexity or Kagi, which aren't widespread yet.

Casey Newton even half acknowledges that timing ("But it has had integrated web search since last year"...even while in the next comment criticising criticisms using the things "you half-remember from when ChatGPT launched in 2022").

If you give the original poster the benefit of the doubt, you can sort of see what they're saying, too. An LLM, on its own, is not a search engine and can not scan the web for information. The information encoded in them might be ok, but is not complete, and does not encompass the full body of the published human thought it was trained on. Trusting an offline LLM with an informational search is sometimes a really bad idea ("who are all the presidents that did X").

The fact that they're incorrect when they say that LLM's can't trigger search doesn't seem that "hilarious" to me, at least. The OP post maybe should have been less strident, but it also seems like a really bad idea to gatekeep anybody wanting to weigh in on something if their knowledge of product roadmaps is more than six months out of date (which I guarantee is all of us for at least some subject we are invested in).

steveklabnik · 2025-05-05T21:31:56 1746480716

> ChatGPT search was only released in November last year

It is entirely possible that I simply got involved at a particular moment that was crazy lucky: it's only been a couple of weeks. I don't closely keep up with when things are released, I had just asked ChatGPT something where it did a web search, and then immediately read a "it cannot do search" claim right after.

> An LLM, on its own, is not a search engine and can not scan the web for information.

In a narrow sense, this is true, but that's not the claim: the claim is "You cannot use it as a search engine, or as a substitute for searching." That is pretty demonstrably incorrect, given that many people use it as such.

> Trusting an offline LLM with an informational search is sometimes a really bad idea ("who are all the presidents that did X").

I fully agree with this, but it's also the case with search engines. They also do not always "encompass the fully body of the published human thought" either, or always provide answers that are comprehensible.

I recently was looking for examples of accomplishing things with a certain software architecture. I did a bunch of searches, which led me to a bunch of StackOverflow and blog posts. Virtually all of those posts gave vague examples which did not really answer my question with anything other than platitudes. I decided to ask ChatGPT about it instead. It was able to not only answer my question in depth, but provide specific examples, tailored to my questions, which the previous hours of reading search results had not afforded me. I was further able to interrogate it about various tradeoffs. It was legitimately more useful than a search engine.

Of course, sometimes it is not that good, and a web search wins. That's fine too. But suggesting that it's never useful for a task is just contrary to my actual experience.

> The fact that they're incorrect when they say that LLM's can't trigger search doesn't seem that "hilarious" to me, at least.

It's not them, it's the overall state of the discourse. I find it ironic that the fallibility of LLMs is used to suggest they're worthless compared to a human, when humans are also fallible. OP did not directly say this, but others often do, and it's the combination that's amusing to me.

It's also frustrating to me, because it feels impossible to have reasonable discussions about this topic. It's full of enthusiastic cheerleaders that misrepresent what these things can do, and enthusiastic haters that misrepresent what these things can do. My own feelings are all over the map here, but it feels impossible to have reasonable discussions about it due to the polarization, and I find that frustrating.

AlexCoventry · 2025-05-05T22:20:07 1746483607

If you've only been using AI for a couple of weeks, that's quite likely a factor. AI services have been improving incredibly quickly, and many people have a bad impression of the whole field from a time when it was super promising, but basically unusable. I was pretty dismissive until a couple of months ago, myself.

I think the other reason people are hostile to the field is that they're scared it's going to make them economically redundant, because a tsunami of cheap, skilled labor is now towering over us. It's loss-aversion bias, basically. Many people are more focused on that risk than on the amazing things we're able to do with all that labor.

foobarqux · 2025-05-05T21:39:11 1746481151

These are mostly value judgments and people are using words that mean different things to different people but I would point that LLM boosters have been saying the same thing for each product release: "now it works, you are just using the last-gen model/technique which doesn't really work (even though I said the same thing for that model/technique and every one before that). Moreover there still hasn't been significant, objectively observable impact: no explosion in products, no massive acceleration of feature releases, no major layoffs attributed to AI (to which the response every time is that it was just released and you will see the effects in a few months).

Finally, if it really were really true that some people know the special sauce of how to use LLMs to make a massive difference in productivity but many people didn't know how to do that then you could make millions or tens of millions per year as a consultant training everyone at big companies. In other words if you really believed what you were saying you should pick up the money on the ground.

steveklabnik · 2025-05-05T21:42:59 1746481379

> using words that mean different things to different people

This might be a good explanation for the disconnect!

> I would point that LLM boosters have been saying the same thing

I certainly 100% agree that lots of LLM boosters are way over-selling what they can accomplish as well.

> In other words if you really believed what you were saying you should pick up the money on the ground.

I mean, I'm doing that in the sense that I am using them. I also am not saying that I "know the special sauce of how to use LLMs to make a massive difference in productivity," but what I will say is, my productivity is genuinely higher with LLM assistance than without. I don't necessarily believe that means it's replicable, one of the things I'm curious about it "is it something special about my setup or what I'm doing or the technologies I'm using or anything else that makes me have a good time with this stuff when other smart people seem to only have a bad time?" Because I don't think that the detractors are just lying. But there is a clear disconnect, and I don't know why.

foobarqux · 2025-05-05T21:58:43 1746482323

There is so much tacit understanding by both LLM-boosters and LLM-skeptics that only becomes apparent when you look at the explicit details of how they are trying to use the tools. That's why I've asked in the past for examples of recording of real-time development that would capture all the nuance explicitly. Cherry-picked chat logs are second best but even then I haven't been particularly impressed by the few examples I've seen.

> I mean, I'm doing that in the sense that I am using them.

My point is whatever you are doing is worth millions of dollars less than teaching the non-believers how to do it if you could figure out how (actually probably even if you couldn't but sold snake-oil).

diggan · 2025-05-05T20:07:46 1746475666

> I am still trying to sort out why experiences are so divergent. I've had much more positive LLM experiences while coding than many other people seem to, even as someone who's deeply skeptical of what's being promised about them. I don't know how to reconcile the two.

As with many topics, I feel like you can divide people in a couple of groups. You have people who try it, have their mind blown by it, so they over-hype it. Then the polar-opposite, people who are overly dismissive and cement themselves into a really defensive position. Both groups are relatively annoying, inaccurate, and too extremist. Then another group of people might try it out, find some value, integrate it somewhat and maybe got a little productivity-boost and moves on with their day. Then a bunch of other groupings in-between.

Problem is that the people in the middle tend to not make a lot of noise about it, and the extremists (on both ends) tend to be very vocal about their preference, in their ways. So you end up perceiving something as very polarizing. There are many accurate and true drawbacks with LLMs as well, but it also ends up poisoning the entire concept/conversation/ecosystem for some people, and they tend to be noisy as well.

Then the whole experience depends a lot on your setup, how you use it, what you expect, what you've learned and so many much more, and some folks are very quick to judge a whole ecosystem without giving parts of it an honest try. It took me a long time to try Aider, Cursor and others, and even now after I've tried them out, I feel like there are probably better ways to use this new category of tooling we have available.

In the end I think reality is a bit less black/white for most folks, common sentiment I see and hear is that LLMs are probably not hellfire ending humanity nor is it digital-Jesus coming to save us all.

steveklabnik · 2025-05-05T20:16:55 1746476215

> I feel like you can divide people in a couple of groups.

This is probably a big chunk of it. I was pretty anti-LLM until recently, when I joked that I wanted to become an informed hater, so I spent some more time with things. It's put me significantly more in the middle than either extremely pro or extremely anti. It's also hard to talk about anything that's not purely anti in the spaces I seemingly run in, so that also contributes to my relative quiet about it. I'm sure others are in a similar boat.

> for most folks, common sentiment I see and hear is that LLMs are probably not hellfire ending humanity nor is it digital-Jesus coming to save us all.

Especially around non-programmers, this is the vibe I get as well. They also tend to see the inaccuracies as much less significant than programmers seem to, that is, they assume they're checking the output already, or see it as a starting point, or that humans also make mistakes, and so don't get so immediately "this is useless" about it.

stefan_ · 2025-05-05T21:08:55 1746479335

> anti-LLM people saying very obviously untrue things, which is uh, kind of hilarious in a meta way.

tptacek shifted the goal posts from "correct a hallucination" to "solve a copy pasted error" (very different things!) and just a comment later theres someone assassinating me as an "anti-LLM person" saying "very obviously untrue things", "kind of hilarious". And you call yourself "charitable". It's a joke.

steveklabnik · 2025-05-05T21:21:55 1746480115

EDIT: wait, I think you're tptacek's parent. I was not talking about your post, I was talking about the post I linked to. I'm leaving my reply here but there's some serious confusion going on.

> theres someone assassinating me as an "anti-LLM person"

Is this not true? That's the vibe the comment gives off. I'm happy to not say that in the future if that's not correct, and if so, additionally, I apologize.

I myself was pretty anti-LLM until the last month or so. My opinions have shifted recently, and I've been trying to sort through my feelings about it. I'm not entirely enthusiastically pro, and have some pretty big reservations myself, but I'm more in the middle than where I was previously, which was firmly anti.

> "very obviously untrue things"

At the time I saw the post, I had just tabbed away from a ChatGPT session where it had relied on searching the web for some info, so the contrast was very stark.

> "kind of hilarious"

I do think that when people say that LLMs occasionally hallucinate things, and are therefore worthless, when others make false claims about them for the purpose of suggesting we shouldn't use them, that it is kind of funny. You didn't directly say this in your post, only handwaved towards it, but I'm talking about the discourse in general, not you specifically.

> And you call yourself "charitable"

I am trying to be charitable. A lot of people reached for some variant of "this person is stupid," and I do not think that's the case, or the good way to understand what people mean when they say things. A mistake is a mistake. I am actively not trying to simply dismiss arguments on either side of here, but take them seriously.

Bjartr · 2025-05-06T12:15:49 1746533749

> I am still trying to sort out why experiences are so divergent

I suspect part of it is that there still isn't much established social context for how to interact with an LLM, and best practices are still being actively discovered, at least compared to tools like search engines or word processors.

Search engines somewhat have this problem, but there's some social context around search engine skill, colloquially "google-fu", if it's even explicitly mentioned.

At some point, being able to get the results from a search engine stopped being entirely about the quality of the engine and instead became more about the skill of the user.

I imagine, as the UX for AI systems stabilizes, and as knowledge of the "right way", to use them diffuses through culture, experiences will be less divergent.

thewebguyd · 2025-05-06T16:28:15 1746548895

> I suspect part of it is that there still isn't much established social context for how to interact with an LLM, and best practices are still being actively discovered, at least compared to tools like search engines or word processors.

Likely, but I think another big reason for diverging experience is that natural language is ambiguous and human conversation leaves out a lot of explicit details because it can be inferred or assumed when using natural language.

I can't speak for others, but it's difficult for me to describe programming ideas and concepts using natural language - but, that's why we have programming languages for this. A language that is limited in vocabulary, and explicit in conveying your meaning.

Natural language is anything but, and it can be difficult to be exact. You can instinctively leave out all kinds of details using natural language, whereas leaving out those details in a programming language would cause a compiler error.

I've never really understood the push toward programming with natural language, even before LLMs. It's just not a good fit. And much like how you an pass specific parameters into Google, I think we'll end up getting to a place where LLMs have their own DSL for prompting to make it easier to get the result you want.

daxfohl · 2025-05-05T21:17:37 1746479857

So is the real engineering work in the agents rather than in the LLM itself then? Or do they have to be paired together correctly? How do you go about choosing an LLM/agent pair efficiently?

steveklabnik · 2025-05-05T21:34:20 1746480860

> How do you go about choosing an LLM/agent pair efficiently?

I googled "how do I use ai with VS: Code" and it pointed me at Cline. I've then swapped between their various backends, and just played around with it. I'm still far too new to this to have strong options about LLM/agent pairs, or even largely between which LLMs, other than "the free ChatGPT agent was far worse than the $20/month one at the task I threw it at." As in, choosing worse algorithms that are less idiomatic for the exact same task.

daxfohl · 2025-05-05T22:00:01 1746482401

I also wonder how hard it would be to create your own agent that remembers your preferences and other stuff that you can make sure stays in the LLM context.

...Maybe a good first LLM assisted project.

dcre · 2025-05-05T22:02:52 1746482572

No need to write your own whole thing (though it is a good exercise) — the existing tools all support ways of customizing the prompting with preferences and conventions, whether globally or per-project.

dcre · 2025-05-05T22:00:30 1746482430

The Aider leaderboards are quite helpful, performance there seems to match people's subjective experience pretty well.

https://aider.chat/docs/leaderboards/

Regarding choosing a tool, they're pretty lightweight to try and they're all converging in structure and capabilities anyway.

zoogeny · 2025-05-05T19:58:24 1746475104

I think it is pretty simple: people tried it a few times a few months ago in a limited setting, formed an opinion based on those limited experiences and cannot imagine a world where they are wrong.

That might sound snarky, but it probably works out for people in 99% of cases. AI and LLMs are advancing at a pace that is so different from any other technology that people aren't yet trained to re-evaluate their assumptions at the high rate necessary to form accurate new opinions. There are too many tools coming (and going, to be fair).

HN (and certain parts of other social media) is a bubble of early adopters. We're on the front lines seeing the war in realtime and shaking our heads at what's being reported in the papers back home.

steveklabnik · 2025-05-05T20:13:35 1746476015

Yeah, I try to stay away from reaching for these sorts of explanations, because it feels uncharitable. I saw a lot of very smart people repost the quoted post! They're not the kind who "cannot imagine a world where they are wrong."

But at the same time, the pace of advancement is very fast, and so not having recently re-evaluated things is significantly more likely while also being more charitable, I think.

zoogeny · 2025-05-05T20:25:51 1746476751

My language is inflammatory for certain, but I believe it is true. I don't think most minds are capable of having to reevaluate their opinions as quickly as AI is demanding. There is some evidence that stress is strongly correlated to uncertainty. AI is complicated, the tools are complicated, the trade-off are complicated. So that leaves a few options: live in uncertainty/stress, expend the energy to reevaluate or choose to believe in certainty based on past experience.

If someone is embracing uncertainty or expending the time/energy/money to reevaluate then they don't post such confidently wrong ideas on social media.

aerhardt · 2025-05-05T20:08:36 1746475716

> an instance of this from a few days ago.

Bro I've been using LLMs for search since before it even had search capabilities...

"LLMs not being for search" has been an argument from the naysayers for a while now, but very often when I use an LLM I am looking for the answer to something - if that isn't [information] search, then what is?

Whether they hallucinate or outright bullshit sometimes is immaterial. For many information retrieval tasks they are infinitely better than Google and have been since GPT3.

steveklabnik · 2025-05-05T20:11:48 1746475908

I think this is related, but I'm more interested in the factual aspects than the subjective ones. That is, I don't disagree there's also arguments over "are LLMs good for the same things search engines for" but it's more of the more objective "they do not search the web" part. We need to have agreement on the objective aspects before we can have meaningful discussion of the subjective, in my mind.