Hacker Newsnew | past | comments | ask | show | jobs | submit | overgard's commentslogin

"Clever" is not a word you want to hear in front of accounting..

Definitely agree with this. Even without a large backlog, one of the things I find working on my personal project/product where I'm simultaneously the engineer/designer/project manager is it's really easy to ask the LLM to implement an idea I've been mulling for an hour or two, it one-shots it and I'm happy, and then a week or two later it starts to dawn on me that the feature was maybe not a great idea. Which isn't on the LLM, but I know when I write features by hand I tend to reach the "this is a bad idea" conclusion a lot earlier because I'm directly confronted with the cases where it won't work out, I have to think a lot harder about corner cases, etc.

Where I think/hope this goes is instead of using LLMs to go faster, we use them to do better work. I'd rather someone vibe code up better ways to test things, or use it to do in-depth code analysis and bug fixing, etc., then just pile in features.


The good thing is since the feature was cheap to implement, you can just say "this was a bad idea" and remove it, as long as adding that feature wasn't a one way decision. People are typically more reticent to remove things that were hard to implement, even if that's the right thing to do.

That's the problem, even with an LLM, removing a feature two weeks later can be a nightmare because things have grown to depend on it. In a way it's even harder because the velocity of stuff piling in is much greater.

> you can just say "this was a bad idea" and remove it

Have you ever actually done that in a "serious" product, or just made it up?

In any product with actual customers, especially those from other (big) companies, features don't just go away at the snap of a finger. Otherwise, have fun discovering users moving to your competitors.

Anecdotally, a product from another company had removed or made significant changes to important features our users rely on every day multiple times with short notice, several times. We didn't hesitate to migrate to another service, which completed within about a month.


> People are typically more reticent to remove things that were hard to implement, even if that's the right thing to do.

Careful. The sunk cost fallacy isn't just about time, it's also about money, and people may naturally be reluctant to remove bad features that cost them a lot of tokens, especially if the act of removal itself is going to cost even more tokens.


That's a pretty good point, and I assume at $work they wouldn't appreciate throwing away $n dollars worth of code.

"my personal project"

Sure.


That’s why I usually starts with the simplest version of something I want, which is usually a shell or a perl script. When I nailed down my workflows and needs, that’s when I build a proper program. This is one of the reason I like cli programs. They’re like lego blocks for workflows.

Same things with bigger projects. Write the most minimalistic version of a feature, and then ship it (or do a round of testing). Iterate based on feedback.


I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.

Pretty good take. I don't really get the feelings of anxiety, but sometimes I'm working and I'm like I'm flying this is so fast! And then everything comes crashing down when I can't figure out one last bug.

I felt the same way, then I started with "I'll believe it when I see it". Now I'm a bit happier.

I mean, I don't think commits are the place for tool attributions. I want to know what the change was, I'm not really interested in your tool selection (put that in the PR if it's relevant). It'd be just as irrelevant to see "written on my macbook in neovim"

Depends on what the claude attribution actually means. A lot of people will just get the thing building and then ship. To me that attribution is generally a red flag.

It means “this contribution likely infringes someone else’s copyright.”

FWIW, I asked ChatGPT to review the article just for my amusement. It's conclusion was:

"My honest assessment is that this is a competent calculation performed on a badly confounded measurement, followed by conclusions substantially stronger than the calculation warrants. It is useful as a rebuttal to “the Claude releases are obviously unprecedented disasters,” but not as evidence that Claude was harmless."


The TLDR seems to be: needs more data.

Yeah, my switch to Linux and Mac for most things is more about just finding Microsoft's policies so obnoxious and hostile that I just won't deal with them anymore, even if I have to deal with more technological hassles. The only reason I haven't completely nuked my Windows partition is because I can at least use Rufus to turn off the worst stuff. But frankly, the amount of software that keeps me on Windows is dwindling fast, and every time Windows update resets my browser to fucking Edge or signs me into a Microsoft account system wide without my consent I just get that much closer. It feels like malware at this point.

There will always be shitty linkedin posts.

I just watched copilot today turn a 8 line fix into 500 lines, so, yeah, verbosity is a big side effect

It occurs to me this pattern might be the average code we humans have produced. We all have made those quick fixes, copy-pastas, and dirty hacks... they learned it somewhere! I also assume that some of the behavior is an artifact of their training regime.

So with LLM outputting average code, and people using LLM more and more, I guess the average code will become worse over time ?

There is a belief that everyone is just taking whatever the LLM (really agents now) outputs. This is not the case anywhere I work. We use human oversight to have it iteratively improve the code. The average quality is going up.

Do you have some metrics to back this up ? Because from what I can see with my own eyes, outages everywhere, security holes everywhere too, doesn't seem that things are improving..

Not advocating for AI code slop--but if AI coded software works correctly, maybe it doesn't matter? Except sometimes when a specialist will have to get involved. Not a perfect analogy, but most people don't write assembly these days--they have a compiler do that. Assembly still has a place, but it's a specialist task.

> if AI coded software works correctly, maybe it doesn't matter?

The problem isn't the amount of code, it's how fitting/unfitting the abstractions are. Wrong abstractions are bugs in waiting. If there's much code with wrong abstractions, future change becomes difficult.

Source: me, I've created many bad abstractions and they led to much pain...


Yeah. Its kind of strange - claude is great at some tasks, but it seems really rubbish at coming up with good abstractions a lot of the time. I've often caught it making a conceptual mistake (like "X cannot do Y") - then spending hundreds of lines working around an issue that doesn't actually exist.

Its also really bad at inventing and leaning on invariants. I make rules in my code all the time - "by the time we get to path X, we know Y and Z are true.". In aggregate, these invariants make code simpler and easier to reason about. But claude doesn't do that. It just kind of - slops through and adds bespoke "just in case" workarounds all over the place. Every time I read through code its written - without fail - I find bad design / architectural choices.

Maybe mythos will change this. But for now I've slowed way down on my claude code usage. You can't build a skyscraper on a foundation of mud.


Yeah, I have a contract project for a webapp/integration to legacy Excel tool with an API endpoint for exchanging data with Excel. Over time, I notice issues or need to add functionality in the data processing and hadn't been closely watching the code changes Claude Code made to the API as long as it worked as expected/tests passed.

When I eventually read through the current state of the upload processing code it was like an absurd tree of checks on checks on fallbacks on triple checks added in response to whatever bug I reported in a bizarrely additive way and could be massively simplified (which would also make it less brittle to edge cases that then demanded more checks and workarounds).

The other issue is that for the upload API, there is documentation but not for every little bug or edge case so each time the model "wakes up" and loads everything into context it sees that crazy web of checks and edge cases as the only source of truth for the API so is hesitant to touch anything unless 100% necessary which then leads to more conservative behavior of additive code which makes the problem worse over time.

Codex seems a bit better but I still have to guide it towards proper abstractions/refactors to avoid that piling on cruft effect.


More verbose code takes up more space in the context. It's harder for humans to review, but also harder for future AIs to edit. Unless you manage to keep the AI to firm module boundaries & have it replace modules wholesale it's not really equivalent to how assembly gets replaced wholesale when a compilation unit changes. Compilers aren't editing the `.o` files when you rebuild, they throw the old ones out & replace them. But when you prompt an AI it is reading & editing the source files, so excess verbosity in the source files is detrimental.

Well, if tokens = cost, and verbosity = more tokens, then smaller code is a financial (and human!) win. Although I'm worried vibe coders are just going to have LLMs modify minified code in caveman mode so they can have 100 agents in a swarm..

On a more serious note, I wonder if this might eventually encourage people to use languages that are a little harder to write but much more concise (functional languages for instance). When you're paying per-token enterprise bean java style verbosity totally sucks


But the truth is: it doesn't work correctly. I see quality of software dropped significantly.

At work we are integrating with third party platform to automate excel-powered calculations. It is awful. Rendering the table in browser takes 10s or one click on Export button will throw backend in OutOfMemory state.


Ai mirrors the code around it. So if there is bad code or good abstractions, it's going to do the same. Even with good code, it will do bad things, you have to remain in the loop and catch these. It can write good code, it just needs nudging.

I don't disagree there is a lot of slop being produced right now, but I'm still optimistic in the long-run.


In my case, where I see it most often is when the LLM has to rework something multiple times, and the feedback loop is vague (especially when all I have to give it is "no error messages, but it's still broken"). It seems like after the third or fourth try it just kinda goes off the rails. I find that the one-shot quality tends to be a little better, if the slot machine happened to work correctly that time.

You shouldn't be using an LLM directly (web chat style). A proper harness allows an agent to see the errors itself and correct as needed. You can the correct it at higher, more meaningful levels.

If you can make it 800 you can claim to be a 100x engineer!

Missed opportunity! I obviously have skill issues.

Real question is, are you a 100x prompter?

I haven't tried to make a TUI admittedly, but double buffering is the oldest technique on the planet. A TUI doesn't even need to pay the cost of a lot of pixels since its effective resolution is much lower

Long long time ago, I used to do some graphics stuff in 320x240, which uses a whopping 64KB per buffer, and still has more resolution than a terminal.

In 1GB I could probably fit all the buffers to double-buffer all the TUIs in a whole country. Well, maybe not. But it's likely not that far off.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: