Hacker News new | past | comments | ask | show | jobs | submit login
When AI promises speed but delivers debugging hell (nsavage.substack.com)
200 points by nsavage 2 days ago | hide | past | favorite | 252 comments





Coding is trying to order bytes into doing arbitrary stuff that is useful because of some transient conjunction of factors in the real world.

We have developed programming languages because coding in machine language is horrible, and over the decades we've refined them into tools people can use fluently and just directly think in code when they have to make a computer system behave in a certain way.

Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.


The hype is around having AI replace your typing, that is, to code for you.

The hype should not be around replacing the typing, but in assisting your thoughts.

When you code, there's the dialog in your brain which thinks about the code and also creates the questions which you know you must answer in order to then transition to the dialog with the machine, that is, to type code.

And in this first part LLMs can be extremely useful, which will come to the point where you select a line, then explain your intent, and while the AI retrieves documentation and possible solutions, you can reason about the problem and then pick and choose from what the AI has collected for you.

> Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.

The question is then if assistants are "sitting next to you", like a secretary and a mentor, or if they are sitting between you and the editor, as the thing you need to control.

An assistant can be a really effective refinement in the programming process. Even so far that it ends up motivating you instead of you constantly getting demotivated due to hitting the wall of "not another problem that I need to solve before I can really continue" (which happens all to often).


Personally, I haven’t found LLMs to be helpful for the internal dialogue. Even with a lot of exposition, code samples, and documentations, it always provides either obvious solutions (store items in a vector!), pointless modifications (use map and filter instead of reduce!), or it just makes up APIs that don’t exist.

I think it’s really good with the 101-level academics side. Learning the basics of anything through a conversational manner can be massively helpful.

As soon as your situation exceeds textbook level, I’ve found them to always be a waste of my time, and nothing I’ve seen as of late makes me think they’re trending in a direction to be helpful in this scenario


> As soon as your situation exceeds textbook level, I’ve found them to always be a waste of my time

Doubly so. A very knowledgeable and helpful tutor would say ”you’re asking for advanced or detailed guidance. I don’t know the specifics, but I can point you to some places that you’d be able to find these answers on your own”.

What the AI does is continue its babbling confidently while being incredibly wrong and non-sensical, like a person suffering a stroke or concussion where the mannerisms are normal but they don’t remember their name. It seems completely unable to judge its own knowledge (probably because it is).


Yes, it would have to be unable to reason about its own confidence levels, wouldn’t it? It produces content—and, as far as I understand, sort of simulates basic reasoning—by making predictions based on a huge corpus of text. The larger this corpus becomes, and the more sophisticated its method of analyzing it, the better it becomes at “reasoning about” the things described in its corpus.

But the question “How sure are you?” inherently refers to something—the LLM’s “mental state”, if such a thing can be said to exist—that isn’t referred to anywhere in the corpus. No improvement in the quality of the corpus or the power of the predictions made based on the corpus can have any impact on this problem.


I want LLMs to answer simple, but tedious questions that arise when I do the thinking. I want them to help me find relevant sections in multi-thousand page datasheets regardless of whether I happened to use the same synonym that documentation's author has used. I want them to remind me the meaning of a term that was defined 12 chapters ago without me having to context switch and look for it again. I want them to consolidate information spread over 6 PDFs that I need to look at to understand something.

I want them to be an interface between me and reliable resources. I want them to essentially facilitate ad-hoc fact retrieval without requiring me to master field-specific jargon first. I don't need them to do any thinking, that's my job. I don't want them to try to answer my questions, I want them to point me to resources that let me answer them and save me the time on searching them in the process. You don't need to know where to look beforehand if you're a machine that can ingest whole libraries in seconds - so let me actually benefit from this power rather than try to provide me with a sketchy equivalent of a clueless intern trying to make a good impression and not realizing how tremendously bad they're at it.

I believe LLMs (or, more specifically, complex systems utilizing LLMs) can end up being incredible productivity boosters, but right now they're being so hopelessly misapplied that it will take a good while for them to get there. LLMs can already be somewhat helpful if you approach them carefully, but they're still far from life-changing - unless your life can be changed by reducing the amount of boilerplate you need to type to code, then I guess you're already happier.


Where I find coding assistants the most useful _is_ in writing code that I already want to write.

Ala - I need to write this unit test, it has these checks, it validates these methods.

Or write a log message for me about what error got encountered here. Those are annoying to write out, but often the llm has enough context that I just start to write and it completes it appropriately.

All of these are things I can easily do myself, are easy to validate correctness, but if I were to write them would consume my limited mental energy for the day.


Sounds like everyone have different use cases for LLMs, which makes sense, we need help with different things.

Personally, I try to get rid of writing boilerplate (which it sounds like you're talking about?) fully, no one should write that.

Instead I get help from LLMs to deal with areas where I'm not super great, like math. So I know what results I want, but I'm not sure how to get them. So setup a bunch of tests and the interface I want, then let the LLM figure out how to do the implementation itself.


I find it is good at writing about 90% of tests, either from requirements or existing code.

It is terrible once you get 3 or 4 tweaks into main parts of the code base. Just like autocorrect degrades over time.


> Where I find coding assistants the most useful _is_ in writing code that I already want to write.

... for a very specific definition of "code that I already want to write". :-)

The code that I want to write often involves very novel (and sometimes quite mathematical) concepts. Or code that is "low-level wizardry". Thus the AI nearly by definition will commonly fail with the tasks.

Also, this kind of code often (though not always) has the property that "if you do something wrong, everything falls apart", i.e. it is the kind of problem that has a low tolerance concerning any "AI hallucinations", meaning the code is either (nearly) perfectly right, or not helpful. Debugging any wrong code (even if you have testcases available) will commonly take a very long time.


Not everyone is the same, but I have a different view of programming that what you describe. Most tasks involve thinking about the domain and what implementation techniques to use while trying to reduce the technical debt in the project. For the domain, I talk to people, rely on past experience (mine or others), or do research. For implementation techniques I look at other people’s code, read books, or ask someone more experienced. Both are heavily influenced by the context, aka what already exists in the project and the constraints that I have to deal with. I heavily distrust LLM because it cannot assimilate the context like an experienced person and provide me direction based on experience. Why experience? Because the problem and the constraints always exist in the real world.

The hype in some areas is around replacing coders, which is a fantasy without orders of magnitude better systems.

Yes, that's exactly it. Many corporations firing coders because "AI can do that" and then discovering that AI can't.

It's offshoring 2.0. $developing_country is cheaper so let's hire them, and why is everything broken now?

This fantasy economy is based entirely on Numbers Must Go Up. Everyone seems to have forgotten the numbers are linked to real people in a physical world, and the real people and the physical world both have rules and consequences of their own which don't care about quarterly returns.


> This fantasy economy is based entirely on Numbers Must Go Up.

Everyone with any form of savings wants numbers to go up. Everyone who wants to retire wants numbers to go up. The financial industry is just doing what their customers want, and anyone with net-positive wealth is a customer.

The problem is they can't forever. The Earth is not infinitely large and there is not infinite demand. Even if we did settle the Moon or Mars, it probably wouldn't make a huge difference to the terrestrial economy because of the distances involved. We'd just have founded another economy out there that would follow its own growth and maturation curve.

I think this is why you see such an outright panic right now about birth rates. If people don't have more and more kids, number can't go up. I'm predicting -- to the extent that it's not already here -- the emergence of something I'm calling authoritarian natalism. The government will try to more or less force people to have kids, probably by taking away womens' rights and/or taxing people who are childless. It won't work, but it will be attempted in some places.

Eventually the financial industry as we know it will collapse. It's inevitable. A major political question of the 21st century might be how much pain we inflict on ourselves in an attempt to save it.

I know a lot of people will cheer for that, but it will also likely mean the end of low to zero risk compound interest on savings and the end of retirement among other things. I don't expect retirement as an institution or widespread practice to survive much longer than 20-30 more years.


> It's offshoring 2.0

Doesn’t this analogy break down inasmuch as offshoring eventually worked?


T-shirts are made in Bangladesh while code is still being made by extremely well compensated professionals in developed countries. That's despite code being easier to transport internationally than clothes. It's hard to say that it's worked yet.

> while code is still being made by extremely well compensated professionals in developed countries

T-shirts are also being made in small print shops across America. As is code being written on the cheap in Poland, Brazil and India.


Sure, but I don't think that the code being produced in the US is "small code shops" producing artisanal code. It's not about whether code is produced in other places, it's if it's produced at a high scale locally. I don't think it's a problem that Poland and India have programmers.

A lot of code is made by people in countries with relatively low incomes, eg eastern Europe. Most of the software in your car is probably written by an offshore team for example.

That probably explains why most people use Android Auto and Apple CarPlay!

Probably not for their engine control unit or their abs controller.

The problem is not with offshoring, per se but the mentality of the person doing the offshoring and why they are doing it.

Did it?

It did, e.g. Poland have very good bang for buck ratio. Something like Bangladesh wouldn't work for IT, but more developed countries will

The challenge with Poland, is that if there is going to be WW3 soon it looks likely to be the immediate front lines. So relatively high risk, from a business disruption perspective.

I think there is a skill issue. Just like in any other pursuit, some people are going to be better at using AI productively. It is a tool. You are still responsible for the quality of the resulting code whatever the mix of human and tool generated.

Hard disagree.

Assistant should not help you thinking, any AI agent/tool should be doing what you want with minimal amount of explanation.

Only way I accept current hype is if I am able to type in "make a Twitter clone" it does the implementation, I can run it, I write "make it red, silver and yellow color themed" and it does just that. I am the one doing thinking here - I don't care about technical details. That should be state of art.

I can write my own Twitter clone and if I have to write prompt after prompt it is going to take me more time and more typing so it is useless.

A person that cannot write their own Twitter clone is not going to prompt their way out to having working and deployed Twitter clone.


This reminds me of something Dijkstra wrote almost 50 years ago in On the foolishness of "natural language programming [1]:

> When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.

Although this is obviously not about LLMs, its astonishing how many parallels can be drawn to today's usage of AI systems.

1: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


Wow, thanks for sharing this beautiful essay by a legend that captures the essence of the llm debate.

That, the use of formal languages - although an evil (since it is not a natural human language) is essential to avoid nonsense (ie hallucinations). While intuition/natural language is more imaginative, formalism (ie narrow interfaces) is a forcing function to make things work.

According to Dijkstra, the use of natural languages regressed civilization by thousands of years (because of the nonsense and imprecision) ! So, expect thousands of years of LLM hell if we adopted it to replace our formal languages.

Indeed, math and other symbolism/formalism is a crowning achievement of humans.


I use AI (Sonnet) to write relatively complex SQL queries, but that always need a review before implementation. It's sometimes brillant, and at the same time can suggest wrapping a single, atomic query in a transaction, for no reason. Just asking "why?" will result in profuse apologies and thank yous, but it never explains what caused the mistake in the first place.

Trusting an AI to code an app from start to finish seems crazy to me but hey... if some people can pull it off, good for them I guess.


That's the type of thing AI is great for! I find AI is pretty decent at generating data related code with Pandas, and I since I only rarely use Pandas it saves me a ton of time relearning everything.

Where ai starts breaking down is how to effectively incorporate a new feature in a complicated existing codebase. That is where us engineers can continue to hold an advantage.


This massively underestimate what current LLMs can do. Yesterday, I was able to create a 600 lines script in 20 minutes or so that essentially setup a Cloudflare worker bindings (KV, Queues, Hyperdrive, etc...). The complexity is very low and debuggu-ability is easy. Reading this infra. code is fast. However, if I was to do this manually, it would have taken me a full day reading through the docs and trying the implementation back and forth for each binding I am connecting to.

Claude 3.5 did it from the first shot.


And 2 months later I get asked to debug code like yours when it doesn't work for a customer and have to spend days or weeks digging into your code before I notice the LLM took some shortcuts that work most of the time, but are ever so slightly broken in edge cases, followed by me having to rebuild it all from scratch.

I literally just spent a full week on such a project. Respectfully, fuck people who don't read the docs/spec.


Respectfully, I don't understand what your problem is. First, you don't have to debug any code. That's your personal choice. Second, whether LLM are used or not is irrelevant. The original developer is the party that decides to check and double-check his work and whether he extensively verified what the code is doing (and maybe tightly tested it). LLMs are not making this worse or better.

First:

> First, you don't have to debug any code. That's your personal choice.

To quote the Simpsons:

> Money can be exchanged for goods and services

Second:

> LLMs are not making this worse or better.

The bottleneck is never "writing code", it's "thinking". LLMs can solve one, but not the other. They allow you to produce more thought (and give you false confidence), but the code has less thought put into it, making it worse.


Maybe it's easier to verify a working program with the docs rather than try to build something from scratch? I agree it's a bad idea to fire off AI code without any sort of understanding though!!

Agree it's good for boilerplate, provided the thing you want to do is extremely basic / just setup. Once you need something slightly more complex it seems to break down rather quickly.

Claude 3.5 is pretty good at giving you the right hints though. If you aren't familiar with a library it's definitely faster than grepping through docs. If you are an expert in a library than it's pretty useless.


You still have to become a domain expert to debug it though?

Not when it's code that was only hard to write because you needed to know the right incantations to pipe data between different services.

Now you see the incantations that mostly work and the job of transforming it is easy.

Java's Bouncy Castle crypto library is a good example of this. The thing you're trying to do might be simple, but to do it, you might need to instantiate 8+ Java classes. It doesn't mean it's complex to read or hard to debug.


> The thing you're trying to do might be simple, but to do it, you might need to instantiate 8+ Java classes. It doesn't mean it's complex to read or hard to debug.

I’m skeptical that code that needs to instantiate eight separate classes will remain easy to debug in the general case.


LLMs give you a lot of false confidence, just because something looks right doesn't mean it is.

Especially with cryptography you should NEVER use LLMs. Read the docs, write down some notes, and make sure you properly understand everything before you use it. You need to really think it through before you end up leaking user data or worse.


Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.

AppleScript might convince me of your argument.. ;-) But, seriously, we've been putting abstractions between "us and the bytes" ever since Fortran and COBOL appeared (and indeed, earlier). We can argue about the quality and expressiveness of those abstractions, and there are a lot of arguments against natural languages in this task, but the broad idea of putting things in between developers and machines is sound so it's worth continuing to explore IMHO.


All those layers are stable and deterministic. You can open it up and check, in most cases, how the upper layer calls the lower layer, should it be necessary.

If you have an LLM between you and the code, there is not even such a thing as ”source code”, only a history of prompts. You can’t check in your prompts in git, and re-generate the same code later.

In fact, it’s more like the antithesis of the reproducible builds movement. It’s introducing a proprietary networked high latency chaos agent into the critical path.


I accept your points today, but I'm optimistic this problem will be resolved in the mid-term. I just feel some of the arguments smell similar to concerns levelled at the earliest compiler developers about the perils of abstraction. Now we gleefully stack layer upon layer with most developers being unable to grok more than a layer below their choice of abstraction (if they're good!)

I am extremely LLM-optimistic though and largely in favor of abstractions, so that fuels my viewpoint. I still remember my dad, an embedded developer in the 90s-00s, ranting about how many people were starting to use 'inefficient and unpredictable' C compilers than whatever assembly he was using. I reckon he'd be appalled to learn that now even assembly isn't always a reliable model of what's really happening on the CPU thanks to microcode and optimizations.. ;-)


> I just feel some of the arguments smell similar to concerns levelled at the earliest compiler developers about the perils of abstraction

Like you, I am a sucker for abstraction (often to my detriment). I just don’t see any resemblance between an LLM layer and abstraction, aside from that they both introduce complexity. I guess that abstraction superficially ”hides” its inner workings but this is a misnomer, it’s never actually hidden, it’s only claimed that you don’t need to look behind the curtain (but you can, and then you can see exactly what’s going on).

> I'm optimistic this problem will be resolved in the mid-term

Which one? Stability? Assume you can get open source models running locally and with no floating point non-determinism. Now you have some narrow sense local stability, yes. For exactly the same version of the LLM compiler and same input (all prompt history), you can reproduce the result. But this is not enough. If you roll a new version of the compiler (meaning new training data or algorithm), all previous responses are invalidated. So how would you maintain backwards compatibility?

With abstractions OTOH (like say C++), you don’t check in the generated assembly or binary. It’s enough to check in the C++ code, the rest is derived directly. When C++ releases their new foot-machinegun feature, you simply update your compiler and your old code works.

My point is that LLMs are not engineering products. They are closer to organic in the sense they act in the moment, and by necessity they are non-stable with respect to small variations in input and training. I heard this good quote from this researcher at anthropic who claimed we don't program the (neural) networks, we ”grow” them.


The thing is, going from a high-level scripting language to a natural-language interface is a massive jump. The difference between a python script and a hand-crafted executable is essentially nothing in comparison. I don’t think that a single technology is going to bridge that yawning gap overnight.

Plus I feel like at least compilers are at least deterministic? AI is not repeatable or provable at all

> I still remember my dad, an embedded developer in the 90s-00s, ranting about how many people were starting to use 'inefficient and unpredictable' C compilers than whatever assembly he was using

I remember reading that C compilers were buggy/generated lots of instructions. He may have a point back then


> It’s introducing a proprietary networked high latency chaos agent into the critical path.

Well (and amusingly) said. The same (or at least a very similar) problem exists at the other end of the pipeline, i.e., whenever a user has to use a natural-language interface to get software to do something they want. Are we really going to tell our AI assistants to take complex actions on our behalf in the real world, and then just sit back? Are we really going to do this when money is involved?


sure; but these languages we speak to computers in are deliberately non-ambiguous and lack nuance. Natural language is both ambiguous and nuanced.

"Have a nice day!" can mean many things (an insult, or sincere, for example).


I accept your point and agree in principle! However, I think there's a lot of fertile ground to be researched and experimented with here. I'm mostly pushing back against the parent's assertion that "putting natural language encoding between [a developer] and the bytes" is inherently of no net benefit.

(I'd also argue many programming languages introduce ambiguity, are riddled with undefined behaviors, subject to 'dialects', and distorted by issues similar to those affecting natural languages. C++ perhaps most notoriously. This is something I'd rather debate in good humor in person, though, as I suspect this could be an entirely separate problem.. ;-))


I’m not sure why you need to do any research. Just look at how hard it is to drag software requirements from a business person. Or how hard it is to explain to a “pedantic asshole” how to make a sandwich (https://youtu.be/FN2RM-CHkuI?si=3lanUOhXkj2sP8GA).

Very maddening, some regular phrasees can be interpreted in multitude of ways, often contradictory. Even with full context it's still ambiguous

Exactly. The formalism of computer code will always create some amount of boilerplate—there's no perfect language—but in my (admittedly limited) experience, an LLM is a middleman which distances you unacceptably from your own code. Whether you review the code or you write the code, the intellectual effort of deciding which approach is best and understanding the solution still needs to be undertaken. All it's saving you are the keystrokes, at which point it's glorified intellisense.

Note that intelligence/code completion was never about saving keystrokes, so I assume LLMs shouldn’t be either. Code completion has always been about getting a list of things you can do on a particular type, saving you from having to remember everything in your head (and APIs have gotten more numerous as a result). LLM code assistance that I’ve seen is poorly designed in that it usually gives you one most likely choice (so save keystrokes) and doesn’t allow you to browse through a bunch of likely possibilities.

AI is a tool like any other. Autocomplete on steroids -- markov chains taken to the extreme.

We already put natural language between us and the bytes. Hence why most keywords and variable names (a hard part of computer science) are in simple English and it is considered a net positive.


The memory and compute requirements to develop and run these models make no sense if the marginal improvement in autocomplete is the big end result. They only make sense in a world where machine can derive intent from natural language and actually conform to what people mean when they ask for something. This is clearly a fantastical result that LLMs are very short of.

It’s interesting. I would’ve agreed that ‘driving intent from natural language is something that LLMs have fallen far short of’ maybe a month ago.

Since then, I spent a week trying to get cursor to work, and after dealing dealing with all the bugs, and restarting the composer each time with a new prompt, was able to get what I would consider a quality output for a moderately complex app (a parimutuel betting market).

The issue isn’t that LLMs are terrible, it’s the software like cursor is buggy and poorly written.

It should know that I don’t want to use code from an old version of the library I am using because the new library I am using is already in my projects dependencies.

It should let me set up preferences for different programming languages. And preferences for all programming languages.

So when I give it a prompt, it looks at the dependencies and language rules I already have set up, adds those to the prompt and produces the quality output I’m seeing now without me having to manually specify all those things.

Short version: LLMs rule the software is just shitty.


My experience has been the opposite, I found cursor to be an improvement over comparable tools such as aider.

I was able to write a plugin for ComfyUI (a 60k loc python/js codebase) in 2 hours thanks to semantic search. It's not an exercise I'm versed in.

It wasn't that different from the kind of internal monologue I'd have held in my head had I done it on my own, including misguided confidence that gets crushed 5 minutes later as you read other parts of the code that show you had the wrong understanding of how it actually works.

In this context, LLMs can be very useful because a ground truth already exists to compare their replies against.


> My experience has been the opposite, I found cursor to be an improvement over comparable tools such as aider.

That sounds like a similar finding to what I had (comparing to copilot in my own case).

My point, which my post maybe didn’t make so well, was a huge amount of the prompt should’ve been written for me in order to get to an acceptable result sooner.


I see this and conclude the opposite that they were adhering to the principle. Basically AI writes the buggy code that is upsetting you.

Similar experience trying to use GenAIScript, btw, and peering inside the box the code and product is pretty well incomprehensible.


My post above does not seem to be well written as it’s been frequently misinterpreted.

Yes, the AI is writing the buggy parts that upset me but my point was creating a good quality prompt would’ve taken a lot less time if Cursor had had some reasonable defaults.


LLM's don't "know" anything, it's just a souped-up Stackoverflow search.

I totally agree. I think almost all coding could be done by today’s best LLMs, IF they had the right context and tooling. Using Cursor is sometimes like magic, but it also feels painfully clear that the LLM is being held back by a lack of information, leaving me to have to interface between the codebase and the LLM, in both directions. Selecting which files to include in context feels so stupid, and like something that will hopefully quickly go away.

>The memory and compute requirements to develop and run these models make no sense

There is the story that von Neumann flew off the handle the first time he saw an assembler.

>>How dare you waste compute cycles on this frivolity? Just use machine code like everyone else.


There is the story that von Neumann flew off the handle the first time he saw an assembler.

That was in the 1940s when labour was very cheap and compute was insanely expensive. We’re talking hundreds to thousands of programmers’ salaries for the cost of one computer.


How much did the hardware to train gpt4 cost?

I’d expect $50-$100m. 1/40 the value of Rockstar energy drinks.

The hardware costs tens of billions. The electricity cost is around what you guessed.

> AI is a tool like any other. Autocomplete on steroids

No, AI is a shitty tool that has yet to prove its utility. Autocomplete works by analyzing the official API and interface, it's completely different than AI which hallucinates meaning between words and also stuff that it was fed before it met you.

> variable names (a hard part of computer science)

Naming is for software engineering, not CS. One more confusion by people who want to sell us AI at all cost.


At some point, you become the luddite. Maybe you have no experience with modern AI dev tools, maybe you work in a language that is underrepresented in models meaning off the shelf tools don't work well, or maybe you're just an old curmudgeon who will die on a hill.

But modern AI tools are far beyond "auto complete". (I actually turn off those in-line completions, I feel they ruin flowstate). The tools now are fully prompted, with multi-file editing, with full codebase context, with web/search and doc integration, and for "on the rails" development are producing high quality code for "easier" tasks.

These modern models and tools can solve nearly every single leet code problem faster than you. They can do every single Advent of Code problem likely 10X-100X faster than you can.

In my professional, high standards, very legal and contract driven web app world, AI tools are still very useful for doing "on the rails" development. Is it architecting entire systems? No of course not (yet). Is it emulating existing patterns and extending them for new functionality 10X faster than a Jr or Mid? Yes it is. Is it writing nearly perfect automated tests based on examples? Yes it is. It is scaffolding new ideas and putting down a great starting point? Yep. And it's even able to iterate on featurework pretty well, and much faster than Jr/Mid.

The kind of work I'd give to a Jr/Mid and expect to take 2-3 days before they need serious feedback up and down the change, these AI are doing in about 30 seconds, maybe 90 seconds if you need to iterate a few times on the prompt.

I get that "AI" is a buzzword that is pumping valuations and making business people see $$$.

But coding assistants are not that. For many programmers, they are quickly becoming valuable tools that do in fact speed up development.


> you work in a language that is underrepresented in models meaning off the shelf tools don't work well

That's exactly what happens, and why I think the whole hype is a joke. I have tried all the models and tools though, it's always an annoying mess.

> tools can solve nearly every single leet code problem faster than you

That would be useful if I was paid to "leet code" or solve Christmas games. This is not a good rebutal though but it made me smile.

> The kind of work I'd give to a Jr/Mid

Good, but I don't want to know what happens in 20 years when there are no more juniors to feed the AI and work on becoming seniors. I will be retired by then and I'll enjoy writing my own open-source stuff.


> Our modern model and tools can solve nearly every single leet code problem faster than you.

That's expected, since all the leetcode problems have ready-to-use solutions on the internet.

(In fact, the reason they ask leetcode questions isn't to test your IQ, it's to know if you've read the obvious and available literature.)


Two points:

>That's expected, since all the leetcode problems have ready-to-use solutions on the internet.

1) If the implication is "The model knows the answer and regurgitates it like lyrics to a song" then I would push back. Put a leet code problem into deepseek r1 chain-of-reasoning model and watch it spend 2 minutes spitting out 5000 words thinking through every single facet of the problem and genuinely solving it at a level that is higher than 95% of programmers.

And point 2)

If you do believe it's fundamentally about how much the model has been trained on, then it has seen your CRUD app and has already seen 10,000 times the feature or system you're about to write -- so it should be a foregone conclusion that it can also do all of that development work too. Only the higher order architecting and proprietary domains should be challenging for it, as there would be far less examples to train on (scarcity) or the model doesn't understand a complex solution (architecting systems at scale is something it can't do).

(I also point out how well these models did for Advent of Code 2024, when there were zero examples in the training data for it).


Is it “thinking” or is it regurgitating analysis of the problem it found somewhere on the internet?

Are you "thinking" or are you regurgitating analysis of the problem based on what you read on the internet, too?

This one is funny because for something like leet code, nearly everyone just reads the best answers, learns them and learns how to regurgitate them in an interview environment.


> thinking through

It's not "thinking", it's regurgitating an internet search and padding it out with markov-chain style text autocompletion.

The 5000 words of padding do not actually provide any value, it's verbal white noise to fill space.

> ...then it has seen your CRUD app and has already seen 10,000 times the feature or system you're about to write

Well, yes. Lots of pointless waste in software engineering. Fortunately I don't write CRUD apps and AI does nothing at all for me in a professional context.


> (In fact, the reason they ask leetcode questions isn't to test your IQ, it's to know if you've read the obvious and available literature.)

Rather: it tests whether you are sufficiently docile and devoted to be willing to cram lots of leetcode exercise books that have no relevance for the programming concepts that the job will involve, just for a lottery ticket for a somewhat well-paid position.

I know that there is so much more to programming and related topics that is sooo much deeper (in particular if non-trivial mathematics becomes involved) than these leetcode-style brainteasers. So I strongly prefer to read about such deeply intellectually inspiring topics related to programming instead of jumping through the idiotic hoops that other people want me to.

Indeed, I thus fail the test for docility and devotedness, but I honestly can't take organisations seriously that demand such jumping through hoops.


I feel like 95% of the developers who are touting AI are doing web dev or app development - a field which has low stakes, low barrier to entry, and an incredible amount of reinventing the wheel - all things that an LLMs is naturally going to excel with. I can't imagine you'd hold these same beliefs if you were writing control software for a life saving medical device where a single bug could kill someone, where the board package is proprietary and not understood by an LLM, and the thing is written in a C dialect combining only half the features of C99 with a subset of obscure compiler flags.

I think people are running into a couple of challenges. One is keeping up with the pace of improvement. The tools are much improved over 6-12-24 months ago. A poor first impression can leave people thinking a tool is terrible forever more. Second is that someone must learn to work with the new tools. The hype can lead people to think the tools will magically just do all these things. The reality is, like most tools, it takes some trial and error to learn how to best use it.

ChatGPT in 2023 was a fun toy, but just that.

Claude in 2025, especially with Project feature is far better. It can complete CRUD project on its' own, and all I have to do is to fix glaring issues and design API before.

Which might be not impressive to someone, but it is good at that. And few years ago, it would not be possible.


CRUD generation is a staple in web framework CLI tooling since like forever. Once upon a time Wordpress was this, the boilerplate application you scripted out and then adapted. In certain programming languages macros or type systems power up this kind of tactic, and IDE:s typically have very good support for these kinds of shortcuts.

Then the project management tooling does a lot more, like automatically reverse engineer existing databases and so on.


Autocomplete works by analyzing the official API and interface, it's completely different than AI

You can (and should) give the AI access to your existing codebase and any relevant documentation to use as context if you want good results. If you give the AI zero context for the problem it is trying to solve, of course it will struggle. If you give it all the necessary context, it will do much better.

I've found that just uploading the documentation of the API or library you are working with before asking the AI questions about it makes a huge difference in the quality of its output.


>> variable names (a hard part of computer science)

>Naming is for software engineering, not CS.

I figured they were referencing the “two hard problems of computer science”, those two being naming things, cache invalidation and off by one errors.

Everybody knows the hardest problems in software engineering are assembling promo packets and building consensus on number of spaces per indent.


‘Number of spaces per indent’ begs the question: why spaces? The proper indentation character is the corn emoji.

> Hence why most keywords and variable names (a hard part of computer science) are in simple English and it is considered a net positive.

As I'm not a native English speaker, I disagree. I learned programming long before I got decent in English, and even today I just consider the English keywords in programming languages to be some "abstract mathematical concept" that by mere coincidence is named after some real, existing English word. Even today, being somewhat decent in English, I stil think this way when I see program code.

I actually would insist that this is a much more useful way to think about good programming, since this way you have no difficulties to ask yourself all the time whether it would make sense to replace some "English-named" concept by something more useful, but which has no analogue in the English language (or any other natural language).


There have been studies (in the 80s or 90s, I never wrote down references, unfortunately, but they probably involved lexical priming) that support that idea. They suggest that English keywords get a meaning of their own for non-native speakers.

49 20 72 61 74 68 65 72 20 74 68 69 6e 6b 20 74 68 61 74 20 74 68 65 20 6e 61 6d 65 73 20 6f 66 20 6b 65 79 77 6f 72 64 73 20 6d 61 74 74 65 72 20 61 20 6c 6f 74 2e

4b 65 6e 74 20 50 69 74 6d 61 6e 20 72 65 6c 61 74 65 64 20 5b 30 5d 20 74 68 65 20 63 6f 6d 6d 65 6e 74 20 6f 66 20 61 20 70 72 6f 67 72 61 6d 6d 65 72 20 69 6e 20 50 61 6e 61 6d 61 2c 20 77 68 6f 20 6c 69 6b 65 6e 65 64 20 45 6e 67 6c 69 73 68 20 6b 65 79 77 6f 72 64 73 20 74 6f 20 6d 75 73 69 63 61 6c 20 6e 6f 74 61 74 69 6f 6e 73 20 62 6f 72 72 6f 77 65 64 20 66 72 6f 6d 20 49 74 61 6c 69 61 6e 2e 0a 0a 5b 30 5d 20 68 74 74 70 73 3a 2f 2f 64 65 76 65 6c 6f 70 65 72 73 2e 73 6c 61 73 68 64 6f 74 2e 6f 72 67 2f 73 74 6f 72 79 2f 30 31 2f 31 31 2f 30 33 2f 31 37 32 36 32 35 31 2f 6b 65 6e 74 2d 6d 2d 70 69 74 6d 61 6e 2d 61 6e 73 77 65 72 73 2d 6f 6e 2d 6c 69 73 70 2d 61 6e 64 2d 6d 75 63 68 2d 6d 6f 72 65

>Hence why most keywords and variable names (a hard part of computer science) are in simple English

"Natural language" is about far more than individual words.


Bad take. Identifiers are just labels.

I don’t think anyone disagrees that identifiers are labels. If you’re claiming that these labels are unimportant, I’d be interested in why you think this.

Identifiers are super important and should be chosen wisely, but a C program with English identifiers is still a C program, while an LLM prompt is in fact NL and a whole new layer of that between your brain and the bytes. Which is why I think that saying "we already put NL between us and the bytes" minimizes that difference and is a bad take.

Exactly.

Not to mention that our existing programming languages have a deterministic output given the same code and the same compiler.

LLMs do not.

Thus, LLM prompts are an entirely different class of tool than a programming language.

This should be obvious to anyone who has written code, but alas.


Ooh. Good point. I guess we need traditional languages as a way to debug the created machine code. But… what if we didn’t? Ie the LLM made bytecode and we had some better way to talk about the concrete implementation.

They mostly aren't important though. When I first learned Pascal, JavaScript and PHP as a child, I had barely any idea what all those English words meant. Later on, when I was learning English in middle school, I was remembering their meanings by recalling what they do in code.

I agree, and it's sad that, despite all those pitfalls, some companies and CEOs will keep pushing the idea that human programmers can be replaced by AI.

But well I guess there's a bright side to see here: those LLMs applied to software development might become the new Genexus and there are gonna be plenty of open positions for humans to rewrite entire systems in a not so far future.


That's obviously wrong, as demonstrated by the engineers invested in building tools to enable that.

A lot of engineers invested in building crypto stuff and we didn’t go far in personal banking. Hype-driven development is not guaranteed to succeed.

That was not what was argued, and not what I am arguing. OP claimed that "only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive."

Unless OP is also willing to claim that all the people who are working on LLM dev tools are frauds, and act against their better knowledge, OPs claim is obviously false. The entire premise that the people who build these tools operate under is that natural language "between you and bytes" can be a net positive.


Which has been proved again and again to only get you 70% there [1].

[1] https://addyo.substack.com/p/the-70-problem-hard-truths-abou...


Popularity doesn’t demonstrate anything

This has been my experience with a recent try to guide the LLM to a complete implementation of a small internal tool. I had in an hour what would have taken me 4 or 5 to write. But after that, it was an endless loop of the LLM adding logging code to find some bug and failing to fix it, only to add more logging code and ineffectual changes and so on. The problem is that even after it's lost at sea, it's still answering in a completely confident and self assured tone, so when you decide to take matters in your hands you might be too far gone from sanity and have an unfixable mess in your hands. I guess I can go back to where it strayed and retake it from there, but by now the experiment seems to be a failure.

Back in early 2023 I tried to write a tool to do my taxes based on my broker CVS files. Since I wasn't familiar with how the data was structured, I let the LLM lead me while building this in incremental steps. The result was not just buggy, it simply failed to detect the relationships in the data (multiple somewhat implicitly embedded tables that needed to be joined). Even after I pointed this out, it failed to handle it, getting stuck in the same kind of loop you described.

To this day, no LLM that I tried passed this task of leading the development while detecting the underlying structure of the data.


At least in my experience as soon as something goes a little wrong it just gets worse from there. The more of it's confusion and contradictory information are in the chat history the worse it gets. It also has to make changes to the code so you accumulate these spurious changes and the problem gets more confusing. I've had some luck starting over with a new chat asking what is wrong but if that doesn't work I just assume I'm on my own.

I've found that quality degrades really quickly after just the first reply, for some reason. They all seem heavily biased towards one-shot correct answers, and as you say, they go down the wrong path really quickly if you even get the first message slightly wrong.

I tend to restart chats from the beginning pretty much all the time, because of this.


I’ve also found this to be the case. Starting a new chat or in Cursor composer session puts things back on the right track. Also, prompting is really important. A lot of people just seem to think they have some kind of oracle - “fix the bug” - how is anything supposed to work from that?

> But after that, it was an endless loop of the LLM adding logging code to find some bug and failing to fix it, only to add more logging code and ineffectual changes and so on. The problem is that even after it's lost at sea, it's still answering in a completely confident and self assured tone, so when you decide to take matters in your hands you might be too far gone from sanity and have an unfixable mess in your hands.

I wonder how much better or worse things would get, if we took the human factor out of the loop. Give the LLM the ability to run tests and see the results, then iterate on its own output and branch off with different approaches, gradually increase the temperature etc.

Maybe it’d turn out that you need 10 LLMs running in parallel for an hour to fix something, or perhaps even a 100 would never stumble upon a solution for a particular type of problem. And even then I wonder, whether it’d get better if you fed it your entire codebase or the codebases of the entire libraries or frameworks that you use (though at that point you’re either training it yourself or are selectively finding and feeding the correct bits not to exceed the context).


But why? What is there to be gained in all of this work around the inherent limitations of this technology?

Get more people into computer science. Knuth said early on in his career he thought he needed to make the computer faster or cheaper but really it was about getting more users. Anyone can program. Or try to then learn about computer science

Where has Knuth said that? Sounds like the opposite of what I've generally heard from him (not caring about popularity etc).

Exploration of what’s possible and what’s not, identifying whether the weaknesses can or cannot be addressed.

A bit like traditional autocomplete can help streamline familiarising oneself with various libraries, a clear step ahead when compared to just needing to dig through documentation as much.

Maybe there’s a class of code problems that LLMs can be decent at solving, given the ability to iterate, verify solutions and what works or doesn’t, perhaps with 10x more compute than is utilized in the typical chat mode of interaction though.


Part of the skill in using these tools is recognizing when it spins off the rails and backtracking immediately. Most of the time something can be gleaned from that wrong approach which can then guide further attempts.

This is my experience with Aider. When I first started using it, I turned off the auto git commits, but I’ve since turned them back on because they serve as perfect rollback points. My personal style is only commit once I have a feature fully working but with Aider it's best to have it commit after each exchange.

I've gone 2-6 steps down a path before realizing this isn't going to work or the LLM is stuck in a loop. I just hard reset back to the first commit in that chain and either approach the task differently or skip it if it wasn't really that important.


you're not graded on getting the LLM to output perfect code, the point is to get the code in git and PR'd. If your LLM tooling doesn't automatically commit to git so you can trivially go back to "where it strayed" you need to find a better tool. (My current favorite is aider)

It's a tool not a person. When was the last time you got mad at a hammer for being smug?


To drive the point home, hammers are quite smug.

Mine always thinks it nailed it on the first try, and it's pretty hard-headed when you point out mistakes.

If you can't work around those limitations, you're screwed.


The current state of so-called AI does not provide much meaningful assistance in software development beyond basic tasks such as explaining workflows, breaking down thought processes, and performing simple conversions. I believe that generative AI, in its current form, is not true artificial intelligence. Rather, it is a sophisticated prediction engine that lacks genuine reasoning or understanding.

True AI should be capable of comprehending problems and devising its own solutions, rather than merely generating statistically likely outputs. Until AI reaches that level of cognitive ability, its applications in the real world remain limited, and much of what we see today is largely hype.

Tokenization and embeddings merely help models predict the most probable next token, a process that is executed at scale using vast computational resources. This is not intelligence but large-scale probabilistic prediction. The terminology used in computer science, especially in recent years, can often be misleading.


I think this comment underlines the biggest difference between people that say AI is a transformative tool and people that say it is nowhere close to working as expected.

I never expect some magic "understanding" to ever arrive, but doing remedial pattern matching is already a hugely valuable power that frees up humans to do more interesting work. This is how I use current AI - spitting out 5 line functions I could spend 5 minutes writing that he can do in 3 seconds and take me 10 seconds to review. Like "check for circular references" or "use Django ORM to write a query for all categories that have this flag for users that have this permission".

It doesn't "write the app" or solve difficult problems for me (unless it is some configuration issue). I can paste in a error code and save myself a few minutes of manual debugging. If I add a new parameter to a function it prefills the correct type definition and things like that. These are all micro-improvements but add up to a lot of saved time. Some people have success with editing across files but I rarely even try that - it excels at solving discrete, repeatable bits of work with tidy solutions so I use it for that.

Until AI can return "I don't know" or, better, "did you want it this way or that way?" it will be severely limited. Yes, it acts like a junior dev in some ways, but a junior dev that never asks any questions, which is not the junior dev you ever want to give important work.


Do we really want this? As soon as possible, employers will fire software engineers and replace them with AI. I’m positive they will not care about what AI can do, only how many salaries they can eliminate and still achieve the same results. You and I will not be the inheritors of AI.

I think that by the time AI can genuinely replace software engineers, a lot else in society will change.

It's hard to predict what it will look like. I could write both utopian and dystopian narratives and I can pretty much guarantee they'll both be wrong. Not "in the middle" but something unexpected, the way nobody predicted cat videos or doomscrolling.

But you are almost certainly right that we will not be the inheritors.


Yes, because employers will also be replaced by AI. Technology penetration won’t stop at some arbitrary boundary, it will go all the way through to logical conclusion. We have a chance at qualitatively better world, but we’ll need to act and push for new economic systems - when the time comes.

Maybe I read too much science fiction, but my first thought when speaking about "true AI" isn't the worry that a lot of us will get fired, it's the worry that we'll have created an army of digital slaves.

"Army of digital slaves" doesn't really sound that bad when I think about it. As long as it's your army and not your adversary's army... In what ways do you think "an army of digital slaves" is bad?

I guess only in the sense that any form of slavery is bad and morally reprehensible, and ethically inexcusable.

“and still achieve the same results”.

That is the part that won’t actually happen, at least pretty quickly.


I have a non-technical friend who in the last two months has bootstrapped a SaaS startup using nothing but AI. He's got just over a handful of paying customers at this point on a monthly subscription[0].

I asked him to show me his process[1] after trying my hand (20 year, principal) and noticed a big difference in how we used AI: I instruct the AI how to code, he asks the AI to fix problems. In other words, I have a tendency to look at the code and ask the AI to fix it in more specific and direct ways that I want it fixed. On the other hand, if something doesn't work, my friend will copy/paste the error to the AI directly out of the dev tools console and ask the AI to fix the error. The two approaches are totally different.

My lesson here is that you're not meant to debug AI generated code; hand the error off to the AI and let it fix itself. I think if you're debugging AI generated code, you're doing AI generated code wrong. If you're an experienced dev picking up AI coding, I think you need to shift your mindset entirely. Ideally, someone out there will just create a closed loop where the AI can fix itself when it finds an error (integrate some browser and autonomous test loop into Cursor, for example, and let it fix its own errors).

Conclusion: if you're going to use AI to code, commit to it and use AI to fix the errors as well. Use AI for every aspect of it.

[0] Yes, I'm sure there are security holes and code issues galore, but those can always be fixed later when he's proven the business model.

[1] Yes, I have told him that he should create a YT channel or stream on Twitch because the content itself is super interesting how well he's been able to use AI.


In my experience, AI isn't very good at debugging AI-generated code. If it fails to make the right insight, it loops continuously until it's completely off the rails. I'm surprised your friend hasn't fully gotten stuck with this, as it seems like a huge risk for his startup.

Having had an inside view of a YC startup that went from seed to C, I can tell you that code quality means a lot less than one would think when it comes to the early days of a startup.

The biggest risk to a startup is that you get the business model wrong or you don't ship code, even if it's the code is buggy and messy.


I don't know which specific LLM your friend used but pasting the error to the LLM usually ends in a endless loop where they tell you to do the same thing over and over again or the solution doesn't really work or generate another error.

So maybe he was lucky or he is using a very good LLM I'm not aware of.


Claude Sonnet. If your choices are to pay out of pocket for an offshore contractor and wait for weeks or pay $20/mo. for an LLM, it's pretty clear that even if you have to sit there for a few days until you get what you want, using the LLM is the better bet if you're non-technical. In either case, the code would be of questionable quality and a non-technical person would not be able to tell the difference anyways. I see it as a wash.

This probably only works if you glue a bunch of high level, popular APIs together. It might work, but will be fragile and expensive.

> fragile and expensive

Unfortunately, that’s the most common kind of software in the saas industry anyway.


Most SaaS apps today can be done by gluing together popular APIs (e.g. Stripe, Shopify, etc.).

No better or worse than hiring cheap offshore contractors to do the same, IMO.


As an experienced developer, that’s also how I use it. What I’m finding is that it generally rabbit holes as I give it new errors that it’s previous fix has produced.

However, usually after three or four of those kind of fixes, I can walk it back to the starting point before the initial error, and I now know how to prompt it to produce correct code, because I now have a better mental model of how the thing is supposed to work.

This has been super helpful in my process of learning new things, as well as relearning things I haven’t worked with in a while.


In my experience, fixing security issues after the fact is extremely challenging. A secure system has a very different architecture, with security as its fundamental function and business logic almost an afterthought.

It's not impossible to fix later. But it's often more effective to scrap and rewrite. Hopefully your proven business model has yielded enough money for that, before someone else has pwned it.


One can only imagine how many corners your friend had to cut to get to the product you call finished.

He's got paying customers organic inbound by word of mouth only; there must be some value there.


I don't see how these two would be connected.

His SaaS solves a problem for a very specific industry that's quite small (his market research yielded about 6000 customers) so it's a small, niche industry where a lot of the small business proprietors know each other through a trade group and more or less have the same problem: the alternative solutions currently in the market are expensive, entrenched, legacy providers that operate through a POS while his solution is web-based and costs less.


I wonder what this codebase will look like after a year or so of doing this.

Bugs will escalate from syntax errors to business logic errors ("one customer was charged twice"). There won't be anything to copy/paste, no AI will be able to fix these errors and no human will touch this codebase with a long pole.

Have you seen the job market at the moment? Humans will do a lot of things to keep a roof over their heads.

I’ve seen some users do that, and got stuck in a loop where the AI is saying "ah, this error is because", and not fixing it properly, or fixing it and adding a different issue by modifying part of the code that is not related at the same time. Next, the code is fixed but by added the old fixed issue.

I saw that with users asking VBA code to be generated by people trying to automated part of email and excel work.


It's possible, but his choices are 1) hire someone else, 2) just sit there and prompt again until it's fixed. Since he's bootstrapping this with < $50/mo, the choice is simple.

Also, it may be the case that the corpus of training data with VBA is not as good as it is with React these days.


I have tried that and the AI gets stuck just attempting whatever and writing more code that won't even compile. I have had more success trying to get it to follow steps or examples.

Maybe the language your friend is using has more examples for training, or perhaps the dynamism of some languages get it to runtime errors that have better details it can work with.


React and JS so I think it has some benefits since 1) it has a large corpus of recent training data, 2) the browser gives pretty good errors.

I also tried it and the biggest issue I ran into is that I'm very specific about what I want. I wanted to use `nanostores` for state and routing. Problem is that the LLM keeps using code from `react-router` instead of `@nanostores/router`. As soon as I point it out, the LLM fixes it, but the first pass code generation is almost always wrong, even using an instruction file (as documented in both Cursor and GH Copilot).

That's when I realized that we are using the AI in two totally different ways: he simply doesn't care about the implementation, prop drilling, any of the technical details. None of that matters to him except that when "this button is clicked, that action happens". So however complex or inefficient or imperfect the code is, he doesn't care whereas I still have a tendency to read the code and try to ask the AI to do it in specific ways.


> integrate some browser and autonomous test loop into Cursor

Doesn't this exist yet? It's such an obvious idea I'd be astonished if no-one has done it.


They exist in separate pieces; I've not seen it integrated into one loop yet.

Code gen -> show the AI an example of how it's supposed to work -> error -> code gen -> AI tries it again by itself -> Code gen


This only works if there is an error message. Do you instruct the AI to fill in the code with asserts and not implemented exceptions?

I was only on a session with him for like 15 minutes and he showed me his prompt history. Basically when he hits an error, he will paste the error and give a simple instruction like "I'm getting this error when I click this button: <ERROR_HERE>" and then repeat until it's fixed. Nothing special; imagine a non-technical PM giving directions to a junior dev except this junior dev codes nearly instantaneously.

Yeah this works for crud apps with conventional methods for accounts email payments and not at all for anything complex especially if it isn’t a super commonly used language or framework. Try coding a single game with AI that isn’t something done 10000 times already. It actually is impossible.

The vast majority of code in the world is the former though

The last 20 years of programming tells me that this isn’t the case.

This is only the case for new projects which don’t yet have users. Add users to even the simplest project and it evolves into a special snowflake with never before seen edge cases.

That’s why low code solutions are great for prototyping but eventually always explode into a nightmare of complexity.


This roughly mirrors my experience so far. Mind you I'm an extremely qualified engineer who has worked at FAANG.

Except I'd add that as one gets experience working with the AI I can only assume they'd get much better at making it go smoothly. For example, I wouldn't manually rewrite localhost, I'd tell the AI "Why is localhost everywhere? Will this worker if I deploy to a droplet?" and it will fix it for you.

Also I just paste error-messages directly into the AI and it usually knows how to fix them.

Sometimes it's net positive, sometimes it's net-negative due to creating a mess that's really hard to get out of or debug. But I imagine it's only a matter of time until the scopes in which it's cost-effective go up.

I don't like that AI is a threat of huge monopolistic and job-reducing potential, but I don't think downplaying it is a long-term strategy to combat that.


> For example, I wouldn't manually rewrite localhost, I'd tell the AI "Why is localhost everywhere? Will this worker if I deploy to a droplet?" and it will fix it for you.

The solution is multi occur (emacs), quickfix list (vim), or any editors that have whole project find and replace.


Which will also be much faster because you don't have to worry about sanitizing your code before sending it to an LLM or that the LLM made a mistake somewhere along the way.

> I'm an extremely qualified engineer who has worked at FAANG.

> I just paste error-messages directly into the AI

...


I find it funny that commenters on HN actually think their having past or current experience working at a FAANG is some sort of signal for two reasons.

On HN especially, that’s really nothing novel, many of us have (including me) and the only thing that it takes to get into one as a software engineer is memorizing the solution to coding problems.

When I’m hiring - mostly for green field initiatives - coming from BigTech is usually a negative signal for me.


I'm not sure what your point is here...

Where the author went wrong in this post is that he tried to interpret an error ("I was asking claude to solve the wrong problem"), was wrong, and then wasted a lot of his own time.

I really think it's best practice when describing a problem to anybody that you start with what you observe and then if you want to hint your suspicions you call those out afterward as such. If you're very confident the LLM is going down a wrong path, you can ask it things like "How would I test the theory that environment variables aren't set in my docker container?"


This. Great, AI can produce code. But it produces code without inducing understanding of the code in the person who wrote (or rather supervised the production of) it, which is half the point.

At some point AI will probably be good enough that this won’t matter. But it feels like we’re still a long way off that.


Oh look, a load of future work to fix these.

Why is this just like the last cost cutting exercise where the cheapest people in India produced a lot of "interesting" code.


Way back in the 2000 (or even before that, can't remember!) I wanted to get into winsock programming. I found a page where someone from India explained that with examples.

The variables, functions and so on had names like:

a aa aaa b bb bbb

It helped me to grasp the basic concept, but was kinda hard to follow, tho. :D


You can update requirements, educate developers, and fix bad code with an LLM many orders of magnitude faster than you can with Wipro.

Because ignoring a heroic effort from all the women in India the number of Indian developers does not double every 4 years.

The number of flops a gpu can output on the other hand does.


See also: almost every bespoke internal app written in FoxPro, VB, Excel with VBScript, etc

Can anyone explain why everyone is so hyper focused on speed? 500 images per second, 100 minutes video in 30 minutes, thousand lines of loc per hour. Who is going to consume all that?

Most of what generative models produce is shit so they have to produce a lot in hopes _some_ of it is OK-ish.

It's also about responsiveness. LLMs produce junior-level quality of code at a rate of hundreds of lines per minute. I need it to produce enough to spot where it's completely wrong as quickly as possible to I can change the prompt.

It's like a edit-compile-run cycle which you also need to be fast or you lose attention.

I was tempted to say it's another _step_ in the edit-compile-run but often the code is so bad I don't even bother compiling.


I'm firmly of the belief that most software would benefit immensely from us all slowing the hell down and putting more thought into what we build. But it would appear stability and a focus on core strengths doesn't sell nearly as well as endless new features for the marketing sheet added as quickly as possible.

"Who is going to consume 1000 lines of code per hour?" he types into his mass-manufactured thinking machine running an advanced operating system, before clicking reply, sending it across a global mesh of said devices.

Other machines.

The images are almost good but still in the uncanny valley. The code is almost good but full of bad practices and hidden bugs and undefined behavior. Since most AI grifters are neither coders nor artists, all they can do is produce more more more capitalism-style.

I have had great experience with Claude for coding, but you really need to be a programmer yourself, to be able to divide the problems into manageable chunks.

Same here, I really don't get all the "it's totally useless for programming" posts on here.

It makes me think many people haven't taken the time to actually learn to use the tool.

It just feels like they tried Copilot or ChatGPT for 5 minutes last year and concluded that all LLM's are useless and will be useless forever.

It makes me wonder if those people know that Claude 3.5 sonnet projects and/or Cursor with Claude exist?

Do they not appreciate some help to document their code? Do they never need to write or quickly understand scripts or code in one of the 100's of languages/stacks they're not too familiar with that they might encounter in the wild? How to get out of yet another git mess? Build a proof of concept in an hour that would've taken you days? A refresher on how to set up x toolchain to get started asap (the nr 1 hardest thing in programming :p) etc etc.


Same here. I see these tools as teaching me patiently and challenging me (unwittingly) in areas where i'm out of depth. When i'm lucky they will do simpler stuff for me, but for $40/month, I don't feel entitled to a SaaS-unicorn-terraformer.

> Do they not appreciate some help to document their code?

How does an LLM help there? What the code does should be obvious by looking at it, WHY it was written that way is the interesting question. Answering it often requires more context and domain knowledge.

> Do they never need to write or quickly understand scripts or code in one of the 100's of languages/stacks they're not too familiar with that they might encounter in the wild?

I'd rather take the time to do it myself because if I'm not familiar with a language/stack I won't be able to spot mistakes made by the LLM as easily.

> How to get out of yet another git mess?

Learn to solve the git issue and apply the knowledge in the future so you don't rely on yet another tool.

> Build a proof of concept in an hour that would've taken you days?

I question the premise.

> A refresher on how to set up x toolchain to get started asap (the nr 1 hardest thing in programming :p) etc etc.

How often do you do that? I think it's worth spending the time to do it yourself so you get an understanding of what exactly you're doing there. When you're done you can document the process and come back to it next time.


What you're basically saying here is: you should just learn more and know more faster.

And what I'm saying is: that's exactly what LLM's are super useful for.

To answer your last question: about every 6 months or so. I'm a freelancer, I do a new project for a new client every 6 months on average. All of their toolchains, build systems, OS of choice for the dev machine, OS of choice for the SoC, documentation methods, PCB design tools, version management systems, release systems, testing frameworks are completely different per client and change constantly (even within the same company) depending on department and moment in time.


> What you're basically saying here is: you should just learn more and know more faster.

I didn't say anything about speed. I think you should take the time to deeply understand what you are working on.

> And what I'm saying is: that's exactly what LLM's are super useful for.

I disagree, LLMs aren't good teachers. You won't be able to spot subtle issues with their output if you're not already familiar with the topic.

> To answer your last question: about every 6 months or so. [...]

I don't see the big advantage of using an LLM there. It can't set up the environment for you.


> I don't see the big advantage of using an LLM there. It can't set up the environment for you.

I can give it tons of random documentation without having to read through it to understand what parts are useful without having to filter through the irrelevant/mistyped/outdated/badly written stuff, even give it undocumented (shitty old) code and ask specific questions about how it works or what the likely intention was.

I promise you it is useful to me. I use it every day. It can't set up the environment for me all by itself, but it sure can help me do it WAY faster and understand it way faster. Especially the boring stuff nobody wants to do.

PS: who has time to deeply understand all aspects of what they are working on? This makes no sense to me in the context of a job. If I would take time to deeply understand every tool, script or even source code I touch, I would NEVER get to doing ANY work.


Programs are communication between 2 loosely coupled audiences -- the humans who have to maintain / modify the code and the computer that gets to run the code.

Human language, used to convey ideas to other humans, is imprecise. It's fine that it's imprecise because the media (humans) have both good error correction and a reasonable set of global defaults.

Computer languages require enormous precision because they're some mechanical translation to a set of machine code runtime.

Perhaps you can train an LLM on lots of code, and it'll find semantic relationships between some clever code it's been trained on to and your specific request. Perhaps not, and it'll just give a dumb answer or an incorrect answer, (ideally some code copilot will actually try running the candidate answer code against your specific ask?) -- but once the answer gets complex you run into the "it's much harder to debug code than write it, so don't write code that's almost too complex for you to understand" problem.

At work, I constantly have to remind people "don't use math data structures for identities" "but int is smaller" "Are you ever going to want the 95th percentile customerID?" "no that's silly" "then it isn't a number". Or I get to constantly remind people "a string with lots of curly braces and quotes isn't necessarily json; if you're not using a serialized API and just sending bytes to stdout someone else has to parse it" "but I'm using a logging library" "does anything else ever send stuff to stdout while your logging library is running?" "oh yes, we're going to open a ticket to debug that." So I'm not optimistic that running code written by a machine is long-term viable.

That said -- there are situations where machine generated code works -- I think it's been a long time since anyone manually drew masks for etching dies when making CPUs.


Anyone who has ever worked with VCs or shareholders before knows that, if you tell them the reality and limitations of something, they will either fire you or ignore what you say. They have been desperate to remove the leverage programmers have due to their skill and replace us with AI that they don't have to pay salaries to. All we can do at this point is just take VC money promising them exactly what they want to hear, that they will be able to replace us with a NLP model. Sometimes you just can't save people but you can profit from their voluntary fall from the cliff?

If you don't why it works when it works, you won't know why it doesn't work when it doesn't work.

The key issues here were staying on top of the AI's help.

Use AI wisely: as an assistant, not as a drunken lead developer.


I played with OpenHands for a few days (using gpt-4o since I already had an OpenAI account). I found it to be decent at writing new code, but then it had a hard time making changes when there was a lot of repetitive code (in a TypeScript / React project that I had it create with vite).

One of the interesting things about OpenHands is that you can see what the AI is doing in the terminal window where you launched it. Since it can't really load the whole codebase into its context window, it does a lot of greping files, showing 10 lines on either side of the match, and then doing a search and replace based on this. This is pretty similar to what a human might do: attempt to identify the relevant function and change it.

I think I might have better luck with a simpler project, e.g. a Sinatra or Flask app where each route is relatively self-contained. I might give it or Cursor another try in the future when the tech has progressed a bit.


I appreciate posts that are about practical usage of AI and it's strengths/weaknesses and the kind of conversation it generates. Conversations about AI are tough for me to navigate because there are camps of people that seem very invested in AI being either omniscient or completely useless. I regularly see people saying that AI is at the level that it can replace engineers or build whole apps. When I try this with state of the art models, I am seeing results that are nowhere close. That said, I still use AI every day during my development and I have a flow I think makes me way more productive. I want more conversations like this about the mechanics of using AI as it currently, and honestly evaluating it's strengths and weaknesses without getting into hypothetical debates about the future or whether or not the AI "understands".

It seems there is a battle of two opposite view-points. One is that LLMs are just dumb autocompletes with no ability to understand anything. Another is that LLMs can already right now be substitutes for programmers. I personally thing it is neither, but for experts who know what they are doing it is a massive time saver. I.e. in cases you know what code you want to write, but it's tedious, LLMs can do for you. Also LLMs are great in cases where you are less familiar with a new API, language, but have generally good understanding of programming.

Despite my broadly positive view on usefulness of LLMs, I do not think they are good enough (yet) to build a full system from scratch without an expert supervisor. This should not IMO be used as a 'proof' they are dumb autocompleters.


> I.e. in cases you know what code you want to write, but it's tedious, LLMs can do for you

I feel like I'm living on another planet when I see this point. I have almost never in my career encountered the situation where actually typing out the code is the time consuming part. The time consuming part is knowing what code you want to write, running it in a variety of circumstances to gain confidence that it's correct, and iterating when it isn't.

Please don't think I'm saying you're wrong by the way—if anything this just shows how diverse programming can be as a career. But I see this point raised a lot and it doesn't match my experience at all.


Experts who know what they are doing have long had alternatives beyond LLMs to make their work faster.

They have open source libraries, stack overflow, tutorials, documentation, simple code generator tools and snippets.

The speed up we’re seeing is from LLMs basically caching all those things into a huge mathematical model and retrieving information in summarized form ready for consumption.

And while speed is always nice, LLMs are expensive, require maintenance themselves to maintain relevant context, are still error prone, and terrible at true innovation.

In a few years we’ll be talking about the big “AI crash” and “what went wrong” when it has been obvious to experts all along. Winter is coming.


I am sorry, but the comparison of 'stack overflow' and tutorials to LLMs is bizarre. The amount of time to get to the answer from LLMs is drastically shorter. And claiming that the they only 'cache thing' is just wrong. They are certainly capable of correctly answering things that were not directly in their training set.

Do you have any examples of a question you could ask to an AI right now that you couldn’t find from a basic search on stack overflow and Google? Didn’t think so.

One thing I think would have helped the author: write a spec first.

Seriously. It seems stupid. But AI works a lot better with a written spec.

The incredible thing is that the AI can actually be an excellent resource for writing the spec. And it will actually produce better code when you feed the spec back into said AI!

The current generation of AI seems to have fooled a lot of people into thinking that somehow you can jump straight to coding. (Well, you can, and it will probably work if you want to make something small or limited in scope.) Not so!

But, on the bright side, it’s just as good at design as code if you ask the right questions!

I say this having used 4 and 4o extensively in this manner. Just started using sonnet3.5 in this way in the last month or so, and it is amazing at this.


The issue with AI is, it generates what it is trained on. Most publicly available coding contents/examples are just docs or blogspam(geeksforgeeks/javapoint/whatever) where mostly surface level code is mostly peddled. Even, many OSS(small scale) do not have best practices or good code base, just enough to get whatever is needed to be done. Now when you train AI on such data, it’ll excel reproducing(statistically) the same thread of code.

Once the quality of training data improves(somehow getting access to high quality codebase behind corporate walls by promoting these assistants and ingesting the codebase), the output improves.

There was a popular saying, garbage in garbage out.


It delivers debugging hell if you don't know what you do which is usually the case for inexperienced developers. It assists experienced developers very well who can sort through which parts of output are useful from the AI and which not so.

Heh, after decades of functional programmers being the "well, actually..." crowd at every conference, turns out they were right all along. Just for the wrong reasons!

The pitch:

AI generates tons of plausible-looking garbage Static types catch garbage at compile time OCaml/F#/Haskell fans quietly sipping tea in the corner

The irony? We spent years debating static vs dynamic typing for human developers. But the killer use case may ended up being catching AI hallucinations.

Finally, a business case for monads that doesn't require a PhD!

Time to dust off those Haskell books. Who knew safety could be so profitable? Plot twist: Category theory becomes a required interview question by 2025


I dream of a world in which more investment is put into creating better programming languages and runtime environments than trying to use LLMs as a way of coping with the complexities of current systems.

I was recently experimenting with local-only LLM coding assistant in JetBrains products. They did speed things up a bit, but I quickly realized that they were essentially automating the creating a copy-paste errors, resulting in time lost to debugging errors I never would have introduced myself, so I stopped using them.

My social feeds are full of tech bros who keep telling people AI codes everything for them. AI obviously has some impressive coding skills, but for me it never really worked well.

So is this just an illusion they create, or is it really possible to build software with AI, at least at a mediocre level?

I'm looking for open source projects that were built mostly with AI, but so far I couldn't find any bigger projects that were built with AI coding tools.


AI isn't great at creating software, but it is great at writing functions. I often ask AI to "write a function that takes A, looks up B in a SQL database, and returns C, or write a function that implements the FooBar algorithm in C++" and on the whole that works pretty well. Asking it to write documentations for those functions also works really well. Asking it to write unit tests for those functions works pretty well (although you have to be extra careful, because sometimes the tests are wrong).

What you have to do, and what AI cannot do well, is to decide where in the codebase to put those functions, and decide how to structure the code around those functions. You have to decide how and when and why to call each of those functions.


When I have to be that specific with it, it would be faster for me to just write it directly in my normal IDE with great auto complete

it would be faster for me to just write it directly in my normal IDE

Then you are a much better developer than me (which you may very well be). I'd like to think I'm pretty good, and I've many times spent hours trying to think through complex SQL queries or getting all the details right in some tricky equation or algorithm. Writing the same code with an AI often takes 2-20 minutes.

If it's faster for me, it might not be faster for everybody, but it is probably faster for many people.


The way to get better is to do it a lot. Every time you dig through a problem to solve it, you're not just learning about that problem - you're learning about every problem near it in the problem space, including the meta-problems about how to get information, solve problems, and test solutions.

In a sense you're slowly building the LLM in your head, but it's more valuable there because of the much-better idea evaluation, and lack of network/GPU overhead.


If you have to spend significant amount of time thinking things through how do you know the output from AI is correct and covers all details?

how do you know the output from AI is correct and covers all details?

Same way as any other code. You look at it, ask the 'author' to explain any parts you don't understand, reason through what it's doing and then test it.


That’s so much slower than writing it myself though.

The only reason to bother doing that with a junior developer is to teach them.


You have to do all those steps no matter if you write it yourself of if an AI helps you generate the code. Just typing the first thing that comes into your head and not bothering to test it will rarely create good code in all but the most trivial cases.

I find having an AI write out the algorithm and then walking me through the steps and generating test cases for all the corner case much faster than, looking up the algorithm online, trying to understand it, implementing all the details and then writing a test suite for it by hand. But I guess YMMV.


Well I was assuming you already understood the domain and the algorithm (or at least a good idea how you’d solve the problem). If you don’t, sure having an LLM cough up something is probably faster than finding and teaching yourself a new algorithm.

I definitely wouldn’t trust an LLM to come up with an optimal algorithm if I didn’t already have an idea of how to solve the problem myself though. There’s too much room for subtle bugs and unknown unknowns.

Tests aren’t a substitute for thoroughly understanding a solution (and thoroughly understanding a solution involves at least having an idea about the tradeoffs of different solutions, which you won’t have if you had no idea how to solve something yourself).

Most functions in line of business software though are going to be something like “loop over each item in this list, transform them from one format to another, then add them all up and save that somewhere.”

Actually typing out the code to do that is in no way a bottleneck for me (unless I’m working in an unfamiliar language and don’t understand the syntax well).


Actually typing out the code to do that is in no way a bottleneck for me

For me it often is. Understanding and solving the problem in the abstract is often the easy part. I might realise that one way to solve this problem is by representing the data X as a graph with properties Y and then use that fact those graphs behave like Z under conditions A. Or I might know that it's possible to get the information I need out of this database using some sort of complex series of joins. Correctly writing the relevant C++ or SQL do actually make the computer solve the actual problem still takes me a lot of time.

I often find that coming up with a good approach to solving a problem is a fast part, and actually typing out the bug free code to implement that idea the slow part, and LLMs at least for me speed up that slow part significantly.


Ok i dont know. When I first read it, it sounded like you have to spoon feed details until AI gets it right. Or perhaps your ide could be better!

"AI isn't great at creating software, but it is great at writing functions."

This 100%. In my experience (ChatGPT - paid account), it often causes more problems than it solves when I ask it to do anything complex, but for writing functions that I describe in simple English, much like your example here, it has been overall pretty amazing. Also, I love asking it to generate tests for the function it writes (or that I write!). That has also been a huge timesaver for me. I find testing to be so boring and yet it's obviously essential, so it's nice to offload (some of) that to an LLM!


It manages simple functions but Ive tried to get it to do complex ones (e.g. a parser with a bunch of edge cases) and it totally shit the bed.

For the simpler cases I think prompting still took about as long as just writing the damn thing myself if I was familiar with the language.

The coding I have found it useful for is small, self contained, well defined scripts in bash where the tedious part is reminding myself of all of the command switches and the funky syntax.


Current gen AI can spit out some very, very basic web sites (I won't even elevate to the word "app") with some handholding and poking and prodding to get it to correct its own mistakes.

There is no one out there building real, marketable production apps where AI "codes everything for them". At least not yet, but even in the future it seems infeasible because of context. I think even the most pro-AI people out there are vastly underestimating the amount of context that humans have and need to manage in order to build fully fledged software.

It is pretty great as a ridealong pair programmer though. I've been using Cursor as my IDE and can't imagine going back to a non-AI coding experience.


I think it’s selection bias. Marketers are going to post the proof-of-concept that it works (if only in a small isolated scenario), algorithms are going to emphasize the more amazing “toys” this produces, over the boring rebuttals. In the end, you will see hundreds of examples where it worked and not the the thousands where it produced buggy or dangerous code.

That attention does not map well to the important, hard, and more valuable parts of development.

Anecdotally, I still find it to be useful and it’s improving. I do think it’s going to be an huge impact in time.

Hype is part of the industry and it can be distracting to users, developers, and investors BUT it can also be useful (and I don’t know how to replace it) so, we live with it.


https://github.com/williamcotton/webdsl

Made almost entirely with Cursor and Claude 3.5 Sonnet.

11k lines of C and counting.


What I find interesting is that the project claims MIT license, but if it is "almost entirely" AI generated, I am not sure it even is copyrightable. So either the licensing terms deserve some large disclaimers, or it is not "almost entirely" made with AI. Based on the name I assume it is your project, could you shed some light on which of those two options is correct?

I guess copyright laws treat AIs as tools. If you paint a picture with a brush it's also almost entirely "brush created", and still you can claim copyright for it.

There are differing levels of abstraction when considering copyright.

I used Windsurf mostly on a feature to build out user authentication and then another tool to generate the PR documentation entirely.

https://github.com/jsonresume/jsonresume.org/pull/176

Meets my good enough standards fo sure


I like the concept, but honestly I don't see myself writing an entire webapp with this.

Here is some feedback:

There are a bunch of libraries that need to expose an http API. There is a niche for providing an embeddable http server that comes batteries included with all the features such as rate limiting, authentication, access control, etc. Things that constantly have to be reimplemented from scratch, but would not warrant adding a large framework by themselves.

That's where I think the idea of a "WebDSL" would shine the most.


Thank you for the feedback.

I also don't see myself writing an entire webapp with this either - perhaps small sites or simple API endpoints?

I was mainly scratching an itch I've had for a couple of years. I also really like tuning C code just for the fun of it!


It's funny that this is MIT licensed, expecting credit for uncopyrightable work.

I work in copyright law. Familiarize yourself with the AFC test concept and I'm willing to have a conversation about what would be copyrightable in this project of mine.

https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...


You misunderstand, the repo lists an MIT license which requires attribution. You want people to give you credit if they use this LLM-generated code.

LLMs which were trained on the works of thousands of other developers with similar licenses, who are offered no similar credit here.

It also claims copyright of the code as though you have authored it, but you're claiming here to have used LLMs to generate it. Seems like trying to have it both ways.


I want people to give me attribution for the parts of my project that are copyrightable.

From the article,

The second step is to remove from consideration aspects of the program which are not legally protectable by copyright. The analysis is done at each level of abstraction identified in the previous step. The court identifies three factors to consider during this step: elements dictated by efficiency, elements dictated by external factors, and elements taken from the public domain.

This means that the code written for interfacing with an external API, eg, GitHub OAuth, would not be covered by any sort of copyright as the expression is dictated by requirements of the API itself.

The overall structure and organization of the code was not generated by LLMs and is fully covered by copyright.

LLMs are in fact very good at writing code that would probably not be copyrightable in the first place and are pretty bad at writing the overall expressive systems structures that would be covered by copyright.


> I want people to give me attribution for the parts of my project that are copyrightable.

A requirement not extended to the open source developers whose code you are using essentially a copyright and licensing laundering engine to get around.


Thanks, that's exactly what I'm looking for.

Same experience. It has become pretty good at writing creative SQL queries though. Its actually rather good at that.

When I am working on something niche, it does not help either. I have tried to make it build modern UI applications for myself using modern Java, but it just can't. It hallucinates libs and functions that does not exists, and I cant really get it to produce what I want. I have had better experiences with languages that are simpler and more predictable (Go), and languages with huge amounts of learning material available (Typescript / React). But I have been trying to build open source UI apps in JavaFX and GTK, it just can not help me when I am stuck


I experimented with Cursor over christmas, with writing a simple-ish Swift/SwiftUI app on iOS as the challenge. I can code fairly well in Python, moderately in JS, and almost not at all in Swift. I was using Cursor on a Mac, in parallel to XCode.

Basically, it worked, but not without issues:

- The biggest issue was debugging: because the bugs appeared in XCode, not Cursor, it either meant laboriously describing/transcribing errors into Cursor, or manually fixing them.

- The 'parallel' work between Cursor and XCode was clunky, especially when Cursor created new files. It took a while to figure out a halfway-decent workflow.

- At one point something screwed up somewhere deep in the confusing depths of XCode, and the app refused to compile altogehter. Neither Cursor nor I could figure it out, but a new project with the files transferred over worked just fine.

But... after a few short hours' chatting, learning, and fixing, I had a functional app. It wasn't free of frustrations, and it's pretty far from the level where a non-coder could do the same, but it impressed me that it's already at the level where it's a decent multiplier of someone's abilities.


aider (AI assistant, that will do coding for/with you, depending on how you use it) has one of the more illuminating pieces of information on this. Here is a graph of percentage contribution of aider to aider development itself over time:

https://aider.chat/HISTORY.html


I don't think it is an illusion. It can remove a lot of barriers to entry for some people, and this is probably what you're seeing in the anecdata.

For example, my brother. He is what I'd refer to as 'tech-aligned' - he can and has written code before, but does not do it for a living and only ever wrote basic Python scripts every now and then to help with his actual work.

LLM's have enabled him to build out web apps in perhaps 1/5 of the time it would have taken him if he tried to learn and build them out from scratch. I don't think he would have even attempted it without an LLM.

Now it doesn't 'code everything' - he still has to massage the output to get what he wants, and there is still a learning curve to climb. But the spring-board that LLM's can give people, particularly those who don't have much experience in software development, should not be underestimated.


There is a big gap between being able to create a somehow working application and shipping a product to a customer.

Those claims are about being able to create a profitable product with 10x efficiency.


Current-gen AI can write obvious code well, but fails at anything that involves complexity or subtlety in my experience

I think it’s that AI unlocks the ability to code something up and test an idea for people who’re technical enough to get it working, but not really developers themselves. It’s not (yet at least) a substitute for a good dev team that knows what they’re doing.

But this is still huge, and shouldn’t be disregarded.


In my experience, AI is good at building stuff in two scenarios:

- You have zero engineering background and you use an LLM to build an MVP from scratch. As long as the MVP is sufficiently simple there is plenty of training data for LLM to do well. E.g. some kind of React website with a simple REST API backend. This works as long as the app is simple enough, but it'll start breaking down as the app becomes more complex and requires domain-specific business knowledge or more sophisticated engineering techniques. Because you don't understand what the LLM is doing, you can't debug or extend any of it.

- You are an experienced developer and know EXACTLY what you want. You then use an LLM to write all the boilerplate for you. I was surprised at how much of my daily engineering work is actually just boilerplate. Using an LLM has made me a significantly more productive. This only works if you know what you're doing, can spot mistakes immediately, and can describe in detail HOW an LLM should be doing the task.

For use cases in middle, LLMs kind of suck.

So I think the comparison to a (very) junior engineer is quit apt. If the task is simple you can just let them do it. If the task is hard or requires a lot of context, you need to give them step by step instructions on how to go about it, and that requires that you know how to do it yourself.


these are exactly my experiences as well. senior devs on my team and rocking and rolling with the AI. my junior devs have all but given up using it even after numerous retros etc…

For me ai has been pretty useful. Difference is I'm not a software engineer, I just write scripts to help me do my job. If I wrote bigger applications I doubt llms could help me.

AI is awesome for small coding tasks, with a defined scope. I don't write shell (or powershell) scripts anymore, AI does it now for me.

But once a project has more than 20 source files, most AI tools seem to be unable to grasp the context of the project. In my experience AI is really bad at multi threading code and distributed systems. It seems to be unable to build its "mental model" for those kind of problems.


This. It's good at Cmake, and mostly good at dealing with COM boilerplate (although hallucinations are still a problem).

But threading and asynchronous code are implicit - there's a lot going on that you can't see on the page, you need to think about what the system is actually doing rather than simply the words to make it do the thing.


They are useful for small tasks like refactoring a method however big the whole project is

It’s not mentioned anywhere in the post. But would be good to hear what the total time was including all the problems.

Well, still trying to get into nvrhi, I went on to ask ChatGPT to write me an example program using it.

To make it short, it got better when I made a project, uploaded the headers and docs of it as project files and moved my chat into that project as well.

That said, AI can help you but needs a lot support from you to do things somewhat right.


Content marketing for a new text editor thinly disguised as AI rage-bait.

HN fell for it hard - 156 points, 180 comments (as of this writing).

Well done Nick! :) And congrats on launching Codescribble! Hope to see a "how my post on AI grew my userbase" followup in a few weeks!


> LLMs are useless if you don’t understand the context > AI can be worse than useless when you don't understand the underlying technologies

I made a saying about this some weeks ago: "A.I. can make the road for you, but you have to know where you are going". In Greek it sounds a little bit better.

Also code is the truth, but it is not the only truth. The underlying computer, the network infrastructure and other things have an effect on the code. So, there could be a saying in addition to the first: "A.I. can make the road for you, but you have to test the road".


I put it: "copilot doesn't save me much thinking, but it saves a ton of typing".

If you drop all pretenses and use a photocopier to steal code directly instead of performing an elaborate laundering step, you will not have these issues.

Maybe AI will shine when working with strongly typed languages. Most errors can be caught at compile time avoiding debugging hell.

There's not enough of a corpus out there for the LLMs to snarf up

Garbage in, garbage out. Code spewed by a random generator that has not the slightest understanding of what it is doing, whacked at by a hammer until it seems to be working.

What is this supposed to produce other than a mass of bugs and vulnerabilities? "A.I." is utter garbage and always will be, it is foolish to think otherwise.


AI allows for more people to be more productive an therefore code more and produce more lines of code. That alone means more debugging needs to be done. when more people are doing anything, within that realm of action there is more liability naturally simply because of a larger participation in those actions.

AI tools are just that, tools. I’ve said this since the very beginning of LLMs. I’ve yet to see anything change my mind. Aider/Devin/Copilot/Cursor/etc, all the different flavors of LLM tools are great but if you don’t know what you’re doing they are going to get stuck in a loop/corner/bad-path. Sometimes it takes 2-6+ exchanges before you realize it’s lost the thread which is why I love Aider’s “auto git commit” feature (defaults to on). You can always jump back X steps if you realize the LLM is lost.

You also have to get a good feel for when it’s best if you make a change vs the LLM. Aider doesn’t handle new files and moving around massive chunks super well. It can do it but if I want to rename someone everywhere or break out components/types/etc into different files then I know I should be doing that in my IDE myself. Same for little syntax errors when a diff the LLM makes isn’t quite right.

I spent a few nights last week using LLMs to help build a chrome extension to match my Amazon transactions with my YNAB transactions for the purpose of updating the memo field in YNAB with the item names I bought from Amazon to speed up my categorization and serve as history of what I bought (previously I did this whole process manually). I think it really helped and made the whole process go much faster.

It really excels (for me) in UI. I’d like to think I’m pretty competent at writing code/logic but I’m not great at UI. In many projects I get bogged down when it comes to UI. If I get stuck coming up with a UI or I don’t like how something looks I can lose motivation to continue forward on it. With Aider I can ask for UI and while it might be abhorrent to a designer I think it looks pretty damn good (better than what I could do) and lets me focus on the logic. Aider also lets me try radical changes knowing I can easy reset back a few steps if it doesn’t work out.

I’ve said many times at work that a huge power of LLMs is taking something that would take 30-60min down to <5min, specifically around things like little scripts to investigate a problem or get more details. For example, I might have a log that I can see there is data in that I want to extract. I know I can write a chained/piped command of sed/awk/grep/cut/sort/uniq/etc but it’s going to take some trial and error as well as time. With an LLM I can bang out the full command in 1-3 exchanges.

Same deal with visualizing some piece of data in the logs (note: yes, we use Prometheus/Grafana but not everything can go in there and for new bugs/issues in the field I’m normally dealing with something we haven’t seen before and thus haven’t setup monitoring/alerting on). I’ve had LLMs churn out simple HTML/JS/CSS files that I can feed data into “graph all instances of this happening if X > Y and time is between A and B, etc”.

Again, I can write this stuff from scratch but often don’t do it in practice because the ROI isn’t guaranteed. In the middle of a production issue do I want to waste 10-30+ min writing the script to see if I can prove a theory? No, it’s not worth it if it doesn’t pan out, but if I’m using an LLM and it takes me less than five minutes then I can throw a lot more stuff at the wall to see if it sticks.


Never believe the snake oil sellers

I’ve built and iterated a bunch of web applications with Claude in the past year—I think the author’s experience here was similar to some of my first tries, where I nearly just decided not to bother any further, but I’ve since come to see it as a massive accelerant as I’ve gotten used to the strengths and weaknesses. Quick thoughts on that:

1. It’s fun to use it to try unfamiliar languages and frameworks, but that exponentially increases the chance you get firmly stuck in a corner like OP’s deployment issue, where the AI can no longer figure it out and you find yourself needing to learn everything on the fly. I use a Django/Vue/Docker template repo that I’ve deployed many production apps from and know like the back of my hand, and I’m deeply familiar with each of the components of the stack.

2. Work in smaller chunks and keep it on a short leash. Agentic editors like Windsurf have a lot of promise but have the potential to make big sweeping messes in one go. I find the manual file context management of Aider to work pretty well. I think through the project structure I want and I ask it to implement it chunk by chunk—one or two moving pieces at a time. I work through it like I would pair programming with someone else at the keyboard: we take it step by step rather than giving a big upfront ask. This is still extremely fast because it’s less prone to big screwups. “Slow is smooth and smooth is fast.”

3. Don’t be afraid to undo everything it just did and re-prompt.

4. Use guidelines—I have had great success getting the AI to follow my desired patterns, e.g. how and where to make XHRs, by stubbing them in somewhere as an example or explicitly detailing them in a file.

5. Suggest the data structures and algorithms you want it to use. Design the software intentionally yourself. Tell it to make a module that does X with three classes that do A, B and C.

6. Let the AI do some gold plating: sometimes you gotta get in there and write the code yourself, but having an LLM assistant can help make it much more robust than I’d bother to in a PoC type project—thorough and friendly error handling, nice UI around data validation, extensive tests I’m less worried about maintaining, etc. There are lots of areas where I find myself able to do more and make better quality-oriented things even when I’m coding the core functionality myself.

7. Use frameworks and libraries the AI “knows” about. If your goal is speed, using something sufficiently mainstream that it has been trained on lots of examples helps a lot. That said, if something you’re using has had a major API change, you might struggle with it writing 1.0-style code even though you’re using 2.0.

8. Mix in other models. I’ve often had Claude back itself into a corner, only to loop in o1 via Aider’s architect mode and have it figure out the issue and tell Claude how to fix it.

9. Get a feel for what it’s good at in your domain—since I’m always ready to quickly roll back changes, I always go for the ambitious ask and see whether it can pull it off—sometimes it’s truly amazing in one shot! Other times it’s a mess and I undo it. Either way over time you get an intuition for when it will screw up. Just last week I was playing around with a project where I had a need to draw polygons over a photograph for debugging purposes. A nice to have on top of that was being able to add, delete, and drag to reshape them, but I never would have bothered coding it myself or pulling in a library just for that. I asked Claude for it, and got it in one shot.


the real revolution will be when an AI tool can just be powered by our laptop to use our own codebase as the input....

Until then its just nonsense pretending to be something else...


So, an M3 MacBook with 64GiB of RAM running Deepseek R1 Zero in ollama prompted via aider?

Coding assistants do use your code base.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: